summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/apic/io_apic.c
AgeCommit message (Collapse)Author
2010-02-17x86, irq: Remove arch_probe_nr_irqsYinghai Lu
So keep nr_irqs == NR_IRQS. With radix trees is matters less. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-33-git-send-email-yinghai@kernel.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-16x86: Convert i8259_lock to raw_spinlockThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-16x86: Convert ioapic_lock and vector_lock to raw_spinlockThomas Gleixner
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2010-02-10x86: Avoid race condition in pci_enable_msix()Brandon Phiilps
Keep chip_data in create_irq_nr and destroy_irq. When two drivers are setting up MSI-X at the same time via pci_enable_msix() there is a race. See this dmesg excerpt: [ 85.170610] ixgbe 0000:02:00.1: irq 97 for MSI/MSI-X [ 85.170611] alloc irq_desc for 99 on node -1 [ 85.170613] igb 0000:08:00.1: irq 98 for MSI/MSI-X [ 85.170614] alloc kstat_irqs on node -1 [ 85.170616] alloc irq_2_iommu on node -1 [ 85.170617] alloc irq_desc for 100 on node -1 [ 85.170619] alloc kstat_irqs on node -1 [ 85.170621] alloc irq_2_iommu on node -1 [ 85.170625] ixgbe 0000:02:00.1: irq 99 for MSI/MSI-X [ 85.170626] alloc irq_desc for 101 on node -1 [ 85.170628] igb 0000:08:00.1: irq 100 for MSI/MSI-X [ 85.170630] alloc kstat_irqs on node -1 [ 85.170631] alloc irq_2_iommu on node -1 [ 85.170635] alloc irq_desc for 102 on node -1 [ 85.170636] alloc kstat_irqs on node -1 [ 85.170639] alloc irq_2_iommu on node -1 [ 85.170646] BUG: unable to handle kernel NULL pointer dereference at 0000000000000088 As you can see igb and ixgbe are both alternating on create_irq_nr() via pci_enable_msix() in their probe function. ixgbe: While looping through irq_desc_ptrs[] via create_irq_nr() ixgbe choses irq_desc_ptrs[102] and exits the loop, drops vector_lock and calls dynamic_irq_init. Then it sets irq_desc_ptrs[102]->chip_data = NULL via dynamic_irq_init(). igb: Grabs the vector_lock now and starts looping over irq_desc_ptrs[] via create_irq_nr(). It gets to irq_desc_ptrs[102] and does this: cfg_new = irq_desc_ptrs[102]->chip_data; if (cfg_new->vector != 0) continue; This hits the NULL deref. Another possible race exists via pci_disable_msix() in a driver or in the number of error paths that call free_msi_irqs(): destroy_irq() dynamic_irq_cleanup() which sets desc->chip_data = NULL ...race window... desc->chip_data = cfg; Remove the save and restore code for cfg in create_irq_nr() and destroy_irq() and take the desc->lock when checking the irq_cfg. Reported-and-analyzed-by: Brandon Philips <bphilips@suse.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-3-git-send-email-yinghai@kernel.org> Signed-off-by: Brandon Phililps <bphilips@suse.de> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-10x86: Fix SCI on IOAPIC != 0Yinghai Lu
Thomas Renninger <trenn@suse.de> reported on IBM x3330 booting a latest kernel on this machine results in: PCI: PCI BIOS revision 2.10 entry at 0xfd61c, last bus=1 PCI: Using configuration type 1 for base access bio: create slab <bio-0> at 0 ACPI: SCI (IRQ30) allocation failed ACPI Exception: AE_NOT_ACQUIRED, Unable to install System Control Interrupt handler (20090903/evevent-161) ACPI: Unable to start the ACPI Interpreter Later all kind of devices fail... and bisect it down to this commit: commit b9c61b70075c87a8612624736faf4a2de5b1ed30 x86/pci: update pirq_enable_irq() to setup io apic routing it turns out we need to set irq routing for the sci on ioapic1 early. -v2: make it work without sparseirq too. -v3: fix checkpatch.pl warning, and cc to stable Reported-by: Thomas Renninger <trenn@suse.de> Bisected-by: Thomas Renninger <trenn@suse.de> Tested-by: Thomas Renninger <trenn@suse.de> Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <1265793639-15071-2-git-send-email-yinghai@kernel.org> Cc: stable@kernel.org Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-02-07x86: Remove trailing spaces in messagesFrans Pop
Signed-off-by: Frans Pop <elendil@planet.nl> Cc: Avi Kivity <avi@redhat.com> Cc: x86@kernel.org LKML-Reference: <1265478443-31072-10-git-send-email-elendil@planet.nl> [ Left out the KVM bits. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-01-29x86, irq: Move __setup_vector_irq() before the first irq enable in cpu ↵Suresh Siddha
online path Lowest priority delivery of logical flat mode is broken on some systems, such that even when IO-APIC RTE says deliver the interrupt to a particular CPU, interrupt subsystem delivers the interrupt to totally different CPU. For example, this behavior was observed on a P4 based system with SiS chipset which was reported by Li Zefan. We have been handling this kind of behavior by making sure that in logical flat mode, we assign the same vector to irq mappings on all the 8 possible logical cpu's. But we have been doing this initial assignment (__setup_vector_irq()) a little late (before which interrupts were already enabled for a short duration). Move the __setup_vector_irq() before the first irq enable point in the cpu online path to avoid the issue of not handling some interrupts that wrongly hit the cpu which is still coming online. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100129194330.283696385@sbs-t61.sc.intel.com> Tested-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-29x86, irq: Update the vector domain for legacy irqs handled by io-apicSuresh Siddha
In the recent change of not reserving IRQ0_VECTOR..IRQ15_VECTOR's on all cpu's, we start with irq 0..15 getting directed to (and handled on) cpu-0. In the logical flat mode, once the AP's are online (and before irqbalance comes into picture), kernel intends to handle these IRQ's on any cpu (as the logical flat mode allows to specify multiple cpu's for the irq destination and the chipset based routing can deliver to the interrupt to any one of the specified cpu's). This was broken with our recent change, which was ending up using only cpu 0 as the destination, even when the kernel was specifying to use all online cpu's for the logical flat mode case. Fix this by updating vector allocation domain (cfg->domain) for legacy irqs, when the IO-APIC handles them. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100129194330.207790269@sbs-t61.sc.intel.com> Tested-by: Li Zefan <lizf@cn.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-19x86, irq: Don't block IRQ0_VECTOR..IRQ15_VECTOR's on all cpu'sSuresh Siddha
Currently IRQ0..IRQ15 are assigned to IRQ0_VECTOR..IRQ15_VECTOR's on all the cpu's. If these IRQ's are handled by legacy pic controller, then the kernel handles them only on cpu 0. So there is no need to block this vector space on all cpu's. Similarly if these IRQ's are handled by IO-APIC, then the IRQ affinity will determine on which cpu's we need allocate the vector resource for that particular IRQ. This can be done dynamically and here also there is no need to block 16 vectors for IRQ0..IRQ15 on all cpu's. Fix this by initially assigning IRQ0..IRQ15 to IRQ0_VECTOR..IRQ15_VECTOR's only on cpu 0. If the legacy controllers like pic handles these irq's, then this configuration will be fixed. If more modern controllers like IO-APIC handle these IRQ's, then we start with this configuration and as IRQ's migrate, vectors (/and cpu's) associated with these IRQ's change dynamically. This will freeup the block of 16 vectors on other cpu's which don't handle IRQ0..IRQ15, which can now be used for other IRQ's that the particular cpu handle. [ hpa: this also an architectural cleanup for future legacy-PIC-free configurations. ] [ hpa: fixed typo NR_LEGACY_IRQS -> NR_IRQS_LEGACY ] Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1263932453.2814.52.camel@sbs-t61.sc.intel.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-18x86, irq: Use 0x20 for the IRQ_MOVE_CLEANUP_VECTOR instead of 0x1fSuresh Siddha
After talking to some more folks inside intel (Peter Anvin, Asit Mallick), the safest option (for future compatibility etc) seen was to use vector 0x20 for IRQ_MOVE_CLEANUP_VECTOR instead of using vector 0x1f (which is documented as reserved vector in the Intel IA32 manuals). Also we don't need to reserve the entire privilege level (all 16 vectors in the priority bucket that IRQ_MOVE_CLEANUP_VECTOR falls into), as the x86 architecture (section 10.9.3 in SDM Vol3a) specifies that with in the priority level, the higher the vector number the higher the priority. And hence we don't need to reserve the complete priority level 0x20-0x2f for the IRQ migration cleanup logic. So change the IRQ_MOVE_CLEANUP_VECTOR to 0x20 and allow 0x21-0x2f to be used for device interrupts. 0x30-0x3f will be used for ISA interrupts (these also can be migrated in the context of IOAPIC and hence need to be at a higher priority level than IRQ_MOVE_CLEANUP_VECTOR). Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <20100114002118.521826763@sbs-t61.sc.intel.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Eric W. Biederman <ebiederm@xmission.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-06x86, irq: Check move_in_progress before freeing the vector mappingSuresh Siddha
With the recent irq migration fixes (post 2.6.32), Gary Hade has noticed "No IRQ handler for vector" messages during the 2.6.33-rc1 kernel boot on IBM AMD platforms and root caused the issue to this commit: > commit 23359a88e7eca3c4f402562b102f23014db3c2aa > Author: Suresh Siddha <suresh.b.siddha@intel.com> > Date: Mon Oct 26 14:24:33 2009 -0800 > > x86: Remove move_cleanup_count from irq_cfg As part of this patch, we have removed the move_cleanup_count check in smp_irq_move_cleanup_interrupt(). With this change, we can run into a situation where an irq cleanup interrupt on a cpu can cleanup the vector mappings associated with multiple irqs, of which one of the irq's migration might be still in progress. As such when that irq hits the old cpu, we get the "No IRQ handler" messages. Fix this by checking for the irq_cfg's move_in_progress and if the move is still in progress delay the vector cleanup to another irq cleanup interrupt request (which will happen when the irq starts arriving at the new cpu destination). Reported-and-tested-by: Gary Hade <garyhade@us.ibm.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1262804191.2732.7.camel@sbs-t61.sc.intel.com> Cc: Eric W. Biederman <ebiederm@xmission.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2010-01-04x86, apic: Don't waste a vector to improve vector spreadH. Peter Anvin
We want to use a vector-assignment sequence that avoids stumbling onto 0x80 earlier in the sequence, in order to improve the spread of vectors across priority levels on machines with a small number of interrupt sources. Right now, this is done by simply making the first vector (0x31 or 0x41) completely unusable. This is unnecessary; all we need is to start assignment at a +1 offset, we don't actually need to prohibit the usage of this vector once we have wrapped around. Signed-off-by: H. Peter Anvin <hpa@zytor.com> LKML-Reference: <4B426550.6000209@kernel.org>
2009-12-30x86: Increase NR_IRQS and nr_irqsYinghai Lu
I have a system with lots of igb and ixgbe, when iov/vf are enabled for them, we hit the limit of 3064. when system has 20 pcie installed, and one card has 2 functions, and one function needs 64 msi-x, may need 20 * 2 * 64 = 2560 for msi-x but if iov and vf are enabled may need 20 * 2 * 64 * 3 = 7680 for msi-x assume system with 5 ioapic, nr_irqs_gsi will be 120. NR_CPUS = 512, and nr_cpu_ids = 128 will have NR_IRQS = 256 + 512 * 64 = 33024 will have nr_irqs = 120 + 8 * 128 + 120 * 64 = 8824 When SPARSE_IRQ is not set, there is no increase with kernel data size. when NR_CPUS=128, and SPARSE_IRQ is set: text data bss dec hex filename 21837444 4216564 12480736 38534744 24bfe58 vmlinux.before 21837442 4216580 12480736 38534758 24bfe66 vmlinux.after when NR_CPUS=4096, and SPARSE_IRQ is set text data bss dec hex filename 21878619 5610244 13415392 40904255 270263f vmlinux.before 21878617 5610244 13415392 40904253 270263d vmlinux.after Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> LKML-Reference: <4B398ECD.1080506@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-28Merge branch 'iommu/fixes' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/joro/linux-2.6-iommu into x86/urgent
2009-12-19Merge branch 'x86-fixes-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu system Makefile: Unexport LC_ALL instead of clearing it x86: Fix objdump version check in arch/x86/tools/chkobjdump.awk x86: Reenable TSC sync check at boot, even with NONSTOP_TSC x86: Don't use POSIX character classes in gen-insn-attr-x86.awk Makefile: set LC_CTYPE, LC_COLLATE, LC_NUMERIC to C x86: Increase MAX_EARLY_RES; insufficient on 32-bit NUMA x86: Fix checking of SRAT when node 0 ram is not from 0 x86, cpuid: Add "volatile" to asm in native_cpuid() x86, msr: msrs_alloc/free for CONFIG_SMP=n x86, amd: Get multi-node CPU info from NodeId MSR instead of PCI config space x86: Add IA32_TSC_AUX MSR and use it x86, msr/cpuid: Register enough minors for the MSR and CPUID drivers initramfs: add missing decompressor error check bzip2: Add missing checks for malloc returning NULL bzip2/lzma/gzip: pre-boot malloc doesn't return NULL on failure
2009-12-17x86, irq: Allow 0xff for /proc/irq/[n]/smp_affinity on an 8-cpu systemSuresh Siddha
John Blackwood reported: > on an older Dell PowerEdge 6650 system with 8 cpus (4 are hyper-threaded), > and 32 bit (x86) kernel, once you change the irq smp_affinity of an irq > to be less than all cpus in the system, you can never change really the > irq smp_affinity back to be all cpus in the system (0xff) again, > even though no error status is returned on the "/bin/echo ff > > /proc/irq/[n]/smp_affinity" operation. > > This is due to that fact that BAD_APICID has the same value as > all cpus (0xff) on 32bit kernels, and thus the value returned from > set_desc_affinity() via the cpu_mask_to_apicid_and() function is treated > as a failure in set_ioapic_affinity_irq_desc(), and no affinity changes > are made. set_desc_affinity() is already checking if the incoming cpu mask intersects with the cpu online mask or not. So there is no need for the apic op cpu_mask_to_apicid_and() to check again and return BAD_APICID. Remove the BAD_APICID return value from cpu_mask_to_apicid_and() and also fix set_desc_affinity() to return -1 instead of using BAD_APICID to represent error conditions (as cpu_mask_to_apicid_and() can return logical or physical apicid values and BAD_APICID is really to represent bad physical apic id). Reported-by: John Blackwood <john.blackwood@ccur.com> Root-caused-by: John Blackwood <john.blackwood@ccur.com> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> LKML-Reference: <1261103386.2535.409.camel@sbs-t61> Signed-off-by: H. Peter Anvin <hpa@zytor.com>
2009-12-14genirq: Convert irq_desc.lock to raw_spinlockThomas Gleixner
Convert locks which cannot be sleeping locks in preempt-rt to raw_spinlocks. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Acked-by: Ingo Molnar <mingo@elte.hu>
2009-12-08Merge branch 'timers-for-linus-hpet' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'timers-for-linus-hpet' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: x86: hpet: Make WARN_ON understandable x86: arch specific support for remapping HPET MSIs intr-remap: generic support for remapping HPET MSIs x86, hpet: Simplify the HPET code x86, hpet: Disable per-cpu hpet timer if ARAT is supported
2009-12-02x86, ioapic: Document another case when level irq is seen as an edgeSuresh Siddha
In the case when cpu goes offline, fixup_irqs() will forward any unhandled interrupt on the offlined cpu to the new cpu destination that is handling the corresponding interrupt. This interrupt forwarding is done via IPI's. Hence, in this case also level-triggered io-apic interrupt will be seen as an edge interrupt in the cpu's APIC IRR. Document this scenario in the code which handles this case by doing an explicit EOI to the io-apic to clear remote IRR of the io-apic RTE. Requested-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: ebiederm@xmission.com Cc: garyhade@us.ibm.com LKML-Reference: <20091201233335.143970505@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02x86, ioapic: Fix the EOI register detection mechanismSuresh Siddha
Maciej W. Rozycki reported: > 82093AA I/O APIC has its version set to 0x11 and it > does not support the EOI register. Similarly I/O APICs > integrated into the 82379AB south bridge and the 82374EB/SB > EISA component. IO-APIC versions below 0x20 don't support EOI register. Some of the Intel ICH Specs (ICH2 to ICH5) documents the io-apic version as 0x2. This is an error with documentation and these ICH chips use io-apic's of version 0x20 and indeed has a working EOI register for the io-apic. Fix the EOI register detection mechanism to check for version 0x20 and beyond. And also, a platform can potentially have io-apic's with different versions. Make the EOI register check per io-apic. Reported-by: Maciej W. Rozycki <macro@linux-mips.org> Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: ebiederm@xmission.com Cc: garyhade@us.ibm.com LKML-Reference: <20091201233335.065361533@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-12-02x86, io-apic: Move the effort of clearing remoteIRR explicitly before ↵Maciej W. Rozycki
migrating the irq When the level-triggered interrupt is seen as an edge interrupt, we try to clear the remoteIRR explicitly (using either an io-apic eoi register when present or through the idea of changing trigger mode of the io-apic RTE to edge and then back to level). But this explicit try also needs to happen before we try to migrate the irq. Otherwise irq migration attempt will fail anyhow, as it postpones the irq migration to a later attempt when it sees the remoteIRR in the io-apic RTE still set. Signed-off-by: "Maciej W. Rozycki" <macro@linux-mips.org> Reviewed-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: ebiederm@xmission.com Cc: garyhade@us.ibm.com LKML-Reference: <20091201233334.975416130@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-23x86: Re-get cfg_new in case reuse/move irq_descYinghai Lu
When irq_desc is moved, we need to make sure to use the right cfg_new. Signed-off-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <4B07A739.3030104@kernel.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-16x86: io-apic: IO-APIC MMIO should not fail on resource insertionCyrill Gorcunov
If IO-APIC base address is 1K aligned we should not fail on resourse insertion procedure. For this sake we define IO_APIC_SLOT_SIZE constant which should cover all IO-APIC direct accessible registers. An example of a such configuration is there http://marc.info/?l=linux-kernel&m=118114792006520 | | Quoting the message | | IOAPIC[0]: apic_id 2, version 32, address 0xfec00000, GSI 0-23 | IOAPIC[1]: apic_id 3, version 32, address 0xfec80000, GSI 24-47 | IOAPIC[2]: apic_id 4, version 32, address 0xfec80400, GSI 48-71 | IOAPIC[3]: apic_id 5, version 32, address 0xfec84000, GSI 72-95 | IOAPIC[4]: apic_id 8, version 32, address 0xfec84400, GSI 96-119 | Reported-by: "Maciej W. Rozycki" <macro@linux-mips.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Acked-by: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20091116151426.GC5653@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-10x86: apic: Do not use stacked physid_mask_tCyrill Gorcunov
We should not use physid_mask_t as a stack based variable in apic code. This type depends on MAX_APICS parameter which may be huge enough. Especially it became a problem with apic NOOP driver which is portable between 32 bit and 64 bit environment (where we have really huge MAX_APICS). So apic driver should operate with pointers and a caller in turn should aware of allocation physid_mask_t variable. As a side (but positive) effect -- we may use already implemented physid_set_mask_of_physid function eliminating default_apicid_to_cpu_present completely. Note that physids_coerce and physids_promote turned into static inline from macro (since macro hides the fact that parameter is being interpreted as unsigned long, make it explicit). Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Maciej W. Rozycki <macro@linux-mips.org> Cc: Stephen Rothwell <sfr@canb.auug.org.au> LKML-Reference: <20091109220659.GA5568@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08x86, ioapic: Use snrpintf while set names for IO-APIC resoursesCyrill Gorcunov
We should be ready that one day MAX_IO_APICS may raise its number. To prevent memory overwrite we're to use safe snprintf while set IO-APIC resourse name. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20091108155431.GC25940@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-08x86, apic: Use PAGE_SIZE instead of numbersCyrill Gorcunov
The whole page is reserved for IO-APIC fixmap due to non-cacheable requirement. So lets note this explicitly instead of playing with numbers. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Maciej W. Rozycki <macro@linux-mips.org> LKML-Reference: <20091108155356.GB25940@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02x86: Use EOI register in io-apic on intel platformsSuresh Siddha
IO-APIC's in intel chipsets support EOI register starting from IO-APIC version 2. Use that when ever we need to clear the IO-APIC RTE's RemoteIRR bit explicitly. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Gary Hade <garyhade@us.ibm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> LKML-Reference: <20091026230001.947855317@sbs-t61.sc.intel.com> [ Marked use_eio_reg as __read_mostly, fixed small details ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02x86: Force irq complete move during cpu offlineSuresh Siddha
When a cpu goes offline, fixup_irqs() try to move irq's currently destined to the offline cpu to a new cpu. But this attempt will fail if the irq is recently moved to this cpu and the irq still hasn't arrived at this cpu (for non intr-remapping platforms this is when we free the vector allocation at the previous destination) that is about to go offline. This will endup with the interrupt subsystem still pointing the irq to the offline cpu, causing that irq to not work any more. Fix this by forcing the irq to complete its move (its been a long time we moved the irq to this cpu which we are offlining now) and then move this irq to a new cpu before this cpu goes offline. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Gary Hade <garyhade@us.ibm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> LKML-Reference: <20091026230001.848830905@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-11-02x86: Remove move_cleanup_count from irq_cfgSuresh Siddha
move_cleanup_count for each irq in irq_cfg is keeping track of the total number of cpus that need to free the corresponding vectors associated with the irq which has now been migrated to new destination. As long as this move_cleanup_count is non-zero (i.e., as long as we have n't freed the vector allocations on the old destinations) we were preventing the irq's further migration. This cleanup count is unnecessary and it is enough to not allow the irq migration till we send the cleanup vector to the previous irq destination, for which we already have irq_cfg's move_in_progress. All we need to make sure is that we free the vector at the old desintation but we don't need to wait till that gets freed. Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Acked-by: Gary Hade <garyhade@us.ibm.com> Cc: Eric W. Biederman <ebiederm@xmission.com> LKML-Reference: <20091026230001.752968906@sbs-t61.sc.intel.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14x86, apic: Move SGI UV functionality out of generic IO-APIC codeDimitri Sivanich
Move UV specific functionality out of the generic IO-APIC code. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> LKML-Reference: <20091013203236.GD20543@sgi.com> [ Cleaned up the code some more in their new places. ] Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14x86: SGI UV: Fix irq affinity for hub based interruptsDimitri Sivanich
This patch fixes handling of uv hub irq affinity. IRQs with ALL or NODE affinity can be routed to cpus other than their originally assigned cpu. Those with CPU affinity cannot be rerouted. Signed-off-by: Dimitri Sivanich <sivanich@sgi.com> LKML-Reference: <20090930160259.GA7822@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-10-14x86, apic: Limit apic dumping, introduce new show_lapic= setup optionCyrill Gorcunov
In case if a system has a large number of cpus printing apics contents may consume a long time period. We limit such an output by 1 apic by default. But to have an ability to see all apics or some part of them we introduce "show_lapic" setup option which allow us to limit/unlimit the number of APICs being dumped. Example: apic=debug show_lapic=5, or apic=debug show_lapic=all Also move apic_verbosity checking upper that way so helper routines do not need to inspect it at all. Suggested-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: yinghai@kernel.org Cc: macro@linux-mips.org LKML-Reference: <20091013201022.926793122@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-24cpumask: use zalloc_cpumask_var() where possibleLi Zefan
Remove open-coded zalloc_cpumask_var() and zalloc_cpumask_var_node(). Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
2009-09-20Merge branch 'linus' into x86/urgentIngo Molnar
Merge reason: Bring in changes that the next patch will depend on. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-20x86, apic: Fix missed handling of discrete apicsCyrill Gorcunov
In case of discrete (pretty old) apics we may have cpu_has_apic bit not set but have to check if smp_found_config (MP spec) is there and apic was not disabled. Also don't forget to print apic/io-apic for such case as well. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Maciej W. Rozycki" <macro@linux-mips.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090915071230.GA10604@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-09-18Merge branch 'x86-platform-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip * 'x86-platform-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (38 commits) x86: Move get/set_wallclock to x86_platform_ops x86: platform: Fix section annotations x86: apic namespace cleanup x86: Distangle ioapic and i8259 x86: Add Moorestown early detection x86: Add hardware_subarch ID for Moorestown x86: Add early platform detection x86: Move tsc_init to late_time_init x86: Move tsc_calibration to x86_init_ops x86: Replace the now identical time_32/64.c by time.c x86: time_32/64.c unify profile_pc x86: Move calibrate_cpu to tsc.c x86: Make timer setup and global variables the same in time_32/64.c x86: Remove mca bus ifdef from timer interrupt x86: Simplify timer_ack magic in time_32.c x86: Prepare unification of time_32/64.c x86: Remove do_timer hook x86: Add timer_init to x86_init_ops x86: Move percpu clockevents setup to x86_init_ops x86: Move xen_post_allocator_init into xen_pagetable_setup_done ... Fix up conflicts in arch/x86/include/asm/io_apic.h
2009-08-31x86: Distangle ioapic and i8259Thomas Gleixner
The proposed Moorestown support patches use an extra feature flag mechanism to make the ioapic work w/o an i8259. There is a much simpler solution. Most i8259 specific functions are already called dependend on the irq number less than NR_IRQS_LEGACY. Replacing that constant by a read_mostly variable which can be set to 0 by the platform setup code allows us to achieve the same without any special feature flags. That trivial change allows us to proceed with MRST w/o doing a full blown overhaul of the ioapic code which would delay MRST unduly. Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-29Merge branch 'for-ingo' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux-sfi-2.6 into x86/apic Merge reason: the SFI (Simple Firmware Interface) feature in the ACPI tree needs this cleanup, pull it into the APIC branch as well so that there's no interactions. Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-28ACPI, x86: expose some IO-APIC routines when CONFIG_ACPI=nFeng Tang
Some IO-APIC routines are ACPI specific now, but need to be exposed when CONFIG_ACPI=n for the benefit of SFI. Remove #ifdef ACPI around these routines: io_apic_get_unique_id(int ioapic, int apic_id); io_apic_get_version(int ioapic); io_apic_get_redir_entries(int ioapic); Move these routines from ACPI-specific boot.c to io_apic.c: uniq_ioapic_id(u8 id) mp_find_ioapic() mp_find_ioapic_pin() mp_register_ioapic() Also, since uniq_ioapic_id() is now no longer static, re-name it to io_apic_unique_id() for consistency with the other public io_apic routines. For simplicity, do not #ifdef the resulting code ACPI || SFI, thought that could be done in the future if it is important to optimize the !ACPI !SFI IO-APIC x86 kernel for size. Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Len Brown <len.brown@intel.com> Cc: x86@kernel.org
2009-08-27x86: arch specific support for remapping HPET MSIsSuresh Siddha
x86 arch support for remapping HPET MSI's by associating the HPET timer block with the interrupt-remapping HW unit and setting up appropriate irq_chip Signed-off-by: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Jesse Barnes <jbarnes@virtuousgeek.org> Cc: Jay Fenlason <fenlason@redhat.com> LKML-Reference: <20090804190729.630510000@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-27x86: Move ioapic_ids_setup to x86_init_opsThomas Gleixner
32bit and also the numaq code have special requirements on the ioapic_id setup. Convert it to a x86_init_ops function and get rid of the quirks and #ifdefs Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2009-08-26x86, ioapic: Get rid of needless check and simplify ioapic_setup_resources()Cyrill Gorcunov
alloc_bootmem() already panics on allocation failure. There is no need to check the result. Also there is a way to unbind global variable from its body and use it as a parameter which allow us to simplify ioapic_init_mappings as well -- "for" cycle already uses nr_ioapics as a conditional variable and there is no need to check if ioapic_setup_resources was returning NULL again. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090824175551.493629148@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-08x86, ioapic: Panic on irq-pin binding only if neededCyrill Gorcunov
Though the most time we are to panic on irq-pin allocation fails, for PCI interrupts it's not the case and we could continue operate even if irq-pin allocation failed. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Yinghai Lu <yinghai@kernel.org> LKML-Reference: <20090805200931.GB5319@lenovo> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-05x86, ioapic: Throw BUG instead of NULL dereferenceCyrill Gorcunov
Instead of plain NULL deref we better throw error message with a backtrace. Actually we need more gracious error handling here. Meanwhile leave it as is. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: yinghai@kernel.org LKML-Reference: <20090801075435.769301745@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-05x86, ioapic: Introduce for_each_irq_pin() helperCyrill Gorcunov
This allow us to save a few lines of code. Signed-off-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: yinghai@kernel.org LKML-Reference: <20090801075435.597863129@openvz.org> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-04x86, UV: Complete IRQ interrupt migration in arch_enable_uv_irq()Jack Steiner
In uv_setup_irq(), the call to create_irq() initially assigns IRQ vectors to cpu 0. The subsequent call to assign_irq_vector() in arch_enable_uv_irq() migrates the IRQ to another cpu and frees the cpu 0 vector - at least it will be freed as soon as the "IRQ move" completes. arch_enable_uv_irq() needs to send a cleanup IPI to complete the IRQ move. Otherwise, assignment of GRU interrupts on large systems (>200 cpus) will exhaust the cpu 0 interrupt vectors and initialization of the GRU driver will fail. Signed-off-by: Jack Steiner <steiner@sgi.com> LKML-Reference: <20090720142840.GA8885@sgi.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-08-02x86: Remove superfluous NULL pointer check in destroy_irq()Bartlomiej Zolnierkiewicz
This takes care of the following entry from Dan's list: arch/x86/kernel/apic/io_apic.c +3241 destroy_irq(11) warning: variable derefenced before check 'desc' Reported-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Eugene Teo <eteo@redhat.com> Cc: Julia Lawall <julia@diku.dk> LKML-Reference: <200907302321.19086.bzolnier@gmail.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-07-14x86/ioapic.c: unify ioapic_retrigger_irq()Jeremy Fitzhardinge
The 32 and 64-bit versions of ioapic_retrigger_irq() are identical except the 64-bit one takes vector_lock. vector_lock is defined and used on 32-bit too, so just use a common ioapic_retrigger_irq(). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-07-14x86/ioapic.c: convert __target_IO_APIC_irq to conventional for() loopJeremy Fitzhardinge
Use a normal for() loop in __target_IO_APIC_irq(). Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>
2009-07-14x86/ioapic.c: clean up replace_pin_at_irq_node logic and commentsJeremy Fitzhardinge
There's no need for a control variable in replace_pin_at_irq_node(); it can just return if it finds the old apic/pin to replace. If the loop terminates, then it didn't find the old apic/pin, so it can add the new ones. Signed-off-by: Jeremy Fitzhardinge <jeremy.fitzhardinge@citrix.com>