summaryrefslogtreecommitdiff
path: root/arch/x86/kernel
AgeCommit message (Collapse)Author
2013-01-31x86/common.c: Make have_cpuid_p() a global functionFenghua Yu
Remove static declaration in have_cpuid_p() to make it a global function. The function will be called in early loading microcode. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-4-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31x86/microcode_intel.h: Define functions and macros for early loading ucodeFenghua Yu
Define some functions and macros that will be used in early loading ucode. Some of them are moved from microcode_intel.c driver in order to be called in early boot phase before module can be called. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Link: http://lkml.kernel.org/r/1356075872-3054-3-git-send-email-fenghua.yu@intel.com Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-31Merge branch 'timers/for-arm' into timers/coreThomas Gleixner
2013-01-31perf: Make EVENT_ATTR globalSukadev Bhattiprolu
Rename EVENT_ATTR() to PMU_EVENT_ATTR() and make it global so it is available to all architectures. Further to allow architectures flexibility, have PMU_EVENT_ATTR() pass in the variable name as a parameter. Changelog[v2] - [Jiri Olsa] No need to define PMU_EVENT_PTR() Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com> Acked-by: Jiri Olsa <jolsa@redhat.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Anton Blanchard <anton@au1.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Robert Richter <robert.richter@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: linuxppc-dev@ozlabs.org Link: http://lkml.kernel.org/r/20130123062422.GC13720@us.ibm.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-01-31Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Peter Anvin: "This is a collection of miscellaneous fixes, the most important one is the fix for the Samsung laptop bricking issue (auto-blacklisting the samsung-laptop driver); the efi_enabled() changes you see below are prerequisites for that fix. The other issues fixed are booting on OLPC XO-1.5, an UV fix, NMI debugging, and requiring CAP_SYS_RAWIO for MSR references, just as with I/O port references." * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: samsung-laptop: Disable on EFI hardware efi: Make 'efi_enabled' a function to query EFI facilities smp: Fix SMP function call empty cpu mask race x86/msr: Add capabilities check x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES x86/olpc: Fix olpc-xo1-sci.c build errors arch/x86/platform/uv: Fix incorrect tlb flush all issue x86-64: Fix unwind annotations in recent NMI changes x86-32: Start out cr0 clean, disable paging before modifying cr3/4
2013-01-30efi: Make 'efi_enabled' a function to query EFI facilitiesMatt Fleming
Originally 'efi_enabled' indicated whether a kernel was booted from EFI firmware. Over time its semantics have changed, and it now indicates whether or not we are booted on an EFI machine with bit-native firmware, e.g. 64-bit kernel with 64-bit firmware. The immediate motivation for this patch is the bug report at, https://bugs.launchpad.net/ubuntu-cdimage/+bug/1040557 which details how running a platform driver on an EFI machine that is designed to run under BIOS can cause the machine to become bricked. Also, the following report, https://bugzilla.kernel.org/show_bug.cgi?id=47121 details how running said driver can also cause Machine Check Exceptions. Drivers need a new means of detecting whether they're running on an EFI machine, as sadly the expression, if (!efi_enabled) hasn't been a sufficient condition for quite some time. Users actually want to query 'efi_enabled' for different reasons - what they really want access to is the list of available EFI facilities. For instance, the x86 reboot code needs to know whether it can invoke the ResetSystem() function provided by the EFI runtime services, while the ACPI OSL code wants to know whether the EFI config tables were mapped successfully. There are also checks in some of the platform driver code to simply see if they're running on an EFI machine (which would make it a bad idea to do BIOS-y things). This patch is a prereq for the samsung-laptop fix patch. Cc: David Airlie <airlied@linux.ie> Cc: Corentin Chary <corentincj@iksaif.net> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Cc: Dave Jiang <dave.jiang@intel.com> Cc: Olof Johansson <olof@lixom.net> Cc: Peter Jones <pjones@redhat.com> Cc: Colin Ian King <colin.king@canonical.com> Cc: Steve Langasek <steve.langasek@canonical.com> Cc: Tony Luck <tony.luck@intel.com> Cc: Konrad Rzeszutek Wilk <konrad@kernel.org> Cc: Rafael J. Wysocki <rjw@sisk.pl> Cc: <stable@vger.kernel.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86: Merge early kernel reserve for 32bit and 64bitYinghai Lu
They are the same, and we could move them out from head32/64.c to setup.c. We are using memblock, and it could handle overlapping properly, so we don't need to reserve some at first to hold the location, and just need to make sure we reserve them before we are using memblock to find free mem to use. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-32-git-send-email-yinghai@kernel.org Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86: Add Crash kernel low reservationYinghai Lu
During kdump kernel's booting stage, it need to find low ram for swiotlb buffer when system does not support intel iommu/dmar remapping. kexed-tools is appending memmap=exactmap and range from /proc/iomem with "Crash kernel", and that range is above 4G for 64bit after boot protocol 2.12. We need to add another range in /proc/iomem like "Crash kernel low", so kexec-tools could find that info and append to kdump kernel command line. Try to reserve some under 4G if the normal "Crash kernel" is above 4G. User could specify the size with crashkernel_low=XX[KMG]. -v2: fix warning that is found by Fengguang's test robot. -v3: move out get_mem_size change to another patch, to solve compiling warning that is found by Borislav Petkov <bp@alien8.de> -v4: user must specify crashkernel_low if system does not support intel or amd iommu. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-31-git-send-email-yinghai@kernel.org Cc: Eric Biederman <ebiederm@xmission.com> Cc: Rob Landley <rob@landley.net> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, kdump: Remove crashkernel range find limit for 64bitYinghai Lu
Now kexeced kernel/ramdisk could be above 4g, so remove 896 limit for 64bit. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-30-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29memblock: Add memblock_mem_size()Yinghai Lu
Use it to get mem size under the limit_pfn. to replace local version in x86 reserved_initrd. -v2: remove not needed cast that is pointed out by HPA. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-29-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, boot: Not need to check setup_header version for setup_dataYinghai Lu
That is for bootloaders. setup_data is in setup_header, and bootloader is copying that from bzImage. So for old bootloader should keep that as 0 already. old kexec-tools till now for elf image set setup_data to 0, so it is ok. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-28-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, boot: Support loading bzImage, boot_params and ramdisk above 4GYinghai Lu
xloadflags bit 1 indicates that we can load the kernel and all data structures above 4G; it is set if kernel is relocatable and 64bit. bootloader will check if xloadflags bit 1 is set to decide if it could load ramdisk and kernel high above 4G. bootloader will fill value to ext_ramdisk_image/size for high 32bits when it load ramdisk above 4G. kernel use get_ramdisk_image/size to use ext_ramdisk_image/size to get right positon for ramdisk. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Cc: Rob Landley <rob@landley.net> Cc: Matt Fleming <matt.fleming@intel.com> Cc: Gokul Caushik <caushik1@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joe Millenbach <jmillenbach@gmail.com> Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, kexec, 64bit: Only set ident mapping for ram.Yinghai Lu
We should set mappings only for usable memory ranges under max_pfn Otherwise causes same problem that is fixed by x86, mm: Only direct map addresses that are marked as E820_RAM This patch exposes pfn_mapped array, and only sets ident mapping for ranges in that array. This patch relies on new kernel_ident_mapping_init that could handle existing pgd/pud between different calls. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-25-git-send-email-yinghai@kernel.org Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, kexec: Replace ident_mapping_init and init_level4_pageYinghai Lu
Now ident_mapping_init is checking if pgd/pud is present for every 2M, so several 2Ms are in same PUD, it will keep checking if pud is there with same pud. init_level4_page just does not check existing pgd/pud. We could use generic mapping_init with different settings in info to replace those two local grown version functions. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-24-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, kexec: Set ident mapping for kernel that is above max_pfnYinghai Lu
When first kernel is booted with memmap= or mem= to limit max_pfn. kexec can load second kernel above that max_pfn. We need to set ident mapping for whole image in this case instead of just for first 2M. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-23-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, boot: Add get_cmd_line_ptr()Yinghai Lu
Add an accessor function for the command line address. Later we will add support for holding a 64-bit address via ext_cmd_line_ptr. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-17-git-send-email-yinghai@kernel.org Cc: Gokul Caushik <caushik1@gmail.com> Cc: Josh Triplett <josh@joshtriplett.org> Cc: Joe Millenbach <jmillenbach@gmail.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86: Add get_ramdisk_image/size()Yinghai Lu
There are several places to find ramdisk information early for reserving and relocating. Use accessor functions to make code more readable and consistent. Later will add ext_ramdisk_image/size in those functions to support loading ramdisk above 4g. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-16-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86: Merge early_reserve_initrd for 32bit and 64bitYinghai Lu
They are the same, could move them out from head32/64.c to setup.c. We are using memblock, and it could handle overlapping properly, so we don't need to reserve some at first to hold the location, and just need to make sure we reserve them before we are using memblock to find free mem to use. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-15-git-send-email-yinghai@kernel.org Reviewed-by: Pekka Enberg <penberg@kernel.org> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, 64bit: Don't set max_pfn_mapped wrong value early on native pathYinghai Lu
We are not having max_pfn_mapped set correctly until init_memory_mapping. So don't print its initial value for 64bit Also need to use KERNEL_IMAGE_SIZE directly for highmap cleanup. -v2: update comments about max_pfn_mapped according to Stefano Stabellini. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-14-git-send-email-yinghai@kernel.org Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, 64bit: #PF handler set page to cover only 2M per #PFYinghai Lu
We only map a single 2 MiB page per #PF, even though we should be able to do this a full gigabyte at a time with no additional memory cost. This is a workaround for a broken AMD reference BIOS (and its derivatives in shipping system) which maps a large chunk of memory as WB in the MTRR system but will #MC if the processor wanders off and tries to prefetch that memory, which can happen any time the memory is mapped in the TLB. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-13-git-send-email-yinghai@kernel.org Cc: Alexander Duyck <alexander.h.duyck@intel.com> [ hpa: rewrote the patch description ] Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, 64bit: Use a #PF handler to materialize early mappings on demandH. Peter Anvin
Linear mode (CR0.PG = 0) is mutually exclusive with 64-bit mode; all 64-bit code has to use page tables. This makes it awkward before we have first set up properly all-covering page tables to access objects that are outside the static kernel range. So far we have dealt with that simply by mapping a fixed amount of low memory, but that fails in at least two upcoming use cases: 1. We will support load and run kernel, struct boot_params, ramdisk, command line, etc. above the 4 GiB mark. 2. need to access ramdisk early to get microcode to update that as early possible. We could use early_iomap to access them too, but it will make code to messy and hard to be unified with 32 bit. Hence, set up a #PF table and use a fixed number of buffers to set up page tables on demand. If the buffers fill up then we simply flush them and start over. These buffers are all in __initdata, so it does not increase RAM usage at runtime. Thus, with the help of the #PF handler, we can set the final kernel mapping from blank, and switch to init_level4_pgt later. During the switchover in head_64.S, before #PF handler is available, we use three pages to handle kernel crossing 1G, 512G boundaries with sharing page by playing games with page aliasing: the same page is mapped twice in the higher-level tables with appropriate wraparound. The kernel region itself will be properly mapped; other mappings may be spurious. early_make_pgtable is using kernel high mapping address to access pages to set page table. -v4: Add phys_base offset to make kexec happy, and add init_mapping_kernel() - Yinghai -v5: fix compiling with xen, and add back ident level3 and level2 for xen also move back init_level4_pgt from BSS to DATA again. because we have to clear it anyway. - Yinghai -v6: switch to init_level4_pgt in init_mem_mapping. - Yinghai -v7: remove not needed clear_page for init_level4_page it is with fill 512,8,0 already in head_64.S - Yinghai -v8: we need to keep that handler alive until init_mem_mapping and don't let early_trap_init to trash that early #PF handler. So split early_trap_pf_init out and move it down. - Yinghai -v9: switchover only cover kernel space instead of 1G so could avoid touch possible mem holes. - Yinghai -v11: change far jmp back to far return to initial_code, that is needed to fix failure that is reported by Konrad on AMD systems. - Yinghai Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-12-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, realmode: Separate real_mode reserve and setupYinghai Lu
After we switch to use #PF handler help to set page table, init_level4_pgt will only have entries set after init_mem_mapping(). We need to move copying init_level4_pgt to trampoline_pgd after that. So split reserve and setup, and move the setup after init_mem_mapping() Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-11-git-send-email-yinghai@kernel.org Cc: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86, 64bit: Copy struct boot_params earlyYinghai Lu
We want to support struct boot_params (formerly known as the zero-page, or real-mode data) above the 4 GiB mark. We will have #PF handler to set page table for not accessible ram early, but want to limit it before x86_64_start_reservations to limit the code change to native path only. Also we will need the ramdisk info in struct boot_params to access the microcode blob in ramdisk in x86_64_start_kernel, so copy struct boot_params early makes it accessing ramdisk info simple. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-9-git-send-email-yinghai@kernel.org Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29x86: Factor out e820_add_kernel_range()Yinghai Lu
Separate out the reservation of the kernel static memory areas into a separate function. Also add support for case when memmap=xxM$yyM is used without exactmap. Need to remove reserved range at first before we add E820_RAM range, otherwise added E820_RAM range will be ignored. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-5-git-send-email-yinghai@kernel.org Cc: Jacob Shin <jacob.shin@amd.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29Merge remote-tracking branch 'origin/x86/boot' into x86/mm2H. Peter Anvin
Coming patches to x86/mm2 require the changes and advanced baseline in x86/boot. Resolved Conflicts: arch/x86/kernel/setup.c mm/nobootmem.c Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-29Merge branch 'master' into for-nextJiri Kosina
Conflicts: drivers/devfreq/exynos4_bus.c Sync with Linus' tree to be able to apply patches that are against newer code (mvneta).
2013-01-29x86, boot: Sanitize boot_params if not zeroed on creationH. Peter Anvin
Use the new sentinel field to detect bootloaders which fail to follow protocol and don't initialize fields in struct boot_params that they do not explicitly initialize to zero. Based on an original patch and research by Yinghai Lu. Changed by hpa to be invoked both in the decompression path and in the kernel proper; the latter for the case where a bootloader takes over decompression. Originally-by: Yinghai Lu <yinghai@kernel.org> Link: http://lkml.kernel.org/r/1359058816-7615-26-git-send-email-yinghai@kernel.org Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-28x86, io_apic: Introduce eoi_ioapic_pin call-backJoerg Roedel
This callback replaces the old __eoi_ioapic_pin function which needs a special path for interrupt remapping. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, msi: Introduce x86_msi.compose_msi_msg call-backJoerg Roedel
This call-back points to the right function for initializing the msi_msg structure. The old code for msi_msg generation was split up into the irq-remapped and the default case. The irq-remapped case just calls into the specific Intel or AMD implementation when the device is behind an IOMMU. Otherwise the default function is called. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, irq: Introduce setup_remapped_irq()Joerg Roedel
This function does irq-remapping specific interrupt setup like modifying the chip defaults. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, irq: Move irq_remapped() check into free_remapped_irqJoerg Roedel
The function is called unconditionally now in IO-APIC code removing another irq_remapped() check from x86 core code. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io-apic: Remove !irq_remapped() check from __target_IO_APIC_irq()Joerg Roedel
This function is only called from default_ioapic_set_affinity() which is only used when interrupt remapping is disabled since the introduction of the set_affinity function pointer. So the check will always evaluate as true and can be removed. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io-apic: Move CONFIG_IRQ_REMAP code out of x86 coreJoerg Roedel
Move all the code to either to the header file asm/irq_remapping.h or to drivers/iommu/. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io_apic: Remove irq_remapping_enabled check in setup_timer_IRQ0_pinJoerg Roedel
This function is only called when irq-remapping is disabled. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io_apic: Move irq_remapping_enabled checks out of check_timer()Joerg Roedel
Move these checks to IRQ remapping code by introducing the panic_on_irq_remap() function. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io_apic: Convert setup_ioapic_entry to function pointerJoerg Roedel
This pointer is changed to a different function when IRQ remapping is enabled. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io_apic: Introduce set_affinity function pointerJoerg Roedel
With interrupt remapping a special function is used to change the affinity of an IO-APIC interrupt. Abstract this with a function pointer. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, msi: Use IRQ remapping specific setup_msi_irqs routineJoerg Roedel
Use seperate routines to setup MSI IRQs for both irq_remapping_enabled cases. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, hpet: Introduce x86_msi_ops.setup_hpet_msiJoerg Roedel
This function pointer can be overwritten by the IRQ remapping code. The irq_remapping_enabled check can be removed from default_setup_hpet_msi. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io_apic: Introduce x86_io_apic_ops.print_entries for debuggingJoerg Roedel
This call-back is used to dump IO-APIC entries for debugging purposes into the kernel log. VT-d needs a special routine for this and will overwrite the default. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, io_apic: Introduce x86_io_apic_ops.disable()Joerg Roedel
This function pointer is used to call a system-specific function for disabling the IO-APIC. Currently this is used for IRQ remapping which has its own disable routine. Also introduce the necessary infrastructure in the interrupt remapping code to overwrite this and other function pointers as necessary by interrupt remapping. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, apic: Mask IO-APIC and PIC unconditionally on LAPIC resumeJoerg Roedel
IO-APIC and PIC use the same resume routines when IRQ remapping is enabled or disabled. So it should be safe to mask the other APICs for the IRQ-remapping-disabled case too. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-28x86, apic: Move irq_remapping_enabled checks into IRQ-remapping codeJoerg Roedel
Move the three easy to move checks in the x86' apic.c file into the IRQ-remapping code. Signed-off-by: Joerg Roedel <joro@8bytes.org> Acked-by: Sebastian Andrzej Siewior <sebastian@breakpoint.cc> Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
2013-01-27cputime: Use accessors to read task cputime statsFrederic Weisbecker
This is in preparation for the full dynticks feature. While remotely reading the cputime of a task running in a full dynticks CPU, we'll need to do some extra-computation. This way we can account the time it spent tickless in userspace since its last cputime snapshot. Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Li Zhong <zhong@linux.vnet.ibm.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Paul Gortmaker <paul.gortmaker@windriver.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Thomas Gleixner <tglx@linutronix.de>
2013-01-25x86, kvm: Fix kvm's use of __pa() on percpu areasDave Hansen
In short, it is illegal to call __pa() on an address holding a percpu variable. This replaces those __pa() calls with slow_virt_to_phys(). All of the cases in this patch are in boot time (or CPU hotplug time at worst) code, so the slow pagetable walking in slow_virt_to_phys() is not expected to have a performance impact. The times when this actually matters are pretty obscure (certain 32-bit NUMA systems), but it _does_ happen. It is important to keep KVM guests working on these systems because the real hardware is getting harder and harder to find. This bug manifested first by me seeing a plain hang at boot after this message: CPU 0 irqstacks, hard=f3018000 soft=f301a000 or, sometimes, it would actually make it out to the console: [ 0.000000] BUG: unable to handle kernel paging request at ffffffff I eventually traced it down to the KVM async pagefault code. This can be worked around by disabling that code either at compile-time, or on the kernel command-line. The kvm async pagefault code was injecting page faults in to the guest which the guest misinterpreted because its "reason" was not being properly sent from the host. The guest passes a physical address of an per-cpu async page fault structure via an MSR to the host. Since __pa() is broken on percpu data, the physical address it sent was bascially bogus and the host went scribbling on random data. The guest never saw the real reason for the page fault (it was injected by the host), assumed that the kernel had taken a _real_ page fault, and panic()'d. The behavior varied, though, depending on what got corrupted by the bad write. Signed-off-by: Dave Hansen <dave@linux.vnet.ibm.com> Link: http://lkml.kernel.org/r/20130122212435.4905663F@kernel.stglabs.ibm.com Acked-by: Rik van Riel <riel@redhat.com> Reviewed-by: Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-25Merge tag 'v3.8-rc5' into x86/mmH. Peter Anvin
The __pa() fixup series that follows touches KVM code that is not present in the existing branch based on v3.7-rc5, so merge in the current upstream from Linus. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2013-01-26PM / tracing: remove deprecated power trace APIPaul Gortmaker
The text in Documentation said it would be removed in 2.6.41; the text in the Kconfig said removal in the 3.1 release. Either way you look at it, we are well past both, so push it off a cliff. Note that the POWER_CSTATE and the POWER_PSTATE are part of the legacy tracing API. Remove all tracepoints which use these flags. As can be seen from context, most already have a trace entry via trace_cpu_idle anyways. Also, the cpufreq/cpufreq.c PSTATE one is actually unpaired, as compared to the CSTATE ones which all have a clear start/stop. As part of this, the trace_power_frequency also becomes orphaned, so it too is deleted. Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com> Acked-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2013-01-24x86/process: Change %8s to %s for pr_warn() in release_thread()Chen Gang
the length of dead_task->comm[] is 16 (TASK_COMM_LEN) on pr_warn(), it is not meaningful to use %8s for task->comm[]. So change it to %s, since the line is not solid anyway. Additional information: %8s limit the width, not for the original string output length if name length is more than 8, it still can be fully displayed. if name length is less than 8, the ' ' will be filled before name. %.8s truly limit the original string output length (precision) Signed-off-by: Chen Gang <gang.chen@asianux.com> Link: http://lkml.kernel.org/n/tip-nridm1zvreai1tgfLjuexDmd@git.kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24x86/msr: Add capabilities checkAlan Cox
At the moment the MSR driver only relies upon file system checks. This means that anything as root with any capability set can write to MSRs. Historically that wasn't very interesting but on modern processors the MSRs are such that writing to them provides several ways to execute arbitary code in kernel space. Sample code and documentation on doing this is circulating and MSR attacks are used on Windows 64bit rootkits already. In the Linux case you still need to be able to open the device file so the impact is fairly limited and reduces the security of some capability and security model based systems down towards that of a generic "root owns the box" setup. Therefore they should require CAP_SYS_RAWIO to prevent an elevation of capabilities. The impact of this is fairly minimal on most setups because they don't have heavy use of capabilities. Those using SELinux, SMACK or AppArmor rules might want to consider if their rulesets on the MSR driver could be tighter. Signed-off-by: Alan Cox <alan@linux.intel.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Horses <stable@kernel.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-01-24x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIESMaarten Lankhorst
I ran out of free entries when I had CONFIG_DMA_API_DEBUG enabled. Some other archs seem to default to 65536, so increase this limit for x86 too. Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Cc: Bjorn Helgaas <bhelgaas@google.com> Link: http://lkml.kernel.org/r/50A612AA.7040206@canonical.com Signed-off-by: Ingo Molnar <mingo@kernel.org> ----