summaryrefslogtreecommitdiff
path: root/arch/x86/include/asm
AgeCommit message (Collapse)Author
2014-10-11Merge tag 'stable/for-linus-3.18-rc0-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen updates from David Vrabel: "Features and fixes: - Add pvscsi frontend and backend drivers. - Remove _PAGE_IOMAP PTE flag, freeing it for alternate uses. - Try and keep memory contiguous during PV memory setup (reduces SWIOTLB usage). - Allow front/back drivers to use threaded irqs. - Support large initrds in PV guests. - Fix PVH guests in preparation for Xen 4.5" * tag 'stable/for-linus-3.18-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: (22 commits) xen: remove DEFINE_XENBUS_DRIVER() macro xen/xenbus: Remove BUG_ON() when error string trucated xen/xenbus: Correct the comments for xenbus_grant_ring() x86/xen: Set EFER.NX and EFER.SCE in PVH guests xen: eliminate scalability issues from initrd handling xen: sync some headers with xen tree xen: make pvscsi frontend dependant on xenbus frontend arm{,64}/xen: Remove "EXPERIMENTAL" in the description of the Xen options xen-scsifront: don't deadlock if the ring becomes full x86: remove the Xen-specific _PAGE_IOMAP PTE flag x86/xen: do not use _PAGE_IOMAP PTE flag for I/O mappings x86: skip check for spurious faults for non-present faults xen/efi: Directly include needed headers xen-scsiback: clean up a type issue in scsiback_make_tpg() xen-scsifront: use GFP_ATOMIC under spin_lock MAINTAINERS: Add xen pvscsi maintainer xen-scsiback: Add Xen PV SCSI backend driver xen-scsifront: Add Xen PV SCSI frontend driver xen: Add Xen pvSCSI protocol description xen/events: support threaded irqs for interdomain event channels ...
2014-10-09mm: remove misleading ARCH_USES_NUMA_PROT_NONEMel Gorman
ARCH_USES_NUMA_PROT_NONE was defined for architectures that implemented _PAGE_NUMA using _PROT_NONE. This saved using an additional PTE bit and relied on the fact that PROT_NONE vmas were skipped by the NUMA hinting fault scanner. This was found to be conceptually confusing with a lot of implicit assumptions and it was asked that an alternative be found. Commit c46a7c81 "x86: define _PAGE_NUMA by reusing software bits on the PMD and PTE levels" redefined _PAGE_NUMA on x86 to be one of the swap PTE bits and shrunk the maximum possible swap size but it did not go far enough. There are no architectures that reuse _PROT_NONE as _PROT_NUMA but the relics still exist. This patch removes ARCH_USES_NUMA_PROT_NONE and removes some unnecessary duplication in powerpc vs the generic implementation by defining the types the core NUMA helpers expected to exist from x86 with their ppc64 equivalent. This necessitated that a PTE bit mask be created that identified the bits that distinguish present from NUMA pte entries but it is expected this will only differ between arches based on _PAGE_PROTNONE. The naming for the generic helpers was taken from x86 originally but ppc64 has types that are equivalent for the purposes of the helper so they are mapped instead of duplicating code. Signed-off-by: Mel Gorman <mgorman@suse.de> Cc: Hugh Dickins <hughd@google.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Cc: Rik van Riel <riel@redhat.com> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Cyrill Gorcunov <gorcunov@gmail.com> Reviewed-by: Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-10-09Merge branch 'timers-nohz-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull timer fixes from Ingo Molnar: "Main changes: - Fix the deadlock reported by Dave Jones et al - Clean up and fix nohz_full interaction with arch abilities - nohz init code consolidation/cleanup" * 'timers-nohz-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: nohz: nohz full depends on irq work self IPI support nohz: Consolidate nohz full init code arm64: Tell irq work about self IPI support arm: Tell irq work about self IPI support x86: Tell irq work about self IPI support irq_work: Force raised irq work to run on irq work interrupt irq_work: Introduce arch_irq_work_has_interrupt() nohz: Move nohz full init call to tick init
2014-10-08Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM updates from Paolo Bonzini: "Fixes and features for 3.18. Apart from the usual cleanups, here is the summary of new features: - s390 moves closer towards host large page support - PowerPC has improved support for debugging (both inside the guest and via gdbstub) and support for e6500 processors - ARM/ARM64 support read-only memory (which is necessary to put firmware in emulated NOR flash) - x86 has the usual emulator fixes and nested virtualization improvements (including improved Windows support on Intel and Jailhouse hypervisor support on AMD), adaptive PLE which helps overcommitting of huge guests. Also included are some patches that make KVM more friendly to memory hot-unplug, and fixes for rare caching bugs. Two patches have trivial mm/ parts that were acked by Rik and Andrew. Note: I will soon switch to a subkey for signing purposes" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: (157 commits) kvm: do not handle APIC access page if in-kernel irqchip is not in use KVM: s390: count vcpu wakeups in stat.halt_wakeup KVM: s390/facilities: allow TOD-CLOCK steering facility bit KVM: PPC: BOOK3S: HV: CMA: Reserve cma region only in hypervisor mode arm/arm64: KVM: Report correct FSC for unsupported fault types arm/arm64: KVM: Fix VTTBR_BADDR_MASK and pgd alloc kvm: Fix kvm_get_page_retry_io __gup retval check arm/arm64: KVM: Fix set_clear_sgi_pend_reg offset kvm: x86: Unpin and remove kvm_arch->apic_access_page kvm: vmx: Implement set_apic_access_page_addr kvm: x86: Add request bit to reload APIC access page address kvm: Add arch specific mmu notifier for page invalidation kvm: Rename make_all_cpus_request() to kvm_make_all_cpus_request() and make it non-static kvm: Fix page ageing bugs kvm/x86/mmu: Pass gfn and level to rmapp callback. x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-only kvm: x86: use macros to compute bank MSRs KVM: x86: Remove debug assertion of non-PAE reserved bits kvm: don't take vcpu mutex for obviously invalid vcpu ioctls kvm: Faults which trigger IO release the mmap_sem ...
2014-10-08x86: Reject x32 executables if x32 ABI not supportedBen Hutchings
It is currently possible to execve() an x32 executable on an x86_64 kernel that has only ia32 compat enabled. However all its syscalls will fail, even _exit(). This usually causes it to segfault. Change the ELF compat architecture check so that x32 executables are rejected if we don't support the x32 ABI. Signed-off-by: Ben Hutchings <ben@decadent.org.uk> Link: http://lkml.kernel.org/r/1410120305.6822.9.camel@decadent.org.uk Cc: stable@vger.kernel.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
2014-10-07Merge tag 'tiny/for-3.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux Pull "tinification" patches from Josh Triplett. Work on making smaller kernels. * tag 'tiny/for-3.18' of git://git.kernel.org/pub/scm/linux/kernel/git/josh/linux: bloat-o-meter: Ignore syscall aliases SyS_ and compat_SyS_ mm: Support compiling out madvise and fadvise x86: Support compiling out human-friendly processor feature names x86: Drop support for /proc files when !CONFIG_PROC_FS x86, boot: Don't compile early_serial_console.c when !CONFIG_EARLY_PRINTK x86, boot: Don't compile aslr.c when !CONFIG_RANDOMIZE_BASE x86, boot: Use the usual -y -n mechanism for objects in vmlinux x86: Add "make tinyconfig" to configure the tiniest possible kernel x86, platform, kconfig: move kvmconfig functionality to a helper
2014-10-03Merge branch 'next' into efi-next-mergeMatt Fleming
Conflicts: arch/x86/boot/compressed/eboot.c
2014-10-03efi: Delete the in_nmi() conditional runtime lockingMatt Fleming
commit 5dc3826d9f08 ("efi: Implement mandatory locking for UEFI Runtime Services") implemented some conditional locking when accessing variable runtime services that Ingo described as "pretty disgusting". The intention with the !efi_in_nmi() checks was to avoid live-locks when trying to write pstore crash data into an EFI variable. Such lockless accesses are allowed according to the UEFI specification when we're in a "non-recoverable" state, but whether or not things are implemented correctly in actual firmware implementations remains an unanswered question, and so it would seem sensible to avoid doing any kind of unsynchronized variable accesses. Furthermore, the efi_in_nmi() tests are inadequate because they don't account for the case where we call EFI variable services from panic or oops callbacks and aren't executing in NMI context. In other words, live-locking is still possible. Let's just remove the conditional locking altogether. Now we've got the ->set_variable_nonblocking() EFI variable operation we can abort if the runtime lock is already held. Aborting is by far the safest option. Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: Matthew Garrett <mjg59@srcf.ucam.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03x86/efi: Mark initialization code as suchMathias Krause
The 32 bit and 64 bit implementations differ in their __init annotations for some functions referenced from the common EFI code. Namely, the 32 bit variant is missing some of the __init annotations the 64 bit variant has. To solve the colliding annotations, mark the corresponding functions in efi_32.c as initialization code, too -- as it is such. Actually, quite a few more functions are only used during initialization and therefore can be marked __init. They are therefore annotated, too. Also add the __init annotation to the prototypes in the efi.h header so users of those functions will see it's meant as initialization code only. This patch also fixes the "prelog" typo. ("prologue" / "epilogue" might be more appropriate but this is C code after all, not an opera! :D) Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03x86/efi: Unexport add_efi_memmap variableMathias Krause
This variable was accidentally exported, even though it's only used in this compilation unit and only during initialization. Remove the bogus export, make the variable static instead and mark it as __initdata. Fixes: 200001eb140e ("x86 boot: only pick up additional EFI memmap...") Cc: Paul Jackson <pj@sgi.com> Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03x86/efi: Remove unused efi_call* macrosMathias Krause
Complement commit 62fa6e69a436 ("x86/efi: Delete most of the efi_call* macros") and delete the stub macros for the !CONFIG_EFI case, too. In fact, there are no EFI calls in this case so we don't need a dummy for efi_call() even. Signed-off-by: Mathias Krause <minipli@googlemail.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03efi: Implement mandatory locking for UEFI Runtime ServicesArd Biesheuvel
According to section 7.1 of the UEFI spec, Runtime Services are not fully reentrant, and there are particular combinations of calls that need to be serialized. Use a spinlock to serialize all Runtime Services with respect to all others, even if this is more than strictly needed. We've managed to get away without requiring a runtime services lock until now because most of the interactions with EFI involve EFI variables, and those operations are already serialised with __efivars->lock. Some of the assumptions underlying the decision whether locks are needed or not (e.g., SetVariable() against ResetSystem()) may not apply universally to all [new] architectures that implement UEFI. Rather than try to reason our way out of this, let's just implement at least what the spec requires in terms of locking. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-10-03locking,arch: Use ACCESS_ONCE() instead of cast to volatile in atomic_read()Pranith Kumar
Use the much more reader friendly ACCESS_ONCE() instead of the cast to volatile. This is purely a stylistic change. Signed-off-by: Pranith Kumar <bobby.prani@gmail.com> Acked-by: Jesper Nilsson <jesper.nilsson@axis.com> Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no> Acked-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: linux-arch@vger.kernel.org Link: http://lkml.kernel.org/r/1411482607-20948-1-git-send-email-bobby.prani@gmail.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-27Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "This has: - EFI revert to fix a boot regression - early_ioremap() fix for boot failure - KASLR fix for possible boot failures - EFI fix for corrupted string printing - remove a misleading EFI bootup 'failed!' error message Unfortunately it's all rather close to the merge window" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/efi: Truncate 64-bit values when calling 32-bit OutputString() x86/efi: Delete misleading efi_printk() error message Revert "efi/x86: efistub: Move shared dependencies to <asm/efi.h>" x86/kaslr: Avoid the setup_data area when picking location x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8
2014-09-25Merge tag 'efi-urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fixes from Matt Fleming: * Revert the static library changes from the merge window since they're causing issues for Macbooks and Fedora + Grub2 (Matt Fleming) * Delete the misleading "setup_efi_pci() failed!" message which some people are seeing when booting EFI (Matt Fleming) * Fix printing strings from the 32-bit EFI boot stub by only passing 32-bit addresses to the firmware (Matt Fleming) Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-24kvm: x86: Unpin and remove kvm_arch->apic_access_pageTang Chen
In order to make the APIC access page migratable, stop pinning it in memory. And because the APIC access page is not pinned in memory, we can remove kvm_arch->apic_access_page. When we need to write its physical address into vmcs, we use gfn_to_page() to get its page struct, which is needed to call page_to_phys(); the page is then immediately unpinned. Suggested-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24kvm: x86: Add request bit to reload APIC access page addressTang Chen
Currently, the APIC access page is pinned by KVM for the entire life of the guest. We want to make it migratable in order to make memory hot-unplug available for machines that run KVM. This patch prepares to handle this in generic code, through a new request bit (that will be set by the MMU notifier) and a new hook that is called whenever the request bit is processed. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24kvm: Add arch specific mmu notifier for page invalidationTang Chen
This will be used to let the guest run while the APIC access page is not pinned. Because subsequent patches will fill in the function for x86, place the (still empty) x86 implementation in the x86.c file instead of adding an inline function in kvm_host.h. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24kvm: Fix page ageing bugsAndres Lagar-Cavilla
1. We were calling clear_flush_young_notify in unmap_one, but we are within an mmu notifier invalidate range scope. The spte exists no more (due to range_start) and the accessed bit info has already been propagated (due to kvm_pfn_set_accessed). Simply call clear_flush_young. 2. We clear_flush_young on a primary MMU PMD, but this may be mapped as a collection of PTEs by the secondary MMU (e.g. during log-dirty). This required expanding the interface of the clear_flush_young mmu notifier, so a lot of code has been trivially touched. 3. In the absence of shadow_accessed_mask (e.g. EPT A bit), we emulate the access bit by blowing the spte. This requires proper synchronizing with MMU notifier consumers, like every other removal of spte's does. Signed-off-by: Andres Lagar-Cavilla <andreslc@google.com> Acked-by: Rik van Riel <riel@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24x86: kvm: use alternatives for VMCALL vs. VMMCALL if kernel text is read-onlyPaolo Bonzini
On x86_64, kernel text mappings are mapped read-only with CONFIG_DEBUG_RODATA. In that case, KVM will fail to patch VMCALL instructions to VMMCALL as required on AMD processors. The failure mode is currently a divide-by-zero exception, which obviously is a KVM bug that has to be fixed. However, picking the right instruction between VMCALL and VMMCALL will be faster and will help if you cannot upgrade the hypervisor. Reported-by: Chris Webb <chris@arachsys.com> Tested-by: Chris Webb <chris@arachsys.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: x86@kernel.org Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-24KVM: x86: directly use kvm_make_request againLiang Chen
A one-line wrapper around kvm_make_request is not particularly useful. Replace kvm_mmu_flush_tlb() with kvm_make_request(). Signed-off-by: Liang Chen <liangchen.linux@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-23Revert "efi/x86: efistub: Move shared dependencies to <asm/efi.h>"Matt Fleming
This reverts commit f23cf8bd5c1f ("efi/x86: efistub: Move shared dependencies to <asm/efi.h>") as well as the x86 parts of commit f4f75ad5741f ("efi: efistub: Convert into static library"). The road leading to these two reverts is long and winding. The above two commits were merged during the v3.17 merge window and turned the common EFI boot stub code into a static library. This necessitated making some symbols global in the x86 boot stub which introduced new entries into the early boot GOT. The problem was that we weren't fixing up the newly created GOT entries before invoking the EFI boot stub, which sometimes resulted in hangs or resets. This failure was reported by Maarten on his Macbook pro. The proposed fix was commit 9cb0e394234d ("x86/efi: Fixup GOT in all boot code paths"). However, that caused issues for Linus when booting his Sony Vaio Pro 11. It was subsequently reverted in commit f3670394c29f. So that leaves us back with Maarten's Macbook pro not booting. At this stage in the release cycle the least risky option is to revert the x86 EFI boot stub to the pre-merge window code structure where we explicitly #include efi-stub-helper.c instead of linking with the static library. The arm64 code remains unaffected. We can take another swing at the x86 parts for v3.18. Conflicts: arch/x86/include/asm/efi.h Tested-by: Josh Boyer <jwboyer@fedoraproject.org> Tested-by: Maarten Lankhorst <maarten.lankhorst@canonical.com> Tested-by: Leif Lindholm <leif.lindholm@linaro.org> [arm64] Tested-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>, Cc: Ingo Molnar <mingo@kernel.org> Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-23x86: remove the Xen-specific _PAGE_IOMAP PTE flagDavid Vrabel
The _PAGE_IO_MAP PTE flag was only used by Xen PV guests to mark PTEs that were used to map I/O regions that are 1:1 in the p2m. This allowed Xen to obtain the correct PFN when converting the MFNs read from a PTE back to their PFN. Xen guests no longer use _PAGE_IOMAP for this. Instead mfn_to_pfn() returns the correct PFN by using a combination of the m2p and p2m to determine if an MFN corresponds to a 1:1 mapping in the the p2m. Remove _PAGE_IOMAP, replacing it with _PAGE_UNUSED2 to allow for future uses of the PTE flag. Signed-off-by: David Vrabel <david.vrabel@citrix.com> Acked-by: "H. Peter Anvin" <hpa@zytor.com>
2014-09-22x86: use generic dma-contiguous.hZubair Lutfullah Kakakhel
dma-contiguous.h is now in asm-generic. Use that to avoid code repetition in x86. Signed-off-by: Zubair Lutfullah Kakakhel <Zubair.Kakakhel@imgtec.com> Acked-by: Michal Nazarewicz <mina86@mina86.com> Cc: catalin.marinas@arm.com Cc: will.deacon@arm.com Cc: tglx@linutronix.de Cc: mingo@redhat.com Cc: hpa@zytor.com Cc: arnd@arndb.de Cc: gregkh@linuxfoundation.org Cc: m.szyprowski@samsung.com Cc: x86@kernel.org Cc: linux-arm-kernel@lists.infradead.org Cc: linux-kernel@vger.kernel.org Cc: linux-mips@linux-mips.org Cc: linux-arch@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/7359/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org>
2014-09-19Merge branch 'x86-urgent-for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Ingo Molnar: "Misc fixes: EFI fixes, a build fix, a page table dumping (debug) fix and a clang build fix" * 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: efi/arm64: Fix fdt-related memory reservation x86/mm: Apply the section attribute to the variable, not its type x86/efi: Fixup GOT in all boot code paths x86/efi: Only load initrd above 4g on second try x86-64, ptdump: Mark espfix area only if existent x86, irq: Fix build error caused by 9eabc99a635a77cbf09
2014-09-19Merge tag 'efi-urgent' of ↵Ingo Molnar
git://git.kernel.org/pub/scm/linux/kernel/git/mfleming/efi into x86/urgent Pull EFI fix from Matt Fleming: * Increase the number of early_ioremap() slots to fix a regression with earlyprintk=efi after recent changes to the ACPI code (Dave Young) Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-17kvm: Remove ept_identity_pagetable from struct kvm_arch.Tang Chen
kvm_arch->ept_identity_pagetable holds the ept identity pagetable page. But it is never used to refer to the page at all. In vcpu initialization, it indicates two things: 1. indicates if ept page is allocated 2. indicates if a memory slot for identity page is initialized Actually, kvm_arch->ept_identity_pagetable_done is enough to tell if the ept identity pagetable is initialized. So we can remove ept_identity_pagetable. NOTE: In the original code, ept identity pagetable page is pinned in memroy. As a result, it cannot be migrated/hot-removed. After this patch, since kvm_arch->ept_identity_pagetable is removed, ept identity pagetable page is no longer pinned in memory. And it can be migrated/hot-removed. Signed-off-by: Tang Chen <tangchen@cn.fujitsu.com> Reviewed-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-16x86/mm/numa: Drop dead code and rename setup_node_data() to setup_alloc_data()Luiz Capitulino
The setup_node_data() function allocates a pg_data_t object, inserts it into the node_data[] array and initializes the following fields: node_id, node_start_pfn and node_spanned_pages. However, a few function calls later during the kernel boot, free_area_init_node() re-initializes those fields, possibly with setup_node_data() is not used. This causes a small glitch when running Linux as a hyperv numa guest: SRAT: PXM 0 -> APIC 0x00 -> Node 0 SRAT: PXM 0 -> APIC 0x01 -> Node 0 SRAT: PXM 1 -> APIC 0x02 -> Node 1 SRAT: PXM 1 -> APIC 0x03 -> Node 1 SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff] SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff] SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff] NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff] Initmem setup node 0 [mem 0x00000000-0x7fffffff] NODE_DATA [mem 0x7ffdc000-0x7ffeffff] Initmem setup node 1 [mem 0x80800000-0x1081fffff] NODE_DATA [mem 0x1081ea000-0x1081fdfff] crashkernel: memory value expected [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0 [ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1 Zone ranges: DMA [mem 0x00001000-0x00ffffff] DMA32 [mem 0x01000000-0xffffffff] Normal [mem 0x100000000-0x1081fffff] Movable zone start for each node Early memory node ranges node 0: [mem 0x00001000-0x0009efff] node 0: [mem 0x00100000-0x7ffeffff] node 1: [mem 0x80200000-0xf7ffffff] node 1: [mem 0x100000000-0x1081fffff] On node 0 totalpages: 524174 DMA zone: 64 pages used for memmap DMA zone: 21 pages reserved DMA zone: 3998 pages, LIFO batch:0 DMA32 zone: 8128 pages used for memmap DMA32 zone: 520176 pages, LIFO batch:31 On node 1 totalpages: 524288 DMA32 zone: 7672 pages used for memmap DMA32 zone: 491008 pages, LIFO batch:31 Normal zone: 520 pages used for memmap Normal zone: 33280 pages, LIFO batch:7 In this dmesg, the SRAT table reports that the memory range for node 1 starts at 0x80200000. However, the line starting with "Initmem" reports that node 1 memory range starts at 0x80800000. The "Initmem" line is reported by setup_node_data() and is wrong, because the kernel ends up using the range as reported in the SRAT table. This commit drops all that dead code from setup_node_data(), renames it to alloc_node_data() and adds a printk() to free_area_init_node() so that we report a node's memory range accurately. Here's the same dmesg section with this patch applied: SRAT: PXM 0 -> APIC 0x00 -> Node 0 SRAT: PXM 0 -> APIC 0x01 -> Node 0 SRAT: PXM 1 -> APIC 0x02 -> Node 1 SRAT: PXM 1 -> APIC 0x03 -> Node 1 SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff] SRAT: Node 1 PXM 1 [mem 0x80200000-0xf7ffffff] SRAT: Node 1 PXM 1 [mem 0x100000000-0x1081fffff] NUMA: Node 1 [mem 0x80200000-0xf7ffffff] + [mem 0x100000000-0x1081fffff] -> [mem 0x80200000-0x1081fffff] NODE_DATA(0) allocated [mem 0x7ffdc000-0x7ffeffff] NODE_DATA(1) allocated [mem 0x1081ea000-0x1081fdfff] crashkernel: memory value expected [ffffea0000000000-ffffea0001ffffff] PMD -> [ffff88007de00000-ffff88007fdfffff] on node 0 [ffffea0002000000-ffffea00043fffff] PMD -> [ffff880105600000-ffff8801077fffff] on node 1 Zone ranges: DMA [mem 0x00001000-0x00ffffff] DMA32 [mem 0x01000000-0xffffffff] Normal [mem 0x100000000-0x1081fffff] Movable zone start for each node Early memory node ranges node 0: [mem 0x00001000-0x0009efff] node 0: [mem 0x00100000-0x7ffeffff] node 1: [mem 0x80200000-0xf7ffffff] node 1: [mem 0x100000000-0x1081fffff] Initmem setup node 0 [mem 0x00001000-0x7ffeffff] On node 0 totalpages: 524174 DMA zone: 64 pages used for memmap DMA zone: 21 pages reserved DMA zone: 3998 pages, LIFO batch:0 DMA32 zone: 8128 pages used for memmap DMA32 zone: 520176 pages, LIFO batch:31 Initmem setup node 1 [mem 0x80200000-0x1081fffff] On node 1 totalpages: 524288 DMA32 zone: 7672 pages used for memmap DMA32 zone: 491008 pages, LIFO batch:31 Normal zone: 520 pages used for memmap Normal zone: 33280 pages, LIFO batch:7 This commit was tested on a two node bare-metal NUMA machine and Linux as a numa guest on hyperv and qemu/kvm. PS: The wrong memory range reported by setup_node_data() seems to be harmless in the current kernel because it's just not used. However, that bad range is used in kernel 2.6.32 to initialize the old boot memory allocator, which causes a crash during boot. Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com> Acked-by: Rik van Riel <riel@redhat.com> Cc: Andi Kleen <andi@firstfloor.org> Cc: David Rientjes <rientjes@google.com> Cc: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Cc: Yinghai Lu <yinghai@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-16x86/mm/hotplug: Modify PGD entry when removing memoryYasuaki Ishimatsu
When hot-adding/removing memory, sync_global_pgds() is called for synchronizing PGD to PGD entries of all processes MM. But when hot-removing memory, sync_global_pgds() does not work correctly. At first, sync_global_pgds() checks whether target PGD is none or not. And if PGD is none, the PGD is skipped. But when hot-removing memory, PGD may be none since PGD may be cleared by free_pud_table(). So when sync_global_pgds() is called after hot-removing memory, sync_global_pgds() should not skip PGD even if the PGD is none. And sync_global_pgds() must clear PGD entries of all processes MM. Currently sync_global_pgds() does not clear PGD entries of all processes MM when hot-removing memory. So when hot adding memory which is same memory range as removed memory after hot-removing memory, following call traces are shown: kernel BUG at arch/x86/mm/init_64.c:206! ... [<ffffffff815e0c80>] kernel_physical_mapping_init+0x1b2/0x1d2 [<ffffffff815ced94>] init_memory_mapping+0x1d4/0x380 [<ffffffff8104aebd>] arch_add_memory+0x3d/0xd0 [<ffffffff815d03d9>] add_memory+0xb9/0x1b0 [<ffffffff81352415>] acpi_memory_device_add+0x1af/0x28e [<ffffffff81325dc4>] acpi_bus_device_attach+0x8c/0xf0 [<ffffffff813413b9>] acpi_ns_walk_namespace+0xc8/0x17f [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7 [<ffffffff81325d38>] ? acpi_bus_type_and_status+0xb7/0xb7 [<ffffffff813418ed>] acpi_walk_namespace+0x95/0xc5 [<ffffffff81326b4c>] acpi_bus_scan+0x9a/0xc2 [<ffffffff81326bff>] acpi_scan_bus_device_check+0x8b/0x12e [<ffffffff81326cb5>] acpi_scan_device_check+0x13/0x15 [<ffffffff81320122>] acpi_os_execute_deferred+0x25/0x32 [<ffffffff8107e02b>] process_one_work+0x17b/0x460 [<ffffffff8107edfb>] worker_thread+0x11b/0x400 [<ffffffff8107ece0>] ? rescuer_thread+0x400/0x400 [<ffffffff81085aef>] kthread+0xcf/0xe0 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 [<ffffffff815fc76c>] ret_from_fork+0x7c/0xb0 [<ffffffff81085a20>] ? kthread_create_on_node+0x140/0x140 This patch clears PGD entries of all processes MM when sync_global_pgds() is called after hot-removing memory Signed-off-by: Yasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com> Acked-by: Toshi Kani <toshi.kani@hp.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Cc: Tang Chen <tangchen@cn.fujitsu.com> Cc: Gu Zheng <guz.fnst@cn.fujitsu.com> Cc: Zhang Yanfei <zhangyanfei@cn.fujitsu.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-14x86 early_ioremap: Increase FIX_BTMAPS_SLOTS to 8Dave Young
3.16 kernel boot fail with earlyprintk=efi, it keeps scrolling at the bottom line of screen. Bisected, the first bad commit is below: commit 86dfc6f339886559d80ee0d4bd20fe5ee90450f0 Author: Lv Zheng <lv.zheng@intel.com> Date: Fri Apr 4 12:38:57 2014 +0800 ACPICA: Tables: Fix table checksums verification before installation. I did some debugging by enabling both serial and efi earlyprintk, below is some debug dmesg, seems early_ioremap fails in scroll up function due to no free slot, see below dmesg output: WARNING: CPU: 0 PID: 0 at mm/early_ioremap.c:116 __early_ioremap+0x90/0x1c4() __early_ioremap(ed00c800, 00000c80) not found slot Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 3.17.0-rc1+ #204 Hardware name: Hewlett-Packard HP Z420 Workstation/1589, BIOS J61 v03.15 05/09/2013 Call Trace: dump_stack+0x4e/0x7a warn_slowpath_common+0x75/0x8e ? __early_ioremap+0x90/0x1c4 warn_slowpath_fmt+0x47/0x49 __early_ioremap+0x90/0x1c4 ? sprintf+0x46/0x48 early_ioremap+0x13/0x15 early_efi_map+0x24/0x26 early_efi_scroll_up+0x6d/0xc0 early_efi_write+0x1b0/0x214 call_console_drivers.constprop.21+0x73/0x7e console_unlock+0x151/0x3b2 ? vprintk_emit+0x49f/0x532 vprintk_emit+0x521/0x532 ? console_unlock+0x383/0x3b2 printk+0x4f/0x51 acpi_os_vprintf+0x2b/0x2d acpi_os_printf+0x43/0x45 acpi_info+0x5c/0x63 ? __acpi_map_table+0x13/0x18 ? acpi_os_map_iomem+0x21/0x147 acpi_tb_print_table_header+0x177/0x186 acpi_tb_install_table_with_override+0x4b/0x62 acpi_tb_install_standard_table+0xd9/0x215 ? early_ioremap+0x13/0x15 ? __acpi_map_table+0x13/0x18 acpi_tb_parse_root_table+0x16e/0x1b4 acpi_initialize_tables+0x57/0x59 acpi_table_init+0x50/0xce acpi_boot_table_init+0x1e/0x85 setup_arch+0x9b7/0xcc4 start_kernel+0x94/0x42d ? early_idt_handlers+0x120/0x120 x86_64_start_reservations+0x2a/0x2c x86_64_start_kernel+0xf3/0x100 Quote reply from Lv.zheng about the early ioremap slot usage in this case: """ In early_efi_scroll_up(), 2 mapping entries will be used for the src/dst screen buffer. In drivers/acpi/acpica/tbutils.c, we've improved the early table loading code in acpi_tb_parse_root_table(). We now need 2 mapping entries: 1. One mapping entry is used for RSDT table mapping. Each RSDT entry contains an address for another ACPI table. 2. For each entry in RSDP, we need another mapping entry to map the table to perform necessary check/override before installing it. When acpi_tb_parse_root_table() prints something through EFI earlyprintk console, we'll have 4 mapping entries used. The current 4 slots setting of early_ioremap() seems to be too small for such a use case. """ Thus increase the slot to 8 in this patch to fix this issue. boot-time mappings become 512 page with this patch. Signed-off-by: Dave Young <dyoung@redhat.com> Cc: <stable@vger.kernel.org> # v3.16 Signed-off-by: Matt Fleming <matt.fleming@intel.com>
2014-09-13Make ARCH_HAS_FAST_MULTIPLIER a real config variableLinus Torvalds
It used to be an ad-hoc hack defined by the x86 version of <asm/bitops.h> that enabled a couple of library routines to know whether an integer multiply is faster than repeated shifts and additions. This just makes it use the real Kconfig system instead, and makes x86 (which was the only architecture that did this) select the option. NOTE! Even for x86, this really is kind of wrong. If we cared, we would probably not enable this for builds optimized for netburst (P4), where shifts-and-adds are generally faster than multiplies. This patch does *not* change that kind of logic, though, it is purely a syntactic change with no code changes. This was triggered by the fact that we have other places that really want to know "do I want to expand multiples by constants by hand or not", particularly the hash generation code. Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2014-09-13x86: Tell irq work about self IPI supportFrederic Weisbecker
x86 supports irq work self-IPIs when local apic is available. This is partly known on runtime so lets implement arch_irq_work_has_interrupt() accordingly. This should be safely called after setup_arch(). Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-09-13irq_work: Introduce arch_irq_work_has_interrupt()Peter Zijlstra
The nohz full code needs irq work to trigger its own interrupt so that the subsystem can work even when the tick is stopped. Lets introduce arch_irq_work_has_interrupt() that archs can override to tell about their support for this ability. Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-09-11Merge tag 'stable/for-linus-3.17-b-rc4-tag' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip Pull Xen bug fixes from David Vrabel: - fix for PVHVM suspend/resume and migration - don't pointlessly retry certain ballooning ops - fix gntalloc when grefs have run out. - fix PV boot if KSALR is enable or very large modules are used. * tag 'stable/for-linus-3.17-b-rc4-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip: x86/xen: don't copy bogus duplicate entries into kernel page tables xen/gntalloc: safely delete grefs in add_grefs() undo path xen/gntalloc: fix oops after runnning out of grant refs xen/balloon: cancel ballooning if adding new memory failed xen/manage: Always freeze/thaw processes when suspend/resuming
2014-09-11x86: Add more disabled featuresDave Hansen
The original motivation for these patches was for an Intel CPU feature called MPX. The patch to add a disabled feature for it will go in with the other parts of the support. But, in the meantime, there are a few other features than MPX that we can make assumptions about at compile-time based on compile options. Add them to disabled-features.h and check them with cpu_feature_enabled(). Note that this gets rid of the last things that needed an #ifdef CONFIG_X86_64 in cpufeature.h. Yay! Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140911211524.C0EC332A@viggo.jf.intel.com Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-11x86: Introduce disabled-featuresDave Hansen
I believe the REQUIRED_MASK aproach was taken so that it was easier to consult in assembly (arch/x86/kernel/verify_cpu.S). DISABLED_MASK does not have the same restriction, but I implemented it the same way for consistency. We have a REQUIRED_MASK... which does two things: 1. Keeps a list of cpuid bits to check in very early boot and refuse to boot if those are not present. 2. Consulted during cpu_has() checks, which allows us to optimize out things at compile-time. In other words, if we *KNOW* we will not boot with the feature off, then we can safely assume that it will be present forever. But, we don't have a similar mechanism for CPU features which may be present but that we know we will not use. We simply use our existing mechanisms to repeatedly check the status of the bit at runtime (well, the alternatives patching helps here but it does not provide compile-time optimization). Adding a feature to disabled-features.h allows the bit to be checked via a new macro: cpu_feature_enabled(). Note that for features in DISABLED_MASK, checks with this macro have all of the benefits of an #ifdef. Before, we would have done this in a header: #ifdef CONFIG_X86_INTEL_MPX #define cpu_has_mpx cpu_has(X86_FEATURE_MPX) #else #define cpu_has_mpx 0 #endif and this in the code: if (cpu_has_mpx) do_some_mpx_thing(); Now, just add your feature to DISABLED_MASK and you can do this everywhere, and get the same benefits you would have from #ifdefs: if (cpu_feature_enabled(X86_FEATURE_MPX)) do_some_mpx_thing(); We need a new function and *not* a modification to cpu_has() because there are cases where we actually need to check the CPU itself, despite what features the kernel supports. The best example of this is a hypervisor which has no control over what features its guests are using and where the guest does not depend on the host for support. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140911211513.9E35E931@viggo.jf.intel.com Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-11x86: Axe the lightly-used cpu_has_paeDave Hansen
cpu_has_pae is only referenced in one place: the X86_32 kexec code (in a file not even built on 64-bit). It hardly warrants its own macro, or the trouble we go to ensuring that it can't be called in X86_64 code. Axe the macro and replace it with a direct cpu feature check. Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Link: http://lkml.kernel.org/r/20140911211511.AD76E774@viggo.jf.intel.com Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-10x86/xen: don't copy bogus duplicate entries into kernel page tablesStefan Bader
When RANDOMIZE_BASE (KASLR) is enabled; or the sum of all loaded modules exceeds 512 MiB, then loading modules fails with a warning (and hence a vmalloc allocation failure) because the PTEs for the newly-allocated vmalloc address space are not zero. WARNING: CPU: 0 PID: 494 at linux/mm/vmalloc.c:128 vmap_page_range_noflush+0x2a1/0x360() This is caused by xen_setup_kernel_pagetables() copying level2_kernel_pgt into level2_fixmap_pgt, overwriting many non-present entries. Without KASLR, the normal kernel image size only covers the first half of level2_kernel_pgt and module space starts after that. L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[ 0..255]->kernel [256..511]->module [511]->level2_fixmap_pgt[ 0..505]->module This allows 512 MiB of of module vmalloc space to be used before having to use the corrupted level2_fixmap_pgt entries. With KASLR enabled, the kernel image uses the full PUD range of 1G and module space starts in the level2_fixmap_pgt. So basically: L4[511]->level3_kernel_pgt[510]->level2_kernel_pgt[0..511]->kernel [511]->level2_fixmap_pgt[0..505]->module And now no module vmalloc space can be used without using the corrupt level2_fixmap_pgt entries. Fix this by properly converting the level2_fixmap_pgt entries to MFNs, and setting level1_fixmap_pgt as read-only. A number of comments were also using the the wrong L3 offset for level2_kernel_pgt. These have been corrected. Signed-off-by: Stefan Bader <stefan.bader@canonical.com> Signed-off-by: David Vrabel <david.vrabel@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: stable@vger.kernel.org
2014-09-10locking/rwlock, x86: Delete unused asm/rwlock.h and rwlock.SWaiman Long
This patch removes the unused asm/rwlock.h and rwlock.S files. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: http://lkml.kernel.org/r/1408037251-45918-3-git-send-email-Waiman.Long@hp.com Cc: Scott J Norton <scott.norton@hp.com> Cc: Borislav Petkov <bp@suse.de> Cc: Daniel Borkmann <dborkman@redhat.com> Cc: David S. Miller <davem@davemloft.net> Cc: Francesco Fusco <ffusco@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Thomas Graf <tgraf@redhat.com> Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-10locking/rwlock, x86: Clean up asm/spinlock*.h to remove old rwlock codeWaiman Long
As the x86 architecture now uses qrwlock for its read/write lock implementation, it is no longer necessary to keep the old rwlock code around. This patch removes the old rwlock code in the asm/spinlock.h and asm/spinlock_types.h files. Now the ARCH_USE_QUEUE_RWLOCK config parameter cannot be removed from x86/Kconfig or there will be a compilation error. Signed-off-by: Waiman Long <Waiman.Long@hp.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Scott J Norton <scott.norton@hp.com> Cc: Dave Jones <davej@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Waiman Long <Waiman.Long@hp.com> Link: http://lkml.kernel.org/r/1408037251-45918-2-git-send-email-Waiman.Long@hp.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-09Merge tag 'v3.17-rc4' into perf/core, to pick up fixesIngo Molnar
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-08x86_64, entry: Treat regs->ax the same in fastpath and slowpath syscallsAndy Lutomirski
For slowpath syscalls, we initialize regs->ax to -ENOSYS and stick the syscall number into regs->orig_ax prior to any possible tracing and syscall execution. This is user-visible ABI used by ptrace syscall emulation and seccomp. For fastpath syscalls, there's no good reason not to do the same thing. It's even slightly simpler than what we're currently doing. It probably has no measureable performance impact. It should have no user-visible effect. The purpose of this patch is to prepare for two-phase syscall tracing, in which the first phase might modify the saved RAX without leaving the fast path. This change is just subtle enough that I'm keeping it separate. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/01218b493f12ae2f98034b78c9ae085e38e94350.1409954077.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-08x86: Split syscall_trace_enter into two phasesAndy Lutomirski
This splits syscall_trace_enter into syscall_trace_enter_phase1 and syscall_trace_enter_phase2. Only phase 2 has full pt_regs, and only phase 2 is permitted to modify any of pt_regs except for orig_ax. The intent is that phase 1 can be called from the syscall fast path. In this implementation, phase1 can handle any combination of TIF_NOHZ (RCU context tracking), TIF_SECCOMP, and TIF_SYSCALL_AUDIT, unless seccomp requests a ptrace event, in which case phase2 is forced. In principle, this could yield a big speedup for TIF_NOHZ as well as for TIF_SECCOMP if syscall exit work were similarly split up. Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/2df320a600020fda055fccf2b668145729dd0c04.1409954077.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-06x86/tty/serial/8250: Clean up the asm/serial.h include file a bitIngo Molnar
- correct spelling - align fields vertically to make things more readable - make the layout of magic defines more obvious Cc: Mark Rustad <mark.d.rustad@intel.com> Cc: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Link: http://lkml.kernel.org/r/1409972149-26272-1-git-send-email-jeffrey.t.kirsher@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-06x86/tty/serial/8250: Resolve missing-field-initializers warningsMark Rustad
Resolve some missing-field-initializers warnings by using designated initialization in the expansion of the SERIAL_PORT_DFNS macro. Signed-off-by: Mark Rustad <mark.d.rustad@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com> Link: http://lkml.kernel.org/r/1409972149-26272-1-git-send-email-jeffrey.t.kirsher@intel.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-09-05KVM: x86: propagate exception from permission checks on the nested page faultPaolo Bonzini
Currently, if a permission error happens during the translation of the final GPA to HPA, walk_addr_generic returns 0 but does not fill in walker->fault. To avoid this, add an x86_exception* argument to the translate_gpa function, and let it fill in walker->fault. The nested_page_fault field will be true, since the walk_mmu is the nested_mmu and translate_gpu instead operates on the "outer" (NPT) instance. Reported-by: Valentine Sinitsyn <valentine.sinitsyn@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-05KVM: x86: skip writeback on injection of nested exceptionPaolo Bonzini
If a nested page fault happens during emulation, we will inject a vmexit, not a page fault. However because writeback happens after the injection, we will write ctxt->eip from L2 into the L1 EIP. We do not write back if an instruction caused an interception vmexit---do the same for page faults. Suggested-by: Gleb Natapov <gleb@kernel.org> Reviewed-by: Gleb Natapov <gleb@kernel.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-03kvm: x86: fix stale mmio cache bugDavid Matlack
The following events can lead to an incorrect KVM_EXIT_MMIO bubbling up to userspace: (1) Guest accesses gpa X without a memory slot. The gfn is cached in struct kvm_vcpu_arch (mmio_gfn). On Intel EPT-enabled hosts, KVM sets the SPTE write-execute-noread so that future accesses cause EPT_MISCONFIGs. (2) Host userspace creates a memory slot via KVM_SET_USER_MEMORY_REGION covering the page just accessed. (3) Guest attempts to read or write to gpa X again. On Intel, this generates an EPT_MISCONFIG. The memory slot generation number that was incremented in (2) would normally take care of this but we fast path mmio faults through quickly_check_mmio_pf(), which only checks the per-vcpu mmio cache. Since we hit the cache, KVM passes a KVM_EXIT_MMIO up to userspace. This patch fixes the issue by using the memslot generation number to validate the mmio cache. Cc: stable@vger.kernel.org Signed-off-by: David Matlack <dmatlack@google.com> [xiaoguangrong: adjust the code to make it simpler for stable-tree fix.] Signed-off-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Reviewed-by: David Matlack <dmatlack@google.com> Reviewed-by: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Tested-by: David Matlack <dmatlack@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2014-09-02x86, fpu: Change __thread_fpu_begin() to use use_eager_fpu()Oleg Nesterov
__thread_fpu_begin() checks X86_FEATURE_EAGER_FPU by hand, we have a helper for that. Signed-off-by: Oleg Nesterov <oleg@redhat.com> Link: http://lkml.kernel.org/r/20140902175720.GA21656@redhat.com Reviewed-by: Suresh Siddha <sbsiddha@gmail.com> Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
2014-09-01x86: Remove set_pmd_pfnMatthew Wilcox
The last user of set_pmd_pfn() went away in commit f03574f2d5b2, so this has been dead code for over a year. Signed-off-by: Matthew Wilcox <matthew.r.wilcox@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> arch/x86/include/asm/pgtable_32.h | 3 --- arch/x86/mm/pgtable_32.c | 35 ----------------------------------- 2 files changed, 38 deletions(-)