summaryrefslogtreecommitdiff
path: root/arch/x86
AgeCommit message (Collapse)Author
2022-07-19x86/fpu: Add a helper to prepare AMX state for low-power CPU idleChang S. Bae
When a CPU enters an idle state, a non-initialized AMX register state may be the cause of preventing a deeper low-power state. Other extended register states whether initialized or not do not impact the CPU idle state. The new helper can ensure the AMX state is initialized before the CPU is idle, and it will be used by the intel idle driver. Check the AMX_TILE feature bit before using XGETBV1 as a chain of dependencies was established via cpuid_deps[]: AMX->XFD->XGETBV1. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20220608164748.11864-2-chang.seok.bae@intel.com
2022-07-19x86/mm/tlb: Ignore f->new_tlb_gen when zeroNadav Amit
Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible") introduced an optimization to skip superfluous TLB flushes based on the generation provided in flush_tlb_info. However, arch_tlbbatch_flush() does not provide any generation in flush_tlb_info and populates the flush_tlb_info generation with 0. This 0 is causes the flush_tlb_info to be interpreted as a superfluous, old flush. As a result, try_to_unmap_one() would not perform any TLB flushes. Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is anyhow is an invalid generation value. To avoid future confusion, introduce TLB_GENERATION_INVALID constant and use it properly. Add warnings to ensure no partial flushes are done with TLB_GENERATION_INVALID or when f->mm is NULL, since this does not make any sense. In addition, add the missing unlikely(). [ dhansen: change VM_BUG_ON() -> VM_WARN_ON(), clarify changelog ] Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Hugh Dickins <hughd@google.com> Link: https://lkml.kernel.org/r/20220710232837.3618-1-namit@vmware.com
2022-07-18x86/amd: Use IBPB for firmware callsPeter Zijlstra
On AMD IBRS does not prevent Retbleed; as such use IBPB before a firmware call to flush the branch history state. And because in order to do an EFI call, the kernel maps a whole lot of the kernel page table into the EFI page table, do an IBPB just in case in order to prevent the scenario of poisoning the BTB and causing an EFI call using the unprotected RET there. [ bp: Massage. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220715194550.793957-1-cascardo@canonical.com
2022-07-18x86/rdrand: Remove "nordrand" flag in favor of "random.trust_cpu"Jason A. Donenfeld
The decision of whether or not to trust RDRAND is controlled by the "random.trust_cpu" boot time parameter or the CONFIG_RANDOM_TRUST_CPU compile time default. The "nordrand" flag was added during the early days of RDRAND, when there were worries that merely using its values could compromise the RNG. However, these days, RDRAND values are not used directly but always go through the RNG's hash function, making "nordrand" no longer useful. Rather, the correct switch is "random.trust_cpu", which not only handles the relevant trust issue directly, but also is general to multiple CPU types, not just x86. However, x86 RDRAND does have a history of being occasionally problematic. Prior, when the kernel would notice something strange, it'd warn in dmesg and suggest enabling "nordrand". We can improve on that by making the test a little bit better and then taking the step of automatically disabling RDRAND if we detect it's problematic. Also disable RDSEED if the RDRAND test fails. Cc: x86@kernel.org Cc: Theodore Ts'o <tytso@mit.edu> Suggested-by: H. Peter Anvin <hpa@zytor.com> Suggested-by: Borislav Petkov <bp@suse.de> Acked-by: Borislav Petkov <bp@suse.de> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-07-18random: remove CONFIG_ARCH_RANDOMJason A. Donenfeld
When RDRAND was introduced, there was much discussion on whether it should be trusted and how the kernel should handle that. Initially, two mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and "nordrand", a boot-time switch. Later the thinking evolved. With a properly designed RNG, using RDRAND values alone won't harm anything, even if the outputs are malicious. Rather, the issue is whether those values are being *trusted* to be good or not. And so a new set of options were introduced as the real ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu". With these options, RDRAND is used, but it's not always credited. So in the worst case, it does nothing, and in the best case, maybe it helps. Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the center and became something certain platforms force-select. The old options don't really help with much, and it's a bit odd to have special handling for these instructions when the kernel can deal fine with the existence or untrusted existence or broken existence or non-existence of that CPU capability. Simplify the situation by removing CONFIG_ARCH_RANDOM and using the ordinary asm-generic fallback pattern instead, keeping the two options that are actually used. For now it leaves "nordrand" for now, as the removal of that will take a different route. Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-07-17x86/cacheinfo: move shared cache map definitionsSander Vanheule
Patch series "cpumask: Fix invalid uniprocessor assumptions", v4. On uniprocessor builds, it is currently assumed that any cpumask will contain the single CPU: cpu0. This assumption is used to provide optimised implementations. The current assumption also appears to be wrong, by ignoring the fact that users can provide empty cpumasks. This can result in bugs as explained in [1] - for_each_cpu() will run one iteration of the loop even when passed an empty cpumask. This series introduces some basic tests, and updates the optimisations for uniprocessor builds. The x86 patch was written after the kernel test robot [2] ran into a failed build. I have tried to list the files potentially affected by the changes to cpumask.h, in an attempt to find any other cases that fail on !SMP. I've gone through some of the files manually, and ran a few cross builds, but nothing else popped up. I (build) checked about half of the potientally affected files, but I do not have the resources to do them all. I hope we can fix other issues if/when they pop up later. [1] https://lore.kernel.org/all/20220530082552.46113-1-sander@svanheule.net/ [2] https://lore.kernel.org/all/202206060858.wA0FOzRy-lkp@intel.com/ This patch (of 5): The maps to keep track of shared caches between CPUs on SMP systems are declared in asm/smp.h, among them specifically cpu_llc_shared_map. These maps are externally defined in cpu/smpboot.c. The latter is only compiled on CONFIG_SMP=y, which means the declared extern symbols from asm/smp.h do not have a corresponding definition on uniprocessor builds. The inline cpu_llc_shared_mask() function from asm/smp.h refers to the map declaration mentioned above. This function is referenced in cacheinfo.c inside for_each_cpu() loop macros, to provide cpumask for the loop. On uniprocessor builds, the symbol for the cpu_llc_shared_map does not exist. However, the current implementation of for_each_cpu() also (wrongly) ignores the provided mask. By sheer luck, the compiler thus optimises out this unused reference to cpu_llc_shared_map, and the linker therefore does not require the cpu_llc_shared_mask to actually exist on uniprocessor builds. Only on SMP bulids does smpboot.o exist to provide the required symbols. To no longer rely on compiler optimisations for successful uniprocessor builds, move the definitions of cpu_llc_shared_map and cpu_l2c_shared_map from smpboot.c to cacheinfo.c. Link: https://lkml.kernel.org/r/cover.1656777646.git.sander@svanheule.net Link: https://lkml.kernel.org/r/e8167ddb570f56744a3dc12c2149a660a324d969.1656777646.git.sander@svanheule.net Signed-off-by: Sander Vanheule <sander@svanheule.net> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Marco Elver <elver@google.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Valentin Schneider <vschneid@redhat.com> Cc: Yury Norov <yury.norov@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17mm/mmap: drop ARCH_HAS_VM_GET_PAGE_PROTAnshuman Khandual
Now all the platforms enable ARCH_HAS_GET_PAGE_PROT. They define and export own vm_get_page_prot() whether custom or standard DECLARE_VM_GET_PAGE_PROT. Hence there is no need for default generic fallback for vm_get_page_prot(). Just drop this fallback and also ARCH_HAS_GET_PAGE_PROT mechanism. Link: https://lkml.kernel.org/r/20220711070600.2378316-27-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17um/mm: enable ARCH_HAS_VM_GET_PAGE_PROTAnshuman Khandual
This enables ARCH_HAS_VM_GET_PAGE_PROT on the platform and exports standard vm_get_page_prot() implementation via DECLARE_VM_GET_PAGE_PROT, which looks up a private and static protection_map[] array. Subsequently all __SXXX and __PXXX macros can be dropped which are no longer needed. Link: https://lkml.kernel.org/r/20220711070600.2378316-25-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@infradead.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17x86/mm: move protection_map[] inside the platformAnshuman Khandual
This moves protection_map[] inside the platform and makes it a static. This also defines a helper function add_encrypt_protection_map() that can update the protection_map[] array with pgprot_encrypted(). Link: https://lkml.kernel.org/r/20220711070600.2378316-7-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-17um: include linux/stddef.h for __always_inlineJason A. Donenfeld
When compiling against musl, their shipped <stddef.h> doesn't have __always_inline. So instead explicitly include the kernel uapi header, <linux/stddef.h>, which does. This prevents the following build error: In file included from arch/x86/um/shared/sysdep/stub.h:11, from arch/um/kernel/skas/clone.c:14: arch/x86/um/shared/sysdep/stub_64.h:111:23: error: expected ‘;’ before ‘void’ 111 | static __always_inline void *get_stub_page(void) | ^~~~~ | ; make[4]: *** [scripts/Makefile.build:249: arch/um/kernel/skas/clone.o] Error 1 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2022-07-17UML: add support for KASAN under x86_64Patricia Alfonso
Make KASAN run on User Mode Linux on x86_64. The UML-specific KASAN initializer uses mmap to map the ~16TB of shadow memory to the location defined by KASAN_SHADOW_OFFSET. kasan_init() utilizes constructors to initialize KASAN before main(). The location of the KASAN shadow memory, starting at KASAN_SHADOW_OFFSET, can be configured using the KASAN_SHADOW_OFFSET option. The default location of this offset is 0x100000000000, which keeps it out-of-the-way even on UML setups with more "physical" memory. For low-memory setups, 0x7fff8000 can be used instead, which fits in an immediate and is therefore faster, as suggested by Dmitry Vyukov. There is usually enough free space at this location; however, it is a config option so that it can be easily changed if needed. Note that, unlike KASAN on other architectures, vmalloc allocations still use the shadow memory allocated upfront, rather than allocating and free-ing it per-vmalloc allocation. If another architecture chooses to go down the same path, we should replace the checks for CONFIG_UML with something more generic, such as: - A CONFIG_KASAN_NO_SHADOW_ALLOC option, which architectures could set - or, a way of having architecture-specific versions of these vmalloc and module shadow memory allocation options. Also note that, while UML supports both KASAN in inline mode (CONFIG_KASAN_INLINE) and static linking (CONFIG_STATIC_LINK), it does not support both at the same time. Signed-off-by: Patricia Alfonso <trishalfonso@google.com> Co-developed-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: Vincent Whitchurch <vincent.whitchurch@axis.com> Signed-off-by: David Gow <davidgow@google.com> Reviewed-by: Johannes Berg <johannes@sipsolutions.net> Reviewed-by: Dmitry Vyukov <dvyukov@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2022-07-17um: x86: print RIP with symbolJohannes Berg
This is not only nicer to read by default, but also lets decode_stacktrace.sh work on it, rather than removing it. Signed-off-by: Johannes Berg <johannes.berg@intel.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2022-07-17x86/um: Kconfig: Fix indentationJuerg Haefliger
The convention for indentation seems to be a single tab. Help text is further indented by an additional two whitespaces. Fix the lines that violate these rules. Signed-off-by: Juerg Haefliger <juerg.haefliger@canonical.com> Signed-off-by: Richard Weinberger <richard@nod.at>
2022-07-17Merge tag 'x86_urgent_for_v5.19_rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - Improve the check whether the kernel supports WP mappings so that it can accomodate a XenPV guest due to how the latter is setting up the PAT machinery - Now that the retbleed nightmare is public, here's the first round of fallout fixes: * Fix a build failure on 32-bit due to missing include * Remove an untraining point in espfix64 return path * other small cleanups * tag 'x86_urgent_for_v5.19_rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/bugs: Remove apostrophe typo um: Add missing apply_returns() x86/entry: Remove UNTRAIN_RET from native_irq_return_ldt x86/bugs: Mark retbleed_strings static x86/pat: Fix x86_has_pat_wp() x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bit
2022-07-16Merge tag 'acpi-5.19-rc7' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm Pull ACPI fix from Rafael Wysocki: "Fix more fallout from recent changes of the ACPI CPPC handling on AMD platforms (Mario Limonciello)" * tag 'acpi-5.19-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: ACPI: CPPC: Fix enabling CPPC on AMD systems with shared memory
2022-07-16efi/x86: use naked RET on mixed mode call wrapperThadeu Lima de Souza Cascardo
When running with return thunks enabled under 32-bit EFI, the system crashes with: kernel tried to execute NX-protected page - exploit attempt? (uid: 0) BUG: unable to handle page fault for address: 000000005bc02900 #PF: supervisor instruction fetch in kernel mode #PF: error_code(0x0011) - permissions violation PGD 18f7063 P4D 18f7063 PUD 18ff063 PMD 190e063 PTE 800000005bc02063 Oops: 0011 [#1] PREEMPT SMP PTI CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.19.0-rc6+ #166 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 0.0.0 02/06/2015 RIP: 0010:0x5bc02900 Code: Unable to access opcode bytes at RIP 0x5bc028d6. RSP: 0018:ffffffffb3203e10 EFLAGS: 00010046 RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000048 RDX: 000000000190dfac RSI: 0000000000001710 RDI: 000000007eae823b RBP: ffffffffb3203e70 R08: 0000000001970000 R09: ffffffffb3203e28 R10: 747563657865206c R11: 6c6977203a696665 R12: 0000000000001710 R13: 0000000000000030 R14: 0000000001970000 R15: 0000000000000001 FS: 0000000000000000(0000) GS:ffff8e013ca00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0018 ES: 0018 CR0: 0000000080050033 CR2: 000000005bc02900 CR3: 0000000001930000 CR4: 00000000000006f0 Call Trace: ? efi_set_virtual_address_map+0x9c/0x175 efi_enter_virtual_mode+0x4a6/0x53e start_kernel+0x67c/0x71e x86_64_start_reservations+0x24/0x2a x86_64_start_kernel+0xe9/0xf4 secondary_startup_64_no_verify+0xe5/0xeb That's because it cannot jump to the return thunk from the 32-bit code. Using a naked RET and marking it as safe allows the system to proceed booting. Fixes: aa3d480315ba ("x86: Use return-thunk in asm code") Reported-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: <stable@vger.kernel.org> Tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-16x86/bugs: Remove apostrophe typoKim Phillips
Remove a superfluous ' in the mitigation string. Fixes: e8ec1b6e08a2 ("x86/bugs: Enable STIBP for JMP2RET") Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-15Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvmLinus Torvalds
Pull KVM fixes from Paolo Bonzini: "RISC-V: - Fix missing PAGE_PFN_MASK - Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests() x86: - Fix for nested virtualization when TSC scaling is active - Estimate the size of fastcc subroutines conservatively, avoiding disastrous underestimation when return thunks are enabled - Avoid possible use of uninitialized fields of 'struct kvm_lapic_irq' Generic: - Mark as such the boolean values available from the statistics file descriptors - Clarify statistics documentation" * tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm: KVM: emulate: do not adjust size of fastop and setcc subroutines KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op() Documentation: kvm: clarify histogram units kvm: stats: tell userspace which values are boolean x86/kvm: fix FASTOP_SIZE when return thunks are enabled KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1 RISC-V: KVM: Fix SRCU deadlock caused by kvm_riscv_check_vcpu_requests() riscv: Fix missing PAGE_PFN_MASK
2022-07-15kexec, KEYS: make the code in bzImage64_verify_sig genericCoiby Xu
commit 278311e417be ("kexec, KEYS: Make use of platform keyring for signature verify") adds platform keyring support on x86 kexec but not arm64. The code in bzImage64_verify_sig uses the keys on the .builtin_trusted_keys, .machine, if configured and enabled, .secondary_trusted_keys, also if configured, and .platform keyrings to verify the signed kernel image as PE file. Cc: kexec@lists.infradead.org Cc: keyrings@vger.kernel.org Cc: linux-security-module@vger.kernel.org Reviewed-by: Michal Suchanek <msuchanek@suse.de> Signed-off-by: Coiby Xu <coxu@redhat.com> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-15kexec: drop weak attribute from functionsNaveen N. Rao
Drop __weak attribute from functions in kexec_core.c: - machine_kexec_post_load() - arch_kexec_protect_crashkres() - arch_kexec_unprotect_crashkres() - crash_free_reserved_phys_range() Link: https://lkml.kernel.org/r/c0f6219e03cb399d166d518ab505095218a902dd.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Suggested-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-15kexec_file: drop weak attribute from functionsNaveen N. Rao
As requested (http://lkml.kernel.org/r/87ee0q7b92.fsf@email.froward.int.ebiederm.org), this series converts weak functions in kexec to use the #ifdef approach. Quoting the 3e35142ef99fe ("kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]") changelog: : Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") : [1], binutils (v2.36+) started dropping section symbols that it thought : were unused. This isn't an issue in general, but with kexec_file.c, gcc : is placing kexec_arch_apply_relocations[_add] into a separate : .text.unlikely section and the section symbol ".text.unlikely" is being : dropped. Due to this, recordmcount is unable to find a non-weak symbol in : .text.unlikely to generate a relocation record against. This patch (of 2); Drop __weak attribute from functions in kexec_file.c: - arch_kexec_kernel_image_probe() - arch_kimage_file_post_load_cleanup() - arch_kexec_kernel_image_load() - arch_kexec_locate_mem_hole() - arch_kexec_kernel_verify_sig() arch_kexec_kernel_image_load() calls into kexec_image_load_default(), so drop the static attribute for the latter. arch_kexec_kernel_verify_sig() is not overridden by any architecture, so drop the __weak attribute. Link: https://lkml.kernel.org/r/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Link: https://lkml.kernel.org/r/2cd7ca1fe4d6bb6ca38e3283c717878388ed6788.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Suggested-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-15x86/olpc: fix 'logical not is only applied to the left hand side'Alexander Lobakin
The bitops compile-time optimization series revealed one more problem in olpc-xo1-sci.c:send_ebook_state(), resulted in GCC warnings: arch/x86/platform/olpc/olpc-xo1-sci.c: In function 'send_ebook_state': arch/x86/platform/olpc/olpc-xo1-sci.c:83:63: warning: logical not is only applied to the left hand side of comparison [-Wlogical-not-parentheses] 83 | if (!!test_bit(SW_TABLET_MODE, ebook_switch_idev->sw) == state) | ^~ arch/x86/platform/olpc/olpc-xo1-sci.c:83:13: note: add parentheses around left hand side expression to silence this warning Despite this code working as intended, this redundant double negation of boolean value, together with comparing to `char` with no explicit conversion to bool, makes compilers think the author made some unintentional logical mistakes here. Make it the other way around and negate the char instead to silence the warnings. Fixes: d2aa37411b8e ("x86/olpc/xo1/sci: Produce wakeup events for buttons and switches") Cc: stable@vger.kernel.org # 3.5+ Reported-by: Guenter Roeck <linux@roeck-us.net> Reported-by: kernel test robot <lkp@intel.com> Reviewed-and-tested-by: Guenter Roeck <linux@roeck-us.net> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-07-15KVM: emulate: do not adjust size of fastop and setcc subroutinesPaolo Bonzini
Instead of doing complicated calculations to find the size of the subroutines (which are even more complicated because they need to be stringified into an asm statement), just hardcode to 16. It is less dense for a few combinations of IBT/SLS/retbleed, but it has the advantage of being really simple. Cc: stable@vger.kernel.org # 5.15.x: 84e7051c0bc1: x86/kvm: fix FASTOP_SIZE when return thunks are enabled Cc: stable@vger.kernel.org Suggested-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-15crypto: x86/blowfish - remove redundant assignment to variable nytesColin Ian King
Variable nbytes is being assigned a value that is never read, it is being re-assigned in the next statement in the while-loop. The assignment is redundant and can be removed. Cleans up clang scan-build warnings, e.g.: arch/x86/crypto/blowfish_glue.c:147:10: warning: Although the value stored to 'nbytes' is used in the enclosing expression, the value is never actually read from 'nbytes' Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2022-07-15x86/boot/tboot: Move tboot_force_iommu() to Intel IOMMULu Baolu
tboot_force_iommu() is only called by the Intel IOMMU driver. Move the helper into that driver. No functional change intended. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20220514014322.2927339-7-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-15KVM: x86: Remove unnecessary includeLu Baolu
intel-iommu.h is not needed in kvm/x86 anymore. Remove its include. Signed-off-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Steve Wahl <steve.wahl@hpe.com> Link: https://lore.kernel.org/r/20220514014322.2927339-6-baolu.lu@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2022-07-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
include/net/sock.h 310731e2f161 ("net: Fix data-races around sysctl_mem.") e70f3c701276 ("Revert "net: set SK_MEM_QUANTUM to 4096"") https://lore.kernel.org/all/20220711120211.7c8b7cba@canb.auug.org.au/ net/ipv4/fib_semantics.c 747c14307214 ("ip: fix dflt addr selection for connected nexthop") d62607c3fe45 ("net: rename reference+tracking helpers") net/tls/tls.h include/net/tls.h 3d8c51b25a23 ("net/tls: Check for errors in tls_device_init") 587903142308 ("tls: create an internal header") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-14x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_currentNathan Chancellor
Clang warns: arch/x86/kernel/cpu/bugs.c:58:21: error: section attribute is specified on redeclared variable [-Werror,-Wsection] DEFINE_PER_CPU(u64, x86_spec_ctrl_current); ^ arch/x86/include/asm/nospec-branch.h:283:12: note: previous declaration is here extern u64 x86_spec_ctrl_current; ^ 1 error generated. The declaration should be using DECLARE_PER_CPU instead so all attributes stay in sync. Cc: stable@vger.kernel.org Fixes: fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-14KVM: x86: Check target, not vCPU's x2APIC ID, when applying hotplug hackSean Christopherson
When applying the hotplug hack to match x2APIC IDs for vCPUs in xAPIC mode, check the target APID ID for being unaddressable in xAPIC mode instead of checking the vCPU's x2APIC ID, and in that case proceed as if apic_x2apic_mode(vcpu) were true. Functionally, it does not matter whether you compare kvm_x2apic_id(apic) or mda with 0xff, since the two values are then checked for equality. But in isolation, checking the x2APIC ID takes an unnecessary dependency on the x2APIC ID being read-only (which isn't strictly true on AMD CPUs, and is difficult to document as well); it also requires KVM to fallthrough and check the xAPIC ID as well to deal with a writable xAPIC ID, whereas the xAPIC ID _can't_ match a target ID greater than 0xff. Opportunistically reword the comment to call out the various subtleties, and to fix a typo reported by Zhang Jiaming. No functional change intended. Cc: Zhang Jiaming <jiaming@nfschina.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: x86: Fully initialize 'struct kvm_lapic_irq' in kvm_pv_kick_cpu_op()Vitaly Kuznetsov
'vector' and 'trig_mode' fields of 'struct kvm_lapic_irq' are left uninitialized in kvm_pv_kick_cpu_op(). While these fields are normally not needed for APIC_DM_REMRD, they're still referenced by __apic_accept_irq() for trace_kvm_apic_accept_irq(). Fully initialize the structure to avoid consuming random stack memory. Fixes: a183b638b61c ("KVM: x86: make apic_accept_irq tracepoint more generic") Reported-by: syzbot+d6caa905917d353f0d07@syzkaller.appspotmail.com Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220708125147.593975-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: x86: Restrict get_mt_mask() to a u8, use KVM_X86_OP_OPTIONAL_RET0Sean Christopherson
Restrict get_mt_mask() to a u8 and reintroduce using a RET0 static_call for the SVM implementation. EPT stores the memtype information in the lower 8 bits (bits 6:3 to be precise), and even returns a shifted u8 without an explicit cast to a larger type; there's no need to return a full u64. Note, RET0 doesn't play nice with a u64 return on 32-bit kernels, see commit bf07be36cd88 ("KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask"). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220714153707.3239119-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: x86: Add dedicated helper to get CPUID entry with significant indexSean Christopherson
Add a second CPUID helper, kvm_find_cpuid_entry_index(), to handle KVM queries for CPUID leaves whose index _may_ be significant, and drop the index param from the existing kvm_find_cpuid_entry(). Add a WARN in the inner helper, cpuid_entry2_find(), to detect attempts to retrieve a CPUID entry whose index is significant without explicitly providing an index. Using an explicit magic number and letting callers omit the index avoids confusion by eliminating the myriad cases where KVM specifies '0' as a dummy value. Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: SVM: fix task switch emulation on INTn instruction.Maxim Levitsky
Recently KVM's SVM code switched to re-injecting software interrupt events, if something prevented their delivery. Task switch due to task gate in the IDT, however is an exception to this rule, because in this case, INTn instruction causes a task switch intercept and its emulation completes the INTn emulation as well. Add a missing case to task_switch_interception for that. This fixes 32 bit kvm unit test taskswitch2. Fixes: 7e5b5ef8dca322 ("KVM: SVM: Re-inject INTn instead of retrying the insn on "failure"") Signed-off-by: Maxim Levitsky <mlevitsk@redhat.com> Reviewed-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com> Message-Id: <20220714124453.188655-1-mlevitsk@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: x86/mmu: Fix typo and tweak comment for split_desc_cache capacitySean Christopherson
Remove a spurious closing paranthesis and tweak the comment about the cache capacity for PTE descriptors (rmaps) eager page splitting to tone down the assertion slightly, and to call out that topup requires dropping mmu_lock, which is the real motivation for avoiding topup (as opposed to memory usage). Cc: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220712020724.1262121-4-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: x86/mmu: Expand quadrant comment for PG_LEVEL_4K shadow pagesSean Christopherson
Tweak the comment above the computation of the quadrant for PG_LEVEL_4K shadow pages to explicitly call out how and why KVM uses role.quadrant to consume gPTE bits. Opportunistically wrap an unnecessarily long line. No functional change intended. Link: https://lore.kernel.org/all/YqvWvBv27fYzOFdE@google.com Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220712020724.1262121-3-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: x86/mmu: Add optimized helper to retrieve an SPTE's indexSean Christopherson
Add spte_index() to dedup all the code that calculates a SPTE's index into its parent's page table and/or spt array. Opportunistically tweak the calculation to avoid pointer arithmetic, which is subtle (subtract in 8-byte chunks) and less performant (requires the compiler to generate the subtraction). Suggested-by: David Matlack <dmatlack@google.com> Reviewed-by: David Matlack <dmatlack@google.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220712020724.1262121-2-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14Merge commit 'kvm-vmx-nested-tsc-fix' into kvm-masterPaolo Bonzini
Merge bugfix needed in both 5.19 (because it's bad) and 5.20 (because it is a prerequisite to test new features).
2022-07-14kvm: stats: tell userspace which values are booleanPaolo Bonzini
Some of the statistics values exported by KVM are always only 0 or 1. It can be useful to export this fact to userspace so that it can track them specially (for example by polling the value every now and then to compute a % of time spent in a specific state). Therefore, add "boolean value" as a new "unit". While it is not exactly a unit, it walks and quacks like one. In particular, using the type would be wrong because boolean values could be instantaneous or peak values (e.g. "is the rmap allocated?") or even two-bucket histograms (e.g. "number of posted vs. non-posted interrupt injections"). Suggested-by: Amneesh Singh <natto@weirdnatto.in> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14x86/kvm: fix FASTOP_SIZE when return thunks are enabledThadeu Lima de Souza Cascardo
The return thunk call makes the fastop functions larger, just like IBT does. Consider a 16-byte FASTOP_SIZE when CONFIG_RETHUNK is enabled. Otherwise, functions will be incorrectly aligned and when computing their position for differently sized operators, they will executed in the middle or end of a function, which may as well be an int3, leading to a crash like: [ 36.091116] int3: 0000 [#1] SMP NOPTI [ 36.091119] CPU: 3 PID: 1371 Comm: qemu-system-x86 Not tainted 5.15.0-41-generic #44 [ 36.091120] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.15.0-1 04/01/2014 [ 36.091121] RIP: 0010:xaddw_ax_dx+0x9/0x10 [kvm] [ 36.091185] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3 cc cc [ 36.091186] RSP: 0018:ffffb1f541143c98 EFLAGS: 00000202 [ 36.091188] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000 [ 36.091188] RDX: 0000000076543210 RSI: ffffffffc073c6d0 RDI: 0000000000000200 [ 36.091189] RBP: ffffb1f541143ca0 R08: ffff9f1803350a70 R09: 0000000000000002 [ 36.091190] R10: ffff9f1803350a70 R11: 0000000000000000 R12: ffff9f1803350a70 [ 36.091190] R13: ffffffffc077fee0 R14: 0000000000000000 R15: 0000000000000000 [ 36.091191] FS: 00007efdfce8d640(0000) GS:ffff9f187dd80000(0000) knlGS:0000000000000000 [ 36.091192] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 36.091192] CR2: 0000000000000000 CR3: 0000000009b62002 CR4: 0000000000772ee0 [ 36.091195] PKRU: 55555554 [ 36.091195] Call Trace: [ 36.091197] <TASK> [ 36.091198] ? fastop+0x5a/0xa0 [kvm] [ 36.091222] x86_emulate_insn+0x7b8/0xe90 [kvm] [ 36.091244] x86_emulate_instruction+0x2f4/0x630 [kvm] [ 36.091263] ? kvm_arch_vcpu_load+0x7c/0x230 [kvm] [ 36.091283] ? vmx_prepare_switch_to_host+0xf7/0x190 [kvm_intel] [ 36.091290] complete_emulated_mmio+0x297/0x320 [kvm] [ 36.091310] kvm_arch_vcpu_ioctl_run+0x32f/0x550 [kvm] [ 36.091330] kvm_vcpu_ioctl+0x29e/0x6d0 [kvm] [ 36.091344] ? kvm_vcpu_ioctl+0x120/0x6d0 [kvm] [ 36.091357] ? __fget_files+0x86/0xc0 [ 36.091362] ? __fget_files+0x86/0xc0 [ 36.091363] __x64_sys_ioctl+0x92/0xd0 [ 36.091366] do_syscall_64+0x59/0xc0 [ 36.091369] ? syscall_exit_to_user_mode+0x27/0x50 [ 36.091370] ? do_syscall_64+0x69/0xc0 [ 36.091371] ? syscall_exit_to_user_mode+0x27/0x50 [ 36.091372] ? __x64_sys_writev+0x1c/0x30 [ 36.091374] ? do_syscall_64+0x69/0xc0 [ 36.091374] ? exit_to_user_mode_prepare+0x37/0xb0 [ 36.091378] ? syscall_exit_to_user_mode+0x27/0x50 [ 36.091379] ? do_syscall_64+0x69/0xc0 [ 36.091379] ? do_syscall_64+0x69/0xc0 [ 36.091380] ? do_syscall_64+0x69/0xc0 [ 36.091381] ? do_syscall_64+0x69/0xc0 [ 36.091381] entry_SYSCALL_64_after_hwframe+0x61/0xcb [ 36.091384] RIP: 0033:0x7efdfe6d1aff [ 36.091390] Code: 00 48 89 44 24 18 31 c0 48 8d 44 24 60 c7 04 24 10 00 00 00 48 89 44 24 08 48 8d 44 24 20 48 89 44 24 10 b8 10 00 00 00 0f 05 <41> 89 c0 3d 00 f0 ff ff 77 1f 48 8b 44 24 18 64 48 2b 04 25 28 00 [ 36.091391] RSP: 002b:00007efdfce8c460 EFLAGS: 00000246 ORIG_RAX: 0000000000000010 [ 36.091393] RAX: ffffffffffffffda RBX: 000000000000ae80 RCX: 00007efdfe6d1aff [ 36.091393] RDX: 0000000000000000 RSI: 000000000000ae80 RDI: 000000000000000c [ 36.091394] RBP: 0000558f1609e220 R08: 0000558f13fb8190 R09: 00000000ffffffff [ 36.091394] R10: 0000558f16b5e950 R11: 0000000000000246 R12: 0000000000000000 [ 36.091394] R13: 0000000000000001 R14: 0000000000000000 R15: 0000000000000000 [ 36.091396] </TASK> [ 36.091397] Modules linked in: isofs nls_iso8859_1 kvm_intel joydev kvm input_leds serio_raw sch_fq_codel dm_multipath scsi_dh_rdac scsi_dh_emc scsi_dh_alua ipmi_devintf ipmi_msghandler drm msr ip_tables x_tables autofs4 btrfs blake2b_generic zstd_compress raid10 raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel virtio_net net_failover crypto_simd ahci xhci_pci cryptd psmouse virtio_blk libahci xhci_pci_renesas failover [ 36.123271] ---[ end trace db3c0ab5a48fabcc ]--- [ 36.123272] RIP: 0010:xaddw_ax_dx+0x9/0x10 [kvm] [ 36.123319] Code: 00 0f bb d0 c3 cc cc cc cc 48 0f bb d0 c3 cc cc cc cc 0f 1f 80 00 00 00 00 0f c0 d0 c3 cc cc cc cc 66 0f c1 d0 c3 cc cc cc cc <0f> 1f 80 00 00 00 00 0f c1 d0 c3 cc cc cc cc 48 0f c1 d0 c3 cc cc [ 36.123320] RSP: 0018:ffffb1f541143c98 EFLAGS: 00000202 [ 36.123321] RAX: 0000000089abcdef RBX: 0000000000000001 RCX: 0000000000000000 [ 36.123321] RDX: 0000000076543210 RSI: ffffffffc073c6d0 RDI: 0000000000000200 [ 36.123322] RBP: ffffb1f541143ca0 R08: ffff9f1803350a70 R09: 0000000000000002 [ 36.123322] R10: ffff9f1803350a70 R11: 0000000000000000 R12: ffff9f1803350a70 [ 36.123323] R13: ffffffffc077fee0 R14: 0000000000000000 R15: 0000000000000000 [ 36.123323] FS: 00007efdfce8d640(0000) GS:ffff9f187dd80000(0000) knlGS:0000000000000000 [ 36.123324] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 36.123325] CR2: 0000000000000000 CR3: 0000000009b62002 CR4: 0000000000772ee0 [ 36.123327] PKRU: 55555554 [ 36.123328] Kernel panic - not syncing: Fatal exception in interrupt [ 36.123410] Kernel Offset: 0x1400000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff) [ 36.135305] ---[ end Kernel panic - not syncing: Fatal exception in interrupt ]--- Fixes: aa3d480315ba ("x86: Use return-thunk in asm code") Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Co-developed-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Borislav Petkov <bp@suse.de> Cc: Josh Poimboeuf <jpoimboe@kernel.org> Cc: Paolo Bonzini <pbonzini@redhat.com> Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Message-Id: <20220713171241.184026-1-cascardo@canonical.com> Tested-by: Jack Wang <jinpu.wang@ionos.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14KVM: nVMX: Always enable TSC scaling for L2 when it was enabled for L1Vitaly Kuznetsov
Windows 10/11 guests with Hyper-V role (WSL2) enabled are observed to hang upon boot or shortly after when a non-default TSC frequency was set for L1. The issue is observed on a host where TSC scaling is supported. The problem appears to be that Windows doesn't use TSC frequency for its guests even when the feature is advertised and KVM filters SECONDARY_EXEC_TSC_SCALING out when creating L2 controls from L1's. This leads to L2 running with the default frequency (matching host's) while L1 is running with an altered one. Keep SECONDARY_EXEC_TSC_SCALING in secondary exec controls for L2 when it was set for L1. TSC_MULTIPLIER is already correctly computed and written by prepare_vmcs02(). Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Message-Id: <20220712135009.952805-1-vkuznets@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-14x86/entry: Remove UNTRAIN_RET from native_irq_return_ldtAlexandre Chartre
UNTRAIN_RET is not needed in native_irq_return_ldt because RET untraining has already been done at this point. In addition, when the RETBleed mitigation is IBPB, UNTRAIN_RET clobbers several registers (AX, CX, DX) so here it trashes user values which are in these registers. Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/35b0d50f-12d1-10c3-f5e8-d6c140486d4a@oracle.com
2022-07-14x86/bugs: Mark retbleed_strings staticJiapeng Chong
This symbol is not used outside of bugs.c, so mark it static. Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220714072939.71162-1-jiapeng.chong@linux.alibaba.com
2022-07-13KVM: VMX: Update PT MSR intercepts during filter change iff PT in host+guestSean Christopherson
Update the Processor Trace (PT) MSR intercepts during a filter change if and only if PT may be exposed to the guest, i.e. only if KVM is operating in the so called "host+guest" mode where PT can be used simultaneously by both the host and guest. If PT is in system mode, the host is the sole owner of PT and the MSRs should never be passed through to the guest. Luckily the missed check only results in unnecessary work, as select RTIT MSRs are passed through only when RTIT tracing is enabled "in" the guest, and tracing can't be enabled in the guest when KVM is in system mode (writes to guest.MSR_IA32_RTIT_CTL are disallowed). Cc: Xiaoyao Li <xiaoyao.li@intel.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Xiaoyao Li <xiaoyao.li@intel.com> Link: https://lore.kernel.org/r/20220712015838.1253995-1-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-13KVM: x86: WARN only once if KVM leaves a dangling userspace I/O requestSean Christopherson
Change a WARN_ON() to separate WARN_ON_ONCE() if KVM has an outstanding PIO or MMIO request without an associated callback, i.e. if KVM queued a userspace I/O exit but didn't actually exit to userspace before moving on to something else. Warning on every KVM_RUN risks spamming the kernel if KVM gets into a bad state. Opportunistically split the WARNs so that it's easier to triage failures when a WARN fires. Deliberately do not use KVM_BUG_ON(), i.e. don't kill the VM. While the WARN is all but guaranteed to fire if and only if there's a KVM bug, a dangling I/O request does not present a danger to KVM (that flag is truly truly consumed only in a single emulator path), and any such bug is unlikely to be fatal to the VM (KVM essentially failed to do something it shouldn't have tried to do in the first place). In other words, note the bug, but let the VM keep running. Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220711232750.1092012-4-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-13KVM: x86: Set error code to segment selector on LLDT/LTR non-canonical #GPSean Christopherson
When injecting a #GP on LLDT/LTR due to a non-canonical LDT/TSS base, set the error code to the selector. Intel SDM's says nothing about the #GP, but AMD's APM explicitly states that both LLDT and LTR set the error code to the selector, not zero. Note, a non-canonical memory operand on LLDT/LTR does generate a #GP(0), but the KVM code in question is specific to the base from the descriptor. Fixes: e37a75a13cda ("KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR") Cc: stable@vger.kernel.org Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220711232750.1092012-3-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-13KVM: x86: Mark TSS busy during LTR emulation _after_ all fault checksSean Christopherson
Wait to mark the TSS as busy during LTR emulation until after all fault checks for the LTR have passed. Specifically, don't mark the TSS busy if the new TSS base is non-canonical. Opportunistically drop the one-off !seg_desc.PRESENT check for TR as the only reason for the early check was to avoid marking a !PRESENT TSS as busy, i.e. the common !PRESENT is now done before setting the busy bit. Fixes: e37a75a13cda ("KVM: x86: Emulator ignores LDTR/TR extended base on LLDT/LTR") Reported-by: syzbot+760a73552f47a8cd0fd9@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> Cc: Hou Wenlong <houwenlong.hwl@antgroup.com> Signed-off-by: Sean Christopherson <seanjc@google.com> Reviewed-by: Maxim Levitsky <mlevitsk@redhat.com> Link: https://lore.kernel.org/r/20220711232750.1092012-2-seanjc@google.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-13KVM: x86: Tweak name of MONITOR/MWAIT #UD quirk to make it #UD specificSean Christopherson
Add a "UD" clause to KVM_X86_QUIRK_MWAIT_NEVER_FAULTS to make it clear that the quirk only controls the #UD behavior of MONITOR/MWAIT. KVM doesn't currently enforce fault checks when MONITOR/MWAIT are supported, but that could change in the future. SVM also has a virtualization hole in that it checks all faults before intercepts, and so "never faults" is already a lie when running on SVM. Fixes: bfbcc81bb82c ("KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior") Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220711225753.1073989-4-seanjc@google.com
2022-07-13ACPI: CPPC: Fix enabling CPPC on AMD systems with shared memoryMario Limonciello
When commit 72f2ecb7ece7 ("ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported") was introduced, we found collateral damage that a number of AMD systems that supported CPPC but didn't advertise support in _OSC stopped having a functional amd-pstate driver. The _OSC was only enforced on Intel systems at that time. This was fixed for the MSR based designs by commit 8b356e536e69f ("ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported") but some shared memory based designs also support CPPC but haven't advertised support in the _OSC. Add support for those designs as well by hardcoding the list of systems. Fixes: 72f2ecb7ece7 ("ACPI: bus: Set CPPC _OSC bits for all and when CPPC_LIB is supported") Fixes: 8b356e536e69f ("ACPI: CPPC: Don't require _OSC if X86_FEATURE_CPPC is supported") Link: https://lore.kernel.org/all/3559249.JlDtxWtqDm@natalenko.name/ Cc: 5.18+ <stable@vger.kernel.org> # 5.18+ Reported-and-tested-by: Oleksandr Natalenko <oleksandr@natalenko.name> Signed-off-by: Mario Limonciello <mario.limonciello@amd.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-13x86/pat: Fix x86_has_pat_wp()Juergen Gross
x86_has_pat_wp() is using a wrong test, as it relies on the normal PAT configuration used by the kernel. In case the PAT MSR has been setup by another entity (e.g. Xen hypervisor) it might return false even if the PAT configuration is allowing WP mappings. This due to the fact that when running as Xen PV guest the PAT MSR is setup by the hypervisor and cannot be changed by the guest. This results in the WP related entry to be at a different position when running as Xen PV guest compared to the bare metal or fully virtualized case. The correct way to test for WP support is: 1. Get the PTE protection bits needed to select WP mode by reading __cachemode2pte_tbl[_PAGE_CACHE_MODE_WP] (depending on the PAT MSR setting this might return protection bits for a stronger mode, e.g. UC-) 2. Translate those bits back into the real cache mode selected by those PTE bits by reading __pte2cachemode_tbl[__pte2cm_idx(prot)] 3. Test for the cache mode to be _PAGE_CACHE_MODE_WP Fixes: f88a68facd9a ("x86/mm: Extend early_memremap() support with additional attrs") Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: <stable@vger.kernel.org> # 4.14 Link: https://lore.kernel.org/r/20220503132207.17234-1-jgross@suse.com
2022-07-13x86/asm/32: Fix ANNOTATE_UNRET_SAFE use on 32-bitJiri Slaby
The build on x86_32 currently fails after commit 9bb2ec608a20 (objtool: Update Retpoline validation) with: arch/x86/kernel/../../x86/xen/xen-head.S:35: Error: no such instruction: `annotate_unret_safe' ANNOTATE_UNRET_SAFE is defined in nospec-branch.h. And head_32.S is missing this include. Fix this. Fixes: 9bb2ec608a20 ("objtool: Update Retpoline validation") Signed-off-by: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/63e23f80-033f-f64e-7522-2816debbc367@kernel.org