summaryrefslogtreecommitdiff
path: root/arch/x86/include
AgeCommit message (Collapse)Author
2022-07-24Merge tag 'x86_urgent_for_v5.19_rc8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: "A couple more retbleed fallout fixes. It looks like their urgency is decreasing so it seems like we've managed to catch whatever snafus the limited -rc testing has exposed. Maybe we're getting ready... :) - Make retbleed mitigations 64-bit only (32-bit will need a bit more work if even needed, at all). - Prevent return thunks patching of the LKDTM modules as it is not needed there - Avoid writing the SPEC_CTRL MSR on every kernel entry on eIBRS parts - Enhance error output of apply_returns() when it fails to patch a return thunk - A sparse fix to the sev-guest module - Protect EFI fw calls by issuing an IBPB on AMD" * tag 'x86_urgent_for_v5.19_rc8' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/speculation: Make all RETbleed mitigations 64-bit only lkdtm: Disable return thunks in rodata.c x86/bugs: Warn when "ibrs" mitigation is selected on Enhanced IBRS parts x86/alternative: Report missing return thunk details virt: sev-guest: Pass the appropriate argument type to iounmap() x86/amd: Use IBPB for firmware calls
2022-07-22PCI: Move isa_dma_bridge_buggy out of asm/dma.hStafford Horne
The isa_dma_bridge_buggy symbol is only used for x86_32, and only x86_32 platforms or quirks ever set it. Add a new linux/isa-dma.h header that #defines isa_dma_bridge_buggy to 0 except on x86_32, where we keep it as a variable, and remove all the arch- specific definitions. [bhelgaas: commit log] Suggested-by: Arnd Bergmann <arnd@arndb.de> Suggested-by: Christoph Hellwig <hch@infradead.org> Link: https://lore.kernel.org/r/20220722214944.831438-3-shorne@gmail.com Signed-off-by: Stafford Horne <shorne@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>
2022-07-22PCI: Remove pci_get_legacy_ide_irq() and asm-generic/pci.hStafford Horne
pci_get_legacy_ide_irq() is only used on platforms that support PNP, so many architectures define it but never use it. Replace uses of it with ATA_PRIMARY_IRQ() and ATA_SECONDARY_IRQ(), which provide the same functionality. Since pci_get_legacy_ide_irq() is no longer used, remove all the architecture-specific definitions of it as well as asm-generic/pci.h, which only provides pci_get_legacy_ide_irq() [bhelgaas: commit log] Co-developed-by: Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20220722214944.831438-2-shorne@gmail.com Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Stafford Horne <shorne@gmail.com> Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Pierre Morel <pmorel@linux.ibm.com> Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-21mmu_gather: Remove per arch tlb_{start,end}_vma()Peter Zijlstra
Scattered across the archs are 3 basic forms of tlb_{start,end}_vma(). Provide two new MMU_GATHER_knobs to enumerate them and remove the per arch tlb_{start,end}_vma() implementations. - MMU_GATHER_NO_FLUSH_CACHE indicates the arch has flush_cache_range() but does *NOT* want to call it for each VMA. - MMU_GATHER_MERGE_VMAS indicates the arch wants to merge the invalidate across multiple VMAs if possible. With these it is possible to capture the three forms: 1) empty stubs; select MMU_GATHER_NO_FLUSH_CACHE and MMU_GATHER_MERGE_VMAS 2) start: flush_cache_range(), end: empty; select MMU_GATHER_MERGE_VMAS 3) start: flush_cache_range(), end: flush_tlb_range(); default Obviously, if the architecture does not have flush_cache_range() then it also doesn't need to select MMU_GATHER_NO_FLUSH_CACHE. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Will Deacon <will@kernel.org> Cc: David Miller <davem@davemloft.net> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-21x86,nospec: Simplify {JMP,CALL}_NOSPECPeter Zijlstra
Have {JMP,CALL}_NOSPEC generate the same code GCC does for indirect calls and rely on the objtool retpoline patching infrastructure. There's no reason these should be alternatives while the vast bulk of compiler generated retpolines are not. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
2022-07-19x86/fpu: Add a helper to prepare AMX state for low-power CPU idleChang S. Bae
When a CPU enters an idle state, a non-initialized AMX register state may be the cause of preventing a deeper low-power state. Other extended register states whether initialized or not do not impact the CPU idle state. The new helper can ensure the AMX state is initialized before the CPU is idle, and it will be used by the intel idle driver. Check the AMX_TILE feature bit before using XGETBV1 as a chain of dependencies was established via cpuid_deps[]: AMX->XFD->XGETBV1. Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20220608164748.11864-2-chang.seok.bae@intel.com
2022-07-19x86/mm/tlb: Ignore f->new_tlb_gen when zeroNadav Amit
Commit aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible") introduced an optimization to skip superfluous TLB flushes based on the generation provided in flush_tlb_info. However, arch_tlbbatch_flush() does not provide any generation in flush_tlb_info and populates the flush_tlb_info generation with 0. This 0 is causes the flush_tlb_info to be interpreted as a superfluous, old flush. As a result, try_to_unmap_one() would not perform any TLB flushes. Fix it by checking whether f->new_tlb_gen is nonzero. Zero value is anyhow is an invalid generation value. To avoid future confusion, introduce TLB_GENERATION_INVALID constant and use it properly. Add warnings to ensure no partial flushes are done with TLB_GENERATION_INVALID or when f->mm is NULL, since this does not make any sense. In addition, add the missing unlikely(). [ dhansen: change VM_BUG_ON() -> VM_WARN_ON(), clarify changelog ] Fixes: aa44284960d5 ("x86/mm/tlb: Avoid reading mm_tlb_gen when possible") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Nadav Amit <namit@vmware.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Tested-by: Hugh Dickins <hughd@google.com> Link: https://lkml.kernel.org/r/20220710232837.3618-1-namit@vmware.com
2022-07-18x86/amd: Use IBPB for firmware callsPeter Zijlstra
On AMD IBRS does not prevent Retbleed; as such use IBPB before a firmware call to flush the branch history state. And because in order to do an EFI call, the kernel maps a whole lot of the kernel page table into the EFI page table, do an IBPB just in case in order to prevent the scenario of poisoning the BTB and causing an EFI call using the unprotected RET there. [ bp: Massage. ] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220715194550.793957-1-cascardo@canonical.com
2022-07-18random: remove CONFIG_ARCH_RANDOMJason A. Donenfeld
When RDRAND was introduced, there was much discussion on whether it should be trusted and how the kernel should handle that. Initially, two mechanisms cropped up, CONFIG_ARCH_RANDOM, a compile time switch, and "nordrand", a boot-time switch. Later the thinking evolved. With a properly designed RNG, using RDRAND values alone won't harm anything, even if the outputs are malicious. Rather, the issue is whether those values are being *trusted* to be good or not. And so a new set of options were introduced as the real ones that people use -- CONFIG_RANDOM_TRUST_CPU and "random.trust_cpu". With these options, RDRAND is used, but it's not always credited. So in the worst case, it does nothing, and in the best case, maybe it helps. Along the way, CONFIG_ARCH_RANDOM's meaning got sort of pulled into the center and became something certain platforms force-select. The old options don't really help with much, and it's a bit odd to have special handling for these instructions when the kernel can deal fine with the existence or untrusted existence or broken existence or non-existence of that CPU capability. Simplify the situation by removing CONFIG_ARCH_RANDOM and using the ordinary asm-generic fallback pattern instead, keeping the two options that are actually used. For now it leaves "nordrand" for now, as the removal of that will take a different route. Acked-by: Michael Ellerman <mpe@ellerman.id.au> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Acked-by: Borislav Petkov <bp@suse.de> Acked-by: Heiko Carstens <hca@linux.ibm.com> Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
2022-07-17x86/mm: move protection_map[] inside the platformAnshuman Khandual
This moves protection_map[] inside the platform and makes it a static. This also defines a helper function add_encrypt_protection_map() that can update the protection_map[] array with pgprot_encrypted(). Link: https://lkml.kernel.org/r/20220711070600.2378316-7-anshuman.khandual@arm.com Signed-off-by: Anshuman Khandual <anshuman.khandual@arm.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Arnd Bergmann <arnd@arndb.de> Cc: Brian Cain <bcain@quicinc.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Christoph Hellwig <hch@infradead.org> Cc: Chris Zankel <chris@zankel.net> Cc: "David S. Miller" <davem@davemloft.net> Cc: Dinh Nguyen <dinguyen@kernel.org> Cc: Geert Uytterhoeven <geert@linux-m68k.org> Cc: Guo Ren <guoren@kernel.org> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: "James E.J. Bottomley" <James.Bottomley@HansenPartnership.com> Cc: Jeff Dike <jdike@addtoit.com> Cc: Jonas Bonn <jonas@southpole.se> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Michal Simek <monstr@monstr.eu> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Richard Henderson <rth@twiddle.net> Cc: Rich Felker <dalias@libc.org> Cc: Russell King <linux@armlinux.org.uk> Cc: Sam Ravnborg <sam@ravnborg.org> Cc: Stafford Horne <shorne@gmail.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Cc: Vineet Gupta <vgupta@kernel.org> Cc: WANG Xuerui <kernel@xen0n.name> Cc: Will Deacon <will@kernel.org> Cc: Yoshinori Sato <ysato@users.sourceforge.jp> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-07-15kexec: drop weak attribute from functionsNaveen N. Rao
Drop __weak attribute from functions in kexec_core.c: - machine_kexec_post_load() - arch_kexec_protect_crashkres() - arch_kexec_unprotect_crashkres() - crash_free_reserved_phys_range() Link: https://lkml.kernel.org/r/c0f6219e03cb399d166d518ab505095218a902dd.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Suggested-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-15kexec_file: drop weak attribute from functionsNaveen N. Rao
As requested (http://lkml.kernel.org/r/87ee0q7b92.fsf@email.froward.int.ebiederm.org), this series converts weak functions in kexec to use the #ifdef approach. Quoting the 3e35142ef99fe ("kexec_file: drop weak attribute from arch_kexec_apply_relocations[_add]") changelog: : Since commit d1bcae833b32f1 ("ELF: Don't generate unused section symbols") : [1], binutils (v2.36+) started dropping section symbols that it thought : were unused. This isn't an issue in general, but with kexec_file.c, gcc : is placing kexec_arch_apply_relocations[_add] into a separate : .text.unlikely section and the section symbol ".text.unlikely" is being : dropped. Due to this, recordmcount is unable to find a non-weak symbol in : .text.unlikely to generate a relocation record against. This patch (of 2); Drop __weak attribute from functions in kexec_file.c: - arch_kexec_kernel_image_probe() - arch_kimage_file_post_load_cleanup() - arch_kexec_kernel_image_load() - arch_kexec_locate_mem_hole() - arch_kexec_kernel_verify_sig() arch_kexec_kernel_image_load() calls into kexec_image_load_default(), so drop the static attribute for the latter. arch_kexec_kernel_verify_sig() is not overridden by any architecture, so drop the __weak attribute. Link: https://lkml.kernel.org/r/cover.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Link: https://lkml.kernel.org/r/2cd7ca1fe4d6bb6ca38e3283c717878388ed6788.1656659357.git.naveen.n.rao@linux.vnet.ibm.com Signed-off-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Suggested-by: Eric Biederman <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Mimi Zohar <zohar@linux.ibm.com>
2022-07-14x86/speculation: Use DECLARE_PER_CPU for x86_spec_ctrl_currentNathan Chancellor
Clang warns: arch/x86/kernel/cpu/bugs.c:58:21: error: section attribute is specified on redeclared variable [-Werror,-Wsection] DEFINE_PER_CPU(u64, x86_spec_ctrl_current); ^ arch/x86/include/asm/nospec-branch.h:283:12: note: previous declaration is here extern u64 x86_spec_ctrl_current; ^ 1 error generated. The declaration should be using DECLARE_PER_CPU instead so all attributes stay in sync. Cc: stable@vger.kernel.org Fixes: fc02735b14ff ("KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Nathan Chancellor <nathan@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-07-14KVM: x86: Restrict get_mt_mask() to a u8, use KVM_X86_OP_OPTIONAL_RET0Sean Christopherson
Restrict get_mt_mask() to a u8 and reintroduce using a RET0 static_call for the SVM implementation. EPT stores the memtype information in the lower 8 bits (bits 6:3 to be precise), and even returns a shifted u8 without an explicit cast to a larger type; there's no need to return a full u64. Note, RET0 doesn't play nice with a u64 return on 32-bit kernels, see commit bf07be36cd88 ("KVM: x86: do not use KVM_X86_OP_OPTIONAL_RET0 for get_mt_mask"). Signed-off-by: Sean Christopherson <seanjc@google.com> Message-Id: <20220714153707.3239119-1-seanjc@google.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2022-07-13KVM: x86: Tweak name of MONITOR/MWAIT #UD quirk to make it #UD specificSean Christopherson
Add a "UD" clause to KVM_X86_QUIRK_MWAIT_NEVER_FAULTS to make it clear that the quirk only controls the #UD behavior of MONITOR/MWAIT. KVM doesn't currently enforce fault checks when MONITOR/MWAIT are supported, but that could change in the future. SVM also has a virtualization hole in that it checks all faults before intercepts, and so "never faults" is already a lie when running on SVM. Fixes: bfbcc81bb82c ("KVM: x86: Add a quirk for KVM's "MONITOR/MWAIT are NOPs!" behavior") Signed-off-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/20220711225753.1073989-4-seanjc@google.com
2022-07-12KVM: x86/mmu: Replace UNMAPPED_GVA with INVALID_GPA for gva_to_gpa()Hou Wenlong
The result of gva_to_gpa() is physical address not virtual address, it is odd that UNMAPPED_GVA macro is used as the result for physical address. Replace UNMAPPED_GVA with INVALID_GPA and drop UNMAPPED_GVA macro. No functional change intended. Signed-off-by: Hou Wenlong <houwenlong.hwl@antgroup.com> Reviewed-by: Sean Christopherson <seanjc@google.com> Link: https://lore.kernel.org/r/6104978956449467d3c68f1ad7f2c2f6d771d0ee.1656667239.git.houwenlong.hwl@antgroup.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2022-07-11Merge tag 'x86_bugs_retbleed' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 retbleed fixes from Borislav Petkov: "Just when you thought that all the speculation bugs were addressed and solved and the nightmare is complete, here's the next one: speculating after RET instructions and leaking privileged information using the now pretty much classical covert channels. It is called RETBleed and the mitigation effort and controlling functionality has been modelled similar to what already existing mitigations provide" * tag 'x86_bugs_retbleed' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (54 commits) x86/speculation: Disable RRSBA behavior x86/kexec: Disable RET on kexec x86/bugs: Do not enable IBPB-on-entry when IBPB is not supported x86/entry: Move PUSH_AND_CLEAR_REGS() back into error_entry x86/bugs: Add Cannon lake to RETBleed affected CPU list x86/retbleed: Add fine grained Kconfig knobs x86/cpu/amd: Enumerate BTC_NO x86/common: Stamp out the stepping madness KVM: VMX: Prevent RSB underflow before vmenter x86/speculation: Fill RSB on vmexit for IBRS KVM: VMX: Fix IBRS handling after vmexit KVM: VMX: Prevent guest RSB poisoning attacks with eIBRS KVM: VMX: Convert launched argument to flags KVM: VMX: Flatten __vmx_vcpu_run() objtool: Re-add UNWIND_HINT_{SAVE_RESTORE} x86/speculation: Remove x86_spec_ctrl_mask x86/speculation: Use cached host SPEC_CTRL value for guest entry/exit x86/speculation: Fix SPEC_CTRL write on SMT state change x86/speculation: Fix firmware entry SPEC_CTRL handling x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=n ...
2022-07-11x86/setup: Use rng seeds from setup_dataJason A. Donenfeld
Currently, the only way x86 can get an early boot RNG seed is via EFI, which is generally always used now for physical machines, but is very rarely used in VMs, especially VMs that are optimized for starting "instantaneously", such as Firecracker's MicroVM. For tiny fast booting VMs, EFI is not something you generally need or want. Rather, the image loader or firmware should be able to pass a single random seed, exactly as device tree platforms do with the "rng-seed" property. Additionally, this is something that bootloaders can append, with their own seed file management, which is something every other major OS ecosystem has that Linux does not (yet). Add SETUP_RNG_SEED, similar to the other eight setup_data entries that are parsed at boot. It also takes care to zero out the seed immediately after using, in order to retain forward secrecy. This all takes about 7 trivial lines of code. Then, on kexec_file_load(), a new fresh seed is generated and passed to the next kernel, just as is done on device tree architectures when using kexec. And, importantly, I've tested that QEMU is able to properly pass SETUP_RNG_SEED as well, making this work for every step of the way. This code too is pretty straight forward. Together these measures ensure that VMs and nested kexec()'d kernels always receive a proper boot time RNG seed at the earliest possible stage from their parents: - Host [already has strongly initialized RNG] - QEMU [passes fresh seed in SETUP_RNG_SEED field] - Linux [uses parent's seed and gathers entropy of its own] - kexec [passes this in SETUP_RNG_SEED field] - Linux [uses parent's seed and gathers entropy of its own] - kexec [passes this in SETUP_RNG_SEED field] - Linux [uses parent's seed and gathers entropy of its own] - kexec [passes this in SETUP_RNG_SEED field] - ... I've verified in several scenarios that this works quite well from a host kernel to QEMU and down inwards, mixing and matching loaders, with every layer providing a seed to the next. [ bp: Massage commit message. ] Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: H. Peter Anvin (Intel) <hpa@zytor.com> Link: https://lore.kernel.org/r/20220630113300.1892799-1-Jason@zx2c4.com
2022-07-11Merge tag 'v5.19-rc6' into tip:x86/kdumpBorislav Petkov
Merge rc6 to pick up dependent changes to the bootparam UAPI header. Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-10x86/boot: Fix the setup data types max limitBorislav Petkov
Commit in Fixes forgot to change the SETUP_TYPE_MAX definition which contains the highest valid setup data type. Correct that. Fixes: 5ea98e01ab52 ("x86/boot: Add Confidential Computing type to setup_data") Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/ddba81dd-cc92-699c-5274-785396a17fb5@zytor.com
2022-07-09x86/speculation: Disable RRSBA behaviorPawan Gupta
Some Intel processors may use alternate predictors for RETs on RSB-underflow. This condition may be vulnerable to Branch History Injection (BHI) and intramode-BTI. Kernel earlier added spectre_v2 mitigation modes (eIBRS+Retpolines, eIBRS+LFENCE, Retpolines) which protect indirect CALLs and JMPs against such attacks. However, on RSB-underflow, RET target prediction may fallback to alternate predictors. As a result, RET's predicted target may get influenced by branch history. A new MSR_IA32_SPEC_CTRL bit (RRSBA_DIS_S) controls this fallback behavior when in kernel mode. When set, RETs will not take predictions from alternate predictors, hence mitigating RETs as well. Support for this is enumerated by CPUID.7.2.EDX[RRSBA_CTRL] (bit2). For spectre v2 mitigation, when a user selects a mitigation that protects indirect CALLs and JMPs against BHI and intramode-BTI, set RRSBA_DIS_S also to protect RETs for RSB-underflow case. Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-07-07x86/sgx: Support complete page removalReinette Chatre
The SGX2 page removal flow was introduced in previous patch and is as follows: 1) Change the type of the pages to be removed to SGX_PAGE_TYPE_TRIM using the ioctl() SGX_IOC_ENCLAVE_MODIFY_TYPES introduced in previous patch. 2) Approve the page removal by running ENCLU[EACCEPT] from within the enclave. 3) Initiate actual page removal using the ioctl() SGX_IOC_ENCLAVE_REMOVE_PAGES introduced here. Support the final step of the SGX2 page removal flow with ioctl() SGX_IOC_ENCLAVE_REMOVE_PAGES. With this ioctl() the user specifies a page range that should be removed. All pages in the provided range should have the SGX_PAGE_TYPE_TRIM page type and the request will fail with EPERM (Operation not permitted) if a page that does not have the correct type is encountered. Page removal can fail on any page within the provided range. Support partial success by returning the number of pages that were successfully removed. Since actual page removal will succeed even if ENCLU[EACCEPT] was not run from within the enclave the ENCLU[EMODPR] instruction with RWX permissions is used as a no-op mechanism to ensure ENCLU[EACCEPT] was successfully run from within the enclave before the enclave page is removed. If the user omits running SGX_IOC_ENCLAVE_REMOVE_PAGES the pages will still be removed when the enclave is unloaded. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Haitao Huang <haitao.huang@intel.com> Tested-by: Vijay Dhanraj <vijay.dhanraj@intel.com> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/b75ee93e96774e38bb44a24b8e9bbfb67b08b51b.1652137848.git.reinette.chatre@intel.com
2022-07-07x86/sgx: Support modifying SGX page typeReinette Chatre
Every enclave contains one or more Thread Control Structures (TCS). The TCS contains meta-data used by the hardware to save and restore thread specific information when entering/exiting the enclave. With SGX1 an enclave needs to be created with enough TCSs to support the largest number of threads expecting to use the enclave and enough enclave pages to meet all its anticipated memory demands. In SGX1 all pages remain in the enclave until the enclave is unloaded. SGX2 introduces a new function, ENCLS[EMODT], that is used to change the type of an enclave page from a regular (SGX_PAGE_TYPE_REG) enclave page to a TCS (SGX_PAGE_TYPE_TCS) page or change the type from a regular (SGX_PAGE_TYPE_REG) or TCS (SGX_PAGE_TYPE_TCS) page to a trimmed (SGX_PAGE_TYPE_TRIM) page (setting it up for later removal). With the existing support of dynamically adding regular enclave pages to an initialized enclave and changing the page type to TCS it is possible to dynamically increase the number of threads supported by an enclave. Changing the enclave page type to SGX_PAGE_TYPE_TRIM is the first step of dynamically removing pages from an initialized enclave. The complete page removal flow is: 1) Change the type of the pages to be removed to SGX_PAGE_TYPE_TRIM using the SGX_IOC_ENCLAVE_MODIFY_TYPES ioctl() introduced here. 2) Approve the page removal by running ENCLU[EACCEPT] from within the enclave. 3) Initiate actual page removal using the ioctl() introduced in the following patch. Add ioctl() SGX_IOC_ENCLAVE_MODIFY_TYPES to support changing SGX enclave page types within an initialized enclave. With SGX_IOC_ENCLAVE_MODIFY_TYPES the user specifies a page range and the enclave page type to be applied to all pages in the provided range. The ioctl() itself can return an error code based on failures encountered by the kernel. It is also possible for SGX specific failures to be encountered. Add a result output parameter to communicate the SGX return code. It is possible for the enclave page type change request to fail on any page within the provided range. Support partial success by returning the number of pages that were successfully changed. After the page type is changed the page continues to be accessible from the kernel perspective with page table entries and internal state. The page may be moved to swap. Any access until ENCLU[EACCEPT] will encounter a page fault with SGX flag set in error code. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Haitao Huang <haitao.huang@intel.com> Tested-by: Vijay Dhanraj <vijay.dhanraj@intel.com> Link: https://lkml.kernel.org/r/babe39318c5bf16fc65fbfb38896cdee72161575.1652137848.git.reinette.chatre@intel.com
2022-07-07x86/sgx: Support restricting of enclave page permissionsReinette Chatre
In the initial (SGX1) version of SGX, pages in an enclave need to be created with permissions that support all usages of the pages, from the time the enclave is initialized until it is unloaded. For example, pages used by a JIT compiler or when code needs to otherwise be relocated need to always have RWX permissions. SGX2 includes a new function ENCLS[EMODPR] that is run from the kernel and can be used to restrict the EPCM permissions of regular enclave pages within an initialized enclave. Introduce ioctl() SGX_IOC_ENCLAVE_RESTRICT_PERMISSIONS to support restricting EPCM permissions. With this ioctl() the user specifies a page range and the EPCM permissions to be applied to all pages in the provided range. ENCLS[EMODPR] is run to restrict the EPCM permissions followed by the ENCLS[ETRACK] flow that will ensure no cached linear-to-physical address mappings to the changed pages remain. It is possible for the permission change request to fail on any page within the provided range, either with an error encountered by the kernel or by the SGX hardware while running ENCLS[EMODPR]. To support partial success the ioctl() returns an error code based on failures encountered by the kernel as well as two result output parameters: one for the number of pages that were successfully changed and one for the SGX return code. The page table entry permissions are not impacted by the EPCM permission changes. VMAs and PTEs will continue to allow the maximum vetted permissions determined at the time the pages are added to the enclave. The SGX error code in a page fault will indicate if it was an EPCM permission check that prevented an access attempt. No checking is done to ensure that the permissions are actually being restricted. This is because the enclave may have relaxed the EPCM permissions from within the enclave without the kernel knowing. An attempt to relax permissions using this call will be ignored by the hardware. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Jarkko Sakkinen <jarkko@kernel.org> Tested-by: Haitao Huang <haitao.huang@intel.com> Tested-by: Vijay Dhanraj <vijay.dhanraj@intel.com> Link: https://lkml.kernel.org/r/082cee986f3c1a2f4fdbf49501d7a8c5a98446f8.1652137848.git.reinette.chatre@intel.com
2022-07-07x86/sgx: Keep record of SGX page typeReinette Chatre
SGX2 functions are not allowed on all page types. For example, ENCLS[EMODPR] is only allowed on regular SGX enclave pages and ENCLS[EMODPT] is only allowed on TCS and regular pages. If these functions are attempted on another type of page the hardware would trigger a fault. Keep a record of the SGX page type so that there is more certainty whether an SGX2 instruction can succeed and faults can be treated as real failures. The page type is a property of struct sgx_encl_page and thus does not cover the VA page type. VA pages are maintained in separate structures and their type can be determined in a different way. The SGX2 instructions needing the page type do not operate on VA pages and this is thus not a scenario needing to be covered at this time. struct sgx_encl_page hosting this information is maintained for each enclave page so the space consumed by the struct is important. The existing sgx_encl_page->vm_max_prot_bits is already unsigned long while only using three bits. Transition to a bitfield for the two members to support the additional information without increasing the space consumed by the struct. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/a0a6939eefe7ba26514f6c49723521cde372de64.1652137848.git.reinette.chatre@intel.com
2022-07-07x86/sgx: Add wrapper for SGX2 EMODPR functionReinette Chatre
Add a wrapper for the EMODPR ENCLS leaf function used to restrict enclave page permissions as maintained in the SGX hardware's Enclave Page Cache Map (EPCM). EMODPR: 1) Updates the EPCM permissions of an enclave page by treating the new permissions as a mask. Supplying a value that attempts to relax EPCM permissions has no effect on EPCM permissions (PR bit, see below, is changed). 2) Sets the PR bit in the EPCM entry of the enclave page to indicate that permission restriction is in progress. The bit is reset by the enclave by invoking ENCLU leaf function EACCEPT or EACCEPTCOPY. The enclave may access the page throughout the entire process if conforming to the EPCM permissions for the enclave page. After performing the permission restriction by issuing EMODPR the kernel needs to collaborate with the hardware to ensure that all logical processors sees the new restricted permissions. This is required for the enclave's EACCEPT/EACCEPTCOPY to succeed and is accomplished with the ETRACK flow. Expand enum sgx_return_code with the possible EMODPR return values. Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Jarkko Sakkinen <jarkko@kernel.org> Link: https://lkml.kernel.org/r/d15e7a769e13e4ca671fa2d0a0d3e3aec5aedbd4.1652137848.git.reinette.chatre@intel.com
2022-07-01x86/kexec: Carry forward IMA measurement log on kexecJonathan McDowell
On kexec file load, the Integrity Measurement Architecture (IMA) subsystem may verify the IMA signature of the kernel and initramfs, and measure it. The command line parameters passed to the kernel in the kexec call may also be measured by IMA. A remote attestation service can verify a TPM quote based on the TPM event log, the IMA measurement list and the TPM PCR data. This can be achieved only if the IMA measurement log is carried over from the current kernel to the next kernel across the kexec call. PowerPC and ARM64 both achieve this using device tree with a "linux,ima-kexec-buffer" node. x86 platforms generally don't make use of device tree, so use the setup_data mechanism to pass the IMA buffer to the new kernel. Signed-off-by: Jonathan McDowell <noodles@fb.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Mimi Zohar <zohar@linux.ibm.com> # IMA function definitions Link: https://lore.kernel.org/r/YmKyvlF3my1yWTvK@noodles-fedora-PC23Y6EG
2022-07-01x86/xen: Use clear_bss() for Xen PV guestsJuergen Gross
Instead of clearing the bss area in assembly code, use the clear_bss() function. This requires to pass the start_info address as parameter to xen_start_kernel() in order to avoid the xen_start_info being zeroed again. Signed-off-by: Juergen Gross <jgross@suse.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Jan Beulich <jbeulich@suse.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Link: https://lore.kernel.org/r/20220630071441.28576-2-jgross@suse.com
2022-06-30bitops: unify non-atomic bitops prototypes across architecturesAlexander Lobakin
Currently, there is a mess with the prototypes of the non-atomic bitops across the different architectures: ret bool, int, unsigned long nr int, long, unsigned int, unsigned long addr volatile unsigned long *, volatile void * Thankfully, it doesn't provoke any bugs, but can sometimes make the compiler angry when it's not handy at all. Adjust all the prototypes to the following standard: ret bool retval can be only 0 or 1 nr unsigned long native; signed makes no sense addr volatile unsigned long * bitmaps are arrays of ulongs Next, some architectures don't define 'arch_' versions as they don't support instrumentation, others do. To make sure there is always the same set of callables present and to ease any potential future changes, make them all follow the rule: * architecture-specific files define only 'arch_' versions; * non-prefixed versions can be defined only in asm-generic files; and place the non-prefixed definitions into a new file in asm-generic to be included by non-instrumented architectures. Finally, add some static assertions in order to prevent people from making a mess in this room again. I also used the %__always_inline attribute consistently, so that they always get resolved to the actual operations. Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Alexander Lobakin <alexandr.lobakin@intel.com> Acked-by: Mark Rutland <mark.rutland@arm.com> Reviewed-by: Yury Norov <yury.norov@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Signed-off-by: Yury Norov <yury.norov@gmail.com>
2022-06-29x86/retbleed: Add fine grained Kconfig knobsPeter Zijlstra
Do fine-grained Kconfig for all the various retbleed parts. NOTE: if your compiler doesn't support return thunks this will silently 'upgrade' your mitigation to IBPB, you might not like this. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-28treewide: uapi: Replace zero-length arrays with flexible-array membersGustavo A. R. Silva
There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. This code was transformed with the help of Coccinelle: (linux-5.19-rc2$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch) @@ identifier S, member, array; type T1, T2; @@ struct S { ... T1 member; T2 array[ - 0 ]; }; -fstrict-flex-arrays=3 is coming and we need to land these changes to prevent issues like these in the short future: ../fs/minix/dir.c:337:3: warning: 'strcpy' will always overflow; destination buffer has size 0, but the source string has length 2 (including NUL byte) [-Wfortify-source] strcpy(de3->name, "."); ^ Since these are all [0] to [] changes, the risk to UAPI is nearly zero. If this breaks anything, we can use a union with a new member name. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/78 Build-tested-by: kernel test robot <lkp@intel.com> Link: https://lore.kernel.org/lkml/62b675ec.wKX6AOZ6cbE71vtF%25lkp@intel.com/ Acked-by: Dan Williams <dan.j.williams@intel.com> # For ndctl.h Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2022-06-28efi: Simplify arch_efi_call_virt() macroSudeep Holla
Currently, the arch_efi_call_virt() assumes all users of it will have defined a type 'efi_##f##_t' to make use of it. Simplify the arch_efi_call_virt() macro by eliminating the explicit need for efi_##f##_t type for every user of this macro. Signed-off-by: Sudeep Holla <sudeep.holla@arm.com> Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk> [ardb: apply Sudeep's ARM fix to i686, Loongarch and RISC-V too] Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
2022-06-28arch/*/: remove CONFIG_VIRT_TO_BUSArnd Bergmann
All architecture-independent users of virt_to_bus() and bus_to_virt() have been fixed to use the dma mapping interfaces or have been removed now. This means the definitions on most architectures, and the CONFIG_VIRT_TO_BUS symbol are now obsolete and can be removed. The only exceptions to this are a few network and scsi drivers for m68k Amiga and VME machines and ppc32 Macintosh. These drivers work correctly with the old interfaces and are probably not worth changing. On alpha and parisc, virt_to_bus() were still used in asm/floppy.h. alpha can use isa_virt_to_bus() like x86 does, and parisc can just open-code the virt_to_phys() here, as this is architecture specific code. I tried updating the bus-virt-phys-mapping.rst documentation, which started as an email from Linus to explain some details of the Linux-2.0 driver interfaces. The bits about virt_to_bus() were declared obsolete backin 2000, and the rest is not all that relevant any more, so in the end I just decided to remove the file completely. Reviewed-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Geert Uytterhoeven <geert@linux-m68k.org> Acked-by: Michael Ellerman <mpe@ellerman.id.au> (powerpc) Acked-by: Helge Deller <deller@gmx.de> # parisc Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2022-06-27x86/cpu/amd: Enumerate BTC_NOAndrew Cooper
BTC_NO indicates that hardware is not susceptible to Branch Type Confusion. Zen3 CPUs don't suffer BTC. Hypervisors are expected to synthesise BTC_NO when it is appropriate given the migration pool, to prevent kernels using heuristics. [ bp: Massage. ] Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/speculation: Fill RSB on vmexit for IBRSJosh Poimboeuf
Prevent RSB underflow/poisoning attacks with RSB. While at it, add a bunch of comments to attempt to document the current state of tribal knowledge about RSB attacks and what exactly is being mitigated. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27KVM: VMX: Prevent guest RSB poisoning attacks with eIBRSJosh Poimboeuf
On eIBRS systems, the returns in the vmexit return path from __vmx_vcpu_run() to vmx_vcpu_run() are exposed to RSB poisoning attacks. Fix that by moving the post-vmexit spec_ctrl handling to immediately after the vmexit. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27objtool: Re-add UNWIND_HINT_{SAVE_RESTORE}Josh Poimboeuf
Commit c536ed2fffd5 ("objtool: Remove SAVE/RESTORE hints") removed the save/restore unwind hints because they were no longer needed. Now they're going to be needed again so re-add them. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/speculation: Fix firmware entry SPEC_CTRL handlingJosh Poimboeuf
The firmware entry code may accidentally clear STIBP or SSBD. Fix that. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/speculation: Fix RSB filling with CONFIG_RETPOLINE=nJosh Poimboeuf
If a kernel is built with CONFIG_RETPOLINE=n, but the user still wants to mitigate Spectre v2 using IBRS or eIBRS, the RSB filling will be silently disabled. There's nothing retpoline-specific about RSB buffer filling. Remove the CONFIG_RETPOLINE guards around it. Signed-off-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/cpu/amd: Add Spectral ChickenPeter Zijlstra
Zen2 uarchs have an undocumented, unnamed, MSR that contains a chicken bit for some speculation behaviour. It needs setting. Note: very belatedly AMD released naming; it's now officially called MSR_AMD64_DE_CFG2 and MSR_AMD64_DE_CFG2_SUPPRESS_NOBR_PRED_BIT but shall remain the SPECTRAL CHICKEN. Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27objtool: Add entry UNRET validationPeter Zijlstra
Since entry asm is tricky, add a validation pass that ensures the retbleed mitigation has been done before the first actual RET instruction. Entry points are those that either have UNWIND_HINT_ENTRY, which acts as UNWIND_HINT_EMPTY but marks the instruction as an entry point, or those that have UWIND_HINT_IRET_REGS at +0. This is basically a variant of validate_branch() that is intra-function and it will simply follow all branches from marked entry points and ensures that all paths lead to ANNOTATE_UNRET_END. If a path hits RET or an indirection the path is a fail and will be reported. There are 3 ANNOTATE_UNRET_END instances: - UNTRAIN_RET itself - exception from-kernel; this path doesn't need UNTRAIN_RET - all early exceptions; these also don't need UNTRAIN_RET Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Add retbleed=ibpbPeter Zijlstra
jmp2ret mitigates the easy-to-attack case at relatively low overhead. It mitigates the long speculation windows after a mispredicted RET, but it does not mitigate the short speculation window from arbitrary instruction boundaries. On Zen2, there is a chicken bit which needs setting, which mitigates "arbitrary instruction boundaries" down to just "basic block boundaries". But there is no fix for the short speculation window on basic block boundaries, other than to flush the entire BTB to evict all attacker predictions. On the spectrum of "fast & blurry" -> "safe", there is (on top of STIBP or no-SMT): 1) Nothing System wide open 2) jmp2ret May stop a script kiddy 3) jmp2ret+chickenbit Raises the bar rather further 4) IBPB Only thing which can count as "safe". Tentative numbers put IBPB-on-entry at a 2.5x hit on Zen2, and a 10x hit on Zen1 according to lmbench. [ bp: Fixup feature bit comments, document option, 32-bit build fix. ] Suggested-by: Andrew Cooper <Andrew.Cooper3@citrix.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27objtool: Update Retpoline validationPeter Zijlstra
Update retpoline validation with the new CONFIG_RETPOLINE requirement of not having bare naked RET instructions. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27intel_idle: Disable IBRS during long idlePeter Zijlstra
Having IBRS enabled while the SMT sibling is idle unnecessarily slows down the running sibling. OTOH, disabling IBRS around idle takes two MSR writes, which will increase the idle latency. Therefore, only disable IBRS around deeper idle states. Shallow idle states are bounded by the tick in duration, since NOHZ is not allowed for them by virtue of their short target residency. Only do this for mwait-driven idle, since that keeps interrupts disabled across idle, which makes disabling IBRS vs IRQ-entry a non-issue. Note: C6 is a random threshold, most importantly C1 probably shouldn't disable IBRS, benchmarking needed. Suggested-by: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Report Intel retbleed vulnerabilityPeter Zijlstra
Skylake suffers from RSB underflow speculation issues; report this vulnerability and it's mitigation (spectre_v2=ibrs). [jpoimboe: cleanups, eibrs] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/speculation: Add spectre_v2=ibrs option to support Kernel IBRSPawan Gupta
Extend spectre_v2= boot option with Kernel IBRS. [jpoimboe: no STIBP with IBRS] Signed-off-by: Pawan Gupta <pawan.kumar.gupta@linux.intel.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Optimize SPEC_CTRL MSR writesPeter Zijlstra
When changing SPEC_CTRL for user control, the WRMSR can be delayed until return-to-user when KERNEL_IBRS has been enabled. This avoids an MSR write during context switch. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/entry: Add kernel IBRS implementationPeter Zijlstra
Implement Kernel IBRS - currently the only known option to mitigate RSB underflow speculation issues on Skylake hardware. Note: since IBRS_ENTER requires fuller context established than UNTRAIN_RET, it must be placed after it. However, since UNTRAIN_RET itself implies a RET, it must come after IBRS_ENTER. This means IBRS_ENTER needs to also move UNTRAIN_RET. Note 2: KERNEL_IBRS is sub-optimal for XenPV. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Keep a per-CPU IA32_SPEC_CTRL valuePeter Zijlstra
Due to TIF_SSBD and TIF_SPEC_IB the actual IA32_SPEC_CTRL value can differ from x86_spec_ctrl_base. As such, keep a per-CPU value reflecting the current task's MSR content. [jpoimboe: rename] Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>
2022-06-27x86/bugs: Report AMD retbleed vulnerabilityAlexandre Chartre
Report that AMD x86 CPUs are vulnerable to the RETBleed (Arbitrary Speculative Code Execution with Return Instructions) attack. [peterz: add hygon] [kim: invert parity; fam15h] Co-developed-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Kim Phillips <kim.phillips@amd.com> Signed-off-by: Alexandre Chartre <alexandre.chartre@oracle.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Josh Poimboeuf <jpoimboe@kernel.org> Signed-off-by: Borislav Petkov <bp@suse.de>