summaryrefslogtreecommitdiff
path: root/arch/x86/kernel/traps.c
AgeCommit message (Collapse)Author
2024-03-11Merge tag 'x86-core-2024-03-11' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull core x86 updates from Ingo Molnar: - The biggest change is the rework of the percpu code, to support the 'Named Address Spaces' GCC feature, by Uros Bizjak: - This allows C code to access GS and FS segment relative memory via variables declared with such attributes, which allows the compiler to better optimize those accesses than the previous inline assembly code. - The series also includes a number of micro-optimizations for various percpu access methods, plus a number of cleanups of %gs accesses in assembly code. - These changes have been exposed to linux-next testing for the last ~5 months, with no known regressions in this area. - Fix/clean up __switch_to()'s broken but accidentally working handling of FPU switching - which also generates better code - Propagate more RIP-relative addressing in assembly code, to generate slightly better code - Rework the CPU mitigations Kconfig space to be less idiosyncratic, to make it easier for distros to follow & maintain these options - Rework the x86 idle code to cure RCU violations and to clean up the logic - Clean up the vDSO Makefile logic - Misc cleanups and fixes * tag 'x86-core-2024-03-11' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (52 commits) x86/idle: Select idle routine only once x86/idle: Let prefer_mwait_c1_over_halt() return bool x86/idle: Cleanup idle_setup() x86/idle: Clean up idle selection x86/idle: Sanitize X86_BUG_AMD_E400 handling sched/idle: Conditionally handle tick broadcast in default_idle_call() x86: Increase brk randomness entropy for 64-bit systems x86/vdso: Move vDSO to mmap region x86/vdso/kbuild: Group non-standard build attributes and primary object file rules together x86/vdso: Fix rethunk patching for vdso-image-{32,64}.o x86/retpoline: Ensure default return thunk isn't used at runtime x86/vdso: Use CONFIG_COMPAT_32 to specify vdso32 x86/vdso: Use $(addprefix ) instead of $(foreach ) x86/vdso: Simplify obj-y addition x86/vdso: Consolidate targets and clean-files x86/bugs: Rename CONFIG_RETHUNK => CONFIG_MITIGATION_RETHUNK x86/bugs: Rename CONFIG_CPU_SRSO => CONFIG_MITIGATION_SRSO x86/bugs: Rename CONFIG_CPU_IBRS_ENTRY => CONFIG_MITIGATION_IBRS_ENTRY x86/bugs: Rename CONFIG_CPU_UNRET_ENTRY => CONFIG_MITIGATION_UNRET_ENTRY x86/bugs: Rename CONFIG_SLS => CONFIG_MITIGATION_SLS ...
2024-02-14Merge tag 'v6.8-rc4' into x86/percpu, to resolve conflicts and refresh the ↵Ingo Molnar
branch Conflicts: arch/x86/include/asm/percpu.h arch/x86/include/asm/text-patching.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2024-01-31x86/fred: Invoke FRED initialization code to enable FREDH. Peter Anvin (Intel)
Let cpu_init_exception_handling() call cpu_init_fred_exceptions() to initialize FRED. However if FRED is unavailable or disabled, it falls back to set up TSS IST and initialize IDT. Co-developed-by: Xin Li <xin3.li@intel.com> Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Shan Kang <shan.kang@intel.com> Link: https://lore.kernel.org/r/20231205105030.8698-36-xin3.li@intel.com
2024-01-31x86/fred: Add a debug fault entry stub for FREDH. Peter Anvin (Intel)
When occurred on different ring level, i.e., from user or kernel context, stack, while kernel #DB on a dedicated stack. This is exactly how FRED event delivery invokes an exception handler: ring 3 event on level 0 stack, i.e., current task stack; ring 0 event on the #DB dedicated stack specified in the IA32_FRED_STKLVLS MSR. So unlike IDT, the FRED debug exception entry stub doesn't do stack switch. On a FRED system, the debug trap status information (DR6) is passed on the stack, to avoid the problem of transient state. Furthermore, FRED transitions avoid a lot of ugly corner cases the handling of which can, and should be, skipped. The FRED debug trap status information saved on the stack differs from DR6 in both stickiness and polarity; it is exactly in the format which debug_read_clear_dr6() returns for the IDT entry points. Signed-off-by: H. Peter Anvin (Intel) <hpa@zytor.com> Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Shan Kang <shan.kang@intel.com> Link: https://lore.kernel.org/r/20231205105030.8698-24-xin3.li@intel.com
2024-01-30x86/fred: Add a fred= cmdline paramXin Li
Let command line option "fred" accept multiple options to make it easier to tweak its behavior. Currently, two options 'on' and 'off' are allowed, and the default behavior is to disable FRED. To enable FRED, append "fred=on" to the kernel command line. [ bp: Use cpu_feature_enabled(), touch ups. ] Signed-off-by: Xin Li <xin3.li@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Tested-by: Shan Kang <shan.kang@intel.com> Link: https://lore.kernel.org/r/20231205105030.8698-9-xin3.li@intel.com
2024-01-18Merge tag 'iommu-updates-v6.8' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: "Core changes: - Fix race conditions in device probe path - Retire IOMMU bus_ops - Support for passing custom allocators to page table drivers - Clean up Kconfig around IOMMU_SVA - Support for sharing SVA domains with all devices bound to a mm - Firmware data parsing cleanup - Tracing improvements for iommu-dma code - Some smaller fixes and cleanups ARM-SMMU drivers: - Device-tree binding updates: - Add additional compatible strings for Qualcomm SoCs - Document Adreno clocks for Qualcomm's SM8350 SoC - SMMUv2: - Implement support for the ->domain_alloc_paging() callback - Ensure Secure context is restored following suspend of Qualcomm SMMU implementation - SMMUv3: - Disable stalling mode for the "quiet" context descriptor - Minor refactoring and driver cleanups Intel VT-d driver: - Cleanup and refactoring AMD IOMMU driver: - Improve IO TLB invalidation logic - Small cleanups and improvements Rockchip IOMMU driver: - DT binding update to add Rockchip RK3588 Apple DART driver: - Apple M1 USB4/Thunderbolt DART support - Cleanups Virtio IOMMU driver: - Add support for iotlb_sync_map - Enable deferred IO TLB flushes" * tag 'iommu-updates-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (66 commits) iommu: Don't reserve 0-length IOVA region iommu/vt-d: Move inline helpers to header files iommu/vt-d: Remove unused vcmd interfaces iommu/vt-d: Remove unused parameter of intel_pasid_setup_pass_through() iommu/vt-d: Refactor device_to_iommu() to retrieve iommu directly iommu/sva: Fix memory leak in iommu_sva_bind_device() dt-bindings: iommu: rockchip: Add Rockchip RK3588 iommu/dma: Trace bounce buffer usage when mapping buffers iommu/arm-smmu: Convert to domain_alloc_paging() iommu/arm-smmu: Pass arm_smmu_domain to internal functions iommu/arm-smmu: Implement IOMMU_DOMAIN_BLOCKED iommu/arm-smmu: Convert to a global static identity domain iommu/arm-smmu: Reorganize arm_smmu_domain_add_master() iommu/arm-smmu-v3: Remove ARM_SMMU_DOMAIN_NESTED iommu/arm-smmu-v3: Master cannot be NULL in arm_smmu_write_strtab_ent() iommu/arm-smmu-v3: Add a type for the STE iommu/arm-smmu-v3: disable stall for quiet_cd iommu/qcom: restore IOMMU state if needed iommu/arm-smmu-qcom: Add QCM2290 MDSS compatible iommu/arm-smmu-qcom: Add missing GMU entry to match table ...
2023-12-12iommu: Add mm_get_enqcmd_pasid() helper functionTina Zhang
mm_get_enqcmd_pasid() should be used by architecture code and closely related to learn the PASID value that the x86 ENQCMD operation should use for the mm. For the moment SMMUv3 uses this without any connection to ENQCMD, it will be cleaned up similar to how the prior patch made VT-d use the PASID argument of set_dev_pasid(). The motivation is to replace mm->pasid with an iommu private data structure that is introduced in a later patch. Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Tested-by: Nicolin Chen <nicolinc@nvidia.com> Signed-off-by: Tina Zhang <tina.zhang@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20231027000525.1278806-4-tina.zhang@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-12iommu: Change kconfig around IOMMU_SVAJason Gunthorpe
Linus suggested that the kconfig here is confusing: https://lore.kernel.org/all/CAHk-=wgUiAtiszwseM1p2fCJ+sC4XWQ+YN4TanFhUgvUqjr9Xw@mail.gmail.com/ Let's break it into three kconfigs controlling distinct things: - CONFIG_IOMMU_MM_DATA controls if the mm_struct has the additional fields for the IOMMU. Currently only PASID, but later patches store a struct iommu_mm_data * - CONFIG_ARCH_HAS_CPU_PASID controls if the arch needs the scheduling bit for keeping track of the ENQCMD instruction. x86 will select this if IOMMU_SVA is enabled - IOMMU_SVA controls if the IOMMU core compiles in the SVA support code for iommu driver use and the IOMMU exported API This way ARM will not enable CONFIG_ARCH_HAS_CPU_PASID Signed-off-by: Jason Gunthorpe <jgg@nvidia.com> Link: https://lore.kernel.org/r/20231027000525.1278806-2-tina.zhang@intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-12-11x86/traps: Use current_top_of_stack() helper in traps.cUros Bizjak
Use current_top_of_stack() helper in sync_regs() and vc_switch_off_ist() instead of open-coding the reading of the top_of_stack percpu variable explicitly. Signed-off-by: Uros Bizjak <ubizjak@gmail.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Link: https://lore.kernel.org/r/20231204210320.114429-2-ubizjak@gmail.com
2023-11-23arch: include linux/cpu.h for trap_init() prototypeArnd Bergmann
some architectures run into a -Wmissing-prototypes warning for trap_init() arch/microblaze/kernel/traps.c:21:6: warning: no previous prototype for 'trap_init' [-Wmissing-prototypes] Include the right header to avoid this consistently, removing the extra declarations on m68k and x86 that were added as local workarounds already. Signed-off-by: Arnd Bergmann <arnd@arndb.de>
2023-08-31Merge tag 'x86_shstk_for_6.6-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 shadow stack support from Dave Hansen: "This is the long awaited x86 shadow stack support, part of Intel's Control-flow Enforcement Technology (CET). CET consists of two related security features: shadow stacks and indirect branch tracking. This series implements just the shadow stack part of this feature, and just for userspace. The main use case for shadow stack is providing protection against return oriented programming attacks. It works by maintaining a secondary (shadow) stack using a special memory type that has protections against modification. When executing a CALL instruction, the processor pushes the return address to both the normal stack and to the special permission shadow stack. Upon RET, the processor pops the shadow stack copy and compares it to the normal stack copy. For more information, refer to the links below for the earlier versions of this patch set" Link: https://lore.kernel.org/lkml/20220130211838.8382-1-rick.p.edgecombe@intel.com/ Link: https://lore.kernel.org/lkml/20230613001108.3040476-1-rick.p.edgecombe@intel.com/ * tag 'x86_shstk_for_6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (47 commits) x86/shstk: Change order of __user in type x86/ibt: Convert IBT selftest to asm x86/shstk: Don't retry vm_munmap() on -EINTR x86/kbuild: Fix Documentation/ reference x86/shstk: Move arch detail comment out of core mm x86/shstk: Add ARCH_SHSTK_STATUS x86/shstk: Add ARCH_SHSTK_UNLOCK x86: Add PTRACE interface for shadow stack selftests/x86: Add shadow stack test x86/cpufeatures: Enable CET CR4 bit for shadow stack x86/shstk: Wire in shadow stack interface x86: Expose thread features in /proc/$PID/status x86/shstk: Support WRSS for userspace x86/shstk: Introduce map_shadow_stack syscall x86/shstk: Check that signal frame is shadow stack mem x86/shstk: Check that SSP is aligned on sigreturn x86/shstk: Handle signals for shadow stack x86/shstk: Introduce routines modifying shstk x86/shstk: Handle thread shadow stack x86/shstk: Add user-mode shadow stack support ...
2023-08-14x86/CPU/AMD: Fix the DIV(0) initial fix attemptBorislav Petkov (AMD)
Initially, it was thought that doing an innocuous division in the #DE handler would take care to prevent any leaking of old data from the divider but by the time the fault is raised, the speculation has already advanced too far and such data could already have been used by younger operations. Therefore, do the innocuous division on every exit to userspace so that userspace doesn't see any potentially old data from integer divisions in kernel space. Do the same before VMRUN too, to protect host data from leaking into the guest too. Fixes: 77245f1c3c64 ("x86/CPU/AMD: Do not leak quotient data after a division by 0") Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Link: https://lore.kernel.org/r/20230811213824.10025-1-bp@alien8.de
2023-08-09x86/CPU/AMD: Do not leak quotient data after a division by 0Borislav Petkov (AMD)
Under certain circumstances, an integer division by 0 which faults, can leave stale quotient data from a previous division operation on Zen1 microarchitectures. Do a dummy division 0/1 before returning from the #DE exception handler in order to avoid any leaks of potentially sensitive data. Signed-off-by: Borislav Petkov (AMD) <bp@alien8.de> Cc: <stable@kernel.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-08-02x86/shstk: Add user control-protection fault handlerRick Edgecombe
A control-protection fault is triggered when a control-flow transfer attempt violates Shadow Stack or Indirect Branch Tracking constraints. For example, the return address for a RET instruction differs from the copy on the shadow stack. There already exists a control-protection fault handler for handling kernel IBT faults. Refactor this fault handler into separate user and kernel handlers, like the page fault handler. Add a control-protection handler for usermode. To avoid ifdeffery, put them both in a new file cet.c, which is compiled in the case of either of the two CET features supported in the kernel: kernel IBT or user mode shadow stack. Move some static inline functions from traps.c into a header so they can be used in cet.c. Opportunistically fix a comment in the kernel IBT part of the fault handler that is on the end of the line instead of preceding it. Keep the same behavior for the kernel side of the fault handler, except for converting a BUG to a WARN in the case of a #CP happening when the feature is missing. This unifies the behavior with the new shadow stack code, and also prevents the kernel from crashing under this situation which is potentially recoverable. The control-protection fault handler works in a similar way as the general protection fault handler. It provides the si_code SEGV_CPERR to the signal handler. Co-developed-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Yu-cheng Yu <yu-cheng.yu@intel.com> Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Tested-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/all/20230613001108.3040476-28-rick.p.edgecombe%40intel.com
2023-07-25x86/traps: Fix load_unaligned_zeropad() handling for shared TDX memoryKirill A. Shutemov
Commit c4e34dd99f2e ("x86: simplify load_unaligned_zeropad() implementation") changes how exceptions around load_unaligned_zeropad() handled. The kernel now uses the fault_address in fixup_exception() to verify the address calculations for the load_unaligned_zeropad(). It works fine for #PF, but breaks on #VE since no fault address is passed down to fixup_exception(). Propagating ve_info.gla down to fixup_exception() resolves the issue. See commit 1e7769653b06 ("x86/tdx: Handle load_unaligned_zeropad() page-cross to a shared page") for more context. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Reported-by: Michael Kelley <mikelley@microsoft.com> Fixes: c4e34dd99f2e ("x86: simplify load_unaligned_zeropad() implementation") Acked-by: Dave Hansen <dave.hansen@linux.intel.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2023-07-11x86/traps: Move control protection handler to separate fileRick Edgecombe
Today the control protection handler is defined in traps.c and used only for the kernel IBT feature. To reduce ifdeffery, move it to it's own file. In future patches, functionality will be added to make this handler also handle user shadow stack faults. So name the file cet.c. No functional change. Signed-off-by: Rick Edgecombe <rick.p.edgecombe@intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Borislav Petkov (AMD) <bp@alien8.de> Reviewed-by: Kees Cook <keescook@chromium.org> Acked-by: Mike Rapoport (IBM) <rppt@kernel.org> Tested-by: Pengfei Xu <pengfei.xu@intel.com> Tested-by: John Allen <john.allen@amd.com> Tested-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/all/20230613001108.3040476-8-rick.p.edgecombe%40intel.com
2023-04-30Merge tag 'iommu-updates-v6.4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu Pull iommu updates from Joerg Roedel: - Convert to platform remove callback returning void - Extend changing default domain to normal group - Intel VT-d updates: - Remove VT-d virtual command interface and IOASID - Allow the VT-d driver to support non-PRI IOPF - Remove PASID supervisor request support - Various small and misc cleanups - ARM SMMU updates: - Device-tree binding updates: * Allow Qualcomm GPU SMMUs to accept relevant clock properties * Document Qualcomm 8550 SoC as implementing an MMU-500 * Favour new "qcom,smmu-500" binding for Adreno SMMUs - Fix S2CR quirk detection on non-architectural Qualcomm SMMU implementations - Acknowledge SMMUv3 PRI queue overflow when consuming events - Document (in a comment) why ATS is disabled for bypass streams - AMD IOMMU updates: - 5-level page-table support - NUMA awareness for memory allocations - Unisoc driver: Support for reattaching an existing domain - Rockchip driver: Add missing set_platform_dma_ops callback - Mediatek driver: Adjust the dma-ranges - Various other small fixes and cleanups * tag 'iommu-updates-v6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/joro/iommu: (82 commits) iommu: Remove iommu_group_get_by_id() iommu: Make iommu_release_device() static iommu/vt-d: Remove BUG_ON in dmar_insert_dev_scope() iommu/vt-d: Remove a useless BUG_ON(dev->is_virtfn) iommu/vt-d: Remove BUG_ON in map/unmap() iommu/vt-d: Remove BUG_ON when domain->pgd is NULL iommu/vt-d: Remove BUG_ON in handling iotlb cache invalidation iommu/vt-d: Remove BUG_ON on checking valid pfn range iommu/vt-d: Make size of operands same in bitwise operations iommu/vt-d: Remove PASID supervisor request support iommu/vt-d: Use non-privileged mode for all PASIDs iommu/vt-d: Remove extern from function prototypes iommu/vt-d: Do not use GFP_ATOMIC when not needed iommu/vt-d: Remove unnecessary checks in iopf disabling path iommu/vt-d: Move PRI handling to IOPF feature path iommu/vt-d: Move pfsid and ats_qdep calculation to device probe path iommu/vt-d: Move iopf code from SVA to IOPF enabling path iommu/vt-d: Allow SVA with device-specific IOPF dmaengine: idxd: Add enable/disable device IOPF feature arm64: dts: mt8186: Add dma-ranges for the parent "soc" node ...
2023-03-31iommu/ioasid: Rename INVALID_IOASIDJacob Pan
INVALID_IOASID and IOMMU_PASID_INVALID are duplicated. Rename INVALID_IOASID and consolidate since we are moving away from IOASID infrastructure. Reviewed-by: Dave Jiang <dave.jiang@intel.com> Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Reviewed-by: Lu Baolu <baolu.lu@linux.intel.com> Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com> Link: https://lore.kernel.org/r/20230322200803.869130-7-jacob.jun.pan@linux.intel.com Signed-off-by: Joerg Roedel <jroedel@suse.de>
2023-03-16iommu/sva: Replace pasid_valid() helper with mm_valid_pasid()Kirill A. Shutemov
Kernel has few users of pasid_valid() and all but one checks if the process has PASID allocated. The helper takes ioasid_t as the input. Replace the helper with mm_valid_pasid() that takes mm_struct as the argument. The only call that checks PASID that is not tied to mm_struct is open-codded now. This is preparatory patch. It helps avoid ifdeffery: no need to dereference mm->pasid in generic code to check if the process has PASID. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/all/20230312112612.31869-11-kirill.shutemov%40linux.intel.com
2022-12-14Merge tag 'x86_core_for_v6.2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Add the call depth tracking mitigation for Retbleed which has been long in the making. It is a lighterweight software-only fix for Skylake-based cores where enabling IBRS is a big hammer and causes a significant performance impact. What it basically does is, it aligns all kernel functions to 16 bytes boundary and adds a 16-byte padding before the function, objtool collects all functions' locations and when the mitigation gets applied, it patches a call accounting thunk which is used to track the call depth of the stack at any time. When that call depth reaches a magical, microarchitecture-specific value for the Return Stack Buffer, the code stuffs that RSB and avoids its underflow which could otherwise lead to the Intel variant of Retbleed. This software-only solution brings a lot of the lost performance back, as benchmarks suggest: https://lore.kernel.org/all/20220915111039.092790446@infradead.org/ That page above also contains a lot more detailed explanation of the whole mechanism - Implement a new control flow integrity scheme called FineIBT which is based on the software kCFI implementation and uses hardware IBT support where present to annotate and track indirect branches using a hash to validate them - Other misc fixes and cleanups * tag 'x86_core_for_v6.2' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (80 commits) x86/paravirt: Use common macro for creating simple asm paravirt functions x86/paravirt: Remove clobber bitmask from .parainstructions x86/debug: Include percpu.h in debugreg.h to get DECLARE_PER_CPU() et al x86/cpufeatures: Move X86_FEATURE_CALL_DEPTH from bit 18 to bit 19 of word 11, to leave space for WIP X86_FEATURE_SGX_EDECCSSA bit x86/Kconfig: Enable kernel IBT by default x86,pm: Force out-of-line memcpy() objtool: Fix weak hole vs prefix symbol objtool: Optimize elf_dirty_reloc_sym() x86/cfi: Add boot time hash randomization x86/cfi: Boot time selection of CFI scheme x86/ibt: Implement FineIBT objtool: Add --cfi to generate the .cfi_sites section x86: Add prefix symbols for function padding objtool: Add option to generate prefix symbols objtool: Avoid O(bloody terrible) behaviour -- an ode to libelf objtool: Slice up elf_create_section_symbol() kallsyms: Revert "Take callthunks into account" x86: Unconfuse CONFIG_ and X86_FEATURE_ namespaces x86/retpoline: Fix crash printing warning x86/paravirt: Fix a !PARAVIRT build warning ...
2022-12-12Merge tag 'x86-cleanups-2022-12-10' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 cleanups from Thomas Gleixner: "A set of x86 cleanups: - Rework the handling of x86_regset for 32 and 64 bit. The original implementation tried to minimize the allocation size with quite some hard to understand and fragile tricks. Make it robust and straight forward by separating the register enumerations for 32 and 64 bit completely. - Add a few missing static annotations - Remove the stale unused setup_once() assembly function - Address a few minor static analysis and kernel-doc warnings" * tag 'x86-cleanups-2022-12-10' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/asm/32: Remove setup_once() x86/kaslr: Fix process_mem_region()'s return value x86: Fix misc small issues x86/boot: Repair kernel-doc for boot_kstrtoul() x86: Improve formatting of user_regset arrays x86: Separate out x86_regset for 32 and 64 bit x86/i8259: Make default_legacy_pic static x86/tsc: Make art_related_clocksource static
2022-11-21Merge tag 'v6.1-rc6' into x86/core, to resolve conflictsIngo Molnar
Resolve conflicts between these commits in arch/x86/kernel/asm-offsets.c: # upstream: debc5a1ec0d1 ("KVM: x86: use a separate asm-offsets.c file") # retbleed work in x86/core: 5d8213864ade ("x86/retbleed: Add SKL return thunk") ... and these commits in include/linux/bpf.h: # upstram: 18acb7fac22f ("bpf: Revert ("Fix dispatcher patchable function entry to 5 bytes nop")") # x86/core commits: 931ab63664f0 ("x86/ibt: Implement FineIBT") bea75b33895f ("x86/Kconfig: Introduce function padding") The latter two modify BPF_DISPATCHER_ATTRIBUTES(), which was removed upstream. Conflicts: arch/x86/kernel/asm-offsets.c include/linux/bpf.h Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-11-08x86/traps: avoid KMSAN bugs originating from handle_bug()Alexander Potapenko
There is a case in exc_invalid_op handler that is executed outside the irqentry_enter()/irqentry_exit() region when an UD2 instruction is used to encode a call to __warn(). In that case the `struct pt_regs` passed to the interrupt handler is never unpoisoned by KMSAN (this is normally done in irqentry_enter()), which leads to false positives inside handle_bug(). Use kmsan_unpoison_entry_regs() to explicitly unpoison those registers before using them. Link: https://lkml.kernel.org/r/20221102110611.1085175-5-glider@google.com Signed-off-by: Alexander Potapenko <glider@google.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Marco Elver <elver@google.com> Cc: Masahiro Yamada <masahiroy@kernel.org> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2022-11-08x86: Fix misc small issuesJiapeng Chong
Fix: ./arch/x86/kernel/traps.c: asm/proto.h is included more than once. ./arch/x86/kernel/alternative.c:1610:2-3: Unneeded semicolon. [ bp: Merge into a single patch. ] Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/1620902768-53822-1-git-send-email-jiapeng.chong@linux.alibaba.com Link: https://lore.kernel.org/r/20220926054628.116957-1-jiapeng.chong@linux.alibaba.com
2022-10-17x86/percpu: Move current_top_of_stack next to current_taskThomas Gleixner
Extend the struct pcpu_hot cacheline with current_top_of_stack; another very frequently used value. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20220915111145.493038635@infradead.org
2022-09-26x86: Add support for CONFIG_CFI_CLANGSami Tolvanen
With CONFIG_CFI_CLANG, the compiler injects a type preamble immediately before each function and a check to validate the target function type before indirect calls: ; type preamble __cfi_function: mov <id>, %eax function: ... ; indirect call check mov -<id>,%r10d add -0x4(%r11),%r10d je .Ltmp1 ud2 .Ltmp1: call __x86_indirect_thunk_r11 Add error handling code for the ud2 traps emitted for the checks, and allow CONFIG_CFI_CLANG to be selected on x86_64. This produces the following oops on CFI failure (generated using lkdtm): [ 21.441706] CFI failure at lkdtm_indirect_call+0x16/0x20 [lkdtm] (target: lkdtm_increment_int+0x0/0x10 [lkdtm]; expected type: 0x7e0c52a) [ 21.444579] invalid opcode: 0000 [#1] PREEMPT SMP NOPTI [ 21.445296] CPU: 0 PID: 132 Comm: sh Not tainted 5.19.0-rc8-00020-g9f27360e674c #1 [ 21.445296] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.14.0-0-g155821a1990b-prebuilt.qemu.org 04/01/2014 [ 21.445296] RIP: 0010:lkdtm_indirect_call+0x16/0x20 [lkdtm] [ 21.445296] Code: 52 1c c0 48 c7 c1 c5 50 1c c0 e9 25 48 2a cc 0f 1f 44 00 00 49 89 fb 48 c7 c7 50 b4 1c c0 41 ba 5b ad f3 81 45 03 53 f8 [ 21.445296] RSP: 0018:ffffa9f9c02ffdc0 EFLAGS: 00000292 [ 21.445296] RAX: 0000000000000027 RBX: ffffffffc01cb300 RCX: 385cbbd2e070a700 [ 21.445296] RDX: 0000000000000000 RSI: c0000000ffffdfff RDI: ffffffffc01cb450 [ 21.445296] RBP: 0000000000000006 R08: 0000000000000000 R09: ffffffff8d081610 [ 21.445296] R10: 00000000bcc90825 R11: ffffffffc01c2fc0 R12: 0000000000000000 [ 21.445296] R13: ffffa31b827a6000 R14: 0000000000000000 R15: 0000000000000002 [ 21.445296] FS: 00007f08b42216a0(0000) GS:ffffa31b9f400000(0000) knlGS:0000000000000000 [ 21.445296] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 21.445296] CR2: 0000000000c76678 CR3: 0000000001940000 CR4: 00000000000006f0 [ 21.445296] Call Trace: [ 21.445296] <TASK> [ 21.445296] lkdtm_CFI_FORWARD_PROTO+0x30/0x50 [lkdtm] [ 21.445296] direct_entry+0x12d/0x140 [lkdtm] [ 21.445296] full_proxy_write+0x5d/0xb0 [ 21.445296] vfs_write+0x144/0x460 [ 21.445296] ? __x64_sys_wait4+0x5a/0xc0 [ 21.445296] ksys_write+0x69/0xd0 [ 21.445296] do_syscall_64+0x51/0xa0 [ 21.445296] entry_SYSCALL_64_after_hwframe+0x63/0xcd [ 21.445296] RIP: 0033:0x7f08b41a6fe1 [ 21.445296] Code: be 07 00 00 00 41 89 c0 e8 7e ff ff ff 44 89 c7 89 04 24 e8 91 c6 02 00 8b 04 24 48 83 c4 68 c3 48 63 ff b8 01 00 00 03 [ 21.445296] RSP: 002b:00007ffcdf65c2e8 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 21.445296] RAX: ffffffffffffffda RBX: 00007f08b4221690 RCX: 00007f08b41a6fe1 [ 21.445296] RDX: 0000000000000012 RSI: 0000000000c738f0 RDI: 0000000000000001 [ 21.445296] RBP: 0000000000000001 R08: fefefefefefefeff R09: fefefefeffc5ff4e [ 21.445296] R10: 00007f08b42222b0 R11: 0000000000000246 R12: 0000000000c738f0 [ 21.445296] R13: 0000000000000012 R14: 00007ffcdf65c401 R15: 0000000000c70450 [ 21.445296] </TASK> [ 21.445296] Modules linked in: lkdtm [ 21.445296] Dumping ftrace buffer: [ 21.445296] (ftrace buffer empty) [ 21.471442] ---[ end trace 0000000000000000 ]--- [ 21.471811] RIP: 0010:lkdtm_indirect_call+0x16/0x20 [lkdtm] [ 21.472467] Code: 52 1c c0 48 c7 c1 c5 50 1c c0 e9 25 48 2a cc 0f 1f 44 00 00 49 89 fb 48 c7 c7 50 b4 1c c0 41 ba 5b ad f3 81 45 03 53 f8 [ 21.474400] RSP: 0018:ffffa9f9c02ffdc0 EFLAGS: 00000292 [ 21.474735] RAX: 0000000000000027 RBX: ffffffffc01cb300 RCX: 385cbbd2e070a700 [ 21.475664] RDX: 0000000000000000 RSI: c0000000ffffdfff RDI: ffffffffc01cb450 [ 21.476471] RBP: 0000000000000006 R08: 0000000000000000 R09: ffffffff8d081610 [ 21.477127] R10: 00000000bcc90825 R11: ffffffffc01c2fc0 R12: 0000000000000000 [ 21.477959] R13: ffffa31b827a6000 R14: 0000000000000000 R15: 0000000000000002 [ 21.478657] FS: 00007f08b42216a0(0000) GS:ffffa31b9f400000(0000) knlGS:0000000000000000 [ 21.479577] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 21.480307] CR2: 0000000000c76678 CR3: 0000000001940000 CR4: 00000000000006f0 [ 21.481460] Kernel panic - not syncing: Fatal exception Signed-off-by: Sami Tolvanen <samitolvanen@google.com> Reviewed-by: Kees Cook <keescook@chromium.org> Tested-by: Kees Cook <keescook@chromium.org> Tested-by: Nathan Chancellor <nathan@kernel.org> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org> Signed-off-by: Kees Cook <keescook@chromium.org> Link: https://lore.kernel.org/r/20220908215504.3686827-23-samitolvanen@google.com
2022-05-23Merge tag 'x86_asm_for_v5.19_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 asm updates from Borislav Petkov: - A bunch of changes towards streamlining low level asm helpers' calling conventions so that former can be converted to C eventually - Simplify PUSH_AND_CLEAR_REGS so that it can be used at the system call entry paths instead of having opencoded, slightly different variants of it everywhere - Misc other fixes * tag 'x86_asm_for_v5.19_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/entry: Fix register corruption in compat syscall objtool: Fix STACK_FRAME_NON_STANDARD reloc type linkage: Fix issue with missing symbol size x86/entry: Remove skip_r11rcx x86/entry: Use PUSH_AND_CLEAR_REGS for compat x86/entry: Simplify entry_INT80_compat() x86/mm: Simplify RESERVE_BRK() x86/entry: Convert SWAPGS to swapgs and remove the definition of SWAPGS x86/entry: Don't call error_entry() for XENPV x86/entry: Move CLD to the start of the idtentry macro x86/entry: Move PUSH_AND_CLEAR_REGS out of error_entry() x86/entry: Switch the stack after error_entry() returns x86/traps: Use pt_regs directly in fixup_bad_iret()
2022-05-03x86/traps: Use pt_regs directly in fixup_bad_iret()Lai Jiangshan
Always stash the address error_entry() is going to return to, in %r12 and get rid of the void *error_entry_ret; slot in struct bad_iret_stack which was supposed to account for it and pt_regs pushed on the stack. After this, both fixup_bad_iret() and sync_regs() can work on a struct pt_regs pointer directly. [ bp: Rewrite commit message, touch ups. ] Signed-off-by: Lai Jiangshan <jiangshan.ljs@antgroup.com> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20220503032107.680190-2-jiangshanlai@gmail.com
2022-04-07x86/mm/cpa: Add support for TDX shared memoryKirill A. Shutemov
Intel TDX protects guest memory from VMM access. Any memory that is required for communication with the VMM must be explicitly shared. It is a two-step process: the guest sets the shared bit in the page table entry and notifies VMM about the change. The notification happens using MapGPA hypercall. Conversion back to private memory requires clearing the shared bit, notifying VMM with MapGPA hypercall following with accepting the memory with AcceptPage hypercall. Provide a TDX version of x86_platform.guest.* callbacks. It makes __set_memory_enc_pgtable() work right in TDX guest. Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lkml.kernel.org/r/20220405232939.73860-27-kirill.shutemov@linux.intel.com
2022-04-07x86/traps: Add #VE support for TDX guestKirill A. Shutemov
Virtualization Exceptions (#VE) are delivered to TDX guests due to specific guest actions which may happen in either user space or the kernel: * Specific instructions (WBINVD, for example) * Specific MSR accesses * Specific CPUID leaf accesses * Access to specific guest physical addresses Syscall entry code has a critical window where the kernel stack is not yet set up. Any exception in this window leads to hard to debug issues and can be exploited for privilege escalation. Exceptions in the NMI entry code also cause issues. Returning from the exception handler with IRET will re-enable NMIs and nested NMI will corrupt the NMI stack. For these reasons, the kernel avoids #VEs during the syscall gap and the NMI entry code. Entry code paths do not access TD-shared memory, MMIO regions, use #VE triggering MSRs, instructions, or CPUID leaves that might generate #VE. VMM can remove memory from TD at any point, but access to unaccepted (or missing) private memory leads to VM termination, not to #VE. Similarly to page faults and breakpoints, #VEs are allowed in NMI handlers once the kernel is ready to deal with nested NMIs. During #VE delivery, all interrupts, including NMIs, are blocked until TDGETVEINFO is called. It prevents #VE nesting until the kernel reads the VE info. TDGETVEINFO retrieves the #VE info from the TDX module, which also clears the "#VE valid" flag. This must be done before anything else as any #VE that occurs while the valid flag is set escalates to #DF by TDX module. It will result in an oops. Virtual NMIs are inhibited if the #VE valid flag is set. NMI will not be delivered until TDGETVEINFO is called. For now, convert unhandled #VE's (everything, until later in this series) so that they appear just like a #GP by calling the ve_raise_fault() directly. The ve_raise_fault() function is similar to #GP handler and is responsible for sending SIGSEGV to userspace and CPU die and notifying debuggers and other die chain users. Co-developed-by: Sean Christopherson <sean.j.christopherson@intel.com> Co-developed-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Sean Christopherson <sean.j.christopherson@intel.com> Signed-off-by: Kuppuswamy Sathyanarayanan <sathyanarayanan.kuppuswamy@linux.intel.com> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220405232939.73860-8-kirill.shutemov@linux.intel.com
2022-04-07x86/traps: Refactor exc_general_protection()Kirill A. Shutemov
TDX brings a new exception -- Virtualization Exception (#VE). Handling of #VE structurally very similar to handling #GP. Extract two helpers from exc_general_protection() that can be reused for handling #VE. No functional changes. Suggested-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Dave Hansen <dave.hansen@linux.intel.com> Reviewed-by: Dave Hansen <dave.hansen@linux.intel.com> Link: https://lkml.kernel.org/r/20220405232939.73860-7-kirill.shutemov@linux.intel.com
2022-03-15Merge branch 'x86/pasid' into x86/core, to resolve conflictsIngo Molnar
Conflicts: tools/objtool/arch/x86/decode.c Signed-off-by: Ingo Molnar <mingo@kernel.org>
2022-03-15x86/ibt: Add IBT feature, MSR and #CP handlingPeter Zijlstra
The bits required to make the hardware go.. Of note is that, provided the syscall entry points are covered with ENDBR, #CP doesn't need to be an IST because we'll never hit the syscall gap. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220308154318.582331711@infradead.org
2022-03-11x86/traps: Mark do_int3() NOKPROBE_SYMBOLLi Huafei
Since kprobe_int3_handler() is called in do_int3(), probing do_int3() can cause a breakpoint recursion and crash the kernel. Therefore, do_int3() should be marked as NOKPROBE_SYMBOL. Fixes: 21e28290b317 ("x86/traps: Split int3 handler up") Signed-off-by: Li Huafei <lihuafei1@huawei.com> Signed-off-by: Borislav Petkov <bp@suse.de> Acked-by: Masami Hiramatsu <mhiramat@kernel.org> Cc: <stable@vger.kernel.org> Link: https://lore.kernel.org/r/20220310120915.63349-1-lihuafei1@huawei.com
2022-02-15x86/traps: Demand-populate PASID MSR via #GPFenghua Yu
All tasks start with PASID state disabled. This means that the first time they execute an ENQCMD instruction they will take a #GP fault. Modify the #GP fault handler to check if the "mm" for the task has already been allocated a PASID. If so, try to fix the #GP fault by loading the IA32_PASID MSR. Signed-off-by: Fenghua Yu <fenghua.yu@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Tony Luck <tony.luck@intel.com> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Link: https://lore.kernel.org/r/20220207230254.3342514-9-fenghua.yu@intel.com
2021-11-02Merge tag 'x86_core_for_v5.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 core updates from Borislav Petkov: - Do not #GP on userspace use of CLI/STI but pretend it was a NOP to keep old userspace from breaking. Adjust the corresponding iopl selftest to that. - Improve stack overflow warnings to say which stack got overflowed and raise the exception stack sizes to 2 pages since overflowing the single page of exception stack is very easy to do nowadays with all the tracing machinery enabled. With that, rip out the custom mapping of AMD SEV's too. - A bunch of changes in preparation for FGKASLR like supporting more than 64K section headers in the relocs tool, correct ORC lookup table size to cover the whole kernel .text and other adjustments. * tag 'x86_core_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: selftests/x86/iopl: Adjust to the faked iopl CLI/STI usage vmlinux.lds.h: Have ORC lookup cover entire _etext - _stext x86/boot/compressed: Avoid duplicate malloc() implementations x86/boot: Allow a "silent" kaslr random byte fetch x86/tools/relocs: Support >64K section headers x86/sev: Make the #VC exception stacks part of the default stacks storage x86: Increase exception stack sizes x86/mm/64: Improve stack overflow warnings x86/iopl: Fake iopl(3) CLI/STI usage
2021-11-01Merge tag 'x86_sev_for_v5.16_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 SEV updates from Borislav Petkov: - Export sev_es_ghcb_hv_call() so that HyperV Isolation VMs can use it too - Non-urgent fixes and cleanups * tag 'x86_sev_for_v5.16_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/sev: Expose sev_es_ghcb_hv_call() for use by HyperV x86/sev: Allow #VC exceptions on the VC2 stack x86/sev: Fix stack type check in vc_switch_off_ist() x86/sme: Use #define USE_EARLY_PGTABLE_L5 in mem_encrypt_identity.c x86/sev: Carve out HV call's return value verification
2021-10-26x86/fpu/xstate: Add XFD #NM handlerChang S. Bae
If the XFD MSR has feature bits set then #NM will be raised when user space attempts to use an instruction related to one of these features. When the task has no permissions to use that feature, raise SIGILL, which is the same behavior as #UD. If the task has permissions, calculate the new buffer size for the extended feature set and allocate a larger fpstate. In the unlikely case that vzalloc() fails, SIGSEGV is raised. The allocation function will be added in the next step. Provide a stub which fails for now. [ tglx: Updated serialization ] Signed-off-by: Chang S. Bae <chang.seok.bae@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lore.kernel.org/r/20211021225527.10184-18-chang.seok.bae@intel.com
2021-10-21x86/sev: Fix stack type check in vc_switch_off_ist()Joerg Roedel
The value of STACK_TYPE_EXCEPTION_LAST points to the last _valid_ exception stack. Reflect that in the check done in the vc_switch_off_ist() function. Fixes: a13644f3a53de ("x86/entry/64: Add entry code for #VC handler") Reported-by: Tom Lendacky <thomas.lendacky@amd.com> Signed-off-by: Joerg Roedel <jroedel@suse.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211021080833.30875-2-joro@8bytes.org
2021-10-20x86/fpu: Replace the includes of fpu/internal.hThomas Gleixner
Now that the file is empty, fixup all references with the proper includes and delete the former kitchen sink. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20211015011540.001197214@linutronix.de
2021-09-21x86/mm/64: Improve stack overflow warningsPeter Zijlstra
Current code has an explicit check for hitting the task stack guard; but overflowing any of the other stacks will get you a non-descript general #DF warning. Improve matters by using get_stack_info_noinstr() to detetrmine if and which stack guard page got hit, enabling a better stack warning. In specific, Michael Wang reported what turned out to be an NMI exception stack overflow, which is now clearly reported as such: [] BUG: NMI stack guard page was hit at 0000000085fd977b (stack is 000000003a55b09e..00000000d8cce1a5) Reported-by: Michael Wang <yun.wang@linux.alibaba.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Tested-by: Michael Wang <yun.wang@linux.alibaba.com> Link: https://lkml.kernel.org/r/YUTE/NuqnaWbST8n@hirez.programming.kicks-ass.net
2021-09-21x86/iopl: Fake iopl(3) CLI/STI usagePeter Zijlstra
Since commit c8137ace5638 ("x86/iopl: Restrict iopl() permission scope") it's possible to emulate iopl(3) using ioperm(), except for the CLI/STI usage. Userspace CLI/STI usage is very dubious (read broken), since any exception taken during that window can lead to rescheduling anyway (or worse). The IOPL(2) manpage even states that usage of CLI/STI is highly discouraged and might even crash the system. Of course, that won't stop people and HP has the dubious honour of being the first vendor to be found using this in their hp-health package. In order to enable this 'software' to still 'work', have the #GP treat the CLI/STI instructions as NOPs when iopl(3). Warn the user that their program is doing dubious things. Fixes: a24ca9976843 ("x86/iopl: Remove legacy IOPL option") Reported-by: Ondrej Zary <linux@zary.sk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Thomas Gleixner <tglx@linutronix.de> Cc: stable@kernel.org # v5.5+ Link: https://lkml.kernel.org/r/20210918090641.GD5106@worktop.programming.kicks-ass.net
2021-07-07Merge tag 'x86-fpu-2021-07-07' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fpu updates from Thomas Gleixner: "Fixes and improvements for FPU handling on x86: - Prevent sigaltstack out of bounds writes. The kernel unconditionally writes the FPU state to the alternate stack without checking whether the stack is large enough to accomodate it. Check the alternate stack size before doing so and in case it's too small force a SIGSEGV instead of silently corrupting user space data. - MINSIGSTKZ and SIGSTKSZ are constants in signal.h and have never been updated despite the fact that the FPU state which is stored on the signal stack has grown over time which causes trouble in the field when AVX512 is available on a CPU. The kernel does not expose the minimum requirements for the alternate stack size depending on the available and enabled CPU features. ARM already added an aux vector AT_MINSIGSTKSZ for the same reason. Add it to x86 as well. - A major cleanup of the x86 FPU code. The recent discoveries of XSTATE related issues unearthed quite some inconsistencies, duplicated code and other issues. The fine granular overhaul addresses this, makes the code more robust and maintainable, which allows to integrate upcoming XSTATE related features in sane ways" * tag 'x86-fpu-2021-07-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (74 commits) x86/fpu/xstate: Clear xstate header in copy_xstate_to_uabi_buf() again x86/fpu/signal: Let xrstor handle the features to init x86/fpu/signal: Handle #PF in the direct restore path x86/fpu: Return proper error codes from user access functions x86/fpu/signal: Split out the direct restore code x86/fpu/signal: Sanitize copy_user_to_fpregs_zeroing() x86/fpu/signal: Sanitize the xstate check on sigframe x86/fpu/signal: Remove the legacy alignment check x86/fpu/signal: Move initial checks into fpu__restore_sig() x86/fpu: Mark init_fpstate __ro_after_init x86/pkru: Remove xstate fiddling from write_pkru() x86/fpu: Don't store PKRU in xstate in fpu_reset_fpstate() x86/fpu: Remove PKRU handling from switch_fpu_finish() x86/fpu: Mask PKRU from kernel XRSTOR[S] operations x86/fpu: Hook up PKRU into ptrace() x86/fpu: Add PKRU storage outside of task XSAVE buffer x86/fpu: Dont restore PKRU in fpregs_restore_userspace() x86/fpu: Rename xfeatures_mask_user() to xfeatures_mask_uabi() x86/fpu: Move FXSAVE_LEAK quirk info __copy_kernel_to_fpregs() x86/fpu: Rename __fpregs_load_activate() to fpregs_restore_userregs() ...
2021-06-23x86/fpu: Rename and sanitize fpu__save/copy()Thomas Gleixner
Both function names are a misnomer. fpu__save() is actually about synchronizing the hardware register state into the task's memory state so that either coredump or a math exception handler can inspect the state at the time where the problem happens. The function guarantees to preserve the register state, while "save" is a common terminology for saving the current state so it can be modified and restored later. This is clearly not the case here. Rename it to fpu_sync_fpstate(). fpu__copy() is used to clone the current task's FPU state when duplicating task_struct. While the register state is a copy the rest of the FPU state is not. Name it accordingly and remove the really pointless @src argument along with the warning which comes along with it. Nothing can ever copy the FPU state of a non-current task. It's clearly just a consequence of arch_dup_task_struct(), but it makes no sense to proliferate that further. Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Borislav Petkov <bp@suse.de> Link: https://lkml.kernel.org/r/20210623121455.196727450@linutronix.de
2021-05-18x86/idt: Rework IDT setup for boot CPUThomas Gleixner
A basic IDT setup for the boot CPU has to be done before invoking cpu_init() because that might trigger #GP when accessing certain MSRs. This setup cannot install the IST variants on 64-bit because the TSS setup which is required for ISTs to work happens in cpu_init(). That leaves a theoretical window where a NMI would invoke the ASM entry point which relies on IST being enabled on the kernel stack which is undefined behaviour. This setup logic has never worked correctly, but on the other hand a NMI hitting the boot CPU before it has fully set up the IDT would be fatal anyway. So the small window between the wrong NMI gate and the IST based NMI gate is not really adding a substantial amount of risk. But the setup logic is nevertheless more convoluted than necessary. The recent separation of the TSS setup into a separate function to ensure that setup so it can setup TSS first, then initialize IDT with the IST variants before invoking cpu_init() and get rid of the post cpu_init() IST setup. Move the invocation of cpu_init_exception_handling() ahead of idt_setup_traps() and merge the IST setup into the default setup table. Reported-by: Lai Jiangshan <laijs@linux.alibaba.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lai Jiangshan <laijs@linux.alibaba.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/20210507114000.569244755@linutronix.de
2021-05-18x86/cpu: Init AP exception handling from cpu_init_secondary()Borislav Petkov
SEV-ES guests require properly setup task register with which the TSS descriptor in the GDT can be located so that the IST-type #VC exception handler which they need to function properly, can be executed. This setup needs to happen before attempting to load microcode in ucode_cpu_init() on secondary CPUs which can cause such #VC exceptions. Simplify the machinery by running that exception setup from a new function cpu_init_secondary() and explicitly call cpu_init_exception_handling() for the boot CPU before cpu_init(). The latter prepares for fixing and simplifying the exception/IST setup on the boot CPU. There should be no functional changes resulting from this patch. [ tglx: Reworked it so cpu_init_exception_handling() stays seperate ] Signed-off-by: Borislav Petkov <bp@suse.de> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Lai Jiangshan <laijs@linux.alibaba.com> Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org> Link: https://lore.kernel.org/r/87k0o6gtvu.ffs@nanos.tec.linutronix.de
2021-04-27Merge tag 'x86_core_for_v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 updates from Borislav Petkov: - Turn the stack canary into a normal __percpu variable on 32-bit which gets rid of the LAZY_GS stuff and a lot of code. - Add an insn_decode() API which all users of the instruction decoder should preferrably use. Its goal is to keep the details of the instruction decoder away from its users and simplify and streamline how one decodes insns in the kernel. Convert its users to it. - kprobes improvements and fixes - Set the maximum DIE per package variable on Hygon - Rip out the dynamic NOP selection and simplify all the machinery around selecting NOPs. Use the simplified NOPs in objtool now too. - Add Xeon Sapphire Rapids to list of CPUs that support PPIN - Simplify the retpolines by folding the entire thing into an alternative now that objtool can handle alternatives with stack ops. Then, have objtool rewrite the call to the retpoline with the alternative which then will get patched at boot time. - Document Intel uarch per models in intel-family.h - Make Sub-NUMA Clustering topology the default and Cluster-on-Die the exception on Intel. * tag 'x86_core_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (53 commits) x86, sched: Treat Intel SNC topology as default, COD as exception x86/cpu: Comment Skylake server stepping too x86/cpu: Resort and comment Intel models objtool/x86: Rewrite retpoline thunk calls objtool: Skip magical retpoline .altinstr_replacement objtool: Cache instruction relocs objtool: Keep track of retpoline call sites objtool: Add elf_create_undef_symbol() objtool: Extract elf_symbol_add() objtool: Extract elf_strtab_concat() objtool: Create reloc sections implicitly objtool: Add elf_create_reloc() helper objtool: Rework the elf_rebuild_reloc_section() logic objtool: Fix static_call list generation objtool: Handle per arch retpoline naming objtool: Correctly handle retpoline thunk calls x86/retpoline: Simplify retpolines x86/alternatives: Optimize optimize_nops() x86: Add insn_decode_kernel() x86/kprobes: Move 'inline' to the beginning of the kprobe_is_ss() declaration ...
2021-04-26Merge tag 'x86-splitlock-2021-04-26' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 bus lock detection updates from Thomas Gleixner: "Support for enhanced split lock detection: Newer CPUs provide a second mechanism to detect operations with lock prefix which go accross a cache line boundary. Such operations have to take bus lock which causes a system wide performance degradation when these operations happen frequently. The new mechanism is not using the #AC exception. It triggers #DB and is restricted to operations in user space. Kernel side split lock access can only be detected by the #AC based variant. Contrary to the #AC based mechanism the #DB based variant triggers _after_ the instruction was executed. The mechanism is CPUID enumerated and contrary to the #AC version which is based on the magic TEST_CTRL_MSR and model/family based enumeration on the way to become architectural" * tag 'x86-splitlock-2021-04-26' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: Documentation/admin-guide: Change doc for split_lock_detect parameter x86/traps: Handle #DB for bus lock x86/cpufeatures: Enumerate #DB for bus lock detection
2021-04-26Merge tag 'x86_cleanups_for_v5.13' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull misc x86 cleanups from Borislav Petkov: "Trivial cleanups and fixes all over the place" * tag 'x86_cleanups_for_v5.13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: MAINTAINERS: Remove me from IDE/ATAPI section x86/pat: Do not compile stubbed functions when X86_PAT is off x86/asm: Ensure asm/proto.h can be included stand-alone x86/platform/intel/quark: Fix incorrect kernel-doc comment syntax in files x86/msr: Make locally used functions static x86/cacheinfo: Remove unneeded dead-store initialization x86/process/64: Move cpu_current_top_of_stack out of TSS tools/turbostat: Unmark non-kernel-doc comment x86/syscalls: Fix -Wmissing-prototypes warnings from COND_SYSCALL() x86/fpu/math-emu: Fix function cast warning x86/msr: Fix wr/rdmsr_safe_regs_on_cpu() prototypes x86: Fix various typos in comments, take #2 x86: Remove unusual Unicode characters from comments x86/kaslr: Return boolean values from a function returning bool x86: Fix various typos in comments x86/setup: Remove unused RESERVE_BRK_ARRAY() stacktrace: Move documentation for arch_stack_walk_reliable() to header x86: Remove duplicate TSC DEADLINE MSR definitions
2021-04-09x86/traps: Correct exc_general_protection() and math_error() return pathsThomas Tai
Commit 334872a09198 ("x86/traps: Attempt to fixup exceptions in vDSO before signaling") added return statements which bypass calling cond_local_irq_disable(). According to ca4c6a9858c2 ("x86/traps: Make interrupt enable/disable symmetric in C code"), cond_local_irq_disable() is needed because the asm return code no longer disables interrupts. Follow the existing code as an example to use "goto exit" instead of "return" statement. [ bp: Massage commit message. ] Fixes: 334872a09198 ("x86/traps: Attempt to fixup exceptions in vDSO before signaling") Signed-off-by: Thomas Tai <thomas.tai@oracle.com> Signed-off-by: Borislav Petkov <bp@suse.de> Reviewed-by: Alexandre Chartre <alexandre.chartre@oracle.com> Link: https://lkml.kernel.org/r/1617902914-83245-1-git-send-email-thomas.tai@oracle.com