summaryrefslogtreecommitdiff
path: root/arch/powerpc/kernel
AgeCommit message (Collapse)Author
2022-02-07powerpc/module_32: Fix livepatching for RO modulesChristophe Leroy
Livepatching a loaded module involves applying relocations through apply_relocate_add(), which attempts to write to read-only memory when CONFIG_STRICT_MODULE_RWX=y. R_PPC_ADDR16_LO, R_PPC_ADDR16_HI, R_PPC_ADDR16_HA and R_PPC_REL24 are the types generated by the kpatch-build userspace tool or klp-convert kernel tree observed applying a relocation to a post-init module. Use patch_instruction() to patch those relocations. Commit 8734b41b3efe ("powerpc/module_64: Fix livepatching for RO modules") did similar change in module_64. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Acked-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/d5697157cb7dba3927e19aa17c915a83bc550bb2.1640017960.git.christophe.leroy@csgroup.eu
2022-02-07powerpc/32: Remove _ENTRY() macroChristophe Leroy
_ENTRY() is now redundant with _GLOBAL(). Remove it. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/62a35f8dde2bb74c8d0d7a5430cce07a5a3a6fb6.1638273868.git.christophe.leroy@csgroup.eu
2022-02-07powerpc/32: Remove remaining .stabs annotationsChristophe Leroy
STABS debug format has been superseded long time ago by DWARF. Remove the few remaining .stabs annotations from old 32 bits code. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/68932ec2ba6b868d35006b96e90f0890f3da3c05.1638273868.git.christophe.leroy@csgroup.eu
2022-02-07powerpc/603: Fix boot failure with DEBUG_PAGEALLOC and KFENCEChristophe Leroy
Allthough kernel text is always mapped with BATs, we still have inittext mapped with pages, so TLB miss handling is required when CONFIG_DEBUG_PAGEALLOC or CONFIG_KFENCE is set. The final solution should be to set a BAT that also maps inittext but that BAT then needs to be cleared at end of init, and it will require more changes to be able to do it properly. As DEBUG_PAGEALLOC or KFENCE are debugging, performance is not a big deal so let's fix it simply for now to enable easy stable application. Fixes: 035b19a15a98 ("powerpc/32s: Always map kernel text and rodata with BATs") Cc: stable@vger.kernel.org # v5.11+ Reported-by: Maxime Bizon <mbizon@freebox.fr> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/aea33b4813a26bdb9378b5f273f00bd5d4abe240.1638857364.git.christophe.leroy@csgroup.eu
2022-02-07powerpc: Set crashkernel offset to mid of RMA regionSourabh Jain
On large config LPARs (having 192 and more cores), Linux fails to boot due to insufficient memory in the first memblock. It is due to the memory reservation for the crash kernel which starts at 128MB offset of the first memblock. This memory reservation for the crash kernel doesn't leave enough space in the first memblock to accommodate other essential system resources. The crash kernel start address was set to 128MB offset by default to ensure that the crash kernel get some memory below the RMA region which is used to be of size 256MB. But given that the RMA region size can be 512MB or more, setting the crash kernel offset to mid of RMA size will leave enough space for the kernel to allocate memory for other system resources. Since the above crash kernel offset change is only applicable to the LPAR platform, the LPAR feature detection is pushed before the crash kernel reservation. The rest of LPAR specific initialization will still be done during pseries_probe_fw_features as usual. This patch is dependent on changes to paca allocation for boot CPU. It expect boot CPU to discover 1T segment support which is introduced by the patch posted here: https://lists.ozlabs.org/pipermail/linuxppc-dev/2022-January/239175.html Reported-by: Abdul haleem <abdhalee@linux.vnet.ibm.com> Signed-off-by: Sourabh Jain <sourabhjain@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220204085601.107257-1-sourabhjain@linux.ibm.com
2022-02-03powerpc/603: Clear C bit when PTE is read onlyChristophe Leroy
On book3s/32 MMU, PP bits don't offer kernel RO protection, kernel pages are always RW. However, on the 603 a page fault is always generated when the C bit (change bit = dirty bit) is not set. Enforce kernel RO protection by clearing C bit in TLB miss handler when the page doesn't have _PAGE_RW flag. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/bbb13848ff0100a76ee9ea95118058c30ae95f2c.1643613343.git.christophe.leroy@csgroup.eu
2022-02-03powerpc/603: Remove outdated commentChristophe Leroy
Since commit 84de6ab0e904 ("powerpc/603: don't handle PAGE_ACCESSED in TLB miss handlers.") page table is not updated anymore by TLB miss handlers. Remove the comment. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/38b1ffefd2146fa56bf8aa605d476ad9736bbb37.1643613296.git.christophe.leroy@csgroup.eu
2022-02-03powerpc/module_64: use module_init_section instead of patching namesWedson Almeida Filho
Without this patch, module init sections are disabled by patching their names in arch-specific code when they're loaded (which prevents code in layout_sections from finding init sections). This patch uses the new arch-specific module_init_section instead. This allows modules that have .init_array sections to have the initialisers properly called (on load, before init). Without this patch, the initialisers are not called because .init_array is renamed to _init_array, and thus isn't found by code in find_module_sections(). Signed-off-by: Wedson Almeida Filho <wedsonaf@google.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220202055123.2144842-1-wedsonaf@google.com
2022-02-03powerpc: Fix debug print in smp_setup_cpu_mapsFabiano Rosas
When figuring out the number of threads, the debug message prints "1 thread" for the first iteration of the loop, instead of the actual number of threads calculated from the length of the "ibm,ppc-interrupt-server#s" property. * /cpus/PowerPC,POWER8@20... ibm,ppc-interrupt-server#s -> 1 threads <--- WRONG thread 0 -> cpu 0 (hard id 32) thread 1 -> cpu 1 (hard id 33) thread 2 -> cpu 2 (hard id 34) thread 3 -> cpu 3 (hard id 35) thread 4 -> cpu 4 (hard id 36) thread 5 -> cpu 5 (hard id 37) thread 6 -> cpu 6 (hard id 38) thread 7 -> cpu 7 (hard id 39) * /cpus/PowerPC,POWER8@28... ibm,ppc-interrupt-server#s -> 8 threads thread 0 -> cpu 8 (hard id 40) thread 1 -> cpu 9 (hard id 41) thread 2 -> cpu 10 (hard id 42) thread 3 -> cpu 11 (hard id 43) thread 4 -> cpu 12 (hard id 44) thread 5 -> cpu 13 (hard id 45) thread 6 -> cpu 14 (hard id 46) thread 7 -> cpu 15 (hard id 47) (...) Signed-off-by: Fabiano Rosas <farosas@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210120181847.952106-1-farosas@linux.ibm.com
2022-02-02powerpc/64: Move paca allocation later in bootMichael Ellerman
Mahesh & Sourabh identified two problems[1][2] with ppc64_bolted_size() and paca allocation. The first is that on a Radix capable machine but with "disable_radix" on the command line, there is a window during early boot where early_radix_enabled() is true, even though it will later become false. early_init_devtree: <- early_radix_enabled() = false early_init_dt_scan_cpus: <- early_radix_enabled() = false ... check_cpu_pa_features: <- early_radix_enabled() = false ... ^ <- early_radix_enabled() = TRUE allocate_paca: | <- early_radix_enabled() = TRUE ... | ppc64_bolted_size: | <- early_radix_enabled() = TRUE if (early_radix_enabled())| <- early_radix_enabled() = TRUE return ULONG_MAX; | ... | ... | <- early_radix_enabled() = TRUE ... | <- early_radix_enabled() = TRUE mmu_early_init_devtree() V ... <- early_radix_enabled() = false This causes ppc64_bolted_size() to return ULONG_MAX for the boot CPU's paca allocation, even though later it will return a different value. This is not currently a bug because the paca allocation is also limited by the RMA size, but that is very fragile. The second issue is that when using the Hash MMU, when we call ppc64_bolted_size() for the boot CPU's paca allocation, we have not yet detected whether 1T segments are available. That causes ppc64_bolted_size() to return 256MB, even if the machine can actually support up to 1T. This is usually OK, we generally have space below 256MB for one paca, but for a kdump kernel placed above 256MB it causes the boot to fail. At boot we cannot discover all the features of the machine instantaneously, so there will always be some periods where we have incomplete knowledge of the system. However both the above problems stem from the fact that we allocate the boot CPU's paca (and paca pointers array) before we decide which MMU we are using, or discover its exact features. Moving the paca allocation slightly later still can solve both the issues described above, and means for a normal boot we don't do any permanent allocations until after we've discovered the MMU. Note that although we move the boot CPU's paca allocation later, we still have a temporary paca (boot_paca) accessible via r13, so code that does read only access to paca fields is safe. The only risk is that some code writes to the boot_paca, and that write will then be lost when we switch away from the boot_paca later in early_setup(). The additional code that runs before the paca allocation is primarily mmu_early_init_devtree(), which is scanning the device tree and populating globals and cur_cpu_spec with MMU related flags. I do not see any additional code that writes to paca fields. [1]: https://lore.kernel.org/r/20211018084434.217772-2-sourabhjain@linux.ibm.com [2]: https://lore.kernel.org/r/20211018084434.217772-3-sourabhjain@linux.ibm.com Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220124130544.408675-1-mpe@ellerman.id.au
2022-01-31powerpc: add link stack flush mitigation status in debugfs.Michal Suchanek
The link stack flush status is not visible in debugfs. It can be enabled even when count cache flush is disabled. Add separate file for its status. Signed-off-by: Michal Suchanek <msuchanek@suse.de> [mpe: Update for change to link_stack_flush_type] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20191127220959.6208-1-msuchanek@suse.de
2022-01-25powerpc/64s/interrupt: Fix decrementer stormNicholas Piggin
The decrementer exception can fail to be cleared when the interrupt returns in the case where the decrementer wraps with the next timer still beyond decrementer_max. This results in a decrementer interrupt storm. This is triggerable with small decrementer system with hard and soft watchdogs disabled. Fix this by always programming the decrementer if there was no timer. Fixes: 0faf20a1ad16 ("powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is in use") Reported-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220124143930.3923442-1-npiggin@gmail.com
2022-01-23Merge tag 'powerpc-5.17-2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: - A series of bpf fixes, including an oops fix and some codegen fixes. - Fix a regression in syscall_get_arch() for compat processes. - Fix boot failure on some 32-bit systems with KASAN enabled. - A couple of other build/minor fixes. Thanks to Athira Rajeev, Christophe Leroy, Dmitry V. Levin, Jiri Olsa, Johan Almbladh, Maxime Bizon, Naveen N. Rao, and Nicholas Piggin. * tag 'powerpc-5.17-2' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/64s: Mask SRR0 before checking against the masked NIP powerpc/perf: Only define power_pmu_wants_prompt_pmi() for CONFIG_PPC64 powerpc/32s: Fix kasan_init_region() for KASAN powerpc/time: Fix build failure due to do_hard_irq_enable() on PPC32 powerpc/audit: Fix syscall_get_arch() powerpc64/bpf: Limit 'ldbrx' to processors compliant with ISA v2.06 tools/bpf: Rename 'struct event' to avoid naming conflict powerpc/bpf: Update ldimm64 instructions during extra pass powerpc32/bpf: Fix codegen for bpf-to-bpf calls bpf: Guard against accessing NULL pt_regs in bpf_get_task_stack()
2022-01-22proc: remove PDE_DATA() completelyMuchun Song
Remove PDE_DATA() completely and replace it with pde_data(). [akpm@linux-foundation.org: fix naming clash in drivers/nubus/proc.c] [akpm@linux-foundation.org: now fix it properly] Link: https://lkml.kernel.org/r/20211124081956.87711-2-songmuchun@bytedance.com Signed-off-by: Muchun Song <songmuchun@bytedance.com> Acked-by: Christian Brauner <christian.brauner@ubuntu.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Alexey Gladkov <gladkov.alexey@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "55 patches. Subsystems affected by this patch series: percpu, procfs, sysctl, misc, core-kernel, get_maintainer, lib, checkpatch, binfmt, nilfs2, hfs, fat, adfs, panic, delayacct, kconfig, kcov, and ubsan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (55 commits) lib: remove redundant assignment to variable ret ubsan: remove CONFIG_UBSAN_OBJECT_SIZE kcov: fix generic Kconfig dependencies if ARCH_WANTS_NO_INSTR lib/Kconfig.debug: make TEST_KMOD depend on PAGE_SIZE_LESS_THAN_256KB btrfs: use generic Kconfig option for 256kB page size limit arch/Kconfig: split PAGE_SIZE_LESS_THAN_256KB from PAGE_SIZE_LESS_THAN_64KB configs: introduce debug.config for CI-like setup delayacct: track delays from memory compact Documentation/accounting/delay-accounting.rst: add thrashing page cache and direct compact delayacct: cleanup flags in struct task_delay_info and functions use it delayacct: fix incomplete disable operation when switch enable to disable delayacct: support swapin delay accounting for swapping without blkio panic: remove oops_id panic: use error_report_end tracepoint on warnings fs/adfs: remove unneeded variable make code cleaner FAT: use io_schedule_timeout() instead of congestion_wait() hfsplus: use struct_group_attr() for memcpy() region nilfs2: remove redundant pointer sbufs fs/binfmt_elf: use PT_LOAD p_align values for static PIE const_structs.checkpatch: add frequently used ops structs ...
2022-01-20mm: percpu: add generic pcpu_populate_pte() functionKefeng Wang
With NEED_PER_CPU_PAGE_FIRST_CHUNK enabled, we need a function to populate pte, this patch adds a generic pcpu populate pte function, pcpu_populate_pte(), which is marked __weak and used on most architectures, but it is overridden on x86, which has its own implementation. Link: https://lkml.kernel.org/r/20211216112359.103822-5-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20mm: percpu: add generic pcpu_fc_alloc/free funcitonKefeng Wang
With the previous patch, we could add a generic pcpu first chunk allocate and free function to cleanup the duplicated definations on each architecture. Link: https://lkml.kernel.org/r/20211216112359.103822-4-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-20mm: percpu: add pcpu_fc_cpu_to_node_fn_t typedefKefeng Wang
Add pcpu_fc_cpu_to_node_fn_t and pass it into pcpu_fc_alloc_fn_t, pcpu first chunk allocation will call it to alloc memblock on the corresponding node by it, this is prepare for the next patch. Link: https://lkml.kernel.org/r/20211216112359.103822-3-wangkefeng.wang@huawei.com Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: "David S. Miller" <davem@davemloft.net> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ingo Molnar <mingo@redhat.com> Cc: Borislav Petkov <bp@alien8.de> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: "Rafael J. Wysocki" <rafael@kernel.org> Cc: Dennis Zhou <dennis@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Christoph Lameter <cl@linux.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-18powerpc/64s: Mask SRR0 before checking against the masked NIPNicholas Piggin
Commit 314f6c23dd8d ("powerpc/64s: Mask NIP before checking against SRR0") masked off the low 2 bits of the NIP value in the interrupt stack frame in case they are non-zero and mis-compare against a SRR0 register value of a CPU which always reads back 0 from the 2 low bits which are reserved. This now causes the opposite problem that an implementation which does implement those bits in SRR0 will mis-compare against the masked NIP value in which they have been cleared. QEMU is one such implementation, and this is allowed by the architecture. This can be triggered by sigfuz by setting low bits of PT_NIP in the signal context. Fix this for now by masking the SRR0 bits as well. Cleaner is probably to sanitise these values before putting them in registers or stack, but this is the quick and backportable fix. Fixes: 314f6c23dd8d ("powerpc/64s: Mask NIP before checking against SRR0") Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220117134403.2995059-1-npiggin@gmail.com
2022-01-17Merge branch 'signal-for-v5.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace Pull signal/exit/ptrace updates from Eric Biederman: "This set of changes deletes some dead code, makes a lot of cleanups which hopefully make the code easier to follow, and fixes bugs found along the way. The end-game which I have not yet reached yet is for fatal signals that generate coredumps to be short-circuit deliverable from complete_signal, for force_siginfo_to_task not to require changing userspace configured signal delivery state, and for the ptrace stops to always happen in locations where we can guarantee on all architectures that the all of the registers are saved and available on the stack. Removal of profile_task_ext, profile_munmap, and profile_handoff_task are the big successes for dead code removal this round. A bunch of small bug fixes are included, as most of the issues reported were small enough that they would not affect bisection so I simply added the fixes and did not fold the fixes into the changes they were fixing. There was a bug that broke coredumps piped to systemd-coredump. I dropped the change that caused that bug and replaced it entirely with something much more restrained. Unfortunately that required some rebasing. Some successes after this set of changes: There are few enough calls to do_exit to audit in a reasonable amount of time. The lifetime of struct kthread now matches the lifetime of struct task, and the pointer to struct kthread is no longer stored in set_child_tid. The flag SIGNAL_GROUP_COREDUMP is removed. The field group_exit_task is removed. Issues where task->exit_code was examined with signal->group_exit_code should been examined were fixed. There are several loosely related changes included because I am cleaning up and if I don't include them they will probably get lost. The original postings of these changes can be found at: https://lkml.kernel.org/r/87a6ha4zsd.fsf@email.froward.int.ebiederm.org https://lkml.kernel.org/r/87bl1kunjj.fsf@email.froward.int.ebiederm.org https://lkml.kernel.org/r/87r19opkx1.fsf_-_@email.froward.int.ebiederm.org I trimmed back the last set of changes to only the obviously correct once. Simply because there was less time for review than I had hoped" * 'signal-for-v5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (44 commits) ptrace/m68k: Stop open coding ptrace_report_syscall ptrace: Remove unused regs argument from ptrace_report_syscall ptrace: Remove second setting of PT_SEIZED in ptrace_attach taskstats: Cleanup the use of task->exit_code exit: Use the correct exit_code in /proc/<pid>/stat exit: Fix the exit_code for wait_task_zombie exit: Coredumps reach do_group_exit exit: Remove profile_handoff_task exit: Remove profile_task_exit & profile_munmap signal: clean up kernel-doc comments signal: Remove the helper signal_group_exit signal: Rename group_exit_task group_exec_task coredump: Stop setting signal->group_exit_task signal: Remove SIGNAL_GROUP_COREDUMP signal: During coredumps set SIGNAL_GROUP_EXIT in zap_process signal: Make coredump handling explicit in complete_signal signal: Have prepare_signal detect coredumps using signal->core_state signal: Have the oom killer detect coredumps using signal->core_state exit: Move force_uaccess back into do_exit exit: Guarantee make_task_dead leaks the tsk when calling do_task_exit ...
2022-01-15Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge misc updates from Andrew Morton: "146 patches. Subsystems affected by this patch series: kthread, ia64, scripts, ntfs, squashfs, ocfs2, vfs, and mm (slab-generic, slab, kmemleak, dax, kasan, debug, pagecache, gup, shmem, frontswap, memremap, memcg, selftests, pagemap, dma, vmalloc, memory-failure, hugetlb, userfaultfd, vmscan, mempolicy, oom-kill, hugetlbfs, migration, thp, ksm, page-poison, percpu, rmap, zswap, zram, cleanups, hmm, and damon)" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (146 commits) mm/damon: hide kernel pointer from tracepoint event mm/damon/vaddr: hide kernel pointer from damon_va_three_regions() failure log mm/damon/vaddr: use pr_debug() for damon_va_three_regions() failure logging mm/damon/dbgfs: remove an unnecessary variable mm/damon: move the implementation of damon_insert_region to damon.h mm/damon: add access checking for hugetlb pages Docs/admin-guide/mm/damon/usage: update for schemes statistics mm/damon/dbgfs: support all DAMOS stats Docs/admin-guide/mm/damon/reclaim: document statistics parameters mm/damon/reclaim: provide reclamation statistics mm/damon/schemes: account how many times quota limit has exceeded mm/damon/schemes: account scheme actions that successfully applied mm/damon: remove a mistakenly added comment for a future feature Docs/admin-guide/mm/damon/usage: update for kdamond_pid and (mk|rm)_contexts Docs/admin-guide/mm/damon/usage: mention tracepoint at the beginning Docs/admin-guide/mm/damon/usage: remove redundant information Docs/admin-guide/mm/damon/usage: update for scheme quotas and watermarks mm/damon: convert macro functions to static inline functions mm/damon: modify damon_rand() macro to static inline function mm/damon: move damon_rand() definition into damon.h ...
2022-01-15mm/mempolicy: wire up syscall set_mempolicy_home_nodeAneesh Kumar K.V
Link: https://lkml.kernel.org/r/20211202123810.267175-4-aneesh.kumar@linux.ibm.com Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> Cc: Ben Widawsky <ben.widawsky@intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Feng Tang <feng.tang@intel.com> Cc: Michal Hocko <mhocko@kernel.org> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andi Kleen <ak@linux.intel.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Huang Ying <ying.huang@intel.com> Cc: <linux-api@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-01-14Merge tag 'powerpc-5.17-1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc updates from Michael Ellerman: - Optimise radix KVM guest entry/exit by 2x on Power9/Power10. - Allow firmware to tell us whether to disable the entry and uaccess flushes on Power10 or later CPUs. - Add BPF_PROBE_MEM support for 32 and 64-bit BPF jits. - Several fixes and improvements to our hard lockup watchdog. - Activate HAVE_DYNAMIC_FTRACE_WITH_REGS on 32-bit. - Allow building the 64-bit Book3S kernel without hash MMU support, ie. Radix only. - Add KUAP (SMAP) support for 40x, 44x, 8xx, Book3E (64-bit). - Add new encodings for perf_mem_data_src.mem_hops field, and use them on Power10. - A series of small performance improvements to 64-bit interrupt entry. - Several commits fixing issues when building with the clang integrated assembler. - Many other small features and fixes. Thanks to Alan Modra, Alexey Kardashevskiy, Ammar Faizi, Anders Roxell, Arnd Bergmann, Athira Rajeev, Cédric Le Goater, Christophe JAILLET, Christophe Leroy, Christoph Hellwig, Daniel Axtens, David Yang, Erhard Furtner, Fabiano Rosas, Greg Kroah-Hartman, Guo Ren, Hari Bathini, Jason Wang, Joel Stanley, Julia Lawall, Kajol Jain, Kees Cook, Laurent Dufour, Madhavan Srinivasan, Mark Brown, Minghao Chi, Nageswara R Sastry, Naresh Kamboju, Nathan Chancellor, Nathan Lynch, Nicholas Piggin, Nick Child, Oliver O'Halloran, Peiwei Hu, Randy Dunlap, Ravi Bangoria, Rob Herring, Russell Currey, Sachin Sant, Sean Christopherson, Segher Boessenkool, Thadeu Lima de Souza Cascardo, Tyrel Datwyler, Xiang wangx, and Yang Guang. * tag 'powerpc-5.17-1' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: (240 commits) powerpc/xmon: Dump XIVE information for online-only processors. powerpc/opal: use default_groups in kobj_type powerpc/cacheinfo: use default_groups in kobj_type powerpc/sched: Remove unused TASK_SIZE_OF powerpc/xive: Add missing null check after calling kmalloc powerpc/floppy: Remove usage of the deprecated "pci-dma-compat.h" API selftests/powerpc: Add a test of sigreturning to an unaligned address powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warnings powerpc/64s: Mask NIP before checking against SRR0 powerpc/perf: Fix spelling of "its" powerpc/32: Fix boot failure with GCC latent entropy plugin powerpc/code-patching: Replace patch_instruction() by ppc_inst_write() in selftests powerpc/code-patching: Move code patching selftests in its own file powerpc/code-patching: Move instr_is_branch_{i/b}form() in code-patching.h powerpc/code-patching: Move patch_exception() outside code-patching.c powerpc/code-patching: Use test_trampoline for prefixed patch test powerpc/code-patching: Fix patch_branch() return on out-of-range failure powerpc/code-patching: Reorganise do_patch_instruction() to ease error handling powerpc/code-patching: Fix unmap_patch_area() error handling powerpc/code-patching: Fix error handling in do_patch_instruction() ...
2022-01-12Merge tag 'devicetree-for-5.17' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux Pull devicetree updates from Rob Herring: "Bindings: - DT schema conversions for Samsung clocks, RNG bindings, Qcom Command DB and rmtfs, gpio-restart, i2c-mux-gpio, i2c-mux-pinctl, Tegra I2C and BPMP, pwm-vibrator, Arm DSU, and Cadence macb - DT schema conversions for Broadcom platforms: interrupt controllers, STB GPIO, STB waketimer, STB reset, iProc MDIO mux, iProc PCIe, Cygnus PCIe PHY, PWM, USB BDC, BCM6328 LEDs, TMON, SYSTEMPORT, AMAC, Northstar 2 PCIe PHY, GENET, moca PHY, GISB arbiter, and SATA - Add binding schemas for Tegra210 EMC table, TI DC-DC converters, - Clean-ups of MDIO bus schemas to fix 'unevaluatedProperties' issues - More fixes due to 'unevaluatedProperties' enabling - Data type fixes and clean-ups of binding examples found in preparation to move to validating DTB files directly (instead of intermediate YAML representation. - Vendor prefixes for T-Head Semiconductor, OnePlus, and Sunplus - Add various new compatible strings DT core: - Silence a warning for overlapping reserved memory regions - Reimplement unittest overlay tracking - Fix stack frame size warning in unittest - Clean-ups of early FDT scanning functions - Fix handling of "linux,usable-memory-range" on EFI booted systems - Add support for 'fail' status on CPU nodes - Improve error message in of_phandle_iterator_next() - kbuild: Disable duplicate unit-address warnings for disabled nodes" * tag 'devicetree-for-5.17' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux: (114 commits) dt-bindings: net: mdio: Drop resets/reset-names child properties dt-bindings: clock: samsung: convert S5Pv210 to dtschema dt-bindings: clock: samsung: convert Exynos5410 to dtschema dt-bindings: clock: samsung: convert Exynos5260 to dtschema dt-bindings: clock: samsung: extend Exynos7 bindings with UFS dt-bindings: clock: samsung: convert Exynos7 to dtschema dt-bindings: clock: samsung: convert Exynos5433 to dtschema dt-bindings: i2c: maxim,max96712: Add bindings for Maxim Integrated MAX96712 dt-bindings: iio: adi,ltc2983: Fix 64-bit property sizes dt-bindings: power: maxim,max17040: Fix incorrect type for 'maxim,rcomp' dt-bindings: interrupt-controller: arm,gic-v3: Fix 'interrupts' cell size in example dt-bindings: iio/magnetometer: yamaha,yas530: Fix invalid 'interrupts' in example dt-bindings: clock: imx5: Drop clock consumer node from example dt-bindings: Drop required 'interrupt-parent' dt-bindings: net: ti,dp83869: Drop value on boolean 'ti,max-output-impedance' dt-bindings: net: wireless: mt76: Fix 8-bit property sizes dt-bindings: PCI: snps,dw-pcie-ep: Drop conflicting 'max-functions' schema dt-bindings: i2c: st,stm32-i2c: Make each example a separate entry dt-bindings: net: stm32-dwmac: Make each example a separate entry dt-bindings: net: Cleanup MDIO node schemas ...
2022-01-10Merge tag 'core_entry_for_v5.17_rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull thread_info flag accessor helper updates from Borislav Petkov: "Add a set of thread_info.flags accessors which snapshot it before accesing it in order to prevent any potential data races, and convert all users to those new accessors" * tag 'core_entry_for_v5.17_rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: powerpc: Snapshot thread flags powerpc: Avoid discarding flags in system_call_exception() openrisc: Snapshot thread flags microblaze: Snapshot thread flags arm64: Snapshot thread flags ARM: Snapshot thread flags alpha: Snapshot thread flags sched: Snapshot thread flags entry: Snapshot thread flags x86: Snapshot thread flags thread_info: Add helpers to snapshot thread flags
2022-01-10Merge tag 'arm64-upstream' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux Pull arm64 updates from Catalin Marinas: - KCSAN enabled for arm64. - Additional kselftests to exercise the syscall ABI w.r.t. SVE/FPSIMD. - Some more SVE clean-ups and refactoring in preparation for SME support (scalable matrix extensions). - BTI clean-ups (SYM_FUNC macros etc.) - arm64 atomics clean-up and codegen improvements. - HWCAPs for FEAT_AFP (alternate floating point behaviour) and FEAT_RPRESS (increased precision of reciprocal estimate and reciprocal square root estimate). - Use SHA3 instructions to speed-up XOR. - arm64 unwind code refactoring/unification. - Avoid DC (data cache maintenance) instructions when DCZID_EL0.DZP == 1 (potentially set by a hypervisor; user-space already does this). - Perf updates for arm64: support for CI-700, HiSilicon PCIe PMU, Marvell CN10K LLC-TAD PMU, miscellaneous clean-ups. - Other fixes and clean-ups; highlights: fix the handling of erratum 1418040, correct the calculation of the nomap region boundaries, introduce io_stop_wc() mapped to the new DGH instruction (data gathering hint). * tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (81 commits) arm64: Use correct method to calculate nomap region boundaries arm64: Drop outdated links in comments arm64: perf: Don't register user access sysctl handler multiple times drivers: perf: marvell_cn10k: fix an IS_ERR() vs NULL check perf/smmuv3: Fix unused variable warning when CONFIG_OF=n arm64: errata: Fix exec handling in erratum 1418040 workaround arm64: Unhash early pointer print plus improve comment asm-generic: introduce io_stop_wc() and add implementation for ARM64 arm64: Ensure that the 'bti' macro is defined where linkage.h is included arm64: remove __dma_*_area() aliases docs/arm64: delete a space from tagged-address-abi arm64: Enable KCSAN kselftest/arm64: Add pidbench for floating point syscall cases arm64/fp: Add comments documenting the usage of state restore functions kselftest/arm64: Add a test program to exercise the syscall ABI kselftest/arm64: Allow signal tests to trigger from a function kselftest/arm64: Parameterise ptrace vector length information arm64/sve: Minor clarification of ABI documentation arm64/sve: Generalise vector length configuration prctl() for SME arm64/sve: Make sysctl interface for SVE reusable by SME ...
2022-01-05powerpc/cacheinfo: use default_groups in kobj_typeGreg Kroah-Hartman
There are currently 2 ways to create a set of sysfs files for a kobj_type, through the default_attrs field, and the default_groups field. Move the powerpc cacheinfo sysfs code to use default_groups field which has been the preferred way since aa30f47cf666 ("kobject: Add support for default attribute groups to kobj_type") so that we can soon get rid of the obsolete default_attrs field. Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Reviewed-by: Tyrel Datwyler <tyreld@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20220104155450.1291277-1-gregkh@linuxfoundation.org
2021-12-25powerpc/64s: Use EMIT_WARN_ENTRY for SRR debug warningsMichael Ellerman
When CONFIG_PPC_RFI_SRR_DEBUG=y we check the SRR values before returning from interrupts. This is done in asm using EMIT_BUG_ENTRY, and passing BUGFLAG_WARNING. However that fails to create an exception table entry for the warning, and so do_program_check() fails the exception table search and proceeds to call _exception(), resulting in an oops like: Oops: Exception in kernel mode, sig: 5 [#1] LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries Modules linked in: CPU: 2 PID: 1204 Comm: sigreturn_unali Tainted: P 5.16.0-rc2-00194-g91ca3d4f77c5 #12 NIP: c00000000000c5b0 LR: 0000000000000000 CTR: 0000000000000000 ... NIP [c00000000000c5b0] system_call_common+0x150/0x268 LR [0000000000000000] 0x0 Call Trace: [c00000000db73e10] [c00000000000c558] system_call_common+0xf8/0x268 (unreliable) ... Instruction dump: 7cc803a6 888d0931 2c240000 4082001c 38800000 988d0931 e8810170 e8a10178 7c9a03a6 7cbb03a6 7d7a02a6 e9810170 <7f0b6088> 7d7b02a6 e9810178 7f0b6088 We should instead use EMIT_WARN_ENTRY, which creates an exception table entry for the warning, allowing the warning to be correctly recognised, and the code to resume after printing the warning. Note however that because this warning is buried deep in the interrupt return path, we are not able to recover from it (due to MSR_RI being clear), so we still end up in die() with an unrecoverable exception. Fixes: 59dc5bfca0cb ("powerpc/64s: avoid reloading (H)SRR registers if they are still valid") Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211221135101.2085547-2-mpe@ellerman.id.au
2021-12-25powerpc/64s: Mask NIP before checking against SRR0Michael Ellerman
When CONFIG_PPC_RFI_SRR_DEBUG=y we check that NIP and SRR0 match when returning from interrupts. This can trigger falsely if NIP has either of its two low bits set via sigreturn or ptrace, while SRR0 has its low two bits masked in hardware. As a quick fix make sure to mask the low bits before doing the check. Fixes: 59dc5bfca0cb ("powerpc/64s: avoid reloading (H)SRR registers if they are still valid") Reported-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Tested-by: Sachin Sant <sachinp@linux.vnet.ibm.com> Link: https://lore.kernel.org/r/20211221135101.2085547-1-mpe@ellerman.id.au
2021-12-23powerpc/32: Fix boot failure with GCC latent entropy pluginChristophe Leroy
Boot fails with GCC latent entropy plugin enabled. This is due to early boot functions trying to access 'latent_entropy' global data while the kernel is not relocated at its final destination yet. As there is no way to tell GCC to use PTRRELOC() to access it, disable latent entropy plugin in early_32.o and feature-fixups.o and code-patching.o Fixes: 38addce8b600 ("gcc-plugins: Add latent_entropy plugin") Cc: stable@vger.kernel.org # v4.9+ Reported-by: Erhard Furtner <erhard_f@mailbox.org> Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://bugzilla.kernel.org/show_bug.cgi?id=215217 Link: https://lore.kernel.org/r/2bac55483b8daf5b1caa163a45fa5f9cdbe18be4.1640178426.git.christophe.leroy@csgroup.eu
2021-12-23powerpc/mm: Switch obsolete dssall to .longAlexey Kardashevskiy
The dssall ("Data Stream Stop All") instruction is obsolete altogether with other Data Cache Instructions since ISA 2.03 (year 2006). LLVM IAS does not support it but PPC970 seems to be using it. This switches dssall to .long as there is no much point in fixing LLVM. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211221055904.555763-6-aik@ozlabs.ru
2021-12-23powerpc/64/asm: Do not reassign labelsDaniel Axtens
The LLVM integrated assembler really does not like us reassigning things to the same label: <instantiation>:7:9: error: invalid reassignment of non-absolute variable 'fs_label' This happens across a bunch of platforms: https://github.com/ClangBuiltLinux/linux/issues/1043 https://github.com/ClangBuiltLinux/linux/issues/1008 https://github.com/ClangBuiltLinux/linux/issues/920 https://github.com/ClangBuiltLinux/linux/issues/1050 There is no hope of getting this fixed in LLVM (see https://github.com/ClangBuiltLinux/linux/issues/1043#issuecomment-641571200 and https://bugs.llvm.org/show_bug.cgi?id=47798#c1 ) so if we want to build with LLVM_IAS, we need to hack around it ourselves. For us the big problem comes from this: \#define USE_FIXED_SECTION(sname) \ fs_label = start_##sname; \ fs_start = sname##_start; \ use_ftsec sname; \#define USE_TEXT_SECTION() fs_label = start_text; \ fs_start = text_start; \ .text and in particular fs_label. This works around it by not setting those 'variables' and requiring that users of the variables instead track for themselves what section they are in. This isn't amazing, by any stretch, but it gets us further in the compilation. Note that even though users have to keep track of the section, using a wrong one produces an error with both binutils and llvm which prevents from using wrong section at the compile time: llvm error example: AS arch/powerpc/kernel/head_64.o <unknown>:0: error: Cannot represent a difference across sections make[3]: *** [/home/aik/p/kernels-llvm/llvm/scripts/Makefile.build:388: arch/powerpc/kernel/head_64.o] Error 1 binutils error example: /home/aik/p/kernels-llvm/llvm/arch/powerpc/kernel/exceptions-64s.S: Assembler messages: /home/aik/p/kernels-llvm/llvm/arch/powerpc/kernel/exceptions-64s.S:1974: Error: can't resolve `system_call_common' {.text section} - `start_r eal_vectors' {.head.text.real_vectors section} make[3]: *** [/home/aik/p/kernels-llvm/llvm/scripts/Makefile.build:388: arch/powerpc/kernel/head_64.o] Error 1 Signed-off-by: Daniel Axtens <dja@axtens.net> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211221055904.555763-5-aik@ozlabs.ru
2021-12-23powerpc/64/asm: Inline BRANCH_TO_C000Alexey Kardashevskiy
It is used just once and does not really help with readability, remove it. Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211221055904.555763-4-aik@ozlabs.ru
2021-12-23powerpc/toc: Future proof kernel tocAlan Modra
This patch future-proofs the kernel against linker changes that might put the toc pointer at some location other than .got+0x8000, by replacing __toc_start+0x8000 with .TOC. throughout. If the kernel's idea of the toc pointer doesn't agree with the linker, bad things happen. prom_init.c code relocating its toc is also changed so that a symbolic __prom_init_toc_start toc-pointer relative address is calculated rather than assuming that it is always at toc-pointer - 0x8000. The length calculations loading values from the toc are also avoided. It's a little incestuous to do that with unreloc_toc picking up adjusted values (which is fine in practice, they both adjust by the same amount if all goes well). I've also changed the way .got is aligned in vmlinux.lds and zImage.lds, mostly so that dumping out section info by objdump or readelf plainly shows the alignment is 256. This linker script feature was added 2005-09-27, available in FSF binutils releases from 2.17 onwards. Should be safe to use in the kernel, I think. Finally, put *(.got) before the prom_init.o entry which only needs *(.toc), so that the GOT header goes in the correct place. I don't believe this makes any difference for the kernel as it would for dynamic objects being loaded by ld.so. That change is just to stop lusers who blindly copy kernel scripts being led astray. Of course, this change needs the prom_init.c changes. Some notes on .toc and .got. .toc is a compiler generated section of addresses. .got is a linker generated section of addresses, generally built when the linker sees R_*_*GOT* relocations. In the case of powerpc64 ld.bfd, there are multiple generated .got sections, one per input object file. So you can somewhat reasonably write in a linker script an input section statement like *prom_init.o(.got .toc) to mean "the .got and .toc section for files matching *prom_init.o". On other architectures that doesn't make sense, because the linker generally has just one .got section. Even on powerpc64, note well that the GOT entries for prom_init.o may be merged with GOT entries from other objects. That means that if prom_init.o references, say, _end via some GOT relocation, and some other object also references _end via a GOT relocation, the GOT entry for _end may be in the range __prom_init_toc_start to __prom_init_toc_end and if the kernel does something special to GOT/TOC entries in that range then the value of _end as seen by objects other than prom_init.o will be affected. On the other hand the GOT entry for _end may not be in the range __prom_init_toc_start to __prom_init_toc_end. Which way it turns out is deterministic but a detail of linker operation that should not be relied on. A feature of ld.bfd is that input .toc (and .got) sections matching one linker input section statement may be sorted, to put entries used by small-model code first, near the toc base. This is why scripts for powerpc64 normally use *(.got .toc) rather than *(.got) *(.toc), since the first form allows more freedom to sort. Another feature of ld.bfd is that indirect addressing sequences using the GOT/TOC may be edited by the linker to relative addressing. In many cases relative addressing would be emitted by gcc for -mcmodel=medium if you appropriately decorate variable declarations with non-default visibility. The original patch is here: https://lore.kernel.org/linuxppc-dev/20210310034813.GM6042@bubble.grove.modra.org/ Signed-off-by: Alan Modra <amodra@au1.ibm.com> [aik: removed non-relocatable which is gone in 24d33ac5b8ffb] [aik: added <=2.24 check] [aik: because of llvm-as, kernel_toc_addr() uses "mr" instead of global register variable] Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211221055904.555763-2-aik@ozlabs.ru
2021-12-23powerpc/kernel: Add __init attribute to eligible functionsNick Child
Some functions defined in `arch/powerpc/kernel` (and one in `arch/powerpc/ kexec`) are deserving of an `__init` macro attribute. These functions are only called by other initialization functions and therefore should inherit the attribute. Also, change function declarations in header files to include `__init`. Signed-off-by: Nick Child <nick.child@ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211216220035.605465-2-nick.child@ibm.com
2021-12-19Merge tag 'powerpc-5.16-4' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux Pull powerpc fixes from Michael Ellerman: "Fix a recently introduced oops at boot on 85xx in some configurations. Fix crashes when loading some livepatch modules with STRICT_MODULE_RWX. Thanks to Joe Lawrence, Russell Currey, and Xiaoming Ni" * tag 'powerpc-5.16-4' of git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux: powerpc/module_64: Fix livepatching for RO modules powerpc/85xx: Fix oops when CONFIG_FSL_PMC=n
2021-12-16of/fdt: Rework early_init_dt_scan_memory() to call directlyRob Herring
Use of the of_scan_flat_dt() function predates libfdt and is discouraged as libfdt provides a nicer set of APIs. Rework early_init_dt_scan_memory() to be called directly and use libfdt. Cc: John Crispin <john@phrozen.org> Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Frank Rowand <frowand.list@gmail.com> Cc: linux-mips@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Reviewed-by: Frank Rowand <frank.rowand@sony.com> Signed-off-by: Rob Herring <robh@kernel.org> Tested-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211215150102.1303588-1-robh@kernel.org
2021-12-16of/fdt: Rework early_init_dt_scan_root() to call directlyRob Herring
Use of the of_scan_flat_dt() function predates libfdt and is discouraged as libfdt provides a nicer set of APIs. Rework early_init_dt_scan_root() to be called directly and use libfdt. Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Frank Rowand <frowand.list@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Frank Rowand <frank.rowand@sony.com> Link: https://lore.kernel.org/r/20211118181213.1433346-3-robh@kernel.org
2021-12-16of/fdt: Rework early_init_dt_scan_chosen() to call directlyRob Herring
Use of the of_scan_flat_dt() function predates libfdt and is discouraged as libfdt provides a nicer set of APIs. Rework early_init_dt_scan_chosen() to be called directly and use libfdt. Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Frank Rowand <frowand.list@gmail.com> Cc: linuxppc-dev@lists.ozlabs.org Signed-off-by: Rob Herring <robh@kernel.org> Reviewed-by: Frank Rowand <frank.rowand@sony.com> Link: https://lore.kernel.org/r/20211118181213.1433346-2-robh@kernel.org
2021-12-16powerpc/64s/interrupt: avoid saving CFAR in some asynchronous interruptsNicholas Piggin
Reading the CFAR register is quite costly (~20 cycles on POWER9). It is a good idea to have for most synchronous interrupts, but for async ones it is much less important. Doorbell, external, and decrementer interrupts are the important asynchronous ones. HV interrupts can't skip CFAR if KVM HV is possible, because it might be a guest exit that requires CFAR preserved. But the important pseries interrupts can avoid loading CFAR. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210922145452.352571-7-npiggin@gmail.com
2021-12-16powerpc/64s/interrupt: Don't enable MSR[EE] in irq handlers unless perf is ↵Nicholas Piggin
in use Enabling MSR[EE] in interrupt handlers while interrupts are still soft masked allows PMIs to profile interrupt handlers to some degree, beyond what SIAR latching allows. When perf is not being used, this is almost useless work. It requires an extra mtmsrd in the irq handler, and it also opens the door to masked interrupts hitting and requiring replay, which is more expensive than just taking them directly. This effect can be noticable in high IRQ workloads. Avoid enabling MSR[EE] unless perf is currently in use. This saves about 60 cycles (or 8%) on a simple decrementer interrupt microbenchmark. Replayed interrupts drop from 1.4% of all interrupts taken, to 0.003%. This does prevent the soft-nmi interrupt being taken in these handlers, but that's not too reliable anyway. The SMP watchdog will continue to be the reliable way to catch lockups. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210922145452.352571-5-npiggin@gmail.com
2021-12-16powerpc/64s/interrupt: handle MSR EE and RI in interrupt entry wrapperNicholas Piggin
The mtmsrd to enable MSR[RI] can be combined with the mtmsrd to enable MSR[EE] in interrupt entry code, for those interrupts which enable EE. This helps performance of important synchronous interrupts (e.g., page faults). This is similar to what commit dd152f70bdc1 ("powerpc/64s: system call avoid setting MSR[RI] until we set MSR[EE]") does for system calls. Do this by enabling EE and RI together at the beginning of the entry wrapper if PACA_IRQ_HARD_DIS is clear, and only enabling RI if it is set. Asynchronous interrupts set PACA_IRQ_HARD_DIS, but synchronous ones leave it unchanged, so by default they always get EE=1 unless they have interrupted a caller that is hard disabled. When the sync interrupt later calls interrupt_cond_local_irq_enable(), it will not require another mtmsrd because MSR[EE] was already enabled here. This avoids one mtmsrd L=1 for synchronous interrupts on 64s, which saves about 20 cycles on POWER9. And for kernel-mode interrupts, both synchronous and asynchronous, this saves an additional 40 cycles due to the mtmsrd being moved ahead of mfspr SPRN_AMR, which prevents a SPR scoreboard stall. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20210922145452.352571-3-npiggin@gmail.com
2021-12-14powerpc/module_64: Fix livepatching for RO modulesRussell Currey
Livepatching a loaded module involves applying relocations through apply_relocate_add(), which attempts to write to read-only memory when CONFIG_STRICT_MODULE_RWX=y. Work around this by performing these writes through the text poke area by using patch_instruction(). R_PPC_REL24 is the only relocation type generated by the kpatch-build userspace tool or klp-convert kernel tree that I observed applying a relocation to a post-init module. A more comprehensive solution is planned, but using patch_instruction() for R_PPC_REL24 on should serve as a sufficient fix. This does have a performance impact, I observed ~15% overhead in module_load() on POWER8 bare metal with checksum verification off. Fixes: c35717c71e98 ("powerpc: Set ARCH_HAS_STRICT_MODULE_RWX") Cc: stable@vger.kernel.org # v5.14+ Reported-by: Joe Lawrence <joe.lawrence@redhat.com> Signed-off-by: Russell Currey <ruscur@russell.cc> Tested-by: Joe Lawrence <joe.lawrence@redhat.com> [mpe: Check return codes from patch_instruction()] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211214121248.777249-1-mpe@ellerman.id.au
2021-12-13exit: Add and use make_task_dead.Eric W. Biederman
There are two big uses of do_exit. The first is it's design use to be the guts of the exit(2) system call. The second use is to terminate a task after something catastrophic has happened like a NULL pointer in kernel code. Add a function make_task_dead that is initialy exactly the same as do_exit to cover the cases where do_exit is called to handle catastrophic failure. In time this can probably be reduced to just a light wrapper around do_task_dead. For now keep it exactly the same so that there will be no behavioral differences introducing this new concept. Replace all of the uses of do_exit that use it for catastraphic task cleanup with make_task_dead to make it clear what the code is doing. As part of this rename rewind_stack_do_exit rewind_stack_and_make_dead. Signed-off-by: "Eric W. Biederman" <ebiederm@xmission.com>
2021-12-10arch: Make ARCH_STACKWALK independent of STACKTRACEPeter Zijlstra
Make arch_stack_walk() available for ARCH_STACKWALK architectures without it being entangled in STACKTRACE. Link: https://lore.kernel.org/lkml/20211022152104.356586621@infradead.org/ Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> [Mark: rebase, drop unnecessary arm change] Signed-off-by: Mark Rutland <mark.rutland@arm.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Borislav Petkov <bp@alien8.de> Cc: Christian Borntraeger <borntraeger@de.ibm.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Heiko Carstens <hca@linux.ibm.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vasily Gorbik <gor@linux.ibm.com> Link: https://lore.kernel.org/r/20211129142849.3056714-2-mark.rutland@arm.com Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2021-12-09powerpc/fadump: Fix inaccurate CPU state info in vmcore generated with panicHari Bathini
In panic path, fadump is triggered via a panic notifier function. Before calling panic notifier functions, smp_send_stop() gets called, which stops all CPUs except the panic'ing CPU. Commit 8389b37dffdc ("powerpc: stop_this_cpu: remove the cpu from the online map.") and again commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()") started marking CPUs as offline while stopping them. So, if a kernel has either of the above commits, vmcore captured with fadump via panic path would not process register data for all CPUs except the panic'ing CPU. Sample output of crash-utility with such vmcore: # crash vmlinux vmcore ... KERNEL: vmlinux DUMPFILE: vmcore [PARTIAL DUMP] CPUS: 1 DATE: Wed Nov 10 09:56:34 EST 2021 UPTIME: 00:00:42 LOAD AVERAGE: 2.27, 0.69, 0.24 TASKS: 183 NODENAME: XXXXXXXXX RELEASE: 5.15.0+ VERSION: #974 SMP Wed Nov 10 04:18:19 CST 2021 MACHINE: ppc64le (2500 Mhz) MEMORY: 8 GB PANIC: "Kernel panic - not syncing: sysrq triggered crash" PID: 3394 COMMAND: "bash" TASK: c0000000150a5f80 [THREAD_INFO: c0000000150a5f80] CPU: 1 STATE: TASK_RUNNING (PANIC) crash> p -x __cpu_online_mask __cpu_online_mask = $1 = { bits = {0x2, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} } crash> crash> crash> p -x __cpu_active_mask __cpu_active_mask = $2 = { bits = {0xff, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0} } crash> While this has been the case since fadump was introduced, the issue was not identified for two probable reasons: - In general, the bulk of the vmcores analyzed were from crash due to exception. - The above did change since commit 8341f2f222d7 ("sysrq: Use panic() to force a crash") started using panic() instead of deferencing NULL pointer to force a kernel crash. But then commit de6e5d38417e ("powerpc: smp_send_stop do not offline stopped CPUs") stopped marking CPUs as offline till kernel commit bab26238bbd4 ("powerpc: Offline CPU in stop_this_cpu()") reverted that change. To ensure post processing register data of all other CPUs happens as intended, let panic() function take the crash friendly path (read crash_smp_send_stop()) with the help of crash_kexec_post_notifiers option. Also, as register data for all CPUs is captured by f/w, skip IPI callbacks here for fadump, to avoid any complications in finding the right backtraces. Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211207103719.91117-2-hbathini@linux.ibm.com
2021-12-09powerpc: handle kdump appropriately with crash_kexec_post_notifiers optionHari Bathini
Kdump can be triggered after panic_notifers since commit f06e5153f4ae2 ("kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers") introduced crash_kexec_post_notifiers option. But using this option would mean smp_send_stop(), that marks all other CPUs as offline, gets called before kdump is triggered. As a result, kdump routines fail to save other CPUs' registers. To fix this, kdump friendly crash_smp_send_stop() function was introduced with kernel commit 0ee59413c967 ("x86/panic: replace smp_send_stop() with kdump friendly version in panic path"). Override this kdump friendly weak function to handle crash_kexec_post_notifiers option appropriately on powerpc. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Hari Bathini <hbathini@linux.ibm.com> [Fixed signature of crash_stop_this_cpu() - reported by lkp@intel.com] Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/20211207103719.91117-1-hbathini@linux.ibm.com
2021-12-09powerpc/inst: Define ppc_inst_t as u32 on PPC32Christophe Leroy
Unlike PPC64 ABI, PPC32 uses the stack to pass a parameter defined as a struct, even when the struct has a single simple element. To avoid that, define ppc_inst_t as u32 on PPC32. Keep it as 'struct ppc_inst' when __CHECKER__ is defined so that sparse can perform type checking. Also revert commit 511eea5e2ccd ("powerpc/kprobes: Fix Oops by passing ppc_inst as a pointer to emulate_step() on ppc32") as now the instruction to be emulated is passed as a register to emulate_step(). Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/c6d0c46f598f76ad0b0a88bc0d84773bd921b17c.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/inst: Define ppc_inst_tChristophe Leroy
In order to stop using 'struct ppc_inst' on PPC32, define a ppc_inst_t typedef. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/fe5baa2c66fea9db05a8b300b3e8d2880a42596c.1638208156.git.christophe.leroy@csgroup.eu
2021-12-09powerpc/kuap: Wire-up KUAP on 85xx in 32 bits mode.Christophe Leroy
This adds KUAP support to 85xx in 32 bits mode. This is done by reading the content of SPRN_MAS1 and checking the TID at the time user pgtable is loaded. Signed-off-by: Christophe Leroy <christophe.leroy@csgroup.eu> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Link: https://lore.kernel.org/r/f8696f8980ca1532ada3a2f0e0a03e756269c7fe.1634627931.git.christophe.leroy@csgroup.eu