bcachefs.git - Unnamed repository; edit this file 'description' to name the repository.

Age	Commit message (Collapse)	Author
2024-02-22	selftests/mm: map_populate: conform test to TAP format output	Muhammad Usama Anjum
	Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. Minor cleanups have also been included. Link: https://lkml.kernel.org/r/20240202113119.2047740-4-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22	selftests/mm: map_hugetlb: conform test to TAP format output	Muhammad Usama Anjum
	Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. Link: https://lkml.kernel.org/r/20240202113119.2047740-3-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22	selftests/mm: map_fixed_noreplace: conform test to TAP format output	Muhammad Usama Anjum
	Patch series "conform tests to TAP format output", v2. This patch (of 12): Conform the layout, informational and status messages to TAP. No functional change is intended other than the layout of output messages. While at it, convert commenting style from // to /**/. Link: https://lkml.kernel.org/r/20240202113119.2047740-1-usama.anjum@collabora.com Link: https://lkml.kernel.org/r/20240202113119.2047740-2-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22	selftets/damon: prepare for monitor_on file renaming	SeongJae Park
	Following change will rename 'monitor_on' DAMON debugfs file to 'monitor_on_DEPRECATED', to make the deprecation unignorable in runtime. Since it could make DAMON selftests fail and disturb future bisects, update DAMON selftests to support the change. Link: https://lkml.kernel.org/r/20240130013549.89538-7-sj@kernel.org Signed-off-by: SeongJae Park <sj@kernel.org> Cc: Alex Shi <alexs@kernel.org> Cc: Hu Haowen <2023002089@link.tyut.edu.cn> Cc: Jonathan Corbet <corbet@lwn.net> Cc: Shuah Khan <shuah@kernel.org> Cc: Yanteng Si <siyanteng@loongson.cn> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22	selftests/mm: new test that steals pages	Breno Leitao
	This test stresses the race between of madvise(DONTNEED), a page fault and a parallel huge page mmap, which should fail due to lack of available page available for mapping. This test case must run on a system with one and only one huge page available. # echo 1 > /sys/kernel/mm/hugepages/hugepages-2048kB/nr_hugepages During setup, the test allocates the only available page, and starts three threads: - thread 1: * madvise(MADV_DONTNEED) on the allocated huge page - thread 2: * Write to the allocated huge page - thread 3: * Tries to allocated (steal) an extra huge page (which is not available) thread 3 should never succeed in the allocation, since the only huge page was never unmapped, and should be reserved. Touching the old page after thread3 allocation will raise a SIGBUS. Link: https://lkml.kernel.org/r/20240105155419.1939484-2-leitao@debian.org Signed-off-by: Breno Leitao <leitao@debian.org> Cc: Mike Rapoport (IBM) <rppt@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Rik van Riel <riel@surriel.com> Cc: Shuah Khan <shuah@kernel.org> Cc: Vegard Nossum <vegard.nossum@oracle.com> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22	selftests: mm: perform some system cleanup before using hugepages	Nico Pache
	When running with CATEGORY= (thp \| hugetlb) we see a large numbers of tests failing. These failures are due to not being able to allocate a hugepage and normally occur on memory contrainted systems or when using large page sizes. drop_cache and compact_memory before the tests for a higher chance at a successful hugepage allocation. Link: https://lkml.kernel.org/r/20240117180037.15734-1-npache@redhat.com Signed-off-by: Nico Pache <npache@redhat.com> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22	bpf: Clarify batch lookup/lookup_and_delete semantics	Martin Kelly
	The batch lookup and lookup_and_delete APIs have two parameters, in_batch and out_batch, to facilitate iterative lookup/lookup_and_deletion operations for supported maps. Except NULL for in_batch at the start of these two batch operations, both parameters need to point to memory equal or larger than the respective map key size, except for various hashmaps (hash, percpu_hash, lru_hash, lru_percpu_hash) where the in_batch/out_batch memory size should be at least 4 bytes. Document these semantics to clarify the API. Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240221211838.1241578-1-martin.kelly@crowdstrike.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2024-02-22	selftests/memfd: delete unused declarations	Greg Thelen
	Commit 32d118ad50a5 ("selftests/memfd: add tests for F_SEAL_EXEC"): - added several unused 'nbytes' local variables Commit 6469b66e3f5a ("selftests: improve vm.memfd_noexec sysctl tests"): - orphaned 'newpid_thread_fn2()' forward declaration - orphaned 'join_newpid_thread()' forward declaration - added unused 'pid' local in sysctl_simple_child() - orphaned 'fd' local in sysctl_simple_child() - added unused 'fd' in sysctl_nested_child() Delete the unused locals and forward declarations. Link: https://lkml.kernel.org/r/20240118095057.677544-1-gthelen@google.com Signed-off-by: Greg Thelen <gthelen@google.com> Cc: Aleksa Sarai <cyphar@cyphar.com> Cc: Daniel Verkamp <dverkamp@chromium.org> Cc: Jeff Xu <jeffxu@google.com> Cc: Kees Cook <keescook@chromium.org> Cc: Shuah Khan <shuah@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22	tools/mm: add thpmaps script to dump THP usage info	Ryan Roberts
	With the proliferation of large folios for file-backed memory, and more recently the introduction of multi-size THP for anonymous memory, it is becoming useful to be able to see exactly how large folios are mapped into processes. For some architectures (e.g. arm64), if most memory is mapped using contpte-sized and -aligned blocks, TLB usage can be optimized so it's useful to see where these requirements are and are not being met. thpmaps is a Python utility that reads /proc/<pid>/smaps, /proc/<pid>/pagemap and /proc/kpageflags to print information about how transparent huge pages (both file and anon) are mapped to a specified process or cgroup. It aims to help users debug and optimize their workloads. In future we may wish to introduce stats directly into the kernel (e.g. smaps or similar), but for now this provides a short term solution without the need to introduce any new ABI. Run with help option for a full listing of the arguments: # ./thpmaps --help --8<-- usage: thpmaps [-h] [--pid pid \| --cgroup path] [--rollup] [--cont size[KMG]] [--inc-smaps] [--inc-empty] [--periodic sleep_ms] Prints information about how transparent huge pages are mapped, either system-wide, or for a specified process or cgroup. When run with --pid, the user explicitly specifies the set of pids to scan. e.g. "--pid 10 [--pid 134 ...]". When run with --cgroup, the user passes either a v1 or v2 cgroup and all pids that belong to the cgroup subtree are scanned. When run with neither --pid nor --cgroup, the full set of pids on the system is gathered from /proc and scanned as if the user had provided "--pid 1 --pid 2 ...". A default set of statistics is always generated for THP mappings. However, it is also possible to generate additional statistics for "contiguous block mappings" where the block size is user-defined. Statistics are maintained independently for anonymous and file-backed (pagecache) memory and are shown both in kB and as a percentage of either total anonymous or total file-backed memory as appropriate. THP Statistics -------------- Statistics are always generated for fully- and contiguously-mapped THPs whose mapping address is aligned to their size, for each <size> supported by the system. Separate counters describe THPs mapped by PTE vs those mapped by PMD. (Although note a THP can only be mapped by PMD if it is PMD-sized): - anon-thp-pte-aligned-<size>kB - file-thp-pte-aligned-<size>kB - anon-thp-pmd-aligned-<size>kB - file-thp-pmd-aligned-<size>kB Similarly, statistics are always generated for fully- and contiguously- mapped THPs whose mapping address is not aligned to their size, for each <size> supported by the system. Due to the unaligned mapping, it is impossible to map by PMD, so there are only PTE counters for this case: - anon-thp-pte-unaligned-<size>kB - file-thp-pte-unaligned-<size>kB Statistics are also always generated for mapped pages that belong to a THP but where the is THP is not fully- and contiguously- mapped. These "partial" mappings are all counted in the same counter regardless of the size of the THP that is partially mapped: - anon-thp-pte-partial - file-thp-pte-partial Contiguous Block Statistics --------------------------- An optional, additional set of statistics is generated for every contiguous block size specified with `--cont <size>`. These statistics show how much memory is mapped in contiguous blocks of <size> and also aligned to <size>. A given contiguous block must all belong to the same THP, but there is no requirement for it to be the whole THP. Separate counters describe contiguous blocks mapped by PTE vs those mapped by PMD: - anon-cont-pte-aligned-<size>kB - file-cont-pte-aligned-<size>kB - anon-cont-pmd-aligned-<size>kB - file-cont-pmd-aligned-<size>kB As an example, if monitoring 64K contiguous blocks (--cont 64K), there are a number of sources that could provide such blocks: a fully- and contiguously-mapped 64K THP that is aligned to a 64K boundary would provide 1 block. A fully- and contiguously-mapped 128K THP that is aligned to at least a 64K boundary would provide 2 blocks. Or a 128K THP that maps its first 100K, but contiguously and starting at a 64K boundary would provide 1 block. A fully- and contiguously-mapped 2M THP would provide 32 blocks. There are many other possible permutations. options: -h, --help show this help message and exit --pid pid Process id of the target process. Maybe issued multiple times to scan multiple processes. --pid and --cgroup are mutually exclusive. If neither are provided, all processes are scanned to provide system-wide information. --cgroup path Path to the target cgroup in sysfs. Iterates over every pid in the cgroup and its children. --pid and --cgroup are mutually exclusive. If neither are provided, all processes are scanned to provide system-wide information. --rollup Sum the per-vma statistics to provide a summary over the whole system, process or cgroup. --cont size[KMG] Adds stats for memory that is mapped in contiguous blocks of <size> and also aligned to <size>. May be issued multiple times to track multiple sized blocks. Useful to infer e.g. arm64 contpte and hpa mappings. Size must be a power-of-2 number of pages. --inc-smaps Include all numerical, additive /proc/<pid>/smaps stats in the output. --inc-empty Show all statistics including those whose value is 0. --periodic sleep_ms Run in a loop, polling every sleep_ms milliseconds. Requires root privilege to access pagemap and kpageflags. --8<-- Example command to summarise fully and partially mapped THPs and 64K contiguous blocks over all VMAs in all processes in the system (--inc-empty forces printing stats that are 0): # ./thpmaps --cont 64K --rollup --inc-empty --8<-- anon-thp-pmd-aligned-2048kB: 139264 kB ( 6%) file-thp-pmd-aligned-2048kB: 0 kB ( 0%) anon-thp-pte-aligned-16kB: 0 kB ( 0%) anon-thp-pte-aligned-32kB: 0 kB ( 0%) anon-thp-pte-aligned-64kB: 72256 kB ( 3%) anon-thp-pte-aligned-128kB: 0 kB ( 0%) anon-thp-pte-aligned-256kB: 0 kB ( 0%) anon-thp-pte-aligned-512kB: 0 kB ( 0%) anon-thp-pte-aligned-1024kB: 0 kB ( 0%) anon-thp-pte-aligned-2048kB: 0 kB ( 0%) anon-thp-pte-unaligned-16kB: 0 kB ( 0%) anon-thp-pte-unaligned-32kB: 0 kB ( 0%) anon-thp-pte-unaligned-64kB: 0 kB ( 0%) anon-thp-pte-unaligned-128kB: 0 kB ( 0%) anon-thp-pte-unaligned-256kB: 0 kB ( 0%) anon-thp-pte-unaligned-512kB: 0 kB ( 0%) anon-thp-pte-unaligned-1024kB: 0 kB ( 0%) anon-thp-pte-unaligned-2048kB: 0 kB ( 0%) anon-thp-pte-partial: 63232 kB ( 3%) file-thp-pte-aligned-16kB: 809024 kB (47%) file-thp-pte-aligned-32kB: 43168 kB ( 3%) file-thp-pte-aligned-64kB: 98496 kB ( 6%) file-thp-pte-aligned-128kB: 17536 kB ( 1%) file-thp-pte-aligned-256kB: 0 kB ( 0%) file-thp-pte-aligned-512kB: 0 kB ( 0%) file-thp-pte-aligned-1024kB: 0 kB ( 0%) file-thp-pte-aligned-2048kB: 0 kB ( 0%) file-thp-pte-unaligned-16kB: 21712 kB ( 1%) file-thp-pte-unaligned-32kB: 704 kB ( 0%) file-thp-pte-unaligned-64kB: 896 kB ( 0%) file-thp-pte-unaligned-128kB: 44928 kB ( 3%) file-thp-pte-unaligned-256kB: 0 kB ( 0%) file-thp-pte-unaligned-512kB: 0 kB ( 0%) file-thp-pte-unaligned-1024kB: 0 kB ( 0%) file-thp-pte-unaligned-2048kB: 0 kB ( 0%) file-thp-pte-partial: 9252 kB ( 1%) anon-cont-pmd-aligned-64kB: 139264 kB ( 6%) file-cont-pmd-aligned-64kB: 0 kB ( 0%) anon-cont-pte-aligned-64kB: 100672 kB ( 4%) file-cont-pte-aligned-64kB: 161856 kB ( 9%) --8<-- Link: https://lkml.kernel.org/r/20240116141235.960842-1-ryan.roberts@arm.com Signed-off-by: Ryan Roberts <ryan.roberts@arm.com> Tested-by: Barry Song <v-songbaohua@oppo.com> Cc: Alistair Popple <apopple@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: John Hubbard <jhubbard@nvidia.com> Cc: Kefeng Wang <wangkefeng.wang@huawei.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: William Kucharski <william.kucharski@oracle.com> Cc: Zenghui Yu <yuzenghui@huawei.com> Cc: Zi Yan <ziy@nvidia.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-22	Merge tag 'net-6.8.0-rc6' of ↵	Linus Torvalds
	git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Paolo Abeni: "Including fixes from bpf and netfilter. Current release - regressions: - af_unix: fix another unix GC hangup Previous releases - regressions: - core: fix a possible AF_UNIX deadlock - bpf: fix NULL pointer dereference in sk_psock_verdict_data_ready() - netfilter: nft_flow_offload: release dst in case direct xmit path is used - bridge: switchdev: ensure MDB events are delivered exactly once - l2tp: pass correct message length to ip6_append_data - dccp/tcp: unhash sk from ehash for tb2 alloc failure after check_estalblished() - tls: fixes for record type handling with PEEK - devlink: fix possible use-after-free and memory leaks in devlink_init() Previous releases - always broken: - bpf: fix an oops when attempting to read the vsyscall page through bpf_probe_read_kernel - sched: act_mirred: use the backlog for mirred ingress - netfilter: nft_flow_offload: fix dst refcount underflow - ipv6: sr: fix possible use-after-free and null-ptr-deref - mptcp: fix several data races - phonet: take correct lock to peek at the RX queue Misc: - handful of fixes and reliability improvements for selftests" * tag 'net-6.8.0-rc6' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (72 commits) l2tp: pass correct message length to ip6_append_data net: phy: realtek: Fix rtl8211f_config_init() for RTL8211F(D)(I)-VD-CG PHY selftests: ioam: refactoring to align with the fix Fix write to cloned skb in ipv6_hop_ioam() phonet/pep: fix racy skb_queue_empty() use phonet: take correct lock to peek at the RX queue net: sparx5: Add spinlock for frame transmission from CPU net/sched: flower: Add lock protection when remove filter handle devlink: fix port dump cmd type net: stmmac: Fix EST offset for dwmac 5.10 tools: ynl: don't leak mcast_groups on init error tools: ynl: make sure we always pass yarg to mnl_cb_run net: mctp: put sock on tag allocation failure netfilter: nf_tables: use kzalloc for hook allocation netfilter: nf_tables: register hooks last when adding new chain/flowtable netfilter: nft_flow_offload: release dst in case direct xmit path is used netfilter: nft_flow_offload: reset dst in route object after setting up flow netfilter: nf_tables: set dormant flag on hook register failure selftests: tls: add test for peeking past a record of a different type selftests: tls: add test for merging of same-type control messages ...
2024-02-22	perf tests: Add option to run tests in parallel	Ian Rogers
	By default tests are forked, add an option (-p or --parallel) so that the forked tests are all started in parallel and then their output gathered serially. This is opt-in as running in parallel can cause test flakes. Rather than fork within the code, the start_command/finish_command from libsubcmd are used. This changes how stderr and stdout are handled. The child stderr and stdout are always read to avoid the child blocking. If verbose is 1 (-v) then if the test fails the child stdout and stderr are displayed. If the verbose is >1 (e.g. -vv) then the stdout and stderr from the child are immediately displayed. An unscientific test on my laptop shows the wall clock time for perf test without parallel being 5 minutes 21 seconds and with parallel (-p) being 1 minute 50 seconds. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-9-irogers@google.com
2024-02-22	perf tests: Run time generate shell test suites	Ian Rogers
	Rather than special shell test logic, do a single pass to create an array of test suites. Hold the shell test file name in the test suite priv field. This makes the special shell test logic in builtin-test.c redundant so remove it. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-8-irogers@google.com
2024-02-22	perf tests: Use scandirat for shell script finding	Ian Rogers
	Avoid filename appending buffers by using openat, faccessat and scandirat more widely. Turn the script's path back to a file name using readlink from /proc/<pid>/fd/<fd>. Read the script's description using api/io.h to avoid fdopen conversions. Whilst reading perform additional sanity checks on the script's contents. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-7-irogers@google.com
2024-02-22	perf test: Rename builtin-test-list and add missed header guard	Ian Rogers
	builtin-test-list is primarily concerned with shell script tests. Rename the file to better reflect this and add a missed header guard. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-6-irogers@google.com
2024-02-22	tools subcmd: Add a no exec function call option	Ian Rogers
	Tools like perf fork tests in case they crash, but they don't want to exec a full binary. Add an option to call a function rather than do an exec. The child process exits with the result of the function call and is passed the struct of the run_command, things like container_of can then allow the child process function to determine additional arguments. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-5-irogers@google.com
2024-02-22	perf tests: Avoid fork in perf_has_symbol test	Ian Rogers
	perf test -vv Symbols is used to indentify symbols within the perf binary. Add the -F flag so that the test command doesn't fork the test before running. This removes a little overhead. Acked-by: Adrian Hunter <adrian.hunter@intel.com> Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-4-irogers@google.com
2024-02-22	perf list: Add scandirat compatibility function	Ian Rogers
	scandirat is used during the printing of tracepoint events but may be missing from certain libcs. Add a compatibility implementation that uses the symlink of an fd in /proc as a path for the reliably present scandir. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-3-irogers@google.com
2024-02-22	perf thread_map: Skip exited threads when scanning /proc	Ian Rogers
	Scanning /proc is inherently racy. Scanning /proc/pid/task within that is also racy as the pid can terminate. Rather than failing in __thread_map__new_all_cpus, skip pids for such failures. Signed-off-by: Ian Rogers <irogers@google.com> Cc: James Clark <james.clark@arm.com> Cc: Justin Stitt <justinstitt@google.com> Cc: Bill Wendling <morbo@google.com> Cc: Nick Desaulniers <ndesaulniers@google.com> Cc: Yang Jihong <yangjihong1@huawei.com> Cc: Nathan Chancellor <nathan@kernel.org> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com> Cc: llvm@lists.linux.dev Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221034155.1500118-2-irogers@google.com
2024-02-22	perf list: fix short description for some cache events	Thomas Richter
	Correct the short description of the following events: DCW_REQ, DCW_REQ_CHIP_HIT, DCW_REQ_DRAWER_HIT, DCW_REQ_IV, DCW_ON_CHIP, DCW_ON_CHIP_IV, DCW_ON_CHIP_CHIP_HIT, DCW_ON_CHIP_DRAWER_HIT, CW_ON_MODULE, DCW_ON_DRAWER, DCW_OFF_DRAWER, IDCW_ON_MODULE_IV, IDCW_ON_MODULE_CHIP_HIT, IDCW_ON_MODULE_DRAWER_HIT, IDCW_ON_DRAWER_IV, IDCW_ON_DRAWER_CHIP_HIT, IDCW_ON_DRAWER_DRAWER_HIT, IDCW_OFF_DRAWER_IV, IDCW_OFF_DRAWER_CHIP_HIT, IDCW_OFF_DRAWER_DRAWER_HIT, ICW_REQ, ICW_REQ_IV, CW_REQ_CHIP_HIT, ICW_REQ_DRAWER_HIT, ICW_ON_CHIP, ICW_ON_CHIP_IV, ICW_ON_CHIP_CHIP_HIT, ICW_ON_CHIP_DRAWER_HIT, ICW_ON_MODULE and ICW_OFF_DRAWER. The second Cache should be L2-Cache. Output before (display diff of the first four events) # perf list -d DCW_REQ [Directory Write Level 1 Data Cache from Cache. Unit: cpum_cf] DCW_REQ_CHIP_HIT [Directory Write Level 1 Data Cache from Cache with Chip HP \ Hit. Unit: cpum_cf] DCW_REQ_DRAWER_HIT [Directory Write Level 1 Data Cache from Cache with Drawer \ HP Hit. Unit: cpum_cf] DCW_REQ_IV [Directory Write Level 1 Data Cache from Cache with Intervention. \ Unit: cpum_cf] Output after: # perf list -d DCW_REQ [Directory Write Level 1 Data Cache from L2-Cache. Unit: cpum_cf] DCW_REQ_CHIP_HIT [Directory Write Level 1 Data Cache from L2-Cache with Chip HP \ Hit. Unit: cpum_cf] DCW_REQ_DRAWER_HIT [Directory Write Level 1 Data Cache from L2-Cache with Drawer \ HP Hit. Unit: cpum_cf] DCW_REQ_IV [Directory Write Level 1 Data Cache from L2-Cache with \ Intervention. Unit: cpum_cf] Fixes: 7f76b3113068 ("perf list: Add IBM z16 event description for s390") Reported-by: Andreas Krebbel <krebbel@linux.ibm.com> Signed-off-by: Thomas Richter <tmricht@linux.ibm.com> Acked-by: Andreas Krebbel <krebbel@linux.ibm.com> Reviewed-by: Ian Rogers <irogers@google.com> Cc: gor@linux.ibm.com Cc: hca@linux.ibm.com Cc: sumanthk@linux.ibm.com Cc: svens@linux.ibm.com Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221091908.1759083-1-tmricht@linux.ibm.com
2024-02-22	perf stat: Fix metric-only aggregation index	Ian Rogers
	Aggregation index was being computed using the evsel's cpumap which may have a different (typically the same or fewer) entries. Before: ``` $ perf stat --metric-only -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total CPU0 12.8 0.0 12.9 12.7 0.0 12.6 CPU1 1.007806367 seconds time elapsed ``` After: ``` $ perf stat --metric-only -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total MB/s memory_bandwidth_total CPU0 15.4 0.0 15.3 15.0 0.0 14.9 CPU18 0.0 0.0 13.5 5.2 0.0 11.9 1.007858736 seconds time elapsed ``` Signed-off-by: Ian Rogers <irogers@google.com> \| Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Kaige Ye <ye@kaige.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221070754.4163916-3-irogers@google.com
2024-02-22	perf metrics: Compute unmerged uncore metrics individually	Ian Rogers
	When merging counts from multiple uncore PMUs the metric is only computed for the metric leader. When merging/aggregation is disabled, prior to this patch just the leader's metric would be computed. Fix this by computing the metric for each PMU. On a SkylakeX: Before: ``` $ perf stat -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': CPU0 82,217 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.2 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total CPU0 61,395 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU0 81,570 UNC_M_CAS_COUNT.RD [uncore_imc_2] CPU18 113,886 UNC_M_CAS_COUNT.RD [uncore_imc_2] CPU0 62,330 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU18 66,942 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU0 75,489 UNC_M_CAS_COUNT.RD [uncore_imc_3] CPU18 27,958 UNC_M_CAS_COUNT.RD [uncore_imc_3] CPU0 55,864 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU18 38,727 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU0 75,423 UNC_M_CAS_COUNT.RD [uncore_imc_5] CPU18 104,527 UNC_M_CAS_COUNT.RD [uncore_imc_5] CPU0 57,596 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU18 56,777 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU0 1,003,440,851 ns duration_time 1.003440851 seconds time elapsed ``` After: ``` $ perf stat -A -M memory_bandwidth_total -a sleep 1 Performance counter stats for 'system wide': CPU0 88,968 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 9.5 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_0] # 0.0 MB/s memory_bandwidth_total CPU0 59,498 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_0] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_1] # 0.0 MB/s memory_bandwidth_total CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_1] CPU0 88,635 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 9.5 MB/s memory_bandwidth_total CPU18 117,975 UNC_M_CAS_COUNT.RD [uncore_imc_2] # 11.5 MB/s memory_bandwidth_total CPU0 60,829 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU18 62,105 UNC_M_CAS_COUNT.WR [uncore_imc_2] CPU0 82,238 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 8.7 MB/s memory_bandwidth_total CPU18 22,906 UNC_M_CAS_COUNT.RD [uncore_imc_3] # 3.6 MB/s memory_bandwidth_total CPU0 53,959 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU18 32,990 UNC_M_CAS_COUNT.WR [uncore_imc_3] CPU0 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total CPU18 0 UNC_M_CAS_COUNT.RD [uncore_imc_4] # 0.0 MB/s memory_bandwidth_total CPU0 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU18 0 UNC_M_CAS_COUNT.WR [uncore_imc_4] CPU0 83,595 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 8.9 MB/s memory_bandwidth_total CPU18 110,151 UNC_M_CAS_COUNT.RD [uncore_imc_5] # 10.5 MB/s memory_bandwidth_total CPU0 56,540 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU18 53,816 UNC_M_CAS_COUNT.WR [uncore_imc_5] CPU0 1,003,353,416 ns duration_time ``` Signed-off-by: Ian Rogers <irogers@google.com> \| Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Kaige Ye <ye@kaige.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221070754.4163916-2-irogers@google.com
2024-02-22	perf stat: Pass fewer metric arguments	Ian Rogers
	Pass metric_expr and evsel rather than specific variables from the struct, thereby reducing the number of arguments. This will enable later fixes. To reduce the size of the diff, local variables are added to match the previous parameter names. This isn't done in the case of "name" as evsel->name is more intention revealing. A whitespace issue is also addressed. Signed-off-by: Ian Rogers <irogers@google.com> Acked-by: Namhyung Kim <namhyung@kernel.org> Cc: K Prateek Nayak <kprateek.nayak@amd.com> Cc: Stephane Eranian <eranian@google.com> Cc: Kaige Ye <ye@kaige.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: John Garry <john.g.garry@oracle.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240221070754.4163916-1-irogers@google.com
2024-02-22	selftests/bpf: update tcp_custom_syncookie to use scalar packet offset	Eduard Zingerman
	This commit updates tcp_custom_syncookie.c:tcp_parse_option() to use explicit packet offset (ctx->off) for packet access instead of ever moving pointer (ctx->ptr), this reduces verification complexity: - the tcp_parse_option() is passed as a callback to bpf_loop(); - suppose a checkpoint is created each time at function entry; - the ctx->ptr is tracked by verifier as PTR_TO_PACKET; - the ctx->ptr is incremented in tcp_parse_option(), thus umax_value field tracked for it is incremented as well; - on each next iteration of tcp_parse_option() checkpoint from a previous iteration can't be reused for state pruning, because PTR_TO_PACKET registers are considered equivalent only if old->umax_value >= cur->umax_value; - on the other hand, the ctx->off is a SCALAR, subject to widen_imprecise_scalars(); - it's exact bounds are eventually forgotten and it is tracked as unknown scalar at entry to tcp_parse_option(); - hence checkpoints created at the start of the function eventually converge. The change is similar to one applied in [0] to xdp_synproxy_kern.c. Comparing before and after with veristat yields following results: File Insns (A) Insns (B) Insns (DIFF) ------------------------------- --------- --------- ----------------- test_tcp_custom_syncookie.bpf.o 466657 12423 -454234 (-97.34%) [0] commit 977bc146d4eb ("selftests/bpf: track tcp payload offset as scalar in xdp_synproxy") Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240222150300.14909-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-02-22	KVM: s390: selftests: memop: add a simple AR test	Eric Farman
	There is a selftest that checks for an (expected) error when an invalid AR is specified, but not one that exercises the AR path. Add a simple test that mirrors the vanilla write/read test while providing an AR. An AR that contains zero will direct the CPU to use the primary address space normally used anyway. AR[1] is selected for this test because the host AR[1] is usually non-zero, and KVM needs to correctly swap those values. Reviewed-by: Nina Schoetterl-Glausch <nsg@linux.ibm.com> Acked-by: Christian Borntraeger <borntraeger@linux.ibm.com> Signed-off-by: Eric Farman <farman@linux.ibm.com> Link: https://lore.kernel.org/r/20240220211211.3102609-3-farman@linux.ibm.com Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
2024-02-22	KVM: selftests: re-map Xen's vcpu_info using HVA rather than GPA	Paul Durrant
	If the relevant capability (KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) is present then re-map vcpu_info using the HVA part way through the tests to make sure then there is no functional change. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20240215152916.1158-16-paul@xen.org Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-22	KVM: selftests: map Xen's shared_info page using HVA rather than GFN	Paul Durrant
	Using the HVA of the shared_info page is more efficient, so if the capability (KVM_XEN_HVM_CONFIG_SHARED_INFO_HVA) is present use that method to do the mapping. NOTE: Have the juggle_shinfo_state() thread map and unmap using both GFN and HVA, to make sure the older mechanism is not broken. Signed-off-by: Paul Durrant <pdurrant@amazon.com> Reviewed-by: David Woodhouse <dwmw@amazon.co.uk> Link: https://lore.kernel.org/r/20240215152916.1158-15-paul@xen.org Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-22	selftests/iommu: fix the config fragment	Muhammad Usama Anjum
	The config fragment doesn't follow the correct format to enable those config options which make the config options getting missed while merging with other configs. ➜ merge_config.sh -m .config tools/testing/selftests/iommu/config Using .config as base Merging tools/testing/selftests/iommu/config ➜ make olddefconfig .config:5295:warning: unexpected data: CONFIG_IOMMUFD .config:5296:warning: unexpected data: CONFIG_IOMMUFD_TEST While at it, add CONFIG_FAULT_INJECTION as well which is needed for CONFIG_IOMMUFD_TEST. If CONFIG_FAULT_INJECTION isn't present in base config (such as x86 defconfig), CONFIG_IOMMUFD_TEST doesn't get enabled. Fixes: 57f0988706fe ("iommufd: Add a selftest") Link: https://lore.kernel.org/r/20240222074934.71380-1-usama.anjum@collabora.com Signed-off-by: Muhammad Usama Anjum <usama.anjum@collabora.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2024-02-22	net: mctp: tests: Test that outgoing skbs have flow data populated	Jeremy Kerr
	When CONFIG_MCTP_FLOWS is enabled, outgoing skbs should have their SKB_EXT_MCTP extension set for drivers to consume. Add two tests for local-to-output routing that check for the flow extensions: one for the simple single-packet case, and one for fragmentation. We now make MCTP_TEST select MCTP_FLOWS, so we always get coverage of these flow tests. The tests are skippable if MCTP_FLOWS is (otherwise) disabled, but that would need manual config tweaking. Signed-off-by: Jeremy Kerr <jk@codeconstruct.com.au> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22	x86/insn: Directly assign x86_64 state in insn_init()	Nikolay Borisov
	No point in checking again as this was already done by the caller. Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240222111636.2214523-3-nik.borisov@suse.com
2024-02-22	x86/insn: Remove superfluous checks from instruction decoding routines	Nikolay Borisov
	It's pointless checking if a particular part of an instruction is decoded before calling the routine responsible for decoding it as this check is duplicated in the routines itself. Streamline the code by removing the superfluous checks. No functional difference. Signed-off-by: Nikolay Borisov <nik.borisov@suse.com> Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Josh Poimboeuf <jpoimboe@redhat.com> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/20240222111636.2214523-2-nik.borisov@suse.com
2024-02-22	Merge tag 'for-netdev' of ↵	Paolo Abeni
	https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Daniel Borkmann says: ==================== pull-request: bpf 2024-02-22 The following pull-request contains BPF updates for your net tree. We've added 11 non-merge commits during the last 24 day(s) which contain a total of 15 files changed, 217 insertions(+), 17 deletions(-). The main changes are: 1) Fix a syzkaller-triggered oops when attempting to read the vsyscall page through bpf_probe_read_kernel and friends, from Hou Tao. 2) Fix a kernel panic due to uninitialized iter position pointer in bpf_iter_task, from Yafang Shao. 3) Fix a race between bpf_timer_cancel_and_free and bpf_timer_cancel, from Martin KaFai Lau. 4) Fix a xsk warning in skb_add_rx_frag() (under CONFIG_DEBUG_NET) due to incorrect truesize accounting, from Sebastian Andrzej Siewior. 5) Fix a NULL pointer dereference in sk_psock_verdict_data_ready, from Shigeru Yoshida. 6) Fix a resolve_btfids warning when bpf_cpumask symbol cannot be resolved, from Hari Bathini. bpf-for-netdev * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, sockmap: Fix NULL pointer dereference in sk_psock_verdict_data_ready() selftests/bpf: Add negtive test cases for task iter bpf: Fix an issue due to uninitialized bpf_iter_task selftests/bpf: Test racing between bpf_timer_cancel_and_free and bpf_timer_cancel bpf: Fix racing between bpf_timer_cancel_and_free and bpf_timer_cancel selftest/bpf: Test the read of vsyscall page under x86-64 x86/mm: Disallow vsyscall page read for copy_from_kernel_nofault() x86/mm: Move is_vsyscall_vaddr() into asm/vsyscall.h bpf, scripts: Correct GPL license name xsk: Add truesize to skb_add_rx_frag(). bpf: Fix warning for bpf_cpumask in verifier ==================== Link: https://lore.kernel.org/r/20240221231826.1404-1-daniel@iogearbox.net Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-22	selftests: ioam: refactoring to align with the fix	Justin Iurman
	ioam6_parser uses a packet socket. After the fix to prevent writing to cloned skb's, the receiver does not see its IOAM data anymore, which makes input/forward ioam-selftests to fail. As a workaround, ioam6_parser now uses an IPv6 raw socket and leverages ancillary data to get hop-by-hop options. As a consequence, the hook is "after" the IOAM data insertion by the receiver and all tests are working again. Signed-off-by: Justin Iurman <justin.iurman@uliege.be> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-02-21	tools: ynl: don't leak mcast_groups on init error	Jakub Kicinski
	Make sure to free the already-parsed mcast_groups if we don't get an ack from the kernel when reading family info. This is part of the ynl_sock_create() error path, so we won't get a call to ynl_sock_destroy() to free them later. Fixes: 86878f14d71a ("tools: ynl: user space helpers") Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20240220161112.2735195-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	tools: ynl: make sure we always pass yarg to mnl_cb_run	Jakub Kicinski
	There is one common error handler in ynl - ynl_cb_error(). It expects priv to be a pointer to struct ynl_parse_arg AKA yarg. To avoid potential crashes if we encounter a stray NLMSG_ERROR always pass yarg as priv (or a struct which has it as the first member). ynl_cb_null() has a similar problem directly - it expects yarg but priv passed by the caller is ys. Found by code inspection. Fixes: 86878f14d71a ("tools: ynl: user space helpers") Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Link: https://lore.kernel.org/r/20240220161112.2735195-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	selftests/mm/ksm_functional: prevent unmapping undefined address	JP Kobryn
	Replace some goto statements with return statements so that unmap() is not called on an undefined address. This change is made so that unmap() can only be reached after mmap() is called (and the address mentioned is defined). Returning MAP_FAILED seems acceptable since client code checks for this value. Link: https://lkml.kernel.org/r/20240105202401.28851-1-inwardvessel@gmail.com Fixes: 42096aa24b82 ("selftest/mm: ksm_functional_tests: test in mmap_and_merge_range() if anything got merged") Signed-off-by: JP Kobryn <inwardvessel@gmail.com> Reviewed-by: David Hildenbrand <david@redhat.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
2024-02-21	selftests: tls: add test for peeking past a record of a different type	Sabrina Dubroca
	If we queue 3 records: - record 1, type DATA - record 2, some other type - record 3, type DATA the current code can look past the 2nd record and merge the 2 data records. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/4623550f8617c239581030c13402d3262f2bd14f.1708007371.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	selftests: tls: add test for merging of same-type control messages	Sabrina Dubroca
	Two consecutive control messages of the same type should never be merged into one large received blob of data. Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Link: https://lore.kernel.org/r/018f1633d5471684c65def5fe390de3b15c3d683.1708007371.git.sd@queasysnail.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-21	selftests/bpf: Remove intermediate test files.	Alexei Starovoitov
	The test of linking process creates several intermediate files. Remove them once the build is over. This reduces the number of files in selftests/bpf/ directory from ~4400 to ~2600. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240220231102.49090-1-alexei.starovoitov@gmail.com
2024-02-21	kselftest/arm64: Test that ptrace takes effect in the target process	Mark Brown
	While we have test coverage for the ptrace interface in our selftests the current programs have a number of gaps. The testing is done per regset so does not cover interactions and at no point do any of the tests actually run the traced processes meaning that there is no validation that anything we read or write corresponds to register values the process actually sees. Let's add a new program which attempts to cover these gaps. Each test we do performs a single ptrace write. For each test we generate some random initial register data in memory and then fork() and trace a child. The child will load the generated data into the registers then trigger a breakpoint. The parent waits for the breakpoint then reads the entire child register state via ptrace, verifying that the values expected were actually loaded by the child. It then does the write being tested and resumes the child. Once resumed the child saves the register state it sees to memory and executes another breakpoint. The parent uses process_vm_readv() to get these values from the child and verifies that the values were as expected before cleaning up the child. We generate configurations with combinations of vector lengths and SVCR values and then try every ptrace write which will implement the transition we generated. In order to control execution time (especially in emulation) we only cover the minimum and maximum VL for each of SVE and SME, this will ensure we generate both increasing and decreasing changes in vector length. In order to provide a baseline test we also check the case where we resume the child without doing a ptrace write. In order to simplify the generation of the test count for kselftest we will report but skip a substantial number of tests that can't actually be expressed via a single ptrace write, several times more than we actually run. This is noisy and will add some overhead but is very much simpler so is probably worth the tradeoff. Signed-off-by: Mark Brown <broonie@kernel.org> Link: https://lore.kernel.org/r/20240122-arm64-test-ptrace-regs-v1-1-0897f822d73e@kernel.org Signed-off-by: Catalin Marinas <catalin.marinas@arm.com>
2024-02-21	KVM: selftests: Test top-down slots event in x86's pmu_counters_test	Dapeng Mi
	Although the fixed counter 3 and its exclusive pseudo slots event are not supported by KVM yet, the architectural slots event is supported by KVM and can be programmed on any GP counter. Thus add validation for this architectural slots event. Top-down slots event "counts the total number of available slots for an unhalted logical processor, and increments by machine-width of the narrowest pipeline as employed by the Top-down Microarchitecture Analysis method." As for the slot, it's an abstract concept which indicates how many uops (decoded from instructions) can be processed simultaneously (per cycle) on HW. In Top-down Microarchitecture Analysis (TMA) method, the processor is divided into two parts, frond-end and back-end. Assume there is a processor with classic 5-stage pipeline, fetch, decode, execute, memory access and register writeback. The former 2 stages (fetch/decode) are classified to frond-end and the latter 3 stages are classified to back-end. In modern Intel processors, a complicated instruction would be decoded into several uops (micro-operations) and so these uops can be processed simultaneously and then improve the performance. Thus, assume a processor can decode and dispatch 4 uops in front-end and execute 4 uops in back-end simultaneously (per-cycle), so the machine-width of this processor is 4 and this processor has 4 topdown slots per-cycle. If a slot is spare and can be used to process a new upcoming uop, then the slot is available, but if a uop occupies a slot for several cycles and can't be retired (maybe blocked by memory access), then this slot is stall and unavailable. Considering the testing instruction sequence can't be macro-fused on x86 platforms, the measured slots count should not be less than NUM_INSNS_RETIRED. Thus assert the slots count against NUM_INSNS_RETIRED. pmu_counters_test passed with this patch on Intel Sapphire Rapids. About the more information about TMA method, please refer the below link. https://www.intel.com/content/www/us/en/docs/vtune-profiler/cookbook/2023-0/top-down-microarchitecture-analysis-method.html Signed-off-by: Dapeng Mi <dapeng1.mi@linux.intel.com> Link: https://lore.kernel.org/r/20240218043003.2424683-1-dapeng1.mi@linux.intel.com Signed-off-by: Sean Christopherson <seanjc@google.com>
2024-02-21	Merge tag 'wireless-next-2024-02-20' of ↵	David S. Miller
	git://git.kernel.org/pub/scm/linux/kernel/git/wireless/wireless-next Kalle Valo says: ==================== wireless-next patches for v6.9 The second "new features" pull request for v6.9. Lots of iwlwifi and stack changes this time. And naturally smaller changes to other drivers. We also twice merged wireless into wireless-next to avoid conflicts between the trees. Major changes: stack * mac80211: negotiated TTLM request support * SPP A-MSDU support * mac80211: wider bandwidth OFDMA config support iwlwifi * kunit tests * bump FW API to 89 for AX/BZ/SC devices * enable SPP A-MSDUs * support for new devices ath12k * refactoring in preparation for Multi-Link Operation (MLO) support * 1024 Block Ack window size support * provide firmware wmi logs via a trace event ath11k * 36 bit DMA mask support * support 6 GHz station power modes: Low Power Indoor (LPI), Standard Power) SP and Very Low Power (VLP) rtl8xxxu * TP-Link TL-WN823N V2 support ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2024-02-21	clocksource: Scale the watchdog read retries automatically	Feng Tang
	On a 8-socket server the TSC is wrongly marked as 'unstable' and disabled during boot time on about one out of 120 boot attempts: clocksource: timekeeping watchdog on CPU227: wd-tsc-wd excessive read-back delay of 153560ns vs. limit of 125000ns, wd-wd read-back delay only 11440ns, attempt 3, marking tsc unstable tsc: Marking TSC unstable due to clocksource watchdog TSC found unstable after boot, most likely due to broken BIOS. Use 'tsc=unstable'. sched_clock: Marking unstable (119294969739, 159204297)<-(125446229205, -5992055152) clocksource: Checking clocksource tsc synchronization from CPU 319 to CPUs 0,99,136,180,210,542,601,896. clocksource: Switched to clocksource hpet The reason is that for platform with a large number of CPUs, there are sporadic big or huge read latencies while reading the watchog/clocksource during boot or when system is under stress work load, and the frequency and maximum value of the latency goes up with the number of online CPUs. The cCurrent code already has logic to detect and filter such high latency case by reading the watchdog twice and checking the two deltas. Due to the randomness of the latency, there is a low probabilty that the first delta (latency) is big, but the second delta is small and looks valid. The watchdog code retries the readouts by default twice, which is not necessarily sufficient for systems with a large number of CPUs. There is a command line parameter 'max_cswd_read_retries' which allows to increase the number of retries, but that's not user friendly as it needs to be tweaked per system. As the number of required retries is proportional to the number of online CPUs, this parameter can be calculated at runtime. Scale and enlarge the number of retries according to the number of online CPUs and remove the command line parameter completely. [ tglx: Massaged change log and comments ] Signed-off-by: Feng Tang <feng.tang@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Jin Wang <jin1.wang@intel.com> Tested-by: Paul E. McKenney <paulmck@kernel.org> Reviewed-by: Waiman Long <longman@redhat.com> Reviewed-by: Paul E. McKenney <paulmck@kernel.org> Link: https://lore.kernel.org/r/20240221060859.1027450-1-feng.tang@intel.com
2024-02-20	Merge branch 'for-6.8/cxl-cper' into for-6.8/cxl	Dan Williams
	Pick up CXL CPER notification removal for v6.8-rc6, to return in a later merge window.
2024-02-20	perf: script: prefer capstone to XED	Changbin Du
	Now perf can show assembly instructions with libcapstone for x86, and the capstone is better in general. Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-6-changbin.du@huawei.com
2024-02-20	perf: script: add raw\|disasm arguments to --insn-trace option	Changbin Du
	Now '--insn-trace' accept a argument to specify the output format: - raw: display raw instructions. - disasm: display mnemonic instructions (if capstone is installed). $ sudo perf script --insn-trace=raw ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: 48 89 e7 ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: e8 e8 0c 00 00 ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) insn: f3 0f 1e fa $ sudo perf script --insn-trace=disasm ls 1443864 [006] 2275506.209908875: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi ls 1443864 [006] 2275506.209908875: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0 ls 1443864 [006] 2275506.209908875: 7f216b426df0 _dl_start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) illegal instruction ls 1443864 [006] 2275506.209908875: 7f216b426df4 _dl_start+0x4 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %rbp ls 1443864 [006] 2275506.209908875: 7f216b426df5 _dl_start+0x5 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rbp ls 1443864 [006] 2275506.209908875: 7f216b426df8 _dl_start+0x8 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) pushq %r15 Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-5-changbin.du@huawei.com
2024-02-20	perf: script: add field 'disasm' to display mnemonic instructions	Changbin Du
	In addition to the 'insn' field, this adds a new field 'disasm' to display mnemonic instructions instead of the raw code. $ sudo perf script -F +disasm perf-exec 1443864 [006] 2275506.209848: psb: psb offs: 0 0 [unknown] ([unknown]) perf-exec 1443864 [006] 2275506.209848: cbr: cbr: 41 freq: 4100 MHz (114%) 0 [unknown] ([unknown]) ls 1443864 [006] 2275506.209905: 1 branches:uH: 7f216b426100 _start+0x0 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) movq %rsp, %rdi ls 1443864 [006] 2275506.209908: 1 branches:uH: 7f216b426103 _start+0x3 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) callq _dl_start+0x0 Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-4-changbin.du@huawei.com
2024-02-20	perf: util: use capstone disasm engine to show assembly instructions	Changbin Du
	Currently, the instructions of samples are shown as raw hex strings which are hard to read. x86 has a special option '--xed' to disassemble the hex string via intel XED tool. Here we use capstone as our disassembler engine to give more friendly instructions. We select libcapstone because capstone can provide more insn details. Perf will fallback to raw instructions if libcapstone is not available. The advantages compared to XED tool: * Support arm, arm64, x86-32, x86_64 (more could be supported), xed only for x86_64. * Immediate address operands are shown as symbol+offs. Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-3-changbin.du@huawei.com
2024-02-20	perf: build: introduce the libcapstone	Changbin Du
	Later we will use libcapstone to disassemble instructions of samples. Signed-off-by: Changbin Du <changbin.du@huawei.com> Reviewed-by: Adrian Hunter <adrian.hunter@intel.com> Cc: changbin.du@gmail.com Cc: Thomas Richter <tmricht@linux.ibm.com> Cc: Andi Kleen <ak@linux.intel.com> Signed-off-by: Namhyung Kim <namhyung@kernel.org> Link: https://lore.kernel.org/r/20240217074046.4100789-2-changbin.du@huawei.com
2024-02-20	selftests: sched: Fix spelling mistake "hiearchy" -> "hierarchy"	Colin Ian King
	There is a spelling mistake in a printed message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>
2024-02-20	selftests/mqueue: Set timeout to 180 seconds	SeongJae Park
	While mq_perf_tests runs with the default kselftest timeout limit, which is 45 seconds, the test takes about 60 seconds to complete on i3.metal AWS instances. Hence, the test always times out. Increase the timeout to 180 seconds. Fixes: 852c8cbf34d3 ("selftests/kselftest/runner.sh: Add 45 second timeout per test") Cc: <stable@vger.kernel.org> # 5.4.x Signed-off-by: SeongJae Park <sj@kernel.org> Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Shuah Khan <skhan@linuxfoundation.org>