summaryrefslogtreecommitdiff
path: root/lib
AgeCommit message (Collapse)Author
2022-06-04Delete seq_bufprintbuf_v3Kent Overstreet
No longer has any users, so delete it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04vsprintf: Refactor hex_string, bitmap_string_list, bitmap_stringKent Overstreet
This patch cleans up printf_spec handling: these functions only use spec.field_width and they do not interpret it in the normal way - instead it's a number of bits/bytes passed in to print, so these functions are changed to take that parameter directly. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04vsprintf: Refactor device_node_string, fwnode_stringKent Overstreet
- eliminate on-stack buffer in device_node_string - eliminate unnecessary uses of printf_spec, lift format string precision/field width to pointer() Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04vsprintf: flags_string() no longer takes printf_specKent Overstreet
We're attempting to consolidate printf_spec and format string handling in the top level vpr_buf(), this changes time_and_date() to not take printf_spec. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04vsprintf: time_and_date() no longer takes printf_specKent Overstreet
We're attempting to consolidate printf_spec and format string handling in the top level vpr_buf(), this changes time_and_date() to not take printf_spec. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04vsprintf: Refactor mac_address_string()Kent Overstreet
- We're attempting to consolidate printf_spec and format string handling in the top level ptr_vprintf(), this changes mac_address_string() to not take printf_spec - With the new printbuf helpers there's no need to use a separate stack allocated buffer, so this patch deletes it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04vsprintf: Refactor ip_addr_string()Kent Overstreet
- We're attempting to consolidate printf_spec and format string handling in the top level vpr_buf(), this changes ip_addr_string() to not take printf_spec - With the new printbuf helpers there's no need to use a separate stack allocated buffer, so this patch deletes it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04vsprintf: Refactor fourcc_string()Kent Overstreet
- We're attempting to consolidate printf_spec and format string handling in the top level vpr_buf(), this changes fourcc_string() to not take printf_spec - With the new printbuf helpers there's no need to use a separate stack allocated buffer, so this patch deletes it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04vsprintf: Refactor resource_string()Kent Overstreet
Two changes: - We're attempting to consolidate printf_spec and format string handling in the top level vpr_buf(), this changes resource_string to not take printf_spec - With the new printbuf helpers there's no need to use a separate stack allocated buffer, so this patch deletes it. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04vsprintf: Start consolidating printf_spec handlingKent Overstreet
printf_spec is right now something of a mess - it's a grab-bag of state that's interpreted inconsistently by different code, and it's scattered throughout vsprintf.c. We'd like to get it out of the pretty-printers, and have it be solely the responsibility of vsprintf()/vpr_buf(), the code that parses and handles format strings. Most of the code that uses printf_spec is only using it for a minimum & maximum field width - that can be done at the toplevel by checking how much we just printed, and padding or truncating it as necessary. This patch takes those "simple" uses of printf_spec and moves them as far up the call stack as possible. This patch also renames some helpers and creates new ones that don't take printf_spec: - do_width_precision: new helper that handles with/precision of printf_spec - error_string -> error_string_spec - check_pointer -> check_pointer_spec - string -> string_spec Next patches will be reducing/eliminating uses of the *_spec versions. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04test_printf: Drop requirement that sprintf not write past nulKent Overstreet
The current test code checks that sprintf never writes past the terminating nul. This is a rather strange requirement, completely separate from writing past the end of the buffer, which of course we can't do: writing anywhere to the buffer passed to snprintf, within size of course, should be perfectly fine. Since this check has no documentation as to where it comes from or what depends on it, and it's getting in the way of further refactoring (printf_spec handling is right now scattered massively throughout the code, and we'd like to consolidate it) - delete it. Also, many current pretty-printers building up their output on the stack, and then copy it to the actual output buffer - by eliminating this requirement we can kill those extra buffers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04vsprintf: prt_u64_minwidth(), prt_u64()Kent Overstreet
This adds two new-style printbuf helpers for printing simple u64s, and converts num_to_str() to be a simple wrapper around prt_u64_minwidth(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04vsprintf: Improve number()Kent Overstreet
This patch refactors number() to make it a bit clearer, and it also changes it to call printbuf_make_room() only once at the start, instead of in the printbuf output helpers. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04lib/pretty-printers: prt_string_option(), prt_bitflags()Kent Overstreet
Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04lib/printbuf: Unit specifiersKent Overstreet
This adds options to printbuf for specifying whether units should be printed raw (default) or with human readable units, and for controlling whether human-readable units should be base 2 (default), or base 10. This also adds new helpers that obey these options: - pr_human_readable_u64 - pr_human_readable_s64 These obey printbuf->si_units - pr_units_u64 - pr_units_s64 These obey both printbuf-human_readable_units and printbuf->si_units Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04lib/printbuf: Tabstops, indentingKent Overstreet
This patch adds two new features to printbuf for structured formatting: - Indent level: the indent level, as a number of spaces, may be increased with pr_indent_add() and decreased with pr_indent_sub(). Subsequent lines, when started with pr_newline() (not "\n", although that may change) will then be intended according to the current indent level. This helps with pretty-printers that structure a large amonut of data across multiple lines and multiple functions. - Tabstops: Tabstops may be set by assigning to the printbuf->tabstops array. Then, pr_tab() may be used to advance to the next tabstop, printing as many spaces as required - leaving previous output left justified to the previous tabstop. pr_tab_rjust() advances to the next tabstop but inserts the spaces just after the previous tabstop - right justifying the previously-outputted text to the next tabstop. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04lib/printbuf: Heap allocationKent Overstreet
This makes printbufs optionally heap allocated: a printbuf initialized with the PRINTBUF initializer will automatically heap allocate and resize as needed. Allocations are done with GFP_KERNEL: code should use e.g. memalloc_nofs_save()/restore() as needed. Since we do not currently have memalloc_nowait_save()/restore(), in contexts where it is not safe to block we provide the helpers printbuf_atomic_inc() printbuf_atomic_dec() When the atomic count is nonzero, memory allocations will be done with GFP_NOWAIT. On memory allocation failure, output will be truncated. Code that wishes to check for memory allocation failure (in contexts where we should return -ENOMEM) should check if printbuf->allocation_failure is set. Since printbufs are expected to be typically used for log messages and on a best effort basis, we don't return errors directly. Other helpers provided by this patch: - printbuf_make_room(buf, extra) Reallocates if necessary to make room for @extra bytes (not including terminating null). - printbuf_str(buf) Returns a null terminated string equivalent to the contents of @buf. If @buf was never allocated (or allocation failed), returns a constant empty string. - printbuf_exit(buf) Releases memory allocated by a printbuf. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04lib/string_helpers: string_get_size() now returns characters wroteKent Overstreet
printbuf now needs to know the number of characters that would have been written if the buffer was too small, like snprintf(); this changes string_get_size() to return the the return value of snprintf(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04vsprintf: %pf(%p)Kent Overstreet
This implements two new format strings: both do the same thing, one more compatible with current gcc format string checking, the other that we'd like to standardize: %pf(%p) - more compatible %(%p) - more prettier Both can take variable numbers of arguments, i.e. %(%p,%p,%p). They're used to indicate that snprintf or pr_buf should interpret the next argument as a pretty-printer function to call, and subsequent arguments within the parentheses should be passed to the pretty-printer. A pretty printer takes as its first argument a printbuf, and then zero or more pointer arguments - integer arguments are not (currently) supported. Example usage: static void foo_to_text(struct printbuf *out, struct foo *foo) { pr_buf(out, "bar=%u baz=%u", foo->bar, foo->baz); } printf("%(%p)", foo_to_text, foo); The goal is to replace most of our %p format extensions with this interface, and to move pretty-printers out of the core vsprintf.c code - this will get us better organization and better discoverability (you'll be able to cscope to pretty printer calls!), as well as eliminate a lot of dispatch code in vsprintf.c. Currently, we can only call pretty printers with pointer arguments. This could be changed to also allow at least integer arguments in the future by using libffi. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-06-04lib/hexdump: Convert to printbufKent Overstreet
This converts most of the hexdump code to printbufs, along with some significant cleanups and a bit of reorganization. The old non-printbuf functions are mostly left as wrappers around the new printbuf versions. Big note: byte swabbing behaviour Previously, hex_dump_to_buffer() would byteswab the groups of bytes being printed on little endian machines. This behaviour is... not standard or typical for a hex dumper, and this behaviour was silently added/changed without documentation (in 2007). Given that the hex dumpers are just used for debugging output, nothing is likely to break, and hopefully by reverting to more standard behaviour the end result will be _less_ confusion, modulo a few kernel developers who will certainly be annoyed by their tools changing. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04vsprintf: Convert to printbufKent Overstreet
This converts vsnprintf() to printbufs: instead of passing around raw char * pointers for current buf position and end of buf, we have a real type! This makes the calling convention for our existing pretty printers a lot saner and less error prone, plus printbufs add some new helpers that make the code smaller and more readable, with a lot less crazy pointer arithmetic. There are a lot more refactorings to be done: this patch tries to stick to just converting the calling conventions, as that needs to be done all at once in order to avoid introducing a ton of wrappers that will just be deleted. Thankfully we have good unit tests for printf, and they have been run and are all passing with this patch. We have two new exported functions with this patch: - prt_printf(), which is like snprintf but outputs to a printbuf - prt_vprintf, like vsnprintf These are the actual core print routines now - vsnprintf() is a wrapper around prt_vprintf(). Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-06-04lib/string_helpers: Convert string_escape_mem() to printbufKent Overstreet
Like the upcoming vsprintf.c conversion, this converts string_escape_mem to prt_escaped_string(), which uses and outputs to a printbuf, and makes string_escape_mem() a smaller wrapper to support existing users. The new printbuf helpers greatly simplify the code. Signed-off-by: Kent Overstreet <kent.overstreet@gmail.com>
2022-05-18Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfsLinus Torvalds
Pull misc fixes from Al Viro: "vhost race fix and a percpu_ref_init-caused cgroup double-free fix. The latter had manifested as buggered struct mount refcounting - those are also using percpu data structures, but anything that does percpu allocations could be hit" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Fix double fget() in vhost_net_set_backend() percpu_ref_init(): clean ->percpu_count_ref on failure
2022-05-18percpu_ref_init(): clean ->percpu_count_ref on failureAl Viro
That way percpu_ref_exit() is safe after failing percpu_ref_init(). At least one user (cgroup_create()) had a double-free that way; there might be other similar bugs. Easier to fix in percpu_ref_init(), rather than playing whack-a-mole in sloppy users... Usual symptoms look like a messed refcounting in one of subsystems that use percpu allocations (might be percpu-refcount, might be something else). Having refcounts for two different objects share memory is Not Nice(tm)... Reported-by: syzbot+5b1e53987f858500ec00@syzkaller.appspotmail.com Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2022-05-09dim: initialize all struct fieldsJesse Brandeburg
The W=2 build pointed out that the code wasn't initializing all the variables in the dim_cq_moder declarations with the struct initializers. The net change here is zero since these structs were already static const globals and were initialized with zeros by the compiler, but removing compiler warnings has value in and of itself. lib/dim/net_dim.c: At top level: lib/dim/net_dim.c:54:9: warning: missing initializer for field ‘comps’ of ‘const struct dim_cq_moder’ [-Wmissing-field-initializers] 54 | NET_DIM_RX_EQE_PROFILES, | ^~~~~~~~~~~~~~~~~~~~~~~ In file included from lib/dim/net_dim.c:6: ./include/linux/dim.h:45:13: note: ‘comps’ declared here 45 | u16 comps; | ^~~~~ and repeats for the tx struct, and once you fix the comps entry then the cq_period_mode field needs the same treatment. Use the commonly accepted style to indicate to the compiler that we know what we're doing, and add a comma at the end of each struct initializer to clean up the issue, and use explicit initializers for the fields we are initializing which makes the compiler happy. While here and fixing these lines, clean up the code slightly with a fix for the super long lines by removing the word "_MODERATION" from a couple defines only used in this file. Fixes: f8be17b81d44 ("lib/dim: Fix -Wunused-const-variable warnings") Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com> Link: https://lore.kernel.org/r/20220507011038.14568-1-jesse.brandeburg@intel.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-05-01Merge tag 'x86_urgent_for_v5.18_rc5' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip Pull x86 fixes from Borislav Petkov: - A fix to disable PCI/MSI[-X] masking for XEN_HVM guests as that is solely controlled by the hypervisor - A build fix to make the function prototype (__warn()) as visible as the definition itself - A bunch of objtool annotation fixes which have accumulated over time - An ORC unwinder fix to handle bad input gracefully - Well, we thought the microcode gets loaded in time in order to restore the microcode-emulated MSRs but we thought wrong. So there's a fix for that to have the ordering done properly - Add new Intel model numbers - A spelling fix * tag 'x86_urgent_for_v5.18_rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: x86/pci/xen: Disable PCI/MSI[-X] masking for XEN_HVM guests bug: Have __warn() prototype defined unconditionally x86/Kconfig: fix the spelling of 'becoming' in X86_KERNEL_IBT config objtool: Use offstr() to print address of missing ENDBR objtool: Print data address for "!ENDBR" data warnings x86/xen: Add ANNOTATE_NOENDBR to startup_xen() x86/uaccess: Add ENDBR to __put_user_nocheck*() x86/retpoline: Add ANNOTATE_NOENDBR for retpolines x86/static_call: Add ANNOTATE_NOENDBR to static call trampoline objtool: Enable unreachable warnings for CLANG LTO x86,objtool: Explicitly mark idtentry_body()s tail REACHABLE x86,objtool: Mark cpu_startup_entry() __noreturn x86,xen,objtool: Add UNWIND hint lib/strn*,objtool: Enforce user_access_begin() rules MAINTAINERS: Add x86 unwinding entry x86/unwind/orc: Recheck address range after stack info was updated x86/cpu: Load microcode during restore_processor_state() x86/cpu: Add new Alderlake and Raptorlake CPU model numbers
2022-04-27hex2bin: fix access beyond string endMikulas Patocka
If we pass too short string to "hex2bin" (and the string size without the terminating NUL character is even), "hex2bin" reads one byte after the terminating NUL character. This patch fixes it. Note that hex_to_bin returns -1 on error and hex2bin return -EINVAL on error - so we can't just return the variable "hi" or "lo" on error. This inconsistency may be fixed in the next merge window, but for the purpose of fixing this bug, we just preserve the existing behavior and return -1 and -EINVAL. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Reviewed-by: Andy Shevchenko <andy.shevchenko@gmail.com> Fixes: b78049831ffe ("lib: add error checking to hex2bin") Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-27hex2bin: make the function hex_to_bin constant-timeMikulas Patocka
The function hex2bin is used to load cryptographic keys into device mapper targets dm-crypt and dm-integrity. It should take constant time independent on the processed data, so that concurrently running unprivileged code can't infer any information about the keys via microarchitectural convert channels. This patch changes the function hex_to_bin so that it contains no branches and no memory accesses. Note that this shouldn't cause performance degradation because the size of the new function is the same as the size of the old function (on x86-64) - and the new function causes no branch misprediction penalties. I compile-tested this function with gcc on aarch64 alpha arm hppa hppa64 i386 ia64 m68k mips32 mips64 powerpc powerpc64 riscv sh4 s390x sparc32 sparc64 x86_64 and with clang on aarch64 arm hexagon i386 mips32 mips64 powerpc powerpc64 s390x sparc32 sparc64 x86_64 to verify that there are no branches in the generated code. Signed-off-by: Mikulas Patocka <mpatocka@redhat.com> Cc: stable@vger.kernel.org Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-22XArray: Disallow sibling entries of nodesMatthew Wilcox (Oracle)
There is a race between xas_split() and xas_load() which can result in the wrong page being returned, and thus data corruption. Fortunately, it's hard to hit (syzbot took three months to find it) and often guarded with VM_BUG_ON(). The anatomy of this race is: thread A thread B order-9 page is stored at index 0x200 lookup of page at index 0x274 page split starts load of sibling entry at offset 9 stores nodes at offsets 8-15 load of entry at offset 8 The entry at offset 8 turns out to be a node, and so we descend into it, and load the page at index 0x234 instead of 0x274. This is hard to fix on the split side; we could replace the entire node that contains the order-9 page instead of replacing the eight entries. Fixing it on the lookup side is easier; just disallow sibling entries that point to nodes. This cannot ever be a useful thing as the descent would not know the correct offset to use within the new node. The test suite continues to pass, but I have not added a new test for this bug. Reported-by: syzbot+cf4cf13056f85dec2c40@syzkaller.appspotmail.com Tested-by: syzbot+cf4cf13056f85dec2c40@syzkaller.appspotmail.com Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-04-19lib/strn*,objtool: Enforce user_access_begin() rulesPeter Zijlstra
Apparently GCC can fail to inline a 'static inline' single caller function: lib/strnlen_user.o: warning: objtool: strnlen_user()+0x33: call to do_strnlen_user() with UACCESS enabled lib/strncpy_from_user.o: warning: objtool: strncpy_from_user()+0x33: call to do_strncpy_from_user() with UACCESS enabled Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Link: https://lore.kernel.org/r/20220408094718.262932488@infradead.org
2022-04-10Merge tag 'driver-core-5.18-rc2' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core Pull driver core updates from Greg KH: "Here are two small driver core changes for 5.18-rc2. They are the final bits in the removal of the default_attrs field in struct kobj_type. I had to wait until after 5.18-rc1 for all of the changes to do this came in through different development trees, and then one new user snuck in. So this series has two changes: - removal of the default_attrs field in the powerpc/pseries/vas code. The change has been acked by the PPC maintainers to come through this tree - removal of default_attrs from struct kobj_type now that all in-kernel users are removed. This cleans up the kobject code a little bit and removes some duplicated functionality that confused people (now there is only one way to do default groups) Both of these have been in linux-next for all of this week with no reported problems" * tag 'driver-core-5.18-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core: kobject: kobj_type: remove default_attrs powerpc/pseries/vas: use default_groups in kobj_type
2022-04-08lz4: fix LZ4_decompress_safe_partial read out of boundGuo Xuenan
When partialDecoding, it is EOF if we've either filled the output buffer or can't proceed with reading an offset for following match. In some extreme corner cases when compressed data is suitably corrupted, UAF will occur. As reported by KASAN [1], LZ4_decompress_safe_partial may lead to read out of bound problem during decoding. lz4 upstream has fixed it [2] and this issue has been disscussed here [3] before. current decompression routine was ported from lz4 v1.8.3, bumping lib/lz4 to v1.9.+ is certainly a huge work to be done later, so, we'd better fix it first. [1] https://lore.kernel.org/all/000000000000830d1205cf7f0477@google.com/ [2] https://github.com/lz4/lz4/commit/c5d6f8a8be3927c0bec91bcc58667a6cfad244ad# [3] https://lore.kernel.org/all/CC666AE8-4CA4-4951-B6FB-A2EFDE3AC03B@fb.com/ Link: https://lkml.kernel.org/r/20211111105048.2006070-1-guoxuenan@huawei.com Reported-by: syzbot+63d688f1d899c588fb71@syzkaller.appspotmail.com Signed-off-by: Guo Xuenan <guoxuenan@huawei.com> Reviewed-by: Nick Terrell <terrelln@fb.com> Acked-by: Gao Xiang <hsiangkao@linux.alibaba.com> Cc: Yann Collet <cyan@fb.com> Cc: Chengyang Fan <cy.fan@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-04-05kobject: kobj_type: remove default_attrsGreg Kroah-Hartman
Now that all in-kernel users of default_attrs for the kobj_type are gone and converted to properly use the default_groups pointer instead, it can be safely removed. There is one standard way to create sysfs files in a kobj_type, and not two like before, causing confusion as to which should be used. Cc: "Rafael J. Wysocki" <rafael@kernel.org> Link: https://lore.kernel.org/r/20220106133151.607703-1-gregkh@linuxfoundation.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2022-04-01Merge tag 'for-5.18/block-2022-04-01' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block fixes from Jens Axboe: "Either fixes or a few additions that got missed in the initial merge window pull. In detail: - List iterator fix to avoid leaking value post loop (Jakob) - One-off fix in minor count (Christophe) - Fix for a regression in how io priority setting works for an exiting task (Jiri) - Fix a regression in this merge window with blkg_free() being called in an inappropriate context (Ming) - Misc fixes (Ming, Tom)" * tag 'for-5.18/block-2022-04-01' of git://git.kernel.dk/linux-block: blk-wbt: remove wbt_track stub block: use dedicated list iterator variable block: Fix the maximum minor value is blk_alloc_ext_minor() block: restore the old set_task_ioprio() behaviour wrt PF_EXITING block: avoid calling blkg_free() in atomic context lib/sbitmap: allocate sb->map via kvzalloc_node
2022-04-01Merge tag 'xarray-5.18' of git://git.infradead.org/users/willy/xarrayLinus Torvalds
Pull XArray updates from Matthew Wilcox: - Documentation update - Fix test-suite build after move of bitmap.h - Fix xas_create_range() when a large entry is already present - Fix xas_split() of a shadow entry * tag 'xarray-5.18' of git://git.infradead.org/users/willy/xarray: XArray: Update the LRU list in xas_split() XArray: Fix xas_create_range() when multi-order entry present XArray: Include bitmap.h from xarray.h XArray: Document the locking requirement for the xa_state
2022-03-31Merge tag 'for-linus-5.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml Pull UML updates from Richard Weinberger: - Devicetree support (for testing) - Various cleanups and fixes: UBD, port_user, uml_mconsole - Maintainer update * tag 'for-linus-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/uml: um: run_helper: Write error message to kernel log on exec failure on host um: port_user: Improve error handling when port-helper is not found um: port_user: Allow setting path to port-helper using UML_PORT_HELPER envvar um: port_user: Search for in.telnetd in PATH um: clang: Strip out -mno-global-merge from USER_CFLAGS docs: UML: Mention telnetd for port channel um: Remove unused timeval_to_ns() function um: Fix uml_mconsole stop/go um: Cleanup syscall_handler_t definition/cast, fix warning uml: net: vector: fix const issue um: Fix WRITE_ZEROES in the UBD Driver um: Migrate vector drivers to NAPI um: Fix order of dtb unflatten/early init um: fix and optimize xor select template for CONFIG64 and timetravel mode um: Document dtb command line option lib/logic_iomem: correct fallback config references um: Remove duplicated include in syscalls_64.c MAINTAINERS: Update UserModeLinux entry
2022-03-31XArray: Update the LRU list in xas_split()Matthew Wilcox (Oracle)
When splitting a value entry, we may need to add the new nodes to the LRU list and remove the parent node from the LRU list. The WARN_ON checks in shadow_lru_isolate() catch this oversight. This bug was latent until we stopped splitting folios in shrink_page_list() with commit 820c4e2e6f51 ("mm/vmscan: Free non-shmem folios without splitting them"). That allows the creation of large shadow entries, and subsequently when trying to page in a small page, we will split the large shadow entry in __filemap_add_folio(). Fixes: 8fc75643c5e1 ("XArray: add xas_split") Reported-by: Hugh Dickins <hughd@google.com> Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-29lib/test: use after free in register_test_dev_kmod()Dan Carpenter
The "test_dev" pointer is freed but then returned to the caller. Fixes: d9c6a72d6fa2 ("kmod: add test driver to stress test the module loader") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
2022-03-28XArray: Fix xas_create_range() when multi-order entry presentMatthew Wilcox (Oracle)
If there is already an entry present that is of order >= XA_CHUNK_SHIFT when we call xas_create_range(), xas_create_range() will misinterpret that entry as a node and dereference xa_node->parent, generally leading to a crash that looks something like this: general protection fault, probably for non-canonical address 0xdffffc0000000001: 0000 [#1] PREEMPT SMP KASAN KASAN: null-ptr-deref in range [0x0000000000000008-0x000000000000000f] CPU: 0 PID: 32 Comm: khugepaged Not tainted 5.17.0-rc8-syzkaller-00003-g56e337f2cf13 #0 RIP: 0010:xa_parent_locked include/linux/xarray.h:1207 [inline] RIP: 0010:xas_create_range+0x2d9/0x6e0 lib/xarray.c:725 It's deterministically reproducable once you know what the problem is, but producing it in a live kernel requires khugepaged to hit a race. While the problem has been present since xas_create_range() was introduced, I'm not aware of a way to hit it before the page cache was converted to use multi-index entries. Fixes: 6b24ca4a1a8d ("mm: Use multi-index entries in the page cache") Reported-by: syzbot+0d2b0bf32ca5cfd09f2e@syzkaller.appspotmail.com Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
2022-03-26Merge tag 'memcpy-v5.18-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux Pull FORTIFY_SOURCE updates from Kees Cook: "This series consists of two halves: - strict compile-time buffer size checking under FORTIFY_SOURCE for the memcpy()-family of functions (for extensive details and rationale, see the first commit) - enabling FORTIFY_SOURCE for Clang, which has had many overlapping bugs that we've finally worked past" * tag 'memcpy-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: fortify: Add Clang support fortify: Make sure strlen() may still be used as a constant expression fortify: Use __diagnose_as() for better diagnostic coverage fortify: Make pointer arguments const Compiler Attributes: Add __diagnose_as for Clang Compiler Attributes: Add __overloadable for Clang Compiler Attributes: Add __pass_object_size for Clang fortify: Replace open-coded __gnu_inline attribute fortify: Update compile-time tests for Clang 14 fortify: Detect struct member overflows in memset() at compile-time fortify: Detect struct member overflows in memmove() at compile-time fortify: Detect struct member overflows in memcpy() at compile-time
2022-03-26Merge tag 'for-5.18/64bit-pi-2022-03-25' of git://git.kernel.dk/linux-blockLinus Torvalds
Pull block layer 64-bit data integrity support from Jens Axboe: "This adds support for 64-bit data integrity in the block layer and in NVMe" * tag 'for-5.18/64bit-pi-2022-03-25' of git://git.kernel.dk/linux-block: crypto: fix crc64 testmgr digest byte order nvme: add support for enhanced metadata block: add pi for extended integrity crypto: add rocksoft 64b crc guard tag framework lib: add rocksoft model crc64 linux/kernel: introduce lower_48_bits function asm-generic: introduce be48 unaligned accessors nvme: allow integrity on extended metadata formats block: support pi with extended metadata
2022-03-25Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge yet more updates from Andrew Morton: "This is the material which was staged after willystuff in linux-next. Subsystems affected by this patch series: mm (debug, selftests, pagecache, thp, rmap, migration, kasan, hugetlb, pagemap, madvise), and selftests" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (113 commits) selftests: kselftest framework: provide "finished" helper mm: madvise: MADV_DONTNEED_LOCKED mm: fix race between MADV_FREE reclaim and blkdev direct IO read mm: generalize ARCH_HAS_FILTER_PGPROT mm: unmap_mapping_range_tree() with i_mmap_rwsem shared mm: warn on deleting redirtied only if accounted mm/huge_memory: remove stale locking logic from __split_huge_pmd() mm/huge_memory: remove stale page_trans_huge_mapcount() mm/swapfile: remove stale reuse_swap_page() mm/khugepaged: remove reuse_swap_page() usage mm/huge_memory: streamline COW logic in do_huge_pmd_wp_page() mm: streamline COW logic in do_swap_page() mm: slightly clarify KSM logic in do_swap_page() mm: optimize do_wp_page() for fresh pages in local LRU pagevecs mm: optimize do_wp_page() for exclusive pages in the swapcache mm/huge_memory: make is_transparent_hugepage() static userfaultfd/selftests: enable hugetlb remap and remove event testing selftests/vm: add hugetlb madvise MADV_DONTNEED MADV_REMOVE test mm: enable MADV_DONTNEED for hugetlb mappings kasan: disable LOCKDEP when printing reports ...
2022-03-24kasan: update function name in commentsPeter Collingbourne
The function kasan_global_oob was renamed to kasan_global_oob_right, but the comments referring to it were not updated. Do so. Link: https://linux-review.googlesource.com/id/I20faa90126937bbee77d9d44709556c3dd4b40be Link: https://lkml.kernel.org/r/20220219012433.890941-1-pcc@google.com Signed-off-by: Peter Collingbourne <pcc@google.com> Reviewed-by: Miaohe Lin <linmiaohe@huawei.com> Reviewed-by: Marco Elver <elver@google.com> Reviewed-by: Andrey Konovalov <andreyknvl@gmail.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24kasan: test: support async (again) and asymm modes for HW_TAGSAndrey Konovalov
Async mode support has already been implemented in commit e80a76aa1a91 ("kasan, arm64: tests supports for HW_TAGS async mode") but then got accidentally broken in commit 99734b535d9b ("kasan: detect false-positives in tests"). Restore the changes removed by the latter patch and adapt them for asymm mode: add a sync_fault flag to kunit_kasan_expectation that only get set if the MTE fault was synchronous, and reenable MTE on such faults in tests. Also rename kunit_kasan_expectation to kunit_kasan_status and move its definition to mm/kasan/kasan.h from include/linux/kasan.h, as this structure is only internally used by KASAN. Also put the structure definition under IS_ENABLED(CONFIG_KUNIT). Link: https://lkml.kernel.org/r/133970562ccacc93ba19d754012c562351d4a8c8.1645033139.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Cc: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24kasan: improve vmalloc testsAndrey Konovalov
Update the existing vmalloc_oob() test to account for the specifics of the tag-based modes. Also add a few new checks and comments. Add new vmalloc-related tests: - vmalloc_helpers_tags() to check that exported vmalloc helpers can handle tagged pointers. - vmap_tags() to check that SW_TAGS mode properly tags vmap() mappings. - vm_map_ram_tags() to check that SW_TAGS mode properly tags vm_map_ram() mappings. - vmalloc_percpu() to check that SW_TAGS mode tags regions allocated for __alloc_percpu(). The tagging of per-cpu mappings is best-effort; proper tagging is tracked in [1]. [1] https://bugzilla.kernel.org/show_bug.cgi?id=215019 [sfr@canb.auug.org.au: similar to "kasan: test: fix compatibility with FORTIFY_SOURCE"] Link: https://lkml.kernel.org/r/20220128144801.73f5ced0@canb.auug.org.au Link: https://lkml.kernel.org/r/865c91ba49b90623ab50c7526b79ccb955f544f0.1644950160.git.andreyknvl@google.com [andreyknvl@google.com: set_memory_rw/ro() are not exported to modules] Link: https://lkml.kernel.org/r/019ac41602e0c4a7dfe96dc8158a95097c2b2ebd.1645554036.git.andreyknvl@google.com [akpm@linux-foundation.org: fix build] Cc: Andrey Konovalov <andreyknvl@gmail.com> [andreyknvl@google.com: vmap_tags() and vm_map_ram_tags() pass invalid page array size] Link: https://lkml.kernel.org/r/bbdc1c0501c5275e7f26fdb8e2a7b14a40a9f36b.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Signed-off-by: Stephen Rothwell <sfr@canb.auug.org.au> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24kasan: allow enabling KASAN_VMALLOC and SW/HW_TAGSAndrey Konovalov
Allow enabling CONFIG_KASAN_VMALLOC with SW_TAGS and HW_TAGS KASAN modes. Also adjust CONFIG_KASAN_VMALLOC description: - Mention HW_TAGS support. - Remove unneeded internal details: they have no place in Kconfig description and are already explained in the documentation. Link: https://lkml.kernel.org/r/bfa0fdedfe25f65e5caa4e410f074ddbac7a0b59.1643047180.git.andreyknvl@google.com Signed-off-by: Andrey Konovalov <andreyknvl@google.com> Acked-by: Marco Elver <elver@google.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Evgenii Stepanov <eugenis@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Collingbourne <pcc@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: Will Deacon <will@kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24lib/vsprintf: avoid redundant work with 0 sizeWaiman Long
Patch series "mm/page_owner: Extend page_owner to show memcg information", v4. While debugging the constant increase in percpu memory consumption on a system that spawned large number of containers, it was found that a lot of offline mem_cgroup structures remained in place without being freed. Further investigation indicated that those mem_cgroup structures were pinned by some pages. In order to find out what those pages are, the existing page_owner debugging tool is extended to show memory cgroup information and whether those memcgs are offline or not. With the enhanced page_owner tool, the following is a typical page that pinned the mem_cgroup structure in my test case: Page allocated via order 0, mask 0x1100cca(GFP_HIGHUSER_MOVABLE), pid 162970 (podman), ts 1097761405537 ns, free_ts 1097760838089 ns PFN 1925700 type Movable Block 3761 type Movable Flags 0x17ffffc00c001c(uptodate|dirty|lru|reclaim|swapbacked|node=0|zone=2|lastcpupid=0x1fffff) prep_new_page+0xac/0xe0 get_page_from_freelist+0x1327/0x14d0 __alloc_pages+0x191/0x340 alloc_pages_vma+0x84/0x250 shmem_alloc_page+0x3f/0x90 shmem_alloc_and_acct_page+0x76/0x1c0 shmem_getpage_gfp+0x281/0x940 shmem_write_begin+0x36/0xe0 generic_perform_write+0xed/0x1d0 __generic_file_write_iter+0xdc/0x1b0 generic_file_write_iter+0x5d/0xb0 new_sync_write+0x11f/0x1b0 vfs_write+0x1ba/0x2a0 ksys_write+0x59/0xd0 do_syscall_64+0x37/0x80 entry_SYSCALL_64_after_hwframe+0x44/0xae Charged to offline memcg libpod-conmon-15e4f9c758422306b73b2dd99f9d50a5ea53cbb16b4a13a2c2308a4253cc0ec8. So the page was not freed because it was part of a shmem segment. That is useful information that can help users to diagnose similar problems. With cgroup v1, /proc/cgroups can be read to find out the total number of memory cgroups (online + offline). With cgroup v2, the cgroup.stat of the root cgroup can be read to find the number of dying cgroups (most likely pinned by dying memcgs). The page_owner feature is not supposed to be enabled for production system due to its memory overhead. However, if it is suspected that dying memcgs are increasing over time, a test environment with page_owner enabled can then be set up with appropriate workload for further analysis on what may be causing the increasing number of dying memcgs. This patch (of 4): For *scnprintf(), vsnprintf() is always called even if the input size is 0. That is a waste of time, so just return 0 in this case. Note that vsnprintf() will never return -1 to indicate an error. So skipping the call to vsnprintf() when size is 0 will have no functional impact at all. Link: https://lkml.kernel.org/r/20220202203036.744010-1-longman@redhat.com Link: https://lkml.kernel.org/r/20220202203036.744010-2-longman@redhat.com Signed-off-by: Waiman Long <longman@redhat.com> Acked-by: David Rientjes <rientjes@google.com> Reviewed-by: Sergey Senozhatsky <senozhatsky@chromium.org> Acked-by: Roman Gushchin <guro@fb.com> Acked-by: Rafael Aquini <aquini@redhat.com> Acked-by: Mike Rapoport <rppt@linux.ibm.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Vladimir Davydov <vdavydov.dev@gmail.com> Cc: Petr Mladek <pmladek@suse.com> Cc: Steven Rostedt (Google) <rostedt@goodmis.org> Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk> Cc: Ira Weiny <ira.weiny@intel.com> Cc: David Rientjes <rientjes@google.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2022-03-24Merge tag 'cxl-for-5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl Pull CXL (Compute Express Link) updates from Dan Williams: "This development cycle extends the subsystem to discover CXL resources throughout a CXL/PCIe switch topology and respond to hot add/remove events anywhere in that topology. This is more foundational infrastructure in preparation for dynamic memory region provisioning support. Recall that CXL memory regions, as the new "Theory of Operation" section of Documentation/driver-api/cxl/memory-devices.rst describes, bring storage volume striping semantics to memory. The hot add/remove behavior is validated with extensions to the cxl_test unit test environment and this test in the cxl-cli test suite: https://github.com/pmem/ndctl/blob/djbw/for-74/cxl/test/cxl-topology.sh Summary: - Add a driver for 'struct cxl_memdev' objects responsible for CXL.mem operation as distinct from 'cxl_pci' mailbox operations. Its primary responsibility is enumerating an endpoint 'struct cxl_port' and all the 'struct cxl_port' instances between an endpoint and the CXL platform root. - Add a driver for 'struct cxl_port' objects responsible for enumerating and operating all Host-managed Device Memory (HDM) decoder resources between the platform-level CXL memory description, all intervening host bridges / switches, and the HDM resources in endpoints. - Update the cxl_pci driver to validate CXL.mem operation precursors to HDM decoder operation like ready-polling, and legacy CXL 1.1 DVSEC based CXL.mem configuration. - Add basic lockdep coverage for usage of device_lock() on CXL subsystem objects similar to what exists for LIBNVDIMM. Include a compile-time switch for which subsystem to validate at run-time. - Update cxl_test to emulate a one level switch topology. - Document a "Theory of Operation" for the subsystem. - Add 'numa_node' and 'serial' attributes to cxl_memdev sysfs - Include miscellaneous fixes for spec / QEMU CXL emulation compatibility and static analysis reports" * tag 'cxl-for-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl: (48 commits) cxl/core/port: Fix NULL but dereferenced coccicheck error cxl/port: Hold port reference until decoder release cxl/port: Fix endpoint refcount leak cxl/core: Fix cxl_device_lock() class detection cxl/core/port: Fix unregister_port() lock assertion cxl/regs: Fix size of CXL Capability Header Register cxl/core/port: Handle invalid decoders cxl/core/port: Fix / relax decoder target enumeration tools/testing/cxl: Add a physical_node link tools/testing/cxl: Enumerate mock decoders tools/testing/cxl: Mock one level of switches tools/testing/cxl: Fix root port to host bridge assignment tools/testing/cxl: Mock dvsec_ranges() cxl/core/port: Add endpoint decoders cxl/core: Move target_list out of base decoder attributes cxl/mem: Add the cxl_mem driver cxl/core/port: Add switch port enumeration cxl/memdev: Add numa_node attribute cxl/pci: Emit device serial number cxl/pci: Implement wait for media active ...
2022-03-24Merge branch 'akpm' (patches from Andrew)Linus Torvalds
Merge more updates from Andrew Morton: "Various misc subsystems, before getting into the post-linux-next material. 41 patches. Subsystems affected by this patch series: procfs, misc, core-kernel, lib, checkpatch, init, pipe, minix, fat, cgroups, kexec, kdump, taskstats, panic, kcov, resource, and ubsan" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: (41 commits) Revert "ubsan, kcsan: Don't combine sanitizer with kcov on clang" kernel/resource: fix kfree() of bootmem memory again kcov: properly handle subsequent mmap calls kcov: split ioctl handling into locked and unlocked parts panic: move panic_print before kmsg dumpers panic: add option to dump all CPUs backtraces in panic_print docs: sysctl/kernel: add missing bit to panic_print taskstats: remove unneeded dead assignment kasan: no need to unset panic_on_warn in end_report() ubsan: no need to unset panic_on_warn in ubsan_epilogue() panic: unset panic_on_warn inside panic() docs: kdump: add scp example to write out the dump file docs: kdump: update description about sysfs file system support arm64: mm: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef x86/setup: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef riscv: mm: init: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef kexec: make crashk_res, crashk_low_res and crash_notes symbols always visible cgroup: use irqsave in cgroup_rstat_flush_locked(). fat: use pointer to simple type in put_user() minix: fix bug when opening a file with O_DIRECT ...
2022-03-24Merge tag 'net-next-5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "The sprinkling of SPI drivers is because we added a new one and Mark sent us a SPI driver interface conversion pull request. Core ---- - Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO). - Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order. - Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect). - Continue the work of annotating packet drop reasons throughout the stack. - Switch netdev error counters from an atomic to dynamically allocated per-CPU counters. - Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT. - Extend the ref_tracker to allow catching use-after-free bugs. BPF --- - Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split. - Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers. - Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency. - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator. - Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep. - Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later). - Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra. - Allow cgroup BPF progs to return custom errors to user space. - Add support for AF_UNIX iterator batching. - Allow iterator programs to use sleepable helpers. - Support JIT of add, and, or, xor and xchg atomic ops on arm64. - Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info. - Large number of libbpf API improvements, cleanups and deprecations. Protocols --------- - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev. - Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames. - Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior. - VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge. - Support inserting IPv6 IOAM information to a fraction of frames. - Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.) - Support setting socket and IPv6 options via cmsg on ping6 sockets. - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules. - Add support for locked bridge ports (for 802.1X). - tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios. - IPv6 extension header handling in Open vSwitch. - Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor. - SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY - MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237) - Multi-BSSID beacon handling in AP mode for WiFi. - Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events - Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements - Increase the max PDU size in CAN ISOTP to 64 kB. Driver API ---------- - Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels. - Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8. - Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks. - Add header/data split indication to guide user space enabling of TCP zero-copy Rx. - Allow configuring completion queue event size. - Refactor page_pool to enable fragmenting after allocation. - Add allocation and page reuse statistics to page_pool. - Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches. - DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering New hardware / drivers ---------------------- - Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches - WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6 - Mobile: - iosm: Intel M.2 7360 WWAN card Drivers ------- - Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases. - Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support - Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions - Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP - Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support - Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports - Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload - NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches - Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock - Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets - Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode - Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices - MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915 - RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan - Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS) - Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup" * tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits) llc: fix netdevice reference leaks in llc_ui_bind() drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool ice: don't allow to run ice_send_event_to_aux() in atomic ctx ice: fix 'scheduling while atomic' on aux critical err interrupt net/sched: fix incorrect vlan_push_eth dest field net: bridge: mst: Restrict info size queries to bridge ports net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init() drivers: net: xgene: Fix regression in CRC stripping net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT net: dsa: fix missing host-filtered multicast addresses net/mlx5e: Fix build warning, detected write beyond size of field iwlwifi: mvm: Don't fail if PPAG isn't supported selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" netdevice: add missing dm_private kdoc net: bridge: mst: prevent NULL deref in br_mst_info_size() selftests: forwarding: Use same VRF for port and VLAN upper ...