Age | Commit message (Collapse) | Author |
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
- Add cpufreq pressure feedback for the scheduler
- Rework misfit load-balancing wrt affinity restrictions
- Clean up and simplify the code around ::overutilized and
::overload access.
- Simplify sched_balance_newidle()
- Bump SCHEDSTAT_VERSION to 16 due to a cleanup of CPU_MAX_IDLE_TYPES
handling that changed the output.
- Rework & clean up <asm/vtime.h> interactions wrt arch_vtime_task_switch()
- Reorganize, clean up and unify most of the higher level
scheduler balancing function names around the sched_balance_*()
prefix
- Simplify the balancing flag code (sched_balance_running)
- Miscellaneous cleanups & fixes
* tag 'sched-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (50 commits)
sched/pelt: Remove shift of thermal clock
sched/cpufreq: Rename arch_update_thermal_pressure() => arch_update_hw_pressure()
thermal/cpufreq: Remove arch_update_thermal_pressure()
sched/cpufreq: Take cpufreq feedback into account
cpufreq: Add a cpufreq pressure feedback for the scheduler
sched/fair: Fix update of rd->sg_overutilized
sched/vtime: Do not include <asm/vtime.h> header
s390/irq,nmi: Include <asm/vtime.h> header directly
s390/vtime: Remove unused __ARCH_HAS_VTIME_TASK_SWITCH leftover
sched/vtime: Get rid of generic vtime_task_switch() implementation
sched/vtime: Remove confusing arch_vtime_task_switch() declaration
sched/balancing: Simplify the sg_status bitmask and use separate ->overloaded and ->overutilized flags
sched/fair: Rename set_rd_overutilized_status() to set_rd_overutilized()
sched/fair: Rename SG_OVERLOAD to SG_OVERLOADED
sched/fair: Rename {set|get}_rd_overload() to {set|get}_rd_overloaded()
sched/fair: Rename root_domain::overload to ::overloaded
sched/fair: Use helper functions to access root_domain::overload
sched/fair: Check root_domain::overload value before update
sched/fair: Combine EAS check with root_domain::overutilized access
sched/fair: Simplify the continue_balancing logic in sched_balance_newidle()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull perf events updates from Ingo Molnar:
- Combine perf and BPF for fast evalution of HW breakpoint
conditions
- Add LBR capture support outside of hardware events
- Trigger IO signals for watermark_wakeup
- Add RAPL support for Intel Arrow Lake and Lunar Lake
- Optimize frequency-throttling
- Miscellaneous cleanups & fixes
* tag 'perf-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (21 commits)
perf/bpf: Mark perf_event_set_bpf_handler() and perf_event_free_bpf_handler() as inline too
selftests/perf_events: Test FASYNC with watermark wakeups
perf/ring_buffer: Trigger IO signals for watermark_wakeup
perf: Move perf_event_fasync() to perf_event.h
perf/bpf: Change the !CONFIG_BPF_SYSCALL stubs to static inlines
selftest/bpf: Test a perf BPF program that suppresses side effects
perf/bpf: Allow a BPF program to suppress all sample side effects
perf/bpf: Remove unneeded uses_default_overflow_handler()
perf/bpf: Call BPF handler directly, not through overflow machinery
perf/bpf: Remove #ifdef CONFIG_BPF_SYSCALL from struct perf_event members
perf/bpf: Create bpf_overflow_handler() stub for !CONFIG_BPF_SYSCALL
perf/bpf: Reorder bpf_overflow_handler() ahead of __perf_event_overflow()
perf/x86/rapl: Add support for Intel Lunar Lake
perf/x86/rapl: Add support for Intel Arrow Lake
perf/core: Reduce PMU access to adjust sample freq
perf/core: Optimize perf_adjust_freq_unthr_context()
perf/x86/amd: Don't reject non-sampling events with configured LBR
perf/x86/amd: Support capturing LBR from software events
perf/x86/amd: Avoid taking branches before disabling LBR
perf/x86/amd: Ensure amd_pmu_core_disable_all() is always inlined
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull locking updates from Ingo Molnar:
- Over a dozen code generation micro-optimizations for the atomic
and spinlock code
- Add more __ro_after_init attributes
- Robustify the lockdevent_*() macros
* tag 'locking-core-2024-05-13' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
locking/pvqspinlock/x86: Use _Q_LOCKED_VAL in PV_UNLOCK_ASM macro
locking/qspinlock/x86: Micro-optimize virt_spin_lock()
locking/atomic/x86: Merge __arch{,_try}_cmpxchg64_emu_local() with __arch{,_try}_cmpxchg64_emu()
locking/atomic/x86: Introduce arch_try_cmpxchg64_local()
locking/pvqspinlock/x86: Remove redundant CMP after CMPXCHG in __raw_callee_save___pv_queued_spin_unlock()
locking/pvqspinlock: Use try_cmpxchg() in qspinlock_paravirt.h
locking/pvqspinlock: Use try_cmpxchg_acquire() in trylock_clear_pending()
locking/qspinlock: Use atomic_try_cmpxchg_relaxed() in xchg_tail()
locking/atomic/x86: Define arch_atomic_sub() family using arch_atomic_add() functions
locking/atomic/x86: Rewrite x86_32 arch_atomic64_{,fetch}_{and,or,xor}() functions
locking/atomic/x86: Introduce arch_atomic64_read_nonatomic() to x86_32
locking/atomic/x86: Introduce arch_atomic64_try_cmpxchg() to x86_32
locking/atomic/x86: Introduce arch_try_cmpxchg64() for !CONFIG_X86_CMPXCHG64
locking/atomic/x86: Modernize x86_32 arch_{,try_}_cmpxchg64{,_local}()
locking/atomic/x86: Correct the definition of __arch_try_cmpxchg128()
x86/tsc: Make __use_tsc __ro_after_init
x86/kvm: Make kvm_async_pf_enabled __ro_after_init
context_tracking: Make context_tracking_key __ro_after_init
jump_label,module: Don't alloc static_key_mod for __ro_after_init keys
locking/qspinlock: Always evaluate lockevent* non-event parameter once
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2024-05-13
We've added 119 non-merge commits during the last 14 day(s) which contain
a total of 134 files changed, 9462 insertions(+), 4742 deletions(-).
The main changes are:
1) Add BPF JIT support for 32-bit ARCv2 processors, from Shahab Vahedi.
2) Add BPF range computation improvements to the verifier in particular
around XOR and OR operators, refactoring of checks for range computation
and relaxing MUL range computation so that src_reg can also be an unknown
scalar, from Cupertino Miranda.
3) Add support to attach kprobe BPF programs through kprobe_multi link in
a session mode, meaning, a BPF program is attached to both function entry
and return, the entry program can decide if the return program gets
executed and the entry program can share u64 cookie value with return
program. Session mode is a common use-case for tetragon and bpftrace,
from Jiri Olsa.
4) Fix a potential overflow in libbpf's ring__consume_n() and improve libbpf
as well as BPF selftest's struct_ops handling, from Andrii Nakryiko.
5) Improvements to BPF selftests in context of BPF gcc backend,
from Jose E. Marchesi & David Faust.
6) Migrate remaining BPF selftest tests from test_sock_addr.c to prog_test-
-style in order to retire the old test, run it in BPF CI and additionally
expand test coverage, from Jordan Rife.
7) Big batch for BPF selftest refactoring in order to remove duplicate code
around common network helpers, from Geliang Tang.
8) Another batch of improvements to BPF selftests to retire obsolete
bpf_tcp_helpers.h as everything is available vmlinux.h,
from Martin KaFai Lau.
9) Fix BPF map tear-down to not walk the map twice on free when both timer
and wq is used, from Benjamin Tissoires.
10) Fix BPF verifier assumptions about socket->sk that it can be non-NULL,
from Alexei Starovoitov.
11) Change BTF build scripts to using --btf_features for pahole v1.26+,
from Alan Maguire.
12) Small improvements to BPF reusing struct_size() and krealloc_array(),
from Andy Shevchenko.
13) Fix s390 JIT to emit a barrier for BPF_FETCH instructions,
from Ilya Leoshkevich.
14) Extend TCP ->cong_control() callback in order to feed in ack and
flag parameters and allow write-access to tp->snd_cwnd_stamp
from BPF program, from Miao Xu.
15) Add support for internal-only per-CPU instructions to inline
bpf_get_smp_processor_id() helper call for arm64 and riscv64 BPF JITs,
from Puranjay Mohan.
16) Follow-up to remove the redundant ethtool.h from tooling infrastructure,
from Tushar Vyavahare.
17) Extend libbpf to support "module:<function>" syntax for tracing
programs, from Viktor Malik.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (119 commits)
bpf: make list_for_each_entry portable
bpf: ignore expected GCC warning in test_global_func10.c
bpf: disable strict aliasing in test_global_func9.c
selftests/bpf: Free strdup memory in xdp_hw_metadata
selftests/bpf: Fix a few tests for GCC related warnings.
bpf: avoid gcc overflow warning in test_xdp_vlan.c
tools: remove redundant ethtool.h from tooling infra
selftests/bpf: Expand ATTACH_REJECT tests
selftests/bpf: Expand getsockname and getpeername tests
sefltests/bpf: Expand sockaddr hook deny tests
selftests/bpf: Expand sockaddr program return value tests
selftests/bpf: Retire test_sock_addr.(c|sh)
selftests/bpf: Remove redundant sendmsg test cases
selftests/bpf: Migrate ATTACH_REJECT test cases
selftests/bpf: Migrate expected_attach_type tests
selftests/bpf: Migrate wildcard destination rewrite test
selftests/bpf: Migrate sendmsg6 v4 mapped address tests
selftests/bpf: Migrate sendmsg deny test cases
selftests/bpf: Migrate WILDCARD_IP test
selftests/bpf: Handle SYSCALL_EPERM and SYSCALL_ENOTSUPP test cases
...
====================
Link: https://lore.kernel.org/r/20240513134114.17575-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
MSIX irq allocation and free APIs are no longer
in use. Hence, remove the dead code.
Signed-off-by: Parav Pandit <parav@nvidia.com>
Reviewed-by: Dragos Tatulea <dtatulea@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Reviewed-by: Kalesh AP <kalesh-anakkur.purayil@broadcom.com>
Link: https://lore.kernel.org/r/20240512124306.740898-4-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch adds to mlx5 drivers support for 8 ports HCAs.
Starting with ConnectX-8 HCAs with 8 ports are possible.
As most driver parts aren't affected by such configuration most driver
code is unchanged.
Specially the only affected areas are:
- Lag
- Multiport E-Switch
- Single FDB E-Switch
All of the above are already factored in generic way, and LAG and VF LAG
are tested, so all that left is to change a #define and remove checks
which are no longer needed.
However, Multiport E-Switch is not tested yet, so it is left untouched.
This patch will allow to create hardware LAG/VF LAG when all 8 ports are
added to the same bond device.
for example, In order to activate the hardware lag a user can execute
the following:
ip link add bond0 type bond
ip link set bond0 type bond miimon 100 mode 2
ip link set eth2 master bond0
ip link set eth3 master bond0
ip link set eth4 master bond0
ip link set eth5 master bond0
ip link set eth6 master bond0
ip link set eth7 master bond0
ip link set eth8 master bond0
ip link set eth9 master bond0
Where eth2, eth3, eth4, eth5, eth6, eth7, eth8 and eth9 are the PFs of
the same HCA.
Signed-off-by: Shay Drory <shayd@nvidia.com>
Reviewed-by: Mark Bloch <mbloch@nvidia.com>
Signed-off-by: Tariq Toukan <tariqt@nvidia.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240512124306.740898-2-tariqt@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
The origin ax25_dev_list implements its own single linked list,
which is complicated and error-prone. For example, when deleting
the node of ax25_dev_list in ax25_dev_device_down(), we have to
operate on the head node and other nodes separately.
This patch uses kernel universal linked list to replace original
ax25_dev_list, which make the operation of ax25_dev_list easier.
We should do "dev->ax25_ptr = ax25_dev;" and "dev->ax25_ptr = NULL;"
while holding the spinlock, otherwise the ax25_dev_device_up() and
ax25_dev_device_down() could race.
Suggested-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Duoming Zhou <duoming@zju.edu.cn>
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/85bba3af651ca0e1a519da8d0d715b949891171c.1715247018.git.duoming@zju.edu.cn
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Currently, user-space extracts data from the ring-buffer via splice,
which is handy for storage or network sharing. However, due to splice
limitations, it is imposible to do real-time analysis without a copy.
A solution for that problem is to let the user-space map the ring-buffer
directly.
The mapping is exposed via the per-CPU file trace_pipe_raw. The first
element of the mapping is the meta-page. It is followed by each
subbuffer constituting the ring-buffer, ordered by their unique page ID:
* Meta-page -- include/uapi/linux/trace_mmap.h for a description
* Subbuf ID 0
* Subbuf ID 1
...
It is therefore easy to translate a subbuf ID into an offset in the
mapping:
reader_id = meta->reader->id;
reader_offset = meta->meta_page_size + reader_id * meta->subbuf_size;
When new data is available, the mapper must call a newly introduced ioctl:
TRACE_MMAP_IOCTL_GET_READER. This will update the Meta-page reader ID to
point to the next reader containing unread data.
Mapping will prevent snapshot and buffer size modifications.
Link: https://lore.kernel.org/linux-trace-kernel/20240510140435.3550353-4-vdonnefort@google.com
CC: <linux-mm@kvack.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
In preparation for allowing the user-space to map a ring-buffer, add
a set of mapping functions:
ring_buffer_{map,unmap}()
And controls on the ring-buffer:
ring_buffer_map_get_reader() /* swap reader and head */
Mapping the ring-buffer also involves:
A unique ID for each subbuf of the ring-buffer, currently they are
only identified through their in-kernel VA.
A meta-page, where are stored ring-buffer statistics and a
description for the current reader
The linear mapping exposes the meta-page, and each subbuf of the
ring-buffer, ordered following their unique ID, assigned during the
first mapping.
Once mapped, no subbuf can get in or out of the ring-buffer: the buffer
size will remain unmodified and the splice enabling functions will in
reality simply memcpy the data instead of swapping subbufs.
Link: https://lore.kernel.org/linux-trace-kernel/20240510140435.3550353-3-vdonnefort@google.com
CC: <linux-mm@kvack.org>
Signed-off-by: Vincent Donnefort <vdonnefort@google.com>
Acked-by: David Hildenbrand <david@redhat.com>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
|
|
Fix to take the i_rwsem (through the netfs locking wrappers) before taking
cinode->lock_sem.
Fixes: 3ee1a1fc3981 ("cifs: Cut over to using netfslib")
Reported-by: Enzo Matsumiya <ematsumiya@suse.de>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Steve French <stfrench@microsoft.com>
|
|
TX queue stop and wake are counted by some drivers.
Support reporting these via netdev-genl queue stats.
Signed-off-by: Daniel Jurgens <danielj@nvidia.com>
Reviewed-by: Jiri Pirko <jiri@nvidia.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
Link: https://lore.kernel.org/r/20240510201927.1821109-2-danielj@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Xiumei reports the following splat when netlabel and TCP socket are used:
=============================
WARNING: suspicious RCU usage
6.9.0-rc2+ #637 Not tainted
-----------------------------
net/ipv4/cipso_ipv4.c:1880 suspicious rcu_dereference_protected() usage!
other info that might help us debug this:
rcu_scheduler_active = 2, debug_locks = 1
1 lock held by ncat/23333:
#0: ffffffff906030c0 (rcu_read_lock){....}-{1:2}, at: netlbl_sock_setattr+0x25/0x1b0
stack backtrace:
CPU: 11 PID: 23333 Comm: ncat Kdump: loaded Not tainted 6.9.0-rc2+ #637
Hardware name: Supermicro SYS-6027R-72RF/X9DRH-7TF/7F/iTF/iF, BIOS 3.0 07/26/2013
Call Trace:
<TASK>
dump_stack_lvl+0xa9/0xc0
lockdep_rcu_suspicious+0x117/0x190
cipso_v4_sock_setattr+0x1ab/0x1b0
netlbl_sock_setattr+0x13e/0x1b0
selinux_netlbl_socket_post_create+0x3f/0x80
selinux_socket_post_create+0x1a0/0x460
security_socket_post_create+0x42/0x60
__sock_create+0x342/0x3a0
__sys_socket_create.part.22+0x42/0x70
__sys_socket+0x37/0xb0
__x64_sys_socket+0x16/0x20
do_syscall_64+0x96/0x180
? do_user_addr_fault+0x68d/0xa30
? exc_page_fault+0x171/0x280
? asm_exc_page_fault+0x22/0x30
entry_SYSCALL_64_after_hwframe+0x71/0x79
RIP: 0033:0x7fbc0ca3fc1b
Code: 73 01 c3 48 8b 0d 05 f2 1b 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 90 f3 0f 1e fa b8 29 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d d5 f1 1b 00 f7 d8 64 89 01 48
RSP: 002b:00007fff18635208 EFLAGS: 00000246 ORIG_RAX: 0000000000000029
RAX: ffffffffffffffda RBX: 0000000000000001 RCX: 00007fbc0ca3fc1b
RDX: 0000000000000006 RSI: 0000000000000001 RDI: 0000000000000002
RBP: 000055d24f80f8a0 R08: 0000000000000003 R09: 0000000000000001
R10: 0000000000020000 R11: 0000000000000246 R12: 000055d24f80f8a0
R13: 0000000000000000 R14: 000055d24f80fb88 R15: 0000000000000000
</TASK>
The current implementation of cipso_v4_sock_setattr() replaces IP options
under the assumption that the caller holds the socket lock; however, such
assumption is not true, nor needed, in selinux_socket_post_create() hook.
Let all callers of cipso_v4_sock_setattr() specify the "socket lock held"
condition, except selinux_socket_post_create() _ where such condition can
safely be set as true even without holding the socket lock.
Fixes: f6d8bd051c39 ("inet: add RCU protection to inet->opt")
Reported-by: Xiumei Mu <xmu@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Acked-by: Casey Schaufler <casey@schaufler-ca.com>
Acked-by: Paul Moore <paul@paul-moore.com>
Link: https://lore.kernel.org/r/f4260d000a3a55b9e8b6a3b4e3fffc7da9f82d41.1715359817.git.dcaratti@redhat.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu:
"API:
- Remove crypto stats interface
Algorithms:
- Add faster AES-XTS on modern x86_64 CPUs
- Forbid curves with order less than 224 bits in ecc (FIPS 186-5)
- Add ECDSA NIST P521
Drivers:
- Expose otp zone in atmel
- Add dh fallback for primes > 4K in qat
- Add interface for live migration in qat
- Use dma for aes requests in starfive
- Add full DMA support for stm32mpx in stm32
- Add Tegra Security Engine driver
Others:
- Introduce scope-based x509_certificate allocation"
* tag 'v6.10-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (123 commits)
crypto: atmel-sha204a - provide the otp content
crypto: atmel-sha204a - add reading from otp zone
crypto: atmel-i2c - rename read function
crypto: atmel-i2c - add missing arg description
crypto: iaa - Use kmemdup() instead of kzalloc() and memcpy()
crypto: sahara - use 'time_left' variable with wait_for_completion_timeout()
crypto: api - use 'time_left' variable with wait_for_completion_killable_timeout()
crypto: caam - i.MX8ULP donot have CAAM page0 access
crypto: caam - init-clk based on caam-page0-access
crypto: starfive - Use fallback for unaligned dma access
crypto: starfive - Do not free stack buffer
crypto: starfive - Skip unneeded fallback allocation
crypto: starfive - Skip dma setup for zeroed message
crypto: hisilicon/sec2 - fix for register offset
crypto: hisilicon/debugfs - mask the unnecessary info from the dump
crypto: qat - specify firmware files for 402xx
crypto: x86/aes-gcm - simplify GCM hash subkey derivation
crypto: x86/aes-gcm - delete unused GCM assembly code
crypto: x86/aes-xts - simplify loop in xts_crypt_slowpath()
hwrng: stm32 - repair clock handling
...
|
|
A way for an application to know if an MPTCP connection fell back to TCP
is to use getsockopt(MPTCP_INFO) and look for errors. The issue with
this technique is that the same errors -- EOPNOTSUPP (IPv4) and
ENOPROTOOPT (IPv6) -- are returned if there was a fallback, *or* if the
kernel doesn't support this socket option. The userspace then has to
look at the kernel version to understand what the errors mean.
It is not clean, and it doesn't take into account older kernels where
the socket option has been backported. A cleaner way would be to expose
this info to the TCP socket level. In case of MPTCP socket where no
fallback happened, the socket options for the TCP level will be handled
in MPTCP code, in mptcp_getsockopt_sol_tcp(). If not, that will be in
TCP code, in do_tcp_getsockopt(). So MPTCP simply has to set the value
1, while TCP has to set 0.
If the socket option is not supported, one of these two errors will be
reported:
- EOPNOTSUPP (95 - Operation not supported) for MPTCP sockets
- ENOPROTOOPT (92 - Protocol not available) for TCP sockets, e.g. on the
socket received after an 'accept()', when the client didn't request to
use MPTCP: this socket will be a TCP one, even if the listen socket
was an MPTCP one.
With this new option, the kernel can return a clear answer to both "Is
this kernel new enough to tell me the fallback status?" and "If it is
new enough, is it currently a TCP or MPTCP socket?" questions, while not
breaking the previous method.
Acked-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Matthieu Baerts (NGI0) <matttbe@kernel.org>
Link: https://lore.kernel.org/r/20240509-upstream-net-next-20240509-mptcp-tcp_is_mptcp-v1-1-f846df999202@kernel.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
{inet,ipv6}_gro_receive functions perform flush checks (ttl, flags,
iph->id, ...) against all packets in a loop. These flush checks are used in
all merging UDP and TCP flows.
These checks need to be done only once and only against the found p skb,
since they only affect flush and not same_flow.
This patch leverages correct network header offsets from the cb for both
outer and inner network headers - allowing these checks to be done only
once, in tcp_gro_receive and udp_gro_receive_segment. As a result,
NAPI_GRO_CB(p)->flush is not used at all. In addition, flush_id checks are
more declarative and contained in inet_gro_flush, thus removing the need
for flush_id in napi_gro_cb.
This results in less parsing code for non-loop flush tests for TCP and UDP
flows.
To make sure results are not within noise range - I've made netfilter drop
all TCP packets, and measured CPU performance in GRO (in this case GRO is
responsible for about 50% of the CPU utilization).
perf top while replaying 64 parallel IP/TCP streams merging in GRO:
(gro_receive_network_flush is compiled inline to tcp_gro_receive)
net-next:
6.94% [kernel] [k] inet_gro_receive
3.02% [kernel] [k] tcp_gro_receive
patch applied:
4.27% [kernel] [k] tcp_gro_receive
4.22% [kernel] [k] inet_gro_receive
perf top while replaying 64 parallel IP/IP/TCP streams merging in GRO (same
results for any encapsulation, in this case inet_gro_receive is top
offender in net-next)
net-next:
10.09% [kernel] [k] inet_gro_receive
2.08% [kernel] [k] tcp_gro_receive
patch applied:
6.97% [kernel] [k] inet_gro_receive
3.68% [kernel] [k] tcp_gro_receive
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240509190819.2985-3-richardbgobert@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
This patch converts references of skb->network_header to napi_gro_cb's
network_offset and inner_network_offset.
Signed-off-by: Richard Gobert <richardbgobert@gmail.com>
Reviewed-by: Willem de Bruijn <willemb@google.com>
Link: https://lore.kernel.org/r/20240509190819.2985-2-richardbgobert@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull hardening updates from Kees Cook:
"The bulk of the changes here are related to refactoring and expanding
the KUnit tests for string helper and fortify behavior.
Some trivial strncpy replacements in fs/ were carried in my tree. Also
some fixes to SCSI string handling were carried in my tree since the
helper for those was introduce here. Beyond that, just little fixes
all around: objtool getting confused about LKDTM+KCFI, preparing for
future refactors (constification of sysctl tables, additional
__counted_by annotations), a Clang UBSAN+i386 crash fix, and adding
more options in the hardening.config Kconfig fragment.
Summary:
- selftests: Add str*cmp tests (Ivan Orlov)
- __counted_by: provide UAPI for _le/_be variants (Erick Archer)
- Various strncpy deprecation refactors (Justin Stitt)
- stackleak: Use a copy of soon-to-be-const sysctl table (Thomas
Weißschuh)
- UBSAN: Work around i386 -regparm=3 bug with Clang prior to
version 19
- Provide helper to deal with non-NUL-terminated string copying
- SCSI: Fix older string copying bugs (with new helper)
- selftests: Consolidate string helper behavioral tests
- selftests: add memcpy() fortify tests
- string: Add additional __realloc_size() annotations for "dup"
helpers
- LKDTM: Fix KCFI+rodata+objtool confusion
- hardening.config: Enable KCFI"
* tag 'hardening-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux: (29 commits)
uapi: stddef.h: Provide UAPI macros for __counted_by_{le, be}
stackleak: Use a copy of the ctl_table argument
string: Add additional __realloc_size() annotations for "dup" helpers
kunit/fortify: Fix replaced failure path to unbreak __alloc_size
hardening: Enable KCFI and some other options
lkdtm: Disable CFI checking for perms functions
kunit/fortify: Add memcpy() tests
kunit/fortify: Do not spam logs with fortify WARNs
kunit/fortify: Rename tests to use recommended conventions
init: replace deprecated strncpy with strscpy_pad
kunit/fortify: Fix mismatched kvalloc()/vfree() usage
scsi: qla2xxx: Avoid possible run-time warning with long model_num
scsi: mpi3mr: Avoid possible run-time warning with long manufacturer strings
scsi: mptfusion: Avoid possible run-time warning with long manufacturer strings
fs: ecryptfs: replace deprecated strncpy with strscpy
hfsplus: refactor copy_name to not use strncpy
reiserfs: replace deprecated strncpy with scnprintf
virt: acrn: replace deprecated strncpy with strscpy
ubsan: Avoid i386 UBSAN handler crashes with Clang
ubsan: Remove 1-element array usage in debug reporting
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux
Pull execve updates from Kees Cook:
- Provide knob to change (previously fixed) coredump NOTES size
(Allen Pais)
- Add sched_prepare_exec tracepoint (Marco Elver)
- Make /proc/$pid/auxv work under binfmt_elf_fdpic (Max Filippov)
- Convert ARCH_HAVE_EXTRA_ELF_NOTES to proper Kconfig (Vignesh
Balasubramanian)
- Leave a gap between .bss and brk
* tag 'execve-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/kees/linux:
fs/coredump: Enable dynamic configuration of max file note size
binfmt_elf_fdpic: fix /proc/<pid>/auxv
binfmt_elf: Leave a gap between .bss and brk
Replace macro "ARCH_HAVE_EXTRA_ELF_NOTES" with kconfig
tracing: Add sched_prepare_exec tracepoint
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next
Pablo Neira Ayuso says:
====================
Netfilter updates for net-next
The following patchset contains Netfilter updates for net-next:
Patch #1 skips transaction if object type provides no .update interface.
Patch #2 skips NETDEV_CHANGENAME which is unused.
Patch #3 enables conntrack to handle Multicast Router Advertisements and
Multicast Router Solicitations from the Multicast Router Discovery
protocol (RFC4286) as untracked opposed to invalid packets.
From Linus Luessing.
Patch #4 updates DCCP conntracker to mark invalid as invalid, instead of
dropping them, from Jason Xing.
Patch #5 uses NF_DROP instead of -NF_DROP since NF_DROP is 0,
also from Jason.
Patch #6 removes reference in netfilter's sysctl documentation on pickup
entries which were already removed by Florian Westphal.
Patch #7 removes check for IPS_OFFLOAD flag to disable early drop which
allows to evict entries from the conntrack table,
also from Florian.
Patches #8 to #16 updates nf_tables pipapo set backend to allocate
the datastructure copy on-demand from preparation phase,
to better deal with OOM situations where .commit step is too late
to fail. Series from Florian Westphal.
Patch #17 adds a selftest with packetdrill to cover conntrack TCP state
transitions, also from Florian.
Patch #18 use GFP_KERNEL to clone elements from control plane to avoid
quick atomic reserves exhaustion with large sets, reporter refers
to million entries magnitude.
* tag 'nf-next-24-05-12' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next:
netfilter: nf_tables: allow clone callbacks to sleep
selftests: netfilter: add packetdrill based conntrack tests
netfilter: nft_set_pipapo: remove dirty flag
netfilter: nft_set_pipapo: move cloning of match info to insert/removal path
netfilter: nft_set_pipapo: prepare pipapo_get helper for on-demand clone
netfilter: nft_set_pipapo: merge deactivate helper into caller
netfilter: nft_set_pipapo: prepare walk function for on-demand clone
netfilter: nft_set_pipapo: prepare destroy function for on-demand clone
netfilter: nft_set_pipapo: make pipapo_clone helper return NULL
netfilter: nft_set_pipapo: move prove_locking helper around
netfilter: conntrack: remove flowtable early-drop test
netfilter: conntrack: documentation: remove reference to non-existent sysctl
netfilter: use NF_DROP instead of -NF_DROP
netfilter: conntrack: dccp: try not to drop skb in conntrack
netfilter: conntrack: fix ct-state for ICMPv6 Multicast Router Discovery
netfilter: nf_tables: remove NETDEV_CHANGENAME from netdev chain event handler
netfilter: nf_tables: skip transaction if update object is not implemented
====================
Link: https://lore.kernel.org/r/20240512161436.168973-1-pablo@netfilter.org
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
|
|
Pull block updates from Jens Axboe:
- Add a partscan attribute in sysfs, fixing an issue with systemd
relying on an internal interface that went away.
- Attempt #2 at making long running discards interruptible. The
previous attempt went into 6.9, but we ended up mostly reverting it
as it had issues.
- Remove old ida_simple API in bcache
- Support for zoned write plugging, greatly improving the performance
on zoned devices.
- Remove the old throttle low interface, which has been experimental
since 2017 and never made it beyond that and isn't being used.
- Remove page->index debugging checks in brd, as it hasn't caught
anything and prepares us for removing in struct page.
- MD pull request from Song
- Don't schedule block workers on isolated CPUs
* tag 'for-6.10/block-20240511' of git://git.kernel.dk/linux: (84 commits)
blk-throttle: delay initialization until configuration
blk-throttle: remove CONFIG_BLK_DEV_THROTTLING_LOW
block: fix that util can be greater than 100%
block: support to account io_ticks precisely
block: add plug while submitting IO
bcache: fix variable length array abuse in btree_iter
bcache: Remove usage of the deprecated ida_simple_xx() API
md: Revert "md: Fix overflow in is_mddev_idle"
blk-lib: check for kill signal in ioctl BLKDISCARD
block: add a bio_await_chain helper
block: add a blk_alloc_discard_bio helper
block: add a bio_chain_and_submit helper
block: move discard checks into the ioctl handler
block: remove the discard_granularity check in __blkdev_issue_discard
block/ioctl: prefer different overflow check
null_blk: Fix the WARNING: modpost: missing MODULE_DESCRIPTION()
block: fix and simplify blkdevparts= cmdline parsing
block: refine the EOF check in blkdev_iomap_begin
block: add a partscan sysfs attribute for disks
block: add a disk_has_partscan helper
...
|
|
Pull io_uring updates from Jens Axboe:
- Greatly improve send zerocopy performance, by enabling coalescing of
sent buffers.
MSG_ZEROCOPY already does this with send(2) and sendmsg(2), but the
io_uring side did not. In local testing, the crossover point for send
zerocopy being faster is now around 3000 byte packets, and it
performs better than the sync syscall variants as well.
This feature relies on a shared branch with net-next, which was
pulled into both branches.
- Unification of how async preparation is done across opcodes.
Previously, opcodes that required extra memory for async retry would
allocate that as needed, using on-stack state until that was the
case. If async retry was needed, the on-stack state was adjusted
appropriately for a retry and then copied to the allocated memory.
This led to some fragile and ugly code, particularly for read/write
handling, and made storage retries more difficult than they needed to
be. Allocate the memory upfront, as it's cheap from our pools, and
use that state consistently both initially and also from the retry
side.
- Move away from using remap_pfn_range() for mapping the rings.
This is really not the right interface to use and can cause lifetime
issues or leaks. Additionally, it means the ring sq/cq arrays need to
be physically contigious, which can cause problems in production with
larger rings when services are restarted, as memory can be very
fragmented at that point.
Move to using vm_insert_page(s) for the ring sq/cq arrays, and apply
the same treatment to mapped ring provided buffers. This also helps
unify the code we have dealing with allocating and mapping memory.
Hard to see in the diffstat as we're adding a few features as well,
but this kills about ~400 lines of code from the codebase as well.
- Add support for bundles for send/recv.
When used with provided buffers, bundles support sending or receiving
more than one buffer at the time, improving the efficiency by only
needing to call into the networking stack once for multiple sends or
receives.
- Tweaks for our accept operations, supporting both a DONTWAIT flag for
skipping poll arm and retry if we can, and a POLLFIRST flag that the
application can use to skip the initial accept attempt and rely
purely on poll for triggering the operation. Both of these have
identical flags on the receive side already.
- Make the task_work ctx locking unconditional.
We had various code paths here that would do a mix of lock/trylock
and set the task_work state to whether or not it was locked. All of
that goes away, we lock it unconditionally and get rid of the state
flag indicating whether it's locked or not.
The state struct still exists as an empty type, can go away in the
future.
- Add support for specifying NOP completion values, allowing it to be
used for error handling testing.
- Use set/test bit for io-wq worker flags. Not strictly needed, but
also doesn't hurt and helps silence a KCSAN warning.
- Cleanups for io-wq locking and work assignments, closing a tiny race
where cancelations would not be able to find the work item reliably.
- Misc fixes, cleanups, and improvements
* tag 'for-6.10/io_uring-20240511' of git://git.kernel.dk/linux: (97 commits)
io_uring: support to inject result for NOP
io_uring: fail NOP if non-zero op flags is passed in
io_uring/net: add IORING_ACCEPT_POLL_FIRST flag
io_uring/net: add IORING_ACCEPT_DONTWAIT flag
io_uring/filetable: don't unnecessarily clear/reset bitmap
io_uring/io-wq: Use set_bit() and test_bit() at worker->flags
io_uring/msg_ring: cleanup posting to IOPOLL vs !IOPOLL ring
io_uring: Require zeroed sqe->len on provided-buffers send
io_uring/notif: disable LAZY_WAKE for linked notifs
io_uring/net: fix sendzc lazy wake polling
io_uring/msg_ring: reuse ctx->submitter_task read using READ_ONCE instead of re-reading it
io_uring/rw: reinstate thread check for retries
io_uring/notif: implement notification stacking
io_uring/notif: simplify io_notif_flush()
net: add callback for setting a ubuf_info to skb
net: extend ubuf_info callback to ops structure
io_uring/net: support bundles for recv
io_uring/net: support bundles for send
io_uring/kbuf: add helpers for getting/peeking multiple buffers
io_uring/net: add provided buffer support for IORING_OP_SEND
...
|
|
Pull vfs rw iterator updates from Christian Brauner:
"The core fs signalfd, userfaultfd, and timerfd subsystems did still
use f_op->read() instead of f_op->read_iter(). Convert them over since
we should aim to get rid of f_op->read() at some point.
Aside from that io_uring and others want to mark files as FMODE_NOWAIT
so it can make use of per-IO nonblocking hints to enable more
efficient IO. Converting those users to f_op->read_iter() allows them
to be marked with FMODE_NOWAIT"
* tag 'vfs-6.10.rw' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs:
signalfd: convert to ->read_iter()
userfaultfd: convert to ->read_iter()
timerfd: convert to ->read_iter()
new helper: copy_to_iter_full()
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull netfs updates from Christian Brauner:
"This reworks the netfslib writeback implementation so that pages read
from the cache are written to the cache through ->writepages(),
thereby allowing the fscache page flag to be retired.
The reworking also:
- builds on top of the new writeback_iter() infrastructure
- makes it possible to use vectored write RPCs as discontiguous
streams of pages can be accommodated
- makes it easier to do simultaneous content crypto and stream
division
- provides support for retrying writes and re-dividing a stream
- replaces the ->launder_folio() op, so that ->writepages() is used
instead
- uses mempools to allocate the netfs_io_request and
netfs_io_subrequest structs to avoid allocation failure in the
writeback path
Some code that uses the fscache page flag is retained for
compatibility purposes with nfs and ceph. The code is switched to
using the synonymous private_2 label instead and marked with
deprecation comments.
The merge commit contains additional details on the new algorithm that
I've left out of here as it would probably be excessively detailed.
On top of the netfslib infrastructure this contains the work to
convert cifs over to netfslib"
* tag 'vfs-6.10.netfs' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (38 commits)
cifs: Enable large folio support
cifs: Remove some code that's no longer used, part 3
cifs: Remove some code that's no longer used, part 2
cifs: Remove some code that's no longer used, part 1
cifs: Cut over to using netfslib
cifs: Implement netfslib hooks
cifs: Make add_credits_and_wake_if() clear deducted credits
cifs: Add mempools for cifs_io_request and cifs_io_subrequest structs
cifs: Set zero_point in the copy_file_range() and remap_file_range()
cifs: Move cifs_loose_read_iter() and cifs_file_write_iter() to file.c
cifs: Replace the writedata replay bool with a netfs sreq flag
cifs: Make wait_mtu_credits take size_t args
cifs: Use more fields from netfs_io_subrequest
cifs: Replace cifs_writedata with a wrapper around netfs_io_subrequest
cifs: Replace cifs_readdata with a wrapper around netfs_io_subrequest
cifs: Use alternative invalidation to using launder_folio
netfs, afs: Use writeback retry to deal with alternate keys
netfs: Miscellaneous tidy ups
netfs: Remove the old writeback code
netfs: Cut over to using new writeback code
...
|
|
Merge x86-specific ACPI updates, an ACPI DPTF driver update adding new
platform support to it, and an ACPI APEI update:
- Add a num-cs device property to specify the number of chip selects
for Intel Braswell to the ACPI LPSS (Intel SoC) driver and remove a
nested CONFIG_PM #ifdef from it (Andy Shevchenko).
- Move three x86-specific ACPI files to the x86 directory (Andy
Shevchenko).
- Mark SMO8810 accel on Dell XPS 15 9550 as always present and add a
PNP_UART1_SKIP quirk for Lenovo Blade2 tablets (Hans de Goede).
- Move acpi_blacklisted() declaration to asm/acpi.h (Kuppuswamy
Sathyanarayanan).
- Add Lunar Lake support to the ACPI DPTF driver (Sumeet Pawnikar).
- Mark the einj_driver driver's remove callback as __exit because it
cannot get unbound via sysfs (Uwe Kleine-König).
* acpi-x86:
ACPI: Move acpi_blacklisted() declaration to asm/acpi.h
ACPI: x86: Add PNP_UART1_SKIP quirk for Lenovo Blade2 tablets
ACPI: x86: utils: Mark SMO8810 accel on Dell XPS 15 9550 as always present
ACPI: x86: Move LPSS to x86 folder
ACPI: x86: Move blacklist to x86 folder
ACPI: x86: Move acpi_cmos_rtc to x86 folder
ACPI: x86: Introduce a Makefile
ACPI: LPSS: Remove nested ifdeffery for CONFIG_PM
ACPI: LPSS: Advertise number of chip selects via property
* acpi-dptf:
ACPI: DPTF: Add Lunar Lake support
* acpi-apei:
ACPI: APEI: EINJ: mark remove callback as __exit
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs
Pull misc vfs updates from Christian Brauner:
"This contains the usual miscellaneous features, cleanups, and fixes
for vfs and individual fses.
Features:
- Free up FMODE_* bits. I've freed up bits 6, 7, 8, and 24. That
means we now have six free FMODE_* bits in total (but bit #6
already got used for FMODE_WRITE_RESTRICTED)
- Add FOP_HUGE_PAGES flag (follow-up to FMODE_* cleanup)
- Add fd_raw cleanup class so we can make use of automatic cleanup
provided by CLASS(fd_raw, f)(fd) for O_PATH fds as well
- Optimize seq_puts()
- Simplify __seq_puts()
- Add new anon_inode_getfile_fmode() api to allow specifying f_mode
instead of open-coding it in multiple places
- Annotate struct file_handle with __counted_by() and use
struct_size()
- Warn in get_file() whether f_count resurrection from zero is
attempted (epoll/drm discussion)
- Folio-sophize aio
- Export the subvolume id in statx() for both btrfs and bcachefs
- Relax linkat(AT_EMPTY_PATH) requirements
- Add F_DUPFD_QUERY fcntl() allowing to compare two file descriptors
for dup*() equality replacing kcmp()
Cleanups:
- Compile out swapfile inode checks when swap isn't enabled
- Use (1 << n) notation for FMODE_* bitshifts for clarity
- Remove redundant variable assignment in fs/direct-io
- Cleanup uses of strncpy in orangefs
- Speed up and cleanup writeback
- Move fsparam_string_empty() helper into header since it's currently
open-coded in multiple places
- Add kernel-doc comments to proc_create_net_data_write()
- Don't needlessly read dentry->d_flags twice
Fixes:
- Fix out-of-range warning in nilfs2
- Fix ecryptfs overflow due to wrong encryption packet size
calculation
- Fix overly long line in xfs file_operations (follow-up to FMODE_*
cleanup)
- Don't raise FOP_BUFFER_{R,W}ASYNC for directories in xfs (follow-up
to FMODE_* cleanup)
- Don't call xfs_file_open from xfs_dir_open (follow-up to FMODE_*
cleanup)
- Fix stable offset api to prevent endless loops
- Fix afs file server rotations
- Prevent xattr node from overflowing the eraseblock in jffs2
- Move fdinfo PTRACE_MODE_READ procfs check into the .permission()
operation instead of .open() operation since this caused userspace
regressions"
* tag 'vfs-6.10.misc' of git://git.kernel.org/pub/scm/linux/kernel/git/vfs/vfs: (39 commits)
afs: Fix fileserver rotation getting stuck
selftests: add F_DUPDFD_QUERY selftests
fcntl: add F_DUPFD_QUERY fcntl()
file: add fd_raw cleanup class
fs: WARN when f_count resurrection is attempted
seq_file: Simplify __seq_puts()
seq_file: Optimize seq_puts()
proc: Move fdinfo PTRACE_MODE_READ check into the inode .permission operation
fs: Create anon_inode_getfile_fmode()
xfs: don't call xfs_file_open from xfs_dir_open
xfs: drop fop_flags for directories
xfs: fix overly long line in the file_operations
shmem: Fix shmem_rename2()
libfs: Add simple_offset_rename() API
libfs: Fix simple_offset_rename_exchange()
jffs2: prevent xattr node from overflowing the eraseblock
vfs, swap: compile out IS_SWAPFILE() on swapless configs
vfs: relax linkat() AT_EMPTY_PATH - aka flink() - requirements
fs/direct-io: remove redundant assignment to variable retval
fs/dcache: Re-use value stored to dentry->d_flags instead of re-reading
...
|
|
Merge Enery Model update and a power management documentation update for
6.10:
- Make the Samsung exynos-asv driver update the Energy Model after
adjusting voltage on top of some preliminary changes of the OPP and
Enery Model generic code (Lukasz Luba).
- Remove a reference to a function that has been dropped from the power
management documentation (Bjorn Helgaas).
* pm-em:
soc: samsung: exynos-asv: Update Energy Model after adjusting voltage
PM: EM: Add em_dev_update_chip_binning()
PM: EM: Refactor em_adjust_new_capacity()
OPP: OF: Export dev_opp_pm_calc_power() for usage from EM
* pm-docs:
Documentation: PM: Update platform_pci_wakeup_init() reference
|
|
Merge cpuidle updates, changes related to system sleep and power capping
updates for 6.10:
- Fix kerneldoc description of ladder_do_selection() (Jeff Johnson).
- Convert the cpuidle kirkwood driver to platform remove callback
returning void (Yangtao Li).
- Replace deprecated strncpy() with strscpy() in the hibernation core
code (Justin Stitt).
- Use %ps to simplify debug output in the core system-wide suspend and
resume code (Len Brown).
- Remove unnecessary else from device_init_wakeup() and make
device_wakeup_disable() return void (Dhruva Gole).
- Enable PMU support in the Intel TPMI RAPL driver (Zhang Rui).
- Add support for ArrowLake-H platform to the Intel RAPL driver (Zhang
Rui).
- Avoid explicit cpumask allocation on stack in DTPM (Dawei Li).
* pm-cpuidle:
cpuidle: ladder: fix ladder_do_selection() kernel-doc
cpuidle: kirkwood: Convert to platform remove callback returning void
* pm-sleep:
PM: hibernate: replace deprecated strncpy() with strscpy()
PM: sleep: Take advantage of %ps to simplify debug output
PM: wakeup: Remove unnecessary else from device_init_wakeup()
PM: wakeup: make device_wakeup_disable() return void
* pm-powercap:
powercap: intel_rapl_tpmi: Enable PMU support
powercap: intel_rapl: Introduce APIs for PMU support
powercap: intel_rapl: Sort header files
powercap: intel_rapl: Add support for ArrowLake-H platform
powercap: DTPM: Avoid explicit cpumask allocation on stack
|
|
Merge cpufreq updates for 6.10:
- Rework the handling of disabled turbo in the intel_pstate driver and
make it update the maximum CPU frequency consistently regardless of
the reason on top of a number of cleanups (Rafael Wysocki).
- Add missing checks for NULL .exit() cpufreq driver callback to the
cpufreq core (Viresh Kumar).
- Prevent pulicy->max from going above the frequency QoS maximum value
when cpufreq_frequency_table_verify() is used (Xuewen Yan).
- Prevent a negative CPU number or frequency value from being printed
if they are really large (Joshua Yeong).
- Update MAINTAINERS entry for amd-pstate to add two new submaintainers
and a designated reviewer (Huang Rui).
- Clean up the amd-pstate driver and update its documentation (Gautham
Shenoy).
- Fix the highest frequency issue in the amd-pstate driver which limits
performance (Perry Yuan).
- Enable CPPC v2 for certain processors in the family 17H, as requested
by TR40 processor users who expect improved performance and lower
system temperature (Perry Yuan).
- Change latency and delay values to be read from platform firmware
firstly for more accurate timing (Perry Yuan).
- A new quirk is introduced for supporting amd-pstate on legacy
processors which either lack CPPC capability, or only only have CPPC
v2 capability (Perry Yuan).
- Sun50i: Add support for opp_supported_hw, H616 platform and general
cleanups (Andre Przywara, Martin Botka, Brandon Cheo Fusi, Dan
Carpenter, Viresh Kumar).
- CPPC: Fix possible null pointer dereference (Aleksandr Mishin).
- Eliminate uses of of_node_put() (Javier Carrasco, and Shivani Gupta).
- brcmstb-avs: ISO C90 forbids mixed declarations (Portia Stephens).
- mediatek: Add support for MT7988A (Sam Shih).
- cpufreq-qcom-hw: Add SM4450 compatibles in DT bindings (Tengfei Fan).
- Fix struct cpudata::epp_cached kernel-doc in the intel_pstate cpufreq
driver (Jeff Johnson).
* pm-cpufreq: (46 commits)
cpufreq: amd-pstate: fix the highest frequency issue which limits performance
cpufreq: intel_pstate: fix struct cpudata::epp_cached kernel-doc
cpufreq: Fix up printing large CPU numbers and frequency values
MAINTAINERS: cpufreq: amd-pstate: Add co-maintainers and reviewer
cpufreq: amd-pstate: remove unused variable lowest_nonlinear_freq
cpufreq: amd-pstate: fix code format problems
cpufreq: amd-pstate: Add quirk for the pstate CPPC capabilities missing
cppc_acpi: print error message if CPPC is unsupported
cpufreq: amd-pstate: get transition delay and latency value from ACPI tables
cpufreq: amd-pstate: Bail out if min/max/nominal_freq is 0
cpufreq: amd-pstate: Remove amd_get_{min,max,nominal,lowest_nonlinear}_freq()
cpufreq: amd-pstate: Unify computation of {max,min,nominal,lowest_nonlinear}_freq
cpufreq: amd-pstate: Document the units for freq variables in amd_cpudata
cpufreq: amd-pstate: Document *_limit_* fields in struct amd_cpudata
dt-bindings: cpufreq: cpufreq-qcom-hw: Add SM4450 compatibles
cpufreq: sun50i: fix error returns in dt_has_supported_hw()
cpufreq: brcmstb-avs-cpufreq: ISO C90 forbids mixed declarations
cpufreq: dt-platdev: eliminate uses of of_node_put()
cpufreq: dt: eliminate uses of of_node_put()
cpufreq: ti: Implement scope-based cleanup in ti_cpufreq_match_node()
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull TPM updates from Jarkko Sakkinen:
"These are the changes for the TPM driver with a single major new
feature: TPM bus encryption and integrity protection. The key pair on
TPM side is generated from so called null random seed per power on of
the machine [1]. This supports the TPM encryption of the hard drive by
adding layer of protection against bus interposer attacks.
Other than that, a few minor fixes and documentation for tpm_tis to
clarify basics of TPM localities for future patch review discussions
(will be extended and refined over times, just a seed)"
Link: https://lore.kernel.org/linux-integrity/20240429202811.13643-1-James.Bottomley@HansenPartnership.com/ [1]
* tag 'tpmdd-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd: (28 commits)
Documentation: tpm: Add TPM security docs toctree entry
tpm: disable the TPM if NULL name changes
Documentation: add tpm-security.rst
tpm: add the null key name as a sysfs export
KEYS: trusted: Add session encryption protection to the seal/unseal path
tpm: add session encryption protection to tpm2_get_random()
tpm: add hmac checks to tpm2_pcr_extend()
tpm: Add the rest of the session HMAC API
tpm: Add HMAC session name/handle append
tpm: Add HMAC session start and end functions
tpm: Add TCG mandated Key Derivation Functions (KDFs)
tpm: Add NULL primary creation
tpm: export the context save and load commands
tpm: add buffer function to point to returned parameters
crypto: lib - implement library version of AES in CFB mode
KEYS: trusted: tpm2: Use struct tpm_buf for sized buffers
tpm: Add tpm_buf_read_{u8,u16,u32}
tpm: TPM2B formatted buffers
tpm: Store the length of the tpm_buf data separately.
tpm: Update struct tpm_buf documentation comments
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd
Pull trusted keys updates from Jarkko Sakkinen:
"This contains a new key type for the Data Co-Processor (DCP), which is
an IP core built into many NXP SoCs such as i.mx6ull"
* tag 'keys-trusted-next-6.10-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/jarkko/linux-tpmdd:
docs: trusted-encrypted: add DCP as new trust source
docs: document DCP-backed trusted keys kernel params
MAINTAINERS: add entry for DCP-based trusted keys
KEYS: trusted: Introduce NXP DCP-backed trusted keys
KEYS: trusted: improve scalability of trust source config
crypto: mxs-dcp: Add support for hardware-bound keys
|
|
Make ACPI resource management quirks, a documentation update related to
the ACPI handling of device properties and ACPI NUMA handling changes
for 6.10:
- Add ACPI IRQ override quirks for Asus Vivobook Pro N6506MV, TongFang
GXxHRXx and GMxHGxx, and XMG APEX 17 M23 (Guenter Schafranek, Tamim
Khan, Christoffer Sandberg).
- Add reference to UEFI DSD Guide to the documentation related to the
ACPI handling of device properties (Sakari Ailus).
- Fix SRAT lookup of CFMWS ranges with numa_fill_memblks(), remove
lefover architecture-dependent code from the ACPI NUMA handling code
and simplify it on top of that (Robert Richter).
* acpi-resource:
ACPI: resource: Skip IRQ override on Asus Vivobook Pro N6506MV
ACPI: resource: Do IRQ override on TongFang GXxHRXx and GMxHGxx
ACPI: resource: Do IRQ override on GMxBGxx (XMG APEX 17 M23)
* acpi-property:
ACPI: property: Add reference to UEFI DSD Guide
* acpi-numa:
ACPI/NUMA: Squash acpi_numa_memory_affinity_init() into acpi_parse_memory_affinity()
ACPI/NUMA: Squash acpi_numa_slit_init() into acpi_parse_slit()
ACPI/NUMA: Remove architecture dependent remainings
x86/numa: Fix SRAT lookup of CFMWS ranges with numa_fill_memblks()
|
|
Merge ACPI device enumeration changes and ACPI data-only tables support
updates for 6.10:
- Rearrange fields in several structures to effectively eliminate
computations from container_of() in some cases (Andy Shevchenko).
- Do some assorted cleanups of the ACPI device enumeration code (Andy
Shevchenko).
- Make the ACPI device enumeration code skip devices with _STA values
clearly identified by the specification as invalid (Rafael Wysocki).
- Rework the handling of the NHLT table to simplify and clarify it and
drop some obsolete pieces (Cezary Rojewski).
* acpi-scan:
ACPI: scan: Avoid enumerating devices with clearly invalid _STA values
ACPI: scan: Introduce typedef:s for struct acpi_hotplug_context members
ACPI: scan: Use standard error checking pattern
ACPI: scan: Move misleading comment to acpi_dma_configure_id()
ACPI: scan: Use list_first_entry_or_null() in acpi_device_hid()
ACPI: bus: Don't use "proxy" headers
ACPI: bus: Make container_of() no-op where it makes sense
* acpi-tables:
ACPI: NHLT: Streamline struct naming
ACPI: NHLT: Drop redundant types
ACPI: NHLT: Introduce API for the table
ACPI: NHLT: Reintroduce types the table consists of
|
|
Merge changes related to _OSC handling and updates eliminating the owner
field from struct acpi_driver:
- Make the kernel indicate support for several ACPI features that are
in fact supported to the platform firmware through _OSC and fix
the Generic Initiator Affinity _OSC bit (Armin Wolf).
- Make the ACPI core set the owner value for ACPI drivers, drop the
owner setting from a number of drivers and eliminate the owner
field from struct acpi_driver (Krzysztof Kozlowski).
* acpi-bus: (24 commits)
ACPI: drop redundant owner from acpi_driver
virt: vmgenid: drop owner assignment
ptp: vmw: drop owner assignment
platform/x86/wireless-hotkey: drop owner assignment
platform/x86/toshiba_haps: drop owner assignment
platform/x86/toshiba_bluetooth: drop owner assignment
platform/x86/toshiba_acpi: drop owner assignment
platform/x86/sony-laptop: drop owner assignment
platform/x86/lg-laptop: drop owner assignment
platform/x86/intel/smartconnect: drop owner assignment
platform/x86/intel/rst: drop owner assignment
platform/x86/eeepc: drop owner assignment
platform/x86/dell: drop owner assignment
platform: classmate-laptop: drop owner assignment
platform: asus-laptop: drop owner assignment
platform/chrome: wilco_ec: drop owner assignment
net: fjes: drop owner assignment
Input: atlas - drop owner assignment
ACPI: store owner from modules with acpi_bus_register_driver()
ACPI: bus: Indicate support for IRQ ResourceSource thru _OSC
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull kcsan update from Paul McKenney:
"Introduce __data_racy type qualifier
This adds a __data_racy type qualifier that enables kernel developers
to inform KCSAN that a given variable is a shared variable without
needing to mark each and every access.
This allows pre-KCSAN code to be correctly (if approximately)
instrumented withh very little effort, and also provides people
reading the code a clear indication that the variable is in fact
shared.
In addition, it permits incremental transition to per-access KCSAN
marking, so that (for example) a given subsystem can be transitioned
one variable at a time, while avoiding large numbers of KCSAN warnings
during this transition"
* tag 'kcsan.2024.05.10a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
kcsan, compiler_types: Introduce __data_racy type qualifier
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu
Pull cmpxchg updates from Paul McKenney:
"Provide one-byte and two-byte cmpxchg() support on sparc32, parisc,
and csky
This provides native one-byte and two-byte cmpxchg() support for
sparc32 and parisc, courtesy of Al Viro. This support is provided by
the same hashed-array-of-locks technique used for the other atomic
operations provided for these two platforms.
There is also emulated one-byte cmpxchg() support for csky using a new
cmpxchg_emu_u8() function that uses a four-byte cmpxchg() to emulate
the one-byte variant.
Similar patches for emulation of one-byte cmpxchg() for arc, sh, and
xtensa have not yet received maintainer acks, so they are slated for
the v6.11 merge window"
* tag 'cmpxchg.2024.05.11a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu:
csky: Emulate one-byte cmpxchg
lib: Add one-byte emulation function
parisc: add u16 support to cmpxchg()
parisc: add missing export of __cmpxchg_u8()
parisc: unify implementations of __cmpxchg_u{8,32,64}
parisc: __cmpxchg_u32(): lift conversion into the callers
sparc32: add __cmpxchg_u{8,16}() and teach __cmpxchg() to handle those sizes
sparc32: unify __cmpxchg_u{32,64}
sparc32: make the first argument of __cmpxchg_u64() volatile u64 *
sparc32: make __cmpxchg_u32() return u32
|
|
Pull RCU updates from Uladzislau Rezki:
- Fix a lockdep complain for lazy-preemptible kernel, remove redundant
BH disable for TINY_RCU, remove redundant READ_ONCE() in tree.c, fix
false positives KCSAN splat and fix buffer overflow in the
print_cpu_stall_info().
- Misc updates related to bpf, tracing and update the MAINTAINERS file.
- An improvement of a normal synchronize_rcu() call in terms of
latency. It maintains a separate track for sync. users only. This
approach bypasses per-cpu nocb-lists thus sync-users do not depend on
nocb-list length and how fast regular callbacks are processed.
- RCU tasks: switch tasks RCU grace periods to sleep at TASK_IDLE
priority, fix some comments, add some diagnostic warning to the
exit_tasks_rcu_start() and fix a buffer overflow in the
show_rcu_tasks_trace_gp_kthread().
- RCU torture: Increase memory to guest OS, fix a Tasks Rude RCU
testing, some updates for TREE09, dump mode information to debug GP
kthread state, remove redundant READ_ONCE(), fix some comments about
RCU_TORTURE_PIPE_LEN and pipe_count, remove some redundant pointer
initialization, fix a hung splat task by when the rcutorture tests
start to exit, fix invalid context warning, add '--do-kvfree'
parameter to torture test and use slow register unregister callbacks
only for rcutype test.
* tag 'rcu.next.v6.10' of https://github.com/urezki/linux: (48 commits)
rcutorture: Use rcu_gp_slow_register/unregister() only for rcutype test
torture: Scale --do-kvfree test time
rcutorture: Fix invalid context warning when enable srcu barrier testing
rcutorture: Make stall-tasks directly exit when rcutorture tests end
rcutorture: Removing redundant function pointer initialization
rcutorture: Make rcutorture support print rcu-tasks gp state
rcutorture: Use the gp_kthread_dbg operation specified by cur_ops
rcutorture: Re-use value stored to ->rtort_pipe_count instead of re-reading
rcutorture: Fix rcu_torture_one_read() pipe_count overflow comment
rcutorture: Remove extraneous rcu_torture_pipe_update_one() READ_ONCE()
rcu: Allocate WQ with WQ_MEM_RECLAIM bit set
rcu: Support direct wake-up of synchronize_rcu() users
rcu: Add a trace event for synchronize_rcu_normal()
rcu: Reduce synchronize_rcu() latency
rcu: Fix buffer overflow in print_cpu_stall_info()
rcu: Mollify sparse with RCU guard
rcu-tasks: Fix show_rcu_tasks_trace_gp_kthread buffer overflow
rcu-tasks: Fix the comments for tasks_rcu_exit_srcu_stall_timer
rcu-tasks: Replace exit_tasks_rcu_start() initialization with WARN_ON_ONCE()
rcu: Remove redundant CONFIG_PROVE_RCU #if condition
...
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic
Pull alpha updates from Arnd Bergmann:
"I had investigated dropping support for alpha EV5 and earlier a while
ago after noticing that this is the only supported CPU family in the
kernel without native byte access and that Debian has already dropped
support for this generation last year [1] in order to improve
performance for the newer machines.
This topic came up again when Paul McKenney noticed that parts of the
RCU code already rely on byte access and do not work on alpha EV5
reliably, so we decided on using my series to avoid the problem
entirely.
Al Viro did another series for alpha to address all the known build
issues. I rebased his patches without any further changes and included
it as a baseline for my work here to avoid conflicts and allow
backporting the fixes to stable kernels for the now removed hardware
support as well"
[ I dearly loved alpha back in the days, but the lack of byte and word
operations was a horrible mistake and made everything worse -
including very much the crazy IO contortions that resulted from it.
It certainly wasn't the only mistake in the architecture, but it's the
first-order issue.
So while it's a bit sad to see the support for my first alpha go away,
if you want to run museum hardware, maybe you should use museum
kernels.. - Linus ]
* tag 'asm-generic-alpha' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/asm-generic:
alpha: drop pre-EV56 support
alpha: cabriolet: remove EV5 CPU support
alpha: remove LCA and APECS based machines
alpha: sable: remove early machine support
alpha: remove DECpc AXP150 (Jensen) support
alpha: trim the unused stuff from asm-offsets.c
alpha: jensen, t2 - make __EXTERN_INLINE same as for the rest
alpha: core_lca: take the unused functions out
alpha: missing includes
alpha: sys_sio: fix misspelled ifdefs
alpha: don't make functions public without a reason
alpha: add clone3() support
alpha: fix modversions for strcpy() et.al.
alpha: sort scr_mem{cpy,move}w() out
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc
Pull SoC driver updates from Arnd Bergmann:
"As usual, these are updates for drivers that are specific to certain
SoCs or firmware running on them.
Notable updates include
- The new STMicroelectronics STM32 "firewall" bus driver that is used
to provide a barrier between different parts of an SoC
- Lots of updates for the Qualcomm platform drivers, in particular
SCM, which gets a rewrite of its initialization code
- Firmware driver updates for Arm FF-A notification interrupts and
indirect messaging, SCMI firmware support for pin control and
vendor specific interfaces, and TEE firmware interface changes
across multiple TEE drivers
- A larger cleanup of the Mediatek CMDQ driver and some related bits
- Kconfig changes for riscv drivers to prepare for adding Kanaan k230
support
- Multiple minor updates for the TI sysc bus driver, memory
controllers, hisilicon hccs and more"
* tag 'soc-drivers-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (103 commits)
firmware: qcom: uefisecapp: Allow on sc8180x Primus and Flex 5G
soc: qcom: pmic_glink: Make client-lock non-sleeping
dt-bindings: soc: qcom,wcnss: fix bluetooth address example
soc/tegra: pmc: Add EQOS wake event for Tegra194 and Tegra234
bus: stm32_firewall: fix off by one in stm32_firewall_get_firewall()
bus: etzpc: introduce ETZPC firewall controller driver
firmware: arm_ffa: Avoid queuing work when running on the worker queue
bus: ti-sysc: Drop legacy idle quirk handling
bus: ti-sysc: Drop legacy quirk handling for smartreflex
bus: ti-sysc: Drop legacy quirk handling for uarts
bus: ti-sysc: Add a description and copyrights
bus: ti-sysc: Move check for no-reset-on-init
soc: hisilicon: kunpeng_hccs: replace MAILBOX dependency with PCC
soc: hisilicon: kunpeng_hccs: Add the check for obtaining complete port attribute
firmware: arm_ffa: Fix memory corruption in ffa_msg_send2()
bus: rifsc: introduce RIFSC firewall controller driver
of: property: fw_devlink: Add support for "access-controller"
soc: mediatek: mtk-socinfo: Correct the marketing name for MT8188GV
soc: mediatek: mtk-socinfo: Add entry for MT8395AV/ZA Genio 1200
soc: mediatek: mtk-mutex: Add support for MT8188 VPPSYS
...
|
|
Pull SoC devicetree updates from Arnd Bergmann:
"The updates this time are a bit smaller than most times, mainly
because it is not totally dominated by new Qualcomm hardware support.
Instead, we larger than average updates for Rockchips, NXP, Allwinner
and TI. The only two new SoCs this time are both from NXP and are
minor variants of already supported ones.
The updates for aspeed, amlogic and mediatek came a little late, so
I'm saving those for part 2 in a few days if everything turns out
fine.
New machines this time contain:
- two Broadcom SoC based wireless routers from Asus
- Five allwinner based consumer devices for gaming, set-top-box and
eboot reader applications
- Three older phones based on Qualcomm chips, plus the more recent
Sony Xperia 1 V
- 14 industrial and embedded boards based on NXP i.MX6, i.MX8,
layerscape and s32g3 SoCs
- six rockchips boards including another handheld game console and a
few single-board computers
On top of these, we have the usual cleanups for dtc warnings and
updates to add more features to already merged machines"
* tag 'soc-dt-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/soc/soc: (612 commits)
arm64: dts: marvell: espressobin-ultra: fix Ethernet Switch unit address
arm64: dts: marvell: turris-mox: drop unneeded flash address/size-cells
arm64: dts: marvell: eDPU: drop redundant address/size-cells
arm64: dts: qcom: pm6150: correct USB VBUS regulator compatible
arm64: dts: rockchip: add rk3588 pcie and php IOMMUs
arm64: dts: rockchip: enable onboard spi flash for rock-3a
arm64: dts: rockchip: add USB-C support to rk3588s-orangepi-5
arm64: dts: rockchip: Enable GPU on Orange Pi 5
arm64: dts: rockchip: enable GPU on khadas-edge2
arm64: dts: rockchip: Add USB3 on Edgeble NCM6A-IO board
arm64: dts: rockchip: Support poweroff on Edgeble Neural Compute Module
arm64: dts: rockchip: Add Radxa ROCK 3C
dt-bindings: arm: rockchip: add Radxa ROCK 3C
arm64: dts: exynos: gs101: specify empty clocks for remaining pinctrl
arm64: dts: exynos: gs101: specify bus clock for pinctrl_hsi2
arm64: dts: exynos: gs101: specify bus clock for pinctrl_peric[01]
arm64: dts: exynos: gs101: specify bus clock for pinctrl (far) alive
arm64: dts: Add/fix /memory node unit-addresses
arm64: dts: qcom: qcs404: fix bluetooth device address
arm64: dts: qcom: sc8280xp-x13s: enable USB MP and fingerprint reader
...
|
|
Due to an erratum with the SPR_DSA and SPR_IAX devices, it is not secure to assign
these devices to virtual machines. Add the PCI IDs of these devices to the VFIO
denylist to ensure that this is handled appropriately by the VFIO subsystem.
The SPR_DSA and SPR_IAX devices are on-SOC devices for the Sapphire Rapids
(and related) family of products that perform data movement and compression.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
|
|
Merge our topic branch containing kdump hotplug changes, more detail from the
original cover letter:
Commit 247262756121 ("crash: add generic infrastructure for crash
hotplug support") added a generic infrastructure that allows
architectures to selectively update the kdump image component during CPU
or memory add/remove events within the kernel itself.
This patch series adds crash hotplug handler for PowerPC and enable
support to update the kdump image on CPU/Memory add/remove events.
Among the 6 patches in this series, the first two patches make changes
to the generic crash hotplug handler to assist PowerPC in adding support
for this feature. The last four patches add support for this feature.
The following section outlines the problem addressed by this patch
series, along with the current solution, its shortcomings, and the
proposed resolution.
Problem:
========
Due to CPU/Memory hotplug or online/offline events the elfcorehdr
(which describes the CPUs and memory of the crashed kernel) and FDT
(Flattened Device Tree) of kdump image becomes outdated. Consequently,
attempting dump collection with an outdated elfcorehdr or FDT can lead
to failed or inaccurate dump collection.
Going forward CPU hotplug or online/offline events are referred as
CPU/Memory add/remove events.
Existing solution and its shortcoming:
======================================
The current solution to address the above issue involves monitoring the
CPU/memory add/remove events in userspace using udev rules and whenever
there are changes in CPU and memory resources, the entire kdump image
is loaded again. The kdump image includes kernel, initrd, elfcorehdr,
FDT, purgatory. Given that only elfcorehdr and FDT get outdated due to
CPU/Memory add/remove events, reloading the entire kdump image is
inefficient. More importantly, kdump remains inactive for a substantial
amount of time until the kdump reload completes.
Proposed solution:
==================
Instead of initiating a full kdump image reload from userspace on
CPU/Memory hotplug and online/offline events, the proposed solution aims
to update only the necessary kdump image component within the kernel
itself.
|
|
Merge our KVM topic branch.
|
|
into next
|
|
https://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus
ASoC: Updates for v6.10
This is a very big update, in large part due to extensive work the Intel
people have been doing in their drivers though it's also been busy
elsewhere. There's also a big overhaul of the DAPM documentation from
Luca Ceresoli arising from the work he did putting together his recent
ELC talk, and he also contributed a new tool for visualising the DAPM
state.
- A new tool dapm-graph for visualising the DAPM state.
- Substantial fixes and clarifications for the DAPM documentation.
- Very large updates throughout the Intel audio drivers.
- Cleanups of accessors for driver data, module labelling, and for
constification.
- Modernsation and cleanup work in the Mediatek drivers.
- Several fixes and features for the DaVinci I2S driver.
- New drivers for several AMD and Intel platforms, Nuvoton NAU8325,
Rockchip RK3308 and Texas Instruments PCM6240.
|
|
Introduces the LANDLOCK_ACCESS_FS_IOCTL_DEV right
and increments the Landlock ABI version to 5.
This access right applies to device-custom IOCTL commands
when they are invoked on block or character device files.
Like the truncate right, this right is associated with a file
descriptor at the time of open(2), and gets respected even when the
file descriptor is used outside of the thread which it was originally
opened in.
Therefore, a newly enabled Landlock policy does not apply to file
descriptors which are already open.
If the LANDLOCK_ACCESS_FS_IOCTL_DEV right is handled, only a small
number of safe IOCTL commands will be permitted on newly opened device
files. These include FIOCLEX, FIONCLEX, FIONBIO and FIOASYNC, as well
as other IOCTL commands for regular files which are implemented in
fs/ioctl.c.
Noteworthy scenarios which require special attention:
TTY devices are often passed into a process from the parent process,
and so a newly enabled Landlock policy does not retroactively apply to
them automatically. In the past, TTY devices have often supported
IOCTL commands like TIOCSTI and some TIOCLINUX subcommands, which were
letting callers control the TTY input buffer (and simulate
keypresses). This should be restricted to CAP_SYS_ADMIN programs on
modern kernels though.
Known limitations:
The LANDLOCK_ACCESS_FS_IOCTL_DEV access right is a coarse-grained
control over IOCTL commands.
Landlock users may use path-based restrictions in combination with
their knowledge about the file system layout to control what IOCTLs
can be done.
Cc: Paul Moore <paul@paul-moore.com>
Cc: Christian Brauner <brauner@kernel.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Günther Noack <gnoack@google.com>
Link: https://lore.kernel.org/r/20240419161122.2023765-2-gnoack@google.com
Signed-off-by: Mickaël Salaün <mic@digikod.net>
|
|
Inline the calls to bpf_get_smp_processor_id() in the riscv bpf jit.
RISCV saves the pointer to the CPU's task_struct in the TP (thread
pointer) register. This makes it trivial to get the CPU's processor id.
As thread_info is the first member of task_struct, we can read the
processor id from TP + offsetof(struct thread_info, cpu).
RISCV64 JIT output for `call bpf_get_smp_processor_id`
======================================================
Before After
-------- -------
auipc t1,0x848c ld a5,32(tp)
jalr 604(t1)
mv a5,a0
Benchmark using [1] on Qemu.
./benchs/run_bench_trigger.sh glob-arr-inc arr-inc hash-inc
+---------------+------------------+------------------+--------------+
| Name | Before | After | % change |
|---------------+------------------+------------------+--------------|
| glob-arr-inc | 1.077 ± 0.006M/s | 1.336 ± 0.010M/s | + 24.04% |
| arr-inc | 1.078 ± 0.002M/s | 1.332 ± 0.015M/s | + 23.56% |
| hash-inc | 0.494 ± 0.004M/s | 0.653 ± 0.001M/s | + 32.18% |
+---------------+------------------+------------------+--------------+
NOTE: This benchmark includes changes from this patch and the previous
patch that implemented the per-cpu insn.
[1] https://github.com/anakryiko/linux/commit/8dec900975ef
Signed-off-by: Puranjay Mohan <puranjay@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Björn Töpel <bjorn@kernel.org>
Link: https://lore.kernel.org/r/20240502151854.9810-3-puranjay@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull x86 fixes from Borislav Petkov:
- Add a new PCI ID which belongs to a new AMD CPU family 0x1a
- Ensure that that last level cache ID is set in all cases, in the AMD
CPU topology parsing code, in order to prevent invalid scheduling
domain CPU masks
* tag 'x86_urgent_for_v6.9' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/topology/amd: Ensure that LLC ID is initialized
x86/amd_nb: Add new PCI IDs for AMD family 0x1a
|
|
KVM cleanups for 6.10:
- Misc cleanups extracted from the "exit on missing userspace mapping" series,
which has been put on hold in anticipation of a "KVM Userfault" approach,
which should provide a superset of functionality.
- Remove kvm_make_all_cpus_request_except(), which got added to hack around an
AVIC bug, and then became dead code when a more robust fix came along.
- Fix a goof in the KVM_CREATE_GUEST_MEMFD documentation.
|
|
git://git.kernel.org/pub/scm/linux/kernel/git/kvmarm/kvmarm into HEAD
KVM/arm64 updates for Linux 6.10
- Move a lot of state that was previously stored on a per vcpu
basis into a per-CPU area, because it is only pertinent to the
host while the vcpu is loaded. This results in better state
tracking, and a smaller vcpu structure.
- Add full handling of the ERET/ERETAA/ERETAB instructions in
nested virtualisation. The last two instructions also require
emulating part of the pointer authentication extension.
As a result, the trap handling of pointer authentication has
been greattly simplified.
- Turn the global (and not very scalable) LPI translation cache
into a per-ITS, scalable cache, making non directly injected
LPIs much cheaper to make visible to the vcpu.
- A batch of pKVM patches, mostly fixes and cleanups, as the
upstreaming process seems to be resuming. Fingers crossed!
- Allocate PPIs and SGIs outside of the vcpu structure, allowing
for smaller EL2 mapping and some flexibility in implementing
more or less than 32 private IRQs.
- Purge stale mpidr_data if a vcpu is created after the MPIDR
map has been created.
- Preserve vcpu-specific ID registers across a vcpu reset.
- Various minor cleanups and improvements.
|
|
As one can see in include/trace/stages/stage4_event_fields.h, the
implementation of __field() uses the is_signed_type() macro. As one can
see in commit dcf8e5633e2e ("tracing: Define the is_signed_type() macro
once"), there has been an attempt to not make is_signed_type() trigger
sparse warnings for bitwise types.
Despite that change, sparse complains when passing a bitwise type to
is_signed_type(). The reason is that in its definition below, an
inequality comparison will be made against bitwise types, which are random
collections of bits (the casts to bitwise types themselves are
semantically valid and not problematic):
#define is_signed_type(type) (((type)(-1)) < (__force type)1)
So, as a workaround, follow the example of <trace/events/initcall.h> and
suppress the following sparse warnings by changing __field() into
__field_struct() that doesn't use is_signed_type():
fs/nilfs2/segment.c: note: in included file (through
include/trace/trace_events.h, include/trace/define_trace.h,
include/trace/events/nilfs2.h):
./include/trace/events/nilfs2.h:191:1: warning: cast to restricted
blk_opf_t
./include/trace/events/nilfs2.h:191:1: warning: restricted blk_opf_t
degrades to integer
./include/trace/events/nilfs2.h:191:1: warning: restricted blk_opf_t
degrades to integer
[konishi.ryusuke: describe the reason for the warnings per Linus's explanation]
Link: https://lkml.kernel.org/r/20240507222041.4876-1-konishi.ryusuke@gmail.com
Link: https://lkml.kernel.org/r/20240507142454.3344-1-konishi.ryusuke@gmail.com
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202401092241.I4mm9OWl-lkp@intel.com/
Reported-by: Ryusuke Konishi <konishi.ryusuke@gmail.com>
Closes: https://lore.kernel.org/all/20240430080019.4242-2-konishi.ryusuke@gmail.com/
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
|