summaryrefslogtreecommitdiff
path: root/include/linux/mlx5
AgeCommit message (Collapse)Author
2022-09-09Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Many bug fixes in several drivers: - Fix misuse of the DMA API in rtrs - Several irdma issues: hung task due to SQ flushing, incorrect capability reporting to userspace, improper error handling for MW corners, touching an uninitialized SGL for during invalidation. - hns was using the wrong page size limits for the HW, an incorrect calculation of wqe_shift causing WQE corruption, and mis computed a timer id. - Fix a crash in SRP triggered by blktests - Fix compiler errors by calling virt_to_page() with the proper type in siw - Userspace triggerable deadlock in ODP - mlx5 could use the wrong profile due to some driver loading races, counters were not working in some device configurations, and a crash on error unwind" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/irdma: Report RNR NAK generation in device caps RDMA/irdma: Use s/g array in post send only when its valid RDMA/irdma: Return correct WC error for bind operation failure RDMA/irdma: Return error on MR deregister CQP failure RDMA/irdma: Report the correct max cqes from query device MAINTAINERS: Update maintainers of HiSilicon RoCE RDMA/mlx5: Fix UMR cleanup on error flow of driver init RDMA/mlx5: Set local port to one when accessing counters RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profile IB/core: Fix a nested dead lock as part of ODP flow RDMA/siw: Pass a pointer to virt_to_page() RDMA/srp: Set scmnd->result only when scmnd is not NULL RDMA/hns: Remove the num_qpc_timer variable RDMA/hns: Fix wrong fixed value of qp->rq.wqe_shift RDMA/hns: Fix supported page size RDMA/cma: Fix arguments order in net device validation RDMA/irdma: Fix drain SQ hang with no completion RDMA/rtrs-srv: Pass the correct number of entries for dma mapped SGL RDMA/rtrs-clt: Use the right sg_cnt after ib_dma_map_sg
2022-09-05RDMA/mlx5: Rely on RoCE fw cap instead of devlink when setting profileMaher Sanalla
When the RDMA auxiliary driver probes, it sets its profile based on devlink driverinit value. The latter might not be in sync with FW yet (In case devlink reload is not performed), thus causing a mismatch between RDMA driver and FW. This results in the following FW syndrome when the RDMA driver tries to adjust RoCE state, which fails the probe: "0xC1F678 | modify_nic_vport_context: roce_en set on a vport that doesn't support roce" To prevent this, select the PF profile based on FW RoCE capability instead of relying on devlink driverinit value. To provide backward compatibility of the RoCE disable feature, on older FW's where roce_rw is not set (FW RoCE capability is read-only), keep the current behavior e.g., rely on devlink driverinit value. Fixes: fbfa97b4d79f ("net/mlx5: Disable roce at HCA level") Reviewed-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Maher Sanalla <msanalla@nvidia.com> Link: https://lore.kernel.org/r/cb34ce9a1df4a24c135cb804db87f7d2418bd6cc.1661763459.git.leonro@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-22net/mlx5: Avoid false positive lockdep warning by adding lock_class_keyMoshe Shemesh
Add a lock_class_key per mlx5 device to avoid a false positive "possible circular locking dependency" warning by lockdep, on flows which lock more than one mlx5 device, such as adding SF. kernel log: ====================================================== WARNING: possible circular locking dependency detected 5.19.0-rc8+ #2 Not tainted ------------------------------------------------------ kworker/u20:0/8 is trying to acquire lock: ffff88812dfe0d98 (&dev->intf_state_mutex){+.+.}-{3:3}, at: mlx5_init_one+0x2e/0x490 [mlx5_core] but task is already holding lock: ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&(&notifier->n_head)->rwsem){++++}-{3:3}: down_write+0x90/0x150 blocking_notifier_chain_register+0x53/0xa0 mlx5_sf_table_init+0x369/0x4a0 [mlx5_core] mlx5_init_one+0x261/0x490 [mlx5_core] probe_one+0x430/0x680 [mlx5_core] local_pci_probe+0xd6/0x170 work_for_cpu_fn+0x4e/0xa0 process_one_work+0x7c2/0x1340 worker_thread+0x6f6/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 -> #0 (&dev->intf_state_mutex){+.+.}-{3:3}: __lock_acquire+0x2fc7/0x6720 lock_acquire+0x1c1/0x550 __mutex_lock+0x12c/0x14b0 mlx5_init_one+0x2e/0x490 [mlx5_core] mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core] auxiliary_bus_probe+0x9d/0xe0 really_probe+0x1e0/0xaa0 __driver_probe_device+0x219/0x480 driver_probe_device+0x49/0x130 __device_attach_driver+0x1b8/0x280 bus_for_each_drv+0x123/0x1a0 __device_attach+0x1a3/0x460 bus_probe_device+0x1a2/0x260 device_add+0x9b1/0x1b40 __auxiliary_device_add+0x88/0xc0 mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core] blocking_notifier_call_chain+0xd5/0x130 mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core] process_one_work+0x7c2/0x1340 worker_thread+0x59d/0xec0 kthread+0x28f/0x330 ret_from_fork+0x1f/0x30 other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&(&notifier->n_head)->rwsem); lock(&dev->intf_state_mutex); lock(&(&notifier->n_head)->rwsem); lock(&dev->intf_state_mutex); *** DEADLOCK *** 4 locks held by kworker/u20:0/8: #0: ffff888150612938 ((wq_completion)mlx5_events){+.+.}-{0:0}, at: process_one_work+0x6e2/0x1340 #1: ffff888100cafdb8 ((work_completion)(&work->work)#3){+.+.}-{0:0}, at: process_one_work+0x70f/0x1340 #2: ffff888101aa7898 (&(&notifier->n_head)->rwsem){++++}-{3:3}, at: blocking_notifier_call_chain+0x5a/0x130 #3: ffff88813682d0e8 (&dev->mutex){....}-{3:3}, at:__device_attach+0x76/0x460 stack backtrace: CPU: 6 PID: 8 Comm: kworker/u20:0 Not tainted 5.19.0-rc8+ Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS rel-1.13.0-0-gf21b5a4aeb02-prebuilt.qemu.org 04/01/2014 Workqueue: mlx5_events mlx5_vhca_state_work_handler [mlx5_core] Call Trace: <TASK> dump_stack_lvl+0x57/0x7d check_noncircular+0x278/0x300 ? print_circular_bug+0x460/0x460 ? lock_chain_count+0x20/0x20 ? register_lock_class+0x1880/0x1880 __lock_acquire+0x2fc7/0x6720 ? register_lock_class+0x1880/0x1880 ? register_lock_class+0x1880/0x1880 lock_acquire+0x1c1/0x550 ? mlx5_init_one+0x2e/0x490 [mlx5_core] ? lockdep_hardirqs_on_prepare+0x400/0x400 __mutex_lock+0x12c/0x14b0 ? mlx5_init_one+0x2e/0x490 [mlx5_core] ? mlx5_init_one+0x2e/0x490 [mlx5_core] ? _raw_read_unlock+0x1f/0x30 ? mutex_lock_io_nested+0x1320/0x1320 ? __ioremap_caller.constprop.0+0x306/0x490 ? mlx5_sf_dev_probe+0x269/0x370 [mlx5_core] ? iounmap+0x160/0x160 mlx5_init_one+0x2e/0x490 [mlx5_core] mlx5_sf_dev_probe+0x29c/0x370 [mlx5_core] ? mlx5_sf_dev_remove+0x130/0x130 [mlx5_core] auxiliary_bus_probe+0x9d/0xe0 really_probe+0x1e0/0xaa0 __driver_probe_device+0x219/0x480 ? auxiliary_match_id+0xe9/0x140 driver_probe_device+0x49/0x130 __device_attach_driver+0x1b8/0x280 ? driver_allows_async_probing+0x140/0x140 bus_for_each_drv+0x123/0x1a0 ? bus_for_each_dev+0x1a0/0x1a0 ? lockdep_hardirqs_on_prepare+0x286/0x400 ? trace_hardirqs_on+0x2d/0x100 __device_attach+0x1a3/0x460 ? device_driver_attach+0x1e0/0x1e0 ? kobject_uevent_env+0x22d/0xf10 bus_probe_device+0x1a2/0x260 device_add+0x9b1/0x1b40 ? dev_set_name+0xab/0xe0 ? __fw_devlink_link_to_suppliers+0x260/0x260 ? memset+0x20/0x40 ? lockdep_init_map_type+0x21a/0x7d0 __auxiliary_device_add+0x88/0xc0 ? auxiliary_device_init+0x86/0xa0 mlx5_sf_dev_state_change_handler+0x67e/0x9d0 [mlx5_core] blocking_notifier_call_chain+0xd5/0x130 mlx5_vhca_state_work_handler+0x2b0/0x3f0 [mlx5_core] ? mlx5_vhca_event_arm+0x100/0x100 [mlx5_core] ? lock_downgrade+0x6e0/0x6e0 ? lockdep_hardirqs_on_prepare+0x286/0x400 process_one_work+0x7c2/0x1340 ? lockdep_hardirqs_on_prepare+0x400/0x400 ? pwq_dec_nr_in_flight+0x230/0x230 ? rwlock_bug.part.0+0x90/0x90 worker_thread+0x59d/0xec0 ? process_one_work+0x1340/0x1340 kthread+0x28f/0x330 ? kthread_complete_and_exit+0x20/0x20 ret_from_fork+0x1f/0x30 </TASK> Fixes: 6a3273217469 ("net/mlx5: SF, Port function state change support") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-12Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - A huge patchset supporting vq resize using the new vq reset capability - Features, fixes, and cleanups all over the place * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (88 commits) vdpa/mlx5: Fix possible uninitialized return value vdpa_sim_blk: add support for discard and write-zeroes vdpa_sim_blk: add support for VIRTIO_BLK_T_FLUSH vdpa_sim_blk: make vdpasim_blk_check_range usable by other requests vdpa_sim_blk: check if sector is 0 for commands other than read or write vdpa_sim: Implement suspend vdpa op vhost-vdpa: uAPI to suspend the device vhost-vdpa: introduce SUSPEND backend feature bit vdpa: Add suspend operation virtio-blk: Avoid use-after-free on suspend/resume virtio_vdpa: support the arg sizes of find_vqs() vhost-vdpa: Call ida_simple_remove() when failed vDPA: fix 'cast to restricted le16' warnings in vdpa.c vDPA: !FEATURES_OK should not block querying device config space vDPA/ifcvf: support userspace to query features and MQ of a management device vDPA/ifcvf: get_config_size should return a value no greater than dev implementation vhost scsi: Allow user to control num virtqueues vhost-scsi: Fix max number of virtqueues vdpa/mlx5: Support different address spaces for control and data vdpa/mlx5: Implement susupend virtqueue callback ...
2022-08-11vdpa/mlx5: Implement susupend virtqueue callbackEli Cohen
Implement the suspend callback allowing to suspend the virtqueues so they stop processing descriptors. This is required to allow to query a consistent state of the virtqueue while live migration is taking place. Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220714113927.85729-2-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-08-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "This cycle we got a new RDMA driver "ERDMA" for the Alibaba cloud environment. Otherwise the changes are dominated by rxe fixes. There is another RDMA driver on the list that might get merged next cycle, 'MANA' for the Azure cloud environment. Summary: - Bug fixes and small features for irdma, hns, siw, qedr, hfi1, mlx5 - General spelling/grammer fixes - rdma cm can follow changes in neighbours for control packets - Significant amounts of rxe fixes and spec compliance changes - Use the modern NAPI API - Use the bitmap API instead of open coding - Performance improvements for rtrs - Add the ERDMA driver for Alibaba cloud - Fix a use after free bug in SRP" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits) RDMA/ib_srpt: Unify checking rdma_cm_id condition in srpt_cm_req_recv() RDMA/rxe: Fix error unwind in rxe_create_qp() RDMA/mlx5: Add missing check for return value in get namespace flow RDMA/rxe: Split qp state for requester and completer RDMA/rxe: Generate error completion for error requester QP state RDMA/rxe: Update wqe_index for each wqe error completion RDMA/srpt: Fix a use-after-free RDMA/srpt: Introduce a reference count in struct srpt_device RDMA/srpt: Duplicate port name members IB/qib: Fix repeated "in" within comments RDMA/erdma: Add driver to kernel build environment RDMA/erdma: Add the ABI definitions RDMA/erdma: Add the erdma module RDMA/erdma: Add connection management (CM) support RDMA/erdma: Add verbs implementation RDMA/erdma: Add verbs header file RDMA/erdma: Add event queue implementation RDMA/erdma: Add cmdq implementation RDMA/erdma: Add main include file RDMA/erdma: Add the hardware related definitions ...
2022-07-27RDMA/mlx5: Rename the mkey cache variables and functionsAharon Landau
After replacing the MR cache with an Mkey cache, rename the variables and functions to fit the new meaning. Link: https://lore.kernel.org/r/20220726071911.122765-6-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-19net/mlx5: Expose ts_cqe_metadata_size2wqe_counterAya Levin
Add capability field which indicates the mask for wqe_counter which connects between loopback CQE and the original WQE. With this connection the driver can identify lost of the loopback CQE and reply PTP synchronization with timestamp given in the original CQE. Signed-off-by: Aya Levin <ayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-17net/mlx5: fs, allow flow table creation with a UIDMark Bloch
Add UID field to flow table attributes to allow creating flow tables with a non default (zero) uid. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Alex Vesker <valex@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-17net/mlx5: fs, expose flow table ID to usersMark Bloch
Expose the flow table ID to users. This will be used by downstream patches to allow creating steering rules that point to a flow table ID. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-17net/mlx5: Expose the ability to point to any UID from shared UIDMark Bloch
Expose shared_object_to_user_object_allowed, this capability means an object created with shared UID can point to any UID. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-13net/mlx5: Use software VHCA id when it's supportedYishai Hadas
Use software VHCA id when it's supported by the firmware. A unique id is allocated upon mlx5_mdev_init() and freed upon mlx5_mdev_uninit(), as such it stays the same during the full life cycle of the device including upon health recovery if occurred. The conjunction of sw_vhca_id with sw_owner_id will be a global unique id per function which uses mlx5_core. The sw_vhca_id is set upon init_hca command and is used to specify the VHCA that the NIC vport is affiliated with. This functionality is needed upon migration of VM which is MPV based. (i.e. multi port device). Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-13net/mlx5: Introduce ifc bits for using software vhca idYishai Hadas
Introduce ifc related stuff to enable using software vhca id functionality. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-07-12net/mlx5: Use devl_ API in mlx5e_devlink_port_registerMoshe Shemesh
As part of the flows invoked by mlx5_devlink_eswitch_mode_set() get to mlx5_rescan_drivers_locked() which can call mlx5e_probe()/mlx5e_remove and register/unregister mlx5e driver ports accordingly. This can lead to deadlock once mlx5_devlink_eswitch_mode_set() will use devlink lock. Use devl_port_register/unregister() instead of devlink_port_register/unregister() and add devlink instance locks in the driver paths to this function to have it locked while calling devl_ API function. If remove or probe were called by module init or module cleanup flows, need to lock devlink just before calling devl_port_register(), otherwise it is called by attach/detach or register/unregister flow and we can have the flow locked. Added flag to distinguish between these cases. This will be used by the downstream patch to invoke mlx5_devlink_eswitch_mode_set() with devlink locked. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2022-07-02net/mlx5: E-switch, Remove dependency between sriov and eswitch modeChris Mi
Currently, there are three eswitch modes, none, legacy and switchdev. None is the default mode. Remove redundant none mode as eswitch mode should always be either legacy mode or switchdev mode. With this patch, there are two behavior changes: 1. Legacy becomes the default mode. When querying eswitch mode using devlink, a valid mode is always returned. 2. When disabling sriov, the eswitch mode will not change, only vfs are unloaded. Signed-off-by: Chris Mi <cmi@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13net/mlx5: Add bits and fields to support enhanced CQE compressionOfer Levi
Expose ifc bits and add needed structure fields and methods to support enhanced CQE compression feature. The enhanced CQE compression feature improves cpu utiliziation with better packet latency from nic to host. Signed-off-by: Ofer Levi <oferle@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13net/mlx5: Remove not used MLX5_CAP_BITS_RW_MASKShay Drory
Remove not used MLX5_CAP_BITS_RW_MASK. While at it, remove CAP_MASK, MLX5_CAP_OFF_CMDIF_CSUM and MLX5_DEV_CAP_FLAG_*, since MLX5_CAP_BITS_RW_MASK was their only user. Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13net/mlx5: Add support EXECUTE_ASO action for flow entryJianbo Liu
Attach flow meter to FTE with object id and index. Use metadata register C5 to store the packet color meter result. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Roi Dayan <roid@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13net/mlx5: Add HW definitions of vport debug countersSaeed Mahameed
total_q_under_processor_handle - number of queues in error state due to an async error or errored command. send_queue_priority_update_flow - number of QP/SQ priority/SL update events. cq_overrun - number of times CQ entered an error state due to an overflow. async_eq_overrun -number of time an EQ mapped to async events was overrun. comp_eq_overrun - number of time an EQ mapped to completion events was overrun. quota_exceeded_command - number of commands issued and failed due to quota exceeded. invalid_command - number of commands issued and failed dues to any reason other than quota exceeded. Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13net/mlx5: Add IFC bits and enums for flow meterJianbo Liu
Add/extend structure layouts and defines for flow meter. Signed-off-by: Jianbo Liu <jianbol@nvidia.com> Reviewed-by: Ariel Levkovich <lariel@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13net/mlx5: Manage ICM of type modify-header patternYevgeny Kliteynik
Added support for managing new type of ICM for devices that support sw_owner_v2. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-13net/mlx5: Introduce header-modify-pattern ICM propertiesYevgeny Kliteynik
Added new fields for device memory capabilities, in order to support creation of ICM memory for modify header patterns. Signed-off-by: Erez Shitrit <erezsh@mellanox.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Acked-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-06-03Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "vhost,virtio and vdpa features, fixes, and cleanups: - mac vlan filter and stats support in mlx5 vdpa - irq hardening in virtio - performance improvements in virtio crypto - polling i/o support in virtio blk - ASID support in vhost - fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (64 commits) vdpa: ifcvf: set pci driver data in probe vdpa/mlx5: Add RX MAC VLAN filter support vdpa/mlx5: Remove flow counter from steering vhost: rename vhost_work_dev_flush vhost-test: drop flush after vhost_dev_cleanup vhost-scsi: drop flush after vhost_dev_cleanup vhost_vsock: simplify vhost_vsock_flush() vhost_test: remove vhost_test_flush_vq() vhost_net: get rid of vhost_net_flush_vq() and extra flush calls vhost: flush dev once during vhost_dev_stop vhost: get rid of vhost_poll_flush() wrapper vhost-vdpa: return -EFAULT on copy_to_user() failure vdpasim: Off by one in vdpasim_set_group_asid() virtio: Directly use ida_alloc()/free() virtio: use WARN_ON() to warning illegal status value virtio: harden vring IRQ virtio: allow to unbreak virtqueue virtio-ccw: implement synchronize_cbs() virtio-mmio: implement synchronize_cbs() virtio-pci: implement synchronize_cbs() ...
2022-06-02Merge tag 'net-5.19-rc1' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from bpf and netfilter. Current release - new code bugs: - af_packet: make sure to pull the MAC header, avoid skb panic in GSO - ptp_clockmatrix: fix inverted logic in is_single_shot() - netfilter: flowtable: fix missing FLOWI_FLAG_ANYSRC flag - dt-bindings: net: adin: fix adi,phy-output-clock description syntax - wifi: iwlwifi: pcie: rename CAUSE macro, avoid MIPS build warning Previous releases - regressions: - Revert "net: af_key: add check for pfkey_broadcast in function pfkey_process" - tcp: fix tcp_mtup_probe_success vs wrong snd_cwnd - nf_tables: disallow non-stateful expression in sets earlier - nft_limit: clone packet limits' cost value - nf_tables: double hook unregistration in netns path - ping6: fix ping -6 with interface name Previous releases - always broken: - sched: fix memory barriers to prevent skbs from getting stuck in lockless qdiscs - neigh: set lower cap for neigh_managed_work rearming, avoid constantly scheduling the probe work - bpf: fix probe read error on big endian in ___bpf_prog_run() - amt: memory leak and error handling fixes Misc: - ipv6: expand & rename accept_unsolicited_na to accept_untracked_na" * tag 'net-5.19-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (80 commits) net/af_packet: make sure to pull mac header net: add debug info to __skb_pull() net: CONFIG_DEBUG_NET depends on CONFIG_NET stmmac: intel: Add RPL-P PCI ID net: stmmac: use dev_err_probe() for reporting mdio bus registration failure tipc: check attribute length for bearer name ice: fix access-beyond-end in the switch code nfp: remove padding in nfp_nfdk_tx_desc ax25: Fix ax25 session cleanup problems net: usb: qmi_wwan: Add support for Cinterion MV31 with new baseline sfc/siena: fix wrong tx channel offset with efx_separate_tx_channels sfc/siena: fix considering that all channels have TX queues socket: Don't use u8 type in uapi socket.h net/sched: act_api: fix error code in tcf_ct_flow_table_fill_tuple_ipv6() net: ping6: Fix ping -6 with interface name macsec: fix UAF bug for real_dev octeontx2-af: fix error code in is_valid_offset() wifi: mac80211: fix use-after-free in chanctx code bonding: guard ns_targets by CONFIG_IPV6 tcp: tcp_rtx_synack() can be called from process context ...
2022-06-01Merge tag 'vfio-v5.19-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull vfio updates from Alex Williamson: - Improvements to mlx5 vfio-pci variant driver, including support for parallel migration per PF (Yishai Hadas) - Remove redundant iommu_present() check (Robin Murphy) - Ongoing refactoring to consolidate the VFIO driver facing API to use vfio_device (Jason Gunthorpe) - Use drvdata to store vfio_device among all vfio-pci and variant drivers (Jason Gunthorpe) - Remove redundant code now that IOMMU core manages group DMA ownership (Jason Gunthorpe) - Remove vfio_group from external API handling struct file ownership (Jason Gunthorpe) - Correct typo in uapi comments (Thomas Huth) - Fix coccicheck detected deadlock (Wan Jiabing) - Use rwsem to remove races and simplify code around container and kvm association to groups (Jason Gunthorpe) - Harden access to devices in low power states and use runtime PM to enable d3cold support for unused devices (Abhishek Sahu) - Fix dma_owner handling of fake IOMMU groups (Jason Gunthorpe) - Set driver_managed_dma on vfio-pci variant drivers (Jason Gunthorpe) - Pass KVM pointer directly rather than via notifier (Matthew Rosato) * tag 'vfio-v5.19-rc1' of https://github.com/awilliam/linux-vfio: (38 commits) vfio: remove VFIO_GROUP_NOTIFY_SET_KVM vfio/pci: Add driver_managed_dma to the new vfio_pci drivers vfio: Do not manipulate iommu dma_owner for fake iommu groups vfio/pci: Move the unused device into low power state with runtime PM vfio/pci: Virtualize PME related registers bits and initialize to zero vfio/pci: Change the PF power state to D0 before enabling VFs vfio/pci: Invalidate mmaps and block the access in D3hot power state vfio: Change struct vfio_group::container_users to a non-atomic int vfio: Simplify the life cycle of the group FD vfio: Fully lock struct vfio_group::container vfio: Split up vfio_group_get_device_fd() vfio: Change struct vfio_group::opened from an atomic to bool vfio: Add missing locking for struct vfio_group::kvm kvm/vfio: Fix potential deadlock problem in vfio include/uapi/linux/vfio.h: Fix trivial typo - _IORW should be _IOWR instead vfio/pci: Use the struct file as the handle not the vfio_group kvm/vfio: Remove vfio_group from kvm vfio: Change vfio_group_set_kvm() to vfio_file_set_kvm() vfio: Change vfio_external_check_extension() to vfio_file_enforced_coherent() vfio: Remove vfio_external_group_match_file() ...
2022-05-31net/mlx5: correct ECE offset in query qp outputChangcheng Liu
ECE field should be after opt_param_mask in query qp output. Fixes: 6b646a7e4af6 ("net/mlx5: Add ability to read and write ECE options") Signed-off-by: Changcheng Liu <jerrliu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-31vdpa/mlx5: Add support for reading descriptor statisticsEli Cohen
Implement the get_vq_stats calback of vdpa_config_ops to return the statistics for a virtqueue. The statistics are provided as vendor specific statistics where the driver provides a pair of attribute name and attribute value. Currently supported are received descriptors and completed descriptors. Signed-off-by: Eli Cohen <elic@nvidia.com> Message-Id: <20220518133804.1075129-6-elic@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-17net/mlx5: Support multiport eswitch modeEli Cohen
Multiport eswitch mode is a LAG mode that allows to add rules that forward traffic to a specific physical port without being affected by LAG affinity configuration. This mode of operation is mutual exclusive with the other LAG modes used by multipath and bonding. To make the transition between the modes, we maintain a counter on the number of rules specifying one of the uplink representors as the target of mirred egress redirect action. An example of such rule would be: $ tc filter add dev enp8s0f0_0 prot all root flower dst_mac \ 00:11:22:33:44:55 action mirred egress redirect dev enp8s0f0 If the reference count just grows to one and LAG is not in use, we create the LAG in multiport eswitch mode. Other mode changes are not allowed while in this mode. When the reference count reaches zero, we destroy the LAG and let other modes be used if needed. logic also changed such that if forwarding to some uplink destination cannot be guaranteed, we fail the operation so the rule will eventually be in software and not in hardware. Signed-off-by: Eli Cohen <elic@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17net/mlx5: Inline db alloc API functionTariq Toukan
Take the wrapper version which picks default node into a header file. This reduces the number of exported functions. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-17net/mlx5: Add last command failure syndrome to debugfsMoshe Shemesh
Add syndrome of last command failure per command type to debugfs to ease debugging of such failure. last_failed_syndrome - last command failed syndrome returned by FW. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-10net/mlx5: Expose mlx5_sriov_blocking_notifier_register / unregister APIsYishai Hadas
Expose mlx5_sriov_blocking_notifier_register / unregister APIs to let a VF register to be notified for its enablement / disablement by the PF. Upon VF probe it will call mlx5_sriov_blocking_notifier_register() with its notifier block and upon VF remove it will call mlx5_sriov_blocking_notifier_unregister() to drop its registration. This can give a VF the ability to clean some resources upon disable before that the command interface goes down and on the other hand sets some stuff before that it's enabled. This may be used by a VF which is migration capable in few cases.(e.g. PF load/unload upon an health recovery). Link: https://lore.kernel.org/r/20220510090206.90374-2-yishaih@nvidia.com Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-05-09net/mlx5: Lag, add debugfs to query hardware lag stateMark Bloch
Lag state has become very complicated with many modes, flags, types and port selections methods and future work will add additional features. Add a debugfs to query the current lag state. A new directory named "lag" will be created under the mlx5 debugfs directory. As the driver has debugfs per pci function the location will be: <debugfs>/mlx5/<BDF>/lag For example: /sys/kernel/debug/mlx5/0000:08:00.0/lag The following files are exposed: - state: Returns "active" or "disabled". If "active" it means hardware lag is active. - members: Returns the BDFs of all the members of lag object. - type: Returns the type of the lag currently configured. Valid only if hardware lag is active. * "roce" - Members are bare metal PFs. * "switchdev" - Members are in switchdev mode. * "multipath" - ECMP offloads. - port_sel_mode: Returns the egress port selection method, valid only if hardware lag is active. * "queue_affinity" - Egress port is selected by the QP/SQ affinity. * "hash" - Egress port is selected by hash done on each packet. Controlled by: xmit_hash_policy of the bond device. - flags: Returns flags that are specific per lag @type. Valid only if hardware lag is active. * "shared_fdb" - "on" or "off", if "on" single FDB is used. - mapping: Returns the mapping which is used to select egress port. Valid only if hardware lag is active. If @port_sel_mode is "hash" returns the active egress ports. The hash result will select only active ports. if @port_sel_mode is "queue_affinity" returns the mapping between the configured port affinity of the QP/SQ and actual egress port. For example: * 1:1 - Mapping means if the configured affinity is port 1 traffic will egress via port 1. * 1:2 - Mapping means if the configured affinity is port 1 traffic will egress via port 2. This can happen if port 1 is down or in active/backup mode and port 1 is backup. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09net/mlx5: Support devices with more than 2 portsMark Bloch
Increase the define MLX5_MAX_PORTS to 4 as the driver is ready to support NICs with 4 ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09net/mlx5: Lag, expose number of lag portsMark Bloch
Downstream patches will add support for hardware lag with more than 2 ports. Add a way for users to query the number of lag ports. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-09net/mlx5: Add exit route when waiting for FWGavin Li
Currently, removing a device needs to get the driver interface lock before doing any cleanup. If the driver is waiting in a loop for FW init, there is no way to cancel the wait, instead the device cleanup waits for the loop to conclude and release the lock. To allow immediate response to remove device commands, check the TEARDOWN flag while waiting for FW init, and exit the loop if it has been set. Signed-off-by: Gavin Li <gavinl@nvidia.com> Reviewed-by: Moshe Shemesh <moshe@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Remove not-supported ICV lengthLeon Romanovsky
mlx5 doesn't allow to configure any AEAD ICV length other than 128, so remove the logic that configures other unsupported values. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Merge various control path IPsec headers into one fileLeon Romanovsky
The mlx5 IPsec code has logical separation between code that operates with XFRM objects (ipsec.c), HW objects (ipsec_offload.c), flow steering logic (ipsec_fs.c) and data path (ipsec_rxtx.c). Such separation makes sense for C-files, but isn't needed at all for H-files as they are included in batch anyway. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Remove useless validity checkLeon Romanovsky
All callers build xfrm attributes with help of mlx5e_ipsec_build_accel_xfrm_attrs() function that ensure validity of attributes. There is no need to recheck them again. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-03net/mlx5: Store IPsec ESN update work in XFRM stateLeon Romanovsky
mlx5 IPsec code updated ESN through workqueue with allocation calls in the data path, which can be saved easily if the work is created during XFRM state initialization routine. The locking used later in the work didn't protect from anything because change of HW context is possible during XFRM state add or delete only, which can cancel work and make sure that it is not running. Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-02net/mlx5: fs, add unused destination typeMark Bloch
When the caller doesn't pass a destination fs_core will create a unused rule just so a context can be returned. This unused rule is zeroed out and its type is 0 which can be mixed up with MLX5_FLOW_DESTINATION_TYPE_VPORT. Create a dedicated type to differentiate between the two named MLX5_FLOW_DESTINATION_TYPE_NONE. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-05-02net/mlx5: fs, split software and IFC flow destination definitionsMark Bloch
Separate flow destinations between software and IFC. Flow destination type passed by callers was used as the input in firmware commands and over the years software only types were added which resulted in mixing between the two. Create an IFC enum that contains only the flow destinations defined when talking to the firmware. Now that there is a proper software only enum for flow destinations the hardcoded values can be removed as the values are no longer used in firmware commands. Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-04-09net/mlx5: Remove not-implemented IPsec capabilitiesLeon Romanovsky
Clean a capabilities enum to remove not-implemented bits. Link: https://lore.kernel.org/r/1044bb7b779107ff38e48e3f6553421104f3f819.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09net/mlx5: Remove ipsec_ops function tableLeon Romanovsky
There is only one IPsec implementation and ipsec_ops is not needed at all in this situation. Together with removal of ipsec_ops, we can drop the entry checks as these functions are called for IPsec devices only. Link: https://lore.kernel.org/r/bc8dd1c8a77b65dbf5e2cf92c813ffaca2505c5f.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09net/mlx5: Remove not-needed IPsec configLeon Romanovsky
In current code, the CONFIG_MLX5_IPSEC and CONFIG_MLX5_EN_IPSEC are the same. So remove useless indirection. Link: https://lore.kernel.org/r/fd14492cbc01a0d51a5bfedde02bcd2154123fde.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09net/mlx5: Unify device IPsec capabilities checkLeon Romanovsky
Merge two different function to one in order to provide coherent picture if the device is IPsec capable or not. Link: https://lore.kernel.org/r/8f10ea06ad19c6f651e9fb33921009658f01e1d5.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09RDMA/mlx5: Drop crypto flow steering APILeon Romanovsky
The mlx5 flow steering crypto API was intended to be used in FPGA devices, which is not supported for years already. The removal of mlx5 crypto FPGA code together with inability to configure encryption keys makes the low steering API completely unusable. So delete the code, so any ESP flow steering requests will fail with not supported error, as it is happening now anyway as no device support this type of API. Link: https://lore.kernel.org/r/634a5face7734381463d809bfb89850f6998deac.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-09net/mlx5_fpga: Drop INNOVA IPsec supportLeon Romanovsky
Mellanox INNOVA IPsec cards are EOL in Nov, 2019 [1]. As such, the code is unmaintained, untested and not in-use by any upstream/distro oriented customers. In order to reduce code complexity, drop the kernel code. [1] https://network.nvidia.com/related-docs/eol/LCR-000535.pdf Link: https://lore.kernel.org/r/2afe88ec5020a491079eacf6fe3c89b64d65195c.1649232994.git.leonro@nvidia.com Reviewed-by: Raed Salem <raeds@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-06IB/mlx5: Fix undefined behavior due to shift overflowing the constantBorislav Petkov
Fix: drivers/infiniband/hw/mlx5/main.c: In function ‘translate_eth_legacy_proto_oper’: drivers/infiniband/hw/mlx5/main.c:370:2: error: case label does not reduce to an integer constant case MLX5E_PROT_MASK(MLX5E_50GBASE_KR2): ^~~~ See https://lore.kernel.org/r/YkwQ6%2BtIH8GQpuct@zn.tnic for the gory details as to why it triggers with older gccs only. Link: https://lore.kernel.org/all/20220405151517.29753-11-bp@alien8.de Signed-off-by: Borislav Petkov <bp@suse.de> Cc: Leon Romanovsky <leon@kernel.org> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: linux-rdma@vger.kernel.org Cc: netdev@vger.kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-04-06net/mlx5_fpga: Drop INNOVA TLS supportLeon Romanovsky
Mellanox INNOVA TLS cards are EOL in May, 2018 [1]. As such, the code is unmaintained, untested and not in-use by any upstream/distro oriented customers. In order to reduce code complexity, drop the kernel code. [1] https://network.nvidia.com/related-docs/eol/LCR-000286.pdf Link: https://lore.kernel.org/r/b88add368def721ea9d054cb69def72d9e3f67aa.1649073691.git.leonro@nvidia.com Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2022-03-24Merge tag 'net-next-5.18' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "The sprinkling of SPI drivers is because we added a new one and Mark sent us a SPI driver interface conversion pull request. Core ---- - Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO). - Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order. - Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect). - Continue the work of annotating packet drop reasons throughout the stack. - Switch netdev error counters from an atomic to dynamically allocated per-CPU counters. - Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT. - Extend the ref_tracker to allow catching use-after-free bugs. BPF --- - Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split. - Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers. - Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency. - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator. - Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep. - Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later). - Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra. - Allow cgroup BPF progs to return custom errors to user space. - Add support for AF_UNIX iterator batching. - Allow iterator programs to use sleepable helpers. - Support JIT of add, and, or, xor and xchg atomic ops on arm64. - Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info. - Large number of libbpf API improvements, cleanups and deprecations. Protocols --------- - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev. - Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames. - Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior. - VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge. - Support inserting IPv6 IOAM information to a fraction of frames. - Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.) - Support setting socket and IPv6 options via cmsg on ping6 sockets. - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules. - Add support for locked bridge ports (for 802.1X). - tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios. - IPv6 extension header handling in Open vSwitch. - Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor. - SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY - MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237) - Multi-BSSID beacon handling in AP mode for WiFi. - Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events - Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements - Increase the max PDU size in CAN ISOTP to 64 kB. Driver API ---------- - Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels. - Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8. - Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks. - Add header/data split indication to guide user space enabling of TCP zero-copy Rx. - Allow configuring completion queue event size. - Refactor page_pool to enable fragmenting after allocation. - Add allocation and page reuse statistics to page_pool. - Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches. - DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering New hardware / drivers ---------------------- - Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches - WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6 - Mobile: - iosm: Intel M.2 7360 WWAN card Drivers ------- - Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases. - Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support - Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions - Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP - Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support - Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports - Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload - NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches - Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock - Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets - Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode - Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices - MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915 - RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan - Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS) - Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup" * tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits) llc: fix netdevice reference leaks in llc_ui_bind() drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool ice: don't allow to run ice_send_event_to_aux() in atomic ctx ice: fix 'scheduling while atomic' on aux critical err interrupt net/sched: fix incorrect vlan_push_eth dest field net: bridge: mst: Restrict info size queries to bridge ports net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init() drivers: net: xgene: Fix regression in CRC stripping net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT net: dsa: fix missing host-filtered multicast addresses net/mlx5e: Fix build warning, detected write beyond size of field iwlwifi: mvm: Don't fail if PPAG isn't supported selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" netdevice: add missing dm_private kdoc net: bridge: mst: prevent NULL deref in br_mst_info_size() selftests: forwarding: Use same VRF for port and VLAN upper ...