summaryrefslogtreecommitdiff
path: root/include/linux/mlx5/mlx5_ifc.h
AgeCommit message (Collapse)Author
2024-03-15Merge tag 'vfio-v6.9-rc1' of https://github.com/awilliam/linux-vfioLinus Torvalds
Pull VFIO updates from Alex Williamson: - Add warning in unlikely case that device is not captured with driver_override (Kunwu Chan) - Error handling improvements in mlx5-vfio-pci to detect firmware tracking object error states, logging of firmware error syndrom, and releasing of firmware resources in aborted migration sequence (Yishai Hadas) - Correct an un-alphabetized VFIO MAINTAINERS entry (Alex Williamson) - Make the mdev_bus_type const and also make the class struct const for a couple of the vfio-mdev sample drivers (Ricardo B. Marliere) - Addition of a new vfio-pci variant driver for the GPU of NVIDIA's Grace-Hopper superchip. During initialization of the chip-to-chip interconnect in this hardware module, the PCI BARs of the device become unused in favor of a faster, coherent mechanism for exposing device memory. This driver primarily changes the VFIO representation of the device to masquerade this coherent aperture to replace the physical PCI BARs for userspace drivers. This also incorporates use of a new vma flag allowing KVM to use write combining attributes for uncached device memory (Ankit Agrawal) - Reset fixes and cleanups for the pds-vfio-pci driver. Save and restore files were previously leaked if the device didn't pass through an error state, this is resolved and later re-fixed to prevent access to the now freed files. Reset handling is also refactored to remove the complicated deferred reset mechanism (Brett Creeley) - Remove some references to pl330 in the vfio-platform amba driver (Geert Uytterhoeven) - Remove twice redundant and ugly code to unpin incidental pins of the zero-page (Alex Williamson) - Deferred reset logic is also removed from the hisi-acc-vfio-pci driver as a simplification (Shameer Kolothum) - Enforce that mlx5-vfio-pci devices must support PRE_COPY and remove resulting unnecessary code. There is no device firmware that has been available publicly without this support (Yishai Hadas) - Switch over to using the .remove_new callback for vfio-platform in support of the broader transition for a void remove function (Uwe Kleine-König) - Resolve multiple issues in interrupt code for VFIO bus drivers that allow calling eventfd_signal() on a NULL context. This also remove a potential race in INTx setup on certain hardware for vfio-pci, races with various mechanisms to mask INTx, and leaked virqfds in vfio-platform (Alex Williamson) * tag 'vfio-v6.9-rc1' of https://github.com/awilliam/linux-vfio: (29 commits) vfio/fsl-mc: Block calling interrupt handler without trigger vfio/platform: Create persistent IRQ handlers vfio/platform: Disable virqfds on cleanup vfio/pci: Create persistent INTx handler vfio: Introduce interface to flush virqfd inject workqueue vfio/pci: Lock external INTx masking ops vfio/pci: Disable auto-enable of exclusive INTx IRQ vfio/pds: Refactor/simplify reset logic vfio/pds: Make sure migration file isn't accessed after reset vfio/platform: Convert to platform remove callback returning void vfio/mlx5: Enforce PRE_COPY support vfio/mbochs: make mbochs_class constant vfio/mdpy: make mdpy_class constant hisi_acc_vfio_pci: Remove the deferred_reset logic Revert "vfio/type1: Unpin zero pages" vfio/nvgrace-gpu: Convey kvm to map device memory region as noncached vfio: amba: Rename pl330_ids[] to vfio_amba_ids[] vfio/pds: Always clear the save/restore FDs on reset vfio/nvgrace-gpu: Add vfio pci variant module for grace hopper vfio/pci: rename and export range_intersect_range ...
2024-03-08Merge tag 'mlx5-socket-direct-v3' of ↵Jakub Kicinski
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== Support Multi-PF netdev (Socket Direct) This series adds support for combining multiple devices (PFs) of the same port under one netdev instance. Passing traffic through different devices belonging to different NUMA sockets saves cross-numa traffic and allows apps running on the same netdev from different numas to still feel a sense of proximity to the device and achieve improved performance. We achieve this by grouping PFs together, and creating the netdev only once all group members are probed. Symmetrically, we destroy the netdev once any of the PFs is removed. The channels are distributed between all devices, a proper configuration would utilize the correct close numa when working on a certain app/cpu. We pick one device to be a primary (leader), and it fills a special role. The other devices (secondaries) are disconnected from the network in the chip level (set to silent mode). All RX/TX traffic is steered through the primary to/from the secondaries. Currently, we limit the support to PFs only, and up to two devices (sockets). * tag 'mlx5-socket-direct-v3' of git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux: Documentation: networking: Add description for multi-pf netdev net/mlx5: Enable SD feature net/mlx5e: Block TLS device offload on combined SD netdev net/mlx5e: Support per-mdev queue counter net/mlx5e: Support cross-vhca RSS net/mlx5e: Let channels be SD-aware net/mlx5e: Create EN core HW resources for all secondary devices net/mlx5e: Create single netdev per SD group net/mlx5: SD, Add debugfs net/mlx5: SD, Add informative prints in kernel log net/mlx5: SD, Implement steering for primary and secondaries net/mlx5: SD, Implement devcom communication and primary election net/mlx5: SD, Implement basic query and instantiation net/mlx5: SD, Introduce SD lib net/mlx5: Add MPIR bit in mcam_access_reg ==================== Link: https://lore.kernel.org/r/20240307084229.500776-1-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. No conflicts. Adjacent changes: net/core/page_pool_user.c 0b11b1c5c320 ("netdev: let netlink core handle -EMSGSIZE errors") 429679dcf7d9 ("page_pool: fix netlink dump stop/resume") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-07net/mlx5: Add MPIR bit in mcam_access_regTariq Toukan
Add a cap bit in mcam_access_reg to check for MPIR support. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-03-01net/mlx5: Check capability for fw_resetMoshe Shemesh
Functions which can't access MFRL (Management Firmware Reset Level) register, have no use of fw_reset structures or events. Remove fw_reset structures allocation and registration for fw reset events notifications for these functions. Having the devlink param enable_remote_dev_reset on functions that don't have this capability is misleading as these functions are not allowed to influence the reset flow. Hence, this patch removes this parameter for such functions. In addition, return not supported on devlink reload action fw_activate for these functions. Fixes: 38b9f903f22b ("net/mlx5: Handle sync reset request event") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Aya Levin <ayal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-22Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: net/ipv4/udp.c f796feabb9f5 ("udp: add local "peek offset enabled" flag") 56667da7399e ("net: implement lockless setsockopt(SO_PEEK_OFF)") Adjacent changes: net/unix/garbage.c aa82ac51d633 ("af_unix: Drop oob_skb ref before purging queue in GC.") 11498715f266 ("af_unix: Remove io_uring code for GC.") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-02-22net/mlx5: Add the IFC related bits for query trackerYishai Hadas
Add the IFC related bits for query tracker. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Reviewed-by: Kevin Tian <kevin.tian@intel.com> Acked-by: Leon Romanovsky <leon@kernel.org> Link: https://lore.kernel.org/r/20240205124828.232701-2-yishaih@nvidia.com Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
2024-02-20Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma fixes from Jason Gunthorpe: "Mostly irdma and bnxt_re fixes: - Missing error unwind in hf1 - For bnxt - fix fenching behavior to work on new chips, fail unsupported SRQ resize back to userspace, propogate SRQ FW failure back to userspace. - Correctly fail unsupported SRQ resize back to userspace in bnxt - Adjust a memcpy in mlx5 to not overflow a struct field. - Prevent userspace from triggering mlx5 fw syndrome logging from sysfs - Use the correct access mode for MLX5_IB_METHOD_DEVX_OBJ_MODIFY to avoid a userspace failure on modify - For irdma - Don't UAF a concurrent tasklet during destroy, prevent userspace from issuing invalid QP attrs, fix a possible CQ overflow, capture a missing HW async error event - sendmsg() triggerable memory access crash in hfi1 - Fix the srpt_service_guid parameter to not crash due to missing function pointer - Don't leak objects in error unwind in qedr - Don't weirdly cast function pointers in srpt" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: RDMA/srpt: fix function pointer cast warnings RDMA/qedr: Fix qedr_create_user_qp error flow RDMA/srpt: Support specifying the srpt_service_guid parameter IB/hfi1: Fix sdma.h tx->num_descs off-by-one error RDMA/irdma: Add AE for too many RNRS RDMA/irdma: Set the CQ read threshold for GEN 1 RDMA/irdma: Validate max_send_wr and max_recv_wr RDMA/irdma: Fix KASAN issue with tasklet RDMA/mlx5: Relax DEVX access upon modify commands IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not supported RDMA/mlx5: Fix fortify source warning while accessing Eth segment RDMA/bnxt_re: Add a missing check in bnxt_qplib_query_srq RDMA/bnxt_re: Return error for SRQ resize RDMA/bnxt_re: Fix unconditional fence for newer adapters RDMA/bnxt_re: Remove a redundant check inside bnxt_re_vf_res_config RDMA/bnxt_re: Avoid creating fence MR for newer adapters IB/hfi1: Fix a memleak in init_credit_return
2024-02-05net/mlx5: Remove initial segmentation duplicate definitionsGal Pressman
Device definitions belong in mlx5_ifc, remove the duplicates in mlx5_core.h. Signed-off-by: Gal Pressman <gal@nvidia.com> Reviewed-by: Jianbo Liu <jianbol@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-02-01net/mlx5: DPLL, Implement lock status error valueJiri Pirko
Fill-up the lock status error value properly. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Acked-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Reviewed-by: Simon Horman <horms@kernel.org> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-01-31IB/mlx5: Don't expose debugfs entries for RRoCE general parameters if not ↵Mark Zhang
supported debugfs entries for RRoCE general CC parameters must be exposed only when they are supported, otherwise when accessing them there may be a syndrome error in kernel log, for example: $ cat /sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp cat: '/sys/kernel/debug/mlx5/0000:08:00.1/cc_params/rtt_resp_dscp': Invalid argument $ dmesg mlx5_core 0000:08:00.1: mlx5_cmd_out_err:805:(pid 1253): QUERY_CONG_PARAMS(0x824) op_mod(0x0) failed, status bad parameter(0x3), syndrome (0x325a82), err(-22) Fixes: 66fb1d5df6ac ("IB/mlx5: Extend debug control for CC parameters") Reviewed-by: Edward Srouji <edwards@nvidia.com> Signed-off-by: Mark Zhang <markzhang@nvidia.com> Link: https://lore.kernel.org/r/e7ade70bad52b7468bdb1de4d41d5fad70c8b71c.1706433934.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2024-01-24net/mlx5: Bridge, fix multicast packets sent to uplinkMoshe Shemesh
To enable multicast packets which are offloaded in bridge multicast offload mode to be sent also to uplink, FTE bit uplink_hairpin_en should be set. Add this bit to FTE for the bridge multicast offload rules. Fixes: 18c2916cee12 ("net/mlx5: Bridge, snoop igmp/mld packets") Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-24net/mlx5: Fix query of sd_group fieldTariq Toukan
The sd_group field moved in the HW spec from the MPIR register to the vport context. Align the query accordingly. Fixes: f5e956329960 ("net/mlx5: Expose Management PCIe Index Register (MPIR)") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2024-01-18Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: - vdpa/mlx5: support for resumable vqs - virtio_scsi: mq_poll support - 3virtio_pmem: support SHMEM_REGION - virtio_balloon: stay awake while adjusting balloon - virtio: support for no-reset virtio PCI PM - Fixes, cleanups * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: vdpa/mlx5: Add mkey leak detection vdpa/mlx5: Introduce reference counting to mrs vdpa/mlx5: Use vq suspend/resume during .set_map vdpa/mlx5: Mark vq state for modification in hw vq vdpa/mlx5: Mark vq addrs for modification in hw vq vdpa/mlx5: Introduce per vq and device resume vdpa/mlx5: Allow modifying multiple vq fields in one modify command vdpa/mlx5: Expose resumable vq capability vdpa: Block vq property changes in DRIVER_OK vdpa: Track device suspended state scsi: virtio_scsi: Add mq_poll support virtio_pmem: support feature SHMEM_REGION virtio_balloon: stay awake while adjusting balloon vdpa: Remove usage of the deprecated ida_simple_xx() API virtio: Add support for no-reset virtio PCI PM virtio_net: fix missing dma unmap for resize vhost-vdpa: account iommu allocations vdpa: Fix an error handling path in eni_vdpa_probe()
2024-01-12Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Small cycle, with some typical driver updates: - General code tidying in siw, hfi1, idrdma, usnic, hns rtrs and bnxt_re - Many small siw cleanups without an overeaching theme - Debugfs stats for hns - Fix a TX queue timeout in IPoIB and missed locking of the mcast list - Support more features of P7 devices in bnxt_re including a new work submission protocol - CQ interrupts for MANA - netlink stats for erdma - EFA multipath PCI support - Fix Incorrect MR invalidation in iser" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (66 commits) RDMA/bnxt_re: Fix error code in bnxt_re_create_cq() RDMA/efa: Add EFA query MR support IB/iser: Prevent invalidating wrong MR RDMA/erdma: Add hardware statistics support RDMA/erdma: Introduce dma pool for hardware responses of CMDQ requests IB/iser: iscsi_iser.h: fix kernel-doc warning and spellos RDMA/mana_ib: Add CQ interrupt support for RAW QP RDMA/mana_ib: query device capabilities RDMA/mana_ib: register RDMA device with GDMA RDMA/bnxt_re: Fix the sparse warnings RDMA/bnxt_re: Fix the offset for GenP7 adapters for user applications RDMA/bnxt_re: Share a page to expose per CQ info with userspace RDMA/bnxt_re: Add UAPI to share a page with user space IB/ipoib: Fix mcast list locking RDMA/mlx5: Expose register c0 for RDMA device net/mlx5: E-Switch, expose eswitch manager vport net/mlx5: Manage ICM type of SW encap RDMA/mlx5: Support handling of SW encap ICM area net/mlx5: Introduce indirect-sw-encap ICM properties RDMA/bnxt_re: Adds MSN table capability for Gen P7 adapters ...
2024-01-10vdpa/mlx5: Expose resumable vq capabilityDragos Tatulea
Necessary for checking if resumable vqs are supported by the hardware. Actual support will be added in a downstream patch. Reviewed-by: Gal Pressman <gal@nvidia.com> Acked-by: Eugenio Pérez <eperezma@redhat.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Message-Id: <20231225151203.152687-2-dtatulea@nvidia.com> Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2024-01-07Revert "mlx5 updates 2023-12-20"Jakub Kicinski
Revert "net/mlx5: Implement management PF Ethernet profile" This reverts commit 22c4640698a1d47606b5a4264a584e8046641784. Revert "net/mlx5: Enable SD feature" This reverts commit c88c49ac9c18fb7c3fa431126de1d8f8f555e912. Revert "net/mlx5e: Block TLS device offload on combined SD netdev" This reverts commit 83a59ce0057b7753d7fbece194b89622c663b2a6. Revert "net/mlx5e: Support per-mdev queue counter" This reverts commit d72baceb92539a178d2610b0e9ceb75706a75b55. Revert "net/mlx5e: Support cross-vhca RSS" This reverts commit c73a3ab8fa6e93a783bd563938d7cf00d62d5d34. Revert "net/mlx5e: Let channels be SD-aware" This reverts commit e4f9686bdee7b4dd89e0ed63cd03606e4bda4ced. Revert "net/mlx5e: Create EN core HW resources for all secondary devices" This reverts commit c4fb94aa822d6c9d05fc3c5aee35c7e339061dc1. Revert "net/mlx5e: Create single netdev per SD group" This reverts commit e2578b4f983cfcd47837bbe3bcdbf5920e50b2ad. Revert "net/mlx5: SD, Add informative prints in kernel log" This reverts commit c82d360325112ccc512fc11a3b68cdcdf04a1478. Revert "net/mlx5: SD, Implement steering for primary and secondaries" This reverts commit 605fcce33b2d1beb0139b6e5913fa0b2062116b2. Revert "net/mlx5: SD, Implement devcom communication and primary election" This reverts commit a45af9a96740873db9a4b5bb493ce2ad81ccb4d5. Revert "net/mlx5: SD, Implement basic query and instantiation" This reverts commit 63b9ce944c0e26c44c42cdd5095c2e9851c1a8ff. Revert "net/mlx5: SD, Introduce SD lib" This reverts commit 4a04a31f49320d078b8078e1da4b0e2faca5dfa3. Revert "net/mlx5: Fix query of sd_group field" This reverts commit e04984a37398b3f4f5a79c993b94c6b1224184cc. Revert "net/mlx5e: Use the correct lag ports number when creating TISes" This reverts commit a7e7b40c4bc115dbf2a2bb453d7bbb2e0ea99703. There are some unanswered questions on the list, and we don't have any docs. Given the lack of replies so far and the fact that v6.8 merge window has started - let's revert this and revisit for v6.9. Link: https://lore.kernel.org/all/20231221005721.186607-1-saeed@kernel.org/ Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-20net/mlx5: Implement management PF Ethernet profileArmen Ratner
Add management PF modules, which introduce support for the structures needed to create the resources for the MGMT PF to work. Also, add the necessary calls and functions to establish this functionality. Signed-off-by: Armen Ratner <armeng@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Reviewed-by: Daniel Jurgens <danielj@nvidia.com>
2023-12-20net/mlx5: Fix query of sd_group fieldTariq Toukan
The sd_group field moved in the HW spec from the MPIR register to the vport context. Align the query accordingly. Fixes: f5e956329960 ("net/mlx5: Expose Management PCIe Index Register (MPIR)") Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-15Merge tag 'mlx5-updates-2023-12-13' of ↵David S. Miller
git://git.kernel.org/pub/scm/linux/kernel/git/saeed/linux Saeed Mahameed says: ==================== mlx5-updates-2023-12-13 Preparation for mlx5e socket direct feature. Socket direct will allow multiple PF devices attached to different NUMA nodes but sharing the same physical port. The following series is a small refactoring series in preparation to support socket direct in the following submission. Highlights: - Define required device registers and bits related to socket direct - Flow steering re-arrangements - Generalize TX objects (TISs) and store them in a common object, will be useful in the next series for per function object management. - Decouple raw CQ objects from their parent netdev priv - Prepare devcom for Socket Direct device group discovery. Please see the individual patches for more information. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-14Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
Cross-merge networking fixes after downstream PR. Conflicts: drivers/net/ethernet/intel/iavf/iavf_ethtool.c 3a0b5a2929fd ("iavf: Introduce new state machines for flow director") 95260816b489 ("iavf: use iavf_schedule_aq_request() helper") https://lore.kernel.org/all/84e12519-04dc-bd80-bc34-8cf50d7898ce@intel.com/ drivers/net/ethernet/broadcom/bnxt/bnxt.c c13e268c0768 ("bnxt_en: Fix HWTSTAMP_FILTER_ALL packet timestamp logic") c2f8063309da ("bnxt_en: Refactor RX VLAN acceleration logic.") a7445d69809f ("bnxt_en: Add support for new RX and TPA_START completion types for P7") 1c7fd6ee2fe4 ("bnxt_en: Rename some macros for the P5 chips") https://lore.kernel.org/all/20231211110022.27926ad9@canb.auug.org.au/ drivers/net/ethernet/broadcom/bnxt/bnxt_ptp.c bd6781c18cb5 ("bnxt_en: Fix wrong return value check in bnxt_close_nic()") 84793a499578 ("bnxt_en: Skip nic close/open when configuring tstamp filters") https://lore.kernel.org/all/20231214113041.3a0c003c@canb.auug.org.au/ drivers/net/ethernet/mellanox/mlx5/core/fw_reset.c 3d7a3f2612d7 ("net/mlx5: Nack sync reset request when HotPlug is enabled") cecf44ea1a1f ("net/mlx5: Allow sync reset flow when BF MGT interface device is present") https://lore.kernel.org/all/20231211110328.76c925af@canb.auug.org.au/ No adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-12-13net/mlx5: Expose Management PCIe Index Register (MPIR)Tariq Toukan
MPIR register allows to query the PCIe indexes and Socket-Direct related parameters. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-13net/mlx5: Add mlx5_ifc bits used for supporting single netdev Socket-DirectTariq Toukan
Multiple device caps and features are required to support single netdev Socket-Direct. Add them here in preparation for the feature implementation. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-12-12net/mlx5: Introduce indirect-sw-encap ICM propertiesShun Hao
Add new fields for device memory capabilities, in order to support creation of new ICM memory type of SW encap. Signed-off-by: Shun Hao <shunh@nvidia.com> Link: https://lore.kernel.org/r/107cca7dd6a932a1704abf6ebd1b801105546a8e.1701871118.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-12-04net/mlx5e: Tidy up IPsec NAT-T SA discoveryLeon Romanovsky
IPsec NAT-T packets are UDP encapsulated packets over ESP normal ones. In case they arrive to RX, the SPI and ESP are located in inner header, while the check was performed on outer header instead. That wrong check caused to the situation where received rekeying request was missed and caused to rekey timeout, which "compensated" this failure by completing rekeying. Fixes: d65954934937 ("net/mlx5e: Support IPsec NAT-T functionality") Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-12-04net/mlx5e: Honor user choice of IPsec replay window sizeLeon Romanovsky
Users can configure IPsec replay window size, but mlx5 driver didn't honor their choice and set always 32bits. Fix assignment logic to configure right size from the beginning. Fixes: 7db21ef4566e ("net/mlx5e: Set IPsec replay sequence numbers") Reviewed-by: Patrisious Haddad <phaddad@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2023-11-15net/mlx5: Query maximum frequency adjustment of the PTP hardware clockRahul Rameshbabu
Some mlx5 devices do not support the default advertised maximum frequency adjustment value for the PTP hardware clock that is set by the driver. These devices need to be queried when initializing the clock functionality in order to get the maximum supported frequency adjustment value. This value can be greater than the minimum supported frequency adjustment across mlx5 devices (50 million ppb). Signed-off-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-11-05Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhostLinus Torvalds
Pull virtio updates from Michael Tsirkin: "vhost,virtio,vdpa: features, fixes, cleanups. vdpa/mlx5: - VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK - new maintainer vdpa: - support for vq descriptor mappings - decouple reset of iotlb mapping from device reset and fixes, cleanups all over the place" * tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost: (34 commits) vdpa_sim: implement .reset_map support vdpa/mlx5: implement .reset_map driver op vhost-vdpa: clean iotlb map during reset for older userspace vdpa: introduce .compat_reset operation callback vhost-vdpa: introduce IOTLB_PERSIST backend feature bit vhost-vdpa: reset vendor specific mapping to initial state in .release vdpa: introduce .reset_map operation callback virtio_pci: add check for common cfg size virtio-blk: fix implicit overflow on virtio_max_dma_size virtio_pci: add build offset check for the new common cfg items virtio: add definition of VIRTIO_F_NOTIF_CONFIG_DATA feature bit vduse: make vduse_class constant vhost-scsi: Spelling s/preceeding/preceding/g virtio: kdoc for struct virtio_pci_modern_device vdpa: Update sysfs ABI documentation MAINTAINERS: Add myself as mlx5_vdpa driver virtio-balloon: correct the comment of virtballoon_migratepage() mlx5_vdpa: offer VHOST_BACKEND_F_ENABLE_AFTER_DRIVER_OK vdpa/mlx5: Update cvq iotlb mapping on ASID change vdpa/mlx5: Make iotlb helper functions more generic ...
2023-10-13Merge branch 'mlx5-next' of ↵Jakub Kicinski
https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux Leon Romanovsky says: ==================== This PR is collected from https://lore.kernel.org/all/cover.1695296682.git.leon@kernel.org This series from Patrisious extends mlx5 to support IPsec packet offload in multiport devices (MPV, see [1] for more details). These devices have single flow steering logic and two netdev interfaces, which require extra logic to manage IPsec configurations as they performed on netdevs. [1] https://lore.kernel.org/linux-rdma/20180104152544.28919-1-leon@kernel.org/ * 'mlx5-next' of https://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux: net/mlx5: Handle IPsec steering upon master unbind/bind net/mlx5: Configure IPsec steering for ingress RoCEv2 MPV traffic net/mlx5: Configure IPsec steering for egress RoCEv2 MPV traffic net/mlx5: Add create alias flow table function to ipsec roce net/mlx5: Implement alias object allow and create functions net/mlx5: Add alias flow table bits net/mlx5: Store devcom pointer inside IPsec RoCE net/mlx5: Register mlx5e priv to devcom in MPV mode RDMA/mlx5: Send events from IB driver about device affiliation state net/mlx5: Introduce ifc bits for migration in a chunk mode ==================== Link: https://lore.kernel.org/r/20231002083832.19746-1-leon@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-10-02vdpa/mlx5: Expose descriptor group mkey hw capabilityDragos Tatulea
Necessary for improved live migration flow. Actual support will be added in a downstream patch. Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Dragos Tatulea <dtatulea@nvidia.com> Link: https://lore.kernel.org/r/20230928164550.980832-3-dtatulea@nvidia.com Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-10-02net/mlx5: Add alias flow table bitsPatrisious Haddad
Add all the capabilities needed to check for alias object support. As well as all the fields or commands needed for its creation and the creation of flow table that is able to jump to an alias object. Signed-off-by: Patrisious Haddad <phaddad@nvidia.com> Reviewed-by: Mark Bloch <mbloch@nvidia.com> Link: https://lore.kernel.org/r/544c030f2a78c4adf3fe6b64f97a39cc1bbdabb9.1695296682.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-28net/mlx5: Introduce ifc bits for migration in a chunk modeYishai Hadas
Introduce ifc related stuff to enable migration in a chunk mode. Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Link: https://lore.kernel.org/r/20230911093856.81910-2-yishaih@nvidia.com Reviewed-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-09-19net/mlx5: Add a health error syndrome for pci data poisonedMoshe Shemesh
Add new health error syndrome to indicate that pci data poisoned error has been received while fetching device ICM data. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-09-17mlx5: Implement SyncE support using DPLL infrastructureJiri Pirko
Implement SyncE support using newly introduced DPLL support. Make sure that each PFs/VFs/SFs probed with appropriate capability will spawn a dpll auxiliary device and register appropriate dpll device and pin instances. Signed-off-by: Jiri Pirko <jiri@nvidia.com> Signed-off-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com> Signed-off-by: Vadim Fedorenko <vadim.fedorenko@linux.dev> Signed-off-by: David S. Miller <davem@davemloft.net>
2023-08-27net/mlx5: Add IFC bits to support IPsec enable/disableLeon Romanovsky
Add hardware definitions to allow to control IPSec capabilities. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Link: https://lore.kernel.org/r/20230825062836.103744-6-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-08-14net/mlx5: Remove unused CAPsShay Drory
mlx5 driver queries the device for VECTOR_CALC and SHAMPO caps, but there isn't any user who requires them. As well as, MLX5_MCAM_REGS_0x9080_0x90FF is queried but not used. Thus, drop all usages and definitions of the mentioned caps above. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maher Sanalla <msanalla@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-14net/mlx5: Check with FW that sync reset completed successfullyMoshe Shemesh
Even if the PF driver had no error on his part of the sync reset flow, the firmware can see wider picture as it syncs all the PFs in the flow. So add at end of sync reset flow check with firmware by reading MFRL register and initialization segment that the flow had no issue from firmware point of view too. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-08-09net/mlx5: Expose NIC temperature via hardware monitoring kernel APIAdham Faris
Expose NIC temperature by implementing hwmon kernel API, which turns current thermal zone kernel API to redundant. For each one of the supported and exposed thermal diode sensors, expose the following attributes: 1) Input temperature. 2) Highest temperature. 3) Temperature label: Depends on the firmware capability, if firmware doesn't support sensors naming, the fallback naming convention would be: "sensorX", where X is the HW spec (MTMP register) sensor index. 4) Temperature critical max value: refers to the high threshold of Warning Event. Will be exposed as `tempY_crit` hwmon attribute (RO attribute). For example for ConnectX5 HCA's this temperature value will be 105 Celsius, 10 degrees lower than the HW shutdown temperature). 5) Temperature reset history: resets highest temperature. For example, for dualport ConnectX5 NIC with a single IC thermal diode sensor will have 2 hwmon directories (one for each PCI function) under "/sys/class/hwmon/hwmon[X,Y]". Listing one of the directories above (hwmonX/Y) generates the corresponding output below: $ grep -H -d skip . /sys/class/hwmon/hwmon0/* Output ======================================================================= /sys/class/hwmon/hwmon0/name:mlx5 /sys/class/hwmon/hwmon0/temp1_crit:105000 /sys/class/hwmon/hwmon0/temp1_highest:48000 /sys/class/hwmon/hwmon0/temp1_input:46000 /sys/class/hwmon/hwmon0/temp1_label:asic grep: /sys/class/hwmon/hwmon0/temp1_reset_history: Permission denied In addition, displaying the sensors data via lm_sensors generates the corresponding output below: $ sensors Output ======================================================================= mlx5-pci-0800 Adapter: PCI adapter asic: +46.0°C (crit = +105.0°C, highest = +48.0°C) mlx5-pci-0801 Adapter: PCI adapter asic: +46.0°C (crit = +105.0°C, highest = +48.0°C) CC: Jean Delvare <jdelvare@suse.com> Signed-off-by: Adham Faris <afaris@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Acked-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Simon Horman <horms@kernel.org> Link: https://lore.kernel.org/r/20230807180507.22984-3-saeed@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2023-07-25net/mlx5: Add relevant capabilities bits to support NAT-TLeon Romanovsky
Provide an ability to check if flow steering supports UDP encapsulation and decapsulation of IPsec ESP packets. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-06-23net/mlx5: Fix reserved at offset in hca_cap registerLama Kayal
A member of struct mlx5_ifc_cmd_hca_cap_bits has been mistakenly assigned the wrong reserved_at offset value. Correct it to align to the right value, thus avoid future miscalculation. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5: Expose bits for local loopback counterOr Har-Toov
Add needed HW bits for querying local loopback counter and the HCA capability for it. Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Avihai Horon <avihaih@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5: Handle sync reset unload eventMoshe Shemesh
Added a new event handler to firmware sync reset, which is used to support firmware sync reset flow on smart NIC. Adding this new stage to the flow enables the firmware to ensure host PFs unload before ECPFs unload, to avoid race of PFs recovery. If firmware sends sync_reset_unload event to driver the driver should unload and close all HW resources of the function. Once the driver finishes unloading part, it can't get any more events from firmware as event queues are closed, so it polls the reset state field to know when to continue to next stage of the sync reset flow. Added capability bit for supporting sync_reset_unload event. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-16net/mlx5: Expose timeout for sync reset unload stageMoshe Shemesh
Expose new timoueout in Default Timeouts Register to be used on sync reset flow running on smart NIC. In this flow the driver should know how much time to wait from getting unload request till firmware will ask the PF to continue to next stage of the flow. Signed-off-by: Moshe Shemesh <moshe@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-09net/mlx5: mlx5_ifc updates for embedded CPU SRIOVDaniel Jurgens
Add ec_vf_vport_base to HCA Capabilities 2. This indicates the base vport of embedded CPU virtual functions that are connected to the eswitch. Add ec_vf_function to query/set_hca_caps. If set this indicates accessing a virtual function on the embedded CPU by function ID. This should only be used with other_function set to 1. Signed-off-by: Daniel Jurgens <danielj@nvidia.com> Reviewed-by: Bodong Wang <bodong@nvidia.com> Reviewed-by: William Tu <witu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-06-07net/mlx5e: Expose catastrophic steering error countersLama Kayal
Add generated_pkt_steering_fail and handled_pkt_steering_fail to devlink heatlth reporter. generated_pkt_steering_fail indicates the number of packets dropped due to illegal steering operation within the vport steering domain. handled_pkt_steering_fail indicates the number of packets dropped due to illegal steering operation, originated by the vport. Also, update devlink reporter functionality documentation with the newly exposed counters. Signed-off-by: Lama Kayal <lkayal@nvidia.com> Reviewed-by: Rahul Rameshbabu <rrameshbabu@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-05-22net/mlx5: DR, Check force-loopback RC QP capability independently from RoCEYevgeny Kliteynik
SW Steering uses RC QP for writing STEs to ICM. This writingis done in LB (loopback), and FL (force-loopback) QP is preferred for performance. FL is available when RoCE is enabled or disabled based on RoCE caps. This patch adds reading of FL capability from HCA caps in addition to the existing reading from RoCE caps, thus fixing the case where we didn't have loopback enabled when RoCE was disabled. Fixes: 7304d603a57a ("net/mlx5: DR, Add support for force-loopback QP") Signed-off-by: Itamar Gozlan <igozlan@nvidia.com> Signed-off-by: Yevgeny Kliteynik <kliteyn@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-29Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "Usual wide collection of unrelated items in drivers: - Driver bug fixes and treewide cleanups in hfi1, siw, qib, mlx5, rxe, usnic, usnic, bnxt_re, ocrdma, iser: - remove unnecessary NULL checks - kmap obsolescence - pci_enable_pcie_error_reporting() obsolescence - unused variables and macros - trace event related warnings - casting warnings - Code cleanups for irdm and erdma - EFA reporting of 128 byte PCIe TLP support - mlx5 more agressively uses the out of order HW feature - Big rework of how state machines and tasks work in rxe - Fix a syzkaller found crash netdev refcount leak in siw - bnxt_re revises their HW description header - Congestion control for bnxt_re - Use mmu_notifiers more safely in hfi1 - mlx5 gets better support for PCIe relaxed ordering inside VMs" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (81 commits) RDMA/efa: Add rdma write capability to device caps RDMA/mlx5: Use correct device num_ports when modify DC RDMA/irdma: Drop spurious WQ_UNBOUND from alloc_ordered_workqueue() call RDMA/rxe: Fix spinlock recursion deadlock on requester RDMA/mlx5: Fix flow counter query via DEVX RDMA/rxe: Protect QP state with qp->state_lock RDMA/rxe: Move code to check if drained to subroutine RDMA/rxe: Remove qp->req.state RDMA/rxe: Remove qp->comp.state RDMA/rxe: Remove qp->resp.state RDMA/mlx5: Allow relaxed ordering read in VFs and VMs net/mlx5: Update relaxed ordering read HCA capabilities RDMA/mlx5: Check pcie_relaxed_ordering_enabled() in UMR RDMA/mlx5: Remove pcie_relaxed_ordering_enabled() check for RO write RDMA: Add ib_virt_dma_to_page() RDMA/rxe: Fix the error "trying to register non-static key in rxe_cleanup_task" RDMA/irdma: Slightly optimize irdma_form_ah_cm_frame() RDMA/rxe: Fix incorrect TASKLET_STATE_SCHED check in rxe_task.c IB/hfi1: Place struct mmu_rb_handler on cache line start IB/hfi1: Fix bugs with non-PAGE_SIZE-end multi-iovec user SDMA requests ...
2023-04-20net/mlx5: Update op_mode to op_mod for port selectionRoi Dayan
To be consistent with the other enum keys use OP_MOD instead of OP_MODE. Signed-off-by: Roi Dayan <roid@nvidia.com> Reviewed-by: Maor Dickman <maord@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2023-04-18RDMA/mlx5: Fix flow counter query via DEVXMark Bloch
Commit cited in "fixes" tag added bulk support for flow counters but it didn't account that's also possible to query a counter using a non-base id if the counter was allocated as bulk. When a user performs a query, validate the flow counter id given in the mailbox is inside the valid range taking bulk value into account. Fixes: 208d70f562e5 ("IB/mlx5: Support flow counters offset for bulk counters") Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Link: https://lore.kernel.org/r/79d7fbe291690128e44672418934256254d93115.1681377114.git.leon@kernel.org Signed-off-by: Leon Romanovsky <leon@kernel.org>
2023-04-17net/mlx5e: Add IPsec packet offload tunnel bitsLeon Romanovsky
Extend packet reformat types and flow table capabilities with IPsec packet offload tunnel bits. Reviewed-by: Simon Horman <simon.horman@corigine.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>