summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/mlx5/odp.c
AgeCommit message (Collapse)Author
2023-02-17Merge mlx5-next into rdma.git for-nextJason Gunthorpe
Synchronize the shared mlx5 branch with net: - From Jiri: fixe a deadlock in mlx5_ib's netdev notifier unregister. - From Mark and Patrisious: add IPsec RoCEv2 support. - From Or: Rely on firmware to get special mkeys * branch mlx5-next: RDMA/mlx5: Use query_special_contexts for mkeys net/mlx5e: Use query_special_contexts for mkeys net/mlx5: Change define name for 0x100 lkey value net/mlx5: Expose bits for querying special mkeys net/mlx5: Configure IPsec steering for egress RoCEv2 traffic net/mlx5: Configure IPsec steering for ingress RoCEv2 traffic net/mlx5: Add IPSec priorities in RDMA namespaces net/mlx5: Implement new destination type TABLE_TYPE net/mlx5: Introduce new destination type TABLE_TYPE RDMA/mlx5: Track netdev to avoid deadlock during netdev notifier unregister net/mlx5e: Propagate an internal event in case uplink netdev changes net/mlx5e: Fix trap event handling Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17RDMA/mlx5: Use query_special_contexts for mkeysOr Har-Toov
Use query_sepcial_contexts to get the correct value of mkeys such as null_mkey, terminate_scatter_list_mkey and dump_fill_mkey, as FW will change them in certain configurations. Link: https://lore.kernel.org/r/000236f0a9487d48809f87bcc3620a3964b2d3d3.1673960981.git.leon@kernel.org Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-17net/mlx5: Change define name for 0x100 lkey valueOr Har-Toov
Change define of 0x100 lkey value from MLX5_INVALID_LKEY to be MLX5_TERMINATE_SCATTER_LIST_LKEY as 0x100 is the value of terminate_scatter_list_mkey. Link: https://lore.kernel.org/r/3a116dc3fbae4cb6b76a63d27d418830b06ade0c.1673960981.git.leon@kernel.org Signed-off-by: Or Har-Toov <ohartoov@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27RDMA/mlx5: Add work to remove temporary entries from the cacheMichael Guralnik
The non-cache mkeys are stored in the cache only to shorten restarting application time. Don't store them longer than needed. Configure cache entries that store non-cache MRs as temporary entries. If 30 seconds have passed and no user reclaimed the temporarily cached mkeys, an asynchronous work will destroy the mkeys entries. Link: https://lore.kernel.org/r/20230125222807.6921-7-michaelgur@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27RDMA/mlx5: Introduce mlx5r_cache_rb_keyMichael Guralnik
Switch from using the mkey order to using the new struct as the key to the RB tree of cache entries. The key is all the mkey properties that UMR operations can't modify. Using this key to define the cache entries and to search and create cache mkeys. Link: https://lore.kernel.org/r/20230125222807.6921-5-michaelgur@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27RDMA/mlx5: Change the cache structure to an RB-treeMichael Guralnik
Currently, the cache structure is a static linear array. Therefore, his size is limited to the number of entries in it and is not expandable. The entries are dedicated to mkeys of size 2^x and no access_flags. Mkeys with different properties are not cacheable. In this patch, we change the cache structure to an RB-tree. This will allow to extend the cache to support more entries with different mkey properties. Link: https://lore.kernel.org/r/20230125222807.6921-4-michaelgur@nvidia.com Signed-off-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27RDMA/mlx5: Remove implicit ODP cache entryAharon Landau
Implicit ODP mkey doesn't have unique properties. It shares the same properties as the order 18 cache entry. There is no need to devote a special entry for that. Link: https://lore.kernel.org/r/20230125222807.6921-3-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27RDMA/mlx5: Don't keep umrable 'page_shift' in cache entriesAharon Landau
mkc.log_page_size can be changed using UMR. Therefore, don't treat it as a cache entry property. Removing it from struct mlx5_cache_ent. All cache mkeys will be created with default PAGE_SHIFT, and updated with the needed page_shift using UMR when passing them to a user. Link: https://lore.kernel.org/r/20230125222807.6921-2-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-29net/mlx5: Generalize name of UMR alignment definitionTariq Toukan
Per the device spec, MLX5_UMR_MTT_ALIGNMENT is good not only for UMR MTT entries, but for all other entries as well, like KLMs and KSMs. Signed-off-by: Tariq Toukan <tariqt@nvidia.com> Reviewed-by: Gal Pressman <gal@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2022-08-23IB/mlx5: Remove duplicate header inclusion related to ODPDaisuke Matsuda
rdma/ib_umem.h and rdma/ib_verbs.h are included by rdma/ib_umem_odp.h. This patch removes the redundant entries. Link: https://lore.kernel.org/r/20220823025131.862811-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-16RDMA/mlx5: Don't compare mkey tags in DEVX indirect mkeyAharon Landau
According to the ib spec: If the CI supports the Base Memory Management Extensions defined in this specification, the L_Key format must consist of: 24 bit index in the most significant bits of the R_Key, and 8 bit key in the least significant bits of the R_Key Through a successful Allocate L_Key verb invocation, the CI must let the consumer own the key portion of the returned R_Key Therefore, when creating a mkey using DEVX, the consumer is allowed to change the key part. The kernel should compare only the index part of a R_Key to determine equality with another R_Key. Adding capability in order not to break backward compatibility. Fixes: 534fd7aac56a ("IB/mlx5: Manage indirection mkey upon DEVX flow for ODP") Link: https://lore.kernel.org/r/3d669aacea85a3a15c3b3b953b3eaba3f80ef9be.1659255945.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-27RDMA/mlx5: Rename the mkey cache variables and functionsAharon Landau
After replacing the MR cache with an Mkey cache, rename the variables and functions to fit the new meaning. Link: https://lore.kernel.org/r/20220726071911.122765-6-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-19RDMA/mlx5: Remove duplicate pointer assignment in mlx5_ib_alloc_implicit_mr()Daisuke Matsuda
The pointer imr->umem is assigned twice. Fix this by removing the redundant one. Link: https://lore.kernel.org/r/20220518044914.1903125-1-matsuda-daisuke@fujitsu.com Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Use mlx5_umr_post_send_wait() to update xltAharon Landau
Move mlx5_ib_update_mr_pas logic to umr.c, and use mlx5_umr_post_send_wait() instead of mlx5_ib_post_send_wait(). Since it is the last use of mlx5_ib_post_send_wait(), remove it. Link: https://lore.kernel.org/r/55a4972f156aba3592a2fc9bcb33e2059acf295f.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Use mlx5_umr_post_send_wait() to update MR pasAharon Landau
Move mlx5_ib_update_mr_pas logic to umr.c, and use mlx5_umr_post_send_wait() instead of mlx5_ib_post_send_wait(). Link: https://lore.kernel.org/r/ed8f2ee6c64804072155d727149abf7105f92536.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-04-25RDMA/mlx5: Move umr checks to umr.hAharon Landau
Move mlx5_ib_can_load_pas_with_umr() and mlx5_ib_can_reconfig_with_umr() to umr.h and rename them accordingly. Link: https://lore.kernel.org/r/1b799b0142534a63dfd5bacc5f8ad2256d7777ad.1649747695.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Michael Guralnik <michaelgur@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-23RDMA/mlx5: Store ndescs instead of the translation table sizeAharon Landau
Currently, ent->xlt stores the translation table size. This data should not be stored in the cache entry but be written directly to the mailbox. Store ndescs instead, and deduce the translation table size from it according to the access mode. Link: https://lore.kernel.org/r/e9dbfaa1f279793a6bd28ee5a31cb4f0f0d70f05.1644947594.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-02-23RDMA/mlx5: Merge similar flows of allocating MR from the cacheAharon Landau
When allocating a MR from the cache, the driver calls to get_cache_mr(), and in case of failure, retries with create_cache_mr(). This is the flow of mlx5_mr_cache_alloc(), so use it instead. Link: https://lore.kernel.org/r/53c85fcd4de6ec9de0b8e6cbb1bf5d5fe19900c3.1644947594.git.leonro@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-06net/mlx5: Introduce API for bulk request and release of IRQsShay Drory
Currently IRQs are requested one by one. To balance spreading IRQs among cpus using such scheme requires remembering cpu mask for the cpus used for a given device. This complicates the IRQ allocation scheme in subsequent patch. Hence, prepare the code for bulk IRQs allocation. This enables spreading IRQs among cpus in subsequent patch. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-11-03Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "A typical collection of patches this cycle, mostly fixing with a few new features: - Fixes from static tools. clang warnings, dead code, unused variable, coccinelle sweeps, etc - Driver bug fixes and minor improvements in rxe, bnxt_re, hfi1, mlx5, irdma, qedr - rtrs ULP bug fixes an improvments - Additional counters for bnxt_re - Support verbs CQ notifications in EFA - Continued reworking and fixing of rxe - netlink control to enable/disable optional device counters - rxe now can use AH objects for its UD path, fixing various bugs in the process - Add DMABUF support to EFA" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (103 commits) RDMA/core: Require the driver to set the IOVA correctly during rereg_mr RDMA/bnxt_re: Remove unsupported bnxt_re_modify_ah callback RDMA/irdma: optimize rx path by removing unnecessary copy RDMA/qed: Use helper function to set GUIDs RDMA/hns: Use the core code to manage the fixed mmap entries IB/opa_vnic: Rebranding of OPA VNIC driver to Cornelis Networks IB/qib: Rebranding of qib driver to Cornelis Networks IB/hfi1: Rebranding of hfi1 driver to Cornelis Networks RDMA/bnxt_re: Use helper function to set GUIDs RDMA/bnxt_re: Fix kernel panic when trying to access bnxt_re_stat_descs RDMA/qedr: Fix NULL deref for query_qp on the GSI QP RDMA/hns: Modify the value of MAX_LP_MSG_LEN to meet hardware compatibility RDMA/hns: Fix initial arm_st of CQ RDMA/rxe: Make rxe_type_info static const RDMA/rxe: Use 'bitmap_zalloc()' when applicable RDMA/rxe: Save a few bytes from struct rxe_pool RDMA/irdma: Remove the unused variable local_qp RDMA/core: Fix missed initialization of rdma_hw_stats::lock RDMA/efa: Add support for dmabuf memory regions RDMA/umem: Allow pinned dmabuf umem usage ...
2021-10-27Merge branch 'mlx5-next' of ↵Saeed Mahameed
git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux into net-next Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-19Merge brank 'mlx5_mkey' into rdma.git for-nextJason Gunthorpe
A small series to clean up the mlx5 mkey code across the mlx5_core and InfiniBand. * branch 'mlx5_mkey': RDMA/mlx5: Attach ndescs to mlx5_ib_mkey RDMA/mlx5: Move struct mlx5_core_mkey to mlx5_ib RDMA/mlx5: Replace struct mlx5_core_mkey by u32 key RDMA/mlx5: Remove pd from struct mlx5_core_mkey RDMA/mlx5: Remove size from struct mlx5_core_mkey RDMA/mlx5: Remove iova from struct mlx5_core_mkey Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-10-19RDMA/mlx5: Attach ndescs to mlx5_ib_mkeyAharon Landau
Generalize the use of ndescs by adding it to mlx5_ib_mkey. Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19RDMA/mlx5: Move struct mlx5_core_mkey to mlx5_ibAharon Landau
Move mlx5_core_mkey struct to mlx5_ib, as the mlx5_core doesn't use it at this point. Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19RDMA/mlx5: Replace struct mlx5_core_mkey by u32 keyAharon Landau
In mlx5_core and vdpa there is no use of mlx5_core_mkey members except for the key itself. As preparation for moving mlx5_core_mkey to mlx5_ib, the occurrences of struct mlx5_core_mkey in all modules except for mlx5_ib are replaced by a u32 key. Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-19RDMA/mlx5: Remove iova from struct mlx5_core_mkeyAharon Landau
iova is already stored in ibmr->iova, no need to store it here. Signed-off-by: Aharon Landau <aharonl@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
2021-10-04net/mlx5: Shift control IRQ to the last indexShay Drory
Control IRQ is the first IRQ vector. This complicates handling of completion irqs as we need to offset them by one. in the next patch, there are scenarios where completion and control EQs will share the same irq. for example: functions with single IRQ. To ease such scenarios, we shift control IRQ to the end of the irq array. Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-10-01IB/mlx5: Flow through a more detailed return code from get_prefetchable_mr()Jason Gunthorpe
The error returns for various cases detected by get_prefetchable_mr() get confused as it flows back to userspace. Properly label each error path and flow the error code properly back to the system call. Link: https://lore.kernel.org/r/20210928170846.GA1721590@nvidia.com Suggested-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-01Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "This contains a replacement driver for Intel iWarp hardware. This new driver supports the old ethernet hardware and also newer chips that can do ROCE. Other than that, this contains the typical mix of patches: - Driver updates and cleanups for bnxt_re, cxgb4, mlx4, and mlx5 - Many static checker driven code clean ups, including a wide refcount_t conversion - Several series for the hns driver, more HIP09 HW capabilities, migration to new HW register manipulators, and code cleanups - Minor fixes and improvements in srp, rts, and cm - Improvements throughout for sysfs related code to use DEVICE_ATTR_*, make the ib_port sysfs first-class, and overall use sysfs APIs properly - Intel's new irdma driver replacing i40iw - rxe general clean ups and Memory Window support" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (211 commits) RDMA/core: Always release restrack object RDMA/mlx5: Don't access NULL-cleared mpi pointer RDMA/irdma: Fix potential overflow expression in irdma_prm_get_pbles RDMA/irdma: Check contents of user-space irdma_mem_reg_req object RDMA/rxe: Missing unlock on error in get_srq_wqe() RDMA/cma: Fix rdma_resolve_route() memory leak RDMA/core/sa_query: Remove unused argument RDMA/cma: Fix incorrect Packet Lifetime calculation RDMA/cma: Protect RMW with qp_mutex RDMA/cma: Remove unnecessary INIT->INIT transition RDMA/hns: Add window selection field of congestion control RDMA/hfi1: Remove use of kmap() RDMA/irdma: Remove use of kmap() RDMA/bnxt_re: Fix uninitialized struct bit field rsvd1 IB/isert: Align target max I/O size to initiator size RDMA/hns: Fix incorrect vlan enable bit in QPC MAINTAINERS: Update Broadcom RDMA maintainers RDMA/irdma: Use the queried port attributes RDMA/rxe: Fix redundant skb_put_zero RDMA/rxe: Fix extra copy in prepare_ack_packet ...
2021-06-14net/mlx5: Round-Robin EQs over IRQsShay Drory
Whenever users provided affinity for an EQ creation request, map the EQ to a matching IRQ. Matching IRQ=IRQ with the same affinity and type (completion/control) of the EQ created. This mapping is being done in agressive dedicated IRQ allocation scheme, which described bellow. First, we check whether there is a matching IRQ that his min threshold is not exhausted. - min_eqs_threshold = 3 for control EQ. - min_eqs_threshold = 1 for completion EQ. In case no matching IRQ was found, try to request a new IRQ. In case we can't request a new IRQ, reuse least-used matching IRQ. Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Tariq Toukan <tariqt@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-06-14net/mlx5: Provide cpumask at EQ creation phaseLeon Romanovsky
The users of EQ are running their code on different CPUs and with various affinity patterns. Move the cpumask setting close to their actual usage. Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Reviewed-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com>
2021-05-26RDMA/mlx5: Take qp type from mlx5_ib_qpMaor Gottlieb
Change all the places in the mlx5_ib driver to take the qp type from the mlx5_ib_qp struct, except the QP initialization flow. It will ensure that we check the right QP type also for vendor specific QPs. Link: https://lore.kernel.org/r/b2e16cd65b59cd24fa81c01c7989248da44e58ea.1621413899.git.leonro@nvidia.com Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-20RDMA/mlx5: Remove unused parameter udataLang Cheng
The old version of ib_umem_get() need these udata as a parameter but now they are unnecessary. Fixes: c320e527e154 ("IB: Allow calls to ib_umem_get from kernel ULPs") Link: https://lore.kernel.org/r/1620807142-39157-4-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-11RDMA/mlx5: Remove redundant assignment to retYang Li
Variable 'ret' is set to the rerurn value of function mlx5_mr_cache_alloc() but this value is never read as it is overwritten with a new value later on, hence it is a redundant assignment and can be removed Clean up the following clang-analyzer warning: drivers/infiniband/hw/mlx5/odp.c:421:2: warning: Value stored to 'ret' is never read [clang-analyzer-deadcode.DeadStores] Fixes: e6fb246ccafb ("RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()") Link: https://lore.kernel.org/r/1620296001-120406-1-git-send-email-yang.lee@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-26RDMA/mlx5: Set ODP caps only if device profile support ODPShay Drory
Currently, ODP caps are set during the init stage of mlx5_ib_dev, regardless of whether the device profile supports ODP or not. There is no point in setting ODP caps if the device profile doesn't support ODP. Hence, move setting the ODP caps to the odp_init stage. Link: https://lore.kernel.org/r/20210318135259.681264-1-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Shay Drory <shayd@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-23RDMA/mlx5: Create ODP EQ only when ODP MR is createdShay Drory
There is no need to create the ODP EQ if the user doesn't use ODP MRs. Hence, create it only when the first ODP MR is created. This EQ will be destroyed only when the device is unloaded. This will decrease the number of EQs created per device. for example: If we creates 1K devices (SF/VF/etc'), than we will decrease the num of EQs by 1K. Link: https://lore.kernel.org/r/20210314125418.179716-1-leon@kernel.org Signed-off-by: Shay Drory <shayd@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11RDMA/mlx5: Consolidate MR destruction to mlx5_ib_dereg_mr()Jason Gunthorpe
Now that the SRCU stuff has been removed the entire MR destroy logic can be made a lot simpler. Currently there are many different ways to destroy a MR and it makes it really hard to do this task correctly. Route all destruction through mlx5_ib_dereg_mr() and make it work for all situations. Since it turns out all the different MR types do basically the same thing this removes a lot of knowledge of MR internals from ODP and leaves ODP just exporting an operation to clean up children. This fixes a few weird corner cases bugs and firmly uses the correct ordering of the MR destruction: - Stop parallel access to the mkey via the ODP xarray - Stop DMA - Release the umem - Clean up ODP children - Free/Recycle the MR Link: https://lore.kernel.org/r/20210304120745.1090751-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-11RDMA/mlx5: Zero out ODP related items in the mlx5_ib_mrJason Gunthorpe
All of the ODP code assumes when it calls mlx5_mr_cache_alloc() the ODP related fields are zero'd. This is true if the MR was just allocated, but if the MR is recycled through the cache then the values are never zero'd. This causes a bug in the odp_stats, they don't reset when the MR is reallocated, also is_odp_implicit is never 0'd. So we can use memset on a block of the mlx5_ib_mr reorganize the structure to put all the data that can be zero'd by the cache at the end. It is organized as an anonymous struct because the next patch will make this a union. Delete the unused smr_info. Don't set the kernel only desc_size on the user path. No longer any need to zero mr->parent before freeing it, the memset() will get it now. Fixes: a3de94e3d61e ("IB/mlx5: Introduce ODP diagnostic counters") Link: https://lore.kernel.org/r/20210304120745.1090751-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-03RDMA/mlx5: Set correct kernel-doc identifierLeon Romanovsky
The W=1 allmodconfig build produces the following warning: drivers/infiniband/hw/mlx5/odp.c:1086: warning: wrong kernel-doc identifier on line: * Parse a series of data segments for page fault handling. Fix it by changing /** to be /* as it is written in kernel-doc documentation. Fixes: 5e769e444d26 ("RDMA/hw/mlx5/odp: Fix formatting and add missing descriptions in 'pagefault_data_segments()'") Link: https://lore.kernel.org/r/20210302074214.1054299-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-02-08RDMA/mlx5: Cleanup the synchronize_srcu() from the ODP flowYishai Hadas
Cleanup the synchronize_srcu() from the ODP flow as it was found to be a very heavy time consumer as part of dereg_mr. For example de-registration of 10000 ODP MRs each with size of 2M hugepage took 19.6 sec comparing de-registration of same number of non ODP MRs that took 172 ms. The new locking scheme uses the wait_event() mechanism which follows the use count of the MR instead of using synchronize_srcu(). By that change, the time required for the above test took 95 ms which is even better than the non ODP flow. Once fully dropped the srcu usage, had to come with a lock to protect the XA access. As part of using the above mechanism we could also clean the num_deferred_work stuff and follow the use count instead. Link: https://lore.kernel.org/r/20210202071309.2057998-1-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-22RDMA/hw/mlx5/odp: Fix formatting and add missing descriptions in ↵Lee Jones
'pagefault_data_segments()' Fixes the following W=1 kernel build warning(s): drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'dev' not described in 'pagefault_data_segments' drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'pfault' not described in 'pagefault_data_segments' drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'wqe' not described in 'pagefault_data_segments' drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'wqe_end' not described in 'pagefault_data_segments' drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'bytes_mapped' not described in 'pagefault_data_segments' drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'total_wqe_bytes' not described in 'pagefault_data_segments' drivers/infiniband/hw/mlx5/odp.c:1062: warning: Function parameter or member 'receive_queue' not described in 'pagefault_data_segments' Link: https://lore.kernel.org/r/20210121094519.2044049-2-lee.jones@linaro.org Cc: Leon Romanovsky <leon@kernel.org> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: linux-rdma@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-01-20RDMA/mlx5: Support dma-buf based userspace memory regionJianxin Xiong
Implement the new driver method 'reg_user_mr_dmabuf'. Utilize the core functions to import dma-buf based memory region and update the mappings. Add code to handle dma-buf related page fault. Link: https://lore.kernel.org/r/1608067636-98073-5-git-send-email-jianxin.xiong@intel.com Signed-off-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Sean Hefty <sean.hefty@intel.com> Acked-by: Michael J. Ruhl <michael.j.ruhl@intel.com> Acked-by: Christian Koenig <christian.koenig@amd.com> Acked-by: Daniel Vetter <daniel.vetter@ffwll.ch> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07RDMA/mlx5: Assign dev to DM MRMaor Gottlieb
Currently, DM MR registration flow doesn't set the mlx5_ib_dev pointer and can cause a NULL pointer dereference if userspace dumps the MR via rdma tool. Assign the IB device together with the other fields and remove the redundant reference of mlx5_ib_dev from mlx5_ib_mr. Cc: stable@vger.kernel.org Fixes: 6c29f57ea475 ("IB/mlx5: Device memory mr registration support") Link: https://lore.kernel.org/r/20201203190807.127189-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-12-07RDMA/mlx5: Reorganize mlx5_ib_reg_user_mr()Jason Gunthorpe
This function handles an ODP and regular MR flow all mushed together, even though the two flows are quite different. Split them into two dedicated functions. Link: https://lore.kernel.org/r/20201130075839.278575-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01RDMA/mlx5: Sync device with CPU pages upon ODP MR registrationYishai Hadas
Sync device with CPU pages upon ODP MR registration. mlx5 already has to zero the HW's version of the PAS list, may as well deliver a PAS list that matches the current CPU page tables configuration. Link: https://lore.kernel.org/r/20200930163828.1336747-5-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01RDMA/mlx5: Extend advice MR to support non faulting modeYishai Hadas
Extend advice MR to support non faulting mode, this can improve performance by increasing the populated page tables in the device. Link: https://lore.kernel.org/r/20200930163828.1336747-4-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01IB/core: Enable ODP sync without faultingYishai Hadas
Enable ODP sync without faulting, this improves performance by reducing the number of page faults in the system. The gain from this option is that the device page table can be aligned with the presented pages in the CPU page table without causing page faults. As of that, the overhead on data path from hardware point of view to trigger a fault which end-up by calling the driver to bring the pages will be dropped. Link: https://lore.kernel.org/r/20200930163828.1336747-3-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01IB/core: Improve ODP to use hmm_range_fault()Yishai Hadas
Move to use hmm_range_fault() instead of get_user_pags_remote() to improve performance in a few aspects: This includes: - Dropping the need to allocate and free memory to hold its output - No need any more to use put_page() to unpin the pages - The logic to detect contiguous pages is done based on the returned order, no need to run per page and evaluate. In addition, moving to use hmm_range_fault() enables to reduce page faults in the system with it's snapshot mode, this will be introduced in next patches from this series. As part of this, cleanup some flows and use the required data structures to work with hmm_range_fault(). Link: https://lore.kernel.org/r/20200930163828.1336747-2-leon@kernel.org Signed-off-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18RDMA/mlx5: Clarify what the UMR is for when creating MRsJason Gunthorpe
Once a mkey is created it can be modified using UMR. This is desirable for performance reasons. However, different hardware has restrictions on what modifications are possible using UMR. Make sense of these checks: - mlx5_ib_can_reconfig_with_umr() returns true if the access flags can be altered. Most cases create MRs using 0 access flags (now made clear by consistent use of set_mkc_access_pd_addr_fields()), but the old logic here was tormented. Make it clear that this is checking if the current access_flags can be modified using UMR to different access_flags. It is always OK to use UMR to change flags that all HW supports. - mlx5_ib_can_load_pas_with_umr() returns true if UMR can be used to enable and update the PAS/XLT. Enabling requires updating the entity size, so UMR ends up completely disabled on this old hardware. Make it clear why it is disabled. FRWR, ODP and cache always requires mlx5_ib_can_load_pas_with_umr(). - mlx5_ib_pas_fits_in_mr() is used to tell if an existing MR can be resized to hold a new PAS list. This only works for cached MR's because we don't store the PAS list size in other cases. To be very clear, arrange things so any pre-created MR's in the cache check the newly requested access_flags before allowing the MR to leave the cache. If UMR cannot set the required access_flags the cache fails to create the MR. This in turn means relaxed ordering and atomic are now correctly blocked early for implicit ODP on older HW. Link: https://lore.kernel.org/r/20200914112653.345244-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-06Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "A quiet cycle after the larger 5.8 effort. Substantially cleanup and driver work with a few smaller features this time. - Driver updates for hfi1, rxe, mlx5, hns, qedr, usnic, bnxt_re - Removal of dead or redundant code across the drivers - RAW resource tracker dumps to include a device specific data blob for device objects to aide device debugging - Further advance the IOCTL interface, remove the ability to turn it off. Add QUERY_CONTEXT, QUERY_MR, and QUERY_PD commands - Remove stubs related to devices with no pkey table - A shared CQ scheme to allow multiple ULPs to share the CQ rings of a device to give higher performance - Several more static checker, syzkaller and rare crashers fixed" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (121 commits) RDMA/mlx5: Fix flow destination setting for RDMA TX flow table RDMA/rxe: Remove pkey table RDMA/umem: Add a schedule point in ib_umem_get() RDMA/hns: Fix the unneeded process when getting a general type of CQE error RDMA/hns: Fix error during modify qp RTS2RTS RDMA/hns: Delete unnecessary memset when allocating VF resource RDMA/hns: Remove redundant parameters in set_rc_wqe() RDMA/hns: Remove support for HIP08_A RDMA/hns: Refactor hns_roce_v2_set_hem() RDMA/hns: Remove redundant hardware opcode definitions RDMA/netlink: Remove CAP_NET_RAW check when dump a raw QP RDMA/include: Replace license text with SPDX tags RDMA/rtrs: remove WQ_MEM_RECLAIM for rtrs_wq RDMA/rtrs-clt: add an additional random 8 seconds before reconnecting RDMA/cma: Execute rdma_cm destruction from a handler properly RDMA/cma: Remove unneeded locking for req paths RDMA/cma: Using the standard locking pattern when delivering the removal event RDMA/cma: Simplify DEVICE_REMOVAL for internal_id RDMA/efa: Add EFA 0xefa1 PCI ID RDMA/efa: User/kernel compatibility handshake mechanism ...