summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw
AgeCommit message (Collapse)Author
2022-08-05Merge tag 'trace-v6.0' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace Pull tracing updates from Steven Rostedt: - Runtime verification infrastructure This is the biggest change here. It introduces the runtime verification that is necessary for running Linux on safety critical systems. It allows for deterministic automata models to be inserted into the kernel that will attach to tracepoints, where the information on these tracepoints will move the model from state to state. If a state is encountered that does not belong to the model, it will then activate a given reactor, that could just inform the user or even panic the kernel (for which safety critical systems will detect and can recover from). - Two monitor models are also added: Wakeup In Preemptive (WIP - not to be confused with "work in progress"), and Wakeup While Not Running (WWNR). - Added __vstring() helper to the TRACE_EVENT() macro to replace several vsnprintf() usages that were all doing it wrong. - eprobes now can have their event autogenerated when the event name is left off. - The rest is various cleanups and fixes. * tag 'trace-v6.0' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace: (50 commits) rv: Unlock on error path in rv_unregister_reactor() tracing: Use alignof__(struct {type b;}) instead of offsetof() tracing/eprobe: Show syntax error logs in error_log file scripts/tracing: Fix typo 'the the' in comment tracepoints: It is CONFIG_TRACEPOINTS not CONFIG_TRACEPOINT tracing: Use free_trace_buffer() in allocate_trace_buffers() tracing: Use a struct alignof to determine trace event field alignment rv/reactor: Add the panic reactor rv/reactor: Add the printk reactor rv/monitor: Add the wwnr monitor rv/monitor: Add the wip monitor rv/monitor: Add the wip monitor skeleton created by dot2k Documentation/rv: Add deterministic automata instrumentation documentation Documentation/rv: Add deterministic automata monitor synthesis documentation tools/rv: Add dot2k Documentation/rv: Add deterministic automaton documentation tools/rv: Add dot2c Documentation/rv: Add a basic documentation rv/include: Add instrumentation helper functions rv/include: Add deterministic automata monitor definition via C macros ...
2022-08-04Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdmaLinus Torvalds
Pull rdma updates from Jason Gunthorpe: "This cycle we got a new RDMA driver "ERDMA" for the Alibaba cloud environment. Otherwise the changes are dominated by rxe fixes. There is another RDMA driver on the list that might get merged next cycle, 'MANA' for the Azure cloud environment. Summary: - Bug fixes and small features for irdma, hns, siw, qedr, hfi1, mlx5 - General spelling/grammer fixes - rdma cm can follow changes in neighbours for control packets - Significant amounts of rxe fixes and spec compliance changes - Use the modern NAPI API - Use the bitmap API instead of open coding - Performance improvements for rtrs - Add the ERDMA driver for Alibaba cloud - Fix a use after free bug in SRP" * tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (99 commits) RDMA/ib_srpt: Unify checking rdma_cm_id condition in srpt_cm_req_recv() RDMA/rxe: Fix error unwind in rxe_create_qp() RDMA/mlx5: Add missing check for return value in get namespace flow RDMA/rxe: Split qp state for requester and completer RDMA/rxe: Generate error completion for error requester QP state RDMA/rxe: Update wqe_index for each wqe error completion RDMA/srpt: Fix a use-after-free RDMA/srpt: Introduce a reference count in struct srpt_device RDMA/srpt: Duplicate port name members IB/qib: Fix repeated "in" within comments RDMA/erdma: Add driver to kernel build environment RDMA/erdma: Add the ABI definitions RDMA/erdma: Add the erdma module RDMA/erdma: Add connection management (CM) support RDMA/erdma: Add verbs implementation RDMA/erdma: Add verbs header file RDMA/erdma: Add event queue implementation RDMA/erdma: Add cmdq implementation RDMA/erdma: Add main include file RDMA/erdma: Add the hardware related definitions ...
2022-08-02RDMA/mlx5: Add missing check for return value in get namespace flowMaor Gottlieb
Add missing check for return value when calling to mlx5_ib_ft_type_to_namespace, even though it can't really fail in this specific call. Fixes: 52438be44112 ("RDMA/mlx5: Allow inserting a steering rule to the FDB") Link: https://lore.kernel.org/r/7b9ceda217d9368a51dc47a46b769bad4af9ac92.1659256069.git.leonro@nvidia.com Reviewed-by: Itay Aveksis <itayav@nvidia.com> Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-28IB/qib: Fix repeated "in" within commentswangjianli
Delete the redundant word 'in'. Link: https://lore.kernel.org/r/20220724074407.18552-1-wangjianli@cdjrlc.com Signed-off-by: wangjianli <wangjianli@cdjrlc.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27Merge branch 'erdma' into rdma.git for-nextJason Gunthorpe
Cheng Xu says ==================== This v14 patch set introduces the Elastic RDMA Adapter (ERDMA) driver, which released in Apsara Conference 2021 by Alibaba. The PR of ERDMA userspace provider has already been created [1]. ERDMA enables large-scale RDMA acceleration capability in Alibaba ECS environment, initially offered in g7re instance. It can improve the efficiency of large-scale distributed computing and communication significantly and expand dynamically with the cluster scale of Alibaba Cloud. ERDMA is a RDMA networking adapter based on the Alibaba MOC hardware. It works in the VPC network environment (overlay network), and uses iWarp transport protocol. ERDMA supports reliable connection (RC). ERDMA also supports both kernel space and user space verbs. Now we have already supported HPC/AI applications with libfabric, NoF and some other internal verbs libraries, such as xrdma, epsl, etc,. For the ECS instance with RDMA enabled, our MOC hardware generates two kinds of PCI devices: one for ERDMA, and one for the original net device (virtio-net). They are separated PCI devices. ==================== * branch 'erdma': RDMA/erdma: Add driver to kernel build environment RDMA/erdma: Add the ABI definitions RDMA/erdma: Add the erdma module RDMA/erdma: Add connection management (CM) support RDMA/erdma: Add verbs implementation RDMA/erdma: Add verbs header file RDMA/erdma: Add event queue implementation RDMA/erdma: Add cmdq implementation RDMA/erdma: Add main include file RDMA/erdma: Add the hardware related definitions RDMA: Add ERDMA to rdma_driver_id definition Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add driver to kernel build environmentCheng Xu
Add erdma to the kernel build environment, and sort the source order in drivers/infiniband/Kconfig. Link: https://lore.kernel.org/r/20220727014927.76564-12-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add the erdma moduleCheng Xu
Add the main erdma module, which provides interface to infiniband subsystem. This commit includes a modification from Christophe, that using the bitmap API to allocate bitmaps instead of hand-writing. And the commit also fixes warnings reported by static checkers. Link: https://lore.kernel.org/r/20220727014927.76564-10-chengyou@linux.alibaba.com Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add connection management (CM) supportCheng Xu
ERDMA's transport protocol is iWarp, so the driver must support CM interface. In CM part, we use the same way as SoftiWarp: using kernel socket to set up the connection, then performing MPA negotiation in kernel. So, this part of code mainly comes from SoftiWarp, base on it, we add some more features, such as non-blocking iw_connect implementation. This commit also fixes a duplicated include issue reported by Abaci Robot. Link: https://lore.kernel.org/r/20220727014927.76564-9-chengyou@linux.alibaba.com Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add verbs implementationCheng Xu
The RDMA verbs implementation of erdma is divided into three files: erdma_qp.c, erdma_cq.c, and erdma_verbs.c. Internal used functions and datapath functions of QP/CQ are put in erdma_qp.c and erdma_cq.c, the rest is in erdma_verbs.c. This commit also fixes some static check warnings. Link: https://lore.kernel.org/r/20220727014927.76564-8-chengyou@linux.alibaba.com Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Reported-by: Hulk Robot <hulkci@huawei.com> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add verbs header fileCheng Xu
This header file defines the main structures and functions used for RDMA Verbs, including qp, cq, mr, ucontext, etc,. Link: https://lore.kernel.org/r/20220727014927.76564-7-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add event queue implementationCheng Xu
Event queue (EQ) is the main notification way from erdma hardware to its driver. Each erdma device contains 2 kinds EQs: asynchronous EQ (AEQ) and completion EQ (CEQ). Per device has 1 AEQ, which used for RDMA async event report, and max to 32 CEQs (numbered for CEQ0 to CEQ31). CEQ0 is used for cmdq completion event report, and the rest CEQs are used for RDMA completion event report. Link: https://lore.kernel.org/r/20220727014927.76564-6-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add cmdq implementationCheng Xu
Cmdq is the main control plane channel between erdma driver and hardware. After erdma device is initialized, the cmdq channel will be active in the whole lifecycle of this driver. This commit also includes two modifications from Christophe, one is using the bitmap API to allocate bitmaps instead of hand-writing, and another is using the non-atomic bitmap API when applicable. Link: https://lore.kernel.org/r/20220727014927.76564-5-chengyou@linux.alibaba.com Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add main include fileCheng Xu
Add ERDMA driver main header file, defining internal used data structures and operations. The defined data structures includes *cmdq*, which is used as the communication channel between ERDMA driver and hardware. Link: https://lore.kernel.org/r/20220727014927.76564-4-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/erdma: Add the hardware related definitionsCheng Xu
ERDMA is a PCIe device, and this file provides ERDMA hardware related definitions, mainly including PCIe device capabilities and restrictions, device registers definitions, doorbell space, doorbell structure definitions and WQE definitions. Link: https://lore.kernel.org/r/20220727014927.76564-3-chengyou@linux.alibaba.com Signed-off-by: Cheng Xu <chengyou@linux.alibaba.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/mlx5: Rename the mkey cache variables and functionsAharon Landau
After replacing the MR cache with an Mkey cache, rename the variables and functions to fit the new meaning. Link: https://lore.kernel.org/r/20220726071911.122765-6-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/mlx5: Store in the cache mkeys instead of mrsAharon Landau
Currently, the driver stores mlx5_ib_mr struct in the cache entries, although the only use of the cached MR is the mkey. Store only the mkey in the cache. Link: https://lore.kernel.org/r/20220726071911.122765-5-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/mlx5: Store the number of in_use cache mkeys instead of total_mrsAharon Landau
total_mrs is used only to calculate the number of mkeys currently in use. To simplify things, replace it with a new member called "in_use" and directly store the number of mkeys currently in use. Link: https://lore.kernel.org/r/20220726071911.122765-4-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/mlx5: Replace cache list with XarrayAharon Landau
The Xarray allows us to store the cached mkeys in memory efficient way. Entries are reserved in the Xarray using xa_cmpxchg before calling to the upcoming callbacks to avoid allocations in interrupt context. The xa_cmpxchg can sleep when using GFP_KERNEL, so we call it in a loop to ensure one reserved entry for each process trying to reserve. Link: https://lore.kernel.org/r/20220726071911.122765-3-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-27RDMA/mlx5: Replace ent->lock with xa_lockAharon Landau
In the next patch, ent->list will be replaced with an xarray. The xarray uses an internal lock to protect the indexes. Use it to protect all the entry fields, and get rid of ent->lock. Link: https://lore.kernel.org/r/20220726071911.122765-2-michaelgur@nvidia.com Signed-off-by: Aharon Landau <aharonl@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-22IB: Fix repeated words 'the the' commentsSlark Xiao
Replace 'the the' with 'the' in the comments. Signed-off-by: Slark Xiao <slark_xiao@163.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-21Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-07-19RDMA/mlx5: Expose steering anchor to userspaceMark Bloch
Expose a steering anchor per priority to allow users to re-inject packets back into default NIC pipeline for additional processing. MLX5_IB_METHOD_STEERING_ANCHOR_CREATE returns a flow table ID which a user can use to re-inject packets at a specific priority. A FTE (flow table entry) can be created and the flow table ID used as a destination. When a packet is taken into a RDMA-controlled steering domain (like software steering) there may be a need to insert the packet back into the default NIC pipeline. This exposes a flow table ID to the user that can be used as a destination in a flow table entry. With this new method priorities that are exposed to users via MLX5_IB_METHOD_FLOW_MATCHER_CREATE can be reached from a non-zero UID. As user-created flow tables (via RDMA DEVX) are created with a non-zero UID thus it's impossible to point to a NIC core flow table (core driver flow tables are created with UID value of zero) from userspace. Create flow tables that are exposed to users with the shared UID, this allows users to point to default NIC flow tables. Steering loops are prevented at FW level as FW enforces that no flow table at level X can point to a table at level lower than X. Link: https://lore.kernel.org/all/20220703205407.110890-6-saeed@kernel.org/ Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Yishai Hadas <yishaih@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-19RDMA/mlx5: Refactor get flow table functionMark Bloch
_get_flow_table() requires the entire matcher being passed while all it needs is the priority and namespace type. Pass the priority and namespace type directly instead. Link: https://lore.kernel.org/all/20220703205407.110890-5-saeed@kernel.org/ Signed-off-by: Mark Bloch <mbloch@nvidia.com> Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Saeed Mahameed <saeedm@nvidia.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-19IB/qib: Fix comment typoJason Wang
The double `are' is duplicated in line 156, remove one. Link: https://lore.kernel.org/r/20220715054007.5320-1-wangborong@cdjrlc.com Signed-off-by: Jason Wang <wangborong@cdjrlc.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-19RDMA/hfi1: fix potential memory leak in setup_base_ctxt()Jianglei Nie
setup_base_ctxt() allocates a memory chunk for uctxt->groups with hfi1_alloc_ctxt_rcv_groups(). When init_user_ctxt() fails, uctxt->groups is not released, which will lead to a memory leak. We should release the uctxt->groups with hfi1_free_ctxt_rcv_groups() when init_user_ctxt() fails. Fixes: e87473bc1b6c ("IB/hfi1: Only set fd pointer when base context is completely initialized") Link: https://lore.kernel.org/r/20220711070718.2318320-1-niejianglei2021@163.com Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/hns: Recover 1bit-ECC error of RAM on chipHaoyue Xu
Since ECC memory maintains a memory system immune to single-bit errors, add support for correcting the 1bit-ECC error, which prevents a 1bit-ECC error become an uncorrected type error. When a 1bit-ECC error happens in the internal ram of the ROCE engine, such as the QPC table, as a 1bit-ECC error caused by reading, the ROCE engine only corrects those 1bit ECC errors by writing. Link: https://lore.kernel.org/r/20220714134353.16700-6-liangwenpeng@huawei.com Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/hns: Refactor the abnormal interrupt handler functionHaoyue Xu
Use a single function to handle the same kind of abnormal interrupts. Link: https://lore.kernel.org/r/20220714134353.16700-5-liangwenpeng@huawei.com Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/hns: Fix incorrect clearing of interrupt status registerHaoyue Xu
The driver will clear all the interrupts in the same area when the driver handles the interrupt of type AEQ overflow. It should only set the interrupt status bit of type AEQ overflow. Fixes: a5073d6054f7 ("RDMA/hns: Add eq support of hip08") Link: https://lore.kernel.org/r/20220714134353.16700-4-liangwenpeng@huawei.com Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/hns: Fix the wrong type of return value of the interrupt handlerHaoyue Xu
The type of return value of the interrupt handler should be irqreturn_t. Link: https://lore.kernel.org/r/20220714134353.16700-3-liangwenpeng@huawei.com Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/hns: Remove unused abnormal interrupt of type RASHaoyue Xu
The HNS NIC driver receives and handles the abnormal interrupt of the RAS type generated by ROCEE, and the HNS RDMA driver does not need to handle this type of interrupt. Therefore, delete unused codes in the HNS RDMA driver. Link: https://lore.kernel.org/r/20220714134353.16700-2-liangwenpeng@huawei.com Signed-off-by: Haoyue Xu <xuhaoyue1@hisilicon.com> Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/qedr: Fix potential memory leak in __qedr_alloc_mr()Jianglei Nie
__qedr_alloc_mr() allocates a memory chunk for "mr->info.pbl_table" with init_mr_info(). When rdma_alloc_tid() and rdma_register_tid() fail, "mr" is released while "mr->info.pbl_table" is not released, which will lead to a memory leak. We should release the "mr->info.pbl_table" with qedr_free_pbl() when error occurs to fix the memory leak. Fixes: e0290cce6ac0 ("qedr: Add support for memory registeration verbs") Link: https://lore.kernel.org/r/20220714061505.2342759-1-niejianglei2021@163.com Signed-off-by: Jianglei Nie <niejianglei2021@163.com> Acked-by: Michal Kalderon <michal.kalderon@marvell.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/hfi1: Depend on !UMLEhab Ababneh
Both hfi1 and UML depend on x86_64, this can trigger build errors. This driver must depends on !UML because it accesses x86_64 features that are not supported by UML. Link: https://lore.kernel.org/r/165755127879.2996325.5668395672492732376.stgit@awfm-02.cornelisnetworks.com Signed-off-by: Ehab Ababneh <ehab.ababneh@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Use the bitmap API to allocate bitmapsChristophe JAILLET
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Link: https://lore.kernel.org/r/1f671b1af5881723ee265a0a12809c92950e58aa.1657567269.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Acked-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/qib: Use the bitmap API to allocate bitmapsChristophe JAILLET
Use bitmap_zalloc()/bitmap_free() instead of hand-writing them. It is less verbose and it improves the semantic. Link: https://lore.kernel.org/r/f7a8588447679e80a438b6188b0603c1a11ad877.1657300671.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Fix setting of QP context err_rq_idx_valid fieldMustafa Ismail
Setting err_rq_idx_valid field in QP context when the AE source of the AEQE is not associated with an RQ causes the firmware flush to fail. Set err_rq_idx_valid field in QP context only if it is associated with an RQ. Additionally, cleanup the redundant setting of this field in irdma_process_aeq. Fixes: 44d9e52977a1 ("RDMA/irdma: Implement device initialization definitions") Link: https://lore.kernel.org/r/20220705230815.265-8-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Fix VLAN connection with wildcard addressMustafa Ismail
When an application listens on a wildcard address, and there are VLAN and non-VLAN IP addresses, iWARP connection establishemnt can fail if the listen node VLAN ID does not match. Fix this by checking the vlan_id only if not a wildcard listen node. Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Link: https://lore.kernel.org/r/20220705230815.265-7-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Fix a window for use-after-freeMustafa Ismail
During a destroy CQ an interrupt may cause processing of a CQE after CQ resources are freed by irdma_cq_free_rsrc(). Fix this by moving the call to irdma_cq_free_rsrc() after the irdma_sc_cleanup_ceqes(), which is called under the cq_lock. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Link: https://lore.kernel.org/r/20220705230815.265-6-shiraz.saleem@intel.com Signed-off-by: Bartosz Sobczak <bartosz.sobczak@intel.com> Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Make resource distribution algorithm more QP orientedNayan Kumar
Adapt the resource distribution algorithm in irdma_cfg_fpm_val to be more QP oriented. If the configuration is too big for the available memory, trim the MR and PBLE's first before trimming the QPs. This also avoids having to double QPs requested as input to algorithm for GEN1 devices. Link: https://lore.kernel.org/r/20220705230815.265-5-shiraz.saleem@intel.com Signed-off-by: Nayan Kumar <nayan.kumar@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Make CQP invalid state error non-criticalMustafa Ismail
The invalid state error returned by the Control Queue-Pair (CQP) is not a critical error. Add it to the irdma_noncrit_err_list and drop reporting it as device error message. Link: https://lore.kernel.org/r/20220705230815.265-4-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Add AE source to error logMustafa Ismail
To assist with debugging add the Asynchronous Event (AE) source when logging the abnormal AE error log message. Link: https://lore.kernel.org/r/20220705230815.265-3-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-18RDMA/irdma: Add 2 level PBLE support for FMRMustafa Ismail
Level 2 Physical Buffer List Entry (PBLE) is currently not supported for Fast MRs which limits memory registrations to 256K pages. Adapt irdma_set_page and irdma_alloc_mr to allow for 2 level PBLEs. Link: https://lore.kernel.org/r/20220705230815.265-2-shiraz.saleem@intel.com Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-17IB/hfi1: switch to netif_napi_add_weight()Jakub Kicinski
Since we'll remove the last argument from netif_napi_add() soon switch this RDMA driver to netif_napi_add_weight() for now to avoid cross-tree patches. Link: https://lore.kernel.org/r/20220705230208.924408-3-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-17IB/hfi1: switch to netif_napi_add_tx()Jakub Kicinski
Switch to the new API not requiring the NAPI_POLL_WEIGHT argument. Link: https://lore.kernel.org/r/20220705230208.924408-2-kuba@kernel.org Signed-off-by: Jakub Kicinski <kuba@kernel.org> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-17RDMA/qib: Use the bitmap API when applicableChristophe JAILLET
Using the bitmap API is less verbose than hand writing them. It also improves the semantic. While at it, initialize the bitmaps. It can't hurt. Link: https://lore.kernel.org/r/33d8992586d382bec8b8efd83e4729fb7feaf89e.1656834106.git.christophe.jaillet@wanadoo.fr Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr> Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-15tracing/IB/hfi1: Use the new __vstring() helperSteven Rostedt (Google)
Instead of open coding a __dynamic_array() with a fixed length (which defeats the purpose of the dynamic array in the first place). Use the new __vstring() helper that will use a va_list and only write enough of the string into the ring buffer that is needed. Link: https://lkml.kernel.org/r/20220705224749.239494531@goodmis.org Cc: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: Leon Romanovsky <leon@kernel.org> Cc: linux-rdma@vger.kernel.org Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
2022-07-11RDMA/irdma: Fix sleep from invalid context BUGMustafa Ismail
Taking the qos_mutex to process RoCEv2 QP's on netdev events causes a kernel splat. Fix this by removing the handling for RoCEv2 in irdma_cm_teardown_connections that uses the mutex. This handling is only needed for iWARP to avoid having connections established while the link is down or having connections remain functional after the IP address is removed. BUG: sleeping function called from invalid context at kernel/locking/mutex. Call Trace: kernel: dump_stack+0x66/0x90 kernel: ___might_sleep.cold.92+0x8d/0x9a kernel: mutex_lock+0x1c/0x40 kernel: irdma_cm_teardown_connections+0x28e/0x4d0 [irdma] kernel: ? check_preempt_curr+0x7a/0x90 kernel: ? select_idle_sibling+0x22/0x3c0 kernel: ? select_task_rq_fair+0x94c/0xc90 kernel: ? irdma_exec_cqp_cmd+0xc27/0x17c0 [irdma] kernel: ? __wake_up_common+0x7a/0x190 kernel: irdma_if_notify+0x3cc/0x450 [irdma] kernel: ? sched_clock_cpu+0xc/0xb0 kernel: irdma_inet6addr_event+0xc6/0x150 [irdma] Fixes: 146b9756f14c ("RDMA/irdma: Add connection manager") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-11RDMA/irdma: Do not advertise 1GB page size for x722Mustafa Ismail
x722 does not support 1GB page size but the irdma driver incorrectly advertises 1GB page size support for x722 device to ib_core to compute the best page size to use on this MR. This could lead to incorrect start offsets computed by hardware on the MR. Fixes: b48c24c2d710 ("RDMA/irdma: Implement device supported verb APIs") Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-04IB: Fix spelling of 'writable'Zhang Jiaming
There is a typo (writeable) in qib_file_ops.c, qib_sd7220.c's comments, and in rxe_check_bind_mw() Link: https://lore.kernel.org/r/20220701074812.12615-1-jiaming@nfschina.com Link: https://lore.kernel.org/r/20220701080019.13329-1-jiaming@nfschina.com Signed-off-by: Zhang Jiaming <jiaming@nfschina.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/netJakub Kicinski
drivers/net/ethernet/microchip/sparx5/sparx5_switchdev.c 9c5de246c1db ("net: sparx5: mdb add/del handle non-sparx5 devices") fbb89d02e33a ("net: sparx5: Allow mdb entries to both CPU and ports") Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2022-06-24RDMA: Correct duplicated words in commentsJiang Jian
There is a duplicated word 'is' and 'for' in a comment that needs to be dropped. Link: https://lore.kernel.org/r/20220622170853.3644-1-jiangjian@cdjrlc.com Link: https://lore.kernel.org/r/20220623103708.43104-1-jiangjian@cdjrlc.com Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>