summaryrefslogtreecommitdiff
path: root/drivers/infiniband/hw/hfi1/affinity.c
AgeCommit message (Collapse)Author
2016-08-22IB/hfi1: Remove duplicated include from affinity.cWei Yongjun
Remove duplicated include. Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Reviewed-by: Leon Romanovsky <leonro@mellanox.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-22IB/hfi1: Allocate cpu mask on the heap to silence warningTadeusz Struk
If CONFIG_FRAME_WARN is small (1K) and CONFIG_NR_CPUS big then a frame size warning is triggered during build. Allocate the cpu mask dynamically to silence the warning. Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/hfi1: Add sysfs entry to override SDMA interrupt affinityTadeusz Struk
Add sysfs entry to allow user to override affinity for SDMA engine interrupts. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Tadeusz Struk <tadeusz.struk@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/hfi1: Refine user process affinity algorithmSebastian Sanchez
When performing process affinity recommendations for MPI ranks, the current algorithm doesn't take into account multiple HFI units. Also, real cores and HT cores are not distinguished from one another. Therefore, all HT cores are recommended to be assigned first within the local NUMA node before recommending the assignments of cores in other NUMA nodes. It's ideal to assign all real cores across all NUMA nodes first, then all HT 1 cores, then all HT 2 cores, and so on to balance CPU workload. CPU cores in other NUMA nodes could be running interrupt handlers, and this is not taken into account. To balance the CPU workload for user processes, the following recommendation algorithm is used: For each user process that is opening a context on HFI Y: a) If all cores are assigned to user processes, start assignments all over from the first core b) Assign real cores first, then HT cores (First set of HT cores on all physical cores, then second set of HT cores, and, so on) in the following order: 1. Same NUMA node as HFI Y and not running an IRQ handler 2. Same NUMA node as HFI Y and running an IRQ handler 3. Different NUMA node to HFI Y and not running an IRQ handler 4. Different NUMA node to HFI Y and running an IRQ handler c) Mark core as assigned in the global affinity structure. As user processes are done, remove core assignments from global affinity structure. This implementation allows an arbitrary number of HT cores and provides support for multiple HFIs. This is being included in the kernel rather than user space due to the fact that user space has no way of knowing the CPU recommendations for contexts running as part of other jobs. Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mitko Haralanov <mitko.haralanov@intel.com> Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/hfi1: Reserve and collapse CPU cores for contextsSebastian Sanchez
Kernel receive queues oversubscribe CPU cores on multi-HFI systems. To prevent this, the kernel receive queues are separated onto different cores, and the SDMA engine interrupts are constrained to a lesser number of cores. hfi1s_on_numa_node*krcvqs is the number of CPU cores that are reserved for kernel receive queues for all HFIs. Each HFI initializes its kernel receive queues to one of the reserved CPU cores. If there ends up being 0 CPU cores leftover for SDMA engines, use the same CPU cores as receive contexts. In addition, general and control contexts are assigned to their own CPU core, however, both types of contexts tend to have low traffic. To save CPU cores, collapse general and control contexts to one CPU core for all HFI units. This change prevents SDMA engine interrupts from wrapping around general contexts. Reviewed-by: Dean Luick <dean.luick@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-08-02IB/hfi1: Add global structure for affinity assignmentsDennis Dalessandro
When HFI units get initialized, they each use their own mask copy for affinity assignments. On a multi-HFI system, affinity assignments overbook CPU cores as each HFI doesn't have knowledge of affinity assignments for other HFI units. Therefore, some CPU cores are never used for interrupt handlers in systems with high number of CPU cores per NUMA node. For multi-HFI systems, SDMA engine interrupt assignments start all over from the first CPU in the local NUMA node after the first HFI initialization. This change allows assignments to continue where the last HFI unit left off. Add global structure for affinity assignments for multiple HFIs to share affinity mask. Reviewed-by: Jianxin Xiong <jianxin.xiong@intel.com> Reviewed-by: Jubin John <jubin.john@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com> Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-06-06IB/hfi1: Avoid large frame size warningLeon Romanovsky
When CONFIG_FRAME_WARN is set to 1024 bytes, which is useful to find stack consumers, we get a warning in hfi1 driver. drivers/infiniband/hw/hfi1/affinity.c: In function ‘hfi1_get_proc_affinity’: drivers/infiniband/hw/hfi1/affinity.c:415:1: warning: the frame size of 1056 bytes is larger than 1024 bytes [-Wframe-larger-than=] This change removes unneeded buf[1024] declaration and usage. Fixes: f48ad614c100 ("IB/hfi1: Move driver out of staging") Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>
2016-05-26IB/hfi1: Move driver out of stagingDennis Dalessandro
The TODO list for the hfi1 driver was completed during 4.6. In addition other objections raised (which are far beyond what was in the TODO list) have been addressed as well. It is now time to remove the driver from staging and into the drivers/infiniband sub-tree. Reviewed-by: Jubin John <jubin.john@intel.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com> Signed-off-by: Doug Ledford <dledford@redhat.com>