linux/drivers/infiniband/core
Doug Ledford 29f27e8477 IB/cma: Use cached gids
The cma_acquire_dev function was changed by commit 3c86aa70bf
("RDMA/cm: Add RDMA CM support for IBoE devices") to use find_gid_port()
because multiport devices might have either IB or IBoE formatted gids.
The old function assumed that all ports on the same device used the
same GID format.

However, when it was changed to use find_gid_port(), we inadvertently
lost usage of the GID cache.  This turned out to be a very costly
change.  In our testing, each iteration through each index of the GID
table takes roughly 35us.  When you have multiple devices in a system,
and the GID you are looking for is on one of the later devices, the
code loops through all of the GID indexes on all of the early devices
before it finally succeeds on the target device.  This pathological
search behavior combined with 35us per GID table index retrieval
results in results such as the following from the cmtime application
that's part of the latest librdmacm git repo:

ib1:
step              total ms     max ms     min us  us / conn
create id    :       29.42       0.04       1.00       2.94
bind addr    :   186705.66      19.00   18556.00   18670.57
resolve addr :       41.93       9.68     619.00       4.19
resolve route:      486.93       0.48     101.00      48.69
create qp    :     4021.95       6.18     330.00     402.20
connect      :    68350.39   68588.17   24632.00    6835.04
disconnect   :     1460.43     252.65-1862269.00     146.04
destroy      :       41.16       0.04       2.00       4.12

ib0:
step              total ms     max ms     min us  us / conn
create id    :       28.61       0.68       1.00       2.86
bind addr    :     2178.86       2.95     201.00     217.89
resolve addr :       51.26      16.85     845.00       5.13
resolve route:      620.08       0.43      92.00      62.01
create qp    :     3344.40       6.36     273.00     334.44
connect      :     6435.99    6368.53    7844.00     643.60
disconnect   :     5095.38     321.90     757.00     509.54
destroy      :       37.13       0.02       2.00       3.71

Clearly, both the bind address and connect operations suffer
a huge penalty for being anything other than the default
GID on the first port in the system.

After applying this patch, the numbers now look like this:

ib1:
step              total ms     max ms     min us  us / conn
create id    :       30.15       0.03       1.00       3.01
bind addr    :       80.27       0.04       7.00       8.03
resolve addr :       43.02      13.53     589.00       4.30
resolve route:      482.90       0.45     100.00      48.29
create qp    :     3986.55       5.80     330.00     398.66
connect      :     7141.53    7051.29    5005.00     714.15
disconnect   :     5038.85     193.63     918.00     503.88
destroy      :       37.02       0.04       2.00       3.70

ib0:
step              total ms     max ms     min us  us / conn
create id    :       34.27       0.05       1.00       3.43
bind addr    :       26.45       0.04       1.00       2.64
resolve addr :       38.25      10.54     760.00       3.82
resolve route:      604.79       0.43      97.00      60.48
create qp    :     3314.95       6.34     273.00     331.49
connect      :    12399.26   12351.10    8609.00    1239.93
disconnect   :     5096.76     270.72    1015.00     509.68
destroy      :       37.10       0.03       2.00       3.71

It's worth noting that we still suffer a bit of a penalty on
connect to the wrong device, but the penalty is much less than
it used to be.  Follow on patches deal with this penalty.

Many thanks to Neil Horman for helping to track the source of
slow function that allowed us to track down the fact that
the original patch I mentioned above backed out cache usage
and identify just how much that impacted the system.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:24 -08:00
..
addr.c IB/addr: Add AF_IB support to ip_addr_size 2013-06-20 13:08:02 -07:00
agent.c IB/mad: Improve an error message so error code is included 2011-03-18 09:42:20 -07:00
agent.h
cache.c IB/core: Add ib_find_exact_cached_pkey() 2012-09-30 20:33:30 -07:00
cm_msgs.h IB/core: Move CM_xxx_ATTR_ID macros from cm_msgs.h to ib_cm.h 2012-07-08 18:05:06 -07:00
cm.c idr: remove MAX_IDR_MASK and move left MAX_IDR_* into idr.c 2013-02-27 19:10:20 -08:00
cma.c IB/cma: Use cached gids 2013-11-08 14:42:24 -08:00
core_priv.h IB/core: Allow device-specific per-port sysfs files 2010-05-21 10:34:44 -07:00
device.c IB/core: Handle table with full and partial membership for the same P_Key 2012-09-30 20:33:29 -07:00
fmr_pool.c hlist: drop the node parameter from iterators 2013-02-27 19:10:24 -08:00
iwcm.c RDMA/iwcm: Don't touch cmid after dropping reference 2013-04-24 17:47:33 -07:00
iwcm.h
mad_priv.h IB/mad: Allow tuning of QP0 and QP1 sizes 2009-09-07 08:28:48 -07:00
mad_rmpp.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
mad_rmpp.h
mad.c IB/core: Create QP1 using the pkey index which contains the default pkey 2013-07-31 14:15:17 -07:00
Makefile RDMA: Add netlink infrastructure 2011-05-20 11:46:11 -07:00
multicast.c infiniband: add in export.h for files using EXPORT_SYMBOL/THIS_MODULE 2011-10-31 19:31:35 -04:00
netlink.c infiniband: pass rdma_cm module to netlink_dump_start 2012-10-07 00:30:56 -04:00
packer.c infiniband: add in export.h for files using EXPORT_SYMBOL/THIS_MODULE 2011-10-31 19:31:35 -04:00
sa_query.c IB/sa: Export function to pack a path record into wire format 2013-06-20 23:35:39 -07:00
sa.h
smi.c IB/mad: Check hop count field in directed route MAD to avoid array overflow 2009-09-05 20:24:10 -07:00
smi.h
sysfs.c Main batch of InfiniBand/RDMA changes for 3.11 merge window: 2013-07-13 12:57:21 -07:00
ucm.c IB/core: convert to idr_alloc() 2013-02-27 19:10:16 -08:00
ucma.c RDMA/ucma: Allow user space to specify AF_IB when joining multicast 2013-06-20 23:35:45 -07:00
ud_header.c infiniband: add in export.h for files using EXPORT_SYMBOL/THIS_MODULE 2011-10-31 19:31:35 -04:00
umem.c IB/core: Fix mismatch between locked and pinned pages 2012-05-11 11:38:22 -07:00
user_mad.c switch device_get_devnode() and ->devnode() to umode_t * 2012-01-03 22:54:55 -05:00
uverbs_cmd.c IB/core: Temporarily disable create_flow/destroy_flow uverbs 2013-10-21 09:44:17 -07:00
uverbs_main.c IB/core: Temporarily disable create_flow/destroy_flow uverbs 2013-10-21 09:44:17 -07:00
uverbs_marshall.c infiniband: add in export.h for files using EXPORT_SYMBOL/THIS_MODULE 2011-10-31 19:31:35 -04:00
uverbs.h IB/core: Temporarily disable create_flow/destroy_flow uverbs 2013-10-21 09:44:17 -07:00
verbs.c Merge branches 'cxgb4', 'flowsteer', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next 2013-09-03 09:01:08 -07:00