linux/drivers/infiniband/core
Håkon Bugge 5f5a650999 RDMA/core/sa_query: Retry SA queries
A MAD packet is sent as an unreliable datagram (UD). SA requests are sent
as MAD packets. As such, SA requests or responses may be silently dropped.

IB Core's MAD layer has a timeout and retry mechanism, which amongst
other, is used by RDMA CM. But it is not used by SA queries. The lack of
retries of SA queries leads to long specified timeout, and error being
returned in case of packet loss. The ULP or user-land process has to
perform the retry.

Fix this by taking advantage of the MAD layer's retry mechanism.

First, a check against a zero timeout is added in rdma_resolve_route(). In
send_mad(), we set the MAD layer timeout to one tenth of the specified
timeout and the number of retries to 10. The special case when timeout is
less than 10 is handled.

With this fix:

 # ucmatose -c 1000 -S 1024 -C 1

runs stable on an Infiniband fabric. Without this fix, we see an
intermittent behavior and it errors out with:

cmatose: event: RDMA_CM_EVENT_ROUTE_ERROR, error: -110

(110 is ETIMEDOUT)

Link: https://lore.kernel.org/r/1628784755-28316-1-git-send-email-haakon.bugge@oracle.com
Signed-off-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-25 13:42:47 -03:00
..
addr.c RDMA/addr: Be strict with gid size 2021-04-08 16:14:56 -03:00
agent.c RDMA: Mark if destroy address handle is in a sleepable context 2018-12-19 16:28:03 -07:00
agent.h
cache.c IB/core: Shifting initialization of device->cache_lock 2021-07-16 10:57:28 -03:00
cgroup.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 288 2019-06-05 17:36:37 +02:00
cm_msgs.h RDMA/core: Add necessary spaces 2021-04-12 14:52:22 -03:00
cm_trace.c RDMA/cm: Replace pr_debug() call sites with tracepoints 2020-08-24 19:41:41 -03:00
cm_trace.h RDMA/cm: Add tracepoints to track MAD send operations 2020-08-24 19:41:41 -03:00
cm.c RDMA/core: Fix incorrect print format specifier 2021-06-21 15:38:30 -03:00
cma_configfs.c RDMA: Support more than 255 rdma ports 2021-03-26 09:31:21 -03:00
cma_priv.h IB/cma: Introduce rdma_set_min_rnr_timer() 2021-04-12 19:51:48 -03:00
cma_trace.c RDMA/cma: Add trace points in RDMA Connection Manager 2020-01-07 16:10:53 -04:00
cma_trace.h RDMA/core: Move the rdma_show_ib_cm_event() macro 2020-08-24 16:01:47 -03:00
cma.c RDMA/core/sa_query: Retry SA queries 2021-08-25 13:42:47 -03:00
core_priv.h RDMA/core: Create clean QP creations interface for uverbs 2021-08-03 15:26:19 -03:00
counters.c RDMA: Split the alloc_hw_stats() ops to port and device variants 2021-06-16 20:58:29 -03:00
cq.c RDMA/core: Clean up cq pool mechanism 2020-12-10 15:05:17 -04:00
device.c RDMA: Globally allocate and release QP memory 2021-08-03 13:44:27 -03:00
ib_core_uverbs.c RDMA/core: Ensure that rdma_user_mmap_entry_remove() is a fence 2020-01-25 14:48:33 -04:00
iwcm.c RDMA/iwcm: Release resources if iw_cm module initialization fails 2021-07-30 10:01:40 -03:00
iwcm.h RDMA/core: Use refcount_t instead of atomic_t on refcount of iwcm_id_private 2021-06-08 14:35:44 -03:00
iwpm_msg.c RDMA/iwpm: Rely on the rdma_nl_[un]register() to ensure that requests are valid 2021-07-30 10:01:41 -03:00
iwpm_util.c RDMA/iwpm: Rely on the rdma_nl_[un]register() to ensure that requests are valid 2021-07-30 10:01:41 -03:00
iwpm_util.h RDMA/iwpm: Rely on the rdma_nl_[un]register() to ensure that requests are valid 2021-07-30 10:01:41 -03:00
lag.c RDMA/core: Consider flow label when building skb 2020-05-06 16:51:43 -03:00
mad_priv.h RDMA/core: Remove refcount from struct ib_mad_snoop_private 2021-06-08 14:43:28 -03:00
mad_rmpp.c RDMA/core: Remove redundant spaces 2021-04-12 14:56:48 -03:00
mad_rmpp.h
mad.c RDMA/core: Fix incorrect print format specifier 2021-06-21 15:38:30 -03:00
Makefile RDMA/umem: Support importing dma-buf as user memory region 2021-01-20 16:07:52 -04:00
mr_pool.c Linux 5.2-rc6 2019-06-28 21:18:23 -03:00
multicast.c RDMA/core: Use refcount_t instead of atomic_t on refcount of mcast_port 2021-06-08 14:45:07 -03:00
netlink.c RDMA/core: Fix incorrect print format specifier 2021-06-21 15:38:30 -03:00
nldev.c RDMA/core: Replace the ib_port_data hw_stats pointers with a ib_port pointer 2021-06-16 20:58:29 -03:00
opa_smi.h RDMA: Support more than 255 rdma ports 2021-03-26 09:31:21 -03:00
packer.c
rdma_core.c RDMA/core: Correct misspellings of two words in comments 2021-03-26 11:58:26 -03:00
rdma_core.h IB/uverbs: Introduce create/destroy QP commands over ioctl 2020-05-21 20:39:36 -03:00
restrack.c RDMA: Globally allocate and release QP memory 2021-08-03 13:44:27 -03:00
restrack.h RDMA/restrack: Improve readability in task name management 2020-09-22 19:47:35 -03:00
roce_gid_mgmt.c RDMA: Fix kernel-doc warnings about wrong comment 2021-06-21 20:32:50 -03:00
rw.c RDMA/core: Fix incorrect print format specifier 2021-06-21 15:38:30 -03:00
sa_query.c RDMA/core/sa_query: Retry SA queries 2021-08-25 13:42:47 -03:00
sa.h RDMA: Support more than 255 rdma ports 2021-03-26 09:31:21 -03:00
security.c IB/core: Removed port validity check from ib_get_cached_subnet_prefix 2021-06-21 20:49:32 -03:00
smi.c RDMA: Support more than 255 rdma ports 2021-03-26 09:31:21 -03:00
smi.h RDMA: Support more than 255 rdma ports 2021-03-26 09:31:21 -03:00
sysfs.c RDMA/core: Fix incorrect print format specifier 2021-06-21 15:38:30 -03:00
trace.c RDMA/core: Clean up tracepoint headers 2020-07-06 14:54:46 -03:00
ucma.c RDMA/core: Use the DEVICE_ATTR_RO macro 2021-05-28 20:39:51 -03:00
ud_header.c RDMA/core: Fix incorrect print format specifier 2021-06-21 15:38:30 -03:00
umem_dmabuf.c RDMA/umem: fix missing automated rename 2021-06-07 15:41:50 +02:00
umem_odp.c IB/core: Remove deprecated current_seq comments 2021-08-23 13:43:08 -03:00
umem.c RDMA merge window pull request 2021-05-01 09:15:05 -07:00
user_mad.c RDMA/core: Fix incorrect print format specifier 2021-06-21 15:38:30 -03:00
uverbs_cmd.c RDMA/core: Create clean QP creations interface for uverbs 2021-08-03 15:26:19 -03:00
uverbs_ioctl.c IB/core: Split uverbs_get_const/default to consider target type 2021-03-11 20:20:36 -04:00
uverbs_main.c RDMA/core: Use refcount_t instead of atomic_t on refcount of ib_uverbs_device 2021-06-08 15:04:36 -03:00
uverbs_marshall.c IB/cm: Replace members of sa_path_rec with 'struct sgid_attr *' 2018-06-25 14:19:57 -06:00
uverbs_std_types_async_fd.c RDMA/core: Make FD destroy callback void 2020-11-12 12:32:17 -04:00
uverbs_std_types_counters.c RDMA/core: Postpone uobject cleanup on failure till FD close 2020-11-12 12:32:17 -04:00
uverbs_std_types_cq.c RDMA/core: Postpone uobject cleanup on failure till FD close 2020-11-12 12:32:17 -04:00
uverbs_std_types_device.c RDMA/uverbs: Fix a NULL vs IS_ERR() bug 2021-05-19 15:32:07 -03:00
uverbs_std_types_dm.c RDMA/core: Postpone uobject cleanup on failure till FD close 2020-11-12 12:32:17 -04:00
uverbs_std_types_flow_action.c RDMA/core: Postpone uobject cleanup on failure till FD close 2020-11-12 12:32:17 -04:00
uverbs_std_types_mr.c RDMA/uverbs: Add uverbs command for dma-buf based MR registration 2021-01-20 16:07:52 -04:00
uverbs_std_types_qp.c RDMA/core: Create clean QP creations interface for uverbs 2021-08-03 15:26:19 -03:00
uverbs_std_types_srq.c RDMA/core: Postpone uobject cleanup on failure till FD close 2020-11-12 12:32:17 -04:00
uverbs_std_types_wq.c RDMA/core: Postpone uobject cleanup on failure till FD close 2020-11-12 12:32:17 -04:00
uverbs_std_types.c RDMA/core: Make FD destroy callback void 2020-11-12 12:32:17 -04:00
uverbs_uapi.c RDMA/core: Fix incorrect print format specifier 2021-06-21 15:38:30 -03:00
uverbs.h RDMA/core: Use refcount_t instead of atomic_t on refcount of ib_uverbs_device 2021-06-08 15:04:36 -03:00
verbs.c RDMA/core: Create clean QP creations interface for uverbs 2021-08-03 15:26:19 -03:00