Commit Graph

2774 Commits

Author SHA1 Message Date
Christoph Hellwig
b116c70279 RDMA/umem: Use ib_dma_max_seg_size instead of dma_get_max_seg_size
RDMA ULPs must not call DMA mapping APIs directly but instead use the
ib_dma_* wrappers.

Fixes: 0c16d9635e ("RDMA/umem: Move to allocate SG table from pages")
Link: https://lore.kernel.org/r/20201106181941.1878556-3-hch@lst.de
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12 13:33:43 -04:00
Leon Romanovsky
c5633a72a1 RDMA/core: Make FD destroy callback void
All FD object destroy implementations return 0, so declare this callback
void.

Link: https://lore.kernel.org/r/20201104144556.3809085-3-leon@kernel.org
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12 12:32:17 -04:00
Leon Romanovsky
efa968ee20 RDMA/core: Postpone uobject cleanup on failure till FD close
Remove the ib_is_destroyable_retryable() concept.

The idea here was to allow the drivers to forcibly clean the HW object
even if they otherwise didn't want to (eg because of usecnt). This was an
attempt to clean up in a world where drivers were not allowed to fail HW
object destruction.

Now that we are going back to allowing HW objects to fail destroy this
doesn't make sense. Instead if a uobject's HW object can't be destroyed it
is left on the uobject list and it is up to uverbs_destroy_ufile_hw() to
clean it. Multiple passes over the uobject list allow hidden dependencies
to be resolved. If that fails the HW driver is broken, throw a WARN_ON and
leak the HW object memory.

All the other tricky failure paths (eg on creation error unwind) have
already been updated to this new model.

Link: https://lore.kernel.org/r/20201104144556.3809085-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-12 12:32:17 -04:00
Meir Lichtinger
c7adf77173 IB/core: Add support for NDR link speed
Add new IBTA speed NDR, supporting signaling rate of 100Gb.

Link: https://lore.kernel.org/r/20201026133738.1340432-2-leon@kernel.org
Signed-off-by: Meir Lichtinger <meirl@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:48:56 -04:00
Jason Gunthorpe
d5c7916fe4 RDMA/mlx5: Use ib_umem_find_best_pgsz() for mkc's
Now that all the PAS arrays or UMR XLT's for mkcs are filled using
rdma_for_each_block() we can use the common ib_umem_find_best_pgsz()
algorithm.

Link: https://lore.kernel.org/r/20201026132314.1336717-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-11-02 15:10:50 -04:00
Joe Perches
e28bf1f03b RDMA: Convert various random sprintf sysfs _show uses to sysfs_emit
Manual changes for sysfs_emit as cocci scripts can't easily convert them.

Link: https://lore.kernel.org/r/ecde7791467cddb570c6f6d2c908ffbab9145cac.1602122880.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-30 21:03:52 -03:00
Joe Perches
45808361d4 RDMA: Manual changes for sysfs_emit and neatening
Make changes to use sysfs_emit in the RDMA code as cocci scripts can not
be written to handle _all_ the possible variants of various sprintf family
uses in sysfs show functions.

While there, make the code more legible and update its style to be more
like the typical kernel styles.

Miscellanea:

o Use intermediate pointers for dereferences
o Add and use string lookup functions
o return early when any intermediate call fails so normal return is
  at the bottom of the function
o mlx4/mcg.c:sysfs_show_group: use scnprintf to format intermediate strings

Link: https://lore.kernel.org/r/f5c9e4c9d8dafca1b7b70bd597ee7f8f219c31c8.1602122880.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Acked-by: Jack Wang <jinpu.wang@cloud.ionos.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-30 21:03:52 -03:00
Jing Xiangfeng
5333499c60 RDMA/core: Fix error return in _ib_modify_qp()
Fix to return error code PTR_ERR() from the error handling case instead of
0.

Fixes: 51aab12631 ("RDMA/core: Get xmit slave for LAG")
Link: https://lore.kernel.org/r/20201016075845.129562-1-jingxiangfeng@huawei.com
Signed-off-by: Jing Xiangfeng <jingxiangfeng@huawei.com>
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-28 10:32:22 -03:00
Joe Perches
1c7fd72687 RDMA: Convert sysfs device * show functions to use sysfs_emit()
Done with cocci script:

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	sprintf(buf,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	snprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	scnprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
expression chr;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	return
-	strcpy(buf, chr);
+	sysfs_emit(buf, chr);
	...>
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	sprintf(buf,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	snprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
	len =
-	scnprintf(buf, PAGE_SIZE,
+	sysfs_emit(buf,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
identifier len;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	<...
-	len += scnprintf(buf + len, PAGE_SIZE - len,
+	len += sysfs_emit_at(buf, len,
	...);
	...>
	return len;
}

@@
identifier d_show;
identifier dev, attr, buf;
expression chr;
@@

ssize_t d_show(struct device *dev, struct device_attribute *attr, char *buf)
{
	...
-	strcpy(buf, chr);
-	return strlen(buf);
+	return sysfs_emit(buf, chr);
}

Link: https://lore.kernel.org/r/7f406fa8e3aa2552c022bec680f621e38d1fe414.1602122879.git.joe@perches.com
Signed-off-by: Joe Perches <joe@perches.com>
Reviewed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:53:21 -03:00
Jason Gunthorpe
676a80adba RDMA: Remove AH from uverbs_cmd_mask
Drivers that need a uverbs AH should instead set the create_user_ah() op
similar to reg_user_mr(). MODIFY_AH and QUERY_AH cmds were never
implemented so are just deleted.

Link: https://lore.kernel.org/r/11-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:28:00 -03:00
Jason Gunthorpe
bd2a40cc24 RDMA/core Remove uverbs_ex_cmd_mask
No driver sets it, and the core code sets a maximum mask, simply remove
it.

Disabled operations are now handled either by having a NULL ops pointer,
or by having the common driver callbacks check for unsupported extended
attributes.

Link: https://lore.kernel.org/r/9-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:59 -03:00
Jason Gunthorpe
1f11a7610e RDMA: Check create_flags during create_qp
Each driver should check that the QP attrs create_flags is supported.
Unfortuantely when create_flags was added to the QP attrs the drivers were
not updated. uverbs_ex_cmd_mask was used to block it - even though kernel
drivers use these flags too.

Check that flags is zero in all drivers that don't use it, remove
IB_USER_VERBS_EX_CMD_CREATE_QP from uverbs_ex_cmd_mask. Fix the error code
to be EOPNOTSUPP.

Link: https://lore.kernel.org/r/8-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:59 -03:00
Jason Gunthorpe
1c407cb5d7 RDMA: Check flags during create_cq
Each driver should check that the CQ attrs is supported. Unfortuantely
when flags was added to the CQ attrs the drivers were not updated,
uverbs_ex_cmd_mask was used to block it. This was missed when create CQ
was converted to ioctl, so non-zero flags could have been passed into
drivers.

Check that flags is zero in all drivers that don't use it, remove
IB_USER_VERBS_EX_CMD_CREATE_CQ from uverbs_ex_cmd_mask.

Fixes: 41b2a71fc8 ("IB/uverbs: Move ioctl path of create_cq and destroy_cq to a new file")
Link: https://lore.kernel.org/r/7-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:59 -03:00
Jason Gunthorpe
26e990badd RDMA: Check attr_mask during modify_qp
Each driver should check that it can support the provided attr_mask during
modify_qp. IB_USER_VERBS_EX_CMD_MODIFY_QP was being used to block
modify_qp_ex because the driver didn't check RATE_LIMIT.

Link: https://lore.kernel.org/r/6-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe
652caba5b5 RDMA: Check srq_type during create_srq
uverbs was blocking srq_types the driver doesn't support based on the
CREATE_XSRQ cmd_mask. Fix all drivers to check for supported srq_types
during create_srq and move CREATE_XSRQ to the core code.

Link: https://lore.kernel.org/r/5-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe
44ce37bc8b RDMA: Move more uverbs_cmd_mask settings to the core
These functions all depend on the driver providing a specific op:

- REREG_MR is rereg_user_mr(). bnxt_re set this without providing the op
- ATTACH/DEATCH_MCAST is attach_mcast()/detach_mcast(). usnic set this
  without providing the op
- OPEN_QP doesn't involve the driver but requires a XRCD. qedr provides
  xrcd but forgot to set it, usnic doesn't provide XRCD but set it anyhow.
- OPEN/CLOSE_XRCD are the ops alloc_xrcd()/dealloc_xrcd()
- CREATE_SRQ/DESTROY_SRQ are the ops create_srq()/destroy_srq()
- QUERY/MODIFY_SRQ is op query_srq()/modify_srq(). hns sets this but
  sometimes supplies a NULL op.
- RESIZE_CQ is op resize_cq(). bnxt_re sets this boes doesn't supply an op
- ALLOC/DEALLOC_MW is alloc_mw()/dealloc_mw(). cxgb4 provided an
  (now deleted) implementation but no userspace

All drivers were checked that no drivers provide the op without also
setting uverbs_cmd_mask so this should have no functional change.

Link: https://lore.kernel.org/r/4-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:58 -03:00
Jason Gunthorpe
c074bb1e30 RDMA: Remove elements in uverbs_cmd_mask that all drivers set
This is a step toward eliminating uverbs_cmd_mask. Preset this list in the
core code. Only the op reg_user_mr wasn't already being required from the
drivers.

Link: https://lore.kernel.org/r/3-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:57 -03:00
Jason Gunthorpe
b8e3130dd9 RDMA: Remove uverbs_ex_cmd_mask values that are linked to functions
Since a while now the uverbs layer checks if the driver implements a
function before allowing the ucmd to proceed. This largely obsoletes the
cmd_mask stuff, but there is some tricky bits in drivers preventing it
from being removed.

Remove the easy elements of uverbs_ex_cmd_mask by pre-setting them in the
core code. These are triggered soley based on the related ops function
pointer.

query_device_ex is not triggered based on an op, but all drivers already
implement something compatible with the extension, so enable it globally
too.

Link: https://lore.kernel.org/r/2-v1-caa70ba3d1ab+1436e-ucmd_mask_jgg@nvidia.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 19:27:56 -03:00
Linus Torvalds
a1e16bc7d5 RDMA 5.10 pull request
The typical set of driver updates across the subsystem:
 
  - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma, hns,
    usnic, qib, qedr, cxgb4, hns, bnxt_re
 
  - Various rtrs fixes and updates
 
  - Bug fix for mlx4 CM emulation for virtualization scenarios where MRA
    wasn't working right
 
  - Use tracepoints instead of pr_debug in the CM code
 
  - Scrub the locking in ucma and cma to close more syzkaller bugs
 
  - Use tasklet_setup in the subsystem
 
  - Revert the idea that 'destroy' operations are not allowed to fail at
    the driver level. This proved unworkable from a HW perspective.
 
  - Revise how the umem API works so drivers make fewer mistakes using it
 
  - XRC support for qedr
 
  - Convert uverbs objects RWQ and MW to new the allocation scheme
 
  - Large queue entry sizes for hns
 
  - Use hmm_range_fault() for mlx5 On Demand Paging
 
  - uverbs APIs to inspect the GID table instead of sysfs
 
  - Move some of the RDMA code for building large page SGLs into
    lib/scatterlist
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEfB7FMLh+8QxL+6i3OG33FX4gmxoFAl+J37MACgkQOG33FX4g
 mxrKfRAAnIecwdE8df0yvVU5k0Eg6qVjMy9MMHq4va9m7g6GpUcNNI0nIlOASxH2
 l+9vnUQS3ebgsPeECaDYzEr0hh/u53+xw2g4WV5ts/hE8KkQ6erruXb9kasCe8yi
 5QWJ9K36T3c03Cd3EeH6JVtytAxuH42ombfo9BkFLPVyfG/R2tsAzvm5pVi73lxk
 46wtU1Bqi4tsLhyCbifn1huNFGbHp08OIBPAIKPUKCA+iBRPaWS+Dpi+93h3g3Bp
 oJwDhL9CBCGcHM+rKWLzek3Dy87FnQn7R1wmTpUFwkK+4AH3U/XazivhX035w1vL
 YJyhakVU0kosHlX9hJTNKDHJGkt0YEV2mS8dxAuqilFBtdnrVszb5/MirvlzC310
 /b5xCPSEusv9UVZV0G4zbySVNA9knZ4YaRiR3VDVMLKl/pJgTOwEiHIIx+vs3ejk
 p8GRWa1SjXw5LfZEQcq39J689ljt6xjCTonyuBSv7vSQq5v8pjBxvHxiAe2FIa2a
 ZyZeSCYoSh0SwJQukO2VO7aprhHP3TcCJ/987+X03LQ8tV2VWPktHqm62YCaDcOl
 fgiQuQdPivRjDDkJgMfDWDGKfZeHoWLKl5XsJhWByt0lablVrsvc+8ylUl1UI7gI
 16hWB/Qtlhfwg10VdApn+aOFpIS+s5P4XIp8ik57MZO+VeJzpmE=
 =LKpl
 -----END PGP SIGNATURE-----

Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma

Pull rdma updates from Jason Gunthorpe:
 "A usual cycle for RDMA with a typical mix of driver and core subsystem
  updates:

   - Driver minor changes and bug fixes for mlx5, efa, rxe, vmw_pvrdma,
     hns, usnic, qib, qedr, cxgb4, hns, bnxt_re

   - Various rtrs fixes and updates

   - Bug fix for mlx4 CM emulation for virtualization scenarios where
     MRA wasn't working right

   - Use tracepoints instead of pr_debug in the CM code

   - Scrub the locking in ucma and cma to close more syzkaller bugs

   - Use tasklet_setup in the subsystem

   - Revert the idea that 'destroy' operations are not allowed to fail
     at the driver level. This proved unworkable from a HW perspective.

   - Revise how the umem API works so drivers make fewer mistakes using
     it

   - XRC support for qedr

   - Convert uverbs objects RWQ and MW to new the allocation scheme

   - Large queue entry sizes for hns

   - Use hmm_range_fault() for mlx5 On Demand Paging

   - uverbs APIs to inspect the GID table instead of sysfs

   - Move some of the RDMA code for building large page SGLs into
     lib/scatterlist"

* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma: (191 commits)
  RDMA/ucma: Fix use after free in destroy id flow
  RDMA/rxe: Handle skb_clone() failure in rxe_recv.c
  RDMA/rxe: Move the definitions for rxe_av.network_type to uAPI
  RDMA: Explicitly pass in the dma_device to ib_register_device
  lib/scatterlist: Do not limit max_segment to PAGE_ALIGNED values
  IB/mlx4: Convert rej_tmout radix-tree to XArray
  RDMA/rxe: Fix bug rejecting all multicast packets
  RDMA/rxe: Fix skb lifetime in rxe_rcv_mcast_pkt()
  RDMA/rxe: Remove duplicate entries in struct rxe_mr
  IB/hfi,rdmavt,qib,opa_vnic: Update MAINTAINERS
  IB/rdmavt: Fix sizeof mismatch
  MAINTAINERS: CISCO VIC LOW LATENCY NIC DRIVER
  RDMA/bnxt_re: Fix sizeof mismatch for allocation of pbl_tbl.
  RDMA/bnxt_re: Use rdma_umem_for_each_dma_block()
  RDMA/umem: Move to allocate SG table from pages
  lib/scatterlist: Add support in dynamic allocation of SG table from pages
  tools/testing/scatterlist: Show errors in human readable form
  tools/testing/scatterlist: Rejuvenate bit-rotten test
  RDMA/ipoib: Set rtnl_link_ops for ipoib interfaces
  RDMA/uverbs: Expose the new GID query API to user space
  ...
2020-10-17 11:18:18 -07:00
Jann Horn
4d45e75a99 mm: remove the now-unnecessary mmget_still_valid() hack
The preceding patches have ensured that core dumping properly takes the
mmap_lock.  Thanks to that, we can now remove mmget_still_valid() and all
its users.

Signed-off-by: Jann Horn <jannh@google.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: "Eric W . Biederman" <ebiederm@xmission.com>
Cc: Oleg Nesterov <oleg@redhat.com>
Cc: Hugh Dickins <hughd@google.com>
Link: http://lkml.kernel.org/r/20200827114932.3572699-8-jannh@google.com
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2020-10-16 11:11:22 -07:00
Maor Gottlieb
c7a198c700 RDMA/ucma: Fix use after free in destroy id flow
ucma_free_ctx() should call to __destroy_id() on all the connection requests
that have not been delivered to user space. Currently it calls on the
context itself and cause to use after free.

Fixes the trace:

   BUG: Unable to handle kernel data access on write at 0x5deadbeef0000108
   Faulting instruction address: 0xc0080000002428f4
   Oops: Kernel access of bad area, sig: 11 [#1]
   Call Trace:
   [c000000207f2b680] [c00800000024280c] .__destroy_id+0x28c/0x610 [rdma_ucm] (unreliable)
   [c000000207f2b750] [c0080000002429c4] .__destroy_id+0x444/0x610 [rdma_ucm]
   [c000000207f2b820] [c008000000242c24] .ucma_close+0x94/0xf0 [rdma_ucm]
   [c000000207f2b8c0] [c00000000046fbdc] .__fput+0xac/0x330
   [c000000207f2b960] [c00000000015d48c] .task_work_run+0xbc/0x110
   [c000000207f2b9f0] [c00000000012fb00] .do_exit+0x430/0xc50
   [c000000207f2bae0] [c0000000001303ec] .do_group_exit+0x5c/0xd0
   [c000000207f2bb70] [c000000000144a34] .get_signal+0x194/0xe30
   [c000000207f2bc60] [c00000000001f6b4] .do_notify_resume+0x124/0x470
   [c000000207f2bd60] [c000000000032484] .interrupt_exit_user_prepare+0x1b4/0x240
   [c000000207f2be20] [c000000000010034] interrupt_return+0x14/0x1c0

Rename listen_ctx to conn_req_ctx as the poor name was the cause of this
bug.

Fixes: a1d33b70db ("RDMA/ucma: Rework how new connections are passed through event delivery")
Link: https://lore.kernel.org/r/20201012045600.418271-4-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-16 14:07:08 -03:00
Jason Gunthorpe
e0477b34d9 RDMA: Explicitly pass in the dma_device to ib_register_device
The code in setup_dma_device has become rather convoluted, move all of
this to the drivers. Drives now pass in a DMA capable struct device which
will be used to setup DMA, or drivers must fully configure the ibdev for
DMA and pass in NULL.

Other than setting the masks in rvt all drivers were doing this already
anyhow.

mthca, mlx4 and mlx5 were already setting up maximum DMA segment size for
DMA based on their hardweare limits in:
__mthca_init_one()
  dma_set_max_seg_size (1G)

__mlx4_init_one()
  dma_set_max_seg_size (1G)

mlx5_pci_init()
  set_dma_caps()
    dma_set_max_seg_size (2G)

Other non software drivers (except usnic) were extended to UINT_MAX [1, 2]
instead of 2G as was before.

[1] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/
[2] https://lore.kernel.org/linux-rdma/20200924114940.GE9475@nvidia.com/

Link: https://lore.kernel.org/r/20201008082752.275846-1-leon@kernel.org
Link: https://lore.kernel.org/r/6b2ed339933d066622d5715903870676d8cc523a.1602590106.git.mchehab+huawei@kernel.org
Suggested-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-16 13:53:46 -03:00
Jason Gunthorpe
16e7483e6f Merge branch 'dynamic_sg' into rdma.git for-next
From Maor Gottlieb says:

====================
This series extends __sg_alloc_table_from_pages to allow chaining of new
pages to an already initialized SG table.

This allows for drivers to utilize the optimization of merging contiguous
pages without a need to pre allocate all the pages and hold them in a very
large temporary buffer prior to the call to SG table initialization.

The last patch changes the Infiniband core to use the new API. It removes
duplicate functionality from the code and benefits from the optimization
of allocating dynamic SG table from pages.

In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries.
====================

* branch 'dynamic_sg':
  RDMA/umem: Move to allocate SG table from pages
  lib/scatterlist: Add support in dynamic allocation of SG table from pages
  tools/testing/scatterlist: Show errors in human readable form
  tools/testing/scatterlist: Rejuvenate bit-rotten test
2020-10-16 12:40:58 -03:00
Maor Gottlieb
0c16d9635e RDMA/umem: Move to allocate SG table from pages
Remove the implementation of ib_umem_add_sg_table and instead
call to __sg_alloc_table_from_pages which already has the logic to
merge contiguous pages.

Besides that it removes duplicated functionality, it reduces the
memory consumption of the SG table significantly. Prior to this
patch, the SG table was allocated in advance regardless consideration
of contiguous pages.

In huge pages system of 2MB page size, without this change, the SG table
would contain x512 SG entries.
E.g. for 100GB memory registration:

	 Number of entries	Size
Before 	      26214400          600.0MB
After            51200		  1.2MB

Link: https://lore.kernel.org/r/20201004154340.1080481-5-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-05 20:45:45 -03:00
Linus Torvalds
165563c050 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from David Miller:

 1) Make sure SKB control block is in the proper state during IPSEC
    ESP-in-TCP encapsulation. From Sabrina Dubroca.

 2) Various kinds of attributes were not being cloned properly when we
    build new xfrm_state objects from existing ones. Fix from Antony
    Antony.

 3) Make sure to keep BTF sections, from Tony Ambardar.

 4) TX DMA channels need proper locking in lantiq driver, from Hauke
    Mehrtens.

 5) Honour route MTU during forwarding, always. From Maciej
    Żenczykowski.

 6) Fix races in kTLS which can result in crashes, from Rohit
    Maheshwari.

 7) Skip TCP DSACKs with rediculous sequence ranges, from Priyaranjan
    Jha.

 8) Use correct address family in xfrm state lookups, from Herbert Xu.

 9) A bridge FDB flush should not clear out user managed fdb entries
    with the ext_learn flag set, from Nikolay Aleksandrov.

10) Fix nested locking of netdev address lists, from Taehee Yoo.

11) Fix handling of 32-bit DATA_FIN values in mptcp, from Mat Martineau.

12) Fix r8169 data corruptions on RTL8402 chips, from Heiner Kallweit.

13) Don't free command entries in mlx5 while comp handler could still be
    running, from Eran Ben Elisha.

14) Error flow of request_irq() in mlx5 is busted, due to an off by one
    we try to free and IRQ never allocated. From Maor Gottlieb.

15) Fix leak when dumping netlink policies, from Johannes Berg.

16) Sendpage cannot be performed when a page is a slab page, or the page
    count is < 1. Some subsystems such as nvme were doing so. Create a
    "sendpage_ok()" helper and use it as needed, from Coly Li.

17) Don't leak request socket when using syncookes with mptcp, from
    Paolo Abeni.

* git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (111 commits)
  net/core: check length before updating Ethertype in skb_mpls_{push,pop}
  net: mvneta: fix double free of txq->buf
  net_sched: check error pointer in tcf_dump_walker()
  net: team: fix memory leak in __team_options_register
  net: typhoon: Fix a typo Typoon --> Typhoon
  net: hinic: fix DEVLINK build errors
  net: stmmac: Modify configuration method of EEE timers
  tcp: fix syn cookied MPTCP request socket leak
  libceph: use sendpage_ok() in ceph_tcp_sendpage()
  scsi: libiscsi: use sendpage_ok() in iscsi_tcp_segment_map()
  drbd: code cleanup by using sendpage_ok() to check page for kernel_sendpage()
  tcp: use sendpage_ok() to detect misused .sendpage
  nvme-tcp: check page by sendpage_ok() before calling kernel_sendpage()
  net: add WARN_ONCE in kernel_sendpage() for improper zero-copy send
  net: introduce helper sendpage_ok() in include/linux/net.h
  net: usb: pegasus: Proper error handing when setting pegasus' MAC address
  net: core: document two new elements of struct net_device
  netlink: fix policy dump leak
  net/mlx5e: Fix race condition on nhe->n pointer in neigh update
  net/mlx5e: Fix VLAN create flow
  ...
2020-10-05 11:27:14 -07:00
Avihai Horon
9f85cbe50a RDMA/uverbs: Expose the new GID query API to user space
Expose the query GID table and entry API to user space by adding two new
methods and method handlers to the device object.

This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.

Link: https://lore.kernel.org/r/20200923165015.2491894-5-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 21:20:11 -03:00
Avihai Horon
c4b4d548fa RDMA/core: Introduce new GID table query API
Introduce rdma_query_gid_table which enables querying all the GID tables
of a given device and copying the attributes of all valid GID entries to a
provided buffer.

This API provides a faster way to query a GID table using single call and
will be used in libibverbs to improve current approach that requires
multiple calls to open, close and read multiple sysfs files for a single
GID table entry.

Link: https://lore.kernel.org/r/20200923165015.2491894-4-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 21:20:11 -03:00
Avihai Horon
1c15b4f2a4 RDMA/core: Modify enum ib_gid_type and enum rdma_network_type
Separate IB_GID_TYPE_IB and IB_GID_TYPE_ROCE to two different values, so
enum ib_gid_type will match the gid types of the new query GID table API
which will be introduced in the following patches.

This change in enum ib_gid_type requires to change also enum
rdma_network_type by separating RDMA_NETWORK_IB and RDMA_NETWORK_ROCE_V1
values.

Link: https://lore.kernel.org/r/20200923165015.2491894-3-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 21:20:11 -03:00
Avihai Horon
3ff4de8f60 RDMA/core: Change rdma_get_gid_attr returned error code
Change the error code returned from rdma_get_gid_attr when the GID entry
is invalid but the GID index is in the gid table size range to -ENODATA
instead of -EINVAL.

This change is done in order to provide a more accurate error reporting to
be used by the new GID query API in user space. Nevertheless, -EINVAL is
still returned from sysfs in the aforementioned case to maintain
compatibility with user space that expects -EINVAL.

Link: https://lore.kernel.org/r/20200923165015.2491894-2-leon@kernel.org
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 21:20:11 -03:00
Rikard Falkeborn
42d5179c89 RDMA/core: Constify struct attribute_group
The only usage of the pma_table field in the ib_port struct is to pass its
address to sysfs_create_group() and sysfs_remove_group(). Make it const to
make it possible to constify a couple of static struct
attribute_group. This allows the compiler to put them in read-only memory.

Link: https://lore.kernel.org/r/20200930224004.24279-2-rikard.falkeborn@gmail.com
Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 20:44:51 -03:00
Yishai Hadas
8bfafde086 IB/core: Enable ODP sync without faulting
Enable ODP sync without faulting, this improves performance by reducing
the number of page faults in the system.

The gain from this option is that the device page table can be aligned
with the presented pages in the CPU page table without causing page
faults.

As of that, the overhead on data path from hardware point of view to
trigger a fault which end-up by calling the driver to bring the pages
will be dropped.

Link: https://lore.kernel.org/r/20200930163828.1336747-3-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:44:05 -03:00
Yishai Hadas
36f30e486d IB/core: Improve ODP to use hmm_range_fault()
Move to use hmm_range_fault() instead of get_user_pags_remote() to improve
performance in a few aspects:

This includes:
- Dropping the need to allocate and free memory to hold its output

- No need any more to use put_page() to unpin the pages

- The logic to detect contiguous pages is done based on the returned
  order, no need to run per page and evaluate.

In addition, moving to use hmm_range_fault() enables to reduce page faults
in the system with it's snapshot mode, this will be introduced in next
patches from this series.

As part of this, cleanup some flows and use the required data structures
to work with hmm_range_fault().

Link: https://lore.kernel.org/r/20200930163828.1336747-2-leon@kernel.org
Signed-off-by: Yishai Hadas <yishaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-01 16:39:54 -03:00
Jason Gunthorpe
2ee9bf346f RDMA/addr: Fix race with netevent_callback()/rdma_addr_cancel()
This three thread race can result in the work being run once the callback
becomes NULL:

       CPU1                 CPU2                   CPU3
 netevent_callback()
                     process_one_req()       rdma_addr_cancel()
                      [..]
     spin_lock_bh()
  	set_timeout()
     spin_unlock_bh()

						spin_lock_bh()
						list_del_init(&req->list);
						spin_unlock_bh()

		     req->callback = NULL
		     spin_lock_bh()
		       if (!list_empty(&req->list))
                         // Skipped!
		         // cancel_delayed_work(&req->work);
		     spin_unlock_bh()

		    process_one_req() // again
		     req->callback() // BOOM
						cancel_delayed_work_sync()

The solution is to always cancel the work once it is completed so any
in between set_timeout() does not result in it running again.

Cc: stable@vger.kernel.org
Fixes: 44e75052bc ("RDMA/rdma_cm: Make rdma_addr_cancel into a fence")
Link: https://lore.kernel.org/r/20200930072007.1009692-1-leon@kernel.org
Reported-by: Dan Aloni <dan@kernelim.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-30 15:29:05 -03:00
Jason Gunthorpe
a6f0b08dba RDMA/core: Remove ucontext->closing
Nothing reads this any more, and the reason for its existence has passed
due to the deferred fput() scheme.

Fixes: 8ea1f989aa ("drivers/IB,usnic: reduce scope of mmap_sem")
Link: https://lore.kernel.org/r/0-v1-df64ff042436+42-uctx_closing_jgg@nvidia.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-30 15:27:19 -03:00
Leon Romanovsky
5807bb3205 RDMA/core: Align write and ioctl checks of QP types
The ioctl flow checks that the user provides only a supported list of QP
types, while write flow didn't do it and relied on the driver to check
it. Align those flows to fail as early as possible.

Link: https://lore.kernel.org/r/20200926102450.2966017-8-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-29 13:11:06 -03:00
Taehee Yoo
eff7423365 net: core: introduce struct netdev_nested_priv for nested interface infrastructure
Functions related to nested interface infrastructure such as
netdev_walk_all_{ upper | lower }_dev() pass both private functions
and "data" pointer to handle their own things.
At this point, the data pointer type is void *.
In order to make it easier to expand common variables and functions,
this new netdev_nested_priv structure is added.

In the following patch, a new member variable will be added into this
struct to fix the lockdep issue.

Signed-off-by: Taehee Yoo <ap420073@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2020-09-28 15:00:15 -07:00
Leon Romanovsky
b09c4d7012 RDMA/restrack: Improve readability in task name management
Use rdma_restrack_set_name() and rdma_restrack_parent_name() instead of
tricky uses of rdma_restrack_attach_task()/rdma_restrack_uadd().

This uniformly makes all restracks add'd using rdma_restrack_add().

Link: https://lore.kernel.org/r/20200922091106.2152715-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 19:47:35 -03:00
Leon Romanovsky
c34a23c28c RDMA/restrack: Simplify restrack tracking in kernel flows
Have a single rdma_restrack_add() that adds an entry, there is no reason
to split the user/kernel here, the rdma_restrack_set_task() is responsible
for this difference.

This patch prepares the code to the future requirement of making restrack
is mandatory for managing ib objects.

Link: https://lore.kernel.org/r/20200922091106.2152715-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 19:47:35 -03:00
Leon Romanovsky
13ef5539de RDMA/restrack: Count references to the verbs objects
Refactor the restrack code to make sure the kref inside the restrack entry
properly kref's the object in which it is embedded. This slight change is
needed for future conversions of MR and QP which are refcounted before the
release and kfree.

The ideal flow from ib_core perspective as follows:
* Allocate ib_* structure with rdma_zalloc_*.
* Set everything that is known to ib_core to that newly created object.
* Initialize kref with restrack help
* Call to driver specific allocation functions.
* Insert into restrack DB
....
* Return and release restrack with restrack_put.

Largely this means a rdma_restrack_new() should be called near allocating
the containing structure.

Link: https://lore.kernel.org/r/20200922091106.2152715-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 19:47:35 -03:00
Leon Romanovsky
60aaeffa36 RDMA/cma: Delete from restrack DB after successful destroy
Update the code to have similar destroy pattern like other IB objects.

This change create asymmetry to the rdma_id_private create flow to make
sure that memory is managed by restrack.

Link: https://lore.kernel.org/r/20200922091106.2152715-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-22 19:47:34 -03:00
Jason Gunthorpe
f5449e7480 RDMA/ucma: Rework ucma_migrate_id() to avoid races with destroy
ucma_destroy_id() assumes that all things accessing the ctx will do so via
the xarray. This assumption violated only in the case the FD is being
closed, then the ctx is reached via the ctx_list. Normally this is OK
since ucma_destroy_id() cannot run concurrenty with release(), however
with ucma_migrate_id() is involved this can violated as the close of the
2nd FD can run concurrently with destroy on the first:

                CPU0                      CPU1
        ucma_destroy_id(fda)
                                  ucma_migrate_id(fda -> fdb)
                                       ucma_get_ctx()
        xa_lock()
         _ucma_find_context()
         xa_erase()
        xa_unlock()
                                       xa_lock()
                                        ctx->file = new_file
                                        list_move()
                                       xa_unlock()
                                      ucma_put_ctx()

                                   ucma_close(fdb)
                                      _destroy_id()
                                      kfree(ctx)

        _destroy_id()
          wait_for_completion()
          // boom, ctx was freed

The ctx->file must be modified under the handler and xa_lock, and prior to
modification the ID must be rechecked that it is still reachable from
cur_file, ie there is no parallel destroy or migrate.

To make this work remove the double locking and streamline the control
flow. The double locking was obsoleted by the handler lock now directly
preventing new uevents from being created, and the ctx_list cannot be read
while holding fgets on both files. Removing the double locking also
removes the need to check for the same file.

Fixes: 88314e4dda ("RDMA/cma: add support for rdma_migrate_id()")
Link: https://lore.kernel.org/r/0-v1-05c5a4090305+3a872-ucma_syz_migrate_jgg@nvidia.com
Reported-and-tested-by: syzbot+cc6fc752b3819e082d0c@syzkaller.appspotmail.com
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 20:54:01 -03:00
Jason Gunthorpe
5dee5872f8 Merge branch 'mlx5_active_speed' into rdma.git for-next
Leon Romanovsky says:

====================
IBTA declares speed as 16 bits, but kernel stores it in u8. This series
fixes in-kernel declaration while keeping external interface intact.
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_active_speed':
  RDMA: Fix link active_speed size
  RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
  net/mlx5: Refactor query port speed functions
2020-09-18 10:31:45 -03:00
Aharon Landau
376ceb31ff RDMA: Fix link active_speed size
According to the IB spec active_speed size should be u16 and not u8 as
before. Changing it to allow further extensions in offered speeds.

Link: https://lore.kernel.org/r/20200917090223.1018224-4-leon@kernel.org
Signed-off-by: Aharon Landau <aharonl@mellanox.com>
Reviewed-by: Michael Guralnik <michaelgur@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-18 10:31:24 -03:00
Leon Romanovsky
c0a6b5ecc5 RDMA: Convert RWQ table logic to ib_core allocation scheme
Move struct ib_rwq_ind_table allocation to ib_core.

Link: https://lore.kernel.org/r/20200902081623.746359-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 14:04:33 -03:00
Leon Romanovsky
d18bb3e152 RDMA: Clean MW allocation and free flows
Move allocation and destruction of memory windows under ib_core
responsibility and clean drivers to ensure that no updates to MW
ib_core structures are done in driver layer.

Link: https://lore.kernel.org/r/20200902081623.746359-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 14:04:32 -03:00
Jason Gunthorpe
b5de0c60cc RDMA/cma: Fix use after free race in roce multicast join
The roce path triggers a work queue that continues to touch the id_priv
but doesn't hold any reference on it. Futher, unlike in the IB case, the
work queue is not fenced during rdma_destroy_id().

This can trigger a use after free if a destroy is triggered in the
incredibly narrow window after the queue_work and the work starting and
obtaining the handler_mutex.

The only purpose of this work queue is to run the ULP event callback from
the standard context, so switch the design to use the existing
cma_work_handler() scheme. This simplifies quite a lot of the flow:

- Use the cma_work_handler() callback to launch the work for roce. This
  requires generating the event synchronously inside the
  rdma_join_multicast(), which in turn means the dummy struct
  ib_sa_multicast can become a simple stack variable.

- cm_work_handler() used the id_priv kref, so we can entirely eliminate
  the kref inside struct cma_multicast. Since the cma_multicast never
  leaks into an unprotected work queue the kfree can be done at the same
  time as for IB.

- Eliminating the general multicast.ib requires using cma_set_mgid() in a
  few places to recompute the mgid.

Fixes: 3c86aa70bf ("RDMA/cm: Add RDMA CM support for IBoE devices")
Link: https://lore.kernel.org/r/20200902081122.745412-9-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:25 -03:00
Jason Gunthorpe
3788d2997b RDMA/cma: Consolidate the destruction of a cma_multicast in one place
Two places were open coding this sequence, and also pull in
cma_leave_roce_mc_group() which was called only once.

Link: https://lore.kernel.org/r/20200902081122.745412-8-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:24 -03:00
Jason Gunthorpe
1bb5091def RDMA/cma: Remove dead code for kernel rdmacm multicast
There is no kernel user of RDMA CM multicast so this code managing the
multicast subscription of the kernel-only internal QP is dead. Remove it.

This makes the bug fixes in the next patches much simpler.

Link: https://lore.kernel.org/r/20200902081122.745412-7-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:24 -03:00
Jason Gunthorpe
7e85bcda8b RDMA/cma: Combine cma_ndev_work with cma_work
These are the same thing, except that cma_ndev_work doesn't have a state
transition. Signal no state transition by setting old_state and new_state
== 0.

In all cases the handler function should not be called once
rdma_destroy_id() has progressed passed setting the state.

Link: https://lore.kernel.org/r/20200902081122.745412-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:24 -03:00
Jason Gunthorpe
5cfbf9291e RDMA/cma: Remove cma_comp()
The only place that still uses it is rdma_join_multicast() which is only
doing a sanity check that the caller hasn't done something wrong and
doesn't need the spinlock.

At least in the case of rdma_join_multicast() the information it needs
will remain until the ID is destroyed once it enters these
states. Similarly there is no reason to check for these specific states in
the handler callback, instead use the usual check for a destroyed id under
the handler_mutex.

Link: https://lore.kernel.org/r/20200902081122.745412-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-09-17 09:09:24 -03:00