Commit Graph

70 Commits

Author SHA1 Message Date
Bob Pearson
72a0362744 RDMA/rxe: Remove rxe_alloc()
Currently all the object types in the rxe driver are allocated in
rdma-core except for MRs. By moving tha kzalloc() call outside of
the pool code the rxe_alloc() subroutine can be eliminated and code
checking for MR as a special case can be removed.

This patch moves the kzalloc() and kfree_rcu() calls into the mr
registration and destruction verbs. It removes that code from
rxe_pool.c including the rxe_alloc() subroutine which is no longer
used.

Link: https://lore.kernel.org/r/20230213225551.12437-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Reviewed-by: Devesh Sharma <devesh.s.sharma@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@linux.dev>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16 11:30:11 -04:00
Bob Pearson
5ff31dfcd6 Subject: RDMA/rxe: Handle zero length rdma
Currently the rxe driver does not handle all cases of zero length rdma
operations correctly. The client does not have to provide an rkey for zero
length RDMA read or write operations so the rkey provided may be invalid
and should not be used to lookup an mr.

This patch corrects the driver to ignore the provided rkey if the reth
length is zero for read or write operations and make sure to set the mr to
NULL. In read_reply() if length is zero rxe_recheck_mr() is not
called. Warnings are added in the routines in rxe_mr.c to catch NULL MRs
when the length is non-zero.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20230202044240.6304-1-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-02-16 11:13:52 -04:00
Bob Pearson
592627ccbd RDMA/rxe: Replace rxe_map and rxe_phys_buf by xarray
Replace struct rxe-phys_buf and struct rxe_map by struct xarray
in rxe_verbs.h. This allows using rcu locking on reads for
the memory maps stored in each mr.

This is based off of a sketch of a patch from Jason Gunthorpe in the
link below. Some changes were needed to make this work. It applies
cleanly to the current for-next and passes the pyverbs, perftest
and the same blktests test cases which run today.

Link: https://lore.kernel.org/r/20230119235936.19728-7-rpearsonhpe@gmail.com
Link: https://lore.kernel.org/linux-rdma/Y3gvZr6%2FNCii9Avy@nvidia.com/
Co-developed-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-27 12:14:14 -04:00
Bob Pearson
325a7eb851 RDMA/rxe: Cleanup page variables in rxe_mr.c
Cleanup usage of mr->page_shift and mr->page_mask and introduce
an extractor for mr->ibmr.page_size. Normal usage in the kernel
has page_mask masking out offset in page rather than masking out
the page number. The rxe driver had reversed that which was confusing.
Implicitly there can be a per mr page_size which was not uniformly
supported.

Link: https://lore.kernel.org/r/20230119235936.19728-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:46 -04:00
Bob Pearson
d8bdb0ebca RDMA-rxe: Isolate mr code from atomic_write_reply()
Isolate mr specific code from atomic_write_reply() in rxe_resp.c into
a subroutine rxe_mr_do_atomic_write() in rxe_mr.c.
Check length for atomic write operation.
Make iova_to_vaddr() static.

Link: https://lore.kernel.org/r/20230119235936.19728-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:45 -04:00
Bob Pearson
f04d5b3d91 RDMA-rxe: Isolate mr code from atomic_reply()
Isolate mr specific code from atomic_reply() in rxe_resp.c into
a subroutine rxe_mr_do_atomic_op() in rxe_mr.c.
Minor cleanups to rxe_check_range() and iova_to_vaddr().
Move enum resp_state to rxe.h

Link: https://lore.kernel.org/r/20230119235936.19728-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:45 -04:00
Bob Pearson
db4729a525 RDMA/rxe: Move rxe_map_mr_sg to rxe_mr.c
Move rxe_map_mr_sg() to rxe_mr.c where it makes a little more sense.

Link: https://lore.kernel.org/r/20230119235936.19728-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:45 -04:00
Bob Pearson
ade58da2a7 RDMA/rxe: Cleanup mr_check_range
Remove blank lines and replace EFAULT by EINVAL when an invalid
mr type is used.

Link: https://lore.kernel.org/r/20230119235936.19728-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2023-01-26 15:04:44 -04:00
Li Zhijian
ea1bb00ee9 RDMA/rxe: Implement flush execution in responder side
Only the requested placement types that also registered in the destination
memory region are acceptable.
Otherwise, responder will also reply NAK "Remote Access Error" if it
found a placement type violation.

We will persist data via arch_wb_cache_pmem(), which could be
architecture specific.

This commit also adds 2 helpers to update qp.resp from the incoming packet.

Link: https://lore.kernel.org/r/20221206130201.30986-8-lizhijian@fujitsu.com
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
Li Zhijian
02ea0a5115 RDMA/rxe: Allow registering persistent flag for pmem MR only
Memory region could  support at most 2 flush access flags:
IB_ACCESS_FLUSH_PERSISTENT and IB_ACCESS_FLUSH_GLOBAL

But we only allow user to register persistent flush flags to the pmem MR
where it has the ability of persisting data across power cycles.

So registering a persistent access flag to a non-pmem MR will be rejected.

Link: https://lore.kernel.org/r/20221206130201.30986-5-lizhijian@fujitsu.com
CC: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-12-09 19:36:02 -04:00
Jason Gunthorpe
cb6562c380 RDMA/rxe: Do not NULL deref on debugging failure path
Correct the mistake, mr is obviously NULL in this code path.

Fixes: 2778b72b1d ("RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mr.c")
Link: https://lore.kernel.org/r/Y3eeJW0AdyJYhYyQ@kili
Reported-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-22 15:53:02 -04:00
Li Zhijian
7d984dac8f RDMA/rxe: Fix mr->map double free
rxe_mr_cleanup() which tries to free mr->map again will be called when
rxe_mr_init_user() fails:

   CPU: 0 PID: 4917 Comm: rdma_flush_serv Kdump: loaded Not tainted 6.1.0-rc1-roce-flush+ #25
   Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
   Call Trace:
    <TASK>
    dump_stack_lvl+0x45/0x5d
    panic+0x19e/0x349
    end_report.part.0+0x54/0x7c
    kasan_report.cold+0xa/0xf
    rxe_mr_cleanup+0x9d/0xf0 [rdma_rxe]
    __rxe_cleanup+0x10a/0x1e0 [rdma_rxe]
    rxe_reg_user_mr+0xb7/0xd0 [rdma_rxe]
    ib_uverbs_reg_mr+0x26a/0x480 [ib_uverbs]
    ib_uverbs_handler_UVERBS_METHOD_INVOKE_WRITE+0x1a2/0x250 [ib_uverbs]
    ib_uverbs_cmd_verbs+0x1397/0x15a0 [ib_uverbs]

This issue was firstly exposed since commit b18c7da63f ("RDMA/rxe: Fix
memory leak in error path code") and then we fixed it in commit
8ff5f5d9d8 ("RDMA/rxe: Prevent double freeing rxe_map_set()") but this
fix was reverted together at last by commit 1e75550648 (Revert
"RDMA/rxe: Create duplicate mapping tables for FMRs")

Simply let rxe_mr_cleanup() always handle freeing the mr->map once it is
successfully allocated.

Fixes: 1e75550648 ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"")
Link: https://lore.kernel.org/r/1667099073-2-1-git-send-email-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-18 20:15:51 -04:00
Bob Pearson
2778b72b1d RDMA/rxe: Replace pr_xxx by rxe_dbg_xxx in rxe_mr.c
Replace calls to pr_xxx() in rxe_mr.c by rxe_dbg_mr().

Link: https://lore.kernel.org/r/20221103171013.20659-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-11-10 15:33:03 -04:00
Xiao Yang
b071850ef6 RDMA/rxe: Remove the duplicate assignment of mr->map_shift
mr->map_shift is set to ilog2(RXE_BUF_PER_MAP) in both rxe_mr_init() and
rxe_mr_alloc() so remove the duplicate one in rxe_mr_init().

Link: https://lore.kernel.org/r/1666855893-145-1-git-send-email-yangx.jy@fujitsu.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 15:08:41 -03:00
Li Zhijian
875ab4a8d9 RDMA/rxe: Make sure requested access is a subset of {mr,mw}->access
We should reject the requests with access flags that is not registered by
MR/MW. For example, lookup_mr() should return NULL when requested access
is 0x03 and mr->access is 0x01.

Link: https://lore.kernel.org/r/20220927055337.22630-2-lizhijian@fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-10-28 14:39:47 -03:00
yangx.jy@fujitsu.com
71d2363991 RDMA/rxe: Remove the member 'type' of struct rxe_mr
The member 'type' is included in both struct rxe_mr and struct ib_mr
so remove the duplicate one of struct rxe_mr.

Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Link: https://lore.kernel.org/r/20221021134513.17730-1-yangx.jy@fujitsu.com
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-10-24 14:50:14 +03:00
Bob Pearson
58651bbb30 RDMA/rxe: Set pd early in mr alloc routines
Move setting of pd in mr objects ahead of any possible errors so that it
will always be set in rxe_mr_cleanup() to avoid seg faults when
rxe_put(mr_pd(mr)) is called.

Fixes: cf40367961 ("RDMA/rxe: Move mr cleanup code to rxe_mr_cleanup()")
Link: https://lore.kernel.org/r/20220805183153.32007-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-09-27 10:15:24 -03:00
Daisuke Matsuda
954afc5a8f RDMA/rxe: Use members of generic struct in rxe_mr
rxe_mr and ib_mr have interchangeable members. Remove device specific
members and use ones in the generic struct. Both 'iova' and 'length' are
filled in ib_uverbs or ib_core layer after MR registration.

Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Link: https://lore.kernel.org/r/20220921080844.1616883-2-matsuda-daisuke@fujitsu.com
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-09-22 12:46:39 +03:00
Daisuke Matsuda
d4ecb56e86 RDMA/rxe: Remove an unused member from struct rxe_mr
Commit 1e75550648 ("Revert "RDMA/rxe: Create duplicate mapping tables for
FMRs"") brought back the member 'va' to struct rxe_mr. However, it is
actually used by nobody and thus can be removed.

Fixes: 1e75550648 ("Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"")
Link: https://lore.kernel.org/r/20220829012335.1212697-1-matsuda-daisuke@fujitsu.com
Signed-off-by: Daisuke Matsuda <matsuda-daisuke@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-08-29 09:44:07 +03:00
Li Zhijian
1e75550648 Revert "RDMA/rxe: Create duplicate mapping tables for FMRs"
Below 2 commits will be reverted:
 commit 8ff5f5d9d8 ("RDMA/rxe: Prevent double freeing rxe_map_set()")
 commit 647bf13ce9 ("RDMA/rxe: Create duplicate mapping tables for FMRs")

The community has a few bug reports which pointed this commit at last.
Some proposals are raised up in the meantime but all of them have no
follow-up operation.

The previous commit led the map_set of FMR to be not available any more if
the MR is registered again after invalidating. Although the mentioned
patch try to fix a potential race in building/accessing the same table
for fast memory regions, it broke rtrs etc ULPs. Since the latter could
be worse, revert this patch.

With previous commit, it's observed that a same MR in rnbd server will
trigger below code path:
 -> rxe_mr_init_fast()
 |-> alloc map_set() # map_set is uninitialized
 |...-> rxe_map_mr_sg() # build the map_set
     |-> rxe_mr_set_page()
 |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE that means
                          # we can access host memory(such rxe_mr_copy)
 |...-> rxe_invalidate_mr() # mr->state change to FREE from VALID
 |...-> rxe_reg_fast_mr() # mr->state change to VALID from FREE,
                          # but map_set was not built again
 |...-> rxe_mr_copy() # kernel crash due to access wild addresses
                      # that lookup from the map_set

The backtraces are not always identical.
[1st]----------
  RIP: 0010:lookup_iova+0x66/0xa0 [rdma_rxe]
  Code: 00 00 00 48 d3 ee 89 32 c3 4c 8b 18 49 8b 3b 48 8b 47 08 48 39 c6 72 38 48 29 c6 45 31 d2 b8 01 00 00 00 48 63 c8 48 c1 e1 04 <48> 8b 4c 0f 08 48 39 f1 77 21 83 c0 01 48 29 ce 3d 00 01 00 00 75
  RSP: 0018:ffffb7ff80063bf0 EFLAGS: 00010246
  RAX: 0000000000000000 RBX: ffff9b9949d86800 RCX: 0000000000000000
  RDX: ffffb7ff80063c00 RSI: 0000000049f6b378 RDI: 002818da00000004
  RBP: 0000000000000120 R08: ffffb7ff80063c08 R09: ffffb7ff80063c04
  R10: 0000000000000002 R11: ffff9b9916f7eef8 R12: ffff9b99488a0038
  R13: ffff9b99488a0038 R14: ffff9b9914fb346a R15: ffff9b990ab27000
  FS:  0000000000000000(0000) GS:ffff9b997dc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007efc33a98ed0 CR3: 0000000014f32004 CR4: 00000000001706f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   rxe_mr_copy.part.0+0x6f/0x140 [rdma_rxe]
   rxe_responder+0x12ee/0x1b60 [rdma_rxe]
   ? rxe_icrc_check+0x7e/0x100 [rdma_rxe]
   ? rxe_rcv+0x1d0/0x780 [rdma_rxe]
   ? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe]
   rxe_do_task+0x67/0xb0 [rdma_rxe]
   rxe_xmit_packet+0xc7/0x210 [rdma_rxe]
   rxe_requester+0x680/0xee0 [rdma_rxe]
   ? update_load_avg+0x5f/0x690
   ? update_load_avg+0x5f/0x690
   ? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client]

[2nd]----------
  RIP: 0010:rxe_mr_copy.part.0+0xa8/0x140 [rdma_rxe]
  Code: 00 00 49 c1 e7 04 48 8b 00 4c 8d 2c d0 48 8b 44 24 10 4d 03 7d 00 85 ed 7f 10 eb 6c 89 54 24 0c 49 83 c7 10 31 c0 85 ed 7e 5e <49> 8b 3f 8b 14 24 4c 89 f6 48 01 c7 85 d2 74 06 48 89 fe 4c 89 f7
  RSP: 0018:ffffae3580063bf8 EFLAGS: 00010202
  RAX: 0000000000018978 RBX: ffff9d7ef7a03600 RCX: 0000000000000008
  RDX: 000000000000007c RSI: 000000000000007c RDI: ffff9d7ef7a03600
  RBP: 0000000000000120 R08: ffffae3580063c08 R09: ffffae3580063c04
  R10: ffff9d7efece0038 R11: ffff9d7ec4b1db00 R12: ffff9d7efece0038
  R13: ffff9d7ef4098260 R14: ffff9d7f11e23c6a R15: 4c79500065708144
  FS:  0000000000000000(0000) GS:ffff9d7f3dc00000(0000) knlGS:0000000000000000
  CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
  CR2: 00007fce47276c60 CR3: 0000000003f66004 CR4: 00000000001706f0
  DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
  DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
  Call Trace:
   <TASK>
   rxe_responder+0x12ee/0x1b60 [rdma_rxe]
   ? rxe_icrc_check+0x7e/0x100 [rdma_rxe]
   ? rxe_rcv+0x1d0/0x780 [rdma_rxe]
   ? rxe_icrc_hdr.isra.0+0xf6/0x160 [rdma_rxe]
   rxe_do_task+0x67/0xb0 [rdma_rxe]
   rxe_xmit_packet+0xc7/0x210 [rdma_rxe]
   rxe_requester+0x680/0xee0 [rdma_rxe]
   ? update_load_avg+0x5f/0x690
   ? update_load_avg+0x5f/0x690
   ? rtrs_clt_recv_done+0x1b/0x30 [rtrs_client]
   rxe_do_task+0x67/0xb0 [rdma_rxe]
   tasklet_action_common.constprop.0+0x92/0xc0
   __do_softirq+0xe1/0x2d8
   run_ksoftirqd+0x21/0x30
   smpboot_thread_fn+0x183/0x220
   ? sort_range+0x20/0x20
   kthread+0xe2/0x110
   ? kthread_complete_and_exit+0x20/0x20
   ret_from_fork+0x22/0x30

Link: https://lore.kernel.org/r/1658805386-2-1-git-send-email-lizhijian@fujitsu.com
Link: https://lore.kernel.org/all/20220210073655.42281-1-guoqing.jiang@linux.dev/T/
Link: https://www.spinics.net/lists/linux-rdma/msg110836.html
Link: https://lore.kernel.org/lkml/94a5ea93-b8bb-3a01-9497-e2021f29598a@linux.dev/t/
Tested-by: Md Haris Iqbal <haris.iqbal@ionos.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Li Zhijian <lizhijian@fujitsu.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
2022-07-27 12:22:57 +03:00
Md Haris Iqbal
174e7b1370 RDMA/rxe: For invalidate compare according to set keys in mr
The 'rkey' input can be an lkey or rkey, and in rxe the lkey or rkey have
the same value, including the variant bits.

So, if mr->rkey is set, compare the invalidate key with it, otherwise
compare with the mr->lkey.

Since we already did a lookup on the non-varient bits to get this far, the
check's only purpose is to confirm that the wqe has the correct variant
bits.

Fixes: 001345339f ("RDMA/rxe: Separate HW and SW l/rkeys")
Link: https://lore.kernel.org/r/20220707073006.328737-1-haris.phnx@gmail.com
Signed-off-by: Md Haris Iqbal <haris.phnx@gmail.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-07-22 17:35:53 -03:00
Bob Pearson
215d0a755e RDMA/rxe: Stop lookup of partially built objects
Currently the rdma_rxe driver has a security weakness due to giving
objects which are partially initialized indices allowing external actors
to gain access to them by sending packets which refer to their
index (e.g. qpn, rkey, etc) causing unpredictable results.

This patch adds a new API rxe_finalize(obj) which enables looking up pool
objects from indices using rxe_pool_get_index() for AH, QP, MR, and
MW. They are added in create verbs only after the objects are fully
initialized.

It also adds wait for completion to destroy/dealloc verbs to assure that
all references have been dropped before returning to rdma_core by
implementing a new rxe_pool API rxe_cleanup() which drops a reference to
the object and then waits for all other references to be dropped.  When
the last reference is dropped the object is completed by kref.  After that
it cleans up the object and if locally allocated frees the memory. In the
special case of address handle objects the delay is implemented separately
if the destroy_ah call is not sleepable.

Combined with deferring cleanup code to type specific cleanup routines
this allows all pending activity referring to objects to complete before
returning to rdma_core.

Link: https://lore.kernel.org/r/20220612223434.31462-2-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-06-30 10:56:01 -03:00
Bob Pearson
cf40367961 RDMA/rxe: Move mr cleanup code to rxe_mr_cleanup()
Move the code which tears down an mr to rxe_mr_cleanup to allow operations
holding a reference to the mr to complete.

Link: https://lore.kernel.org/r/20220421014042.26985-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-05-09 09:03:45 -03:00
Bob Pearson
3197706abd RDMA/rxe: Use standard names for ref counting
Rename rxe_add_ref() to rxe_get() and rxe_drop_ref() to rxe_put().
Significantly improves readability for new readers.

Link: https://lore.kernel.org/r/20220304000808.225811-10-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-03-16 10:34:42 -03:00
Bob Pearson
3225717f6d RDMA/rxe: Replace red-black trees by xarrays
Currently the rxe driver uses red-black trees to add indices to the rxe
object pools. Linux xarrays provide a better way to implement the same
functionality for indices. This patch replaces red-black trees by xarrays
for pool objects. Since xarrays already have a spinlock use that in place
of the pool rwlock. Make sure that all changes in the xarray(index) and
kref(ref counnt) occur atomically.

Link: https://lore.kernel.org/r/20220304000808.225811-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-03-16 10:34:42 -03:00
Jason Gunthorpe
c0fe82baae Linux 5.16
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmHbZ+YeHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGDs4H/RgC8JOV3Dki1VtO
 6OwPxUKKojhVU9LJis7kyG5voB/zE7tK5nI+jC3gYGQUFKWaZ3YY8s3UcV1zvg/b
 a44b91boA+dKxEwOq4RZNQ9mU+QWnNoG5+UqBkmB8vewi3QC3T8xEmpWcERLbU7d
 KrI2T6i4ksJ9OYSYMEMyrvrpt7nt3n1tDX8b71faXjf1zbLeGo9zT53t6BJ/LknV
 AK406Eq/3bg36OZrKFuG7hCJfRE/cSlxF9bxK3sIfMBMQ2YPe1S5+pxl5iBD0nyl
 NaHOBYcLTxPAne3YgIvK0zDdsS+EtPSlaVdWfSmNjQhX2vqEixldgdrOCmwp37vd
 3gV9D28=
 =hrOo
 -----END PGP SIGNATURE-----

Merge tag 'v5.16' into rdma.git for-next

To resolve minor conflict in:
        drivers/infiniband/hw/mlx5/mlx5_ib.h

By merging both hunks.

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-13 13:21:03 -04:00
Li Zhijian
8ff5f5d9d8 RDMA/rxe: Prevent double freeing rxe_map_set()
The same rxe_map_set could be freed twice:

rxe_reg_user_mr()
  -> rxe_mr_init_user()
    -> rxe_mr_free_map_set() # 1st

  -> rxe_drop_ref()
   ...
    -> rxe_mr_cleanup()
      -> rxe_mr_free_map_set() # 2nd

Follow normal convection and put resource cleanup either in the error
unwind of the allocator, or the overall free function. Leave the object
unchanged with a NULL cur_map_set on failure and remove the unncessary
free in rxe_mr_init_user().

Link: https://lore.kernel.org/r/20211228014406.1033444-1-lizhijian@cn.fujitsu.com
Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2022-01-04 10:29:34 -04:00
Bob Pearson
02827b6708 RDMA/rxe: Cleanup rxe_pool_entry
Currently three different names are used to describe rxe pool elements.
They are referred to as entries, elems or pelems. This patch chooses one
'elem' and changes the other ones.

Link: https://lore.kernel.org/r/20211103050241.61293-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-11-19 13:29:14 -04:00
Bob Pearson
450f4f6aa1 RDMA/rxe: Only allow invalidate for appropriate MRs
Local and remote invalidate operations are not allowed by IBA for MRs
created by (re)register memory verbs. This patch checks the MR type in
rxe_invalidate_mr().

Link: https://lore.kernel.org/r/20210914164206.19768-6-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-24 10:15:00 -03:00
Bob Pearson
647bf13ce9 RDMA/rxe: Create duplicate mapping tables for FMRs
For fast memory regions create duplicate mapping tables so ib_map_mr_sg()
can build a new mapping table which is then swapped into place
synchronously with the execution of an IB_WR_REG_MR work request.

Currently the rxe driver uses the same table for receiving RDMA operations
and for building new tables in preparation for reusing the MR. This
exposes users to potentially incorrect results.

Link: https://lore.kernel.org/r/20210914164206.19768-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-24 10:15:00 -03:00
Bob Pearson
001345339f RDMA/rxe: Separate HW and SW l/rkeys
Separate software and simulated hardware lkeys and rkeys for MRs and MWs.
This makes struct ib_mr and struct ib_mw isolated from hardware changes
triggered by executing work requests.

This change fixes a bug seen in blktest.

Link: https://lore.kernel.org/r/20210914164206.19768-4-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-24 10:14:59 -03:00
Bob Pearson
47b7f7064b RDMA/rxe: Cleanup MR status and type enums
Eliminate RXE_MR_STATE_ZOMBIE which is not compatible with IBA.
RXE_MR_STATE_INVALID is better.

Replace RXE_MR_TYPE_XXX by IB_MR_TYPE_XXX which covers all the needed
types.

Link: https://lore.kernel.org/r/20210914164206.19768-3-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-09-24 10:14:59 -03:00
Jason Gunthorpe
6a217437f9 Merge branch 'sg_nents' into rdma.git for-next
From Maor Gottlieb
====================

Fix the use of nents and orig_nents in the sg table append helpers. The
nents should be used by the DMA layer to store the number of DMA mapped
sges, the orig_nents is the number of CPU sges.

Since the sg append logic doesn't always create a SGL with exactly
orig_nents entries store a total_nents as well to allow the table to be
properly free'd and reorganize the freeing logic to share across all the
use cases.

====================

Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>

* 'sg_nents':
  RDMA: Use the sg_table directly and remove the opencoded version from umem
  lib/scatterlist: Fix wrong update of orig_nents
  lib/scatterlist: Provide a dedicated function to support table append
2021-08-30 09:49:59 -03:00
Maor Gottlieb
79fbd3e124 RDMA: Use the sg_table directly and remove the opencoded version from umem
This allows using the normal sg_table APIs and makes all the code
cleaner. Remove sgt, nents and nmapd from ib_umem.

Link: https://lore.kernel.org/r/20210824142531.3877007-4-maorg@nvidia.com
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-08-24 19:52:40 -03:00
Bob Pearson
1117f26ea7 RDMA/rxe: Move ICRC generation to a subroutine
Isolate ICRC generation into a single subroutine named rxe_generate_icrc()
in rxe_icrc.c. Remove scattered crc generation code from elsewhere.

Link: https://lore.kernel.org/r/20210707040040.15434-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-16 12:43:34 -03:00
Bob Pearson
b18c7da63f RDMA/rxe: Fix memory leak in error path code
In rxe_mr_init_user() at the third error the driver fails to free the
memory at mr->map. This patch adds code to do that.  This error only
occurs if page_address() fails to return a non zero address which should
never happen for 64 bit architectures.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20210705164153.17652-1-rpearsonhpe@gmail.com
Reported by: Haakon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-15 14:44:12 -03:00
Xiao Yang
cdbdb77247 RDMA/rxe: Remove the repeated 'mr->umem = umem'
Drop duplicated code

Link: https://lore.kernel.org/r/20210702123024.37025-1-ice_yangxiao@163.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Reviewed-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-07-15 14:43:19 -03:00
Xiao Yang
20ec0a6d60 RDMA/rxe: Don't overwrite errno from ib_umem_get()
rxe_mr_init_user() always returns the fixed -EINVAL when ib_umem_get()
fails so it's hard for user to know which actual error happens in
ib_umem_get(). For example, ib_umem_get() will return -EOPNOTSUPP when
trying to pin pages on a DAX file.

Return actual error as mlx4/mlx5 does.

Link: https://lore.kernel.org/r/20210621071456.4259-1-ice_yangxiao@163.com
Signed-off-by: Xiao Yang <yangx.jy@fujitsu.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-21 21:05:02 -03:00
Bob Pearson
570d2b99d0 RDMA/rxe: Disallow MR dereg and invalidate when bound
Check that an MR has no bound MWs before allowing a dereg or invalidate
operation.

Link: https://lore.kernel.org/r/20210608042552.33275-11-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16 20:51:19 -03:00
Bob Pearson
3902b429ca RDMA/rxe: Implement invalidate MW operations
Implement invalidate MW and cleaned up invalidate MR operations.

Added code to perform remote invalidate for send with invalidate.  Added
code to perform local invalidation. Deleted some blank lines in rxe_loc.h.

Link: https://lore.kernel.org/r/20210608042552.33275-9-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16 20:51:18 -03:00
Bob Pearson
beec0239c3 RDMA/rxe: Add ib_alloc_mw and ib_dealloc_mw verbs
Add ib_alloc_mw and ib_dealloc_mw verbs APIs.

Added new file rxe_mw.c focused on MWs. Changed the 8 bit random key
generator. Added a cleanup routine for MWs. Added verbs routines to
ib_device_ops.

Link: https://lore.kernel.org/r/20210608042552.33275-5-rpearsonhpe@gmail.com
Signed-off-by: Bob Pearson <rpearsonhpe@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-06-16 20:51:17 -03:00
Lang Cheng
cd5b010fff RDMA/rxe: Remove unused parameter udata
The old version of ib_umem_get() need these udata as a parameter but now
they are unnecessary.

Fixes: c320e527e1 ("IB: Allow calls to ib_umem_get from kernel ULPs")
Link: https://lore.kernel.org/r/1620807142-39157-5-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-05-20 11:52:17 -03:00
Bob Pearson
364e282c4f RDMA/rxe: Split MEM into MR and MW
In the original rxe implementation it was intended to use a common object
to represent MRs and MWs but they are different enough to separate these
into two objects.

This allows replacing the mem name with mr for MRs which is more
consistent with the style for the other objects and less likely to be
confusing. This is a long patch that mostly changes mem to mr where it
makes sense and adds a new rxe_mw struct.

Link: https://lore.kernel.org/r/20210325212425.2792-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Acked-by: Zhu Yanjun <zyjzyj2000@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2021-03-30 17:11:30 -03:00
Bob Pearson
eeed696507 RDMA/rxe: Remove unused RXE_MR_TYPE_FMR
This is a left over from the past. It is no longer used.

Link: https://lore.kernel.org/r/20201009165112.271143-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-26 20:00:27 -03:00
Bob Pearson
1858d98b83 RDMA/rxe: Remove duplicate entries in struct rxe_mr
Struct rxe_mem had pd, lkey and rkey values both in itself and in the
struct ib_mr which is also included in rxe_mem.

Delete these entries and replace references with the ones in ibmr.Add
mr_pd, mr_lkey and mr_rkey macros which extract these values from mr.

Link: https://lore.kernel.org/r/20201008212818.265303-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-10-08 20:07:50 -03:00
Jason Gunthorpe
5dee5872f8 Merge branch 'mlx5_active_speed' into rdma.git for-next
Leon Romanovsky says:

====================
IBTA declares speed as 16 bits, but kernel stores it in u8. This series
fixes in-kernel declaration while keeping external interface intact.
====================

Based on the mlx5-next branch at
     git://git.kernel.org/pub/scm/linux/kernel/git/mellanox/linux
due to dependencies.

* branch 'mlx5_active_speed':
  RDMA: Fix link active_speed size
  RDMA/mlx5: Delete duplicated mlx5_ptys_width enum
  net/mlx5: Refactor query port speed functions
2020-09-18 10:31:45 -03:00
Bob Pearson
63fa15dbd4 RDMA/rxe: Add SPDX hdrs to rxe source files
Add SPDX headers to all rxe .c and .h files.

Link: https://lore.kernel.org/r/20200827145439.2273-1-rpearson@hpe.com
Signed-off-by: Bob Pearson <rpearson@hpe.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-31 12:20:02 -03:00
Dinghao Liu
e3ddd6067e RDMA/rxe: Fix memleak in rxe_mem_init_user
When page_address() fails, umem should be freed just like when
rxe_mem_alloc() fails.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200819075632.22285-1-dinghao.liu@zju.edu.cn
Signed-off-by: Dinghao Liu <dinghao.liu@zju.edu.cn>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-08-27 08:45:59 -03:00
Kamal Heib
293d8440a0 RDMA/rxe: Return void from rxe_mem_init_dma()
The return value from rxe_mem_init_dma() is always 0 - change it to be
void and fix the callers accordingly.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200705104313.283034-4-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
2020-07-16 13:57:21 -03:00
Kamal Heib
f6b4c11fc5 RDMA/rxe: Remove unused rxe_mem_map_pages
This function is not in use - delete it.

Fixes: 8700e3e7c4 ("Soft RoCE driver")
Link: https://lore.kernel.org/r/20200622100731.27359-1-kamalheib1@gmail.com
Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2020-06-22 14:49:27 -03:00