2019-06-09 10:27:03 +08:00
|
|
|
======================
|
|
|
|
Userspace verbs access
|
|
|
|
======================
|
2005-07-08 08:57:21 +08:00
|
|
|
|
|
|
|
The ib_uverbs module, built by enabling CONFIG_INFINIBAND_USER_VERBS,
|
|
|
|
enables direct userspace access to IB hardware via "verbs," as
|
|
|
|
described in chapter 11 of the InfiniBand Architecture Specification.
|
|
|
|
|
|
|
|
To use the verbs, the libibverbs library, available from
|
2018-02-03 05:35:29 +08:00
|
|
|
https://github.com/linux-rdma/rdma-core, is required. libibverbs contains a
|
2005-07-08 08:57:21 +08:00
|
|
|
device-independent API for using the ib_uverbs interface.
|
|
|
|
libibverbs also requires appropriate device-dependent kernel and
|
|
|
|
userspace driver for your InfiniBand hardware. For example, to use
|
|
|
|
a Mellanox HCA, you will need the ib_mthca kernel module and the
|
|
|
|
libmthca userspace driver be installed.
|
|
|
|
|
|
|
|
User-kernel communication
|
2019-06-09 10:27:03 +08:00
|
|
|
=========================
|
2005-07-08 08:57:21 +08:00
|
|
|
|
|
|
|
Userspace communicates with the kernel for slow path, resource
|
|
|
|
management operations via the /dev/infiniband/uverbsN character
|
|
|
|
devices. Fast path operations are typically performed by writing
|
|
|
|
directly to hardware registers mmap()ed into userspace, with no
|
|
|
|
system call or context switch into the kernel.
|
|
|
|
|
|
|
|
Commands are sent to the kernel via write()s on these device files.
|
|
|
|
The ABI is defined in drivers/infiniband/include/ib_user_verbs.h.
|
|
|
|
The structs for commands that require a response from the kernel
|
|
|
|
contain a 64-bit field used to pass a pointer to an output buffer.
|
|
|
|
Status is returned to userspace as the return value of the write()
|
|
|
|
system call.
|
|
|
|
|
|
|
|
Resource management
|
2019-06-09 10:27:03 +08:00
|
|
|
===================
|
2005-07-08 08:57:21 +08:00
|
|
|
|
|
|
|
Since creation and destruction of all IB resources is done by
|
|
|
|
commands passed through a file descriptor, the kernel can keep track
|
|
|
|
of which resources are attached to a given userspace context. The
|
|
|
|
ib_uverbs module maintains idr tables that are used to translate
|
|
|
|
between kernel pointers and opaque userspace handles, so that kernel
|
|
|
|
pointers are never exposed to userspace and userspace cannot trick
|
|
|
|
the kernel into following a bogus pointer.
|
|
|
|
|
|
|
|
This also allows the kernel to clean up when a process exits and
|
|
|
|
prevent one process from touching another process's resources.
|
|
|
|
|
|
|
|
Memory pinning
|
2019-06-09 10:27:03 +08:00
|
|
|
==============
|
2005-07-08 08:57:21 +08:00
|
|
|
|
|
|
|
Direct userspace I/O requires that memory regions that are potential
|
|
|
|
I/O targets be kept resident at the same physical address. The
|
|
|
|
ib_uverbs module manages pinning and unpinning memory regions via
|
|
|
|
get_user_pages() and put_page() calls. It also accounts for the
|
2019-02-07 09:31:55 +08:00
|
|
|
amount of memory pinned in the process's pinned_vm, and checks that
|
2005-07-08 08:57:21 +08:00
|
|
|
unprivileged processes do not exceed their RLIMIT_MEMLOCK limit.
|
|
|
|
|
|
|
|
Pages that are pinned multiple times are counted each time they are
|
2019-02-07 09:31:55 +08:00
|
|
|
pinned, so the value of pinned_vm may be an overestimate of the
|
2005-07-08 08:57:21 +08:00
|
|
|
number of pages pinned by a process.
|
|
|
|
|
|
|
|
/dev files
|
2019-06-09 10:27:03 +08:00
|
|
|
==========
|
2005-07-08 08:57:21 +08:00
|
|
|
|
|
|
|
To create the appropriate character device files automatically with
|
2019-06-09 10:27:03 +08:00
|
|
|
udev, a rule like::
|
2005-07-08 08:57:21 +08:00
|
|
|
|
2009-10-08 06:35:55 +08:00
|
|
|
KERNEL=="uverbs*", NAME="infiniband/%k"
|
2005-07-08 08:57:21 +08:00
|
|
|
|
2019-06-09 10:27:03 +08:00
|
|
|
can be used. This will create device nodes named::
|
2005-07-08 08:57:21 +08:00
|
|
|
|
|
|
|
/dev/infiniband/uverbs0
|
|
|
|
|
|
|
|
and so on. Since the InfiniBand userspace verbs should be safe for
|
|
|
|
use by non-privileged processes, it may be useful to add an
|
|
|
|
appropriate MODE or GROUP to the udev rule.
|