Update the supported DEVX commands, it includes adding to the
query/modify command's list and to the encoding handling.
In addition, a valid range for general commands was added to be used for
future commands.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Enforce DEVX privilege by firmware, this enables future device
functionality without the need to make driver changes unless a new
privilege type will be introduced.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Enables modify and query verbs objects via the DEVX interface.
To support this the above DEVX handlers were changed to get any
object type via the UVERBS_IDR_ANY_OBJECT mechanism.
The type checking and handling is done per object as part of the
driver code.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Enable getting an object type from a given uobject, the type is saved
upon tree merging and is returned as part of some helper function.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Introduce the UVERBS_IDR_ANY_OBJECT type to match any IDR object.
Once used, the infrastructure skips checking for the IDR type, it
becomes the driver handler responsibility.
This enables drivers to get in a given method an object from various of
types.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The enhanced devx support series needs commit:
9d43faac02 ("net/mlx5: Update mlx5_ifc with DEVX UCTX capabilities bits")
Signed-off-by: Doug Ledford <dledford@redhat.com>
There is no need to perform modify_rmp in two separate function,
while one of them uses stack as a placeholder for data while other
allocates it dynamically. Combine those two functions to one call
instead of two.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
There is no need to perform create_rmp in two separate function, while
one of them uses stack as a placeholder for data while other allocates
it dynamically. Combine those two functions to one instead of two.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Transfer initialization and cleanup from mlx5_priv struct of
mlx5_core_dev to be part of mlx5_ib_dev. This completes removal
of SRQ from mlx5_core.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reflect the change of moving SRQ code from mlx5_core to mlx5_ib by
updating function signatures do not require mlx5_core_dev as an input,
because all operations in mlx5_ib are supposed to use mlx5_ib_dev.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reuse existing infrastructure to initialize and release DEVX uid.
The DevX interface is intended for user space access, so it is supposed
to be initialized before ib_register_device(). Also it isn't supported
in switchdev mode and don't need to initialize it in that mode.
Fixes: 76dc5a8406 ("IB/mlx5: Manage device uid for DEVX white list commands")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
SRQ signature is not supported, hence no need for special static
global variable to announce it.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
There is no need to keep SRQ which is RDMA object in mlx5_core.
In this patch, we partially move the execution code, while next patches
will move table initialization/release logic too.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Ensure that both RDMA and netdev parts of SRQ implementation
has same copyright and license information annotated by SPDX
tags.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Add ability to track allocated ib_ucontext, which are limited
resource and worth to be visible by users.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
All of the old arguments can be derived from the uverbs_attr_bundle
structure, so get rid of the redundant arguments. Most of the prior work
has been removing users of the arguments to allow this to be a simple
patch.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If the user did not provide a long enough command buffer then the missing
bytes are forced to zero. There is no reason to check the length if a zero
value is OK.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This has a very complicated memory layout, with two flex arrays. Use
the iterator API to make reading it clearer.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Several methods have a command with a trailing flex array, and they
all open code some extraction scheme. Centralize this into a simple
iterator API.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
We truncate the response structure if there is not enough room in the
user buffer so there is no reason to have all the mess with finely managing
response_length. Just fully fill the attrs and truncate on copy.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
A response struct was defined, and userspace is providing it (but not
checking it). Fill it in and write it out.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The write_ex handlers have this horrible boilerplate in every function to
do the zero extend/zero check and min size checks. This is now handled in
the core code via the meta-data, and the zero checks are handled by
uverbs_request(). Replace all the occurrences.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This function properly zero-extends, and zero-checks if the user
buffer is not the same size as the kernel command struct.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This function properly truncates and zero-fills the response which is the
standard used by the ioctl uAPI when working with user data.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
There is no reason for this. For response processing we simply need to
copy, truncate, and zero fill the response into whatever output buffer
was provided. Add a function uverbs_response() that does this
consistently.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This creates a consistent way to access the two core buffers across write
and write_ex handlers.
Remove the open coded ucore conversion in the write/ex compatibility
handlers.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
write() methods must work with fixed sized structures as that is the only
way to know where the udata segment starts. The common udata code now
rejects any write() that has a response buffer shorter than the core's
response.
Thus all the checks of out_len for write methods are redundant and can be
removed.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Handle FW general event rq delay drop as it was received from FW via mlx5
notifiers API, instead of handling the processed software version of that
event. After this patch we can safely remove all software processed FW
events types and definitions.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Use the FW version of the port change event as forwarded via new mlx5
notifiers API.
After this patch, processed software version of the port change event
will become deprecated and will be totally removed in downstream
patches.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
Remove the deprecated mlx5_interface->event mlx5_ib callback and use new
mlx5 notifier API to subscribe for mlx5 events.
For native mlx5_ib devices profiles pf_profile/nic_rep_profile register
the notifier callback mlx5_ib_handle_event which treats the notifier
context as mlx5_ib_dev.
For vport repesentors, don't register any notifier, same as before, they
didn't receive any mlx5 events.
For slave port (mlx5_ib_multiport_info) register a different notifier
callback mlx5_ib_event_slave_port, which knows that the event is coming
for mlx5_ib_multiport_info and prepares the event job accordingly.
Before this on the event handler work we had to ask mlx5_core if this is
a slave port mlx5_core_is_mp_slave(work->dev), now it is not needed
anymore.
mlx5_ib_multiport_info notifier registration is done on
mlx5_ib_bind_slave_port and de-registration is done on
mlx5_ib_unbind_slave_port.
Signed-off-by: Saeed Mahameed <saeedm@mellanox.com>
The current implementation of create QP requires contiguous memory, such a
requirement is problematic once the memory is fragmented or the system is
low in memory, it causes failures in dma_zalloc_coherent().
This patch takes advantage of the new mlx5_core API which allocates a
fragmented buffer. This makes the QP creation much more resilient to
memory fragmentation. Data-path code was adapted to the fact that WQEs can
cross buffers.
We also use the opportunity to fix some cosmetic legacy coding convention
errors which were in the feature scope.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The current implementation of create SRQ requires contiguous memory, such
a requirement is problematic once the memory is fragmented or the system
is low in memory, it causes failures in dma_zalloc_coherent().
This patch takes the advantage of the new mlx5_core API which allocates a
fragmented buffer, and makes the SRQ creation much more resilient to
memory fragmentation. Data-path code was adapted to the fact that WQEs can
cross buffers.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
FRWR memory registration is done with a series of calls and WRs.
1. ULP invokes ib_dma_map_sg()
2. ULP invokes ib_map_mr_sg()
3. ULP posts an IB_WR_REG_MR on the Send queue
Step 2 generates an iova. It is permissible for ULPs to change this
iova (with certain restrictions) between steps 2 and 3.
rxe_map_mr_sg captures the MR's iova but later when rxe processes the
REG_MR WR, it ignores the MR's iova field. If a ULP alters the MR's iova
after step 2 but before step 3, rxe never captures that change.
When the remote sends an RDMA Read targeting that MR, rxe looks up the
R_key, but the altered iova does not match the iova stored in the MR,
causing the RDMA Read request to fail.
Reported-by: Anna Schumaker <schumaker.anna@gmail.com>
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Allow a user to attach a DEVX counter via mlx5 raw flow creation. In order
to attach a counter we introduce a new attribute:
MLX5_IB_ATTR_CREATE_FLOW_ARR_COUNTERS_DEVX
A counter can be attached to multiple flow steering rules.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
QIB driver was added in 2010 with many BUG_ON(), most of them were cleaned
out after years of development and usages.
It looks like that it is safe now to remove rest of BUG_ONs.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is a spelling mistake in a usnic_err error message, fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
drivers/infiniband/core/uverbs_cmd.c:1095:1-3: WARNING: PTR_ERR_OR_ZERO can be used
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Fixes: 7106a97697 ("RDMA/uverbs: Make write() handlers return 0 on success")
Signed-off-by: kbuild test robot <fengguang.wu@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fix spelling mistake in usnic_err error message
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Have the core code initialize the driver_udata if the method has a udata
description. This is done using the same create_udata the handler was
supposed to call.
This makes ioctl consistent with the write and write_ex paths.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Now that we have metadata describing the command format the core code can
directly compute the udata pointers and all the really ugly
ib_uverbs_init_udata() calls can be removed from the handlers.
This means all the write() handlers are no longer sensitive to the layout
of the command buffer.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The core code needs to compute the udata so we may as well pass it in the
uverbs_attr_bundle instead of on the stack. This converts the simple case
of write_ex() which already has a core calculation.
Also change the write() path to use the attrs for ib_uverbs_init_udata()
instead of on the stack. This lets the write to write_ex compatibility
path continue to follow the lead of the _ex path.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The size meta-data in the prior patch describes the smallest acceptable
buffer for the write() interface. Globally check this in the core code.
This is necessary in the case of write() methods that have a driver udata
to prevent computing a negative udata buffer length.
The return code of -ENOSPC is chosen here as some of the handlers already
use this code, however many other handler use EINVAL.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
We need the structure sizes to compute the location of the udata in the
core code. Annotate the sizes into the new macro language.
This is generated largely by script and checked by comparing against the
similar list in rdma-core.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
The uverbs_attr_bundle already contains this pointer, and most methods
don't actually need it. Get rid of the redundant function argument.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Currently they return the command length, while all other handlers return
0. This makes the write path closer to the write_ex and ioctl path.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Now that we can add meta-data to the description of write() methods we
need to pass the uverbs_attr_bundle into all write based handlers so
future patches can use it as a container for any new data transferred out
of the core.
This is the first step to bringing the write() and ioctl() methods to a
common interface signature.
This is a simple search/replace, and we push the attr down into the uobj
and other APIs to keep changes minimal.
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
There is a spelling mistake in the module description text, fix it.
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When the rdma device is getting removed, get resource info can race with
device removal, as below:
CPU-0 CPU-1
-------- --------
rdma_nl_rcv_msg()
nldev_res_get_cq_dumpit()
mutex_lock(device_lock);
get device reference
mutex_unlock(device_lock); [..]
ib_unregister_device()
/* Valid reference to
* device->dev exists.
*/
ib_dealloc_device()
[..]
provider->fill_res_entry();
Even though device object is not freed, fill_res_entry() can get called on
device which doesn't have a driver anymore. Kernel core device reference
count is not sufficient, as this only keeps the structure valid, and
doesn't guarantee the driver is still loaded.
Similar race can occur with device renaming and device removal, where
device_rename() tries to rename a unregistered device. While this is fine
for devices of a class which are not net namespace aware, but it is
incorrect for net namespace aware class coming in subsequent series. If a
class is net namespace aware, then the below [1] call trace is observed in
above situation.
Therefore, to avoid the race, keep a reference count and let device
unregistration wait until all netlink users drop the reference.
[1] Call trace:
kernfs: ns required in 'infiniband' for 'mlx5_0'
WARNING: CPU: 18 PID: 44270 at fs/kernfs/dir.c:842 kernfs_find_ns+0x104/0x120
libahci i2c_core mlxfw libata dca [last unloaded: devlink]
RIP: 0010:kernfs_find_ns+0x104/0x120
Call Trace:
kernfs_find_and_get_ns+0x2e/0x50
sysfs_rename_link_ns+0x40/0xb0
device_rename+0xb2/0xf0
ib_device_rename+0xb3/0x100 [ib_core]
nldev_set_doit+0x165/0x190 [ib_core]
rdma_nl_rcv_msg+0x249/0x250 [ib_core]
? netlink_deliver_tap+0x8f/0x3e0
rdma_nl_rcv+0xd6/0x120 [ib_core]
netlink_unicast+0x17c/0x230
netlink_sendmsg+0x2f0/0x3e0
sock_sendmsg+0x30/0x40
__sys_sendto+0xdc/0x160
Fixes: da5c850782 ("RDMA/nldev: add driver-specific resource tracking")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>