The high 6 bits of traffic class in GRH is DSCP (Differentiated Services
Codepoint), the driver should shift it before the hardware gets it when
using RoCEv2.
Fixes: 606bf89e98 ("RDMA/hns: Refactor for hns_roce_v2_modify_qp function")
Fixes: fba429fcf9 ("RDMA/hns: Fix missing fields in address vector")
Link: https://lore.kernel.org/r/1607650657-35992-4-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Whether to enable the these features should better depend on the enable
flags, not the value of related fields.
Fixes: 5c1f167af1 ("RDMA/hns: Init SRQ table for hip08")
Fixes: 3cb2c996c9 ("RDMA/hns: Add support for SCCC in size of 64 Bytes")
Link: https://lore.kernel.org/r/1607650657-35992-3-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
For ib_copy_from_user(), the length of udata may not be the same as that
of cmd. For ib_copy_to_user(), the length of udata may not be the same as
that of resp. So limit the length to prevent out-of-bounds read and write
operations from ib_copy_from_user() and ib_copy_to_user().
Fixes: de77503a59 ("RDMA/hns: RDMA/hns: Assign rq head pointer when enable rq record db")
Fixes: 633fb4d9fd ("RDMA/hns: Use structs to describe the uABI instead of opencoding")
Fixes: ae85bf92ef ("RDMA/hns: Optimize qp param setup flow")
Fixes: 6fd610c573 ("RDMA/hns: Support 0 hop addressing for SRQ buffer")
Fixes: 9d9d4ff788 ("RDMA/hns: Update the kernel header file of hns")
Link: https://lore.kernel.org/r/1607650657-35992-2-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
It is not clear what this lock protects. If the authors wanted to ensure
that "dev" does not disappear, that is impossible, given the following
code path:
mlx4_ib_netdev_event (under RTNL mutex)
-> mlx4_ib_scan_netdevs
-> mlx4_ib_update_qps
Also, the dev_base_lock does not protect dev->dev_addr either.
So it serves no purpose here. Remove it.
Link: https://lore.kernel.org/r/20201208193928.1500893-1-vladimir.oltean@nxp.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Fix incorrect type of max_entries in UVERBS_METHOD_QUERY_GID_TABLE -
max_entries is of type size_t although it can take negative values.
The following static check revealed it:
drivers/infiniband/core/uverbs_std_types_device.c:338 ib_uverbs_handler_UVERBS_METHOD_QUERY_GID_TABLE() warn: 'max_entries' unsigned <= 0
Fixes: 9f85cbe50a ("RDMA/uverbs: Expose the new GID query API to user space")
Link: https://lore.kernel.org/r/20201208073545.9723-4-leon@kernel.org
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In procedure ib_register_device, procedure kobject_uevent is called
(advertising that the device is ready for userspace usage) even when
device_enable_and_get() returned an error.
As a result, various RDMA modules attempted to register for the device
even while the device driver was preparing to unregister the device.
Fix this by advertising the device availability only after enabling the
device succeeds.
Fixes: e7a5b4aafd ("RDMA/device: Don't fire uevent before device is fully initialized")
Link: https://lore.kernel.org/r/20201208073545.9723-3-leon@kernel.org
Suggested-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The CQ pool mechanism had two problems:
1. The CQ pool lists were uninitialized in the device registration error
flow. As a result, all the list pointers remained NULL. This caused
the kernel to crash (in procedure ib_cq_pool_destroy) when that error
flow was taken (and unregister called). The stack trace snippet:
BUG: kernel NULL pointer dereference, address: 0000000000000000
#PF: supervisor read access in kernel mode
#PF: error_code(0×0000) ? not-present page
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
. . .
RIP: 0010:ib_cq_pool_destroy+0x1b/0×70 [ib_core]
. . .
Call Trace:
disable_device+0x9f/0×130 [ib_core]
__ib_unregister_device+0x35/0×90 [ib_core]
ib_register_device+0x529/0×610 [ib_core]
__mlx5_ib_add+0x3a/0×70 [mlx5_ib]
mlx5_add_device+0x87/0×1c0 [mlx5_core]
mlx5_register_interface+0x74/0xc0 [mlx5_core]
do_one_initcall+0x4b/0×1f4
do_init_module+0x5a/0×223
load_module+0x1938/0×1d40
2. At device unregister, when cleaning up the cq pool, the cq's in the
pool lists were freed, but the cq entries were left in the list.
The fix for the first issue is to initialize the cq pool lists when the
ib_device structure is allocated for a new device (in procedure
_ib_alloc_device).
The fix for the second problem is to delete cq entries from the pool lists
when cleaning up the cq pool.
In addition, procedure ib_cq_pool_destroy() is renamed to the more
appropriate name ib_cq_pool_cleanup().
Fixes: 4aa1615268 ("RDMA/core: Fix ordering of CQ pool destruction")
Link: https://lore.kernel.org/r/20201208073545.9723-2-leon@kernel.org
Suggested-by: Jason Gunthorpe <jgg@nvidia.com>
Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Commit 66f57b871e ("RDMA/restrack: Support all QP types") extends
ib_create_qp() to a named ib_create_named_qp(), which takes the caller's
name as argument, but it did not add the new argument description to the
function's kerneldoc.
make htmldocs warns:
./drivers/infiniband/core/verbs.c:1206: warning: Function parameter or member 'caller' not described in 'ib_create_named_qp'
Add a description for this new argument based on the description of the
same argument in other related functions.
Fixes: 66f57b871e ("RDMA/restrack: Support all QP types")
Link: https://lore.kernel.org/r/20201207173255.13355-1-lukas.bulwahn@gmail.com
Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
iser_initialize_task_headers() uses in_interrupt() to find out if it is
safe to acquire a mutex.
in_interrupt() is deprecated as it is ill defined and does not provide
what it suggests. Aside of that it covers only parts of the contexts in
which a mutex may not be acquired.
The following callchains exist:
iscsi_queuecommand() *locks* iscsi_session::frwd_lock
-> iscsi_prep_scsi_cmd_pdu()
-> session->tt->init_task() (iscsi_iser_task_init())
-> iser_initialize_task_headers()
-> iscsi_iser_task_xmit() (iscsi_transport::xmit_task)
-> iscsi_iser_task_xmit_unsol_data()
-> iser_send_data_out()
-> iser_initialize_task_headers()
iscsi_data_xmit() *locks* iscsi_session::frwd_lock
-> iscsi_prep_mgmt_task()
-> session->tt->init_task() (iscsi_iser_task_init())
-> iser_initialize_task_headers()
-> iscsi_prep_scsi_cmd_pdu()
-> session->tt->init_task() (iscsi_iser_task_init())
-> iser_initialize_task_headers()
__iscsi_conn_send_pdu() caller has iscsi_session::frwd_lock
-> iscsi_prep_mgmt_task()
-> session->tt->init_task() (iscsi_iser_task_init())
-> iser_initialize_task_headers()
-> session->tt->xmit_task() (
The only callchain that is close to be invoked in preemptible context:
iscsi_xmitworker() worker
-> iscsi_data_xmit()
-> iscsi_xmit_task()
-> conn->session->tt->xmit_task() (iscsi_iser_task_xmit()
In iscsi_iser_task_xmit() there is this check:
if (!task->sc)
return iscsi_iser_mtask_xmit(conn, task);
so it does end up in iser_initialize_task_headers() and
iser_initialize_task_headers() relies on iscsi_task::sc == NULL.
Remove conditional locking of iser_conn::state_mutex because there is no
call chain to do so. Remove the goto label and return early now that there
is no clean up needed.
Link: https://lore.kernel.org/r/20201204174256.62xfcvudndt7oufl@linutronix.de
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Sagi Grimberg <sagi@grimberg.me>
Cc: Max Gurtovoy <maxg@nvidia.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Jason Gunthorpe <jgg@ziepe.ca>
Cc: linux-rdma@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Currently, DM MR registration flow doesn't set the mlx5_ib_dev pointer and
can cause a NULL pointer dereference if userspace dumps the MR via rdma
tool.
Assign the IB device together with the other fields and remove the
redundant reference of mlx5_ib_dev from mlx5_ib_mr.
Cc: stable@vger.kernel.org
Fixes: 6c29f57ea4 ("IB/mlx5: Device memory mr registration support")
Link: https://lore.kernel.org/r/20201203190807.127189-1-leon@kernel.org
Signed-off-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
These flags will be returned to the userspace through ABI, so they should
be defined in hns-abi.h. Furthermore, there is no need to include
hns-abi.h in every source files, it just needs to be included in the
common header file.
Link: https://lore.kernel.org/r/1606872560-17823-1-git-send-email-liweihang@huawei.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
While creating qps, the driver adds one extra entry to the sq size passed
by the ULPs in order to avoid queue full condition. When ULPs creates QPs
with max_qp_wr reported, driver creates QP with 1 more than the max_wqes
supported by HW. Create QP fails in this case. To avoid this error, reduce
1 entry in max_qp_wqes and report it to the stack.
Link: https://lore.kernel.org/r/1606741986-16477-1-git-send-email-selvin.xavier@broadcom.com
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This is all a giant train wreck of error handling, in many cases the MR is
left in some corrupted state where continuing on is going to lead to
chaos, or various unwinds/order is missed.
rereg had three possible completely different actions, depending on flags
and various details about the MR. Split the three actions into three
functions, and call the right action from the start.
For each action carefully design the error handling to fit the action:
- UMR access/PD update is a simple UMR, if it fails the MR isn't changed,
so do nothing
- PAS update over UMR is multiple UMR operations. To keep everything sane
revoke access to the MKey while it is being changed and restore it once
the MR is correct.
- Recreating the mkey should completely build a parallel MR with a fully
loaded PAS then swap and destroy the old one. If it fails the original
should be left untouched. This is handled in the core code. Directly
call the normal MR creation functions, possibly re-using the existing
umem.
Add support for working with ODP MRs. The READ/WRITE access flags can be
changed by UMR and we can trivially convert to/from ODP MRs using the
logic to build a completely new MR.
This new logic also fixes various problems with MRs continuing to work
while their PAS lists are no longer valid, eg during a page size change.
Link: https://lore.kernel.org/r/20201130075839.278575-6-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
This function handles an ODP and regular MR flow all mushed together, even
though the two flows are quite different. Split them into two dedicated
functions.
Link: https://lore.kernel.org/r/20201130075839.278575-5-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
mlx5 has an ugly flow where it tries to allocate a new MR and replace the
existing MR in the same memory during rereg. This is very complicated and
buggy. Instead of trying to replace in-place inside the driver, provide
support from uverbs to change the entire HW object assigned to a handle
during rereg_mr.
Since destroying a MR is allowed to fail (ie if a MW is pointing at it)
and can't be detected in advance, the algorithm creates a completely new
uobject to hold the new MR and swaps the IDR entries of the two objects.
The old MR in the temporary IDR entry is destroyed, and if it fails
rereg_mr succeeds and destruction is deferred to FD release. This
complexity is why this cannot live in a driver safely.
Link: https://lore.kernel.org/r/20201130075839.278575-4-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
No reason only one caller checks this. This properly blocks ODP
from the rereg flow if the device does not support ODP.
Link: https://lore.kernel.org/r/20201130075839.278575-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Unknown flags should be EOPNOTSUPP, only zero flags is EINVAL. Flags is
actually the rereg action to perform.
The checking of the start/hca_va/etc is also redundant and ib_umem_get()
does these checks and returns proper error codes.
Fixes: 7e6edb9b2e ("IB/core: Add user MR re-registration support")
Link: https://lore.kernel.org/r/20201130075839.278575-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Use dma_set_mask_and_coherent() instead of pci_set_dma_mask() followed by
a pci_set_consistent_dma_mask().
Link: https://lore.kernel.org/r/20201201091811.37984-1-galpress@amazon.com
Reviewed-by: Firas JahJah <firasj@amazon.com>
Reviewed-by: Yossi Leybovich <sleybo@amazon.com>
Signed-off-by: Gal Pressman <galpress@amazon.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The variable 'cnt' is used to represent the max number of sge an SQ WQE
can use at first, then it means how many extended sge an SQ has. In
addition, this function has no need to return a value. So refactor and
encapsulate the parts of getting number of extended sge a WQE can use to
make it easier to understand.
Link: https://lore.kernel.org/r/1606558959-48510-4-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Page alignment is required when setting the number of extended sge
according to the hardware's achivement. If the space of needed extended
sge is greater than one page, the roundup_pow_of_two() can ensure
that. But if the needed extended sge isn't 0 and can not be filled in a
whole page, the driver should align it specifically.
Fixes: 54d6638765 ("RDMA/hns: Optimize WQE buffer size calculating process")
Link: https://lore.kernel.org/r/1606558959-48510-3-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
One RC SQ WQE can store 2 sges but UD can't, so ignore 2 valid sges of
wr.sglist for RC which have been filled in WQE before setting extended
sge. Either of RC and UD can not contain 0-length sges, so these 0-length
sges should be skipped.
Fixes: 54d6638765 ("RDMA/hns: Optimize WQE buffer size calculating process")
Link: https://lore.kernel.org/r/1606558959-48510-2-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The push feature does not work as expected in x722 and has historically
been disabled in the driver.
Purge all remaining code related to the push feature in i40iw.
Link: https://lore.kernel.org/r/20201125005616.1800-3-shiraz.saleem@intel.com
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
-----BEGIN PGP SIGNATURE-----
iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl/EM9oeHHRvcnZhbGRz
QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/3kH/RNkFyTlHlUkZpJx
8Ks2yWgUln7YhZcmOaG/IcIyWnhCgo3l35kiaH7XxM+rPMZzidp51MHUllaTAQDc
u+5EFHMJsmTWUfE8ocHPb1cPdYEDSoVr6QUsixbL9+uADpRz+VZVtWMb89EiyMrC
wvLIzpnqY5UNriWWBxD0hrmSsT4g9XCsauer4k2KB+zvebwg6vFOMCFLFc2qz7fb
ABsrPFqLZOMp+16chGxyHP7LJ6ygI/Hwf7tPW8ppv4c+hes4HZg7yqJxXhV02QbJ
s10s6BTcEWMqKg/T6L/VoScsMHWUcNdvrr3uuPQhgup240XdmB1XO8rOKddw27e7
VIjrjNw=
=4ZaP
-----END PGP SIGNATURE-----
Merge tag 'v5.10-rc6' into rdma.git for-next
For dependencies in following patches
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Stash is a mechanism that uses the core information carried by the ARM AXI
bus to access the L3 cache. It can be used to improve the performance by
increasing the hit ratio of L3 cache. QPs need to enable stash by default.
Link: https://lore.kernel.org/r/1606374251-21512-3-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Stash is a mechanism that uses the core information carried by the ARM AXI
bus to access the L3 cache. It can be used to improve the performance by
increasing the hit ratio of L3 cache. CQs need to enable stash by default.
Link: https://lore.kernel.org/r/1606374251-21512-2-git-send-email-liweihang@huawei.com
Signed-off-by: Lang Cheng <chenglang@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
In order to improve performance by balancing the load between different
banks of cache, the QPC cache is desigend to choose one of 8 banks
according to lower 3 bits of QPN. The hns driver needs to count the number
of QP on each bank and then assigns the QP being created to the bank with
the minimum load first.
Link: https://lore.kernel.org/r/1606220649-1465-1-git-send-email-liweihang@huawei.com
Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The latest changes in restrack name handling allowed to simplify the QP
creation code to support all types of QPs.
For example XRC QP are presented with rdmatool.
$ ibv_xsrq_pingpong &
$ rdma res show qp
link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core]
link ibp0s9/1 lqpn 7 type UD state RTS sq-psn 0 comm [mlx5_ib]
link ibp0s9/1 lqpn 42 type XRC_TGT state INIT sq-psn 0 path-mig-state MIGRATED comm [ib_uverbs]
link ibp0s9/1 lqpn 43 type XRC_INI state INIT sq-psn 0 path-mig-state MIGRATED pdn 197 pid 419 comm ibv_xsrq_pingpong
Link: https://lore.kernel.org/r/20201117070148.1974114-4-leon@kernel.org
Reviewed-by: Mark Zhang <markz@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Driver QP types are special case with no IBTA restrictions. For example,
EFA implemented creation of this QP type as regular one, while mlx5
separated create to two step: create and modify. That separation causes to
the situation where DC QP (mlx5) is always added to the same xarray index
zero.
This change allows to drivers like mlx5 simply disable restrack DB
tracking, but it doesn't disable kref on the memory.
Fixes: 52e0a118a2 ("RDMA/restrack: Track driver QP types in resource tracker")
Link: https://lore.kernel.org/r/20201117070148.1974114-3-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Device memory (DM) are registered as MR during initialization flow, these
MRs were not tracked by resource tracker and had res->valid set as a
false. Update the code to manage them too.
Before this change:
$ ibv_rc_pingpong -j &
$ rdma res show mr <-- shows nothing
After this change:
$ ibv_rc_pingpong -j &
$ rdma res show mr
dev ibp0s9 mrn 0 mrlen 4096 pdn 3 pid 734 comm ibv_rc_pingpong
Fixes: be934cca9e ("IB/uverbs: Add device memory registration ioctl support")
Link: https://lore.kernel.org/r/20201117070148.1974114-2-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
DMA operation of the IB device is done using ib_device->dma_device.
Instead of accessing parent of the IB device, use the PCI dma device which
is setup to ib_device->dma_device during IB device registration.
Link: https://lore.kernel.org/r/20201125064628.8431-1-leon@kernel.org
Signed-off-by: Parav Pandit <parav@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Coverity reports "Potentially overflowing expression ..." warning, which
is correct thing to complain from the compiler point of view, but this is
not possible in the current code. Still, this is a small error as there
are some future situations that might need to use a 32 bit offset. Use ULL
so the calculation works up to 63.
Fixes: b045db62f6 ("RDMA/mlx5: Use ib_umem_find_best_pgoff() for SRQ")
Link: https://lore.kernel.org/r/20201125061704.6580-1-leon@kernel.org
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The return code from uverbs_zalloc() was wrongly checked, it is ERR_PTR
not NULL like other allocators:
drivers/infiniband/hw/mlx5/devx.c:2110 devx_umem_reg_cmd_alloc() warn: passing zero to 'PTR_ERR'
Fixes: 878f7b31c3 ("RDMA/mlx5: Use ib_umem_find_best_pgsz() for devx")
Link: https://lore.kernel.org/r/0-v1-4d05ccc1c223+173-devx_err_ptr_jgg@nvidia.com
Reported-by: kernel test robot <lkp@intel.com>
Acked-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The loopback flag will be set to 1 by the hardware when the source mac
address is same as the destination mac address. So the driver don't need
to compare them.
Fixes: d6a3627e31 ("RDMA/hns: Optimize wqe buffer set flow for post send")
Link: https://lore.kernel.org/r/1605526408-6936-4-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Traffic class and hop limit in address vector is not assigned from GRH,
but it will be filled into UD SQ WQE. So the hardware will get a wrong
value.
Fixes: 82e620d9c3 ("RDMA/hns: Modify the data structure of hns_roce_av")
Link: https://lore.kernel.org/r/1605526408-6936-3-git-send-email-liweihang@huawei.com
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Address handle is set for connected QP types such as RC and UC, and thus
can also be queried.
Since XRC QP types INI and TGT are connected, it should be possible to
query their address handle as well.
Until now it was not the case, and although the firmware supported it, the
driver allowed querying the address handle only for RC and UC.
Hence, we enable it now for INI and TGT QPs as well.
Link: https://lore.kernel.org/r/20201115121425.139833-3-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Address handle is set for connected QP types such as RC and UC, and thus
can also be queried.
Since XRC QP types INI and TGT are connected, it should be possible to
query their address handle as well.
Until now it was not the case, and although the firmware supported it, the
driver allowed querying the address handle only for RC and UC.
Hence, we enable it now for INI and TGT QPs as well.
Link: https://lore.kernel.org/r/20201115121425.139833-2-leon@kernel.org
Reviewed-by: Maor Gottlieb <maorg@nvidia.com>
Signed-off-by: Avihai Horon <avihaih@nvidia.com>
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
When a memory window is bound to a memory region, the local write access
should be set for its mtpt table.
Fixes: c7c2819140 ("RDMA/hns: Add MW support for hip08")
Link: https://lore.kernel.org/r/1606386372-21094-1-git-send-email-liweihang@huawei.com
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The maximum number of retransmission should be returned when querying QP,
not the value of retransmission counter.
Fixes: 99fcf82521 ("RDMA/hns: Fix the wrong value of rnr_retry when querying qp")
Fixes: 926a01dc00 ("RDMA/hns: Add QP operations support for hip08 SoC")
Link: https://lore.kernel.org/r/1606382977-21431-1-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
The SRQ capacity is got from the firmware, whose field should be ended at
bit 19.
Fixes: ba6bb7e974 ("RDMA/hns: Add interfaces to get pf capabilities from firmware")
Link: https://lore.kernel.org/r/1606382812-23636-1-git-send-email-liweihang@huawei.com
Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Two earlier bug fixes have created a security problem in the hfi1
driver. One fix aimed to solve an issue where current->mm was not valid
when closing the hfi1 cdev. It attempted to do this by saving a cached
value of the current->mm pointer at file open time. This is a problem if
another process with access to the FD calls in via write() or ioctl() to
pin pages via the hfi driver. The other fix tried to solve a use after
free by taking a reference on the mm.
To fix this correctly we use the existing cached value of the mm in the
mmu notifier. Now we can check in the insert, evict, etc. routines that
current->mm matched what the notifier was registered for. If not, then
don't allow access. The register of the mmu notifier will save the mm
pointer.
Since in do_exit() the exit_mm() is called before exit_files(), which
would call our close routine a reference is needed on the mm. We rely on
the mmgrab done by the registration of the notifier, whereas before it was
explicit. The mmu notifier deregistration happens when the user context is
torn down, the creation of which triggered the registration.
Also of note is we do not do any explicit work to protect the interval
tree notifier. It doesn't seem that this is going to be needed since we
aren't actually doing anything with current->mm. The interval tree
notifier stuff still has a FIXME noted from a previous commit that will be
addressed in a follow on patch.
Cc: <stable@vger.kernel.org>
Fixes: e0cf75deab ("IB/hfi1: Fix mm_struct use after free")
Fixes: 3faa3d9a30 ("IB/hfi1: Make use of mm consistent")
Link: https://lore.kernel.org/r/20201125210112.104301.51331.stgit@awfm-01.aw.intel.com
Suggested-by: Jann Horn <jannh@google.com>
Reported-by: Jason Gunthorpe <jgg@nvidia.com>
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
rdma_detroy_id() cannot be called under &lock - we must instead keep the
error'd ID around until &lock can be released, then destroy it.
This is complicated by the usual way listen IDs are destroyed through
cma_process_remove() which can run at any time and will asynchronously
destroy the same ID.
Remove the ID from visiblity of cma_process_remove() before going down the
destroy path outside the locking.
Fixes: c80a0c52d8 ("RDMA/cma: Add missing error handling of listen_id")
Link: https://lore.kernel.org/r/20201118133756.GK244516@ziepe.ca
Reported-by: syzbot+1bc48bf7f78253f664a9@syzkaller.appspotmail.com
Reviewed-by: Leon Romanovsky <leonro@nvidia.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
i40iw_mmap manipulates the vma->vm_pgoff to differentiate a push page mmap
vs a doorbell mmap, and uses it to compute the pfn in remap_pfn_range
without any validation. This is vulnerable to an mmap exploit as described
in: https://lore.kernel.org/r/20201119093523.7588-1-zhudi21@huawei.com
The push feature is disabled in the driver currently and therefore no push
mmaps are issued from user-space. The feature does not work as expected in
the x722 product.
Remove the push module parameter and all VMA attribute manipulations for
this feature in i40iw_mmap. Update i40iw_mmap to only allow DB user
mmapings at offset = 0. Check vm_pgoff for zero and if the mmaps are bound
to a single page.
Cc: <stable@kernel.org>
Fixes: d374984179 ("i40iw: add files for iwarp interface")
Link: https://lore.kernel.org/r/20201125005616.1800-2-shiraz.saleem@intel.com
Reported-by: Di Zhu <zhudi21@huawei.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>
Add a group of flags to control the 'struct hns_roce_buf' allocation
flow, this is used to support the caller running in atomic context.
Link: https://lore.kernel.org/r/1605347916-15964-1-git-send-email-liweihang@huawei.com
Signed-off-by: Xi Wang <wangxi11@huawei.com>
Signed-off-by: Weihang Li <liweihang@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>