linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-11 21:38:32 +08:00

Author	SHA1	Message	Date
Weihang Li	603bee935f	RDMA/hns: Do shift on traffic class when using RoCEv2 The high 6 bits of traffic class in GRH is DSCP (Differentiated Services Codepoint), the driver should shift it before the hardware gets it when using RoCEv2. Fixes: `606bf89e98` ("RDMA/hns: Refactor for hns_roce_v2_modify_qp function") Fixes: `fba429fcf9` ("RDMA/hns: Fix missing fields in address vector") Link: https://lore.kernel.org/r/1607650657-35992-4-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-11 15:21:33 -04:00
Wenpeng Liang	4ddeacf68a	RDMA/hns: Normalization the judgment of some features Whether to enable the these features should better depend on the enable flags, not the value of related fields. Fixes: `5c1f167af1` ("RDMA/hns: Init SRQ table for hip08") Fixes: `3cb2c996c9` ("RDMA/hns: Add support for SCCC in size of 64 Bytes") Link: https://lore.kernel.org/r/1607650657-35992-3-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-11 15:21:33 -04:00
Wenpeng Liang	1c0ca9cd17	RDMA/hns: Limit the length of data copied between kernel and userspace For ib_copy_from_user(), the length of udata may not be the same as that of cmd. For ib_copy_to_user(), the length of udata may not be the same as that of resp. So limit the length to prevent out-of-bounds read and write operations from ib_copy_from_user() and ib_copy_to_user(). Fixes: `de77503a59` ("RDMA/hns: RDMA/hns: Assign rq head pointer when enable rq record db") Fixes: `633fb4d9fd` ("RDMA/hns: Use structs to describe the uABI instead of opencoding") Fixes: `ae85bf92ef` ("RDMA/hns: Optimize qp param setup flow") Fixes: `6fd610c573` ("RDMA/hns: Support 0 hop addressing for SRQ buffer") Fixes: `9d9d4ff788` ("RDMA/hns: Update the kernel header file of hns") Link: https://lore.kernel.org/r/1607650657-35992-2-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-11 15:21:33 -04:00
Vladimir Oltean	6f320f6990	RDMA/mlx4: Remove bogus dev_base_lock usage It is not clear what this lock protects. If the authors wanted to ensure that "dev" does not disappear, that is impossible, given the following code path: mlx4_ib_netdev_event (under RTNL mutex) -> mlx4_ib_scan_netdevs -> mlx4_ib_update_qps Also, the dev_base_lock does not protect dev->dev_addr either. So it serves no purpose here. Remove it. Link: https://lore.kernel.org/r/20201208193928.1500893-1-vladimir.oltean@nxp.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-10 16:58:25 -04:00
Avihai Horon	e0da68994d	RDMA/uverbs: Fix incorrect variable type Fix incorrect type of max_entries in UVERBS_METHOD_QUERY_GID_TABLE - max_entries is of type size_t although it can take negative values. The following static check revealed it: drivers/infiniband/core/uverbs_std_types_device.c:338 ib_uverbs_handler_UVERBS_METHOD_QUERY_GID_TABLE() warn: 'max_entries' unsigned <= 0 Fixes: `9f85cbe50a` ("RDMA/uverbs: Expose the new GID query API to user space") Link: https://lore.kernel.org/r/20201208073545.9723-4-leon@kernel.org Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-10 15:05:17 -04:00
Jack Morgenstein	779e0bf476	RDMA/core: Do not indicate device ready when device enablement fails In procedure ib_register_device, procedure kobject_uevent is called (advertising that the device is ready for userspace usage) even when device_enable_and_get() returned an error. As a result, various RDMA modules attempted to register for the device even while the device driver was preparing to unregister the device. Fix this by advertising the device availability only after enabling the device succeeds. Fixes: `e7a5b4aafd` ("RDMA/device: Don't fire uevent before device is fully initialized") Link: https://lore.kernel.org/r/20201208073545.9723-3-leon@kernel.org Suggested-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-10 15:05:17 -04:00
Jack Morgenstein	286e1d3f9b	RDMA/core: Clean up cq pool mechanism The CQ pool mechanism had two problems: 1. The CQ pool lists were uninitialized in the device registration error flow. As a result, all the list pointers remained NULL. This caused the kernel to crash (in procedure ib_cq_pool_destroy) when that error flow was taken (and unregister called). The stack trace snippet: BUG: kernel NULL pointer dereference, address: 0000000000000000 #PF: supervisor read access in kernel mode #PF: error_code(0×0000) ? not-present page PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI . . . RIP: 0010:ib_cq_pool_destroy+0x1b/0×70 [ib_core] . . . Call Trace: disable_device+0x9f/0×130 [ib_core] __ib_unregister_device+0x35/0×90 [ib_core] ib_register_device+0x529/0×610 [ib_core] __mlx5_ib_add+0x3a/0×70 [mlx5_ib] mlx5_add_device+0x87/0×1c0 [mlx5_core] mlx5_register_interface+0x74/0xc0 [mlx5_core] do_one_initcall+0x4b/0×1f4 do_init_module+0x5a/0×223 load_module+0x1938/0×1d40 2. At device unregister, when cleaning up the cq pool, the cq's in the pool lists were freed, but the cq entries were left in the list. The fix for the first issue is to initialize the cq pool lists when the ib_device structure is allocated for a new device (in procedure _ib_alloc_device). The fix for the second problem is to delete cq entries from the pool lists when cleaning up the cq pool. In addition, procedure ib_cq_pool_destroy() is renamed to the more appropriate name ib_cq_pool_cleanup(). Fixes: `4aa1615268` ("RDMA/core: Fix ordering of CQ pool destruction") Link: https://lore.kernel.org/r/20201208073545.9723-2-leon@kernel.org Suggested-by: Jason Gunthorpe <jgg@nvidia.com> Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-10 15:05:17 -04:00
Lukas Bulwahn	d1dec0cae5	RDMA/core: Update kernel documentation for ib_create_named_qp() Commit `66f57b871e` ("RDMA/restrack: Support all QP types") extends ib_create_qp() to a named ib_create_named_qp(), which takes the caller's name as argument, but it did not add the new argument description to the function's kerneldoc. make htmldocs warns: ./drivers/infiniband/core/verbs.c:1206: warning: Function parameter or member 'caller' not described in 'ib_create_named_qp' Add a description for this new argument based on the description of the same argument in other related functions. Fixes: `66f57b871e` ("RDMA/restrack: Support all QP types") Link: https://lore.kernel.org/r/20201207173255.13355-1-lukas.bulwahn@gmail.com Signed-off-by: Lukas Bulwahn <lukas.bulwahn@gmail.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-10 13:17:00 -04:00
Tom Rix	7f1d2dfa30	RDMA/mlx5: Remove unneeded semicolon A semicolon is not needed after a switch statement. Link: https://lore.kernel.org/r/20201031134638.2135060-1-trix@redhat.com Signed-off-by: Tom Rix <trix@redhat.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-10 12:00:27 -04:00
Sebastian Andrzej Siewior	0583531bb9	RDMA/iser: Remove in_interrupt() usage iser_initialize_task_headers() uses in_interrupt() to find out if it is safe to acquire a mutex. in_interrupt() is deprecated as it is ill defined and does not provide what it suggests. Aside of that it covers only parts of the contexts in which a mutex may not be acquired. The following callchains exist: iscsi_queuecommand() locks iscsi_session::frwd_lock -> iscsi_prep_scsi_cmd_pdu() -> session->tt->init_task() (iscsi_iser_task_init()) -> iser_initialize_task_headers() -> iscsi_iser_task_xmit() (iscsi_transport::xmit_task) -> iscsi_iser_task_xmit_unsol_data() -> iser_send_data_out() -> iser_initialize_task_headers() iscsi_data_xmit() locks iscsi_session::frwd_lock -> iscsi_prep_mgmt_task() -> session->tt->init_task() (iscsi_iser_task_init()) -> iser_initialize_task_headers() -> iscsi_prep_scsi_cmd_pdu() -> session->tt->init_task() (iscsi_iser_task_init()) -> iser_initialize_task_headers() __iscsi_conn_send_pdu() caller has iscsi_session::frwd_lock -> iscsi_prep_mgmt_task() -> session->tt->init_task() (iscsi_iser_task_init()) -> iser_initialize_task_headers() -> session->tt->xmit_task() ( The only callchain that is close to be invoked in preemptible context: iscsi_xmitworker() worker -> iscsi_data_xmit() -> iscsi_xmit_task() -> conn->session->tt->xmit_task() (iscsi_iser_task_xmit() In iscsi_iser_task_xmit() there is this check: if (!task->sc) return iscsi_iser_mtask_xmit(conn, task); so it does end up in iser_initialize_task_headers() and iser_initialize_task_headers() relies on iscsi_task::sc == NULL. Remove conditional locking of iser_conn::state_mutex because there is no call chain to do so. Remove the goto label and return early now that there is no clean up needed. Link: https://lore.kernel.org/r/20201204174256.62xfcvudndt7oufl@linutronix.de Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Cc: Sagi Grimberg <sagi@grimberg.me> Cc: Max Gurtovoy <maxg@nvidia.com> Cc: Doug Ledford <dledford@redhat.com> Cc: Jason Gunthorpe <jgg@ziepe.ca> Cc: linux-rdma@vger.kernel.org Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-07 16:05:12 -04:00
Maor Gottlieb	ca991a7d14	RDMA/mlx5: Assign dev to DM MR Currently, DM MR registration flow doesn't set the mlx5_ib_dev pointer and can cause a NULL pointer dereference if userspace dumps the MR via rdma tool. Assign the IB device together with the other fields and remove the redundant reference of mlx5_ib_dev from mlx5_ib_mr. Cc: stable@vger.kernel.org Fixes: `6c29f57ea4` ("IB/mlx5: Device memory mr registration support") Link: https://lore.kernel.org/r/20201203190807.127189-1-leon@kernel.org Signed-off-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-07 15:52:54 -04:00
Weihang Li	53ef4999f0	RDMA/hns: Move capability flags of QP and CQ to hns-abi.h These flags will be returned to the userspace through ABI, so they should be defined in hns-abi.h. Furthermore, there is no need to include hns-abi.h in every source files, it just needs to be included in the common header file. Link: https://lore.kernel.org/r/1606872560-17823-1-git-send-email-liweihang@huawei.com Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-07 15:48:51 -04:00
Mauro Carvalho Chehab	2988ca08ba	IB: Fix kernel-doc markups Some functions have different names between their prototypes and the kernel-doc markup. Others need to be fixed, as kernel-doc markups should use this format: identifier - description Link: https://lore.kernel.org/r/78b98c41a5a0f4c0106433d305b143028a4168b0.1606823973.git.mchehab+huawei@kernel.org Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-07 15:45:00 -04:00
Selvin Xavier	c63e1c4dfc	RDMA/bnxt_re: Fix max_qp_wrs reported While creating qps, the driver adds one extra entry to the sq size passed by the ULPs in order to avoid queue full condition. When ULPs creates QPs with max_qp_wr reported, driver creates QP with 1 more than the max_wqes supported by HW. Create QP fails in this case. To avoid this error, reduce 1 entry in max_qp_wqes and report it to the stack. Link: https://lore.kernel.org/r/1606741986-16477-1-git-send-email-selvin.xavier@broadcom.com Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com> Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-07 15:43:42 -04:00
Yejune Deng	c277f98b3e	RDMA/i40iw: Replace atomic_add_return(1, ..) atomic_inc_return() is a little neater Link: https://lore.kernel.org/r/1606726376-7675-1-git-send-email-yejune.deng@gmail.com Signed-off-by: Yejune Deng <yejune.deng@gmail.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-07 15:40:23 -04:00
Jason Gunthorpe	ef3642c4f5	RDMA/mlx5: Fix error unwinds for rereg_mr This is all a giant train wreck of error handling, in many cases the MR is left in some corrupted state where continuing on is going to lead to chaos, or various unwinds/order is missed. rereg had three possible completely different actions, depending on flags and various details about the MR. Split the three actions into three functions, and call the right action from the start. For each action carefully design the error handling to fit the action: - UMR access/PD update is a simple UMR, if it fails the MR isn't changed, so do nothing - PAS update over UMR is multiple UMR operations. To keep everything sane revoke access to the MKey while it is being changed and restore it once the MR is correct. - Recreating the mkey should completely build a parallel MR with a fully loaded PAS then swap and destroy the old one. If it fails the original should be left untouched. This is handled in the core code. Directly call the normal MR creation functions, possibly re-using the existing umem. Add support for working with ODP MRs. The READ/WRITE access flags can be changed by UMR and we can trivially convert to/from ODP MRs using the logic to build a completely new MR. This new logic also fixes various problems with MRs continuing to work while their PAS lists are no longer valid, eg during a page size change. Link: https://lore.kernel.org/r/20201130075839.278575-6-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-07 14:06:23 -04:00
Jason Gunthorpe	38f8ff5b44	RDMA/mlx5: Reorganize mlx5_ib_reg_user_mr() This function handles an ODP and regular MR flow all mushed together, even though the two flows are quite different. Split them into two dedicated functions. Link: https://lore.kernel.org/r/20201130075839.278575-5-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-07 14:06:23 -04:00
Jason Gunthorpe	6e0954b11c	RDMA/uverbs: Allow drivers to create a new HW object during rereg_mr mlx5 has an ugly flow where it tries to allocate a new MR and replace the existing MR in the same memory during rereg. This is very complicated and buggy. Instead of trying to replace in-place inside the driver, provide support from uverbs to change the entire HW object assigned to a handle during rereg_mr. Since destroying a MR is allowed to fail (ie if a MW is pointing at it) and can't be detected in advance, the algorithm creates a completely new uobject to hold the new MR and swaps the IDR entries of the two objects. The old MR in the temporary IDR entry is destroyed, and if it fails rereg_mr succeeds and destruction is deferred to FD release. This complexity is why this cannot live in a driver safely. Link: https://lore.kernel.org/r/20201130075839.278575-4-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-07 14:06:23 -04:00
Jason Gunthorpe	adac4cb3c1	RDMA/uverbs: Check ODP in ib_check_mr_access() as well No reason only one caller checks this. This properly blocks ODP from the rereg flow if the device does not support ODP. Link: https://lore.kernel.org/r/20201130075839.278575-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-07 14:06:23 -04:00
Jason Gunthorpe	b9653b31d7	RDMA/uverbs: Tidy input validation of ib_uverbs_rereg_mr() Unknown flags should be EOPNOTSUPP, only zero flags is EINVAL. Flags is actually the rereg action to perform. The checking of the start/hca_va/etc is also redundant and ib_umem_get() does these checks and returns proper error codes. Fixes: `7e6edb9b2e` ("IB/core: Add user MR re-registration support") Link: https://lore.kernel.org/r/20201130075839.278575-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-07 14:06:22 -04:00
Gal Pressman	87524494a7	RDMA/efa: Use dma_set_mask_and_coherent() to simplify code Use dma_set_mask_and_coherent() instead of pci_set_dma_mask() followed by a pci_set_consistent_dma_mask(). Link: https://lore.kernel.org/r/20201201091811.37984-1-galpress@amazon.com Reviewed-by: Firas JahJah <firasj@amazon.com> Reviewed-by: Yossi Leybovich <sleybo@amazon.com> Signed-off-by: Gal Pressman <galpress@amazon.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-01 21:03:23 -04:00
Weihang Li	05201e01be	RDMA/hns: Refactor process of setting extended sge The variable 'cnt' is used to represent the max number of sge an SQ WQE can use at first, then it means how many extended sge an SQ has. In addition, this function has no need to return a value. So refactor and encapsulate the parts of getting number of extended sge a WQE can use to make it easier to understand. Link: https://lore.kernel.org/r/1606558959-48510-4-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-01 20:58:43 -04:00
Yangyang Li	d34895c319	RDMA/hns: Bugfix for calculation of extended sge Page alignment is required when setting the number of extended sge according to the hardware's achivement. If the space of needed extended sge is greater than one page, the roundup_pow_of_two() can ensure that. But if the needed extended sge isn't 0 and can not be filled in a whole page, the driver should align it specifically. Fixes: `54d6638765` ("RDMA/hns: Optimize WQE buffer size calculating process") Link: https://lore.kernel.org/r/1606558959-48510-3-git-send-email-liweihang@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-01 20:58:42 -04:00
Lang Cheng	0fd0175e30	RDMA/hns: Fix 0-length sge calculation error One RC SQ WQE can store 2 sges but UD can't, so ignore 2 valid sges of wr.sglist for RC which have been filled in WQE before setting extended sge. Either of RC and UD can not contain 0-length sges, so these 0-length sges should be skipped. Fixes: `54d6638765` ("RDMA/hns: Optimize WQE buffer size calculating process") Link: https://lore.kernel.org/r/1606558959-48510-2-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-01 20:58:42 -04:00
Shiraz Saleem	1d11d26cf0	RDMA/i40iw: Remove push code from i40iw The push feature does not work as expected in x722 and has historically been disabled in the driver. Purge all remaining code related to the push feature in i40iw. Link: https://lore.kernel.org/r/20201125005616.1800-3-shiraz.saleem@intel.com Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-01 20:50:01 -04:00
Jason Gunthorpe	2b0a999ba0	Linux 5.10-rc6 -----BEGIN PGP SIGNATURE----- iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAl/EM9oeHHRvcnZhbGRz QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiG/3kH/RNkFyTlHlUkZpJx 8Ks2yWgUln7YhZcmOaG/IcIyWnhCgo3l35kiaH7XxM+rPMZzidp51MHUllaTAQDc u+5EFHMJsmTWUfE8ocHPb1cPdYEDSoVr6QUsixbL9+uADpRz+VZVtWMb89EiyMrC wvLIzpnqY5UNriWWBxD0hrmSsT4g9XCsauer4k2KB+zvebwg6vFOMCFLFc2qz7fb ABsrPFqLZOMp+16chGxyHP7LJ6ygI/Hwf7tPW8ppv4c+hes4HZg7yqJxXhV02QbJ s10s6BTcEWMqKg/T6L/VoScsMHWUcNdvrr3uuPQhgup240XdmB1XO8rOKddw27e7 VIjrjNw= =4ZaP -----END PGP SIGNATURE----- Merge tag 'v5.10-rc6' into rdma.git for-next For dependencies in following patches Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-12-01 20:40:50 -04:00
Lang Cheng	f93c39bc95	RDMA/hns: Add support for QP stash Stash is a mechanism that uses the core information carried by the ARM AXI bus to access the L3 cache. It can be used to improve the performance by increasing the hit ratio of L3 cache. QPs need to enable stash by default. Link: https://lore.kernel.org/r/1606374251-21512-3-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-27 12:53:59 -04:00
Lang Cheng	bfefae9f10	RDMA/hns: Add support for CQ stash Stash is a mechanism that uses the core information carried by the ARM AXI bus to access the L3 cache. It can be used to improve the performance by increasing the hit ratio of L3 cache. CQs need to enable stash by default. Link: https://lore.kernel.org/r/1606374251-21512-2-git-send-email-liweihang@huawei.com Signed-off-by: Lang Cheng <chenglang@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-27 12:53:59 -04:00
Yangyang Li	71586dd200	RDMA/hns: Create QP with selected QPN for bank load balance In order to improve performance by balancing the load between different banks of cache, the QPC cache is desigend to choose one of 8 banks according to lower 3 bits of QPN. The hns driver needs to count the number of QP on each bank and then assigns the QP being created to the bank with the minimum load first. Link: https://lore.kernel.org/r/1606220649-1465-1-git-send-email-liweihang@huawei.com Signed-off-by: Yangyang Li <liyangyang20@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-27 12:44:26 -04:00
Leon Romanovsky	66f57b871e	RDMA/restrack: Support all QP types The latest changes in restrack name handling allowed to simplify the QP creation code to support all types of QPs. For example XRC QP are presented with rdmatool. $ ibv_xsrq_pingpong & $ rdma res show qp link ibp0s9/1 lqpn 0 type SMI state RTS sq-psn 0 comm [ib_core] link ibp0s9/1 lqpn 1 type GSI state RTS sq-psn 0 comm [ib_core] link ibp0s9/1 lqpn 7 type UD state RTS sq-psn 0 comm [mlx5_ib] link ibp0s9/1 lqpn 42 type XRC_TGT state INIT sq-psn 0 path-mig-state MIGRATED comm [ib_uverbs] link ibp0s9/1 lqpn 43 type XRC_INI state INIT sq-psn 0 path-mig-state MIGRATED pdn 197 pid 419 comm ibv_xsrq_pingpong Link: https://lore.kernel.org/r/20201117070148.1974114-4-leon@kernel.org Reviewed-by: Mark Zhang <markz@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-27 11:38:46 -04:00
Leon Romanovsky	2b1f747071	RDMA/core: Allow drivers to disable restrack DB Driver QP types are special case with no IBTA restrictions. For example, EFA implemented creation of this QP type as regular one, while mlx5 separated create to two step: create and modify. That separation causes to the situation where DC QP (mlx5) is always added to the same xarray index zero. This change allows to drivers like mlx5 simply disable restrack DB tracking, but it doesn't disable kref on the memory. Fixes: `52e0a118a2` ("RDMA/restrack: Track driver QP types in resource tracker") Link: https://lore.kernel.org/r/20201117070148.1974114-3-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-27 11:38:46 -04:00
Leon Romanovsky	b47a98efa9	RDMA/core: Track device memory MRs Device memory (DM) are registered as MR during initialization flow, these MRs were not tracked by resource tracker and had res->valid set as a false. Update the code to manage them too. Before this change: $ ibv_rc_pingpong -j & $ rdma res show mr <-- shows nothing After this change: $ ibv_rc_pingpong -j & $ rdma res show mr dev ibp0s9 mrn 0 mrlen 4096 pdn 3 pid 734 comm ibv_rc_pingpong Fixes: `be934cca9e` ("IB/uverbs: Add device memory registration ioctl support") Link: https://lore.kernel.org/r/20201117070148.1974114-2-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@mellanox.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-27 11:38:46 -04:00
Parav Pandit	7ec3df174f	RDMA/mlx5: Use PCI device for dma mappings DMA operation of the IB device is done using ib_device->dma_device. Instead of accessing parent of the IB device, use the PCI dma device which is setup to ib_device->dma_device during IB device registration. Link: https://lore.kernel.org/r/20201125064628.8431-1-leon@kernel.org Signed-off-by: Parav Pandit <parav@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 15:49:05 -04:00
Leon Romanovsky	d4b2d19dc5	RDMA/mlx5: Silence the overflow warning while building offset mask Coverity reports "Potentially overflowing expression ..." warning, which is correct thing to complain from the compiler point of view, but this is not possible in the current code. Still, this is a small error as there are some future situations that might need to use a 32 bit offset. Use ULL so the calculation works up to 63. Fixes: `b045db62f6` ("RDMA/mlx5: Use ib_umem_find_best_pgoff() for SRQ") Link: https://lore.kernel.org/r/20201125061704.6580-1-leon@kernel.org Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 15:49:05 -04:00
Jason Gunthorpe	d0b7721c5e	RDMA/mlx5: Check for ERR_PTR from uverbs_zalloc() The return code from uverbs_zalloc() was wrongly checked, it is ERR_PTR not NULL like other allocators: drivers/infiniband/hw/mlx5/devx.c:2110 devx_umem_reg_cmd_alloc() warn: passing zero to 'PTR_ERR' Fixes: `878f7b31c3` ("RDMA/mlx5: Use ib_umem_find_best_pgsz() for devx") Link: https://lore.kernel.org/r/0-v1-4d05ccc1c223+173-devx_err_ptr_jgg@nvidia.com Reported-by: kernel test robot <lkp@intel.com> Acked-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 15:49:04 -04:00
Weihang Li	66d86e529d	RDMA/hns: Add UD support for HIP09 HIP09 supports service type of Unreliable Datagram, add necessary process to enable this feature. Link: https://lore.kernel.org/r/1605526408-6936-7-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 15:24:48 -04:00
Weihang Li	534c9bdb02	RDMA/hns: Simplify process of filling UD SQ WQE There are some codes can be simplified or encapsulated in set_ud_wqe() to make them easier to be understand. Link: https://lore.kernel.org/r/1605526408-6936-6-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 15:24:48 -04:00
Weihang Li	148f904c6f	RDMA/hns: Remove the portn field in UD SQ WQE This field in UD WQE in not used by hardware. Fixes: `7bdee4158b` ("RDMA/hns: Fill sq wqe context of ud type in hip08") Link: https://lore.kernel.org/r/1605526408-6936-5-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 15:24:48 -04:00
Weihang Li	3631dadfb1	RDMA/hns: Avoid setting loopback indicator when smac is same as dmac The loopback flag will be set to 1 by the hardware when the source mac address is same as the destination mac address. So the driver don't need to compare them. Fixes: `d6a3627e31` ("RDMA/hns: Optimize wqe buffer set flow for post send") Link: https://lore.kernel.org/r/1605526408-6936-4-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 15:24:48 -04:00
Weihang Li	fba429fcf9	RDMA/hns: Fix missing fields in address vector Traffic class and hop limit in address vector is not assigned from GRH, but it will be filled into UD SQ WQE. So the hardware will get a wrong value. Fixes: `82e620d9c3` ("RDMA/hns: Modify the data structure of hns_roce_av") Link: https://lore.kernel.org/r/1605526408-6936-3-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 15:24:47 -04:00
Weihang Li	7406c0036f	RDMA/hns: Only record vlan info for HIP08 Information about vlan is stored in GMV(GID/MAC/VLAN) table for HIP09, so there is no need to copy it to address vector. Link: https://lore.kernel.org/r/1605526408-6936-2-git-send-email-liweihang@huawei.com Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 15:24:47 -04:00
Avihai Horon	8138a4c21b	RDMA/mlx4: Enable querying AH for XRC QP types Address handle is set for connected QP types such as RC and UC, and thus can also be queried. Since XRC QP types INI and TGT are connected, it should be possible to query their address handle as well. Until now it was not the case, and although the firmware supported it, the driver allowed querying the address handle only for RC and UC. Hence, we enable it now for INI and TGT QPs as well. Link: https://lore.kernel.org/r/20201115121425.139833-3-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 11:58:19 -04:00
Avihai Horon	f957d4d09a	RDMA/mlx5: Enable querying AH for XRC QP types Address handle is set for connected QP types such as RC and UC, and thus can also be queried. Since XRC QP types INI and TGT are connected, it should be possible to query their address handle as well. Until now it was not the case, and although the firmware supported it, the driver allowed querying the address handle only for RC and UC. Hence, we enable it now for INI and TGT QPs as well. Link: https://lore.kernel.org/r/20201115121425.139833-2-leon@kernel.org Reviewed-by: Maor Gottlieb <maorg@nvidia.com> Signed-off-by: Avihai Horon <avihaih@nvidia.com> Signed-off-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 11:58:19 -04:00
Yixian Liu	17475e104d	RDMA/hns: Bugfix for memory window mtpt configuration When a memory window is bound to a memory region, the local write access should be set for its mtpt table. Fixes: `c7c2819140` ("RDMA/hns: Add MW support for hip08") Link: https://lore.kernel.org/r/1606386372-21094-1-git-send-email-liweihang@huawei.com Signed-off-by: Yixian Liu <liuyixian@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 10:57:32 -04:00
Wenpeng Liang	ab6f7248cc	RDMA/hns: Fix retry_cnt and rnr_cnt when querying QP The maximum number of retransmission should be returned when querying QP, not the value of retransmission counter. Fixes: `99fcf82521` ("RDMA/hns: Fix the wrong value of rnr_retry when querying qp") Fixes: `926a01dc00` ("RDMA/hns: Add QP operations support for hip08 SoC") Link: https://lore.kernel.org/r/1606382977-21431-1-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 10:57:32 -04:00
Wenpeng Liang	ebed7b7ca4	RDMA/hns: Fix wrong field of SRQ number the device supports The SRQ capacity is got from the firmware, whose field should be ended at bit 19. Fixes: `ba6bb7e974` ("RDMA/hns: Add interfaces to get pf capabilities from firmware") Link: https://lore.kernel.org/r/1606382812-23636-1-git-send-email-liweihang@huawei.com Signed-off-by: Wenpeng Liang <liangwenpeng@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-26 10:57:32 -04:00
Dennis Dalessandro	3d2a9d6425	IB/hfi1: Ensure correct mm is used at all times Two earlier bug fixes have created a security problem in the hfi1 driver. One fix aimed to solve an issue where current->mm was not valid when closing the hfi1 cdev. It attempted to do this by saving a cached value of the current->mm pointer at file open time. This is a problem if another process with access to the FD calls in via write() or ioctl() to pin pages via the hfi driver. The other fix tried to solve a use after free by taking a reference on the mm. To fix this correctly we use the existing cached value of the mm in the mmu notifier. Now we can check in the insert, evict, etc. routines that current->mm matched what the notifier was registered for. If not, then don't allow access. The register of the mmu notifier will save the mm pointer. Since in do_exit() the exit_mm() is called before exit_files(), which would call our close routine a reference is needed on the mm. We rely on the mmgrab done by the registration of the notifier, whereas before it was explicit. The mmu notifier deregistration happens when the user context is torn down, the creation of which triggered the registration. Also of note is we do not do any explicit work to protect the interval tree notifier. It doesn't seem that this is going to be needed since we aren't actually doing anything with current->mm. The interval tree notifier stuff still has a FIXME noted from a previous commit that will be addressed in a follow on patch. Cc: <stable@vger.kernel.org> Fixes: `e0cf75deab` ("IB/hfi1: Fix mm_struct use after free") Fixes: `3faa3d9a30` ("IB/hfi1: Make use of mm consistent") Link: https://lore.kernel.org/r/20201125210112.104301.51331.stgit@awfm-01.aw.intel.com Suggested-by: Jann Horn <jannh@google.com> Reported-by: Jason Gunthorpe <jgg@nvidia.com> Reviewed-by: Ira Weiny <ira.weiny@intel.com> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@cornelisnetworks.com> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@cornelisnetworks.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-25 20:30:46 -04:00
Jason Gunthorpe	dd37d2f59e	RDMA/cma: Fix deadlock on &lock in rdma_cma_listen_on_all() error unwind rdma_detroy_id() cannot be called under &lock - we must instead keep the error'd ID around until &lock can be released, then destroy it. This is complicated by the usual way listen IDs are destroyed through cma_process_remove() which can run at any time and will asynchronously destroy the same ID. Remove the ID from visiblity of cma_process_remove() before going down the destroy path outside the locking. Fixes: `c80a0c52d8` ("RDMA/cma: Add missing error handling of listen_id") Link: https://lore.kernel.org/r/20201118133756.GK244516@ziepe.ca Reported-by: syzbot+1bc48bf7f78253f664a9@syzkaller.appspotmail.com Reviewed-by: Leon Romanovsky <leonro@nvidia.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-25 11:07:01 -04:00
Shiraz Saleem	2ed381439e	RDMA/i40iw: Address an mmap handler exploit in i40iw i40iw_mmap manipulates the vma->vm_pgoff to differentiate a push page mmap vs a doorbell mmap, and uses it to compute the pfn in remap_pfn_range without any validation. This is vulnerable to an mmap exploit as described in: https://lore.kernel.org/r/20201119093523.7588-1-zhudi21@huawei.com The push feature is disabled in the driver currently and therefore no push mmaps are issued from user-space. The feature does not work as expected in the x722 product. Remove the push module parameter and all VMA attribute manipulations for this feature in i40iw_mmap. Update i40iw_mmap to only allow DB user mmapings at offset = 0. Check vm_pgoff for zero and if the mmaps are bound to a single page. Cc: <stable@kernel.org> Fixes: `d374984179` ("i40iw: add files for iwarp interface") Link: https://lore.kernel.org/r/20201125005616.1800-2-shiraz.saleem@intel.com Reported-by: Di Zhu <zhudi21@huawei.com> Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-25 10:38:11 -04:00
Xi Wang	6f6e2dcbb8	RDMA/hns: Refactor the hns_roce_buf allocation flow Add a group of flags to control the 'struct hns_roce_buf' allocation flow, this is used to support the caller running in atomic context. Link: https://lore.kernel.org/r/1605347916-15964-1-git-send-email-liweihang@huawei.com Signed-off-by: Xi Wang <wangxi11@huawei.com> Signed-off-by: Weihang Li <liweihang@huawei.com> Signed-off-by: Jason Gunthorpe <jgg@nvidia.com>	2020-11-23 19:36:54 -04:00

1 2 3 4 5 ...

11798 Commits