It always returns 0. Change return type to void.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
A DC Target (DCT) QP is represented in the hardware as a unique object.
This object is created by CREATE_DCT command and destroyed by DESTROY_DCT
command. However, in the driver we describe it as a QP.
The hardware command that creates a DCT needs parameters that the verb
create_qp() does not provide. Those remaining parameters are provided
with the call to the verb modify_qp(). Therefore we delay the actual
creation of a DCT in the hardware until the stage of modify_qp() to RTR.
A support for query_qp() was added as well. It uses QUERY_DCT command to
retrieve the applicable fields.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
DC Initiator (DCI) QP is represented like any other QP in the hardware.
However, like any other transport QP there are attributes and settings
that are special to DCI QP and needs specific attention and care.
Make necessary changes to configure DCI QP.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The QP type IB_QPT_DRIVER doesn't describe the transport or the service
that the QP provides but those are known only to the hardware driver.
The actual type of the QP is stored in the hardware driver context (i.e.
mlx5_qp) under the field qp_sub_type.
Take the real QP type and any extra data that is required to create the QP
from the driver channel and modify the QP initial attributes before continuing
with create_qp().
Downstream patches from this series will add support for both DCI and
DCT driver QPs.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Reporting that a device doesn't support RoCE seems like a valuable piece
of information to have when trying to determine why a driver is not binding
to a device. Better to report this at info log level instead of requiring
a user to enable all debug messages in the driver.
Signed-off-by: Jonathan Toppins <jtoppins@redhat.com>
Acked-By: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The cases for FDR/EDR signalling speed, were missing in
ib_rate_to_mult and mult_to_ib_rate giving wrong return values when
drivers convert static rate to/from inter-packet-delay.
Signed-off-by: Hans Westgaard Ry <hans.westgaard.ry@oracle.com>
Reviewed-by: Håkon Bugge <haakon.bugge@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
i40iw_wait_pe_ready is not called in an interrupt handler
nor holding a spinlock.
The function mdelay in it can be replaced with msleep,
to reduce busy wait.
Signed-off-by: Jia-Ju Bai <baijiaju1990@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
This patch adds static trace for RNRNAK timer. Currently the output from
hrtimer static trace only shows the addresses of hrtimers in the system
and there is no easy way to correlate an RNRNAK timer with its entries in
the hrtimer trace. This patch adds the correlation among a QP, its RNRNAK
timer, and its entries in the hrtimer trace. This correlation will be
enormously helpful when debugging RNRNAK related issues. In addition, this
patch cleans up rvt_stop_rnr_timer() to be void while here.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The get_unit_name() function crafts a string based on the device name
and the device unit number. It then stores this in a static variable.
This has concurrency issues as can be seen with this log:
hfi1 0000:02:00.0: hfi1_1: read_idle_message: read idle message 0x203
hfi1 0000:01:00.0: hfi1_1: read_idle_message: read idle message 0x203
The PCI device ID (0000:02:00.0 vs. 0000:01:00.0) is correct for the
message, but the device string hfi1_1 is incorrect (it should be
hfi1_0 for the second log message).
Remove get_unit_name() function.
Instead, use the rvt accessor rvt_get_ibdev_name() to get the IB name
string.
Clean up any hfi1_early_xx calls that can now use the new path.
QIB has the same (qib_get_unit_name()) issue. Updating as necessary.
Remove qib_get_unit_name() function.
Update log message that has redundant device name.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
CQ allocation does not ensure that completion queue entries
and the completion queue structure are allocated on the correct
numa node.
Fix by allocating the rvt_cq and kernel CQ entries on the device node,
leaving the user CQ entries on the default local node. Also ensure
CQ resizes use the correct allocator when extending a CQ.
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When an 8051 command times out, the entire DC block is restarted. During
the restart, the host interface version bit is set, which calls
do_8051_command() recursively. The host version bit needs to be set
before the link moves into polling, so the host version bit can be set
in set_local_link_attributes() instead. Thus, the 8051 command functions
can be simplied as a non-locking version (dd->dc8051_lock) of those
functions are no longer needed.
Fixes: 9be6a5d788 ("IB/hfi1: Prevent LNI out of sync by resetting host interface version")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Normal receive queue allocation ensures that kernel receive queues
are allocated on the local numa node. Shared receive queues
do not behave the same way.
Ensure that kernel shared receive queues are allocated on the device
local node.
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
rdmavt has a down call to client drivers to retrieve a crafted card
name.
This name should be the IB defined name.
Rather than craft the name each time it is needed, simply retrieve
the IB allocated name from the IB device.
Update the function name to reflect its application.
Clean up driver code to match this change.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Currently the HFI and QIB drivers allow the IB core to assign a unit
number to the driver name string.
If multiple devices exist in a system, there is a possibility that the
device unit number and the IB core number will be mismatched.
Fix by using the driver defined unit number to generate the device
name.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When the rdmavt's RNRNAK timer is fired, it tries to cancel the timer by
calling hrtimer_try_to_cancel(), which always returns -1 because the timer
is currently running. This patch removes this useless call.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Kaike Wan <kaike.wan@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Make sure that the initiator port GUID is stored in ch->ini_guid.
Note: when initiating a connection sgid and dgid members in struct
sa_path_rec represent the source and destination GIDs. When accepting
a connection however sgid represents the destination GID and dgid the
source GID.
Fixes: commit 2bce1a6d22 ("IB/srpt: Accept GUIDs as port names")
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
With the SRP protocol all RDMA operations are initiated by the target.
Since no RDMA operations are initiated by the initiator, do not grant
the initiator permission to submit RDMA reads or writes to the target.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Unconditional locks/list and ODP srcu initialization should be done in
the INIT stage. Remove those from the CAPS stage and move them to the
proper stage.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The loopback stage only initializes a lock, move it to be in
the CAPS initialization phase and get rid loopback step completely.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now that we have a stage just for hardware counters, move all relevant
initialization logic into one place.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now that we have a stage just for ODP, move all relevant
initialization logic into one place.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Now that we have a stage just for RoCE/ETH, move all relevant
initialization logic into one place.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Today we have single function which is used when we add an IB interface,
break this function into multiple functions.
Create stages and a generic mechanism to execute each stage.
This is in preparation for RDMA/IB representors which might not need
all stages or will do things differently in some of the stages.
This patch doesn't change any functionality.
Signed-off-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When debugging issues with RC QPs, it is useful to know if a QP
has an associated RQ or SRQ, the size of the RQ, and any RNR timeout
values.
Add the necessary information to the QP stats output.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
It needs to create eight reserve QPs for resolving
a bug of hip06. When deregistering mr, it will issue
a rdma write for every reserve QPs.
When modify qp from init to rtr, it needs to set
the value of dest_qp_num. Otherwise, it will lead
an error of freeing mr.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The QP can accept send work requests only when the QP is
in the states that allow them to be submitted.
This patch updates the QP state judgement based on the
specification.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When the length of sge is zero, the driver need to filter it
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch refactors the code of setting access flags
for RDMA operation as well as adds the scene when
attr->max_dest_rd_atomic is zero.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch fixes the usage with sr_max filed and rr_max of qp
context when modify qp. Its modifications include:
1. Adjust location of filling sr_max filed of qpc
2. Only assign the number of responder resource if
IB_QP_MAX_DEST_RD_ATOMIC bit is set
3. Only assign the number of outstanding resource if
IB_QP_MAX_QP_RD_ATOMIC
4. Fix the assgin algorithms for the field of sr_max
and rr_max of qp context
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch mainly implement rq inline data feature for hip08
RoCE in kernel mode.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If IB_CQ_REPORT_MISSED_EVENTS flag is passed in ib_req_notify_cq()
it may return positive value indicating non-empty CQ.
If return code not verified the log might be flooded with false
warning messages "request notify on send CQ failed".
Fixes: 8966e28d2e ("IB/ipoib: Use NAPI in UD/TX flows")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Pull RCU updates from Paul E. McKenney:
- Updates to use cond_resched() instead of cond_resched_rcu_qs()
where feasible (currently everywhere except in kernel/rcu and
in kernel/torture.c). Also a couple of fixes to avoid sending
IPIs to offline CPUs.
- Updates to simplify RCU's dyntick-idle handling.
- Updates to remove almost all uses of smp_read_barrier_depends()
and read_barrier_depends().
- Miscellaneous fixes.
- Torture-test updates.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
The commit 1a1c116f3d ("RDMA/netlink: Simplify the put_msg and put_attr")
removes nlmsg_len calculation in ibnl_put_attr causing netlink messages and
caused to miss source and destination addresses.
Fixes: 1a1c116f3d ("RDMA/netlink: Simplify the put_msg and put_attr")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Holding locks is mandatory when calling __ib_device_get_by_index,
otherwise there are races during the list iteration with device removal.
Since the locks are static to device.c, __ib_device_get_by_index can
never be called correctly by any user out side the file.
Make the function static and provide a safe function that gets the
correct locks and returns a kref'd pointer. Fix all callers.
Fixes: e5c9469efc ("RDMA/netlink: Add nldev device doit implementation")
Fixes: c3f66f7b00 ("RDMA/netlink: Implement nldev port doit callback")
Fixes: 7d02f605f0 ("RDMA/netlink: Add nldev port dumpit implementation")
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The NLDEV commands are using IB device indexes and names as a handle
for netlink communications. Put all relevant code into one function
to remove code duplication in followup patches.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is an existing function to decrease reference counter
of the device, let's use it.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The request_module() call is internally wrapped by CONFIG_MODULE,
so there is no need to check it in our RDMA code too.
Refactor to simplify the code.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The RXE driver is standalone module and hence doesn't need to export
symbols, nor does this one line function deserve to be not inlined.
Reviewed-by: Mark Bloch <markb@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Advance the qed* drivers to use firmware 8.33.1.0:
Modify core driver (qed) to utilize the new FW and initialize the device
with it. This is the lion's share of the patch, and includes changes to FW
interface files, device initialization flows, FW interaction flows, and
debug collection flows.
Modify Ethernet driver (qede) to make use of new FW in fastpath.
Modify RoCE/iWARP driver (qedr) to make use of new FW in fastpath.
Modify FCoE driver (qedf) to make use of new FW in fastpath.
Modify iSCSI driver (qedi) to make use of new FW in fastpath.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Yuval Bason <Yuval.Bason@cavium.com>
Signed-off-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Manish Chopra <Manish.Chopra@cavium.com>
Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch renames defines and structures in the FW HSI files to allow a
distinction between different types of HW.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Chad Dupuis <Chad.Dupuis@cavium.com>
Signed-off-by: Manish Rangankar <Manish.Rangankar@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
This patch refactors and reorders the FW API files in preparation of
upgrading the code to support new FW.
- Make use of the BIT macro in appropriate places.
- Whitespace changes to align values and code blocks.
- Comments are updated (spelling mistakes, removed if not clear).
- Group together code blocks which are related or deal with similar
matters.
Signed-off-by: Ariel Elior <Ariel.Elior@cavium.com>
Signed-off-by: Michal Kalderon <Michal.Kalderon@cavium.com>
Signed-off-by: Tomer Tayar <Tomer.Tayar@cavium.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.
Done using Coccinelle.
Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
0-day tested with no failures.
Suggested-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.
Done using Coccinelle.
Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
0-day tested with no failures.
Suggested-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.
Done using Coccinelle.
Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
0-day tested with no failures.
Suggested-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Acked-by: Ram Amrani <Ram.Amrani@cavium.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.
Done using Coccinelle.
Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
0-day tested with no failures.
Suggested-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use dma_zalloc_coherent for allocating zeroed
memory and remove unnecessary memset function.
Done using Coccinelle.
Generated-by: scripts/coccinelle/api/alloc/kzalloc-simple.cocci
0-day tested with no failures.
Suggested-by: Luis R. Rodriguez <mcgrof@kernel.org>
Signed-off-by: Himanshu Jha <himanshujha199640@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When using enhanced mode for IPoIB, two threads may execute xmit in
parallel to two different TX queues while the target is the same.
In this case, both of them will add the same neighbor to the path's
neigh link list and we might see the following message:
list_add double add: new=ffff88024767a348, prev=ffff88024767a348...
WARNING: lib/list_debug.c:31__list_add_valid+0x4e/0x70
ipoib_start_xmit+0x477/0x680 [ib_ipoib]
dev_hard_start_xmit+0xb9/0x3e0
sch_direct_xmit+0xf9/0x250
__qdisc_run+0x176/0x5d0
__dev_queue_xmit+0x1f5/0xb10
__dev_queue_xmit+0x55/0xb10
Analysis:
Two SKB are scheduled to be transmitted from two cores.
In ipoib_start_xmit, both gets NULL when calling ipoib_neigh_get.
Two calls to neigh_add_path are made. One thread takes the spin-lock
and calls ipoib_neigh_alloc which creates the neigh structure,
then (after the __path_find) the neigh is added to the path's neigh
link list. When the second thread enters the critical section it also
calls ipoib_neigh_alloc but in this case it gets the already allocated
ipoib_neigh structure, which is already linked to the path's neigh
link list and adds it again to the list. Which beside of triggering
the list, it creates a loop in the linked list. This loop leads to
endless loop inside path_rec_completion.
Solution:
Check list_empty(&neigh->list) before adding to the list.
Add a similar fix in "ipoib_multicast.c::ipoib_mcast_send"
Fixes: b63b70d877 ('IPoIB: Use a private hash table for path lookup in xmit path')
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
ibmr.device is being set only after ib_alloc_mr() is successfully complete.
Therefore, in case imlx4_mr_enable() returns with error, the error flow
unwinder calls to mlx4_free_priv_pages(), which uses ibmr.device.
Such usage causes to NULL dereference oops and to fix it, the IB device
should be set in the mr struct earlier stage (e.g. prior to calling
mlx4_free_priv_pages()).
Fixes: 1b2cd0fc67 ("IB/mlx4: Support the new memory registration API")
Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
These prints not neccesarily mean error/warning, so changing them to
debug.
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds more detailed comments when we call the
memory barrier function, such as rmb, wmb and mb. Three
mb() callers are deleted since they are unnecessary.
Suggested-by: Jason Gunthorpe <jgg@mellanox.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
- cxgb4 fix for an iser testing failure as debugged by Steve and Sagi.
The problem was a driver bug in the handling of shutting down a QP.
- Various vmw_pvrdma fixes for bogus WARN_ON, missed resource free on error
unwind and a use after free bug
- Improper congestion counter values on mlx5 when link aggregation is enabled
- ipoib lockdep regression introduced in this merge window
- hfi1 regression supporting the device in a VM introduced in a recent patch
- Typo that breaks future uAPI compatibility in the verbs core
- More SELinux related oops fixing
- Fix an oops during error unwind in mlx5
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJaRIC/AAoJEDht9xV+IJsaJfQP/1Z97/kDlJGIJQ4vBJ52xdHV
LfRdmCBqU5nrAihEBpFLRc2S+kaSJbYAY48tRn28Jx6s9dmSvU6v2J2IqhmnM6p6
ruWLR0Yqjg+xHcw+eaEoscJjRw+jDUEeVOgfbYc0HViWwvMNTrBB32HpAV48HuAl
aCbM/qrQYXdYuJBImM4glERIpjlvYKoxv4D9xCJhJRRQvTnKOymHzZpKbqNujWxl
dzCmZeOrw+HVxNW9MHHtUxClBoLNnykfRVKzMcdDjsqJ+Fdo2bY3ksgMvgiatRwY
NxGfixhouhOz9vjN/ljpWXxTV5TTm6Nrib8XcHuOWjcYn/AFwJMMRsM+1w1AuCKs
Zviq7QVApZzYuvHw1ewupRGvDX+P13sufD5sbc6cfVUT3w6ZX0Clpspl4++JN4ER
WvBZikozaviL3w9ir0drlZ6k9BDnjQ6P7wZcBjDZC/j0zXKM65rISZrTsK7TeiTk
lBNdLCkwZhO0dvafCNwA910tTaXEPhqqAh8Okob2A5U5lUAewd0AEHJusL/iCmSl
uXnnxu8ik61QzOqwneEHSyVMkOSLEC+kk13fiFAq/LjPUSm9N/MihZd4JNxwSa6W
4Rah7IKdh9F6qEnaKLPEfHxPhfghhb7O51zCA8mwA/JNCneqc4Gqi0U2JXkuloml
395aK2aZSShIkZvIwbI8
=IkGi
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"This is the next batch of for-rc patches from RDMA. It includes the
fix for the ipoib regression I mentioned last time, and the result of
a fairly major debugging effort to get iser working reliably on cxgb4
hardware - it turns out the cxgb4 driver was not handling QP error
flushing properly causing iser to fail.
- cxgb4 fix for an iser testing failure as debugged by Steve and
Sagi. The problem was a driver bug in the handling of shutting down
a QP.
- Various vmw_pvrdma fixes for bogus WARN_ON, missed resource free on
error unwind and a use after free bug
- Improper congestion counter values on mlx5 when link aggregation is
enabled
- ipoib lockdep regression introduced in this merge window
- hfi1 regression supporting the device in a VM introduced in a
recent patch
- Typo that breaks future uAPI compatibility in the verbs core
- More SELinux related oops fixing
- Fix an oops during error unwind in mlx5"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/mlx5: Fix mlx5_ib_alloc_mr error flow
IB/core: Verify that QP is security enabled in create and destroy
IB/uverbs: Fix command checking as part of ib_uverbs_ex_modify_qp()
IB/mlx5: Serialize access to the VMA list
IB/hfi: Only read capability registers if the capability exists
IB/ipoib: Fix lockdep issue found on ipoib_ib_dev_heavy_flush
IB/mlx5: Fix congestion counters in LAG mode
RDMA/vmw_pvrdma: Avoid use after free due to QP/CQ/SRQ destroy
RDMA/vmw_pvrdma: Use refcount_dec_and_test to avoid warning
RDMA/vmw_pvrdma: Call ib_umem_release on destroy QP path
iw_cxgb4: when flushing, complete all wrs in a chain
iw_cxgb4: reflect the original WR opcode in drain cqes
iw_cxgb4: Only validate the MSN for successful completions
This patch enables QP creation with a given BF index, this allows the
user space driver to share same BF between few QPs or alternatively have
a dedicated BF per QP.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch exposes the option to dynamic allocates a UAR, this
functionality will be used in downstream patch in this series as
part of QP creation.
Specifically, the user space driver asks for a UAR allocation in a given
page index, upon success this UAR and its bfregs can be used as part of
QP creation by the user space driver.
To enable allocating more than 256 UARs the page index is encoded in an
extra one byte just after the command byte.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch extends the alloc context flow to be prepared for working
with dynamic UAR allocations.
Currently upon alloc context there is some fix size of UARs that are
allocated (named 'static allocation') and there is no option to user
application to ask for more or control which UAR will be used by which
QP.
In this patch the driver prepares its data structures to manage both the
static and the dynamic allocations and let the user driver knows about
the max value of dynamic blue-flame registers that are allowed.
Downstream patches from this series will enable the dynamic allocation
and the association as part of QP creation.
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add missing inner RSS support capability as part of
the RSS supported fields.
In addition change MLX5_RX_HASH_INNER to 1UL << 31 in
order to define it as unsigned.
Fixes: 309fa3470f ("IB/mlx5: Add support for RSS on the inner packet")
Signed-off-by: Maor Gottlieb <maorg@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Support RSS hash for inner headers according to a new flag,
MLX4_IB_RX_HASH_INNER provided by the vendor channel.
In case the flag is set, RSS hash will be done on the inner headers of
VXLAN packets (which are encapsulated).
Non-encapsulated packets will be hashed according to the outer headers.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Patches for 4.16 that are dependent on patches sent to 4.15-rc.
These are small clean ups for the vmw_pvrdma and i40iw drivers.
* 'from-rc' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git:
RDMA/vmw_pvrdma: Remove usage of BIT() from UAPI header
RDMA/vmw_pvrdma: Use refcount_t instead of atomic_t
RDMA/vmw_pvrdma: Use more specific sizeof in kcalloc
RDMA/vmw_pvrdma: Clarify QP and CQ is_kernel logic
RDMA/vmw_pvrdma: Add UAR SRQ macros in ABI header file
i40iw: Change accelerated flag to bool
refcount_t is the preferred type for refcounts. Change the
QP and CQ refcnt fields to use refcount_t.
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Convert the sizeof(void *) in two kcalloc calls to be more
specific for the arrays that are being allocated.
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Be more consistent in setting and checking is_kernel
flag for QPs and CQs.
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The accelerated flag only utilizes two values: 0 and 1.
Modify accelerated flag in struct i40iw_cm_node to bool.
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Delete ibnl_chk_listeners() and its kernel-doc comments from the
core_priv.h header file. There is no such function.
Fixes: 233c195583 ("RDMA/netlink: Reduce exposure of RDMA netlink functions")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The rq/sq->psn is 24 bits as defined in the IB spec, therefore we mask
out the 8 most significant bits to avoid overflow in modify_qp.
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
ibmr.device is being set only after ib_alloc_mr() is
(successfully) complete. Therefore, in case mlx5_core_create_mkey()
return with error, the error flow calls mlx5_free_priv_descs()
which uses ibmr.device (which doesn't exist yet), causing
a NULL dereference oops.
To fix this, the IB device should be set in the mr struct earlier
stage (e.g. prior to calling mlx5_core_create_mkey()).
Fixes: 8a187ee52b ("IB/mlx5: Support the new memory registration API")
Signed-off-by: Max Gurtovoy <maxg@mellanox.com>
Signed-off-by: Nitzan Carmi <nitzanc@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The XRC target QP create flow sets up qp_sec only if there is an IB link with
LSM security enabled. However, several other related uAPI entry points blindly
follow the qp_sec NULL pointer, resulting in a possible oops.
Check for NULL before using qp_sec.
Cc: <stable@vger.kernel.org> # v4.12
Fixes: d291f1a652 ("IB/core: Enforce PKey security on QPs")
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If the input command length is larger than the kernel supports an error should
be returned in case the unsupported bytes are not cleared, instead of the
other way aroudn. This matches what all other callers of ib_is_udata_cleared
do and will avoid user ABI problems in the future.
Cc: <stable@vger.kernel.org> # v4.10
Fixes: 189aba99e7 ("IB/uverbs: Extend modify_qp and support packet pacing")
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
User-space applications can do mmap and munmap directly at
any time.
Since the VMA list is not protected with a mutex, concurrent
accesses to the VMA list from the mmap and munmap can cause
data corruption. Add a mutex around the list.
Cc: <stable@vger.kernel.org> # v4.7
Fixes: 7c2344c3bb ("IB/mlx5: Implements disassociate_ucontext API")
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Change the slid arg to ingress_pkey_table_fail() to a full
32Bits and do not convert to 16Bits in caller. This is so we
can keep everything 32bit in the kernel and only change to
16bit at the uapi boundary.
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use rdma_cap_opa_mad() to check for OPA to promote code reuse.
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The accepting QP ORD value should be adjusted not to
exceed the peer QP IRD value (RFC 6581). This is
skipped for loopback. After the ORD is validated
by i40iw_record_ird_ord(), adjust the ORD value of
the loopback accepting QP to prevent overrunning the
IRD space of the peer QP. Also move the ORD accounting
for 0-byte RDMA read to i40iw_record_ird_ord().
Fixes: f27b4746f3 ("i40iw: add connection management code")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Casting to u16 before validating IRD/ORD connection
parameters could cause recording wrong IRD/ORD values
in the cm_node. Validate the IRD/ORD parameters as
they are passed by the application before recording
them.
Fixes: f27b4746f3 ("i40iw: add connection management code")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The LLP_DOUBT_REACHABILITY Asynchronous Event (AE) is an early
warning of a connection issue. It is followed by LLP_TOO_MANY_RETRIES
AE, if the retransmit threshold is reached and recovery is not possible
for the connection.
Currently we terminate the connection on receiving the
LLP_DOUBT_REACHABILITY AE. Ignore this AE and
terminate the connection only on LLP_TOO_MANY_RETRIES AE.
This improves the user experience on cable disconnect/reconnect
scenario while running iWARP traffic. On cable disconnect,
the QP traffic is paused and the user has a larger and more
reasonable timeout within which if the cable is reconnected,
traffic can continue.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Partial FPDU processing is broken as the sequence number
for the first partial FPDU is wrong due to incorrect
Q2 buffer offset. The offset should be 64 rather than 16.
Fixes: 786c6adb3a ("i40iw: add puda code")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
On IP address change event, all connected QPs are torn down
irrespective of whether IP address is involved in a connection.
Only teardown connections those source or destination address
matches the netdev interface IP address being changed, and if
they are on the same VLAN as the netdev.
Fixes: e5e74b61b1 ("i40iw: Add IP addr handling on netdev events")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Register a netdevice notifier for netdev UP/DOWN
notification events and report the appropriate ib event.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Lower Inbound RDMA Read Queue (Q1) object count by a factor of 2
as it is incorrectly doubled. Also, round up Q1 and Transmit FIFO (XF)
object count to power of 2 to satisfy hardware requirement.
Fixes: 86dbcd0f12 ("i40iw: add file to handle cqp calls")
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Consolidate all power of 2 round calculations to
use kernel utility function roundup_pow_of_two().
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Increase I40IW_MAX_IRD_SIZE to 64 which is the device limit.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
SA queries SM for class port info when there is a LID_CHANGE event.
When a base lid is configured before fm is started ie when smlid is
not yet assigned, SA handles the LID_CHANGE event and tries query SM
with lid 0. This will cause an hang.
[ 1106.958820] INFO: task kworker/2:0:23 blocked for more than 120 seconds.
[ 1106.965082] Tainted: G O 4.12.0+ #1
[ 1106.969602] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables
this message.
[ 1106.977227] kworker/2:0 D 0 23 2 0x00000000
[ 1106.977250] Workqueue: infiniband update_ib_cpi [ib_core]
[ 1106.977261] Call Trace:
[ 1106.977273] __schedule+0x28e/0x860
[ 1106.977285] schedule+0x36/0x80
[ 1106.977298] schedule_timeout+0x1a3/0x2e0
[ 1106.977310] ? radix_tree_iter_tag_clear+0x1b/0x20
[ 1106.977322] ? idr_alloc+0x64/0x90
[ 1106.977334] wait_for_completion+0xe3/0x140
[ 1106.977347] ? wake_up_q+0x80/0x80
[ 1106.977369] update_ib_cpi+0x163/0x210 [ib_core]
[ 1106.977381] process_one_work+0x147/0x370
[ 1106.977394] worker_thread+0x4a/0x390
[ 1106.977406] kthread+0x109/0x140
[ 1106.977418] ? process_one_work+0x370/0x370
[ 1106.977430] ? kthread_park+0x60/0x60
[ 1106.977443] ret_from_fork+0x22/0x30
Always ensure a proper smlid is assigned before querying SM for cpi.
Fixes: ee1c60b1bf ("IB/SA: Modify SA to implicitly cache Class Port info")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The accelerated flag only utilizes two values: 0 and 1.
Modify accelerated flag in struct nes_cm_node to bool.
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
During driver init, various registers are saved to allow restoration
after an FLR or gen3 bump. Some of these registers are not available
in some circumstances (i.e. Virtual machines).
This bug makes the driver unusable when the PCI device is passed into
a VM, it fails during probe.
Delete unnecessary register read/write, and only access register if
the capability exists.
Cc: <stable@vger.kernel.org> # 4.14.x
Fixes: a618b7e40a ("IB/hfi1: Move saving PCI values to a separate function")
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
These duplicate includes have been found with scripts/checkincludes.pl but
they have been removed manually to avoid removing false positives.
Signed-off-by: Pravin Shedge <pravin.shedge4linux@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch adds eq support for hip08. The eq table can
be multi-hop addressed.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Reviewed-by: Lijun Ou <oulijun@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Considering the compatibility of supporting hip08's eq
process and possible changes of data structure, this patch
refactors the eq code structure of hip06.
We move all the eq process code for hip06 from hns_roce_eq.c
into hns_roce_hw_v1.c, and also for hns_roce_eq.h. With
these changes, it will be convenient to add the eq support
for later hardware version.
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Reviewed-by: Lijun Ou <oulijun@huawei.com>
Reviewed-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The locking order of vlan_rwsem (LOCK A) and then rtnl (LOCK B),
contradicts other flows such as ipoib_open possibly causing a deadlock.
To prevent this deadlock heavy flush is called with RTNL locked and
only then tries to acquire vlan_rwsem.
This deadlock is possible only when there are child interfaces.
[ 140.941758] ======================================================
[ 140.946276] WARNING: possible circular locking dependency detected
[ 140.950950] 4.15.0-rc1+ #9 Tainted: G O
[ 140.954797] ------------------------------------------------------
[ 140.959424] kworker/u32:1/146 is trying to acquire lock:
[ 140.963450] (rtnl_mutex){+.+.}, at: [<ffffffffc083516a>] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib]
[ 140.970006]
but task is already holding lock:
[ 140.975141] (&priv->vlan_rwsem){++++}, at: [<ffffffffc0834ee1>] __ipoib_ib_dev_flush+0x51/0x4e0 [ib_ipoib]
[ 140.982105]
which lock already depends on the new lock.
[ 140.990023]
the existing dependency chain (in reverse order) is:
[ 140.998650]
-> #1 (&priv->vlan_rwsem){++++}:
[ 141.005276] down_read+0x4d/0xb0
[ 141.009560] ipoib_open+0xad/0x120 [ib_ipoib]
[ 141.014400] __dev_open+0xcb/0x140
[ 141.017919] __dev_change_flags+0x1a4/0x1e0
[ 141.022133] dev_change_flags+0x23/0x60
[ 141.025695] devinet_ioctl+0x704/0x7d0
[ 141.029156] sock_do_ioctl+0x20/0x50
[ 141.032526] sock_ioctl+0x221/0x300
[ 141.036079] do_vfs_ioctl+0xa6/0x6d0
[ 141.039656] SyS_ioctl+0x74/0x80
[ 141.042811] entry_SYSCALL_64_fastpath+0x1f/0x96
[ 141.046891]
-> #0 (rtnl_mutex){+.+.}:
[ 141.051701] lock_acquire+0xd4/0x220
[ 141.055212] __mutex_lock+0x88/0x970
[ 141.058631] __ipoib_ib_dev_flush+0x2da/0x4e0 [ib_ipoib]
[ 141.063160] __ipoib_ib_dev_flush+0x71/0x4e0 [ib_ipoib]
[ 141.067648] process_one_work+0x1f5/0x610
[ 141.071429] worker_thread+0x4a/0x3f0
[ 141.074890] kthread+0x141/0x180
[ 141.078085] ret_from_fork+0x24/0x30
[ 141.081559]
other info that might help us debug this:
[ 141.088967] Possible unsafe locking scenario:
[ 141.094280] CPU0 CPU1
[ 141.097953] ---- ----
[ 141.101640] lock(&priv->vlan_rwsem);
[ 141.104771] lock(rtnl_mutex);
[ 141.109207] lock(&priv->vlan_rwsem);
[ 141.114032] lock(rtnl_mutex);
[ 141.116800]
*** DEADLOCK ***
Fixes: b4b678b06f ("IB/ipoib: Grab rtnl lock on heavy flush when calling ndo_open/stop")
Signed-off-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Congestion counters are counted and queried per physical function.
When working in LAG mode, CNP packets can be sent or received on both
of the functions, thus congestion counters should be aggregated from
the two physical functions.
Fixes: e1f24a79f4 ("IB/mlx5: Support congestion related counters")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Aviv Heller <avivh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The use of wait queues in vmw_pvrdma for handling concurrent
access to a resource leaves a race condition which can cause a use
after free bug.
Fix this by using the pattern from other drivers, complete() protected by
dec_and_test to ensure complete() is called only once.
Fixes: 29c8d9eba5 ("IB: Add vmw_pvrdma driver")
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
refcount_dec generates a warning when the operation
causes the refcount to hit zero. Avoid this by using
refcount_dec_and_test.
Fixes: 8b10ba783c ("RDMA/vmw_pvrdma: Add shared receive queue support")
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If a wr chain was posted and needed to be flushed, only the first
wr in the chain was completed with FLUSHED status. The rest were
never completed. This caused isert to hang on shutdown due to the
missing completions which left iscsi IO commands referenced, stalling
the shutdown.
Fixes: 4fe7c2962e ("iw_cxgb4: refactor sq/rq drain logic")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The flush/drain logic was not retaining the original wr opcode in
its completion. This can cause problems if the application uses
the completion opcode to make decisions.
Use bit 10 of the CQE header word to indicate the CQE is a special
drain completion, and save the original WR opcode in the cqe header
opcode field.
Fixes: 4fe7c2962e ("iw_cxgb4: refactor sq/rq drain logic")
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If the RECV CQE is in error, ignore the MSN check. This was causing
recvs that were flushed into the sw cq to be completed with the wrong
status (BAD_MSN instead of FLUSHED).
Cc: stable@vger.kernel.org
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Debugfs file reset_stats is created with S_IRUSR permissions,
but ocrdma_dbgfs_ops_read() doesn't support OCRDMA_RESET_STATS,
whereas ocrdma_dbgfs_ops_write() supports only OCRDMA_RESET_STATS.
The patch fixes misstype with permissions.
Found by Linux Driver Verification project (linuxtesting.org).
Signed-off-by: Anton Vasilyev <vasilyev@ispras.ru>
Acked-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When path ah_attr initialization from path record
fails, ib_cm_send_rej() uses av.ah_attr fields to send out reject
message. In such cases initialization of path record software fields
is not needed. Code is simplified for same.
Additionally in current code in cm_req_handler, when ib_get_cached_gid
fails for a given sgid_index of the GID of the GRH of the incoming CM MAD,
error code 12 is sent. This error code refers to primary GID in incoming
CM REQ and not for the GID in in MAD packet.
Therefore code is refactored to send code 5 (unsupported request) for such
error.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Hal Rosenstock <hal@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently ib_init_ah_from_wc initializes address handle attributes and
not the address handle object itself.
To avoid confusion between ah_attr vs ah, ib_init_ah_from_wc is
renamed to ib_init_ah_attr_from_wc to reflect that its initialzes
ah_attr.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since ib_init_ah_from_path initializes the address handle attribute, it is
renamed to reflect so.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In case of LAP are used for RoCE, it can lead to a problem of sleeping a
context while spin lock is held in below flow.
cm_lap_handler
->spin_lock
-> <..switch_case..>
-> cm_init_av_for_response
-> ib_init_ah_from_wc
-> rdma_addr_find_l2_eth_by_grh
wait_for_completion()
Therefore ah attribute initialization is done for incoming lap requests
outside of the lock context.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
cm_init_av_by_path depends on ib_init_ah_from_path to initialize ah
attribute and ib_init_ah_from_path() can fail, such error should not
be ignored.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
cm_init_av_for_response depends on ib_init_ah_from_wc() whose return
status is ignored.
ib_init_ah_from_wc() can fail and its return status should be handled as
done in this patch.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently there are no users of ib_find_gid for RoCE transport. It is
only used by IPoIB.
Therefore its simplified to ignore RoCE ports and GID type check which
was previously done for every port.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
rdma_copy_addr copies the ifndex to bound_dev_if.
Therefore avoid copying it again after rdma_copy_addr call is completed.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Since no caller needs vlan, rdma_translate_ip is simplified to avoid
vlan pointer.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
rdma_addr_find_smac_by_sgid() is exported symbol not used by any kernel
module. Therefore its removed.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
rdma_resolve_ip already copies 'addr' to its dev_addr argument.
Remove the duplicate memcpy and since it was the only user, remove the
'addr' member from resolve_cb_context.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
ib_find_gid_by_filter() is used only by ib_core, therefore avoid
exporting it.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
While searching for GID, returned index is not used, so avoid passing
pointer during invocation.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently on every gid entry comparison miss found variable is checked;
which is not needed as those two comparison fail already indicate that
GID is not found yet.
So refactor to avoid such check and copy the GID index when found.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Type cast from void to struct find_gid_index_context is not needed.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Introduce and user helper functions to initialize work for address
resolved and route resolved event that avoid code duplication at few
places.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Avoid setting path record type twice for RoCE.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Current code checks for NULL ndev twice where 2nd check is always
invalid given the fact that during route resolving stage, device address
must be bound to netdevice interface.
This patch simplifies such check.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
As the function name suggests cma_resolve_iboe_route() resolves RoCE
route. However, its default GID type is IB_GID_TYPE_IB and not
IB_GID_TYPE_ROCE, even though both are mapped to the same enum value.
Change default GID type to IB_GID_TYPE_ROCE.
cma_iboe_set_mgid() is updated to reflect the RoCEv2 GID check.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In ib_umem structure npages holds original number of sg entries, while
nmap is number of DMA blocks returned by dma_map_sg.
Fixes: c5d76f130b ('IB/core: Add umem function to read data from user-space')
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Add debug prints to the error paths in the connection manager control
flows, to help debug connection management problems.
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The code was using the src size when formatting the dst. They are almost
certainly the same value but it reads wrong.
Fixes: ce117ffac2 ("RDMA/cma: Export AF_IB statistics")
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In case that the PathRecord is not valid (SM changed its network prefix)
ipoib will continue issue PathQuery requests with the same parameters
that are in its database, which are no longer valid anymore.
Now the driver in that case will re-initialize the record from a valid
place (the priv structure keeps the updated values), and a valid request
will be issued.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Alex Vesker <valex@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The ipoib path database is organized around DGIDs from the LLADDR, but the
SA is free to return a different GID when asked for path. This causes a
bug because the SA's modified DGID is copied into the database key, even
though it is no longer the correct lookup key, causing a memory leak and
other malfunctions.
Ensure the database key does not change after the SA query completes.
Demonstration of the bug is as follows
ipoib wants to send to GID fe80:0000:0000:0000:0002:c903:00ef:5ee2, it
creates new record in the DB with that gid as a key, and issues a new
request to the SM.
Now, the SM from some reason returns path-record with other SGID (for
example, 2001:0000:0000:0000:0002:c903:00ef:5ee2 that contains the local
subnet prefix) now ipoib will overwrite the current entry with the new
one, and if new request to the original GID arrives ipoib will not find
it in the DB (was overwritten) and will create new record that in its
turn will also be overwritten by the response from the SM, and so on
till the driver eats all the device memory.
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
ib_security_modify_qp and ib_security_pkey_access are core internal
function. So avoid exporting them.
ib_security_pkey_access is used only when secuirty hooks are enabled so
avoid defining it otherwise.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
RoCEv1 does not use the IPv6 stack to resolve the link local DGID since it
uses GID address. It forms the DMAC directly from the DGID.
The code became confused and also tried to use this bypass for RoCEv2
packets, however RoCEv2 always uses a IP address in the GID and must
always use ARP or neighbor discovery to get the DMAC address.
Now that rdma_addr_find_l2_eth_by_grh() supports resolving link local
address to find destination mac address, lets make use of it.
This aligns it to how the rest of the IPv6 stack resolves link local
destination IPv6 address.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Eli Cohen <eli@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When computing a UD reverse path (return AH) from a WC the code was not
doing a route lookup anchored in a specific netdevice. This caused several
bugs, including broken IPv6 link-local address support in RoCEv2. [1]
This fixes the lookup by determining the GID table entry that the HW
matched to the SGID for the WC and then using the netdevice from that
entry to perform the route and ND lookup for the 'DGID' to build a return
AH.
RoCE GID table management ensures that right upper netdevices of the
physical netdevices are added. Therefore init_ah_from_wc doesn't need to
perform such check.
Now that route lookup is done based on the netdevice of the GID entry,
simplify code to not have ifindex and vlan pointers. As part of that,
refactor to have netdevice as input parameter. This is already discussed
at [2].
Finally ib_init_ah_from_wc resolves dmac for unicast GID in similar way as
what ib_resolve_eth_dmac() does. So ib_resolve_eth_dmac is refactored to
split for unicast and non unicast GIDs, so that it can be reused by
ib_init_ah_from_wc.
While we are at refactoring ib_resolve_eth_dmac(), it is further
simplified
(a) to avoid hoplimit as optional parameter, as there is only one
user who always queries hoplimit.
(b) for empty line.
(c) avoided zero initialization of ret.
(d) removed as exported symbol as only ib core uses it.
For IPv6, this is tested using simple rping test as below.
rping -sv -a ::0
rping -c -a fe80::268a:7ff:fe55:4661%ens2f1 -C 1 -v -d
[1] https://www.spinics.net/lists/linux-rdma/msg45690.html
[2] https://www.spinics.net/lists/linux-rdma/msg45710.html
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Matan Barak <matanb@mellanox.com>
Reviewed-by: Mark Bloch <markb@mellanox.com>
Reported-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
- Fix for SELinux on the umad SMI path. Some old hardware
does not fill the PKey properly exposing another bug in the newer
SELinux code.
- Check the input port as we can exceed array bounds from this
user supplied value
- Users are unable to use the hash field support as they want due to
incorrect checks on the field restrictions, correct that so the
feature works as intended
- User triggerable oops in the NETLINK_RDMA handler
- cxgb4 driver fix for a bad interaction with CQ flushing in iser
caused by patches in this merge window, and bad CQ flushing during
normal close.
- Unbalanced memalloc_noio in ipoib in an error path.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJaNGJ9AAoJEDht9xV+IJsawK4P/iVlUR8DReXKVPkxYOQk15bI
GEKG3t2Ce1GaeFZY7TBmsdHBRTHhxf2osEM57TbBmWv6N/pG83GLresE6xxOhRHz
3s2hzWElJXpYnM0QttHCNjJvySIzjzLZiaQyhiWFqs0+cPVUM9zQd0G77LwHngRf
1gO7toTMYk8eZkJt0ClQwHMeH6qR892o+zDUtorX/Ez4Ly4tT3I/RwRpbZ1HHpsA
uWMYcsge7lRzFbZnC+lDoeqozcv20B7n9UBEcAHJkVSh5JFC+TByRmCAZ/hPzjXb
Pr2E4gTYT+ULUsPRECtIwupT30xfFdByFYBAl+EQ+fiJvGgBxcgVjdLDQ3Ddlb6n
ga5UEverYKivizitKowtpMCJ0nVH6R4qLt5vcPwxuoHKQmUtXQFeg/haZPWCiPwr
B4Ahm371yRx8xo4AITBFX4L4PdtmdAueyrrjz/MxJm5YM2eRy08OONFVlBqXTuqT
EdbtHFCbXtE3aAIiWGmUA0jbswKN9fkUct/wMwkny8T3h/XPKhBqA/SWN5SX1KC3
EHAjczAcX+MOS52pyhf07C3Z/oq4gpXSCQQHSat9es8oxst4w0CWcCGqIhyqyu2q
s5CZG3Ok+OvTmKWRJkaEAJXHRoTB1OjkgEod13xRDQmONW/cabeBKe/BC1zmvctM
g2eyl4amP8MGaRTldpou
=oyIl
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"More fixes from testing done on the rc kernel, including more SELinux
testing. Looking forward, lockdep found regression today in ipoib
which is still being fixed.
Summary:
- Fix for SELinux on the umad SMI path. Some old hardware does not
fill the PKey properly exposing another bug in the newer SELinux
code.
- Check the input port as we can exceed array bounds from this user
supplied value
- Users are unable to use the hash field support as they want due to
incorrect checks on the field restrictions, correct that so the
feature works as intended
- User triggerable oops in the NETLINK_RDMA handler
- cxgb4 driver fix for a bad interaction with CQ flushing in iser
caused by patches in this merge window, and bad CQ flushing during
normal close.
- Unbalanced memalloc_noio in ipoib in an error path"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/ipoib: Restore MM behavior in case of tx_ring allocation failure
iw_cxgb4: only insert drain cqes if wq is flushed
iw_cxgb4: only clear the ARMED bit if a notification is needed
RDMA/netlink: Fix general protection fault
IB/mlx4: Fix RSS hash fields restrictions
IB/core: Don't enforce PKey security on SMI MADs
IB/core: Bound check alternate path port number
Smatch complains about this code:
drivers/infiniband/hw/mlx4/qp.c:1827 _mlx4_set_path()
error: buffer overflow 'dev->dev->caps.gid_table_len' 3 <= 255
The mlx4_ib_gid_index_to_real_index() does check that "port" is within
bounds, but we don't check the return value for errors. It seems simple
enough to add a check for that.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The story is that Smatch marks skb->data as untrusted so it generates
a warning message here:
drivers/infiniband/hw/cxgb4/cm.c:4100 process_work()
error: buffer overflow 'work_handlers' 241 <= 255
In other places which handle this such as t4_uld_rx_handler() there is
some checking to make sure that the function pointer is not NULL. I
have added bounds checking and a check for NULL here as well.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The pointer reg_workq is local to the source and does not need to be
in global scope, so make it static.
Cleans up sparse warning:
drivers/infiniband/hw/cxgb4/device.c:69:25: warning: symbol 'reg_workq'
was not declared. Should it be static?
Fixes: 1c8f1da5d8 ("iw_cxgb4: Fix possible circular dependency locking warning")
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The debugfs file prints the difference between host timestamps as a
seconds/nanoseconds tuple, along with a 64-bit nanoseconds hardware
timestamp. The host time is read using getnstimeofday() which is
deprecated because of the y2038 overflow, and it suffers from time jumps
during settimeofday() and leap seconds.
Converting to ktime_get_ts64() would solve those two, but I'm going
a little further here by changing to ktime_get() and printing 64-bit
nanoseconds on both host and hw timestamps. This simplifies the code
further and makes the output easier to understand.
The format of the debugfs file obviously changes here, but this should
only be read by humans and not scripts, so I assume it's fine.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
With gcc-4.1.2:
drivers/infiniband/core/iwpm_util.c: In function ‘iwpm_send_mapinfo’:
drivers/infiniband/core/iwpm_util.c:647: warning: ‘ret’ may be used uninitialized in this function
Indeed, if nl_client is not found in any of the scanned has buckets, ret
will be used uninitialized.
Preinitialize ret to -EINVAL to fix this.
Fixes: 30dc5e63d6 ("RDMA/core: Add support for iWARP Port Mapper user space service")
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Reviewed-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If one port fails to initialize an error message should indicate the
reason and driver should continue serving the working port(s) and other
HCA(s).
Fixes: e4b2d06892 ("IB/ipoib: Remove device when one port fails to init").
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The bnxt_qplib_disable_nq() call is redundant as it occurs
after 'goto fail' and hence it called twice. Remove it.
Signed-off-by: Arvind Yadav <arvind.yadav.cs@gmail.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
memalloc_noio_save modifies the behavior of MM, we must restore it after
we are done.
Fixes: d83187dda9 ("IB/IPoIB: Convert IPoIB to memalloc_noio_* calls")
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is no need to have a duplication of the generic library, i.e.
hex2bin().
Replace the open coded variant.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Acked-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
A warning that I thought I had fixed before occasionally comes
back in rare randconfig builds (I found 7 instances in the last
100000 builds, originally it was much more frequent):
drivers/infiniband/hw/mlx5/mr.c: In function 'mlx5_ib_reg_user_mr':
drivers/infiniband/hw/mlx5/mr.c:1229:5: error: 'order' may be used uninitialized in this function [-Werror=maybe-uninitialized]
if (order <= mr_cache_max_order(dev)) {
^
drivers/infiniband/hw/mlx5/mr.c:1247:8: error: 'ncont' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/infiniband/hw/mlx5/mr.c:1247:8: error: 'page_shift' may be used uninitialized in this function [-Werror=maybe-uninitialized]
drivers/infiniband/hw/mlx5/mr.c:1260:2: error: 'npages' may be used uninitialized in this function [-Werror=maybe-uninitialized]
I've looked at all those findings again and noticed that they are all
with CONFIG_INFINIBAND_USER_MEM=n, which means ib_umem_get() returns
an error unconditionally and we never initialize or use those variables.
This triggers a condition in gcc iff mr_umem_get() is partially but not
entirely inlined, which in turn depends on the exact combination of
optimization settings. This is a known problem with gcc, with no
easy solution in the compiler, so this adds another workaround that
should be more reliable than my previous attempt.
Returning an error from mlx5_ib_reg_user_mr() earlier means that we
can completely bypass the logic that caused the warning, the compiler
can now see that the variable is never accessed.
Fixes: 14ab8896f5 ("IB/mlx5: avoid bogus -Wmaybe-uninitialized warning")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
add_sd_cnt in info structure passed to i40iw_create_hmc_obj_type
must be 0 and since it is modified during the call, it must be
reset in the loop. This avoids unnecessarily reprogramming the
SDs multiple times with the same values.
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Use sqsize instead of I40IW_CQP_SW_SQSIZE_2048 to initialize
cqp_requests elements in the for-loop as sqsize is used
to allocate memory for cqp_requests.
Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
pr_* is the preferred way to print messages, replace all
printk(KERN_WARN, ...) with pr_warn.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Fix ptr_ret.cocci warnings:
drivers/infiniband/core/uverbs_cmd.c:1156:1-3: WARNING: PTR_ERR_OR_ZERO can be used
Use PTR_ERR_OR_ZERO rather than if(IS_ERR(...)) + PTR_ERR
Generated by: scripts/coccinelle/api/ptr_ret.cocci
Signed-off-by: Vasyl Gomonovych <gomonovych@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is a stale entry in nes_cm_tcp_context that has apparently
never been used in mainline linux.
I'm trying to kill off all users of timeval as part of the
y2038-safety work, so let's just remove this one.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is a stale entry in i40iw_cm_tcp_context that apparently
was copied from the 'nes' driver but never used in i40iw.
I'm trying to kill off all users of timeval as part of the
y2038-safety work, so let's just remove this one.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
There is no need to re-calculate the number of pages since it is already
done in ib_umem_get.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Acked-by: Adit Ranadive <aditr@vmware.com>
Tested-by: Adit Ranadive <aditr@vmware.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Perhaps the function is better written without
the empty bail: label and without setting ret
and just using return.
Combining the int/bool conversion of any and the
direct returns makes the resulting code clearer.
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The variable all is being set but is never read after this
hence it can be removed from the for loop initialization.
Cleans up clang warning:
drivers/infiniband/hw/qib/qib_file_ops.c:640:7: warning: Value
stored to 'any' is never read
Tested-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This failure exists with qib:
ver_rc_compare_swap:
mismatch, sequence 2, expected 123456789abcdef, got 0
The request builder was using the incorrect inlines to
build the request header resulting in incorrect data
in the atomic header.
Fix by using the appropriate inlines to create the request.
Cc: <stable@vger.kernel.org> # 4.9.x+
Fixes: 261a435184 ("IB/qib,IB/hfi: Use core common header file")
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently, if a port is queried that has an invalid
Maximum Transmission Unit, driver reports default MTU of 2048.
This in incorrect.
Use default value of 4096 if invalid.
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
ULPs do not understand OPA GIDs and will reject CM requests
if the sgid does not match the local_gid. In order to
fix this behavior we convert the OPA GID back to an IB GID.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
16B packets require that the path bits are masked with the LMC.
This mask is done correctly in all 16B header creation but was
left out for the RC Acknowledge.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Only insert our special drain CQEs to support ib_drain_sq/rq() after
the wq is flushed. Otherwise, existing but not yet polled CQEs can be
returned out of order to the user application. This can happen when the
QP has exited RTS but not yet flushed the QP, which can happen during
a normal close (vs abortive close).
In addition never count the drain CQEs when determining how many CQEs
need to be synthesized during the flush operation. This latter issue
should never happen if the QP is properly flushed before inserting the
drain CQE, but I wanted to avoid corrupting the CQ state. So we handle
it and log a warning once.
Fixes: 4fe7c2962e ("iw_cxgb4: refactor sq/rq drain logic")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Cc: stable@vger.kernel.org
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In __flush_qp(), the CQ ARMED bit was being cleared regardless of
whether any notification is actually needed. This resulted in the iser
termination logic getting stuck in ib_drain_sq() because the CQ was not
marked ARMED and thus the drain CQE notification wasn't triggered.
This new bug was exposed when this commit was merged:
commit cbb40fadd3 ("iw_cxgb4: only call the cq comp_handler when the
cq is armed")
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Mistakenly the driver didn't allow RSS hash fields combinations which
involve both IPv4 and IPv6 protocols. This bug caused to failures for
user's use cases for RSS.
Consequently, this patch fixes this bug and allows any combination that
the HW can support.
Additionally, the patch fixes the driver to return an error in case the
user provides an unsupported mask for RSS hash fields.
Fixes: 3078f5f1bd ("IB/mlx4: Add support for RSS QP")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Per the infiniband spec an SMI MAD can have any PKey. Checking the pkey
on SMI MADs is not necessary, and it seems that some older adapters
using the mthca driver don't follow the convention of using the default
PKey, resulting in false denials, or errors querying the PKey cache.
SMI MAD security is still enforced, only agents allowed to manage the
subnet are able to receive or send SMI MADs.
Reported-by: Chris Blake <chrisrblake93@gmail.com>
Cc: <stable@vger.kernel.org> # v4.12
Fixes: 47a2b338fe ("IB/core: Enforce security on management datagrams")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The alternate port number is used as an array index in the IB
security implementation, invalid values can result in a kernel panic.
Cc: <stable@vger.kernel.org> # v4.12
Fixes: d291f1a652 ("IB/core: Enforce PKey security on QPs")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The smp_read_barrier_depends() does nothing at all except on DEC Alpha,
and no current DEC Alpha systems use Infiniband:
lkml.kernel.org/r/20171023085921.jwbntptn6ictbnvj@tower
This commit therefore makes Infiniband depend on !ALPHA and removes
the now-ineffective invocations of smp_read_barrier_depends() from
the InfiniBand driver.
Please note that this patch should not be construed as my saying that
InfiniBand's memory ordering is correct, but rather that this patch does
not in any way affect InfiniBand's correctness. In other words, the
result of applying this patch is bug-for-bug compatible with the original.
Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Richard Henderson <rth@twiddle.net>
Cc: Ivan Kokshaysky <ink@jurassic.park.msu.ru>
Cc: Matt Turner <mattst88@gmail.com>
Cc: Michael Cree <mcree@orcon.net.nz>
Cc: Andrea Parri <parri.andrea@gmail.com>
Cc: <linux-rdma@vger.kernel.org>
Cc: <linux-alpha@vger.kernel.org>
[ paulmck: Removed drivers/dma/ioat/dma.c per Jason Gunthorpe's feedback. ]
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
fix for a regression in iWarp if SELinux is enabled, a fix for a compilation
regression introduced in this merge window, and one obscure kconfig
combination that oops's the kernel.
For drivers, we have hns fixes needed to make their devices work on certain
ARM IOMMU configurations, a stack data leak for hfi1, and various testing
discovered -rc bug fixes for i40iw.
This cycle we pushed back on the driver maintainers to have better commit
messages for -rc material.
You may need to pull my latest PGP key from the GPG key servers for this, I am
not certain if the subkey update will make it to kernel.org's WKD before you
need it.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2
iQIcBAABCgAGBQJaJcu6AAoJEDht9xV+IJsa8aoP/Rh3tHT3mqN6v/p9HgUyUNRS
gyDJ4BHg3A+O1UrnBjFjrAhpX9bqik/96n9t14Er7kVM9gxaDzOaCNd9+ASsKjDf
fXRusCnKS5RP5CpQ16e6qurkOsBXghsJKTL+zpqGSmDf0yUBQCJUkmRNJNhiaUtW
YEp92dfZytTK+iEmuXW4fJoIKWK3N5aOkttiK8BFb6XvmsUnWSp1wlBS2FhRzDq9
PPwfM2EE/x46dFF1/w04M5hVDPO6Bngq0Tvo+EdOlAMwKN3Zmun+fSOLKaxg44Of
dyN6dsu5tKi200Nbdq6cBkehWL6CukSGdJnepeI+xW+8hve9Eu9O6j6O3pMb/dYn
/vvqE14KhrR1B3F5LFkJLcxxKRl97S2uPhOY2j3oU4L93s9B4X6geXX2oLVIos1r
41YPu1/7OQyQffp4eKgsz4eA38TpdG6DoOlFMXgdIboJ8bASuRuyfLISVviMc8dx
SKQTZTY54FK7uJMRw4rkcOlVUpJ2tyuVZr+Lt8p80IpnySCdJsEgAZkJngCPOKRT
2h8VdfFwzhdlf3Ni5tZRZdMtE6oMD5BMa0jri7xtyKYa0o3gUqvHDGMTSVlQ1maF
qXMP2mApTcpdFuFvdbnxIeLzP8zigJVkvIsqeKHGS8gt+dxF/934rTY1NTj3rQN5
zmClIoiVg7NvHlDzvwg+
=YaUB
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma
Pull rdma fixes from Jason Gunthorpe:
"Here is the first rc pull request for RDMA. This includes an important
core fix for a regression in iWarp if SELinux is enabled, a fix for a
compilation regression introduced in this merge window, and one
obscure kconfig combination that oops's the kernel.
For drivers, we have hns fixes needed to make their devices work on
certain ARM IOMMU configurations, a stack data leak for hfi1, and
various testing discovered -rc bug fixes for i40iw.
This cycle we pushed back on the driver maintainers to have better
commit messages for -rc material"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma:
IB/core: Only enforce security for InfiniBand
RDMA/hns: Get rid of page operation after dma_alloc_coherent
RDMA/hns: Get rid of virt_to_page and vmap calls after dma_alloc_coherent
RDMA/hns: Fix the issue of IOVA not page continuous in hip08
IB/core: Init subsys if compiled to vmlinuz-core
RDMA/cma: Make sure that PSN is not over max allowed
i40iw: Notify user of established connection after QP in RTS
i40iw: Move MPA request event for loopback after connect
i40iw: Correct ARP index mask
i40iw: Do not free sqbuf when event is I40IW_TIMER_TYPE_CLOSE
i40iw: Allocate a sdbuf per CQP WQE
IB: INFINIBAND should depend on HAS_DMA
IB/hfi1: Initialize bth1 in 16B rc ack builder
For now the only LSM security enforcement mechanism available is
specific to InfiniBand. Bypass enforcement for non-IB link types.
This fixes a regression where modify_qp fails for iWARP because
querying the PKEY returns -EINVAL.
Cc: Paul Moore <paul@paul-moore.com>
Cc: Don Dutile <ddutile@redhat.com>
Cc: stable@vger.kernel.org
Reported-by: Potnuri Bharat Teja <bharat@chelsio.com>
Fixes: d291f1a65232("IB/core: Enforce PKey security on QPs")
Fixes: 47a2b338fe63("IB/core: Enforce security on management datagrams")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Tested-by: Potnuri Bharat Teja <bharat@chelsio.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In general, dma_alloc_coherent() returns a CPU virtual address and
a DMA address, and we have no guarantee that the underlying memory
even has an associated struct page at all.
This patch gets rid of the page operation after dma_alloc_coherent,
and records the VA returned form dma_alloc_coherent in the struct
of hem in hns RoCE driver.
Fixes: 9a44353("IB/hns: Add driver files for hns RoCE driver")
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Xiping Zhang (Francis) <zhangxiping3@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
In general dma_alloc_coherent() returns a CPU virtual address and
a DMA address, and we have no guarantee that the virtual address
is either in the linear map or vmalloc. It could be in some other special
place. We have no guarantee that the underlying memory even has
an associated struct page at all.
In current code, there are incorrect usage as below:
dma_alloc_coherent + virt_to_page + vmap. There will probably
introduce coherency problem. This patch fixes it to get rid of
virt_to_page and vmap calls at Leon's suggestion. The related
link: https://lkml.org/lkml/2017/11/7/34
Fixes: 9a44353("IB/hns: Add driver files for hns RoCE driver")
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Xiping Zhang (Francis) <zhangxiping3@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
If the smmu is enabled, the length of sg obtained from
__iommu_map_sg_attrs is not 4kB. When the IOVA is set with the sg
dma address, the IOVA will not be page continuous. so, the current
code has MTPT configuration error that probably cause dma operation
failure. In order to fix this issue, the IOVA should be calculated
based on the sg length.
Fixes: 3958cc5("RDMA/hns: Configure the MTPT in hip08")
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Xiping Zhang (Francis) <zhangxiping3@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Once infiniband is compiled as a core component its subsystem must be
enabled before device initialization. Otherwise there is a NULL pointer
dereference during mlx4_core init, calltrace:
->device_add
if (dev->class) {
deref dev->class->p =>NULLPTR
#Config
CONFIG_NET_DEVLINK=y
CONFIG_MAY_USE_DEVLINK=y
CONFIG_MLX4_EN=y
Signed-off-by: Dmitry Monakhov <dmonakhov@openvz.org>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
This patch limits the initial value for PSN to 24 bits as
spec requires.
Signed-off-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Mukesh Kacker <mukesh.kacker@oracle.com>
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Established CM event is sent prior to modifying QP to RTS state.
This can result in application closing the connection before the
QP is actually in RTS state. Move sending of established CM
event to after modify QP to RTS.
Fixes: f27b4746f3 ("i40iw: add connection management code")
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
For loopback, a MPA request event is generated when cm_node
is initialized, which allows applications to act on the
connect request before i40iw_connect() has completed.
In some cases, the reject flow executes in parallel with
the connect flow and doesn't delete an APBVT entry,
because the apbvt_set variable is still not set by the
connect flow. Move the MPA request event to the end of
i40iw_connect() to notify application for a connect
request, after connect has completed.
Fixes: f27b4746f3 ("i40iw: add connection management code")
Signed-off-by: Tatyana Nikolova <tatyana.e.nikolova@intel.com>
Signed-off-by: Henry Orosco <henry.orosco@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
The ARP table entry indexes are aliased to 12bits
instead of the intended 16bits when uploaded to
the QP Context. This will present an issue when the
number of connections exceeds 4096 as ARP entries are
reused. Fix this by adjusting the mask to account for
the full 16bits.
Fixes: 4e9042e647 ("i40iw: add hw and utils files")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
When the event type is I40IW_TIMER_TYPE_CLOSE, there is no sqbuf and
it should not be freed as one in i40iw_schedule_cm_timer().
Fixes: f27b4746f3 ("i40iw: add connection management code")
Signed-off-by: Mustafa Ismail <mustafa.ismail@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Currently there is only one sdbuf per Control QP (CQP) for
programming Segment Descriptor (SD). If multiple SD work
requests are posted simultaneously, the sdbuf is reused
by all WQEs and new WQEs can corrupt previous WQEs sdbuf
leading to incorrect SD programming.
Fix this by allocating one sdbuf per CQP SQ WQE. When an
SD command is posted, it will use the corresponding sdbuf
for the WQE.
Fixes: 86dbcd0f12 ("i40iw: add file to handle cqp calls")
Signed-off-by: Chien Tin Tung <chien.tin.tung@intel.com>
Signed-off-by: Shiraz Saleem <shiraz.saleem@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
It is possible the bth1 variable could be used uninitialized so going
ahead and giving it a default value.
Otherwise we leak stack memory to the network.
Fixes: 5b6cabb0db ("IB/hfi1: Add 16B RC/UC support")
Reviewed-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
Until there is a solution to the dma-to-dax vs truncate problem it is
not safe to allow RDMA to create long standing memory registrations
against filesytem-dax vmas.
Link: http://lkml.kernel.org/r/151068941011.7446.7766030590347262502.stgit@dwillia2-desk3.amr.corp.intel.com
Fixes: 3565fce3a6 ("mm, x86: get_user_pages() for dax mappings")
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Reported-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Acked-by: Jason Gunthorpe <jgg@mellanox.com>
Acked-by: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Jeff Moyer <jmoyer@redhat.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Inki Dae <inki.dae@samsung.com>
Cc: Jan Kara <jack@suse.cz>
Cc: Joonyoung Shim <jy0922.shim@samsung.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Seung-Woo Kim <sw0312.kim@samsung.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: <stable@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
With all callbacks converted, and the timer callback prototype
switched over, the TIMER_FUNC_TYPE cast is no longer needed,
so remove it. Conversion was done with the following scripts:
perl -pi -e 's|\(TIMER_FUNC_TYPE\)||g' \
$(git grep TIMER_FUNC_TYPE | cut -d: -f1 | sort -u)
perl -pi -e 's|\(TIMER_DATA_TYPE\)||g' \
$(git grep TIMER_DATA_TYPE | cut -d: -f1 | sort -u)
The now unused macros are also dropped from include/linux/timer.h.
Signed-off-by: Kees Cook <keescook@chromium.org>
This mechanically converts all remaining cases of ancient open-coded timer
setup with the old setup_timer() API, which is the first step in timer
conversions. This has no behavioral changes, since it ultimately just
changes the order of assignment to fields of struct timer_list when
finding variations of:
init_timer(&t);
f.function = timer_callback;
t.data = timer_callback_arg;
to be converted into:
setup_timer(&t, timer_callback, timer_callback_arg);
The conversion is done with the following Coccinelle script, which
is an improved version of scripts/cocci/api/setup_timer.cocci, in the
following ways:
- assignments-before-init_timer() cases
- limit the .data case removal to the specific struct timer_list instance
- handling calls by dereference (timer->field vs timer.field)
spatch --very-quiet --all-includes --include-headers \
-I ./arch/x86/include -I ./arch/x86/include/generated \
-I ./include -I ./arch/x86/include/uapi \
-I ./arch/x86/include/generated/uapi -I ./include/uapi \
-I ./include/generated/uapi --include ./include/linux/kconfig.h \
--dir . \
--cocci-file ~/src/data/setup_timer.cocci
@fix_address_of@
expression e;
@@
init_timer(
-&(e)
+&e
, ...)
// Match the common cases first to avoid Coccinelle parsing loops with
// "... when" clauses.
@match_immediate_function_data_after_init_timer@
expression e, func, da;
@@
-init_timer
+setup_timer
( \(&e\|e\)
+, func, da
);
(
-\(e.function\|e->function\) = func;
-\(e.data\|e->data\) = da;
|
-\(e.data\|e->data\) = da;
-\(e.function\|e->function\) = func;
)
@match_immediate_function_data_before_init_timer@
expression e, func, da;
@@
(
-\(e.function\|e->function\) = func;
-\(e.data\|e->data\) = da;
|
-\(e.data\|e->data\) = da;
-\(e.function\|e->function\) = func;
)
-init_timer
+setup_timer
( \(&e\|e\)
+, func, da
);
@match_function_and_data_after_init_timer@
expression e, e2, e3, e4, e5, func, da;
@@
-init_timer
+setup_timer
( \(&e\|e\)
+, func, da
);
... when != func = e2
when != da = e3
(
-e.function = func;
... when != da = e4
-e.data = da;
|
-e->function = func;
... when != da = e4
-e->data = da;
|
-e.data = da;
... when != func = e5
-e.function = func;
|
-e->data = da;
... when != func = e5
-e->function = func;
)
@match_function_and_data_before_init_timer@
expression e, e2, e3, e4, e5, func, da;
@@
(
-e.function = func;
... when != da = e4
-e.data = da;
|
-e->function = func;
... when != da = e4
-e->data = da;
|
-e.data = da;
... when != func = e5
-e.function = func;
|
-e->data = da;
... when != func = e5
-e->function = func;
)
... when != func = e2
when != da = e3
-init_timer
+setup_timer
( \(&e\|e\)
+, func, da
);
@r1 exists@
expression t;
identifier f;
position p;
@@
f(...) { ... when any
init_timer@p(\(&t\|t\))
... when any
}
@r2 exists@
expression r1.t;
identifier g != r1.f;
expression e8;
@@
g(...) { ... when any
\(t.data\|t->data\) = e8
... when any
}
// It is dangerous to use setup_timer if data field is initialized
// in another function.
@script:python depends on r2@
p << r1.p;
@@
cocci.include_match(False)
@r3@
expression r1.t, func, e7;
position r1.p;
@@
(
-init_timer@p(&t);
+setup_timer(&t, func, 0UL);
... when != func = e7
-t.function = func;
|
-t.function = func;
... when != func = e7
-init_timer@p(&t);
+setup_timer(&t, func, 0UL);
|
-init_timer@p(t);
+setup_timer(t, func, 0UL);
... when != func = e7
-t->function = func;
|
-t->function = func;
... when != func = e7
-init_timer@p(t);
+setup_timer(t, func, 0UL);
)
Signed-off-by: Kees Cook <keescook@chromium.org>
Here is the set of driver core / debugfs patches for 4.15-rc1.
Not many here, mostly all are debugfs fixes to resolve some
long-reported problems with files going away with references to them in
userspace. There's also some SPDX cleanups for the debugfs code, as
well as a few other minor driver core changes for issues reported by
people.
All of these have been in linux-next for a week or more with no reported
issues.
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-----BEGIN PGP SIGNATURE-----
iG0EABECAC0WIQT0tgzFv3jCIUoxPcsxR9QN2y37KQUCWg2NCA8cZ3JlZ0Brcm9h
aC5jb20ACgkQMUfUDdst+ymUNgCfYq434CFh+YtwITBNYdqkFYFf0ZAAn3qfhh2+
M3rmZzwk2MKBvNQ2npvt
=/8+Y
-----END PGP SIGNATURE-----
Merge tag 'driver-core-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core
Pull driver core updates from Greg KH:
"Here is the set of driver core / debugfs patches for 4.15-rc1.
Not many here, mostly all are debugfs fixes to resolve some
long-reported problems with files going away with references to them
in userspace. There's also some SPDX cleanups for the debugfs code, as
well as a few other minor driver core changes for issues reported by
people.
All of these have been in linux-next for a week or more with no
reported issues"
* tag 'driver-core-4.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/driver-core:
driver core: Fix device link deferred probe
debugfs: Remove redundant license text
debugfs: add SPDX identifiers to all debugfs files
debugfs: defer debugfs_fsdata allocation to first usage
debugfs: call debugfs_real_fops() only after debugfs_file_get()
debugfs: purge obsolete SRCU based removal protection
IB/hfi1: convert to debugfs_file_get() and -put()
debugfs: convert to debugfs_file_get() and -put()
debugfs: debugfs_real_fops(): drop __must_hold sparse annotation
debugfs: implement per-file removal protection
debugfs: add support for more elaborate ->d_fsdata
driver core: Move device_links_purge() after bus_remove_device()
arch_topology: Fix section miss match warning due to free_raw_capacity()
driver-core: pr_err() strings should end with newlines
Merge updates from Andrew Morton:
- a few misc bits
- ocfs2 updates
- almost all of MM
* emailed patches from Andrew Morton <akpm@linux-foundation.org>: (131 commits)
memory hotplug: fix comments when adding section
mm: make alloc_node_mem_map a void call if we don't have CONFIG_FLAT_NODE_MEM_MAP
mm: simplify nodemask printing
mm,oom_reaper: remove pointless kthread_run() error check
mm/page_ext.c: check if page_ext is not prepared
writeback: remove unused function parameter
mm: do not rely on preempt_count in print_vma_addr
mm, sparse: do not swamp log with huge vmemmap allocation failures
mm/hmm: remove redundant variable align_end
mm/list_lru.c: mark expected switch fall-through
mm/shmem.c: mark expected switch fall-through
mm/page_alloc.c: broken deferred calculation
mm: don't warn about allocations which stall for too long
fs: fuse: account fuse_inode slab memory as reclaimable
mm, page_alloc: fix potential false positive in __zone_watermark_ok
mm: mlock: remove lru_add_drain_all()
mm, sysctl: make NUMA stats configurable
shmem: convert shmem_init_inodecache() to void
Unify migrate_pages and move_pages access checks
mm, pagevec: rename pagevec drained field
...
Now that we have a NUMA-aware version of kmalloc_array() we can use it
instead of kmalloc_node() without an overflow check in the size
calculation.
Link: http://lkml.kernel.org/r/20170927082038.3782-5-jthumshirn@suse.de
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Mike Marciniszyn <infinipath@intel.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that we have a NUMA-aware version of kmalloc_array() we can use it
instead of kmalloc_node() without an overflow check in the size
calculation.
Link: http://lkml.kernel.org/r/20170927082038.3782-4-jthumshirn@suse.de
Signed-off-by: Johannes Thumshirn <jthumshirn@suse.de>
Reviewed-by: Christoph Lameter <cl@linux.com>
Cc: Mike Marciniszyn <infinipath@intel.com>
Cc: Doug Ledford <dledford@redhat.com>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Damien Le Moal <damien.lemoal@wdc.com>
Cc: David Rientjes <rientjes@google.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
Cc: Pekka Enberg <penberg@kernel.org>
Cc: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Cc: Vlastimil Babka <vbabka@suse.cz>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
- Add iWARP support to qedr driver
- Lots of misc fixes across subsystem
- Multiple update series to hns roce driver
- Multiple update series to hfi1 driver
- Updates to vnic driver
- Add kref to wait struct in cxgb4 driver
- Updates to i40iw driver
- Mellanox shared pull request
- timer_setup changes
- massive cleanup series from Bart Van Assche
- Two series of SRP/SRPT changes from Bart Van Assche
- Core updates from Mellanox
- i40iw updates
- IPoIB updates
- mlx5 updates
- mlx4 updates
- hns updates
- bnxt_re fixes
- PCI write padding support
- Sparse/Smatch/warning cleanups/fixes
- CQ moderation support
- SRQ support in vmw_pvrdma
-----BEGIN PGP SIGNATURE-----
iQIcBAABAgAGBQJaDF9JAAoJELgmozMOVy/dDXUP/i92g+G4OJ+4hHMh4KCjQMHT
eMr/w9l1C033HrtsU1afPhqHOsKSxwCuJSiTgN4uXIm67/2kPK5Vlx+ir7mbOLwB
3ukVK6Q/aFdigWCUhIaJSlDpjbd2sEj7JwKtM3rucvMWJlBJ4mAbcVQVfU96CCsv
V9mO7dpR3QtYWDId9DukfnAfPUPFa3SMZnD7tdl6mKNRg/MjWGYLAL4nJoBfex5f
b4o+MTrbuFWXYsfDru1m9BpHgyul20ldfcnbe8C/sVOQmOgkX7ngD5Sdi1FLeRJP
GF/DnAqInC9N7cAxZHx4kH9x6mLMmEdfnwQ9VTVqGUHBsj3H4hQTVIAFfHUhWUbG
TP5ZHgZG2CewZ0rf092cWlDZwp6n0BalnbQJr+QN4MzPmYbofs3AccSKUwrle+e+
E6yYf4XxJdt7wRr4F1QKygtUEXSnNkNYUDQ4ZFbpJS/D4Sq80R1ZV/WZ7PJxm1D/
EIKoi7NU9cbPMIlbCzn8kzgfjS7Pe4p0WW/Xxc/IYmACzpwNPkZuFGSND79ksIpF
jhHqwZsOWFuXISjvcR4loc8wW6a5w5vjOiX0lLVz0NSdXSzVqav/2at7ZLDx/PT+
Lh9YVL51akA3hiD+3X6iOhfOUu6kskjT9HijE5T8rJnf0V+C6AtIRpwrQ7ONmjJm
3JMrjjLxtCIvpUyzCvDW
=A1oL
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma
Pull rdma updates from Doug Ledford:
"This is a fairly plain pull request. Lots of driver updates across the
stack, a huge number of static analysis cleanups including a close to
50 patch series from Bart Van Assche, and a number of new features
inside the stack such as general CQ moderation support.
Nothing really stands out, but there might be a few conflicts as you
take things in. In particular, the cleanups touched some of the same
lines as the new timer_setup changes.
Everything in this pull request has been through 0day and at least two
days of linux-next (since Stephen doesn't necessarily flag new
errors/warnings until day2). A few more items (about 30 patches) from
Intel and Mellanox showed up on the list on Tuesday. I've excluded
those from this pull request, and I'm sure some of them qualify as
fixes suitable to send any time, but I still have to review them
fully. If they contain mostly fixes and little or no new development,
then I will probably send them through by the end of the week just to
get them out of the way.
There was a break in my acceptance of patches which coincides with the
computer problems I had, and then when I got things mostly back under
control I had a backlog of patches to process, which I did mostly last
Friday and Monday. So there is a larger number of patches processed in
that timeframe than I was striving for.
Summary:
- Add iWARP support to qedr driver
- Lots of misc fixes across subsystem
- Multiple update series to hns roce driver
- Multiple update series to hfi1 driver
- Updates to vnic driver
- Add kref to wait struct in cxgb4 driver
- Updates to i40iw driver
- Mellanox shared pull request
- timer_setup changes
- massive cleanup series from Bart Van Assche
- Two series of SRP/SRPT changes from Bart Van Assche
- Core updates from Mellanox
- i40iw updates
- IPoIB updates
- mlx5 updates
- mlx4 updates
- hns updates
- bnxt_re fixes
- PCI write padding support
- Sparse/Smatch/warning cleanups/fixes
- CQ moderation support
- SRQ support in vmw_pvrdma"
* tag 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma: (296 commits)
RDMA/core: Rename kernel modify_cq to better describe its usage
IB/mlx5: Add CQ moderation capability to query_device
IB/mlx4: Add CQ moderation capability to query_device
IB/uverbs: Add CQ moderation capability to query_device
IB/mlx5: Exposing modify CQ callback to uverbs layer
IB/mlx4: Exposing modify CQ callback to uverbs layer
IB/uverbs: Allow CQ moderation with modify CQ
iw_cxgb4: atomically flush the qp
iw_cxgb4: only call the cq comp_handler when the cq is armed
iw_cxgb4: Fix possible circular dependency locking warning
RDMA/bnxt_re: report vlan_id and sl in qp1 recv completion
IB/core: Only maintain real QPs in the security lists
IB/ocrdma_hw: remove unnecessary code in ocrdma_mbx_dealloc_lkey
RDMA/core: Make function rdma_copy_addr return void
RDMA/vmw_pvrdma: Add shared receive queue support
RDMA/core: avoid uninitialized variable warning in create_udata
RDMA/bnxt_re: synchronize poll_cq and req_notify_cq verbs
RDMA/bnxt_re: Flush CQ notification Work Queue before destroying QP
RDMA/bnxt_re: Set QP state in case of response completion errors
RDMA/bnxt_re: Add memory barriers when processing CQ/EQ entries
...
Summary of modules changes for the 4.15 merge window:
- Treewide module_param_call() cleanup, fix up set/get function
prototype mismatches, from Kees Cook
- Minor code cleanups
Signed-off-by: Jessica Yu <jeyu@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQIcBAABCgAGBQJaDCyzAAoJEMBFfjjOO8FyaYQP/AwHBy6XmwwVlWDP4BqIF6hL
Vhy3ccVLYEORvePv68tWSRPUz5n6+1Ebqanmwtkw6i8l+KwxY2SfkZql09cARc33
2iBE4bHF98iWQmnJbF6me80fedY9n5bZJNMQKEF9VozJWwTMOTQFTCfmyJRDBmk9
iidQj6M3idbSUOYIJjvc40VGx5NyQWSr+FFfqsz1rU5iLGRGEvA3I2/CDT0oTuV6
D4MmFxzE2Tv/vIMa2GzKJ1LGScuUfSjf93Lq9Kk0cG36qWao8l930CaXyVdE9WJv
bkUzpf3QYv/rDX6QbAGA0cada13zd+dfBr8YhchclEAfJ+GDLjMEDu04NEmI6KUT
5lP0Xw0xYNZQI7bkdxDMhsj5jaz/HJpXCjPCtZBnSEKiL4OPXVMe+pBHoCJ2/yFN
6M716XpWYgUviUOdiE+chczB5p3z4FA6u2ykaM4Tlk0btZuHGxjcSWwvcIdlPmjm
kY4AfDV6K0bfEBVguWPJicvrkx44atqT5nWbbPhDwTSavtsuRJLb3GCsHedx7K8h
ZO47lCQFAWCtrycK1HYw+oupNC3hYWQ0SR42XRdGhL1bq26C+1sei1QhfqSgA9PQ
7CwWH4UTOL9fhtrzSqZngYOh9sjQNFNefqQHcecNzcEjK2vjrgQZvRNWZKHSwaFs
fbGX8juZWP4ypbK+irTB
=c8vb
-----END PGP SIGNATURE-----
Merge tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux
Pull module updates from Jessica Yu:
"Summary of modules changes for the 4.15 merge window:
- treewide module_param_call() cleanup, fix up set/get function
prototype mismatches, from Kees Cook
- minor code cleanups"
* tag 'modules-for-v4.15' of git://git.kernel.org/pub/scm/linux/kernel/git/jeyu/linux:
module: Do not paper over type mismatches in module_param_call()
treewide: Fix function prototypes for module_param_call()
module: Prepare to convert all module_param_call() prototypes
kernel/module: Delete an error message for a failed memory allocation in add_module_usage()
Pull networking updates from David Miller:
"Highlights:
1) Maintain the TCP retransmit queue using an rbtree, with 1GB
windows at 100Gb this really has become necessary. From Eric
Dumazet.
2) Multi-program support for cgroup+bpf, from Alexei Starovoitov.
3) Perform broadcast flooding in hardware in mv88e6xxx, from Andrew
Lunn.
4) Add meter action support to openvswitch, from Andy Zhou.
5) Add a data meta pointer for BPF accessible packets, from Daniel
Borkmann.
6) Namespace-ify almost all TCP sysctl knobs, from Eric Dumazet.
7) Turn on Broadcom Tags in b53 driver, from Florian Fainelli.
8) More work to move the RTNL mutex down, from Florian Westphal.
9) Add 'bpftool' utility, to help with bpf program introspection.
From Jakub Kicinski.
10) Add new 'cpumap' type for XDP_REDIRECT action, from Jesper
Dangaard Brouer.
11) Support 'blocks' of transformations in the packet scheduler which
can span multiple network devices, from Jiri Pirko.
12) TC flower offload support in cxgb4, from Kumar Sanghvi.
13) Priority based stream scheduler for SCTP, from Marcelo Ricardo
Leitner.
14) Thunderbolt networking driver, from Amir Levy and Mika Westerberg.
15) Add RED qdisc offloadability, and use it in mlxsw driver. From
Nogah Frankel.
16) eBPF based device controller for cgroup v2, from Roman Gushchin.
17) Add some fundamental tracepoints for TCP, from Song Liu.
18) Remove garbage collection from ipv6 route layer, this is a
significant accomplishment. From Wei Wang.
19) Add multicast route offload support to mlxsw, from Yotam Gigi"
* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (2177 commits)
tcp: highest_sack fix
geneve: fix fill_info when link down
bpf: fix lockdep splat
net: cdc_ncm: GetNtbFormat endian fix
openvswitch: meter: fix NULL pointer dereference in ovs_meter_cmd_reply_start
netem: remove unnecessary 64 bit modulus
netem: use 64 bit divide by rate
tcp: Namespace-ify sysctl_tcp_default_congestion_control
net: Protect iterations over net::fib_notifier_ops in fib_seq_sum()
ipv6: set all.accept_dad to 0 by default
uapi: fix linux/tls.h userspace compilation error
usbnet: ipheth: prevent TX queue timeouts when device not ready
vhost_net: conditionally enable tx polling
uapi: fix linux/rxrpc.h userspace compilation errors
net: stmmac: fix LPI transitioning for dwmac4
atm: horizon: Fix irq release error
net-sysfs: trigger netlink notification on ifalias change via sysfs
openvswitch: Using kfree_rcu() to simplify the code
openvswitch: Make local function ovs_nsh_key_attr_size() static
openvswitch: Fix return value check in ovs_meter_cmd_features()
...
- proper use of the bool type (Thomas Meyer)
- constification of struct config_item_type (Bhumika Goyal)
-----BEGIN PGP SIGNATURE-----
iQI/BAABCAApFiEEgdbnc3r/njty3Iq9D55TZVIEUYMFAloLSTALHGhjaEBsc3Qu
ZGUACgkQD55TZVIEUYNxfhAAv3cunxiEPEAvs+1xuGd3cZYaxz7qinvIODPxIKoF
kRWiuy5PUklRMnJ8seOgJ1p1QokX6Sk4cZ8HcctDJVByqODjOq4K5eaKVN1ZqJoz
BUzO/gOqfs64r9yaFIlKfe8nFA+gpUftSeWyv3lThxAIJ1iSbue7OZ/A10tTOS1m
RWp9FPepFv+nJMfWqeQU64BsoDQ4kgZ2NcEA+jFxNx5dlmIbLD49tk0lfddvZQXr
j5WyAH73iugilLtNUGVOqSzHBY4kUvfCKUV7leirCegyMoGhFtA87m6Wzwbo6ZUI
DwQLzWvuPaGv1P2PpNEHfKiNbfIEp75DRyyyf87DD3lc5ffAxQSm28mGuwcr7Rn5
Ow/yWL6ERMzCLExoCzEkXYJISy7T5LIzYDgNggKMpeWxysAduF7Onx7KfW1bTuhK
mHvY7iOXCjEvaIVaF8uMKE6zvuY1vCMRXaJ+kC9jcIE3gwhg+2hmQvrdJ2uAFXY+
rkeF2Poj/JlblPU4IKWAjiPUbzB7Lv0gkypCB2pD4riaYIN5qCAgF8ULIGQp2hsO
lYW1EEgp5FBop85oSO/HAGWeH9dFg0WaV7WqNRVv0AGXhKjgy+bVd7iYPpvs7mGw
z9IqSQDORcG2ETLcFhZgiJpCk/itwqXBD+wgMOjJPP8lL+4kZ8FcuhtY9kc9WlJE
Tew=
=+tMO
-----END PGP SIGNATURE-----
Merge tag 'configfs-for-4.15' of git://git.infradead.org/users/hch/configfs
Pull configfs updates from Christoph Hellwig:
"A couple of configfs cleanups:
- proper use of the bool type (Thomas Meyer)
- constification of struct config_item_type (Bhumika Goyal)"
* tag 'configfs-for-4.15' of git://git.infradead.org/users/hch/configfs:
RDMA/cma: make config_item_type const
stm class: make config_item_type const
ACPI: configfs: make config_item_type const
nvmet: make config_item_type const
usb: gadget: configfs: make config_item_type const
PCI: endpoint: make config_item_type const
iio: make function argument and some structures const
usb: gadget: make config_item_type structures const
dlm: make config_item_type const
netconsole: make config_item_type const
nullb: make config_item_type const
ocfs2/cluster: make config_item_type const
target: make config_item_type const
configfs: make ci_type field, some pointers and function arguments const
configfs: make config_item_type const
configfs: Fix bool initialization/comparison
Current ib_modify_cq() is used to set CQ moderation parameters.
This patch renames ib_modify_cq() to be rdma_set_cq_moderation(),
because the kernel version of RDMA API doesn't need to follow already
exposed to user's API pattern (create_XXX/modify_XXX/query_XXX/destroy_XXX)
and better to have more accurate name which describes the actual usage.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
query_device can now obtain the maximum values for
cq_max_count and cq_period, needed for cq moderation.
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
query_device can now obtain the maximum values for
cq_max_count and cq_period, needed for cq moderation.
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The query_device function can now obtain the maximum values for
cq_max_count and cq_period, needed for CQ moderation.
cq_max_count is a 16 bits number that determines the number
of CQEs to accumulate before generating an event.
cq_period is a 16 bits number that determines the timeout in micro
seconds from the last event generated, upon which a new event will
be generated even if cq_max_count was not reached.
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Exposed mlx5_ib_modify_cq to be called from ib device
verb list.
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Exposed mlx4_ib_modify_cq to be called from ib device
verb list.
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Uverbs support in modify_cq for CQ moderation only.
Gives ability to change cq_max_count and cq_period.
CQ moderation enhance performance by moderating the number
of CQEs needed to create an event instead of application
having to suffer from event per-CQE.
To achieve CQ moderation the application needs to set cq_max_count
and cq_period.
cq_max_count - defines the number of CQEs needed to create an event.
cq_period - defines the timeout (micro seconds) between last
event and a new one that will occur even if
cq_max_count was not satisfied
Signed-off-by: Yonatan Cohen <yonatanc@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
__flush_qp() has a race condition where during the flush operation,
the qp lock is released allowing another thread to possibly post a WR,
which corrupts the queue state, possibly causing crashes. The lock was
released to preserve the cq/qp locking hierarchy of cq first, then qp.
However releasing the qp lock is not necessary; both RQ and SQ CQ locks
can be acquired first, followed by the qp lock, and then the RQ and SQ
flushing can be done w/o unlocking.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The ULPs completion handler should only be called if the CQ is
armed for notification.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Locking sequence of iw_cxgb4 and RoCE drivers in ib_register_device() is
slightly different and this leads to possible circular dependency locking
warning when both the devices are brought up.
Here is the locking sequence upto ib_register_device():
iw_cxgb4: rtnl_mutex(net stack) --> uld_mutex --> device_mutex
RoCE drivers: device_mutex --> rtnl_mutex
Here is the possibility of cross locking:
CPU #0 (iw_cxgb4) CPU #1 (RoCE drivers)
-> on interface up cxgb4_up()
executed with rtnl_mutex held
-> hold uld_mutex and try
registering ib device
-> In ib_register_device() hold
device_mutex
-> hold device mutex in
ib_register_device
-> try acquiring rtnl_mutex in
ib_enum_roce_netdev()
Current patch schedules the ib_register_device() functionality of
iw_cxgb4 to a workqueue to prevent the possible cross-locking.
Also rename the labels in c4iw_reister_device().
Signed-off-by: Potnuri Bharat Teja <bharat@chelsio.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In a real RoCE v2 network it is possible to have two
Sections of network have same IP hence same gid. However
those may have different vlans. During connection resolution
it is important to report the actual vlan on which the
MAD packet was received instead of relying on other means
to resolve vlan-id. ib_find_gid_index should not be used
to resolve the vlan-id using sgid of the local system
where the packet was received.
Our device has the capability to report the actual VLAN-ID
in the GSI qp completions. Since we have the capability our
driver should move away from resolving the vlan-id with the
help of SGID at the destination port.
Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Reported-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When modify QP is called on a shared QP update the security context for
the real QP. When security is subsequently enforced the shared QP
handles will be checked as well.
Without this change shared QP handles get added to the port/pkey lists,
which is a bug, because not all shared QP handles will be checked for
access. Also the shared QP security context wouldn't get removed from
the port/pkey lists causing access to free memory and list corruption
when they are destroyed.
Cc: stable@vger.kernel.org
Fixes: d291f1a652 ("IB/core: Enforce PKey security on QPs")
Signed-off-by: Daniel Jurgens <danielj@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Check on return value and goto label mbx_err are unnecessary.
Addresses-Coverity-ID: 1268780
Signed-off-by: Gustavo A. R. Silva <garsilva@embeddedor.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Function returns zero - make it void.
While there make struct net_device const.
Signed-off-by: Yuval Shaia <yuval.shaia@oracle.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add the required functions needed to support SRQs. Currently, kernel
clients are not supported. SRQs will only be available in userspace.
Reviewed-by: Adit Ranadive <aditr@vmware.com>
Reviewed-by: Aditya Sarwade <asarwade@vmware.com>
Reviewed-by: Jorgen Hansen <jhansen@vmware.com>
Reviewed-by: Nitish Bhat <bnitish@vmware.com>
Signed-off-by: Bryan Tan <bryantan@vmware.com>
Reviewed-by: Yuval Shaia <yuval.shaia@oracle.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
As Dan pointed out, the rework I did makes it harder for smatch and other
static checkers to figure out what is going on with the uninitialized
pointers.
By open-coding the call in create_udata(), we make it more readable for
both humans and tools.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 12f727721e ("IB/uverbs: clean up INIT_UDATA_BUF_OR_NULL usage")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Destroy_qp shall wait for any outstanding CQ notification to be
flushed out before proceeding with QP destroy. Flushing the WQ
before destroying the QP.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Moves the driver QP state to error in case of response completion
errors. Handles the scenarios which doesn't generate a terminal CQE.
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The code determines if the next ring entry is valid before proceeding
further to read the rest of the entry. The CPU can re-order and read
the rest of the entry first, possibly reading a stale entry, if DMA
of a new entry happens right after reading it.
Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When the driver is loaded it sets the default CCTI value to be 0. When the FM
starts and CCA is disabled the driver sets the max value to 65535 due the driver
subtracting 1 from 0 and the fact that the CCTI value is a u16.
Special case the subtraction to find the index for a 0 value.
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Pass only the lower 16Bits of an Extended LIDs to rvt_cq_entry
to avoid triggering a WARN_ON_ONCE during conversion there.
These upper 16Bits are okay to drop as they are obtained elsewhere.
Fixes: 62ede77799 ("Add OPA extended LID support")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When deciding to convert an OPA AH to IB we were incorrectly
including the IB multicast range. At this layer, all Extended
LIDs will be larger than IB_LID_PERMISSIVE. Change comparison
accordingly.
Fixes: d541e45500 ("IB/core: Convert ah_attr from OPA to IB when copying to user")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Don Hiatt <don.hiatt@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
On host shutdown, driver sends 'SMA_Disabled' as a reason
for link down. This is incorrect.
Send 'reboot' as a linkdown reason.
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
It is invalid to change Link state from Init to Armed if
IsSmConfigurationStarted bit is not set in Attribute modifier
for Set subnet management method in case of PortInfo
and PortStateInfo attribute.
Set response MAD status field bits accordingly to react correctly
in such situations and avoid changing Link state.
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Grzegorz Morys <grzegorz.morys@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
OPA VNIC does not use PIO contexts and instead only uses SDMA
engines. Do not allocate PIO contexts for VNIC ports.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Niranjana Vishwanathapura <niranjana.vishwanathapura@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
A for loop condition of data_iovs in user_sdma_free_request
is unnecessarily repeated before the loop as an if check.
Remove the if enveloping the loop.
Reviewed-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The "2 * UINT_MAX" statement:
if ((u64)(ts - cce->timestamp) > 2 * UINT_MAX) {
is equivalent to:
if ((u64)(ts - cce->timestamp) > UINT_MAX - 1) {
This results in a premature timeout of the cong log entry.
Fix by using unsigned 64 bit integers, removing casts, and using
an algebraic equivalent test to avoid the "2 * UINT_MAX" issue.
Also make use of kernel API to get nanoseconds instead of
open coding.
Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Wrapper functions were used to call the same function
mmu_notifier_mem_invalidate for 2 callbacks in
mmu_notifier.
The commit 7def96f0a9 ("IB/hfi1: update to new mmu_notifier semantic")
removed the invalidate_page callback.
Therefore, the wrapper function is no longer needed.
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Kamenee Arumugam <kamenee.arumugam@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Timeout of 20 seconds is too long for active wait performed
for 8051 command completion. It was required for scenarios
when transition to polling was requested before offline.quiet
state was reached. Currently wait for offline.quiet is
properly implemented and timeout can be reduced to 1 second.
Reviewed-by: Dean Luick <dean.luick@intel.com>
Reviewed-by: Duane McCrory <duane.mccrory@intel.com>
Signed-off-by: Jakub Byczkowski <jakub.byczkowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
HFI's are hard-wired to send Device Info frames with
MgmtAllowed bit set to 0. This means in B2B setups,
MgmtAllowed would never be allowed, which prevents
remote opa management tools from working properly.
Assume MgmtAllowed if a neighbor is also an HFI.
Fixes: 98b9ee2002 ("IB/hfi1: Cache neighbor secure data after link up")
Reviewed-by: Sebastian Sanchez <sebastian.sanchez@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Signed-off-by: Jan Sokolowski <jan.sokolowski@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The use_srq configfs attribute is created after it is read. Hence
modify srpt_tpg_attrib_use_srq_store() such that this function
switches dynamically between non-SRQ and SRQ mode.
Reported-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Introduce the helper function srpt_set_enabled(). Protect
sport->enabled changes with sdev->mutex. Makes configfs writes
into 'enabled' wait until all channel resources have been freed.
Wait until channel release has finished during kernel module
unload.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Except for changing a BUG_ON() call into a WARN_ON_ONCE() call, this
patch does not change any functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The only functional change in this patch is in the srpt_add_one()
error path: if allocating the ring buffer for the SRQ fails, fall
back to non-SRQ mode instead of disabling SRP target functionality.
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
IB and iWARP specs both spell out that posting a receive work request
to a queue pair in the RESET state is an invalid operation and required
to fail. Postpone posting receive work requests until after the
transition to the INIT state.
Fixes: commit dea262094c ("IB/srpt: Change default behavior from using SRQ to using RC")
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
iw_cxgb4 has many BUG_ON()s that were left over from various enhancemnets
made over the years. Almost all of them should just be removed. Some,
however indicate a ULP usage error and can be handled w/o bringing down
the system.
If the condition cannot happen with correctly implemented cxgb4 sw/fw,
then remove the BUG_ON.
If the condition indicates a misbehaving ULP (like CQ overflows), add
proper recovery logic.
Signed-off-by: Steve Wise <swise@opengridcomputing.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
While adding a new gid, the driver currently does not return the context
back to the stack. A subsequent del_gid() (e.g, when ip address is changed)
doesn't find the right context in the driver and it ends up dropping that
request. This results in the HW caching a stale gid entry and traffic fails
because of that. Fix by returning the proper context in bnxt_re_add_gid().
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In bnxt_qplib_process_qp_event(), for qp error events we look up the
qp-handle and pass it for further processing. But we don't check if the
handle is NULL. This could lead to a crash in the called functions when
that qp-handle is dereferenced, if the qp is destroyed in the meantime.
Fix this by checking for a valid qp-handle in that function.
Signed-off-by: Sriharsha Basavapatna <sriharsha.basavapatna@broadcom.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Since there is nothing done with non zero return value, such check is
avoided.
Signed-off-by: Parav Pandit <parav@mellanox.com>
Reviewed-by: Daniel Jurgens <danielj@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Maximal message should be used as a limit to the max message payload allowed,
without the headers. The ConnectX-3 check is done against this value includes
the headers. When the payload is 4K this will cause the NIC to drop packets.
Increase maximal message to 8K as workaround, this shouldn't change current
behaviour because we continue to set the MTU to 4k.
To reproduce;
set MTU to 4296 on the corresponding interface, for example:
ifconfig eth0 mtu 4296 (both server and client)
On server:
ib_send_bw -c UD -d mlx4_0 -s 4096 -n 1000000 -i1 -m 4096
On client:
ib_send_bw -d mlx4_0 -c UD <server_ip> -s 4096 -n 1000000 -i 1 -m 4096
Fixes: 6e0d733d92 ("IB/mlx4: Allow 4K messages for UD QPs")
Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Taking advantage of the optimization which was introduced in previous
commit ("IB/mlx4: Use optimal numbers of MTT entries") to optimize the
MTT usage for QP and CQ.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Optimize the device performance by assigning multiple physical pages,
which are contiguous, to a single MTT. As a result, the number of MTTs
is reduced and in turn save cache misses of MTTs.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When working over a RoCE network, the UDP source port should be set only
for statically connected QPs (RC, UC and XRC).
Fixes: 2811ba51b0 ("IB/mlx5: Add RoCE fields to Address Vector")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The UMR's QP is created by calling mlx5_ib_create_qp directly, and
therefore the send CQ and the recv CQ on the ibqp weren't assigned.
Assign them right after calling the mlx5_ib_create_qp to assure
that any access to those pointers will work as expected and won't
crash the system as might happen as part of reset flow.
Fixes: e126ba97db ("mlx5: Add driver for Mellanox Connect-IB adapters")
Signed-off-by: Majd Dibbiny <majd@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Smatch tool reports the following error:
drivers/infiniband/hw/cxgb4/qp.c:1886
c4iw_create_qp() error: we previously assumed 'ucontext'
could be null (see line 1804)
Cc: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The Broadcom driver produces the following compilation warning
drivers/infiniband/hw/bnxt_re/ib_verbs.c:
In function ‘bnxt_re_create_ah’:
drivers/infiniband/hw/bnxt_re/ib_verbs.c:668:6:
warning: variable ‘vlan_tag’ set but not used [-Wunused-but-set-variable]
u16 vlan_tag;
Let's remove it till vlan_tag will be implemented properly.
Cc: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Add the PCI write end padding flag to device_cap_flags enum and set it
during mlx5_ib_query_device so it will be reported to user-space.
During WQ/QP creation, set that capability for WQ/QP if user requested
it and HW supports it.
PCI write end padding modification is not supported for now. There's no
such flag for a QP but for a WQ, create and modify use the same flag.
Return an error if PCI write end padding flag is set during modify_wq.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
There are root complexes that are able to optimize their
performance when incoming data is multiple full cache lines.
PCI write end padding is the device's ability to pad the ending of
incoming packets (scatter) to full cache line such that the last
upstream write generated by an incoming packet will be a full cache
line.
Add a relevant entry to ib_device_cap_flags to report such capability
of an RDMA device.
Add the QP and WQ create flags:
* A QP/WQ created with a scatter end padding flag will cause
HW to pad the last upstream write generated by a packet to cache line.
User should consider several factors before activating this feature:
- In case of high CPU memory load (which may cause PCI back pressure in
turn), if a large percent of the writes are partial cache line, this
feature should be checked as an optional solution.
- This feature might reduce performance if most packets are between one
and two cache lines and PCIe throughput has reached its maximum
capacity. E.g. 65B packet from the network port will lead to 128B
write on PCIe, which may cause traffic on PCIe to reach high
throughput.
Signed-off-by: Noa Osherovich <noaos@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Following crash happens, if crc algorithm couldn't be allocated:
[ 1087.989072] rdma_rxe: loaded
[ 1097.855397] PCLMULQDQ-NI instructions are not detected.
[ 1097.901220] rdma_rxe: failed to allocate crc algorithmi err:-2
[ 1097.901248] BUG: unable to handle kernel
[ 1097.901249] NULL pointer dereference
[ 1097.901250] at 0000000000000046
[...]
Reason is that rxe->tfm is assigned the error return, which will then
be used for crypto_free_shash() in rxe_cleanup. Fix by using a
temporary variable and assigning it rxe->tfm after allocation succeeded.
Fixes: cee2688e3c ("IB/rxe: Offload CRC calculation when possible")
Signed-off-by: Thomas Bogendoerfer <tbogendoerfer@suse.de>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Acked-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Chelsio cxgb4 HW is big-endian, hence there is need to properly
annotate r2 and stag fields as __be32 and not __u32 to fix the
following sparse warnings.
drivers/infiniband/hw/cxgb4/qp.c:614:16:
warning: incorrect type in assignment (different base types)
expected unsigned int [unsigned] [usertype] r2
got restricted __be32 [usertype] <noident>
drivers/infiniband/hw/cxgb4/qp.c:615:18:
warning: incorrect type in assignment (different base types)
expected unsigned int [unsigned] [usertype] stag
got restricted __be32 [usertype] <noident>
Cc: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In the modify QP handler the base_qpn_udp field in the RSS QPC is
overwrite later by irrelevant value assignment. Hence, ingress packets
which gets to the RSS QP will be steered then to a garbage QPN.
The patch fixes this by skipping the above assignment when a RSS QP is
modified, also, the RSS context's attributes assignments are relocated
just before the context is posted to avoid future issues like this.
Additionally, this patch takes the opportunity to change the code to be
disciplined to the device's manual and assigns the RSS QP context just at
RESET to INIT transition.
Fixes:3078f5f1bd8b ("IB/mlx4: Add support for RSS QP")
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The mlx4's RSS patches submission missed a report of RSS capabilities
which should be reported by the vendor channel in query_device.
Signed-off-by: Guy Levi <guyle@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leon@kernel.org>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The RDMA/umem uses generic RB-trees macros to generate various ib_umem
access functions. The generation is performed with INTERVAL_TREE_DEFINE
macro, which allows one of two modes: declare all functions as static or
declare none of the function to be static.
The second mode of operation produces the following sparse errors:
drivers/infiniband/core/umem_rbtree.c:69:1:
warning: symbol 'rbt_ib_umem_iter_first' was not declared.
Should it be static?
drivers/infiniband/core/umem_rbtree.c:69:1:
warning: symbol 'rbt_ib_umem_iter_next' was not declared.
Should it be static?
Code relocation together with declaration of such functions to be
"static" solves the issue.
Because there is no need to have separate file for two functions,
let's consolidate umem_rtree.c and umem_odp.c into one file.
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The cmd_sn field of CQ doorbell inits for 0. It should be
increment on each first db rung after a completion Event.
if the cmd_sn of notify doorbell Adjacent two times is the
same, the hardware will distinguish it for the same notify
request and update its type according to the priority level
of next event and solicited event.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The calculation of hem index are different between hns_roce_table_get
and hns_roce_table_find. When the table chunk size of TRRL is not
divisible by object size, it will faile to find the trrl table.
This patch is to update the calculation of the hem index in the
hns_roce_table_find to the same as which in the hns_roce_table_get.
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
the owner need to be set when posting sqwqe in hip08 RoCE.
The owner be used according to the below algorithm:
The value of owner should be 1 in the first lap, it
should be 0 in the second lap and in turn.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
In hip08 RoCE, it need to add the sq_invld_flg field
in QP context for RoCE hardware.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The ack timeout's value in qp context shall be a 5-bit value
and be assgined by users. When at of qpc is set zero, the
timer is disabled.
When attr_mask set for IB_QP_TIMEOUT, The ack timeout field
is effective.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
If the extend sges exist, the sq_cur_sge_blk_addr field in QPC
(qp context) should be configured.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When sig_type of qpc is non-selectable, all sq's wqes will produce
cqe and not depend on the cqe attribute of wqe. When sig_type of
qpc is selectable, The cqe attribute of wqe will decide whether to
produce the cqe.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When send flags is IB_SEND_SOLICITED, the se(solicated event)
field of sqwqe will be set.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
When post wr for mixed rdma operation, we need to use fence
mechanism to keep the correct execute order.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
The TRRL(Target RDMA Read/aTOMIC List) record the information
of receiving RDMA READ or ATOMIC operation in hip08. It will
be used the hardware. The driver need to assign a continuous
physical address for trrl_ba field of qp context.
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Wei Hu (Xavier) <xavier.huwei@huawei.com>
Signed-off-by: Shaobo Xu <xushaobo2@huawei.com>
Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>