Commit Graph

9066 Commits

Author SHA1 Message Date
Leon Romanovsky
dbace111e5 RDMA/core: Annotate timeout as unsigned long
The ucma users supply timeout in u32 format, it means that any number
with most significant bit set will be converted to negative value
by various rdma_*, cma_* and sa_query functions, which treat timeout
as int.

In the lowest level, the timeout is converted back to be unsigned long.
Remove this ambiguous conversion by updating all function signatures to
receive unsigned long.

Reported-by: Noa Osherovich <noaos@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16 13:34:01 -04:00
Leon Romanovsky
9549c2bd09 RDMA/core: Align multiple functions to kernel coding style
This patch changes the small number of functions to be aligned to kernel
coding style. It is needed to minimize the diffstat of the following
patch. It doesn't change any functionality.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16 13:34:01 -04:00
Leon Romanovsky
d6f9125207 RDMA/cma: Remove unused timeout_ms parameter from cma_resolve_iw_route()
cma_resolve_iw_route() doesn't use timeout_ms parameter, so let's remove it.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-16 13:34:01 -04:00
Artemy Kovalyov
013c2403bf IB/mlx5: Fix MR cache initialization
Schedule MR cache work only after bucket was initialized.

Cc: <stable@vger.kernel.org> # 4.10
Fixes: 49780d42df ("IB/mlx5: Expose MR cache for mlx5_ib")
Signed-off-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 08:30:37 -06:00
Leon Romanovsky
e54b6a3bcd RDMA/cm: Respect returned status of cm_init_av_by_path
Add missing check for failure of cm_init_av_by_path

Fixes: e1444b5a16 ("IB/cm: Fix automatic path migration support")
Reported-by: Slava Shwartsman <slavash@mellanox.com>
Reviewed-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 08:29:24 -06:00
Denis Drozdov
4d6e4d12da IB/ipoib: Clear IPCB before icmp_send
IPCB should be cleared before icmp_send, since it may contain data from
previous layers and the data could be misinterpreted as ip header options,
which later caused the ihl to be set to an invalid value and resulted in
the following stack corruption:

[ 1083.031512] ib0: packet len 57824 (> 2048) too long to send, dropping
[ 1083.031843] ib0: packet len 37904 (> 2048) too long to send, dropping
[ 1083.032004] ib0: packet len 4040 (> 2048) too long to send, dropping
[ 1083.032253] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.032481] ib0: packet len 23960 (> 2048) too long to send, dropping
[ 1083.033149] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.033439] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.033700] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.034124] ib0: packet len 63800 (> 2048) too long to send, dropping
[ 1083.034387] ==================================================================
[ 1083.034602] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0xf08/0x1310
[ 1083.034798] Write of size 4 at addr ffff880353457c5f by task kworker/u16:0/7
[ 1083.034990]
[ 1083.035104] CPU: 7 PID: 7 Comm: kworker/u16:0 Tainted: G           O      4.19.0-rc5+ #1
[ 1083.035316] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Ubuntu-1.8.2-1ubuntu2 04/01/2014
[ 1083.035573] Workqueue: ipoib_wq ipoib_cm_skb_reap [ib_ipoib]
[ 1083.035750] Call Trace:
[ 1083.035888]  dump_stack+0x9a/0xeb
[ 1083.036031]  print_address_description+0xe3/0x2e0
[ 1083.036213]  kasan_report+0x18a/0x2e0
[ 1083.036356]  ? __ip_options_echo+0xf08/0x1310
[ 1083.036522]  __ip_options_echo+0xf08/0x1310
[ 1083.036688]  icmp_send+0x7b9/0x1cd0
[ 1083.036843]  ? icmp_route_lookup.constprop.9+0x1070/0x1070
[ 1083.037018]  ? netif_schedule_queue+0x5/0x200
[ 1083.037180]  ? debug_show_all_locks+0x310/0x310
[ 1083.037341]  ? rcu_dynticks_curr_cpu_in_eqs+0x85/0x120
[ 1083.037519]  ? debug_locks_off+0x11/0x80
[ 1083.037673]  ? debug_check_no_obj_freed+0x207/0x4c6
[ 1083.037841]  ? check_flags.part.27+0x450/0x450
[ 1083.037995]  ? debug_check_no_obj_freed+0xc3/0x4c6
[ 1083.038169]  ? debug_locks_off+0x11/0x80
[ 1083.038318]  ? skb_dequeue+0x10e/0x1a0
[ 1083.038476]  ? ipoib_cm_skb_reap+0x2b5/0x650 [ib_ipoib]
[ 1083.038642]  ? netif_schedule_queue+0xa8/0x200
[ 1083.038820]  ? ipoib_cm_skb_reap+0x544/0x650 [ib_ipoib]
[ 1083.038996]  ipoib_cm_skb_reap+0x544/0x650 [ib_ipoib]
[ 1083.039174]  process_one_work+0x912/0x1830
[ 1083.039336]  ? wq_pool_ids_show+0x310/0x310
[ 1083.039491]  ? lock_acquire+0x145/0x3a0
[ 1083.042312]  worker_thread+0x87/0xbb0
[ 1083.045099]  ? process_one_work+0x1830/0x1830
[ 1083.047865]  kthread+0x322/0x3e0
[ 1083.050624]  ? kthread_create_worker_on_cpu+0xc0/0xc0
[ 1083.053354]  ret_from_fork+0x3a/0x50

For instance __ip_options_echo is failing to proceed with invalid srr and
optlen passed from another layer via IPCB

[  762.139568] IPv4: __ip_options_echo rr=0 ts=0 srr=43 cipso=0
[  762.139720] IPv4: ip_options_build: IPCB 00000000f3cd969e opt 000000002ccb3533
[  762.139838] IPv4: __ip_options_echo in srr: optlen 197 soffset 84
[  762.139852] IPv4: ip_options_build srr=0 is_frag=0 rr_needaddr=0 ts_needaddr=0 ts_needtime=0 rr=0 ts=0
[  762.140269] ==================================================================
[  762.140713] IPv4: __ip_options_echo rr=0 ts=0 srr=0 cipso=0
[  762.141078] BUG: KASAN: stack-out-of-bounds in __ip_options_echo+0x12ec/0x1680
[  762.141087] Write of size 4 at addr ffff880353457c7f by task kworker/u16:0/7

Signed-off-by: Denis Drozdov <denisd@mellanox.com>
Reviewed-by: Erez Shitrit <erezsh@mellanox.com>
Reviewed-by: Feras Daoud <ferasda@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 08:25:43 -06:00
Leon Romanovsky
fe9bc16449 RDMA/restrack: Protect from reentry to resource return path
Nullify the resource task struct pointer to ensure that subsequent calls
won't try to release task_struct again.

------------[ cut here ]------------
ODEBUG: free active (active state 1) object type: rcu_head hint:
(null)
WARNING: CPU: 0 PID: 6048 at lib/debugobjects.c:329
debug_print_object+0x16a/0x210 lib/debugobjects.c:326
Kernel panic - not syncing: panic_on_warn set ...

CPU: 0 PID: 6048 Comm: syz-executor022 Not tainted
4.19.0-rc7-next-20181008+ #89
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS
Google 01/01/2011
Call Trace:
  __dump_stack lib/dump_stack.c:77 [inline]
  dump_stack+0x244/0x3ab lib/dump_stack.c:113
  panic+0x238/0x4e7 kernel/panic.c:184
  __warn.cold.8+0x163/0x1ba kernel/panic.c:536
  report_bug+0x254/0x2d0 lib/bug.c:186
  fixup_bug arch/x86/kernel/traps.c:178 [inline]
  do_error_trap+0x11b/0x200 arch/x86/kernel/traps.c:271
  do_invalid_op+0x36/0x40 arch/x86/kernel/traps.c:290
  invalid_op+0x14/0x20 arch/x86/entry/entry_64.S:969
RIP: 0010:debug_print_object+0x16a/0x210 lib/debugobjects.c:326
Code: 41 88 48 89 fa 48 c1 ea 03 80 3c 02 00 0f 85 92 00 00 00 48 8b 14
dd
60 02 41 88 4c 89 fe 48 c7 c7 00 f8 40 88 e8 36 2f b4 fd <0f> 0b 83 05
a9
f4 5e 06 01 48 83 c4 18 5b 41 5c 41 5d 41 5e 41 5f
RSP: 0018:ffff8801d8c3eda8 EFLAGS: 00010086
RAX: 0000000000000000 RBX: 0000000000000003 RCX: 0000000000000000
RDX: 0000000000000000 RSI: ffffffff8164d235 RDI: 0000000000000005
RBP: ffff8801d8c3ede8 R08: ffff8801d70aa280 R09: ffffed003b5c3eda
R10: ffffed003b5c3eda R11: ffff8801dae1f6d7 R12: 0000000000000001
R13: ffffffff8939a760 R14: 0000000000000000 R15: ffffffff8840fca0
  __debug_check_no_obj_freed lib/debugobjects.c:786 [inline]
  debug_check_no_obj_freed+0x3ae/0x58d lib/debugobjects.c:818
  kmem_cache_free+0x202/0x290 mm/slab.c:3759
  free_task_struct kernel/fork.c:163 [inline]
  free_task+0x16e/0x1f0 kernel/fork.c:457
  __put_task_struct+0x2e6/0x620 kernel/fork.c:730
  put_task_struct include/linux/sched/task.h:96 [inline]
  finish_task_switch+0x66c/0x900 kernel/sched/core.c:2715
  context_switch kernel/sched/core.c:2834 [inline]
  __schedule+0x8d7/0x21d0 kernel/sched/core.c:3480
  schedule+0xfe/0x460 kernel/sched/core.c:3524
  freezable_schedule include/linux/freezer.h:172 [inline]
  futex_wait_queue_me+0x3f9/0x840 kernel/futex.c:2530
  futex_wait+0x45c/0xa50 kernel/futex.c:2645
  do_futex+0x31a/0x26d0 kernel/futex.c:3528
  __do_sys_futex kernel/futex.c:3589 [inline]
  __se_sys_futex kernel/futex.c:3557 [inline]
  __x64_sys_futex+0x472/0x6a0 kernel/futex.c:3557
  do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
  entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x446549
Code: e8 2c b3 02 00 48 83 c4 18 c3 0f 1f 80 00 00 00 00 48 89 f8 48 89 f7
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff
ff 0f 83 2b 09 fc ff c3 66 2e 0f 1f 84 00 00 00 00
RSP: 002b:00007f3a998f5da8 EFLAGS: 00000246 ORIG_RAX: 00000000000000ca
RAX: ffffffffffffffda RBX: 00000000006dbc38 RCX: 0000000000446549
RDX: 0000000000000000 RSI: 0000000000000080 RDI: 00000000006dbc38
RBP: 00000000006dbc30 R08: 0000000000000000 R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000246 R12: 00000000006dbc3c
R13: 2f646e6162696e69 R14: 666e692f7665642f R15: 00000000006dbd2c
Kernel Offset: disabled

Reported-by: syzbot+71aff6ea121ffefc280f@syzkaller.appspotmail.com
Fixes: ed7a01fd3f ("RDMA/restrack: Release task struct which was hold by CM_ID object")
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 08:24:36 -06:00
Mark Bloch
ba4a411983 RDMA/mlx5: Add support for flow tag to raw create flow
A user can provide a hint which will be attached to the packet and written
to the CQE on receive. This can be used as a way to offload operations
into the HW, for example parsing a packet which is a tunneled packet, and
if so, pass 0x1 as the hint. The software can use that hint to decapsulate
the packet and parse only the inner headers thus saving CPU cycles.

Signed-off-by: Mark Bloch <markb@mellanox.com>
Reviewed-by: Yishai Hadas <yishaih@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:24:45 -06:00
Gal Pressman
645ba5970c RDMA/mlx5: Remove extraneous error check
Remove double error check from create user RQ error flow.

Fixes: 79b20a6c30 ("IB/mlx5: Add receive Work Queue verbs")
Signed-off-by: Gal Pressman <pressmangal@gmail.com>
Reviewed-by: Majd Dibbiny <majd@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:21:38 -06:00
Yishai Hadas
2351776e87 IB/mlx5: Verify DEVX object type
Verify that the input DEVX object type matches the created object.

As the obj_id in the firmware is not globally unique the object type must
be considered upon checking for a valid object id.

Once both the type and the id match we know that the lock was taken on the
correct object by the uverbs layer.

Fixes: e662e14d80 ("IB/mlx5: Add DEVX support for modify and query commands")
Signed-off-by: Yishai Hadas <yishaih@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:19:59 -06:00
Yixian Liu
68a997c5d2 RDMA/hns: Add FRMR support for hip08
This patch adds fast register physical memory region (FRMR) support for
hip08.

Signed-off-by: Yixian Liu <liuyixian@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:17:08 -06:00
Selvin Xavier
5df9509949 RDMA/bnxt_re: Avoid resource leak in case the NQ registration fails
In case the NQ alloc/enable fails, free up the already allocated/enabled
NQ before reporting failure. Also, track the alloc/enable using proper
state checking.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:03:51 -06:00
Selvin Xavier
a08b9e9a70 RDMA/bnxt_re: Wait for delayed work to finish before device removal
Delayed work bnxt_re_worker would be still running even after
cancel_delayed_work returns. This causes crash as the driver proceeds with
device removal. To make sure that the work is finished before returning,
use cancel_delayed_work_sync.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:03:51 -06:00
Devesh Sharma
854a202001 RDMA/bnxt_re: Limit max_pkey to 16 bit value
Some FW versios return pkey values more than 0xFFFF. pkey_tbl_len of
ib_port_attr is 16bit value. So restricting max_pkeys to 0xFFFF.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:03:51 -06:00
Devesh Sharma
4c01f2e3a9 RDMA/bnxt_re: Fix qp async event reporting
Reports affiliated async event on the qp-async event channel instead of
global event channel.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:03:50 -06:00
Selvin Xavier
316dd2825d RDMA/bnxt_re: Report out of sequence hw counters
Expose out of sequence errors received from FW.  This counter is a 32 bit
counter and driver has to accumulate the counter. Stores the previous
value for calculating the difference in the next query.

Also, update the HW statistics structure with new fields.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:03:50 -06:00
Selvin Xavier
5c80c9138e RDMA/bnxt_re: Expose rx discards and drop counters
Expose the RoCE discard and drop counters from the HW statistics context

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:03:50 -06:00
Somnath Kotur
bb22c36cba RDMA/bnxt_re: Prevent driver crash due to NULL pointer in error message print
crsqe->resp would be NULL in case the host command timed out before
getting a response from HW. Check for NULL pointer to avoid a potential
crash while printing the error message.

Signed-off-by: Somnath Kotur <somnath.kotur@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:03:50 -06:00
Devesh Sharma
f2bd4d096e RDMA/bnxt_re: Drop L2 async events silently
In some FW versions, RoCE driver also receives an async notification which
was directed to L2 driver.  RoCE driver does not handle this and print a
message to syslog.  Drop these notifications silently.

Signed-off-by: Devesh Sharma <devesh.sharma@broadcom.com>
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:03:50 -06:00
Selvin Xavier
ed51efd2ce RDMA/bnxt_re: Avoid accessing nq->bar_reg_iomem in failure case
In the failure path, nq->bar_reg_iomem gets accessed without
initializing. Avoid this by calling the bnxt_qplib_nq_stop_irq only if the
initialization is complete.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 1ac5a40479 ("RDMA/bnxt_re: Add bnxt_re RoCE driver")
Fixes: 6e04b10356 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:03:50 -06:00
Selvin Xavier
eae4ad1b0c RDMA/bnxt_re: Avoid NULL check after accessing the pointer
This is reported by smatch check.  rcfw->creq_bar_reg_iomem is accessed in
bnxt_qplib_rcfw_stop_irq and this variable check afterwards doesn't make
sense.  Also, rcfw->creq_bar_reg_iomem will never be NULL.  So Removing
this check.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Fixes: 6e04b10356 ("RDMA/bnxt_re: Fix broken RoCE driver due to recent L2 driver changes")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:03:50 -06:00
Selvin Xavier
1b7042d7a5 RDMA/bnxt_re: Remove the unnecessary version macro definition
Version macro is not required as the driver is not maintaining the
version. Removing the references of this macro too.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:03:50 -06:00
Selvin Xavier
d455f29f6d RDMA/bnxt_re: Fix recursive lock warning in debug kernel
Fix possible recursive lock warning. Its a false warning as the locks are
part of two differnt HW Queue data structure - cmdq and creq. Debug kernel
is throwing the following warning and stack trace.

[  783.914967] ============================================
[  783.914970] WARNING: possible recursive locking detected
[  783.914973] 4.19.0-rc2+ #33 Not tainted
[  783.914976] --------------------------------------------
[  783.914979] swapper/2/0 is trying to acquire lock:
[  783.914982] 000000002aa3949d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
[  783.914999]
but task is already holding lock:
[  783.915002] 00000000be73920d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x2a/0x350 [bnxt_re]
[  783.915013]
other info that might help us debug this:
[  783.915016]  Possible unsafe locking scenario:

[  783.915019]        CPU0
[  783.915021]        ----
[  783.915034]   lock(&(&hwq->lock)->rlock);
[  783.915035]   lock(&(&hwq->lock)->rlock);
[  783.915037]
 *** DEADLOCK ***

[  783.915038]  May be due to missing lock nesting notation

[  783.915039] 1 lock held by swapper/2/0:
[  783.915040]  #0: 00000000be73920d (&(&hwq->lock)->rlock){..-.}, at: bnxt_qplib_service_creq+0x2a/0x350 [bnxt_re]
[  783.915044]
stack backtrace:
[  783.915046] CPU: 2 PID: 0 Comm: swapper/2 Not tainted 4.19.0-rc2+ #33
[  783.915047] Hardware name: Dell Inc. PowerEdge R730/0599V5, BIOS 1.0.4 08/28/2014
[  783.915048] Call Trace:
[  783.915049]  <IRQ>
[  783.915054]  dump_stack+0x90/0xe3
[  783.915058]  __lock_acquire+0x106c/0x1080
[  783.915061]  ? sched_clock+0x5/0x10
[  783.915063]  lock_acquire+0xbd/0x1a0
[  783.915065]  ? bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
[  783.915069]  _raw_spin_lock_irqsave+0x4a/0x90
[  783.915071]  ? bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
[  783.915073]  bnxt_qplib_service_creq+0x232/0x350 [bnxt_re]
[  783.915078]  tasklet_action_common.isra.17+0x197/0x1b0
[  783.915081]  __do_softirq+0xcb/0x3a6
[  783.915084]  irq_exit+0xe9/0x100
[  783.915085]  do_IRQ+0x6a/0x120
[  783.915087]  common_interrupt+0xf/0xf
[  783.915088]  </IRQ>

Use nested notation for the spin_lock to avoid this warning.

Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:03:50 -06:00
Selvin Xavier
5a23e0b1dd RDMA/bnxt_re: Add missing spin lock initialization
Add the missing initalization of the cq_lock and qplib.flush_lock.

Fixes: 942c9b6ca8 ("RDMA/bnxt_re: Avoid Hard lockup during error CQE processing")
Signed-off-by: Selvin Xavier <selvin.xavier@broadcom.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:03:50 -06:00
Jason Gunthorpe
59bfc59a68 Merge branch 'for-rc' into rdma.git for-next
From git://git.kernel.org/pub/scm/linux/kernel/git/rdma/rdma.git

This is required to resolve dependencies of the next series of RDMA
patches.

The code motion conflicts in drivers/infiniband/core/cache.c were
resolved.

Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-16 00:01:02 -06:00
Valentine Fatiev
dd9a403495 IB/mlx5: Unmap DMA addr from HCA before IOMMU
The function that puts back the MR in cache also removes the DMA address
from the HCA. Therefore we need to call this function before we remove
the DMA mapping from MMU. Otherwise the HCA may access a memory that
is no longer DMA mapped.

Call trace:
NMI: IOCK error (debug interrupt?) for reason 71 on CPU 0.
CPU: 0 PID: 0 Comm: swapper/0 Not tainted 4.19.0-rc6+ #4
Hardware name: HP ProLiant DL360p Gen8, BIOS P71 08/20/2012
RIP: 0010:intel_idle+0x73/0x120
Code: 80 5c 01 00 0f ae 38 0f ae f0 31 d2 65 48 8b 04 25 80 5c 01 00 48 89 d1 0f 60 02
RSP: 0018:ffffffff9a403e38 EFLAGS: 00000046
RAX: 0000000000000030 RBX: 0000000000000005 RCX: 0000000000000001
RDX: 0000000000000000 RSI: ffffffff9a5790c0 RDI: 0000000000000000
RBP: 0000000000000030 R08: 0000000000000000 R09: 0000000000007cf9
R10: 000000000000030a R11: 0000000000000018 R12: 0000000000000000
R13: ffffffff9a5792b8 R14: ffffffff9a5790c0 R15: 0000002b48471e4d
FS:  0000000000000000(0000) GS:ffff9c6caf400000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 00007f5737185000 CR3: 0000000590c0a002 CR4: 00000000000606f0
Call Trace:
 cpuidle_enter_state+0x7e/0x2e0
 do_idle+0x1ed/0x290
 cpu_startup_entry+0x6f/0x80
 start_kernel+0x524/0x544
 ? set_init_arg+0x55/0x55
 secondary_startup_64+0xa4/0xb0
DMAR: DRHD: handling fault status reg 2
DMAR: [DMA Read] Request device [04:00.0] fault addr b34d2000 [fault reason 06] PTE Read access is not set
DMAR: [DMA Read] Request device [01:00.2] fault addr bff8b000 [fault reason 06] PTE Read access is not set

Fixes: f3f134f526 ("RDMA/mlx5: Fix crash while accessing garbage pointer and freed memory")
Signed-off-by: Valentine Fatiev <valentinef@mellanox.com>
Reviewed-by: Moni Shoua <monis@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Doug Ledford <dledford@redhat.com>
2018-10-10 14:52:43 -04:00
Leon Romanovsky
ed7a01fd3f RDMA/restrack: Release task struct which was hold by CM_ID object
Tracking CM_ID resource is performed in two stages: creation of cm_id
and connecting it to the cma_dev. It is needed because rdma-cm protocol
exports two separate user-visible calls rdma_create_id and rdma_accept.

At the time of CM_ID creation, the real owner of that object is unknown
yet and we need to grab task_struct. This task_struct is released or
reassigned in attach phase later on. but call to rdma_destroy_id left
this task_struct unreleased.

Such separation is unique to CM_ID and other restrack objects initialize
in one shot. It means that it is safe to use "res->valid" check to catch
unfinished CM_ID flow and release task_struct for that object.

Fixes: 00313983cd ("RDMA/nldev: provide detailed CM_ID information")
Reported-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-05 16:07:39 -06:00
Leon Romanovsky
2165fc2640 RDMA/restrack: Consolidate task name updates in one place
Unify task update and kernel name set in one place.

Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-05 16:07:39 -06:00
Leon Romanovsky
363ad35577 RDMA/restrack: Un-inline set task implementation
Prepare rdma_restrack_set_task() call to accommodate more
code by moving its implementation from *.h to *.c.

Reviewed-by: Artemy Kovalyov <artemyko@mellanox.com>
Reviewed-by: Yossi Itigin <yosefe@mellanox.com>
Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-05 16:07:39 -06:00
Parav Pandit
fe33507ec3 RDMA/core: Check error status of rdma_find_ndev_for_src_ip_rcu
rdma_find_ndev_for_src_ip_rcu() returns either valid netdev pointer or
ERR_PTR().  Instead of checking for NULL, check for error.

Fixes: caf1e3ae9f ("RDMA/core Introduce and use rdma_find_ndev_for_src_ip_rcu")
Reported-by: syzbot+20c32fa6ff84a2d28c36@syzkaller.appspotmail.com
Signed-off-by: Parav Pandit <parav@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 20:47:41 -06:00
Venkata Sandeep Dhanalakota
1570346153 IB/{hfi1, qib, rdmavt}: Move ruc_loopback to rdmavt
This patch moves ruc_loopback() from hfi1 into rdmavt for code sharing
with the qib driver.

Reviewed-by: Brian Welty <brian.welty@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:38:28 -06:00
Venkata Sandeep Dhanalakota
116aa0330e IB/{hfi1, qib, rdmavt}: Move send completion logic to rdmavt
Moving send completion code into rdmavt in order to have shared logic
between qib and hfi1 drivers.

Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Venkata Sandeep Dhanalakota <venkata.s.dhanalakota@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:38:28 -06:00
Brian Welty
019f118b94 IB/{hfi1, qib, rdmavt}: Move copy SGE logic into rdmavt
This patch moves hfi1_copy_sge() into rdmavt for sharing with qib.
This patch also moves all the wss_*() functions into rdmavt as
several wss_*() functions are called from hfi1_copy_sge()

When SGE copy mode is adaptive, cacheless copy may be done in some cases
for performance reasons. In those cases, X86 cacheless copy function
is called since the drivers that use rdmavt and may set SGE copy mode
to adaptive are X86 only. For this reason, this patch adds
"depends on X86_64" to rdmavt/Kconfig.

Reviewed-by: Ashutosh Dixit <ashutosh.dixit@intel.com>
Reviewed-by: Michael J. Ruhl <michael.j.ruhl@intel.com>
Reviewed-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Brian Welty <brian.welty@intel.com>
Signed-off-by: Harish Chegondi <harish.chegondi@intel.com>
Signed-off-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:38:28 -06:00
Lijun Ou
d9581bf358 RDMA/hns: Bugfix for atomic operation
The atomic operation not supported inline. Besides, the standard atomic
operation only support a sge and the sge is placed in the wqe.

Fix: 384f881("RDMA/hns: Add atomic support")
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:21:18 -06:00
Lijun Ou
caf3e4064a RDMA/hns: Add vlan enable bit for hip08
In order to extend vlan device range, the design add two field of qp
context for checking vlan packet in sender and in recevicer.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:21:18 -06:00
Lijun Ou
e93df01085 RDMA/hns: Support local invalidate for hip08 in kernel space
This patch adds local invalidate Memory Region (MR) support in the kernel
space driver.

Signed-off-by: Yangyang Li <liyangyang20@huawei.com>
Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:21:18 -06:00
Lijun Ou
2362cceef3 RDMA/hns: Update some fields of qp context
The hip08 hardware has two version. the version id are 0x20 and 0x21
according to the pci revision. It needs to adjust some fields for
extending new features. The specific updates include:

1. Add some fields for supporting new features by enabling some reserved
   fields in 0x20 version.
2. remove some fields which the user is not visiable in order to support
   the extend features.
3. Init some fields with zero.

These updates is compatible with 0x20 version.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:21:18 -06:00
Lijun Ou
b28ca7ccef RDMA/hns: Limit extend sq sge num
According to hip08 limit, the buffer size of extend sge needs to be an
integer wqe_sge_buf_page size. For example, the value of sge_shift field
of qp context is greater or equal to eight when buffer page size is 4K
size. The value of sge_shift field of qp context assigned by
hr_qp->sge.sge_cnt.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:21:18 -06:00
Lijun Ou
3a63c964ea RDMA/hns: Update some attributes of the RoCE device
According to the IB protocol definition, the driver needs to show the
correct device information and the information will be queryed by device
attribute.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:21:17 -06:00
Lijun Ou
157b52a08d RDMA/hns: Configure ecn field of ip header
In order to compatible with the third party RoCE device, The hardware
modify the set method for the ecn field of ip header in new hip08
version. The high 6bit of tclass be assigned for dscp field of packet.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:21:17 -06:00
Lijun Ou
05ad5482a5 RDMA/hns: Limit the size of extend sge of sq
The hip08 split two hardware version. The version id are 0x20 and 0x21
according to the PCI revison. The max size of extend sge of sq is limited
to 2M for 0x20 version and 8M for 0x21 version. It may be exceeded to 2M
according to the algorithm that compute the product of wqe count and
extend sge number of every wqe. But the product always less than 8M.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:21:17 -06:00
Lijun Ou
15fc056fba RDMA/hns: Bugfix for CM test
It will print the warning when the MSB bit of SLID is not zero running
cm_req_handler function that test CM. It needs to fixed zero when test
RoCE device.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:21:17 -06:00
Lijun Ou
c80e066100 RDMA/hns: Submit bad wr when post send wr exception
When user issues a RDMA read and enables sq inline, it needs to report a
bad wr to user.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:21:17 -06:00
Lijun Ou
06ef0ee4b5 RDMA/hns: Bugfix for reserved qp number
It needs to include two special qps for every port. The hip08 have four
ports and the all reserved qp numbers are eight.

Signed-off-by: Lijun Ou <oulijun@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:21:17 -06:00
Leon Romanovsky
38716732f1 RDMA/netlink: Simplify netlink listener existence check
All users of rdma_nl_chk_listeners() are interested to get boolean answer
if netlink socket has listeners, so update all places to boolean function.

Signed-off-by: Leon Romanovsky <leonro@mellanox.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:06:07 -06:00
Kamal Heib
d31131bba5 RDMA: Remove unused parameter from ib_modify_qp_is_ok()
The ll parameter is not used in ib_modify_qp_is_ok(), so remove it.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:05:46 -06:00
Kamal Heib
03241627b2 RDMA/rxe: Remove unused addr_same()
This function is not in use - delete it.

Signed-off-by: Kamal Heib <kamalheib1@gmail.com>
Reviewed-by: Leon Romanovsky <leonro@mellanox.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:04:32 -06:00
Zhu Yanjun
aae0484e15 IB/rxe: avoid srq memory leak
In rxe_queue_init, q and q->buf are allocated. In do_mmap_info, q->ip is
allocated. When error occurs, rxe_srq_from_init and the later error
handler do not free these allocated memories.  This will make memory leak.

Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:03:36 -06:00
Wei Yongjun
39f2495618 IB/mthca: Fix error return code in __mthca_init_one()
Fix to return a negative error code from the mthca_cmd_init() error
handling case instead of 0, as done elsewhere in this function.

Fixes: 80fd823873 ("[PATCH] IB/mthca: Encapsulate command interface init")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:02:10 -06:00
Jason Gunthorpe
e73798f20e RDMA/uverbs: Fix RCU annotation for radix slot deference
The uapi radix tree is a write-once data structure protected by kref.
Once we get to the ioctl() fop it is not possible for anything else
to be writing to it, so the access should use rcu_dereference_protected.

Reported-by: Matthew Wilcox <willy@infradead.org>
Signed-off-by: Jason Gunthorpe <jgg@mellanox.com>
2018-10-03 16:01:40 -06:00