2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-28 23:23:55 +08:00
Commit Graph

811 Commits

Author SHA1 Message Date
Wei Yongjun
94a7111043 iser-target: fix error return code in isert_create_device_ib_res()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-12-11 11:54:47 -08:00
Linus Torvalds
b0e3636f65 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Things have been quiet this round with mostly bugfixes, percpu
  conversions, and other minor iscsi-target conformance testing changes.

  The highlights include:

   - Add demo_mode_discovery attribute for iscsi-target (Thomas)
   - Convert tcm_fc(FCoE) to use percpu-ida pre-allocation
   - Add send completion interrupt coalescing for ib_isert
   - Convert target-core to use percpu-refcounting for se_lun
   - Fix mutex_trylock usage bug in iscsit_increment_maxcmdsn
   - tcm_loop updates (Hannes)
   - target-core ALUA cleanups + prep for v3.14 SCSI Referrals support (Hannes)

  v3.14 is currently shaping to be a busy development cycle in target
  land, with initial support for T10 Referrals and T10 DIF currently on
  the roadmap"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (40 commits)
  iscsi-target: chap auth shouldn't match username with trailing garbage
  iscsi-target: fix extract_param to handle buffer length corner case
  iscsi-target: Expose default_erl as TPG attribute
  target_core_configfs: split up ALUA supported states
  target_core_alua: Make supported states configurable
  target_core_alua: Store supported ALUA states
  target_core_alua: Rename ALUA_ACCESS_STATE_OPTIMIZED
  target_core_alua: spellcheck
  target core: rename (ex,im)plict -> (ex,im)plicit
  percpu-refcount: Add percpu-refcount.o to obj-y
  iscsi-target: Do not reject non-immediate CmdSNs exceeding MaxCmdSN
  iscsi-target: Convert iscsi_session statistics to atomic_long_t
  target: Convert se_device statistics to atomic_long_t
  target: Fix delayed Task Aborted Status (TAS) handling bug
  iscsi-target: Reject unsupported multi PDU text command sequence
  ib_isert: Avoid duplicate iscsit_increment_maxcmdsn call
  iscsi-target: Fix mutex_trylock usage in iscsit_increment_maxcmdsn
  target: Core does not need blkdev.h
  target: Pass through I/O topology for block backstores
  iser-target: Avoid using FRMR for single dma entry requests
  ...
2013-11-22 10:52:03 -08:00
Linus Torvalds
1ea406c0e0 Main batch of InfiniBand/RDMA changes for 3.13:
- Re-enable flow steering verbs with new improved userspace ABI
  - Fixes for slow connection due to GID lookup scalability
  - IPoIB fixes
  - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
  - Further improvements to SRP error handling
  - Add new transport type for Cisco usNIC
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCAAGBQJSil7BAAoJEENa44ZhAt0hbtgP/A+AmUalbOX6ZKzuOFxsrtY2
 r55CX9b1JBeFM/Zhn2o6y+81lpCjkckJSggESMe4izNgocGw0nW4vYGN4SBynatj
 y8sR9OSn+G3ihuENrzG41MJUGEa5WbcNMy4boN+Oa+qyTlV/WjLR7Fv4WbikK7Wm
 o8FNlXiiDhMoGfHHG5J0MD0EQsnxuLDk2XP+ciu4tLtTs+wBka+gFK8WnMvztle3
 gTeMNna5ilvCS2fdBxteuPA3KeDnJE9AgJSMJ2a4Rh+DR8uTgWYQ6n7amjmOc546
 yhAKkoBkxPE10+Yj82WOPhCFxSeWcuSwJvpgv5dTVZ1XqUUcC1V3TEcZDHmyyHQ7
 uPXgS1A+erBW3OYPBjZqtKvnHObscV12fL+rId3vIhcAQIbFroci08ZwPidEYRkn
 fvwlEKcrIsBIpRXEyjlFCxsiiDnfq1wC1VayMR3jrIK0P6idf1SXf/geiRp9+RGT
 wKUc0j51jvEx29qc65xuhEP9FQV9pCMxyd+FEE0d0KkjMz5hsIkjmcUcBbgF0CGg
 GEyDPlgRLv+vmWDGpT8XraaV/0CJOEQDIgB4WSN87/AZ4UoNt7spW2xqsLsp1toy
 5e0100tpWUleTPLe/Wig5GtBdagQ2jAUK1+186CP93pFPtkwc4/7X3hyp7qPIPTz
 VDvT9DEy6zjSMCLpMcdo
 =xxC+
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma updates from Roland Dreier:
 - Re-enable flow steering verbs with new improved userspace ABI
 - Fixes for slow connection due to GID lookup scalability
 - IPoIB fixes
 - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
 - Further improvements to SRP error handling
 - Add new transport type for Cisco usNIC

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
  IB/core: Re-enable create_flow/destroy_flow uverbs
  IB/core: extended command: an improved infrastructure for uverbs commands
  IB/core: Remove ib_uverbs_flow_spec structure from userspace
  IB/core: Use a common header for uverbs flow_specs
  IB/core: Make uverbs flow structure use names like verbs ones
  IB/core: Rename 'flow' structs to match other uverbs structs
  IB/core: clarify overflow/underflow checks on ib_create/destroy_flow
  IB/ucma: Convert use of typedef ctl_table to struct ctl_table
  IB/cm: Convert to using idr_alloc_cyclic()
  IB/mlx5: Fix page shift in create CQ for userspace
  IB/mlx4: Fix device max capabilities check
  IB/mlx5: Fix list_del of empty list
  IB/mlx5: Remove dead code
  IB/core: Encorce MR access rights rules on kernel consumers
  IB/mlx4: Fix endless loop in resize CQ
  RDMA/cma: Remove unused argument and minor dead code
  RDMA/ucma: Discard events for IDs not yet claimed by user space
  IB/core: Add Cisco usNIC rdma node and transport types
  RDMA/nes: Remove self-assignment from nes_query_qp()
  IB/srp: Report receive errors correctly
  ...
2013-11-18 15:36:04 -08:00
Roland Dreier
b4fdf52b3f Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-next 2013-11-17 08:22:19 -08:00
Linus Torvalds
9073e1a804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual earth-shaking, news-breaking, rocket science pile from
  trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
  doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
  doc: add missing files to timers/00-INDEX
  timekeeping: Fix some trivial typos in comments
  mm: Fix some trivial typos in comments
  irq: Fix some trivial typos in comments
  NUMA: fix typos in Kconfig help text
  mm: update 00-INDEX
  doc: Documentation/DMA-attributes.txt fix typo
  DRM: comment: `halve' -> `half'
  Docs: Kconfig: `devlopers' -> `developers'
  doc: typo on word accounting in kprobes.c in mutliple architectures
  treewide: fix "usefull" typo
  treewide: fix "distingush" typo
  mm/Kconfig: Grammar s/an/a/
  kexec: Typo s/the/then/
  Documentation/kvm: Update cpuid documentation for steal time and pv eoi
  treewide: Fix common typo in "identify"
  __page_to_pfn: Fix typo in comment
  Correct some typos for word frequency
  clk: fixed-factor: Fix a trivial typo
  ...
2013-11-15 16:47:22 -08:00
Nicholas Bellinger
04d9cd1224 ib_isert: Avoid duplicate iscsit_increment_maxcmdsn call
This patch avoids a duplicate iscsit_increment_maxcmdsn() call for
ISER_IB_RDMA_WRITE within isert_map_rdma() + isert_reg_rdma_frwr(),
which will already be occuring once during isert_put_datain() ->
iscsit_build_rsp_pdu() operation.

It also removes the local conn->stat_sn assignment + increment,
and changes the third parameter to iscsit_build_rsp_pdu() to
signal this should be done by iscsi_target_mode code.

Tested-by: Moussa Ba <moussaba@micron.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-12 18:05:07 -08:00
Vu Pham
f01b9f7339 iser-target: Avoid using FRMR for single dma entry requests
This patch changes isert_reg_rdma_frwr() to not use FRMR for single
dma entry requests from small I/Os, in order to avoid the associated
memory registration overhead.

Using DMA MR is sufficient here for the single dma entry requests,
and addresses a >= v3.12 performance regression.

Signed-off-by: Vu Pham <vu@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-12 13:07:52 -08:00
Bart Van Assche
cd4e38542a IB/srp: Report receive errors correctly
The IB spec does not guarantee that the opcode is available in error
completions.  Hence do not rely on it.  See also commit 948d1e889e
("IB/srp: Introduce srp_handle_qp_err()").

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org> # v3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:18 -08:00
Bart Van Assche
99b6697a50 IB/srp: Avoid offlining operational SCSI devices
If SCSI commands are submitted with a SCSI request timeout that is
lower than the the IB RC timeout, it can happen that the SCSI error
handler has already started device recovery before transport layer
error handling starts.  So it can happen that the SCSI error handler
tries to abort a SCSI command after it has been reset by
srp_rport_reconnect().

Tell the SCSI error handler that such commands have finished and that
it is not necessary to continue its recovery strategy for commands
that have been reset by srp_rport_reconnect().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:18 -08:00
Vu Pham
65d7dd2f34 IB/srp: Remove target from list before freeing Scsi_Host structure
Remove an SRP target from the SRP target list before invoking the last
scsi_host_put() call.  This change is necessary because that last put
frees the memory that holds the srp_target_port structure.

This patch prevents the following kernel oops:

    RIP: 0010:[<ffffffff810b00d0>] __lock_acquire+0x500/0x1570
    Call Trace:
     [<ffffffff810b11e4>] lock_acquire+0xa4/0x120
     [<ffffffff81531206>] _spin_lock+0x36/0x70
     [<ffffffffa01b6d8f>] srp_remove_work+0xef/0x180 [ib_srp]
     [<ffffffff8109125c>] worker_thread+0x21c/0x3d0
     [<ffffffff81096e86>] kthread+0x96/0xa0
     [<ffffffff8100c20a>] child_rip+0xa/0x20

Signed-off-by: Vu Pham <vuhuong@mellanox.com>

[ bvanassche - Modified path description and CC'ed stable. ]

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:17 -08:00
Jack Wang
71444b9781 IB/srp: Add change_queue_depth and change_queue_type support
Currently, it's not possible to change queue depth for a device behind
SRP host. Sometimes, we need to adjust queue_depth for performance
reason (eg storage busy, we need lower queue_depth to avoid running
into SCSI error handler), so this patch add support for SRP driver.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:17 -08:00
Bart Van Assche
4d73f95f70 IB/srp: Make queue size configurable
Certain storage configurations, e.g. a sufficiently large array of
hard disks in a RAID configuration, need a queue depth above 64 to
achieve optimal performance. Hence make the queue depth configurable.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Tested-by: Jack Wang <xjtuwjp@gmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:17 -08:00
Bart Van Assche
b81d00bddf IB/srp: Introduce srp_alloc_req_data()
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Vu Pham <vu@mellanox.com>
Cc: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:17 -08:00
Bart Van Assche
848b3082db IB/srp: Export sgid to sysfs
On an initiator system with multiple IB ports it is not yet possible
to figure out what the originating port of an SRP connection is. Hence
make the source GID available in sysfs.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:16 -08:00
Bart Van Assche
a95cadb9da IB/srp: Add periodic reconnect functionality
After a transport layer occurred, periodically try to reconnect
to the target until the dev_loss timer expires.  Protect the
callback functions that can be invoked from inside the SCSI EH
against concurrent invocation with srp_reconnect_rport() via the
rport mutex. Change the default dev_loss_tmo from 60s into 600s
to give the reconnect mechanism a chance to kick in.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:16 -08:00
Bart Van Assche
8c64e4531c scsi_transport_srp: Add periodic reconnect support
Add support for periodically reconnecting to an SRP target until
the dev_loss timer expires. After the tenth reconnection attempt,
gradually slow down subsequent reconnect attempts.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:16 -08:00
Bart Van Assche
c1120f8981 IB/srp: Start timers if a transport layer error occurs
Start the reconnect timer, fast_io_fail timer and dev_loss timers if a
transport layer error occurs.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:15 -08:00
Bart Van Assche
ed9b2264fb IB/srp: Use SRP transport layer error recovery
Enable fast_io_fail_tmo and dev_loss_tmo functionality for the IB SRP
initiator.  Add kernel module parameters that allow to specify default
values for these parameters.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:15 -08:00
Bart Van Assche
9dd69a600a IB/srp: Keep rport as long as the IB transport layer
Keep the rport data structure around after srp_remove_host() has
finished until cleanup of the IB transport layer has finished
completely. This is necessary because later patches use the rport
pointer inside the queuecommand callback. Without this patch
accessing the rport from inside a queuecommand callback is racy
because srp_remove_host() must be invoked before scsi_remove_host()
and because the queuecommand callback could get invoked after
srp_remove_host() has finished. In other words, without this patch
the queuecommand callback can get invoked after the rport data
structure has been freed.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:15 -08:00
Vu Pham
7bb312e4a2 IB/srp: Make transport layer retry count configurable
Allow the InfiniBand RC retry count to be configured by the user as an
option in the target login string.  Reducing this retry count allows to
reduce the path failover time.

Signed-off-by: Vu Pham <vu@mellanox.com>

[ bvanassche: Rewrote patch description / changed default retry count ]

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:15 -08:00
Michal Schmidt
7f1a38671c IPoIB: lower NAPI weight
Since commit 82dc3c63c6 ("net: introduce NAPI_POLL_WEIGHT")
netif_napi_add() produces an error message if a NAPI poll weight
greater than 64 is requested.

Use the standard NAPI weight.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:50 -08:00
Erez Shitrit
94232d9ce8 IPoIB: Start multicast join process only on active ports
The driver starts the mcast_join task whenever the netdev interface is
UP without relation to the underlying IB port state.

Until the port state is ACTIVE all the join requests are irrelevant,
and the IB core returns -EINVAL. So the user will see errors such as:
"multicast join failed for ff12:401b:... , status -22".

Instead, have ipoib_mcast_join_task() return when the port is not active.

It will be called again when the port state is changed and the
low-level driver triggers the IB_EVENT_PORT_ACTIVE event or the
IB_EVENT_CLIENT_REREGISTER event.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Erez Shitrit
a39c52ab88 IPoIB: Add path query flushing in ipoib_ib_dev_cleanup
The path_rec_completion() callback may be invoked asynchronously even
at the middle of "driver uninit" process.  This can lead to scheduling
a task that tries to touch members of the priv object that are no
longer valid.  For example the function cm_create_tx_qp can attempt to
create qp with no valid priv->pd object.

The following crash is one of the results:
RIP: 0010:[<ffffffffa021bb47>]  [<ffffffffa021bb47>] ipoib_cm_create_tx_qp+0x57/0x90 [ib_ipoib]
Process ipoib (pid: 5916, threadinfo ffff8803786e4000, task ffff8804150e1500)
Stack:
Call Trace:
[<ffffffff81309ef0>] ? get_random_bytes+0x20/0x30
[<ffffffffa021be2a>] ipoib_cm_tx_init+0xca/0x340 [ib_ipoib]
[<ffffffffa021f765>] ipoib_cm_tx_start+0x215/0x3f0 [ib_ipoib]
[<ffffffffa021f550>] ? ipoib_cm_tx_start+0x0/0x3f0 [ib_ipoib]
[<ffffffff8108b2b0>] worker_thread+0x170/0x2a0
[<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108b140>] ? worker_thread+0x0/0x2a0
[<ffffffff81090886>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffff810907f0>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20

Fix that by flushing all pending path queries at this point.

Signed-off-by: Alex Markuze <markuze@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Erez Shitrit
a9c8ba5884 IPoIB: Fix usage of uninitialized multicast objects
The driver should avoid calling ib_sa_free_multicast on the mcast->mc
object until it finishes its initialization state.  Otherwise we can
crash when ipoib_mcast_dev_flush() attempts to use the uninitialized
multicast object.

Instead, only call wait_for_completion() for multicast entries that
started the join process, meaning that ib_sa_join_multicast() finished.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Erez Shitrit
aede25011f IPoIB: Avoid flushing the driver workqueue on dev_down
The driver should not flush the whole workqueue when only one work (the
pkey poll one) needs to be cancelled.  Use cancel_delayed_work_sync()
instead.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Erez Shitrit
f47944cc2d IPoIB: Fix deadlock between dev_change_flags() and __ipoib_dev_flush()
When ipoib interface is going down it takes all of its children with
it, under mutex.

For each child, dev_change_flags() is called.  That function calls
ipoib_stop() via the ndo, and causes flush of the workqueue.
Sometimes in the workqueue an __ipoib_dev_flush work() is waiting and
when invoked tries to get the same mutex, which leads to a deadlock,
as seen below.

The solution is to switch to rw-sem instead of mutex.

The deadlock:
[11028.165303]  [<ffffffff812b0977>] ? vgacon_scroll+0x107/0x2e0
[11028.171844]  [<ffffffff814eaac5>] schedule_timeout+0x215/0x2e0
[11028.178465]  [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80
[11028.185962]  [<ffffffff814ea743>] wait_for_common+0x123/0x180
[11028.192491]  [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20
[11028.199504]  [<ffffffff814ea85d>] wait_for_completion+0x1d/0x20
[11028.206224]  [<ffffffff8108b4f1>] flush_cpu_workqueue+0x61/0x90
[11028.212948]  [<ffffffff8108b5a0>] ? wq_barrier_func+0x0/0x20
[11028.219375]  [<ffffffff8108bfc4>] flush_workqueue+0x54/0x80
[11028.225712]  [<ffffffffa05a0576>] ipoib_mcast_stop_thread+0x66/0x90 [ib_ipoib]
[11028.233988]  [<ffffffffa059ccea>] ipoib_ib_dev_down+0x6a/0x100 [ib_ipoib]
[11028.241678]  [<ffffffffa059849a>] ipoib_stop+0x8a/0x140 [ib_ipoib]
[11028.248692]  [<ffffffff8142adf1>] dev_close+0x71/0xc0
[11028.254447]  [<ffffffff8142a631>] dev_change_flags+0xa1/0x1d0
[11028.261062]  [<ffffffffa059851b>] ipoib_stop+0x10b/0x140 [ib_ipoib]
[11028.268172]  [<ffffffff8142adf1>] dev_close+0x71/0xc0
[11028.273922]  [<ffffffff8142a631>] dev_change_flags+0xa1/0x1d0
[11028.280452]  [<ffffffff8148f20b>] devinet_ioctl+0x5eb/0x6a0
[11028.286786]  [<ffffffff814903b8>] inet_ioctl+0x88/0xa0
[11028.292633]  [<ffffffff8141591a>] sock_ioctl+0x7a/0x280
[11028.298576]  [<ffffffff81189012>] vfs_ioctl+0x22/0xa0
[11028.304326]  [<ffffffff81140540>] ? unmap_region+0x110/0x130
[11028.310756]  [<ffffffff811891b4>] do_vfs_ioctl+0x84/0x580
[11028.316897]  [<ffffffff81189731>] sys_ioctl+0x81/0xa0

and

11028.017533]  [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80
[11028.025030]  [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
[11028.031945]  [<ffffffff814eb2ae>] __mutex_lock_slowpath+0x13e/0x180
[11028.039053]  [<ffffffff814eb14b>] mutex_lock+0x2b/0x50
[11028.044910]  [<ffffffffa059f7e7>] __ipoib_ib_dev_flush+0x37/0x210 [ib_ipoib]
[11028.052894]  [<ffffffffa059fa00>] ? ipoib_ib_dev_flush_light+0x0/0x20 [ib_ipoib]
[11028.061363]  [<ffffffffa059fa17>] ipoib_ib_dev_flush_light+0x17/0x20 [ib_ipoib]
[11028.069738]  [<ffffffff8108b120>] worker_thread+0x170/0x2a0
[11028.076068]  [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40
[11028.083374]  [<ffffffff8108afb0>] ? worker_thread+0x0/0x2a0
[11028.089709]  [<ffffffff81090626>] kthread+0x96/0xa0
[11028.095266]  [<ffffffff8100c0ca>] child_rip+0xa/0x20
[11028.100921]  [<ffffffff81090590>] ? kthread+0x0/0xa0
[11028.106573]  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
[11028.112423] INFO: task ifconfig:23640 blocked for more than 120 seconds.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Tal Alon
22252b4e09 IPoIB: Change CM skb memory allocation to be non-atomic during init
Change CM skb memory allocation to use GFP_KERNEL when possible.

During device init there's no need to use GFP_ATOMIC when allocating
memory for the CM skbs -- use GFP_KERNEL instead.

Signed-off-by: Tal Alon <talal@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:48 -08:00
Erez Shitrit
c2bb5628db IPoIB: Fix crash in dev_open error flow
If napi has never been enabled when calling ipoib_ib_dev_stop, a
kernel crash occurs, because the verbs layer completion handler
(ipoib_ib_completion) calls napi_schedule unconditionally.

If the napi structure passed in the napi_schedule call has not
been initialized, napi will crash.

The cleanest solution is to simply enable napi before calling
ipoib_ib_dev_stop in the dev_open error flow. (dev_stop then
immediately disables napi).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:48 -08:00
Nicholas Bellinger
4a9a6c8d53 target: Drop left-over se_lun->lun_cmd_list shutdown code
Now with percpu refcounting for se_lun in place, go ahead and drop
the legacy per se_cmd accounting for se_lun shutdown.

This includes __transport_clear_lun_from_sessions(), the associated
transport_lun_wait_for_tasks() logic, along with a handful of now
unused se_cmd structure members and ->transport_state bits.

Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-07 14:25:02 -08:00
Nicholas Bellinger
95b60f0788 ib_isert: Add support for completion interrupt coalescing
This patch adds support for completion interrupt coalescing that
allows only every ISERT_COMP_BATCH_COUNT (8) to set IB_SEND_SIGNALED,
thus avoiding completion interrupts for every posted iser_tx_desc.

The batch processing is done using a per isert_conn llist that once
IB_SEND_SIGNALED has been set is saved to tx_desc->comp_llnode_batch,
and completion processing of previously posted iser_tx_descs is done
in a single shot from within isert_send_completion() code.

Note this is only done for response PDUs from ISCSI_OP_SCSI_CMD, and
all other control type of PDU responses will force an implicit batch
drain to occur.

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-06 12:48:22 -08:00
Vu Pham
0a66614b93 iser-target: check device before dereferencing its variable
This patch changes isert_connect_release() to correctly check for
the existence struct isert_device *device before checking for
isert_device->use_frwr.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-23 21:42:33 -07:00
Masanari Iida
8c88126bbb treewide: Fix typo in Kconfig
Correct spelling typo in Kconfig.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-10-14 15:23:02 +02:00
Jack Wang
c807f64340 ib_srpt: always set response for task management
The SRP specification requires:

  "Response data shall be provided in any SRP_RSP response that is sent in
   response to an SRP_TSK_MGMT request (see 6.7). The information in the
   RSP_CODE field (see table 24) shall indicate the completion status of
   the task management function."

So fix this to avoid the SRP initiator interprets task management functions
that succeeded as failed.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-03 04:23:17 -07:00
Nicholas Bellinger
0b41d6ca61 ib_srpt: Destroy cm_id before destroying QP.
This patch fixes a bug where ib_destroy_cm_id() was incorrectly being called
after srpt_destroy_ch_ib() had destroyed the active QP.

This would result in the following failed SRP_LOGIN_REQ messages:

Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff1762bd, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c903009f8f41)
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff1758f9, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c903009f8f42)
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff175941, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c90300a3cfb2)
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1)
mlx4_core 0000:84:00.0: command 0x19 failed: fw status = 0x9
rejected SRP_LOGIN_REQ because creating a new RDMA channel failed.
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1)
mlx4_core 0000:84:00.0: command 0x19 failed: fw status = 0x9
rejected SRP_LOGIN_REQ because creating a new RDMA channel failed.
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1)

Reported-by: Navin Ahuja <navin.ahuja@saratoga-speed.com>
Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-01 21:27:30 -07:00
Linus Torvalds
48efe453e6 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Lots of activity again this round for I/O performance optimizations
  (per-cpu IDA pre-allocation for vhost + iscsi/target), and the
  addition of new fabric independent features to target-core
  (COMPARE_AND_WRITE + EXTENDED_COPY).

  The main highlights include:

   - Support for iscsi-target login multiplexing across individual
     network portals
   - Generic Per-cpu IDA logic (kent + akpm + clameter)
   - Conversion of vhost to use per-cpu IDA pre-allocation for
     descriptors, SGLs and userspace page pointer list
   - Conversion of iscsi-target + iser-target to use per-cpu IDA
     pre-allocation for descriptors
   - Add support for generic COMPARE_AND_WRITE (AtomicTestandSet)
     emulation for virtual backend drivers
   - Add support for generic EXTENDED_COPY (CopyOffload) emulation for
     virtual backend drivers.
   - Add support for fast memory registration mode to iser-target (Vu)

  The patches to add COMPARE_AND_WRITE and EXTENDED_COPY support are of
  particular significance, which make us the first and only open source
  target to support the full set of VAAI primitives.

  Currently Linux clients are lacking upstream support to actually
  utilize these primitives.  However, with server side support now in
  place for folks like MKP + ZAB working on the client, this logic once
  reserved for the highest end of storage arrays, can now be run in VMs
  on their laptops"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
  target/iscsi: Bump versions to v4.1.0
  target: Update copyright ownership/year information to 2013
  iscsi-target: Bump default TCP listen backlog to 256
  target: Fix >= v3.9+ regression in PR APTPL + ALUA metadata write-out
  iscsi-target; Bump default CmdSN Depth to 64
  iscsi-target: Remove unnecessary wait_for_completion in iscsi_get_thread_set
  iscsi-target: Add thread_set->ts_activate_sem + use common deallocate
  iscsi-target: Fix race with thread_pre_handler flush_signals + ISCSI_THREAD_SET_DIE
  target: remove unused including <linux/version.h>
  iser-target: introduce fast memory registration mode (FRWR)
  iser-target: generalize rdma memory registration and cleanup
  iser-target: move rdma wr processing to a shared function
  target: Enable global EXTENDED_COPY setup/release
  target: Add Third Party Copy (3PC) bit in INQUIRY response
  target: Enable EXTENDED_COPY setup in spc_parse_cdb
  target: Add support for EXTENDED_COPY copy offload emulation
  target: Avoid non-existent tg_pt_gp_mem in target_alua_state_check
  target: Add global device list for EXTENDED_COPY
  target: Make helpers non static for EXTENDED_COPY command setup
  target: Make spc_parse_naa_6h_vendor_specific non static
  ...
2013-09-12 16:11:45 -07:00
Nicholas Bellinger
4c76251e8e target: Update copyright ownership/year information to 2013
Update copyright ownership/year information for target-core,
loopback, iscsi-target, tcm_qla2xx, vhost and iser-target.

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10 20:23:36 -07:00
Vu Pham
59464ef4fb iser-target: introduce fast memory registration mode (FRWR)
This model was introduced in 00f7ec36c "RDMA/core: Add memory
management extensions support" and works when the IB device
supports the IB_DEVICE_MEM_MGT_EXTENSIONS capability.

Upon creating the isert device, ib_isert will test whether the HCA
supports FRWR. If supported then set the flag and assign
function pointers that handle fast registration and deregistration
of appropriate resources (fast_reg descriptors).

When new connection coming in, ib_isert will check frwr flag and
create frwr resouces, if fail to do it will switch back to
old model of using global dma key and turn off the frwr support.

Registration is done using posting IB_WR_FAST_REG_MR to the QP and
invalidations using posting IB_WR_LOCAL_INV.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10 16:48:51 -07:00
Vu Pham
d40945d8c2 iser-target: generalize rdma memory registration and cleanup
Current driver uses global dma key to register the memory pointed
by sg list provided by the target core.

This is the preparation step for adding more methods like fast
path memory registration, make the reg/unreg calls be function
pointers.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10 16:48:50 -07:00
Vu Pham
90ecc6e251 iser-target: move rdma wr processing to a shared function
isert_put_datain() and isert_get_dataout() share a lot of code
in rdma wr processing, move this common code to a shared function.

Use isert_unmap_cmd to cleanup for RDMA_READ completion.
Remove duplicate field in isert_cmd and isert_rdma_wr structs
Change misc debug messages to track isert_cmd

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10 16:48:49 -07:00
Nicholas Bellinger
d703ce2f7f iscsi/iser-target: Convert to command priv_size usage
This command converts iscsi/isert-target to use allocations based on
iscsit_transport->priv_size within iscsit_allocate_cmd(), instead of
using an embedded isert_cmd->iscsi_cmd.

This includes removing iscsit_transport->alloc_cmd() usage, along
with updating isert-target code to use iscsit_priv_cmd().

Also, remove left-over iscsit_transport->release_cmd() usage for
direct calls to iscsit_release_cmd(), and drop the now unused
lio_cmd_cache and isert_cmd_cache.

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09 14:29:21 -07:00
Nicholas Bellinger
6faaa85f37 iser-target: Updates for login negotiation multi-plexing support
This patch updates iser-target code to support login negotiation
multi-plexing.  This includes only using isert_conn->conn_login_comp
for the first login request PDU, pushing the subsequent processing
to iscsi_conn->login_work -> iscsi_target_do_login_rx(), and turning
isert_get_login_rx() into a NOP.

v3 changes:
   - Drop unnecessary LOGIN_FLAGS_READ_ACTIVE bit set in
     isert_rx_login_req()

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09 14:27:21 -07:00
Linus Torvalds
7c049d0869 Main batch of InfiniBand/RDMA changes for 3.12 merge window:
- Large ocrdma HW driver update: add "fast register" work requests,
    fixes, cleanups
  - Add receive flow steering support for raw QPs
  - Fix IPoIB neighbour race that leads to crash
  - iSER updates including support for using "fast register" memory
    registration
  - IPv6 support for iWARP
  - XRC transport fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCAAGBQJSJ2drAAoJEENa44ZhAt0hpaUQAJ6EdNFh2bon9c/uHz1lw58A
 DIvVniUwWGCRgpbsv/IxZDBVX3G5IKSW6U0Y3/JxMXeIF/cGOUaJZgCeKPiYNoNA
 yxiEGI0ffvpWGAJUHSE2ARJKvgdjrpqGo5UzsiinEhJx3uczeZBcooosQTWDXxya
 /qSWH3rARkic9abuaizkuVzdWEjWLNjyh2bWnqJ4HNZplE8RfSK4bk8QtvEmpn9Z
 dNBKFFujysfHHGflLuSkFAkP1NdqlZeQ4/2uzD23p3YofbJwrJoJNFxkCfx3UUml
 fjPptlhU6kZMCwSXsD24tAV8Exr/CCgmxriFIN/Xqhu4gvBELScRPPqPqFquhtXI
 pP5hbG9/9P5C8BLgABe6IAvlU1lUyraR67bA78nYeKm2JpATE6T23Nx7nUgMgzRS
 Ee3ZvZGZvk8BZ7zyQh8D7Aig1XOtzqA9D5nLUyEJ2aRPSnXJxyi0S3Fbn0vmOMb7
 8CF+99KECG8fb4sjNAjX5Zm48+7WJd+WCd3t89oUzCbJKKpa1Chctph8VH7dEfke
 JkEjOJhuAGqau4bIMZ2nZkISoTfSD7vbjPAxMPASS7fOdR7fe6AqDn9/LMAhWjpw
 4Fb25ZW7rYf9+6jqA4MgMRFCTkXhR25FsqUUL9h8NaWai4F1dfVoZHbRvGIZah5n
 5HQupXGquwES4y6pvHjc
 =rabl
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull main batch of InfiniBand/RDMA changes from Roland Dreier:
 - Large ocrdma HW driver update: add "fast register" work requests,
   fixes, cleanups
 - Add receive flow steering support for raw QPs
 - Fix IPoIB neighbour race that leads to crash
 - iSER updates including support for using "fast register" memory
   registration
 - IPv6 support for iWARP
 - XRC transport fixes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (54 commits)
  RDMA/ocrdma: Fix compiler warning about int/pointer size mismatch
  IB/iser: Fix redundant pointer check in dealloc flow
  IB/iser: Fix possible memory leak in iser_create_frwr_pool()
  IB/qib: Move COUNTER_MASK definition within qib_mad.h header guards
  RDMA/ocrdma: Fix passing wrong opcode to modify_srq
  RDMA/ocrdma: Fill PVID in UMC case
  RDMA/ocrdma: Add ABI versioning support
  RDMA/ocrdma: Consider multiple SGES in case of DPP
  RDMA/ocrdma: Fix for displaying proper link speed
  RDMA/ocrdma: Increase STAG array size
  RDMA/ocrdma: Dont use PD 0 for userpace CQ DB
  RDMA/ocrdma: FRMA code cleanup
  RDMA/ocrdma: For ERX2 irrespective of Qid, num_posted offset is 24
  RDMA/ocrdma: Fix to work with even a single MSI-X vector
  RDMA/ocrdma: Remove the MTU check based on Ethernet MTU
  RDMA/ocrdma: Add support for fast register work requests (FRWR)
  RDMA/ocrdma: Create IRD queue fix
  IB/core: Better checking of userspace values for receive flow steering
  IB/mlx4: Add receive flow steering support
  IB/core: Export ib_create/destroy_flow through uverbs
  ...
2013-09-05 09:39:27 -07:00
Roland Dreier
82af24ac6f Merge branches 'cxgb4', 'flowsteer', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next 2013-09-03 09:01:08 -07:00
Sagi Grimberg
2e02d653fe IB/iser: Fix redundant pointer check in dealloc flow
This bug was discovered by Smatch static checker run by Dan Carpenter.
If in free_rx_descriptors(), rx_descs are not NULL then the iser
device is definately not NULL, so no need to check it before
dereferencing it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-09-02 21:26:16 -07:00
Roi Dayan
27ae2d1ea5 IB/iser: Fix possible memory leak in iser_create_frwr_pool()
Fix leak where desc is not being freed in error flows.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-09-02 21:24:08 -07:00
Or Gerlitz
6a06a4b8cf [SCSI] IB/iser: Add Discovery support
To run discovery over iSER we need to advertize the CAP_TEXT_NEGO capability
towards user space. Also need to make sure the login RX buffer is posted when
SendTargets TEXT PDUs are sent. For that end, we use a setting of the
ISCSI_PARAM_DISCOVERY_SESS iscsi param as an indication that this is
discovery session.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-26 18:53:49 +04:00
Jim Foraker
49b8e74438 IPoIB: Fix race in deleting ipoib_neigh entries
In several places, this snippet is used when removing neigh entries:

	list_del(&neigh->list);
	ipoib_neigh_free(neigh);

The list_del() removes neigh from the associated struct ipoib_path, while
ipoib_neigh_free() removes neigh from the device's neigh entry lookup
table.  Both of these operations are protected by the priv->lock
spinlock.  The table however is also protected via RCU, and so naturally
the lock is not held when doing reads.

This leads to a race condition, in which a thread may successfully look
up a neigh entry that has already been deleted from neigh->list.  Since
the previous deletion will have marked the entry with poison, a second
list_del() on the object will cause a panic:

  #5 [ffff8802338c3c70] general_protection at ffffffff815108c5
     [exception RIP: list_del+16]
     RIP: ffffffff81289020  RSP: ffff8802338c3d20  RFLAGS: 00010082
     RAX: dead000000200200  RBX: ffff880433e60c88  RCX: 0000000000009e6c
     RDX: 0000000000000246  RSI: ffff8806012ca298  RDI: ffff880433e60c88
     RBP: ffff8802338c3d30   R8: ffff8806012ca2e8   R9: 00000000ffffffff
     R10: 0000000000000001  R11: 0000000000000000  R12: ffff8804346b2020
     R13: ffff88032a3e7540  R14: ffff8804346b26e0  R15: 0000000000000246
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib]
  #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm]
  #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm]
  #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10
 #10 [ffff8802338c3ee8] kthread at ffffffff81096c66
 #11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca

We move the list_del() into ipoib_neigh_free(), so that deletion happens
only once, after the entry has been successfully removed from the lookup
table.  This same behavior is already used in ipoib_del_neighs_by_gid()
and __ipoib_reap_neigh().

Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-13 11:57:37 -07:00
Sagi Grimberg
5587856c96 IB/iser: Introduce fast memory registration model (FRWR)
Newer HCAs and Virtual functions may not support FMRs but rather a fast
registration model, which we call FRWR - "Fast Registration Work Requests".

This model was introduced in 00f7ec36c ("RDMA/core: Add memory management
extensions support") and works when the IB device supports the
IB_DEVICE_MEM_MGT_EXTENSIONS capability.

Upon creating the iser device iser will test whether the HCA supports
FMRs.  If no support for FMRs, check if IB_DEVICE_MEM_MGT_EXTENSIONS
is supported and assign function pointers that handle fast
registration and allocation of appropriate resources (fast_reg
descriptors).

Registration is done using posting IB_WR_FAST_REG_MR to the QP and
invalidations using posting IB_WR_LOCAL_INV.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:10 -07:00
Sagi Grimberg
e657571b76 IB/iser: Place the fmr pool into a union in iser's IB conn struct
This is preparation step for other memory registration methods to be
added.  In addition, change reg/unreg routines signature to indicate
they use FMRs.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:09 -07:00
Sagi Grimberg
919fc274ef IB/iser: Handle unaligned SG in separate function
This routine will be shared with other rdma management schemes.  The
bounce buffer solution for unaligned SG-lists and the sg_to_page_vec
routine are likely to be used for other registration schemes and not
just FMR.

Move them out of the FMR specific code, and call them from there.
Later they will be called also from other reg methods code.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:09 -07:00
Sagi Grimberg
b4e155ffbb IB/iser: Generalize rdma memory registration
Currently the driver uses FMRs as the only means to register the
memory pointed by SG provided by the SCSI mid-layer with the RDMA
device.

As preparation step for adding more methods for fast path memory
registration, make the alloc/free and reg/unreg calls function
pointers, which are for now just set to the existing FMR ones.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:09 -07:00
Shlomo Pongratz
b7f0451309 IB/iser: Accept session->cmds_max from user space
Use cmds_max passed from user space to be the number of PDUs to be
supported for the session instead of hard-coded ISCSI_DEF_XMIT_CMDS_MAX.
This allow controlling the max number of SCSI commands for the session.
Also don't ignore the qdepth passed from user space.

Derive from session->cmds_max the actual number of RX buffers and FMR
pool size to allocate during the connection bind phase.

Since the iser transport connection is established before the iscsi
session/connection are created and bound, we still use one hard-coded
quantity ISER_DEF_XMIT_CMDS_MAX to compute the maximum number of
work-requests to be supported by the RC QP used for the connection.

The above quantity is made to be a power of two between ISCSI_TOTAL_CMDS_MIN
(16) and ISER_DEF_XMIT_CMDS_MAX (512) inclusive.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:08 -07:00
Shlomo Pongratz
986db0d6c0 IB/iser: Restructure allocation/deallocation of connection resources
This is a preparation step to a patch that accepts the number of max
SCSI commands to be supported a session from user space iSCSI tools.

Move the allocation of the login buffer, FMR pool and its associated
page vector from iser_create_ib_conn_res() (which is called prior when
we actually know how many commands should be supported) to
iser_alloc_rx_descriptors() (which is called during the iscsi
connection bind step where this quantity is known).

Also do small refactoring around the deallocation to make that path
similar to the allocation one.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:08 -07:00
Or Gerlitz
f91424cf5b IB/iser: Use proper debug level value for info prints
Commit 4f36388261 ("IB/iser: Move informational messages from error
to info level") set info prints to be emitted at a lower debug level
than warning prints, which is a bit odd.  Fix that.

Also move the prints on unaligned SG from warning to debug level.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:08 -07:00
Erez Shitrit
c290414169 IPoIB: Fix pkey change flow for virtualization environments
IPoIB's required behaviour w.r.t to the pkey used by the device is the following:

- For "parent" interfaces (e.g ib0, ib1, etc) who are created
  automatically as a result of hot-plug events from the IB core, the
  driver needs to take whatever pkey vlaue it finds in index 0, and
  stick to that index.

- For child interfaces (e.g ib0.8001, etc) created by admin directive,
  the driver needs to use and stick to the value provided during its
  creation.

In SR-IOV environment its possible for the VF probe to take place
before the cloud management software provisions the suitable pkey for
the VF in the paravirtualed PKEY table index 0. When this is the case,
the VF IB stack will find in index 0 an invalide pkey, which is all
zeros.

Moreover, the cloud managment can assign the pkey value at index 0 at
any time of the guest life cycle.

The correct behavior for IPoIB to address these requirements for
parent interfaces is to use PKEY_CHANGE event as trigger to optionally
re-init the device pkey value and re-create all the relevant resources
accordingly, if the value of the pkey in index 0 has changed (from
invalid to valid or from valid value X to invalid value Y).

This patch enhances the heavy flushing code which is triggered by pkey
change event, to behave correctly for parent devices. For child
devices, the code remains the same, namely chases pkey value and not
index.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-31 14:23:44 -07:00
Or Gerlitz
3d790a4c26 IPoIB: Make sure child devices use valid/proper pkeys
Make sure that the IB invalid pkey (0x0000 or 0x8000) isn't used for
child devices.

Also, make sure to always set the full membership bit for the pkey of
devices created by rtnl link ops.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-31 14:23:40 -07:00
Linus Torvalds
c552441373 Main batch of InfiniBand/RDMA changes for 3.11 merge window:
- AF_IB (native IB addressing) for CMA from Sean Hefty
  - New mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes)
  - SRP fixes from Bart Van Assche (including fix to first merge request)
  - qib HW driver updates
  - Resurrection of ocrdma HW driver development
  - uverbs conversion to create fds with O_CLOEXEC set
  - Other small changes and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJR30TKAAoJEENa44ZhAt0h854P/jvAhK5u+XTM5VyjAi0DKJ7P
 bWcsu+KxbOIFnjEdsYQl1mGP44gdO8GPZp7+JR5nDHDRpw9K76qy6QQiPbaF6Y8D
 cZH8Xlq4hzBfElTWBkExEemPrVUUq77j03FE9TBatdLAtEyYkgrNyqr7Ys6zVwVK
 ugR8nAahvnB7Jh1tsyZBBd9kfbWtXJnaGC8/Zk3Na4n4zXRAbr0DcnRF0sncTL38
 VFnWbi33OQAxu5bsb2jGec/SNP3BbNwspFPjSCKqiiItRaCj13JiHhrKKvVk4RZe
 hIRnPH47kjLRp2/PwBo6o+gTXZuRg48VGBx4CKUTwx1nCzPPN1iz9ZOfqUv9Qwcv
 LX8mxC7QS/Yvud4KeEBsj6kotb80EkRF2KV5RkIKCxQiwetGD9127bZylC8ttxGw
 2f6MzYtAGD4R4C10lO8N+59VugSg1xAvwsqz0a/jy2XyVHbI1ugQedzkB20x5WPY
 51S08ABvtU9yIxIYrw2VEaa/5WN+XJ6+LpG9QBAGXdMLiCiiAe7n/YzyXI6AgwaW
 Jl/uKr6H6/jEHUHKwkyqsmbpVGPhtGWu8deyr1oYvOEP4i48gcDqMQsfMcCISrQV
 MeQU3hS/obykUlNeqjmMI2CXrecqSsiq0hXd4DLaSoZ2Rb4Drx2Wj6sTQLIAgL2q
 GBYjHWMUpZXIFHQaH7am
 =nZh8
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - AF_IB (native IB addressing) for CMA from Sean Hefty
 - new mlx5 driver for Mellanox Connect-IB adapters (including post
   merge request fixes)
 - SRP fixes from Bart Van Assche (including fix to first merge request)
 - qib HW driver updates
 - resurrection of ocrdma HW driver development
 - uverbs conversion to create fds with O_CLOEXEC set
 - other small changes and fixes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
  mlx5: Return -EFAULT instead of -EPERM
  IB/qib: Log all SDMA errors unconditionally
  IB/qib: Fix module-level leak
  mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec
  IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline
  IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd()
  mlx5_core: Fixes for sparse warnings
  IB/mlx5: Make profile[] static in main.c
  mlx5: Fix parameter type of health_handler_t
  mlx5: Add driver for Mellanox Connect-IB adapters
  IB/core: Add reserved values to enums for low-level driver use
  IB/srp: Bump driver version and release date
  IB/srp: Make HCA completion vector configurable
  IB/srp: Maintain a single connection per I_T nexus
  IB/srp: Fail I/O fast if target offline
  IB/srp: Skip host settle delay
  IB/srp: Avoid skipping srp_reset_host() after a transport error
  IB/srp: Fix remove_one crash due to resource exhaustion
  IB/qib: New transmitter tunning settings for Dell 1.1 backplane
  IB/core: Fix error return code in add_port()
  ...
2013-07-13 12:57:21 -07:00
Bart Van Assche
80d5e8a235 IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline
If the transport layer is offline it is more appropriate to let
srp_abort() return FAST_IO_FAIL instead of SUCCESS.

Reported-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-11 16:43:48 -07:00
Linus Torvalds
6d2fa9e141 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Lots of activity this round on performance improvements in target-core
  while benchmarking the prototype scsi-mq initiator code with
  vhost-scsi fabric ports, along with a number of iscsi/iser-target
  improvements and hardening fixes for exception path cases post v3.10
  merge.

  The highlights include:

   - Make persistent reservations APTPL buffer allocated on-demand, and
     drop per t10_reservation buffer.  (grover)
   - Make virtual LUN=0 a NULLIO device, and skip allocation of NULLIO
     device pages (grover)
   - Add transport_cmd_check_stop write_pending bit to avoid extra
     access of ->t_state_lock is WRITE I/O submission fast-path.  (nab)
   - Drop unnecessary CMD_T_DEV_ACTIVE check from
     transport_lun_remove_cmd to avoid extra access of ->t_state_lock in
     release fast-path.  (nab)
   - Avoid extra t_state_lock access in __target_execute_cmd fast-path
     (nab)
   - Drop unnecessary vhost-scsi wait_for_tasks=true usage +
     ->t_state_lock access in release fast-path.  (nab)
   - Convert vhost-scsi to use modern se_cmd->cmd_kref
     TARGET_SCF_ACK_KREF usage (nab)
   - Add tracepoints for SCSI commands being processed (roland)
   - Refactoring of iscsi-target handling of ISCSI_OP_NOOP +
     ISCSI_OP_TEXT to be transport independent (nab)
   - Add iscsi-target SendTargets=$IQN support for in-band discovery
     (nab)
   - Add iser-target support for in-band discovery (nab + Or)
   - Add iscsi-target demo-mode TPG authentication context support (nab)
   - Fix isert_put_reject payload buffer post (nab)
   - Fix iscsit_add_reject* usage for iser (nab)
   - Fix iscsit_sequence_cmd reject handling for iser (nab)
   - Fix ISCSI_OP_SCSI_TMFUNC handling for iser (nab)
   - Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED (nab)

  The last five iscsi/iser-target items are CC'ed to stable, as they do
  address issues present in v3.10 code.  They are certainly larger than
  I'd like for stable patch set, but are important to ensure proper
  REJECT exception handling in iser-target for 3.10.y"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
  iser-target: Ignore non TEXT + LOGOUT opcodes for discovery
  target: make queue_tm_rsp() return void
  target: remove unused codes from enum tcm_tmrsp_table
  iscsi-target: kstrtou* configfs attribute parameter cleanups
  iscsi-target: Fix tfc_tpg_auth_cit configfs length overflow
  iscsi-target: Fix tfc_tpg_nacl_auth_cit configfs length overflow
  iser-target: Add support for ISCSI_OP_TEXT opcode + payload handling
  iser-target: Rename sense_buf_[dma,len] to pdu_[dma,len]
  iser-target: Add vendor_err debug output
  target: Add (obsolete) checking for PMI/LBA fields in READ CAPACITY(10)
  target: Return correct sense data for IO past the end of a device
  target: Add tracepoints for SCSI commands being processed
  iser-target: Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED
  iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser
  iscsi-target: Fix iscsit_sequence_cmd reject handling for iser
  iscsi-target: Fix iscsit_add_reject* usage for iser
  iser-target: Fix isert_put_reject payload buffer post
  iscsi-target: missing kfree() on error path
  iscsi-target: Drop left-over iscsi_conn->bad_hdr
  target: Make core_scsi3_update_and_write_aptpl return sense_reason_t
  ...
2013-07-11 12:57:19 -07:00
Nicholas Bellinger
ca40d24eb8 iser-target: Ignore non TEXT + LOGOUT opcodes for discovery
This patch adds a check in isert_rx_opcode() to ignore non TEXT + LOGOUT
opcodes when SessionType=Discovery has been negotiated.

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:37:02 -07:00
Joern Engel
b79fafac70 target: make queue_tm_rsp() return void
The return value wasn't checked by any of the callers.  Assuming this is
correct behaviour, we can simplify some code by not bothering to
generate it.

nab: Add srpt_queue_data_in() + srpt_queue_tm_rsp() nops around
     srpt_queue_response() void return

Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:36:53 -07:00
Nicholas Bellinger
adb54c2931 iser-target: Add support for ISCSI_OP_TEXT opcode + payload handling
This patch adds isert_handle_text_cmd() to handle incoming
ISCSI_OP_TEXT PDU processing, along with isert_put_text_rsp()
for posting ISCSI_OP_TEXT_RSP ib_send_wr response.

It copies ISCSI_OP_TEXT payload using unsolicited payload at
&iser_rx_desc->data[0] into iscsi_cmd->text_in_ptr for usage
with outgoing isert_put_text_rsp() -> iscsit_build_text_rsp()

v2 changes:
  - Let iscsit_build_text_rsp() determine any extra padding

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:36:48 -07:00
Nicholas Bellinger
dbbc5d1107 iser-target: Rename sense_buf_[dma,len] to pdu_[dma,len]
Now that these two variables are used for REJECT payloads as well
as SCSI response sense payloads, rename them to something that
makes more sense.

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:36:47 -07:00
Nicholas Bellinger
c5a2adbfcb iser-target: Add vendor_err debug output
Add output for ib_wc.vendor_err in isert_cq_[t,r]x_work(), which
is useful for debugging future issues.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:36:46 -07:00
Nicholas Bellinger
b2cb96494d iser-target: Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED
This patch addresses a bug where RDMA_CM_EVENT_DISCONNECTED may occur
before the connection shutdown has been completed by rx/tx threads,
that causes isert_free_conn() to wait indefinately on ->conn_wait.

This patch allows isert_disconnect_work code to invoke rdma_disconnect
when isert_disconnect_work() process context is started by client
session reset before isert_free_conn() code has been reached.

It also adds isert_conn->conn_mutex protection for ->state within
isert_disconnect_work(), isert_cq_comp_err() and isert_free_conn()
code, along with isert_check_state() for wait_event usage.

(v2: Add explicit iscsit_cause_connection_reinstatement call
     during isert_disconnect_work() to force conn reset)

Cc: stable@vger.kernel.org  # 3.10+
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:35:56 -07:00
Nicholas Bellinger
186a964701 iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser
This patch adds target_get_sess_cmd reference counting for
iscsit_handle_task_mgt_cmd(), and adds a target_put_sess_cmd()
for the failure case.

It also fixes a bug where ISCSI_OP_SCSI_TMFUNC type commands
where leaking iscsi_cmd->i_conn_node and eventually triggering
an OOPs during struct isert_conn shutdown.

Cc: stable@vger.kernel.org  # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-06 22:01:23 -07:00
Nicholas Bellinger
561bf15892 iscsi-target: Fix iscsit_sequence_cmd reject handling for iser
This patch moves ISCSI_OP_REJECT failures into iscsit_sequence_cmd()
in order to avoid external iscsit_reject_cmd() reject usage for all
PDU types.

It also updates PDU specific handlers for traditional iscsi-target
code to not reset the session after posting a ISCSI_OP_REJECT during
setup.

(v2: Fix CMDSN_LOWER_THAN_EXP for ISCSI_OP_SCSI to call
     target_put_sess_cmd() after iscsit_sequence_cmd() failure)

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: stable@vger.kernel.org  # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-06 21:59:54 -07:00
Nicholas Bellinger
ba15991408 iscsi-target: Fix iscsit_add_reject* usage for iser
This patch changes iscsit_add_reject() + iscsit_add_reject_from_cmd()
usage to not sleep on iscsi_cmd->reject_comp to address a free-after-use
usage bug in v3.10 with iser-target code.

It saves ->reject_reason for use within iscsit_build_reject() so the
correct value for both transport cases.  It also drops the legacy
fail_conn parameter usage throughput iscsi-target code and adds
two iscsit_add_reject_cmd() and iscsit_reject_cmd helper functions,
along with various small cleanups.

(v2: Re-enable target_put_sess_cmd() to be called from
     iscsit_add_reject_from_cmd() for rejects invoked after
     target_get_sess_cmd() has been called)

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: stable@vger.kernel.org  # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-06 21:59:53 -07:00
Nicholas Bellinger
3df8f68aaf iser-target: Fix isert_put_reject payload buffer post
This patch adds the missing isert_put_reject() logic to post
a outgoing payload buffer to hold the 48 bytes of original PDU
header request payload for the rejected cmd.

It also fixes ISTATE_SEND_REJECT handling in isert_response_completion()
-> isert_do_control_comp() code, and drops incorrect iscsi_cmd_t->reject_comp
usage.

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: stable@vger.kernel.org  # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-06 21:59:35 -07:00
Linus Torvalds
80cc38b163 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "The usual stuff from trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  treewide: relase -> release
  Documentation/cgroups/memory.txt: fix stat file documentation
  sysctl/net.txt: delete reference to obsolete 2.4.x kernel
  spinlock_api_smp.h: fix preprocessor comments
  treewide: Fix typo in printk
  doc: device tree: clarify stuff in usage-model.txt.
  open firmware: "/aliasas" -> "/aliases"
  md: bcache: Fixed a typo with the word 'arithmetic'
  irq/generic-chip: fix a few kernel-doc entries
  frv: Convert use of typedef ctl_table to struct ctl_table
  sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
  doc: clk: Fix incorrect wording
  Documentation/arm/IXP4xx fix a typo
  Documentation/networking/ieee802154 fix a typo
  Documentation/DocBook/media/v4l fix a typo
  Documentation/video4linux/si476x.txt fix a typo
  Documentation/virtual/kvm/api.txt fix a typo
  Documentation/early-userspace/README fix a typo
  Documentation/video4linux/soc-camera.txt fix a typo
  lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
  ...
2013-07-04 11:40:58 -07:00
Vu Pham
e8ca413558 IB/srp: Bump driver version and release date
Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-01 10:42:02 -07:00
Bart Van Assche
4b5e5f41c8 IB/srp: Make HCA completion vector configurable
Several InfiniBand HCAs allow configuring the completion vector per
CQ.  This allows spreading the workload created by IB completion
interrupts over multiple MSI-X vectors and hence over multiple CPU
cores.  In other words, configuring the completion vector properly not
only allows reducing latency on an initiator connected to multiple
SRP targets but also allows improving throughput.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-01 10:40:55 -07:00
Bart Van Assche
96fc248a4c IB/srp: Maintain a single connection per I_T nexus
An SRP target is required to maintain a single connection between
initiator and target.  This means that if the 'add_target' attribute
is used to create a second connection to a target, the first
connection will be logged out and that the SCSI error handler will
kick in.  The SCSI error handler will cause the SRP initiator to
reconnect, which will cause I/O over the second connection to fail.
Avoid such ping-pong behavior by disabling relogins.

If reconnecting manually is necessary, that is possible by deleting
and recreating an rport via sysfs.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-01 10:40:07 -07:00
Bart Van Assche
99e1c1398f IB/srp: Fail I/O fast if target offline
If reconnecting failed we know that no command completion will
be received anymore.  Hence let the SCSI error handler fail such
commands immediately.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-01 10:37:13 -07:00
Bart Van Assche
2742c1dadd IB/srp: Skip host settle delay
The SRP initiator implements host reset by reconnecting to the SRP
target.  That means that communication with the target is possible as
soon as host reset finished. Hence skip the host settle delay.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-27 16:44:39 -07:00
Bart Van Assche
086f44f588 IB/srp: Avoid skipping srp_reset_host() after a transport error
The SCSI error handler assumes that the transport layer is operational
if an eh_abort_handler() returns SUCCESS.  Hence srp_abort() only
should return SUCCESS if sending the ABORT TASK task management
function succeeded.  This patch avoids the SCSI error handler skipping
the srp_reset_host() call after a transport layer error.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-27 16:44:39 -07:00
Dotan Barak
1fe0cb8488 IB/srp: Fix remove_one crash due to resource exhaustion
If the add_one callback fails during driver load no resources are
allocated so there isn't a need to release any resources. Trying
to clean the resource may lead to the following kernel panic:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp]
    RIP: 0010:[<ffffffffa0132331>]  [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp]
    Process rmmod (pid: 4562, threadinfo ffff8800dd738000, task ffff8801167e60c0)
    Call Trace:
     [<ffffffffa024500e>] ib_unregister_client+0x4e/0x120 [ib_core]
     [<ffffffffa01361bd>] srp_cleanup_module+0x15/0x71 [ib_srp]
     [<ffffffff810ac6a4>] sys_delete_module+0x194/0x260
     [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-27 16:44:38 -07:00
Nicholas Bellinger
778de36896 iscsi/isert-target: Refactor ISCSI_OP_NOOP RX handling
This patch refactors ISCSI_OP_NOOP handling within iscsi-target in
order to handle iscsi_nopout payloads in a transport specific manner.

This includes splitting existing iscsit_handle_nop_out() into
iscsit_setup_nop_out() and iscsit_process_nop_out() calls, and
makes iscsit_handle_nop_out() be only used internally by traditional
iscsi socket calls.

Next update iser-target code to use new callers and add FIXME for
the handling iscsi_nopout payloads.  Also fix reject response handling
in iscsit_setup_nop_out() to use proper iscsit_add_reject_from_cmd().

v2: Fix uninitialized iscsit_handle_nop_out() payload_length usage (Fengguang)
v3: Remove left-over dead code in iscsit_setup_nop_out() (DanC)

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-06-24 22:35:51 -07:00
Or Gerlitz
28f292e879 IB/iser: Add Mellanox copyright
Add Mellanox copyright to the iser initiator source code which I maintain.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-04 17:03:12 -07:00
Roi Dayan
5b61ff43a7 IB/iser: Fix device removal flow
Change the code to destroy the "last opened" rdma_cm id after making
sure we released all other objects (QP, CQs, PD, etc) associated with
the IB device.

Since iser accesses the IB device using the rdma_cm id, we need to
free any objects that are related to the device that is associated
with the rdma_cm id prior to destroying that id.  When this isn't
done, the low level driver that created this device can be unloaded
before iser has a chance to free all the objects and a such a call may
invoke code segment which isn't valid any more and crash.

Cc: Sean Hefty <sean.hefty@intel.com
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-04 17:03:11 -07:00
Nicholas Bellinger
1d19f7800d ib_srpt: Call target_sess_cmd_list_set_waiting during shutdown_session
Given that srpt_release_channel_work() calls target_wait_for_sess_cmds()
to allow outstanding se_cmd_t->cmd_kref a change to complete, the call
to perform target_sess_cmd_list_set_waiting() needs to happen in
srpt_shutdown_session()

Also, this patch adds an explicit call to srpt_shutdown_session() within
srpt_drain_channel() so that target_sess_cmd_list_set_waiting() will be
called in the cases where TFO->shutdown_session() is not triggered
directly by TCM.

Cc: Joern Engel <joern@logfs.org>
Cc: Roland Dreier <roland@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-29 21:30:46 -07:00
Masanari Iida
8b513d0cf6 treewide: Fix typo in printk
Correct spelling typo in various part of drivers

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-28 12:02:13 +02:00
Joern Engel
be646c2d2b target: Remove unused wait_for_tasks bit in target_wait_for_sess_cmds
Drop unused transport_wait_for_tasks() check in target_wait_for_sess_cmds
shutdown code, and convert tcm_qla2xxx + ib_srpt fabric drivers.

Cc: Joern Engel <joern@logfs.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-20 21:44:10 -07:00
Linus Torvalds
e0fd9affeb InfiniBand/RDMA changes for the 3.10 merge window:
- XRC transport fixes
  - Fix DHCP on IPoIB
  - mlx4 preparations for flow steering
  - iSER fixes
  - miscellaneous other fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJRisChAAoJEENa44ZhAt0hHLoP/iX5MxtHJ3X1u5KcARX/7nci
 CCH/VnD172re/KavCCg7zkbZQpS4jHCtW/CzLUCSPqBGOaj78HFTUkB3ragvUW7m
 ndERwplF8DP/i0x7Kk7Wau2a4RdlH0lqwucOjqTyQnbIdxknkz6w3Jcsb9Ic2lzx
 up0T0HaHtTxdVF6lXOB5QOIpUGg3l0Yu4euX2BA61WDoZj+VIYhgeWZewq0iV0D1
 rLtarJ+Or7mdwu2rNcDHgrD0lhF4SCBd3rx4lbc4F68Cr8JUz0Xe7liPLNskeLhW
 f3NEm3gmkYp9YI1otGsA0X/CyV6wnRk4mT8JMlOb2WNzeq2V13Z54/9ZyF5/gFD2
 JgzkQB9Ibf7EmTwXWd6+0+FA40Q6dNvnRnhddRM255dvDVw7nxUr2UzYH4Re/Z9K
 rNFjkvix2YUwEmoPjitWocz2kj2reDMqjtiVDmdGy1YbtnicH5GtkQsWkoPg8ON9
 m5jORUdzydTD+yBJwTiFP1EuFoG3TdfoZ7zHMJwWy/u8i308xD6WPGms9MTdjh8j
 7gjz2TCKr+vpuVRh/p6esCPPOTSsSeWDeowy7Sgpdf3qoqAImXsWXVrl2kXLhtyl
 1VIgHU3ztm7oqwmy0gQ/zVCo4CLLdsif2zmEIDpxJPnWaSq+D9LJdyLTcfdBjhQ8
 9SjUafe4msT1pIjNb7ND
 =1ojD
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - XRC transport fixes
 - Fix DHCP on IPoIB
 - mlx4 preparations for flow steering
 - iSER fixes
 - miscellaneous other fixes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (23 commits)
  IB/iser: Add support for iser CM REQ additional info
  IB/iser: Return error to upper layers on EAGAIN registration failures
  IB/iser: Move informational messages from error to info level
  IB/iser: Add module version
  mlx4_core: Expose a few helpers to fill DMFS HW strucutures
  mlx4_core: Directly expose fields of DMFS HW rule control segment
  mlx4_core: Change a few DMFS fields names to match firmare spec
  mlx4: Match DMFS promiscuous field names to firmware spec
  mlx4_core: Move DMFS HW structs to common header file
  IB/mlx4: Set link type for RAW PACKET QPs in the QP context
  IB/mlx4: Disable VLAN stripping for RAW PACKET QPs
  mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level
  RDMA/iwcm: Don't touch cmid after dropping reference
  IB/qib: Correct qib_verbs_register_sysfs() error handling
  IB/ipath: Correct ipath_verbs_register_sysfs() error handling
  RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled
  SRPT: Fix odd use of WARN_ON()
  IPoIB: Fix ipoib_hard_header() return value
  RDMA: Rename random32() to prandom_u32()
  RDMA/cxgb3: Fix uninitialized variable
  ...
2013-05-08 15:29:48 -07:00
Roland Dreier
ea9627c800 Merge branches 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' into for-next 2013-05-08 14:12:37 -07:00
Andrew Morton
50bea5c0d5 drivers/infiniband/hw: rename random32() to prandom_u32()
Use preferable function name which implies using a pseudo-random number
generator.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 18:38:27 -07:00
Or Gerlitz
8d8399deb0 IB/iser: Add support for iser CM REQ additional info
Annex A12 of the IBTA spec defines additional information that needs
to be provided through the CM exchange relating to usage of ZBVA (Zero
Based VAs) and Send With Invalidate over an iSER connection.

Currently, the initiator sets both to not supported, but does provide
the header so that existing iSER targets can be patched to start
looking on the private data carried by the CM.

This is a preparation step to enable iSER with HW drivers for which
FMRs are not supported, such as mlx4 VF instances or new HW devices
which might support only FRWR (Fast Registration Work-Requests) along
the details of the IB_DEVICE_MEM_MGT_EXTENSIONS device capability.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-05-01 17:34:14 -07:00
Or Gerlitz
450d1e40d5 IB/iser: Return error to upper layers on EAGAIN registration failures
Commit 819a087316 ("IB/iser: Avoid error prints on EAGAIN
registration failures") not only eliminated the error print on that
case, but rather also modified the code such that it doesn't return
any error to upper layers.  As a result a wrong mapping was used.  Fix
this to correctly return the error in that case.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-05-01 17:34:13 -07:00
Roi Dayan
4f36388261 IB/iser: Move informational messages from error to info level
Introduce iser_info() and move informational messages that were
printed as errors to use that macro. Also, cleanup printk leftovers to
use the existing macros.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>

[ Use pr_warn(... instead of printk(KERN_WARNING ....  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-05-01 17:34:13 -07:00
Roi Dayan
c1d786e682 IB/iser: Add module version
Add displaying module version, update the version to 1.1,
and remove the DRV_DATE define.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-05-01 17:34:13 -07:00
Linus Torvalds
73287a43cc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights (1721 non-merge commits, this has to be a record of some
  sort):

   1) Add 'random' mode to team driver, from Jiri Pirko and Eric
      Dumazet.

   2) Make it so that any driver that supports configuration of multiple
      MAC addresses can provide the forwarding database add and del
      calls by providing a default implementation and hooking that up if
      the driver doesn't have an explicit set of handlers.  From Vlad
      Yasevich.

   3) Support GSO segmentation over tunnels and other encapsulating
      devices such as VXLAN, from Pravin B Shelar.

   4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

   5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
      Dukkipati.

   6) In the PHY layer, allow supporting wake-on-lan in situations where
      the PHY registers have to be written for it to be configured.

      Use it to support wake-on-lan in mv643xx_eth.

      From Michael Stapelberg.

   7) Significantly improve firewire IPV6 support, from YOSHIFUJI
      Hideaki.

   8) Allow multiple packets to be sent in a single transmission using
      network coding in batman-adv, from Martin Hundebøll.

   9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

  10) Generalize the VXLAN forwarding tables so that there is more
      flexibility in configurating various aspects of the endpoints.
      From David Stevens.

  11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
      from Dmitry Kravkov.

  12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
      Neira Ayuso.

  13) Start adding networking selftests.

  14) In situations of overload on the same AF_PACKET fanout socket, or
      per-cpu packet receive queue, minimize drop by distributing the
      load to other cpus/fanouts.  From Willem de Bruijn and Eric
      Dumazet.

  15) Add support for new payload offset BPF instruction, from Daniel
      Borkmann.

  16) Convert several drivers over to mdoule_platform_driver(), from
      Sachin Kamat.

  17) Provide a minimal BPF JIT image disassembler userspace tool, from
      Daniel Borkmann.

  18) Rewrite F-RTO implementation in TCP to match the final
      specification of it in RFC4138 and RFC5682.  From Yuchung Cheng.

  19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
      you like netlink, so I implemented netlink dumping of netlink
      sockets.") From Andrey Vagin.

  20) Remove ugly passing of rtnetlink attributes into rtnl_doit
      functions, from Thomas Graf.

  21) Allow userspace to be able to see if a configuration change occurs
      in the middle of an address or device list dump, from Nicolas
      Dichtel.

  22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
      Frederic Sowa.

  23) Increase accuracy of packet length used by packet scheduler, from
      Jason Wang.

  24) Beginning set of changes to make ipv4/ipv6 fragment handling more
      scalable and less susceptible to overload and locking contention,
      from Jesper Dangaard Brouer.

  25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
      instead.  From Hong Zhiguo.

  26) Optimize route usage in IPVS by avoiding reference counting where
      possible, from Julian Anastasov.

  27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

  28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
      Eitzenberger.

  29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
      nfnetlink_log, and nfnetlink_queue.  From Gao feng.

  30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

  31) Support several new r8169 chips, from Hayes Wang.

  32) Support tokenized interface identifiers in ipv6, from Daniel
      Borkmann.

  33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

  34) Add 802.1ad vlan offload support, from Patrick McHardy.

  35) Support mmap() based netlink communication, also from Patrick
      McHardy.

  36) Support HW timestamping in mlx4 driver, from Amir Vadai.

  37) Rationalize AF_PACKET packet timestamping when transmitting, from
      Willem de Bruijn and Daniel Borkmann.

  38) Bring parity to what's provided by /proc/net/packet socket dumping
      and the info provided by netlink socket dumping of AF_PACKET
      sockets.  From Nicolas Dichtel.

  39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
      Poirier"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
  filter: fix va_list build error
  af_unix: fix a fatal race with bit fields
  bnx2x: Prevent memory leak when cnic is absent
  bnx2x: correct reading of speed capabilities
  net: sctp: attribute printl with __printf for gcc fmt checks
  netlink: kconfig: move mmap i/o into netlink kconfig
  netpoll: convert mutex into a semaphore
  netlink: Fix skb ref counting.
  net_sched: act_ipt forward compat with xtables
  mlx4_en: fix a build error on 32bit arches
  Revert "bnx2x: allow nvram test to run when device is down"
  bridge: avoid OOPS if root port not found
  drivers: net: cpsw: fix kernel warn on cpsw irq enable
  sh_eth: use random MAC address if no valid one supplied
  3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
  tg3: fix to append hardware time stamping flags
  unix/stream: fix peeking with an offset larger than data in queue
  unix/dgram: fix peeking with an offset larger than data in queue
  unix/dgram: peek beyond 0-sized skbs
  openvswitch: Remove unneeded ovs_netdev_get_ifindex()
  ...
2013-05-01 14:08:52 -07:00
Linus Torvalds
6da6dc2380 Merge branch 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target update from Nicholas Bellinger:
 "The highlights this round include:

   - Add fileio support for WRITE_SAME w/ UNMAP=1 discard (asias)
   - Add fileio support for UNMAP discard (asias)
   - Add tcm_vhost hotplug support to work with upstream QEMU
     vhost-scsi-pci code (asias + mst)
   - Check for aborted sequence in tcm_fc response path (mdr)
   - Add initial iscsit_transport support into iscsi-target code (nab)
   - Refactor iscsi-target RX PDU logic + export request PDU handling
     (nab)
   - Refactor iscsi-target TX queue logic + export response PDU creation
     (nab)
   - Add new iSCSI Extentions for RDMA (ISER) target driver (Or + nab)

  The biggest changes revolve around iscsi-target refactoring in order
  to support the iser-target driver.  This includes the conversion of
  the iscsi-target data-path to use modern se_cmd->cmd_kref counting,
  and allowing transport independent aspects of RX/TX PDU
  request/response handling be shared across existing traditional
  iscsi-target code, and the new iser-target code.

  Thanks to Or Gerlitz + Mellanox for supporting the iser-target
  development effort!"

* 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (25 commits)
  iser-target: Add iSCSI Extensions for RDMA (iSER) target driver
  tcm_vhost: Enable VIRTIO_SCSI_F_HOTPLUG
  tcm_vhost: Add ioctl to get and set events missed flag
  tcm_vhost: Add hotplug/hotunplug support
  tcm_vhost: Refactor the lock nesting rule
  tcm_fc: Check for aborted sequence
  iscsi-target: Add iser network portal attribute
  iscsi-target: Refactor TX queue logic + export response PDU creation
  iscsi-target: Refactor RX PDU logic + export request PDU handling
  iscsi-target: Add per transport iscsi_cmd alloc/free
  iscsi-target: Add iser-target parameter keys + setup during login
  iscsi-target: Initial traditional TCP conversion to iscsit_transport
  iscsi-target: Add iscsit_transport API template
  target: Add export of target_get_sess_cmd symbol
  target: Change default sense key of NOT_READY
  target/file: Set is_nonrot attribute
  target: Add sbc_execute_unmap() helper
  target/iblock: Add iblock_do_unmap() helper
  target/file: Add fd_do_unmap() helper
  target/file: Add UNMAP emulation support
  ...
2013-04-30 13:14:57 -07:00
Nicholas Bellinger
b8d26b3be8 iser-target: Add iSCSI Extensions for RDMA (iSER) target driver
This patch adds support for iSCSI Extensions for RDMA target mode,
and includes CQ pooling per isert_device context distributed across
multiple active iser target sessions.

It also uses cmwq process context for RX / TX ib_post_cq() polling
via isert_cq_desc->cq_[rx,tx]_work invoked by isert_cq_[rx,tx]_callback()
hardIRQ context callbacks.

v5 changes:

- Use ISER_RECV_DATA_SEG_LEN instead of hardcoded value in ISER_RX_PAD_SIZE (Or)
- Fix make W=1 warnings (Or)
- Add missing depends on NET && INFINIBAND_ADDR_TRANS in Kconfig (Randy + Or)
- Make isert_device_find_by_ib_dev() return proper ERR_PTR (Wei Yongjun)
- Properly setup iscsi_np->np_sockaddr in isert_setup_np() (Shlomi + nab)
- Add special case for early ISCSI_OP_SCSI_CMD exception handling (nab)

v4 changes:
- Mark isert_cq_rx_work as static (Or)
- Drop unnecessary ib_dma_sync_single_for_cpu + ib_dma_sync_single_for_device
  calls for isert_cmd->sense_buf_dma from isert_put_response (Or)
- Use 12288 for ISER_RX_PAD_SIZE base to save extra page per
  struct iser_rx_desc (Or + nab)
- Drop now unnecessary isert_rx_desc usage, and convert RX users to
  iser_rx_desc (Or + nab)
- Move isert_[alloc,free]_rx_descriptors() ahead of
  isert_create_device_ib_res() usage (nab)
- Mark isert_cq_[rx,tx]_callback() + prototypes as static
- Fix 'warning: 'ret' may be used uninitialized' warning for
  isert_create_device_ib_res on powerpc allmodconfig (fengguang + nab)
- Fix 'warning: 'ret' may be used uninitialized' warning for
  isert_connect_request on i386 allyesconfig (fengguang + nab)
- Fix pr_debug conversion specification in isert_rx_completion()
  (fengguang + nab)
- Drop unnecessary isert_conn->conn_cm_id != NULL check in
  isert_connect_release causing the build warning:
  "variable dereferenced before check 'isert_conn->conn_cm_id'"
- Fix isert_lid + isert_np leak in isert_setup_np failure path
- Add isert_conn->conn_wait_comp_err usage in isert_free_conn()
  for isert_cq_comp_err completion path
- Add isert_conn->logout_posted bit to determine decrement of
  isert_conn->post_send_buf_count from logout response completion
- Always set ISER_CONN_DOWN from isert_disconnect_work() callback

v3 changes:

- Convert to use per isert_cq_desc->cq_[rx,tx]_work + drop tasklets (Or + nab)
- Move IB_EVENT_QP_LAST_WQE_REACHED warn into correct
  isert_qp_event_callback (Or)
- Drop unnecessary IB_ACCESS_REMOTE_* access flag usage in
  isert_create_device_ib_res (Or)
- Add common isert_init_send_wr(), and convert isert_put_* calls (Or)
- Move to verbs+core logic to single ib_isert.[c,h]  (Or + nab)
- Add kmem_cache isert_cmd_cache usage for descriptor allocation (nab)
- Move common ib_post_send() logic used by isert_put_*() to
  isert_post_response() (nab)
- Add isert_put_reject call in isert_response_queue() for posting
  ISCSI_REJECT response. (nab)
- Add ISTATE_SEND_REJECT checking in isert_do_control_comp. (nab)

v2 changes:

- Drop unused ISERT_ADDR_ROUTE_TIMEOUT define
- Add rdma_notify() call for IB_EVENT_COMM_EST in isert_qp_event_callback()
- Make isert_query_device() less verbose
- Drop unused RDMA_CM_EVENT_ADDR_ERROR and RDMA_CM_EVENT_ROUTE_ERROR
  cases from isert_cma_handler()
- Drop unused rdma/ib_fmr_pool.h include
- Update isert_conn_setup_qp() to assign cq based upon least used
- Add isert_create_device_ib_res() to setup PD, CQs and MRs for each
  underlying struct ib_device, instead of using per isert_conn resources.
- Add isert_free_device_ib_res() to release PD, CQs and MRs for each
  underlying struct ib_device.
- Add isert_device_find_by_ib_dev()
- Change isert_connect_request() to drop PD, CQs and MRs allocation,
  and use isert_device_find_by_ib_dev() instead.
- Add isert_device_try_release()
- Change isert_connect_release() to decrement cq_active_qps, and drop
  PD, CQs and MRs resource release.
- Update isert_connect_release() to call isert_device_try_release()
- Make isert_create_device_ib_res() determine device->cqs_used based
  upon num_online_cpus()
- Drop misleading isert_dump_ib_wc() usage
- Drop unused rdma/ib_fmr_pool.h include
- Use proper xfer_len for login PDUs in isert_rx_completion()
- Add isert_release_cmd() usage
- Change isert_alloc_cmd() to setup iscsi_cmd.release_cmd() pointer
- Change isert_put_cmd() to perform per iscsi_opcode specific release
  logic
- Add isert_unmap_cmd() call for ISCSI_OP_SCSI_CMD from isert_put_cmd()
- Change isert_send_completion() to call
  atomic_dec(&isert_conn->post_send_buf_count)
  based upon per iscsi_opcode logic
- Drop ISTATE_REMOVE processing from isert_immediate_queue()
- Drop ISTATE_SEND_DATAIN processing from isert_response_queue()
- Drop ISTATE_SEND_STATUS processing from isert_response_queue()
- Drop iscsit_transport->iscsit_unmap_cmd() and ->iscsit_free_cmd()
- Convert iser_cq_tx_tasklet() to use struct isert_cq_desc pooling logic
- Convert isert_cq_tx_callback() to use struct isert_cq_desc pooling
  logic
- Convert iser_cq_rx_tasklet() to use struct isert_cq_desc pooling logic
- Convert isert_cq_rx_callback() to use struct isert_cq_desc pooling
  logic
- Add explict iscsit_stop_dataout_timer() call to
  isert_do_rdma_read_comp()
- Use isert_get_dataout() for iscsit_transport->iscsit_get_dataout()
  caller
- Drop ISTATE_SEND_R2T processing from isert_immediate_queue()
- Drop unused rdma/ib_fmr_pool.h include
- Drop isert_cmd->cmd_kref in favor of se_cmd->cmd_kref usage
- Add struct isert_device in order to support multiple EQs + CQ pooling
- Add struct isert_cq_desc
- Drop tasklets and cqs from isert_conn
- Bump ISERT_MAX_CQ to 64
- Various minor checkpatch fixes

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-04-25 01:09:41 -07:00
Patrick McHardy
dc850b0e68 IPoIB: add support for TIPC protocol
Support TIPC in the IPoIB driver. Since IPoIB now keeps track of its own
neighbour entries and doesn't require the packet to have a dst_entry
anymore, the only necessary changes are to:

- not drop multicast TIPC packets because of the unknown ethernet type
- handle unicast TIPC packets similar to IPv4/IPv6 unicast packets

in ipoib_start_xmit().

An alternative would be to remove all ethertype limitations since they're
not necessary anymore, all TIPC needs to know about is ARP and RARP since
it wants to always perform "path find", even if a path is already known.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-17 14:18:33 -04:00
Grant Grundler
532ec6f1c0 SRPT: Fix odd use of WARN_ON()
While WARN_ON("const string") will work, it's intent is not obvious.
The warning is more useful by showing the "state" value.

Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-04-16 22:58:08 -07:00
Doug Ledford
83bdd3b96c IPoIB: Fix ipoib_hard_header() return value
If you have a patched up dhcp server (and dhclient), they will use
AF_PACKET/SOCK_DGRAM pair to send dhcp packets over IPoIB.

However, when testing an upstream kernel, this has been broken for a
very long time (I tested 2.6.34, 2.6.38, 3.0, 3.1, 3.8, HEAD).

It turns out that the hard_header routine in ipoib is not following
the API and is returning 0 even when it pushed data onto the skb.
This then causes af_packet.c to overwrite the header just pushed with
data from user space.

Fixing this gets DHCP working on IPoIB.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-04-16 22:57:09 -07:00
Akinobu Mita
cc529c0d72 RDMA: Rename random32() to prandom_u32()
Use more preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Steve Wise <swise@chelsio.com>
Cc: linux-rdma@vger.kernel.org
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-04-16 22:47:05 -07:00
Mike Marciniszyn
1ee9e2aa7b IPoIB: Fix send lockup due to missed TX completion
Commit f0dc117abd ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.

The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.

This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq().  If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.

This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-03-22 18:01:04 -07:00
Roland Dreier
ef4e359d9b Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' into for-next 2013-02-26 09:17:56 -08:00
Roland Dreier
f72dd56690 IPoIB: Free ipoib neigh on path record failure so path rec queries are retried
If IPoIB fails to look up a path record (eg if it tries during an SM
failover when one SM is dead but the new one hasn't taken over yet), the
driver ends up with a neighbour structure but no address handle (AH).
There's no mechanism to recover from this: any further packets sent to
this destination will be silently dumped in ipoib_start_xmit().

Fix this by freeing the neighbour structures when a path rec query
fails, so that the next packet queued to be sent will trigger a new path
record query.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:42:15 -08:00