2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-28 07:04:00 +08:00
Commit Graph

1260 Commits

Author SHA1 Message Date
Sagi Grimberg
6e6fe2fb1d IB/iser: Optimize completion polling
Poll in batch of 16. Since we don't want it on the stack, keep under
iser completion context (iser_comp).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:07 -07:00
Sagi Grimberg
ff3dd52d26 IB/iser: Use beacon to indicate all completions were consumed
Avoid post_send counting (atomic) in the IO path just to keep track of
how many completions we need to consume.  Use a beacon post to indicate
that all prior posts completed.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:07 -07:00
Sagi Grimberg
6aabfa76f5 IB/iser: Use single CQ for RX and TX
This will solve a possible condition where we might miss TX completion
(flush error) during session teardown.  Since we are using a single
CQ, we don't need to actively drain the TX CQ, instead just wait for
flush_completion (when counters reach zero) and remove iser_poll_for_flush_errors().

This patch might introduce a minor performance regression on its own,
but the next patches will enhance performance using a single CQ for RX
and TX.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:07 -07:00
Sagi Grimberg
183cfa434e IB/iser: Use internal polling budget to avoid possible live-lock
We need a way to guarentee that we don't stay in soft-IRQ context for
too long.  We might starve other pending CQ tasklets or worse lock
against application trying to issue IO on the running CPU.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg
bf17554035 IB/iser: Centralize iser completion contexts
Introduce iser_comp which centralizes all iser completion related
items and is referenced by iser_device and each ib_conn.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Ariel Nahum
aea8f4df6d IB/iser: Use iser_warn instead of BUG_ON in iser_conn_release
In case iscsid was violently killed (SIGKILL) during its error
recovery stage, we may never get a connection teardown sequence for
some of the old connections.  No harm done, but when we try to unload
the module we will need to cleanup all these connections.  So we
actually may end-up here - so it's not a BUG_ON(), just give a relaxed
warning that this happened and continue with normal unload.  BUG_ON()
will cause segfault on module_exit and we don't want that.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg
8c204e69ce IB/iser: Signal iSCSI layer that transport is broken in error completions
Previously we notified iscsi layer about the connection layer when
we consumed all of our flush errors. This was racy as there
was no guarentee that iscsi_conn wasn't terminated by then (which ends
up in an invalid memory access). In case we got a non FLUSH error
completion, we are guarenteed that iscsi_conn is still alive. We should
notify iSCSI layer with iscsi_conn_failure to initiate error handling.

While we are at it, add a nice kernel-doc style documentation.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg
3a940daf6f IB/iser: Protect tasks cleanup in case IB device was already released
Bailout in case a task cleanup (iscsi_iser_cleanup_task) is called
after the IB device was removed (DEVICE_REMOVAL CM event).  We also
call iscsi_conn_stop with a lock taken to prevent DEVICE_REMOVAL and
tasks cleanup from racing.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Ariel Nahum
ec370e2b63 IB/iser: Unbind at conn_stop stage
Previously we didn't need to unbind the iser_conn and iscsi_conn since
we always relied on iscsi daemon to teardown the connection and never
let it finish before we cleanup all that is needed in iser.  This is
not the case anymore (for DEVICE_REMOVAL event).  So avoid any possible
chance we cause iscsi_conn dereference after iscsi_conn was freed.

We also call iser_conn_terminate (safe to call multiple times) just
for the corner case of iscsi daemon stopping an old connection before
invoking endpoint removal (might happen if it was violently killed).

Notice we are unbinding under a lock - which is required.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg
c107a6c0cf IB/iser: Don't bound release_work completions timeouts
We no longer rely on iscsi connection teardown sequence, so no need to
give a grace period and continue cleanup if it expired. Have
iser_conn_release wait for full completion before freeing iser_conn.

ib_completion:
	Guaranteed to come when:
	    - Got DISCONNECTED/ADDR_CHANGE event or
	    - iSCSI called ep_disconnect/conn_stop
	Guaranteed to finish when:
	    - Got TIMEWAIT_EXIT/DEVICE_REMOVAL event
	    - All Flush errors are consumed
	    - IB related resources are destroyed

stop_completion:
	Guaranteed to come when:
	    - iSCSI calls conn_stop
	Guaranteed to finish when:
	    - All inflight tasks were cleaned up

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg
c47a3c9ed5 IB/iser: Fix DEVICE REMOVAL handling in the absence of iscsi daemon
iscsi daemon is in user-space, thus we can't rely on it to be invoked
at connection teardown (if not running or does not receive CPU time).

This patch addresses the issue by re-structuring iSER connection
teardown logic and CM events handling.

The CM events will dictate the RDMA resources destruction (ib_conn)
and iser_conn is kept around as long as iscsi_conn is left around
allowing iscsi/iser callbacks to continue after RDMA transport was
destroyed.

This patch introduces a separation in logic when handling CM events:

- DISCONNECTED_HANDLER, ADDR_CHANGED
  This events indicate the start of teardown process.
  Actions:
  1. Terminate the connection: rdma_disconnect (send DREQ/DREP)
  2. Notify iSCSI of connection failure
  3. Change state to TERMINATING
  4. Poll for all flush errors to be consumed

- TIMEWAIT_EXIT, DEVICE_REMOVAL
  These events indicate the final stage of termination process and
  we can free RDMA related resources.
  Actions:
  1. Call disconnected handler (we are not guaranteed that DISCONNECTED
     event was invoked in the past)
  2. Cleanup RDMA related resources
  3. For DEVICE_REMOVAL return non-zero rc from cma_handler to
     implicitly destroy the cm_id (Can't rely on user-space, make sure
     we have forward progress)

We replace flush_completion (indicate all flushes were consumed) with
ib_completion (rdma resources were cleaned up).

The iser_conn_release_work will wait for teardown completions:

- conn_stop was completed (tasks were cleaned-up) - stop_completion
- RDMA resources were destroyed - ib_completion

And then will continue to free iser connection representation (iser_conn).

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg
96f15198c1 IB/iser: Extend iser_free_ib_conn_res()
Put all connection IB related resources release in this routine.  One
exception is the cm_id which cannot be destroyed as the routine is
protected by the state mutex.  Also move its position to avoid forward
declaration.  While at it fix qp NULL assignment.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Roi Dayan
6bb0279f95 IB/iser: Remove unused variables and dead code
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg
a4ee3539f6 IB/iser: Re-introduce ib_conn
Structure that describes the RDMA relates connection objects.  Static
member of iser_conn.

This patch does not change any functionality

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Sagi Grimberg
5716af6e52 IB/iser: Rename ib_conn -> iser_conn
Two reasons why we choose to do this:

1. No point today calling struct iser_conn by another name ib_conn
2. In the next patches we will restructure iser control plane representation
   - struct iser_conn: connection logical representation
   - struct ib_conn: connection RDMA layout representation

This patch does not change any functionality.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-10-09 00:06:06 -07:00
Eric Dumazet
0287587884 net: better IFF_XMIT_DST_RELEASE support
Testing xmit_more support with netperf and connected UDP sockets,
I found strange dst refcount false sharing.

Current handling of IFF_XMIT_DST_RELEASE is not optimal.

Dropping dst in validate_xmit_skb() is certainly too late in case
packet was queued by cpu X but dequeued by cpu Y

The logical point to take care of drop/force is in __dev_queue_xmit()
before even taking qdisc lock.

As Julian Anastasov pointed out, need for skb_dst() might come from some
packet schedulers or classifiers.

This patch adds new helper to cleanly express needs of various drivers
or qdiscs/classifiers.

Drivers that need skb_dst() in their ndo_start_xmit() should call
following helper in their setup instead of the prior :

	dev->priv_flags &= ~IFF_XMIT_DST_RELEASE;
->
	netif_keep_dst(dev);

Instead of using a single bit, we use two bits, one being
eventually rebuilt in bonding/team drivers.

The other one, is permanent and blocks IFF_XMIT_DST_RELEASE being
rebuilt in bonding/team. Eventually, we could add something
smarter later.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Julian Anastasov <ja@ssi.bg>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-10-07 13:22:11 -04:00
Nicholas Bellinger
0d0f660d88 iser-target: Disable TX completion interrupt coalescing
This patch explicitly disables TX completion interrupt coalescing logic
in isert_put_response() and isert_put_datain() that was originally added
as an efficiency optimization in commit 95b60f07.

It has been reported that this change can trigger ABORT_TASK timeouts
under certain small block workloads, where disabling coalescing was
required for stability.  According to Sagi, this doesn't impact
overall performance, so go ahead and disable it for now.

Reported-by: Moussa Ba <moussaba@micron.com>
Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Cc: <stable@vger.kernel.org> # 3.13+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-10-05 02:27:05 -07:00
Sagi Grimberg
1acff63f6e iser-target: Fix smatch warning
Unused return value from down_interruptible

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-10-03 11:16:11 -07:00
Linus Torvalds
452b6361c4 Last late set of InfiniBand/RDMA fixes for 3.17:
- Fixes for the new memory region re-registration support
  - iSER initiator error path fixes
  - Grab bag of small fixes for the qib and ocrdma hardware drivers
  - Larger set of fixes for mlx4, especially in RoCE mode
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJUIexdAAoJEENa44ZhAt0hP10QAJztxlS2a8U3JCJzthwSYxlI
 ohT9487iLk1uEcj4Z3i7w2ERRUzXaHbRTktNHFjwfRb8x2qMUgT2PfD6/30sQ250
 nJAk3FRFNipxKkJSfmcc3+O4r91i4F+CaN8DGypaBDHcupeD2drKocl/Iu5MIvkG
 e5CzLlS7i/xrWKmgYP4bIqqFZsqQ+2rJrYBDybuLZSaZNd0PTDE3yCDihfOcsxjn
 TeOCVbm5895fPRtxzeCGHy8bXbYYN9vItuhtHC+sntYtbhNJhjpmP+1yD6M2SoZR
 34sGd7AA1j1H6ATmanzeW2aALkFYPIuGihDbbnRQlDG1v09lEPfP2GtfLxoQ9Ibo
 nfe2rsthzV6Qh2xcXjn6KicgV7bb6aSUXEK24zKx7O3MkOvHkOC/JIIrd9dFe+uj
 R7pUd3XlAk8SBhTQ4gLub06Dl7ynzSRArwcdMTHp30LvtnjJZoQR67WGGrsdwlIW
 MV43105i7iLCcdaSd0ihKnR6OFlSh13Z0wpu+B386bwxkHxjFJXkVHxOJir/iAk9
 cW4RXbA/ic7nwIjes4GbMNDOvdJO2tDcg9KGSgiDY3kC5GksPqfxXYVDlMB2rFoE
 PhfQ8TOcbZYTmlcKLMpMIFXP484VPhWQJeYWPOf9KGS6aW5QRNPsPCmAvaoSXWLs
 GVSlvjbE6O7MgonqG1Jh
 =Kpm1
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma fixes from Roland Dreier:
 "Last late set of InfiniBand/RDMA fixes for 3.17:

   - fixes for the new memory region re-registration support
   - iSER initiator error path fixes
   - grab bag of small fixes for the qib and ocrdma hardware drivers
   - larger set of fixes for mlx4, especially in RoCE mode"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits)
  IB/mlx4: Fix VF mac handling in RoCE
  IB/mlx4: Do not allow APM under RoCE
  IB/mlx4: Don't update QP1 in native mode
  IB/mlx4: Avoid accessing netdevice when building RoCE qp1 header
  mlx4: Fix mlx4 reg/unreg mac to work properly with 0-mac addresses
  IB/core: When marshaling uverbs path, clear unused fields
  IB/mlx4: Avoid executing gid task when device is being removed
  IB/mlx4: Fix lockdep splat for the iboe lock
  IB/mlx4: Get upper dev addresses as RoCE GIDs when port comes up
  IB/mlx4: Reorder steps in RoCE GID table initialization
  IB/mlx4: Don't duplicate the default RoCE GID
  IB/mlx4: Avoid null pointer dereference in mlx4_ib_scan_netdevs()
  IB/iser: Bump version to 1.4.1
  IB/iser: Allow bind only when connection state is UP
  IB/iser: Fix RX/TX CQ resource leak on error flow
  RDMA/ocrdma: Use right macro in query AH
  RDMA/ocrdma: Resolve L2 address when creating user AH
  mlx4: Correct error flows in rereg_mr
  IB/qib: Correct reference counting in debugfs qp_stats
  IPoIB: Remove unnecessary port query
  ...
2014-09-23 16:47:34 -07:00
Linus Torvalds
98f75b8291 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Pull networking fixes from David Miller:

 1) If the user gives us a msg_namelen of 0, don't try to interpret
    anything pointed to by msg_name.  From Ani Sinha.

 2) Fix some bnx2i/bnx2fc randconfig compilation errors.

    The gist of the issue is that we firstly have drivers that span both
    SCSI and networking.  And at the top of that chain of dependencies
    we have things like SCSI_FC_ATTRS and SCSI_NETLINK which are
    selected.

    But since select is a sledgehammer and ignores dependencies,
    everything to select's SCSI_FC_ATTRS and/or SCSI_NETLINK has to also
    explicitly select their dependencies and so on and so forth.

    Generally speaking 'select' is supposed to only be used for child
    nodes, those which have no dependencies of their own.  And this
    whole chain of dependencies in the scsi layer violates that rather
    strongly.

    So just make SCSI_NETLINK depend upon it's dependencies, and so on
    and so forth for the things selecting it (either directly or
    indirectly).

    From Anish Bhatt and Randy Dunlap.

 3) Fix generation of blackhole routes in IPSEC, from Steffen Klassert.

 4) Actually notice netdev feature changes in rtl_open() code, from
    Hayes Wang.

 5) Fix divide by zero in bond enslaving, from Nikolay Aleksandrov.

 6) Missing memory barrier in sunvnet driver, from David Stevens.

 7) Don't leave anycast addresses around when ipv6 interface is
    destroyed, from Sabrina Dubroca.

 8) Don't call efx_{arch}_filter_sync_rx_mode before addr_list_lock is
    initialized in SFC driver, from Edward Cree.

 9) Fix missing DMA error checking in 3c59x, from Neal Horman.

10) Openvswitch doesn't emit OVS_FLOW_CMD_NEW notifications accidently,
    fix from Samuel Gauthier.

11) pch_gbe needs to select NET_PTP_CLASSIFY otherwise we can get a
    build error.

12) Fix macvlan regression wherein we stopped emitting
    broadcast/multicast frames over software devices.  From Nicolas
    Dichtel.

13) Fix infiniband bug due to unintended overflow of skb->cb[], from
    Eric Dumazet.  And add an assertion so this doesn't happen again.

14) dm9000_parse_dt() should return error pointers, not NULL.  From
    Tobias Klauser.

15) IP tunneling code uses this_cpu_ptr() in preemptible contexts, fix
    from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (87 commits)
  net: bcmgenet: call bcmgenet_dma_teardown in bcmgenet_fini_dma
  net: bcmgenet: fix TX reclaim accounting for fragments
  ipv4: do not use this_cpu_ptr() in preemptible context
  dm9000: Return an ERR_PTR() in all error conditions of dm9000_parse_dt()
  r8169: fix an if condition
  r8152: disable ALDPS
  ipoib: validate struct ipoib_cb size
  net: sched: shrink struct qdisc_skb_cb to 28 bytes
  tg3: Work around HW/FW limitations with vlan encapsulated frames
  macvlan: allow to enqueue broadcast pkt on virtual device
  pch_gbe: 'select' NET_PTP_CLASSIFY.
  scsi: Use 'depends' with LIBFC instead of 'select'.
  openvswitch: restore OVS_FLOW_CMD_NEW notifications
  genetlink: add function genl_has_listeners()
  lib: rhashtable: remove second linux/log2.h inclusion
  net: allow macvlans to move to net namespace
  3c59x: Fix bad offset spec in skb_frag_dma_map
  3c59x: Add dma error checking and recovery
  sparc: bpf_jit: fix support for ldx/stx mem and SKF_AD_VLAN_TAG
  can: at91_can: add missing prepare and unprepare of the clock
  ...
2014-09-22 18:23:33 -07:00
Eric Dumazet
b49fe36208 ipoib: validate struct ipoib_cb size
To catch future errors sooner.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-22 14:29:55 -04:00
Roland Dreier
3bdad2d13f Merge branches 'core', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next 2014-09-22 10:05:40 -07:00
Or Gerlitz
61aabb3c91 IB/iser: Bump version to 1.4.1
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22 09:46:42 -07:00
Sagi Grimberg
91eb1df39a IB/iser: Allow bind only when connection state is UP
We need to fail the bind operation if the iser connection state != UP
(started teardown) and this should be done under the state lock.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22 09:46:42 -07:00
Roi Dayan
c33b15f00b IB/iser: Fix RX/TX CQ resource leak on error flow
When failing to allocate TX CQ we already allocated RX CQ, so we need to make
sure we release it. Also, when failing to register notification to the RX CQ
we currently leak both RX and TX CQs of the current index, fix that too.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-22 09:46:42 -07:00
Alex Estrin
68f9d83c77 IPoIB: Remove unnecessary port query
There are two queries for port attributes one after another. A second
call is not needed since port_attr structure already holds the data.

Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-09-19 10:17:40 -07:00
Sagi Grimberg
1a92e17e39 Target/iser: Fix initiator_depth and responder_resources
The iser target is the RDMA requester and the iser initiator is the
RDMA responder. In order to determine the max inflight RDMA READ requests
to set on the QP (initiator_depth), it should take the min between the
initiator published initiator_depth and the max inflight rdma read
requests its local HCA support (max_qp_init_rd_atom).

The target will never handle incoming RDMA READ requests so no need to
set responder_resources.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-09-15 14:02:55 -07:00
Sagi Grimberg
38a8316b5d Target/iser: Avoid calling rdma_disconnect twice
rdma_disconnect may be called in 2 code flows:
- isert_wait_conn: disconnect initiated be the target
- disconnected_handler: disconnect invoked by the initiator

In case isert_conn->disconnect is true then rdma_disconnect
was called in disconnected handler, no need to call it again
from isert_wait_conn.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-09-15 14:02:30 -07:00
Sagi Grimberg
0fc4ea701f Target/iser: Don't put isert_conn inside disconnected handler
disconnected_handler is invoked on several CM events (such
as DISCONNECTED, DEVICE_REMOVAL, TIMEWAIT_EXIT...). Since
multiple  events can occur while before isert_free_conn is
invoked, we might put all isert_conn references and free
the connection too early.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-09-15 14:02:16 -07:00
Sagi Grimberg
c2f88b17a1 Target/iser: Get isert_conn reference once got to connected_handler
In case the connection didn't reach connected state, disconnected
handler will never be invoked thus the second kref_put on
isert_conn will be missing.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-09-15 14:02:02 -07:00
Linus Torvalds
e3b1fd56f1 Main set of InfiniBand/RDMA updates for 3.17 merge window:
- MR reregistration support
  - MAD support for RMPP in userspace
  - iSER and SRP initiator updates
  - ocrdma hardware driver updates
  - other fixes...
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJT7N2GAAoJEENa44ZhAt0hUiIQAKBqYIjpB3QY6Z/B19mxDxku
 I81B3OkirumbAaCLoLckvq4gnwQ+BAD+YUmXgP08TCIgrABZAYYw+WvMEY9WNyQB
 x3Pv+BzX+wKKNaQkSnB9JVdku+BSI76eW0YYrIX0F0x1o0Jq9JpSkia91KmRvqmX
 YSFy2R7BjEZ4lfo/uscydHT26Q6EdT3od4iv48K8qq5rKdjtyYNgD/75DLCN599Z
 uI1f98e6Tl+7nHWaioQB61zlYPkNPLAnZtMrY2j4tarTwYwX1KhF3eV7z39L1l81
 nhMIXr+qBXtYuZw3I9rKqw3VVCCqB9e6E8FIA5K/d2jWqO+0TqIMUYOuZXuCezWg
 o+uGgwbDOweBqwrKRmiR2M0lk2I1Z16jBxYuaUBbLImG0/NPtuUB22t6RbPAojTa
 EjDkb9XBA7uFUMrYYnou+HxEzmJUYkin6wgGxtklYEKUqvh8G9ccGt6httWrCSrV
 mpjwJv+S4LdFM49wdP993lYthpCoZ42yxEzZ7zJ/KTt17/Wb5F4RtHIROGVFkHdT
 mo8RUz5DGDznfNQgn0m/jC3woFnMNpLOduI+CivhrwrWCwMpSUxIjqWu3263pIJ7
 +H0kNOKDAbp6+F27j+AVznlyXnaEtYyM8EZnysG1Hkz24gCSWBYo5Ep8eSH8LgY4
 9VPc75KVG6uddx5mhg5h
 =wOm0
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma updates from Roland Dreier:
 "Main set of InfiniBand/RDMA updates for 3.17 merge window:

   - MR reregistration support
   - MAD support for RMPP in userspace
   - iSER and SRP initiator updates
   - ocrdma hardware driver updates
   - other fixes..."

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (52 commits)
  IB/srp: Fix return value check in srp_init_module()
  RDMA/ocrdma: report asic-id in query device
  RDMA/ocrdma: Update sli data structure for endianness
  RDMA/ocrdma: Obtain SL from device structure
  RDMA/uapi: Include socket.h in rdma_user_cm.h
  IB/srpt: Handle GID change events
  IB/mlx5: Use ARRAY_SIZE instead of sizeof/sizeof[0]
  IB/mlx4: Use ARRAY_SIZE instead of sizeof/sizeof[0]
  RDMA/amso1100: Check for integer overflow in c2_alloc_cq_buf()
  IPoIB: Remove unnecessary test for NULL before debugfs_remove()
  IB/mad: Add user space RMPP support
  IB/mad: add new ioctl to ABI to support new registration options
  IB/mad: Add dev_notice messages for various umad/mad registration failures
  IB/mad: Update module to [pr|dev]_* style print messages
  IB/ipoib: Avoid multicast join attempts with invalid P_key
  IB/umad: Update module to [pr|dev]_* style print messages
  IB/ipoib: Avoid flushing the workqueue from worker context
  IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
  IB/ipath: Add P_Key change event support
  mlx4_core: Add support for secure-host and SMP firewall
  ...
2014-08-14 11:09:05 -06:00
Roland Dreier
d087f6ad72 Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'iwcm', 'mad', 'misc', 'mlx4', 'mlx5', 'ocrdma' and 'srp' into for-next 2014-08-14 08:58:04 -07:00
Wei Yongjun
da05be290f IB/srp: Fix return value check in srp_init_module()
In case of error, the function create_workqueue() returns NULL pointer
not ERR_PTR().  The IS_ERR() test in the return value check should be
replaced with NULL test.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Acked-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-14 08:57:54 -07:00
Doug Ledford
2aa1cf64aa IB/srpt: Handle GID change events
GID change events need a refresh just like LID change events and several
others.  Handle this the same as the others.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-12 22:01:55 -07:00
Fabian Frederick
e42fa2092c IPoIB: Remove unnecessary test for NULL before debugfs_remove()
Fix checkpatch warning:

    WARNING: debugfs_remove(NULL) is safe this check is probably not required

Signed-off-by: Fabian Frederick <fabf@skynet.be>
Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-12 21:59:54 -07:00
Ira Weiny
0f29b46d49 IB/mad: add new ioctl to ABI to support new registration options
Registrations options are specified through flags.  Definitions of flags will
be in subsequent patches.

Signed-off-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-10 20:36:00 -07:00
Alex Estrin
dd57c9308a IB/ipoib: Avoid multicast join attempts with invalid P_key
Currently, the parent interface keeps sending broadcast group join
requests even if p_key index 0 is invalid, which is possible/common in
virtualized environments where a VF has been probed to VM but the
actual P_key configuration has not yet been assigned by the management
software. This creates unnecessary noise on the fabric and in the
kernel logs:

    ib0: multicast join failed for ff12:401b:8000:0000:0000:0000:ffff:ffff, status -22

The original code run the multicast task regardless of the actual
P_key value, which can be avoided. The fix is to re-init resources and
bring interface up only if P_key index 0 is valid either when starting
up or on PKEY_CHANGE event.

Fixes: c290414169 ("IPoIB: Fix pkey change flow for virtualization environments")
Reviewed-by: Ira Weiny <ira.weiny@intel.com>
Signed-off-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-10 19:53:56 -07:00
Erez Shitrit
4eae374845 IB/ipoib: Avoid flushing the workqueue from worker context
The error flow of ipoib_ib_dev_open() invokes ipoib_ib_dev_stop() with
workqueue flushing enabled, which deadlocks if the open procedure
itself was called by a worker thread.

Fix this by adding a flush enabled flag to ipoib_ib_dev_open() and set
it accordingly from the locations where such a call is made.

The call trace was the following:

 [<ffffffff81095bc4>] ? flush_workqueue+0x54/0x80
 [<ffffffffa056c657>] ? ipoib_ib_dev_stop+0x447/0x650 [ib_ipoib]
 [<ffffffffa056cc34>] ? ipoib_ib_dev_open+0x284/0x430 [ib_ipoib]
 [<ffffffffa05674a8>] ? ipoib_open+0x78/0x1d0 [ib_ipoib]
 [<ffffffffa05697b8>] ? ipoib_pkey_open+0x38/0x40 [ib_ipoib]
 [<ffffffffa056cf3c>] ? __ipoib_ib_dev_flush+0x15c/0x2c0 [ib_ipoib]
 [<ffffffffa056ce56>] ? __ipoib_ib_dev_flush+0x76/0x2c0 [ib_ipoib]
 [<ffffffffa056d0a0>] ? ipoib_ib_dev_flush_heavy+0x0/0x20 [ib_ipoib]
 [<ffffffffa056d0ba>] ? ipoib_ib_dev_flush_heavy+0x1a/0x20 [ib_ipoib]
 [<ffffffff81094d20>] ? worker_thread+0x170/0x2a0
 [<ffffffff8109b2a0>] ? autoremove_wake_function+0x0/0x40

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-05 07:47:33 -07:00
Erez Shitrit
db84f88037 IB/ipoib: Use P_Key change event instead of P_Key polling mechanism
The current code use a dedicated polling logic to determine when the P_Key
assigned to the ipoib device is present in HCA port table and act accordingly.

Move to use the code which acts upon getting PKEY_CHANGE event to handle this
task and remove the P_Key polling logic/thread as they add extra complexity
which isn't needed.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Acked-by: Alex Estrin <alex.estrin@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-05 07:47:33 -07:00
Bart Van Assche
e714531a34 IB/srp: Fix residual handling
From Documentation/scsi/scsi_mid_low_api.txt: "resid - an LLD should
set this signed integer to the requested transfer length (i.e.
'request_bufflen') less the number of bytes that are actually
transferred."  This means that resid > 0 in case of an underrun and
also that resid < 0 in case of an overrun.  Modify the SRP initiator
code such that it matches this requirement.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: David Dillow <dave@thedillows.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01 15:21:51 -07:00
Bart Van Assche
bcc0591035 IB/srp: Fix deadlock between host removal and multipathd
If scsi_remove_host() is invoked after a SCSI device has been blocked,
if the fast_io_fail_tmo or dev_loss_tmo work gets scheduled on the
workqueue executing srp_remove_work() and if an I/O request is
scheduled after the SCSI device had been blocked by e.g. multipathd
then the following deadlock can occur:

    kworker/6:1     D ffff880831f3c460     0   195      2 0x00000000
    Call Trace:
     [<ffffffff814aafd9>] schedule+0x29/0x70
     [<ffffffff814aa0ef>] schedule_timeout+0x10f/0x2a0
     [<ffffffff8105af6f>] msleep+0x2f/0x40
     [<ffffffff8123b0ae>] __blk_drain_queue+0x4e/0x180
     [<ffffffff8123d2d5>] blk_cleanup_queue+0x225/0x230
     [<ffffffffa0010732>] __scsi_remove_device+0x62/0xe0 [scsi_mod]
     [<ffffffffa000ed2f>] scsi_forget_host+0x6f/0x80 [scsi_mod]
     [<ffffffffa0002eba>] scsi_remove_host+0x7a/0x130 [scsi_mod]
     [<ffffffffa07cf5c5>] srp_remove_work+0x95/0x180 [ib_srp]
     [<ffffffff8106d7aa>] process_one_work+0x1ea/0x6c0
     [<ffffffff8106dd9b>] worker_thread+0x11b/0x3a0
     [<ffffffff810758bd>] kthread+0xed/0x110
     [<ffffffff814b972c>] ret_from_fork+0x7c/0xb0
    multipathd      D ffff880096acc460     0  5340      1 0x00000000
    Call Trace:
     [<ffffffff814aafd9>] schedule+0x29/0x70
     [<ffffffff814aa0ef>] schedule_timeout+0x10f/0x2a0
     [<ffffffff814ab79b>] io_schedule_timeout+0x9b/0xf0
     [<ffffffff814abe1c>] wait_for_completion_io_timeout+0xdc/0x110
     [<ffffffff81244b9b>] blk_execute_rq+0x9b/0x100
     [<ffffffff8124f665>] sg_io+0x1a5/0x450
     [<ffffffff8124fd21>] scsi_cmd_ioctl+0x2a1/0x430
     [<ffffffff8124fef2>] scsi_cmd_blk_ioctl+0x42/0x50
     [<ffffffffa00ec97e>] sd_ioctl+0xbe/0x140 [sd_mod]
     [<ffffffff8124bd04>] blkdev_ioctl+0x234/0x840
     [<ffffffff811cb491>] block_ioctl+0x41/0x50
     [<ffffffff811a0df0>] do_vfs_ioctl+0x300/0x520
     [<ffffffff811a1051>] SyS_ioctl+0x41/0x80
     [<ffffffff814b9962>] tracesys+0xd0/0xd5

Fix this by scheduling removal work on another workqueue than the
transport layer timers.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: David Dillow <dave@thedillows.org>
Cc: Sebastian Parschauer <sebastian.riemer@profitbricks.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01 15:21:51 -07:00
Roi Dayan
8d4aca7f04 IB/iser: Clarify a duplicate counters check
This is to prevent someone from thinking that this code section is
redundant.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01 15:10:05 -07:00
Ariel Nahum
9a6d3234a1 IB/iser: Replace connection waitqueue with completion object
Instead of waiting for events and condition changes of the iser
connection state, we wait for explicit completion of connection
establishment and teardown.

Separate connection establishment wait object from the teardown object
to avoid a situation where racing connection establishment and
teardown may concurrently wakeup each other.

ep_poll will wait for up_completion invoked by
iser_connected_handler() and iser release worker will wait for
flush_completion before releasing the connection.

Bound the completion wait with a 30 seconds timeout for cases where
iscsid (the user space iscsi daemon) is too slow or gone.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01 15:10:05 -07:00
Ariel Nahum
504130c039 IB/iser: Protect iser state machine with a mutex
The iser connection state lookups and transitions are not fully protected.

Some transitions are protected with a spinlock, and in some cases the
state is accessed unprotected due to specific assumptions of the flow.

Introduce a new mutex to protect the connection state access. We use a
mutex since we need to also include a scheduling operations executed
under the state lock.

Each state transition/condition and its corresponding action will be
protected with the state mutex.

The rdma_cm events handler acquires the mutex when handling connection
events. Since iser connection state can transition to DOWN
concurrently during connection establishment, we bailout from
addr/route resolution events when the state is not PENDING.

This addresses a scenario where ep_poll retries expire during CMA
connection establishment. In this case ep_disconnect is invoked while
CMA events keep coming (address/route resolution, connected, etc...).

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01 15:10:05 -07:00
Sagi Grimberg
f1a8bf0983 IB/iser: Remove redundant return code in iser_free_ib_conn_res()
Make it void.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01 15:10:04 -07:00
Ariel Nahum
0a6907588a IB/iser: Seperate iser_conn and iscsi_endpoint storage space
iser connection needs asynchronous cleanup completions which are
triggered in ep_disconnect.  As a result we are keeping the
corresponding iscsi_endpoint structure hanging for no good reason. In
order to avoid that, we seperate iser_conn from iscsi_endpoint storage
space to have their destruction being independent.

iscsi_endpoint will be destroyed at ep_disconnect stage, while the
iser connection will wait for asynchronous completions to be released
in an orderly fashion.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01 15:10:04 -07:00
Sagi Grimberg
2ea32938f3 IB/iser: Fix responder resources advertisement
The iser initiator is the RDMA responder so it should publish to the
target the max inflight rdma read requests its local HCA can handle in
responder_resources (max_qp_rd_atom).

The iser target should take the min of that and its local HCA max
inflight oustanding rdma read requests (max_qp_init_rd_atom).

We keep initiator_depth set to 1 in order to compat with old targets.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01 15:10:04 -07:00
Roi Dayan
9579d60350 IB/iser: Add TIMEWAIT_EXIT event handling
In case the DISCONNECTED event is not delivered after rdma_disconnect
is called, the CM waits TIMEWAIT seconds and delivers the
TIMEWAIT_EXIT local event. We use this as the notification needed to
continue in the teardown and release sequence.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01 15:10:04 -07:00
Roi Dayan
96ed02d4be IB/iser: Support IPv6 address family
Replace struct sockaddr_in with struct sockaddr which supports both
IPv4 and IPv6, and print using the %pIS format directive.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-08-01 15:10:04 -07:00
Tom Gundersen
c835a67733 net: set name_assign_type in alloc_netdev()
Extend alloc_netdev{,_mq{,s}}() to take name_assign_type as argument, and convert
all users to pass NET_NAME_UNKNOWN.

Coccinelle patch:

@@
expression sizeof_priv, name, setup, txqs, rxqs, count;
@@

(
-alloc_netdev_mqs(sizeof_priv, name, setup, txqs, rxqs)
+alloc_netdev_mqs(sizeof_priv, name, NET_NAME_UNKNOWN, setup, txqs, rxqs)
|
-alloc_netdev_mq(sizeof_priv, name, setup, count)
+alloc_netdev_mq(sizeof_priv, name, NET_NAME_UNKNOWN, setup, count)
|
-alloc_netdev(sizeof_priv, name, setup)
+alloc_netdev(sizeof_priv, name, NET_NAME_UNKNOWN, setup)
)

v9: move comments here from the wrong commit

Signed-off-by: Tom Gundersen <teg@jklm.no>
Reviewed-by: David Herrmann <dh.herrmann@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-07-15 16:12:48 -07:00
Linus Torvalds
ed9ea4ed3a Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "The highlights this round include:

   - Add support for T10 PI pass-through between vhost-scsi +
     virtio-scsi (MST + Paolo + MKP + nab)
   - Add support for T10 PI in qla2xxx target mode (Quinn + MKP + hch +
     nab, merged through scsi.git)
   - Add support for percpu-ida pre-allocation in qla2xxx target code
     (Quinn + nab)
   - A number of iser-target fixes related to hardening the network
     portal shutdown path (Sagi + Slava)
   - Fix response length residual handling for a number of control CDBs
     (Roland + Christophe V.)
   - Various iscsi RFC conformance fixes in the CHAP authentication path
     (Tejas and Calsoft folks + nab)
   - Return TASK_SET_FULL status for tcm_fc(FCoE) DataIn + Response
     failures (Vasu + Jun + nab)
   - Fix long-standing ABORT_TASK + session reset hang (nab)
   - Convert iser-initiator + iser-target to include T10 bytes into EDTL
     (Sagi + Or + MKP + Mike Christie)
   - Fix NULL pointer dereference regression related to XCOPY introduced
     in v3.15 + CC'ed to v3.12.y (nab)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (34 commits)
  target: Fix NULL pointer dereference for XCOPY in target_put_sess_cmd
  vhost-scsi: Include prot_bytes into expected data transfer length
  TARGET/sbc,loopback: Adjust command data length in case pi exists on the wire
  libiscsi, iser: Adjust data_length to include protection information
  scsi_cmnd: Introduce scsi_transfer_length helper
  target: Report correct response length for some commands
  target/sbc: Check that the LBA and number of blocks are correct in VERIFY
  target/sbc: Remove sbc_check_valid_sectors()
  Target/iscsi: Fix sendtargets response pdu for iser transport
  Target/iser: Fix a wrong dereference in case discovery session is over iser
  iscsi-target: Fix ABORT_TASK + connection reset iscsi_queue_req memory leak
  target: Use complete_all for se_cmd->t_transport_stop_comp
  target: Set CMD_T_ACTIVE bit for Task Management Requests
  target: cleanup some boolean tests
  target/spc: Simplify INQUIRY EVPD=0x80
  tcm_fc: Generate TASK_SET_FULL status for response failures
  tcm_fc: Generate TASK_SET_FULL status for DataIN failures
  iscsi-target: Reject mutual authentication with reflected CHAP_C
  iscsi-target: Remove no-op from iscsit_tpg_del_portal_group
  iscsi-target: Fix CHAP_A parameter list handling
  ...
2014-06-12 22:38:32 -07:00
Linus Torvalds
f9da455b93 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:

 1) Seccomp BPF filters can now be JIT'd, from Alexei Starovoitov.

 2) Multiqueue support in xen-netback and xen-netfront, from Andrew J
    Benniston.

 3) Allow tweaking of aggregation settings in cdc_ncm driver, from Bjørn
    Mork.

 4) BPF now has a "random" opcode, from Chema Gonzalez.

 5) Add more BPF documentation and improve test framework, from Daniel
    Borkmann.

 6) Support TCP fastopen over ipv6, from Daniel Lee.

 7) Add software TSO helper functions and use them to support software
    TSO in mvneta and mv643xx_eth drivers.  From Ezequiel Garcia.

 8) Support software TSO in fec driver too, from Nimrod Andy.

 9) Add Broadcom SYSTEMPORT driver, from Florian Fainelli.

10) Handle broadcasts more gracefully over macvlan when there are large
    numbers of interfaces configured, from Herbert Xu.

11) Allow more control over fwmark used for non-socket based responses,
    from Lorenzo Colitti.

12) Do TCP congestion window limiting based upon measurements, from Neal
    Cardwell.

13) Support busy polling in SCTP, from Neal Horman.

14) Allow RSS key to be configured via ethtool, from Venkata Duvvuru.

15) Bridge promisc mode handling improvements from Vlad Yasevich.

16) Don't use inetpeer entries to implement ID generation any more, it
    performs poorly, from Eric Dumazet.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1522 commits)
  rtnetlink: fix userspace API breakage for iproute2 < v3.9.0
  tcp: fixing TLP's FIN recovery
  net: fec: Add software TSO support
  net: fec: Add Scatter/gather support
  net: fec: Increase buffer descriptor entry number
  net: fec: Factorize feature setting
  net: fec: Enable IP header hardware checksum
  net: fec: Factorize the .xmit transmit function
  bridge: fix compile error when compiling without IPv6 support
  bridge: fix smatch warning / potential null pointer dereference
  via-rhine: fix full-duplex with autoneg disable
  bnx2x: Enlarge the dorq threshold for VFs
  bnx2x: Check for UNDI in uncommon branch
  bnx2x: Fix 1G-baseT link
  bnx2x: Fix link for KR with swapped polarity lane
  sctp: Fix sk_ack_backlog wrap-around problem
  net/core: Add VF link state control policy
  net/fsl: xgmac_mdio is dependent on OF_MDIO
  net/fsl: Make xgmac_mdio read error message useful
  net_sched: drr: warn when qdisc is not work conserving
  ...
2014-06-12 14:27:40 -07:00
Sagi Grimberg
d77e65350f libiscsi, iser: Adjust data_length to include protection information
In case protection information exists over the wire
iscsi header data length is required to include it.
Use protection information aware scsi helpers to set
the correct transfer length.

In order to avoid breakage, remove iser transfer length
checks for each task as they are not always true and
somewhat redundant anyway.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-11 13:06:45 -07:00
Sagi Grimberg
22c7aaa57e Target/iscsi: Fix sendtargets response pdu for iser transport
In case the transport is iser we should not include the
iscsi target info in the sendtargets text response pdu.
This causes sendtargets response to include the target
info twice.

Modify iscsit_build_sendtargets_response to filter
transport types that don't match.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Cc: stable@vger.kernel.org # 3.11+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-11 11:52:39 -07:00
Sagi Grimberg
e0546fc1ba Target/iser: Fix a wrong dereference in case discovery session is over iser
In case the discovery session is carried over iser, we can't
access the assumed network portal since the default portal is
used. In this case we don't really need to allocate the fastreg
pool, just prepare to the text pdu that will follow.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reported-by: Alex Tabachnik <alext@mellanox.com>
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-11 11:52:39 -07:00
Linus Torvalds
1d21b1bf53 Main batch of InfiniBand/RDMA changes for 3.16:
- Add iWARP port mapper to avoid conflicts between RDMA and normal
    stack TCP connections.
 
  - Fixes for i386 / x86-64 structure padding differences (ABI
    compatibility for 32-on-64) from Yann Droneaud.
 
  - A pile of SRP initiator fixes from Bart Van Assche.
 
  - Fixes for a writeback / memory allocation deadlock with NFS over
    IPoIB connected mode from Jiri Kosina.
 
  - The usual fixes and cleanups to mlx4, mlx5, cxgb4 and other
    low-level drivers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJTlzyEAAoJEENa44ZhAt0h9yoP/1UeXlejOpCJyiNdtJZ+ilcU
 cb0PEzsjzqACyDqcoQ0EpQM3/3emccVIC3uUXK12mzlTIXOFYTeRLays/TbxZDLt
 FK5D/NrMmmJmciPt1ZRgUX82kFFRGScEfpkXYs7jxtRaNT7CW5KwSNQr6aFXskUz
 1gpdK1ARCN5rWcGl2HJx5o9C4c/Fa/Vov8lOsAkUZXD1SuPNT/fFN0u1pRzU68g0
 k3oj81XnZq5ejOBQKXEHImcmjXwaJ2yjmzxhSsKebqDWDdXuS/F9e4taKneHTZmr
 AdwJaLLJPWmAGi/vYYhkuLKpzIDpzMCqwr39lEabmjWvznYOlnjfVUXwUTE2nwNC
 DIXuHOLFrSvF2cNxh8ZeEYKS8AV+PjAOahPC5whkWkY256Q67uB7cy9ilWAK+7xS
 QcQ5Inr6iXvxIGYA4hNwUo8aK0NuKFwhkVVFEbkPaurbQZPqiKwyVE3w2FOws/Qp
 0kLLCVvpRQYjKzkxyof2tb1AcNuVNKXHrYk6RaBDJ9mjxHbhvY4OSt4CBxAAXBu6
 zoedUydN1Nz1UgAB1jDsBdyE2QQnXockA1+JJKNq6gM5Dz0DUdAylzQ2NqY9tnYz
 RTzihEPYIiQUkV3B8ErbqsuO6z7M830AXO5AR6bLZn1zgJ0cbMLBaKLA8LRufJI/
 qxNVwL32Uv1PjKZ+yX1x
 =Wcdc
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull main InfiniBand/RDMA updates from Roland Dreier:

 - add iWARP port mapper to avoid conflicts between RDMA and normal
   stack TCP connections.

 - fixes for i386 / x86-64 structure padding differences (ABI
   compatibility for 32-on-64) from Yann Droneaud.

 - a pile of SRP initiator fixes from Bart Van Assche.

 - fixes for a writeback / memory allocation deadlock with NFS over
   IPoIB connected mode from Jiri Kosina.

 - the usual fixes and cleanups to mlx4, mlx5, cxgb4 and other low-level
   drivers.

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (61 commits)
  RDMA/cxgb4: Add support for iWARP Port Mapper user space service
  RDMA/nes: Add support for iWARP Port Mapper user space service
  RDMA/core: Add support for iWARP Port Mapper user space service
  IB/mlx4: Fix gfp passing in create_qp_common()
  IB/umad: Fix use-after-free on close
  IB/core: Fix kobject leak on device register error flow
  RDMA/cxgb4: add missing padding at end of struct c4iw_alloc_ucontext_resp
  mlx4_core: Fix GFP flags parameters to be gfp_t
  IB/core: Fix port kobject deletion during error flow
  IB/core: Remove unneeded kobject_get/put calls
  IB/core: Fix sparse warnings about redeclared functions
  IB/mad: Fix sparse warning about gfp_t use
  IB/mlx4: Implement IB_QP_CREATE_USE_GFP_NOIO
  IB: Add a QP creation flag to use GFP_NOIO allocations
  IB: Return error for unsupported QP creation flags
  IB: Allow build of hw/ and ulp/ subdirectories independently
  mlx4_core: Move handling of MLX4_QP_ST_MLX to proper switch statement
  RDMA/cxgb4: Add missing padding at end of struct c4iw_create_cq_resp
  IB/srp: Avoid problems if a header uses pr_fmt
  IB/umad: Fix error handling
  ...
2014-06-10 10:41:33 -07:00
Roland Dreier
eeaddf3670 Merge branches 'core', 'cxgb3', 'cxgb4', 'iser', 'iwpm', 'misc', 'mlx4', 'mlx5', 'noio', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next 2014-06-10 10:12:14 -07:00
Nicholas Bellinger
6cc44a6fb4 iser-target: Add missing target_put_sess_cmd for ImmedateData failure
This patch addresses a bug where an early exception for SCSI WRITE
with ImmediateData=Yes was missing the target_put_sess_cmd() call
to drop the extra se_cmd->cmd_kref reference obtained during the
normal iscsit_setup_scsi_cmd() codepath execution.

This bug was manifesting itself during session shutdown within
isert_cq_rx_comp_err() where target_wait_for_sess_cmds() would
end up waiting indefinately for the last se_cmd->cmd_kref put to
occur for the failed SCSI WRITE + ImmediateData descriptors.

This fix follows what traditional iscsi-target code already does
for the same failure case within iscsit_get_immediate_data().

Reported-by: Sagi Grimberg <sagig@dev.mellanox.co.il>
Cc: Sagi Grimberg <sagig@dev.mellanox.co.il>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-06-03 19:17:31 -07:00
Or Gerlitz
09b93088d7 IB: Add a QP creation flag to use GFP_NOIO allocations
This addresses a problem where NFS client writes over IPoIB connected
mode may deadlock on memory allocation/writeback.

The problem is not directly memory reclamation.  There is an indirect
dependency between network filesystems writing back pages and
ipoib_cm_tx_init() due to how a kworker is used.  Page reclaim cannot
make forward progress until ipoib_cm_tx_init() succeeds and it is
stuck in page reclaim itself waiting for network transmission.
Ordinarily this situation may be avoided by having the caller use
GFP_NOFS but ipoib_cm_tx_init() does not have that information.

To address this, take a general approach and add a new QP creation
flag that tells the low-level hardware driver to use GFP_NOIO for the
memory allocations related to the new QP.

Use the new flag in the ipoib connected mode path, and if the driver
doesn't support it, re-issue the QP creation without the flag.

Signed-off-by: Mel Gorman <mgorman@suse.de>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02 14:58:11 -07:00
Yann Droneaud
729ee4efcc IB: Allow build of hw/ and ulp/ subdirectories independently
It is not possible to build only the drivers/infiniband/hw/ (or ulp/)
subdirectory with command such as:

    $ make ARCH=x86_64 O=./obj-x86_64/ drivers/infiniband/hw/

This fails with following error messages:

    make[2]: Nothing to be done for `all'.
    make[2]: Nothing to be done for `relocs'.
      CHK     include/config/kernel.release
      Using /home/ydroneaud/src/linux as source for kernel
      GEN     /home/ydroneaud/src/linux/obj-x86_64/Makefile
      CHK     include/generated/uapi/linux/version.h
      CHK     include/generated/utsrelease.h
      CALL    /home/ydroneaud/src/linux/scripts/checksyscalls.sh
    /home/ydroneaud/src/linux/scripts/Makefile.build:44: /home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile: No such file or directory
    make[2]: *** No rule to make target `/home/ydroneaud/src/linux/drivers/infiniband/hw/Makefile'.  Stop.
    make[1]: *** [drivers/infiniband/hw/] Error 2
    make: *** [sub-make] Error 2

This patch creates a Makefile in hw/ and ulp/ and moves each
corresponding parts of drivers/infiniband/Makefile in the new
Makefiles.

It should not break build except if some hw/ drivers or ulp/ were
allowed previously to be built while CONFIG_INFINIBAND is set to 'n',
but according to drivers/infiniband/Kconfig, it's not possible. So it
should be safe to apply.

Signed-off-by: Yann Droneaud <ydroneaud@opteya.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-06-02 14:51:12 -07:00
Joe Perches
d236cd0e20 IB/srp: Avoid problems if a header uses pr_fmt
SRP defines pr_fmt(fmt) to be "PFX fmt", and then includes a bunch of
header files before it gets around to defining PFX.  This causes
problems if any of the header files do a pr_... and use pr_fmt().

Fix this by using KBUILD_MODNAME instead of the private PFX.

Acked-by: Chris Metcalf <cmetcalf@tilera.com>

Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-29 21:32:54 -07:00
Or Gerlitz
c7ca4b69d9 IB/iser: Bump version to 1.4
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-26 08:19:48 -07:00
Roi Dayan
e7eeffa4a0 IB/iser: Add missing newlines to logging messages
Logging messages need terminating newlines to avoid possible message
interleaving.  Add them.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-26 08:19:48 -07:00
Ariel Nahum
66d4e62d27 IB/iser: Fix a possible race in iser connection states transition
In some circumstances (multiple targets), RDMA_CM ESTABLISHED event
and ep_disconnect may race. In this case, the iser connection state
may transition to UP (after ep_disconnect transitioned it to
TERMINATING), while the connection is being torn down.

Upon RDMA_CM event ESTABLISHED we allow iser connection state to
transition to UP only from PENDING. We also make sure to protect this
state change (done under the connection lock).

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-26 08:19:48 -07:00
Ariel Nahum
b73c3adabd IB/iser: Simplify connection management
iSER relies on refcounting to manage iser connections establishment
and teardown.

Following commit 39ff05dbbb ("IB/iser: Enhance disconnection logic
for multi-pathing"), iser connection maintain 3 references:

 - iscsi_endpoint (at creation stage)
 - cma_id (at connection request stage)
 - iscsi_conn (at bind stage)

We can avoid taking explicit refcounts by correctly serializing iser
teardown flows (graceful and non-graceful).

Our approach is to trigger a scheduled work to handle ordered teardown
by gracefully waiting for 2 cleanup stages to complete:

 1. Cleanup of live pending tasks indicated by iscsi_conn_stop completion
 2. Flush errors processing

Each completed stage will notify a waiting worker thread when it is
done to allow teardwon continuation.

Since iSCSI connection establishment may trigger endpoint disconnect
without a successful endpoint connect, we rely on the iscsi <-> iser
binding (.conn_bind) to learn about the teardown policy we should take
wrt cleanup stages.

Since all cleanup worker threads are scheduled (release_wq) in
.ep_disconnect it is safe to assume that when module_exit is called,
all cleanup workers are already scheduled. Thus proper module unload
shall flush all scheduled works before allowing safe exit, to
guarantee no resources got left behind.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reviewed-by: Roi Dayan <roid@mellanox.com>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-26 08:19:48 -07:00
David S. Miller
54e5c4def0 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/bonding/bond_alb.c
	drivers/net/ethernet/altera/altera_msgdma.c
	drivers/net/ethernet/altera/altera_sgdma.c
	net/ipv6/xfrm6_output.c

Several cases of overlapping changes.

The xfrm6_output.c has a bug fix which overlaps the renaming
of skb->local_df to skb->ignore_df.

In the Altera TSE driver cases, the register access cleanups
in net-next overlapped with bug fixes done in net.

Similarly a bug fix to send ALB packets in the bonding driver using
the right source address overlaps with cleanups in net-next.

Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-24 00:32:30 -04:00
Sagi Grimberg
5f80ff8ecc Target/iser: Gracefully reject T10-PI enabled connect request if not supported
In case user chose to set T10-PI enable on the target while
the IB device does not support it, gracefully reject the request.

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.15+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-20 11:18:44 -07:00
Sagi Grimberg
f5ebec9629 Target/iser: Wait for proper cleanup before unloading
disconnected_handler works are scheduled on system_wq.
When attempting to unload, first make sure all works
have cleaned up.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-20 11:18:43 -07:00
Sagi Grimberg
88c4015fda Target/iser: Improve cm events handling
There are 4 RDMA_CM events that all basically mean that
the user should teardown the IB connection:
- DISCONNECTED
- ADDR_CHANGE
- DEVICE_REMOVAL
- TIMEWAIT_EXIT

Only in DISCONNECTED/ADDR_CHANGE it makes sense to
call rdma_disconnect (send DREQ/DREP to our initiator).
So we keep the same teardown handler for all of them
but only indicate calling rdma_disconnect for the relevant
events.

This patch also removes redundant debug prints for each single
event.

v2 changes:
 - Call isert_disconnected_handler() for DEVICE_REMOVAL (Or + Sag)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-20 11:17:40 -07:00
Bart Van Assche
5cfb17828d IB/srp: Add fast registration support
Certain HCA types (e.g. Connect-IB) and certain configurations (e.g.
ConnectX VF) support fast registration but not FMR. Hence add fast
registration support.

In function srp_rport_reconnect(), move the the srp_finish_req()
loop from after to before the srp_create_target_ib() call. This is
needed to avoid that srp_finish_req() tries to queue any
invalidation requests for rkeys associated with the old queue pair
on the newly allocated queue pair. Invoking srp_finish_req() before
the queue pair has been reallocated is safe since srp_claim_req()
handles completions correctly that arrive after srp_finish_req()
has been invoked.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
52ede08f00 IB/srp: Rename FMR-related variables
The next patch will cause the renamed variables to be shared between
the code for FMR and for FR memory registration. Make the names of
these variables independent of the memory registration mode. This
patch does not change any functionality. The start of this patch was
the changes applied via the following shell command:

sed -i.orig 's/SRP_FMR_SIZE/SRP_MAX_PAGES_PER_MR/g; \
    s/fmr_page_mask/mr_page_mask/g;s/fmr_page_size/mr_page_size/g; \
    s/fmr_page_shift/mr_page_shift/g;s/fmr_max_size/mr_max_size/g; \
    s/max_pages_per_fmr/max_pages_per_mr/g;s/nfmr/nmdesc/g; \
    s/fmr_len/dma_len/g' drivers/infiniband/ulp/srp/ib_srp.[ch]

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
d1b4289e16 IB/srp: One FMR pool per SRP connection
Allocate one FMR pool per SRP connection instead of one SRP pool
per HCA. This improves scalability of the SRP initiator.

Only request the SCSI mid-layer to retry a SCSI command after a
temporary mapping failure (-ENOMEM) but not after a permanent
mapping failure. This avoids that SCSI commands are retried
indefinitely if a permanent memory mapping failure occurs.

Tell the SCSI mid-layer to reduce queue depth temporarily in the
unlikely case where an application is queuing many requests with
more than max_pages_per_fmr sg-list elements.

For FMR pool allocation, base the max_pages_per_fmr parameter on
the HCA memory registration limit. Only try to allocate an FMR
pool if FMR is supported.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
b1b8854d16 IB/srp: Introduce the 'register_always' kernel module parameter
Add a kernel module parameter that enables memory registration also for SG-lists
that can be processed without memory registration. This makes it easier for kernel
developers to test the memory registration code.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
539dde6fc5 IB/srp: Introduce srp_finish_mapping()
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:52 -07:00
Bart Van Assche
76bc1e1ddd IB/srp: Introduce srp_map_fmr()
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:51 -07:00
Bart Van Assche
62154b2e89 IB/srp: Introduce an additional local variable
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:51 -07:00
Bart Van Assche
af24663bc8 IB/srp: Fix kernel-doc warnings
Avoid that the kernel-doc tool warns about missing argument
descriptions for the ib_srp.[ch] source files.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:51 -07:00
Bart Van Assche
024ca90151 IB/srp: Fix a sporadic crash triggered by cable pulling
Avoid that the loops that iterate over the request ring can encounter
a pointer to a SCSI command in req->scmnd that is no longer associated
with that request. If the function srp_unmap_data() is invoked twice
for a SCSI command that is not in flight then that would cause
ib_fmr_pool_unmap() to be invoked with an invalid pointer as argument,
resulting in a kernel oops.

Reported-by: Sagi Grimberg <sagig@mellanox.com>
Reference: http://thread.gmane.org/gmane.linux.drivers.rdma/19068/focus=19069
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-05-20 09:20:51 -07:00
Sagi Grimberg
9d49f5e284 Target/iser: Fix hangs in connection teardown
In ungraceful teardowns isert close flows seem racy such that
isert_wait_conn hangs as RDMA_CM_EVENT_DISCONNECTED never
gets invoked (no one called rdma_disconnect).

Both graceful and ungraceful teardowns will have rx flush errors
(isert posts a batch once connection is established). Once all
flush errors are consumed we invoke isert_wait_conn and it will
be responsible for calling rdma_disconnect. This way it can be
sure that rdma_disconnect was called and it won't wait forever.

This patch also removes the logout_posted indicator. either the
logout completion was consumed and no problem decrementing the
post_send_buf_count, or it was consumed as a flush error. no point
of keeping it for isert_wait_conn as there is no danger that
isert_conn will be accidentally removed while it is running.

(Drop unnecessary sleep_on_conn_wait_comp check in
 isert_cq_rx_comp_err - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-19 14:38:49 -07:00
Sagi Grimberg
e346ab343f Target/iser: Bail from accept_np if np_thread is trying to close
In case np_thread state is in RESET/SHUTDOWN/EXIT states,
no point for isert to stall there as we may get a hang in
case no one will wake it up later.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-19 14:32:58 -07:00
Sagi Grimberg
14f4b54fe3 Target/iscsi,iser: Avoid accepting transport connections during stop stage
When the target is in stop stage, iSER transport initiates RDMA disconnects.
The iSER initiator may wish to establish a new connection over the
still existing network portal. In this case iSER transport should not
accept and resume new RDMA connections. In order to learn that, iscsi_np
is added with enabled flag so the iSER transport can check when deciding
weather to accept and resume a new connection request.

The iscsi_np is enabled after successful transport setup, and disabled
before iscsi_np login threads are cleaned up.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-15 17:09:11 -07:00
Sagi Grimberg
531b7bf4bd Target/iser: Fix iscsit_accept_np and rdma_cm racy flow
RDMA CM and iSCSI target flows are asynchronous and completely
uncorrelated. Relying on the fact that iscsi_accept_np will be called
after CM connection request event and will wait for it is a mistake.

When attempting to login to a few targets this flow is racy and
unpredictable, but for parallel login to dozens of targets will
race and hang every time.

The correct synchronizing mechanism in this case is pending on
a semaphore rather than a wait_for_event. We keep the pending
interruptible for iscsi_np cleanup stage.

(Squash patch to remove dead code into parent - nab)

Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-15 17:09:10 -07:00
Sagi Grimberg
9fe63c88b1 Target/iser: Fix wrong connection requests list addition
Should be adding list_add_tail($new, $head) and not
the other way around.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: stable@vger.kernel.org # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-05-15 17:09:10 -07:00
Wilfried Klaebe
7ad24ea4bf net: get rid of SET_ETHTOOL_OPS
net: get rid of SET_ETHTOOL_OPS

Dave Miller mentioned he'd like to see SET_ETHTOOL_OPS gone.
This does that.

Mostly done via coccinelle script:
@@
struct ethtool_ops *ops;
struct net_device *dev;
@@
-       SET_ETHTOOL_OPS(dev, ops);
+       dev->ethtool_ops = ops;

Compile tested only, but I'd seriously wonder if this broke anything.

Suggested-by: Dave Miller <davem@davemloft.net>
Signed-off-by: Wilfried Klaebe <w-lkml@lebenslange-mailadresse.de>
Acked-by: Felipe Balbi <balbi@ti.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-05-13 17:43:20 -04:00
Linus Torvalds
141eaccd01 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Here are the target pending updates for v3.15-rc1.  Apologies in
  advance for waiting until the second to last day of the merge window
  to send these out.

  The highlights this round include:

   - iser-target support for T10 PI (DIF) offloads (Sagi + Or)
   - Fix Task Aborted Status (TAS) handling in target-core (Alex Leung)
   - Pass in transport supported PI at session initialization (Sagi + MKP + nab)
   - Add WRITE_INSERT + READ_STRIP T10 PI support in target-core (nab + Sagi)
   - Fix iscsi-target ERL=2 ASYNC_EVENT connection pointer bug (nab)
   - Fix tcm_fc use-after-free of ft_tpg (Andy Grover)
   - Use correct ib_sg_dma primitives in ib_isert (Mike Marciniszyn)

  Also, note the virtio-scsi + vhost-scsi changes to expose T10 PI
  metadata into KVM guest have been left-out for now, as there where a
  few comments from MST + Paolo that where not able to be addressed in
  time for v3.15.  Please expect this feature for v3.16-rc1"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (43 commits)
  ib_srpt: Use correct ib_sg_dma primitives
  target/tcm_fc: Rename ft_tport_create to ft_tport_get
  target/tcm_fc: Rename ft_{add,del}_lport to {add,del}_wwn
  target/tcm_fc: Rename structs and list members for clarity
  target/tcm_fc: Limit to 1 TPG per wwn
  target/tcm_fc: Don't export ft_lport_list
  target/tcm_fc: Fix use-after-free of ft_tpg
  target: Add check to prevent Abort Task from aborting itself
  target: Enable READ_STRIP emulation in target_complete_ok_work
  target/sbc: Add sbc_dif_read_strip software emulation
  target: Enable WRITE_INSERT emulation in target_execute_cmd
  target/sbc: Add sbc_dif_generate software emulation
  target/sbc: Only expose PI read_cap16 bits when supported by fabric
  target/spc: Only expose PI mode page bits when supported by fabric
  target/spc: Only expose PI inquiry bits when supported by fabric
  target: Pass in transport supported PI at session initialization
  target/iblock: Fix double bioset_integrity_free bug
  Target/sbc: Initialize COMPARE_AND_WRITE write_sg scatterlist
  target/rd: T10-Dif: RAM disk is allocating more space than required.
  iscsi-target: Fix ERL=2 ASYNC_EVENT connection pointer bug
  ...
2014-04-12 16:51:08 -07:00
Mike Marciniszyn
b076808051 ib_srpt: Use correct ib_sg_dma primitives
The code was incorrectly using sg_dma_address() and
sg_dma_len() instead of ib_sg_dma_address() and
ib_sg_dma_len().

This prevents srpt from functioning with the
Intel HCA and indeed will corrupt memory
badly.

Cc: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Dennis Dalessandro <dennis.dalessandro@intel.com>
Tested-by: Vinod Kumar <vinod.kumar@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-11 15:31:20 -07:00
Nicholas Bellinger
e70beee783 target: Pass in transport supported PI at session initialization
In order to support local WRITE_INSERT + READ_STRIP operations for
non PI enabled fabrics, the fabric driver needs to be able signal
what protection offload operations are supported.

This is done at session initialization time so the modes can be
signaled by individual se_wwn + se_portal_group endpoints, as well
as optionally across different transports on the same endpoint.

For iser-target, set TARGET_PROT_ALL if the underlying ib_device
has already signaled PI offload support, and allow this to be
exposed via a new iscsit_transport->iscsit_get_sup_prot_ops()
callback.

For loopback, set TARGET_PROT_ALL to signal SCSI initiator mode
operation.

For all other drivers, set TARGET_PROT_NORMAL to disable fabric
level PI.

Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Quinn Tran <quinn.tran@qlogic.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:54 -07:00
Sagi Grimberg
f225225848 Target/iser: Use Fastreg only if device supports signature
Fastreg is mandatory for signature, so if the
device doesn't support it we don't need to use it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:52 -07:00
Nicholas Bellinger
03e7848a64 iser-target: Add missing se_cmd put for WRITE_PENDING in tx_comp_err
This patch fixes a bug where outstanding RDMA_READs with WRITE_PENDING
status require an extra target_put_sess_cmd() in isert_put_cmd() code
when called from isert_cq_tx_comp_err() + isert_cq_drain_comp_llist()
context during session shutdown.

The extra kref PUT is required so that transport_generic_free_cmd()
invokes the last target_put_sess_cmd() -> target_release_cmd_kref(),
which will complete(&se_cmd->cmd_wait_comp) the outstanding se_cmd
descriptor with WRITE_PENDING status, and awake the completion in
target_wait_for_sess_cmds() to invoke TFO->release_cmd().

The bug was manifesting itself in target_wait_for_sess_cmds() where
a se_cmd descriptor with WRITE_PENDING status would end up sleeping
indefinately.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:51 -07:00
Nicholas Bellinger
131e6abc67 target: Add TFO->abort_task for aborted task resources release
Now that TASK_ABORTED status is not generated for all cases by
TMR ABORT_TASK + LUN_RESET, a new TFO->abort_task() caller is
necessary in order to give fabric drivers a chance to unmap
hardware / software resources before the se_cmd descriptor is
released via the normal TFO->release_cmd() codepath.

This patch adds TFO->aborted_task() in core_tmr_abort_task()
in place of the original transport_send_task_abort(), and
also updates all fabric drivers to implement this caller.

The fabric drivers that include changes to perform cleanup
via ->aborted_task() are:

  - iscsi-target
  - iser-target
  - srpt
  - tcm_qla2xxx

The fabric drivers that currently set ->aborted_task() to
NOPs are:

  - loopback
  - tcm_fc
  - usb-gadget
  - sbp-target
  - vhost-scsi

For the latter five, there appears to be no additional cleanup
required before invoking TFO->release_cmd() to release the
se_cmd descriptor.

v2 changes:
  - Move ->aborted_task() call into transport_cmd_finish_abort (Alex)

Cc: Alex Leung <amleung21@yahoo.com>
Cc: Mark Rustad <mark.d.rustad@intel.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Vu Pham <vu@mellanox.com>
Cc: Chris Boot <bootc@bootc.net>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Giridhar Malavali <giridhar.malavali@qlogic.com>
Cc: Saurav Kashyap <saurav.kashyap@qlogic.com>
Cc: Quinn Tran <quinn.tran@qlogic.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:51 -07:00
Nicholas Bellinger
f46d6a8a01 iser-target: Match FRMR descriptors to available session tags
This patch changes isert_conn_create_fastreg_pool() to follow
logic in iscsi_target_locate_portal() for determining how many
FRMR descriptors to allocate based upon the number of possible
per-session command slots that are available.

This addresses an OOPs in isert_reg_rdma() where due to the
use of ISCSI_DEF_XMIT_CMDS_MAX could end up returning a bogus
fast_reg_descriptor when the number of active tags exceeded
the original hardcoded max.

Note this also includes moving isert_conn_create_fastreg_pool()
from isert_connect_request() to isert_put_login_tx() before
posting the final Login Response PDU in order to determine the
se_nacl->queue_depth (eg: number of tags) per session the target
will be enforcing.

v2 changes:
  - Move isert_conn->conn_fr_pool list_head init into
    isert_conn_request()
v3 changes:
  - Drop unnecessary list_empty() check in isert_reg_rdma()
    (Sagi)

Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:50 -07:00
Sagi Grimberg
5bac4b1a1f Target/iser: Fail SCSI WRITE command if device detected integrity error
If during data-transfer a data-integrity error was detected we
must fail the command with CHECK_CONDITION and not execute
the command.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:49 -07:00
Sagi Grimberg
96b7973e1c Target/iser: Move check signature status to a function
Remove code duplication from RDMA_READ and RDMA_WRITE
completions that do basically the same check.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:49 -07:00
Sagi Grimberg
897bb2c916 Target/iser: Consider DIF and RDMA_READ completions when calculating post_send counter
If protection is involved, iSER target must wait for
completion of RDMA_READ before sending SCSI response.
So we must consider that when calculating post_send_buf_count
additions, also when processing good/error completions.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:48 -07:00
Sagi Grimberg
c2caa20777 Target/iser: Fix signature work requests accounting
As REG_SIG_MR work request and it's LOCAL_INVALIDATE are
not accounted in post_send_buf_count we must color these
with ISER_FASTREG_LI_WRID in order to process their error
completions when the QP flushes.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:48 -07:00
Sagi Grimberg
9e961ae73c IB/isert: Support T10-PI protected transactions
In case the Target core passed transport T10 protection
operation:

1. Register data buffer (data memory region)
2. Register protection buffer if exsists (prot memory region)
3. Register signature region (signature memory region)
   - use work request IB_WR_REG_SIG_MR
4. Execute RDMA
5. Upon RDMA completion check the signature status
   - if succeeded send good SCSI response
   - if failed send SCSI bad response with appropriate sense buffer

(Fix up compile error in isert_reg_sig_mr, and fix up incorrect
 se_cmd->prot_type -> TARGET_PROT_NORMAL comparision - nab)
(Fix failed sector assignment in isert_completion_rdma_* - Sagi + nab)
(Fix enum assignements for protection type - Sagi)
(Fix devision on 32-bit in isert_completion_rdma_* - Sagi + Fengguang)
(Fix context change for v3.14-rc6 code - nab)
(Fix iscsit_build_rsp_pdu inc_statsn flag usage - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:47 -07:00
Sagi Grimberg
f93f3a70da IB/isert: Accept RDMA_WRITE completions
In case of protected transactions, we will need to check the
protection status of the transaction before sending SCSI response.
So be ready for RDMA_WRITE completions. currently we don't ask
for these completions, but for T10-PI we will.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:46 -07:00
Sagi Grimberg
d3e125dac1 IB/isert: Initialize T10-PI resources
Introduce pi_context to hold relevant RDMA protection resources.
We eliminate data_key_valid boolean and replace it with indicators
container to indicate:
- Is the descriptor protected (registered via signature MR)
- Is the data_mr key valid (can spare LOCAL_INV WR)
- Is the prot_mr key valid (can spare LOCAL_INV WR)
- Is the sig_mr key valid (can spare LOCAL_INV WR)

Upon connection establishment check if network portal is T10-PI
enabled and allocate T10-PI resources if necessary, allocate
signature enabled memory regions and mark connection queue-pair
as signature enabled.

(Fix context change for v3.14-rc6 code - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:46 -07:00
Sagi Grimberg
e3d7e4c30c IB/isert: Introduce isert_map/unmap_data_buf
export map/unmap data buffer to a routine that may
be used in various places in the code and keep the
mapping data in a designated descriptor. Also, let
isert_fast_reg_mr to decide weather to use global
MR or do fast registration.

This commit does not change any functionality.

(Fix context change for v3.14-rc6 code - nab)

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-04-07 01:48:45 -07:00
Linus Torvalds
877f075aac Main batch of InfiniBand/RDMA changes for 3.15:
- The biggest change is core API extensions and mlx5 low-level driver
    support for handling DIF/DIX-style protection information, and the
    addition of PI support to the iSER initiator.  Target support will be
    arriving shortly through the SCSI target tree.
 
  - A nice simplification to the "umem" memory pinning library now that
    we have chained sg lists.  Kudos to Yishai Hadas for realizing our
    code didn't have to be so crazy.
 
  - Another nice simplification to the sg wrappers used by qib, ipath and
    ehca to handle their mapping of memory to adapter.
 
  - The usual batch of fixes to bugs found by static checkers etc. from
    intrepid people like Dan Carpenter and Yann Droneaud.
 
  - A large batch of cxgb4, ocrdma, qib driver updates.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABCAAGBQJTPYBnAAoJEENa44ZhAt0hGI4P/29eotGwpkANUQE6FQvxCUL2
 CXJtSg52lmYvGJrPK4IhihpbtQmHJz3iXEzlOOWidTw1dJgObR6vFaRymh7+vDLs
 CdzybMcXdasarqTuYeJbFzhkimpwtWWrMy/8Ik/Jj/5glGQ6cUSpdYZzVtFhYNqf
 hCGE8iLi+tuekJJj1htut5D6apXM7udcdc2yLJNOdsSj/VUXt1oqG1x9xAi9R8Tq
 7o8eFSStdlja0EBQ6Hli2zauCSnQkaUtr8h6EAFbcCtvBK8HqsHSc2gfq2ViFUiN
 ztt167oWoQnVkR0qCPL5nVt+CRQHHROprVXvbpcTI3aW61gNIl6OrUUOXefzHXac
 TNi+fdMpiEB/JQ4Z04Jzd1dGCSjYeTqPj4rO4meFjBmxRDdTgZHu7FWwejT1nYJ5
 d2abVdCOT+QWlIlM7m/pjdWJII5OYM+4/jtTayGepEaR4fTUzKtPZPBLNUBDBKE+
 4f92PC8LiuPkwJgb6XT96onPz1bDCOnPSEdwoKUFKPeGUcwgVOM/Wx5NU4Yf7rfg
 RxQwZ7mJXbjCYFlmGGo/0QDy6UEGkIFYlJSzooP+wlK1JvZ5h2M+9QKX2FtwzR+R
 I2kBxcTXWsM/h88R7MkNqbNIllmhssrJwmAE46OneZbfoBOB+JZjb4nLRTu0jEcS
 zn6f16GmJ37BKn2/qYY/
 =Ww6H
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband updates from Roland Dreier:
 "Main batch of InfiniBand/RDMA changes for 3.15:

   - The biggest change is core API extensions and mlx5 low-level driver
     support for handling DIF/DIX-style protection information, and the
     addition of PI support to the iSER initiator.  Target support will
     be arriving shortly through the SCSI target tree.

   - A nice simplification to the "umem" memory pinning library now that
     we have chained sg lists.  Kudos to Yishai Hadas for realizing our
     code didn't have to be so crazy.

   - Another nice simplification to the sg wrappers used by qib, ipath
     and ehca to handle their mapping of memory to adapter.

   - The usual batch of fixes to bugs found by static checkers etc.
     from intrepid people like Dan Carpenter and Yann Droneaud.

   - A large batch of cxgb4, ocrdma, qib driver updates"

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (102 commits)
  RDMA/ocrdma: Unregister inet notifier when unloading ocrdma
  RDMA/ocrdma: Fix warnings about pointer <-> integer casts
  RDMA/ocrdma: Code clean-up
  RDMA/ocrdma: Display FW version
  RDMA/ocrdma: Query controller information
  RDMA/ocrdma: Support non-embedded mailbox commands
  RDMA/ocrdma: Handle CQ overrun error
  RDMA/ocrdma: Display proper value for max_mw
  RDMA/ocrdma: Use non-zero tag in SRQ posting
  RDMA/ocrdma: Memory leak fix in ocrdma_dereg_mr()
  RDMA/ocrdma: Increment abi version count
  RDMA/ocrdma: Update version string
  be2net: Add abi version between be2net and ocrdma
  RDMA/ocrdma: ABI versioning between ocrdma and be2net
  RDMA/ocrdma: Allow DPP QP creation
  RDMA/ocrdma: Read ASIC_ID register to select asic_gen
  RDMA/ocrdma: SQ and RQ doorbell offset clean up
  RDMA/ocrdma: EQ full catastrophe avoidance
  RDMA/cxgb4: Disable DSGL use by default
  RDMA/cxgb4: rx_data() needs to hold the ep mutex
  ...
2014-04-03 16:57:19 -07:00
Roland Dreier
f7eaa7ed8f Merge branches 'core', 'cxgb4', 'ip-roce', 'iser', 'misc', 'mlx4', 'nes', 'ocrdma', 'qib', 'sgwrapper', 'srp' and 'usnic' into for-next 2014-04-03 08:30:17 -07:00
Or Gerlitz
5de2ad986d IB/iser: Bump driver version to 1.3
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Or Gerlitz
3ee07d27ac IB/iser: Update Mellanox copyright note
Update Mellanox copyrights for 2014 on the iser initiator driver.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Or Gerlitz
4f9208ad3f IB/iser: Print QP information once connection is established
Add an iser info print with the local/remote QP information carried
out when the connection is established.  While here, fix a little
leftover from the T10 work and set a debug print to be carried in
debug and not info level.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Ariel Nahum
4667f5dfb0 IB/iser: Remove struct iscsi_iser_conn
The iscsi stack has existing mechanisms to link back and forth between
the iscsi connection and the iscsi transport (e.g iser/tcp) connection.

This is done through a dd_data pointer field in struct iscsi_conn
which can be set to point to the transport connection, etc.

The iscsi_iser_conn structure was used to get this linking done in
another way, which is uneeded and adds extra complication to the iser
code, so we just remove it.

Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Roi Dayan
1d6c2b736f IB/iser: Drain the tx cq once before looping on the rx cq
The iser disconnection flow isn't done before all the inflight
recv/send buffers posted to the QP are either flushed or normally
completed to the CQ that serves this connection.  The condition check
is done in iser_handle_comp_error().

Currently, it's possible for the send buffer completion that makes the
posted send buffers counter reach zero to be polled in the drain tx
call, which is after the rx cq is fully drained.  Since this
completion might be not an error one (for example, it might be a
completion of the logout request iSCSI PDU) we will skip
iser_handle_comp_error().  So the connection will never terminate from
the iscsi stack point of view, and we hang.

To resolve this race, do the draining of the tx cq before the loop on
the rx cq.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:09:46 -07:00
Randy Dunlap
39c978cd17 IB/iser: Fix sector_t format warning
Fix pr_err (printk) format warning:

    drivers/infiniband/ulp/iser/iser_verbs.c:1181:4: warning: format '%lx' expects argument of type 'long unsigned int', but argument 3 has type 'sector_t' [-Wformat]

Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-04-01 11:01:08 -07:00
Bart Van Assche
b3fe628da2 IB/srp: Fix a race condition between failing I/O and I/O completion
Avoid that srp_terminate_io() can access req->scmnd after it has been
cleared by the I/O completion code. Do this by protecting req->scmnd
accesses from srp_terminate_io() via locking

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:31 -07:00
Bart Van Assche
ac72d766a8 IB/srp: Avoid that writing into "add_target" hangs due to a cable pull
If a cable is pulled while srp_connect_target() is in progress
that can result in that function never to return. That makes the
process, e.g. srp_daemon, that invoked this function unkillable.
Avoid this by letting srp_connect_target() finish if the event
IB_CM_TIMEWAIT_EXIT is received. This patch fixes a hang with the
following call trace:

 [<ffffffff814eae85>] schedule_timeout+0x215/0x2e0
 [<ffffffff814eab03>] wait_for_common+0x123/0x180
 [<ffffffff814eac1d>] wait_for_completion+0x1d/0x20
 [<ffffffffa03b398c>] srp_connect_target+0x1dc/0x410 [ib_srp]
 [<ffffffffa03b5809>] srp_create_target+0xba9/0xe70 [ib_srp]
 [<ffffffff8133e590>] dev_attr_store+0x20/0x30
 [<ffffffff811eb8f5>] sysfs_write_file+0xe5/0x170
 [<ffffffff811767c8>] vfs_write+0xb8/0x1a0
 [<ffffffff811770c1>] sys_write+0x51/0x90
 [<ffffffff8100b072>] system_call_fastpath+0x16/0x1b

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:31 -07:00
Bart Van Assche
a702adceec IB/srp: Make writing into the "add_target" sysfs attribute interruptible
Avoid that stopping srp_daemon takes unusually long due to a cable
pull by making writing into the "add_target" sysfs attribute
interruptible.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:31 -07:00
Bart Van Assche
2d7091bcf6 IB/srp: Avoid duplicate connections
The connection uniqueness check is performed before a new connection
is added to the target list. This patch protects both actions by a
mutex such that simultaneous writes from two different threads into the
"add_target" variable do not result in duplicate connections.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:31 -07:00
Bart Van Assche
e7ffde0164 IB/srp: Add more logging
Log sgid and dgid when reporting that a login has been rejected or when
a host has been added. This makes it easy to figure out which initiator
and target ports these messages apply to.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:30 -07:00
Sagi Grimberg
2088ca66f5 IB/srp: Check ib_query_gid return value
Detected by Coverity.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-24 10:05:30 -07:00
Sagi Grimberg
6192d4e6bb IB/iser: Publish T10-PI support to SCSI midlayer
After allocating a scsi_host we set protection types and guard type
supported.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Sagi Grimberg
0a7a08ad6f IB/iser: Implement check_protection
Once the iSCSI transaction is completed we must implement
check_protection in order to notify on DIF errors that may have
occured.

The routine boils down to calling ib_check_mr_status to get the
signature status of the transaction.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Sagi Grimberg
177e31bd5a IB/iser: Support T10-PI operations
Add logic to initialize protection information entities.  Upon each
iSCSI task, we keep the scsi_cmnd in order to query the scsi
protection operations and reference to protection buffers.

Modify iser_fast_reg_mr to receive indication whether it is
registering the data or protection buffers.

In addition introduce iser_reg_sig_mr which performs fast registration
work-request for a signature enabled memory region
(IB_WR_REG_SIG_MR).  In this routine we set all the protection
relevants for the device to offload protection data-transfer and
verification.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Alex Tabachnik
6b5a8fb0d2 IB/iser: Initialize T10-PI resources
During connection establishment we also initialize T10-PI resources
(QP, PI contexts) in order to support SCSI's protection operations.

Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Alex Tabachnik
7f73384752 IB/iser: Introduce pi_enable, pi_guard module parameters
Use modparams to activate protection information support.

pi_enable bool: Based on this parameter iSER will know if it should
support T10-PI.  We don't want to do this by default as it requires to
allocate and initialize extra resources.  In case pi_enable=N, iSER
won't publish to SCSI midlayer any DIF capabilities.

pi_guard int: Based on this parameter iSER will publish DIX guard type
support to SCSI midlayer.  0 means CRC is allowed to be passed in DIX
buffers, 1 (or non-zero) means IP-CSUM is allowed to be passed in DIX
buffers.  Note that over the wire, only CRC is allowed.

In the next phase, it is worth considering passing these parameters
from iscsid via nlmsg.  This will allow these parameters to be
connection based rather than global.

Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:58 -07:00
Sagi Grimberg
5f588e3d0c IB/iser: Generalize fall_to_bounce_buf routine
Unaligned SG-lists may also happen for protection information.
Generalize bounce buffer routine to handle any iser_data_buf which may
be data and/or protection.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
9a8b08fad2 IB/iser: Generalize iser_unmap_task_data and finalize_rdma_unaligned_sg
This routines operates on data buffers and may also work with
protection infomation buffers.  So we generalize them to handle an
iser_data_buf which can be the command data or command protection
information.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
73bc06b7ed IB/iser: Replace fastreg descriptor valid bool with indicators container
In T10-PI support we will have memory keys for protection buffers and
signature transactions.  We prefer to compact indicators rather than
keeping multiple bools.

This commit does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
65198d6b84 IB/iser: Keep IB device attributes under iser_device
For T10-PI offload support, we will need to know the device signature
offload capability upon every connection establishment.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
310b347c60 IB/iser: Move fast_reg_descriptor initialization to a function
fastreg descriptor will include protection information context.  In
order to place the logic in one place we introduce iser_create_fr_desc
function.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
d11ec4ecf0 IB/iser: Push the decision what memory key to use into fast_reg_mr routine
This is a preparation step for T10-PI offload support.  We prefer to
push the desicion of which mkey to use (global or fastreg) to
iser_fast_reg_mr.  We choose to do this since in T10-PI we may need to
register for protection buffers and in this case we wish to simplify
iser_fast_reg_mr instead of repeating the logic of which key to use.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
7306b8fad4 IB/iser: Avoid FRWR notation, use fastreg instead
FRWR stands for "fast registration work request". We want to avoid
calling the fastreg pool with that name, instead we name it fastreg
which stands for "fast registration".

This pool will include more elements in the future, so it is a good
idea to generalize the name.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:57 -07:00
Sagi Grimberg
db523b8de1 IB/iser: Suppress completions for fast registration work requests
In case iSER uses fast registration method, it should not request for
successful completions on fast registration nor local invalidate
requests.  We color wr_id with ISER_FRWR_LI_WRID in order to correctly
consume error completions.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-03-17 22:33:54 -07:00
Linus Torvalds
66a523db70 Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "This series addresses a number of outstanding issues wrt to active I/O
  shutdown using iser-target.  This includes:

   - Fix a long standing tpg_state bug where a tpg could be referenced
     during explicit shutdown (v3.1+ stable)
   - Use list_del_init for iscsi_cmd->i_conn_node so list_empty checks
     work as expected (v3.10+ stable)
   - Fix a isert_conn->state related hung task bug + ensure outstanding
     I/O completes during session shutdown.  (v3.10+ stable)
   - Fix isert_conn->post_send_buf_count accounting for RDMA READ/WRITEs
     (v3.10+ stable)
   - Ignore FRWR completions during active I/O shutdown (v3.12+ stable)
   - Fix command leakage for interrupt coalescing during active I/O
     shutdown (v3.13+ stable)

  Also included is another DIF emulation fix from Sagi specific to
  v3.14-rc code"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  Target/sbc: Fix sbc_copy_prot for offset scatters
  iser-target: Fix command leak for tx_desc->comp_llnode_batch
  iser-target: Ignore completions for FRWRs in isert_cq_tx_work
  iser-target: Fix post_send_buf_count for RDMA READ/WRITE
  iscsi/iser-target: Fix isert_conn->state hung shutdown issues
  iscsi/iser-target: Use list_del_init for ->i_conn_node
  iscsi-target: Fix iscsit_get_tpg_from_np tpg_state bug
2014-03-09 13:50:14 -07:00
Nicholas Bellinger
ebbe442183 iser-target: Fix command leak for tx_desc->comp_llnode_batch
This patch addresses a number of active I/O shutdown issues
related to isert_cmd descriptors being leaked that are part
of a completion interrupt coalescing batch.

This includes adding logic in isert_cq_tx_comp_err() to
drain any associated tx_desc->comp_llnode_batch, as well
as isert_cq_drain_comp_llist() to drain any associated
isert_conn->conn_comp_llist.

Also, set tx_desc->llnode_active in isert_init_send_wr()
in order to determine when work requests need to be skipped
in isert_cq_tx_work() exception path code.

Finally, update isert_init_send_wr() to only allow interrupt
coalescing when ISER_CONN_UP.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.13+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:10 -08:00
Nicholas Bellinger
9bb4ca68fc iser-target: Ignore completions for FRWRs in isert_cq_tx_work
This patch changes IB_WR_FAST_REG_MR + IB_WR_LOCAL_INV related
work requests to include a ISER_FRWR_LI_WRID value in order to
signal isert_cq_tx_work() that these requests should be ignored.

This is necessary because even though IB_SEND_SIGNALED is not
set for either work request, during a QP failure event the work
requests will be returned with exception status from the TX
completion queue.

v2 changes:
 - Rename ISER_FRWR_LI_WRID -> ISER_FASTREG_LI_WRID (Sagi)

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:10 -08:00
Nicholas Bellinger
b6b87a1df6 iser-target: Fix post_send_buf_count for RDMA READ/WRITE
This patch fixes the incorrect setting of ->post_send_buf_count
related to RDMA WRITEs + READs where isert_rdma_rw->send_wr_num
was not being taken into account.

This includes incrementing ->post_send_buf_count within
isert_put_datain() + isert_get_dataout(), decrementing within
__isert_send_completion() + isert_response_completion(), and
clearing wr->send_wr_num within isert_completion_rdma_read()

This is necessary because even though IB_SEND_SIGNALED is
not set for RDMA WRITEs + READs, during a QP failure event
the work requests will be returned with exception status
from the TX completion queue.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Nicholas Bellinger
defd884845 iscsi/iser-target: Fix isert_conn->state hung shutdown issues
This patch addresses a couple of different hug shutdown issues
related to wait_event() + isert_conn->state.  First, it changes
isert_conn->conn_wait + isert_conn->conn_wait_comp_err from
waitqueues to completions, and sets ISER_CONN_TERMINATING from
within isert_disconnect_work().

Second, it splits isert_free_conn() into isert_wait_conn() that
is called earlier in iscsit_close_connection() to ensure that
all outstanding commands have completed before continuing.

Finally, it breaks isert_cq_comp_err() into seperate TX / RX
related code, and adds logic in isert_cq_rx_comp_err() to wait
for outstanding commands to complete before setting ISER_CONN_DOWN
and calling complete(&isert_conn->conn_wait_comp_err).

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Nicholas Bellinger
5159d763f6 iscsi/iser-target: Use list_del_init for ->i_conn_node
There are a handful of uses of list_empty() for cmd->i_conn_node
within iser-target code that expect to return false once a cmd
has been removed from the per connect list.

This patch changes all uses of list_del -> list_del_init in order
to ensure that list_empty() returns false as expected.

Acked-by: Sagi Grimberg <sagig@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-03-04 17:54:09 -08:00
Linus Torvalds
946dd683af Merge git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target fixes from Nicholas Bellinger:
 "Mostly minor fixes this time to v3.14-rc1 related changes.  Also
  included is one fix for a free after use regression in persistent
  reservations UNREGISTER logic that is CC'ed to >= v3.11.y stable"

* git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  Target/sbc: Fix protection copy routine
  IB/srpt: replace strict_strtoul() with kstrtoul()
  target: Simplify command completion by removing CMD_T_FAILED flag
  iser-target: Fix leak on failure in isert_conn_create_fastreg_pool
  iscsi-target: Fix SNACK Type 1 + BegRun=0 handling
  target: Fix missing length check in spc_emulate_evpd_83()
  qla2xxx: Remove last vestiges of qla_tgt_cmd.cmd_list
  target: Fix 32-bit + CONFIG_LBDAF=n link error w/ sector_div
  target: Fix free-after-use regression in PR unregister
2014-02-15 16:18:47 -08:00
Dan Carpenter
fd8b48b22a IB/iser: Fix use after free in iser_snd_completion()
We use "tx_desc" again after we free it.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:48:09 -08:00
Roi Dayan
7d9eacf945 IB/iser: Avoid dereferencing iscsi_iser conn object when not bound to iser connection
Fix a possible NULL pointer dereference in disconnection flow. This
can happen if the target disconnected/rejected the connection request,
e.g before the binding stage between iscsi connection to the transport
connection.

Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-02-14 09:48:03 -08:00
Jingoo Han
9d8abf4594 IB/srpt: replace strict_strtoul() with kstrtoul()
The usage of strict_strtoul() is not preferred, because
strict_strtoul() is obsolete. Thus, kstrtoul() should be
used.

Signed-off-by: Jingoo Han <jg1.han@samsung.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-02-12 15:14:40 -08:00
Nicholas Bellinger
a80e21b3b2 iser-target: Fix leak on failure in isert_conn_create_fastreg_pool
This patch fixes a memory leak for fr_desc upon failure of
isert_create_fr_desc() in isert_conn_create_fastreg_pool()
code.

As reported by Coverity 1166659:

*** CID 1166659:  Resource leak  (RESOURCE_LEAK)
/drivers/infiniband/ulp/isert/ib_isert.c: 470 in isert_conn_create_fastreg_pool()
464                      isert_conn, isert_conn->conn_fr_pool_size);
465
466             return 0;
467
468     err:
469             isert_conn_free_fastreg_pool(isert_conn);
>>>     CID 1166659:  Resource leak  (RESOURCE_LEAK)
>>>     Variable "fr_desc" going out of scope leaks the storage it points to.
470             return ret;
471     }
472
473     static int
474     isert_connect_request(struct rdma_cm_id *cma_id, struct rdma_cm_event *event)
475     {

Cc: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-02-12 15:14:11 -08:00
Linus Torvalds
4e13c5d021 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "The highlights this round include:

  - add support for SCSI Referrals (Hannes)
  - add support for T10 DIF into target core (nab + mkp)
  - add support for T10 DIF emulation in FILEIO + RAMDISK backends (Sagi + nab)
  - add support for T10 DIF -> bio_integrity passthrough in IBLOCK backend (nab)
  - prep changes to iser-target for >= v3.15 T10 DIF support (Sagi)
  - add support for qla2xxx N_Port ID Virtualization - NPIV (Saurav + Quinn)
  - allow percpu_ida_alloc() to receive task state bitmask (Kent)
  - fix >= v3.12 iscsi-target session reset hung task regression (nab)
  - fix >= v3.13 percpu_ref se_lun->lun_ref_active race (nab)
  - fix a long-standing network portal creation race (Andy)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
  target: Fix percpu_ref_put race in transport_lun_remove_cmd
  target/iscsi: Fix network portal creation race
  target: Report bad sector in sense data for DIF errors
  iscsi-target: Convert gfp_t parameter to task state bitmask
  iscsi-target: Fix connection reset hang with percpu_ida_alloc
  percpu_ida: Make percpu_ida_alloc + callers accept task state bitmask
  iscsi-target: Pre-allocate more tags to avoid ack starvation
  qla2xxx: Configure NPIV fc_vport via tcm_qla2xxx_npiv_make_lport
  qla2xxx: Enhancements to enable NPIV support for QLOGIC ISPs with TCM/LIO.
  qla2xxx: Fix scsi_host leak on qlt_lport_register callback failure
  IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
  IB/isert: Move fastreg descriptor creation to a function
  IB/isert: Avoid frwr notation, user fastreg
  IB/isert: seperate connection protection domains and dma MRs
  tcm_loop: Enable DIF/DIX modes in SCSI host LLD
  target/rd: Add DIF protection into rd_execute_rw
  target/rd: Add support for protection SGL setup + release
  target/rd: Refactor rd_build_device_space + rd_release_device_space
  target/file: Add DIF protection support to fd_execute_rw
  target/file: Add DIF protection init/format support
  ...
2014-01-31 15:31:23 -08:00
Nicholas Bellinger
676687c696 iscsi-target: Convert gfp_t parameter to task state bitmask
This patch propigates the use of task state bitmask now used by
percpu_ida_alloc() up the iscsi-target callchain, replacing the
use of GFP_ATOMIC for TASK_RUNNING, and GFP_KERNEL for
TASK_INTERRUPTIBLE.

Also, drop the unnecessary gfp_t parameter to isert_allocate_cmd(),
and just pass TASK_INTERRUPTIBLE into iscsit_allocate_cmd().

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-25 06:58:52 +00:00
Roland Dreier
8f399921ea Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'ocrdma', 'qib', 'srp' and 'usnic' into for-next 2014-01-22 23:24:13 -08:00
Michal Schmidt
437708c443 IPoIB: Report operstate consistently when brought up without a link
After booting without a working link, "ip link" shows:

 5: mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast
 state DOWN qlen 256
    ...
 7: mlx4_ib1.8003@mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc
 pfifo_fast state DOWN qlen 256
    ...

Then after connecting and disconnecting the link, which should result
in exactly the same state as before, it shows:

 5: mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc pfifo_fast
 state DOWN qlen 256
    ...
 7: mlx4_ib1.8003@mlx4_ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 2044 qdisc
 pfifo_fast state LOWERLAYERDOWN qlen 256
    ...

Notice the (now correct) LOWERLAYERDOWN operstate shown for the
mlx4_ib1.8003 interface. Ideally the identical state would be shown
right after boot.

The problem is related to the calling of netif_carrier_off() in
network drivers.  For a long time it was known that doing
netif_carrier_off() before registering the netdevice would result in
the interface's operstate being shown as UNKNOWN if the device was
brought up without a working link. This problem was fixed in commit
8f4cccbbd9 ('net: Set device operstate at registration time'), but
still there remains the minor inconsistency demonstrated above.

This patch fixes it by moving ipoib's call to netif_carrier_off() into
the .ndo_open method, which is where network drivers ordinarily do it.
With the patch when doing the same test as above, the operstate of
mlx4_ib1.8003 is shown as LOWERLAYERDOWN right after boot.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Acked-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-22 23:01:05 -08:00
Bart Van Assche
93079162bf scsi_transport_srp: Fix a race condition
The rport timers must be stopped before the SRP initiator destroys the
resources associated with the SCSI host. This is necessary because
otherwise the callback functions invoked from the SRP transport layer
could trigger a use-after-free. Stopping the rport timers before
invoking scsi_remove_host() can trigger long delays in the SCSI error
handler if a transport layer failure occurs while scsi_remove_host()
is in progress. Hence move the code for stopping the rport timers from
srp_rport_release() into a new function and invoke that function after
scsi_remove_host() has finished. This patch fixes the following
sporadic kernel crash:

     kernel BUG at include/asm-generic/dma-mapping-common.h:64!
     invalid opcode: 0000 [#1] SMP
     RIP: 0010:[<ffffffffa03b20b1>]  [<ffffffffa03b20b1>] srp_unmap_data+0x121/0x130 [ib_srp]
     Call Trace:
     [<ffffffffa03b20fc>] srp_free_req+0x3c/0x80 [ib_srp]
     [<ffffffffa03b2188>] srp_finish_req+0x48/0x70 [ib_srp]
     [<ffffffffa03b21fb>] srp_terminate_io+0x4b/0x60 [ib_srp]
     [<ffffffffa03a6fb5>] __rport_fail_io_fast+0x75/0x80 [scsi_transport_srp]
     [<ffffffffa03a7438>] rport_fast_io_fail_timedout+0x88/0xc0 [scsi_transport_srp]
     [<ffffffff8108b370>] worker_thread+0x170/0x2a0
     [<ffffffff81090876>] kthread+0x96/0xa0
     [<ffffffff8100c0ca>] child_rip+0xa/0x20

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-21 10:46:17 -08:00
Sagi Grimberg
9bd626e79d IB/isert: pass scatterlist instead of cmd to fast_reg_mr routine
This routine may help for protection registration as well.
This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-19 02:22:09 +00:00
Sagi Grimberg
dc87a90f5d IB/isert: Move fastreg descriptor creation to a function
This routine may be called both by fast registration
descriptors for data and for integrity buffers.

This patch does not change any functionality.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-19 02:22:08 +00:00
Sagi Grimberg
a3a5a82627 IB/isert: Avoid frwr notation, user fastreg
Use fast registration lingo. fast registration will
also incorporate signature/DIF registration.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-19 02:22:07 +00:00
Sagi Grimberg
eb6ab13267 IB/isert: seperate connection protection domains and dma MRs
It is more correct to seperate connections protection domains
and dma_mr handles. protection information support requires to
do so.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2014-01-19 02:22:07 +00:00
Matan Barak
90f1d1b41b IB/core: Add flow steering support for IPoIB UD traffic
When creating an IPoIB UD QP, provide a hint to the low level driver
that the QP should support flow-steering.  This means that privileged
user space applications can steer TCP/IP IPoIB traffic from the
network stack, in a similar manner done with Ethernet RAW_PACKET QPs.

The hint is provided through new QP creation flag called NETIF_QP.

Signed-off-by: Matan Barak <matanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2014-01-14 14:06:50 -08:00
Hangbin Liu
0d68fc4f12 infiniband: make sure the src net is infiniband when create new link
When we create a new infiniband link with uninfiniband device, e.g. `ip link
add link em1 type ipoib pkey 0x8001`. We will get a NULL pointer dereference
cause other dev like Ethernet don't have struct ib_device.

The code path is:
rtnl_newlink
  |-- ipoib_new_child_link
        |-- __ipoib_vlan_add
              |-- ipoib_set_dev_features
                    |-- ib_query_device

Fix this bug by make sure the src net is infiniband when create new link.

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-01-03 20:38:56 -05:00
Nicholas Bellinger
2853c2b667 iser-target: Move INIT_WORK setup into isert_create_device_ib_res
This patch moves INIT_WORK setup for cq_desc->cq_[rx,tx]_work into
isert_create_device_ib_res(), instead of being done each callback
invocation in isert_cq_[rx,tx]_callback().

This also fixes a 'INFO: trying to register non-static key' warning
when cancel_work_sync() is called before INIT_WORK has setup the
struct work_struct.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> #3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-12-19 00:18:43 -08:00
Wei Yongjun
94a7111043 iser-target: fix error return code in isert_create_device_ib_res()
Fix to return a negative error code from the error handling
case instead of 0, as done elsewhere in this function.

Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Cc: <stable@vger.kernel.org> #3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-12-11 11:54:47 -08:00
Linus Torvalds
b0e3636f65 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Things have been quiet this round with mostly bugfixes, percpu
  conversions, and other minor iscsi-target conformance testing changes.

  The highlights include:

   - Add demo_mode_discovery attribute for iscsi-target (Thomas)
   - Convert tcm_fc(FCoE) to use percpu-ida pre-allocation
   - Add send completion interrupt coalescing for ib_isert
   - Convert target-core to use percpu-refcounting for se_lun
   - Fix mutex_trylock usage bug in iscsit_increment_maxcmdsn
   - tcm_loop updates (Hannes)
   - target-core ALUA cleanups + prep for v3.14 SCSI Referrals support (Hannes)

  v3.14 is currently shaping to be a busy development cycle in target
  land, with initial support for T10 Referrals and T10 DIF currently on
  the roadmap"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (40 commits)
  iscsi-target: chap auth shouldn't match username with trailing garbage
  iscsi-target: fix extract_param to handle buffer length corner case
  iscsi-target: Expose default_erl as TPG attribute
  target_core_configfs: split up ALUA supported states
  target_core_alua: Make supported states configurable
  target_core_alua: Store supported ALUA states
  target_core_alua: Rename ALUA_ACCESS_STATE_OPTIMIZED
  target_core_alua: spellcheck
  target core: rename (ex,im)plict -> (ex,im)plicit
  percpu-refcount: Add percpu-refcount.o to obj-y
  iscsi-target: Do not reject non-immediate CmdSNs exceeding MaxCmdSN
  iscsi-target: Convert iscsi_session statistics to atomic_long_t
  target: Convert se_device statistics to atomic_long_t
  target: Fix delayed Task Aborted Status (TAS) handling bug
  iscsi-target: Reject unsupported multi PDU text command sequence
  ib_isert: Avoid duplicate iscsit_increment_maxcmdsn call
  iscsi-target: Fix mutex_trylock usage in iscsit_increment_maxcmdsn
  target: Core does not need blkdev.h
  target: Pass through I/O topology for block backstores
  iser-target: Avoid using FRMR for single dma entry requests
  ...
2013-11-22 10:52:03 -08:00
Linus Torvalds
1ea406c0e0 Main batch of InfiniBand/RDMA changes for 3.13:
- Re-enable flow steering verbs with new improved userspace ABI
  - Fixes for slow connection due to GID lookup scalability
  - IPoIB fixes
  - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
  - Further improvements to SRP error handling
  - Add new transport type for Cisco usNIC
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCAAGBQJSil7BAAoJEENa44ZhAt0hbtgP/A+AmUalbOX6ZKzuOFxsrtY2
 r55CX9b1JBeFM/Zhn2o6y+81lpCjkckJSggESMe4izNgocGw0nW4vYGN4SBynatj
 y8sR9OSn+G3ihuENrzG41MJUGEa5WbcNMy4boN+Oa+qyTlV/WjLR7Fv4WbikK7Wm
 o8FNlXiiDhMoGfHHG5J0MD0EQsnxuLDk2XP+ciu4tLtTs+wBka+gFK8WnMvztle3
 gTeMNna5ilvCS2fdBxteuPA3KeDnJE9AgJSMJ2a4Rh+DR8uTgWYQ6n7amjmOc546
 yhAKkoBkxPE10+Yj82WOPhCFxSeWcuSwJvpgv5dTVZ1XqUUcC1V3TEcZDHmyyHQ7
 uPXgS1A+erBW3OYPBjZqtKvnHObscV12fL+rId3vIhcAQIbFroci08ZwPidEYRkn
 fvwlEKcrIsBIpRXEyjlFCxsiiDnfq1wC1VayMR3jrIK0P6idf1SXf/geiRp9+RGT
 wKUc0j51jvEx29qc65xuhEP9FQV9pCMxyd+FEE0d0KkjMz5hsIkjmcUcBbgF0CGg
 GEyDPlgRLv+vmWDGpT8XraaV/0CJOEQDIgB4WSN87/AZ4UoNt7spW2xqsLsp1toy
 5e0100tpWUleTPLe/Wig5GtBdagQ2jAUK1+186CP93pFPtkwc4/7X3hyp7qPIPTz
 VDvT9DEy6zjSMCLpMcdo
 =xxC+
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband/rdma updates from Roland Dreier:
 - Re-enable flow steering verbs with new improved userspace ABI
 - Fixes for slow connection due to GID lookup scalability
 - IPoIB fixes
 - Many fixes to HW drivers including mlx4, mlx5, ocrdma and qib
 - Further improvements to SRP error handling
 - Add new transport type for Cisco usNIC

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
  IB/core: Re-enable create_flow/destroy_flow uverbs
  IB/core: extended command: an improved infrastructure for uverbs commands
  IB/core: Remove ib_uverbs_flow_spec structure from userspace
  IB/core: Use a common header for uverbs flow_specs
  IB/core: Make uverbs flow structure use names like verbs ones
  IB/core: Rename 'flow' structs to match other uverbs structs
  IB/core: clarify overflow/underflow checks on ib_create/destroy_flow
  IB/ucma: Convert use of typedef ctl_table to struct ctl_table
  IB/cm: Convert to using idr_alloc_cyclic()
  IB/mlx5: Fix page shift in create CQ for userspace
  IB/mlx4: Fix device max capabilities check
  IB/mlx5: Fix list_del of empty list
  IB/mlx5: Remove dead code
  IB/core: Encorce MR access rights rules on kernel consumers
  IB/mlx4: Fix endless loop in resize CQ
  RDMA/cma: Remove unused argument and minor dead code
  RDMA/ucma: Discard events for IDs not yet claimed by user space
  IB/core: Add Cisco usNIC rdma node and transport types
  RDMA/nes: Remove self-assignment from nes_query_qp()
  IB/srp: Report receive errors correctly
  ...
2013-11-18 15:36:04 -08:00
Roland Dreier
b4fdf52b3f Merge branches 'cma', 'cxgb4', 'flowsteer', 'ipoib', 'misc', 'mlx4', 'mlx5', 'nes', 'ocrdma', 'qib' and 'srp' into for-next 2013-11-17 08:22:19 -08:00
Linus Torvalds
9073e1a804 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "Usual earth-shaking, news-breaking, rocket science pile from
  trivial.git"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (23 commits)
  doc: usb: Fix typo in Documentation/usb/gadget_configs.txt
  doc: add missing files to timers/00-INDEX
  timekeeping: Fix some trivial typos in comments
  mm: Fix some trivial typos in comments
  irq: Fix some trivial typos in comments
  NUMA: fix typos in Kconfig help text
  mm: update 00-INDEX
  doc: Documentation/DMA-attributes.txt fix typo
  DRM: comment: `halve' -> `half'
  Docs: Kconfig: `devlopers' -> `developers'
  doc: typo on word accounting in kprobes.c in mutliple architectures
  treewide: fix "usefull" typo
  treewide: fix "distingush" typo
  mm/Kconfig: Grammar s/an/a/
  kexec: Typo s/the/then/
  Documentation/kvm: Update cpuid documentation for steal time and pv eoi
  treewide: Fix common typo in "identify"
  __page_to_pfn: Fix typo in comment
  Correct some typos for word frequency
  clk: fixed-factor: Fix a trivial typo
  ...
2013-11-15 16:47:22 -08:00
Nicholas Bellinger
04d9cd1224 ib_isert: Avoid duplicate iscsit_increment_maxcmdsn call
This patch avoids a duplicate iscsit_increment_maxcmdsn() call for
ISER_IB_RDMA_WRITE within isert_map_rdma() + isert_reg_rdma_frwr(),
which will already be occuring once during isert_put_datain() ->
iscsit_build_rsp_pdu() operation.

It also removes the local conn->stat_sn assignment + increment,
and changes the third parameter to iscsit_build_rsp_pdu() to
signal this should be done by iscsi_target_mode code.

Tested-by: Moussa Ba <moussaba@micron.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-12 18:05:07 -08:00
Vu Pham
f01b9f7339 iser-target: Avoid using FRMR for single dma entry requests
This patch changes isert_reg_rdma_frwr() to not use FRMR for single
dma entry requests from small I/Os, in order to avoid the associated
memory registration overhead.

Using DMA MR is sufficient here for the single dma entry requests,
and addresses a >= v3.12 performance regression.

Signed-off-by: Vu Pham <vu@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.12+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-12 13:07:52 -08:00
Bart Van Assche
cd4e38542a IB/srp: Report receive errors correctly
The IB spec does not guarantee that the opcode is available in error
completions.  Hence do not rely on it.  See also commit 948d1e889e
("IB/srp: Introduce srp_handle_qp_err()").

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org> # v3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:18 -08:00
Bart Van Assche
99b6697a50 IB/srp: Avoid offlining operational SCSI devices
If SCSI commands are submitted with a SCSI request timeout that is
lower than the the IB RC timeout, it can happen that the SCSI error
handler has already started device recovery before transport layer
error handling starts.  So it can happen that the SCSI error handler
tries to abort a SCSI command after it has been reset by
srp_rport_reconnect().

Tell the SCSI error handler that such commands have finished and that
it is not necessary to continue its recovery strategy for commands
that have been reset by srp_rport_reconnect().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:18 -08:00
Vu Pham
65d7dd2f34 IB/srp: Remove target from list before freeing Scsi_Host structure
Remove an SRP target from the SRP target list before invoking the last
scsi_host_put() call.  This change is necessary because that last put
frees the memory that holds the srp_target_port structure.

This patch prevents the following kernel oops:

    RIP: 0010:[<ffffffff810b00d0>] __lock_acquire+0x500/0x1570
    Call Trace:
     [<ffffffff810b11e4>] lock_acquire+0xa4/0x120
     [<ffffffff81531206>] _spin_lock+0x36/0x70
     [<ffffffffa01b6d8f>] srp_remove_work+0xef/0x180 [ib_srp]
     [<ffffffff8109125c>] worker_thread+0x21c/0x3d0
     [<ffffffff81096e86>] kthread+0x96/0xa0
     [<ffffffff8100c20a>] child_rip+0xa/0x20

Signed-off-by: Vu Pham <vuhuong@mellanox.com>

[ bvanassche - Modified path description and CC'ed stable. ]

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:17 -08:00
Jack Wang
71444b9781 IB/srp: Add change_queue_depth and change_queue_type support
Currently, it's not possible to change queue depth for a device behind
SRP host. Sometimes, we need to adjust queue_depth for performance
reason (eg storage busy, we need lower queue_depth to avoid running
into SCSI error handler), so this patch add support for SRP driver.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:17 -08:00
Bart Van Assche
4d73f95f70 IB/srp: Make queue size configurable
Certain storage configurations, e.g. a sufficiently large array of
hard disks in a RAID configuration, need a queue depth above 64 to
achieve optimal performance. Hence make the queue depth configurable.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Tested-by: Jack Wang <xjtuwjp@gmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:17 -08:00
Bart Van Assche
b81d00bddf IB/srp: Introduce srp_alloc_req_data()
This patch does not change any functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Vu Pham <vu@mellanox.com>
Cc: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:17 -08:00
Bart Van Assche
848b3082db IB/srp: Export sgid to sysfs
On an initiator system with multiple IB ports it is not yet possible
to figure out what the originating port of an SRP connection is. Hence
make the source GID available in sysfs.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:16 -08:00
Bart Van Assche
a95cadb9da IB/srp: Add periodic reconnect functionality
After a transport layer occurred, periodically try to reconnect
to the target until the dev_loss timer expires.  Protect the
callback functions that can be invoked from inside the SCSI EH
against concurrent invocation with srp_reconnect_rport() via the
rport mutex. Change the default dev_loss_tmo from 60s into 600s
to give the reconnect mechanism a chance to kick in.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:16 -08:00
Bart Van Assche
8c64e4531c scsi_transport_srp: Add periodic reconnect support
Add support for periodically reconnecting to an SRP target until
the dev_loss timer expires. After the tenth reconnection attempt,
gradually slow down subsequent reconnect attempts.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:16 -08:00
Bart Van Assche
c1120f8981 IB/srp: Start timers if a transport layer error occurs
Start the reconnect timer, fast_io_fail timer and dev_loss timers if a
transport layer error occurs.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:15 -08:00
Bart Van Assche
ed9b2264fb IB/srp: Use SRP transport layer error recovery
Enable fast_io_fail_tmo and dev_loss_tmo functionality for the IB SRP
initiator.  Add kernel module parameters that allow to specify default
values for these parameters.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:15 -08:00
Bart Van Assche
9dd69a600a IB/srp: Keep rport as long as the IB transport layer
Keep the rport data structure around after srp_remove_host() has
finished until cleanup of the IB transport layer has finished
completely. This is necessary because later patches use the rport
pointer inside the queuecommand callback. Without this patch
accessing the rport from inside a queuecommand callback is racy
because srp_remove_host() must be invoked before scsi_remove_host()
and because the queuecommand callback could get invoked after
srp_remove_host() has finished. In other words, without this patch
the queuecommand callback can get invoked after the rport data
structure has been freed.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:15 -08:00
Vu Pham
7bb312e4a2 IB/srp: Make transport layer retry count configurable
Allow the InfiniBand RC retry count to be configured by the user as an
option in the target login string.  Reducing this retry count allows to
reduce the path failover time.

Signed-off-by: Vu Pham <vu@mellanox.com>

[ bvanassche: Rewrote patch description / changed default retry count ]

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:43:15 -08:00
Michal Schmidt
7f1a38671c IPoIB: lower NAPI weight
Since commit 82dc3c63c6 ("net: introduce NAPI_POLL_WEIGHT")
netif_napi_add() produces an error message if a NAPI poll weight
greater than 64 is requested.

Use the standard NAPI weight.

Signed-off-by: Michal Schmidt <mschmidt@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:50 -08:00
Erez Shitrit
94232d9ce8 IPoIB: Start multicast join process only on active ports
The driver starts the mcast_join task whenever the netdev interface is
UP without relation to the underlying IB port state.

Until the port state is ACTIVE all the join requests are irrelevant,
and the IB core returns -EINVAL. So the user will see errors such as:
"multicast join failed for ff12:401b:... , status -22".

Instead, have ipoib_mcast_join_task() return when the port is not active.

It will be called again when the port state is changed and the
low-level driver triggers the IB_EVENT_PORT_ACTIVE event or the
IB_EVENT_CLIENT_REREGISTER event.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Erez Shitrit
a39c52ab88 IPoIB: Add path query flushing in ipoib_ib_dev_cleanup
The path_rec_completion() callback may be invoked asynchronously even
at the middle of "driver uninit" process.  This can lead to scheduling
a task that tries to touch members of the priv object that are no
longer valid.  For example the function cm_create_tx_qp can attempt to
create qp with no valid priv->pd object.

The following crash is one of the results:
RIP: 0010:[<ffffffffa021bb47>]  [<ffffffffa021bb47>] ipoib_cm_create_tx_qp+0x57/0x90 [ib_ipoib]
Process ipoib (pid: 5916, threadinfo ffff8803786e4000, task ffff8804150e1500)
Stack:
Call Trace:
[<ffffffff81309ef0>] ? get_random_bytes+0x20/0x30
[<ffffffffa021be2a>] ipoib_cm_tx_init+0xca/0x340 [ib_ipoib]
[<ffffffffa021f765>] ipoib_cm_tx_start+0x215/0x3f0 [ib_ipoib]
[<ffffffffa021f550>] ? ipoib_cm_tx_start+0x0/0x3f0 [ib_ipoib]
[<ffffffff8108b2b0>] worker_thread+0x170/0x2a0
[<ffffffff81090bf0>] ? autoremove_wake_function+0x0/0x40
[<ffffffff8108b140>] ? worker_thread+0x0/0x2a0
[<ffffffff81090886>] kthread+0x96/0xa0
[<ffffffff8100c14a>] child_rip+0xa/0x20
[<ffffffff810907f0>] ? kthread+0x0/0xa0
[<ffffffff8100c140>] ? child_rip+0x0/0x20

Fix that by flushing all pending path queries at this point.

Signed-off-by: Alex Markuze <markuze@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Erez Shitrit
a9c8ba5884 IPoIB: Fix usage of uninitialized multicast objects
The driver should avoid calling ib_sa_free_multicast on the mcast->mc
object until it finishes its initialization state.  Otherwise we can
crash when ipoib_mcast_dev_flush() attempts to use the uninitialized
multicast object.

Instead, only call wait_for_completion() for multicast entries that
started the join process, meaning that ib_sa_join_multicast() finished.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Erez Shitrit
aede25011f IPoIB: Avoid flushing the driver workqueue on dev_down
The driver should not flush the whole workqueue when only one work (the
pkey poll one) needs to be cancelled.  Use cancel_delayed_work_sync()
instead.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Erez Shitrit
f47944cc2d IPoIB: Fix deadlock between dev_change_flags() and __ipoib_dev_flush()
When ipoib interface is going down it takes all of its children with
it, under mutex.

For each child, dev_change_flags() is called.  That function calls
ipoib_stop() via the ndo, and causes flush of the workqueue.
Sometimes in the workqueue an __ipoib_dev_flush work() is waiting and
when invoked tries to get the same mutex, which leads to a deadlock,
as seen below.

The solution is to switch to rw-sem instead of mutex.

The deadlock:
[11028.165303]  [<ffffffff812b0977>] ? vgacon_scroll+0x107/0x2e0
[11028.171844]  [<ffffffff814eaac5>] schedule_timeout+0x215/0x2e0
[11028.178465]  [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80
[11028.185962]  [<ffffffff814ea743>] wait_for_common+0x123/0x180
[11028.192491]  [<ffffffff8105fa40>] ? default_wake_function+0x0/0x20
[11028.199504]  [<ffffffff814ea85d>] wait_for_completion+0x1d/0x20
[11028.206224]  [<ffffffff8108b4f1>] flush_cpu_workqueue+0x61/0x90
[11028.212948]  [<ffffffff8108b5a0>] ? wq_barrier_func+0x0/0x20
[11028.219375]  [<ffffffff8108bfc4>] flush_workqueue+0x54/0x80
[11028.225712]  [<ffffffffa05a0576>] ipoib_mcast_stop_thread+0x66/0x90 [ib_ipoib]
[11028.233988]  [<ffffffffa059ccea>] ipoib_ib_dev_down+0x6a/0x100 [ib_ipoib]
[11028.241678]  [<ffffffffa059849a>] ipoib_stop+0x8a/0x140 [ib_ipoib]
[11028.248692]  [<ffffffff8142adf1>] dev_close+0x71/0xc0
[11028.254447]  [<ffffffff8142a631>] dev_change_flags+0xa1/0x1d0
[11028.261062]  [<ffffffffa059851b>] ipoib_stop+0x10b/0x140 [ib_ipoib]
[11028.268172]  [<ffffffff8142adf1>] dev_close+0x71/0xc0
[11028.273922]  [<ffffffff8142a631>] dev_change_flags+0xa1/0x1d0
[11028.280452]  [<ffffffff8148f20b>] devinet_ioctl+0x5eb/0x6a0
[11028.286786]  [<ffffffff814903b8>] inet_ioctl+0x88/0xa0
[11028.292633]  [<ffffffff8141591a>] sock_ioctl+0x7a/0x280
[11028.298576]  [<ffffffff81189012>] vfs_ioctl+0x22/0xa0
[11028.304326]  [<ffffffff81140540>] ? unmap_region+0x110/0x130
[11028.310756]  [<ffffffff811891b4>] do_vfs_ioctl+0x84/0x580
[11028.316897]  [<ffffffff81189731>] sys_ioctl+0x81/0xa0

and

11028.017533]  [<ffffffff8105a5c3>] ? perf_event_task_sched_out+0x33/0x80
[11028.025030]  [<ffffffff8100bb8e>] ? apic_timer_interrupt+0xe/0x20
[11028.031945]  [<ffffffff814eb2ae>] __mutex_lock_slowpath+0x13e/0x180
[11028.039053]  [<ffffffff814eb14b>] mutex_lock+0x2b/0x50
[11028.044910]  [<ffffffffa059f7e7>] __ipoib_ib_dev_flush+0x37/0x210 [ib_ipoib]
[11028.052894]  [<ffffffffa059fa00>] ? ipoib_ib_dev_flush_light+0x0/0x20 [ib_ipoib]
[11028.061363]  [<ffffffffa059fa17>] ipoib_ib_dev_flush_light+0x17/0x20 [ib_ipoib]
[11028.069738]  [<ffffffff8108b120>] worker_thread+0x170/0x2a0
[11028.076068]  [<ffffffff81090990>] ? autoremove_wake_function+0x0/0x40
[11028.083374]  [<ffffffff8108afb0>] ? worker_thread+0x0/0x2a0
[11028.089709]  [<ffffffff81090626>] kthread+0x96/0xa0
[11028.095266]  [<ffffffff8100c0ca>] child_rip+0xa/0x20
[11028.100921]  [<ffffffff81090590>] ? kthread+0x0/0xa0
[11028.106573]  [<ffffffff8100c0c0>] ? child_rip+0x0/0x20
[11028.112423] INFO: task ifconfig:23640 blocked for more than 120 seconds.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:49 -08:00
Tal Alon
22252b4e09 IPoIB: Change CM skb memory allocation to be non-atomic during init
Change CM skb memory allocation to use GFP_KERNEL when possible.

During device init there's no need to use GFP_ATOMIC when allocating
memory for the CM skbs -- use GFP_KERNEL instead.

Signed-off-by: Tal Alon <talal@mellanox.com>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:48 -08:00
Erez Shitrit
c2bb5628db IPoIB: Fix crash in dev_open error flow
If napi has never been enabled when calling ipoib_ib_dev_stop, a
kernel crash occurs, because the verbs layer completion handler
(ipoib_ib_completion) calls napi_schedule unconditionally.

If the napi structure passed in the napi_schedule call has not
been initialized, napi will crash.

The cleanest solution is to simply enable napi before calling
ipoib_ib_dev_stop in the dev_open error flow. (dev_stop then
immediately disables napi).

Signed-off-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-11-08 14:42:48 -08:00
Nicholas Bellinger
4a9a6c8d53 target: Drop left-over se_lun->lun_cmd_list shutdown code
Now with percpu refcounting for se_lun in place, go ahead and drop
the legacy per se_cmd accounting for se_lun shutdown.

This includes __transport_clear_lun_from_sessions(), the associated
transport_lun_wait_for_tasks() logic, along with a handful of now
unused se_cmd structure members and ->transport_state bits.

Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-07 14:25:02 -08:00
Nicholas Bellinger
95b60f0788 ib_isert: Add support for completion interrupt coalescing
This patch adds support for completion interrupt coalescing that
allows only every ISERT_COMP_BATCH_COUNT (8) to set IB_SEND_SIGNALED,
thus avoiding completion interrupts for every posted iser_tx_desc.

The batch processing is done using a per isert_conn llist that once
IB_SEND_SIGNALED has been set is saved to tx_desc->comp_llnode_batch,
and completion processing of previously posted iser_tx_descs is done
in a single shot from within isert_send_completion() code.

Note this is only done for response PDUs from ISCSI_OP_SCSI_CMD, and
all other control type of PDU responses will force an implicit batch
drain to occur.

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Sagi Grimberg <sagig@mellanox.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-11-06 12:48:22 -08:00
Vu Pham
0a66614b93 iser-target: check device before dereferencing its variable
This patch changes isert_connect_release() to correctly check for
the existence struct isert_device *device before checking for
isert_device->use_frwr.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-23 21:42:33 -07:00
Masanari Iida
8c88126bbb treewide: Fix typo in Kconfig
Correct spelling typo in Kconfig.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-10-14 15:23:02 +02:00
Jack Wang
c807f64340 ib_srpt: always set response for task management
The SRP specification requires:

  "Response data shall be provided in any SRP_RSP response that is sent in
   response to an SRP_TSK_MGMT request (see 6.7). The information in the
   RSP_CODE field (see table 24) shall indicate the completion status of
   the task management function."

So fix this to avoid the SRP initiator interprets task management functions
that succeeded as failed.

Signed-off-by: Jack Wang <jinpu.wang@profitbricks.com>
Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-03 04:23:17 -07:00
Nicholas Bellinger
0b41d6ca61 ib_srpt: Destroy cm_id before destroying QP.
This patch fixes a bug where ib_destroy_cm_id() was incorrectly being called
after srpt_destroy_ch_ib() had destroyed the active QP.

This would result in the following failed SRP_LOGIN_REQ messages:

Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff1762bd, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c903009f8f41)
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff1758f9, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c903009f8f42)
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff175941, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 2 (guid=0xfe80000000000000:0x2c90300a3cfb2)
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1)
mlx4_core 0000:84:00.0: command 0x19 failed: fw status = 0x9
rejected SRP_LOGIN_REQ because creating a new RDMA channel failed.
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1)
mlx4_core 0000:84:00.0: command 0x19 failed: fw status = 0x9
rejected SRP_LOGIN_REQ because creating a new RDMA channel failed.
Received SRP_LOGIN_REQ with i_port_id 0x0:0x2590ffff176299, t_port_id 0x2c903009f8f40:0x2c903009f8f40 and it_iu_len 260 on port 1 (guid=0xfe80000000000000:0x2c90300a3cfb1)

Reported-by: Navin Ahuja <navin.ahuja@saratoga-speed.com>
Cc: stable@vger.kernel.org # 3.3+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-10-01 21:27:30 -07:00
Linus Torvalds
48efe453e6 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Lots of activity again this round for I/O performance optimizations
  (per-cpu IDA pre-allocation for vhost + iscsi/target), and the
  addition of new fabric independent features to target-core
  (COMPARE_AND_WRITE + EXTENDED_COPY).

  The main highlights include:

   - Support for iscsi-target login multiplexing across individual
     network portals
   - Generic Per-cpu IDA logic (kent + akpm + clameter)
   - Conversion of vhost to use per-cpu IDA pre-allocation for
     descriptors, SGLs and userspace page pointer list
   - Conversion of iscsi-target + iser-target to use per-cpu IDA
     pre-allocation for descriptors
   - Add support for generic COMPARE_AND_WRITE (AtomicTestandSet)
     emulation for virtual backend drivers
   - Add support for generic EXTENDED_COPY (CopyOffload) emulation for
     virtual backend drivers.
   - Add support for fast memory registration mode to iser-target (Vu)

  The patches to add COMPARE_AND_WRITE and EXTENDED_COPY support are of
  particular significance, which make us the first and only open source
  target to support the full set of VAAI primitives.

  Currently Linux clients are lacking upstream support to actually
  utilize these primitives.  However, with server side support now in
  place for folks like MKP + ZAB working on the client, this logic once
  reserved for the highest end of storage arrays, can now be run in VMs
  on their laptops"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
  target/iscsi: Bump versions to v4.1.0
  target: Update copyright ownership/year information to 2013
  iscsi-target: Bump default TCP listen backlog to 256
  target: Fix >= v3.9+ regression in PR APTPL + ALUA metadata write-out
  iscsi-target; Bump default CmdSN Depth to 64
  iscsi-target: Remove unnecessary wait_for_completion in iscsi_get_thread_set
  iscsi-target: Add thread_set->ts_activate_sem + use common deallocate
  iscsi-target: Fix race with thread_pre_handler flush_signals + ISCSI_THREAD_SET_DIE
  target: remove unused including <linux/version.h>
  iser-target: introduce fast memory registration mode (FRWR)
  iser-target: generalize rdma memory registration and cleanup
  iser-target: move rdma wr processing to a shared function
  target: Enable global EXTENDED_COPY setup/release
  target: Add Third Party Copy (3PC) bit in INQUIRY response
  target: Enable EXTENDED_COPY setup in spc_parse_cdb
  target: Add support for EXTENDED_COPY copy offload emulation
  target: Avoid non-existent tg_pt_gp_mem in target_alua_state_check
  target: Add global device list for EXTENDED_COPY
  target: Make helpers non static for EXTENDED_COPY command setup
  target: Make spc_parse_naa_6h_vendor_specific non static
  ...
2013-09-12 16:11:45 -07:00
Nicholas Bellinger
4c76251e8e target: Update copyright ownership/year information to 2013
Update copyright ownership/year information for target-core,
loopback, iscsi-target, tcm_qla2xx, vhost and iser-target.

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10 20:23:36 -07:00
Vu Pham
59464ef4fb iser-target: introduce fast memory registration mode (FRWR)
This model was introduced in 00f7ec36c "RDMA/core: Add memory
management extensions support" and works when the IB device
supports the IB_DEVICE_MEM_MGT_EXTENSIONS capability.

Upon creating the isert device, ib_isert will test whether the HCA
supports FRWR. If supported then set the flag and assign
function pointers that handle fast registration and deregistration
of appropriate resources (fast_reg descriptors).

When new connection coming in, ib_isert will check frwr flag and
create frwr resouces, if fail to do it will switch back to
old model of using global dma key and turn off the frwr support.

Registration is done using posting IB_WR_FAST_REG_MR to the QP and
invalidations using posting IB_WR_LOCAL_INV.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10 16:48:51 -07:00
Vu Pham
d40945d8c2 iser-target: generalize rdma memory registration and cleanup
Current driver uses global dma key to register the memory pointed
by sg list provided by the target core.

This is the preparation step for adding more methods like fast
path memory registration, make the reg/unreg calls be function
pointers.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10 16:48:50 -07:00
Vu Pham
90ecc6e251 iser-target: move rdma wr processing to a shared function
isert_put_datain() and isert_get_dataout() share a lot of code
in rdma wr processing, move this common code to a shared function.

Use isert_unmap_cmd to cleanup for RDMA_READ completion.
Remove duplicate field in isert_cmd and isert_rdma_wr structs
Change misc debug messages to track isert_cmd

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-10 16:48:49 -07:00
Nicholas Bellinger
d703ce2f7f iscsi/iser-target: Convert to command priv_size usage
This command converts iscsi/isert-target to use allocations based on
iscsit_transport->priv_size within iscsit_allocate_cmd(), instead of
using an embedded isert_cmd->iscsi_cmd.

This includes removing iscsit_transport->alloc_cmd() usage, along
with updating isert-target code to use iscsit_priv_cmd().

Also, remove left-over iscsit_transport->release_cmd() usage for
direct calls to iscsit_release_cmd(), and drop the now unused
lio_cmd_cache and isert_cmd_cache.

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Kent Overstreet <kmo@daterainc.com>
Signed-off-by: Nicholas Bellinger <nab@daterainc.com>
2013-09-09 14:29:21 -07:00
Nicholas Bellinger
6faaa85f37 iser-target: Updates for login negotiation multi-plexing support
This patch updates iser-target code to support login negotiation
multi-plexing.  This includes only using isert_conn->conn_login_comp
for the first login request PDU, pushing the subsequent processing
to iscsi_conn->login_work -> iscsi_target_do_login_rx(), and turning
isert_get_login_rx() into a NOP.

v3 changes:
   - Drop unnecessary LOGIN_FLAGS_READ_ACTIVE bit set in
     isert_rx_login_req()

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-09-09 14:27:21 -07:00
Linus Torvalds
7c049d0869 Main batch of InfiniBand/RDMA changes for 3.12 merge window:
- Large ocrdma HW driver update: add "fast register" work requests,
    fixes, cleanups
  - Add receive flow steering support for raw QPs
  - Fix IPoIB neighbour race that leads to crash
  - iSER updates including support for using "fast register" memory
    registration
  - IPv6 support for iWARP
  - XRC transport fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.14 (GNU/Linux)
 
 iQIcBAABCAAGBQJSJ2drAAoJEENa44ZhAt0hpaUQAJ6EdNFh2bon9c/uHz1lw58A
 DIvVniUwWGCRgpbsv/IxZDBVX3G5IKSW6U0Y3/JxMXeIF/cGOUaJZgCeKPiYNoNA
 yxiEGI0ffvpWGAJUHSE2ARJKvgdjrpqGo5UzsiinEhJx3uczeZBcooosQTWDXxya
 /qSWH3rARkic9abuaizkuVzdWEjWLNjyh2bWnqJ4HNZplE8RfSK4bk8QtvEmpn9Z
 dNBKFFujysfHHGflLuSkFAkP1NdqlZeQ4/2uzD23p3YofbJwrJoJNFxkCfx3UUml
 fjPptlhU6kZMCwSXsD24tAV8Exr/CCgmxriFIN/Xqhu4gvBELScRPPqPqFquhtXI
 pP5hbG9/9P5C8BLgABe6IAvlU1lUyraR67bA78nYeKm2JpATE6T23Nx7nUgMgzRS
 Ee3ZvZGZvk8BZ7zyQh8D7Aig1XOtzqA9D5nLUyEJ2aRPSnXJxyi0S3Fbn0vmOMb7
 8CF+99KECG8fb4sjNAjX5Zm48+7WJd+WCd3t89oUzCbJKKpa1Chctph8VH7dEfke
 JkEjOJhuAGqau4bIMZ2nZkISoTfSD7vbjPAxMPASS7fOdR7fe6AqDn9/LMAhWjpw
 4Fb25ZW7rYf9+6jqA4MgMRFCTkXhR25FsqUUL9h8NaWai4F1dfVoZHbRvGIZah5n
 5HQupXGquwES4y6pvHjc
 =rabl
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull main batch of InfiniBand/RDMA changes from Roland Dreier:
 - Large ocrdma HW driver update: add "fast register" work requests,
   fixes, cleanups
 - Add receive flow steering support for raw QPs
 - Fix IPoIB neighbour race that leads to crash
 - iSER updates including support for using "fast register" memory
   registration
 - IPv6 support for iWARP
 - XRC transport fixes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (54 commits)
  RDMA/ocrdma: Fix compiler warning about int/pointer size mismatch
  IB/iser: Fix redundant pointer check in dealloc flow
  IB/iser: Fix possible memory leak in iser_create_frwr_pool()
  IB/qib: Move COUNTER_MASK definition within qib_mad.h header guards
  RDMA/ocrdma: Fix passing wrong opcode to modify_srq
  RDMA/ocrdma: Fill PVID in UMC case
  RDMA/ocrdma: Add ABI versioning support
  RDMA/ocrdma: Consider multiple SGES in case of DPP
  RDMA/ocrdma: Fix for displaying proper link speed
  RDMA/ocrdma: Increase STAG array size
  RDMA/ocrdma: Dont use PD 0 for userpace CQ DB
  RDMA/ocrdma: FRMA code cleanup
  RDMA/ocrdma: For ERX2 irrespective of Qid, num_posted offset is 24
  RDMA/ocrdma: Fix to work with even a single MSI-X vector
  RDMA/ocrdma: Remove the MTU check based on Ethernet MTU
  RDMA/ocrdma: Add support for fast register work requests (FRWR)
  RDMA/ocrdma: Create IRD queue fix
  IB/core: Better checking of userspace values for receive flow steering
  IB/mlx4: Add receive flow steering support
  IB/core: Export ib_create/destroy_flow through uverbs
  ...
2013-09-05 09:39:27 -07:00
Roland Dreier
82af24ac6f Merge branches 'cxgb4', 'flowsteer', 'ipoib', 'iser', 'mlx4', 'ocrdma' and 'qib' into for-next 2013-09-03 09:01:08 -07:00
Sagi Grimberg
2e02d653fe IB/iser: Fix redundant pointer check in dealloc flow
This bug was discovered by Smatch static checker run by Dan Carpenter.
If in free_rx_descriptors(), rx_descs are not NULL then the iser
device is definately not NULL, so no need to check it before
dereferencing it.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-09-02 21:26:16 -07:00
Roi Dayan
27ae2d1ea5 IB/iser: Fix possible memory leak in iser_create_frwr_pool()
Fix leak where desc is not being freed in error flows.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-09-02 21:24:08 -07:00
Or Gerlitz
6a06a4b8cf [SCSI] IB/iser: Add Discovery support
To run discovery over iSER we need to advertize the CAP_TEXT_NEGO capability
towards user space. Also need to make sure the login RX buffer is posted when
SendTargets TEXT PDUs are sent. For that end, we use a setting of the
ISCSI_PARAM_DISCOVERY_SESS iscsi param as an indication that this is
discovery session.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2013-08-26 18:53:49 +04:00
Jim Foraker
49b8e74438 IPoIB: Fix race in deleting ipoib_neigh entries
In several places, this snippet is used when removing neigh entries:

	list_del(&neigh->list);
	ipoib_neigh_free(neigh);

The list_del() removes neigh from the associated struct ipoib_path, while
ipoib_neigh_free() removes neigh from the device's neigh entry lookup
table.  Both of these operations are protected by the priv->lock
spinlock.  The table however is also protected via RCU, and so naturally
the lock is not held when doing reads.

This leads to a race condition, in which a thread may successfully look
up a neigh entry that has already been deleted from neigh->list.  Since
the previous deletion will have marked the entry with poison, a second
list_del() on the object will cause a panic:

  #5 [ffff8802338c3c70] general_protection at ffffffff815108c5
     [exception RIP: list_del+16]
     RIP: ffffffff81289020  RSP: ffff8802338c3d20  RFLAGS: 00010082
     RAX: dead000000200200  RBX: ffff880433e60c88  RCX: 0000000000009e6c
     RDX: 0000000000000246  RSI: ffff8806012ca298  RDI: ffff880433e60c88
     RBP: ffff8802338c3d30   R8: ffff8806012ca2e8   R9: 00000000ffffffff
     R10: 0000000000000001  R11: 0000000000000000  R12: ffff8804346b2020
     R13: ffff88032a3e7540  R14: ffff8804346b26e0  R15: 0000000000000246
     ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0000
  #6 [ffff8802338c3d38] ipoib_cm_tx_handler at ffffffffa066fe0a [ib_ipoib]
  #7 [ffff8802338c3d98] cm_process_work at ffffffffa05149a7 [ib_cm]
  #8 [ffff8802338c3de8] cm_work_handler at ffffffffa05161aa [ib_cm]
  #9 [ffff8802338c3e38] worker_thread at ffffffff81090e10
 #10 [ffff8802338c3ee8] kthread at ffffffff81096c66
 #11 [ffff8802338c3f48] kernel_thread at ffffffff8100c0ca

We move the list_del() into ipoib_neigh_free(), so that deletion happens
only once, after the entry has been successfully removed from the lookup
table.  This same behavior is already used in ipoib_del_neighs_by_gid()
and __ipoib_reap_neigh().

Signed-off-by: Jim Foraker <foraker1@llnl.gov>
Reviewed-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com>
Reviewed-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-13 11:57:37 -07:00
Sagi Grimberg
5587856c96 IB/iser: Introduce fast memory registration model (FRWR)
Newer HCAs and Virtual functions may not support FMRs but rather a fast
registration model, which we call FRWR - "Fast Registration Work Requests".

This model was introduced in 00f7ec36c ("RDMA/core: Add memory management
extensions support") and works when the IB device supports the
IB_DEVICE_MEM_MGT_EXTENSIONS capability.

Upon creating the iser device iser will test whether the HCA supports
FMRs.  If no support for FMRs, check if IB_DEVICE_MEM_MGT_EXTENSIONS
is supported and assign function pointers that handle fast
registration and allocation of appropriate resources (fast_reg
descriptors).

Registration is done using posting IB_WR_FAST_REG_MR to the QP and
invalidations using posting IB_WR_LOCAL_INV.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:10 -07:00
Sagi Grimberg
e657571b76 IB/iser: Place the fmr pool into a union in iser's IB conn struct
This is preparation step for other memory registration methods to be
added.  In addition, change reg/unreg routines signature to indicate
they use FMRs.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:09 -07:00
Sagi Grimberg
919fc274ef IB/iser: Handle unaligned SG in separate function
This routine will be shared with other rdma management schemes.  The
bounce buffer solution for unaligned SG-lists and the sg_to_page_vec
routine are likely to be used for other registration schemes and not
just FMR.

Move them out of the FMR specific code, and call them from there.
Later they will be called also from other reg methods code.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:09 -07:00
Sagi Grimberg
b4e155ffbb IB/iser: Generalize rdma memory registration
Currently the driver uses FMRs as the only means to register the
memory pointed by SG provided by the SCSI mid-layer with the RDMA
device.

As preparation step for adding more methods for fast path memory
registration, make the alloc/free and reg/unreg calls function
pointers, which are for now just set to the existing FMR ones.

Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:09 -07:00
Shlomo Pongratz
b7f0451309 IB/iser: Accept session->cmds_max from user space
Use cmds_max passed from user space to be the number of PDUs to be
supported for the session instead of hard-coded ISCSI_DEF_XMIT_CMDS_MAX.
This allow controlling the max number of SCSI commands for the session.
Also don't ignore the qdepth passed from user space.

Derive from session->cmds_max the actual number of RX buffers and FMR
pool size to allocate during the connection bind phase.

Since the iser transport connection is established before the iscsi
session/connection are created and bound, we still use one hard-coded
quantity ISER_DEF_XMIT_CMDS_MAX to compute the maximum number of
work-requests to be supported by the RC QP used for the connection.

The above quantity is made to be a power of two between ISCSI_TOTAL_CMDS_MIN
(16) and ISER_DEF_XMIT_CMDS_MAX (512) inclusive.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:08 -07:00
Shlomo Pongratz
986db0d6c0 IB/iser: Restructure allocation/deallocation of connection resources
This is a preparation step to a patch that accepts the number of max
SCSI commands to be supported a session from user space iSCSI tools.

Move the allocation of the login buffer, FMR pool and its associated
page vector from iser_create_ib_conn_res() (which is called prior when
we actually know how many commands should be supported) to
iser_alloc_rx_descriptors() (which is called during the iscsi
connection bind step where this quantity is known).

Also do small refactoring around the deallocation to make that path
similar to the allocation one.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:08 -07:00
Or Gerlitz
f91424cf5b IB/iser: Use proper debug level value for info prints
Commit 4f36388261 ("IB/iser: Move informational messages from error
to info level") set info prints to be emitted at a lower debug level
than warning prints, which is a bit odd.  Fix that.

Also move the prints on unaligned SG from warning to debug level.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-08-09 17:18:08 -07:00
Erez Shitrit
c290414169 IPoIB: Fix pkey change flow for virtualization environments
IPoIB's required behaviour w.r.t to the pkey used by the device is the following:

- For "parent" interfaces (e.g ib0, ib1, etc) who are created
  automatically as a result of hot-plug events from the IB core, the
  driver needs to take whatever pkey vlaue it finds in index 0, and
  stick to that index.

- For child interfaces (e.g ib0.8001, etc) created by admin directive,
  the driver needs to use and stick to the value provided during its
  creation.

In SR-IOV environment its possible for the VF probe to take place
before the cloud management software provisions the suitable pkey for
the VF in the paravirtualed PKEY table index 0. When this is the case,
the VF IB stack will find in index 0 an invalide pkey, which is all
zeros.

Moreover, the cloud managment can assign the pkey value at index 0 at
any time of the guest life cycle.

The correct behavior for IPoIB to address these requirements for
parent interfaces is to use PKEY_CHANGE event as trigger to optionally
re-init the device pkey value and re-create all the relevant resources
accordingly, if the value of the pkey in index 0 has changed (from
invalid to valid or from valid value X to invalid value Y).

This patch enhances the heavy flushing code which is triggered by pkey
change event, to behave correctly for parent devices. For child
devices, the code remains the same, namely chases pkey value and not
index.

Signed-off-by: Erez Shitrit <erezsh@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-31 14:23:44 -07:00
Or Gerlitz
3d790a4c26 IPoIB: Make sure child devices use valid/proper pkeys
Make sure that the IB invalid pkey (0x0000 or 0x8000) isn't used for
child devices.

Also, make sure to always set the full membership bit for the pkey of
devices created by rtnl link ops.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-31 14:23:40 -07:00
Linus Torvalds
c552441373 Main batch of InfiniBand/RDMA changes for 3.11 merge window:
- AF_IB (native IB addressing) for CMA from Sean Hefty
  - New mlx5 driver for Mellanox Connect-IB adapters (including post merge request fixes)
  - SRP fixes from Bart Van Assche (including fix to first merge request)
  - qib HW driver updates
  - Resurrection of ocrdma HW driver development
  - uverbs conversion to create fds with O_CLOEXEC set
  - Other small changes and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJR30TKAAoJEENa44ZhAt0h854P/jvAhK5u+XTM5VyjAi0DKJ7P
 bWcsu+KxbOIFnjEdsYQl1mGP44gdO8GPZp7+JR5nDHDRpw9K76qy6QQiPbaF6Y8D
 cZH8Xlq4hzBfElTWBkExEemPrVUUq77j03FE9TBatdLAtEyYkgrNyqr7Ys6zVwVK
 ugR8nAahvnB7Jh1tsyZBBd9kfbWtXJnaGC8/Zk3Na4n4zXRAbr0DcnRF0sncTL38
 VFnWbi33OQAxu5bsb2jGec/SNP3BbNwspFPjSCKqiiItRaCj13JiHhrKKvVk4RZe
 hIRnPH47kjLRp2/PwBo6o+gTXZuRg48VGBx4CKUTwx1nCzPPN1iz9ZOfqUv9Qwcv
 LX8mxC7QS/Yvud4KeEBsj6kotb80EkRF2KV5RkIKCxQiwetGD9127bZylC8ttxGw
 2f6MzYtAGD4R4C10lO8N+59VugSg1xAvwsqz0a/jy2XyVHbI1ugQedzkB20x5WPY
 51S08ABvtU9yIxIYrw2VEaa/5WN+XJ6+LpG9QBAGXdMLiCiiAe7n/YzyXI6AgwaW
 Jl/uKr6H6/jEHUHKwkyqsmbpVGPhtGWu8deyr1oYvOEP4i48gcDqMQsfMcCISrQV
 MeQU3hS/obykUlNeqjmMI2CXrecqSsiq0hXd4DLaSoZ2Rb4Drx2Wj6sTQLIAgL2q
 GBYjHWMUpZXIFHQaH7am
 =nZh8
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - AF_IB (native IB addressing) for CMA from Sean Hefty
 - new mlx5 driver for Mellanox Connect-IB adapters (including post
   merge request fixes)
 - SRP fixes from Bart Van Assche (including fix to first merge request)
 - qib HW driver updates
 - resurrection of ocrdma HW driver development
 - uverbs conversion to create fds with O_CLOEXEC set
 - other small changes and fixes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (66 commits)
  mlx5: Return -EFAULT instead of -EPERM
  IB/qib: Log all SDMA errors unconditionally
  IB/qib: Fix module-level leak
  mlx5_core: Adjust hca_cap.uar_page_sz to conform to Connect-IB spec
  IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline
  IB/uverbs: Use get_unused_fd_flags(O_CLOEXEC) instead of get_unused_fd()
  mlx5_core: Fixes for sparse warnings
  IB/mlx5: Make profile[] static in main.c
  mlx5: Fix parameter type of health_handler_t
  mlx5: Add driver for Mellanox Connect-IB adapters
  IB/core: Add reserved values to enums for low-level driver use
  IB/srp: Bump driver version and release date
  IB/srp: Make HCA completion vector configurable
  IB/srp: Maintain a single connection per I_T nexus
  IB/srp: Fail I/O fast if target offline
  IB/srp: Skip host settle delay
  IB/srp: Avoid skipping srp_reset_host() after a transport error
  IB/srp: Fix remove_one crash due to resource exhaustion
  IB/qib: New transmitter tunning settings for Dell 1.1 backplane
  IB/core: Fix error return code in add_port()
  ...
2013-07-13 12:57:21 -07:00
Bart Van Assche
80d5e8a235 IB/srp: Let srp_abort() return FAST_IO_FAIL if TL offline
If the transport layer is offline it is more appropriate to let
srp_abort() return FAST_IO_FAIL instead of SUCCESS.

Reported-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-11 16:43:48 -07:00
Linus Torvalds
6d2fa9e141 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "Lots of activity this round on performance improvements in target-core
  while benchmarking the prototype scsi-mq initiator code with
  vhost-scsi fabric ports, along with a number of iscsi/iser-target
  improvements and hardening fixes for exception path cases post v3.10
  merge.

  The highlights include:

   - Make persistent reservations APTPL buffer allocated on-demand, and
     drop per t10_reservation buffer.  (grover)
   - Make virtual LUN=0 a NULLIO device, and skip allocation of NULLIO
     device pages (grover)
   - Add transport_cmd_check_stop write_pending bit to avoid extra
     access of ->t_state_lock is WRITE I/O submission fast-path.  (nab)
   - Drop unnecessary CMD_T_DEV_ACTIVE check from
     transport_lun_remove_cmd to avoid extra access of ->t_state_lock in
     release fast-path.  (nab)
   - Avoid extra t_state_lock access in __target_execute_cmd fast-path
     (nab)
   - Drop unnecessary vhost-scsi wait_for_tasks=true usage +
     ->t_state_lock access in release fast-path.  (nab)
   - Convert vhost-scsi to use modern se_cmd->cmd_kref
     TARGET_SCF_ACK_KREF usage (nab)
   - Add tracepoints for SCSI commands being processed (roland)
   - Refactoring of iscsi-target handling of ISCSI_OP_NOOP +
     ISCSI_OP_TEXT to be transport independent (nab)
   - Add iscsi-target SendTargets=$IQN support for in-band discovery
     (nab)
   - Add iser-target support for in-band discovery (nab + Or)
   - Add iscsi-target demo-mode TPG authentication context support (nab)
   - Fix isert_put_reject payload buffer post (nab)
   - Fix iscsit_add_reject* usage for iser (nab)
   - Fix iscsit_sequence_cmd reject handling for iser (nab)
   - Fix ISCSI_OP_SCSI_TMFUNC handling for iser (nab)
   - Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED (nab)

  The last five iscsi/iser-target items are CC'ed to stable, as they do
  address issues present in v3.10 code.  They are certainly larger than
  I'd like for stable patch set, but are important to ensure proper
  REJECT exception handling in iser-target for 3.10.y"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (51 commits)
  iser-target: Ignore non TEXT + LOGOUT opcodes for discovery
  target: make queue_tm_rsp() return void
  target: remove unused codes from enum tcm_tmrsp_table
  iscsi-target: kstrtou* configfs attribute parameter cleanups
  iscsi-target: Fix tfc_tpg_auth_cit configfs length overflow
  iscsi-target: Fix tfc_tpg_nacl_auth_cit configfs length overflow
  iser-target: Add support for ISCSI_OP_TEXT opcode + payload handling
  iser-target: Rename sense_buf_[dma,len] to pdu_[dma,len]
  iser-target: Add vendor_err debug output
  target: Add (obsolete) checking for PMI/LBA fields in READ CAPACITY(10)
  target: Return correct sense data for IO past the end of a device
  target: Add tracepoints for SCSI commands being processed
  iser-target: Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED
  iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser
  iscsi-target: Fix iscsit_sequence_cmd reject handling for iser
  iscsi-target: Fix iscsit_add_reject* usage for iser
  iser-target: Fix isert_put_reject payload buffer post
  iscsi-target: missing kfree() on error path
  iscsi-target: Drop left-over iscsi_conn->bad_hdr
  target: Make core_scsi3_update_and_write_aptpl return sense_reason_t
  ...
2013-07-11 12:57:19 -07:00
Nicholas Bellinger
ca40d24eb8 iser-target: Ignore non TEXT + LOGOUT opcodes for discovery
This patch adds a check in isert_rx_opcode() to ignore non TEXT + LOGOUT
opcodes when SessionType=Discovery has been negotiated.

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:37:02 -07:00
Joern Engel
b79fafac70 target: make queue_tm_rsp() return void
The return value wasn't checked by any of the callers.  Assuming this is
correct behaviour, we can simplify some code by not bothering to
generate it.

nab: Add srpt_queue_data_in() + srpt_queue_tm_rsp() nops around
     srpt_queue_response() void return

Signed-off-by: Joern Engel <joern@logfs.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:36:53 -07:00
Nicholas Bellinger
adb54c2931 iser-target: Add support for ISCSI_OP_TEXT opcode + payload handling
This patch adds isert_handle_text_cmd() to handle incoming
ISCSI_OP_TEXT PDU processing, along with isert_put_text_rsp()
for posting ISCSI_OP_TEXT_RSP ib_send_wr response.

It copies ISCSI_OP_TEXT payload using unsolicited payload at
&iser_rx_desc->data[0] into iscsi_cmd->text_in_ptr for usage
with outgoing isert_put_text_rsp() -> iscsit_build_text_rsp()

v2 changes:
  - Let iscsit_build_text_rsp() determine any extra padding

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:36:48 -07:00
Nicholas Bellinger
dbbc5d1107 iser-target: Rename sense_buf_[dma,len] to pdu_[dma,len]
Now that these two variables are used for REJECT payloads as well
as SCSI response sense payloads, rename them to something that
makes more sense.

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:36:47 -07:00
Nicholas Bellinger
c5a2adbfcb iser-target: Add vendor_err debug output
Add output for ib_wc.vendor_err in isert_cq_[t,r]x_work(), which
is useful for debugging future issues.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:36:46 -07:00
Nicholas Bellinger
b2cb96494d iser-target: Fix session reset bug with RDMA_CM_EVENT_DISCONNECTED
This patch addresses a bug where RDMA_CM_EVENT_DISCONNECTED may occur
before the connection shutdown has been completed by rx/tx threads,
that causes isert_free_conn() to wait indefinately on ->conn_wait.

This patch allows isert_disconnect_work code to invoke rdma_disconnect
when isert_disconnect_work() process context is started by client
session reset before isert_free_conn() code has been reached.

It also adds isert_conn->conn_mutex protection for ->state within
isert_disconnect_work(), isert_cq_comp_err() and isert_free_conn()
code, along with isert_check_state() for wait_event usage.

(v2: Add explicit iscsit_cause_connection_reinstatement call
     during isert_disconnect_work() to force conn reset)

Cc: stable@vger.kernel.org  # 3.10+
Cc: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-07 18:35:56 -07:00
Nicholas Bellinger
186a964701 iscsi-target: Fix ISCSI_OP_SCSI_TMFUNC handling for iser
This patch adds target_get_sess_cmd reference counting for
iscsit_handle_task_mgt_cmd(), and adds a target_put_sess_cmd()
for the failure case.

It also fixes a bug where ISCSI_OP_SCSI_TMFUNC type commands
where leaking iscsi_cmd->i_conn_node and eventually triggering
an OOPs during struct isert_conn shutdown.

Cc: stable@vger.kernel.org  # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-06 22:01:23 -07:00
Nicholas Bellinger
561bf15892 iscsi-target: Fix iscsit_sequence_cmd reject handling for iser
This patch moves ISCSI_OP_REJECT failures into iscsit_sequence_cmd()
in order to avoid external iscsit_reject_cmd() reject usage for all
PDU types.

It also updates PDU specific handlers for traditional iscsi-target
code to not reset the session after posting a ISCSI_OP_REJECT during
setup.

(v2: Fix CMDSN_LOWER_THAN_EXP for ISCSI_OP_SCSI to call
     target_put_sess_cmd() after iscsit_sequence_cmd() failure)

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: stable@vger.kernel.org  # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-06 21:59:54 -07:00
Nicholas Bellinger
ba15991408 iscsi-target: Fix iscsit_add_reject* usage for iser
This patch changes iscsit_add_reject() + iscsit_add_reject_from_cmd()
usage to not sleep on iscsi_cmd->reject_comp to address a free-after-use
usage bug in v3.10 with iser-target code.

It saves ->reject_reason for use within iscsit_build_reject() so the
correct value for both transport cases.  It also drops the legacy
fail_conn parameter usage throughput iscsi-target code and adds
two iscsit_add_reject_cmd() and iscsit_reject_cmd helper functions,
along with various small cleanups.

(v2: Re-enable target_put_sess_cmd() to be called from
     iscsit_add_reject_from_cmd() for rejects invoked after
     target_get_sess_cmd() has been called)

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: stable@vger.kernel.org  # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-06 21:59:53 -07:00
Nicholas Bellinger
3df8f68aaf iser-target: Fix isert_put_reject payload buffer post
This patch adds the missing isert_put_reject() logic to post
a outgoing payload buffer to hold the 48 bytes of original PDU
header request payload for the rejected cmd.

It also fixes ISTATE_SEND_REJECT handling in isert_response_completion()
-> isert_do_control_comp() code, and drops incorrect iscsi_cmd_t->reject_comp
usage.

Cc: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Mike Christie <michaelc@cs.wisc.edu>
Cc: stable@vger.kernel.org  # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-07-06 21:59:35 -07:00
Linus Torvalds
80cc38b163 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree updates from Jiri Kosina:
 "The usual stuff from trivial tree"

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (34 commits)
  treewide: relase -> release
  Documentation/cgroups/memory.txt: fix stat file documentation
  sysctl/net.txt: delete reference to obsolete 2.4.x kernel
  spinlock_api_smp.h: fix preprocessor comments
  treewide: Fix typo in printk
  doc: device tree: clarify stuff in usage-model.txt.
  open firmware: "/aliasas" -> "/aliases"
  md: bcache: Fixed a typo with the word 'arithmetic'
  irq/generic-chip: fix a few kernel-doc entries
  frv: Convert use of typedef ctl_table to struct ctl_table
  sgi: xpc: Convert use of typedef ctl_table to struct ctl_table
  doc: clk: Fix incorrect wording
  Documentation/arm/IXP4xx fix a typo
  Documentation/networking/ieee802154 fix a typo
  Documentation/DocBook/media/v4l fix a typo
  Documentation/video4linux/si476x.txt fix a typo
  Documentation/virtual/kvm/api.txt fix a typo
  Documentation/early-userspace/README fix a typo
  Documentation/video4linux/soc-camera.txt fix a typo
  lguest: fix CONFIG_PAE -> CONFIG_x86_PAE in comment
  ...
2013-07-04 11:40:58 -07:00
Vu Pham
e8ca413558 IB/srp: Bump driver version and release date
Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-01 10:42:02 -07:00
Bart Van Assche
4b5e5f41c8 IB/srp: Make HCA completion vector configurable
Several InfiniBand HCAs allow configuring the completion vector per
CQ.  This allows spreading the workload created by IB completion
interrupts over multiple MSI-X vectors and hence over multiple CPU
cores.  In other words, configuring the completion vector properly not
only allows reducing latency on an initiator connected to multiple
SRP targets but also allows improving throughput.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-01 10:40:55 -07:00
Bart Van Assche
96fc248a4c IB/srp: Maintain a single connection per I_T nexus
An SRP target is required to maintain a single connection between
initiator and target.  This means that if the 'add_target' attribute
is used to create a second connection to a target, the first
connection will be logged out and that the SCSI error handler will
kick in.  The SCSI error handler will cause the SRP initiator to
reconnect, which will cause I/O over the second connection to fail.
Avoid such ping-pong behavior by disabling relogins.

If reconnecting manually is necessary, that is possible by deleting
and recreating an rport via sysfs.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-01 10:40:07 -07:00
Bart Van Assche
99e1c1398f IB/srp: Fail I/O fast if target offline
If reconnecting failed we know that no command completion will
be received anymore.  Hence let the SCSI error handler fail such
commands immediately.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-07-01 10:37:13 -07:00
Bart Van Assche
2742c1dadd IB/srp: Skip host settle delay
The SRP initiator implements host reset by reconnecting to the SRP
target.  That means that communication with the target is possible as
soon as host reset finished. Hence skip the host settle delay.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Reviewed-by: Christoph Hellwig <hch@infradead.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-27 16:44:39 -07:00
Bart Van Assche
086f44f588 IB/srp: Avoid skipping srp_reset_host() after a transport error
The SCSI error handler assumes that the transport layer is operational
if an eh_abort_handler() returns SUCCESS.  Hence srp_abort() only
should return SUCCESS if sending the ABORT TASK task management
function succeeded.  This patch avoids the SCSI error handler skipping
the srp_reset_host() call after a transport layer error.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-27 16:44:39 -07:00
Dotan Barak
1fe0cb8488 IB/srp: Fix remove_one crash due to resource exhaustion
If the add_one callback fails during driver load no resources are
allocated so there isn't a need to release any resources. Trying
to clean the resource may lead to the following kernel panic:

    BUG: unable to handle kernel NULL pointer dereference at (null)
    IP: [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp]
    RIP: 0010:[<ffffffffa0132331>]  [<ffffffffa0132331>] srp_remove_one+0x31/0x240 [ib_srp]
    Process rmmod (pid: 4562, threadinfo ffff8800dd738000, task ffff8801167e60c0)
    Call Trace:
     [<ffffffffa024500e>] ib_unregister_client+0x4e/0x120 [ib_core]
     [<ffffffffa01361bd>] srp_cleanup_module+0x15/0x71 [ib_srp]
     [<ffffffff810ac6a4>] sys_delete_module+0x194/0x260
     [<ffffffff8100b0f2>] system_call_fastpath+0x16/0x1b

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Eli Cohen <eli@mellanox.co.il>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Sebastian Riemer <sebastian.riemer@profitbricks.com>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-27 16:44:38 -07:00
Nicholas Bellinger
778de36896 iscsi/isert-target: Refactor ISCSI_OP_NOOP RX handling
This patch refactors ISCSI_OP_NOOP handling within iscsi-target in
order to handle iscsi_nopout payloads in a transport specific manner.

This includes splitting existing iscsit_handle_nop_out() into
iscsit_setup_nop_out() and iscsit_process_nop_out() calls, and
makes iscsit_handle_nop_out() be only used internally by traditional
iscsi socket calls.

Next update iser-target code to use new callers and add FIXME for
the handling iscsi_nopout payloads.  Also fix reject response handling
in iscsit_setup_nop_out() to use proper iscsit_add_reject_from_cmd().

v2: Fix uninitialized iscsit_handle_nop_out() payload_length usage (Fengguang)
v3: Remove left-over dead code in iscsit_setup_nop_out() (DanC)

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-06-24 22:35:51 -07:00
Or Gerlitz
28f292e879 IB/iser: Add Mellanox copyright
Add Mellanox copyright to the iser initiator source code which I maintain.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-04 17:03:12 -07:00
Roi Dayan
5b61ff43a7 IB/iser: Fix device removal flow
Change the code to destroy the "last opened" rdma_cm id after making
sure we released all other objects (QP, CQs, PD, etc) associated with
the IB device.

Since iser accesses the IB device using the rdma_cm id, we need to
free any objects that are related to the device that is associated
with the rdma_cm id prior to destroying that id.  When this isn't
done, the low level driver that created this device can be unloaded
before iser has a chance to free all the objects and a such a call may
invoke code segment which isn't valid any more and crash.

Cc: Sean Hefty <sean.hefty@intel.com
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-06-04 17:03:11 -07:00
Nicholas Bellinger
1d19f7800d ib_srpt: Call target_sess_cmd_list_set_waiting during shutdown_session
Given that srpt_release_channel_work() calls target_wait_for_sess_cmds()
to allow outstanding se_cmd_t->cmd_kref a change to complete, the call
to perform target_sess_cmd_list_set_waiting() needs to happen in
srpt_shutdown_session()

Also, this patch adds an explicit call to srpt_shutdown_session() within
srpt_drain_channel() so that target_sess_cmd_list_set_waiting() will be
called in the cases where TFO->shutdown_session() is not triggered
directly by TCM.

Cc: Joern Engel <joern@logfs.org>
Cc: Roland Dreier <roland@kernel.org>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-29 21:30:46 -07:00
Masanari Iida
8b513d0cf6 treewide: Fix typo in printk
Correct spelling typo in various part of drivers

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2013-05-28 12:02:13 +02:00
Joern Engel
be646c2d2b target: Remove unused wait_for_tasks bit in target_wait_for_sess_cmds
Drop unused transport_wait_for_tasks() check in target_wait_for_sess_cmds
shutdown code, and convert tcm_qla2xxx + ib_srpt fabric drivers.

Cc: Joern Engel <joern@logfs.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-05-20 21:44:10 -07:00
Linus Torvalds
e0fd9affeb InfiniBand/RDMA changes for the 3.10 merge window:
- XRC transport fixes
  - Fix DHCP on IPoIB
  - mlx4 preparations for flow steering
  - iSER fixes
  - miscellaneous other fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQIcBAABCAAGBQJRisChAAoJEENa44ZhAt0hHLoP/iX5MxtHJ3X1u5KcARX/7nci
 CCH/VnD172re/KavCCg7zkbZQpS4jHCtW/CzLUCSPqBGOaj78HFTUkB3ragvUW7m
 ndERwplF8DP/i0x7Kk7Wau2a4RdlH0lqwucOjqTyQnbIdxknkz6w3Jcsb9Ic2lzx
 up0T0HaHtTxdVF6lXOB5QOIpUGg3l0Yu4euX2BA61WDoZj+VIYhgeWZewq0iV0D1
 rLtarJ+Or7mdwu2rNcDHgrD0lhF4SCBd3rx4lbc4F68Cr8JUz0Xe7liPLNskeLhW
 f3NEm3gmkYp9YI1otGsA0X/CyV6wnRk4mT8JMlOb2WNzeq2V13Z54/9ZyF5/gFD2
 JgzkQB9Ibf7EmTwXWd6+0+FA40Q6dNvnRnhddRM255dvDVw7nxUr2UzYH4Re/Z9K
 rNFjkvix2YUwEmoPjitWocz2kj2reDMqjtiVDmdGy1YbtnicH5GtkQsWkoPg8ON9
 m5jORUdzydTD+yBJwTiFP1EuFoG3TdfoZ7zHMJwWy/u8i308xD6WPGms9MTdjh8j
 7gjz2TCKr+vpuVRh/p6esCPPOTSsSeWDeowy7Sgpdf3qoqAImXsWXVrl2kXLhtyl
 1VIgHU3ztm7oqwmy0gQ/zVCo4CLLdsif2zmEIDpxJPnWaSq+D9LJdyLTcfdBjhQ8
 9SjUafe4msT1pIjNb7ND
 =1ojD
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - XRC transport fixes
 - Fix DHCP on IPoIB
 - mlx4 preparations for flow steering
 - iSER fixes
 - miscellaneous other fixes

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (23 commits)
  IB/iser: Add support for iser CM REQ additional info
  IB/iser: Return error to upper layers on EAGAIN registration failures
  IB/iser: Move informational messages from error to info level
  IB/iser: Add module version
  mlx4_core: Expose a few helpers to fill DMFS HW strucutures
  mlx4_core: Directly expose fields of DMFS HW rule control segment
  mlx4_core: Change a few DMFS fields names to match firmare spec
  mlx4: Match DMFS promiscuous field names to firmware spec
  mlx4_core: Move DMFS HW structs to common header file
  IB/mlx4: Set link type for RAW PACKET QPs in the QP context
  IB/mlx4: Disable VLAN stripping for RAW PACKET QPs
  mlx4_core: Reduce warning message for SRQ_LIMIT event to debug level
  RDMA/iwcm: Don't touch cmid after dropping reference
  IB/qib: Correct qib_verbs_register_sysfs() error handling
  IB/ipath: Correct ipath_verbs_register_sysfs() error handling
  RDMA/cxgb4: Fix SQ allocation when on-chip SQ is disabled
  SRPT: Fix odd use of WARN_ON()
  IPoIB: Fix ipoib_hard_header() return value
  RDMA: Rename random32() to prandom_u32()
  RDMA/cxgb3: Fix uninitialized variable
  ...
2013-05-08 15:29:48 -07:00
Roland Dreier
ea9627c800 Merge branches 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' into for-next 2013-05-08 14:12:37 -07:00
Andrew Morton
50bea5c0d5 drivers/infiniband/hw: rename random32() to prandom_u32()
Use preferable function name which implies using a pseudo-random number
generator.

Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2013-05-07 18:38:27 -07:00
Or Gerlitz
8d8399deb0 IB/iser: Add support for iser CM REQ additional info
Annex A12 of the IBTA spec defines additional information that needs
to be provided through the CM exchange relating to usage of ZBVA (Zero
Based VAs) and Send With Invalidate over an iSER connection.

Currently, the initiator sets both to not supported, but does provide
the header so that existing iSER targets can be patched to start
looking on the private data carried by the CM.

This is a preparation step to enable iSER with HW drivers for which
FMRs are not supported, such as mlx4 VF instances or new HW devices
which might support only FRWR (Fast Registration Work-Requests) along
the details of the IB_DEVICE_MEM_MGT_EXTENSIONS device capability.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-05-01 17:34:14 -07:00
Or Gerlitz
450d1e40d5 IB/iser: Return error to upper layers on EAGAIN registration failures
Commit 819a087316 ("IB/iser: Avoid error prints on EAGAIN
registration failures") not only eliminated the error print on that
case, but rather also modified the code such that it doesn't return
any error to upper layers.  As a result a wrong mapping was used.  Fix
this to correctly return the error in that case.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-05-01 17:34:13 -07:00
Roi Dayan
4f36388261 IB/iser: Move informational messages from error to info level
Introduce iser_info() and move informational messages that were
printed as errors to use that macro. Also, cleanup printk leftovers to
use the existing macros.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>

[ Use pr_warn(... instead of printk(KERN_WARNING ....  - Roland ]

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-05-01 17:34:13 -07:00
Roi Dayan
c1d786e682 IB/iser: Add module version
Add displaying module version, update the version to 1.1,
and remove the DRV_DATE define.

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-05-01 17:34:13 -07:00
Linus Torvalds
73287a43cc Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking updates from David Miller:
 "Highlights (1721 non-merge commits, this has to be a record of some
  sort):

   1) Add 'random' mode to team driver, from Jiri Pirko and Eric
      Dumazet.

   2) Make it so that any driver that supports configuration of multiple
      MAC addresses can provide the forwarding database add and del
      calls by providing a default implementation and hooking that up if
      the driver doesn't have an explicit set of handlers.  From Vlad
      Yasevich.

   3) Support GSO segmentation over tunnels and other encapsulating
      devices such as VXLAN, from Pravin B Shelar.

   4) Support L2 GRE tunnels in the flow dissector, from Michael Dalton.

   5) Implement Tail Loss Probe (TLP) detection in TCP, from Nandita
      Dukkipati.

   6) In the PHY layer, allow supporting wake-on-lan in situations where
      the PHY registers have to be written for it to be configured.

      Use it to support wake-on-lan in mv643xx_eth.

      From Michael Stapelberg.

   7) Significantly improve firewire IPV6 support, from YOSHIFUJI
      Hideaki.

   8) Allow multiple packets to be sent in a single transmission using
      network coding in batman-adv, from Martin Hundebøll.

   9) Add support for T5 cxgb4 chips, from Santosh Rastapur.

  10) Generalize the VXLAN forwarding tables so that there is more
      flexibility in configurating various aspects of the endpoints.
      From David Stevens.

  11) Support RSS and TSO in hardware over GRE tunnels in bxn2x driver,
      from Dmitry Kravkov.

  12) Zero copy support in nfnelink_queue, from Eric Dumazet and Pablo
      Neira Ayuso.

  13) Start adding networking selftests.

  14) In situations of overload on the same AF_PACKET fanout socket, or
      per-cpu packet receive queue, minimize drop by distributing the
      load to other cpus/fanouts.  From Willem de Bruijn and Eric
      Dumazet.

  15) Add support for new payload offset BPF instruction, from Daniel
      Borkmann.

  16) Convert several drivers over to mdoule_platform_driver(), from
      Sachin Kamat.

  17) Provide a minimal BPF JIT image disassembler userspace tool, from
      Daniel Borkmann.

  18) Rewrite F-RTO implementation in TCP to match the final
      specification of it in RFC4138 and RFC5682.  From Yuchung Cheng.

  19) Provide netlink socket diag of netlink sockets ("Yo dawg, I hear
      you like netlink, so I implemented netlink dumping of netlink
      sockets.") From Andrey Vagin.

  20) Remove ugly passing of rtnetlink attributes into rtnl_doit
      functions, from Thomas Graf.

  21) Allow userspace to be able to see if a configuration change occurs
      in the middle of an address or device list dump, from Nicolas
      Dichtel.

  22) Support RFC3168 ECN protection for ipv6 fragments, from Hannes
      Frederic Sowa.

  23) Increase accuracy of packet length used by packet scheduler, from
      Jason Wang.

  24) Beginning set of changes to make ipv4/ipv6 fragment handling more
      scalable and less susceptible to overload and locking contention,
      from Jesper Dangaard Brouer.

  25) Get rid of using non-type-safe NLMSG_* macros and use nlmsg_*()
      instead.  From Hong Zhiguo.

  26) Optimize route usage in IPVS by avoiding reference counting where
      possible, from Julian Anastasov.

  27) Convert IPVS schedulers to RCU, also from Julian Anastasov.

  28) Support cpu fanouts in xt_NFQUEUE netfilter target, from Holger
      Eitzenberger.

  29) Network namespace support for nf_log, ebt_log, xt_LOG, ipt_ULOG,
      nfnetlink_log, and nfnetlink_queue.  From Gao feng.

  30) Implement RFC3168 ECN protection, from Hannes Frederic Sowa.

  31) Support several new r8169 chips, from Hayes Wang.

  32) Support tokenized interface identifiers in ipv6, from Daniel
      Borkmann.

  33) Use usbnet_link_change() helper in USB net driver, from Ming Lei.

  34) Add 802.1ad vlan offload support, from Patrick McHardy.

  35) Support mmap() based netlink communication, also from Patrick
      McHardy.

  36) Support HW timestamping in mlx4 driver, from Amir Vadai.

  37) Rationalize AF_PACKET packet timestamping when transmitting, from
      Willem de Bruijn and Daniel Borkmann.

  38) Bring parity to what's provided by /proc/net/packet socket dumping
      and the info provided by netlink socket dumping of AF_PACKET
      sockets.  From Nicolas Dichtel.

  39) Fix peeking beyond zero sized SKBs in AF_UNIX, from Benjamin
      Poirier"

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1722 commits)
  filter: fix va_list build error
  af_unix: fix a fatal race with bit fields
  bnx2x: Prevent memory leak when cnic is absent
  bnx2x: correct reading of speed capabilities
  net: sctp: attribute printl with __printf for gcc fmt checks
  netlink: kconfig: move mmap i/o into netlink kconfig
  netpoll: convert mutex into a semaphore
  netlink: Fix skb ref counting.
  net_sched: act_ipt forward compat with xtables
  mlx4_en: fix a build error on 32bit arches
  Revert "bnx2x: allow nvram test to run when device is down"
  bridge: avoid OOPS if root port not found
  drivers: net: cpsw: fix kernel warn on cpsw irq enable
  sh_eth: use random MAC address if no valid one supplied
  3c509.c: call SET_NETDEV_DEV for all device types (ISA/ISAPnP/EISA)
  tg3: fix to append hardware time stamping flags
  unix/stream: fix peeking with an offset larger than data in queue
  unix/dgram: fix peeking with an offset larger than data in queue
  unix/dgram: peek beyond 0-sized skbs
  openvswitch: Remove unneeded ovs_netdev_get_ifindex()
  ...
2013-05-01 14:08:52 -07:00
Linus Torvalds
6da6dc2380 Merge branch 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target update from Nicholas Bellinger:
 "The highlights this round include:

   - Add fileio support for WRITE_SAME w/ UNMAP=1 discard (asias)
   - Add fileio support for UNMAP discard (asias)
   - Add tcm_vhost hotplug support to work with upstream QEMU
     vhost-scsi-pci code (asias + mst)
   - Check for aborted sequence in tcm_fc response path (mdr)
   - Add initial iscsit_transport support into iscsi-target code (nab)
   - Refactor iscsi-target RX PDU logic + export request PDU handling
     (nab)
   - Refactor iscsi-target TX queue logic + export response PDU creation
     (nab)
   - Add new iSCSI Extentions for RDMA (ISER) target driver (Or + nab)

  The biggest changes revolve around iscsi-target refactoring in order
  to support the iser-target driver.  This includes the conversion of
  the iscsi-target data-path to use modern se_cmd->cmd_kref counting,
  and allowing transport independent aspects of RX/TX PDU
  request/response handling be shared across existing traditional
  iscsi-target code, and the new iser-target code.

  Thanks to Or Gerlitz + Mellanox for supporting the iser-target
  development effort!"

* 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (25 commits)
  iser-target: Add iSCSI Extensions for RDMA (iSER) target driver
  tcm_vhost: Enable VIRTIO_SCSI_F_HOTPLUG
  tcm_vhost: Add ioctl to get and set events missed flag
  tcm_vhost: Add hotplug/hotunplug support
  tcm_vhost: Refactor the lock nesting rule
  tcm_fc: Check for aborted sequence
  iscsi-target: Add iser network portal attribute
  iscsi-target: Refactor TX queue logic + export response PDU creation
  iscsi-target: Refactor RX PDU logic + export request PDU handling
  iscsi-target: Add per transport iscsi_cmd alloc/free
  iscsi-target: Add iser-target parameter keys + setup during login
  iscsi-target: Initial traditional TCP conversion to iscsit_transport
  iscsi-target: Add iscsit_transport API template
  target: Add export of target_get_sess_cmd symbol
  target: Change default sense key of NOT_READY
  target/file: Set is_nonrot attribute
  target: Add sbc_execute_unmap() helper
  target/iblock: Add iblock_do_unmap() helper
  target/file: Add fd_do_unmap() helper
  target/file: Add UNMAP emulation support
  ...
2013-04-30 13:14:57 -07:00
Nicholas Bellinger
b8d26b3be8 iser-target: Add iSCSI Extensions for RDMA (iSER) target driver
This patch adds support for iSCSI Extensions for RDMA target mode,
and includes CQ pooling per isert_device context distributed across
multiple active iser target sessions.

It also uses cmwq process context for RX / TX ib_post_cq() polling
via isert_cq_desc->cq_[rx,tx]_work invoked by isert_cq_[rx,tx]_callback()
hardIRQ context callbacks.

v5 changes:

- Use ISER_RECV_DATA_SEG_LEN instead of hardcoded value in ISER_RX_PAD_SIZE (Or)
- Fix make W=1 warnings (Or)
- Add missing depends on NET && INFINIBAND_ADDR_TRANS in Kconfig (Randy + Or)
- Make isert_device_find_by_ib_dev() return proper ERR_PTR (Wei Yongjun)
- Properly setup iscsi_np->np_sockaddr in isert_setup_np() (Shlomi + nab)
- Add special case for early ISCSI_OP_SCSI_CMD exception handling (nab)

v4 changes:
- Mark isert_cq_rx_work as static (Or)
- Drop unnecessary ib_dma_sync_single_for_cpu + ib_dma_sync_single_for_device
  calls for isert_cmd->sense_buf_dma from isert_put_response (Or)
- Use 12288 for ISER_RX_PAD_SIZE base to save extra page per
  struct iser_rx_desc (Or + nab)
- Drop now unnecessary isert_rx_desc usage, and convert RX users to
  iser_rx_desc (Or + nab)
- Move isert_[alloc,free]_rx_descriptors() ahead of
  isert_create_device_ib_res() usage (nab)
- Mark isert_cq_[rx,tx]_callback() + prototypes as static
- Fix 'warning: 'ret' may be used uninitialized' warning for
  isert_create_device_ib_res on powerpc allmodconfig (fengguang + nab)
- Fix 'warning: 'ret' may be used uninitialized' warning for
  isert_connect_request on i386 allyesconfig (fengguang + nab)
- Fix pr_debug conversion specification in isert_rx_completion()
  (fengguang + nab)
- Drop unnecessary isert_conn->conn_cm_id != NULL check in
  isert_connect_release causing the build warning:
  "variable dereferenced before check 'isert_conn->conn_cm_id'"
- Fix isert_lid + isert_np leak in isert_setup_np failure path
- Add isert_conn->conn_wait_comp_err usage in isert_free_conn()
  for isert_cq_comp_err completion path
- Add isert_conn->logout_posted bit to determine decrement of
  isert_conn->post_send_buf_count from logout response completion
- Always set ISER_CONN_DOWN from isert_disconnect_work() callback

v3 changes:

- Convert to use per isert_cq_desc->cq_[rx,tx]_work + drop tasklets (Or + nab)
- Move IB_EVENT_QP_LAST_WQE_REACHED warn into correct
  isert_qp_event_callback (Or)
- Drop unnecessary IB_ACCESS_REMOTE_* access flag usage in
  isert_create_device_ib_res (Or)
- Add common isert_init_send_wr(), and convert isert_put_* calls (Or)
- Move to verbs+core logic to single ib_isert.[c,h]  (Or + nab)
- Add kmem_cache isert_cmd_cache usage for descriptor allocation (nab)
- Move common ib_post_send() logic used by isert_put_*() to
  isert_post_response() (nab)
- Add isert_put_reject call in isert_response_queue() for posting
  ISCSI_REJECT response. (nab)
- Add ISTATE_SEND_REJECT checking in isert_do_control_comp. (nab)

v2 changes:

- Drop unused ISERT_ADDR_ROUTE_TIMEOUT define
- Add rdma_notify() call for IB_EVENT_COMM_EST in isert_qp_event_callback()
- Make isert_query_device() less verbose
- Drop unused RDMA_CM_EVENT_ADDR_ERROR and RDMA_CM_EVENT_ROUTE_ERROR
  cases from isert_cma_handler()
- Drop unused rdma/ib_fmr_pool.h include
- Update isert_conn_setup_qp() to assign cq based upon least used
- Add isert_create_device_ib_res() to setup PD, CQs and MRs for each
  underlying struct ib_device, instead of using per isert_conn resources.
- Add isert_free_device_ib_res() to release PD, CQs and MRs for each
  underlying struct ib_device.
- Add isert_device_find_by_ib_dev()
- Change isert_connect_request() to drop PD, CQs and MRs allocation,
  and use isert_device_find_by_ib_dev() instead.
- Add isert_device_try_release()
- Change isert_connect_release() to decrement cq_active_qps, and drop
  PD, CQs and MRs resource release.
- Update isert_connect_release() to call isert_device_try_release()
- Make isert_create_device_ib_res() determine device->cqs_used based
  upon num_online_cpus()
- Drop misleading isert_dump_ib_wc() usage
- Drop unused rdma/ib_fmr_pool.h include
- Use proper xfer_len for login PDUs in isert_rx_completion()
- Add isert_release_cmd() usage
- Change isert_alloc_cmd() to setup iscsi_cmd.release_cmd() pointer
- Change isert_put_cmd() to perform per iscsi_opcode specific release
  logic
- Add isert_unmap_cmd() call for ISCSI_OP_SCSI_CMD from isert_put_cmd()
- Change isert_send_completion() to call
  atomic_dec(&isert_conn->post_send_buf_count)
  based upon per iscsi_opcode logic
- Drop ISTATE_REMOVE processing from isert_immediate_queue()
- Drop ISTATE_SEND_DATAIN processing from isert_response_queue()
- Drop ISTATE_SEND_STATUS processing from isert_response_queue()
- Drop iscsit_transport->iscsit_unmap_cmd() and ->iscsit_free_cmd()
- Convert iser_cq_tx_tasklet() to use struct isert_cq_desc pooling logic
- Convert isert_cq_tx_callback() to use struct isert_cq_desc pooling
  logic
- Convert iser_cq_rx_tasklet() to use struct isert_cq_desc pooling logic
- Convert isert_cq_rx_callback() to use struct isert_cq_desc pooling
  logic
- Add explict iscsit_stop_dataout_timer() call to
  isert_do_rdma_read_comp()
- Use isert_get_dataout() for iscsit_transport->iscsit_get_dataout()
  caller
- Drop ISTATE_SEND_R2T processing from isert_immediate_queue()
- Drop unused rdma/ib_fmr_pool.h include
- Drop isert_cmd->cmd_kref in favor of se_cmd->cmd_kref usage
- Add struct isert_device in order to support multiple EQs + CQ pooling
- Add struct isert_cq_desc
- Drop tasklets and cqs from isert_conn
- Bump ISERT_MAX_CQ to 64
- Various minor checkpatch fixes

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2013-04-25 01:09:41 -07:00
Patrick McHardy
dc850b0e68 IPoIB: add support for TIPC protocol
Support TIPC in the IPoIB driver. Since IPoIB now keeps track of its own
neighbour entries and doesn't require the packet to have a dst_entry
anymore, the only necessary changes are to:

- not drop multicast TIPC packets because of the unknown ethernet type
- handle unicast TIPC packets similar to IPv4/IPv6 unicast packets

in ipoib_start_xmit().

An alternative would be to remove all ethertype limitations since they're
not necessary anymore, all TIPC needs to know about is ARP and RARP since
it wants to always perform "path find", even if a path is already known.

Signed-off-by: Patrick McHardy <kaber@trash.net>
Signed-off-by: David S. Miller <davem@davemloft.net>
2013-04-17 14:18:33 -04:00
Grant Grundler
532ec6f1c0 SRPT: Fix odd use of WARN_ON()
While WARN_ON("const string") will work, it's intent is not obvious.
The warning is more useful by showing the "state" value.

Signed-off-by: Grant Grundler <grundler@chromium.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-04-16 22:58:08 -07:00
Doug Ledford
83bdd3b96c IPoIB: Fix ipoib_hard_header() return value
If you have a patched up dhcp server (and dhclient), they will use
AF_PACKET/SOCK_DGRAM pair to send dhcp packets over IPoIB.

However, when testing an upstream kernel, this has been broken for a
very long time (I tested 2.6.34, 2.6.38, 3.0, 3.1, 3.8, HEAD).

It turns out that the hard_header routine in ipoib is not following
the API and is returning 0 even when it pushed data onto the skb.
This then causes af_packet.c to overwrite the header just pushed with
data from user space.

Fixing this gets DHCP working on IPoIB.

Signed-off-by: Doug Ledford <dledford@redhat.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-04-16 22:57:09 -07:00
Akinobu Mita
cc529c0d72 RDMA: Rename random32() to prandom_u32()
Use more preferable function name which implies using a pseudo-random
number generator.

Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: Steve Wise <swise@chelsio.com>
Cc: linux-rdma@vger.kernel.org
Reviewed-by: Steve Wise <swise@opengridcomputing.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-04-16 22:47:05 -07:00
Mike Marciniszyn
1ee9e2aa7b IPoIB: Fix send lockup due to missed TX completion
Commit f0dc117abd ("IPoIB: Fix TX queue lockup with mixed UD/CM
traffic") attempts to solve an issue where unprocessed UD send
completions can deadlock the netdev.

The patch doesn't fully resolve the issue because if more than half
the tx_outstanding's were UD and all of the destinations are RC
reachable, arming the CQ doesn't solve the issue.

This patch uses the IB_CQ_REPORT_MISSED_EVENTS on the
ib_req_notify_cq().  If the rc is above 0, the UD send cq completion
callback is called directly to re-arm the send completion timer.

This issue is seen in very large parallel filesystem deployments
and the patch has been shown to correct the issue.

Cc: <stable@vger.kernel.org>
Reviewed-by: Dean Luick <dean.luick@intel.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-03-22 18:01:04 -07:00
Roland Dreier
ef4e359d9b Merge branches 'core', 'cxgb4', 'ipoib', 'iser', 'misc', 'mlx4', 'qib' and 'srp' into for-next 2013-02-26 09:17:56 -08:00
Roland Dreier
f72dd56690 IPoIB: Free ipoib neigh on path record failure so path rec queries are retried
If IPoIB fails to look up a path record (eg if it tries during an SM
failover when one SM is dead but the new one hasn't taken over yet), the
driver ends up with a neighbour structure but no address handle (AH).
There's no mechanism to recover from this: any further packets sent to
this destination will be silently dumped in ipoib_start_xmit().

Fix this by freeing the neighbour structures when a path rec query
fails, so that the next packet queued to be sent will trigger a new path
record query.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:42:15 -08:00
Bart Van Assche
2ce19e72f4 IB/srp: Fail I/O requests if the transport is offline
If an SRP target is no longer reachable and srp_reset_host() fails to
reconnect then ib_srp will invoke scsi_remove_host().  That function
will invoke __scsi_remove_device() for each LUN.  And that last
function will change the device state from SDEV_TRANSPORT_OFFLINE into
SDEV_CANCEL.  Certain user space software, e.g. older versions of
multipathd, continue queueing I/O to SCSI devices that are in the
SDEV_CANCEL state.

If these I/O requests are submitted as SG_IO that means that the
REQ_PREEMPT flag will be set and hence that these requests will be
passed to srp_queuecommand().  These requests will time out.  If new
requests are queued fast enough from user space these active requests
will prevent __scsi_remove_device() to finish.

Avoid this by failing I/O requests in the SDEV_CANCEL state if the
transport is offline.  Introduce a new variable to keep track of the
transport state instead of failing requests if (!target->connected ||
target->qp_in_error), so that the SCSI error handler has a chance to
retry commands after a transport layer failure occurred.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:31:14 -08:00
Bart Van Assche
c7c4e7ff80 IB/srp: Avoid endless SCSI error handling loop
If a SCSI command times out it is passed to the SCSI error
handler. The SCSI error handler will try to abort the commands that
timed out.  If aborting fails, a device reset will be attempted.  If
the device reset also fails a host reset will be attempted.  If the
host reset also fails the whole procedure will be repeated.

srp_abort() and srp_reset_device() fail for a QP in the error state.
srp_reset_host() fails after host removal has started.  Hence if the
SCSI error handler gets invoked after host removal has started and
with the QP in the error state an endless loop will be triggered.

Modify the SCSI error handling functions in ib_srp as follows:
- Abort SCSI commands properly even if the QP is in the error state.
- Make srp_reset_host() reset SCSI requests even after host removal
  has already started or if reconnecting fails.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:31:14 -08:00
Bart Van Assche
3780d1f088 IB/srp: Avoid sending a task management function needlessly
Do not send a task management function if sending will fail anyway
because either there is no RDMA/RC connection or the QP is in the
error state.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:31:14 -08:00
Bart Van Assche
e1b2f13aba IB/srp: Track connection state properly
Remove an assignment that incorrectly overwrites the connection state
update by srp_connect_target().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dave@thedillows.org>
Cc: <stable@vger.kernel.org> # 3.8
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-25 09:31:14 -08:00
Or Gerlitz
5525d210fd IB/iser: Enable iser when FMRs are not supported
Reuse the "SG unaligned for FMR" driver flow to make the initiator
functional when running over driver instance which doesn't support
FMRs, such as a mlx4 virtual function.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-22 00:22:30 -08:00
Or Gerlitz
819a087316 IB/iser: Avoid error prints on EAGAIN registration failures
Under IO/CPU stress its possible that the FMR pool might not have a
free FMR mapping element for iSER to use because of incomplete
background unmapping processing.  In that case we get -EAGAIN and the
IO is pushed back to the SCSI layer which soon retries it.  No need to
be so verbose about that.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-22 00:22:29 -08:00
Or Gerlitz
b96e4abaae IB/iser: Use proper define for the commands per LUN value advertised to SCSI ML
ISER_DEF_CMD_PER_LUN was meant to be ISCSI_DEF_XMIT_CMDS_MAX, not plain 128

Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-22 00:22:29 -08:00
Itai Garbi
5a2815f03c IPoIB: Don't attempt to release resources on error flow
If the ipoib client info isn't found on the _remove_one callback, we
must not attempt to scan the returned null list.  Found by Coverity.

Signed-off-by: Itai Garbi <igarbi@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-19 08:21:36 -08:00
Yan Burman
4b48680b55 IPoIB: Add version and firmware info to ethtool reporting
Implement version info as well as report firmware version and bus info
of the underlying IB HW device.

Signed-off-by: Yan Burman <yanb@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-19 08:21:36 -08:00
Shlomo Pongratz
9d1ad66e3e IPoIB: Fix ipoib_neigh hashing to use the correct daddr octets
The hash function introduced in commit b63b70d877 ("IPoIB: Use a
private hash table for path lookup in xmit path") was designd to use
the 3 octets of the IPoIB HW address that holds the remote QPN.
However, this currently isn't the case on little-endian machines,
because the the code there uses the flags part (octet[0]) and not the
last octet of the QPN (octet[3]).  Fix this.

The fix caused a checkpatch warning on line over 80 characters, to
solve that changed the name of the temp variable that holds the daddr.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-19 08:21:35 -08:00
Shlomo Pongratz
7e5a90c25f IPoIB: Fix crash due to skb double destruct
After commit b13912bbb4 ("IPoIB: Call skb_dst_drop() once skb is
enqueued for sending"), using connected mode and running multithreaded
iperf for long time, ie

    iperf -c <IP> -P 16 -t 3600

results in a crash.

After the above-mentioned patch, the driver is calling skb_orphan() and
skb_dst_drop() after calling post_send() in ipoib_cm.c::ipoib_cm_send()
(also in ipoib_ib.c::ipoib_send())

The problem with this is, as is written in a comment in both routines,
"it's entirely possible that the completion handler will run before we
execute anything after the post_send()."  This leads to running the
skb cleanup routines simultaneously in two different contexts.

The solution is to always perform the skb_orphan() and skb_dst_drop()
before queueing the send work request.  If an error occurs, then it
will be no different than the regular case where dev_free_skb_any() in
the completion path, which is assumed to be after these two routines.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2013-02-05 09:35:06 -08:00
Roland Dreier
b13912bbb4 IPoIB: Call skb_dst_drop() once skb is enqueued for sending
Currently, IPoIB delays collecting send completions for TX packets in
order to batch work more efficiently.  It does skb_orphan() right after
queuing the packets so that destructors run early, to avoid problems
like holding socket send buffers for too long (since we might not
collect a send completion until a long time after the packet is
actually sent).

However, IPoIB clears IFF_XMIT_DST_RELEASE because it actually looks
at skb_dst() to update the PMTU when it gets a too-long packet.  This
means that the packets sitting in the TX ring with uncollected send
completions are holding a reference on the dst.  We've seen this lead
to pathological behavior with respect to route and neighbour GC.  The
easy fix for this is to call skb_dst_drop() when we call skb_orphan().

Also, give packets sent via connected mode (CM) the same skb_orphan()
/ skb_dst_drop() treatment that packets sent via datagram mode get.

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-12-19 09:16:43 -08:00
Linus Torvalds
5bd665f28d Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull target updates from Nicholas Bellinger:
 "It has been a very busy development cycle this time around in target
  land, with the highlights including:

   - Kill struct se_subsystem_dev, in favor of direct se_device usage
     (hch)
   - Simplify reservations code by combining SPC-3 + SCSI-2 support for
     virtual backends only (hch)
   - Simplify ALUA code for virtual only backends, and remove left over
     abstractions (hch)
   - Pass sense_reason_t as return value for I/O submission path (hch)
   - Refactor MODE_SENSE emulation to allow for easier addition of new
     mode pages.  (roland)
   - Add emulation of MODE_SELECT (roland)
   - Fix bug in handling of ExpStatSN wrap-around (steve)
   - Fix bug in TMR ABORT_TASK lookup in qla2xxx target (steve)
   - Add WRITE_SAME w/ UNMAP=0 support for IBLOCK backends (nab)
   - Convert ib_srpt to use modern target_submit_cmd caller + drop
     legacy ioctx->kref usage (nab)
   - Convert ib_srpt to use modern target_submit_tmr caller (nab)
   - Add link_magic for fabric allow_link destination target_items for
     symlinks within target_core_fabric_configfs.c code (nab)
   - Allocate pointers in instead of full structs for
     config_group->default_groups (sebastian)
   - Fix 32-bit highmem breakage for FILEIO (sebastian)

  All told, hch was able to shave off another ~1K LOC by killing the
  se_subsystem_dev abstraction, along with a number of PR + ALUA
  simplifications.  Also, a nice patch by Roland is the refactoring of
  MODE_SENSE handling, along with the addition of initial MODE_SELECT
  emulation support for virtual backends.

  Sebastian found a long-standing issue wrt to allocation of full
  config_group instead of pointers for config_group->default_group[]
  setup in a number of areas, which ends up saving memory with big
  configurations.  He also managed to fix another long-standing BUG wrt
  to broken 32-bit highmem support within the FILEIO backend driver.

  Thank you again to everyone who contributed this round!"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (50 commits)
  target/iscsi_target: Add NodeACL tags for initiator group support
  target/tcm_fc: fix the lockdep warning due to inconsistent lock state
  sbp-target: fix error path in sbp_make_tpg()
  sbp-target: use simple assignment in tgt_agent_rw_agent_state()
  iscsi-target: use kstrdup() for iscsi_param
  target/file: merge fd_do_readv() and fd_do_writev()
  target/file: Fix 32-bit highmem breakage for SGL -> iovec mapping
  target: Add link_magic for fabric allow_link destination target_items
  ib_srpt: Convert TMR path to target_submit_tmr
  ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref
  target: Make spc_get_write_same_sectors return sector_t
  target/configfs: use kmalloc() instead of kzalloc() for default groups
  target/configfs: allocate only 6 slots for dev_cg->default_groups
  target/configfs: allocate pointers instead of full struct for default_groups
  target: update error handling for sbc_setup_write_same()
  iscsit: use GFP_ATOMIC under spin lock
  iscsi_target: Remove redundant null check before kfree
  target/iblock: Forward declare bio helpers
  target: Clean up flow in transport_check_aborted_status()
  target: Clean up logic in transport_put_cmd()
  ...
2012-12-15 14:25:10 -08:00
Bart Van Assche
dc1bdbd9b8 IB/srp: Allow SRP disconnect through sysfs
Make it possible to disconnect the IB RC connection used by the SRP
protocol to communicate with a target.

Have the SRP transport layer create a sysfs "delete" attribute for
initiator drivers that support this functionality.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Robert Jennings <rcj@linux.vnet.ibm.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:33 -08:00
Vu Pham
55d93898a1 IB/srp: send disconnect request without waiting for CM timewait exit
Now that SRP recreates the CM ID, QP, and CQ for each connection,
there is no need to wait for the timewait state to complete.

Signed-off-by: Vu Pham <vu@mellanox.com>
Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:32 -08:00
Ishai Rabinovitz
73aa89ed9e IB/srp: destroy and recreate QP and CQs when reconnecting
HW QP FATAL errors persist over a reset operation, but we can recover
from that by recreating the QP and associated CQs for each connection.
Creating a new QP/CQ also completely forecloses any possibility of
getting stale completions or packets on the new connection.

Signed-off-by: Ishai Rabinovitz <ishai@mellanox.co.il>
Signed-off-by: Michael S. Tsirkin <mst@mellanox.co.il>

[ updated to current code from OFED, cleaned up commit message ]

Signed-off-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:32 -08:00
Bart Van Assche
ef6c49d87c IB/srp: Eliminate state SRP_TARGET_DEAD
Only queue removal work after having changed the target state
into SRP_TARGET_REMOVED and not if that state was already equal
to SRP_TARGET_REMOVED.  That allows us to remove the state
SRP_TARGET_DEAD.  Add a call to srp_disconnect_target() in
srp_remove_target() -- due to previous changes it is now safe to
invoke that function even if the IB connection has already
been disconnected.  This change allows us to replace the target
removal code in srp_remove_one() by an (indirect) call to
srp_remove_target().  Rename srp_target_port.work into
srp_target_port.remove_work to reflect its usage.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche
ee12d6a80c IB/srp: Introduce the helper function srp_remove_target()
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche
294c875a65 IB/srp: Suppress superfluous error messages
Keep track of the connection state.  Only report QP errors while
connected.  Only invoke ib_send_cm_dreq() when connected so that
invoking srp_disconnect_target() after having received a DREQ does not
cause an error message to be printed.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche
4f0af69799 IB/srp: Process all error completions
If the RDMA RC connection is closed, tell the SCSI mid-layer to
terminate all pending commands instead of only the first.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:31 -08:00
Bart Van Assche
948d1e889e IB/srp: Introduce srp_handle_qp_err()
Introduce the function srp_handle_qp_err(), change the type of
qp_in_error from int into bool and move the initialization of that
variable from srp_reconnect_target() to srp_connect_target().

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:30 -08:00
Bart Van Assche
224db15743 IB/srp: Simplify SCSI error handling
Since scsi_remove_host() has been modified so that SCSI error handling
functions will no longer be invoked after scsi_remove_host() returns,
the test at the start of srp_send_tsk_mgmt() is now superfluous.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:30 -08:00
Bart Van Assche
f371823120 IB/srp: Keep processing commands during host removal
Some SCSI upper layer drivers, e.g. sd, issue SCSI commands from
inside scsi_remove_host() (see the sd_shutdown() call in sd_remove()).
Make sure that these commands have a chance to reach the SCSI device.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:30 -08:00
Bart Van Assche
09be70a238 IB/srp: Eliminate state SRP_TARGET_CONNECTING
Block the SCSI host while reconnecting instead of representing the
reconnection activity as a distinct SRP target state.  This allows us
to eliminate the target state SRP_TARGET_CONNECTING.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:29 -08:00
Bart Van Assche
c9b03c1ae5 IB/srp: Increase block layer timeout
Increase the block layer timeout for disks so that it is above the
InfiniBand transport layer timeout.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-11-30 17:40:29 -08:00
Nicholas Bellinger
3e4f574857 ib_srpt: Convert TMR path to target_submit_tmr
This patch converts the TMR path in srpt_handle_tsk_mgmt() to use
target_submit_tmr() with TARGET_SCF_ACK_KREF flag usage.

v2: Drop ununused res in target_submit_tmr (Fengguang.Wu)

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-11-28 11:20:28 -08:00
Nicholas Bellinger
9474b04313 ib_srpt: Convert I/O path to target_submit_cmd + drop legacy ioctx->kref
This patch converts the main srpt_handle_cmd() I/O path to use modern
target_submit_cmd() with TARGET_SCF_ACK_KREF flag usage.  This includes
dropping the original internal ioctx->kref + srpt_put_send_ioctx() usage
in favor of target_put_sess_cmd() w/ se_cmd_t->cmd_kref within ib_srpt
response callbacks.

It also updates srpt_abort_cmd() to call target_put_sess_cmd() for
completion of aborted commands, and adds target_wait_for_sess_cmds() into
srpt_release_channel_work() to allow outstanding I/O to complete during
session shutdown.

Also, go ahead and update srpt_handle_tsk_mgmt() to make the remaining
transport_init_se_cmd() to setup the ioctx->cmd with se_tmr_req.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@kernel.org>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-11-28 01:10:59 -08:00
Christoph Hellwig
de103c93af target: pass sense_reason as a return value
Pass the sense reason as an explicit return value from the I/O submission
path instead of storing it in struct se_cmd and using negative return
values.  This cleans up a lot of the code pathes, and with the sparse
annotations for the new sense_reason_t type allows for much better
error checking.

(nab: Convert spc_emulate_modesense + spc_emulate_modeselect to use
      sense_reason_t with Roland's MODE SELECT changes)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-11-06 20:55:46 -08:00
Linus Torvalds
a188e7e93a Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull scsi target updates from Nicholas Bellinger:
 "Things have been calm for the most part with no new fabric drivers in
  flight for v3.7 (we're up to eight now !), so this update is primarily
  focused on addressing a few long-standing items within target-core and
  iscsi-target fabric code.

  The highlights include:

   - target: Simplify fabric sense data length handling (roland)
   - qla2xxx: Fix endianness of task management response code (roland)
   - target: fix truncation of mode data, support zero allocation length
     (paolo)
   - target: Properly support zero-length commands in normal processing
     path (paolo)
   - iscsi-target: Correctly set 0xffffffff field within ISCSI_OP_REJECT
     PDU (ronnie + nab)
   - iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG
     demo-mode (ronnie + nab)
   - target/file: Re-enable optional fd_buffered_io=1 operation (nab +
     hch)
   - iscsi-target: Add MaxXmitDataSegmenthLength forr target ->
     initiator MDRSL declaration (nab)
   - target: Add target_submit_cmd_map_sgls for SGL fabric memory
     passthrough (nab + hch)
   - tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls (hch +
     nab)
   - tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls (nab
     + hch)

  The last series for adding a new target_submit_cmd_map_sgls() fabric
  caller (as requested by hch) that accepts pre-allocated SGL memory
  (using existing logic), along with converting tcm_loop + tcm_vhost has
  only been in -next for the last days, but has gotten enough review
  +testing and is clear enough a mechanical change that I think it's
  reasonable to merge for -rc1 code.

  Thanks again to everyone who contributed this round! Extra special
  thanks to Roland (PureStorage) for tracking down the qla2xxx target
  TMR response code endian issue, and to Paolo (Redhat) for resolving
  the long standing zero-length CDB issues within target-core between
  virtual and pSCSI backends."

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (44 commits)
  iscsi-target: Bump defaults for nopin_timeout + nopin_response_timeout values
  iscsit: proper endianess conversions
  iscsit: use the itt_t abstract type
  iscsit: add missing endianess conversion in iscsit_check_inaddr_any
  iscsit: remove incorrect unlock in iscsit_build_sendtargets_resp
  iscsit: mark various functions static
  target/iscsi: precedence bug in iscsit_set_dataout_sequence_values()
  target/usb-gadget: strlen() doesn't count the terminator
  target/usb-gadget: remove duplicate initialization
  tcm_vhost: Convert I/O path to use target_submit_cmd_map_sgls
  target: Add control CDB READ payload zero work-around
  tcm_loop: Convert I/O path to use target_submit_cmd_map_sgls
  target: Add target_submit_cmd_map_sgls for SGL fabric memory passthrough
  iscsi-target: Add explicit set of cache_dynamic_acls=1 for TPG demo-mode
  iscsi-target: Change iscsi_target_seq_pdu_list.c to honor MaxXmitDataSegmentLength
  iscsi-target: Add MaxXmitDataSegmentLength connection recovery check
  iscsi-target: Convert incoming PDU payload checks to MaxXmitDataSegmentLength
  iscsi-target: Enable MaxXmitDataSegmentLength operation in login path
  iscsi-target: Add base MaxXmitDataSegmentLength code
  target/file: Re-enable optional fd_buffered_io=1 operation
  ...
2012-10-10 19:52:19 +09:00
Roland Dreier
56040f07b2 Merge branches 'cma', 'ipoib', 'iser', 'mlx4' and 'nes' into for-next 2012-10-04 19:12:22 -07:00
Alex Tabachnik
5a33a66949 IB/iser: Add more RX CQs to scale out processing of SCSI responses
RX/TX CQs will now be selected from a per HCA pool.  For the RX flow
this has the effect of using different interrupt vectors when using
low level drivers (such as mlx4) that map the "vector" param provided
by the ULP on CQ creation to a dedicated IRQ/MSI-X vector.  This
allows the RX flow processing of IO responses to be distributed across
multiple CPUs.

QPs (--> iSER sessions) are assigned to CQs in round robin order using
the CQ with the minimum number of sessions attached to it.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Alex Tabachnik <alext@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-03 21:26:49 -07:00
Roland Dreier
71d9c5f9e6 IPoIB: Fix build with CONFIG_INFINIBAND_IPOIB_CM=n
With the new netlink support in commit 862096a8bb ("IB/ipoib: Add more
rtnl_link_ops callbacks") we need ipoib_set_mode() to be available even
if connected mode isn't built.  Move the function from ipoib_cm.c to
ipoib_main.c (and make a few CM-related macros available unconditonally).

This fixes the build error

    drivers/built-in.o: In function 'ipoib_changelink':
    ipoib_netlink.c:(.text+0x6a5fc9): undefined reference to 'ipoib_set_mode'
    ipoib_netlink.c:(.text+0x6a5fe3): undefined reference to 'ipoib_set_mode'

when CONFIG_INFINIBAND_IPOIB_CM isn't set.

Reported-by: Randy Dunlap <rdunlap@xenotime.net>
Reported-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-10-02 21:33:41 -07:00
Linus Torvalds
7a9a2970b5 First batch of InfiniBand/RDMA changes for the 3.7 merge window:
- mlx4 IB support for SR-IOV
  - A couple of SRP initiator fixes
  - Batch of nes hardware driver fixes
  - Fix for long-standing use-after-free crash in IPoIB
  - Other miscellaneous fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQav4jAAoJEENa44ZhAt0hmL0QAJTuMdSOzYFd/NB38owJCNM2
 kz/N1GlBm3z98fIlGo8u+lzgV2qxqZSAzsJsouMeK38KiAX3CL8HKe44A1QvTM6v
 dXTNL4JFX24/YF+nlmMY8Av518I9Mkte3BZCnpYkBjVFBWe0ePwoRC/btfBXPDIV
 0snq4OtjoBAn00dOOyuZ5PoyY9xf0z4UB0Gple9sM4mzEb8wVWdNDDPOiuPJc6fA
 L+gk6HLkZDg54+QswafdKYwpeTq45wIKLmCdS3oUNmppMLVhZY8rECOwzSa+KiTr
 /Yo19n+zl+IBlvjQHhmUqGHvdD17PaGlr+TckAsQqmVfXUH5qqpEnkF8FoEK59c5
 YA3lVU8Sj4BPhJ7qX54CuN3767mZizakkxCr9iPRzABFTgzWVgcSgCrE8jjx4i0h
 Pam+L5bmANFStgmGR8PmXiNgcrCUcEqYHsOWDDAnHa5ekb2nyv1JL1c18hlY9hC3
 Xb1YTMZFwvofGza89hBu7oHrMbLOUc5kW2lBpvUn2nlyf3i0F8ISlVbVbNjFA54p
 60/jHa2VOQ2CcJUJKnJOk4ajOOEfHnPtMn2q96XJ69Dp8+eSYEO/G+0i1OlChq4h
 ClnG0Yp+NkT1o8WXMd7guDR+RsXt+DXIij5TiUWRIqnIlopIsMTRhNH28tMu4jQL
 fgN5n987wru91ewdX4gW
 =PAcy
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull infiniband updates from Roland Dreier:
 "First batch of InfiniBand/RDMA changes for the 3.7 merge window:
   - mlx4 IB support for SR-IOV
   - A couple of SRP initiator fixes
   - Batch of nes hardware driver fixes
   - Fix for long-standing use-after-free crash in IPoIB
   - Other miscellaneous fixes"

This merge also removes a new use of __cancel_delayed_work(), and
replaces it with the regular cancel_delayed_work() that is now irq-safe
thanks to the workqueue updates.

That said, I suspect the sequence in question should probably use
"mod_delayed_work()".  I just did the minimal "don't use deprecated
functions" fixup, though.

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (45 commits)
  IB/qib: Fix local access validation for user MRs
  mlx4_core: Disable SENSE_PORT for multifunction devices
  mlx4_core: Clean up enabling of SENSE_PORT for older (ConnectX-1/-2) HCAs
  mlx4_core: Stash PCI ID driver_data in mlx4_priv structure
  IB/srp: Avoid having aborted requests hang
  IB/srp: Fix use-after-free in srp_reset_req()
  IB/qib: Add a qib driver version
  RDMA/nes: Fix compilation error when nes_debug is enabled
  RDMA/nes: Print hardware resource type
  RDMA/nes: Fix for crash when TX checksum offload is off
  RDMA/nes: Cosmetic changes
  RDMA/nes: Fix for incorrect MSS when TSO is on
  RDMA/nes: Fix incorrect resolving of the loopback MAC address
  mlx4_core: Fix crash on uninitialized priv->cmd.slave_sem
  mlx4_core: Trivial cleanups to driver log messages
  mlx4_core: Trivial readability fix: "0X30" -> "0x30"
  IB/mlx4: Create paravirt contexts for VFs when master IB driver initializes
  mlx4: Modify proxy/tunnel QP mechanism so that guests do no calculations
  mlx4: Paravirtualize Node Guids for slaves
  mlx4: Activate SR-IOV mode for IB
  ...
2012-10-02 17:20:40 -07:00
Roland Dreier
d172f5a4ab Merge branches 'cma', 'cxgb4', 'ipoib', 'mlx4', 'mlx4-sriov', 'nes', 'qib' and 'srp' into for-linus 2012-10-02 07:43:59 -07:00
Or Gerlitz
862096a8bb IB/ipoib: Add more rtnl_link_ops callbacks
Add the rtnl_link_ops changelink and fill_info callbacks, through
which the admin can now set/get the driver mode, etc policies.
Maintain the proprietary sysfs entries only for legacy childs.

For child devices, set dev->iflink to point to the parent
device ifindex, such that user space tools can now correctly
show the uplink relation as done for vlan, macvlan, etc
devices. Pointed out by Patrick McHardy <kaber@trash.net>

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-10-01 17:12:22 -04:00
Bart Van Assche
d853667091 IB/srp: Avoid having aborted requests hang
We need to call scsi_done() for commands after we abort them.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:36:48 -07:00
Bart Van Assche
9b796d06d5 IB/srp: Fix use-after-free in srp_reset_req()
srp_free_req() uses the scsi_cmnd structure contents to unmap
buffers, so we must invoke srp_free_req() before we release
ownership of that structure.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:36:47 -07:00
Patrick McHardy
bea1e22df4 IPoIB: Fix use-after-free of multicast object
Fix a crash in ipoib_mcast_join_task().  (with help from Or Gerlitz)

Commit c8c2afe360 ("IPoIB: Use rtnl lock/unlock when changing device
flags") added a call to rtnl_lock() in ipoib_mcast_join_task(), which
is run from the ipoib_workqueue, and hence the workqueue can't be
flushed from the context of ipoib_stop().

In the current code, ipoib_stop() (which doesn't flush the workqueue)
calls ipoib_mcast_dev_flush(), which goes and deletes all the
multicast entries.  This takes place without any synchronization with
a possible running instance of ipoib_mcast_join_task() for the same
ipoib device, leading to a crash due to NULL pointer dereference.

Fix this by making sure that the workqueue is flushed before
ipoib_mcast_dev_flush() is called.  To make that possible, we move the
RTNL-lock wrapped code to ipoib_mcast_join_finish().

Signed-off-by: Patrick McHardy <kaber@trash.net>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-30 20:32:33 -07:00
David S. Miller
6a06e5e1bb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	drivers/net/team/team.c
	drivers/net/usb/qmi_wwan.c
	net/batman-adv/bat_iv_ogm.c
	net/ipv4/fib_frontend.c
	net/ipv4/route.c
	net/l2tp/l2tp_netlink.c

The team, fib_frontend, route, and l2tp_netlink conflicts were simply
overlapping changes.

qmi_wwan and bat_iv_ogm were of the "use HEAD" variety.

With help from Antonio Quartulli.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-28 14:40:49 -04:00
Or Gerlitz
9baa0b0364 IB/ipoib: Add rtnl_link_ops support
Add rtnl_link_ops to IPoIB, with the first usage being child device
create/delete through them. Childs devices are now either legacy ones,
created/deleted through the ipoib sysfs entries, or RTNL ones.

Adding support for RTNL childs involved refactoring of ipoib_vlan_add
which is now used by both the sysfs and the link_ops code.

Also, added ndo_uninit entry to support calling unregister_netdevice_queue
from the rtnl dellink entry. This required removal of calls to
ipoib_dev_cleanup from the driver in flows which use unregister_netdevice,
since the networking core will invoke ipoib_uninit which does exactly that.

Signed-off-by: Erez Shitrit <erezsh@mellanox.co.il>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-09-20 16:49:17 -04:00
Roland Dreier
9c58b7ddd7 target: Simplify fabric sense data length handling
Every fabric driver has to supply a se_tfo->set_fabric_sense_len()
method, just so iSCSI can return an offset of 2.  However, every fabric
driver is already allocating a sense buffer and passing it into the
target core, either via transport_init_se_cmd() or target_submit_cmd().

So instead of having iSCSI pass the start of its sense buffer into the
core and then later tell the core to skip the first 2 bytes, it seems
easier for iSCSI just to do the offset of 2 when it passes the sense
buffer into the core.  Then we can drop the se_tfo->set_fabric_sense_len()
everywhere, and just add a couple of lines of code to iSCSI to set the
sense data length to the beginning of the buffer right before it sends
it over the network.

(nab: Remove .set_fabric_sense_len usage from tcm_qla2xxx_npiv_ops +
      change transport_get_sense_buffer to follow v3.6-rc6 code w/o
      ->set_fabric_sense_len usage)

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-09-17 17:12:58 -07:00
Roland Dreier
2ed772b7b9 target: Remove unused target_core_fabric_ops.get_fabric_sense_len method
There are no callers of se_tfo->get_fabric_sense_len(), so we should
stop having every fabric driver implement it.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-09-17 16:15:47 -07:00
Shlomo Pongratz
b5120a6e11 IPoIB: Fix AB-BA deadlock when deleting neighbours
Lockdep points out a circular locking dependency betwwen the ipoib
device priv spinlock (priv->lock) and the neighbour table rwlock
(ntbl->rwlock).

In the normal path, ie neigbour garbage collection task, the neigh
table rwlock is taken first and then if the neighbour needs to be
deleted, priv->lock is taken.

However in some error paths, such as in ipoib_cm_handle_tx_wc(),
priv->lock is taken first and then ipoib_neigh_free routine is called
which in turn takes the neighbour table ntbl->rwlock.

The solution is to get rid the neigh table rwlock completely and use
only priv->lock.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-12 09:21:45 -07:00
Shlomo Pongratz
66172c0993 IPoIB: Fix memory leak in the neigh table deletion flow
If the neighbours hash table is empty when unloading the module, then
ipoib_flush_neighs(), the cleanup routine, isn't called and the
memory used for the hash table itself leaked.

To fix this, ipoib_flush_neighs() is allways called, and another
completion object is added to signal when the table is freed.

Once invoked, ipoib_flush_neighs() flushes all the neighbours (if
there are any), calls the the hash table RCU free routine, which now
signals completion of the deletion process, and waits for the last
neighbour to be freed.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-09-12 09:05:03 -07:00
Roland Dreier
c0369b296e Merge branches 'cma', 'ipoib', 'misc', 'mlx4', 'ocrdma', 'qib' and 'srp' into for-next 2012-08-16 09:38:39 -07:00
Bart Van Assche
220329916c IB/srp: Fix a race condition
Avoid a crash caused by the scmnd->scsi_done(scmnd) call in
srp_process_rsp() being invoked with scsi_done == NULL.  This can
happen if a reply is received during or after a command abort.

Reported-by: Joseph Glanville <joseph.glanville@orionvm.com.au>
Reference: http://marc.info/?l=linux-rdma&m=134314367801595
Cc: <stable@vger.kernel.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 12:00:48 -07:00
Masanari Iida
142ad5db2b IB: Fix typos in infiniband drivers
Correct spelling typos in comments in drivers/infiniband.

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-15 11:56:19 -07:00
Shlomo Pongratz
6c723a68c6 IB/ipoib: Fix RCU pointer dereference of wrong object
Commit b63b70d877 ("IPoIB: Use a private hash table for path lookup
in xmit path") introduced a bug where in ipoib_neigh_free() (which is
called from a few errors flows in the driver), rcu_dereference() is
invoked with the wrong pointer object, which results in a crash.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-14 15:21:44 -07:00
Shlomo Pongratz
fa16ebed31 IB/ipoib: Add missing locking when CM object is deleted
Commit b63b70d877 ("IPoIB: Use a private hash table for path lookup
in xmit path") introduced a bug where in ipoib_cm_destroy_tx() a CM
object is moved between lists without any supported locking.  Under a
stress test, this eventually leads to list corruption and a crash.

Previously when this routine was called, callers were taking the
device priv lock.  Currently this function is called from the RCU
callback associated with neighbour deletion.  Fix the race by taking
the same lock we used to before.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-08-14 15:21:44 -07:00
Shlomo Pongratz
b63b70d877 IPoIB: Use a private hash table for path lookup in xmit path
Dave Miller <davem@davemloft.net> provided a detailed description of
why the way IPoIB is using neighbours for its own ipoib_neigh struct
is buggy:

    Any time an ipoib_neigh is changed, a sequence like the following is made:

    			spin_lock_irqsave(&priv->lock, flags);
    			/*
    			 * It's safe to call ipoib_put_ah() inside
    			 * priv->lock here, because we know that
    			 * path->ah will always hold one more reference,
    			 * so ipoib_put_ah() will never do more than
    			 * decrement the ref count.
    			 */
    			if (neigh->ah)
    				ipoib_put_ah(neigh->ah);
    			list_del(&neigh->list);
    			ipoib_neigh_free(dev, neigh);
    			spin_unlock_irqrestore(&priv->lock, flags);
    			ipoib_path_lookup(skb, n, dev);

    This doesn't work, because you're leaving a stale pointer to the freed up
    ipoib_neigh in the special neigh->ha pointer cookie.  Yes, it even fails
    with all the locking done to protect _changes_ to *ipoib_neigh(n), and
    with the code in ipoib_neigh_free() that NULLs out the pointer.

    The core issue is that read side calls to *to_ipoib_neigh(n) are not
    being synchronized at all, they are performed without any locking.  So
    whether we hold the lock or not when making changes to *ipoib_neigh(n)
    you still can have threads see references to freed up ipoib_neigh
    objects.

    	cpu 1			cpu 2
    	n = *ipoib_neigh()
    				*ipoib_neigh() = NULL
    				kfree(n)
    	n->foo == OOPS

    [..]

    Perhaps the ipoib code can have a private path database it manages
    entirely itself, which holds all the necessary information and is
    looked up by some generic key which is available easily at transmit
    time and does not involve generic neighbour entries.

See <http://marc.info/?l=linux-rdma&m=132812793105624&w=2> and
<http://marc.info/?l=linux-rdma&w=2&r=1&s=allows+references+to+freed+memory&q=b>
for the full discussion.

This patch aims to solve the race conditions found in the IPoIB driver.

The patch removes the connection between the core networking neighbour
structure and the ipoib_neigh structure.  In addition to avoiding the
race described above, it allows us to handle SKBs carrying IP packets
that don't have any associated neighbour.

We add an ipoib_neigh hash table with N buckets where the key is the
destination hardware address.  The ipoib_neigh is fetched from the
hash table and instead of the stashed location in the neighbour
structure. The hash table uses both RCU and reference counting to
guarantee that no ipoib_neigh instance is ever deleted while in use.

Fetching the ipoib_neigh structure instance from the hash also makes
the special code in ipoib_start_xmit that handles remote and local
bonding failover redundant.

Aged ipoib_neigh instances are deleted by a garbage collection task
that runs every M seconds and deletes every ipoib_neigh instance that
was idle for at least 2*M seconds. The deletion is safe since the
ipoib_neigh instances are protected using RCU and reference count
mechanisms.

The number of buckets (N) and frequency of running the GC thread (M),
are taken from the exported arb_tbl.

Signed-off-by: Shlomo Pongratz <shlomop@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-30 07:46:50 -07:00
Linus Torvalds
5dedb9f3bd InfiniBand/RDMA changes for the 3.6 merge window:
- Updates to the qib low-level driver
  - First chunk of changes for SR-IOV support for mlx4 IB
  - RDMA CM support for IPv6-only binding
  - Other misc cleanups and fixes
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJQDXhUAAoJEENa44ZhAt0hZxwQAJydS9f9me31AGa45SAq8rA8
 6e3LgnQ6jS38he7LZZdSsT7g25jROCKOcTj6VUkNSVAVSkK8zcp7wngjDxw50IK6
 FF9hNeljMkWBflOhxzB34DRGCW4b4J+Yt1o7v1RtawiG/Mhri/6imKL0Aqjt4GwX
 a1MPZn+xI2osLujfdHJtATPWWB9jCaXdFe4DJUNPdqJhS6TN7s8OP3XMiqoJtdaV
 ptHeSSbjdUR1mg/h2LU2FVmWHXNSBxn7MEsrBBRkQVyiEkXieFBLwPTp4DqmAUJf
 xugm6Hf4sbiQ+QuU0baJODt56wveuYVQ4HKUzE0urUFXyU4TUB9blrehWZnKsRte
 wo/w4nvVlnXgGhHH0Igq76RDX8aCwc/6uQJ/29oChWjrei3HE0LjmIlPAu0vAhyw
 ViLe02/r2gKXQv1NxIqhPmGsJTZizg2mUk2eEJHHPQb/NGL7iNo6b+141AfURoqQ
 goGmlGdzffCRpmo6FXFZ57RPVRS4gwMunCY/Pmvq5a2t4oZh8899l2+V3N7bCfvH
 +JdavxAjia9U4IlgPsAqVaz8z8TyHOY/lEd75Wnw8q9www3kLx7hASvHsdwrS4LL
 ihzECzsaXOSoIXQzghs+6iyp1pzmzE9ve8OqbIGJStzlrOLyn7Gjo+Ixwm0SLsTO
 I7h9iJ1OMKFZ8bzmHsWb
 =f09j
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Updates to the qib low-level driver
 - First chunk of changes for SR-IOV support for mlx4 IB
 - RDMA CM support for IPv6-only binding
 - Other misc cleanups and fixes

Fix up some add-add conflicts in include/linux/mlx4/device.h and
drivers/net/ethernet/mellanox/mlx4/main.c

* tag 'rdma-for-3.6' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (30 commits)
  IB/qib: checkpatch fixes
  IB/qib: Add congestion control agent implementation
  IB/qib: Reduce sdma_lock contention
  IB/qib: Fix an incorrect log message
  IB/qib: Fix QP RCU sparse warnings
  mlx4: Put physical GID and P_Key table sizes in mlx4_phys_caps struct and paravirtualize them
  mlx4_core: Allow guests to have IB ports
  mlx4_core: Implement mechanism for reserved Q_Keys
  net/mlx4_core: Free ICM table in case of error
  IB/cm: Destroy idr as part of the module init error flow
  mlx4_core: Remove double function declarations
  IB/mlx4: Fill the masked_atomic_cap attribute in query device
  IB/mthca: Fill in sq_sig_type in query QP
  IB/mthca: Warning about event for non-existent QPs should show event type
  IB/qib: Fix sparse RCU warnings in qib_keys.c
  net/mlx4_core: Initialize IB port capabilities for all slaves
  mlx4: Use port management change event instead of smp_snoop
  IB/qib: RCU locking for MR validation
  IB/qib: Avoid returning EBUSY from MR deregister
  IB/qib: Fix UC MR refs for immediate operations
  ...
2012-07-24 13:56:26 -07:00
Linus Torvalds
3c4cfadef6 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next
Pull networking changes from David S Miller:

 1) Remove the ipv4 routing cache.  Now lookups go directly into the FIB
    trie and use prebuilt routes cached there.

    No more garbage collection, no more rDOS attacks on the routing
    cache.  Instead we now get predictable and consistent performance,
    no matter what the pattern of traffic we service.

    This has been almost 2 years in the making.  Special thanks to
    Julian Anastasov, Eric Dumazet, Steffen Klassert, and others who
    have helped along the way.

    I'm sure that with a change of this magnitude there will be some
    kind of fallout, but such things ought the be simple to fix at this
    point.  Luckily I'm not European so I'll be around all of August to
    fix things :-)

    The major stages of this work here are each fronted by a forced
    merge commit whose commit message contains a top-level description
    of the motivations and implementation issues.

 2) Pre-demux of established ipv4 TCP sockets, saves a route demux on
    input.

 3) TCP SYN/ACK performance tweaks from Eric Dumazet.

 4) Add namespace support for netfilter L4 conntrack helpers, from Gao
    Feng.

 5) Add config mechanism for Energy Efficient Ethernet to ethtool, from
    Yuval Mintz.

 6) Remove quadratic behavior from /proc/net/unix, from Eric Dumazet.

 7) Support for connection tracker helpers in userspace, from Pablo
    Neira Ayuso.

 8) Allow userspace driven TX load balancing functions in TEAM driver,
    from Jiri Pirko.

 9) Kill off NLMSG_PUT and RTA_PUT macros, more gross stuff with
    embedded gotos.

10) TCP Small Queues, essentially minimize the amount of TCP data queued
    up in the packet scheduler layer.  Whereas the existing BQL (Byte
    Queue Limits) limits the pkt_sched --> netdevice queuing levels,
    this controls the TCP --> pkt_sched queueing levels.

    From Eric Dumazet.

11) Reduce the number of get_page/put_page ops done on SKB fragments,
    from Alexander Duyck.

12) Implement protection against blind resets in TCP (RFC 5961), from
    Eric Dumazet.

13) Support the client side of TCP Fast Open, basically the ability to
    send data in the SYN exchange, from Yuchung Cheng.

    Basically, the sender queues up data with a sendmsg() call using
    MSG_FASTOPEN, then they do the connect() which emits the queued up
    fastopen data.

14) Avoid all the problems we get into in TCP when timers or PMTU events
    hit a locked socket.  The TCP Small Queues changes added a
    tcp_release_cb() that allows us to queue work up to the
    release_sock() caller, and that's what we use here too.  From Eric
    Dumazet.

15) Zero copy on TX support for TUN driver, from Michael S. Tsirkin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next: (1870 commits)
  genetlink: define lockdep_genl_is_held() when CONFIG_LOCKDEP
  r8169: revert "add byte queue limit support".
  ipv4: Change rt->rt_iif encoding.
  net: Make skb->skb_iif always track skb->dev
  ipv4: Prepare for change of rt->rt_iif encoding.
  ipv4: Remove all RTCF_DIRECTSRC handliing.
  ipv4: Really ignore ICMP address requests/replies.
  decnet: Don't set RTCF_DIRECTSRC.
  net/ipv4/ip_vti.c: Fix __rcu warnings detected by sparse.
  ipv4: Remove redundant assignment
  rds: set correct msg_namelen
  openvswitch: potential NULL deref in sample()
  tcp: dont drop MTU reduction indications
  bnx2x: Add new 57840 device IDs
  tcp: avoid oops in tcp_metrics and reset tcpm_stamp
  niu: Change niu_rbr_fill() to use unlikely() to check niu_rbr_add_page() return value
  niu: Fix to check for dma mapping errors.
  net: Fix references to out-of-scope variables in put_cmsg_compat()
  net: ethernet: davinci_emac: add pm_runtime support
  net: ethernet: davinci_emac: Remove unnecessary #include
  ...
2012-07-24 10:01:50 -07:00
Linus Torvalds
cb47c1831f Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull target updates from Nicholas Bellinger:
 "There have been lots of work in a number of areas this past round.
  The highlights include:

   - Break out target_core_cdb.c emulation into SPC/SBC ops (hch)
   - Add a parse_cdb method to target backend drivers (hch)
   - Move sync_cache + write_same + unmap into spc_ops (hch)
   - Use target_execute_cmd for WRITEs in iscsi_target + srpt (hch)
   - Offload WRITE I/O backend submission in tcm_qla2xxx + tcm_fc (hch +
     nab)
   - Refactor core_update_device_list_for_node() into enable/disable
     funcs (agrover)
   - Replace the TCM processing thread with a TMR work queue (hch)
   - Fix regression in transport_add_device_to_core_hba from TMR
     conversion (DanC)
   - Remove racy, now-redundant check of sess_tearing_down with qla2xxx
     (roland)
   - Add range checking, fix reading of data len + possible underflow in
     UNMAP (roland)
   - Allow for target_submit_cmd() returning errors + convert fabrics
     (roland + nab)
   - Drop bogus struct file usage for iSCSI/SCTP (viro)"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (54 commits)
  iscsi-target: Drop bogus struct file usage for iSCSI/SCTP
  target: NULL dereference on error path
  target: Allow for target_submit_cmd() returning errors
  target: Check number of unmap descriptors against our limit
  target: Fix possible integer underflow in UNMAP emulation
  target: Fix reading of data length fields for UNMAP commands
  target: Add range checking to UNMAP emulation
  target: Add generation of LOGICAL BLOCK ADDRESS OUT OF RANGE
  target: Make unnecessarily global se_dev_align_max_sectors() static
  target: Remove se_session.sess_wait_list
  qla2xxx: Remove racy, now-redundant check of sess_tearing_down
  target: Check sess_tearing_down in target_get_sess_cmd()
  sbp-target: Consolidate duplicated error path code in sbp_handle_command()
  target: Un-export target_get_sess_cmd()
  qla2xxx: Get rid of redundant qla_tgt_sess.tearing_down
  target: Make core_disable_device_list_for_node use pre-refactoring lock ordering
  target: refactor core_update_device_list_for_node()
  target: Eliminate else using boolean logic
  target: Misc retval cleanups
  target: Remove hba param from core_dev_add_lun
  ...
2012-07-22 13:31:57 -07:00
David S. Miller
6700c2709c net: Pass optional SKB and SK arguments to dst_ops->{update_pmtu,redirect}()
This will be used so that we can compose a full flow key.

Even though we have a route in this context, we need more.  In the
future the routes will be without destination address, source address,
etc. keying.  One ipv4 route will cover entire subnets, etc.

In this environment we have to have a way to possess persistent storage
for redirects and PMTU information.  This persistent storage will exist
in the FIB tables, and that's why we'll need to be able to rebuild a
full lookup flow key here.  Using that flow key will do a fib_lookup()
and create/update the persistent entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-17 03:29:28 -07:00
Christoph Hellwig
e672a47fd9 srpt: use target_execute_cmd for WRITEs in srpt_handle_rdma_comp
srpt_handle_rdma_comp is called from kthread context and thus can execute
target_execute_cmd directly.  srpt_abort_cmd sets the CMD_T_LUN_STOP
flag directly, and thus the abuse of transport_generic_handle_data can be
replaced with an opencoded variant of that code path.  I'm still not happy
about a fabric driver poking into target core internals like this, but
let's defer the bigger architecture changes for now.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-07-16 17:35:18 -07:00
David S. Miller
04c9f416e3 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Conflicts:
	net/batman-adv/bridge_loop_avoidance.c
	net/batman-adv/bridge_loop_avoidance.h
	net/batman-adv/soft-interface.c
	net/mac80211/mlme.c

With merge help from Antonio Quartulli (batman-adv) and
Stephen Rothwell (drivers/net/usb/qmi_wwan.c).

The net/mac80211/mlme.c conflict seemed easy enough, accounting for a
conversion to some new tracing macros.

Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 23:56:33 -07:00
Eric Dumazet
b28ba72665 IPoIB: fix skb truesize underestimatiom
Or Gerlitz reported triggering of WARN_ON_ONCE(delta < len); in
skb_try_coalesce()
This warning tracks drivers that incorrectly set skb->truesize

IPoIB indeed allocates a full page to store a fragment, but only
accounts in skb->truesize the used part of the page (frame length)

This patch fixes skb truesize underestimation, and
also fixes a performance issue, because RX skbs have not enough tailroom
to allow IP and TCP stacks to pull their header in skb linear part
without an expensive call to pskb_expand_head()

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: Erez Shitrit <erezsh@mellanox.com>
Cc: Shlomo Pongartz <shlomop@mellanox.com>
Cc: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-10 23:33:12 -07:00
Roland Dreier
d90f9b3591 IB: Use IS_ENABLED(CONFIG_IPV6)
Instead of testing defined(CONFIG_IPV6) || defined(CONFIG_IPV6_MODULE)

Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-07-08 18:04:32 -07:00
David S. Miller
700db99d01 ipoib: Need to do dst_neigh_lookup_skb() outside of priv->lock.
Otherwise local_bh_enable() complains.

Reported-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 21:08:05 -07:00
David S. Miller
178709bbfe ipoib: Convert over to dev_lookup_neigh_skb().
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-07-05 01:09:36 -07:00
Linus Torvalds
c23ddf7857 InfiniBand/RDMA changes for the 3.5 merge window:
- Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
  - Add generic and mlx4 support for "raw" QPs: allow suitably privileged
    applications to send and receive arbitrary packets directly to/from
    the hardware
  - Add "doorbell drop" handling to the cxgb4 driver
  - A fairly large batch of qib hardware driver changes
  - A few fixes for lockdep-detected issues
  - A few other miscellaneous fixes and cleanups
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPumbZAAoJEENa44ZhAt0h8LgP/0fXe7Szm3n6P6UvMAVqkagM
 4PpreH3mpWUFpzqeQE1JPDtgx700R6aPipbHqgIN+k61RWMpLjICGcNx7iwxn1I+
 zqdquGygWgjceLz+BLVlk+iBmJt3vZ3fPRAXc7fdP+jhIarWkNIOy1pXWTUuRvED
 jL8jIaxhCcgAVzm/zNyt6IPxkaHvCz7K9wqmpyU0dsO9OyPdGvWA9+CkGXwmOCPq
 mxSVhWnfGsMkPBsL7EgTC5KP/ox2PKq6rFgysmVVS+rKCpP0L8BEVQyGX3Gf8KA8
 yV+KdTi9ofDnFrv6R7Wz0v7HRUih8GRssakzBu7Y7HLfK1M/QwMG0GUAibXGZObc
 vUXuQ3uRJ/cIzMPXqKeGYwpb5t+TmxyjhWu44OjCUQkNau91+9BSbA69S88KXc49
 aTJiCZlhPuGf4uGMWJJuPLcE2xO2QCZj+8ckL2STYrIip6GWlCH02kJaQmRkuWH2
 UhMOeJDBC4nvh4EQT/WwHpGzyhkavE2ayfo5YemxBJXo+P5Mdbf7WIDRQDLUEeQH
 F8sPoccH4hDiAorN/SkTsm14jVTP7oWW1M40Ont59Nhbgm88MsVkvjoneHnfBvbD
 HjK92soCWnYTAoREfj0G4xUxZgMdOZcezWrX0rx5LJ8Ju9y4zAi3cKGr7lg6hs4X
 syKfN0VjiDRtJ+pxayi3
 =yWfr
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes from Roland Dreier:
 - Add ocrdma hardware driver for Emulex IB-over-Ethernet adapters
 - Add generic and mlx4 support for "raw" QPs: allow suitably privileged
   applications to send and receive arbitrary packets directly to/from
   the hardware
 - Add "doorbell drop" handling to the cxgb4 driver
 - A fairly large batch of qib hardware driver changes
 - A few fixes for lockdep-detected issues
 - A few other miscellaneous fixes and cleanups

Fix up trivial conflict in drivers/net/ethernet/emulex/benet/be.h.

* tag 'rdma-for-3.5' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (53 commits)
  RDMA/cxgb4: Include vmalloc.h for vmalloc and vfree
  IB/mlx4: Fix mlx4_ib_add() error flow
  IB/core: Fix IB_SA_COMP_MASK macro
  IB/iser: Fix error flow in iser ep connection establishment
  IB/mlx4: Increase the number of vectors (EQs) available for ULPs
  RDMA/cxgb4: Add query_qp support
  RDMA/cxgb4: Remove kfifo usage
  RDMA/cxgb4: Use vmalloc() for debugfs QP dump
  RDMA/cxgb4: DB Drop Recovery for RDMA and LLD queues
  RDMA/cxgb4: Disable interrupts in c4iw_ev_dispatch()
  RDMA/cxgb4: Add DB Overflow Avoidance
  RDMA/cxgb4: Add debugfs RDMA memory stats
  cxgb4: DB Drop Recovery for RDMA and LLD queues
  cxgb4: Common platform specific changes for DB Drop Recovery
  cxgb4: Detect DB FULL events and notify RDMA ULD
  RDMA/cxgb4: Drop peer_abort when no endpoint found
  RDMA/cxgb4: Always wake up waiters in c4iw_peer_abort_intr()
  mlx4_core: Change bitmap allocator to work in round-robin fashion
  RDMA/nes: Don't call event handler if pointer is NULL
  RDMA/nes: Fix for the ORD value of the connecting peer
  ...
2012-05-21 17:54:55 -07:00
Linus Torvalds
c9bfa7d75b Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull scsi-target changes from Nicholas Bellinger:
 "There has been lots of work in existing code in a number of areas this
  past cycle.  The major highlights have been:

   * Removal of transport_do_task_sg_chain() from core + fabrics
     (Roland)
   * target-core: Removal of se_task abstraction from target-core and
     enforce hw_max_sectors for pSCSI backends (hch)
   * Re-factoring of iscsi-target tx immediate/response queues (agrover)
   * Conversion of iscsi-target back to using target core memory
     allocation logic (agrover)

  We've had one last minute iscsi-target patch go into for-next to
  address a nasty regression bug related to the target core allocation
  logic conversion from agrover that is not included in friday's
  linux-next build, but has been included in this series.

  On the new fabric module code front for-3.5, here is a brief status
  update for the three currently in flight this round:

   * usb-gadget target driver:

  Sebastian Siewior's driver for supporting usb-gadget target mode
  operation.  This will be going out as a separate PULL request from
  target-pending/usb-target-merge with subsystem maintainer ACKs.  There
  is one minor target-core patch in this series required to function.

   * sbp ieee-1394/firewire target driver:

  Chris Boot's driver for supportting the Serial Block Protocol (SBP)
  across IEEE-1394 Firewire hardware.  This will be going out as a
  separate PULL request from target-pending/sbp-target-merge with two
  additional drivers/firewire/ patches w/ subsystem maintainer ACKs.

   * qla2xxx LLD target mode infrastructure changes + tcm_qla2xxx:

  The Qlogic >= 24xx series HW target mode LLD infrastructure patch-set
  and tcm_qla2xxx fabric driver.  Support for FC target mode using
  qla2xxx LLD code has been officially submitted by Qlogic to James
  below, and is currently outstanding but not yet merged into
  scsi.git/for-next..

    [PATCH 00/22] qla2xxx: Updates for scsi "misc" branch
    http://www.spinics.net/lists/linux-scsi/msg59350.html

  Note there are *zero* direct dependencies upon this for-next series
  for the qla2xxx LLD target + tcm_qla2xxx patches submitted above, and
  over the last days the target mode team has been tracking down an
  tcm_qla2xxx specific active I/O shutdown bug that appears to now be
  almost squashed for 3.5-rc-fixes."

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (47 commits)
  iscsi-target: Fix iov_count calculation bug in iscsit_allocate_iovecs
  iscsi-target: remove dead code in iscsi_check_valuelist_for_support
  target: Handle ATA_16 passthrough for pSCSI backend devices
  target: Add MI_REPORT_TARGET_PGS ext. header + implict_trans_secs attribute
  target: Fix MAINTENANCE_IN service action CDB checks to use lower 5 bits
  target: add support for the WRITE_VERIFY command
  target: make target_put_session void
  target: cleanup transport_execute_tasks()
  target: Remove max_sectors device attribute for modern se_task less code
  target: lock => unlock typo in transport_lun_wait_for_tasks
  target: Enforce hw_max_sectors for SCF_SCSI_DATA_SG_IO_CDB
  target: remove the t_se_count field in struct se_cmd
  target: remove the t_task_cdbs_ex_left field in struct se_cmd
  target: remove the t_task_cdbs_left field in struct se_cmd
  target: remove struct se_task
  target: move the state and execute lists to the command
  target: simplify command to task linkage
  target: always allocate a single task
  target: replace ->execute_task with ->execute_cmd
  target: remove the task_sectors field in struct se_task
  ...
2012-05-21 17:37:09 -07:00
Or Gerlitz
7d9c0de4ab IB/iser: Fix error flow in iser ep connection establishment
The current error flow code was releasing the IB connection object and
calling iscsi_destroy_endpoint() directly without going through the
reference counting mechanism introduced in commit 39ff05d ("IB/iser:
Enhance disconnection logic for multi-pathing"). This resulted in a
double free of the iscsi endpoint object, which causes a kernel NULL
pointer dereference.  Fix that by plugging into the IB conn reference
counting correctly.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-05-18 17:05:31 -07:00
Andy Grover
a12f41f841 target: Rename target_allocate_tasks to target_setup_cmd_from_cdb
This patch renames a horribly misnamed function that no longer allocate
tasks to something more descriptive for it's modern use in target core.

(nab: Fix up ib_srpt to use this as well ahead of a target_submit_cmd
conversion)

Signed-off-by: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-04-14 17:40:36 -07:00
Roland Dreier
6f9e7f01b6 IB/srpt: Remove use of transport_do_task_sg_chain()
With the modern target core, se_cmd->t_data_sg already points to a
sglist that covers the whole command.  So task_sg chaining is needless
overhead and obfuscation -- instead of splicing the split up task
sglists back into one list, we can just use the original list directly.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-04-14 17:40:32 -07:00
Roland Dreier
6f3603367b IB/srpt: Set srq_type to IB_SRQT_BASIC
Since commit 96104eda01 ("RDMA/core: Add SRQ type field"), kernel
users of SRQs need to specify srq_type = IB_SRQT_BASIC in struct
ib_srq_init_attr, or else most low-level drivers will fail in
when srpt_add_one() calls ib_create_srq() and gets -ENOSYS.

(mlx4_ib works OK nearly all of the time, because it just needs
srq_type != IB_SRQT_XRC.  And apparently nearly everyone using
ib_srpt is using mlx4 hardware)

Reported-by: Alexey Shvetsov <alexxy@gentoo.org>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-04-12 07:56:57 -07:00
Linus Torvalds
1ab142d499 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
Pull SCSI target updates from Nicholas Bellinger:
 "This contains the usual set of updates and bugfixes to target-core +
  existing fabric module code, along with a handful of the patches
  destined for v3.3 stable.

  It also contains the necessary target-core infrastructure pieces
  required to run using tcm_qla2xxx.ko WWPNs with the new Qlogic Fibre
  Channel fabric module currently queued in target-pending/for-next-merge,
  and coming for round 2.

  The highlights for this series include:

   - Add target_submit_tmr() helper function for fabric task management
     (andy)
   - Convert tcm_fc to use target_submit_tmr() (andy)
   - Replace target core various cmd flags with a transport state (hch)
   - Convert loopback to use workqueue submission (hch)
   - Convert target core to use array_zalloc for tpg_lun_list (joern)
   - Convert target core to use array_zalloc for device_list (joern)
   - Add target core support for TMR_ABORT_TASK (nab)
   - Add target core se_sess->sess_kref + get/put helpers (nab)
   - Add target core se_node_acl->acl_kref for ->acl_free_comp usage
     (nab)
   - Convert iscsi-target to use target_put_session + sess_kref (nab)
   - Fix tcm_fc fc_exch memory leak in ft_send_resp_status (nab)
   - Fix ib_srpt srpt_handle_cmd send_ioctx->ioctx_kref leak on
     exception (nab)
   - Fix target core up handling of short INQUIRY buffers (roland)
   - Untangle target-core front-end and back-end meanings of max_sectors
     attribute (roland)
   - Set loopback residual field for SCSI commands (roland)
   - Fix target-core 16-bit target ports for SET TARGET PORT GROUPS
     emulation (roland)

  Thanks again to Andy, Christoph, Joern, Roland, and everyone who has
  contributed this round!"

* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (64 commits)
  ib_srpt: Fix srpt_handle_cmd send_ioctx->ioctx_kref leak on exception
  loopback: Fix transport_generic_allocate_tasks error handling
  iscsi-target: remove improper externs
  iscsi-target: Remove unused variables in iscsi_target_parameters.c
  target: remove obvious warnings
  target: Use array_zalloc for device_list
  target: Use array_zalloc for tpg_lun_list
  target: Fix sense code for unsupported SERVICE ACTION IN
  target: Remove hack to make READ CAPACITY(10) lie if thin provisioning is enabled
  target: Bump core version to v4.1.0-rc2-ml + fabric versions
  tcm_fc: Fix fc_exch memory leak in ft_send_resp_status
  target: Drop unused legacy target_core_fabric_ops API callers
  iscsi-target: Convert to use target_put_session + sess_kref
  target: Convert se_node_acl->acl_group removal to use ->acl_kref
  target: Add se_node_acl->acl_kref for ->acl_free_comp usage
  target: Add se_node_acl->acl_free_comp for NodeACL release path
  target: Add se_sess->sess_kref + get/put helpers
  target: Convert session_lock to irqsave
  target: Fix typo in drivers/target
  iscsi-target: Fix dynamic -> explict NodeACL pointer reference
  ...
2012-03-22 12:38:04 -07:00
Linus Torvalds
0c2fe82a9b InfiniBand/RDMA changes for the 3.4 merge window. Nothing big really
stands out; by patch count lots of fixes to the mlx4 driver plus some
 cleanups and fixes to the core and other drivers.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.11 (GNU/Linux)
 
 iQIcBAABCAAGBQJPZ2THAAoJEENa44ZhAt0hMgkP/AwXjwzz6lYlIizytQ5KOH11
 CT/VoHLIGIwY31aRhZRHggWLEbJi8akgfh04K9PiSh/muFRPYk5KJDoQEoJV5Odv
 12krqtpZeL41YcAPq0wiyP3Ki5AZDCDumzWbOtmxE9m93mVfi3yFTTWDHFo/7SOx
 A2kUITdKuZhgBpCFYS23Gs8QTWcMPUlwNlM76edG90BjSySHsK15tbqTUaEOC6Ux
 SPG0p0c2YjlZCPmRObrCL65o5LFRVajinZ4rXN7rre4LEU+IuXgW+mQU2Kwy8z6a
 oMPE2l4OMSqj270r2PlVK087gGH69nKfLgLQLJiP0Id+KwdYUk7vQcA+Fu0jwjP3
 Ci/ahllvB76IiaNbax0vTy2ohCJmilLLotAP46NW0vPR2/KnmVy0Mqk8pXQQddw4
 tnAFNnafn/MjTty8GwqyXR/enJZs29ePcSxcYnKjXYRIgG4ldejL2wktgSra8+gb
 3/qFag1HRgZzcICkXGpHj7uWJ2KDe++1m6KTb0THUmrjuiel6ATFZpYoeRed3GfS
 ZMqpjZvuXM8nA9CrKMkzm5vIiUFF8Qq0hUTLR38F8FiNEhmC08JqH5PqjJ3Z18if
 R1yG4W4xmp/0ICHg4zyg6LWV3YsweDD+jSCJpW+KNBvu3HtYb41HIlK+i2TTmtPD
 3etEZZRwROAyYKcwN01p
 =DbLx
 -----END PGP SIGNATURE-----

Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband

Pull InfiniBand/RDMA changes for the 3.4 merge window from Roland Dreier:
 "Nothing big really stands out; by patch count lots of fixes to the
  mlx4 driver plus some cleanups and fixes to the core and other
  drivers."

* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (28 commits)
  mlx4_core: Scale size of MTT table with system RAM
  mlx4_core: Allow dynamic MTU configuration for IB ports
  IB/mlx4: Fix info returned when querying IBoE ports
  IB/mlx4: Fix possible missed completion event
  mlx4_core: Report thermal error events
  mlx4_core: Fix one more static exported function
  IB: Change CQE "csum_ok" field to a bit flag
  RDMA/iwcm: Reject connect requests if cmid is not in LISTEN state
  RDMA/cxgb3: Don't pass irq flags to flush_qp()
  mlx4_core: Get rid of redundant ext_port_cap flags
  RDMA/ucma: Fix AB-BA deadlock
  IB/ehca: Fix ilog2() compile failure
  IB: Use central enum for speed instead of hard-coded values
  IB/iser: Post initial receive buffers before sending the final login request
  IB/iser: Free IB connection resources in the proper place
  IB/srp: Consolidate repetitive sysfs code
  IB/srp: Use pr_fmt() and pr_err()/pr_warn()
  IB/core: Fix SDR rates in sysfs
  mlx4: Enforce device max FMR maps in FMR alloc
  IB/mlx4: Set bad_wr for invalid send opcode
  ...
2012-03-21 10:33:42 -07:00
Linus Torvalds
9f3938346a Merge branch 'kmap_atomic' of git://github.com/congwang/linux
Pull kmap_atomic cleanup from Cong Wang.

It's been in -next for a long time, and it gets rid of the (no longer
used) second argument to k[un]map_atomic().

Fix up a few trivial conflicts in various drivers, and do an "evil
merge" to catch some new uses that have come in since Cong's tree.

* 'kmap_atomic' of git://github.com/congwang/linux: (59 commits)
  feature-removal-schedule.txt: schedule the deprecated form of kmap_atomic() for removal
  highmem: kill all __kmap_atomic() [swarren@nvidia.com: highmem: Fix ARM build break due to __kmap_atomic rename]
  drbd: remove the second argument of k[un]map_atomic()
  zcache: remove the second argument of k[un]map_atomic()
  gma500: remove the second argument of k[un]map_atomic()
  dm: remove the second argument of k[un]map_atomic()
  tomoyo: remove the second argument of k[un]map_atomic()
  sunrpc: remove the second argument of k[un]map_atomic()
  rds: remove the second argument of k[un]map_atomic()
  net: remove the second argument of k[un]map_atomic()
  mm: remove the second argument of k[un]map_atomic()
  lib: remove the second argument of k[un]map_atomic()
  power: remove the second argument of k[un]map_atomic()
  kdb: remove the second argument of k[un]map_atomic()
  udf: remove the second argument of k[un]map_atomic()
  ubifs: remove the second argument of k[un]map_atomic()
  squashfs: remove the second argument of k[un]map_atomic()
  reiserfs: remove the second argument of k[un]map_atomic()
  ocfs2: remove the second argument of k[un]map_atomic()
  ntfs: remove the second argument of k[un]map_atomic()
  ...
2012-03-21 09:40:26 -07:00
Linus Torvalds
69a7aebcf0 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial
Pull trivial tree from Jiri Kosina:
 "It's indeed trivial -- mostly documentation updates and a bunch of
  typo fixes from Masanari.

  There are also several linux/version.h include removals from Jesper."

* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (101 commits)
  kcore: fix spelling in read_kcore() comment
  constify struct pci_dev * in obvious cases
  Revert "char: Fix typo in viotape.c"
  init: fix wording error in mm_init comment
  usb: gadget: Kconfig: fix typo for 'different'
  Revert "power, max8998: Include linux/module.h just once in drivers/power/max8998_charger.c"
  writeback: fix fn name in writeback_inodes_sb_nr_if_idle() comment header
  writeback: fix typo in the writeback_control comment
  Documentation: Fix multiple typo in Documentation
  tpm_tis: fix tis_lock with respect to RCU
  Revert "media: Fix typo in mixer_drv.c and hdmi_drv.c"
  Doc: Update numastat.txt
  qla4xxx: Add missing spaces to error messages
  compiler.h: Fix typo
  security: struct security_operations kerneldoc fix
  Documentation: broken URL in libata.tmpl
  Documentation: broken URL in filesystems.tmpl
  mtd: simplify return logic in do_map_probe()
  mm: fix comment typo of truncate_inode_pages_range
  power: bq27x00: Fix typos in comment
  ...
2012-03-20 21:12:50 -07:00
Cong Wang
2a156d094d infiniband: remove the second argument of k[un]map_atomic()
Acked-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: Cong Wang <amwang@redhat.com>
2012-03-20 21:48:18 +08:00
Roland Dreier
f0e88aeb19 Merge branches 'cma', 'cxgb3', 'cxgb4', 'ehca', 'iser', 'mad', 'nes', 'qib', 'srp' and 'srpt' into for-next 2012-03-19 09:50:33 -07:00
Nicholas Bellinger
187e70a554 ib_srpt: Fix srpt_handle_cmd send_ioctx->ioctx_kref leak on exception
This patch addresses a bug in srpt_handle_cmd() failure handling where
send_ioctx->kref is being leaked with the local extra reference after init,
causing the expected kref_put() in srpt_handle_send_comp() to not be the final
call to invoke srpt_put_send_ioctx_kref() -> transport_generic_free_cmd() and
perform se_cmd descriptor memory release.

It also fixes a SCF_SCSI_RESERVATION_CONFLICT handling bug where this code
is incorrectly falling through to transport_handle_cdb_direct() after
invoking srpt_queue_status() to send SAM_STAT_RESERVATION_CONFLICT status.

Note this patch is for >= v3.3 mainline code, and current lio-core.git
code has already been converted to target_submit_cmd() + se_cmd->cmd_kref usage,
and internal ioctx->kref usage has been removed.  I'm including this patch
now into target-pending/for-next with a CC' for v3.3 stable.

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: stable@vger.kernel.org
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-03-17 22:13:48 -07:00
Nicholas Bellinger
c7ec05c82b target: Drop unused legacy target_core_fabric_ops API callers
This patch drops the following unused legacy API callers from target_core_fabric.h:

*) TFO->fall_back_to_erl0()
*) TFO->stop_session()
*) TFO->sess_logged_in()
*) TFO->is_state_remove()

This patch also removes the stub usage in loopback, tcm_fc, iscsi_target,
and ib_srpt fabric modules.

Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-03-10 14:42:55 -08:00
Or Gerlitz
d927d505c5 IB: Change CQE "csum_ok" field to a bit flag
Use a bit in wc_flags rather then a whole integer to hold the
"checksum OK" flag.  By itself, this change doesn't reduce the size of
struct ib_wc on 64bit machines -- it stays on 56 bytes because of
padding.  However, it will allow to add more fields in the future
without enlarging the struct.  Also, it will let us have a unified
approach with future libibverbs checksum offload reporting, because a
bit flag doesn't break the library ABI.

This patch was suggested during conversation with Liran Liss
<liranl@mellanox.com>.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Reviewed-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-08 12:34:27 -08:00
Or Gerlitz
89e984e2c2 IB/iser: Post initial receive buffers before sending the final login request
An iser target may send iscsi NO-OP PDUs as soon as it marks the iSER
iSCSI session as fully operative.  This means that there is window
where there are no posted receive buffers on the initiator side, so
it's possible for the iSER RC connection to break because of RNR NAK /
retry errors.  To fix this, rely on the flags bits in the login
request to have FFP (0x3) in the lower nibble as a marker for the
final login request, and post an initial chunk of receive buffers
before sending that login request instead of after getting the login
response.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-05 08:53:05 -08:00
Doug Ledford
d474186f19 IB/iser: Free IB connection resources in the proper place
We allocate the login dma buffers in iser_verbs.c as part of
alloc_ib_conn_resources(), however we are freeing them in
iser_initiator.c as part of iser_free_rx_descriptors().  This is
needlessly confusing.  We have an alloc_rx_descriptors() and it
doesn't alloc something that the free_rx_descriptors() frees, and we
have an alloc_ib_conn_resources() that allocs something not freed by
free_ib_conn_resources().  Clean that up.

Signed-off-by: Doug Ledford <dledford@redhat.com>

[ Fix build error in iser_free_ib_conn_res().  - Or ]

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-03-05 00:23:27 -08:00
Bart Van Assche
683b159a2e IB/srp: Consolidate repetitive sysfs code
Remove sysfs attributes before removing a target instead of testing
the target state in every sysfs attribute callback method. Note: it is
safe to invoke a sysfs attribute removal method like
device_remove_file() twice on the same attribute.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-27 09:27:57 -08:00
Bart Van Assche
e0bda7d8c3 IB/srp: Use pr_fmt() and pr_err()/pr_warn()
Use pr_fmt() and pr_xxx() instead of more verbose printk() equivalents.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-27 09:26:30 -08:00
Masanari Iida
a776ce7cfc IB/srpt: Fix typo "alocate" -> "allocate"
Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-25 17:49:37 -08:00
Andy Grover
c8e31f26fe target: Add SCF_SCSI_TMR_CDB usage and drop se_tmr_req_cache
Change the test for if a cmd is a tmr request to checking if
SCF_SCSI_TMR_CDB (a new flag) is set in cmd->se_cmd_flags.

Also remove se_tmr_req_cache usage in favor of kzalloc usage,
and make core_tmr_alloc_req() return int + setup se_cmd->se_tmr_req
directly and fix up various fabric module usages

Cc: Andy Grover <agrover@redhat.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-02-25 14:37:47 -08:00
Christoph Hellwig
7d680f3b74 target: replace various cmd flags with a transport state
Replace various atomic_ts used as flags in struct se_cmd with a single
transport_state bitmap that requires t_state_lock to be held for modifications.

In the target core that assumption generally is true, but some recently added
code in the SRP target had to grow new lock calls.  I can't say I like the way
how it messes with the command state directly, but let's leave that for later.

(Re-add missing ib_srpt.c changes that nab dropped..)

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
2012-02-25 14:37:45 -08:00
Linus Torvalds
8df54d622a Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net
Quoth David:

1) GRO MAC header comparisons were ethernet specific, breaking other
   link types.  This required a multi-faceted fix to cure the originally
   noted case (Infiniband), because IPoIB was lying about it's actual
   hard header length.  Thanks to Eric Dumazet, Roland Dreier, and
   others.

2) Fix build failure when INET_UDP_DIAG is built in and ipv6 is modular.
   From Anisse Astier.

3) Off by ones and other bug fixes in netprio_cgroup from Neil Horman.

4) ipv4 TCP reset generation needs to respect any network interface
   binding from the socket, otherwise route lookups might give a
   different result than all the other segments received.  From Shawn
   Lu.

5) Fix unintended regression in ipv4 proxy ARP responses, from Thomas
   Graf.

6) Fix SKB under-allocation bug in sh_eth, from Yoshihiro Shimoda.

7) Revert skge PCI mapping changes that are causing crashes for some
   folks, from Stephen Hemminger.

8) IPV4 route lookups fill in the wildcarded fields of the given flow
   lookup key passed in, which is fine most of the time as this is
   exactly what the caller's want.  However there are a few cases that
   want to retain the original flow key values afterwards, so handle
   those cases properly.  Fix from Julian Anastasov.

9) IGB/IXGBE VF lookup bug fixes from Greg Rose.

10) Properly null terminate filename passed to ethtool flash device
    method, from Ben Hutchings.

11) S3 resume fix in via-velocity from David Lv.

12) Fix double SKB free during xmit failure in CAIF, from Dmitry
    Tarnyagin.

* git://git.kernel.org/pub/scm/linux/kernel/git/davem/net: (72 commits)
  net: Don't proxy arp respond if iif == rt->dst.dev if private VLAN is disabled
  ipv4: Fix wrong order of ip_rt_get_source() and update iph->daddr.
  netprio_cgroup: fix wrong memory access when NETPRIO_CGROUP=m
  netprio_cgroup: don't allocate prio table when a device is registered
  netprio_cgroup: fix an off-by-one bug
  bna: fix error handling of bnad_get_flash_partition_by_offset()
  isdn: type bug in isdn_net_header()
  net: Make qdisc_skb_cb upper size bound explicit.
  ixgbe: ethtool: stats user buffer overrun
  ixgbe: dcb: up2tc mapping lost on disable/enable CEE DCB state
  ixgbe: do not update real num queues when netdev is going away
  ixgbe: Fix broken dependency on MAX_SKB_FRAGS being related to page size
  ixgbe: Fix case of Tx Hang in PF with 32 VFs
  ixgbe: fix vf lookup
  igb: fix vf lookup
  e1000: add dropped DMA receive enable back in for WoL
  gro: more generic L2 header check
  IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses
  zd1211rw: firmware needs duration_id set to zero for non-pspoll frames
  net: enable TC35815 for MIPS again
  ...
2012-02-10 14:18:46 -08:00
Masanari Iida
7367d99b30 SRP: Fix typo in ib_srpt.c
Correct spelling "alocate" to "allocate" in
drivers/infiniband/ulp/srpt/ib_srpt.c

Signed-off-by: Masanari Iida <standby24x7@gmail.com>
Signed-off-by: Jiri Kosina <jkosina@suse.cz>
2012-02-09 23:09:54 +01:00
Roland Dreier
936d7de3d7 IPoIB: Stop lying about hard_header_len and use skb->cb to stash LL addresses
Commit a0417fa3a1 ("net: Make qdisc_skb_cb upper size bound
explicit.") made it possible for a netdev driver to use skb->cb
between its header_ops.create method and its .ndo_start_xmit
method.  Use this in ipoib_hard_header() to stash away the LL address
(GID + QPN), instead of the "ipoib_pseudoheader" hack.  This allows
IPoIB to stop lying about its hard_header_len, which will let us fix
the L2 check for GRO.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2012-02-08 18:26:54 -05:00
Jesper Juhl
715252d419 IB/srpt: Don't return freed pointer from srpt_alloc_ioctx_ring()
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-06 08:57:11 -08:00
Dan Carpenter
3af336376f IB/srpt: Fix ERR_PTR() vs. NULL checking confusion
transport_init_session() and target_fabric_configfs_init() don't
return NULL pointers, they only return ERR_PTRs or valid pointers.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-03 09:07:00 -08:00
Jesper Juhl
ebfded8c4b IB/srpt: Remove unneeded <linux/version.h> include
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-02 12:55:59 -08:00
Roland Dreier
f225066b64 IB/srpt: Use ARRAY_SIZE() instead of open-coding
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-02 12:55:58 -08:00
Roland Dreier
486d8b9f88 IB/srpt: Use DEFINE_SPINLOCK()/LIST_HEAD()
Signed-off-by: Roland Dreier <roland@purestorage.com>
2012-02-02 12:55:58 -08:00
Linus Torvalds
f59e842fc0 Merge branch 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
* 'for-next-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  ib_srpt: Initial SRP Target merge for v3.3-rc1
2012-01-18 16:29:42 -08:00
Linus Torvalds
972b2c7199 Merge branch 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs
* 'for-linus2' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: (165 commits)
  reiserfs: Properly display mount options in /proc/mounts
  vfs: prevent remount read-only if pending removes
  vfs: count unlinked inodes
  vfs: protect remounting superblock read-only
  vfs: keep list of mounts for each superblock
  vfs: switch ->show_options() to struct dentry *
  vfs: switch ->show_path() to struct dentry *
  vfs: switch ->show_devname() to struct dentry *
  vfs: switch ->show_stats to struct dentry *
  switch security_path_chmod() to struct path *
  vfs: prefer ->dentry->d_sb to ->mnt->mnt_sb
  vfs: trim includes a bit
  switch mnt_namespace ->root to struct mount
  vfs: take /proc/*/mounts and friends to fs/proc_namespace.c
  vfs: opencode mntget() mnt_set_mountpoint()
  vfs: spread struct mount - remaining argument of next_mnt()
  vfs: move fsnotify junk to struct mount
  vfs: move mnt_devname
  vfs: move mnt_list to struct mount
  vfs: switch pnode.h macros to struct mount *
  ...
2012-01-08 12:19:57 -08:00
Al Viro
587a1f1659 switch ->is_visible() to returning umode_t
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
2012-01-03 22:54:55 -05:00
Bart Van Assche
a42d985bd5 ib_srpt: Initial SRP Target merge for v3.3-rc1
This patch adds the kernel module ib_srpt SCSI RDMA Protocol (SRP) target
implementation conforming to the SRP r16a specification for the mainline
drivers/target infrastructure.

This driver was originally developed by Vu Pham and has been optimized by
Bart Van Assche and merged into upstream LIO based on his srpt-lio-4.1
branch here:

   https://github.com/bvanassche/srpt-lio/commits/srpt-lio-4.1/

This updated patch also contains the following two changes from
lio-core-2.6.git/master.  One is to fix a bug with 1 >= task->task_sg[]
chained mappings in ib_srpt, and the other to convert the configfs control
plane to reference IB Port GUID and struct srpt_port directly following
mainline v4.x target_core_fabric_configfs.c convertion for ib_srpt
to work with rtslib/rtsadmin v2 code.

These seperate patches can be found here:

ib_srpt: Fix bug with chainged SGLs in srpt_map_sg_to_ib_sge
http://www.risingtidesystems.com/git/?p=lio-core-2.6.git;a=commitdiff;h=ea485147563b6555a97dbf811825fbb586519252

ib_srpt: Convert se_wwn endpoint reference to struct srpt_port->port_wwn
http://www.risingtidesystems.com/git/?p=lio-core-2.6.git;a=commitdiff;h=4e544a210acb227df1bb4ca5086e65bdf4e648ea

This also includes the following recent v1 -> v2 review changes:

ib_srpt: Fix potential out-of-bounds array access
ib_srpt: Avoid failed multipart RDMA transfers
ib_srpt: Fix srpt_alloc_fabric_acl failure case return value
ib_srpt: Update comments to reference $driver/$port layout
ib_srpt: Fix sport->port_guid formatting code
ib_srpt: Remove legacy use_port_guid_in_session_name module parameter
ib_srpt: Convert srp_max_rdma_size into per port configfs attribute
ib_srpt: Convert srp_max_rsp_size into per port configfs attribute
ib_srpt: Convert srpt_sq_size into per port configfs attribute

and v2 -> v3 review changes:

ib_srpt: Fix possible race with srp_sq_size in srpt_create_ch_ib
ib_srpt: Fix possible race with srp_max_rsp_size in srpt_release_channel_work
ib_srpt: Fix up MAX_SRPT_RDMA_SIZE define
ib_srpt: Make srpt_map_sg_to_ib_sge() failure case return -EAGAIN
ib_srpt: Convert port_guid to use subnet_prefix + interface_id formatting
ib_srpt: Make srpt_check_stop_free return kref_put status
ib_srpt: Make compilation with BUG=n proceed`
ib_srpt: Use new target_core_fabric.h include
ib_srpt: Check hex2bin() return code to silence build warning

Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Roland Dreier <roland@purestorage.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Vu Pham <vu@mellanox.com>
Cc: David Dillow <dillowda@ornl.gov>
Signed-off-by: Nicholas A. Bellinger <nab@risingtidesystems.com>
2011-12-16 06:33:56 +00:00
David Miller
17e6abeec4 infiniband: ipoib: Sanitize neighbour handling in ipoib_main.c
Reduce the number of dst_get_neighbour_noref() calls within a single
call chain.  Primarily by passing the neighbour pointer down to the
helper functions.

Handle dst_get_neighbour_noref() returning NULL in ipoib_start_xmit()
by incrementing the dropped counter and freeing the packet.  We don't
want it to fall through into the ARP/RARP/multicast handling, since
that should only happen when skb_dst() is NULL.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
2011-12-05 15:20:20 -05:00
David Miller
2721745501 net: Rename dst_get_neighbour{, _raw} to dst_get_neighbour_noref{, _raw}.
To reflect the fact that a refrence is not obtained to the
resulting neighbour entry.

Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Roland Dreier <roland@purestorage.com>
2011-12-05 15:20:19 -05:00
David S. Miller
b3613118eb Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2011-12-02 13:49:21 -05:00
David Miller
596b9b68ef neigh: Add infrastructure for allocating device neigh privates.
netdev->neigh_priv_len records the private area length.

This will trigger for neigh_table objects which set tbl->entry_size
to zero, and the first instances of this will be forthcoming.

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-30 18:46:43 -05:00
Roland Dreier
a493f1a24a Merge branches 'cxgb4', 'ipoib', 'misc' and 'qib' into for-next 2011-11-29 18:01:53 -08:00
Eric Dumazet
580da35a31 IB: Fix RCU lockdep splats
Commit f2c31e32b3 ("net: fix NULL dereferences in check_peer_redir()")
forgot to take care of infiniband uses of dst neighbours.

Many thanks to Marc Aurele who provided a nice bug report and feedback.

Reported-by: Marc Aurele La France <tsi@ualberta.ca>
Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: David Miller <davem@davemloft.net>
Cc: <stable@kernel.org>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-29 13:37:11 -08:00
Mike Marciniszyn
3874397c0b IB/ipoib: Prevent hung task or softlockup processing multicast response
This following can occur with ipoib when processing a multicast reponse:

    BUG: soft lockup - CPU#0 stuck for 67s! [ib_mad1:982]
    Modules linked in: ...
    CPU 0:
    Modules linked in: ...
    Pid: 982, comm: ib_mad1 Not tainted 2.6.32-131.0.15.el6.x86_64 #1 ProLiant DL160 G5
    RIP: 0010:[<ffffffff814ddb27>]  [<ffffffff814ddb27>] _spin_unlock_irqrestore+0x17/0x20
    RSP: 0018:ffff8802119ed860  EFLAGS: 00000246
    0000000000000004 RBX: ffff8802119ed860 RCX: 000000000000a299
    RDX: ffff88021086c700 RSI: 0000000000000246 RDI: 0000000000000246
    RBP: ffffffff8100bc8e R08: ffff880210ac229c R09: 0000000000000000
    R10: ffff88021278aab8 R11: 0000000000000000 R12: ffff8802119ed860
    R13: ffffffff8100be6e R14: 0000000000000001 R15: 0000000000000003
    FS:  0000000000000000(0000) GS:ffff880028200000(0000) knlGS:0000000000000000
    CS:  0010 DS: 0018 ES: 0018 CR0: 000000008005003b
    CR2: 00000000006d4840 CR3: 0000000209aa5000 CR4: 00000000000406f0
    DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
    DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
    Call Trace:
    [<ffffffffa032c247>] ? ipoib_mcast_send+0x157/0x480 [ib_ipoib]
    [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
    [<ffffffff8100bc8e>] ? apic_timer_interrupt+0xe/0x20
    [<ffffffffa03283d4>] ? ipoib_path_lookup+0x124/0x2d0 [ib_ipoib]
    [<ffffffffa03286fc>] ? ipoib_start_xmit+0x17c/0x430 [ib_ipoib]
    [<ffffffff8141e758>] ? dev_hard_start_xmit+0x2c8/0x3f0
    [<ffffffff81439d0a>] ? sch_direct_xmit+0x15a/0x1c0
    [<ffffffff81423098>] ? dev_queue_xmit+0x388/0x4d0
    [<ffffffffa032d6b7>] ? ipoib_mcast_join_finish+0x2c7/0x510 [ib_ipoib]
    [<ffffffffa032dab8>] ? ipoib_mcast_sendonly_join_complete+0x1b8/0x1f0 [ib_ipoib]
    [<ffffffffa02a0946>] ? mcast_work_handler+0x1a6/0x710 [ib_sa]
    [<ffffffffa015f01e>] ? ib_send_mad+0xfe/0x3c0 [ib_mad]
    [<ffffffffa00f6c93>] ? ib_get_cached_lmc+0xa3/0xb0 [ib_core]
    [<ffffffffa02a0f9b>] ? join_handler+0xeb/0x200 [ib_sa]
    [<ffffffffa029e4fc>] ? ib_sa_mcmember_rec_callback+0x5c/0xa0 [ib_sa]
    [<ffffffffa029e79c>] ? recv_handler+0x3c/0x70 [ib_sa]
    [<ffffffffa01603a4>] ? ib_mad_completion_handler+0x844/0x9d0 [ib_mad]
    [<ffffffffa015fb60>] ? ib_mad_completion_handler+0x0/0x9d0 [ib_mad]
    [<ffffffff81088830>] ? worker_thread+0x170/0x2a0
    [<ffffffff8108e160>] ? autoremove_wake_function+0x0/0x40
    [<ffffffff810886c0>] ? worker_thread+0x0/0x2a0
    [<ffffffff8108ddf6>] ? kthread+0x96/0xa0
    [<ffffffff8100c1ca>] ? child_rip+0xa/0x20

Coinciding with stack trace is the following message:

    ib0: ib_address_create failed

The code below in ipoib_mcast_join_finish() will note the above
failure in the address handle but otherwise continue:

                ah = ipoib_create_ah(dev, priv->pd, &av);
                if (!ah) {
                        ipoib_warn(priv, "ib_address_create failed\n");
                } else {

The while loop at the bottom of ipoib_mcast_join_finish() will attempt
to send queued multicast packets in mcast->pkt_queue and eventually
end up in ipoib_mcast_send():

        if (!mcast->ah) {
                if (skb_queue_len(&mcast->pkt_queue) < IPOIB_MAX_MCAST_QUEUE)
                        skb_queue_tail(&mcast->pkt_queue, skb);
                else {
                        ++dev->stats.tx_dropped;
                        dev_kfree_skb_any(skb);
                }

My read is that the code will requeue the packet and return to the
ipoib_mcast_join_finish() while loop and the stage is set for the
"hung" task diagnostic as the while loop never sees a non-NULL ah, and
will do nothing to resolve.

There are GFP_ATOMIC allocates in the provider routines, so this is
possible and should be dealt with.

The test that induced the failure is associated with a host SM on the
same server during a shutdown.

This patch causes ipoib_mcast_join_finish() to exit with an error
which will flush the queued mcast packets.  Nothing is done to unwind
the QP attached state so that subsequent sends from above will retry
the join.

Reviewed-by: Ram Vepa <ram.vepa@qlogic.com>
Reviewed-by: Gary Leshner <gary.leshner@qlogic.com>
Signed-off-by: Mike Marciniszyn <mike.marciniszyn@qlogic.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-29 13:20:02 -08:00
David S. Miller
9ca36f7db2 infiniband: Update net drivers for netdev_features_t changes.
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-11-16 18:05:50 -05:00
Linus Torvalds
32aaeffbd4 Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux
* 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux: (230 commits)
  Revert "tracing: Include module.h in define_trace.h"
  irq: don't put module.h into irq.h for tracking irqgen modules.
  bluetooth: macroize two small inlines to avoid module.h
  ip_vs.h: fix implicit use of module_get/module_put from module.h
  nf_conntrack.h: fix up fallout from implicit moduleparam.h presence
  include: replace linux/module.h with "struct module" wherever possible
  include: convert various register fcns to macros to avoid include chaining
  crypto.h: remove unused crypto_tfm_alg_modname() inline
  uwb.h: fix implicit use of asm/page.h for PAGE_SIZE
  pm_runtime.h: explicitly requires notifier.h
  linux/dmaengine.h: fix implicit use of bitmap.h and asm/page.h
  miscdevice.h: fix up implicit use of lists and types
  stop_machine.h: fix implicit use of smp.h for smp_processor_id
  of: fix implicit use of errno.h in include/linux/of.h
  of_platform.h: delete needless include <linux/module.h>
  acpi: remove module.h include from platform/aclinux.h
  miscdevice.h: delete unnecessary inclusion of module.h
  device_cgroup.h: delete needless include <linux/module.h>
  net: sch_generic remove redundant use of <linux/module.h>
  net: inet_timewait_sock doesnt need <linux/module.h>
  ...

Fix up trivial conflicts (other header files, and  removal of the ab3550 mfd driver) in
 - drivers/media/dvb/frontends/dibx000_common.c
 - drivers/media/video/{mt9m111.c,ov6650.c}
 - drivers/mfd/ab3550-core.c
 - include/linux/dmaengine.h
2011-11-06 19:44:47 -08:00
Or Gerlitz
52439540ea IB/iser: DMA unmap TX bufs used for iSCSI/iSER headers
The current driver never does DMA unmapping on these buffers.  Fix that
by adding DMA unmapping to the task cleanup callback, and DMA mapping to
the task init function (drop the headers_initialized micro-optimization).

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-04 09:32:44 -07:00
Or Gerlitz
2c4ce60934 IB/iser: Use separate buffers for the login request/response
The driver counted on the transactional nature of iSCSI login/text
flows and used the same buffer for both the request and the response.
We also went further and did DMA mapping only once, with
DMA_FROM_DEVICE, which violates the DMA mapping API.  Fix that by
using different buffers, one for requests and one for responses, and
use the correct DMA mapping direction for each.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-11-04 09:30:52 -07:00
Linus Torvalds
f470f8d4e7 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (62 commits)
  mlx4_core: Deprecate log_num_vlan module param
  IB/mlx4: Don't set VLAN in IBoE WQEs' control segment
  IB/mlx4: Enable 4K mtu for IBoE
  RDMA/cxgb4: Mark QP in error before disabling the queue in firmware
  RDMA/cxgb4: Serialize calls to CQ's comp_handler
  RDMA/cxgb3: Serialize calls to CQ's comp_handler
  IB/qib: Fix issue with link states and QSFP cables
  IB/mlx4: Configure extended active speeds
  mlx4_core: Add extended port capabilities support
  IB/qib: Hold links until tuning data is available
  IB/qib: Clean up checkpatch issue
  IB/qib: Remove s_lock around header validation
  IB/qib: Precompute timeout jiffies to optimize latency
  IB/qib: Use RCU for qpn lookup
  IB/qib: Eliminate divide/mod in converting idx to egr buf pointer
  IB/qib: Decode path MTU optimization
  IB/qib: Optimize RC/UC code by IB operation
  IPoIB: Use the right function to do DMA unmap pages
  RDMA/cxgb4: Use correct QID in insert_recv_cqe()
  RDMA/cxgb4: Make sure flush CQ entries are collected on connection close
  ...
2011-11-01 10:51:38 -07:00
Roland Dreier
504255f8d0 Merge branches 'amso1100', 'cma', 'cxgb3', 'cxgb4', 'fdr', 'ipath', 'ipoib', 'misc', 'mlx4', 'misc', 'nes', 'qib' and 'xrc' into for-next 2011-11-01 09:37:08 -07:00
Paul Gortmaker
fec14d2fce infiniband: add moduleparam.h to drivers/infiniband as required
These files were getting the moduleparam infrastructure from the
implicit presence of module.h being everywhere, but that is going
away soon.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:36 -04:00
Paul Gortmaker
b108d9764c infiniband: add in export.h for files using EXPORT_SYMBOL/THIS_MODULE
These were getting it implicitly via device.h --> module.h but
we are going to stop that when we clean up the headers.

Fix these in advance so the tree remains biscect-clean.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:35 -04:00
Paul Gortmaker
e4dd23d753 infiniband: Fix up module files that need to include module.h
They had been getting it implicitly via device.h but we can't
rely on that for the future, due to a pending cleanup so fix
it now.

Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
2011-10-31 19:31:35 -04:00
Linus Torvalds
ec7ae51753 Merge git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (204 commits)
  [SCSI] qla4xxx: export address/port of connection (fix udev disk names)
  [SCSI] ipr: Fix BUG on adapter dump timeout
  [SCSI] megaraid_sas: Fix instance access in megasas_reset_timer
  [SCSI] hpsa: change confusing message to be more clear
  [SCSI] iscsi class: fix vlan configuration
  [SCSI] qla4xxx: fix data alignment and use nl helpers
  [SCSI] iscsi class: fix link local mispelling
  [SCSI] iscsi class: Replace iscsi_get_next_target_id with IDA
  [SCSI] aacraid: use lower snprintf() limit
  [SCSI] lpfc 8.3.27: Change driver version to 8.3.27
  [SCSI] lpfc 8.3.27: T10 additions for SLI4
  [SCSI] lpfc 8.3.27: Fix queue allocation failure recovery
  [SCSI] lpfc 8.3.27: Change algorithm for getting physical port name
  [SCSI] lpfc 8.3.27: Changed worst case mailbox timeout
  [SCSI] lpfc 8.3.27: Miscellanous logic and interface fixes
  [SCSI] megaraid_sas: Changelog and version update
  [SCSI] megaraid_sas: Add driver workaround for PERC5/1068 kdump kernel panic
  [SCSI] megaraid_sas: Add multiple MSI-X vector/multiple reply queue support
  [SCSI] megaraid_sas: Add support for MegaRAID 9360/9380 12GB/s controllers
  [SCSI] megaraid_sas: Clear FUSION_IN_RESET before enabling interrupts
  ...
2011-10-28 16:44:18 -07:00
Eric Dumazet
9e903e0852 net: add skb frag size accessors
To ease skb->truesize sanitization, its better to be able to localize
all references to skb frags size.

Define accessors : skb_frag_size() to fetch frag size, and
skb_frag_size_{set|add|sub}() to manipulate it.

Signed-off-by: Eric Dumazet <eric.dumazet@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-10-19 03:10:46 -04:00
Dotan Barak
787adb9d6a IPoIB: Use the right function to do DMA unmap pages
Pages that were mapped using ib_dma_map_page() should be unmapped
using ib_dma_unmap_page().

Signed-off-by: Dotan Barak <dotanb@dev.mellanox.co.il>
Reviewed-by: Jack Morgenstein <jackm@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-18 10:08:31 -07:00
Sean Hefty
96104eda01 RDMA/core: Add SRQ type field
Currently, there is only a single ("basic") type of SRQ, but with XRC
support we will add a second.  Prepare for this by defining an SRQ type
and setting all current users to IB_SRQT_BASIC.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-13 09:13:26 -07:00
Marcel Apfelbaum
e36fb88a9a IPoIB: Handle extended rates in debugfs
Use new function ib_rate_to_mbps() to handle printing rate in debugfs,
so that we handle extended rates.

Signed-off-by: Marcel Apfelbaum <marcela@dev.mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-10-11 11:57:08 -07:00
David S. Miller
8decf86879 Merge branch 'master' of github.com:davem330/net
Conflicts:
	MAINTAINERS
	drivers/net/Kconfig
	drivers/net/ethernet/broadcom/bnx2x/bnx2x_link.c
	drivers/net/ethernet/broadcom/tg3.c
	drivers/net/wireless/iwlwifi/iwl-pci.c
	drivers/net/wireless/iwlwifi/iwl-trans-tx-pcie.c
	drivers/net/wireless/rt2x00/rt2800usb.c
	drivers/net/wireless/wl12xx/main.c
2011-09-22 03:23:13 -04:00
Mike Christie
f27fb2ef7b [SCSI] iscsi class: sysfs group is_visible callout for iscsi host attrs
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and driver's host attrs to use the attribute
container sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:14 -06:00
Mike Christie
1d063c1729 [SCSI] iscsi class: sysfs group is_visible callout for session attrs
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and driver's session attrs to use the attribute
container sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:06 -06:00
Mike Christie
3128c6c73c [SCSI] iscsi cls: sysfs group is_visible callout for conn attrs
The iscsi class currently does not support writable sysfs
attrs for LLD sysfs settings. This patch converts the
iscsi class and drivers to use the attribute container
sysfs group and the sysfs group's is_visible callout
to be able to support readable or writable sysfs attrs.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
2011-08-27 08:36:03 -06:00
Ian Campbell
5581be3b48 IPoIB: convert to SKB paged frag API.
Signed-off-by: Ian Campbell <ian.campbell@citrix.com>
Reviewed-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Roland Dreier <roland@kernel.org>
Cc: Sean Hefty <sean.hefty@intel.com>
Cc: Hal Rosenstock <hal.rosenstock@gmail.com>
Cc: linux-rdma@vger.kernel.org
Cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-26 12:38:42 -04:00
Jiri Pirko
afc4b13df1 net: remove use of ndo_set_multicast_list in drivers
replace it by ndo_set_rx_mode

Signed-off-by: Jiri Pirko <jpirko@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-08-17 20:22:03 -07:00
Roland Dreier
80b43de837 Merge branches 'ipoib' and 'iser' into for-next 2011-08-17 10:57:43 -07:00
Or Gerlitz
200ae1a08b IB/iser: Support iSCSI PDU padding
RFC3270 mandates that iSCSI PDUs are padded to the closest integer
number of four byte words.  Fix the iser code to support that on both
the TX/RX flows.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-08-17 09:45:07 -07:00
Or Gerlitz
0ace64b85e IBiser: Fix wrong mask when sizeof (dma_addr_t) > sizeof (unsigned long)
The code that prepares the SG associated with SCSI command for FMR was
buggy for systems with DMA addresses that don't fit in unsigned long,
e.g under the 32-bit based XenServer dom0 sizeof(dma_addr_t) is 8.

Fix that by casting to unsigned long long a masking constant used by
the code. This resolves a crash in iser_sg_to_page_vec on this system.

Signed-off-by: Or Gerlitz <ogerlitz@mellanox.co.il>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-08-17 09:40:55 -07:00
Bernd Schubert
22cfb0bf67 IPoIB: Fix possible NULL dereference in ipoib_start_xmit()
Fix a bug introduced in 69cce1d140 ("net: Abstract dst->neighbour
accesses behind helpers.") where we might dereference skb_dst(skb)
even if it is NULL, which causes:

    [  240.944030] BUG: unable to handle kernel NULL pointer dereference at 0000000000000040
    [  240.948007] IP: [<ffffffffa0366ce9>] ipoib_start_xmit+0x39/0x280 [ib_ipoib]
    [...]
    [  240.948007] Call Trace:
    [  240.948007]  <IRQ>
    [  240.948007]  [<ffffffff812cd5e0>] dev_hard_start_xmit+0x2a0/0x590
    [  240.948007]  [<ffffffff8131f680>] ? arp_create+0x70/0x200
    [  240.948007]  [<ffffffff812e8e1f>] sch_direct_xmit+0xef/0x1c0

Addresses: https://bugzilla.kernel.org/show_bug.cgi?id=41212
Signed-off-by: Bernd Schubert <bernd.schubert@itwm.fraunhofer.de>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-08-16 10:19:20 -07:00
Linus Torvalds
91d41fdf31 Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending:
  target: Convert to DIV_ROUND_UP_SECTOR_T usage for sectors / dev_max_sectors
  kernel.h: Add DIV_ROUND_UP_ULL and DIV_ROUND_UP_SECTOR_T macro usage
  iscsi-target: Add iSCSI fabric support for target v4.1
  iscsi: Add Serial Number Arithmetic LT and GT into iscsi_proto.h
  iscsi: Use struct scsi_lun in iscsi structs instead of u8[8]
  iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi
2011-07-27 13:21:40 -07:00
Arun Sharma
60063497a9 atomic: use <linux/atomic.h>
This allows us to move duplicated code in <asm/atomic.h>
(atomic_inc_not_zero() for now) to <linux/atomic.h>

Signed-off-by: Arun Sharma <asharma@fb.com>
Reviewed-by: Eric Dumazet <eric.dumazet@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: David Miller <davem@davemloft.net>
Cc: Eric Dumazet <eric.dumazet@gmail.com>
Acked-by: Mike Frysinger <vapier@gentoo.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-07-26 16:49:47 -07:00
Nicholas Bellinger
123521830c iscsi: Resolve iscsi_proto.h naming conflicts with drivers/target/iscsi
This patch renames the following iscsi_proto.h structures to avoid
namespace issues with drivers/target/iscsi/iscsi_target_core.h:

*) struct iscsi_cmd -> struct iscsi_scsi_req
*) struct iscsi_cmd_rsp -> struct iscsi_scsi_rsp
*) struct iscsi_login -> struct iscsi_login_req

This patch includes useful ISCSI_FLAG_LOGIN_[CURRENT,NEXT]_STAGE*,
and ISCSI_FLAG_SNACK_TYPE_* definitions used by iscsi_target_mod, and
fixes the incorrect definition of struct iscsi_snack to following
RFC-3720 Section 10.16. SNACK Request.

Also, this patch updates libiscsi, iSER, be2iscsi, and bn2xi to
use the updated structure definitions in a handful of locations.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Nicholas A. Bellinger <nab@linux-iscsi.org>
2011-07-25 07:18:45 +00:00
Linus Torvalds
ece236ce2f Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (26 commits)
  IB/qib: Defer HCA error events to tasklet
  mlx4_core: Bump the driver version to 1.0
  RDMA/cxgb4: Use printk_ratelimited() instead of printk_ratelimit()
  IB/mlx4: Support PMA counters for IBoE
  IB/mlx4: Use flow counters on IBoE ports
  IB/pma: Add include file for IBA performance counters definitions
  mlx4_core: Add network flow counters
  mlx4_core: Fix location of counter index in QP context struct
  mlx4_core: Read extended capabilities into the flags field
  mlx4_core: Extend capability flags to 64 bits
  IB/mlx4: Generate GID change events in IBoE code
  IB/core: Add GID change event
  RDMA/cma: Don't allow IPoIB port space for IBoE
  RDMA: Allow for NULL .modify_device() and .modify_port() methods
  IB/qib: Update active link width
  IB/qib: Fix potential deadlock with link down interrupt
  IB/qib: Add sysfs interface to read free contexts
  IB/mthca: Remove unnecessary read of PCI_CAP_ID_EXP
  IB/qib: Remove double define
  IB/qib: Remove unnecessary read of PCI_CAP_ID_EXP
  ...
2011-07-22 14:50:12 -07:00
David S. Miller
69cce1d140 net: Abstract dst->neighbour accesses behind helpers.
dst_{get,set}_neighbour()

Signed-off-by: David S. Miller <davem@davemloft.net>
2011-07-17 23:11:35 -07:00
Bart Van Assche
fd1b6c4a69 IB/srp: Avoid duplicate devices from LUN scan
SCSI scanning of a channel🆔lun triplet in Linux works as follows
(function scsi_scan_target() in drivers/scsi/scsi_scan.c):

- If lun == SCAN_WILD_CARD, send a REPORT LUNS command to the target
  and process the result.

- If lun != SCAN_WILD_CARD, send an INQUIRY command to the LUN
  corresponding to the specified channel🆔lun triplet to verify
  whether the LUN exists.

So a SCSI driver must either take the channel and target id values in
account in its quecommand() function or it should declare that it only
supports one channel and one target id.

Currently the ib_srp driver does neither.  As a result scanning the
SCSI bus via e.g. rescan-scsi-bus.sh causes many duplicate SCSI
devices to be created. For each 0:0:L device, several duplicates are
created with the same LUN number and with (C:I) != (0:0). Fix this by
declaring that the ib_srp driver only supports one channel and one
target id.

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Cc: <stable@kernel.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-07-13 09:19:16 -07:00
Alexey Dobriyan
a6b7a40786 net: remove interrupt.h inclusion from netdevice.h
* remove interrupt.g inclusion from netdevice.h -- not needed
* fixup fallout, add interrupt.h and hardirq.h back where needed.

Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-06-06 22:55:11 -07:00
Linus Torvalds
4c171acc20 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  RDMA/cma: Save PID of ID's owner
  RDMA/cma: Add support for netlink statistics export
  RDMA/cma: Pass QP type into rdma_create_id()
  RDMA: Update exported headers list
  RDMA/cma: Export enum cma_state in <rdma/rdma_cm.h>
  RDMA/nes: Add a check for strict_strtoul()
  RDMA/cxgb3: Don't post zero-byte read if endpoint is going away
  RDMA/cxgb4: Use completion objects for event blocking
  IB/srp: Fix integer -> pointer cast warnings
  IB: Add devnode methods to cm_class and umad_class
  IB/mad: Return EPROTONOSUPPORT when an RDMA device lacks the QP required
  IB/uverbs: Add devnode method to set path/mode
  RDMA/ucma: Add .nodename/.mode to tell userspace where to create device node
  RDMA: Add netlink infrastructure
  RDMA: Add error handling to ib_core_init()
2011-05-26 12:13:57 -07:00
Roland Dreier
8dc4abdf4c Merge branches 'cma', 'cxgb3', 'cxgb4', 'misc', 'nes', 'netlink', 'srp' and 'uverbs' into for-next 2011-05-25 13:47:20 -07:00
Sean Hefty
b26f9b9949 RDMA/cma: Pass QP type into rdma_create_id()
The RDMA CM currently infers the QP type from the port space selected
by the user.  In the future (eg with RDMA_PS_IB or XRC), there may not
be a 1-1 correspondence between port space and QP type.  For netlink
export of RDMA CM state, we want to export the QP type to userspace,
so it is cleaner to explicitly associate a QP type to an ID.

Modify rdma_create_id() to allow the user to specify the QP type, and
use it to make our selections of datagram versus connected mode.

Signed-off-by: Sean Hefty <sean.hefty@intel.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
2011-05-25 13:46:23 -07:00
Roland Dreier
737b94eb41 IB/srp: Fix integer -> pointer cast warnings
Fix

    drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_handle_recv':
    drivers/infiniband/ulp/srp/ib_srp.c:1150: warning: cast to pointer from integer of different size
    drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_send_completion':
    drivers/infiniband/ulp/srp/ib_srp.c🔢 warning: cast to pointer from integer of different size

by adding an intermediate cast to uintptr_t.

Signed-off-by: Roland Dreier <roland@purestorage.com>
Acked-by: David Dillow <dillowda@ornl.gov>
2011-05-23 11:30:04 -07:00
Michał Mirosław
3d96c74d89 net: infiniband/ulp/ipoib: convert to hw_features
Signed-off-by: Michał Mirosław <mirq-linux@rere.qmqm.pl>
Signed-off-by: David S. Miller <davem@davemloft.net>
2011-04-20 01:30:42 -07:00
Lucas De Marchi
25985edced Fix common misspellings
Fixes generated by 'codespell' and manually reviewed.

Signed-off-by: Lucas De Marchi <lucas.demarchi@profusion.mobi>
2011-03-31 11:26:23 -03:00
Linus Torvalds
0625bef606 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband:
  IB: Increase DMA max_segment_size on Mellanox hardware
  IB/mad: Improve an error message so error code is included
  RDMA/nes: Don't print success message at level KERN_ERR
  RDMA/addr: Fix return of uninitialized ret value
  IB/srp: try to use larger FMR sizes to cover our mappings
  IB/srp: add support for indirect tables that don't fit in SRP_CMD
  IB/srp: rework mapping engine to use multiple FMR entries
  IB/srp: allow sg_tablesize to be set for each target
  IB/srp: move IB CM setup completion into its own function
  IB/srp: always avoid non-zero offsets into an FMR
2011-03-24 07:59:46 -07:00
David Dillow
be8b981453 IB/srp: try to use larger FMR sizes to cover our mappings
Now that we can get larger SG lists, we can take advantage of HCAs that
allow us to use larger FMR sizes. In many cases, we can use up to 512
entries, so start there and work our way down.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:41:30 -04:00
David Dillow
c07d424d61 IB/srp: add support for indirect tables that don't fit in SRP_CMD
This allows us to guarantee the ability to submit up to 8 MB requests
based on the current value of SCSI_MAX_SG_CHAIN_SEGMENTS. While FMR will
usually condense the requests into 8 SG entries, it is imperative that
the target support external tables in case the FMR mapping fails or is
not supported.

We add a safety valve to allow targets without the needed support to
reap the benefits of the large tables, but fail in a manner that lets
the user know that the data didn't make it to the device. The user must
add "allow_ext_sg=1" to the target parameters to indicate that the
target has the needed support.

If indirect_sg_entries is not specified in the modules options, then
the sg_tablesize for the target will default to cmd_sg_entries unless
overridden by the target options.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:37:23 -04:00
David Dillow
8f26c9ff9c IB/srp: rework mapping engine to use multiple FMR entries
Instead of forcing all of the S/G entries to fit in one FMR, and falling
back to indirect descriptors if that fails, allow the use of as many
FMRs as needed to map the request. This lays the groundwork for allowing
indirect descriptor tables that are larger than can fit in the command
IU, but should marginally improve performance now by reducing the number
of indirect descriptors needed.

We increase the minimum page size for the FMR pool to 4K, as larger
pages help increase the coverage of each FMR, and it is rare that the
kernel would send down a request with scattered 512 byte fragments.

This patch also move some of the target initialization code afte the
parsing of options, to keep it together with the new code that needs to
allocate memory based on the options given.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:35:16 -04:00
David Dillow
4924864404 IB/srp: allow sg_tablesize to be set for each target
Different configurations of target software allow differing max sizes of
the command IU. Allowing this to be changed per-target allows all
targets on an initiator to get an optimal setting.

We deprecate srp_sg_tablesize and replace it with cmd_sg_entries in
preparation for allowing more indirect descriptors than can fit in the
IU.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:35:05 -04:00
David Dillow
961e0be89a IB/srp: move IB CM setup completion into its own function
This is to clean up prior to further changes.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:34:48 -04:00
David Dillow
8c4037b501 IB/srp: always avoid non-zero offsets into an FMR
It is unclear exactly how this code works around Mellanox SRP targets,
or if the problem is on the target side or in the HCA itself. In an
abundance of caution, we should always enable the workaround.

Signed-off-by: David Dillow <dillowda@ornl.gov>
2011-03-15 19:34:28 -04:00
Mike Christie
7c53c6f89d [SCSI] iser: export addr and port
This pactch has iser export the address and port
of the endpoint.

Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
2011-02-24 12:41:23 -05:00
David Rientjes
6a108a14fa kconfig: rename CONFIG_EMBEDDED to CONFIG_EXPERT
The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.

This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel.  A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).

Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.

Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <david.woodhouse@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Greg KH <gregkh@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Robin Holt <holt@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2011-01-20 17:02:05 -08:00
Roland Dreier
4790f4dc5f Merge branches 'misc', 'mlx4', 'mthca', 'nes' and 'srp' into for-next 2011-01-16 21:22:41 -08:00
Tejun Heo
f06267104d RDMA: Update workqueue usage
* ib_wq is added, which is used as the common workqueue for infiniband
  instead of the system workqueue.  All system workqueue usages
  including flush_scheduled_work() callers are converted to use and
  flush ib_wq.

* cancel_delayed_work() + flush_scheduled_work() converted to
  cancel_delayed_work_sync().

* qib_wq is removed and ib_wq is used instead.

This is to prepare for deprecation of flush_scheduled_work().

Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-16 21:16:31 -08:00
Bart Van Assche
695b83495e IB/srp: Test only once whether iu allocation succeeded
Merge the two tests in srp_queuecommand() of whether information unit
allocation succeeded into one.  An intended side effect of this change
is that we fix the warning:

    drivers/infiniband/ulp/srp/ib_srp.c: In function 'srp_queuecommand':
    drivers/infiniband/ulp/srp/ib_srp.c:1116: warning: 'req' may be used uninitialized in this function

(seen with CONFIG_CC_OPTIMIZE_FOR_SIZE=y at least with gcc 4.4.4)

Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: David Dillow <dillowda@ornl.gov>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-13 14:00:43 -08:00
Joe Perches
948579cd8c RDMA: Use vzalloc() to replace vmalloc()+memset(0)
Signed-off-by: Joe Perches <joe@perches.com>
Signed-off-by: Roland Dreier <rolandd@cisco.com>
2011-01-12 11:11:58 -08:00