Might result in a deadlock where completion context waits for
session commands release where the later might need a final
completion for it.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
In order to reduce the contention on CQ locking (present
in some LLDDs) we poll in batches of 16 work completion items.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
In case the CQ is packed with completions, we can't just
hog the CPU forever. Poll until a sufficient budget (currently
hard-coded to 64k completions) and if budget is exhausted, bailout
and give a chance to other threads.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
In order to know that we consumed all the connection completions
we maintain atomic post_send_buf_count for each IO post send. But
we can know that if we post a "beacon" (zero length RECV work request)
after we move the QP into error state and the target does not serve
any new IO. When we consume it, we know we finished all the connection
completion and we can go ahead and destroy stuff.
In error completion handler we now just need to check for ISERT_BEACON_WRID
to arrive and then wait for session commands to cleanup and complete
conn_wait_comp_err.
We reserve another CQ and QP entries to fit the zero length post recv.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
We are calling session reinstatement, wait_conn will start
connection termination.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Using TX and RX CQs attached to the same vector might
create a throttling effect coming from the serial processing
of a work-queue. Use one CQ instead, it will do better in interrupt
processing and it provides a simpler code. Also, We get rid of
redundant isert_rx_wq.
Next we can remove the atomic post_send_buf_count from the IO path.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
A pre-step before going to a single CQ.
Also this makes the code a little more simple to
read.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Nit, uintptr_t is designed for pointer casting, use it.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
As a pre-step to a single CQ, we unite the error completion
handlers to a single handler.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
It is disabled at the moment, we will get that back
in once the target is more stable.
This reverts commit 95b60f0
"Add support for completion interrupt coalescing"
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Currently we have no way to tell that the target stack is in shutdown
sequence. In case we have open connections, the initiator immediately
attempts to reconnect in a DDOS attack style, so we may end up
terminating the iser enabled network portal while it's np_accept_list
still have pending connections.
The workaround is simply release all the connections in the list.
A proper fix will be to start shutdown sequence by shutting the
network portal to avoid initiator immediate reconnect attempts.
But the temporary work around seems to work at this point, so I think
we can do this for now...
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
iSER will report supported protection operations based on
the tpg attribute t10_pi settings and HCA PI offload capabilities.
If the HCA does not support PI offload or tpg attribute t10_pi is
not set, we fall to SW PI mode.
In order to do that, we move iscsit_get_sup_prot_ops after connection
tpg assignment.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Fallback to software mode DIF if HCA does not support
PI (without crashing obviously). It is still possible to
run with backend protection and an unprotected frontend,
so looking at the command prot_op is not enough. Check
device PI capability on a per-IO basis (isert_prot_cmd
inline static) to determine if we need to handle protection
information.
Trace:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000010
IP: [<ffffffffa037f8b1>] isert_reg_sig_mr+0x351/0x3b0 [ib_isert]
Call Trace:
[<ffffffff812b003a>] ? swiotlb_map_sg_attrs+0x7a/0x130
[<ffffffffa038184d>] isert_reg_rdma+0x2fd/0x370 [ib_isert]
[<ffffffff8108f2ec>] ? idle_balance+0x6c/0x2c0
[<ffffffffa0382b68>] isert_put_datain+0x68/0x210 [ib_isert]
[<ffffffffa02acf5b>] lio_queue_data_in+0x2b/0x30 [iscsi_target_mod]
[<ffffffffa02306eb>] target_complete_ok_work+0x21b/0x310 [target_core_mod]
[<ffffffff8106ece2>] process_one_work+0x182/0x3b0
[<ffffffff8106fda0>] worker_thread+0x120/0x3c0
[<ffffffff8106fc80>] ? maybe_create_worker+0x190/0x190
[<ffffffff8107594e>] kthread+0xce/0xf0
[<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70
[<ffffffff8159a22c>] ret_from_fork+0x7c/0xb0
[<ffffffff81075880>] ? kthread_freezable_should_stop+0x70/0x70
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch converts to allocate PI contexts dynamically in order
avoid a potentially bogus np->tpg_np and associated NULL pointer
dereference in isert_connect_request() during iser-target endpoint
shutdown with multiple network portals.
Also, there is really no need to allocate these at connection
establishment since it is not guaranteed that all the IOs on
that connection will be to a PI formatted device.
We can do it in a lazy fashion so the initial burst will have a
transient slow down, but very fast all IOs will allocate a PI
context.
Squashed:
iser-target: Centralize PI context handling code
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.14+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
In situations such as bond failover, The new session establishment
implicitly invokes the termination of the old connection.
So, we don't want to wait for the old connection wait_conn to completely
terminate before we accept the new connection and post a login response.
The solution is to deffer the comp_wait completion and the conn_put to
a work so wait_conn will effectively be non-blocking (flush errors are
assumed to come very fast).
We allocate isert_release_wq with WQ_UNBOUND and WQ_UNBOUND_MAX_ACTIVE
to spread the concurrency of release works.
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
The np listener cm_id will also get ADDR_CHANGE event
upcall (in case it is bound to a specific IP). Handle
it correctly by creating a new cm_id and implicitly
destroy the old one.
Since this is the second event a listener np cm_id may
encounter, we move the np cm_id event handling to a
routine.
Squashed:
iser-target: Move cma_id setup to a function
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Take isert_conn pointer from cm_id->qp->qp_context. This
will allow us to know that the cm_id context is always
the network portal. This will make the cm_id event check
(connection or network portal) more reliable.
In order to avoid a NULL dereference in cma_id->qp->qp_context
we destroy the qp after we destroy the cm_id (and make the
dereference safe). session stablishment/teardown sequences
can happen in parallel, we should take into account that
connected_handler might race with connection teardown flow.
Also, protect isert_conn->conn_device->active_qps decrement
within the error patch during QP creation failure and the
normal teardown path in isert_connect_release().
Squashed:
iser-target: Decrement completion context active_qps in error flow
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
There is no point in accepting a new CM request only
when we are completely done with the last iscsi login.
Instead we accept immediately, this will also cause the
CM connection to reach connected state and the initiator
is allowed to send the first login. We mark that we got
the initial login and let iscsi layer pick it up when it
gets there.
This reduces the parallel login sequence by a factor of
more then 4 (and more for multi-login) and also prevents
the initiator (who does all logins in parallel) from
giving up on login timeout expiration.
In order to support multiple login requests sequence (CHAP)
we call isert_rx_login_req from isert_rx_completion insead
of letting isert_get_login_rx call it.
Squashed:
iser-target: Use kref_get_unless_zero in connected_handler
iser-target: Acquire conn_mutex when changing connection state
iser-target: Reject connect request in failure path
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
ISER_CONN_UP state is not sufficient to know if
we should wait for completion of flush errors and
disconnected_handler event.
Instead, split it to 2 states:
- ISER_CONN_UP: Got to CM connected phase, This state
indicates that we need to wait for a CM disconnect
event before going to teardown.
- ISER_CONN_FULL_FEATURE: Got to full feature phase
after we posted login response, This state indicates
that we posted recv buffers and we need to wait for
flush completions before going to teardown.
Also avoid deffering disconnected handler to a work,
and handle it within disconnected handler.
More work here is needed to handle DEVICE_REMOVAL event
correctly (cleanup all resources).
Squashed:
iser-target: Don't deffer disconnected handler to a work
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Since commit 0fc4ea701f ("Target/iser: Don't put isert_conn inside
disconnected handler") we put the conn kref in isert_wait_conn, so we
need .wait_conn to be invoked also in the error path.
Introduce call to isert_conn_terminate (called under lock)
which transitions the connection state to TERMINATING and calls
rdma_disconnect. If the state is already teminating, just bail
out back (temination started).
Also, make sure to destroy the connection when getting a connect
error event if didn't get to connected (state UP). Same for the
handling of REJECTED and UNREACHABLE cma events.
Squashed:
iscsi-target: Add call to wait_conn in establishment error flow
Reported-by: Slava Shwartsman <valyushash@gmail.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # v3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
isert has an issue of trying to create a CQ with more CQEs than are
supported by the hardware, that currently results in failures during
isert_device creation during first session login.
This is the isert version of the patch that Minh Tran submitted for
iser, and is simple a workaround required to function with existing
ocrdma hardware.
Signed-off-by: Chris Moore <chris.moore@emulex.com>
Reviewied-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
In this case the cm_id->context is the isert_np, and the cm_id->qp
is NULL, so use that to distinct the cases.
Since we don't expect any other events on this cm_id we can
just return -1 for explicit termination of the cm_id by the
cma layer.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
This patch adds a max_send_sge=2 minimum in isert_conn_setup_qp()
to ensure outgoing control PDU responses with tx_desc->num_sge=2
are able to function correctly.
This addresses a bug with RDMA hardware using dev_attr.max_sge=3,
that in the original code with the ConnectX-2 work-around would
result in isert_conn->max_sge=1 being negotiated.
Originally reported by Chris with ocrdma driver.
Reported-by: Chris Moore <Chris.Moore@emulex.com>
Tested-by: Chris Moore <Chris.Moore@emulex.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
It is not guaranteed to that srp_sq_size is supported
by the HCA. So if we failed to create the QP with ENOMEM,
try with a smaller srp_sq_size. Keep it up until we hit
MIN_SRPT_SQ_SIZE, then fail the connection.
Reported-by: Mark Lehrer <lehrer@gmail.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Cc: <stable@vger.kernel.org> # 3.4+
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
Pull SCSI target updates from Nicholas Bellinger:
"Here are the target updates for v3.18-rc2 code. These where
originally destined for -rc1, but due to the combination of travel
last week for KVM Forum and my mistake of taking the three week merge
window literally, the pull request slipped.. Apologies for that.
Things where reasonably quiet this round. The highlights include:
- New userspace backend driver (target_core_user.ko) by Shaohua Li
and Andy Grover
- A number of cleanups in target, iscsi-taret and qla_target code
from Joern Engel
- Fix an OOPs related to queue full handling with CHECK_CONDITION
status from Quinn Tran
- Fix to disable TX completion interrupt coalescing in iser-target,
that was causing problems on some hardware
- Fix for PR APTPL metadata handling with demo-mode ACLs
I'm most excited about the new backend driver that uses UIO + shared
memory ring to dispatch I/O and control commands into user-space.
This was probably the most requested feature by users over the last
couple of years, and opens up a new area of development + porting of
existing user-space storage applications to LIO. Thanks to Shaohua +
Andy for making this happen.
Also another honorable mention, a new Xen PV SCSI driver was merged
via the xen/tip.git tree recently, which puts us now at 10 target
drivers in upstream! Thanks to David Vrabel + Juergen Gross for their
work to get this code merged"
* 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/nab/target-pending: (40 commits)
target/file: fix inclusive vfs_fsync_range() end
iser-target: Disable TX completion interrupt coalescing
target: Add force_pr_aptpl device attribute
target: Fix APTPL metadata handling for dynamic MappedLUNs
qla_target: don't delete changed nacls
target/user: Recalculate pad size inside is_ring_space_avail()
tcm_loop: Fixup tag handling
iser-target: Fix smatch warning
target/user: Fix up smatch warnings in tcmu_netlink_event
target: Add a user-passthrough backstore
target: Add documentation on the target userspace pass-through driver
uio: Export definition of struct uio_device
target: Remove unneeded check in sbc_parse_cdb
target: Fix queue full status NULL pointer for SCF_TRANSPORT_TASK_SENSE
qla_target: rearrange struct qla_tgt_prm
qla_target: improve qlt_unmap_sg()
qla_target: make some global functions static
qla_target: remove unused parameter
target: simplify core_tmr_abort_task
target: encapsulate smp_mb__after_atomic()
...
- Large set of iSER initiator improvements
- Hardware driver fixes for cxgb4, mlx5 and ocrdma
- Small fixes to core midlayer
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
iQIcBAABCAAGBQJUQEczAAoJEENa44ZhAt0h3n8P/RqklU+JJiF1eWRgvdf3fOPC
WDzOzdKUHvv3Lm5qSv8V6q23oYzf2QC/vjuZJgyM5156vj/qSf3iw1ueZTwYSQ3v
B6bV7/ptSpBlxRx/sI9/ks5yqT869jww7QAO+wtvzuq7JDxQr+t4Yw1j3WOM8DGd
F/rBFWJgLCD3zFSeJVY+AgZwIeDpNvBO0/QVnchs9iPUY0jSBvhDLsWegGhs92Uv
wfeiV36f8hPVnbYVMV+xA2t9NkBV21r1sUK1l+CfPDgL/unDoXXqriuqb401l4cj
zR4/Xwro9WzOC0gey2a5KkyX9wQUW9+Y4TnHRLnJ5shO/yzqxmc+/FMksxnaoWtI
koF5LqyfXxNhq0VZBpoy+astY4vv4h34WlyBC2lxDCJBEw8VYzO+Wg9QJAzWOxlq
JXtY9l9zRxfgTLe78xjl2n9LEeOysbYJemp3YFpZVh7a4JvUK4L9Kh9mXKZu8tqt
zd7YniNNJSdDfF5+Gx1kSK4kE1r/89f04ED6hg/eIf/IqhNYJ/vD5joQuJ4RqQNx
5G0wdCfKMy9cCwbx1/eCRJOuP+dbGR73UskgXc41s5/VtXCdq8JHvBcWw5CgxQ5I
AYGUrCuu/ZsQSMNM1i7Pocz8m4uLlnpXF7Qv62ULt/NQJQHj5Xfdvb9M3BK0urIf
+aBpf+LiZahaUEfo5eYt
=+ABw
-----END PGP SIGNATURE-----
Merge tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband
Pull infiniband/RDMA updates from Roland Dreier:
- large set of iSER initiator improvements
- hardware driver fixes for cxgb4, mlx5 and ocrdma
- small fixes to core midlayer
* tag 'rdma-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/roland/infiniband: (47 commits)
RDMA/cxgb4: Fix ntuple calculation for ipv6 and remove duplicate line
RDMA/cxgb4: Add missing neigh_release in find_route
RDMA/cxgb4: Take IPv6 into account for best_mtu and set_emss
RDMA/cxgb4: Make c4iw_wr_log_size_order static
IB/core: Fix XRC race condition in ib_uverbs_open_qp
IB/core: Clear AH attr variable to prevent garbage data
RDMA/ocrdma: Save the bit environment, spare unncessary parenthesis
RDMA/ocrdma: The kernel has a perfectly good BIT() macro - use it
RDMA/ocrdma: Don't memset() buffers we just allocated with kzalloc()
RDMA/ocrdma: Remove a unused-label warning
RDMA/ocrdma: Convert kernel VA to PA for mmap in user
RDMA/ocrdma: Get vlan tag from ib_qp_attrs
RDMA/ocrdma: Add default GID at index 0
IB/mlx5, iser, isert: Add Signature API additions
Target/iser: Centralize ib_sig_domain setting
IB/iser: Centralize ib_sig_domain settings
IB/mlx5: Use extended internal signature layout
IB/iser: Set IP_CSUM as default guard type
IB/iser: Remove redundant assignment
IB/mlx5: Use enumerations for PI copy mask
...
The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics and
a slightly buggy strncasecmp. The latter is the POSIX name, so strnicmp
was renamed to strncasecmp, and strnicmp made into a wrapper for the new
strncasecmp to avoid breaking existing users.
To allow the compat wrapper strnicmp to be removed at some point in the
future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Roland Dreier <roland@kernel.org>
Cc: Nicholas Bellinger <nab@linux-iscsi.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Expose more signature setting parameters. We modify the signature API
to allow usage of some new execution parameters relevant to data
integrity feature.
This patch modifies ib_sig_domain structure by:
- Deprecate DIF type in signature API (operation will
be determined by the parameters alone, no DIF type awareness)
- Add APPTAG check bitmask (for input domain)
- Add REFTAG remap (increment) flag for each domain
- Add APPTAG/REFTAG escape options for each domain
The mlx5 driver is modified to follow the new parameters in HW
signature setup.
At the moment the callers (iser/isert) hard-code new parameters (by
DIF type). In the future, callers will retrieve them from the scsi
command structure.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Later there will be more parameters to set, so we want to do it in a
centralized place.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Later there will be more parameters to set, so we want to do it in a
centralized place.
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In the future this will be a per-command parameter so we can lose it,
but in the mean time IP_CSUM is a lot lighter for SW layers to
compute, set it as default.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We clear the struct before - no need to do 0 assignment.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Update the driver version and add Sagi Grimberg as maintainer
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This patch does not change any functionality.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Match to the debug level of all functions in connect/disconnect flows.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Singal completion of every 32 scsi commands and suppress all the rest.
We don't do anything upon getting the completion so no need to "just
consume" it. Cleanup of scsi command is done in cleanup_task callback.
Still keep dataout and control send completions as we may need to
cleanup there. This helps reducing the amount of interrupts/completions
in the IO path.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Poll in batch of 16. Since we don't want it on the stack, keep under
iser completion context (iser_comp).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Avoid post_send counting (atomic) in the IO path just to keep track of
how many completions we need to consume. Use a beacon post to indicate
that all prior posts completed.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
This will solve a possible condition where we might miss TX completion
(flush error) during session teardown. Since we are using a single
CQ, we don't need to actively drain the TX CQ, instead just wait for
flush_completion (when counters reach zero) and remove iser_poll_for_flush_errors().
This patch might introduce a minor performance regression on its own,
but the next patches will enhance performance using a single CQ for RX
and TX.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We need a way to guarentee that we don't stay in soft-IRQ context for
too long. We might starve other pending CQ tasklets or worse lock
against application trying to issue IO on the running CPU.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Introduce iser_comp which centralizes all iser completion related
items and is referenced by iser_device and each ib_conn.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
In case iscsid was violently killed (SIGKILL) during its error
recovery stage, we may never get a connection teardown sequence for
some of the old connections. No harm done, but when we try to unload
the module we will need to cleanup all these connections. So we
actually may end-up here - so it's not a BUG_ON(), just give a relaxed
warning that this happened and continue with normal unload. BUG_ON()
will cause segfault on module_exit and we don't want that.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Previously we notified iscsi layer about the connection layer when
we consumed all of our flush errors. This was racy as there
was no guarentee that iscsi_conn wasn't terminated by then (which ends
up in an invalid memory access). In case we got a non FLUSH error
completion, we are guarenteed that iscsi_conn is still alive. We should
notify iSCSI layer with iscsi_conn_failure to initiate error handling.
While we are at it, add a nice kernel-doc style documentation.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Bailout in case a task cleanup (iscsi_iser_cleanup_task) is called
after the IB device was removed (DEVICE_REMOVAL CM event). We also
call iscsi_conn_stop with a lock taken to prevent DEVICE_REMOVAL and
tasks cleanup from racing.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Previously we didn't need to unbind the iser_conn and iscsi_conn since
we always relied on iscsi daemon to teardown the connection and never
let it finish before we cleanup all that is needed in iser. This is
not the case anymore (for DEVICE_REMOVAL event). So avoid any possible
chance we cause iscsi_conn dereference after iscsi_conn was freed.
We also call iser_conn_terminate (safe to call multiple times) just
for the corner case of iscsi daemon stopping an old connection before
invoking endpoint removal (might happen if it was violently killed).
Notice we are unbinding under a lock - which is required.
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
We no longer rely on iscsi connection teardown sequence, so no need to
give a grace period and continue cleanup if it expired. Have
iser_conn_release wait for full completion before freeing iser_conn.
ib_completion:
Guaranteed to come when:
- Got DISCONNECTED/ADDR_CHANGE event or
- iSCSI called ep_disconnect/conn_stop
Guaranteed to finish when:
- Got TIMEWAIT_EXIT/DEVICE_REMOVAL event
- All Flush errors are consumed
- IB related resources are destroyed
stop_completion:
Guaranteed to come when:
- iSCSI calls conn_stop
Guaranteed to finish when:
- All inflight tasks were cleaned up
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
iscsi daemon is in user-space, thus we can't rely on it to be invoked
at connection teardown (if not running or does not receive CPU time).
This patch addresses the issue by re-structuring iSER connection
teardown logic and CM events handling.
The CM events will dictate the RDMA resources destruction (ib_conn)
and iser_conn is kept around as long as iscsi_conn is left around
allowing iscsi/iser callbacks to continue after RDMA transport was
destroyed.
This patch introduces a separation in logic when handling CM events:
- DISCONNECTED_HANDLER, ADDR_CHANGED
This events indicate the start of teardown process.
Actions:
1. Terminate the connection: rdma_disconnect (send DREQ/DREP)
2. Notify iSCSI of connection failure
3. Change state to TERMINATING
4. Poll for all flush errors to be consumed
- TIMEWAIT_EXIT, DEVICE_REMOVAL
These events indicate the final stage of termination process and
we can free RDMA related resources.
Actions:
1. Call disconnected handler (we are not guaranteed that DISCONNECTED
event was invoked in the past)
2. Cleanup RDMA related resources
3. For DEVICE_REMOVAL return non-zero rc from cma_handler to
implicitly destroy the cm_id (Can't rely on user-space, make sure
we have forward progress)
We replace flush_completion (indicate all flushes were consumed) with
ib_completion (rdma resources were cleaned up).
The iser_conn_release_work will wait for teardown completions:
- conn_stop was completed (tasks were cleaned-up) - stop_completion
- RDMA resources were destroyed - ib_completion
And then will continue to free iser connection representation (iser_conn).
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Roi Dayan <roid@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>
Put all connection IB related resources release in this routine. One
exception is the cm_id which cannot be destroyed as the routine is
protected by the state mutex. Also move its position to avoid forward
declaration. While at it fix qp NULL assignment.
Signed-off-by: Sagi Grimberg <sagig@mellanox.com>
Signed-off-by: Ariel Nahum <arieln@mellanox.com>
Signed-off-by: Or Gerlitz <ogerlitz@mellanox.com>
Signed-off-by: Roland Dreier <roland@purestorage.com>