iscsi_pool_init simplified
iscsi_pool_init currently has a lot of duplicate kfree() calls it does
when some allocation fails. This patch simplifies the code a little by
using iscsi_pool_free to tear down the pool in case of an error.
iscsi_pool_init also returns a copy of the item array to the caller.
Not all callers use this array, so we make it optional.
Instead of allocating a second array and return that, allocate just one
array, of twice the size.
Update users of iscsi_pool_{init,free}
This patch drops the (now useless) second argument to
iscsi_pool_free, and updates all callers.
It also removes the ctask->r2ts array, which was never
used anyway. Since the items argument to iscsi_pool_init
is now optional, we can pass NULL instead.
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
at libiscsi generic code
- currently code assumes a storage space of pdu header is allocated
at llds ctask and is pointed to by iscsi_cmd_task->hdr. Here I add
a hdr_max field pertaining to that storage, and an hdr_len that
accumulates the current use of the pdu-header.
- Add an iscsi_next_hdr() inline which returns the next free space
to write new Header at. Also iscsi_next_hdr() is used to retrieve
the address at which to write the header-digest.
- Add iscsi_add_hdr(length). What the user do is calls iscsi_next_hdr()
for address of the new header, than calls iscsi_add_hdr(length) with
the size of the new header. iscsi_add_hdr() will check if space is
available and update to the new size. length must be padded according
to standard.
- Add 2 padding inline helpers thanks to Olaf. Current patch does not
use them but Following patches will.
Also moved definition of ISCSI_PAD_LEN to iscsi_proto.h which had
PAD_WORD_LEN that was never used anywhere.
- Let iscsi_prep_scsi_cmd_pdu() signal an Error return since now it is
possible that it will fail.
- I was tired of yet again writing a "this is a digest" comment next to
sizeof(__u32) so I defined a new ISCSI_DIGEST_SIZE. Now I don't need
any comments. Changed all places that used sizeof(__u32) or "4" in
connection to a digest.
iscsi_tcp specific code
- At struct iscsi_tcp_cmd_task allocate maximum space allowed in
standard for all headers following the iscsi_cmd header. and mark
it so in iscsi_tcp_session_create()
- At iscsi_send_cmd_hdr() retrieve the correct headers size and
write header digest at iscsi_next_hdr().
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
- Check to see that OVERFLOW is not negative indicating
a bug.
- Unify handling of UNDERFLOW and OVERFLOW to the same
code.
- Also handle BIDI_OVERFLOW.
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Rewrite recv path. Fixes:
- data digest processing and error handling.
- ahs support.
Some fixups by Mike Christie
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch adds logical unit reset support. This should work for ib_iser,
but I have not finished testing that driver so it is not hooked in yet.
This patch also temporarily reverts the iscsi_tcp r2t write out patch.
That code is completely rewritten in this patchset.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
There is a race condition in iscsi_tcp.c that may cause it to forget
that it received a R2T from the target. This race may cause a data-out
command (such as a write) to lock up. The race occurs here:
static int
iscsi_send_unsol_pdu(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
{
struct iscsi_tcp_cmd_task *tcp_ctask = ctask->dd_data;
int rc;
if (tcp_ctask->xmstate & XMSTATE_UNS_HDR) {
BUG_ON(!ctask->unsol_count);
tcp_ctask->xmstate &= ~XMSTATE_UNS_HDR; <---- RACE
...
static int
iscsi_r2t_rsp(struct iscsi_conn *conn, struct iscsi_cmd_task *ctask)
{
...
tcp_ctask->xmstate |= XMSTATE_SOL_HDR_INIT; <---- RACE
...
While iscsi_xmitworker() (called from scsi_queue_work()) is preparing to
send unsolicited data, iscsi_tcp_data_recv() (called from
tcp_read_sock()) interrupts it upon receipt of a R2T from the target.
Both contexts do read-modify-write of tcp_ctask->xmstate. Usually, gcc
on x86 will make &= and |= atomic on UP (not guaranteed of course), but
in this case iscsi_send_unsol_pdu() reads the value of xmstate before
clearing the bit, which causes gcc to read xmstate into a CPU register,
test it, clear the bit, and then store it back to memory. If the recv
interrupt happens during this sequence, then the XMSTATE_SOL_HDR_INIT
bit set by the recv interrupt will be lost, and the R2T will be
forgotten.
The patch below (against 2.6.24-rc1) converts accesses of xmstate to use
set_bit, clear_bit, and test_bit instead of |= and &=. I have tested
this patch and verified that it fixes the problem. Another possible
approach would be to hold a lock during most of the rx/tx setup and
post-processing, and drop the lock only for the actual rx/tx.
Signed-off-by: Tony Battersby <tonyb@cybernetics.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch fixes the errors made in the users of the crypto layer during
the sg_init_table conversion. It also adds a few conversions that were
missing altogether.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: David S. Miller <davem@davemloft.net>
Most drivers need to set length and offset as well, so may as well fold
those three lines into one.
Add sg_assign_page() for those two locations that only needed to set
the page, where the offset/length is set outside of the function context.
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
It was found by LSI that on setups with large amounts of memory
we were bouncing buffers when we did not need to. If the iscsi tcp
code touches the data buffer (or a helper does),
it will kmap the buffer. iscsi_tcp also does not interact with hardware,
so it does not have any hw dma restrictions. This patch sets the bounce
buffer settings for our device queue so buffers should not be bounced
because of a driver limit.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This prevents the iscsi modules from being unloaded while
there are active mounts from an iscsi target.
Signed-off-by: Olaf Kirch <olaf.kirch@oracle.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
- remove the unnecessary map_single path.
- convert to use the new accessors for the sg lists and the
parameters.
TODO: use scsi_for_each_sg().
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Acked-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
iSCSI must support software iscsi (iscsi_tcp, iser), hardware iscsi (qla4xxx),
and partial offload (broadcom). To be able to allow each stack or driver
or port (virtual or physical) to be able to log into the same target portal
we use the initiator tuple [[HWADDRESS | NETDEVNAME], INITIATOR_NAME] and
the target tuple [TARGETNAME, CONN_ADDRESS, CONN_PORT] to id a session.
This patch adds the netdev name, which is used by software iscsi when
it binds a session to a netdevice using the SO_BINDTODEVICE sock opt.
It cannot use HWADDRESS because if someone did vlans then the same netdevice
will have the same mac and the initiator,target id will not be unique.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: David C Somayajulu <david.somayajulu@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch exports the local address for the session. For
qla4xxx this is the ip of the hba's port. For software
this is the src addr of the socket.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: David C Somayajulu <david.somayajulu@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch should fix the file descriptor leak problem. A quick look
through the kernel shows that users of sockfd_lookup use sockfd_put to
release their handle. We were using sock_release which from the comments
and code look like it does not release the get() on the file from the
lookup.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add a slave_configure function to iSCSI TCP to remove any DMA
alignment restriction. This permits the use of direct IO from
arbitrary addresses.
Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If we got the padding, data and header in different skbs,
we were not handling the padding correctly because we attributed it
to the data's skb. This resulted in the initiator reading from
pad bytes + skb offset instead of the correct offset.
If you could not connect with the open solaris target, this
will fix the lock up problem you were hitting.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch allows us to set can_queue and cmds_per_lun from userspace
when we create the session/host. From there we can set it on a per
target basis. The patch fully converts iscsi_tcp, but only hooks
up ib_iser for cmd_per_lun since it currently has a lots of preallocations
based on can_queue.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The cmdsn allocation and pdu transmit code can race, and we can end
up sending a pdu with cmdsn 10 before a pdu with 5. The target will
then fail the connection/session. This patch fixes the problem by
delaying the cmdsn allocation until we are about to send the pdu.
This also removes the xmitmutex. We were using the connection xmitmutex
during error handling to handle races with mtask and ctask cleanup and
completion. For ctasks we now have nice refcounting and for the mtask,
if we hit the case where the mtask timesout and it is floating
around somewhere in the driver, we end up dropping the session.
And to handle session level cleanup, we use the xmit suspend bit
along with scsi_flush_queue and the session lock to make sure
that the xmit thread is not possibly transmitting a task while
we are trying to kill it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If iscsi_tcp partially sends a header, it would recalculate the
header size and readd the size of the digest (if header digests
are used).This would cause us to send sizeof(digest) extra bytes
when we sent the rest of the header.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The attached patches add sysfs files for the chap settings
to the iscsi transport class, iscsi_tcp and ib_iser. This is
needed for software iscsi because there are times when iscsid
can die and it will need to reread the values it was using.
And it is needed by qla4xxx for basic management opertaions.
This patch does not hook in qla4xxx yet, because I am not sure
the mbx command to use.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
- Remove shadow of request length from struct iscsi_cmd_task.
- change all users to use scsi_cmnd->request_bufflen directly
(With bidi we will use scsi-ml API to retrieve in/out length)
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch fixes handling of expected datasn/r2tsn as received from
target. It is done according to: T10 rfc3720 section 3.2.2.3. Data Sequencing.
. unify expected datasn/r2tsn into one counter
. calculate than check expected datasn/r2tsn. On error print a message
and fail the request. (TODO use iscsi retransmits)
. remove the FIXME ;)
. avoid zero length memset
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: Benny Halevy <bhalevy@panasas.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
For iscsi root boot, software iscsi needs to know what the BIOS/OF
initiator used for the initiator name so this puts it in sysfs
for userspace to be able to pick up.
For hw iscsi, it is nice to see what the card is using.
This patch adds the new param, and hooks in qla4xxx, iscsi_tcp, and ib_iser.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Cc: David C Somayajulu <david.somayajulu@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
iscsid and udev need to key off the hw address being
used so add some helpers for iser and iscsi tcp.
Also convert them
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Cc: Roland Dreier <rdreier@cisco.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
People do not read the README and seem to like to
unselect the crc32c module even though iscsi_tcp selects
it for them. This patch spits a error that tells the user
that they really do need the module. Hopefully, we will
get fewer people asking about this now.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
For a while now, the block layer has seperated max sectors
and max hw sectors. Software iscsi has no limit so this patch
increases max hw sectors, so we can support large pass through
commands.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch renames DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH to avoid
confusion with the drivers default values (DEFAULT_MAX_RECV_DATA_SEGMENT_LENGTH
is the iscsi RFC specific default).
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The return value of crypto_alloc_hash() should be checked by
IS_ERR().
Signed-off-by: Akinobu Mita <akinobu.mita@gmail.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The transition from crypto_digest_*() to the crypto_hash_*() family
introduced a bug into the data digest calculation: crypto_hash_update() is
called with the number of S/G elements instead of the S/G lists data size.
Signed-off-by: Arne Redlich <arne.redlich@xiranet.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
XMSTATE_SOL_HDR could be set when the xmit thread tests it, but there may
not be anything on the r2tqueue yet. Move the XMSTATE_SOL_HDR set
before the addition to the queue to make sure that when we pull something
off it it is valid. This does not add locks around the xmstate test or make
that a atmoic_t because this is a fast path and if it is set when we test it
we can handle it there without the overhead. Later on we check the xmitqueue
for all requests with the session lock so we will not miss it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Unconditionally free crypto state, as it is always allocated during
TCP connection creation. Without this, crypto structures leak and
crc32c module refcounts grow as connections are created and
destroyed.
Signed-off-by: Pete Wyckoff <pw@osc.edu>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch converts ISCSI to use the new crypto_hash interface instead
of crypto_digest. It's a fairly straightforward substitution.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
When a digest is spread across two network buffers, we currently
ignore this and try to check the digest with the partial buffer.
Or course this fails. This patch has use iscsi_tcp_copy to
copy the whole digest before testing it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When we relogin to a target, we have not yet negotiated digests
so we must reset the hdr_size var.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch built over the last ones fixes a bug in the partial header
resend code, where we add on another 4 bytes to the send length on the resend.
We want just the header plus digest.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We currently allocated seperate tfms for data and header digests. There
is no reason for this since we can never calculate a rx header and
digest at the same time. Same for sends. So this patch removes the data
tfms and has the send and recv sides use the rx_tfm or tx_tfm.
I also made the connection creation code preallocate the tfms because I
thought I hit a bug where I changed the digests settings during a
relogin but could not allocate the tfm and then we just failed.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
iscsi_tcp calculates padding by using the expected transfer length. This
has the problem where if we have immediate data = no and initial R2T =
yes, and the transfer length ended up needing padding then we send:
1. header
2. padding which should have gone after data
3. data
Besides this bug, we also assume the target will always ask for nice
transfer lengths and the first burst length will always be a nice value.
As far as I can tell form the RFC this is not a requirement. It would be
silly to do this, but if someone did it we will end doing bad things.
Finally the last bug in that bit of code is in our handling of the
recalculation of data digests when we do not send a whole iscsi_buf in
one try. The bug here is that we call crypto_digest_final on a
iscsi_sendpage error, then when we send the rest of the iscsi_buf, we
doiscsi_data_digest_init and this causes the previous data digest to be
lost.
And to make matters worse, some of these bugs are replicated over and
over and over again for immediate data, solicited data and unsolicited
data. So the attached patch made over the iscsi git tree (see
kernel.org/git for details) which I updated today to include the patches
I said I merged, consolidates the sending of data, padding and digests
and calculation of data digests and fixes the above bugs.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
A couple targets like string bean and MDS, send r2ts with
a data len greater than the max burst we agreed to. We
were being strict in our enforcing of the iscsi rfc in that
code path, but there is no driver limitation that prevents
us from fullfilling the request. To allow those targets
to work we will ignore the max_burst length and send as
much data as the target asks for assuming it has consciously
decided to override its max burst length.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
iSCSI RFC states that the first burst length must be smaller than the
max burst length. We currently assume targets will be good, but that may
not be the case, so this patch adds a check.
This patch also moves the unsol data out offset to the lib so the LLDs
do not have to track it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The version info is useful for iscsi tcp, iser and qla4xxx so move to
transport class.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Must pass ISCSI_ERR values from the recv path and propogate them
upwards.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We currently try to allocate a max_recv_data_segment_length
which can be very large (default is 64K), and common uses
are up to 1MB. It is very very difficult to allocte this
much contiguous memory and it turns out we never even use it.
We really only need a couple of pages, so this patch has us
allocates just what we know what we need today.
Later if vendors start adding vendor specific data and
we need to handle large buffers we can do this, but for
the last 4 years we have not seen anyone do this or request
it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When we enter recovery and flush the running commands
we cannot freee the connection before flushing the commands.
Some commands may have a reference to the connection
that needs to be released before. iscsi_stop was forcing
the term and suspend too early and was causing a oops
in iser, so this patch removes those callbacks all together
and allows the LLD to handle that detail.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
if iscsi_data_rsp fails we must bail out. Since the pdu values like
data length are invalid we cannot continue to process the data since
it could over run buffers.
This fixes a bug with cisco 5428s where that target is sending
too much data.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The iscsi tcp code can pluck multiple rt2s from the tasks's r2tqueue
in the xmit code. This can result in the task being queued on the xmit queue
but gettting completed at the same time.
This patch fixes the above bug by making the fifo a list so
we always remove the entry on the list del.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
In the xmit patch we are sending a -EXXX value to iscsi_conn_failure
which is causing userspace to get confused.
We should be sending a ISCSI_ERR_* value that userspace understands.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Convert iscsi_tcp to new lib functions.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We can race and misset the suspend bit if iscsi_write_space is
called then iscsi_send returns with a failure indicating
there is no space.
To handle this this patch returns a error upwards allowing xmitworker
to decide if we need to try and transmit again. For the no
write space case xmitworker will not retry, and instead
let iscsi_write_space queue it back up if needed (this relies
on the work queue code to properly requeue us if needed).
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If recovery failed or we are in recovery only overwrite the state
if we are going to terminate the session or if we logged back in.
STOP_CONN_SUSPEND and conn_cnt are not used. We only support
a single connection session ATM, so cleanup that code while
we are working around it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Discovered by steven@hayter.me.uk and patch by michaelc@cs.wisc.edu
The dtask mempool is reserving 261120 items per session! Since we are now
sending headers with sendmsg there is no reason for the mempool and that
was causing us to us carzy amounts of mem. We can preallicate a header in
the r2t and task struct and reuse them
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From Zhen and ported by Mike:
Don't use sendpage for the headers. sendpage for the pdu headers
does not seem to have a performance impact, makes life harder
for mutiple data pdus to be in flight and still trips up some
network cards when it is from slab mem.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
debugged by wrwhitehead@novell.com
patch and analysis by fujita.tomonori@lab.ntt.co.jp
Only tcp_read_sock and recv_actor (iscsi_tcp_data_recv for us) see
desc.count. It is is used just for permitting tcp_read_sock to read
the portion of data in the socket.
When iscsi_tcp_data_recv sees a partial header, it sets
desc.count. However, it is possible that the next skb (containing the
rest of the header) still does not come. So I'm not sure that this
scheme is completely correct.
Ideally, we should use the exact length of the data in the socket for
desc.count. However, it is not so simple (see SIOCINQ in
tcp_ioctl). So I think that iscsi_tcp_data_recv can just stop playing
with desc.count and tell tcp_read_sock to read the all skbs. As
proposed already, if iscsi_tcp_data_ready sets desc.count to
non-zero, tcp_read_sock does that.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
debugged by Ming and Rohan:
The problem Ming and Rohan debugged was that during a normal session
login, open-iscsi is not incrementing the exp_statsn counter. It was
stuck at zero. From the RFC, it looks like if the login response PDU has
a successful status then we should be incrementing that value. Also from
the RFC, it looks like if when we drop a connection then reconnect, we
should be using the exp_statsn from the old connection in the next
relogin attempt.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
align printk output
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
add transport end point callbacks so iscsi drivers that cannot connect
from userspace, like iscsi tcp, using sockets do not have to
implement their own socket infrastructure.
Signed-off-by: Or Gerlitz <ogerlitz@voltaire.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This just converts iscsi_tcp to the lib
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The current iscsi_tcp eh is not nicely setup for dm-multipath
and performs some extra task management functions when they
are not needed.
The attached patch:
- Fixes the TMF issues. If a session is rebuilt
then we do not send aborts.
- Fixes the problem where if the host reset fired, we would
return SUCCESS even though we had not really done anything
yet. This ends up causing problem with scsi_error.c's TUR.
- If someone has turned on the userspace nop daemon code to try
and detect network problems before the scsi command timeout
we can now drop and clean up the session before the scsi command
timesout and fires the eh speeding up the time it takes for a
command to go from one patch to another. For network problems
we fail the command with DID_BUS_BUSY so if failfast is set
scsi_decide_disposition fails the command up to dm for it to
try on another path.
- And we had to add some basic iscsi session block code. Previously
if we were trying to repair a session we would retrun a MLQUEUE code
in the queuecommand. This worked but it was not the most efficient
or pretty thing to do since it would take a while to relogin
to the target. For iscsi_tcp/open-iscsi a lot of the iscsi error handler
is in userspace the block code is pretty bare. We will be
adding to that for qla4xxx.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
For iscsi boot when going from initramfs to the real root we
need to stop the userpsace iscsi daemon. To later restart it
iscsid needs to be able to rebuild itself and part of that
process is matching a session running the kernel with the
iscsid representation. To do this the attached patch
adds several required iscsi values. If the LLD does not provide
them becuase, login is done in userspace, then the transport
class and userspace set ths up for the LLD.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
from hare@suse.de and michaelc@cs.wisc.edu
hw iscsi like qla4xxx does not allocate a host per session and
for userspace it is difficult to restart iscsid using the
"iscsi handles" for the session and connection, so this
patch just has the class or userspace allocate the id for
the session and connection.
Note: this breaks userspace and requires users to upgrade to the newest
open-iscsi tools. Sorry about his but open-iscsi is still too new to
say we have a stable user-kernel api and we were not good nough
designers to know that other hw iscsi drivers and iscsid itself would
need such changes. Actually we sorta did but at the time we did not
have the HW available to us so we could only guess.
Luckily, the only tools hooking into the class are the open-iscsi ones
or other tools like iscsitart hook into the open-iscsi engine from
userspace or prgroams like anaconda call our tools so they are not affected.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Modify well over a dozen mempool users to call mempool_create_slab_pool()
rather than calling mempool_create() with extra arguments, saving about 30
lines of code and increasing readability.
Signed-off-by: Matthew Dobson <colpatch@us.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
SLAB_NO_REAP is documented as an option that will cause this slab not to be
reaped under memory pressure. However, that is not what happens. The only
thing that SLAB_NO_REAP controls at the moment is the reclaim of the unused
slab elements that were allocated in batch in cache_reap(). Cache_reap()
is run every few seconds independently of memory pressure.
Could we remove the whole thing? Its only used by three slabs anyways and
I cannot find a reason for having this option.
There is an additional problem with SLAB_NO_REAP. If set then the recovery
of objects from alien caches is switched off. Objects not freed on the
same node where they were initially allocated will only be reused if a
certain amount of objects accumulates from one alien node (not very likely)
or if the cache is explicitly shrunk. (Strangely __cache_shrink does not
check for SLAB_NO_REAP)
Getting rid of SLAB_NO_REAP fixes the problems with alien cache freeing.
Signed-off-by: Christoph Lameter <clameter@sgi.com>
Cc: Pekka Enberg <penberg@cs.helsinki.fi>
Cc: Manfred Spraul <manfred@colorfullife.com>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
>From ogerlitz@voltaire.com:
mgmtpool shoild be frees in immdata_alloc_fail label.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
>From erezz@voltaire.com:
We are still in ISCSI_STATE_FREE state at create time. The addition
of the first connection puts us in ISCSI_STATE_LOGGED_IN.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
>From erezz@voltaire.com:
rm conn->lock since it is not used anymore. The dataqueue is protected
by the session lock and xmitmutex.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From:
michaelc@cs.wisc.edufujita.tomonori@lab.ntt.co.jpda-x@monatomic.org
and err path fixup from:
ogerlitz@voltaire.com
This patch cleans up that interface by having the lld and class
pass a iscsi_cls_session or iscsi_cls_conn between each other when
the function is used by HW and SW iscsi llds. This way the lld
does not have to remember if it has to send a handle or pointer
and a handle or pointer to connection, session or host.
This also has the class verify the session handle that gets passed from
userspace instead of using the pointer passed into the kernel directly.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Remove the "inline" keyword from a bunch of big functions in the kernel with
the goal of shrinking it by 30kb to 40kb
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
From: FUJITA Tomonori <tomof@acm.org> and zhenyu.z.wang@intel.com:
We cannot handle filesystems like XFS becuase of the pages they
are sending us. We had thought page_count could be used to
work around this, but the correct test is for PageSlab.
The proper solution is to figure out what type of pages
filesystems can use so we do not have to add tests like
this or handle it in the block layer for all network block drivers
but the issue still has not been resolved on fs-devel
so we are sending this patch as a temporary fix.
This is last patch just in case it is Nakd with the explanation
that we need to push the correct fix through fs-devel, mm
or the block layer. The rest of the patchset can live without
the patch, but the driver will not work with filesystems like
XFS.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When we run the xmit code from queuecomand the stack trace
gets too deep. The patch runs the xmit code from the scsi_host
work queue. This fixes 4k stack and xfs support and should
fix the st and sg stack usage bugs.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This is the second version of the patch to address Christoph's comments.
Instead of doing the lib, I just kept everything in scsi_trnapsort_iscsi.c
like the FC and SPI class. This was becuase the driver model and sysfs
class is tied to the session and connection setup so separating did not
buy very much at this time.
The reason for this patch was becuase HW iscsi LLDs like qla4xxx cannot
use the iscsi class becuase the scsi_host was tied to the interface and
class code. This patch just seperates the session from scsi host so
that LLDs that allocate the host per some resource like pci device
can still use the class.
This is also fixes a couple refcount bugs that can be triggered
when users have a sysfs file open, close the session, then
read or write to the file.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From Mike Christie <michaelc@cs.wisc.edu> and FUJITA Tomonori <tomof@acm.org>:
We cannot use page_address becuase some pages could be highmem.
Instead, we can use sock_no_sendpage which does kmap for us.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Users can write to a page while we are sending it and making
digest calculations. This ends up causing us to retry the command
when a digest error is later reported. By using sock_no_sendpage
when data digests are calculated we can avoid a lot of (not all but it
helps) the retries becuase sock_no_sendpage is not zero copy.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We should be taking the host_lock instead of the conn lock when
checking host_busy.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
We need to check the ISCSI_FLAG_DATA_* flags.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Remove extra whitespaces.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
the scsi layer is using semaphores in a mutex way, this patch converts
these into using mutexes instead
Signed-off-by: Arjan van de Ven <arjan@infradead.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This merge is pretty extensive. The conflict is over the new
req->retries parameter, so I had to change the prototype to
scsi_setup_blk_pc_cmnd() and the usage in sd, sr and st.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From Wang Zhenyu:
check header digest for cmd and mgmt tasks
Signed-off-by: Wang Zhenyu <zhenyu.z.wang@intel.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From Wang Zhenyu:
High queue depth was a problem for some targets so make queue_depth adjustable
From Mike Christie
Make default queue_depth a little lower
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From Wang Zhenyu:
data digest fix (the bug caused data corruption w/Wasabi StorageBuilder target)
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
from Wang Zhenyu:
Must check SCSI CMD and R2T response according to the spec
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From tomof@acm.org:
There is one more issue about Equallogic systems. They send
re-direction info with FIN. I think that the kernel module needs to
let iscsid to read data from the socket before killing it.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Must check only valid opcode bits.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From: michaelc@cs.wisc.edu
I have a bad memory. I cannot remember what versions are which,
so add a module version to help.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From: zhenyu.z.wang@intel.com
This add check to NOOP_IN's ttt, when it's ~0UL we should not send
NOOP_OUT by spec (plus some cleanup).
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From: michaelc@cs.wisc.edu
Cleanup some iscsi_proto defs, add some missing values, and
fix some defs.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From: zhenyu.z.wang@intel.com
Delay the head digest update until xmit time, like data digest update.
[To make things cleaner and avoid prempt bug]
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
From: tomof@acm.org
I'm not sure about this. I don't think that NODELAY option hurts
performance. However, open-iscsi does not use MSG_MORE properly with
sendpage, so NODELAY option hurts the open-iscsi performance.
I've attached a patch to fix NODELAY and MSG_MORE problems and the
write performance results with disktest.
I use Opteron boxes connected directly, Chelsio NICs, 1500-byte MTU,
64 KB I/O size, and the iSCSI parameters on open-iscsi web site.
With only NODELAY fix, the performance drops, as you said. On the
other hand, NODELAY and MSG_MORE fixes improve the performance
overall.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
drivers/scsi/iscsi_tcp.c, iscsi data path.
Signed-off-by: Alex Aizman <itn780@yahoo.com>
Signed-off-by: Dmitry Yusupov <dmitry_yus@yahoo.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>