Use the matching scope for logging messages to allow for
better command tracing.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Suggested-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Convert scsi_normalize_sense() and friends to return 'bool'
instead of an integer.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Reviewed-by: Yoshihiro Yunomae <yoshihiro.yunomae.ez@hitachi.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
We should be using the standard dev_printk() variants for
sense code printing.
[hch: remove __scsi_print_sense call in xen-scsiback, Acked by Juergen]
[hch: folded bracing fix from Dan Carpenter]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Further to a January 2013 thread titled: "[PATCH] SG_SCSI_RESET ioctl
should only perform requested operation" by Jeremy Linton a patch (v3)
is presented that expands the existing ioctl to include "no_escalate"
versions to the existing resets. This requires no changes to SCSI low
level drivers (LLDs); it adds several more finely tuned reset options
to the user space. For example:
/* This call remains the same, with the same escalating semantics
* if the device (LU) reset fail. That is: on failure to try a
* target reset and if that fails, try a bus reset, and if that fails
* try a host (i.e. LLD) reset. */
val = SG_SCSI_RESET_DEVICE;
res = ioctl(<sg_or_block_fd>, SG_SCSI_RESET, &val);
/* What follows is a new option introduced by this patch series. Only
* a device reset is attempted. If that fails then an appropriate
* error code is provided. N.B. There is no reset escalation. */
val = SG_SCSI_RESET_DEVICE | SG_SCSI_RESET_NO_ESCALATE;
res = ioctl(<sg_or_block_fd>, SG_SCSI_RESET, &val);
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Reviewed-by: Jeremy Linton <jlinton@tributary.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Multipath devices using the TUR path checker need to see the sense
code for a failed TUR command in their device handler. Since commit
14216561e1 we always return success for mid
layer issued TUR commands before calling the device handler, which
stopped the TUR path checker from working.
Move the call to the device handler check sense method before the early
return for TUR commands to give the device handler a chance to intercept
them.
Signed-off-by: Christoph Hellwig <hch@infradead.org>
Tested-by: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Setups that use the blk-mq I/O path can lock up if a host with a single
device that has its door locked enters EH. Make sure to only send the
command to re-lock the door to devices that actually were reset and thus
might have lost their state. Otherwise the EH code might be get blocked
on blk_get_request as all requests for non-reset devices might be in use.
Cc: stable@vger.kernel.org
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reported-by: Meelis Roos <meelis.roos@ut.ee>
Tested-by: Meelis Roos <meelis.roos@ut.ee>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull core block layer changes from Jens Axboe:
"This is the core block IO pull request for 3.18. Apart from the new
and improved flush machinery for blk-mq, this is all mostly bug fixes
and cleanups.
- blk-mq timeout updates and fixes from Christoph.
- Removal of REQ_END, also from Christoph. We pass it through the
->queue_rq() hook for blk-mq instead, freeing up one of the request
bits. The space was overly tight on 32-bit, so Martin also killed
REQ_KERNEL since it's no longer used.
- blk integrity updates and fixes from Martin and Gu Zheng.
- Update to the flush machinery for blk-mq from Ming Lei. Now we
have a per hardware context flush request, which both cleans up the
code should scale better for flush intensive workloads on blk-mq.
- Improve the error printing, from Rob Elliott.
- Backing device improvements and cleanups from Tejun.
- Fixup of a misplaced rq_complete() tracepoint from Hannes.
- Make blk_get_request() return error pointers, fixing up issues
where we NULL deref when a device goes bad or missing. From Joe
Lawrence.
- Prep work for drastically reducing the memory consumption of dm
devices from Junichi Nomura. This allows creating clone bio sets
without preallocating a lot of memory.
- Fix a blk-mq hang on certain combinations of queue depths and
hardware queues from me.
- Limit memory consumption for blk-mq devices for crash dump
scenarios and drivers that use crazy high depths (certain SCSI
shared tag setups). We now just use a single queue and limited
depth for that"
* 'for-3.18/core' of git://git.kernel.dk/linux-block: (58 commits)
block: Remove REQ_KERNEL
blk-mq: allocate cpumask on the home node
bio-integrity: remove the needless fail handle of bip_slab creating
block: include func name in __get_request prints
block: make blk_update_request print prefix match ratelimited prefix
blk-merge: don't compute bi_phys_segments from bi_vcnt for cloned bio
block: fix alignment_offset math that assumes io_min is a power-of-2
blk-mq: Make bt_clear_tag() easier to read
blk-mq: fix potential hang if rolling wakeup depth is too high
block: add bioset_create_nobvec()
block: use bio_clone_fast() in blk_rq_prep_clone()
block: misplaced rq_complete tracepoint
sd: Honor block layer integrity handling flags
block: Replace strnicmp with strncasecmp
block: Add T10 Protection Information functions
block: Don't merge requests if integrity flags differ
block: Integrity checksum flag
block: Relocate bio integrity flags
block: Add a disk flag to block integrity profile
block: Add prefix to block integrity profile flags
...
Convert spaces to tabs in kernel-doc notation.
Correct duplicated (copy-paste) kernel-doc comments that are incorrect.
Fix kernel-doc warning:
Warning(..//drivers/scsi/scsi_error.c:1647): No description found for parameter 'shost'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
The blk_get_request function may fail in low-memory conditions or during
device removal (even if __GFP_WAIT is set). To distinguish between these
errors, modify the blk_get_request call stack to return the appropriate
ERR_PTR. Verify that all callers check the return status and consider
IS_ERR instead of a simple NULL pointer check.
For consistency, make a similar change to the blk_mq_alloc_request leg
of blk_get_request. It may fail if the queue is dead, or the caller was
unwilling to wait.
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
Acked-by: Jiri Kosina <jkosina@suse.cz> [for pktdvd]
Acked-by: Boaz Harrosh <bharrosh@panasas.com> [for osd]
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
The blk-core dead queue checks introduce an error scenario to
blk_get_request that returns NULL if the request queue has been
shutdown. This affects the behavior for __GFP_WAIT callers, who should
verify the return value before dereferencing.
Signed-off-by: Joe Lawrence <joe.lawrence@stratus.com>
Acked-by: Jiri Kosina <jkosina@suse.cz> [for pktdvd]
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Jens Axboe <axboe@fb.com>
Avoid taking the host-wide host_lock to check the per-host queue limit.
Instead we do an atomic_inc_return early on to grab our slot in the queue,
and if necessary decrement it after finishing all checks.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Webb Scales <webbnh@hp.com>
Acked-by: Jens Axboe <axboe@kernel.dk>
Tested-by: Bart Van Assche <bvanassche@acm.org>
Tested-by: Robert Elliott <elliott@hp.com>
Using dev_printk variants prefixes the logging message with
the originating device, which makes debugging easier.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
scsi_put_command() is either invoked before blk_start_request() or
after block layer processing has completed. scsi_cmnd.abort_work
is scheduled from inside the SCSI timeout handler. The block layer
guarantees that either the regular completion handler
(softirq_done_fn()) or the timeout handler (rq_timed_out_fn()) is
invoked but not both. This means that scsi_put_command() is never
invoked while abort_work is scheduled. Hence remove the
cancel_delayed_work() call from scsi_put_command().
Similarly, scsi_abort_command() is only invoked from the SCSI
timeout handler. If scsi_abort_command() is invoked for a SCSI
command with the SCSI_EH_ABORT_SCHEDULED flag set this means that
scmd_eh_abort_handler() has already invoked scsi_queue_insert() and
hence that scsi_cmnd.abort_work is no longer pending. Hence also
remove the cancel_delayed_work() call from scsi_abort_command().
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Any callbacks in scsi_timeout_out() might return BLK_EH_RESET_TIMER,
in which case we should leave the result alone and not set
DID_TIME_OUT, as the command didn't actually timeout.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
After scsi_try_to_abort_cmd returns, the eh_abort_handler may have
already found that the command has completed in the device, causing
the host_byte to be nonzero (e.g. it could be DID_ABORT). When
this happens, ORing DID_TIME_OUT into the host byte will corrupt
the result field and initiate an unwanted command retry.
Fix this by using set_host_byte instead, following the model of
commit 2082ebc45a.
Cc: stable@vger.kernel.org
Signed-off-by: Ulrich Obergfell <uobergfe@redhat.com>
[Fix all instances according to review comments. - Paolo]
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Pull block layer fixes from Jens Axboe:
"Final small batch of fixes to be included before -rc1. Some general
cleanups in here as well, but some of the blk-mq fixes we need for the
NVMe conversion and/or scsi-mq. The pull request contains:
- Support for not merging across a specified "chunk size", if set by
the driver. Some NVMe devices perform poorly for IO that crosses
such a chunk, so we need to support it generically as part of
request merging avoid having to do complicated split logic. From
me.
- Bump max tag depth to 10Ki tags. Some scsi devices have a huge
shared tag space. Before we failed with EINVAL if a too large tag
depth was specified, now we truncate it and pass back the actual
value. From me.
- Various blk-mq rq init fixes from me and others.
- A fix for enter on a dying queue for blk-mq from Keith. This is
needed to prevent oopsing on hot device removal.
- Fixup for blk-mq timer addition from Ming Lei.
- Small round of performance fixes for mtip32xx from Sam Bradshaw.
- Minor stack leak fix from Rickard Strandqvist.
- Two __init annotations from Fabian Frederick"
* 'for-linus' of git://git.kernel.dk/linux-block:
block: add __init to blkcg_policy_register
block: add __init to elv_register
block: ensure that bio_add_page() always accepts a page for an empty bio
blk-mq: add timer in blk_mq_start_request
blk-mq: always initialize request->start_time
block: blk-exec.c: Cleaning up local variable address returnd
mtip32xx: minor performance enhancements
blk-mq: ->timeout should be cleared in blk_mq_rq_ctx_init()
blk-mq: don't allow queue entering for a dying queue
blk-mq: bump max tag depth to 10K tags
block: add blk_rq_set_block_pc()
block: add notion of a chunk size for request merging
With the optimizations around not clearing the full request at alloc
time, we are leaving some of the needed init for REQ_TYPE_BLOCK_PC
up to the user allocating the request.
Add a blk_rq_set_block_pc() that sets the command type to
REQ_TYPE_BLOCK_PC, and properly initializes the members associated
with this type of request. Update callers to use this function instead
of manipulating rq->cmd_type directly.
Includes fixes from Christoph Hellwig <hch@lst.de> for my half-assed
attempt.
Signed-off-by: Jens Axboe <axboe@fb.com>
->queuecommand returns '0' for successful command submission,
so we need to set the correct SCSI midlayer return value
when calling scsi_log_completion().
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reported-by: Robert Elliott <elliott@hp.com>
Cc: Stephen Cameron <scameron@beardog.cce.hp.com>
Tested-by: Robert Elliott <elliott@hp.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nicholas Bellinger <nab@linux-iscsi.org>
Reviewed-by: Mike Christie <michaelc@cs.wisc.edu>
Reviewed-by: Hannes Reinecke <hare@suse.de>
This patch fixes a corner case in the previous USB Deadlock fix patch (12023e7
[SCSI] Fix USB deadlock caused by SCSI error handling).
The scenario is abort command, set flag, abort completes, send TUR, TUR
doesn't return, so we now try to abort the TUR, but scsi_abort_eh_cmnd()
will skip the abort because the flag is set and move straight to reset.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
USB requires that every command be aborted first before we escalate to reset.
In particular, USB will deadlock if we try to reset first before aborting the
command.
Unfortunately, the flag we use to tell if a command has already been aborted:
SCSI_EH_ABORT_SCHEDULED is not cleared properly leading to cases where we can
requeue a command with the flag set and proceed immediately to reset if it
fails (thus causing USB to deadlock).
Fix by clearing the SCSI_EH_ABORT_SCHEDULED flag if it has been set. Which
means this will be the second time scsi_abort_command() has been called for
the same command. IE the first abort went out, did its thing, but now the
same command has timed out again.
So this flag gets cleared, and scsi_abort_command() returns FAILED, and _no_
asynchronous abort is being scheduled. scsi_times_out() will then proceed to
call scsi_eh_scmd_add(). But as we've cleared the SCSI_EH_ABORT_SCHEDULED
flag the SCSI_EH_CANCEL_CMD flag will continue to be set, and the command will
be aborted with the main SCSI EH routine.
Reported-by: Alan Stern <stern@rowland.harvard.edu>
Tested-by: Andreas Reis <andreas.reis@gmail.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
We're seeing a case where the contents of scmd->result isn't being reset after
a SCSI command encounters an error, is resubmitted, times out and then gets
handled. The error handler acts on the stale result of the previous error
instead of the timeout. Fix this by properly zeroing the scmd->status before
the command is resubmitted.
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
We unconditionally execute scsi_eh_get_sense() to make sure all failed
commands that should have sense attached, do. However, the routine forgets
that some commands, because of the way they fail, will not have any sense code
... we should not bother them with a REQUEST_SENSE command. Fix this by
testing to see if we actually got a CHECK_CONDITION return and skip asking for
sense if we don't.
Tested-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Many callers won't need this and we can optimize them away. In addition
the handling in the __-prefixed variants was inconsistant to start with.
Based on an earlier patch from Bart Van Assche.
[jejb: fix kerneldoc probelm picked up by Fengguang Wu]
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The former minimum valid value of 'eh_deadline' is 1s, which means
the earliest occasion to shorten EH is 1 second later since a
command is failed or timed out. But if we want to skip EH steps
ASAP, we have to wait until the first EH step is finished. If the
duration of the first EH step is long, this waiting time is
excruciating. So, it is necessary to accept 0 as the minimum valid
value for 'eh_deadline'.
According to my test, with Hannes' patchset 'New EH command timeout
handler' as well, the minimum IO time is improved from 73s
(eh_deadline = 1) to 43s(eh_deadline = 0) when commands are timed
out by disabling RSCN and target port.
Signed-off-by: Ren Mingxin <renmx@cn.fujitsu.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
32bit accesses are guaranteed to be atomic, so we can remove
the spinlock when checking for eh_deadline. We only need to
make sure to catch any updates which might happened during
the call to time_before(); if so we just recheck with the
correct value.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
When a command runs into a timeout we need to send an 'ABORT TASK'
TMF. This is typically done by the 'eh_abort_handler' LLDD callback.
Conceptually, however, this function is a normal SCSI command, so
there is no need to enter the error handler.
This patch implements a new scsi_abort_command() function which
invokes an asynchronous function scsi_eh_abort_handler() to
abort the commands via the usual 'eh_abort_handler'.
If abort succeeds the command is either retried or terminated,
depending on the number of allowed retries. However, 'eh_eflags'
records the abort, so if the retry would fail again the
command is pushed onto the error handler without trying to
abort it (again); it'll be cleared up from SCSI EH.
[hare: smatch detected stray switch fixed]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Commit 18a4d0a22e
(Handle disk devices which can not process medium access commands)
was introduced to offline any device which cannot process medium
access commands.
However, commit 3eef6257de
(Reduce error recovery time by reducing use of TURs) reduced
the number of TURs by sending it only on the first failing
command, which might or might not be a medium access command.
So in combination this results in an erratic device offlining
during EH; if the command where the TUR was sent upon happens
to be a medium access command the device will be set offline,
if not everything proceeds as normal.
This patch moves the check to the final test, eliminating
this problem.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
If a command abort fails there is a fair chance that all other
aborts will be failing, too.
So we should be calling LUN reset directly after the first failed
abort and skip aborting the remaining commands.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This patchs adds an 'eh_deadline' sysfs attribute to the scsi
host which limits the overall runtime of the SCSI EH.
The 'eh_deadline' value is stored in the now obsolete field
'resetting'.
When a command is failed the start time of the EH is stored
in 'last_reset'. If the overall runtime of the SCSI EH is longer
than last_reset + eh_deadline, the EH is short-circuited and
falls through to issue a host reset only.
[jejb: add comments in Scsi_Host about new fields]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Generate a uevent when the following Unit Attention ASC/ASCQ
codes are received:
2A/01 MODE PARAMETERS CHANGED
2A/09 CAPACITY DATA HAS CHANGED
38/07 THIN PROVISIONING SOFT THRESHOLD REACHED
3F/03 INQUIRY DATA HAS CHANGED
3F/0E REPORTED LUNS DATA HAS CHANGED
Log kernel messages when the following Unit Attention ASC/ASCQ
codes are received that are not as specific as those above:
2A/xx PARAMETERS CHANGED
3F/xx TARGET OPERATING CONDITIONS HAVE CHANGED
Added logic to set expecting_lun_change for other LUNs on the target
after REPORTED LUNS DATA HAS CHANGED is received, so that duplicate
uevents are not generated, and clear expecting_lun_change when a
REPORT LUNS command completes, in accordance with the SPC-3
specification regarding reporting of the 3F 0E ASC/ASCQ UA.
[jejb: remove SPC3 test in scsi_report_lun_change and some docbook fixes and
unused variable fix, both reported by Fengguang Wu]
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
When a medium error is detected the SCSI stack should return
ENODATA to the upper layers.
[jejb: fix whitespace error]
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
When the thin provisioning hard threshold is reached we
should return ENOSPC to inform upper layers about this fact.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
We should be modifying the host_byte status in scsi_check_sense()
directly; this saves us to introduce a special return code for
each and every condition.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
The patch set is mostly driver updates (usf, zfcp, lpfc, mpt2sas,
megaraid_sas, bfa, ipr) and a few bug fixes. Also of note is that the
Buslogic driver has been rewritten to a better coding style and 64 bit support
added. We also removed the libsas limitation on 16 bytes for the command size
(currently no drivers make use of this).
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQEcBAABAgAGBQJR0ugCAAoJEDeqqVYsXL0MX2sH+gOkWuy5p3igz+VEim8TNaOA
VV5EIxG1v7Q0ZiXCp/wcF6eqhgQkWvkrKSxWkaN0yzq8LEWfQeY7VmFDbGgFeVUZ
XMlX5ay8+FLCIK9M76oxwhV7VAXYbeUUZafh+xX6StWCdKrl0eJbicOGoUk/pjsi
ZjCBpK5BM0SW+s2gMSDQhO2eMsgMp9QrJMiCJHUF1wWPN8Yez6va1tg4b9iW39BZ
dd3sJq+PuN6yDbYAJIjEpiGF9gDaaYxSE6bTKJuY+oy08+VsP/RRWjorTENs9Aev
rQXZIC3nwsv26QRSX7RDSj+UE+kFV6FcPMWMU3HN2UG6ttprtOxT8tslVJf7LcA=
=BxtF
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull first round of SCSI updates from James Bottomley:
"The patch set is mostly driver updates (usf, zfcp, lpfc, mpt2sas,
megaraid_sas, bfa, ipr) and a few bug fixes. Also of note is that the
Buslogic driver has been rewritten to a better coding style and 64 bit
support added. We also removed the libsas limitation on 16 bytes for
the command size (currently no drivers make use of this)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (101 commits)
[SCSI] megaraid: minor cut and paste error fixed.
[SCSI] ufshcd-pltfrm: remove unnecessary dma_set_coherent_mask() call
[SCSI] ufs: fix register address in UIC error interrupt handling
[SCSI] ufshcd-pltfrm: add missing empty slot in ufs_of_match[]
[SCSI] ufs: use devres functions for ufshcd
[SCSI] ufs: Fix the response UPIU length setting
[SCSI] ufs: rework link start-up process
[SCSI] ufs: remove version check before IS reg clear
[SCSI] ufs: amend interrupt configuration
[SCSI] ufs: wrap the i/o access operations
[SCSI] storvsc: Update the storage protocol to win8 level
[SCSI] storvsc: Increase the value of scsi timeout for storvsc devices
[SCSI] MAINTAINERS: Add myself as the maintainer for BusLogic SCSI driver
[SCSI] BusLogic: Port driver to 64-bit.
[SCSI] BusLogic: Fix style issues
[SCSI] libiscsi: Added new boot entries in the session sysfs
[SCSI] aacraid: Fix for arrays are going offline in the system. System hangs
[SCSI] ipr: IOA Status Code(IOASC) update
[SCSI] sd: Update WRITE SAME heuristics
[SCSI] fnic: potential dead lock in fnic_is_abts_pending()
...
Introduce eh_timeout which can be used for error handling purposes. This
was previously hardcoded to 10 seconds in the SCSI error handling
code. However, for some fast-fail scenarios it is necessary to be able
to tune this as it can take several iterations (bus device, target, bus,
controller) before we give up.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
scsi_send_eh_cmnd() is calling queuecommand() directly, so
it needs to check the return value here.
The only valid return codes for queuecommand() are 'busy'
states, so we need to wait for a bit to allow the LLDD
to recover.
Based on an earlier patch from Wen Xiong.
[jejb: fix confusion between msec and jiffies values and other issues]
[bvanassche: correct stall_for interval]
Cc: Wen Xiong <wenxiong@linux.vnet.ibm.com>
Cc: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This patch tries to shorten the path length of scsi_cmd_to_driver(). As only
REQ_TYPE_BLOCK_PC commands can be submitted without a driver, so we could
avoid the related NULL checking, as long as we make sure we don't use it for
REQ_TYPE_BLOCK_PC type commands. Plus, this fixes a bug where you get
different behaviors from REQ_TYPE_BLOCK_PC commands when a driver is and isn't
attached.
Signed-off-by: Li Zhong <zhong@linux.vnet.ibm.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This is a particularly nasty SCSI ATA Translation Layer (SATL) problem.
SAT-2 says (section 8.12.2)
if the device is in the stopped state as the result of
processing a START STOP UNIT command (see 9.11), then the SATL
shall terminate the TEST UNIT READY command with CHECK CONDITION
status with the sense key set to NOT READY and the additional
sense code of LOGICAL UNIT NOT READY, INITIALIZING COMMAND
REQUIRED;
mpt2sas internal SATL seems to implement this. The result is very confusing
standby behaviour (using hdparm -y). If you suspend a drive and then send
another command, usually it wakes up. However, if the next command is a TEST
UNIT READY, the SATL sees that the drive is suspended and proceeds to follow
the SATL rules for this, returning NOT READY to all subsequent commands. This
means that the ordering of TEST UNIT READY is crucial: if you send TUR and
then a command, you get a NOT READY to both back. If you send a command and
then a TUR, you get GOOD status because the preceeding command woke the drive.
This bit us badly because
commit 85ef06d1d2
Author: Tejun Heo <tj@kernel.org>
Date: Fri Jul 1 16:17:47 2011 +0200
block: flush MEDIA_CHANGE from drivers on close(2)
Changed our ordering on TEST UNIT READY commands meaning that SATA drives
connected to an mpt2sas now suspend and refuse to wake (because the mpt2sas
SATL sees the suspend *before* the drives get awoken by the next ATA command)
resulting in lots of failed commands.
The standard is completely nuts forcing this inconsistent behaviour, but we
have to work around it.
The fix for this is twofold:
1. Set the allow_restart flag so we wake the drive when we see it has been
suspended
2. Return all TEST UNIT READY status directly to the mid layer without any
further error handling which prevents us causing error handling which
may offline the device just because of a media check TUR.
Reported-by: Matthias Prager <linux@matthiasprager.de>
Cc: stable@vger.kernel.org
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
A quick reading of scsi_error_handler() one could come away with the
impression that it does its wakeup event check while the task state is
TASK_RUNNING. In fact it sets TASK_INTERRUPTIBLE at the bottom of the
loop, but that is ~50 lines down.
Just set TASK_INTERRUPTIBLE at the top of loop and be done.
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Rapid ata hotplug on a libsas controller results in cases where libsas
is waiting indefinitely on eh to perform an ata probe.
A race exists between scsi_schedule_eh() and scsi_restart_operations()
in the case when scsi_restart_operations() issues i/o to other devices
in the sas domain. When this happens the host state transitions from
SHOST_RECOVERY (set by scsi_schedule_eh) back to SHOST_RUNNING and
->host_busy is non-zero so we put the eh thread to sleep even though
->host_eh_scheduled is active.
Before putting the error handler to sleep we need to check if the
host_state needs to return to SHOST_RECOVERY for another trip through
eh. Since i/o that is released by scsi_restart_operations has been
blocked for at least one eh cycle, this implementation allows those
i/o's to run before another eh cycle starts to discourage hung task
timeouts.
Cc: <stable@vger.kernel.org>
Reported-by: Tom Jackson <thomas.p.jackson@intel.com>
Tested-by: Tom Jackson <thomas.p.jackson@intel.com>
Signed-off-by: Dan Williams <dan.j.williams@intel.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Pull trivial updates from Jiri Kosina:
"As usual, it's mostly typo fixes, redundant code elimination and some
documentation updates."
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial: (57 commits)
edac, mips: don't change code that has been removed in edac/mips tree
xtensa: Change mail addresses of Hannes Weiner and Oskar Schirmer
lib: Change mail address of Oskar Schirmer
net: Change mail address of Oskar Schirmer
arm/m68k: Change mail address of Sebastian Hess
i2c: Change mail address of Oskar Schirmer
net: Fix tcp_build_and_update_options comment in struct tcp_sock
atomic64_32.h: fix parameter naming mismatch
Kconfig: replace "--- help ---" with "---help---"
c2port: fix bogus Kconfig "default no"
edac: Fix spelling errors.
qla1280: Remove redundant NULL check before release_firmware() call
remoteproc: remove redundant NULL check before release_firmware()
qla2xxx: Remove redundant NULL check before release_firmware() call.
aic94xx: Get rid of redundant NULL check before release_firmware() call
tehuti: delete redundant NULL check before release_firmware()
qlogic: get rid of a redundant test for NULL before call to release_firmware()
bna: remove redundant NULL test before release_firmware()
tg3: remove redundant NULL test before release_firmware() call
typhoon: get rid of redundant conditional before all to release_firmware()
...
Commit 18a4d0a22e ("[SCSI] Handle disk devices which can not process
medium access commands") introduced a bug in which we would attempt to
dereference the scsi driver even when the device had no ULD attached.
Ensure that a driver is registered and make the driver accessor function
more resilient to errors during device discovery.
Reported-by: Elric Fu <elricfu1@gmail.com>
Reported-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
We have experienced several devices which fail in a fashion we do not
currently handle gracefully in SCSI. After a failure these devices will
respond to the SCSI primary command set (INQUIRY, TEST UNIT READY, etc.)
but any command accessing the storage medium will time out.
The following patch adds an callback that can be used by upper level
drivers to inspect the results of an error handling command. This in
turn has been used to implement additional checking in the SCSI disk
driver.
If a medium access command fails twice but TEST UNIT READY succeeds both
times in the subsequent error handling we will offline the device. The
maximum number of failed commands required to take a device offline can
be tweaked in sysfs.
Also add a new error flag to scsi_debug which allows this scenario to be
easily reproduced.
[jejb: fix up integer parsing to use kstrtouint]
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
Permanent target failures are non-retryable and should be classified as
TARGET_ERROR; otherwise dm-multipath will retry an IO request that will
always fail at the target.
A SCSI command that fails with ILLEGAL_REQUEST sense and Additional
sense 0x20, 0x21, 0x24 or 0x26 represents a permanent TARGET_ERROR.
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
This patch fixes the host byte settings DID_TARGET_FAILURE and
DID_NEXUS_FAILURE. The function __scsi_error_from_host_byte, tries to reset
the host byte to DID_OK. But that does not happen because of the OR operation.
Here is the flow.
scsi_softirq_done-> scsi_decide_disposition -> __scsi_error_from_host_byte
Let's take an example with DID_NEXUS_FAILURE. In scsi_decide_disposition,
result will be set as DID_NEXUS_FAILURE (=0x11). Then in
__scsi_error_from_host_byte, when we do OR with DID_OK. Purpose is to reset
it back to DID_OK. But that does not happen. This patch fixes this issue.
Signed-off-by: Babu Moger <babu.moger@netapp.com>
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
With previous change, now the ata port runtime suspend will happen as:
disk suspend --> scsi target suspend --> scsi host suspend --> ata port
suspend
ata port(parent device) suspend need to schedule scsi EH which will resume
scsi host(child device). Then the child device resume will in turn make
parent device resume first. This is kind of recursive.
This patch adds a new flag Scsi_Host::eh_noresume.
ata port will set this flag to skip the runtime PM calls on scsi host.
Acked-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: Lin Ming <ming.m.lin@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Some CD-ROMs fail to report a media change correctly. The specific
one for this patch simply fails to respond to commands, then gives a
UNIT ATTENTION after being reset which returns ASC/ASCQ 28/00. This
is out of spec behaviour, but add a check in the eat CC/UA on reset
path to catch this case so the CD-ROM will function somewhat properly.
[jejb: fixed up white space and accepted without signoff]
Signed-off-by: James Bottomley <JBottomley@Parallels.com>
In error recovery, most scsi error recovery stages will send a TUR command
for every bad command when a driver's error handler reports success. When
several bad commands to the same device, this results in a device
being probed multiple times.
This becomes very problematic if the device or connection is in a state
where the device still doesn't respond to commands even after a recovery
function returns success. The error handler must wait for the test
commands to time out. The time waiting for the redundant commands can
drastically lengthen error recovery.
This patch alters the scsi mid-layer's error routines to send test commands
once per device instead of once per bad command. This can drastically
lower error recovery time.
[jejb: fixed up whitespace and formatting]
Signed-of-by: David Jeffery <djeffery@redhat.com>
Signed-off-by: James Bottomley <jbottomley@parallels.com>
At least log the message that we received a THIN PROVISIONING SOFT
THRESHOLD REACHED Unit Attention. Also added it to unit attention
decodes.
Signed-off-by: Shyam Iyer <shyam_iyer@dell.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This patch reduces the number of sequential pointer derefs in
drivers/scsi/scsi_error.c
This has been submitted a number of times over a couple of years. I
believe this version adresses all comments it has gathered over time.
Please apply or reject with a reason.
The benefits are:
- makes the code easier to read. Lots of sequential derefs of the same
pointers is not easy on the eye.
- theoretically at least, just dereferencing the pointers once can
allow the compiler to generally slightly faster code, so in theory
this could also be a micro speed optimization.
- reduces size of object file (tiny effect: on x86-64, in at least one
configuration, the text size decreased from 9439 bytes to 9400)
- removes some pointless (mostly trailing) whitespace.
Signed-off-by: Jesper Juhl <jj@chaosbits.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Instead of just passing 'EIO' for any I/O error we should be
notifying the upper layers with more details about the cause
of this error.
Update the possible I/O errors to:
- ENOLINK: Link failure between host and target
- EIO: Retryable I/O error
- EREMOTEIO: Non-retryable I/O error
- EBADE: I/O error restricted to the I_T_L nexus
'Retryable' in this context means that an I/O error _might_ be
restricted to the I_T_L nexus (vulgo: path), so retrying on another
nexus / path might succeed.
'Non-retryable' in general refers to a target failure, so this
error will always be generated regardless of the I_T_L nexus
it was send on.
I/O errors restricted to the I_T_L nexus might be retried
on another nexus / path, but they should _not_ be queued
if no paths are available.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Mike Snitzer <snitzer@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The current code in scsi_eh_target_reset() has an off by one error
that actually sends spurious extra resets. Since there's no real need
to reset the targets in numerical order, simply chunk up the command
recovery list doing target resets and pulling matching targets out of
the list (that also makes the loop O(N) instead of O(N^2).
[mike christie found and fixed a list_splice -> list_splice_init problem]
Reported-by: Hillf Danton<dhillf@gmail.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The error handler is using the test cmd->serial_number == 0 in the
abort routines to signal that the command to be aborted has already
completed normally. This design was to close a race window in the
original error handler where a command could go through the normal
completion routines after it timed out but before error handling was
started.
Mike Anderson pointed out that when we converted our timeout and
softirq completions, we picked up atomicity here because the block
layer now mediates this with the REQ_ATOM_COMPLETE flag and guarantees
that *either* the command times out or our done routine is called, but
ensures we can't get both occurring. That makes the serial number
zero check redundant and it can be removed.
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
Move the mid-layer's ->queuecommand() invocation from being locked
with the host lock to being unlocked to facilitate speeding up the
critical path for drivers who don't need this lock taken anyway.
The patch below presents a simple SCSI host lock push-down as an
equivalent transformation. No locking or other behavior should change
with this patch. All existing bugs and locking orders are preserved.
Additionally, add one parameter to queuecommand,
struct Scsi_Host *
and remove one parameter from queuecommand,
void (*done)(struct scsi_cmnd *)
Scsi_Host* is a convenient pointer that most host drivers need anyway,
and 'done' is redundant to struct scsi_cmnd->scsi_done.
Minimal code disturbance was attempted with this change. Most drivers
needed only two one-line modifications for their host lock push-down.
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Acked-by: James Bottomley <James.Bottomley@suse.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
REQ_HARDBARRIER is dead now, so remove the leftovers. What's left
at this point is:
- various checks inside the block layer.
- sanity checks in bio based drivers.
- now unused bio_empty_barrier helper.
- Xen blockfront use of BLKIF_OP_WRITE_BARRIER - it's dead for a while,
but Xen really needs to sort out it's barrier situaton.
- setting of ordered tags in uas - dead code copied from old scsi
drivers.
- scsi different retry for barriers - it's dead and should have been
removed when flushes were converted to FS requests.
- blktrace handling of barriers - removed. Someone who knows blktrace
better should add support for REQ_FLUSH and REQ_FUA, though.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (28 commits)
[SCSI] qla4xxx: fix compilation warning
[SCSI] make error handling more robust in the face of reservations
[SCSI] tgt: fix warning
[SCSI] drivers/message/fusion: Adjust confusing if indentation
[SCSI] Return NEEDS_RETRY for eh commands with status BUSY
[SCSI] ibmvfc: Driver version 1.0.9
[SCSI] ibmvfc: Fix terminate_rport_io
[SCSI] ibmvfc: Fix rport add/delete race resulting in oops
[SCSI] lpfc 8.3.16: Change LPFC driver version to 8.3.16
[SCSI] lpfc 8.3.16: FCoE Discovery and Failover Fixes
[SCSI] lpfc 8.3.16: SLI Additions, updates, and code cleanup
[SCSI] pm8001: introduce missing kfree
[SCSI] qla4xxx: Update driver version to 5.02.00-k3
[SCSI] qla4xxx: Added AER support for ISP82xx
[SCSI] qla4xxx: Handle outstanding mbx cmds on hung f/w scenarios
[SCSI] qla4xxx: updated mbx_sys_info struct to sync with FW 4.6.x
[SCSI] qla4xxx: clear AF_DPC_SCHEDULED flage when exit from do_dpc
[SCSI] qla4xxx: Stop firmware before doing init firmware.
[SCSI] qla4xxx: Use the correct request queue.
[SCSI] qla4xxx: set correct value in sess->recovery_tmo
...
commit 5f91bb050e
Author: Michael Reed <mdr@sgi.com>
Date: Mon Aug 10 11:59:28 2009 -0500
[SCSI] reservation conflict after timeout causes device to be taken offline
Flipped us from always returning failed to always returning success in
the name of fixing the problem where reservation conflict returns from
test unit ready cause the device always to be taken offline.
Unfortuantely, it also introduced a problem whereby for commands other
than test unit ready, the eh dispatcher thinks they succeeded when
reservation conflict is returned, whereas in reality they failed. Fix
this by only returning success for the test unit ready case.
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
When the transport is busy and we're sending an EH command drivers
occasionally return 'BUSY'. As this in most cases is the TUR
command sent as part of the error recovery this is a sure way
to make the error recovery escalate. Returning 'NEEDS_RETRY'
here will just retry the TUR command and eventually abort the
original command, thus making error handling far smoother.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
* 'for-2.6.36' of git://git.kernel.dk/linux-2.6-block: (149 commits)
block: make sure that REQ_* types are seen even with CONFIG_BLOCK=n
xen-blkfront: fix missing out label
blkdev: fix blkdev_issue_zeroout return value
block: update request stacking methods to support discards
block: fix missing export of blk_types.h
writeback: fix bad _bh spinlock nesting
drbd: revert "delay probes", feature is being re-implemented differently
drbd: Initialize all members of sync_conf to their defaults [Bugz 315]
drbd: Disable delay probes for the upcomming release
writeback: cleanup bdi_register
writeback: add new tracepoints
writeback: remove unnecessary init_timer call
writeback: optimize periodic bdi thread wakeups
writeback: prevent unnecessary bdi threads wakeups
writeback: move bdi threads exiting logic to the forker thread
writeback: restructure bdi forker loop a little
writeback: move last_active to bdi
writeback: do not remove bdi from bdi_list
writeback: simplify bdi code a little
writeback: do not lose wake-ups in bdi threads
...
Fixed up pretty trivial conflicts in drivers/block/virtio_blk.c and
drivers/scsi/scsi_error.c as per Jens.
scsi-ml uses REQ_TYPE_BLOCK_PC for flush requests from file
systems. The definition of REQ_TYPE_BLOCK_PC is that we don't retry
requests even when we can (e.g. UNIT ATTENTION) and we send the
response to the callers (then the callers can decide what they want).
We need a workaround such as the commit
77a4229719 to retry BLOCK_PC flush
requests. We will need the similar workaround for discard requests too
since SCSI-ml handle them as BLOCK_PC internally.
This uses REQ_TYPE_FS for flush requests from file systems instead of
REQ_TYPE_BLOCK_PC.
scsi-ml retries only REQ_TYPE_FS requests that have data to
transfer when we can retry them (e.g. UNIT_ATTENTION). However, we
also need to retry REQ_TYPE_FS requests without data because the
callers don't.
This also changes scsi_check_sense() to retry all the REQ_TYPE_FS
requests when appropriate. Thanks to scsi_noretry_cmd(),
REQ_TYPE_BLOCK_PC requests don't be retried as before.
Note that basically, this reverts the commit
77a4229719 since now we use REQ_TYPE_FS
for flush requests.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
Remove all the trivial wrappers for the cmd_type and cmd_flags fields in
struct requests. This allows much easier grepping for different request
types instead of unwinding through macros.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
This patch (as1398b) adds runtime PM support to the SCSI layer. Only
the machanism is provided; use of it is up to the various high-level
drivers, and the patch doesn't change any of them. Except for sg --
the patch expicitly prevents a device from being runtime-suspended
while its sg device file is open.
The implementation is simplistic. In general, hosts and targets are
automatically suspended when all their children are asleep, but for
them the runtime-suspend code doesn't actually do anything. (A host's
runtime PM status is propagated up the device tree, though, so a
runtime-PM-aware lower-level driver could power down the host adapter
hardware at the appropriate times.) There are comments indicating
where a transport class might be notified or some other hooks added.
LUNs are runtime-suspended by calling the drivers' existing suspend
handlers (and likewise for runtime-resume). Somewhat arbitrarily, the
implementation delays for 100 ms before suspending an eligible LUN.
This is because there typically are occasions during bootup when the
same device file is opened and closed several times in quick
succession.
The way this all works is that the SCSI core increments a device's
PM-usage count when it is registered. If a high-level driver does
nothing then the device will not be eligible for runtime-suspend
because of the elevated usage count. If a high-level driver wants to
use runtime PM then it can call scsi_autopm_put_device() in its probe
routine to decrement the usage count and scsi_autopm_get_device() in
its remove routine to restore the original count.
Hosts, targets, and LUNs are not suspended while they are being probed
or removed, or while the error handler is running. In fact, a fairly
large part of the patch consists of code to make sure that things
aren't suspended at such times.
[jejb: fix up compile issues in PM config variations]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
If the user accidentally changes LUN mappings or it occurs
due to a bug, then it can cause data corruption that can take
months and months to track down. This patch adds a log
message when getting REPORT_LUNS_DATA_CHANGED and it adds
a generic message for other Unit Attentions with asc == 0x3f.
We are working on adding support for handling of these errors,
but I think until then we should at least log a message so
tracking down problems as a result of one of these changes
is a little easier.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
There's nastyness in the way we currently handle barriers (and
discards): They're effectively filesystem commands, but they get
processed as BLOCK_PC commands. Unfortunately BLOCK_PC commands are
taken by SCSI to be SG_IO commands and the issuer expects to see and
handle any returned errors, however trivial. This leads to a huge
problem, because the block layer doesn't expect this to happen and any
trivially retryable error on a barrier causes an immediate I/O error
to the filesystem.
The only real way to hack around this is to take the usual class of
offending errors (unit attentions) and make them all retryable in the
case of a REQ_HARDBARRIER. A correct fix would involve a rework of
the entire block and SCSI submit system, and so is out of scope for a
quick fix.
Cc: Hannes Reinecke <hare@suse.de>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
If the scsi eh is running and then a FC LLD calls
fc_remote_port_delete, the SCSI commands sent from the eh will fail.
To prevent this, a FC LLD can call fc_block_scsi_eh from the eh
callback, blocking the eh thread until the dev_loss_tmo fires or the
remote port is available again.
If (e.g. for a multipathing setup) the dev_loss_tmo is set to a very
large value, thus preventing the scsi device removal , the scsi eh can
block for a long time. For multipathing, the fast_io_fail_tmo is then
set to a low value to detect path problems sooner.
This patch introduces a new return code FAST_IO_FAIL. The function
fc_block_scsi_eh now returns FAST_IO_FAIL when the fast_io_fail_tmo
fires. This indicates that the LLD terminated all pending I/O requests
and there are no more pending SCSI commands for the scsi eh to wait
for. This return code can be passed back to the scsi eh to stop the
escalation and finish the recovery process for this device.
Signed-off-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Current FC HBA queue_depth ramp up code depends on last queue
full time. The sdev already has last_queue_full_time field to
track last queue full time but stored value is truncated by
last four bits.
So this patch updates last_queue_full_time without truncating
last 4 bits to store full value and then updates its only
current usages in scsi_track_queue_full to ignore last four bits
to keep current usages same while also use this field
in added ramp up code.
Adds scsi_handle_queue_ramp_up to ramp up queue_depth on
successful completion of IO. The scsi_handle_queue_ramp_up will
do ramp up on all luns of a target, just same as ramp down done
on all luns on a target.
The ramp up is skipped in case the change_queue_depth is not
supported by LLD or already reached to added max_queue_depth.
Updates added max_queue_depth on every new update to default
queue_depth value.
The ramp up is also skipped if lapsed time since either last
queue ramp up or down is less than LLD specified
queue_ramp_up_period.
Adds queue_ramp_up_period to sysfs but only if change_queue_depth
is supported since ramp up and queue_ramp_up_period is needed only
in case change_queue_depth is supported first.
Initializes queue_ramp_up_period to 120HZ jiffies as initial
default value, it is same as used in existing lpfc and qla2xxx.
-v2
Combined all ramp code into this single patch.
-v3
Moves max_queue_depth initialization after slave_configure is
called from after slave_alloc calling done. Also adjusted
max_queue_depth check to skip ramp up if current queue_depth
is >= max_queue_depth.
-v4
Changes sdev->queue_ramp_up_period unit to ms when using sysfs i/f
to store or show its value.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Tested-by: Christof Schmitt <christof.schmitt@de.ibm.com>
Tested-by: Giridhar Malavali <giridhar.malavali@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
This has scsi-ml call the change_queue_depth functions when
we get a QUEUE_FULL. It will only change the queue depth if
change_queue_depth is set because the LLD may have to
modify some internal resources, so I thought this would
be the safest route.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
-v2
Limits change_queue_depth to only all luns of target by adding
channel check while iterating for all luns of Scsi_Host. This is
same as currently qla2xxx FC HBA does on QUEUE_FULL event.
Signed-off-by: Vasu Dev <vasu.dev@intel.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
A target reset when I/O is ongoing might result
an eventual device offline, as scsi_eh_completed_normally()
might return ADD_TO_MLQUEUE in addition to the
advertised SUCCESS, FAILED, and NEEDS_RETRY.
Which is unfortunate as scsi_send_eh_cmnd() will
therefore map ADD_TO_MLQUEUE to FAILED instead of
the more appropriate NEEDS_RETRY.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Cc: Stable Tree <stable@kernel.org>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
An IBM tape drive failed to complete a PERSISTENT RESERVE IN within the scsi
cmd timeout. Error recovery was initiated and it sequenced from abort through
taking the tape drive offline.
The device was taken offline because it repeatedly responded to the TUR command
issued by error recovery with a RESERVATION CONFLICT status. The tape drive
was reserved to another system. This is perfectly legitimate response to TUR,
and is one that an escalation of recovery is unlikely to clear. Further,
escalation of recovery can have undesirable side effects on the operation of
tape drives shared with other initiators.
Instead of escalating recovery, error recovery should treat the RESERVATION
CONFLICT response to the TUR as a good status, giving the issuer of the
command the opportunity to handle the timeout and reservation conflict.
Signed-off-by: Michael reed <mdr@sgi.com>
Signed-off-by: James Bottomley <James.Bottomley@suse.de>
The Documentation is incorrect (we removed some functions referred to), and
none of the bug warnings now apply. Additionally remove the spurious check on
the return from blk_get_request() which can't fail if __GFP_WAIT is passed in.
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
No one uses scsi_execute_async with data transfer now. We can remove
scsi_req_map_sg.
Only scsi_eh_lock_door uses scsi_execute_async. scsi_eh_lock_door
doesn't handle sense and the callback. So we can remove
scsi_io_context too.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6: (45 commits)
[SCSI] qla2xxx: Update version number to 8.03.00-k1.
[SCSI] qla2xxx: Add ISP81XX support.
[SCSI] qla2xxx: Use proper request/response queues with MQ instantiations.
[SCSI] qla2xxx: Correct MQ-chain information retrieval during a firmware dump.
[SCSI] qla2xxx: Collapse EFT/FCE copy procedures during a firmware dump.
[SCSI] qla2xxx: Don't pollute kernel logs with ZIO/RIO status messages.
[SCSI] qla2xxx: Don't fallback to interrupt-polling during re-initialization with MSI-X enabled.
[SCSI] qla2xxx: Remove support for reading/writing HW-event-log.
[SCSI] cxgb3i: add missing include
[SCSI] scsi_lib: fix DID_RESET status problems
[SCSI] fc transport: restore missing dev_loss_tmo callback to LLDD
[SCSI] aha152x_cs: Fix regression that keeps driver from using shared interrupts
[SCSI] sd: Correctly handle 6-byte commands with DIX
[SCSI] sd: DIF: Fix tagging on platforms with signed char
[SCSI] sd: DIF: Show app tag on error
[SCSI] Fix error handling for DIF/DIX
[SCSI] scsi_lib: don't decrement busy counters when inserting commands
[SCSI] libsas: fix test for negative unsigned and typos
[SCSI] a2091, gvp11: kill warn_unused_result warnings
[SCSI] fusion: Move a dereference below a NULL test
...
Fixed up trivial conflict due to moving the async part of sd_probe
around in the async probes vs using dev_set_name() in naming.
Make sure the control flow in scsi_times_out makes sense.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch improves handling of TASK ABORTED status by Linux SCSI
mid-layer. Currently, command returned with this status considered
failed and returned to upper layers. It leads to additional error
recovery load on file systems and block layer, which sometimes can
cause undesired side effects, like I/O errors and file systems
corruptions. See http://lkml.org/lkml/2008/11/1/38, for instance.
From other side, TASK ABORTED status is returned by SCSI target if the
corresponding command was aborted by another initiator and the target
has TAS bit set in the control mode page. So, in the majority of cases
commands with TASK ABORTED status should be simply retried. In other
cases, maybe_retry path will not retry if no retries are allowed.
This patch implement suggestion by James Bottomley from
http://marc.info/?l=linux-scsi&m=121932916906009&w=2.
Signed-off-by: Vladislav Bolkhovitin <vst@vlnb.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
...and the list of recent breakage goes on and on, this time
it's 242f9dcb8b (block: unify request timeout handling)
which broke it.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
scsi_eh_try_stu() was still using the timeout parameter in the device
which is now not set (i.e. zero filled) meaning that it waited no time
at all for the start unit command to complete (leading the routine to
conclude failure every time). This lead to a 2.6.27 regression:
http://bugzilla.kernel.org/show_bug.cgi?id=12120
Where firewire devices that were non spec compliant wouldn't spin up.
Fix this by using the block queue timeout value instead.
Reported-by: Stefan Richter <stefanr@s5r6.in-berlin.de>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Drivers want to be able to return DID_TRANSPORT_DISRUPTED and
have it do the right thing for commands like tape and passthrouh
as far as retries go. The LLDs previously used DID_BUS_BUSY or DID_ERROR
which followed the cmd->retries limit, but DID_TRANSPORT_DISRUPTED
was skipping that check so it could have caused a problem with tape
commands.
This patch has DID_TRANSPORT_DISRUPTED check the cmd->retries/cmd->allowed.
Signed-off-by: Mike Christie <mchristi@redhat.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
There's a target reset bug.
This loop:
for (id = 0; id <= shost->max_id; id++) {
Never terminates if shost->max_id is set to ~0, like aic94xx does.
It's also pretty inefficient since you mostly have compact target
numbers, but the max_id can be very high. The best way would be to
sort the recovery list by target id and skip them if they're equal,
but even a worst case O(N^2) traversal is probably OK here, so fix it
by finding the next highest target number (assuming n+1) and
terminating when there isn't one.
Cc: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This checks the errors the scsi-ml determined were retryable
and returns if we should fast fail it based on the request
fail fast flags.
Without the patch, drivers like lpfc, qla2xxx and fcoe would return
DID_ERROR for what it determines is a temporary communication problem.
There is no loss of connectivity at that time and the driver thinks
that it would be fast to retry at the driver level. SCSI-ml will however
sees fast fail on the request and DID_ERROR and will fast fail the io.
This will then cause dm-multipath to fail the path and possibley switch
target controllers when we should be retrying at the scsi layer.
We also were fast failing device errors to dm multiapth when
unless the scsi_dh modules think otherwis we want to retry at
the scsi layer because multipath can only retry the IO like scsi
should have done. multipath is a little dumber though because it
does not what the error was for and assumes that it should fail
the paths.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Currently, if there is a transport problem the iscsi drivers will return
outstanding commands (commands being exeucted by the driver/fw/hw) with
DID_BUS_BUSY and block the session so no new commands can be queued.
Commands that are caught between the failure handling and blocking are
failed with DID_IMM_RETRY or one of the scsi ml queuecommand return values.
When the recovery_timeout fires, the iscsi drivers then fail IO with
DID_NO_CONNECT.
For fcp, some drivers will fail some outstanding IO (disk but possibly not
tape) with DID_BUS_BUSY or DID_ERROR or some other value that causes a retry
and hits the scsi_error.c failfast check, block the rport, and commands
caught in the race are failed with DID_IMM_RETRY. Other drivers, may
hold onto all IO and wait for the terminate_rport_io or dev_loss_tmo_callbk
to be called.
The following patches attempt to unify what upper layers will see drivers
like multipath can make a good guess. This relies on drivers being
hooked into their transport class.
This first patch just defines two new host byte errors so drivers can
return the same value for when a rport/session is blocked and for
when the fast_io_fail_tmo fires.
The idea is that if the LLD/class detects a problem and is going to block
a rport/session, then if the LLD wants or must return the command to scsi-ml,
then it can return it with DID_TRANSPORT_DISRUPTED. This will requeue
the IO into the same scsi queue it came from, until the fast io fail timer
fires and the class decides what to do.
When using multipath and the fast_io_fail_tmo fires then the class
can fail commands with DID_TRANSPORT_FAILFAST or drivers can use
DID_TRANSPORT_FAILFAST in their terminate_rport_io callbacks or
the equivlent in iscsi if we ever implement more advanced recovery methods.
A LLD, like lpfc, could continue to return DID_ERROR and then it will hit
the normal failfast path, so drivers do not have fully be ported to
work better. The point of the patches is that upper layers will
not see a failure that could be recovered from while the rport/session is
blocked until fast_io_fail_tmo/recovery_timeout fires.
V3
Remove some comments.
V2
Fixed patch/diff errors and renamed DID_TRANSPORT_BLOCKED to
DID_TRANSPORT_DISRUPTED.
V1
initial patch.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Right now SCSI and others do their own command timeout handling.
Move those bits to the block layer.
Instead of having a timer per command, we try to be a bit more clever
and simply have one per-queue. This avoids the overhead of having to
tear down and setup a timer for each command, so it will result in a lot
less timer fiddling.
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
Change scsi_check_sense HARDWARE_ERROR check to return ADD_TO_MLQUEUE
if device->retry_hwerror is set to allow retries to occur without
restriction of blk_noretry_request check.
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
[jejb: fixed up a ton of missed conversions.
All of you are on notice this has happened, driver trees will now
need to be rebased]
Signed-off-by: Harvey Harrison <harvey.harrison@gmail.com>
Cc: SCSI List <linux-scsi@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This patch (as1116) fixes a bug in scsi_eh_prep_cmnd() and
scsi_eh_restore_cmnd(). These routines are supposed to save any
values they change and restore them later, but someone forgot to
save & restore scmd->underflow.
This fixes part of the problem reported in Bugzilla #9638.
[jejb: fix up rejections around DIF/DIX]
Signed-off-by: Alan Stern <stern@rowland.harvard.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
If initiator or target reject the I/O due to DIF errors there is no
point in retrying.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Controllers that support DMA of protection information must be told
explicitly how to handle the I/O. The controller has no knowledge of
the protection capabilities of the target device so this information
must be passed in the scsi_cmnd.
- The protection operation tells the HBA whether to generate, strip or
verify protection information.
- The protection type tells the HBA which layout the target is
formatted with. This is necessary because the controller must be
able to correctly interpret the included protection information in
order to verify it.
- When a scsi_cmnd is reused for error handling the protection
operation must be cleared and saved while error handling is in
progress.
- prot_op and prot_type are placed in an existing hole in scsi_cmnd
and don't cause the structure to grow.
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Some of the storage devices (that can be accessed through multiple paths),
do need some special handling for
1. Activating the passive path of the storage access.
2. Decode and handle the special sense codes returned by the devices.
3. Handle the I/Os being sent to the passive path, especially
during the device probe time.
when accessed through multiple paths.
As of today this special device handling is done at the dm-multipath
layer using dm-handlers. That works well for (1); for (2) to be handled
at dm layer, scsi sense information need to be exported from SCSI to dm-layer,
which is not very attractive; (3) cannot be done at all at the dm layer.
Device handler has been moved to SCSI mainly to handle (2) and (3) properly.
Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com>
Signed-off-by: Mike Anderson <andmike@linux.vnet.ibm.com>
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
* git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi-misc-2.6:
[SCSI] aic94xx: fix section mismatch
[SCSI] u14-34f: Fix 32bit only problem
[SCSI] dpt_i2o: sysfs code
[SCSI] dpt_i2o: 64 bit support
[SCSI] dpt_i2o: move from virt_to_bus/bus_to_virt to dma_alloc_coherent
[SCSI] dpt_i2o: use standard __init / __exit code
[SCSI] megaraid_sas: fix suspend/resume sections
[SCSI] aacraid: Add Power Management support
[SCSI] aacraid: Fix jbod operations scan issues
[SCSI] aacraid: Fix warning about macro side-effects
[SCSI] add support for variable length extended commands
[SCSI] Let scsi_cmnd->cmnd use request->cmd buffer
[SCSI] bsg: add large command support
[SCSI] aacraid: Fix down_interruptible() to check the return value correctly
[SCSI] megaraid_sas; Update the Version and Changelog
[SCSI] ibmvscsi: Handle non SCSI error status
[SCSI] bug fix for free list handling
[SCSI] ipr: Rename ipr's state scsi host attribute to prevent collisions
[SCSI] megaraid_mbox: fix Dell CERC firmware problem
- struct scsi_cmnd had a 16 bytes command buffer of its own.
This is an unnecessary duplication and copy of request's
cmd. It is probably left overs from the time that scsi_cmnd
could function without a request attached. So clean that up.
- Once above is done, few places, apart from scsi-ml, needed
adjustments due to changing the data type of scsi_cmnd->cmnd.
- Lots of drivers still use MAX_COMMAND_SIZE. So I have left
that #define but equate it to BLK_MAX_CDB. The way I see it
and is reflected in the patch below is.
MAX_COMMAND_SIZE - means: The longest fixed-length (*) SCSI CDB
as per the SCSI standard and is not related
to the implementation.
BLK_MAX_CDB. - The allocated space at the request level
- I have audit all ISA drivers and made sure none use ->cmnd in a DMA
Operation. Same audit was done by Andi Kleen.
(*)fixed-length here means commands that their size can be determined
by their opcode and the CDB does not carry a length specifier, (unlike
the VARIABLE_LENGTH_CMD(0x7f) command). This is actually not exactly
true and the SCSI standard also defines extended commands and
vendor specific commands that can be bigger than 16 bytes. The kernel
will support these using the same infrastructure used for VARLEN CDB's.
So in effect MAX_COMMAND_SIZE means the maximum size command
scsi-ml supports without specifying a cmd_len by ULD's
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Any path needs to call it to initialize the request.
This is a preparation for large command support, which needs to
initialize the request in a proper way (that is, just doing a memset()
will not work).
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Cc: Jens Axboe <jens.axboe@oracle.com>
Signed-off-by: Jens Axboe <jens.axboe@oracle.com>
This adds scsi_build_sense_buffer, a simple helper function to build
sense data in a buffer.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
The problem is that serveral drivers are sending a target reset from the
device reset handler, and if we have multiple devices a target reset gets
sent for each device when only one would be sufficient. And if we do a target
reset it affects all the commands on the target so the device reset handler
code only cleaning up one devices's commands makes programming the driver a
little more difficult than it should be.
This patch adds a target reset handler, which drivers can use to send
a target reset. If successful it cleans up the commands for a devices
accessed through that starget.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
At the block level bidi request uses req->next_rq pointer for a second
bidi_read request.
At Scsi-midlayer a second scsi_data_buffer structure is used for the
bidi_read part. This bidi scsi_data_buffer is put on
request->next_rq->special. Struct scsi_cmnd is not changed.
- Define scsi_bidi_cmnd() to return true if it is a bidi request and a
second sgtable was allocated.
- Define scsi_in()/scsi_out() to return the in or out scsi_data_buffer
from this command This API is to isolate users from the mechanics of
bidi.
- Define scsi_end_bidi_request() to do what scsi_end_request() does but
for a bidi request. This is necessary because bidi commands are a bit
tricky here. (See comments in body)
- scsi_release_buffers() will also release the bidi_read scsi_data_buffer
- scsi_io_completion() on bidi commands will now call
scsi_end_bidi_request() and return.
- The previous work done in scsi_init_io() is now done in a new
scsi_init_sgtable() (which is 99% identical to old scsi_init_io())
The new scsi_init_io() will call the above twice if needed also for
the bidi_read command. Only at this point is a command bidi.
- In scsi_error.c at scsi_eh_prep/restore_cmnd() make sure bidi-lld is not
confused by a get-sense command that looks like bidi. This is done
by puting NULL at request->next_rq, and restoring.
[jejb: update to sg_table and resolve conflicts
also update to blk-end-request and resolve conflicts]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
In preparation for bidi we abstract all IO members of scsi_cmnd,
that will need to duplicate, into a substructure.
- Group all IO members of scsi_cmnd into a scsi_data_buffer
structure.
- Adjust accessors to new members.
- scsi_{alloc,free}_sgtable receive a scsi_data_buffer instead of
scsi_cmnd. And work on it.
- Adjust scsi_init_io() and scsi_release_buffers() for above
change.
- Fix other parts of scsi_lib/scsi.c to members migration. Use
accessors where appropriate.
- fix Documentation about scsi_cmnd in scsi_host.h
- scsi_error.c
* Changed needed members of struct scsi_eh_save.
* Careful considerations in scsi_eh_prep/restore_cmnd.
- sd.c and sr.c
* sd and sr would adjust IO size to align on device's block
size so code needs to change once we move to scsi_data_buff
implementation.
* Convert code to use scsi_for_each_sg
* Use data accessors where appropriate.
- tgt: convert libsrp to use scsi_data_buffer
- isd200: This driver still bangs on scsi_cmnd IO members,
so need changing
[jejb: rebased on top of sg_table patches fixed up conflicts
and used the synergy to eliminate use_sg and sg_count]
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This replaces sizeof sense_buffer with SCSI_SENSE_BUFFERSIZE in
several LLDs. It's a preparation for the future changes to remove
sense_buffer array in scsi_cmnd structure.
Signed-off-by: FUJITA Tomonori <fujita.tomonori@lab.ntt.co.jp>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
- Change title to remove "Mid-Layer" since the doc is about all of the
SCSI layers.
- Use "SCSI" instead of "scsi" in docbook text.
- Use "*/" to end kernel-doc notation blocks.
- A few other minor typo fixes.
Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Add Documentation/DocBook/scsi_midlayer.tmpl, add to Makefile, and update
lots of kerneldoc comments in drivers/scsi/*.
Updated with comments from Stefan Richter, Stephen M. Cameron,
James Bottomley and Randy Dunlap.
Signed-off-by: Rob Landley <rob@landley.net>
Signed-off-by: James Bottomley <James.Bottomley@HansenPartnership.com>
This reverts commit ac40532ef0, which gets
us back the original cleanup of 6f5391c283.
It turns out that the bug that was triggered by that commit was
apparently not actually triggered by that commit at all, and just the
testing conditions had changed enough to make it appear to be due to it.
The real problem seems to have been found by Peter Osterlund:
"pktcdvd sets it [block device size] when opening the /dev/pktcdvd
device, but when the drive is later opened as /dev/scd0, there is
nothing that sets it back. (Btw, 40944 is possible if the disk is a
CDRW that was formatted with "cdrwtool -m 10236".)
The problem is that pktcdvd opens the cd device in non-blocking mode
when pktsetup is run, and doesn't close it again until pktsetup -d is
run. The effect is that if you meanwhile open the cd device,
blkdev.c:do_open() doesn't call bd_set_size() because
bdev->bd_openers is non-zero."
In particular, to repeat the bug (regardless of whether commit
6f5391c283 is applied or not):
" 1. Start with an empty drive.
2. pktsetup 0 /dev/scd0
3. Insert a CD containing an isofs filesystem.
4. mount /dev/pktcdvd/0 /mnt/tmp
5. umount /mnt/tmp
6. Press the eject button.
7. Insert a DVD containing a non-writable filesystem.
8. mount /dev/scd0 /mnt/tmp
9. find /mnt/tmp -type f -print0 | xargs -0 sha1sum >/dev/null
10. If the DVD contains data beyond the physical size of a CD, you
get I/O errors in the terminal, and dmesg reports lots of
"attempt to access beyond end of device" errors."
which in turn is because the nested open after the media change won't
cause the size to be set properly (because the original open still holds
the block device, and we only do the bd_set_size() when we don't have
other people holding the device open).
The proper fix for that is probably to just do something like
bdev->bd_inode->i_size = (loff_t)get_capacity(disk)<<9;
in fs/block_dev.c:do_open() even for the cases where we're not the
original opener (but *not* call bd_set_size(), since that will also
change the block size of the device).
Cc: Peter Osterlund <petero2@telia.com>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Cc: Matthew Wilcox <matthew@wil.cx>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This reverts commit 6f5391c283 ("[SCSI]
Get rid of scsi_cmnd->done") that was supposed to be a cleanup commit,
but apparently it causes regressions:
Bug 9370 - v2.6.24-rc2-409-g9418d5d: attempt to access beyond end of device
http://bugzilla.kernel.org/show_bug.cgi?id=9370
this patch should be reintroduced in a more split-up form to make
testing of it easier.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Matthew Wilcox <matthew@wil.cx>
Cc: James Bottomley <James.Bottomley@HansenPartnership.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Spotted by Paul Jackson <pj@sgi.com>
The error handler rework moved the scatterlist into a globally exposed
structure in scsi_eh.h; unfortunately, the scatterlist include needs
to move from scsi_error.c to scsi_eh.h to allow this to compile
universally.
Acked-by: Paul Jackson <pj@sgi.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
- Drivers/transports that want to send a synchronous REQUEST_SENSE command
as part of their .queuecommand sequence, have 2 new API's that facilitate
in doing so and abstract them from scsi-ml internals.
void scsi_eh_prep_cmnd(struct scsi_cmnd *scmd,
struct scsi_eh_save *sesci, unsigned char *cmnd,
int cmnd_size, int sense_bytes)
Will hijack a command and prepare it for request sense if needed.
And will save any later needed info into a scsi_eh_save structure.
void scsi_eh_restore_cmnd(struct scsi_cmnd* scmd,
struct scsi_eh_save *sesci);
Will undo any changes done to a command by above function. Making
it ready for completion.
- Re-factor scsi_send_eh_cmnd() to use above APIs
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
- regrouped variables for easier reviewing of next patch
- Support of cmnd==NULL in call to scsi_send_eh_cmnd()
- In the @sense_bytes case set transfer size to the minimum
size of sense_buffer and passed @sense_bytes. cmnd[4] is
set accordingly.
- REQUEST_SENSE is set into cmnd[0] so if @sense_bytes is
not Zero passed @cmnd should be NULL.
- Also save/restore resid of failed command.
- Adjust caller
Signed-off-by: Boaz Harrosh <bharrosh@panasas.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The ULD ->done callback moves into the scsi_driver. By moving the call
to scsi_io_completion() from scsi_blk_pc_done() to scsi_finish_command(),
we can eliminate the latter entirely. By returning 'good_bytes' from
the ->done callback (rather than invoking scsi_io_completion()), we can
stop exporting scsi_io_completion().
Also move the prototypes from sd.h to sd.c as they're all internal anyway.
Rename sd_rw_intr to sd_done and rw_intr to sr_done.
Inspired-by: Christoph Hellwig <hch@infradead.org>
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The pid field is a duplicate of the serial_number field and has been
scheduled for removal for a long time. A few drivers were still using
it, so just change them to use serial_number instead.
Signed-off-by: Matthew Wilcox <willy@linux.intel.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The current code prints:
scsi 13:0:4:0: scsi: Device offlined - not ready after error recovery
which is repetitively redundant. This patch changes that message to:
scsi 6:0:6:0: Device offlined - not ready after error recovery
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Every file should #include the headers containing the prototypes for its
global functions (in this case for scsi_schedule_eh()).
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Currently, the freezer treats all tasks as freezable, except for the kernel
threads that explicitly set the PF_NOFREEZE flag for themselves. This
approach is problematic, since it requires every kernel thread to either
set PF_NOFREEZE explicitly, or call try_to_freeze(), even if it doesn't
care for the freezing of tasks at all.
It seems better to only require the kernel threads that want to or need to
be frozen to use some freezer-related code and to remove any
freezer-related code from the other (nonfreezable) kernel threads, which is
done in this patch.
The patch causes all kernel threads to be nonfreezable by default (ie. to
have PF_NOFREEZE set by default) and introduces the set_freezable()
function that should be called by the freezable kernel threads in order to
unset PF_NOFREEZE. It also makes all of the currently freezable kernel
threads call set_freezable(), so it shouldn't cause any (intentional)
change of behaviour to appear. Additionally, it updates documentation to
describe the freezing of tasks more accurately.
[akpm@linux-foundation.org: build fixes]
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Acked-by: Nigel Cunningham <nigel@nigel.suspend2.net>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Oleg Nesterov <oleg@tv-sign.ru>
Cc: Gautham R Shenoy <ego@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Now that the block submission path correctly bounces, we can simply
use the command sense_buffer to send to retrieve sense information and
junk the unnecessary page allocation.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Use the sysfs configurable timeout when issuing a START_UNIT
command from the scsi error handler. This is needed for devices which
take longer than thirty seconds to respond to the start
unit. The problem was observed when sending a start unit
to a disk array device in an ipr RAID adapter, which results
in the adapter firmware sending potentially multiple commands
to physical devices as a result of this command, which ended
up timing out sometimes. This patch does not change the default
value used for this command.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Currently, the scsi error handler will issue a START_UNIT
command if the drive indicates it needs its motor started
and the allow_restart flag is set in the scsi_device. If,
after the scsi error handler invokes a host adapter reset
due to error recovery, a device is in a unit attention
state AND also needs a START_UNIT, that device will be placed
offline. The disk array devices on an ipr RAID adapter
will do exactly this when in a dual initiator configuration.
This patch adds a single retry to the EH initiated
START_UNIT.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Patch modified and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This fixes a regression caused by commit:
2dc611de5a
The sense buffer code in scsi_send_eh_cmnd was changed to use
alloc_page() and a scatter list, but the sense data copy was not
updated to match so what we actually get in the sense buffer is total
grabage starting with the kernel address of the struct page we got.
Basically the stack frame of scsi_send_eh_cmd() is what ends up
in the sense buffer.
Depending upon how pointers look on a given platform, you can
end up getting sr_ioctl.c errors when you mount a cdrom. If
the CDROM gives a check condition for GPCMD_GET_CONFIGURATION issued
by drivers/cdrom/cdrom.c:cdrom_mmc_profile(), sr_ioctl will
spit out this error message in sr_do_ioctl() with the way pointers
are on sparc64:
default:
printk(KERN_ERR "%s: CDROM (ioctl) error, command: ", cd->cdi.name);
__scsi_print_command(cgc->cmd);
scsi_print_sense_hdr("sr", &sshdr);
err = -EIO;
This is the error Tom Callaway reported in:
http://marc.info/?l=linux-sparc&m=117407453208101&w=2
Anyways, fix this by using page_address(sgl.page) which is OK
because we know this is low-mem due to GFP_ATOMIC.
Signed-off-by: David S. Miller <davem@davemloft.net>
Acked-by: Christoph Hellwig <hch@lst.de>
It looks like megaraid_sas at least needs this to throttle its commands
as they begin to time out. The code keeps the existing transport
template use of eh_timed_out (and allows the transport to override the
host if they both have this callback).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
If an EH command times out today, the LLDD's abort handler
will be called to abort the command. It is assumed that this
completes successfully, which can result in the command getting
completed later resulting in an oops. Improve the current
implementation by escalating all the way to host reset if
necessary in order to clean up the EH command.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
1) If the device reports an uncorrectable MEDIUM ERROR, such
as SK MEDIUM ERROR, ASC UNRECOVERED READ ERR, AMNF DATA
FIELD or RECORD NOT FOUND, then: In scsi_check_sense()
return SUCCESS so as to not retry -- the error is
uncorrectable -- this speeds up total processing time.
Signed-off-by: Luben Tuikov <ltuikov@yahoo.com>
Extracted the MEDIUM ERROR piece and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Export a couple of functions from scsi_error that are needed to handle
failed SCSI commands from the SAS EH.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
make exports GPL and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_send_eh_cmnd is the last user of non-sg commands currently.
This patch switches it to a one-element SG list. Also updates the
kerneldoc comment for scsi_send_eh_cmnd to reflect reality while we're
at it.
Test on my mptsas card, but this should get testing with as many
drivers as possible.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The callers of scsi_send_eh_cmnd are setting the cmnd buffer,
and then scsi_send_eh_cmnd is copying that updated buffer to
the old_cmnd variable. Then after the command runs, we end up
copying that old_cmnd var which has the new cmnd to the scsi
command buffer. When this command gets recent, all types of fun
things happen like getting TUR or START_STOP commands with
data and scatterlists.
This patch made against scsi-rc-fixes, has the callers of
scsi_send_eh_cmnd pass in the command so scsi_send_eh_cmnd
can do the right thing. This should go into 2.6.18 since this
fixes a regression added when we removed some of the scsi_cmnd
fields and replaced them with local variables.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Currently struct scsi_cmnd has various fields that are used to backup
original data after the corresponding fields have been overridden for
EH commands. This means drivers can easily get at it and misuse it.
Due to the old_ naming this doesn't happen for most of them, but two
that have different names have been used wrong a lot (see previous
patch). Another downside is that they unessecarily bloat the scsi_cmnd
size.
This patch moves them onstack in scsi_send_eh_cmnd to fix those two
issues aswell as allowing future EH fixes like moving the EH command
submissions to use SG lists like everything else.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The scsi midlayer portion of the patch
Signed-off-by: James Smart <James.Smart@emulex.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The RQ_SCSI_* flags are a vestiage of a long past history. The EH code
still sets them but we never make use of that information. The other
users is pluto.c which never had a chance to work but needs to be kept
compiling to keep Davem happy, so copy over the definition there.
We could probably get rid of RQ_ACTIVE/RQ_INACTIVE aswell with some
work, there's only two more or less bogus looking uses in ubd and scsi.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
With Achim patch the last user (gdth) is switched away from scsi_request
so we an kill it now. Also disables some code in i2o_scsi that was
broken since the sg driver stopped using scsi_requests.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
libata implemented a feature to schedule EH without an associated EH
by manipulating shost->host_eh_scheduled in ata_scsi_schedule_eh()
directly. Move this function to scsi_error.c and rename it to
scsi_schedule_eh(). It is now an exported API for SCSI transports and
exported via new header file drivers/scsi/scsi_transport_api.h
This patch also de-export scsi_eh_wakeup() which was exported
specifically for ata_scsi_schedule_eh().
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
libata needs to invoke EH without scmd. This patch adds
shost->host_eh_scheduled to implement such behavior.
Currently the only user of this feature is libata and no general
interface is defined. This patch simply adds handling for
host_eh_scheduled where needed and exports scsi_eh_wakeup() to
modules. The rest is upto libata. This is the result of the
following discussion.
http://thread.gmane.org/gmane.linux.scsi/23853/focus=9760
In short, SCSI host is not supposed to know about exceptions unrelated
to specific device or command. Such exceptions should be handled by
transport layer proper. However, the distinction is not essential to
ATA and libata is planning to depart from SCSI, so, for the time
being, libata will be using SCSI EH to handle such exceptions.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Overriding the whole EH code is a per-transport, not per-host thing.
Move ->eh_strategy_handler to the transport class, same as
->eh_timed_out.
Downside is that scsi_host_alloc can't check for the total lack of EH
anymore, but the transition period from old EH where we needed it is
long gone already.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jeff Garzik <jeff@garzik.org>
This moves the eh_timed_out functionality from the scsi_host_template
to the transport_template. Given that this is now a transport function,
the EH_RESET_TIMER case no longer caps the timer reschedulings. The
transport guarantees that this is not an infinite condition.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fix up an off by one error in calculating retries for scsi
commands. This bug was discovered when an SG_IO request
was sent to scsi core with retries = 0, causing the overall
timeout check to go off in scsi_softirq_done.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Export two SCSI EH command handling functions. To be used by libata EH.
Signed-off-by: Tejun Heo <htejun@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
When the scsi_execute_async interface was added it ended up reducing
the flexibility of userspace to send arbitrary scsi commands through
sg using SG_IO. The SG_IO interface allows userspace to specify the
CDB length. This is now ignored in scsi_execute_async and it is
guessed using the COMMAND_SIZE macro, which is not always correct,
particularly for vendor specific commands. This patch adds a cmd_len
parameter to the scsi_execute_async interface to allow the caller
to specify the length of the CDB.
Signed-off-by: Brian King <brking@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This merge is pretty extensive. The conflict is over the new
req->retries parameter, so I had to change the prototype to
scsi_setup_blk_pc_cmnd() and the usage in sd, sr and st.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add scsi helpers to create really-large-requests and convert
scsi-ml to scsi_execute_async().
Per Jens's previous comments, I placed this function in scsi_lib.c.
I made it follow all the queue's limits - I think I did at least :), so
I removed the warning on the function header.
I think the scsi_execute_* functions should eventually take a request_queue
and be placed some place where the dm-multipath hw_handler can use them
if that failover code is going to stay in the kernel. That conversion
patch will be sent in another mail though.
Signed-off-by: Mike Christie <michaelc@cs.wisc.edu>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The eh_action semaphore in scsi_eh_send_command is cleared after a
command timeout. The command is subsequently aborted and the abort
will try to call scsi_done() on it. Unfortunately, the scsi_eh_done()
routine unconditinally completes the semaphore (which is now null).
Fix this race by makiong the scsi_eh_done() routine check that the
semaphore is non null before completing it (mirroring the ordinary
command done/timeout logic).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
scsi_send_eh_cmnd currently uses a semaphore and an overload of eh_timer
to either get a completion for a command for a timeout.
Switch to using a completion and wait_for_completion_timeout to simply
the code and not having to deal with the races ourselves.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
now that the abuse in qla2xxx is gone this field can be remove.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
adjust comments, remove a useless cast and remove a write-only variable.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>