libsas: Don't BUG when connecting two expanders via wide port
When a device is connected to an expander, the discovery process goes through
sas_ex_discover_dev to figure out what's attached to the phy. If it is the
case that the phy being discovered happens to be the second phy of a wide link
to an expander, that discover_dev function will incorrectly call
sas_ex_discover_expander, which creates another sas_port and tries to attach the
other sas_phys to the new port, thus triggering a BUG. The correct thing to do is
to check the other ex_phys of the expander to see if there's a sas_port for this
sas_phy, and attach the sas_phy to the existing sas_port.
This is easily triggered if one enables the phys of a wide port between
expanders one by one.
This second version of the patch fixes a small regression in the case where
all the phys show up at once and we accidentally try to attach to a port
that hasn't been created yet.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
On Thu, 1 Feb 2007, Andrew Morton wrote:
> On Thu, 1 Feb 2007 15:34:29 -0800
> bugme-daemon@bugzilla.kernel.org wrote:
>
> > http://bugzilla.kernel.org/show_bug.cgi?id=7919
> >
> > Summary: Tape dies if wrong block size used
> > Kernel Version: 2.6.20-rc5
> > Status: NEW
> > Severity: normal
> > Owner: scsi_drivers-other@kernel-bugs.osdl.org
> > Submitter: dmartin@sccd.ctc.edu
> >
> >
> > Most recent kernel where this bug did *NOT* occur: 2.6.17.14
> >
> > Other Kernels Tested and Results:
> >
> > OK 2.6.15.7
> > OK 2.6.16.37
> > OK 2.6.17.14
> > BAD 2.6.18.6
> > BAD 2.6.18-1.2869.fc6
> > BAD 2.6.19.2 +
> > BAD 2.6.20-rc5
> >
> > NOTE: 2.6.18-1.2869.fc6 is a Fedora modified kernel, all others are from kernel.org
> >
...
> > Steps to reproduce:
> > Get a Adaptec AHA-2940U/UW/D / AIC-7881U card and a tape drive,
> > install a recent kernel
> > set the tape block size - mt setblk 4096
> > read from or write to tape using wrong block size - tar -b 7 -cvf /dev/tape foo
> >
Write does not trigger this bug because the driver refuses in fixed block
mode writes that are not a multiple of the block size. Read does trigger
it in my system.
The bug is not associated with any specific HBA. st tries to do direct i/o
in fixed block mode with reads that are not a multiple of tape block size.
The patch in this message fixes the st problem by switching to using the
driver buffer up to the next close of the device file in fixed block mode
if the user asks for a read like this.
I don't know why the bug has surfaced only after 2.6.17 although the st
problem is old. There may be another bug in the block subsystem and this
patch works around it. However, the patch fixes a problem in st and in
this way it is a valid fix.
This patch may also fix the bug 7900.
The patch compiles and is lightly tested.
Signed-off-by: Kai Makisara <kai.makisara@kolumbus.fi>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
1) If the device reports an uncorrectable MEDIUM ERROR, such
as SK MEDIUM ERROR, ASC UNRECOVERED READ ERR, AMNF DATA
FIELD or RECORD NOT FOUND, then: In scsi_check_sense()
return SUCCESS so as to not retry -- the error is
uncorrectable -- this speeds up total processing time.
Signed-off-by: Luben Tuikov <ltuikov@yahoo.com>
Extracted the MEDIUM ERROR piece and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Since, mailbox commands are executed in a synchronous
manner, there is no need to have a separate spinlock
primitive to protect data/register access shared by callers.
Signed-off-by: Seokmann Ju <seokmann.ju@qlogic.com>
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
As ISP24xx firmware can return a CS_DATA_UNDERRUN completion
status when the storage has returned a
SAM_STAT_TASK_SET_FULL scsi-status.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Non-ISP24xx cards must have a loop-id in order to query host
statistics.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Limit assignments via qla2x00_model_name[] array to HBA
subsystem vendor IDs equal to QLogic.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Previous work to add asynchronous-scsi-scanning support
(d19044c32b) caused peculiar
semantic changes when no cabling was attached to the HBA
whereby unneeded and intrusive 'error-handling' would take
place due to the initial link state being unset.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Similarly to previous LOGO requests on non-24xx hardware,
perform an implicit-LOGO as to avoid the potential 2 *
R_A_TOV delay which can result during an explicit-LOGO
request.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This includes BIOS, EFI, FCODE and firmware versions.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
No restriction should be placed on the IRQ number assigned
to a given ISP. Original code incorrectly assumed a
non-zero IRQ number assignment by the system. In these
circumstances the proper freeing of the IRQ (via free_irq())
would not take place.
Signed-off-by: Andrew Vasquez <andrew.vasquez@qlogic.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
What DMA for 16bit pcmcia card, anyway? We never do request_dma()
there and ->dma_channel never changes since initialization to -1.
IOW, that call is dead code.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
This patch removes a duplicate device id from the IPR driver. Based on
the ipr.h file, I'm not so sure this was intended to be a duplicate, and
if so, the .h file should be modified to use the proper sub-device id
instead.
This was pointed out to me by Kay Sievers <kay.sievers@suse.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Acked-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Set allow_restart=1 for all SAS disks so that they are spun up when needed.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Register libsas's default device reset code with the scsi.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch moves the code that handles SAS failures out of the main EH
function and into a separate function. It also detects commands that have
no sas_task (i.e. they completed, but with error data) and sends them into
scsi_error for processing. This allows us to handle SCSI errors (and
enables auto-spinup as a side effect) instead of dropping them on the
floor and falling into an infinite loop. It also requires the
implementation of a device reset function, which the SAS failure code has
been modified to employ for REQ_DEVICE_RESET.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Export a couple of functions from scsi_error that are needed to handle
failed SCSI commands from the SAS EH.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
make exports GPL and
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Get rid of: "warning: ignoring return value of sysfs_create_link..."
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sas_rphy_delete does two things: it removes the sas_rphy from the transport
layer and frees the sas_rphy. This can be broken down into two functions,
sas_rphy_remove and sas_rphy_free; sas_rphy_remove is of interest to
sas_discover_root_expander because it calls functions that require
sas_rphy_add as a prerequisite and can fail (namely sas_discover_expander).
In that case, sas_discover_root_expander needs to be able to undo the effects
of sas_rphy_add yet leave the job of freeing the sas_rphy to the caller of
sas_discover_root_expander.
This patch also removes some unnecessary code from sas_discover_end_dev
to eliminate an unnecessary cycle of sas_notify_lldd_gone/found for SAS
devices, thus eliminating a sas_rphy_remove call (and fixing a race condition
where a SCSI target scan can come in between the gone and found call).
It also moves the sas_rphy_free calls into sas_discover_domain and
sas_ex_discover_end_dev to complement the sas_rphy_allocation via
sas_get_port_device.
This patch does not change the semantics of sas_rphy_delete.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Currently, sas_form_port checks the given asd_sas_phy's sas_phy to see if
there's already a port attached. If so, the SAS addresses of the port and
the phy are compared to determine if we need to detach from the port
because the addresses don't match or if we can stop; the SAS address stored
in the sas_port reflects whatever device _was_ attached to the port/phy, and
the SAS address stored in the sas_port reflects whatever device we just
discovered. As written, the code detaches from the port if the addresses
_do_ match, and prints an error if they do _not_ match. I believe this to
be incorrect, as it seems more logical to keep the port if the addresses
match (i.e. the phy was reset but the device didn't change), and detach it
they do not (i.e. the device changed).
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn,
Take the expose_physicals flag and allow the user to select default (physicals
available via /dev/sg), exposed (physicals available via /dev/sd for
experimental reasons) and hidden (physicals blocked from all access). This
expands the functionality of the previous expose_physicals insmod parameter
which was added to support some experimental configurations.
Signed-off-by Mark Haverkamp <markh@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn,
Replace all if/else packet formations with platform function calls. This is in
recognition of the proliferation of read and write packet types, and in the
need to migrate to up-and-coming packets for new products.
Signed-off-by Mark Haverkamp <markh@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn,
Add in the NEMER/ARK physical register mapping, represented in up and coming
products currently under test at Adaptec.
Signed-off-by Mark Haverkamp <markh@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Received from Mark Salyzyn,
Replace all if/else communication transports with a platform function call.
This is in recognition of the need to migrate to up-and-coming transports.
Currently the Linux driver does not support two available communication
transports provided by our products, these will be added in future patches, and
will expand the platform function set.
Signed-off-by Mark Haverkamp <markh@linux-foundation.org>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Since the pci_block_user_cfg_access API was modified to track
block/unblocks, it was discovered that the ipr driver had a
path through its code (in PCI error recovery) which would unblock
when not previously blocked.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Don't fail initialization of an adapter if the PCI-X registers
cannot be found since it may be a PCI-E adapter.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Since ipr handles dynamic ids, it must handle driver_data
not being set, so remove the current usage of driver_data
so it can be used for other things in future patches.
Signed-off-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
fix typos and bump version number
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Acked-by: Alexis Bruemmer <alexisb@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The EH should fall into I_T recovery (and potentially stronger
remedies) if ABORT TASK fails.
Signed-off-by: Alexis Bruemmer <alexisb@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Track sas_ha_struct state so that we ignore events that come in while
we're shutting things down.
Signed-off-by: Malahal Naineni <malahal@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Convert the phy port locks to use irq spinlocks.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add the necessary hooks to the aic94xx driver to support the asynchronous SCSI
device scan infrastructure.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: Matthew Wilcox <matthew@wil.cx>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Allowing the phy reset controls to be world-triggerable does not seem like
a terribly good idea because SAS devices can be disrupted (and ATA devices
are really disrupted) by a phy reset. By default only root should be able
to do things like that.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Extend the use of the DDB lock to include all DDB accesses, because
DDB updates now occur from multiple threads. This fixes the SMP timeout
problems that we were occasionally seeing with a x260, because the
controller got confused when the DDBs got corrupted.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Ed Chim of Adaptec informs us that the DDB registers need to be zeroed at
initialization time and that some SCB initializations need to happen even if
we don't use the SCB.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The vmalloc() blob holding the sequencer firmware wasn't being released at
module unload time, which resulted in a memory leak.
Signed-off-by: Alexis Bruemmer <alexisb@us.ibm.com>
Acked-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Now that task aborts and device port resets are done by the EH, we can
remove all the code that set up workqueues and such and simply call
sas_task_abort and let libsas figure things out.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
sas_task_abort() should simply abort the upper-level SCSI command and wait
until the error handler to send the actual ABORT TASK command. By
deferring things to the EH we simplify the concurrency coordination and
eliminate some race conditions. Note that sas_task_abort has a few hooks
to handle libsas internal commands properly too.
Also rename do_sas_task_abort to __sas_task_abort just in case we really
want to abort the task *right now* and we don't have a scsi_cmnd attached
to the command. This is a hook for libata internal commands to abort.
Signed-off-by: Darrick J. Wong <djwong@us.ibm.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>