MPT fusion driver initialization fails while second kernel is booting,
after a system crash (if kdump kernel is configured). Oops message is
pasted below.
*****************************************************************************
Fusion MPT base driver 3.03.08
Copyright (c) 1999-2005 LSI Logic Corporation
Fusion MPT SAS Host driver 3.03.08 ACPI: PCI Interrupt 0000:01:00.0[A] -> Link [LNKA] -> GSI 5 (level, low) -> IRQ 5
mptbase: Initiating ioc0 bringup
BUG: unable to handle kernel paging request at virtual address 00002608
printing eip:
c11782fd
*pde = 00000000
Oops: 0000 [#1]
Modules linked in:
CPU: 0
EIP: 0060:[<c11782fd>] Not tainted VLI
EFLAGS: 00010046 (2.6.17-rc1-16M #2)
EIP is at mptscsih_io_done+0x27/0x3a3
eax: c4fed000 ebx: c4fed000 ecx: 00002600 edx: 00000298
esi: c11782d6 edi: 00002600 ebp: 00000000 esp: c1332f74
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c1332000 task=c128f9c0) Stack: <0>0000006c 00000020 00000298 00002600 c4fed000 c4fed000 c11782d6 0000260 0
00000000 c1172c49 c4fed000 c1305b40 00000005 00000000 c1172d75 c48877e0
c1029687 00000000 c1307fb8 00000000 c1305a00 00000001 00000000 c1307fb8
Call Trace:
<c11782d6> mptscsih_io_done+0x0/0x3a3 <c1172c49> mpt_turbo_reply+0xbb/0xd3
<c1172d75> mpt_interrupt+0x22/0x2b <c1029687> misrouted_irq+0x63/0xcb
<c10297b3> note_interrupt+0x43/0x98 <c10292f9> __do_IRQ+0x68/0x8f
<c1003fac> do_IRQ+0x36/0x4e
=======================
<c1002aa6> common_interrupt+0x1a/0x20 <c1001150> mwait_idle+0x1a/0x2a
<c10010bf> cpu_idle+0x40/0x5c <c1308610> start_kernel+0x17a/0x17c Code: 5e 5f 5d c3 55 89 cd 57 56 53 83 ec 14 89 54 24 0c 89 44 24 10 8b 90 cc 00 00 00 8b 4c 24 0c 81 c2 98 02 00 00 85 ed 89 54 24 08 <0f> b7 79 08 89 fe 74 04 0f b7 75 08 66 39 f7 75 0d 8b 44 24 0c
*******************************************************************************
o Kdump capture kernel boot fails during initialization of MPT fusion driver.
(LSI Logic / Symbios Logic SAS1064E PCI-Express Fusion-MPT SAS (rev 01))
o Problem is easily reproducible, if system crashed while some disk activity
like cp operation was going on.
o After a system crash, devices are not shutdown and capture kernel starts
booting while skipping BIOS. Hence underlying device is left in operational
state. In this case scsi contoller was left with interrupt line asserted
reply FIFO was not empty. When driver starts initializing in the second
kernel, it receives the interrupt the moment request_irq() is called.
Interrupt handler, reads the message from reply FIFO and tries to access
the associated message frame and panics, as in the new kernel's context
that message frame is not valid at all.
o In this scenario, probably we should delay the request_irq() call. First
bring up the IOC, reset it if needed and then should register for irq.
o I have tested the patch with SAS1064E and 53c1030 controllers.
Signed-off-by: Vivek Goyal <vgoyal@in.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Acked-by: "Moore, Eric Dean" <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
All registered reset callback handlers are called during reset processing.
The mptspi modules has its own reset callback handler, just recently
added for issuing domain validation after host reset. If either the mptsas or
mptfc driver are loaded, this callback could be called. Thus resulting
in domain validation being issued for sas or fibre end devices.
Fix this by having mptbase.c check the bus type against the driver
type and only call the reset handler if they match (or if it's a
non-bus specific reset handler).
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
A race condition exists in mptfc between the thread registering a device
with the fc transport and the scan work generated by the transport.
This race existed prior to the application of the mptfc bug fix patch.
mptfc_register_dev() calls fc_remote_port_add() with the FC_RPORT_ROLE_TARGET
bit set in the rport ids passed to the function. Having this bit set causes
fc_remote_port_add() to schedule a scan of the device.
This scan can execute before mptfc_register_dev() can fill in the dd_data
in the rport structure. When this happens, mptfc_target_alloc() will fail
because dd_data is null.
Attached is a patch which fixes the problem. The patch changes the rport ids
passed to fc_remote_port_add() to not have the TARGET bit set. This prevents
the scan from being scheduled. After mptfc_register_dev() fills in the rport
dd_data field, fc_remote_port_rolechg() is called, changing the role of the
rport to TARGET. Thus, the scan is scheduled after dd_data is filled
in which prevents the failure in mptfc_target_alloc().
Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This is a bug fix for mptspi driver, where after a host reset or
resume, we revalidate the negotiation parameters for all devices.
This bug was introduced when the driver was ported to use the spi
transport layer.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Bug fix for stack overflow in EventDescriptionStr, (a function
for debuging firmware events). We allocated 50 bytes on local stack
for buff[], however there are places in the code where we've attempted
copying in greater than 50 bytes into buff[].
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
mptbase.h
bump version number to 3.03.09
remove unneeded flags
define workq and remove old fc specific locks
mptbase.c
initialize new lock and don't initialize two removed locks
mptscsih.c
when firmware reports target is no longer there, return
DID_REQUEUE for fc hosts so that i/o doesn't get killed until
the transport has an opportunity to manage the loss via its
dev loss timer
when the "eh_abort" routine is called, check to see if the
driver has the command or not before looking to see if a reset
is pending. James Smart and I talked about this and believe
that the API for this routine is: if driver doesn't have
command, return SUCCESS. This change helps prevent a target
from being taken offline. SUCCESS is returned because it's
likely that the command completed after error recovery timed
it out but before it could be aborted.
provide a routine to queue work to newly created workq, and
use it.
remove "ioc" from mptscsih_abort() it was only used one time.
the other references were via hd->ioc, so I just moved it....
net change in references to ioc via hd->ioc is zero
move hd->resetPending test and hd->timeouts increment to after
the test for whether the command to be aborted remains known
to the driver
Make certain that the workq exists before queuing work to it.
mptfc.c
no longer need to lock rport data structures as I was able to
single thread the code! I fixed up the debug code to
eliminate compilation messages due to type mismatch in the
printk. Got rid of some no longer needed rport flags.
Initialize and destroy the workq used for the rescan work.
simplify the logic regarding the increment of
fc_rescan_work_count. use post increment and test for zero
vs. pre increment and test for one; eliminate work_count
variable: queue_work can be called with the work_lock held as
it doesn't sleep
Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch handles case where raid hidden components
are not being removed when power turned off to device
attached to expander, as well as the case of
exposing raid components when power is turned back on
to devices attached to an expander. (This is a repost
of this patch, with mptsas_is_end_device declared
further up in the code.)
This patch contains some other miscellaneous bug fix's.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Driver panic when RAID logical volume was present when driver
loaded, or when a RAID logical volume was created on the fly.
This issue was created in due to recent scsi_transport_sas change,
when sas_read_port_mode_page was added into the mptsas drivers
slave_config entry point.
This new API expects that all sdev's to be assocated to an rphy, however
that is not the case for logical volumes, as they are created using
scsi_add_device, instead of sas_rphy_add().
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The conversion of mptsas should allow the elimination of the contained
flag in the sas transport class.
Acked-by: "Moore, Eric" <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This adds support for hot adding and removing
expanders, and its associated attached devices.
When there is a change in topology,
the fusion firmware sends the
MPI_EVENT_SAS_DISCOVERY event to the driver.
The driver will read firmware config pages
to determine what changes took place, and refresh
drivers view of the world stored in ioc->sas_topology.
Here is the details of the action the driver does:
(1) Expander Added : The mptsas_discovery_work
workqueue is called. Config pages read, and
ioc->sas_topology is refreshed. The sas_phy_add()
is called for each phy of the expander. The
expanders attached devices are added via
sas_rphy_add(). Added end devices are handled within
the MPT_ADD_DEVICE logic in mptsas_hotplug_work
workqueue.
(2) Expander Delete : The sas_rphy_delete() will be
called for the top most compenent of the parent that the
expander is attached to. The sas_rphy_delete call
will delete all the children phys, rphys, and end devices.
This is handled from mptsas_discovery_work workqueue.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Suppport for exposing hidden raid components
for sg interface. The sdev->no_uld_attach flag
will set set accordingly.
The sas module supports adding/removing raid
volumes using online storage management application
interface.
This patch rely's on patch's provided to me
by Christoph Hellwig, that exports device_reprobe.
I will post those patch's on behalf of Christoph.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Changelog:
(1) fix memory leak: p->phy_info
(2) initialize device_info and port_info data fields
(3) initialize the hba firmware handle
(4) initialize phy_id for attached phy_info data fields
(5) initialize attached phy_info data fields
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Cleanup of mptsas firmware event handlers.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
It makes no sense in keeping the target_id and bus_id
in the VirtDevice structure, when it can be obtained
from the VirtTarget structure.
In addition, this patch fix's couple compilation bugs
in mptfc.c when MPT_DEBUG_FC is enabled. This
provided by Micheal Reed.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Patch previously provided from Adrian Bunk <bunk@stusta.de>,
moving some functions to static. This is already in
the -mm tree.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Created a debug level MPT_DEBUG_VERBOSE_EVENTS.
Moving some of the more vebose debug messages
for firwmare events into new debug level. Also
added some more firmware events descriptions.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This header is provided to better understand
loginfo codes returned by the mpt fusion firmware.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
It was actually rendered unused by the move to the spi transport
class, but never taken out.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This is the first half of a patch to add the generic domain validation
to mptspi. It also creates a secondary "virtual" channel for raid
component devices since these are now exported with no_uld_attach.
What Eric and I would have really liked is to export all physical
components on channel 0 and all raid components on channel 1.
Unfortunately, this would result in device renumbering on platforms with
mixed RAID/Physical devices which was considered unacceptable for
userland stability reasons.
Still to be done is to plug back the extra parameter setting and DV
pieces on reset and hotplug.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Adds support to retrieve the enclosure and bay identifiers. This patch
is from Eric with minor modifications from me, rewritten from a buggy
patch of mine, based on the earlier CSMI implementation from Eric..
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch makes two needlessly global functions static.
Signed-off-by: Adrian Bunk <bunk@stusta.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Removes wierd humor, and bad language printk in mptlan.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Fix's firmware download ioctl to work with SAS.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Moving the toolbox call from mptbase.c, over to
mptctl.c, and using the mptctl infastructure to issue
the call. The existing code is hanging on certain HP platforms
when this ioctl is issued, and this patch fix's that.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Bug fix for correctly setting sense width
for the MPTCOMMAND ioctl.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Use the hard coded value MPTCTL_EVENT_LOG_SIZE to fix
bug where in certain cases, the ioc->eventLogSize was
initialized.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Change from using wait_event_interruptible_timeout to
wait_event_timeout. Also delete white space and duplicate
line of code.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Add bus_type recognization in ioctl path for SAS.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This adds support for new function types in
the existing MPTCOMMAND ioctl.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This adds a sanity check in the interrupt routine
insures incoming message frames are a valid
message frames.
The code for setting 0xdeadbeaf in the freed message
frames, apparently was already submitted by Christoph
in previous patch submission.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch inhibits sending spi negotiation parameters
for non-configured devices from the slave_destroy function.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The ioc->alt_ioc->alt_ioc pointer is not getting cleared
during driver unload time. This dangling pointer
can result in panic in certain circumstances, such
as error recovery, or firmware download in flashless
environments. This only impacts dual functions controllers,
such as 1030. Please apply.
This patch also includes a small cosmetic name change
for mpt_spi_log_info.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
When people use the userspace scanning facilities on SAS hardware the
LLDD gets bogus slave_alloc calls. Just fail those gracefully instead
of printing a warning in mptsas and another one in the midlayer.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Adding verbose message returned from firmware
when a task mangment request fails.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
On Mon, Jan 16, 2006 at 06:53:24PM -0700, Moore, Eric wrote:
> Adding MSI support, and command line for enabling
> it. By default, the command line option has MSI disabled.
mpt_msi_enable is initialized to 0 implicitly, no need to do that. Also
replace if (mpt_msi_enable == 1) tests with just if (mpt_msi_enable).
Updated patch below:
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
A customer request to send raid asyn actions
from firmware to the event syslog. This shows
when raid volumes go degraded, or complete resync,
or volumes created/deleted, etc.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Increasing the reply frame size by 16 bytes, to
be in sync with the other fusion drivers.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
On Mon, Jan 16, 2006 at 06:53:13PM -0700, Moore, Eric wrote:
> The task managment request timeout in the eh threads was set
> for U320 timing, which is between 2-5 seconds.
> This is too small for FC and SAS.
> According to the firmware engineers, Fibre needs to be 40 seconds
> and SAS needs to be 10 seconds.
The timeout selection should probably be done in a little helper instead
of duplicated in a few places. Updated patch below.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
Increase the port enable timeout only for SAS from 30 to 300 seconds.
A customer request for the handling large topologies.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This patch is for spi. This issues bus reset when driver
loads. Handling cases when initator has negotiated for packetized,
and target negotiated for non-packetized; effectly this bus reset
is getting both target and initiator on the same sheet of music.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The prior fusion patches moved an invocation of a function,
mptscsih_TMHandler(), static to mptscsih.c into mptsas.c
Make the function unstatic, move the header to mptscsih.h and export it.
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This fix's problems with recent fc submission regarding
i/o being redirected to the wrong target.
Signed-off-by: Michael Reed <mdr@sgi.com>
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
This moves code intented for SAS from
the generic mptscsih module over to the
mptsas module.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>
The issuing of the target reset
used in device hot removal case so the
firmware queue is flushed out off outstanding
commands.
Signed-off-by: Eric Moore <Eric.Moore@lsil.com>
Signed-off-by: James Bottomley <James.Bottomley@SteelEye.com>