In 2009, While running "cache read" performance test of drives behind
SII PMP we encountered a "all 5 drives" timeout on more than 30% of the
machines under test. This patch reduces the rate by a factor of about 70.
Low enough that we didn't care to further investigate the issue.
Performance impact with any sort of "normal" use was ~2%+ CPU and less
than 1% throughput degradation. Worst case impact (cached read) was
6% IOPS reduction. This is with NCQ off (q=1) but I believe FIS based
switching enabled in the SATA driver.
The patch disables "Early ACK" in the 3726 port multiplier.
"Early ACK" is issued when device sends a FIS to the host (via PMP)
and the PMP sends an ACK immediately back to the device - well before
the host gets the response. Under worst case IOPs load (cached read
test) and more than 2 PMPs connected to a 4-port SATA controller,
I suspect the time to service all of the PMPs is exceeding the PMPs
ability to keep track of outstanding FIS it owes the Host. Reducing
the number of PMPs to 2 (or 1) reduces the frequency by several orders
of magnitude. Kudos to Gwendal for initial debugging of this issue.
[Any errors in the description are mine, not his.]
Patch is currently in production on Google servers.
Signed-off-by: Grant Grundler <grundler@google.com>
Signed-off-by: Gwendal Grignou <gwendal@google.com>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Implicit slab.h inclusion via percpu.h is about to go away. Make sure
gfp.h or slab.h is included as necessary.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Stephen Rothwell <sfr@canb.auug.org.au>
Cc: Jeff Garzik <jgarzik@pobox.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
It turns out different generations of MCPs have differing quirks.
* MCP 65-73 : FPDMA AA broken, lies about PMP support, forgets to report NCQ
* MCP 77-79 : FPDMA AA broken, lies about PMP support
* MCP 89 : FPDMA AA broken
Instead of turngin off FPDMA AA on all NVIDIAs, implement
HFLAG_NO_FPDMA_AA, define additional board IDs and apply necessary
quirks.
This fixes bko#15481 and the list of quirks is verified by Peer Chen.
http://bugzilla.kernel.org/show_bug.cgi?id=15481
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peer Chen <pchen@nvidia.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
I've prepared a totally simple patch that, if I did it and measured it
correctly, reduces the text size as of the ppc-6xx-size command of
pata-mpc52xx by more than 10%, by reducing the rodata size from 0x4a4
to 0x17e bytes. This is simply done by changing the data types of the
ATA timing constants.
If you are interested at all, and it's worth the trouble, here the
details:
ppc-6xx-size:
text data bss dec hex filename
old: 6532 1068 0 7600 1db0 pata-mpc52xx.o
new: 5718 1068 0 6786 1a82 pata-mpc52xx.o
The (assembler) code itself doesn't really change very much. I double
checked the final results inside mpc52xx-ata-apply-timings() and they
match. The driver is still working fine of course.
Signed-off-by: Roman Fietze <roman.fietze@telemotive.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
ahci over time has grown a number of board IDs and it's a bit of mess
right now. Clean it up such that,
* board_id_* now live in a separate enum board_ids and numbers are
assigned automatically.
* Board IDs assigned to features are separated from the ones assigned
to specific implementations and both are ordered alphabetically.
* For NV MCPs, define per-generation alias board_ids and assign
matching aliases in the pci id table. This makes mcp_linux, 67-73
use board_ahci_mcp65 instead of board_ahci_yesncq. Both are
identical in content.
* Kill now unused board_ahci_nopmp and board_ahci_yesncq.
This patch doesn't cause any functional change but will make future
changes to board_ids and quirks much less painful.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peer Chen <pchen@nvidia.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
According to section 10.3.1 of the AHCI spec, PxCMD.ST must not be set
unless there's a device attached. Following this saves us a measurable
quantity of power and does not impair hotplug support. Based on a patch
by Kristen Carlson Accardi.
Signed-off-by: Matthew Garrett <mjg@redhat.com>
Cc: Kristen Carlson Accardi <kristen.c.accardi@intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This can be used for AHCI-compatible interfaces implemented inside
System-On-Chip solutions, or AHCI devices connected via localbus.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch should contain no functional changes, just moves code
around.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Factor out some ahci_em_messages handling code from ahci_init_one().
We would like to reuse it for non-PCI devices.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Introduce ahci_pci_print_info() that now handles PCI stuff.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Move PCI stuff into ahci_pci_init_controller().
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
To make the function bus-independand we have to get rid of
"struct pci_dev *", so let's pass just "struct devce *".
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Move PCI stuff into ahci_pci_reset_controller().
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
To make the function generic we have to get rid of "struct pci_dev *",
so let's pass just a "struct devce *".
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Make ahci_save_initial_config() a bit more generic by introducing
force_port_map and mask_port_map arguments.
Move PCI stuff into ahci_pci_save_initial_config().
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Currently the driver uses host->iomap to store all the iomapped BARs
of a PCI device (while AHCI devices actually use just a single memory
window).
We're going to teach AHCI to work with non-PCI buses, so there are two
options to make this work:
1. "fake" host->iomap array for non-PCI devices, and place the needed
address at iomap[AHCI_PCI_BAR];
2. Get rid of host->iomap usage, instead introduce a private mmio
field.
This patch implements the second option.
Signed-off-by: Anton Vorontsov <avorontsov@ru.mvista.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch fixes the bad hashes for one Kingston and one Transcend card.
Thanks to komuro for pointing this out.
Signed-off-by: Kristoffer Ericson <kristoffer.ericson@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This patch adds idstrings for Kingston 1GB/4GB and Transcend 4GB/8GB.
Signed-off-by: Kristoffer Ericson <kristoffer.ericson@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
blk_abort_request() expectes queue lock to be held by the caller.
Grab it before calling the function.
Lack of this synchronization led to infinite loop on corrupt
q->timeout_list.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: stable@kernel.org
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Some BIOSes don't configure HPA during boot but do so while resuming.
This causes harddrives to shrink during resume making libata detach
and reattach them. This can be worked around by unlocking HPA if old
size equals native size.
Add ATA_DFLAG_UNLOCK_HPA so that HPA unlocking can be controlled
per-device and update ata_dev_revalidate() such that it sets
ATA_DFLAG_UNLOCK_HPA and fails with -EIO when the above condition is
detected.
This patch fixes the following bug.
https://bugzilla.kernel.org/show_bug.cgi?id=15396
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Oleksandr Yermolenko <yaa.bta@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Crucial said,
Thank you for contacting us. We know that with our M225 line of SSDs
you sometimes need to disable NCQ (native command queuing) to avoid
just the type of errors you're seeing. Our recommendation for the
M225 is to add libata.force=noncq to your Linux kernel boot options,
under the kernel ATA library option.
I have sent your feedback to the engineers working on the C300, and
asked them to please pass it on to the firmware team. I have been
notified that they are in the process of testing and finalizing a
new firmware version, that you can expect to see released around the
end of April. We’ll keep you posted as to when it will be available
for download.
So, turn off NCQ on the drive w/ the current firmware revision.
Reported in the following bug.
https://bugzilla.kernel.org/show_bug.cgi?id=15573
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: lethalwp@scarlet.be
Reported-by: Luke Macken <lmacken@redhat.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
On configurations where IRQ line is shared with a different
controller, spurious IRQs may happen continuously. The message was
put there primarily for debugging anyway. Kill it.
Signed-off-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
When using VT6410/6415/6330 chips on some VIA's platforms, the HDD
connection to VT6410/6415/6330 cannot be detected.
It is because the driver detects wrong via_isa_bridge ID, and then
causes this issue to happen.
Signed-off-by: Joseph Chan <josephchan@via.com.tw>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Commit 27943620cb introduced spurious
IRQ handling but it has a race condition where valid completion can be
lost while trying to clear spurious IRQ leading to occassional command
timeouts.
This patch improves SFF interrupt handler such that
1. Once BMDMA HSM is stopped, the condition is never considered
spurious. As there's no way to resume stopped BMDMA HSM, if device
status doesn't agree with BMDMA status, the only way out is
aborting the command (otherwise, it will just end up timing out).
2. ap->ops->sff_check_status() can be safely called to clear spurious
device IRQ as it atomically returns completion status but BMDMA IRQ
status can't be cleared in safe way if command is in flight. After
a spurious IRQ, call ap->ops->sff_irq_clear() only if the
respective device is idle and retry completion if
sff_check_status() indicates command completion.
Please note that ata_piix uses bmdma_status for sff_irq_check() and #2
won't weaken spurious IRQ handling even with in-flight command because
if bmdma_status indicates IRQ pending but device status is not on
spurious check, the next IRQ handler invocation will abort the command
due to #1.
This fixes bko#15537.
https://bugzilla.kernel.org/show_bug.cgi?id=15537
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Andrew Benton <b3nton@gmail.com>
Cc: Petr Uzel <petr.uzel@centrum.cz>
Cc: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
pp->active_link is not reliable when FBS is enabled.
Both PORT_SCR_ACT and PORT_CMD_ISSUE should be checked
because mixed NCQ and non-NCQ commands may be in flight.
Signed-off-by: Shane Huang <shane.huang@amd.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
HP is recycling both DMI_PRODUCT_NAME and DMI_BIOS_VERSION making
ahci_broken_suspend() trigger for later products which are not
affected by the original problems. Match BIOS date instead of version
and add references to bko's so that full information can be found
easier later.
This fixes http://bugzilla.kernel.org/show_bug.cgi?id=15462
Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: tigerfishdaisy@gmail.com
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
bko#15481 shows that we're missing some NVIDIA ahci PCI IDs. Peer
Chen confirms that IDs 0x580-0x58f are reserved for cases where Linux
ID option is selected in the BIOS and are only used for mcp65-73. Add
0x0581-0x058f.
http://bugzilla.kernel.org/show_bug.cgi?id=15481
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Peer Chen <pchen@nvidia.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: (38 commits)
sata_via: Delay on vt6420 when starting ATAPI DMA write
ata: Detect Delkin Devices compact flash
pata_efar: Enable parallel scanning
pata_atiixp: enable parallel scan
[libata] pata_atiixp: add locking for parallel scanning
[libata] pata_efar: add locking for parallel scanning
libata: Pass host flags into the pci helper
[libata] pata_marvell: CONFIG_AHCI is really CONFIG_SATA_AHCI
libata: Allow pata_legacy to be built on non-ISA but PCI systems
pata_pdc202xx_old: fix UDMA mode for PDC2026x chipsets
pata_pdc202xx_old: fix UDMA mode for Promise UDMA33 cards
[libata] pata_at91: fix backslash-continued string
pata_via: store UDMA masks in via_isa_bridges table
pata_via: fix address setup timings underlocking
pata_serverworks: fix error message
pata_serverworks: fix PIO setup for the second channel
pata_efar: fix secondary port support
pata_cypress: fix PIO timings underclocking
pata_cs5535: use correct values for PIO1 and PIO2 data timings
pata_cmd64x: remove unused definitions
...
When writing a disc on certain lite-on dvd-writers (also rebadged
as optiarc/LG/...) connected to a vt6420, the ATAPI CDB ends
up in the datastream and on the disc, causing silent corruption.
Delaying between sending the CDB and starting DMA seems to
prevent this.
I do not know if there are burners that do not suffer from
this, but the patch should be safe for those as well.
There are many reports of this issue, but AFAICT no solution was
found before. For example:
http://lkml.indiana.edu/hypermail/linux/kernel/0802.3/0561.html
Signed-off-by: Bart Hartgers <bart.hartgers@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Again originally proposed by Bartlomiej but this does it by using the
generic helper logic instead.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This was originally proposed by Bartlomiej but as a device specific
expansion of the init_one function rather than making the helper more
generic.
Enable the parallel scan via the generic flags.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This is similar change as commit 60c3be3 for ata_piix host driver
and while pata_atiixp doesn't enable parallel scan yet the race
could probably also be triggered by requesting re-scanning of both
ports at the same time using SCSI sysfs interface.
[Ported to current tree without other patch dependancies by Alan Cox]
Original is
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
This one is
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Add clearing of UDMA enable bit also for PIO modes and then add
extra locking for parallel scanning.
This is similar change as commit 60c3be3 for ata_piix host driver
and while pata_efar doesn't enable parallel scan yet the race could
probably also be triggered by requesting re-scanning of both ports
at the same time using SCSI sysfs interface.
[Ported to current kernel without other patch dependancies by
Alan Cox]
Original is
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
This one is
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This allows parallel scan and the like to be set without having to stop
using the existing full helper functions. This patch merely adds the argument
and fixes up the callers. It doesn't undo the special cases already in the
tree or add any new parallel callers.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
The marvell driver comtains a fallback to ahci for the sata ports
which is incorrectly checked as CONFIG_AHCI while the only AHCI config
item is actually called SATA_AHCI (which also sounds sensible
considering it's a fallback for the sata ports).
Signed-off-by: Christoph Egger <siccegge@stud.informatik.uni-erlangen.de>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
This is needed for some unsupported hardware setups on strange 64bit
mainboards where crazy stuff has been done like putting flash ata adapters
on the LPC bus, or where the real hardware is hidden/confused.
Signed-off-by: Alan Cox <alan@linux.intel.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
PDC2026x chipsets need the same treatment as PDC20246 one.
This is completely untested but will hopefully fix UDMA issues
that people have been reporting against pata_pdc202xx_old for
the last couple of years.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
On Monday 04 January 2010 02:30:24 pm Russell King wrote:
> Found the problem - getting rid of the read of the alt status register
> after the command has been written fixes the UDMA CRC errors on write:
>
> @@ -676,7 +676,8 @@ void ata_sff_exec_command(struct ata_port *ap, const struct
> ata_taskfile *tf)
> DPRINTK("ata%u: cmd 0x%X\n", ap->print_id, tf->command);
>
> iowrite8(tf->command, ap->ioaddr.command_addr);
> - ata_sff_pause(ap);
> + ndelay(400);
> +// ata_sff_pause(ap);
> }
> EXPORT_SYMBOL_GPL(ata_sff_exec_command);
>
>
> This rather makes sense. The PDC20247 handles the UDMA part of the
> protocol. It has no way to tell the PDC20246 to wait while it suspends
> UDMA, so that a normal register access can take place - the 246 ploughs
> on with the register access without any regard to the state of the 247.
>
> If the drive immediately starts the UDMA protocol after a write to the
> command register (as it probably will for the DMA WRITE command), then
> we'll be accessing the taskfile in the middle of the UDMA setup, which
> can't be good. It's certainly a violation of the ATA specs.
Fix it by adding custom ->sff_exec_command method for UDMA33 chipsets.
Debugged-by: Russell King <rmk@arm.linux.org.uk>
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* store UDMA masks in via_isa_bridges[] and while at it make "flags"
field to be u8 instead of u16
* convert the driver to use UDMA masks from via_isa_bridges[]
* remove no longer needed VIA_UDMA* defines
Make some minor documentation and CodingStyle fixes while at it.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Correct via_do_set_mode() documentation while at it.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
Timing registers should be programmed with the desired number of clocks
minus one clock.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
There shouldn't be any problems with it as IDE cs5535 host driver
has been using those values for years and they match values given
in the (publicly available) datasheet.
Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Signed-off-by: Jeff Garzik <jgarzik@redhat.com>