Instead of running the request queue of each device associated with a host
every 3 ms (BLK_MQ_RESOURCE_DELAY) while host error handling is in
progress, run the request queue after error handling has finished.
Cc: Christoph Hellwig <hch@lst.de>
Cc: Ming Lei <ming.lei@redhat.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230518193159.1166304-4-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use min() instead of open-coding it in scsi_normalize_sense().
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Benjamin Block <bblock@linux.ibm.com>
Cc: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230518193159.1166304-2-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Smatch complains that:
tw_probe() warn: missing error code 'retval'
This patch adds error checking to tw_probe() to handle initialization
failure. If tw_reset_sequence() function returns a non-zero value, the
function will return -EINVAL to indicate initialization failure.
Fixes: 1da177e4c3 ("Linux-2.6.12-rc2")
Signed-off-by: Yuchen Yang <u202114568@hust.edu.cn>
Link: https://lore.kernel.org/r/20230505141259.7730-1-u202114568@hust.edu.cn
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Niklas Cassel <nks@flawful.org> says:
This series adds support for Command Duration Limits.
The series is based on linux tag: v6.4-rc1
The series can also be found in git: https://github.com/floatious/linux/commits/cdl-v7
=================
CDL in ATA / SCSI
=================
Command Duration Limits is defined in:
T13 ATA Command Set - 5 (ACS-5) and
T10 SCSI Primary Commands - 6 (SPC-6) respectively
(a simpler version of CDL is defined in T10 SPC-5).
CDL defines Duration Limits Descriptors (DLD).
7 DLDs for read commands and 7 DLDs for write commands.
Simply put, a DLD contains a limit and a policy.
A command can specify that a certain limit should be applied by setting
the DLD index field (3 bits, so 0-7) in the command itself.
The DLD index points to one of the 7 DLDs.
DLD index 0 means no descriptor, so no limit.
DLD index 1-7 means DLD 1-7.
A DLD can have a few different policies, but the two major ones are:
-Policy 0xF (abort), command will be completed with command aborted error
(ATA) or status CHECK CONDITION (SCSI), with sense data indicating that
the command timed out.
-Policy 0xD (complete-unavailable), command will be completed without
error (ATA) or status GOOD (SCSI), with sense data indicating that the
command timed out. Note that the command will not have transferred any
data to/from the device when the command timed out, even though the
command returned success.
Regardless of the CDL policy, in case of a CDL timeout, the I/O will
result in a -ETIME error to user-space.
The DLDs are defined in the CDL log page(s) and are readable and writable.
Reading and writing the CDL DLDs are outside the scope of the kernel.
If a user wants to read or write the descriptors, they can do so using a
user-space application that sends passthrough commands, such as cdl-tools:
https://github.com/westerndigitalcorporation/cdl-tools
================================
The introduction of ioprio hints
================================
What the kernel does provide, is a method to let I/O use one of the CDL DLDs
defined in the device. Note that the kernel will simply forward the DLD index
to the device, so the kernel currently does not know, nor does it need to know,
how the DLDs are defined inside the device.
The way that the CDL DLD index is supplied to the kernel is by introducing a
new 10 bit "ioprio hint" field within the existing 16 bit ioprio definition.
Currently, only 6 out of the 16 ioprio bits are in use, the remaining 10 bits
are unused, and are currently explicitly disallowed to be set by the kernel.
For now, we only add ioprio hints representing CDL DLD index 1-7. Additional
ioprio hints for other QoS features could be defined in the future.
A theoretical future work could be to make an I/O scheduler aware of these
hints. E.g. for CDL, an I/O scheduler could make use of the duration limit
in each descriptor, and take that information into account while scheduling
commands. Right now, the ioprio hints will be ignored by the I/O schedulers.
==============================
How to use CDL from user-space
==============================
Since CDL is mutually exclusive with NCQ priority
(see ncq_prio_enable and sas_ncq_prio_enable in
Documentation/ABI/testing/sysfs-block-device),
CDL has to be explicitly enabled using:
echo 1 > /sys/block/$bdev/device/cdl_enable
Since the ioprio hints are supplied through the existing I/O priority API,
it should be simple for an application to make use of the ioprio hints.
It simply has to reuse one of the new macros defined in
include/uapi/linux/ioprio.h: IOPRIO_PRIO_HINT() or IOPRIO_PRIO_VALUE_HINT(),
and supply one of the new hints defined in include/uapi/linux/ioprio.h:
IOPRIO_HINT_DEV_DURATION_LIMIT_[1-7], which indicates that the I/O should
use the corresponding CDL DLD index 1-7.
By reusing the I/O priority API, the user can both define a DLD to use per
AIO (io_uring sqe->ioprio or libaio iocb->aio_reqprio) or per-thread
(ioprio_set()).
=======
Testing
=======
With the following fio patches:
https://github.com/floatious/fio/commits/cdl
fio adds support for ioprio hints, such that CDL can be tested using e.g.:
fio --ioengine=io_uring --cmdprio_percentage=10 --cmdprio_hint=DLD_index
A simple way to test is to use a DLD with a very short duration limit,
and send large reads. Regardless of the CDL policy, in case of a CDL
timeout, the I/O will result in a -ETIME error to user-space.
We also provide a CDL test suite located in the cdl-tools repo, see:
https://github.com/westerndigitalcorporation/cdl-tools#testing-a-system-command-duration-limits-support
We have tested this patch series using:
-real hardware
-the following QEMU implementation:
https://github.com/floatious/qemu/tree/cdl
(NOTE: the QEMU implementation requires you to define the CDL policy at compile
time, so you currently need to recompile QEMU when switching between policies.)
===================
Further information
===================
For further information about CDL, see Damien's slides:
Presented at SDC 2021:
https://www.snia.org/sites/default/files/SDC/2021/pdfs/SNIA-SDC21-LeMoal-Be-On-Time-command-duration-limits-Feature-Support-in%20Linux.pdf
Presented at Lund Linux Con 2022:
https://drive.google.com/file/d/1I6ChFc0h4JY9qZdO1bY5oCAdYCSZVqWw/view?usp=sharing
================
Changes since V6
================
-Rebased series on v6.4-rc1.
-Picked up Reviewed-by tags from Hannes (Thank you Hannes!)
-Picked up Reviewed-by tag from Christoph (Thank you Christoph!)
-Changed KernelVersion from 6.4 to 6.5 for new sysfs attributes.
For older change logs, see previous patch series versions:
https://lore.kernel.org/linux-scsi/20230406113252.41211-1-nks@flawful.org/https://lore.kernel.org/linux-scsi/20230404182428.715140-1-nks@flawful.org/https://lore.kernel.org/linux-scsi/20230309215516.3800571-1-niklas.cassel@wdc.com/https://lore.kernel.org/linux-scsi/20230124190308.127318-1-niklas.cassel@wdc.com/https://lore.kernel.org/linux-scsi/20230112140412.667308-1-niklas.cassel@wdc.com/https://lore.kernel.org/linux-scsi/20221208105947.2399894-1-niklas.cassel@wdc.com/
Link: https://lore.kernel.org/r/20230511011356.227789-1-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commands using a duration limit descriptor that has limit policies set to a
value other than 0x0 may be failed by the device if one of the limits are
exceeded. For such commands, since the failure is the result of the user
duration limit configuration and workload, the commands should not be
retried and terminated immediately. Furthermore, to allow the user to
differentiate these "soft" failures from hard errors due to hardware
problem, a different error code than EIO should be returned.
There are 2 cases to consider:
(1) The failure is due to a limit policy failing the command with a check
condition sense key, that is, any limit policy other than 0xD. For this
case, scsi_check_sense() is modified to detect failures with the ABORTED
COMMAND sense key and the COMMAND TIMEOUT BEFORE PROCESSING or COMMAND
TIMEOUT DURING PROCESSING or COMMAND TIMEOUT DURING PROCESSING DUE TO ERROR
RECOVERY additional sense code. For these failures, a SUCCESS disposition
is returned so that scsi_finish_command() is called to terminate the
command.
(2) The failure is due to a limit policy set to 0xD, which result in the
command being terminated with a GOOD status, COMPLETED sense key, and DATA
CURRENTLY UNAVAILABLE additional sense code. To handle this case, the
scsi_check_sense() is modified to return a SUCCESS disposition so that
scsi_finish_command() is called to terminate the command. In addition,
scsi_decide_disposition() has to be modified to see if a command being
terminated with GOOD status has sense data. This is as defined in SCSI
Primary Commands - 6 (SPC-6), so all according to spec, even if GOOD status
commands were not checked before.
If scsi_check_sense() detects sense data representing a duration limit,
scsi_check_sense() will set the newly introduced SCSI ML byte
SCSIML_STAT_DL_TIMEOUT. This SCSI ML byte is checked in scsi_noretry_cmd(),
so that a command that failed because of a CDL timeout cannot be
retried. The SCSI ML byte is also checked in scsi_result_to_blk_status() to
complete the command request with the BLK_STS_DURATION_LIMIT status, which
result in the user seeing ETIME errors for the failed commands.
Co-developed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-12-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Introduce the command duration limits helper function sd_cdl_dld() to set
the DLD bits of READ/WRITE 16 and READ/WRITE 32 commands to indicate to the
device the command duration limit descriptor to apply to the commands.
When command duration limits are enabled, sd_cdl_dld() obtains the index of
the descriptor to apply to the command using the hints field of the request
IO priority value (hints IOPRIO_HINT_DEV_DURATION_LIMIT_1 to
IOPRIO_HINT_DEV_DURATION_LIMIT_7).
If command duration limits is disabled (which is the default), the limit
index "0" is always used to indicate "no limit" for a command.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-11-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add the sysfs scsi_device attribute cdl_enable to allow a user to enable or
disable a device command duration limits feature. CDL is disabled by
default. This feature must be explicitly enabled by a user by setting the
cdl_enable attribute to 1.
The new function scsi_cdl_enable() does not do anything beside setting the
cdl_enable field of struct scsi_device in the case of a (real) SCSI device
(e.g. a SAS HDD). For ATA devices, the command duration limits feature
needs to be enabled/disabled using the ATA feature sub-page of the control
mode page. To do so, the scsi_cdl_enable() function checks if this mode
page is supported using scsi_mode_sense(). If it is, scsi_mode_select() is
used to enable and disable CDL.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-10-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Introduce the function scsi_cdl_check() to detect if a device supports
command duration limits (CDL). Support for the READ 16, WRITE 16, READ 32
and WRITE 32 commands are checked using the function scsi_report_opcode()
to probe the rwcdlp and cdlp bits as they indicate the mode page defining
the command duration limits descriptors that apply to the command being
tested.
If any of these commands support CDL, the field cdl_supported of struct
scsi_device is set to 1 to indicate that the device supports CDL.
Support for CDL for a device is advertizes through sysfs using the new
cdl_supported device attribute. This attribute value is 1 for a device
supporting CDL and 0 otherwise.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Co-developed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-9-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The REPORT_SUPPORTED_OPERATION_CODES command allows checking for support of
commands that have the same opcode but different service actions, such as
READ 32 and WRITE 32. However, the current implementation of
scsi_report_opcode() only allows checking an operation code without a
service action differentiation.
Add the "sa" argument to scsi_report_opcode() to allow passing a service
action. If a non-zero service action is specified, the reporting options
field value is set to 3 to have the service action field taken into account
by the device. If no service action field is specified (zero), the
reporting options field is set to 1 as before.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-8-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Allow scsi_mode_sense() to retrieve sub-pages of mode pages by adding the
subpage argument. Change all the current caller sites to specify the
subpage 0.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-7-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
SCSI has two different getters:
- get_XXX_byte() (in scsi_cmnd.h) which takes a struct scsi_cmnd *, and
- XXX_byte() (in scsi.h) which takes a scmd->result.
The proper name for get_scsi_ml_byte() should thus be without the get_
prefix, as it takes a scmd->result. Rename the function to rectify this.
(This change was suggested by Mike Christie.)
Additionally, move get_scsi_ml_byte() to scsi_priv.h since both scsi_lib.c
and scsi_error.c will need to use this helper in a follow-up patch.
Cc: Mike Christie <michael.christie@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-6-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In SCSI, we get the sense data as part of the completion, for ATA however,
we need to fetch the sense data as an extra step. For an aborted ATA
command the sense data is fetched via libata's ->eh_strategy_handler().
For Command Duration Limits policy 0xD:
The device shall complete the command without error with the additional
sense code set to DATA CURRENTLY UNAVAILABLE.
In order to handle this policy in libata, we intend to send a successful
command via SCSI EH, and let libata's ->eh_strategy_handler() fetch the
sense data for the good command. This is similar to how we handle an
aborted ATA command, just that we need to read the Successful NCQ Commands
log instead of the NCQ Command Error log.
When we get a SATA completion with successful commands, ATA_SENSE will be
set, indicating that some commands in the completion have sense data.
The sense_valid bitmask in the Sense Data for Successful NCQ Commands log
will inform exactly which commands that had sense data, which might be a
subset of all the commands that was completed in the same completion. (Yet
all will have ATA_SENSE set, since the status is per completion.)
The successful commands that have e.g. a "DATA CURRENTLY UNAVAILABLE" sense
data will have a SCSI ML byte set, so scsi_eh_flush_done_q() will not set
the scmd->result to DID_TIME_OUT for these commands. However, the
successful commands that did not have sense data, must not get their result
marked as DID_TIME_OUT by SCSI EH.
Add a new flag SCMD_FORCE_EH_SUCCESS, which tells SCSI EH to not mark a
command as DID_TIME_OUT, even if it has scmd->result == SAM_STAT_GOOD.
This will be used by libata in a subsequent commit.
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Niklas Cassel <niklas.cassel@wdc.com>
Link: https://lore.kernel.org/r/20230511011356.227789-5-nks@flawful.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Mike Christie <michael.christie@oracle.com> says:
The patches in this thread allow us to use the block pr_ops with LIO's
target_core_iblock module to support cluster applications in VMs. They
were built over Linus's tree. They also apply over linux-next and
Martin's tree and Jens's trees.
Currently, to use windows clustering or linux clustering (pacemaker +
cluster labs scsi fence agents) in VMs with LIO and vhost-scsi, you
have to use tcmu or pscsi or use a cluster aware FS/framework for the
LIO pr file. Setting up a cluster FS/framework is pain and waste when
your real backend device is already a distributed device, and pscsi
and tcmu are nice for specific use cases, but iblock gives you the
best performance and allows you to use stacked devices like
dm-multipath. So these patches allow iblock to work like pscsi/tcmu
where they can pass a PR command to the backend module. And then
iblock will use the pr_ops to pass the PR command to the real devices
similar to what we do for unmap today.
The patches are separated in the following groups:
Patch 1 - 2:
- Add block layer callouts for reading reservations and rename reservation
error code.
Patch 3 - 5:
- SCSI support for new callouts.
Patch 6:
- DM support for new callouts.
Patch 7 - 13:
- NVMe support for new callouts.
Patch 14 - 18:
- LIO support for new callouts.
This patchset has been tested with the libiscsi PGR ops and with
window's failover cluster verification test. Note that for scsi
backend devices we need this patchset:
https://lore.kernel.org/linux-scsi/20230123221046.125483-1-michael.christie@oracle.com/T/#m4834a643ffb5bac2529d65d40906d3cfbdd9b1b7
to handle UAs. To reduce the size of this patchset that's being done
separately to make reviewing easier. And to make merging easier this
patchset and the one above do not have any conflicts so can be merged
in different trees.
Link: https://lore.kernel.org/r/20230407200551.12660-1-michael.christie@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Xiang Chen <chenxiang66@hisilicon.com> says:
This series contains some fixes including:
- Configure initial value of some registers according to HBA model
- Change DMA setup lock timeout from 100ms to 2.5s
- Fix warnings detected by sparse
Link: https://lore.kernel.org/r/1684118481-95908-1-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
DMA setup lock timeout protection is added when DMA setup frames are
received. It's a function outside the protocol and used to prevent SATA
disk I/Os from being delivered for a long time. The default value is 100ms,
it's too strict and easily triggered timeout when the disk is overloaded or
faulty. Based on the average I/O latency of 300 disks, we adjust the value
to 2.5s.
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1684118481-95908-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For SAS HBAs of 920 and previous version, we use init_reg_v3_hw() to set
some registers which are related to HW boards. For SAS HBAs of 920B and
later version, those HW registers are set through firmware. And different
HBA models are distinguished through pci_dev->revision.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1684118481-95908-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the ongoing effort to replace all fake flexible arrays with true
flexible arrays, replace the sge32, sge64, and sge_skinny members of union
megasas_sgl with true flexible arrays. No binary differences are seen after
this change; sizes were already being manually calculated using the member
struct sizes directly.
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: Shivasharan S <shivasharan.srikanteshwara@broadcom.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: megaraidlinux.pdl@broadcom.com
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Kees Cook <keescook@chromium.org>
Link: https://lore.kernel.org/r/20230511220957.never.919-kees@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace <don.brace@microchip.com> says:
These patches are based on Martin Petersen's 6.4/scsi-queue tree
https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
6.4/scsi-queue
This set of changes consists of:
* Map entire BAR 0. The driver was mapping up to and including the
controller registers, but not all of BAR 0.
* Add PCI IDs to support new controllers.
* Clean up some code by removing unnecessary NULL checks. This cleanup is
a result of a Coverity report.
* Correct a rare memory leak whenever pqi_sas_port_add_rhpy() returns an
error. This was Suggested by: Yang Yingliang <yangyingliang@huawei.com>
* Remove atomic operations on variable raid_bypass_cnt. Accuracy is not
required for driver operation. Change type from atomic_t to unsigned
int.
* Correct a rare drive hot-plug removal issue where we get a NULL
io_request. We added a check for this condition.
* Turn on NCQ priority for AIO requests to disks comprising RAID devices.
* Correct byte aligned writew() operations on some ARM servers. Changed
the writew() to two writeb() operations.
* Change how the driver checks for a sanitize operation in progress. We
were using TEST UNIT READY. We removed the TEST UNIT READY code and are
now using the controller's firmware information in order to avoid issues
caused by drives failing to complete TEST UNIT READY.
* Some customers have been requesting that we add the NUMA node to
/sys/block/sd<scsi device>/device like the nvme driver does.
* Update the copyright information to match the current year.
* Bump the driver version to 2.1.22-040.
Link: https://lore.kernel.org/r/20230428153712.297638-1-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pranav Prasad <pranavpp@google.com> says:
This patch series enhances debug logs for pm80xx HW events, and provides a
minor fix in the case of a hard reset. The log enhancement involves changing
the log severity level to enable logging for HW events which consequently
help debug disk discovery issues.
1. Changed log severity level from MSG to EVENT for HW events. Enhanced
the HW event logs by adding the phyid.
2. Enabled INIT logging.
3. Log portid along with the PHY_UP event.
4. Print phyid and portid sent as part of device registration request.
5. Log port state during HW events.
6. Update phy_state and phy_attached to correct values after a hard reset.
Link: https://lore.kernel.org/r/20230418190101.696345-1-pranavpp@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nilesh Javali <njavali@marvell.com> says:
Please apply the qla2xxx driver enhancement and bug fixes to the scsi tree
at your earliest convenience.
Link: https://lore.kernel.org/r/20230428075339.32551-1-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee <justintee8345@gmail.com> says:
Update lpfc to revision 14.2.0.12
This patch set contains fixes flagged by code analyzer tools, introduces a
new CQE status to handle DMA errors, and replaces the usage of blk
interrupts with threaded interrupts.
The patches were cut against Martin's 6.4/scsi-queue tree.
Link: https://lore.kernel.org/r/20230417191558.83100-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Smatch reported:
drivers/scsi/qedf/qedf_main.c:3056 qedf_alloc_global_queues()
warn: missing unwind goto?
At this point in the function, nothing has been allocated so we can return
directly. In particular the "qedf->global_queues" have not been allocated
so calling qedf_free_global_queues() will lead to a NULL dereference when
we check if (!gl[i]) and "gl" is NULL.
Fixes: 61d8658b4a ("scsi: qedf: Add QLogic FastLinQ offload FCoE driver framework.")
Signed-off-by: Jinhong Zhu <jinhongzhu@hust.edu.cn>
Link: https://lore.kernel.org/r/20230502140022.2852-1-jinhongzhu@hust.edu.cn
Reviewed-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Gerry Morong <gerry.morong@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-13-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update copyright to current year.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-12-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Although NUMA node is a PCIe device level attribute, it was requested the
NUMA node be added for each exposed device similar to NVMe disks.
Example for NVMe:
/sys/block/nvme1c1n1/device/numa_node
Example for smartpqi:
/sys/block/sdh/device/numa_node
cat /sys/block/sdh/device/numa_node
0
Reviewed-by: David Strahan <david.strahan@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-11-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Stop sending driver-initiated TURs to physical devices during driver
load/rescan.
Note: This does not affect SML initiated TURs.
Some Linux kernels can cause lengthy delays in OS boot if the kernel
detects that a drive is being sanitized/erased. We were using TURs to
detect if a sanitize/erase was in progress.
Some devices do not return the TUR in a timely manner, causing driver
load/rescan stalls.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-10-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Correct OOPs on ARM servers during driver init.
The driver attempts to update FW with max_feature_supported value using a
writew() kernel call using a byte aligned address. This fails on some ARM
systems.
Change the writew() to two writeb() calls to update this value.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-9-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Enable NCQ priority feature for the RAID path when AIO path is disabled.
Move function pqi_is_io_high_priority() up to avoid adding a prototype.
Remove unused argument ctrl_info.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Gilbert Wu <Gilbert.Wu@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-8-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Prevent OS crashes when a drive is hot removed during I/O stress test.
The I/O request pointer can be invalid if block layer provides incorrect
multi-queue host tag. This can lead to invalid I/O request pointer
dereference.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-7-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reduce CPU contention when incrementing variable raid_bypass_cnt.
Remove the atomic operations for this variable by changing the atomic to an
unsigned int and replace atomic operations with standard operations. The
value is only checked that it is increasing and accuracy is not required.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-6-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Free rphy when pqi_sas_port_add_rphy() returns an error.
If pqi_sas_port_add_rphy() returns an error, the 'rphy' allocated in
sas_end_device_alloc() needs to be freed.
It should be noted that no issues were ever reported.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Suggested-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-5-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove an unnecessary check for a NULL pointer. This unnecessary check was
flagged by Coverity.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-4-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Map full length of PCI BAR 0 at driver init.
During driver initialization, the driver must make a kernel call to map the
controller registers into kernel address space. A parameter to this call
is the length of the memory to be mapped. The driver was specifying the
wrong length.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230428153712.297638-2-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update version to 10.02.08.300-k.
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-8-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
System crash due to use after free.
Current code allows terminate_rport_io to exit before making
sure all IOs has returned. For FCP-2 device, IO's can hang
on in HW because driver has not tear down the session in FW at
first sign of cable pull. When dev_loss_tmo timer pops,
terminate_rport_io is called and upper layer is about to
free various resources. Terminate_rport_io trigger qla to do
the final cleanup, but the cleanup might not be fast enough where it
leave qla still holding on to the same resource.
Wait for IO's to return to upper layer before resources are freed.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-7-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
System crash, where driver is accessing scsi layer's
memory (scsi_cmnd->device->host) to search for a well known internal
pointer (vha). The scsi_cmnd was released back to upper layer which
could be freed, but the driver is still accessing it.
7 [ffffa8e8d2c3f8d0] page_fault at ffffffff86c010fe
[exception RIP: __qla2x00_eh_wait_for_pending_commands+240]
RIP: ffffffffc0642350 RSP: ffffa8e8d2c3f988 RFLAGS: 00010286
RAX: 0000000000000165 RBX: 0000000000000002 RCX: 00000000000036d8
RDX: 0000000000000000 RSI: ffff9c5c56535188 RDI: 0000000000000286
RBP: ffff9c5bf7aa4a58 R8: ffff9c589aecdb70 R9: 00000000000003d1
R10: 0000000000000001 R11: 0000000000380000 R12: ffff9c5c5392bc78
R13: ffff9c57044ff5c0 R14: ffff9c56b5a3aa00 R15: 00000000000006db
ORIG_RAX: ffffffffffffffff CS: 0010 SS: 0018
8 [ffffa8e8d2c3f9c8] qla2x00_eh_wait_for_pending_commands at ffffffffc0646dd5 [qla2xxx]
9 [ffffa8e8d2c3fa00] __qla2x00_async_tm_cmd at ffffffffc0658094 [qla2xxx]
Remove access of freed memory. Currently the driver was checking to see if
scsi_done was called by seeing if the sp->type has changed. Instead,
check to see if the command has left the oustanding_cmds[] array as
sign of scsi_done was called.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Task management command hangs where a side
band chip reset failed to nudge the TMF
from it's current send path.
Add additional error check to block TMF
from entering during chip reset and along
the TMF path to cause it to bail out, skip
over abort of marker.
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-5-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Task management cmd failed with status 30h which means
FW is not able to finish processing one task management
before another task management for the same lun.
Hence add wait for completion of marker to space it out.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304271802.uCZfwQC1-lkp@intel.com/
Cc: stable@vger.kernel.org
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230428075339.32551-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com <mailto:himanshu.madhani@oracle.com>>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To be consistent with sas_check_edge_expander_topo(), factor out
sas_check_fanout_expander_topo(). And remove the comment since we are not
spilling over 80 colums now.
Signed-off-by: Jason Yan <yanaijie@huawei.com>
Link: https://lore.kernel.org/r/20230421093744.1583609-4-yanaijie@huawei.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>