Commit aa3998dbeb ("ata: libata-scsi: Disable scsi device
manage_system_start_stop") change setting the manage_system_start_stop
flag to false for libata managed disks to enable libata internal
management of disk suspend/resume. However, a side effect of this change
is that on system shutdown, disks are no longer being stopped (set to
standby mode with the heads unloaded). While this is not a critical
issue, this unclean shutdown is not recommended and shows up with
increased smart counters (e.g. the unexpected power loss counter
"Unexpect_Power_Loss_Ct").
Instead of defining a shutdown driver method for all ATA adapter
drivers (not all of them define that operation), this patch resolves
this issue by further refining the sd driver start/stop control of disks
using the new flag manage_shutdown. If this new flag is set to true by
a low level driver, the function sd_shutdown() will issue a
START STOP UNIT command with the start argument set to 0 when a disk
needs to be powered off (suspended) on system power off, that is, when
system_state is equal to SYSTEM_POWER_OFF.
Similarly to the other manage_xxx flags, the new manage_shutdown flag is
exposed through sysfs as a read-write device attribute.
To avoid any confusion between manage_shutdown and
manage_system_start_stop, the comments describing these flags in
include/scsi/scsi.h are also improved.
Fixes: aa3998dbeb ("ata: libata-scsi: Disable scsi device manage_system_start_stop")
Cc: stable@vger.kernel.org
Closes: https://bugzilla.kernel.org/show_bug.cgi?id=218038
Link: https://lore.kernel.org/all/cd397c88-bf53-4768-9ab8-9d107df9e613@gmail.com/
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: James Bottomley <James.Bottomley@HansenPartnership.com>
Acked-by: Martin K. Petersen <martin.petersen@oracle.com>
Found by Smatch.
Fixes: 5bcd3bfbda ("scsi: megaraid: Pass in NULL scb for host reset")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231023073021.21954-1-hare@suse.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Found by smatch.
Fixes: c67e638004 ("scsi: aic79xx: Do not reference SCSI command when resetting device")
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231023073014.21438-1-hare@suse.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The retry loop continues to iterate until the count reaches 30, even after
receiving the correct value. Exit loop when a non-zero value is read.
Fixes: 4ca10f3e31 ("scsi: mpt3sas: Perform additional retries if doorbell read returns 0")
Cc: stable@vger.kernel.org
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20231020105849.6350-1-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Return error code directly to save space and be more clear.
Signed-off-by: Su Hui <suhui@nfschina.com>
Link: https://lore.kernel.org/r/20231020023326.43898-1-suhui@nfschina.com
Reviewed-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is just a cleanup for scsi_dev_queue_ready() to avoid a redundant goto
and if statement. No functional change.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231018113746.1940197-2-haowenchao2@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When breaking out of an shost_for_each_device() loop one needs to do an
explicit scsi_device_put().
Fixes: c2a14ab3b9 ("scsi: pmcraid: Select device in pmcraid_eh_target_reset_handler()")
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231023072957.20191-1-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix kernel-doc comment to silence the warnings:
drivers/scsi/pmcraid.c:2697: warning: Excess function parameter 'scsi_cmd' description in 'pmcraid_reset_device'
drivers/scsi/pmcraid.c:2697: warning: Function parameter or member 'scsi_dev' not described in 'pmcraid_reset_device'
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=6843
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20231017025853.67562-1-yang.lee@linux.alibaba.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Two small fixes, both in drivers. The mptsas one is really fixing an
error path issue where it can leave the misc driver loaded even though
the sas driver fails to initialize.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZTLCuiYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishU6oAP45oLIr
P3d4AXt2XGBwsaqyWaQx2OOe8fGps0+jyF3OdQEAzFXKXp7lNmHlA573JVMjH6Sz
JqAdi+dYksRssC0GMZk=
=iXXi
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Two small fixes, both in drivers.
The mptsas one is really fixing an error path issue where it can leave
the misc driver loaded even though the sas driver fails to initialize"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: qla2xxx: Fix double free of dsd_list during driver load
scsi: mpt3sas: Fix in error path
The default handling of the NOT READY sense key is to wait for the device
to become ready. The "wait" is assumed to be relatively short. However
there is a sub-class of NOT READY that have the "... in progress" phrase in
their additional sense code and these can take much longer. Following on
from commit 505aa4b6a8 ("scsi: sd: Defer spinning up drive while SANITIZE
is in progress") we now have element depopulation and restoration that can
take a long time. For example, over 24 hours for a 20 TB, 7200 rpm hard
disk to depopulate 1 of its 20 elements.
Add handling of ASC/ASCQ: 0x4,0x24 (depopulation in progress)
and ASC/ASCQ: 0x4,0x25 (depopulation restoration in progress)
to sd.c . The scsi_lib.c has incomplete handling of these
two messages, so complete it.
Signed-off-by: Douglas Gilbert <dgilbert@interlog.com>
Link: https://lore.kernel.org/r/20231015050650.131145-1-dgilbert@interlog.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Wenchao Hao <haowenchao2@huawei.com> says:
The original error injection mechanism was based on scsi_host which
could not inject fault for a single SCSI device.
This patchset provides the ability to inject errors for a single SCSI
device. Now we support inject timeout errors, queuecommand errors, and
hostbyte, driverbyte, statusbyte, and sense data for specific SCSI
Command. Two new error injection is defined to make abort command or
reset LUN failed.
Besides error injection for single device, this patchset add a new
interface to make reset target failed for each scsi_target.
The first two patch add a debugfs interface to add and inquiry single
device's error injection info; the third patch defined how to remove
an injection which has been added. The following 5 patches use the
injection info and generate the related error type. The last two just
add a new interface to make reset target failed and control
scsi_device's allow_restart flag.
Link: https://lore.kernel.org/r/20231010092051.608007-1-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add new module param "allow_restart" to control scsi_device's allow_restart
flag. This flag determines if EH is triggered after a command completes
with sense_key 0x6, ASC 0x4 and ASCQ 0x2. EH would be triggered if
allow_restart=1 in this condition.
The new param can be used with the error injection capability to test how
commands completing with sense_key 0x6, ASC 0x4 and ASCQ 0x2 are handled.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-11-haowenchao2@huawei.com
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The interface is found at
/sys/kernel/debug/scsi_debug/target<h:c:t>/fail_reset where <h:c:t>
identifies the target to inject errors on. It's a simple bool type
interface which would make this target's reset fail if set to 'Y'.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-10-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add error injection type 4 to make scsi_debug_device_reset() return FAILED.
Fail abort command format:
+--------+------+-------------------------------------------------------+
| Column | Type | Description |
+--------+------+-------------------------------------------------------+
| 1 | u8 | Error type, fixed to 0x4 |
+--------+------+-------------------------------------------------------+
| 2 | s32 | Error count |
| | | 0: this rule will be ignored |
| | | positive: the rule will always take effect |
| | | negative: the rule takes effect n times where -n is |
| | | the value given. Ignored after n times |
+--------+------+-------------------------------------------------------+
| 3 | x8 | SCSI command opcode, 0xff for all commands |
+--------+------+-------------------------------------------------------+
Examples:
error=/sys/kernel/debug/scsi_debug/0:0:0:1/error
echo "4 -10 0x12" > ${error}
will make the device return FAILED when trying to reset LUN with inquiry
command 10 times.
error=/sys/kernel/debug/scsi_debug/0:0:0:1/error
echo "4 -10 0xff" > ${error}
will make the device return FAILED when trying to reset LUN 10 times.
Usually we do not care about what command it is when trying to perform
reset LUN, so 0xff could be applied.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-9-haowenchao2@huawei.com
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add error injection type 3 to make scsi_debug_abort() return FAILED. Fail
abort command format:
+--------+------+-------------------------------------------------------+
| Column | Type | Description |
+--------+------+-------------------------------------------------------+
| 1 | u8 | Error type, fixed to 0x3 |
+--------+------+-------------------------------------------------------+
| 2 | s32 | Error count |
| | | 0: this rule will be ignored |
| | | positive: the rule will always take effect |
| | | negative: the rule takes effect n times where -n is |
| | | the value given. Ignored after n times |
+--------+------+-------------------------------------------------------+
| 3 | x8 | SCSI command opcode, 0xff for all commands |
+--------+------+-------------------------------------------------------+
Examples:
error=/sys/kernel/debug/scsi_debug/0:0:0:1/error
echo "3 -10 0x12" > ${error}
will make the device return FAILED when aborting inquiry command 10 times.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-8-haowenchao2@huawei.com
Tested-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a fail command error is injected, set the command's status and sense
data then finish this SCSI command.
Set SCSI command's status and sense data format:
+--------+------+-------------------------------------------------------+
| Column | Type | Description |
+--------+------+-------------------------------------------------------+
| 1 | u8 | Error type, fixed to 0x2 |
+--------+------+-------------------------------------------------------+
| 2 | s32 | Error Count |
| | | 0: the rule will be ignored |
| | | positive: the rule will always take effect |
| | | negative: the rule takes effect n times where -n is |
| | | the value given. Ignored after n times |
+--------+------+-------------------------------------------------------+
| 3 | x8 | SCSI command opcode, 0xff for all commands |
+--------+------+-------------------------------------------------------+
| 4 | x8 | Host byte in scsi_cmd::status |
| | | [scsi_cmd::status has 32 bits holding these 3 bytes] |
+--------+------+-------------------------------------------------------+
| 5 | x8 | Driver byte in scsi_cmd::status |
+--------+------+-------------------------------------------------------+
| 6 | x8 | SCSI Status byte in scsi_cmd::status |
+--------+------+-------------------------------------------------------+
| 7 | x8 | SCSI Sense Key in scsi_cmnd |
+--------+------+-------------------------------------------------------+
| 8 | x8 | SCSI ASC in scsi_cmnd |
+--------+------+-------------------------------------------------------+
| 9 | x8 | SCSI ASCQ in scsi_cmnd |
+--------+------+-------------------------------------------------------+
Examples:
error=/sys/kernel/debug/scsi_debug/0:0:0:1/error
echo "2 -10 0x88 0 0 0x2 0x3 0x11 0x0" >${error}
will make device's read command return with media error with additional
sense of "Unrecovered read error" (UNC):
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-7-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a fail queuecommand error is injected, return the failed value defined
in the rule from queuecommand.
Make queuecommand return format:
+--------+------+-------------------------------------------------------+
| Column | Type | Description |
+--------+------+-------------------------------------------------------+
| 1 | u8 | Error type, fixed to 0x1 |
+--------+------+-------------------------------------------------------+
| 2 | s32 | Error count |
| | | 0: this rule will be ignored |
| | | positive: the rule will always take effect |
| | | negative: the rule takes effect n times where -n is |
| | | the value given. Ignored after n times |
+--------+------+-------------------------------------------------------+
| 3 | x8 | SCSI command opcode, 0xff for all commands |
+--------+------+-------------------------------------------------------+
| 4 | x32 | The queuecommand() return value we want |
+--------+------+-------------------------------------------------------+
Examples:
error=/sys/kernel/debug/scsi_debug/0:0:0:1/error
echo "1 1 0x12 0x1055" > ${error}
will make each INQUIRY command sent to that device return 0x1055
(SCSI_MLQUEUE_HOST_BUSY).
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-6-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If a timeout error is injected, return 0 from scsi_debug_queuecommand to
make the command time out.
Time out SCSI command format:
+--------+------+-------------------------------------------------------+
| Column | Type | Description |
+--------+------+-------------------------------------------------------+
| 1 | u8 | Error type, fixed to 0x0 |
+--------+------+-------------------------------------------------------+
| 2 | s32 | Error count |
| | | 0: this rule will be ignored |
| | | positive: the rule will always take effect |
| | | negative: the rule takes effect n times where -n is |
| | | the value given. Ignored after n times |
+--------+------+-------------------------------------------------------+
| 3 | x8 | SCSI command opcode, 0xff for all commands |
+--------+------+-------------------------------------------------------+
Examples:
error=/sys/kernel/debug/scsi_debug/0:0:0:1/error
echo "0 -10 0x12" > ${error}
will make the device's inquiry command time out 10 times.
echo "0 1 0x12" > ${error}
will make the device's inquiry time out each time it is invoked on this
device.
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-5-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The grammar to remove error injection is a line with fixed 3 columns
separated by spaces.
First column is fixed to "-". It tells this is a removal operation. Second
column is the error code to match. Third column is the scsi command to
match.
For example the following command would remove timeout injection of inquiry
command:
echo "- 0 0x12" > /sys/kernel/debug/scsi_debug/0:0:0:1/error
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-4-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This new facility uses the debugfs pseudo file system which is typically
mounted under the /sys/kernel/debug directory and requires root permissions
to access.
The interface file is found at /sys/kernel/debug/scsi_debug/<h:c:t:l>/error
where <h:c:t:l> identifies the device (logical unit (LU)) to inject errors
on.
For the following description the ${error} environment variable is assumed
to be set to/sys/kernel/debug/scsi_debug/1:0:0:0/error where 1:0:0:0 is a
pseudo device (LU) owned by the scsi_debug driver. Rules are written to
${error} in the normal sysfs fashion (e.g. 'echo "0 -2 0x12" > ${error}').
More than one rule can be active on a device at a time and inactive rules
(i.e. those whose error count is 0) remain in the rule listing. The
existing rules can be read with 'cat ${error}' with oneline output for each
rule.
The interface format is line-by-line, each line is an error injection rule.
Each rule contains integers separated by spaces, the first three columns
correspond to "Error code", "Error count" and "SCSI command", other
columns depend on Error code.
General rule format:
+--------+------+-------------------------------------------------------+
| Column | Type | Description |
+--------+------+-------------------------------------------------------+
| 1 | u8 | Error code |
| | | 0: timeout SCSI command |
| | | 1: fail queuecommand, make queuecommand return |
| | | given value |
| | | 2: fail command, finish command with SCSI status, |
| | | sense key and ASC/ASCQ values |
| | | 3: make abort commands for specific command fail |
| | | 4: make reset lun for specific command fail |
+--------+------+-------------------------------------------------------+
| 2 | s32 | Error count |
| | | 0: this rule will be ignored |
| | | positive: the rule will always take effect |
| | | negative: the rule takes effect n times where -n is |
| | | the value given. Ignored after n times |
+--------+------+-------------------------------------------------------+
| 3 | x8 | SCSI command opcode, 0xff for all commands |
+--------+------+-------------------------------------------------------+
| ... | xxx | Error type specific fields |
+--------+------+-------------------------------------------------------+
Notes:
- When multiple error inject rules are added for the same SCSI command,
the one with smaller error code will take effect (and the others will be
ignored).
- If the same error (i.e. same Error code and SCSI command) is added, the
older one will be overwritten..
- Currently, the basic types are (u8/u16/u32/u64/s8/s16/s32/s64) and the
hexadecimal types (x8/x16/x32/x64).
- Where a hexadecimal value is expected (e.g. Column 3: SCSI command
opcode) the "0x" prefix is optional on the value (e.g. the INQUIRY
opcode can be given as '0x12' or '12').
- When the Error count is negative, reading ${error} will show that value
incrementing, stopping when it gets to 0.
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-3-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Create directory scsi_debug in the root of the debugfs filesystem. Prepare
to add interface for manage error injection.
Acked-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231010092051.608007-2-haowenchao2@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee <justintee8345@gmail.com> says:
Update lpfc to revision 14.2.0.15
This patch set contains error handling fixes, ELS bug fixes, and
logging improvements.
The patches were cut against Martin's 6.7/scsi-queue tree.
Link: https://lore.kernel.org/r/20231009161812.97232-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The preexisting LOG_NODE message flag frequently spams a subset of the same
log messages during normal FC driver operations. When analyzing driver
logs, this sometimes leads to difficulty in troubleshooting.
Because LOG_IP log message flag is unused, convert it to a new
LOG_NODE_VERBOSE flag. The LOG_NODE_VERBOSE shall specifically be used for
diagnosing issues that require precise ndlp tracking detail.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231009161812.97232-6-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A WCQE success completion status does not guarantee valid LS_ACC receipt
for ELS commands. So, introduce a small helper routine that validates ELS
LS_ACC frames in ELS cmpl routines.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231009161812.97232-5-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, NPIV ports send PRLI_ACC to all received unsolicited PRLI
requests. For an NPIV port, there is no point to PRLI_ACC if the received
PRLI request has the initiator function bit set and the target function bit
unset. Modify the lpfc_rcv_prli_support_check() routine to send a PRLI_RJT
in such cases. NPIV ports are expected to send PRLI_ACC only if the Target
function bit is set.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231009161812.97232-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During receipt of a hardware error attention ACQE, IOERR_SLI_DOWN status is
set by the driver for all outstanding I/Os.
In such hardware error attention cases, we can treat the situation exactly
the same as pci_channel_offline. Thus, add IOERR_SLI_DOWN status to the
same category as pci_channel_offline handling in lpfc_nvme_io_cmd_cmpl.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231009161812.97232-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In order to enter the !rc if statement block in question, rc had to have
been zero to begin with. Thus, the rc = 0 assignment is unnecessary and
can be removed.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231009161812.97232-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver now includes the print message 'IO is completed, no reset is
required' when a reset is requested but not issued. This message is
displayed only when pending SCSI IO is completed before issuing the reset.
Signed-off-by: Chandrakanth patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Link: https://lore.kernel.org/r/20231003110021.168862-3-chandrakanth.patil@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In BMC environments with concurrent access to multiple registers, certain
registers occasionally yield a value of 0 even after 3 retries due to
hardware errata. As a fix, we have extended the retry count from 3 to 30.
The same errata applies to the mpt3sas driver, and a similar patch has
been accepted. Please find more details in the mpt3sas patch reference
link.
Link: https://lore.kernel.org/r/20230829090020.5417-2-ranjan.kumar@broadcom.com
Fixes: 272652fcbf ("scsi: megaraid_sas: add retry logic in megasas_readl")
Cc: stable@vger.kernel.org
Signed-off-by: Chandrakanth patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Link: https://lore.kernel.org/r/20231003110021.168862-2-chandrakanth.patil@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Mike Christie <michael.christie@oracle.com> says:
The following patches were made over Linus tree (Martin's 6.7 branch
was missing some changes to sd.c). They only contain the sshdr and
rdac retry fixes from the "Allow scsi_execute users to control
retries" patchset.
The patches in this set are reviewed and tested but the changes to how
we do retries will take a little longer and require more testing, so I
broke up the series to make them easier to review.
Link: https://lore.kernel.org/r/20231004210013.5601-1-michael.christie@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If scsi_execute_cmd returns < 0, it doesn't initialize the sshdr, so we
shouldn't access the sshdr. If it returns 0, then the cmd executed
successfully, so there is no need to check the sshdr. This has us access
the sshdr when we get a return value > 0.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20231004210013.5601-13-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If scsi_execute_cmd returns < 0, it doesn't initialize the sshdr, so we
shouldn't access the sshdr. If it returns 0, then the cmd executed
successfully, so there is no need to check the sshdr. This has us access
the sshdr when we get a return value > 0.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20231004210013.5601-12-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If scsi_execute_cmd returns < 0, it doesn't initialize the sshdr, so we
shouldn't access the sshdr. If it returns 0, then the cmd executed
successfully, so there is no need to check the sshdr. This has us access
the sshdr when we get a return value > 0.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20231004210013.5601-11-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If scsi_execute_cmd returns < 0, it doesn't initialize the sshdr, so we
shouldn't access the sshdr. If it returns 0, then the cmd executed
successfully, so there is no need to check the sshdr. This has us access
the sshdr when we get a return value > 0.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20231004210013.5601-10-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The sshdr passed into scsi_execute_cmd is only initialized if
scsi_execute_cmd returns >= 0, and scsi_mode_sense will convert all non
good statuses like check conditions to -EIO. This has scsi_mode_sense
callers that were possibly accessing an uninitialized sshdrs to only
access it if we got -EIO.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20231004210013.5601-9-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If scsi_execute_cmd returns < 0, it doesn't initialize the sshdr, so we
shouldn't access the sshdr. If it returns 0, then the cmd executed
successfully, so there is no need to check the sshdr. This has us access
the sshdr when we get a return value > 0.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20231004210013.5601-7-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If scsi_execute_cmd returns < 0, it doesn't initialize the sshdr, so we
shouldn't access the sshdr. If it returns 0, then the cmd executed
successfully, so there is no need to check the sshdr. This has us access
the sshdr when we get a return value > 0.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20231004210013.5601-6-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If send_mode_select retries scsi_execute_cmd it will leave err set to
SCSI_DH_RETRY/SCSI_DH_IMM_RETRY. If on the retry, the command is
successful, then SCSI_DH_RETRY/SCSI_DH_IMM_RETRY will be returned to the
scsi_dh activation caller. On the retry, we will then detect the previous
MODE SELECT had worked, and so we will return success.
This patch has us return the correct return value, so we can avoid the
extra scsi_dh activation call and to avoid failures if the caller had hit
its activation retry limit and does not end up retrying.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20231004210013.5601-5-michael.christie@oracle.com
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If scsi_execute_cmd returns < 0, it doesn't initialize the sshdr, so we
shouldn't access the sshdr. If it returns 0, then the cmd executed
successfully, so there is no need to check the sshdr. This has us access
the sshdr when we get a return value > 0.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20231004210013.5601-4-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If scsi_execute_cmd returns < 0, it doesn't initialize the sshdr, so we
shouldn't access the sshdr. If it returns 0, then the cmd executed
successfully, so there is no need to check the sshdr. This has us access
the sshdr when we get a return value > 0.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20231004210013.5601-3-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If scsi_execute_cmd returns < 0, it doesn't initialize the sshdr, so we
shouldn't access the sshdr. If it returns 0, then the cmd executed
successfully, so there is no need to check the sshdr. This has us access
the sshdr when we get a return value > 0.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20231004210013.5601-2-michael.christie@oracle.com
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Mike Christie <michael.christie@oracle.com> says:
The following patches were made over Linus's tree but apply over
Martin's branches. They allow userspace to configure how fabric
drivers submit cmds to backend drivers.
Right now loop and vhost use a worker thread, and the other drivers
submit from the contexts they receive/process the cmd from. For
multiple LUN cases where the target can queue more cmds than the
backend can handle then deferring to a worker thread is safest because
the backend driver can block when doing things like waiting for a free
request/tag. Deferring also helps when the target has to handle
transport level requests from the recv context.
For cases where the backend devices can queue everything the target
sends, then there is no need to defer to a workqueue and you can see a
perf boost of up to 26% for small IO workloads. For a nvme device and
vhost-scsi I can see with 4K IOs:
fio jobs 1 2 4 8 10
--------------------------------------------------
workqueue
submit 94K 190K 394K 770K 890K
direct
submit 128K 252K 488K 950K -
Link: https://lore.kernel.org/r/1b1f7a5c-0988-45f9-b103-dfed2c0405b1@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In some cases, like with multiple LUN targets or where the target has to
respond to transport level requests from the receiving context it can be
better to defer cmd submission to a helper thread. If the backend driver
blocks on something like request/tag allocation it can block the entire
target submission path and other LUs and transport IO on that session.
In other cases like single LUN targets with storage that can support all
the commands that the target can queue, then it's best to submit the cmd
to the backend from the target's cmd receiving context.
Subsequent commits will allow the user to config what they prefer, but
drivers like loop can't directly submit because they can be called from a
context that can't sleep. And, drivers like vhost-scsi can support direct
submission, but need to keep their default behavior of deferring execution
to avoid possible regressions where the backend can block.
Make the drivers tell LIO core if they support direct submissions and their
current default, so we can prevent users from misconfiguring the system and
initialize devices correctly.
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Link: https://lore.kernel.org/r/20230928020907.5730-2-michael.christie@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Hannes Reinecke <hare@suse.de> says:
Hi all,
(taking up an old thread:) here's the first batch of patches for my EH
rework. It modifies the reset callbacks for SCSI drivers such that
the final conversion to drop the 'struct scsi_cmnd' argument and use
the entity in question (host, bus, target, device) as the argument to
the SCSI EH callbacks becomes possible. The first part covers drivers
which just requires minor tweaks.
Link: https://lore.kernel.org/r/20231002154328.43718-1-hare@suse.de
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
SCSI EH host reset is the final callback in the escalation chain; once we
reach this we need to reset the controller. As such it defeats the purpose
to skip controller reset if no I/Os are pending and the RAID device is to
be reset; especially after kexec there might be stale commands pending in
firmware for which we have no reference whatsoever. So this patch splits
off the check for pending I/O into a 'bus_reset' function, and leaves the
actual controller reset to the host reset.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-19-hare@suse.de
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The reset code requires a device to be selected, but we shouldn't rely on
the command to provide a device for us. So select the first device on the
target when sending down a target reset.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-18-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The reset code requires a device to be selected, but we shouldn't rely on
the command to provide a device for us. So select the first device on the
bus when sending down a bus reset.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-17-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There's not much in common between host reset and all other error handlers,
so use a separate function here.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-16-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Split off the combined abort and device reset handling into distinct
functions. And rename the current device reset handler into a target reset
handler, seeing that it really is a target reset.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-15-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current handler does both, bus reset and host reset. So split them off
into two distinct functions.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-14-hare@suse.de
Cc: Matthew Wilcox <willy@infradead.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The code for aborting an outstanding command is a copy of the functionality
from command abort. As we already have called this function once we reach
host reset there's no point in trying to do so again.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-13-hare@suse.de
Cc: Adaptec OEM Raid Solutions <aacraid@microsemi.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When calling a host reset we shouldn't rely on the command triggering the
reset, so allow megaraid_abort_and_reset() to be called with a NULL scb.
And drop the pointless 'bus_reset' and 'target_reset' handlers, which just
call the same function as host_reset.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-12-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For target reset we need a device to send the target reset to, so open-code
the loop in target reset to send the target reset TMF to the correct
device.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-11-hare@suse.de
Cc: Tyrel Datwyler <tyreld@linux.ibm.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When sending a device reset we should not take a reference to the SCSI
command.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-10-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Convert BUILD_SCSIID() into a function and add a scsi_device argument.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-9-hare@suse.de
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When sending a device reset we should not take a reference to the SCSI
command.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-8-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Convert BUILD_SCSIID() into a function and add a scsi_device argument.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-7-hare@suse.de
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a LUN or target reset is issued, we should not rely on a SCSI command
to be present; we'll have to reset the entire device or target anyway.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-6-hare@suse.de
Cc: Saurav Kashyap <skashyap@marvell.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When sending a TMF we're only concerned with the rport and the LUN ID, so
use struct fc_rport as argument for qedf_initiate_tmf().
Signed-off-by: Hannes Reinecke <hare@suse.de>
Link: https://lore.kernel.org/r/20231002154328.43718-5-hare@suse.de
Cc: Saurav Kashyap <skashyap@marvell.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Clang warns (or errors with CONFIG_WERROR=y) several times along the
lines of:
drivers/scsi/ibmvscsi/ibmvfc.c:650:17: warning: implicit truncation from 'int' to a one-bit wide bit-field changes value from 1 to -1 [-Wsingle-bit-bitfield-constant-conversion]
650 | vhost->reinit = 1;
| ^ ~
A single-bit signed integer bitfield only has possible values of -1 and
0, not 0 and 1 like an unsigned one would. No context appears to check
the actual value of these bitfields, just whether or not it is zero.
However, it is easy enough to change the type of the fields to 'unsigned
int', which keeps the same size in memory and resolves the warning.
Fixes: 5144905884 ("scsi: ibmvfc: Use a bitfield for boolean flags")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20231010-ibmvfc-fix-bitfields-type-v1-1-37e95b5a60e5@kernel.org
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
fc_lport_ptp_setup() did not check the return value of fc_rport_create()
which can return NULL and would cause a NULL pointer dereference. Address
this issue by checking return value of fc_rport_create() and log error
message on fc_rport_create() failed.
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20231011130350.819571-1-haowenchao2@huawei.com
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This commit comes at the tail end of a greater effort to remove the
empty elements at the end of the ctl_table arrays (sentinels) which
will reduce the overall build time size of the kernel and run time
memory bloat by ~64 bytes per sentinel (further information Link :
https://lore.kernel.org/all/ZO5Yx5JFogGi%2FcBo@bombadil.infradead.org/)
Remove sentinel from scsi_table and sg_sysctls.
Signed-off-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Luis Chamberlain <mcgrof@kernel.org>
Instead of "if" conditions with line splits, use the usual error handling
pattern with a separate variable to improve readability.
No functional changes intended.
Link: https://lore.kernel.org/r/20230911125354.25501-7-ilpo.jarvinen@linux.intel.com
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Signed-off-by: Bjorn Helgaas <bhelgaas@google.com>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Rename it to pkg_id which is the terminology used in the kernel.
No functional change.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Tested-by: Juergen Gross <jgross@suse.com>
Tested-by: Sohil Mehta <sohil.mehta@intel.com>
Tested-by: Michael Kelley <mikelley@microsoft.com>
Tested-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230814085112.329006989@linutronix.de
Commit ff48b37802 ("scsi: Do not attempt to rescan suspended devices")
modified scsi_rescan_device() to avoid attempting rescanning a suspended
device. However, the modification added a check to verify that a SCSI
device is in the running state without checking if the device request
queue (in the case of block device) is also running, thus allowing the
exectuion of internal requests. Without checking the device request
queue, commit ff48b37802 fix is incomplete and deadlocks on resume can
still happen. Use blk_queue_pm_only() to check if the device request
queue allows executing commands in addition to checking the SCSI device
state.
Reported-by: Petr Tesarik <petr@tesarici.cz>
Fixes: ff48b37802 ("scsi: Do not attempt to rescan suspended devices")
Cc: stable@vger.kernel.org
Tested-by: Petr Tesarik <petr@tesarici.cz>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Tyrel Datwyler <tyreld@linux.ibm.com> says:
This series includes a couple minor fixes, generalization of some code
that is not protocol specific, and a reworking of the way event pool
buffers are accounted for by the driver. This is a precursor to a
series to follow that introduces support for NVMeoF protocol with
ibmvfc.
Link: https://lore.kernel.org/r/20230921225435.3537728-1-tyreld@linux.ibm.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use DEFINE_SHOW_STORE_ATTRIBUTE() helper for read-write file to reduce some
duplicated code.
Link: https://lkml.kernel.org/r/20230905024835.43219-4-yangxingui@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Co-developed-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Animesh Manna <animesh.manna@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Xiang Chen <chenxiang66@hisilicon.com>
Cc: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Use DEFINE_SHOW_STORE_ATTRIBUTE() helper for read-write file to reduce some
duplicated code.
Link: https://lkml.kernel.org/r/20230905024835.43219-3-yangxingui@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Co-developed-by: Xingui Yang <yangxingui@huawei.com>
Signed-off-by: Xingui Yang <yangxingui@huawei.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Al Viro <viro@zeniv.linux.org.uk>
Cc: Animesh Manna <animesh.manna@intel.com>
Cc: Anshuman Gupta <anshuman.gupta@intel.com>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Felipe Balbi <felipe.balbi@linux.intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Himanshu Madhani <himanshu.madhani@cavium.com>
Cc: James Bottomley <jejb@linux.ibm.com>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: Uma Shankar <uma.shankar@intel.com>
Cc: Xiang Chen <chenxiang66@hisilicon.com>
Cc: Zeng Tao <prime.zeng@hisilicon.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Three fixes, all in drivers. The fnic one is the most extensive
because the little used user initiated device reset path never tagged
the command and adding a tag is rather involved. The other two fixes
are smaller and more obvious.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZRwOQCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishZvqAQC95+Aa
ir6B5iAr5dYXgn31l8LfWuXC0Og4ZhU3o7T/1AEA+nwTu6Jqa+HGbS6ntu3LfEtP
J6WaEXlUraHKdf4+Iac=
=GNsn
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three fixes, all in drivers.
The fnic one is the most extensive because the little used user
initiated device reset path never tagged the command and adding a tag
is rather involved. The other two fixes are smaller and more obvious"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: zfcp: Fix a double put in zfcp_port_enqueue()
scsi: fnic: Fix sg_reset success path
scsi: target: core: Fix deadlock due to recursive locking
The scsi device flag no_start_on_resume is not set by any scsi low
level driver. Remove it. This reverts the changes introduced by commit
0a85890559 ("ata,scsi: do not issue START STOP UNIT on resume").
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Chia-Lin Kao (AceLan) <acelan.kao@canonical.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
A larger than usual set of fixes for 6.6-rc4 due to the unexpected
number of fixes needed to address ATA disks suspend/resume issues.
In more details:
- Add missing additionalProperties on child nodes to the pata-common DT
bindings (Rob).
- Fix handling of the REPORT SUPPORTED OPERATION CODES command to
ignore reserved bits (Niklas).
- Increase port multiplier soft reset timeout to accomodate slow
devices and avoid issues on wakeup (Matthias).
- A couple of minor code fixes to avoid compilation warnings in
libata-core and libata-eh (me).
- Many patches from me to address suspend/resume issues, and in
particular a potential deadlock on resume due to the SCSI disk driver
resume operation not being synchronized with libata EH port resume
handling. This is addressed by changing the scsi disk driver disk
start/stop control to allow libata to execute disk suspend (spin
down) and resume (spin up) on its own during system suspend/resume.
Runtime suspend/resume control remains with the SCSI disk driver.
Other fixes include:
- Fix libata power management request issuing to avoid races.
- Establish a link between ATA ports and SCSI devices to order PM
operations.
- Fix device removal to avoid issues with driver rmmod removal.
- Fix synchronization of libata device rescan and SCSI disk resume
operation.
- Remove libsas PM operations as suspend/resume is handled directly
by the sas controller resume.
- Fix the SCSI disk driver to not issue commands to suspended disks,
thus avoiding potential system lock-up on resume.
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCZRbR0gAKCRDdoc3SxdoY
dkArAP9PFTRgsXEwfE7arBXCwQqXj/W0R2KgKug7Fno+SoQLnAD/ZKe2TR50uwxr
9mwYROdMgi50T9ax1RX1jWA0npGXmQg=
=cFzG
-----END PGP SIGNATURE-----
Merge tag 'ata-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ATA fixes from Damien Le Moal:
"A larger than usual set of fixes for 6.6-rc4 due to the unexpected
number of fixes needed to address ATA disks suspend/resume issues.
In more detail:
- Add missing additionalProperties on child nodes to the pata-common
DT bindings (Rob)
- Fix handling of the REPORT SUPPORTED OPERATION CODES command to
ignore reserved bits (Niklas)
- Increase port multiplier soft reset timeout to accomodate slow
devices and avoid issues on wakeup (Matthias)
- A couple of minor code fixes to avoid compilation warnings in
libata-core and libata-eh (me)
- Many patches from me to address suspend/resume issues, and in
particular a potential deadlock on resume due to the SCSI disk
driver resume operation not being synchronized with libata EH port
resume handling.
This is addressed by changing the scsi disk driver disk start/stop
control to allow libata to execute disk suspend (spin down) and
resume (spin up) on its own during system suspend/resume. Runtime
suspend/resume control remains with the SCSI disk driver.
Other fixes include:
- Fix libata power management request issuing to avoid races
- Establish a link between ATA ports and SCSI devices to order PM
operations
- Fix device removal to avoid issues with driver rmmod removal
- Fix synchronization of libata device rescan and SCSI disk resume
operation
- Remove libsas PM operations as suspend/resume is handled
directly by the sas controller resume
- Fix the SCSI disk driver to not issue commands to suspended
disks, thus avoiding potential system lock-up on resume"
* tag 'ata-6.6-rc4' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata:
ata: libata-eh: Fix compilation warning in ata_eh_link_report()
ata: libata-core: Fix compilation warning in ata_dev_config_ncq()
scsi: sd: Do not issue commands to suspended disks on shutdown
ata: libata-core: Do not register PM operations for SAS ports
ata: libata-scsi: Fix delayed scsi_rescan_device() execution
scsi: Do not attempt to rescan suspended devices
ata: libata-scsi: Disable scsi device manage_system_start_stop
scsi: sd: Differentiate system and runtime start/stop management
ata: libata-scsi: link ata port and scsi device
ata: libata-core: Fix port and device removal
ata: libata-core: Fix ata_port_request_pm() locking
ata: libata-sata: increase PMP SRST timeout to 10s
ata: libata-scsi: ignore reserved bits for REPORT SUPPORTED OPERATION CODES
dt-bindings: ata: pata-common: Add missing additionalProperties on child nodes
If an error occurs when resuming a host adapter before the devices
attached to the adapter are resumed, the adapter low level driver may
remove the scsi host, resulting in a call to sd_remove() for the
disks of the host. This in turn results in a call to sd_shutdown() which
will issue a synchronize cache command and a start stop unit command to
spindown the disk. sd_shutdown() issues the commands only if the device
is not already runtime suspended but does not check the power state for
system-wide suspend/resume. That is, the commands may be issued with the
device in a suspended state, which causes PM resume to hang, forcing a
reset of the machine to recover.
Fix this by tracking the suspended state of a disk by introducing the
suspended boolean field in the scsi_disk structure. This flag is set to
true when the disk is suspended is sd_suspend_common() and resumed with
sd_resume(). When suspended is true, sd_shutdown() is not executed from
sd_remove().
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi_rescan_device() takes a scsi device lock before executing a device
handler and device driver rescan methods. Waiting for the completion of
any command issued to the device by these methods will thus be done with
the device lock held. As a result, there is a risk of deadlocking within
the power management code if scsi_rescan_device() is called to handle a
device resume with the associated scsi device not yet resumed.
Avoid such situation by checking that the target scsi device is in the
running state, that is, fully capable of executing commands, before
proceeding with the rescan and bailout returning -EWOULDBLOCK otherwise.
With this error return, the caller can retry rescaning the device after
a delay.
The state check is done with the device lock held and is thus safe
against incoming suspend power management operations.
Fixes: 6aa0365a3c ("ata: libata-scsi: Avoid deadlock on rescan after device resume")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
The underlying device and driver of a SCSI disk may have different
system and runtime power mode control requirements. This is because
runtime power management affects only the SCSI disk, while system level
power management affects all devices, including the controller for the
SCSI disk.
For instance, issuing a START STOP UNIT command when a SCSI disk is
runtime suspended and resumed is fine: the command is translated to a
STANDBY IMMEDIATE command to spin down the ATA disk and to a VERIFY
command to wake it up. The SCSI disk runtime operations have no effect
on the ata port device used to connect the ATA disk. However, for
system suspend/resume operations, the ATA port used to connect the
device will also be suspended and resumed, with the resume operation
requiring re-validating the device link and the device itself. In this
case, issuing a VERIFY command to spinup the disk must be done before
starting to revalidate the device, when the ata port is being resumed.
In such case, we must not allow the SCSI disk driver to issue START STOP
UNIT commands.
Allow a low level driver to refine the SCSI disk start/stop management
by differentiating system and runtime cases with two new SCSI device
flags: manage_system_start_stop and manage_runtime_start_stop. These new
flags replace the current manage_start_stop flag. Drivers setting the
manage_start_stop are modifed to set both new flags, thus preserving the
existing start/stop management behavior. For backward compatibility, the
old manage_start_stop sysfs device attribute is kept as a read-only
attribute showing a value of 1 for devices enabling both new flags and 0
otherwise.
Fixes: 0a85890559 ("ata,scsi: do not issue START STOP UNIT on resume")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Tested-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Single fix for libata: older devices don't support command duration
limits (CDL) and some don't support report opcodes, meaning there's no
way to tell if they support the command or not. Reduce the problems of
incorrectly using CDL commands on older devices by checking SCSI spec
compliance at SPC-5 (the spec which introduced the command) before
turning on CDL.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZRQglCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishYJdAP4rjE8a
/X3Vs7C0PoFDl6HlkN3w4Eeq54vLMmxNez2tywEA6cB3RdwG58g34p8wBt7Lb6UI
1HAIhRub2mpHZyQH0/U=
=qq6Q
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"A single fix for libata: older devices don't support command duration
limits (CDL) and some don't support report opcodes, meaning there's no
way to tell if they support the command or not.
Reduce the problems of incorrectly using CDL commands on older devices
by checking SCSI spec compliance at SPC-5 (the spec which introduced
the command) before turning on CDL"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: core: ata: Do no try to probe for CDL on old drives
sg_reset performs a target or LUN reset. Since the command is issued by the
user, it does not come into the driver with a tag or a queue id. Fix the
fnic driver to create an io_req and use a SCSI command tag. Fix the ITMF
path to special case the sg_reset response.
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Tested-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20230919182436.6895-1-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a per target protocol field so target code can determine correct
protocol specific actions as well as identify the correct channel group
target list.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/r/20230921225435.3537728-12-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The target discovery buffer that the VIOS populates with targets is
currently a host adapter field. To facilitate the discovery of NVMe targets
as well as SCSI another discovery buffer is required. Move the discovery
buffer out of the host struct and into the ibmvfc_channels struct so that
each channels instance for a given protocol has its own discovery buffer.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/r/20230921225435.3537728-11-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are cases in the generic code where protocol specific configuration
or actions may need to be taken. Add a protocol field to struct
ibmvfc_channels and initial IBMVFC_PROTO_[SCSI/NVME] definitions.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/r/20230921225435.3537728-10-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
With the coming of NVMeoF support the driver will need to also allocate
channels for NVMe. Implement generic channel allocation wrappers that can
be used for both SCSI and NVMeoF protocol setup.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/r/20230921225435.3537728-9-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add fields for desired and max number of queues to ibmvfc_channels. With
support for NVMeoF protocol coming these sorts of values should be tracked
in the protocol specific channel struct instead of the overarching host
adapter.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/r/20230921225435.3537728-8-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is nothing scsi specific about the ibmvfc_scsi_channels struct. It is
meant to encapsulate a set of channels regardless of protocol.
Remove _scsi from the struct name to reflect this genric nature.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/r/20230921225435.3537728-7-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are currently 9 binary flag fields in the ibmvfc host
structure. Converting each of these to a single bitfield reduces the foot
print of the structure by 32 bytes.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/r/20230921225435.3537728-6-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 0217a272fe ("scsi: ibmvfc: Store return code of H_FREE_SUB_CRQ
during cleanup") wrongly changed the busy loop check to use
rtas_busy_delay() instead of H_BUSY and H_IS_LONG_BUSY(). The busy return
codes for RTAS and hypercalls are not the same.
Fix this issue by restoring the use of H_BUSY and H_IS_LONG_BUSY().
Fixes: 0217a272fe ("scsi: ibmvfc: Store return code of H_FREE_SUB_CRQ during cleanup")
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/r/20230921225435.3537728-5-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
An LPAR could potentially be configured with a small logical cpu count that
is less then the default hardware queue max. Ensure that we don't allocate
more hw queues than available cpus.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/r/20230921225435.3537728-4-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Extend ibmvfc_queue, ibmvfc_event, and ibmvfc_event_pool to provide queue
depths for general I/O commands and reserved commands as well as proper
accounting of the free events of each type from the general event
pool. Further, calculate the negotiated max command limit with the VIOS at
NPIV login time as a function of the number of queues times their total
queue depth (general and reserved depths combined).
This does away with the legacy max_request value, and allows the driver to
better manage and track it resources.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/r/20230921225435.3537728-3-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In practice the driver should never send more commands than are allocated
to a queue's event pool. In the unlikely event that this happens, the code
asserts a BUG_ON, and in the case that the kernel is not configured to
crash on panic returns a junk event pointer from the empty event list
causing things to spiral from there. This BUG_ON is a historical artifact
of the ibmvfc driver first being upstreamed, and it is well known now that
the use of BUG_ON is bad practice except in the most unrecoverable
scenario. There is nothing about this scenario that prevents the driver
from recovering and carrying on.
Remove the BUG_ON in question from ibmvfc_get_event() and return a NULL
pointer in the case of an empty event pool. Update all call sites to
ibmvfc_get_event() to check for a NULL pointer and perfrom the appropriate
failure or recovery action.
Signed-off-by: Tyrel Datwyler <tyreld@linux.ibm.com>
Link: https://lore.kernel.org/r/20230921225435.3537728-2-tyreld@linux.ibm.com
Reviewed-by: Brian King <brking@linux.vnet.ibm.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, if CONFIG_SCSI_HISI_SAS_DEBUGFS_DEFAULT_ENABLE is enabled, the
memory space used by DFX is allocated during device initialization, which
occupies a large number of memory resources. The memory usage before and
after the driver is loaded is as follows:
Memory usage before the driver is loaded:
$ free -m
total used free shared buff/cache available
Mem: 867352 2578 864037 11 735 861681
Swap: 4095 0 4095
Memory usage after the driver which include 4 HBAs is loaded:
$ insmod hisi_sas_v3_hw.ko
$ free -m
total used free shared buff/cache available
Mem: 867352 4760 861848 11 743 859495
Swap: 4095 0 4095
The driver with 4 HBAs connected will allocate about 110 MB of memory
without enabling debugfs.
Therefore, to avoid wasting memory resources, DFX memory is allocated
during dump triggering. The dump may fail due to memory allocation
failure. After this change, each dump costs about 10 MB of memory, and each
dump lasts about 100 ms.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1694571327-78697-4-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, register information dump is performed via workqueue, regardless
of the trigger mode (automatic or manual). There is a delay in dumping
register through workqueue, the exact register information at trigger time
cannot be obtained.
Call register snapshot directly instead of through a workqueue.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1694571327-78697-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Damien Le Moal <dlemoal@kernel.org> says:
The first patch of this series fixes an issue with IRQ setup which
prevents the controller from resuming after a system suspend. The
following patches are code cleanup without any functional changes.
[mkp: The first patch went into v6.6-rc2 and thus this merge
constitutes the remaining patches of the series]
Link: https://lore.kernel.org/r/20230911232745.325149-1-dlemoal@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove the macro PM8001_READ_VPD used to define if a controller WWN should
be retrieved from the device. Instead, define the better named boolean
module parameter "read_wwn" to control this.
The code to set a fixed address for a phy device address when read_wwn is
set to false is simplified and fixed to avoid sparse warnings.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230911232745.325149-11-dlemoal@kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove the macro PM8001_USE_TASKLET used to conditionally use tasklets for
MSI-X interrupts handling and replace it with the boolean module parameter
pm8001_use_tasklet. This parameter defaults to true and can be true only if
pm8001_use_msix is also true.
Code conditionnaly defined with PM8001_USE_TASKLET is modified to instead
use the parameter pm8001_use_tasklet.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230911232745.325149-10-dlemoal@kernel.org
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The pm8001 driver does not compile if PM8001_USE_MSIX is not defined in
pm8001_sas.h because various fields and functions conditionally defined are
used unconditionally without a "#ifdef PM8001_USE_MSIX" protection. This
macro is rather useless anyway and not convenient as diabling MSI-X use
requires recompiling the driver.
Remove this macro and replace it with the bool module parameter "use_msix"
which defaults to true. The use of MSI-X interrupts for an adapter is gated
by this module parameter for adapters that actually support MSI-X. The
"use_msix" boolean field is added to struct pm8001_hba_info and all code
defined depending on PM8001_USE_MSIX is modified to rely on
pm8001_hba_info->use_msix instead.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230911232745.325149-9-dlemoal@kernel.org
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove the functions pm80xx_chip_intx_interrupt_enable() and
pm80xx_chip_intx_interrupt_disable() and open code them respectively in
pm80xx_chip_interrupt_enable() and pm80xx_chip_interrupt_disable().
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230911232745.325149-8-dlemoal@kernel.org
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
pm8001_chip_msix_interrupt_enable() and
pm8001_chip_msix_interrupt_disable() are always cold with the vector
argument equal to 0. This allows simplifying the code for these
functions. With this change, the functions are simple enough and can be
removed by open coding them directly in pm8001_chip_interrupt_enable() and
pm8001_chip_interrupt_disable(). Also do the same for the functions
pm8001_chip_intx_interrupt_enable() and
pm8001_chip_intx_interrupt_disable() and remove these functions.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230911232745.325149-7-dlemoal@kernel.org
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Factor out the common code of pm8001_interrupt_handler_msix and of
pm8001_interrupt_handler_intx() into the new function pm8001_handle_irq()
and use this new helper in these two functions to simplify the code.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230911232745.325149-6-dlemoal@kernel.org
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Factor out the identical code for killing tasklets in pm8001_pci_remove()
and pm8001_pci_suspend() and instead use the new function
pm8001_kill_tasklet().
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230911232745.325149-5-dlemoal@kernel.org
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Factor out the identical code for initializing tasklets in
pm8001_pci_alloc() and pm8001_pci_resume() and instead use the new function
pm8001_init_tasklet().
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230911232745.325149-4-dlemoal@kernel.org
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of repeating the same code twice in pm8001_pci_remove() and
pm8001_pci_suspend() to free IRQs, introduuce the function
pm8001_free_irq() to do that.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230911232745.325149-3-dlemoal@kernel.org
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some old drives (e.g. an Ultra320 SCSI disk as reported by John) do not
seem to execute MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES
commands correctly and hang when a non-zero service action is specified
(one command format with service action case in scsi_report_opcode()).
Currently, CDL probing with scsi_cdl_check_cmd() is the only caller using a
non zero service action for scsi_report_opcode(). To avoid issues with
these old drives, do not attempt CDL probe if the device reports support
for an SPC version lower than 5 (CDL was introduced in SPC-5). To keep
things working with ATA devices which probe for the CDL T2A and T2B pages
introduced with SPC-6, modify ata_scsiop_inq_std() to claim SPC-6 version
compatibility for ATA drives supporting CDL.
SPC-6 standard version number is defined as Dh (= 13) in SPC-6 r09. Fix
scsi_probe_lun() to correctly capture this value by changing the bit mask
for the second byte of the INQUIRY response from 0x7 to 0xf.
include/scsi/scsi.h is modified to add the definition SCSI_SPC_6 with the
value 14 (Dh + 1). The missing definitions for the SCSI_SPC_4 and
SCSI_SPC_5 versions are also added.
Reported-by: John David Anglin <dave.anglin@bell.net>
Fixes: 624885209f ("scsi: core: Detect support for command duration limits")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230915022034.678121-1-dlemoal@kernel.org
Tested-by: David Gow <david@davidgow.net>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Current release - regressions:
- bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE
- netfilter: fix entries val in rule reset audit log
- eth: stmmac: fix incorrect rxq|txq_stats reference
Previous releases - regressions:
- ipv4: fix null-deref in ipv4_link_failure
- netfilter:
- fix several GC related issues
- fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
- eth: team: fix null-ptr-deref when team device type is changed
- eth: i40e: fix VF VLAN offloading when port VLAN is configured
- eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB
Previous releases - always broken:
- core: fix ETH_P_1588 flow dissector
- mptcp: fix several connection hang-up conditions
- bpf:
- avoid deadlock when using queue and stack maps from NMI
- add override check to kprobe multi link attach
- hsr: properly parse HSRv1 supervisor frames.
- eth: igc: fix infinite initialization loop with early XDP redirect
- eth: octeon_ep: fix tx dma unmap len values in SG
- eth: hns3: fix GRE checksum offload issue
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmUMFG8SHHBhYmVuaUBy
ZWRoYXQuY29tAAoJECkkeY3MjxOksHAP+QE2eNf5yxo86dIS+3RQnOQ8kFBnNbEn
04lrheGnzG7PpNnGoCoTZna+xYQPYVLgbmmip2/CFnQvnQIsKyLQfCui85sfV2V9
KjUeE/kTgeC+jUQOWNDyz3zDP/MPC2LmiK8Gwyggvm9vFYn5tVZXC36aPZBZ7Vok
/DUW6iXyl31SeVGOOEKakcwn0GIYJSABhVFNsjrDe4tV+leUwvf8obAq3ZWxOGaU
D94ez28lSXgfOSWfQQ/l1rHI/yC0fr8HYyWJ60dNG2uS3fNEqT8LyqZfAUK24kVz
XbAGZa+GA7CDq3cVsU7vCWNWbB5fO+kXtmGOwPtuKtJQM5LPo4X77CuSHlpzdyvq
TuW0vxeVfdzAYVb3Zg+2QgWxDJjY0B8ujwdDWrnnKTPu4Ylhn6HLISXIlkMBoGwT
1/47TCnmn9t+lGagkMADppRRnJotHWObQG5wkzksqVa2CUB0HTESgbrm4rsxe6Ku
JiZhHbTiiPWy7LgY6EFtj/YGPvLs0CSltvh4QUsd+QtDTM/EN7y3HcHqkv88ropG
bSvJIh6WXdEJkwfSUdA0LECXSC6dizzZW2Y1glnT+7FMlhE1jVY4gruNJ37mCYMb
0gh9Zr76c2KYLA5vljGp6uo3j3A7wARJTdLfRFVcaFoz6NQmuFf9ZdBfDNDcymxs
AGvO3j55JAZf
=AoVg
-----END PGP SIGNATURE-----
Merge tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Pull networking fixes from Paolo Abeni:
"Including fixes from netfilter and bpf.
Current release - regressions:
- bpf: adjust size_index according to the value of KMALLOC_MIN_SIZE
- netfilter: fix entries val in rule reset audit log
- eth: stmmac: fix incorrect rxq|txq_stats reference
Previous releases - regressions:
- ipv4: fix null-deref in ipv4_link_failure
- netfilter:
- fix several GC related issues
- fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
- eth: team: fix null-ptr-deref when team device type is changed
- eth: i40e: fix VF VLAN offloading when port VLAN is configured
- eth: ionic: fix 16bit math issue when PAGE_SIZE >= 64KB
Previous releases - always broken:
- core: fix ETH_P_1588 flow dissector
- mptcp: fix several connection hang-up conditions
- bpf:
- avoid deadlock when using queue and stack maps from NMI
- add override check to kprobe multi link attach
- hsr: properly parse HSRv1 supervisor frames.
- eth: igc: fix infinite initialization loop with early XDP redirect
- eth: octeon_ep: fix tx dma unmap len values in SG
- eth: hns3: fix GRE checksum offload issue"
* tag 'net-6.6-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (87 commits)
sfc: handle error pointers returned by rhashtable_lookup_get_insert_fast()
igc: Expose tx-usecs coalesce setting to user
octeontx2-pf: Do xdp_do_flush() after redirects.
bnxt_en: Flush XDP for bnxt_poll_nitroa0()'s NAPI
net: ena: Flush XDP packets on error.
net/handshake: Fix memory leak in __sock_create() and sock_alloc_file()
net: hinic: Fix warning-hinic_set_vlan_fliter() warn: variable dereferenced before check 'hwdev'
netfilter: ipset: Fix race between IPSET_CMD_CREATE and IPSET_CMD_SWAP
netfilter: nf_tables: fix memleak when more than 255 elements expired
netfilter: nf_tables: disable toggling dormant table state more than once
vxlan: Add missing entries to vxlan_get_size()
net: rds: Fix possible NULL-pointer dereference
team: fix null-ptr-deref when team device type is changed
net: bridge: use DEV_STATS_INC()
net: hns3: add 5ms delay before clear firmware reset irq source
net: hns3: fix fail to delete tc flower rules during reset issue
net: hns3: only enable unicast promisc when mac table full
net: hns3: fix GRE checksum offload issue
net: hns3: add cmdq check for vf periodic service task
net: stmmac: fix incorrect rxq|txq_stats reference
...
Nothing prevents iscsi_sw_tcp_conn_bind() to receive file descriptor
pointing to non TCP socket (af_unix for example).
Return -EINVAL if this is attempted, instead of crashing the kernel.
Fixes: 7ba2471389 ("[SCSI] open-iscsi/linux-iscsi-5 Initiator: Initiator code")
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Lee Duncan <lduncan@suse.com>
Cc: Chris Leech <cleech@redhat.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: open-iscsi@googlegroups.com
Cc: linux-scsi@vger.kernel.org
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Fix in the imm driver the same problem that was fixed in the ppa driver by
commit 68a4f84a17 ("scsi: ppa: Add a module parameter for the transfer
mode").
Tested and confirmed working with an Iomega Z250P zip drive and a StarTech
PEX1P2 AX99100 PCIe parallel port.
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Link: https://lore.kernel.org/r/20230831054620.515611-1-alexhenrie24@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
sas_discover_end_dev() is defined and used only in sas_discover.c. Define
this function as static.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230912230551.454357-4-dlemoal@kernel.org
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
sas_set_phy_speed() is used only within sas_init.c. Declare this function
as static.
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230912230551.454357-3-dlemoal@kernel.org
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Move the declarations of functions used only within libsas from
include/scsi/libsas.h to drivers/scsi/libsas/sas_internal.h
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230912230551.454357-2-dlemoal@kernel.org
Reviewed-by: John Garry <john.g.garry@oracle.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use FIELD_GET() to extract PCIe capability registers field instead of
custom masking and shifting.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230913122748.29530-9-ilpo.jarvinen@linux.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use FIELD_GET() to extract PCIe capability register fields instead of
custom masking and shifting. Also remove the unnecessary cast to u8, the
value in those fields always fits to u8.
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Link: https://lore.kernel.org/r/20230913122748.29530-8-ilpo.jarvinen@linux.intel.com
Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During rmmod, when dev_loss_tmo callback is called, an ndlp kref count is
decremented twice. Once for SCSI transport registration and second to
remove the initial node allocation kref. If there is also an NVMe
transport registration, another reference count decrement is expected in
lpfc_nvme_unregister_port().
Race conditions between the NVMe transport remoteport_delete and
dev_loss_tmo callbacks sometimes results in premature ndlp object release
resulting in use-after-free issues.
Fix by not dropping the ndlp object in dev_loss_tmo callback with an
outstanding NVMe transport registration. Inversely, mark the final
NLP_DROPPED flag in lpfc_nvme_unregister_port when rmmod flag is set.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230908211923.37603-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a dev_loss_tmo event occurs, an ndlp lock is taken before checking
nlp_flag for NLP_DROPPED. There is an attempt to restore the ndlp lock
when exiting the if statement, but the nlp_put kref could be the final
decrement causing a use-after-free memory access on a released ndlp object.
Instead of trying to reacquire the ndlp lock after checking nlp_flag, just
return after calling nlp_put.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20230908211852.37576-1-justintee8345@gmail.com
Reviewed-by: "Ewan D. Milne" <emilne@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since debugfs_create_file() returns ERR_PTR and never NULL, use IS_ERR() to
check the return value.
Fixes: 2fcbc569b9 ("scsi: lpfc: Make debugfs ktime stats generic for NVME and SCSI")
Fixes: 4c47efc140 ("scsi: lpfc: Move SCSI and NVME Stats to hardware queue structures")
Fixes: 6a828b0f61 ("scsi: lpfc: Support non-uniform allocation of MSIX vectors to hardware queues")
Fixes: 95bfc6d8ad ("scsi: lpfc: Make FW logging dynamically configurable")
Fixes: 9f77870870 ("scsi: lpfc: Add debugfs support for cm framework buffers")
Fixes: c490850a09 ("scsi: lpfc: Adapt partitioned XRI lists to efficient sharing")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20230906030809.2847970-1-ruanjinjie@huawei.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The function pm8001_pci_resume() only calls pm8001_request_irq() without
calling pm8001_setup_irq(). This causes the IRQ allocation to fail, which
leads all drives being removed from the system.
Fix this issue by integrating the code for pm8001_setup_irq() directly
inside pm8001_request_irq() so that MSI-X setup is performed both during
normal initialization and resume operations.
Fixes: dbf9bfe615 ("[SCSI] pm8001: add SAS/SATA HBA driver")
Cc: stable@vger.kernel.org
Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20230911232745.325149-2-dlemoal@kernel.org
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Tags allocated for OPC_INB_SET_CONTROLLER_CONFIG command need to be freed
when we receive the response.
Signed-off-by: Michal Grzedzicki <mge@meta.com>
Link: https://lore.kernel.org/r/20230911170340.699533-2-mge@meta.com
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Mostly small stragglers that missed the initial merge. Driver updates
are qla2xxx and smartpqi (mp3sas has a high diffstat due to the
volatile qualifier removal, fnic due to unused function removal and
sd.c has a lot of code shuffling to remove forward declarations).
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZPyD4CYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishZWGAQDlh/3q
5YJp7f8sIqmgdOiKl3bln3API9Y0MPsC3z5TsAEAv9LYQZH3He4XvxMy/v5FioEs
8IWIoBsUZtcgoK6mI4w=
=8Aj8
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"Mostly small stragglers that missed the initial merge.
Driver updates are qla2xxx and smartpqi (mp3sas has a high diffstat
due to the volatile qualifier removal, fnic due to unused function
removal and sd.c has a lot of code shuffling to remove forward
declarations)"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (38 commits)
scsi: ufs: core: No need to update UPIU.header.flags and lun in advanced RPMB handler
scsi: ufs: core: Add advanced RPMB support where UFSHCI 4.0 does not support EHS length in UTRD
scsi: mpt3sas: Remove volatile qualifier
scsi: mpt3sas: Perform additional retries if doorbell read returns 0
scsi: libsas: Simplify sas_queue_reset() and remove unused code
scsi: ufs: Fix the build for the old ARM OABI
scsi: qla2xxx: Fix unused variable warning in qla2xxx_process_purls_pkt()
scsi: fnic: Remove unused functions fnic_scsi_host_start/end_tag()
scsi: qla2xxx: Fix spelling mistake "tranport" -> "transport"
scsi: fnic: Replace sgreset tag with max_tag_id
scsi: qla2xxx: Remove unused variables in qla24xx_build_scsi_type_6_iocbs()
scsi: qla2xxx: Fix nvme_fc_rcv_ls_req() undefined error
scsi: smartpqi: Change driver version to 2.1.24-046
scsi: smartpqi: Enhance error messages
scsi: smartpqi: Enhance controller offline notification
scsi: smartpqi: Enhance shutdown notification
scsi: smartpqi: Simplify lun_number assignment
scsi: smartpqi: Rename pciinfo to pci_info
scsi: smartpqi: Rename MACRO to clarify purpose
scsi: smartpqi: Add abort handler
...
- Fix OF include file for ata platform drivers (Rob).
- Simplify various ahci, sata and pata platform drivers using the
function devm_platform_ioremap_resource() (Yangtao).
- Cleanup libata time related argument types (e.g. timeouts values)
(Sergey).
- Cleanup libata code around error handling as all ata drivers now
define a error_handler operation (Hannes and Niklas).
- Remove functions intended for libsas that are in fact unused
(Niklas).
- Change the remove device callback of platform drivers to a null
function (Uwe).
- Simplify the pata_imx driver using devm_clk_get_enabled() (Li).
- Remove old and uinused remnants of the ide code in arm, parisc,
powerpc, sparc and m68k architectures and associated drivers
(pata_buddha, pata_falcon and pata_gayle) (Geert).
- Add missing MODULE_DESCRIPTION() in the sata_gemini and pata_ftide010
drivers (me).
- Several fixes for the pata_ep93xx and pata_falcon drivers (Nikita,
Michael).
- Add Elkhart Lake AHCI controller support to the ahci driver (Werner).
- Disable NCQ trim on Micron 1100 drives (Pawel).
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQSRPv8tYSvhwAzJdzjdoc3SxdoYdgUCZPbUdQAKCRDdoc3SxdoY
du6NAP9G10EhjgXDNIM8f1AN8gHrZk2RGSxSuqIKKROxl2i+ugD+Mt8/UjRpf0JO
nVdKTfgeXaLfCXHN/fmkIYNmk+I7Twg=
=9Ety
-----END PGP SIGNATURE-----
Merge tag 'ata-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata
Pull ata updates from Damien Le Moal:
- Fix OF include file for ata platform drivers (Rob)
- Simplify various ahci, sata and pata platform drivers using the
function devm_platform_ioremap_resource() (Yangtao)
- Cleanup libata time related argument types (e.g. timeouts values)
(Sergey)
- Cleanup libata code around error handling as all ata drivers now
define a error_handler operation (Hannes and Niklas)
- Remove functions intended for libsas that are in fact unused (Niklas)
- Change the remove device callback of platform drivers to a null
function (Uwe)
- Simplify the pata_imx driver using devm_clk_get_enabled() (Li)
- Remove old and uinused remnants of the ide code in arm, parisc,
powerpc, sparc and m68k architectures and associated drivers
(pata_buddha, pata_falcon and pata_gayle) (Geert)
- Add missing MODULE_DESCRIPTION() in the sata_gemini and pata_ftide010
drivers (me)
- Several fixes for the pata_ep93xx and pata_falcon drivers (Nikita,
Michael)
- Add Elkhart Lake AHCI controller support to the ahci driver (Werner)
- Disable NCQ trim on Micron 1100 drives (Pawel)
* tag 'ata-6.6-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/dlemoal/libata: (60 commits)
ata: libata-core: Disable NCQ_TRIM on Micron 1100 drives
ata: ahci: Add Elkhart Lake AHCI controller
ata: pata_falcon: add data_swab option to byte-swap disk data
ata: pata_falcon: fix IO base selection for Q40
ata: pata_ep93xx: use soc_device_match for UDMA modes
ata: pata_ep93xx: fix error return code in probe
ata: sata_gemini: Add missing MODULE_DESCRIPTION
ata: pata_ftide010: Add missing MODULE_DESCRIPTION
m68k: Remove <asm/ide.h>
ata: pata_gayle: Remove #include <asm/ide.h>
ata: pata_falcon: Remove #include <asm/ide.h>
ata: pata_buddha: Remove #include <asm/ide.h>
asm-generic: Remove ide_iops.h
sparc: Remove <asm/ide.h>
powerpc: Remove <asm/ide.h>
parisc: Remove <asm/ide.h>
ARM: Remove <asm/ide.h>
ata: pata_imx: Use helper function devm_clk_get_enabled()
ata: sata_rcar: Convert to platform remove callback returning void
ata: sata_mv: Convert to platform remove callback returning void
...
Avoid race condition between I/O completion and abort processing by
protecting the cmd_type with the rport lock.
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Link: https://lore.kernel.org/r/20230901060646.27885-1-skashyap@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since both debugfs_create_dir() and debugfs_create_file() return ERR_PTR
and never NULL, use IS_ERR() instead of checking for NULL.
Fixes: 1e98fb0f92 ("scsi: qla2xxx: Setup debugfs entries for remote ports")
Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Link: https://lore.kernel.org/r/20230831140930.3166359-1-ruanjinjie@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
rqstlen and rsplen were changed to __le32 to fix sparse warnings:
drivers/scsi/qla2xxx/qla_nvme.c:402:30: warning: incorrect type in assignment (different base types)
drivers/scsi/qla2xxx/qla_nvme.c:402:30: expected restricted __le32 [usertype] cmd_len
drivers/scsi/qla2xxx/qla_nvme.c:402:30: got unsigned short [usertype] rsplen
drivers/scsi/qla2xxx/qla_nvme.c:507:30: warning: incorrect type in assignment (different base types)
drivers/scsi/qla2xxx/qla_nvme.c:507:30: expected restricted __le32 [usertype] cmd_len
drivers/scsi/qla2xxx/qla_nvme.c:507:30: got unsigned int [usertype] rqstlen
drivers/scsi/qla2xxx/qla_nvme.c:508:30: warning: incorrect type in assignment (different base types)
drivers/scsi/qla2xxx/qla_nvme.c:508:30: expected restricted __le32 [usertype] rsp_len
drivers/scsi/qla2xxx/qla_nvme.c:508:30: got unsigned int [usertype] rsplen
Correct the endianness in qla2xxx driver thus avoiding changes in
nvme-fc-driver.h.
Fixes: 875386b988 ("scsi: qla2xxx: Add Unsolicited LS Request and Response Support for NVMe")
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230831112146.32595-1-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The conditions were correct in the ppa_in() function but not in the
ppa_out() function.
Fixes: 68a4f84a17 ("scsi: ppa: Add a module parameter for the transfer mode")
Signed-off-by: Alex Henrie <alexhenrie24@gmail.com>
Link: https://lore.kernel.org/r/20230831051945.515476-1-alexhenrie24@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The following processes run into a deadlock. CPU 41 was waiting for CPU 29
to handle a CSD request while holding spinlock "crashdump_lock", but CPU 29
was hung by that spinlock with IRQs disabled.
PID: 17360 TASK: ffff95c1090c5c40 CPU: 41 COMMAND: "mrdiagd"
!# 0 [ffffb80edbf37b58] __read_once_size at ffffffff9b871a40 include/linux/compiler.h:185:0
!# 1 [ffffb80edbf37b58] atomic_read at ffffffff9b871a40 arch/x86/include/asm/atomic.h:27:0
!# 2 [ffffb80edbf37b58] dump_stack at ffffffff9b871a40 lib/dump_stack.c:54:0
# 3 [ffffb80edbf37b78] csd_lock_wait_toolong at ffffffff9b131ad5 kernel/smp.c:364:0
# 4 [ffffb80edbf37b78] __csd_lock_wait at ffffffff9b131ad5 kernel/smp.c:384:0
# 5 [ffffb80edbf37bf8] csd_lock_wait at ffffffff9b13267a kernel/smp.c:394:0
# 6 [ffffb80edbf37bf8] smp_call_function_many at ffffffff9b13267a kernel/smp.c:843:0
# 7 [ffffb80edbf37c50] smp_call_function at ffffffff9b13279d kernel/smp.c:867:0
# 8 [ffffb80edbf37c50] on_each_cpu at ffffffff9b13279d kernel/smp.c:976:0
# 9 [ffffb80edbf37c78] flush_tlb_kernel_range at ffffffff9b085c4b arch/x86/mm/tlb.c:742:0
#10 [ffffb80edbf37cb8] __purge_vmap_area_lazy at ffffffff9b23a1e0 mm/vmalloc.c:701:0
#11 [ffffb80edbf37ce0] try_purge_vmap_area_lazy at ffffffff9b23a2cc mm/vmalloc.c:722:0
#12 [ffffb80edbf37ce0] free_vmap_area_noflush at ffffffff9b23a2cc mm/vmalloc.c:754:0
#13 [ffffb80edbf37cf8] free_unmap_vmap_area at ffffffff9b23bb3b mm/vmalloc.c:764:0
#14 [ffffb80edbf37cf8] remove_vm_area at ffffffff9b23bb3b mm/vmalloc.c:1509:0
#15 [ffffb80edbf37d18] __vunmap at ffffffff9b23bb8a mm/vmalloc.c:1537:0
#16 [ffffb80edbf37d40] vfree at ffffffff9b23bc85 mm/vmalloc.c:1612:0
#17 [ffffb80edbf37d58] megasas_free_host_crash_buffer [megaraid_sas] at ffffffffc020b7f2 drivers/scsi/megaraid/megaraid_sas_fusion.c:3932:0
#18 [ffffb80edbf37d80] fw_crash_state_store [megaraid_sas] at ffffffffc01f804d drivers/scsi/megaraid/megaraid_sas_base.c:3291:0
#19 [ffffb80edbf37dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0
#20 [ffffb80edbf37dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0
#21 [ffffb80edbf37de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0
#22 [ffffb80edbf37e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0
#23 [ffffb80edbf37ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0
#24 [ffffb80edbf37ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0
#25 [ffffb80edbf37ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0
#26 [ffffb80edbf37f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0
#27 [ffffb80edbf37f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0
PID: 17355 TASK: ffff95c1090c3d80 CPU: 29 COMMAND: "mrdiagd"
!# 0 [ffffb80f2d3c7d30] __read_once_size at ffffffff9b0f2ab0 include/linux/compiler.h:185:0
!# 1 [ffffb80f2d3c7d30] native_queued_spin_lock_slowpath at ffffffff9b0f2ab0 kernel/locking/qspinlock.c:368:0
# 2 [ffffb80f2d3c7d58] pv_queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/paravirt.h:674:0
# 3 [ffffb80f2d3c7d58] queued_spin_lock_slowpath at ffffffff9b0f244b arch/x86/include/asm/qspinlock.h:53:0
# 4 [ffffb80f2d3c7d68] queued_spin_lock at ffffffff9b8961a6 include/asm-generic/qspinlock.h:90:0
# 5 [ffffb80f2d3c7d68] do_raw_spin_lock_flags at ffffffff9b8961a6 include/linux/spinlock.h:173:0
# 6 [ffffb80f2d3c7d68] __raw_spin_lock_irqsave at ffffffff9b8961a6 include/linux/spinlock_api_smp.h:122:0
# 7 [ffffb80f2d3c7d68] _raw_spin_lock_irqsave at ffffffff9b8961a6 kernel/locking/spinlock.c:160:0
# 8 [ffffb80f2d3c7d88] fw_crash_buffer_store [megaraid_sas] at ffffffffc01f8129 drivers/scsi/megaraid/megaraid_sas_base.c:3205:0
# 9 [ffffb80f2d3c7dc0] dev_attr_store at ffffffff9b56dd7b drivers/base/core.c:758:0
#10 [ffffb80f2d3c7dd0] sysfs_kf_write at ffffffff9b326acf fs/sysfs/file.c:144:0
#11 [ffffb80f2d3c7de0] kernfs_fop_write at ffffffff9b325fd4 fs/kernfs/file.c:316:0
#12 [ffffb80f2d3c7e20] __vfs_write at ffffffff9b29418a fs/read_write.c:480:0
#13 [ffffb80f2d3c7ea8] vfs_write at ffffffff9b294462 fs/read_write.c:544:0
#14 [ffffb80f2d3c7ee8] SYSC_write at ffffffff9b2946ec fs/read_write.c:590:0
#15 [ffffb80f2d3c7ee8] SyS_write at ffffffff9b2946ec fs/read_write.c:582:0
#16 [ffffb80f2d3c7f30] do_syscall_64 at ffffffff9b003ca9 arch/x86/entry/common.c:298:0
#17 [ffffb80f2d3c7f58] entry_SYSCALL_64 at ffffffff9ba001b1 arch/x86/entry/entry_64.S:238:0
The lock is used to synchronize different sysfs operations, it doesn't
protect any resource that will be touched by an interrupt. Consequently
it's not required to disable IRQs. Replace the spinlock with a mutex to fix
the deadlock.
Signed-off-by: Junxiao Bi <junxiao.bi@oracle.com>
Link: https://lore.kernel.org/r/20230828221018.19471-1-junxiao.bi@oracle.com
Reviewed-by: Mike Christie <michael.christie@oracle.com>
Cc: stable@vger.kernel.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver retries certain register reads 3 times if the returned value is
0. This was done because the controller could return 0 for certain
registers if other registers were being accessed concurrently by the BMC.
In certain systems with increased BMC interactions, the register values
returned can be 0 for longer than 3 retries. Change the retry count from 3
to 30 for the affected registers to prevent problems with out-of-band
management.
Fixes: b899202901 ("scsi: mpt3sas: Add separate function for aero doorbell reads")
Cc: stable@vger.kernel.org
Signed-off-by: Ranjan Kumar <ranjan.kumar@broadcom.com>
Link: https://lore.kernel.org/r/20230829090020.5417-2-ranjan.kumar@broadcom.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
sas_queue_reset() is always called with param "wait" set to 0, so remove it
from this function's parameter list. Also remove unused function
sas_wait_eh().
Signed-off-by: Wenchao Hao <haowenchao2@huawei.com>
Link: https://lore.kernel.org/r/20230729102451.2452826-1-haowenchao2@huawei.com
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When CONFIG_NVME_FC is not set, fcport is unused:
drivers/scsi/qla2xxx/qla_nvme.c: In function 'qla2xxx_process_purls_pkt':
drivers/scsi/qla2xxx/qla_nvme.c:1183:20: warning: unused variable 'fcport' [-Wunused-variable]
1183 | fc_port_t *fcport = uctx->fcport;
| ^~~~~~
While this preprocessor usage could be converted to a normal if
statement to allow the compiler to always see fcport as used, it is
equally easy to just eliminate the fcport variable and use uctx->fcport
directly.
Fixes: 27177862de ("scsi: qla2xxx: Fix nvme_fc_rcv_ls_req() undefined error")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Closes: https://lore.kernel.org/linux-next/20230828131304.269a2a40@canb.auug.org.au/
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308290833.sKkoSSeO-lkp@intel.com/
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Link: https://lore.kernel.org/r/20230829-qla_nvme-fix-unused-fcport-v1-1-51c7560ecaee@kernel.org
Acked-by: Nilesh Javali <njavali@marvell.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The functions fnic_scsi_host_start_tag() and fnic_scsi_host_end_tag() are
not used anywhere, so remove them.
silence the warnings:
drivers/scsi/fnic/fnic_scsi.c:2175:1: warning: unused function 'fnic_scsi_host_start_tag'
drivers/scsi/fnic/fnic_scsi.c:2196:1: warning: unused function 'fnic_scsi_host_end_tag'
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Link: https://lore.kernel.org/r/20230829010222.33393-1-yang.lee@linux.alibaba.com
Acked-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a spelling mistake in a ql_dbg message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20230828213101.758609-1-colin.i.king@gmail.com
Acked-by: Nilesh Javali <njavali@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pull in the fixes tree for a commit that missed 6.5. Also resolve a
trivial merge conflict in fnic.
* 6.5/scsi-fixes: (36 commits)
scsi: storvsc: Handle additional SRB status values
scsi: snic: Fix double free in snic_tgt_create()
scsi: core: raid_class: Remove raid_component_add()
scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW major version > 5
scsi: ufs: mcq: Fix the search/wrap around logic
scsi: qedf: Fix firmware halt over suspend and resume
scsi: qedi: Fix firmware halt over suspend and resume
scsi: qedi: Fix potential deadlock on &qedi_percpu->p_work_lock
scsi: lpfc: Remove reftag check in DIF paths
scsi: ufs: renesas: Fix private allocation
scsi: snic: Fix possible memory leak if device_add() fails
scsi: core: Fix possible memory leak if device_add() fails
scsi: core: Fix legacy /proc parsing buffer overflow
scsi: 53c700: Check that command slot is not NULL
scsi: fnic: Replace return codes in fnic_clean_pending_aborts()
scsi: storvsc: Fix handling of virtual Fibre Channel timeouts
scsi: pm80xx: Fix error return code in pm8001_pci_probe()
scsi: zfcp: Defer fc_rport blocking until after ADISC response
scsi: storvsc: Limit max_sectors for virtual Fibre Channel devices
scsi: sg: Fix checking return value of blk_get_queue()
...
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmTs08EQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpqa4EACu/zKE+omGXBV0Q7kEpVsChjp0ElGtSDIJ
tJfTuvnWqQjrqRv4ksmZvGdx8SkqFuXri4/7oBXlsaqeUVbIQdWJUpLErBye6nxa
lUb6nXOFWwyG94cMRYs71lN0loosjb7aiVw7oVLAIhntq3p3doFl/cyy3ndMZrUE
pZbsrWSt4QiOKhcO0TtIjfAwsr31AN51qFiNNITEiZl3UjXfkGRCK81X0yM2N8zZ
7Y0h1ldPBsZ/olNWeRyaW1uB64nKM0buR7/nDxCV/NI05nndJ34bIgo/JIj4xy0v
SiBj2+y86+oMJZt17yYENwOQdtX3hbyESGuVm9dCrO0t9/byVQxkUk0OMm65BM/l
l2d+gmMQZTbHziqfLlgq9i3i9+B4C2hsb7iBpuo7SW/FPbM45POgi3lpiZycaZyu
krQo1qwL4KSGXzGN9CabEuKDcJcXqLxqMDOyEDA3R5Kz06V9tNuM+Di/mr4vuZHK
sVHUfHuWBO9ionLlGPdc3fH/CuMqic8SHjumiAm2menBZV6cSzRDxpm6H4CyLt7y
tWmw7BNU7dfHFGd+Jw0Ld49sAuEybszEXq6qYv5uYBVfJNqDvOvEeVoQp0RN2jJA
AG30hymcZgxn9n7gkIgkPQDgIGUjnzUR8B2mE2UFU1CYVHXYXAXU55CCI5oeTkbs
d0Y/zCZf1A==
=p1bd
-----END PGP SIGNATURE-----
Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
"Pretty quiet round for this release. This contains:
- Add support for zoned storage to ublk (Andreas, Ming)
- Series improving performance for drivers that mark themselves as
needing a blocking context for issue (Bart)
- Cleanup the flush logic (Chengming)
- sed opal keyring support (Greg)
- Fixes and improvements to the integrity support (Jinyoung)
- Add some exports for bcachefs that we can hopefully delete again in
the future (Kent)
- deadline throttling fix (Zhiguo)
- Series allowing building the kernel without buffer_head support
(Christoph)
- Sanitize the bio page adding flow (Christoph)
- Write back cache fixes (Christoph)
- MD updates via Song:
- Fix perf regression for raid0 large sequential writes (Jan)
- Fix split bio iostat for raid0 (David)
- Various raid1 fixes (Heinz, Xueshi)
- raid6test build fixes (WANG)
- Deprecate bitmap file support (Christoph)
- Fix deadlock with md sync thread (Yu)
- Refactor md io accounting (Yu)
- Various non-urgent fixes (Li, Yu, Jack)
- Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li,
Ming, Nitesh, Ruan, Tejun, Thomas, Xu)"
* tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits)
block: use strscpy() to instead of strncpy()
block: sed-opal: keyring support for SED keys
block: sed-opal: Implement IOC_OPAL_REVERT_LSP
block: sed-opal: Implement IOC_OPAL_DISCOVERY
blk-mq: prealloc tags when increase tagset nr_hw_queues
blk-mq: delete redundant tagset map update when fallback
blk-mq: fix tags leak when shrink nr_hw_queues
ublk: zoned: support REQ_OP_ZONE_RESET_ALL
md: raid0: account for split bio in iostat accounting
md/raid0: Fix performance regression for large sequential writes
md/raid0: Factor out helper for mapping and submitting a bio
md raid1: allow writebehind to work on any leg device set WriteMostly
md/raid1: hold the barrier until handle_read_error() finishes
md/raid1: free the r1bio before waiting for blocked rdev
md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io()
blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init
drivers/rnbd: restore sysfs interface to rnbd-client
md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid()
raid6: test: only check for Altivec if building on powerpc hosts
raid6: test: make sure all intermediate and artifact files are .gitignored
...
Three small driver fixes and one larger unused function set removal in
the raid class (so no external impact).
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZOr0iCYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishaV8AQDtFvZp
KI3GW2x6XjZeXVW3buQVmwLmdBfIIx0yDZGLqAEAm8qZGROMsMhBCvK/iizjsrir
KerWJQ1LU9oMcjbesuk=
=z12E
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three small driver fixes and one larger unused function set removal in
the raid class (so no external impact)"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: snic: Fix double free in snic_tgt_create()
scsi: core: raid_class: Remove raid_component_add()
scsi: ufs: ufs-qcom: Clear qunipro_g4_sel for HW major version > 5
scsi: ufs: mcq: Fix the search/wrap around logic
sgreset is issued with a SCSI command pointer. The device reset code
assumes that it was issued on a hardware queue, and calls block multiqueue
layer. However, the assumption is broken, and there is no hardware queue
associated with the sgreset, and this leads to a crash due to a null
pointer exception.
Fix the code to use the max_tag_id as a tag which does not overlap with the
other tags issued by mid layer.
Tested by running FC traffic for a few minutes, and by issuing sgreset on
the device in parallel. Without the fix, the crash is observed right away.
With this fix, no crash is observed.
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Tested-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20230817182146.229059-1-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Testing of virtual Fibre Channel devices under Hyper-V has shown additional
SRB status values being returned for various error cases. Because these
SRB status values are not recognized by storvsc, the I/O operations are not
flagged as an error. Requests are treated as if they completed normally but
with zero data transferred, which can cause a flood of retries.
Add definitions for these SRB status values and handle them like other
error statuses from the Hyper-V host.
Signed-off-by: Michael Kelley <mikelley@microsoft.com>
Link: https://lore.kernel.org/r/1692984084-95105-1-git-send-email-mikelley@microsoft.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Nilesh Javali <njavali@marvell.com> says:
Martin,
Please apply the qla2xxx driver miscellaneous features and bug fixes
to the scsi tree at your earliest convenience.
Link: https://lore.kernel.org/r/20230821130045.34850-1-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Sparse warning reported,
drivers/scsi/qla2xxx/qla_iocb.c: In function 'qla24xx_build_scsi_type_6_iocbs':
>> drivers/scsi/qla2xxx/qla_iocb.c:594:29: warning: variable 'ha' set but not used [-Wunused-but-set-variable]
594 | struct qla_hw_data *ha;
| ^~
Remove unused variables 'vha' and 'ha'.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202308230757.VKMIztAB-lkp@intel.com/
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230825070017.46066-1-njavali@marvell.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Don Brace <don.brace@microchip.com> says:
cat smartpqi_6.6_cover_letter
These patches are based on Martin Petersen's 6.6/scsi-queue tree
https://git.kernel.org/pub/scm/linux/kernel/git/mkp/scsi.git
6.6/scsi-queue
The biggest functional change to smartpqi is the addition of an abort
handler. Some customers were complaining about I/O stalls to all
devices when only one device is reset. Adding an abort handler helps
to prevent I/O stalls to all devices.
All of the reset of the patches are small changes to logging messages,
MACRO and variable name changes, and one minor change for LUN
assignments.
This set of changes consists of:
* smartpqi-add-abort-handler
When a device reset occurs, the SML pauses I/O to all devices presented
by a controller instance causing some performance issues.
To only affect device with a problematic request, we added an abort handler.
The abort handler is implemented by using a device reset, but I/O to the
other devices is no longer affected.
* smartpqi-refactor-rename-MACRO-to-clarify-purpose
The MACRO SOP_RC_INCORRECT_LOGICAL_UNIT was used to check for a condition
where a TMF was sent an incorrect LUN. We renamed this MACRO to
SOP_TMF_INCORRECT_LOGICAL_UNIT for clarity.
* smartpqi-refactor-rename-pciinfo-to-pci_info
Change the pciinfo variable to pci_info to make more readable code.
No functional changes.
* smartpqi-simplify-lun_number-assignment
We simplified the conditional expression used to populate LUN numbers
for requests.
* smartpqi-enhance-shutdown-notification
Clarify controller cache flush errors. We added in more precise information
to the cache flush informational message.
No functional changes.
* smartpqi-enhance-controller-offline-notification
The driver can offline a controller for multiple reasons. We added
a description of why these rare offline actions are taken. And a function
to provide the specific details of the shutdown.
* smartpqi-enhance-error-messages
We added host🚌target:lun to messages emitted in our reset/abort handlers.
No functional changes.
* smartpqi-change-driver-version-to-2.1.24-046
Link: https://lore.kernel.org/r/20230824155812.789913-1-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Gerry Morong <gerry.morong@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230824155812.789913-9-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add more detail to some TMF messages.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Mahesh Rajashekhara <Mahesh.Rajashekhara@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230824155812.789913-8-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a description for the reason the controller has been taken off-line.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: David Strahan <David.Strahan@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230824155812.789913-7-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Provide more detailed information about cache flush errors during shutdown.
Reviewed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: David Strahan <David.Strahan@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230824155812.789913-6-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Simplify lun_number assignment. lun_number assignment is only required for
non-AIO requests.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: David Strahan <David.Strahan@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230824155812.789913-5-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Make pci device structure names consistent and readable.
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230824155812.789913-4-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Rename SOP_RC_INCORRECT_LOGICAL_UNIT to SOP_TMF_INCORRECT_LOGICAL_UNIT to
clarify the intended purpose.
Reviewed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230824155812.789913-3-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implement aborts as resets.
Avoid I/O stalls across all devices attached to a controller when device
I/O requests time out.
Reviewed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Signed-off-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20230824155812.789913-2-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 41320b18a0 ("scsi: snic: Fix possible memory leak if device_add()
fails") fixed the memory leak caused by dev_set_name() when device_add()
failed. However, it did not consider that 'tgt' has already been released
when put_device(&tgt->dev) is called. Remove kfree(tgt) in the error path
to avoid double free of 'tgt' and move put_device(&tgt->dev) after the
removed kfree(tgt) to avoid a use-after-free.
Fixes: 41320b18a0 ("scsi: snic: Fix possible memory leak if device_add() fails")
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Link: https://lore.kernel.org/r/20230819083941.164365-1-wangzhu9@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Move the sd_pm_ops and sd_template data structures to just above init_sd()
such that the number of forward function declarations can be reduced.
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230823210628.523244-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Many tape devices will automatically rewind following a poweron/reset.
This can result in data loss as other operations in the driver can write to
the tape when the position is unknown. E.g. MTEOM can write a filemark at
the beginning of the tape. This patch adds code to detect poweron/reset
unit attentions and prevents the driver from writing to the tape when the
position could be unknown.
Customer reported problem description:
We have experienced an issue with the SCSI tape driver (st) which has led
to data loss for us on two separate occasions in production, as well as in
a third case in which we were able to reproduce the failure in our test
environment.
The tape device involved is an Amazon Tape Gateway, a virtual tape library
(VTL) appliance which presents as a series of iSCSI targets (multiple tape
drives and a changer) and is backed by storage in Amazon S3. The problem is
a general one and not limited to any particular SCSI transport or tape
device, though the nature of both iSCSI and the VTL make data loss somewhat
more likely with this combination than with a physical tape drive.
The observed behavior occurs when an error causes the VTL tape gateway
process (on the appliance) to crash and restart. This interrupts the iSCSI
TCP connections and, when it occurs during a write, causes the write to
fail with EIO. However, we then find that the virtual tape in question is
now completely blank. We raised this issue with AWS support, thinking this
must be a bug in the VTL appliance, but that turns out not to be the case.
Per AWS support, when the gateway crashes in this manner, its notion of the
current tape position is reset to the beginning of the tape. It also sets a
unit attention condition, such that the next request results in a CHECK
CONDITION status with sense key UNIT ATTENTION and asc/ascq indicating a
device reset. According to their logs the next command being sent is WRITE
FILEMARK, which results in writing an FM at the beginning of the tape,
effectively discarding its contents.
In fact, once the write fails with EIO, our software attempts to recover by
rewinding and repositioning the tape, then resuming operation. If this
fails, it attempts to rewind and reposition again, write a marker at the
end of the tape, and then unmount. It does not under any circumstances
write either data or filemarks without having successfully positioned the
tape to a known point.
What actually happens is that, since the last operation was a write, the
kernel executes an implied MTWEOF operation (which translate to a Write
Filemarks command) before the rewind that was actually requested. This
seems not entirely unreasonable, provided the tape position is known.
However, once this request fails (due to the unit attention condition), our
next rewind attempt also triggers an implied MTWEOF, which does _not_ fail
(the unit attention condition persists only until the initiator has been
notified); this is the command that unexpectedly erases the tape.
Our analysis is that the st driver is in fact completely ignoring the UNIT
ATTENTION and associated reset notification from the device. This is not a
condition that can be detected in the transport or mid-layer, as it occurs
entirely within the target and is reported only via the UNIT ATTENTION
sense key. The upper driver (i.e. st) needs to detect this indication and
reset its internal model of the device to an unknown state.
Suggested-by: Jeffrey Hutzelman <jhutz@cmu.edu>
Signed-off-by: John Meneghini <jmeneghi@redhat.com>
Link: https://lore.kernel.org/r/20230822181413.1210647-1-jmeneghi@redhat.com
Acked-by: Kai Mäkisara <kai.makisara@kolumbus.fi>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Provide information in debugfs about SCSI error handling to make it easier
to debug the SCSI error handler. Additionally, report the maximum number of
retries in debugfs (.allowed).
Reviewed-by: John Garry <john.g.garry@oracle.com>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230822163811.219569-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Most callers of scsi_rescan_device() have the scsi_device pointer readily
available. Pass a struct scsi_device pointer to scsi_rescan_device()
instead of a struct device pointer. This change prevents that a pointer to
another struct device would be passed accidentally to scsi_rescan_device().
Remove the scsi_rescan_device() declaration from the scsi_priv.h header
file since it duplicates the declaration in <scsi/scsi_host.h>.
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <damien.lemoal@opensource.wdc.com>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Cc: Mike Christie <michael.christie@oracle.com>
Cc: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Link: https://lore.kernel.org/r/20230822153043.4046244-1-bvanassche@acm.org
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is no need to check whether shost_priv() returns a non-NULL value, as
the pointer returned is just an offset to the passed in parameter.
While at it replace an open coded shost_priv() instance.
Reported-by: Dan Carpenter <dan.carpenter@linaro.org>
Signed-off-by: Juergen Gross <jgross@suse.com>
Link: https://lore.kernel.org/r/20230822064817.27257-1-jgross@suse.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The raid_component_add() function was added to the kernel tree via patch
"[SCSI] embryonic RAID class" (2005). Remove this function since it never
has had any callers in the Linux kernel. And also raid_component_release()
is only used in raid_component_add(), so it is also removed.
Signed-off-by: Zhu Wang <wangzhu9@huawei.com>
Link: https://lore.kernel.org/r/20230822015254.184270-1-wangzhu9@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Fixes: 04b5b5cb01 ("scsi: core: Fix possible memory leak if device_add() fails")
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
John Garry <john.g.garry@oracle.com> says:
This series tidies-up libsas a bit, including:
- delete structure(s) with only one member
- delete structure members which are only ever set
- delete structure members which are never set and code which relies on
that member being set
This conflicts with the following series:
https://lore.kernel.org/linux-scsi/20230809132249.37948-1-yuehaibing@huawei.com/
Any conflict should be trivial to resolve.
Based on mkp-scsi staging at a18e81d17a ("scsi: ufs: ufs-pci: Add support for QEMU")
Link: https://lore.kernel.org/r/20230815115156.343535-1-john.g.garry@oracle.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Igor Pylypiv <ipylypiv@google.com> says:
This patch series plumbs libata's request for a result taskfile
(ATA_QCFLAG_RESULT_TF) through libsas to pm80xx LLDD. Other libsas LLDDs
can start using the newly added return_fis_on_success as well, if needed.
For Command Duration Limits policy 0xD (command completes without an
error) libata needs FIS in order to detect the ATA_SENSE bit and read
the Sense Data for Successful NCQ Commands log (0Fh). pm80xx HBAs do
not return FIS on success by default, hence, the driver is updated to
set the RETFIS bit (Return FIS on good completion) when requested by
libsas.
Link: https://lore.kernel.org/r/20230819213040.1101044-1-ipylypiv@google.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since libsas was introduced in commit 2908d778ab ("[SCSI] aic94xx: new
driver"), sas_ata_task.retry_count is never set, so delete it and the
reference in asd_build_ata_ascb().
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230815115156.343535-11-john.g.garry@oracle.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since libsas was introduced in commit 2908d778ab ("[SCSI] aic94xx: new
driver"), sas_ata_task.stp_affil_pol is never set, so delete it and the
reference in asd_build_ata_ascb().
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230815115156.343535-10-john.g.garry@oracle.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since libsas was introduced in commit 2908d778ab ("[SCSI] aic94xx: new
driver"), sas_ata_task.set_affil_pol is never set, so delete it and the
reference in asd_build_ata_ascb().
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230815115156.343535-9-john.g.garry@oracle.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since libsas was introduced in commit 2908d778ab ("[SCSI] aic94xx: new
driver"), sas_ssp_task.task_prio is never set, so delete it and any
references which depend on it being set (all of them).
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230815115156.343535-8-john.g.garry@oracle.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since libsas was introduced in commit 2908d778ab ("[SCSI] aic94xx: new
driver"), sas_ssp_task.enable_first_burst is never set, so delete it and
any references.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230815115156.343535-7-john.g.garry@oracle.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since libsas was introduced in commit 2908d778ab ("[SCSI] aic94xx: new
driver"), sas_ssp_task.retry_count is only ever set, so delete it.
The aic94xx driver also had its own retry_count definition in struct scb
sub-structs, which may have caused a mix-up.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230815115156.343535-6-john.g.garry@oracle.com
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since commit 79855d1785 ("libsas: remove task_collector mode"), struct
scsi_core only contains a reference to the shost. struct scsi_core is only
used in sas_ha_struct.core, so delete scsi_core and replace with a
reference to the shost there.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230815115156.343535-5-john.g.garry@oracle.com
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
enum sas_phy_type is used for asd_sas_phy.type, which is only ever set, so
delete this member and the enum.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230815115156.343535-4-john.g.garry@oracle.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
enum sas_class prob would have been useful if function sas_show_class() was
ever implemented, which it wasn't.
enum sas_class is used as asd_sas_port.class and asd_sas_phy.class, which
are only ever set, so delete these members and the enum.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230815115156.343535-3-john.g.garry@oracle.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since libsas was introduced in commit 2908d778ab ("[SCSI] aic94xx: new
driver"), sas_ha_struct.lldd_module has only ever been set, so remove it.
Struct scsi_host_template already has a reference to the LLD driver
module as to stop the driver being removed unexpectedly.
Signed-off-by: John Garry <john.g.garry@oracle.com>
Link: https://lore.kernel.org/r/20230815115156.343535-2-john.g.garry@oracle.com
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Jason Yan <yanaijie@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
User accidently passed module parameter ql2xenabledif=1 which is
unsupported. However, driver still initialized which lead to guard tag
errors during device discovery.
Remove unsupported ql2xenabledif=1 option and validate the user input.
Cc: stable@vger.kernel.org
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-7-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
TMF was returned with an error code. The error code was not preserved to be
returned to upper layer. Instead, the error code from the Marker was
returned.
Preserve error code from TMF and return it to upper layer.
Cc: stable@vger.kernel.org
Fixes: da7c21b72a ("scsi: qla2xxx: Fix command flush during TMF")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-6-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add logs for SFP Temperature Alert async event to check if laser is
enabled/disabled.
Signed-off-by: Bikash Hazarika <bhazarika@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-5-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The storage was not draining I/Os and the work load was not spread out
across different CPUs evenly. This led to firmware resource counters
getting overrun on the busy CPU. This overrun prevented error recovery from
happening in a timely manner.
By switching the counter to atomic, it allows the count to be little more
accurate to prevent the overrun.
Cc: stable@vger.kernel.org
Fixes: da7c21b72a ("scsi: qla2xxx: Fix command flush during TMF")
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-4-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix race condition between Interrupt thread and Chip reset thread in trying
to flush the same mailbox. With the race condition, the "ha->mbx_intr_comp"
will get an extra complete() call. The extra complete call create erroneous
mailbox timeout condition when the next mailbox is sent where the mailbox
call does not wait for interrupt to arrive. Instead, it advances without
waiting.
Add lock protection around the check for mailbox completion.
Cc: stable@vger.kernel.org
Fixes: b2000805a9 ("scsi: qla2xxx: Flush mailbox commands on chip reset")
Signed-off-by: Quinn Tran <quinn.tran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Introduce infrastructure in the driver to support the processing of
unsolicited LS (Link Service) requests. This will involve the utilization
of a new pass-up of unsolicited FC-NVMe request IOCB interface. Unsolicited
requests will be submitted to the NVMe transport layer through
nvme_fc_rcv_ls_req(). Any received LS responses, which are sent using
xmt_ls_rsp(), will be forwarded to the firmware through the existing
Pass-Through IOCB interface, responsible for sending FC-NVMe Link Service
requests and responses.
Signed-off-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230821130045.34850-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
System crashes when a 32-byte CDB was sent to a non T10 PI disk:
[ 177.143279] ? qla2xxx_dif_start_scsi_mq+0xcd8/0xce0 [qla2xxx]
[ 177.149165] ? internal_add_timer+0x42/0x70
[ 177.153372] qla2xxx_mqueuecommand+0x207/0x2b0 [qla2xxx]
[ 177.158730] scsi_queue_rq+0x2b7/0xc00
[ 177.162501] blk_mq_dispatch_rq_list+0x3ea/0x7e0
Current code attempted to use CRC IOCB to send the command but failed.
Instead, type 6 IOCB should be used to send the I/O.
Clone existing type 6 IOCB code with addition of MQ support to allow
32-byte CDBs to go through.
Signed-off-by: Quinn Tran <qutran@marvell.com>
Cc: Laurence Oberman <loberman@redhat.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230817063132.21900-3-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
dsd_list contains a list of dsd buffer resources allocated during traffic
time. It resides in the qla_hw_data location where some of the code is not
reusable.
Move this list to qpair to allow reuse by either single queue or multi
queue adapter / code.
Signed-off-by: Quinn Tran <qutran@marvell.com>
Signed-off-by: Nilesh Javali <njavali@marvell.com>
Link: https://lore.kernel.org/r/20230817063132.21900-2-njavali@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The lpfc_vmid_host_uuid is not defined as uuid_t and its usage is not the
same as for uuid_t operations (like exporting or importing). Hence replace
call to uuid_is_null() by respective memchr_inv() without abusing casting.
With that, replace LPFC_COMPRESS_VMID_SIZE with plain number and respective
sizeof() to make code robust to changes in the future, if any.
Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Link: https://lore.kernel.org/r/20230818155452.875781-1-andriy.shevchenko@linux.intel.com
Reviewed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 4fcf812ca3 ("[SCSI] libsas: export sas_alloc_task()") removed
these implementations but not the declarations.
Signed-off-by: Yue Haibing <yuehaibing@huawei.com>
Link: https://lore.kernel.org/r/20230818124700.49724-1-yuehaibing@huawei.com
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is a long call chain that &fip->ctlr_lock is acquired by isr
fnic_isr_msix_wq_copy() under hard IRQ context. Thus other process context
code acquiring the lock should disable IRQ, otherwise deadlock could happen
if the IRQ preempts the execution while the lock is held in process context
on the same CPU.
[ISR]
fnic_isr_msix_wq_copy()
-> fnic_wq_copy_cmpl_handler()
-> fnic_fcpio_cmpl_handler()
-> fnic_fcpio_flogi_reg_cmpl_handler()
-> fnic_flush_tx()
-> fnic_send_frame()
-> fcoe_ctlr_els_send()
-> spin_lock_bh(&fip->ctlr_lock)
[Process Context]
1. fcoe_ctlr_timer_work()
-> fcoe_ctlr_flogi_send()
-> spin_lock_bh(&fip->ctlr_lock)
2. fcoe_ctlr_recv_work()
-> fcoe_ctlr_recv_handler()
-> fcoe_ctlr_recv_els()
-> fcoe_ctlr_announce()
-> spin_lock_bh(&fip->ctlr_lock)
3. fcoe_ctlr_recv_work()
-> fcoe_ctlr_recv_handler()
-> fcoe_ctlr_recv_els()
-> fcoe_ctlr_flogi_retry()
-> spin_lock_bh(&fip->ctlr_lock)
4. -> fcoe_xmit()
-> fcoe_ctlr_els_send()
-> spin_lock_bh(&fip->ctlr_lock)
spin_lock_bh() is not enough since fnic_isr_msix_wq_copy() is a
hardirq.
These flaws were found by an experimental static analysis tool I am
developing for irq-related deadlock.
The patch fix the potential deadlocks by spin_lock_irqsave() to disable
hard irq.
Fixes: 794d98e77f ("[SCSI] libfcoe: retry rejected FLOGI to another FCF if possible")
Signed-off-by: Chengfeng Ye <dg573847474@gmail.com>
Link: https://lore.kernel.org/r/20230817074708.7509-1-dg573847474@gmail.com
Reviewed-by: Davidlohr Bueso <dave@stgolabs.net>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In the function sli_xmit_bls_rsp64_wqe(), the 'if' and 'else' conditions
evaluates the same expression and give the same output. Also, params->s_id
shall not be equal to U32_MAX. Remove the unused code.
This fixes coccinelle warning such as:
drivers/scsi/elx/libefc_sli/sli4.c:2320:2-4: WARNING: possible
condition with no effect (if == else)
Signed-off-by: Rajeshwar R Shinde <coolrrsh@gmail.com>
Link: https://lore.kernel.org/r/20230817114301.17601-1-coolrrsh@gmail.com
Reviewed-by: Ram Vegesna <ram.vegesna@broadcom.com>
Acked-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
One-element and zero-length arrays are deprecated. So, replace one-element
array in struct fc_rscn_pl_s with flexible-array member.
This results in no differences in binary output.
Link: https://github.com/KSPP/linux/issues/339
Signed-off-by: "Gustavo A. R. Silva" <gustavoars@kernel.org>
Link: https://lore.kernel.org/r/ZN0VTpDBOSVHGayb@work
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
PCI core API pci_dev_id() can be used to get the BDF number for a PCI
device. We don't need to compose it manually. Use pci_dev_id() to simplify
the code a little bit.
Signed-off-by: Zheng Zengkai <zhengzengkai@huawei.com>
Link: https://lore.kernel.org/r/20230811111310.32364-1-zhengzengkai@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
By default PM80xx HBAs return FIS only when a drive reports an error.
The RETFIS bit forces the controller to populate FIS even when a drive
reports no error.
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Link: https://lore.kernel.org/r/20230819213040.1101044-3-ipylypiv@google.com
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Set return_fis_on_success when libata requests result taskfile.
For Command Duration Limits policy 0xD (command completes without
an error) libata needs FIS in order to detect the ATA_SENSE bit and
read the Sense Data for Successful NCQ Commands log (0Fh).
Signed-off-by: Igor Pylypiv <ipylypiv@google.com>
Link: https://lore.kernel.org/r/20230819213040.1101044-2-ipylypiv@google.com
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
PCI core API pci_dev_id() can be used to get the BDF number for a PCI
device. We don't need to compose it manually. Use pci_dev_id() to simplify
the code a little bit.
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
Link: https://lore.kernel.org/r/20230815025419.3523236-4-zhangjialin11@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
PCI core API pci_dev_id() can be used to get the BDF number for a PCI
device. We don't need to compose it manually. Use pci_dev_id() to simplify
the code a little bit.
Signed-off-by: Jialin Zhang <zhangjialin11@huawei.com>
Link: https://lore.kernel.org/r/20230815025419.3523236-3-zhangjialin11@huawei.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>