Add a periodic check for issuing of QFPA command and VMID timeout in the
worker thread. The inactivity timeout check is added via the timer
function.
Link: https://lore.kernel.org/r/20210608043556.274139-13-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implement timeout functionality for the VMID. After the set time period of
inactivity, the VMID is deregistered from the switch.
Link: https://lore.kernel.org/r/20210608043556.274139-12-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add the VMID in wqe before sending out the request. The type of VMID
depends on the configured type and is checked before being appended.
Link: https://lore.kernel.org/r/20210608043556.274139-11-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implement CT commands for registering and deregistering the appid for the
application. Also, a small change in decrementing the ndlp ref counter has
been added.
Link: https://lore.kernel.org/r/20210608043556.274139-10-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implement routines to save, retrieve, and remove the VMIDs from the data
structure. A hash table is used to save the VMIDs and the corresponding
UUIDs associated with the application/VMs.
Link: https://lore.kernel.org/r/20210608043556.274139-9-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implement ELS commands QFPA and UVEM for the priority tagging appid
support.
Link: https://lore.kernel.org/r/20210608043556.274139-8-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add supporting datastructures for mailbox command which helps in
determining if the firmware supports appid. Allocate resources for VMID at
initialization time and clean them up on removal.
Link: https://lore.kernel.org/r/20210608043556.274139-7-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Initialize parameters such as type of VMID, max number of VMIDs supported
and timeout value for the VMID registration based on the user input.
Link: https://lore.kernel.org/r/20210608043556.274139-6-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add the primary datastructures needed to implement VMID in the lpfc
driver. Maintain the capability, current state, and hash table for the
vmid/appid along with other information. This implementation supports the
two versions of vmid implementation (app header and priority tagging).
Link: https://lore.kernel.org/r/20210608043556.274139-5-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a unique application identifier (i.e fc_app_id member) in blkcg. This
allows identification of traffic belonging to an specific both on the host
and in the fabric infrastructure. As an example, this allows the storage
stack to uniquely identify traffic belong to particular virtual machine.
Link: https://lore.kernel.org/r/20210608043556.274139-3-muneendra.kumar@broadcom.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Acked-by: Tejun Heo <tj@kernel.org>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A couple of ibmvscsi files are missing the inclusion of linux/of.h
and linux/irqdomain.h, relying on transitive inclusion from another
file.
As we are about to break this dependency, make sure these dependencies
are explicit.
Signed-off-by: Marc Zyngier <maz@kernel.org>
Add all the attributes for FDMI.
Fall back mechanism is added in between FDMI V2 and FDMI V1 attributes. In
case FDMI get fails for V2 attributes we fall back to V1 attributes.
Link: https://lore.kernel.org/r/20210603121623.10084-5-jhasan@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add all attributes for RHBA and RPA registration.
Fallback mechanism is added between RBHA V2 and RHBA V1 attributes. In case
RHBA get fails for V2 attributes we fall back to V1 attribute registration.
Link: https://lore.kernel.org/r/20210603121623.10084-4-jhasan@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Initialize RHBA and RPA attributes.
Link: https://lore.kernel.org/r/20210603121623.10084-2-jhasan@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Incorrect condition check was leading to data corruption.
Link: https://lore.kernel.org/r/20210603101404.7841-3-jhasan@marvell.com
Fixes: 8fd9efca86 ("scsi: libfc: Work around -Warray-bounds warning")
CC: stable@vger.kernel.org
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If an internal task abort timeout occurs, the controller has developed a
fault, and needs to be reset to be recovered.
When this occurs during error handling, the current policy is to allow
error handling to continue, and the inevitable nexus ha reset will handle
the required reset.
However various steps of error handling need to taken before this happens.
These also involve some level of HW interaction, which will also fail with
various timeouts.
Speed up this process by recording a HW fault bit for an internal abort
timeout - when this is set, just automatically error any HW interaction,
and essentially go straight to clear nexus ha (to reset the controller).
Link: https://lore.kernel.org/r/1623058179-80434-6-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If an internal task abort timeout occurs, the controller has developed a
fault, and needs to be reset to be recovered. However if a timeout occurs
during SCSI error handling, issuing a controller reset immediately may
conflict with the error handling.
To handle internal abort in these two scenarios, only queue the reset when
not in an error handling function. In the case of a timeout during error
handling, do nothing and rely on the inevitable ha nexus reset to reset the
controller.
Link: https://lore.kernel.org/r/1623058179-80434-5-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For a clear nexus reset operation, the I_T nexus resets are executed
serially for each device. For devices attached through an expander, this
may take 2s per device; so, in total, could take a long time.
Reduce the total time by running the I_T nexus resets in parallel through
async operations.
Link: https://lore.kernel.org/r/1623058179-80434-3-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If an OOB event is received but the phy still fails to come up, a link
reset will be issued repeatedly at an interval of 20s until the phy comes
up.
Set a limit for link reset issue retries to avoid printing the timeout
message endlessly.
Link: https://lore.kernel.org/r/1623058179-80434-2-git-send-email-john.garry@huawei.com
Signed-off-by: Luo Jiaxing <luojiaxing@huawei.com>
Signed-off-by: John Garry <john.garry@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
qedi_clear_session_ctx() could race with the in-kernel or userspace driven
recovery/removal and we could access a NULL conn or do a double free.
We should be using iscsi_host_remove() to start the removal process from
the driver. It will start the in-kernel recovery and notify userspace that
the driver's scsi_hosts are being removed. iscsid will then drive the
session removal like is done when the logout command is run. When the
sessions are removed, iscsi_host_remove() will return so qedi can finish
knowing there are no running sessions and no new sessions will be allowed.
This also fixes an issue where we check for a NULL conn after already
accessing it introduced in commit 27e986289e ("scsi: iscsi: Drop suspend
calls from ep_disconnect") by just removing the function completely.
Link: https://lore.kernel.org/r/20210609192709.5094-1-michael.christie@oracle.com
Fixes: 27e986289e ("scsi: iscsi: Drop suspend calls from ep_disconnect")
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The pci_alloc_irq_vectors_affinity() function returns negative error codes
or it returns a number between the minimum vectors (1 in this case) and
max_vectors. It won't return zero. Because "i" is a u16 then the error
handling won't work. And also if it did work the error code was not set.
Really "max_vectors" can be an int as well because we're doing a min_t() on
int type. The other change is that it's better to remove unnecessary
initialization so that static checkers can warn us if there are ever
uninitialized variable bugs introduced in the future.
I changed the error code from -1 (-EPERM) if the kmalloc() failed to
-ENOMEM. And on success path I changed it from "return retval;" to "return
0;" which shouldn't affect the compiled code but makes it more readable.
Link: https://lore.kernel.org/r/YMCJcnmSI4kOIyv/@mwanda
Fixes: 824a156633 ("scsi: mpi3mr: Base driver code")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The "mrioc->intr_info" pointer can't be NULL, but if it could then the
second iteration through the loop would Oops. Let's delete the confusing
and impossible NULL check.
Link: https://lore.kernel.org/r/YMCJKgykDYtyvY44@mwanda
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix a double free, scsi_tgt_priv_data will be freed in
mpi3mr_target_destroy() so remove the kfree() from mpi3mr_target_alloc().
I've also removed few unneeded initialisations.
Link: https://lore.kernel.org/r/20210608145712.16386-1-thenzl@redhat.com
Acked-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Tomas Henzl <thenzl@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In ufshcd_exec_dev_cmd(), if error happens before lrpb is initialized, then
we should bail out instead of letting trace record the error.
Link: https://lore.kernel.org/r/1623227044-22635-1-git-send-email-cang@codeaurora.org
Fixes: a45f937110 ("scsi: ufs: Optimize host lock on transfer requests send/compl paths")
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Since devman_upiu_cmd() is not COMMAND UPIU, and doesn't have CDB, it is
better to use UPIU query trace, which provides more helpful information for
issue troubleshooting.
Link: https://lore.kernel.org/r/20210531104308.391842-5-huobean@gmail.com
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For the query request, we already have query_trace, but in
ufshcd_send_command(), there will add two more redundant traces. Since
lrbp->cmd is NULL in the query request, the two trace events below provide
nothing except the tag and DB. Instead of letting them take up the limited
trace ring buffer, it’s better not to print these traces in case of cmd ==
NULL.
ufshcd_command: send_req: ff3b0000.ufs: tag: 28, DB: 0x0, size: -1, IS: 0, LBA: 18446744073709551615, opcode: 0x0 (0x0), group_id: 0x0
ufshcd_command: dev_complete: ff3b0000.ufs: tag: 28, DB: 0x0, size: -1, IS: 0, LBA: 18446744073709551615, opcode: 0x0 (0x0), group_id: 0x0
Link: https://lore.kernel.org/r/20210531104308.391842-4-huobean@gmail.com
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current UPIU completion event trace still prints the COMMAND UPIU
header, rather than the RSP UPIU header. This makes UPIU command trace
useless in problem shooting in case we receive a trace log from the
customer/field.
There are two important fields in RSP UPIU:
1. The response field, which indicates the UFS defined overall success or
failure of the series of Command, Data and RESPONSE UPIU’s that make up
the execution of a task.
2. The Status field, which contains the command set specific status for a
specific command issued by the initiator device.
Before this commit, the UPIU paired trace events:
ufshcd_upiu: send_req: fe3b0000.ufs: HDR:01 20 00 1c 00 00 00 00 00 00 00 00, CDB:3b e1 00 00 00 00 00 00 30 00 00 00 00 00 00 00
ufshcd_upiu: complete_rsp: fe3b0000.ufs: HDR:01 20 00 1c 00 00 00 00 00 00 00 00, CDB:3b e1 00 00 00 00 00 00 30 00 00 00 00 00 00 00
After this commit:
ufshcd_upiu: send_req: fe3b0000.ufs: HDR:01 20 00 1c 00 00 00 00 00 00 00 00, CDB:3b e1 00 00 00 00 00 00 30 00 00 00 00 00 00 00
ufshcd_upiu: complete_rsp: fe3b0000.ufs: HDR:21 00 00 1c 00 00 00 00 00 00 00 00, CDB:3b e1 00 00 00 00 00 00 30 00 00 00 00 00 00 00
Link: https://lore.kernel.org/r/20210531104308.391842-3-huobean@gmail.com
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
To consistent with trace event print, convert the value of the variable
'lba' from a block layer sector address to a logical block adress.
Link: https://lore.kernel.org/r/20210531104308.391842-2-huobean@gmail.com
Suggested-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix a
fall-through warning by explicitly adding a break statement instead of just
letting the code fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Link: https://lore.kernel.org/r/20210604023530.GA180997@embeddedor
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix a
fall-through warning by replacing a /* fallthrough */ comment with the new
pseudo-keyword macro fallthrough;
Link: https://github.com/KSPP/linux/issues/115
Link: https://lore.kernel.org/r/20210604022752.GA168289@embeddedor
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
By reading the UTP Transfer Request List Completion Notification Register,
which is added in UFSHCI Ver 3.0, SW can easily get the compeleted transfer
requests. Thus, SW can get rid of host lock, which is used to synchronize
the tr_doorbell and outstanding_reqs, on transfer requests dispatch and
completion paths. This can further benefit random read/write performance.
Link: https://lore.kernel.org/r/1621845419-14194-4-git-send-email-cang@codeaurora.org
Cc: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Current UFS IRQ handler is completely wrapped by host lock, and because
ufshcd_send_command() is also protected by host lock, when IRQ handler
fires, not only the CPU running the IRQ handler cannot send new requests,
the rest CPUs can neither. Move the host lock wrapping the IRQ handler into
specific branches, i.e., ufshcd_uic_cmd_compl(), ufshcd_check_errors(),
ufshcd_tmc_handler() and ufshcd_transfer_req_compl(). Meanwhile, to further
reduce occpuation of host lock in ufshcd_transfer_req_compl(), host lock is
no longer required to call __ufshcd_transfer_req_compl(). As per test, the
optimization can bring considerable gain to random read/write performance.
Link: https://lore.kernel.org/r/1621845419-14194-3-git-send-email-cang@codeaurora.org
Cc: Stanley Chu <stanley.chu@mediatek.com>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Co-developed-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Asutosh Das <asutoshd@codeaurora.org>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
ufshcd_host_reset_and_restore() anyways completes all pending requests
before starts re-probing, so there is no need to complete the command on
the highest bit in tr_doorbell in advance.
Link: https://lore.kernel.org/r/1621845419-14194-2-git-send-email-cang@codeaurora.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Reviewed-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Can Guo <cang@codeaurora.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
get_device(shost->shost_gendev.parent) is called after host state has
switched to SHOST_RUNNING. scsi_host_dev_release() shouldn't release the
parent device if host state is still SHOST_CREATED.
Link: https://lore.kernel.org/r/20210602133029.2864069-5-ming.lei@redhat.com
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Hannes Reinecke <hare@suse.de>
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi_host_dev_release() only frees dev_name when host state is
SHOST_CREATED. After host state has changed to SHOST_RUNNING,
scsi_host_dev_release() no longer cleans up.
Fix this by doing a put_device(&shost->shost_dev) in the failure path when
host state is SHOST_RUNNING. Move get_device(&shost->shost_gendev) before
device_add(&shost->shost_dev) so that scsi_host_cls_release() can do a put
on this reference.
Link: https://lore.kernel.org/r/20210602133029.2864069-4-ming.lei@redhat.com
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: Hannes Reinecke <hare@suse.de>
Reported-by: John Garry <john.garry@huawei.com>
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When scsi_add_host_with_dma() returns failure, the caller will call
scsi_host_put(shost) to release everything allocated for this host
instance. Consequently we can't also free allocated stuff in
scsi_add_host_with_dma(), otherwise we will end up with a double free.
Strictly speaking, host resource allocations should have been done in
scsi_host_alloc(). However, the allocations may need information which is
not yet provided by the driver when that function is called. So leave the
allocations where they are but rely on host device's release handler to
free resources.
Link: https://lore.kernel.org/r/20210602133029.2864069-3-ming.lei@redhat.com
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Hannes Reinecke <hare@suse.de>
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After device is initialized via device_initialize(), or its name is set via
dev_set_name(), the device has to be freed via put_device(). Otherwise
device name will be leaked because it is allocated dynamically in
dev_set_name().
Fix the leak by replacing kfree() with put_device(). Since
scsi_host_dev_release() properly handles IDA and kthread removal, remove
special-casing these from the error handling as well.
Link: https://lore.kernel.org/r/20210602133029.2864069-2-ming.lei@redhat.com
Cc: Bart Van Assche <bvanassche@acm.org>
Cc: John Garry <john.garry@huawei.com>
Cc: Hannes Reinecke <hare@suse.de>
Tested-by: John Garry <john.garry@huawei.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The "retval" variable needs to be signed for the error handling to work.
Link: https://lore.kernel.org/r/YLjMEAFNxOas1mIp@mwanda
Fixes: 7e26e3ea02 ("scsi: scsi_dh_alua: Check for negative result value")
Reviewed-by: Martin Wilck <mwilck@suse.com>
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove all references to the description of __ufshcd_wl_{suspend,resume} as
no such description exist.
Fixes: b294ff3e34 (scsi: ufs: core: Enable power management for wlun)
Link: https://lore.kernel.org/r/20210603122209.635799-1-avri.altman@wdc.com
Signed-off-by: Avri Altman <avri.altman@wdc.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy() avoid using an inline const buffer
argument and instead just statically initialize the destination array
directly.
Link: https://lore.kernel.org/r/20210602180000.3326448-1-keescook@chromium.org
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
host->max_id defines the maximum target id that the SCSI midlayer will
attempt to manually scan. The default is 8. Update the value to the max
sessions the driver supports.
[mkp: applied by hand]
Link: https://lore.kernel.org/r/20210602104653.17278-1-jhasan@marvell.com
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Saurav Kashyap <skashyap@marvell.com>
Signed-off-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Five small and fairly minor fixes, all in drivers.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCYLzsbiYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishRkoAQD07kLp
JSHpsn97DOdDpCYu+GoLtHz9uJ9Keh+61hbv+gEAoruwy+STPC3MiKP6IW4b1i/R
U66kS0NWYkGqOITA2Xs=
=Wqvj
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Five small and fairly minor fixes, all in drivers"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: scsi_devinfo: Add blacklist entry for HPE OPEN-V
scsi: ufs: ufs-mediatek: Fix HCI version in some platforms
scsi: qedf: Do not put host in qedf_vport_create() unconditionally
scsi: lpfc: Fix failure to transmit ABTS on FC link
scsi: target: core: Fix warning on realtime kernels
Allow the compiler to verify the type of the second argument passed to
scsi_host_complete_all_commands().
Link: https://lore.kernel.org/r/20210524025457.11299-4-bvanassche@acm.org
Cc: Hannes Reinecke <hare@suse.com>
Cc: John Garry <john.garry@huawei.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Make it possible for the compiler to verify whether SAM and host
status codes are used correctly.
[mkp: resolve conflicts with Hannes' SCSI result series]
Link: https://lore.kernel.org/r/20210524025457.11299-3-bvanassche@acm.org
Cc: Hannes Reinecke <hare@suse.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch prepares for converting SAM status codes into an enum. Without
this patch converting SAM status codes into an enumeration type would
trigger complaints about enum type mismatches for the SAS code.
Link: https://lore.kernel.org/r/20210524025457.11299-2-bvanassche@acm.org
Cc: Hannes Reinecke <hare@suse.com>
Cc: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Cc: Jason Yan <yanaijie@huawei.com>
Reviewed-by: John Garry <john.garry@huawei.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Acked-by: Jack Wang <jinpu.wang@ionos.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Include Hannes' SCSI command result rework in the staging branch.
[mkp: remove DRIVER_SENSE from mpi3mr]
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If we got a response then we should always wake up the conn. For both the
cmd_cleanup_req == 0 or cmd_cleanup_req > 0, we shouldn't dig into
iscsi_itt_to_task because we don't know what the upper layers are doing.
We can also remove the qedi_clear_task_idx call here because once we signal
success libiscsi will loop over the affected commands and end up calling
the cleanup_task callout which will release it.
Link: https://lore.kernel.org/r/20210525181821.7617-29-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We need to make sure that abort and reset completion work has completed
before ep_disconnect returns. After ep_disconnect we can't manipulate
cmds because libiscsi will call conn_stop and take onwership.
We are trying to make sure abort work and reset completion work has
completed before we do the cmd clean up in ep_disconnect. The problem is
that:
1. the work function sets the QEDI_CONN_FW_CLEANUP bit, so if the work was
still pending we would not see the bit set. We need to do this before
the work is queued.
2. If we had multiple works queued then we could break from the loop in
qedi_ep_disconnect early because when abort work 1 completes it could
clear QEDI_CONN_FW_CLEANUP. qedi_ep_disconnect could then see that
before work 2 has run.
3. A TMF reset completion work could run after ep_disconnect starts
cleaning up cmds via qedi_clearsq. ep_disconnect's call to qedi_clearsq
-> qedi_cleanup_all_io would might think it's done cleaning up cmds,
but the reset completion work could still be running. We then return
from ep_disconnect while still doing cleanup.
This replaces the bit with a counter to track the number of queued TMF
works, and adds a bool to prevent new works from starting from the
completion path once a ep_disconnect starts.
Link: https://lore.kernel.org/r/20210525181821.7617-28-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
qedi_abort_work knows what task to abort so just pass it to send_iscsi_tmf.
Link: https://lore.kernel.org/r/20210525181821.7617-27-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Drivers shouldn't be calling block/unblock session for cmd cleanup because
the functions can change the session state from under libiscsi. This adds
a new a driver level bit so it can block all I/O the host while it drains
the card.
Link: https://lore.kernel.org/r/20210525181821.7617-26-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Drivers shouldn't be calling block/unblock session for tmf handling because
the functions can change the session state from under libiscsi.
iscsi_queuecommand's call to iscsi_prep_scsi_cmd_pdu->
iscsi_check_tmf_restrictions will prevent new cmds from being sent to qedi
after we've started handling a TMF. So we don't need to try and block it in
the driver, and we can remove these block calls.
Link: https://lore.kernel.org/r/20210525181821.7617-25-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We run from a workqueue with no locks held so use GFP_NOIO.
Link: https://lore.kernel.org/r/20210525181821.7617-24-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
qedi_iscsi_abort_work and qedi_tmf_work both allocate a tid then call
qedi_send_iscsi_tmf which also allocates a tid. This removes the tid
allocation from the callers.
Link: https://lore.kernel.org/r/20210525181821.7617-23-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If qedi_tmf_work's qedi_wait_for_cleanup_request call times out we will
also force the clean up of the qedi_work_map but
qedi_process_cmd_cleanup_resp could still be accessing the qedi_cmd.
To fix this issue we extend where we hold the tmf_work_lock and back_lock
so the qedi_process_cmd_cleanup_resp access is serialized with the cleanup
done in qedi_tmf_work and any completion handling for the iscsi_task.
Link: https://lore.kernel.org/r/20210525181821.7617-22-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the SCSI cmd completes after qedi_tmf_work calls iscsi_itt_to_task then
the qedi qedi_cmd->task_id could be freed and used for another cmd. If we
then call qedi_iscsi_cleanup_task with that task_id we will be cleaning up
the wrong cmd.
Wait to release the task_id until the last put has been done on the
iscsi_task. Because libiscsi grabs a ref to the task when sending the
abort, we know that for the non-abort timeout case that the task_id we are
referencing is for the cmd that was supposed to be aborted.
A latter commit will fix the case where the abort times out while we are
running qedi_tmf_work.
Link: https://lore.kernel.org/r/20210525181821.7617-21-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If qedi_process_cmd_cleanup_resp finds the cmd it frees the work and sets
list_tmf_work to NULL, so qedi_tmf_work should check if list_tmf_work is
non-NULL when it wants to force cleanup.
Link: https://lore.kernel.org/r/20210525181821.7617-20-michael.christie@oracle.com
Reviewed-by: Manish Rangankar <mrangankar@marvell.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This doesn't fix any bugs, but it makes more sense to free the pool after
we have removed the session. At that time we know nothing is touching any
of the session fields, because all devices have been removed and scans are
stopped.
Link: https://lore.kernel.org/r/20210525181821.7617-19-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For aborts, qedi needs to cleanup the FW then send the TMF from a worker
thread. While it's doing these the cmd could complete normally and the TMF
could time out. libiscsi would then complete the iscsi_task which will call
into the driver to cleanup the driver level resources while it still might
be accessing them for the cleanup/abort.
This has iscsi_eh_abort keep the iscsi_task ref if the TMF times out, so
qedi does not have to worry about if the task is being freed while in use
and does not need to get its own ref.
Link: https://lore.kernel.org/r/20210525181821.7617-18-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We set the max_active iSCSI EH works to 1, so all work is going to execute
in order by default. However, userspace can now override this in sysfs. If
max_active > 1, we can end up with the block_work on CPU1 and
iscsi_unblock_session running the unblock_work on CPU2 and the session and
target/device state will end up out of sync with each other.
This adds a flush of the block_work in iscsi_unblock_session.
Link: https://lore.kernel.org/r/20210525181821.7617-17-michael.christie@oracle.com
Fixes: 1d726aa6ef ("scsi: iscsi: Optimize work queue flush use")
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We have a ref to the task being aborted, so SCp.ptr will never be NULL. We
need to use iscsi_task_is_completed to check for the completed state.
Link: https://lore.kernel.org/r/20210525181821.7617-16-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The iscsi offload drivers are setting the shost->max_id to the max number
of sessions they support. The problem is that max_id is not the max number
of targets but the highest identifier the targets can have. To use it to
limit the number of targets we need to set it to max sessions - 1, or we
can end up with a session we might not have preallocated resources for.
Link: https://lore.kernel.org/r/20210525181821.7617-15-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If we haven't done a unbind target call we can race where
iscsi_conn_teardown wakes up the EH thread and then frees the conn while
those threads are still accessing the conn ehwait.
We can only do one TMF per session so this just moves the TMF fields from
the conn to the session. We can then rely on the
iscsi_session_teardown->iscsi_remove_session->__iscsi_unbind_session call
to remove the target and it's devices, and know after that point there is
no device or scsi-ml callout trying to access the session.
Link: https://lore.kernel.org/r/20210525181821.7617-14-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The comment in iscsi_eh_session_reset is wrong and we don't wait for the
EH to complete before tearing down the conn. This has us get a ref to the
conn when we are not holding the eh_mutex/frwd_lock so it does not get
freed from under us.
Link: https://lore.kernel.org/r/20210525181821.7617-13-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If SCSI midlayer is aborting a task when we are tearing down the conn we
could free the conn while the abort thread is accessing the conn. This has
the abort handler get a ref to the conn so it won't be freed from under it.
Note: this is not needed for device/target reset because we are holding the
eh_mutex when accessing the conn.
Link: https://lore.kernel.org/r/20210525181821.7617-12-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are a couple places where we could free the iscsi_cls_conn while it's
still in use. This adds some helpers to get/put a refcount on the struct
and converts an exiting user. Subsequent commits will then use the helpers
to fix 2 bugs in the eh code.
Link: https://lore.kernel.org/r/20210525181821.7617-11-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Make sure the conn socket shutdown starts before we start the timer to fail
commands to upper layers.
Link: https://lore.kernel.org/r/20210525181821.7617-10-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Userspace (open-iscsi based tools at least) sets no linger on the socket to
prevent stale data from being sent. However, with the in-kernel cleanup if
userspace is not up the sockfd_put will release the socket without having
set that sockopt.
iscsid sets that opt at socket close time, but it seems ok to set this at
setup time in the kernel for all tools.
Link: https://lore.kernel.org/r/20210525181821.7617-9-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Commit 0ab710458d ("scsi: iscsi: Perform connection failure entirely in
kernel space") has the following regressions/bugs that this patch fixes:
1. It can return cmds to upper layers like dm-multipath where that can
retry them. After they are successful the fs/app can send new I/O to the
same sectors, but we've left the cmds running in FW or in the net layer.
We need to be calling ep_disconnect if userspace is not up.
This patch only fixes the issue for offload drivers. iscsi_tcp will be
fixed in separate commit because it doesn't have a ep_disconnect call.
2. The drivers that implement ep_disconnect expect that it's called before
conn_stop. Besides crashes, if the cleanup_task callout is called before
ep_disconnect it might free up driver/card resources for session1 then they
could be allocated for session2. But because the driver's ep_disconnect is
not called it has not cleaned up the firmware so the card is still using
the resources for the original cmd.
3. The stop_conn_work_fn can run after userspace has done its recovery and
we are happily using the session. We will then end up with various bugs
depending on what is going on at the time.
We may also run stop_conn_work_fn late after userspace has called stop_conn
and ep_disconnect and is now going to call start/bind conn. If
stop_conn_work_fn runs after bind but before start, we would leave the conn
in a unbound but sort of started state where IO might be allowed even
though the drivers have been set in a state where they no longer expect
I/O.
4. Returning -EAGAIN in iscsi_if_destroy_conn if we haven't yet run the in
kernel stop_conn function is breaking userspace. We should have been doing
this for the caller.
Link: https://lore.kernel.org/r/20210525181821.7617-8-michael.christie@oracle.com
Fixes: 0ab710458d ("scsi: iscsi: Perform connection failure entirely in kernel space")
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Subsequent commits allow the kernel to do ep_disconnect. In that case we
will have to get a proper refcount on the ep so one thread does not delete
it from under another.
Link: https://lore.kernel.org/r/20210525181821.7617-7-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use the system_unbound_wq for async session destruction. We don't need a
dedicated workqueue for async session destruction because:
1. perf does not seem to be an issue since we only allow 1 active work.
2. it does not have deps with other system works and we can run them in
parallel with each other.
Link: https://lore.kernel.org/r/20210525181821.7617-6-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If the system is not up, we can just fail immediately since iscsid is not
going to ever answer our netlink events. We are already setting the
recovery_tmo to 0, but by passing stop_conn STOP_CONN_TERM we never will
block the session and start the recovery timer, because for that flag
userspace will do the unbind and destroy events which would remove the
devices and wake up and kill the eh.
Since the conn is dead and the system is going dowm this just has us use
STOP_CONN_RECOVER with recovery_tmo=0 so we fail immediately. However, if
the user has set the recovery_tmo=-1 we let the system hang like they
requested since they might have used that setting for specific reasons
(one known reason is for buggy cluster software).
Link: https://lore.kernel.org/r/20210525181821.7617-5-michael.christie@oracle.com
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
libiscsi will now suspend the send/tx queue for the drivers so we can drop
it from the drivers ep_disconnect.
Link: https://lore.kernel.org/r/20210525181821.7617-4-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During ep_disconnect we have been doing iscsi_suspend_tx/queue to block new
I/O but every driver except cxgbi and iscsi_tcp can still get I/O from
__iscsi_conn_send_pdu() if we haven't called iscsi_conn_failure() before
ep_disconnect. This could happen if we were terminating the session, and
the logout timed out before it was even sent to libiscsi.
Fix the issue by adding a helper which reverses the bind_conn call that
allows new I/O to be queued. Drivers implementing ep_disconnect can use this
to make sure new I/O is not queued to them when handling the disconnect.
Link: https://lore.kernel.org/r/20210525181821.7617-3-michael.christie@oracle.com
Reviewed-by: Lee Duncan <lduncan@suse.com>
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While reenabling the IRQ after IRQ poll there may be a small window for the
firmware to post the replies with interrupts raised. In that case the
driver will not see the interrupts which leads to I/O timeout.
This issue only happens when there are many I/O completions on a single
reply queue. This forces the driver to switch between the interrupt and IRQ
context.
Make the driver process the reply queue one more time after enabling the
IRQ.
Link: https://lore.kernel.org/linux-scsi/20201102072746.27410-1-sreekanth.reddy@broadcom.com/
Link: https://lore.kernel.org/r/20210528131307.25683-5-chandrakanth.patil@broadcom.com
Cc: Tomas Henzl <thenzl@redhat.com>
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Consider the case where a VD is deleted and the targetID of that VD is
assigned to a newly created VD. If the sequence of deletion/addition of VD
happens very quickly there is a possibility that second event (VD add)
occurs even before the driver processes the first event (VD delete). As
event processing is done in deferred context the device list remains the
same (but targetID is re-used) so driver will not learn the VD
deletion/additon. I/Os meant for the older VD will be directed to new VD
which may lead to data corruption.
Make driver detect the deleted VD as soon as possible based on the RaidMap
update and block further I/O to that device.
Link: https://lore.kernel.org/r/20210528131307.25683-4-chandrakanth.patil@broadcom.com
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver doesn't clean up all the allocated resources properly when
scsi_add_host(), megasas_start_aen() function fails during the PCI device
probe.
Clean up all those resources.
Link: https://lore.kernel.org/r/20210528131307.25683-3-chandrakanth.patil@broadcom.com
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver issues all non-ReadWrite I/Os for TYPE_ENCLOSURE devices through
the fast path with invalid dev handle. Fast path in turn directs all the
I/Os to the firmware. As firmware stopped handling those I/Os from SAS3.5
generation of controllers (Ventura generation and onwards) this will lead
to I/O failures.
Switch the driver to issue all the non-ReadWrite I/Os for TYPE_ENCLOSURE
devices directly to firmware for SAS3.5 generation of controllers and
later.
Link: https://lore.kernel.org/r/20210528131307.25683-2-chandrakanth.patil@broadcom.com
Cc: <stable@vger.kernel.org> # v5.11+
Signed-off-by: Chandrakanth Patil <chandrakanth.patil@broadcom.com>
Signed-off-by: Sumit Saxena <sumit.saxena@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Read PCI_EXT_CAP_ID_DSN to query security status.
The driver will throw a warning message when a non-secure type controller
is detected. The purpose of this interface is to avoid interacting with any
firmware which is not secured/signed by Broadcom. Any tampering on
firmware component will be detected by hardware and it will be communicated
to the driver to avoid any further interaction with that component.
Link: https://lore.kernel.org/r/20210520152545.2710479-23-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Wait for host I/O completion (default 180 seconds) if I/O timeout is
detected on VDs.
Link: https://lore.kernel.org/r/20210520152545.2710479-21-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Unlock the host diagnostic register, write the specific reset type to that
and wait for reset acknowledgment from the controller. If the reset is not
successful retry for the predefined number of times
Link: https://lore.kernel.org/r/20210520152545.2710479-19-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Register driver for threaded interrupts.
By default the driver will attempt I/O completion from interrupt context
(primary handler). Since the driver tracks per reply queue outstanding
I/Os, it will schedule threaded ISR if there are any outstanding I/Os
expected on that particular reply queue.
Threaded ISR (secondary handler) will loop for I/O completion as long as
there are outstanding I/Os (speculative method using same per reply queue
outstanding counter) or it has completed some X amount of commands
(something like budget).
Link: https://lore.kernel.org/r/20210520152545.2710479-18-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The controller hardware can not handle certain UNMAP commands for NVMe
drives. Add support in the driver for checking those commands and handle
them appropriately.
Link: https://lore.kernel.org/r/20210520152545.2710479-17-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of driver returning DID_NO_CONNECT during driver unload allow SSU
and Sync Cache commands to be sent to the controller to flush any cached
data from the drive.
Link: https://lore.kernel.org/r/20210520152545.2710479-16-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This operation requests that the IOC update the TimeStamp.
When the I/O Unit is powered on it sets the TimeStamp field value to
0x0000_0000_0000_0000 and increments the current value every millisecond.
A host driver sets the TimeStamp field to the current time by using an
IOCInit request. The TimeStamp field is periodically updated by the host
driver.
Link: https://lore.kernel.org/r/20210520152545.2710479-11-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Detection of firmware fault or any kind of unresponsiveness in the
controller (any admin command which times out) results in resetting the
controller. The primary reset mechanisms used are either soft reset or diag
fault reset. A reset is performed if the host sets the ResetAction field in
the HostDiagnostic register to either 001b (soft reset) or 007b (diag fault
reset). After successfully resetting the controller the driver
reinitializes the controller by going through start of the day
initialization procedure. Pending I/Os during the reset are returned back
to the SCSI midlayer for retry.
Link: https://lore.kernel.org/r/20210520152545.2710479-10-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.co
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implement support for handling the following MPI events:
- MPI3_EVENT_SAS_BROADCAST_PRIMITIVE
- MPI3_EVENT_CABLE_MGMT
- MPI3_EVENT_ENERGY_PACK_CHANGE
Link: https://lore.kernel.org/r/20210520152545.2710479-9-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implement support for the following PCIe-related MPI events:
- MPI3_EVENT_PCIE_TOPOLOGY_CHANGE_LIST
- MPI3_EVENT_PCIE_ENUMERATION
Link: https://lore.kernel.org/r/20210520152545.2710479-8-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Firmware can report various MPI Events. Enable support for processing the
following events related to device addition/removal to the driver:
- MPI3_EVENT_DEVICE_ADDED
- MPI3_EVENT_DEVICE_INFO_CHANGED
- MPI3_EVENT_DEVICE_STATUS_CHANGE
- MPI3_EVENT_ENCL_DEVICE_STATUS_CHANGE
- MPI3_EVENT_SAS_TOPOLOGY_CHANGE_LIST
- MPI3_EVENT_SAS_DISCOVERY
- MPI3_EVENT_SAS_DEVICE_DISCOVERY_ERROR
Link: https://lore.kernel.org/r/20210520152545.2710479-7-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The watchdog thread is the driver's internal thread which does a few things
such as detecting firmware faults, resetting the controller, performing
timestamp sync, etc.
Link: https://lore.kernel.org/r/20210520152545.2710479-6-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Send Port Enable Request to FW for Device Discovery. As part of port
enable completion driver calls scan_start and scan_finished hooks. SCSI
layer references like sdev, starget, etc. are added but actual device
discovery will be supported once driver adds complete event process
handling.
Link: https://lore.kernel.org/r/20210520152545.2710479-5-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Cc: hare@suse.de
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Create operational request and reply queue pair.
The MPI3 transport interface consists of an Administrative Request Queue,
an Administrative Reply Queue, and Operational Messaging Queues. The
Operational Messaging Queues are the primary communication mechanism
between the host and the I/O Controller (IOC). Request messages, allocated
in host memory, identify I/O operations to be performed by the IOC. These
operations are queued on an Operational Request Queue by the host driver.
Reply descriptors track I/O operations as they complete. The IOC queues
these completions in an Operational Reply Queue.
To fulfil large contiguous memory requirement, driver creates multiple
segments and provide the list of segments. Each segment size should be 4K
which is a hardware requirement. An element array is contiguous or
segmented. A contiguous element array is located in contiguous physical
memory. A contiguous element array must be aligned on an element size
boundary. An element's physical address within the array may be directly
calculated from the base address, the Producer/Consumer index, and the
element size.
Expected phased identifier bit is used to find out valid entry on reply
queue. Driver sets <ephase> bit and IOC inverts the value of this bit on
each pass.
Link: https://lore.kernel.org/r/20210520152545.2710479-4-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implement basic pci device driver requirements: Device probing, memory
allocation, mapping system registers, allocate irq lines, etc.
Source is managed in mainly three different files:
- mpi3mr_fw.c: Common code which interacts with underlying fw/hw.
- mpi3mr_os.c: Common code which interacts with SCSI midlayer.
- mpi3mr_app.c: Common code which interacts with application/ioctl.
This is currently work in progress.
Link: https://lore.kernel.org/r/20210520152545.2710479-3-kashyap.desai@broadcom.com
Cc: sathya.prakash@broadcom.com
Cc: bvanassche@acm.org
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Tomas Henzl <thenzl@redhat.com>
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix the following W=1 kernel build warning:
drivers/scsi/ufs/ufshcd.c:9773: warning: This comment starts with '/**', but isn't a kernel-doc comment. Refer Documentation/doc-guide/kernel-doc.rst
[mkp: upcase abbreviations]
Link: https://lore.kernel.org/r/20210531163122.451375-1-huobean@gmail.com
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Bean Huo <beanhuo@micron.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), avoid intentionally writing across
neighboring array fields.
Switch from rsp_ui to resp_buf, since resp_ui isn't SSP_RESP_IU_MAX_SIZE
bytes in length. This avoids future compile-time warnings.
Link: https://lore.kernel.org/r/20210528181337.792268-4-keescook@chromium.org
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In preparation for FORTIFY_SOURCE performing compile-time and run-time
field bounds checking for memcpy(), avoid intentionally writing across
neighboring array fields.
Remove old-style 1-byte array in favor of a flexible array[1] to avoid
future false-positive cross-field memcpy() warning in:
esas2r_vda.c:
memcpy(vi->cmd.gsv.version_info, esas2r_vdaioctl_versions, ...)
The change in struct size doesn't change other structure sizes (it is
already maxed out to 256 bytes, for example here:
union {
struct atto_ioctl_vda_scsi_cmd scsi;
struct atto_ioctl_vda_flash_cmd flash;
struct atto_ioctl_vda_diag_cmd diag;
struct atto_ioctl_vda_cli_cmd cli;
struct atto_ioctl_vda_smp_cmd smp;
struct atto_ioctl_vda_cfg_cmd cfg;
struct atto_ioctl_vda_mgt_cmd mgt;
struct atto_ioctl_vda_gsv_cmd gsv;
u8 cmd_info[256];
} cmd;
No sizes are calculated using the enclosing structure, so no other
updates are needed.
Link: https://lore.kernel.org/r/20210528181337.792268-3-keescook@chromium.org
Reviewed-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The BusLogic driver has build errors on ia64 due to a name collision (in
the #included FlashPoint.c file). Rename the struct field in struct
sccb_mgr_info from si_flags to si_mflags (manager flags) to mend the build.
This is the first problem. There are 50+ others after this one:
In file included from ../include/uapi/linux/signal.h:6,
from ../include/linux/signal_types.h:10,
from ../include/linux/sched.h:29,
from ../include/linux/hardirq.h:9,
from ../include/linux/interrupt.h:11,
from ../drivers/scsi/BusLogic.c:27:
../arch/ia64/include/uapi/asm/siginfo.h:15:27: error: expected ':', ',', ';', '}' or '__attribute__' before '.' token
15 | #define si_flags _sifields._sigfault._flags
| ^
../drivers/scsi/FlashPoint.c:43:6: note: in expansion of macro 'si_flags'
43 | u16 si_flags;
| ^~~~~~~~
In file included from ../drivers/scsi/BusLogic.c:51:
../drivers/scsi/FlashPoint.c: In function 'FlashPoint_ProbeHostAdapter':
../drivers/scsi/FlashPoint.c:1076:11: error: 'struct sccb_mgr_info' has no member named '_sifields'
1076 | pCardInfo->si_flags = 0x0000;
| ^~
../drivers/scsi/FlashPoint.c:1079:12: error: 'struct sccb_mgr_info' has no member named '_sifields'
Link: https://lore.kernel.org/r/20210529234857.6870-1-rdunlap@infradead.org
Fixes: 391e2f2560 ("[SCSI] BusLogic: Port driver to 64-bit.")
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Hannes Reinecke <hare@suse.de>
Cc: Khalid Aziz <khalid.aziz@oracle.com>
Cc: Khalid Aziz <khalid@gonehiking.org>
Reported-by: kernel test robot <lkp@intel.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In preparation to enable -Wimplicit-fallthrough for Clang, fix a couple
of warnings by explicitly adding break statements instead of just letting
the code fall through to the next case.
Link: https://github.com/KSPP/linux/issues/115
Link: https://lore.kernel.org/r/20210528200828.GA39349@embeddedor
Reviewed-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Pass in fcport->vha to ql_log() in order to add the PCI address to the log.
Currently NULL is passed in which gives this confusing log entry:
> qla2xxx [0000:00:00.0]-2112: : qla_nvme_unregister_remote_port: unregister remoteport on 0000000009d6a2e9 50000973981648c7
Link: https://lore.kernel.org/r/20210531122444.116655-1-dwagner@suse.de
Reviewed-by: Himanshu Madhani <himanshu.madhani@oracle.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
MediaTek ufshci needs to be disabled before HW reset to avoid potential
issues.
Link: https://lore.kernel.org/r/20210528033624.12170-3-alice.chao@mediatek.com
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Alice.Chao <alice.chao@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Export ufshcd_hba_stop() to allow vendors to disable HCI in variant ops.
Link: https://lore.kernel.org/r/20210528033624.12170-2-alice.chao@mediatek.com
Reviewed-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Alice.Chao <alice.chao@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Some MediaTek SoC platforms with UFSHCI version below 3.0 have incorrect
UFSHCI versions showed in register map.
Fix the version by referring to UniPro version which is always correct.
Link: https://lore.kernel.org/r/20210531062642.12642-1-stanley.chu@mediatek.com
Reviewed-by: Alim Akhtar <alim.akhtar@samsung.com>
Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Do not drop reference count on vn_port->host in qedf_vport_create()
unconditionally. Instead drop the reference count in qedf_vport_destroy().
Link: https://lore.kernel.org/r/20210521143440.84816-1-dwagner@suse.de
Reported-by: Javed Hasan <jhasan@marvell.com>
Signed-off-by: Daniel Wagner <dwagner@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Replace the per-block device bd_mutex with a per-gendisk open_mutex,
thus simplifying locking wherever we deal with partitions.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Acked-by: Roger Pau Monné <roger.pau@citrix.com>
Link: https://lore.kernel.org/r/20210525061301.2242282-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Originally the SCSI subsystem has been using 'special' SCSI status codes,
which were the SAM-specified ones but shifted by 1. As most drivers have
now been modified to use the SAM-specified ones, having two nearly
identical sets of definitions only causes confusion.
The Linux-specifed SCSI status codes have been marked obsolete for several
years so drop them and use the SAM-specified status codes throughout.
Link: https://lore.kernel.org/r/20210427083046.31620-41-hare@suse.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Douglas Gilbert <dgilbert@interlog.com>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The abort_cmd_ia flag in an abort wqe describes whether an ABTS basic link
service should be transmitted on the FC link or not. Code added in
lpfc_sli4_issue_abort_iotag() set the abort_cmd_ia flag incorrectly,
surpressing ABTS transmission.
A previous LPFC change to build an abort wqe inverted prior logic that
determined whether an ABTS was to be issued on the FC link.
Revert this logic to its proper state.
Link: https://lore.kernel.org/r/20210528212240.11387-1-jsmart2021@gmail.com
Fixes: db7531d2b3 ("scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers")
Cc: <stable@vger.kernel.org> # v5.11+
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The nsp_cs driver stores the SAM status values in SCp.Status, so we need to
use the non-shifted version SAM_STAT_CHECK_CONDITION.
Link: https://lore.kernel.org/r/20210527072217.117126-1-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove last vestiges of SCSI status message bytes.
Link: https://lore.kernel.org/r/20210427083046.31620-39-hare@suse.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The message byte is now unused, so we can drop the helper to set the
message byte and the check for message bytes during error recovery.
Link: https://lore.kernel.org/r/20210427083046.31620-38-hare@suse.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of setting the message byte translate it to the appropriate host
byte. As error recovery would return DID_ERROR for any non-zero message
byte the translation doesn't change the error handling.
Link: https://lore.kernel.org/r/20210427083046.31620-37-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Set the SCSI host status before calling fdomain_finish_cmd() and drop the
last argument to that function.
Link: https://lore.kernel.org/r/20210427083046.31620-36-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver should be using the standard SAM_STAT_ values, and not the
Linux-specific ones.
Link: https://lore.kernel.org/r/20210427083046.31620-34-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of setting the message byte translate it to the appropriate host
byte. As error recovery would return DID_ERROR for any non-zero message
byte the translation doesn't change the error handling.
[mkp: zeroday bug report: s/SCpnt->result/SCpnt/]
Link: https://lore.kernel.org/r/20210427083046.31620-33-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The host byte in the SCSI status takes precedence during error recovery, so
there is no point in setting the message byte in addition to a host byte
which is not DID_OK.
Link: https://lore.kernel.org/r/20210427083046.31620-32-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The done() function is called with a host_byte indicating the actual error
when the message byte is set. As the host byte takes precedence during
error recovery we can drop setting the message byte if the host byte is
set, too. The only other case is when the host byte is DID_OK, but in that
case the message byte is always COMMAND_COMPLETE (i.e. 0), so we can drop
it there, too.
Link: https://lore.kernel.org/r/20210427083046.31620-31-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of passing in the combined SCSI result values, split them off into
separate status, message, and host byte values.
Link: https://lore.kernel.org/r/20210427083046.31620-30-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of setting the message byte translate it to the appropriate host
byte. As error recovery would return DID_ERROR for any non-zero message
byte the translation doesn't change the error handling. And use SCSI
result accessors while we're at it.
Link: https://lore.kernel.org/r/20210427083046.31620-29-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of setting the message byte translate it to a host byte status. As
the error recovery would map it to DID_ERROR anyway the translation doesn't
change the SCSI error handling.
Link: https://lore.kernel.org/r/20210427083046.31620-27-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of setting the message byte translate it to the appropriate host
byte. As error recovery would return DID_ERROR for any non-zero message
byte the translation doesn't change the error handling.
Link: https://lore.kernel.org/r/20210427083046.31620-26-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The message byte always devolves to COMMAND_COMPLETE, so there is no point
in setting it.
Link: https://lore.kernel.org/r/20210427083046.31620-25-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Make ql_pcmd() a void function and set the SCSI result directly.
[mkp: fix zeroday 'result' warning]
Link: https://lore.kernel.org/r/20210427083046.31620-21-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
fix
Drop message byte setting if the host byte is already set, and translate
message bytes into the related host bytes when evaluating an overrun or
underrun.
Link: https://lore.kernel.org/r/20210427083046.31620-20-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Use standard macros to set the SCSI result and drop the internal ones.
Link: https://lore.kernel.org/r/20210427083046.31620-19-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The message byte can take only two values, COMMAND_COMPLETE and ABORT. So
we can easily map ABORT to DID_ABORT and not set the message byte.
Link: https://lore.kernel.org/r/20210427083046.31620-16-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver_byte field in the result is now unused, so we can drop the
definitions.
Link: https://lore.kernel.org/r/20210427083046.31620-15-hare@suse.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The Xen guest might run against arbitrary backends, so the driver might
receive a status with driver_byte set. Map these errors to DID_ERROR to be
consistent with recent changes.
Link: https://lore.kernel.org/r/20210427083046.31620-14-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Set DID_TIME_OUT instead of DRIVER_TIMEOUT when a command
is finally marked as failed after error recovery.
Link: https://lore.kernel.org/r/20210427083046.31620-12-hare@suse.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There is no point in returning DID_ABORT together with DRIVER_INVALID, as
the caller couldn't care less where the abort originated. So drop the use
of DRIVER_INVALID.
Link: https://lore.kernel.org/r/20210427083046.31620-11-hare@suse.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Replace the check for DRIVER_SENSE with a check for
scsi_status_is_check_condition().
Audit all callsites to ensure the SAM status is set correctly. For
backwards compability move the DRIVER_SENSE definition to sg.h, and update
sg, bsg, and scsi_ioctl to set the DRIVER_SENSE driver_status whenever
SAM_STAT_CHECK_CONDITION is present.
[mkp: fix zeroday srp warning]
Link: https://lore.kernel.org/r/20210427083046.31620-10-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
fix
Add a helper function scsi_status_is_check_condition() to encapsulate the
frequent checks for SAM_STAT_CHECK_CONDITION.
Link: https://lore.kernel.org/r/20210427083046.31620-9-hare@suse.de
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Introduce scsi_build_sense() as a wrapper around scsi_build_sense_buffer()
to format the buffer and set the correct SCSI status.
Link: https://lore.kernel.org/r/20210427083046.31620-8-hare@suse.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Return the actual error code in __scsi_execute() (which, according to the
documentation, should have happened anyway). And audit all callers to cope
with negative return values from __scsi_execute() and friends.
[mkp: resolve conflict and return bool]
Link: https://lore.kernel.org/r/20210427083046.31620-7-hare@suse.de
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
scsi_execute() will now return a negative error if there was an error prior
to command submission; evaluate that instead if checking for DRIVER_ERROR.
[mkp: build fix]
Link: https://lore.kernel.org/r/20210427083046.31620-6-hare@suse.de
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>