linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2025-01-27 08:14:35 +08:00

Author	SHA1	Message	Date
James Smart	a4691038b4	scsi: lpfc: Fix unload hang after back to back PCI EEH faults When injecting EEH errors the port is getting hung up waiting on the node list to empty, message number 0233. The driver is stuck at this point and also can't unload. The driver makes transport remoteport delete calls which try to abort I/O's, but the EEH daemon has already called the driver to detach and the detachment has set the global FC_UNLOADING flag. There are several code paths that will avoid I/O cleanup if the FC_UNLOADING flag is set, resulting in transports waiting for I/O while the driver is waiting on transports to clean up. Additionally, during study of the list, a locking issue was found in lpfc_sli_abort_iocb_ring that could corrupt the list. A special case was added to the lpfc_cleanup() routine to call lpfc_sli_flush_rings() if the driver is FC_UNLOADING and if the pci-slot is offline (e.g. EEH). The SLI4 part of lpfc_sli_abort_iocb_ring() is changed to use the ring_lock. Also added code to cancel the I/Os if the pci-slot is offline and added checks and returns for the FC_UNLOADING and HBA_IOQ_FLUSH flags to prevent trying to send an I/O that we cannot handle. Link: https://lore.kernel.org/r/20220317032737.45308-3-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-03-29 23:19:38 -04:00
James Smart	35ed9613d8	scsi: lpfc: Improve PCI EEH Error and Recovery Handling Following EEH errors, the driver can crash or hang when deleting the localport or when attempting to unload. The EEH handlers in the driver did not notify the NVMe-FC transport before tearing the driver down. This was delayed until the resume steps. This worked for SCSI because lpfc_block_scsi() would notify the scsi_fc_transport that the target was not available but it would not clean up all the references to the ndlp. The SLI3 prep for dev reset handler did the lpfc_offline_prep() and lpfc_offline() calls to get the port stopped before restarting. The SLI4 version of the prep for dev reset just destroyed the queues and did not stop NVMe from continuing. Also because the port was not really stopped the localport destroy would hang because the transport was still waiting for I/O. Additionally, a devloss tmo can fire and post events to a stopped worker thread creating another hang condition. lpfc_sli4_prep_dev_for_reset() is modified to call lpfc_offline_prep() and lpfc_offline() rather than just lpfc_scsi_dev_block() to ensure both SCSI and NVMe transports are notified to block I/O to the driver. Logic is added to devloss handler and worker thread to clean up ndlp references and quiesce appropriately. Link: https://lore.kernel.org/r/20220317032737.45308-2-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-03-29 23:19:37 -04:00
James Smart	f45775bf56	scsi: lpfc: Copyright updates for 14.2.0.0 patches Update copyrights to 2022 for files modified in the 14.2.0.0 patch set. Link: https://lore.kernel.org/r/20220225022308.16486-18-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-03-15 13:51:50 -04:00
James Smart	2d1928c57d	scsi: lpfc: SLI path split: Refactor misc ELS paths This patch refactors the remaining ELS paths to use SLI-4 as the primary interface. Paths include RRQ, RSCN, unsolicited ELS RQST and RSP paths, ELS timeouts, etc.: - Remove unused routines lpfc_sli4_bpl2sgl and lpfc_sli4_iocb2wqe - Conversion away from using SLI-3 iocb structures to set/access fields in common routines. Use the new generic get/set routines that were added. This move changes code from indirect structure references to using local variables with the generic routines. - Refactor routines when setting non-generic fields, to have both SLI3 and SLI4 specific sections. This replaces the set-as-SLI3 then translate to SLI4 behavior of the past. Link: https://lore.kernel.org/r/20220225022308.16486-12-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2022-03-15 13:51:49 -04:00
James Smart	7dd2e2a923	scsi: lpfc: Trigger SLI4 firmware dump before doing driver cleanup Extraneous teardown routines are present in the firmware dump path causing altered states in firmware captures. When a firmware dump is requested via sysfs, trigger the dump immediately without tearing down structures and changing adapter state. The driver shall rely on pre-existing firmware error state clean up handlers to restore the adapter. Link: https://lore.kernel.org/r/20211204002644.116455-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:35:36 -05:00
James Smart	8ed190a919	scsi: lpfc: Fix NPIV port deletion crash The driver is calling schedule_timeout after the DA_ID nameserver request and LOGO commands are issued to the fabric by the initiator virtual endport. These fixed delay functions are causing long delays in the driver's worker thread when processing discovery I/Os in a serialized fashion, which is then triggering mailbox timeout errors artificially. To fix this, don't wait on the DA_ID request to complete and call wait_event_timeout to allow the vport delete thread to make progress on an event driven basis rather than fixing the wait time. Link: https://lore.kernel.org/r/20211204002644.116455-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-12-06 22:35:36 -05:00
James Smart	af984c8729	scsi: lpfc: Allow fabric node recovery if recovery is in progress before devloss A link bounce to a slow fabric may observe FDISC response delays lasting longer than devloss tmo. Current logic decrements the final fabric node kref during a devloss tmo event. This results in a NULL ptr dereference crash if the FDISC completes for that fabric node after devloss tmo. Fix by adding the NLP_IN_RECOV_POST_DEV_LOSS flag, which is set when devloss tmo triggers and we've noticed that fabric node recovery has already started or finished in between the time lpfc_dev_loss_tmo_callbk queues lpfc_dev_loss_tmo_handler. If fabric node recovery succeeds, then the driver reverses the devloss tmo marked kref put with a kref get. If fabric node recovery fails, then the final kref put relies on the ELS timing out or the REG_LOGIN cmpl routine. Link: https://lore.kernel.org/r/20211020211417.88754-8-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-10-20 23:33:46 -04:00
James Smart	79b20becce	scsi: lpfc: Fix use-after-free in lpfc_unreg_rpi() routine An error is detected with the following report when unloading the driver: "KASAN: use-after-free in lpfc_unreg_rpi+0x1b1b" The NLP_REG_LOGIN_SEND nlp_flag is set in lpfc_reg_fab_ctrl_node(), but the flag is not cleared upon completion of the login. This allows a second call to lpfc_unreg_rpi() to proceed with nlp_rpi set to LPFC_RPI_ALLOW_ERROR. This results in a use after free access when used as an rpi_ids array index. Fix by clearing the NLP_REG_LOGIN_SEND nlp_flag in lpfc_mbx_cmpl_fc_reg_login(). Link: https://lore.kernel.org/r/20211020211417.88754-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-10-20 23:33:45 -04:00
James Smart	25ac2c970b	scsi: lpfc: Fix EEH support for NVMe I/O Injecting errors on the PCI slot while the driver is handling NVMe I/O will cause crashes and hangs. There are several rather difficult scenarios occurring. The main issue is that the adapter can report a PCI error before or simultaneously to the PCI subsystem reporting the error. Both paths have different entry points and currently there is no interlock between them. Thus multiple teardown paths are competing and all heck breaks loose. Complicating things is the NVMs path. To a large degree, I/O was able to be shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a similar call. At best, it works on a per-controller basis, but even at the controller level, it's a controller "reset" call. All of which means I/O is still flowing on different CPUs with reset paths expecting hw access (mailbox commands) to execute properly. The following modifications are made: - A new flag is set in PCI error entrypoints so the driver can track being called by that path. - An interlock is added in the SLI hw error path and the PCI error path such that only one of the paths proceeds with the teardown logic. - RPI cleanup is patched such that RPIs are marked unregistered w/o mbx cmds in cases of hw error. - If entering the SLI port re-init calls, a case where SLI error teardown was quick and beat the PCI calls now reporting error, check whether the SLI port is still live on the PCI bus. - In the PCI reset code to bring the adapter back, recheck the IRQ settings. Different checks for SLI3 vs SLI4. - In I/O completions, that may be called as part of the cleanup or underway just before the hw error, check the state of the adapter. If in error, shortcut handling that would expect further adapter completions as the hw error won't be sending them. - In routines waiting on I/O completions, which may have been in progress prior to the hw error, detect the device is being torn down and abort from their waits and just give up. This points to a larger issue in the driver on ref-counting for data structures, as it doesn't have ref-counting on q and port structures. We'll do this fix for now as it would be a major rework to be done differently. - Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being failed back due to hw error. - In I/O buf allocation, done at the start of new I/Os, check hw state and fail if hw error. Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-09-14 23:33:21 -04:00
James Smart	3a874488d2	scsi: lpfc: Fix rediscovery of tape device after LIP On link up and node discovery, a remote port is registered with the SCSI transport and the driver sets fc4_xpt_flags to track transport registration. A link down event causes the driver to deregister with the SCSI transport, starting the devloss timer, and calls a local unreg routine to clear the login state. Part of the login state is the fc4_xpt_flags. However, with tape devices that support sequence level error recovery, which wants to preserve the login, the local unreg routine is skipped, thus the flags aren't cleared. A subsequent link up, ADISC is performed and the lpfc_nlp_reg_node() routine is called. As the fc4_xpt_flags is not clear, it's believed the node is already registered with the transport. Unfortunately, the registration was already terminated. Eventually the devloss tmo timer expires and tears down the device. Fix by ensuring the tape device, known by the ADISC flag, is always unregistered if the link drops. Link: https://lore.kernel.org/r/20210910233159.115896-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-09-14 23:33:21 -04:00
James Smart	88f7702984	scsi: lpfc: Fix hang on unload due to stuck fport node A test scenario encountered an unload hang while an FLOGI ELS was in flight when a link down condition occurred. The driver fails unload as it never releases the fport node. For most nodes, when the link drops, devloss tmo is started and the timeout will cause the final node release. For the Fport, as it has not yet registered with the SCSI transport, there is no devloss timer to be started, so there is no final release. Additionally, the link down sequence causes ABORTS to be issued for pending ELS's. The completions from the ABORTS perform the release of node references. However, as the adapter is being reset to be unloaded, those completions will never occur. Fix by the following: - In the ELS cleanup, recognize when unloading and place the ELS's on a different list that immediately cleans up/completes the ELS's. It's recognized that this condition primarily affects only the fport, with other ports having normal clean up logic that handles things. - Resolve the devloss issue by, when cleaning up nodes on after link down, recognizing when the fabric node does not have a completed state (its state is UNUSED) and removing a reference so the node can delete after the ELS reference is released. Link: https://lore.kernel.org/r/20210910233159.115896-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-09-14 23:33:21 -04:00
James Smart	982fc3965d	scsi: lpfc: Don't release final kref on Fport node while ABTS outstanding In a rarely executed path, FLOGI failure, there is a refcounting error. If FLOGI completed with an error, typically a timeout, the initial completion handler would remove the job reference. However, the job completion isn't the actual end of the job/exchange as the timeout usually initiates an ABTS, and upon that ABTS completion, a final completion is sent. The driver removes the reference again in the final completion. Thus the imbalance. In the buggy cases, if there was a link bounce while the delayed response is outstanding, the fport node may be referenced again but there was no additional reference as it is already present. The delayed completion then occurs and removes the last reference freeing the node and causing issues in the link up processed that is using the node. Fix this scenario by removing the snippet that removed the reference in the initial FLOGI completion. The bad snippet was poorly trying to identify the FLOGI as OK to do so by realizing the node was not registered with either SCSI or NVMe transport. Link: https://lore.kernel.org/r/20210910233159.115896-3-jsmart2021@gmail.com Fixes: `618e2ee146` ("scsi: lpfc: Fix FLOGI failure due to accessing a freed node") Cc: <stable@vger.kernel.org> # v5.13+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-09-14 23:33:20 -04:00
James Smart	02243836ad	scsi: lpfc: Add support for the CM framework Complete the enablement of the cm framework feature in the adapter. Perform the following: - Detect the presence of the congestion management framework feature. When the cm framework is present: - Issue the SET_FEATURE command to enable the feature. - Register the cm statistics buffer with the adapter. - Read the cm enablement buffer to determine the cm framework state for cm management. When cm management is enabled: - Monitor all FPIN and congestion signalling events, incrementing counters. - Regularly sync with the adapter to communicate congestion events and to receive an rx request limit. - Monitor requests for rx data and ensure that no more than the adapter prescribed limit is issued on the link. If the limit is exceeded, SCSI and/or NVMe traffic is temporarily suspended. - Maintain the minute, hourly, daily statistics buffer. - Monitor for congestion enablement change events, causing a reread of the enablement buffer and acting on any change in enablement. And: - Add teardown logic, including buffer deregistration, on adapter detachment or reset. Link: https://lore.kernel.org/r/20210816162901.121235-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-08-24 22:56:34 -04:00
James Smart	9064aeb2df	scsi: lpfc: Add EDC ELS support When congestion management is enabled, issue EDC ELS to register congestion signaling capabilities with the fabric. The response handling will process the fabric parameters and set the reporting parameters. Similarly, add support for receiving an EDC request from the fabric generating a corresponding response. Implement handlers for congestion signals from the fabric and maintain statistics for them. Link: https://lore.kernel.org/r/20210816162901.121235-6-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-08-24 22:56:33 -04:00
James Smart	bfc477854a	scsi: lpfc: Add 256 Gb link speed support Update routines to support 256 Gb link speed for LPe37000/LPe38000 adapters. 256 Gb speeds can be seen on trunk links. Link: https://lore.kernel.org/r/20210722221721.74388-5-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-07-27 00:06:41 -04:00
James Smart	0614568361	scsi: lpfc: Delay unregistering from transport until GIDFT or ADISC completes On an RSCN event, the nodes specified in RSCN payload and in MAPPED state are moved to NPR state in order to revalidate the login. This triggers an immediate unregister from SCSI/NVMe backend. The assumption is that the node may be missing. The re-registration with the backend happens after either relogin (PLOGI/PRLI; if ADISC is disabled or login truly lost) or when ADISC completes successfully (rediscover with ADISC enabled). However, the NVMe-FC standard provides for an RSCN to be triggered when the remote port supports a discovery controller and there was a change of discovery log content. As the remote port typically also supports storage subsystems, this unregister causes all storage controller connections to fail and require reconnect. Correct by reworking the code to ensure that the unregistration only occurs when a login state is truly terminated, thereby leaving the NVMe storage controllers in place. The changes made are: - Retain node state in ADISC_ISSUE when scheduling ADISC ELS retry. - Do not clear wwpn/wwnn values upon ADISC failure. - Move MAPPED nodes to NPR during RSCN processing, but do not unregister with transport. On GIDFT completion, identify missing nodes (not marked NLP_NPR_2B_DISC) and unregister them. - Perform unregistration for nodes that will go through ADISC processing if ADISC completion fails. - Successful ADISC completion will move node back to MAPPED state. Link: https://lore.kernel.org/r/20210707184351.67872-16-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-07-18 22:30:37 -04:00
Gaurav Srivastava	0c4792c64f	scsi: lpfc: vmid: Add QFPA and VMID timeout check in worker thread Add a periodic check for issuing of QFPA command and VMID timeout in the worker thread. The inactivity timeout check is added via the timer function. Link: https://lore.kernel.org/r/20210608043556.274139-13-muneendra.kumar@broadcom.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-10 10:01:33 -04:00
Gaurav Srivastava	20397179aa	scsi: lpfc: vmid: Timeout implementation for VMID Implement timeout functionality for the VMID. After the set time period of inactivity, the VMID is deregistered from the switch. Link: https://lore.kernel.org/r/20210608043556.274139-12-muneendra.kumar@broadcom.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-06-10 10:01:33 -04:00
James Smart	fe83e3b9b4	scsi: lpfc: Fix node handling for Fabric Controller and Domain Controller During link bounce testing, RPI counts were seen to differ from the number of nodes. For fabric and domain controllers, a temporary RPI is assigned, but the code isn't registering it. If the nodes do go away, such as on link down, the temporary RPI isn't being released. Change the way these two fabric services are managed, make them behave like any other remote port. Register the RPI and register with the transport. Never leave the nodes in a NPR or UNUSED state where their RPI is in limbo. This allows them to follow normal dev_loss_tmo handling, RPI refcounting, and normal removal rules. It also allows fabric I/Os to use the RPI for traffic requests. Note: There is some logic that still has a couple of exceptions when the Domain controller (0xfffcXX). There are cases where the fabric won't have a valid login but will send RDP. Other times, it will it send a LOGO then an RDP. It makes for ad-hoc behavior to manage the node. Exceptions are documented in the code. Link: https://lore.kernel.org/r/20210514195559.119853-7-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-05-21 23:23:28 -04:00
James Smart	01131e7aae	scsi: lpfc: Fix unreleased RPIs when NPIV ports are created While testing NPIV and watching logins and used RPI levels, it was seen the used RPI count was much higher than the number of remote ports discovered. Code inspection showed that remote port removals on any NPIV instance are releasing the RPI, but not performing an UNREG_RPI with the adapter thus the reference counting never fully drops and the RPI is never fully released. This was happening on NPIV nodes due to a log of fabric ELS's to fabric addresses. This lack of UNREG_RPI was introduced by a prior node rework patch that performed the UNREG_RPI as part of node cleanup. To resolve the issue, do the following: - Restore the RPI release code, but move the location to so that it is in line with the new node cleanup design. - NPIV ports now release the RPI and drop the node when the caller sets the NLP_RELEASE_RPI flag. - Set the NLP_RELEASE_RPI flag in node cleanup which will trigger a release of RPI to free pool. - Ensure there's an UNREG_RPI at LOGO completion so that RPI release is completed. - Stop offline_prep from skipping nodes that are UNUSED. The RPI may not have been released. - Stop the default RPI handling in lpfc_cmpl_els_rsp() for SLI4. - Fixed up debugfs RPI displays for better debugging. Fixes: `a70e63eee1` ("scsi: lpfc: Fix NPIV Fabric Node reference counting") Link: https://lore.kernel.org/r/20210514195559.119853-2-jsmart2021@gmail.com Cc: <stable@vger.kernel.org> # v5.11+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-05-21 23:23:27 -04:00
James Smart	f115612528	scsi: lpfc: Standardize discovery object logging format Code inspection showed lpfc was using three different pointer formats when logging discovery object pointers. Standardize the pointer format to x%px. Note: %px use is limited to discovery objects in order to aid core analysis. Link: https://lore.kernel.org/r/20210412013127.2387-14-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-04-13 01:39:14 -04:00
James Smart	a314dec37c	scsi: lpfc: Fix missing FDMI registrations after Mgmt Svc login FDMI registration needs to be performed after every login with the FC Mgmt service. The flag the driver is using to track registration is cleared on link up, but never on Mgmt service logout/re-login. Fix by clearing the flag whenever a new login is completed with the FC Mgmt service. While perusing the flag use, logging was performed as if FDMI registration occurred on vports. However, it is limited to the physical port only. Revise the logging to reflect physical port based. Link: https://lore.kernel.org/r/20210412013127.2387-10-jsmart2021@gmail.com Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-04-13 01:39:14 -04:00
James Smart	078c68b87a	scsi: lpfc: Fix rmmod crash due to bad ring pointers to abort_iotag Rmmod on SLI-4 adapters is sometimes hitting a bad ptr dereference in lpfc_els_free_iocb(). A prior patch refactored the lpfc_sli_abort_iocb() routine. One of the changes was to convert from building/sending an abort within the routine to using a common routine. The reworked routine passes, without modification, the pring ptr to the new common routine. The older routine had logic to check SLI-3 vs SLI-4 and adapt the pring ptr if necessary as callers were passing SLI-3 pointers even when not on an SLI-4 adapter. The new routine is missing this check and adapt, so the SLI-3 ring pointers are being used in SLI-4 paths. Fix by cleaning up the calling routines. In review, there is no need to pass the ring ptr argument to abort_iocb at all. The routine can look at the adapter type itself and reference the proper ring. Link: https://lore.kernel.org/r/20210412013127.2387-2-jsmart2021@gmail.com Fixes: `db7531d2b3` ("scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers") Cc: <stable@vger.kernel.org> # v5.11+ Co-developed-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: Justin Tee <justin.tee@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-04-13 01:39:13 -04:00
Lee Jones	3884ce1539	scsi: lpfc: Fix incorrect naming of __lpfc_update_fcf_record() Fixes the following W=1 kernel build warning(s): drivers/scsi/lpfc/lpfc_hbadisc.c:1505: warning: expecting prototype for lpfc_update_fcf_record(). Prototype was for __lpfc_update_fcf_record() instead Link: https://lore.kernel.org/r/20210303144631.3175331-28-lee.jones@linaro.org Cc: James Smart <james.smart@broadcom.com> Cc: Dick Kennedy <dick.kennedy@broadcom.com> Cc: "James E.J. Bottomley" <jejb@linux.ibm.com> Cc: "Martin K. Petersen" <martin.petersen@oracle.com> Cc: linux-scsi@vger.kernel.org Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-03-15 22:14:53 -04:00
James Smart	67073c69c8	scsi: lpfc: Update copyrights for 12.8.0.7 and 12.8.0.8 changes For the files modified in 2021 via the 12.8.0.7 and 12.8.0.8 patch sets, update the copyright for 2021. Link: https://lore.kernel.org/r/20210301171821.3427-23-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-03-04 17:37:06 -05:00
James Smart	58c36e80ee	scsi: lpfc: Fix vport indices in lpfc_find_vport_by_vpid() Calls to lpfc_find_vport_by_vpid() for the highest indexed vport fails with error, "2936 Could not find Vport mapped to vpi XXX". Our vport indices in the loop and if-clauses were off by one. Correct the vpid range used for vpi lookup to include the highest possible vpid. Link: https://lore.kernel.org/r/20210301171821.3427-3-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-03-04 17:37:03 -05:00
Dan Carpenter	0be310979e	scsi: lpfc: Fix ancient double free The "pmb" pointer is freed at the start of the function and then freed again in the error handling code. Link: https://lore.kernel.org/r/YA6E8rO51hE56SVw@mwanda Fixes: `92d7f7b0cd` ("[SCSI] lpfc: NPIV: add NPIV support on top of SLI-3") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-01-26 22:08:57 -05:00
James Smart	a22d73b655	scsi: lpfc: Implement health checking when aborting I/O Several errors have occurred where the adapter stops or fails but does not raise the register values for the driver to detect failure. Thus driver is unaware of the failure. The failure typically results in I/O timeouts, the I/O timeout handler failing (after several seconds), and the error handler escalating recovery policy and resulting in more errors. Eventually, the driver is in a position where things have spiraled and it can't do recovery because other recovery ops are still outstanding and it becomes unusable. Resolve the situation by having the I/O timeout handler (actually a els, SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O, perform a mailbox command and look for a response from the hardware. If the mailbox command fails, it will mark the adapter offline and then invoke the adapter reset handler to clean up. The new I/O timeout test will be limited to a test every 5s. If there are multiple I/O timeouts concurrently, only the 1st I/O timeout will generate the mailbox command. Further testing will only occur once a timeout occurs after a 5s delay from the last mailbox command has expired. Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-01-07 23:02:37 -05:00
James Smart	07aaefdf75	scsi: lpfc: Fix crash when a fabric node is released prematurely The driver's management of the fabric controller (aka pseudo-scsi initiator) node in SLI3 mode is causing this crash. The crash occurs because of a node reference imbalance that frees the fabric controller node while devloss is outstanding from the SCSI transport. This is triggered by an odd behavior where the switch reacts to a rejected RDP request with a PLOGI and nothing else, not even a LOGO. The driver ACKS the PLOGI and after successfully registering the RPI, incorrectly registers the fabric controller node because it has the NLP_FC4_FCP flag still set from the fabric controller PRLI. If a LIP is issued, the driver attempts to cleanup on Link Up and ends up executing too many puts. Fix by detecting the fabric node type and clearing out the nodes internal flags that triggered a SCSI transport registration and subsequence dev_loss event. The driver cannot count on any persistence from fabric controller nodes. Link: https://lore.kernel.org/r/20210104180240.46824-5-jsmart2021@gmail.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2021-01-07 23:02:35 -05:00
James Smart	09b15e3507	scsi: lpfc: Fix set but unused variables in lpfc_dev_loss_tmo_handler() Remove set but not used variable shost in lpfc_dev_loss_tmo_handler(). Link: https://lore.kernel.org/r/20201119203353.121866-1-james.smart@broadcom.com Fixes: `52edb2caf6` ("scsi: lpfc: Remove ndlp when a PLOGI/ADISC/PRLI/REG_RPI ultimately fails") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-11-19 22:21:04 -05:00
James Smart	4a119d8a4c	scsi: lpfc: Fix set but not used warnings from Rework remote port lock handling Remove local variables that are set but not used. Link: https://lore.kernel.org/r/20201119203340.121819-1-james.smart@broadcom.com Fixes: `c6adba1501` ("scsi: lpfc: Rework remote port lock handling") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-11-19 22:20:26 -05:00
James Smart	db7531d2b3	scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers This patch reworks the abort interfaces such that SLI-3 retains the iocb-based formatting and completions and SLI-4 now uses native WQEs and completion routines. The following changes are made: - The code is refactored from a confusing 2 routine sequence of xx_abort_iotag_issue(), which creates/formats and abort cmd, and xx_issue_abort_tag(), which then issues and handles the completion of the abort cmd - into a single interface of xx_issue_abort_iotag(). The new interface will determine whether SLI-3 or SLI-4 and then call the appropriate handler. A completion handler can now be specified to address the differences in completion handling. Note: original code is all iocb based, with SLI-4 converting to SLI-3 for the SCSI/ELS path, and NVMe natively using wqes. - The SLI-3 side is refactored: The older iocb-base lpfc_sli_issue_abort_iotag() routine is combined with the logic of lpfc_sli_abort_iotag_issue() as well as the iocb-specific code in lpfc_abort_handler() and lpfc_sli_abort_iocb() to create the new single SLI-3 abort routine that formats and issues the iocb. - The SLI-4 side is refactored and added to: The native WQE abort code in NVMe is moved to the new SLI-4 issue_abort_iotag() routine. Items in SCSI that set fields not set by NVMe is migrated into the new routine. Thus the routine supports NVMe and SCSI initiators. The nvmet block (target) formats the abort slightly different (like the old NVMe initiator) thus it has its own prep routine stolen from NVMe initiator and it retains the current code it has for issuing the WQE (does not use the commonized routine the initiators do). SLI-4 completion handlers were also added. - lpfc_abort_handler now becomes a wrapper that determines whether SLI-3 or SLI-4 and calls the proper abort handler. Link: https://lore.kernel.org/r/20201115192646.12977-16-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-11-17 00:43:56 -05:00
James Smart	a70e63eee1	scsi: lpfc: Fix NPIV Fabric Node reference counting While testing initiator-side cable swaps with NPIV, oops occur. The reference counts for the Fabric nodes on the NPIV vports isn't balanced, resulting in premature node removal. The following fixes were made: - Removed the FC_LBIT check in lpfc_linkup_port. This removed the special case for vports that didn't have them clean up just like the physical port. - Removed the unreg_rpi call in lpfc_cleanup_node. In this section, the node is being removed in the context of a reference count release and a mailbox command can't be issued at this point. - Remove special case handling in the default mailbox completion handler that allowed the skipping of a node reference. Now, reference counting always requires the removal of the reference. - Move the location of the DEVICE_RM event is done during LOGO handling as the driver has additional work to do on the ndlp before puts/releases can be performed. Link: https://lore.kernel.org/r/20201115192646.12977-10-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-11-17 00:43:55 -05:00
James Smart	b3f2e67cc2	scsi: lpfc: Fix NPIV discovery and Fabric Node detection While testing NPIV and link bounces, the vport would not show a fabric node for the F_Port, would not transition into NPR state during a link fault, or leave the FDMI node untouched during error injection. Cause for this was determined to be an inconsistent manner in which F_Port, Nameserver, and FDMI controller nodes were created and linked. In some cases, the nodes would never be unregistered from the transport, leaving references active. In other cases, the fabric nodes may register with the transport multiple times while still registered. The following changes were made: - Fix the FDISC issue routine, which starts vport (re)creation, to mark the F_Port as a fabric node (NLP_FABRIC) and allow the F_Port node to fully be created and show up in the node list. - When remote ports are cleaned up on vport termination, cleanup the nameserver and FDMI controller nodes on the vport so they unregister from the transport. - On link bounces, don't exclude the NPIV Fabric remote ports from transitioning to the NPR state, allowing them to avoid re-registration if already registered. Link: https://lore.kernel.org/r/20201115192646.12977-9-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-11-17 00:43:55 -05:00
James Smart	52edb2caf6	scsi: lpfc: Remove ndlp when a PLOGI/ADISC/PRLI/REG_RPI ultimately fails When a PLOGI/ADISC/PRLI/REG_RPI fails, the node remains in the nodelist in that state. Although the driver now frees a node when the ref count goes to zero, in this case the ref cnt doesn't reach zero because there isn't a mechanism to release the final reference. Discovery just stops. Fix by calling the node discovery state machine DEVICE_RM event whenever one of these commands fail. This will remove the final reference count and trigger node release. Link: https://lore.kernel.org/r/20201115192646.12977-7-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-11-17 00:43:55 -05:00
James Smart	c6adba1501	scsi: lpfc: Rework remote port lock handling Currently the discovery layers within the driver use the SCSI midlayer host_lock to access node-specific structures. This can contend with the I/O path and is too coarse of a lock. Rework the driver so that it uses a lock specific to the remote port node structure when accessing the structure contents. A few of the changes brought out spots were some slightly reorganized routines worked better. Link: https://lore.kernel.org/r/20201115192646.12977-6-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-11-17 00:43:54 -05:00
James Smart	95f0ef8a83	scsi: lpfc: Fix removal of SCSI transport device get and put on dev structure The lpfc driver is calling get_device and put_device on scsi_fc_transport device structure. When this code was removed, the driver triggered an oops in "scsi_is_host_dev" when the first SCSI target was unregistered from the transport. The reason the calls were necessary is that the driver is calling scsi_remove_host too early, before the target rports are unregistered and the scsi devices disconnected from the scsi_host. The fc_host was torn down during fc_remove_host. Fix by moving the lpfc_pci_remove_one_s3/s4 calls to scsi_remove_host to after the nodes are cleaned up. Remove the get_device and put_device calls and the supporting code. Link: https://lore.kernel.org/r/20201115192646.12977-4-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-11-17 00:43:54 -05:00
James Smart	4430f7fd09	scsi: lpfc: Rework locations of ndlp reference taking Now that the driver has gone to a normal ref interface (with no odd logic) the discovery logic needs to be updated to reworked so that it properly takes references when it should and give them up when it should. Rework the driver for the following get/put model: - Move gets to just before an I/O is issued. Add gets for places where an I/O was issued without one. - Ensure that failures from lpfc_nlp_get() are handled by the driver. - Check and fix the placement of lpfc_nlp_puts relative to io completions. Note: some of these paths may not release the reference on the exact io completion as the reference is held as the code takes another step in the discovery thread and which may cause another io to be issued. - Rearrange some code for error processing and calling lpfc_nlp_put. - Fix some places of incorrect reference freeing that was causing the premature releasing of the structure. - Nvmet plogi handling performs unreg_rpi's. The reference counts were unbalanced resulting in premature node removal. In some cases this caused loss of node discovery. Corrected the reftaking around nvmet plogis. Nodes that experience devloss now get released from the node list now that there is a proper reference taking. Link: https://lore.kernel.org/r/20201115192646.12977-3-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-11-17 00:43:54 -05:00
James Smart	307e338097	scsi: lpfc: Rework remote port ref counting and node freeing When a remote port is disconnected and disappears, its node structure (ndlp) stays allocated and on a vport node list. While on the list it can be matched, thus requires validation checks on state to be added in numerous code paths. If the node comes back, its possible for there to be multiple node structures for the same device on the vport node list. There is no reason to keep the node structure around after it is no longer in existence, and the current implementation creates problems for itself (multiple nodes) and lots of unnecessary code for state validation. Additionally, the reference taking on the node structure didn't follow the normal model used by the kernel kref api. It included lots of odd logic to match state with reference count. The combination of this odd logic plus the way it was implicitly used in the discovery engine made its reference taking implementation suspect and extremely hard to follow. Change the driver such that the reference taking routines are now normal ref increments/decrements and callout on refcount=0. With this in place, the rework can be done such that the node structure is fully removed and deallocated when the remote port no longer exists and all references are removed. This removal logic, and the basic ref counting are intrically tied, thus in a single patch. Link: https://lore.kernel.org/r/20201115192646.12977-2-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-11-17 00:43:54 -05:00
James Smart	8aaa7bcf07	scsi: lpfc: Add FDMI Vendor MIB support Created new attribute lpfc_enable_mi, which by default is enabled. Add command definition bits for SLI-4 parameters that recognize whether the adapter has MIB information support and what revision of MIB data. Using the adapter information, register vendor-specific MIB support with FDMI. The registration will be done every link up. During FDMI registration, encountered a couple of errors when reverting to FDMI rev1. Code needed to exist once reverting. Fixed these. Link: https://lore.kernel.org/r/20201020202719.54726-8-james.smart@broadcom.com Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-10-26 21:42:39 -04:00
James Smart	e7dab164a9	scsi: lpfc: Fix scheduling call while in softirq context in lpfc_unreg_rpi The following call trace was seen during HBA reset testing: BUG: scheduling while atomic: swapper/2/0/0x10000100 ... Call Trace: dump_stack+0x19/0x1b __schedule_bug+0x64/0x72 __schedule+0x782/0x840 __cond_resched+0x26/0x30 _cond_resched+0x3a/0x50 mempool_alloc+0xa0/0x170 lpfc_unreg_rpi+0x151/0x630 [lpfc] lpfc_sli_abts_recover_port+0x171/0x190 [lpfc] lpfc_sli4_abts_err_handler+0xb2/0x1f0 [lpfc] lpfc_sli4_io_xri_aborted+0x256/0x300 [lpfc] lpfc_sli4_sp_handle_abort_xri_wcqe.isra.51+0xa3/0x190 [lpfc] lpfc_sli4_fp_handle_cqe+0x89/0x4d0 [lpfc] __lpfc_sli4_process_cq+0xdb/0x2e0 [lpfc] __lpfc_sli4_hba_process_cq+0x41/0x100 [lpfc] lpfc_cq_poll_hdler+0x1a/0x30 [lpfc] irq_poll_softirq+0xc7/0x100 __do_softirq+0xf5/0x280 call_softirq+0x1c/0x30 do_softirq+0x65/0xa0 irq_exit+0x105/0x110 do_IRQ+0x56/0xf0 common_interrupt+0x16a/0x16a With the conversion to blk_io_poll for better interrupt latency in normal cases, it introduced this code path, executed when I/O aborts or logouts are seen, which attempts to allocate memory for a mailbox command to be issued. The allocation is GFP_KERNEL, thus it could attempt to sleep. Fix by creating a work element that performs the event handling for the remote port. This will have the mailbox commands and other items performed in the work element, not the irq. A much better method as the "irq" routine does not stall while performing all this deep handling code. Ensure that allocation failures are handled and send LOGO on failure. Additionally, enlarge the mailbox memory pool to reduce the possibility of additional allocation in this path. Link: https://lore.kernel.org/r/20201020202719.54726-3-james.smart@broadcom.com Fixes: `317aeb83c9` ("scsi: lpfc: Add blk_io_poll support for latency improvment") Cc: <stable@vger.kernel.org> # v5.9+ Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-10-26 21:42:38 -04:00
Linus Torvalds	a1bffa4874	SCSI fixes on 20200926 Three fixes: one in drivers (lpfc) and two for zoned block devices. The latter also impinges on the block layer but only to introduce a new block API for setting the zone model rather than fiddling with the queue directly in the zoned block driver. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX29mRyYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishabnAP48vMYD /cjyGAJfq/0k/U/t6pRPc5tUm89LOWcOJz0SjwD/YXcQNz7mx8MxnypAV1jbWXR7 iyWkPMYVc4EJh7oTARE= =SQhI -----END PGP SIGNATURE----- Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI fixes from James Bottomley: "Three fixes: one in drivers (lpfc) and two for zoned block devices. The latter also impinges on the block layer but only to introduce a new block API for setting the zone model rather than fiddling with the queue directly in the zoned block driver" * tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: scsi: sd: sd_zbc: Fix ZBC disk initialization scsi: sd: sd_zbc: Fix handling of host-aware ZBC disks scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supported	2020-09-26 11:18:37 -07:00
James Smart	7f04839ec4	scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supported Initial FLOGIs are failing with the following message: lpfc 0000:13:00.1: 1:(0):0820 FLOGI Failed (x300). BBCredit Not Supported In a prior patch, the READ_SPARAM command was re-ordered to post after CONFIG_LINK as the driver is expected to update the driver's copy of the service parameters for the FLOGI payload. If the bb-credit recovery feature is enabled, this is fine. But on adapters were bb-credit recovery isn't enabled, it would cause the FLOGI to fail. Fix by restoring the original command order (READ_SPARAM before CONFIG_LINK), and after issuing CONFIG_LINK, detect bb-credit recovery support and reissuing READ_SPARAM to obtain the updated service parameters (effectively adding in the fix command order). [mkp: corrected SHA] Link: https://lore.kernel.org/r/20200911200147.110826-1-james.smart@broadcom.com Fixes: `835214f5d5` ("scsi: lpfc: Fix broken Credit Recovery after driver load") CC: <stable@vger.kernel.org> # v5.7+ Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <james.smart@broadcom.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-09-15 18:45:42 -04:00
Gustavo A. R. Silva	df561f6688	treewide: Use fallthrough pseudo-keyword Replace the existing /* fall through */ comments and its variants with the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary fall-through markings when it is the case. [1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2020-08-23 17:36:59 -05:00
Lee Jones	e415f2a2ac	scsi: lpfc: Fix kerneldoc parameter formatting/misnaming/missing issues Fixes the following W=1 kernel build warning(s): drivers/scsi/lpfc/lpfc_hbadisc.c:1209: warning: Function parameter or member 'phba' not described in 'lpfc_sli4_clear_fcf_rr_bmask' drivers/scsi/lpfc/lpfc_hbadisc.c:1309: warning: Function parameter or member 'sw_name' not described in 'lpfc_sw_name_match' drivers/scsi/lpfc/lpfc_hbadisc.c:1309: warning: Excess function parameter 'fab_name' description in 'lpfc_sw_name_match' drivers/scsi/lpfc/lpfc_hbadisc.c:1397: warning: Function parameter or member 'fcf_rec' not described in 'lpfc_copy_fcf_record' drivers/scsi/lpfc/lpfc_hbadisc.c:1397: warning: Excess function parameter 'fcf' description in 'lpfc_copy_fcf_record' drivers/scsi/lpfc/lpfc_hbadisc.c:1956: warning: Cannot understand lpfc_sli4_fcf_record_match - testing new FCF record for matching existing FCF on line 1956 - I thought it was a doc line drivers/scsi/lpfc/lpfc_hbadisc.c:2078: warning: Function parameter or member 'fcf_index' not described in 'lpfc_sli4_fcf_pri_list_del' drivers/scsi/lpfc/lpfc_hbadisc.c:2109: warning: Function parameter or member 'fcf_index' not described in 'lpfc_sli4_set_fcf_flogi_fail' drivers/scsi/lpfc/lpfc_hbadisc.c:2135: warning: Function parameter or member 'fcf_index' not described in 'lpfc_sli4_fcf_pri_list_add' drivers/scsi/lpfc/lpfc_hbadisc.c:2135: warning: Function parameter or member 'new_fcf_record' not described in 'lpfc_sli4_fcf_pri_list_add' Link: https://lore.kernel.org/r/20200723122446.1329773-3-lee.jones@linaro.org Cc: James Smart <james.smart@broadcom.com> Cc: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: Lee Jones <lee.jones@linaro.org> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-07-24 22:31:54 -04:00
Dick Kennedy	372c187b8a	scsi: lpfc: Add an internal trace log buffer The current logging methods typically end up requesting a reproduction with a different logging level set to figure out what happened. This was mainly by design to not clutter the kernel log messages with things that were typically not interesting and the messages themselves could cause other issues. When looking to make a better system, it was seen that in many cases when more data was wanted was when another message, usually at KERN_ERR level, was logged. And in most cases, what the additional logging that was then enabled was typically. Most of these areas fell into the discovery machine. Based on this summary, the following design has been put in place: The driver will maintain an internal log (256 elements of 256 bytes). The "additional logging" messages that are usually enabled in a reproduction will be changed to now log all the time to the internal log. A new logging level is defined - LOG_TRACE_EVENT. When this level is set (it is not by default) and a message marked as KERN_ERR is logged, all the messages in the internal log will be dumped to the kernel log before the KERN_ERR message is logged. There is a timestamp on each message added to the internal log. However, this timestamp is not converted to wall time when logged. The value of the timestamp is solely to give a crude time reference for the messages. Link: https://lore.kernel.org/r/20200630215001.70793-14-jsmart2021@gmail.com Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-07-02 23:06:49 -04:00
Linus Torvalds	818dbde78e	SCSI misc on 20200605 This series consists of the usual driver updates (qla2xxx, ufs, zfcp, target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host of other minor updates. There are no major core changes in this series apart from a refactoring in scsi_lib.c. Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com> -----BEGIN PGP SIGNATURE----- iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXtq5QyYcamFtZXMuYm90 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXyGAQCipTWx 7kHKHZBCVTU133bADt3+SstLrAm8PKZEXMnP9wEAzu4QkkW8URxEDRrpu7qk5gbA 9M/KyqvfRtTH7+BSK7M= =J6aO -----END PGP SIGNATURE----- Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi Pull SCSI updates from James Bottomley: :This series consists of the usual driver updates (qla2xxx, ufs, zfcp, target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host of other minor updates. There are no major core changes in this series apart from a refactoring in scsi_lib.c" * tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits) scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime changes scsi: cxgb3i: Fix some leaks in init_act_open() scsi: ibmvscsi: Make some functions static scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim scsi: ufs: Fix WriteBooster flush during runtime suspend scsi: ufs: Fix index of attributes query for WriteBooster feature scsi: ufs: Allow WriteBooster on UFS 2.2 devices scsi: ufs: Remove unnecessary memset for dev_info scsi: ufs-qcom: Fix scheduling while atomic issue scsi: mpt3sas: Fix reply queue count in non RDPQ mode scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited event scsi: target: tcmu: Fix a use after free in tcmu_check_expired_queue_cmd() scsi: vhost: Notify TCM about the maximum sg entries supported per command scsi: qla2xxx: Remove return value from qla_nvme_ls() scsi: qla2xxx: Remove an unused function scsi: iscsi: Register sysfs for iscsi workqueue scsi: scsi_debug: Parser tables and code interaction scsi: core: Refactor scsi_mq_setup_tags function scsi: core: Fix incorrect usage of shost_for_each_device scsi: qla2xxx: Fix endianness annotations in source files ...	2020-06-05 15:11:50 -07:00
James Smart	4c2805aab5	lpfc: nvmet: Add support for NVME LS request hosthandle As the nvmet layer does not have the concept of a remoteport object, which can be used to identify the entity on the other end of the fabric that is to receive an LS, the hosthandle was introduced. The driver passes the hosthandle, a value representative of the remote port, with a ls request receive. The LS request will create the association. The transport will remember the hosthandle for the association, and if there is a need to initiate a LS request to the remote port for the association, the hosthandle will be used. When the driver loses connectivity with the remote port, it needs to notify the transport that the hosthandle is no longer valid, allowing the transport to terminate associations related to the hosthandle. This patch adds support to the driver for the hosthandle. The driver will use the ndlp pointer of the remote port for the hosthandle in calls to nvmet_fc_rcv_ls_req(). The discovery engine is updated to invalidate the hosthandle whenever connectivity with the remote port is lost. Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:34 -06:00
James Smart	2a1160a03a	lpfc: Refactor lpfc nvme headers A lot of files in lpfc include nvme headers, building up relationships that require a file to change for its headers when there is no other change necessary. It would be better to localize the nvme headers. There is also no need for separate nvme (initiator) and nvmet (tgt) header files. Refactor the inclusion of nvme headers so that all nvme items are included by lpfc_nvme.h Merge lpfc_nvmet.h into lpfc_nvme.h so that there is a single header used by both the nvme and nvmet sides. This prepares for structure sharing between the two roles. Prep to add shared function prototypes for upcoming shared routines. Signed-off-by: Paul Ely <paul.ely@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-05-09 16:18:34 -06:00
Dick Kennedy	88acb4d9ff	scsi: lpfc: Remove unnecessary lockdep_assert_held calls In an audit of lockdep calls in the driver, there are multiple lockdep checks in successive calling layers. E.g. a routine checks, and then calls a lower routine that also checks, and so on. Calling sequences result in many redundant checks. Refine the code to remove lower-level lockdep checks. Update comments on the lock, correcting a few places where lock object in comment was incorrect. Link: https://lore.kernel.org/r/20200501214310.91713-7-jsmart2021@gmail.com Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com> Signed-off-by: James Smart <jsmart2021@gmail.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>	2020-05-07 22:47:24 -04:00

1 2 3 4 5 ...

354 Commits