When injecting EEH errors the port is getting hung up waiting on the node
list to empty, message number 0233. The driver is stuck at this point and
also can't unload. The driver makes transport remoteport delete calls which
try to abort I/O's, but the EEH daemon has already called the driver to
detach and the detachment has set the global FC_UNLOADING flag. There are
several code paths that will avoid I/O cleanup if the FC_UNLOADING flag is
set, resulting in transports waiting for I/O while the driver is waiting on
transports to clean up.
Additionally, during study of the list, a locking issue was found in
lpfc_sli_abort_iocb_ring that could corrupt the list.
A special case was added to the lpfc_cleanup() routine to call
lpfc_sli_flush_rings() if the driver is FC_UNLOADING and if the pci-slot
is offline (e.g. EEH).
The SLI4 part of lpfc_sli_abort_iocb_ring() is changed to use the
ring_lock. Also added code to cancel the I/Os if the pci-slot is offline
and added checks and returns for the FC_UNLOADING and HBA_IOQ_FLUSH flags
to prevent trying to send an I/O that we cannot handle.
Link: https://lore.kernel.org/r/20220317032737.45308-3-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Following EEH errors, the driver can crash or hang when deleting the
localport or when attempting to unload.
The EEH handlers in the driver did not notify the NVMe-FC transport before
tearing the driver down. This was delayed until the resume steps. This
worked for SCSI because lpfc_block_scsi() would notify the
scsi_fc_transport that the target was not available but it would not clean
up all the references to the ndlp.
The SLI3 prep for dev reset handler did the lpfc_offline_prep() and
lpfc_offline() calls to get the port stopped before restarting. The SLI4
version of the prep for dev reset just destroyed the queues and did not
stop NVMe from continuing. Also because the port was not really stopped
the localport destroy would hang because the transport was still waiting
for I/O. Additionally, a devloss tmo can fire and post events to a stopped
worker thread creating another hang condition.
lpfc_sli4_prep_dev_for_reset() is modified to call lpfc_offline_prep() and
lpfc_offline() rather than just lpfc_scsi_dev_block() to ensure both SCSI
and NVMe transports are notified to block I/O to the driver.
Logic is added to devloss handler and worker thread to clean up ndlp
references and quiesce appropriately.
Link: https://lore.kernel.org/r/20220317032737.45308-2-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update copyrights to 2022 for files modified in the 14.2.0.0 patch set.
Link: https://lore.kernel.org/r/20220225022308.16486-18-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch refactors the remaining ELS paths to use SLI-4 as the primary
interface. Paths include RRQ, RSCN, unsolicited ELS RQST and RSP paths, ELS
timeouts, etc.:
- Remove unused routines lpfc_sli4_bpl2sgl and lpfc_sli4_iocb2wqe
- Conversion away from using SLI-3 iocb structures to set/access fields in
common routines. Use the new generic get/set routines that were added.
This move changes code from indirect structure references to using local
variables with the generic routines.
- Refactor routines when setting non-generic fields, to have both SLI3 and
SLI4 specific sections. This replaces the set-as-SLI3 then translate to
SLI4 behavior of the past.
Link: https://lore.kernel.org/r/20220225022308.16486-12-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Extraneous teardown routines are present in the firmware dump path causing
altered states in firmware captures.
When a firmware dump is requested via sysfs, trigger the dump immediately
without tearing down structures and changing adapter state.
The driver shall rely on pre-existing firmware error state clean up
handlers to restore the adapter.
Link: https://lore.kernel.org/r/20211204002644.116455-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver is calling schedule_timeout after the DA_ID nameserver request
and LOGO commands are issued to the fabric by the initiator virtual
endport. These fixed delay functions are causing long delays in the
driver's worker thread when processing discovery I/Os in a serialized
fashion, which is then triggering mailbox timeout errors artificially.
To fix this, don't wait on the DA_ID request to complete and call
wait_event_timeout to allow the vport delete thread to make progress on an
event driven basis rather than fixing the wait time.
Link: https://lore.kernel.org/r/20211204002644.116455-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A link bounce to a slow fabric may observe FDISC response delays lasting
longer than devloss tmo. Current logic decrements the final fabric node
kref during a devloss tmo event. This results in a NULL ptr dereference
crash if the FDISC completes for that fabric node after devloss tmo.
Fix by adding the NLP_IN_RECOV_POST_DEV_LOSS flag, which is set when
devloss tmo triggers and we've noticed that fabric node recovery has
already started or finished in between the time lpfc_dev_loss_tmo_callbk
queues lpfc_dev_loss_tmo_handler. If fabric node recovery succeeds, then
the driver reverses the devloss tmo marked kref put with a kref get. If
fabric node recovery fails, then the final kref put relies on the ELS
timing out or the REG_LOGIN cmpl routine.
Link: https://lore.kernel.org/r/20211020211417.88754-8-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
An error is detected with the following report when unloading the driver:
"KASAN: use-after-free in lpfc_unreg_rpi+0x1b1b"
The NLP_REG_LOGIN_SEND nlp_flag is set in lpfc_reg_fab_ctrl_node(), but the
flag is not cleared upon completion of the login.
This allows a second call to lpfc_unreg_rpi() to proceed with nlp_rpi set
to LPFC_RPI_ALLOW_ERROR. This results in a use after free access when used
as an rpi_ids array index.
Fix by clearing the NLP_REG_LOGIN_SEND nlp_flag in
lpfc_mbx_cmpl_fc_reg_login().
Link: https://lore.kernel.org/r/20211020211417.88754-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Injecting errors on the PCI slot while the driver is handling NVMe I/O will
cause crashes and hangs.
There are several rather difficult scenarios occurring. The main issue is
that the adapter can report a PCI error before or simultaneously to the PCI
subsystem reporting the error. Both paths have different entry points and
currently there is no interlock between them. Thus multiple teardown paths
are competing and all heck breaks loose.
Complicating things is the NVMs path. To a large degree, I/O was able to be
shutdown for a full FC port on the SCSI stack. But on NVMe, there isn't a
similar call. At best, it works on a per-controller basis, but even at the
controller level, it's a controller "reset" call. All of which means I/O is
still flowing on different CPUs with reset paths expecting hw access
(mailbox commands) to execute properly.
The following modifications are made:
- A new flag is set in PCI error entrypoints so the driver can track being
called by that path.
- An interlock is added in the SLI hw error path and the PCI error path
such that only one of the paths proceeds with the teardown logic.
- RPI cleanup is patched such that RPIs are marked unregistered w/o mbx
cmds in cases of hw error.
- If entering the SLI port re-init calls, a case where SLI error teardown
was quick and beat the PCI calls now reporting error, check whether the
SLI port is still live on the PCI bus.
- In the PCI reset code to bring the adapter back, recheck the IRQ
settings. Different checks for SLI3 vs SLI4.
- In I/O completions, that may be called as part of the cleanup or
underway just before the hw error, check the state of the adapter. If
in error, shortcut handling that would expect further adapter
completions as the hw error won't be sending them.
- In routines waiting on I/O completions, which may have been in progress
prior to the hw error, detect the device is being torn down and abort
from their waits and just give up. This points to a larger issue in the
driver on ref-counting for data structures, as it doesn't have
ref-counting on q and port structures. We'll do this fix for now as it
would be a major rework to be done differently.
- Fix the NVMe cleanup to simulate NVMe I/O completions if I/O is being
failed back due to hw error.
- In I/O buf allocation, done at the start of new I/Os, check hw state and
fail if hw error.
Link: https://lore.kernel.org/r/20210910233159.115896-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On link up and node discovery, a remote port is registered with the SCSI
transport and the driver sets fc4_xpt_flags to track transport
registration.
A link down event causes the driver to deregister with the SCSI transport,
starting the devloss timer, and calls a local unreg routine to clear the
login state. Part of the login state is the fc4_xpt_flags. However, with
tape devices that support sequence level error recovery, which wants to
preserve the login, the local unreg routine is skipped, thus the flags
aren't cleared.
A subsequent link up, ADISC is performed and the lpfc_nlp_reg_node()
routine is called. As the fc4_xpt_flags is not clear, it's believed the
node is already registered with the transport. Unfortunately, the
registration was already terminated. Eventually the devloss tmo timer
expires and tears down the device.
Fix by ensuring the tape device, known by the ADISC flag, is always
unregistered if the link drops.
Link: https://lore.kernel.org/r/20210910233159.115896-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A test scenario encountered an unload hang while an FLOGI ELS was in flight
when a link down condition occurred. The driver fails unload as it never
releases the fport node.
For most nodes, when the link drops, devloss tmo is started and the timeout
will cause the final node release. For the Fport, as it has not yet
registered with the SCSI transport, there is no devloss timer to be
started, so there is no final release. Additionally, the link down
sequence causes ABORTS to be issued for pending ELS's. The completions from
the ABORTS perform the release of node references. However, as the adapter
is being reset to be unloaded, those completions will never occur.
Fix by the following:
- In the ELS cleanup, recognize when unloading and place the ELS's on a
different list that immediately cleans up/completes the ELS's. It's
recognized that this condition primarily affects only the fport, with
other ports having normal clean up logic that handles things.
- Resolve the devloss issue by, when cleaning up nodes on after link down,
recognizing when the fabric node does not have a completed state (its
state is UNUSED) and removing a reference so the node can delete after
the ELS reference is released.
Link: https://lore.kernel.org/r/20210910233159.115896-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In a rarely executed path, FLOGI failure, there is a refcounting error. If
FLOGI completed with an error, typically a timeout, the initial completion
handler would remove the job reference. However, the job completion isn't
the actual end of the job/exchange as the timeout usually initiates an
ABTS, and upon that ABTS completion, a final completion is sent. The driver
removes the reference again in the final completion. Thus the imbalance.
In the buggy cases, if there was a link bounce while the delayed response
is outstanding, the fport node may be referenced again but there was no
additional reference as it is already present. The delayed completion then
occurs and removes the last reference freeing the node and causing issues
in the link up processed that is using the node.
Fix this scenario by removing the snippet that removed the reference in the
initial FLOGI completion. The bad snippet was poorly trying to identify the
FLOGI as OK to do so by realizing the node was not registered with either
SCSI or NVMe transport.
Link: https://lore.kernel.org/r/20210910233159.115896-3-jsmart2021@gmail.com
Fixes: 618e2ee146 ("scsi: lpfc: Fix FLOGI failure due to accessing a freed node")
Cc: <stable@vger.kernel.org> # v5.13+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Complete the enablement of the cm framework feature in the adapter. Perform
the following:
- Detect the presence of the congestion management framework feature.
When the cm framework is present:
- Issue the SET_FEATURE command to enable the feature.
- Register the cm statistics buffer with the adapter.
- Read the cm enablement buffer to determine the cm framework state for cm
management.
When cm management is enabled:
- Monitor all FPIN and congestion signalling events, incrementing
counters.
- Regularly sync with the adapter to communicate congestion events and to
receive an rx request limit.
- Monitor requests for rx data and ensure that no more than the
adapter prescribed limit is issued on the link. If the limit is
exceeded, SCSI and/or NVMe traffic is temporarily suspended.
- Maintain the minute, hourly, daily statistics buffer.
- Monitor for congestion enablement change events, causing a reread of the
enablement buffer and acting on any change in enablement.
And:
- Add teardown logic, including buffer deregistration, on adapter
detachment or reset.
Link: https://lore.kernel.org/r/20210816162901.121235-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When congestion management is enabled, issue EDC ELS to register congestion
signaling capabilities with the fabric. The response handling will process
the fabric parameters and set the reporting parameters.
Similarly, add support for receiving an EDC request from the fabric
generating a corresponding response.
Implement handlers for congestion signals from the fabric and maintain
statistics for them.
Link: https://lore.kernel.org/r/20210816162901.121235-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update routines to support 256 Gb link speed for LPe37000/LPe38000
adapters. 256 Gb speeds can be seen on trunk links.
Link: https://lore.kernel.org/r/20210722221721.74388-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On an RSCN event, the nodes specified in RSCN payload and in MAPPED state
are moved to NPR state in order to revalidate the login. This triggers an
immediate unregister from SCSI/NVMe backend. The assumption is that the
node may be missing. The re-registration with the backend happens after
either relogin (PLOGI/PRLI; if ADISC is disabled or login truly lost) or
when ADISC completes successfully (rediscover with ADISC enabled).
However, the NVMe-FC standard provides for an RSCN to be triggered when
the remote port supports a discovery controller and there was a change
of discovery log content. As the remote port typically also supports
storage subsystems, this unregister causes all storage controller
connections to fail and require reconnect.
Correct by reworking the code to ensure that the unregistration only occurs
when a login state is truly terminated, thereby leaving the NVMe storage
controllers in place.
The changes made are:
- Retain node state in ADISC_ISSUE when scheduling ADISC ELS retry.
- Do not clear wwpn/wwnn values upon ADISC failure.
- Move MAPPED nodes to NPR during RSCN processing, but do not unregister
with transport. On GIDFT completion, identify missing nodes (not marked
NLP_NPR_2B_DISC) and unregister them.
- Perform unregistration for nodes that will go through ADISC processing
if ADISC completion fails.
- Successful ADISC completion will move node back to MAPPED state.
Link: https://lore.kernel.org/r/20210707184351.67872-16-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Add a periodic check for issuing of QFPA command and VMID timeout in the
worker thread. The inactivity timeout check is added via the timer
function.
Link: https://lore.kernel.org/r/20210608043556.274139-13-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implement timeout functionality for the VMID. After the set time period of
inactivity, the VMID is deregistered from the switch.
Link: https://lore.kernel.org/r/20210608043556.274139-12-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During link bounce testing, RPI counts were seen to differ from the number
of nodes. For fabric and domain controllers, a temporary RPI is assigned,
but the code isn't registering it. If the nodes do go away, such as on link
down, the temporary RPI isn't being released.
Change the way these two fabric services are managed, make them behave like
any other remote port. Register the RPI and register with the transport.
Never leave the nodes in a NPR or UNUSED state where their RPI is in limbo.
This allows them to follow normal dev_loss_tmo handling, RPI refcounting,
and normal removal rules. It also allows fabric I/Os to use the RPI for
traffic requests.
Note: There is some logic that still has a couple of exceptions when the
Domain controller (0xfffcXX). There are cases where the fabric won't have a
valid login but will send RDP. Other times, it will it send a LOGO then an
RDP. It makes for ad-hoc behavior to manage the node. Exceptions are
documented in the code.
Link: https://lore.kernel.org/r/20210514195559.119853-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While testing NPIV and watching logins and used RPI levels, it was seen the
used RPI count was much higher than the number of remote ports discovered.
Code inspection showed that remote port removals on any NPIV instance are
releasing the RPI, but not performing an UNREG_RPI with the adapter thus
the reference counting never fully drops and the RPI is never fully
released. This was happening on NPIV nodes due to a log of fabric ELS's to
fabric addresses. This lack of UNREG_RPI was introduced by a prior node
rework patch that performed the UNREG_RPI as part of node cleanup.
To resolve the issue, do the following:
- Restore the RPI release code, but move the location to so that it is in
line with the new node cleanup design.
- NPIV ports now release the RPI and drop the node when the caller sets
the NLP_RELEASE_RPI flag.
- Set the NLP_RELEASE_RPI flag in node cleanup which will trigger a
release of RPI to free pool.
- Ensure there's an UNREG_RPI at LOGO completion so that RPI release is
completed.
- Stop offline_prep from skipping nodes that are UNUSED. The RPI may
not have been released.
- Stop the default RPI handling in lpfc_cmpl_els_rsp() for SLI4.
- Fixed up debugfs RPI displays for better debugging.
Fixes: a70e63eee1 ("scsi: lpfc: Fix NPIV Fabric Node reference counting")
Link: https://lore.kernel.org/r/20210514195559.119853-2-jsmart2021@gmail.com
Cc: <stable@vger.kernel.org> # v5.11+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Code inspection showed lpfc was using three different pointer formats when
logging discovery object pointers.
Standardize the pointer format to x%px.
Note: %px use is limited to discovery objects in order to aid core
analysis.
Link: https://lore.kernel.org/r/20210412013127.2387-14-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
FDMI registration needs to be performed after every login with the FC Mgmt
service. The flag the driver is using to track registration is cleared on
link up, but never on Mgmt service logout/re-login.
Fix by clearing the flag whenever a new login is completed with the FC Mgmt
service.
While perusing the flag use, logging was performed as if FDMI registration
occurred on vports. However, it is limited to the physical port only.
Revise the logging to reflect physical port based.
Link: https://lore.kernel.org/r/20210412013127.2387-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Rmmod on SLI-4 adapters is sometimes hitting a bad ptr dereference in
lpfc_els_free_iocb().
A prior patch refactored the lpfc_sli_abort_iocb() routine. One of the
changes was to convert from building/sending an abort within the routine to
using a common routine. The reworked routine passes, without modification,
the pring ptr to the new common routine. The older routine had logic to
check SLI-3 vs SLI-4 and adapt the pring ptr if necessary as callers were
passing SLI-3 pointers even when not on an SLI-4 adapter. The new routine
is missing this check and adapt, so the SLI-3 ring pointers are being used
in SLI-4 paths.
Fix by cleaning up the calling routines. In review, there is no need to
pass the ring ptr argument to abort_iocb at all. The routine can look at
the adapter type itself and reference the proper ring.
Link: https://lore.kernel.org/r/20210412013127.2387-2-jsmart2021@gmail.com
Fixes: db7531d2b3 ("scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers")
Cc: <stable@vger.kernel.org> # v5.11+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes the following W=1 kernel build warning(s):
drivers/scsi/lpfc/lpfc_hbadisc.c:1505: warning: expecting prototype for lpfc_update_fcf_record(). Prototype was for __lpfc_update_fcf_record() instead
Link: https://lore.kernel.org/r/20210303144631.3175331-28-lee.jones@linaro.org
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For the files modified in 2021 via the 12.8.0.7 and 12.8.0.8 patch sets,
update the copyright for 2021.
Link: https://lore.kernel.org/r/20210301171821.3427-23-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Calls to lpfc_find_vport_by_vpid() for the highest indexed vport fails with
error, "2936 Could not find Vport mapped to vpi XXX". Our vport indices in
the loop and if-clauses were off by one.
Correct the vpid range used for vpi lookup to include the highest possible
vpid.
Link: https://lore.kernel.org/r/20210301171821.3427-3-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The "pmb" pointer is freed at the start of the function and then freed
again in the error handling code.
Link: https://lore.kernel.org/r/YA6E8rO51hE56SVw@mwanda
Fixes: 92d7f7b0cd ("[SCSI] lpfc: NPIV: add NPIV support on top of SLI-3")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Several errors have occurred where the adapter stops or fails but does not
raise the register values for the driver to detect failure. Thus driver is
unaware of the failure. The failure typically results in I/O timeouts, the
I/O timeout handler failing (after several seconds), and the error handler
escalating recovery policy and resulting in more errors. Eventually, the
driver is in a position where things have spiraled and it can't do recovery
because other recovery ops are still outstanding and it becomes unusable.
Resolve the situation by having the I/O timeout handler (actually a els,
SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O,
perform a mailbox command and look for a response from the hardware. If
the mailbox command fails, it will mark the adapter offline and then invoke
the adapter reset handler to clean up.
The new I/O timeout test will be limited to a test every 5s. If there are
multiple I/O timeouts concurrently, only the 1st I/O timeout will generate
the mailbox command. Further testing will only occur once a timeout occurs
after a 5s delay from the last mailbox command has expired.
Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver's management of the fabric controller (aka pseudo-scsi
initiator) node in SLI3 mode is causing this crash. The crash occurs
because of a node reference imbalance that frees the fabric controller node
while devloss is outstanding from the SCSI transport. This is triggered by
an odd behavior where the switch reacts to a rejected RDP request with a
PLOGI and nothing else, not even a LOGO. The driver ACKS the PLOGI and
after successfully registering the RPI, incorrectly registers the fabric
controller node because it has the NLP_FC4_FCP flag still set from the
fabric controller PRLI. If a LIP is issued, the driver attempts to cleanup
on Link Up and ends up executing too many puts.
Fix by detecting the fabric node type and clearing out the nodes internal
flags that triggered a SCSI transport registration and subsequence dev_loss
event. The driver cannot count on any persistence from fabric controller
nodes.
Link: https://lore.kernel.org/r/20210104180240.46824-5-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove set but not used variable shost in lpfc_dev_loss_tmo_handler().
Link: https://lore.kernel.org/r/20201119203353.121866-1-james.smart@broadcom.com
Fixes: 52edb2caf6 ("scsi: lpfc: Remove ndlp when a PLOGI/ADISC/PRLI/REG_RPI ultimately fails")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove local variables that are set but not used.
Link: https://lore.kernel.org/r/20201119203340.121819-1-james.smart@broadcom.com
Fixes: c6adba1501 ("scsi: lpfc: Rework remote port lock handling")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch reworks the abort interfaces such that SLI-3 retains the
iocb-based formatting and completions and SLI-4 now uses native WQEs and
completion routines.
The following changes are made:
- The code is refactored from a confusing 2 routine sequence of
xx_abort_iotag_issue(), which creates/formats and abort cmd, and
xx_issue_abort_tag(), which then issues and handles the completion of
the abort cmd - into a single interface of xx_issue_abort_iotag(). The
new interface will determine whether SLI-3 or SLI-4 and then call the
appropriate handler. A completion handler can now be specified to
address the differences in completion handling. Note: original code is
all iocb based, with SLI-4 converting to SLI-3 for the SCSI/ELS path,
and NVMe natively using wqes.
- The SLI-3 side is refactored:
The older iocb-base lpfc_sli_issue_abort_iotag() routine is combined
with the logic of lpfc_sli_abort_iotag_issue() as well as the
iocb-specific code in lpfc_abort_handler() and lpfc_sli_abort_iocb() to
create the new single SLI-3 abort routine that formats and issues the
iocb.
- The SLI-4 side is refactored and added to:
The native WQE abort code in NVMe is moved to the new SLI-4
issue_abort_iotag() routine. Items in SCSI that set fields not set by
NVMe is migrated into the new routine. Thus the routine supports NVMe
and SCSI initiators. The nvmet block (target) formats the abort slightly
different (like the old NVMe initiator) thus it has its own prep routine
stolen from NVMe initiator and it retains the current code it has for
issuing the WQE (does not use the commonized routine the initiators
do). SLI-4 completion handlers were also added.
- lpfc_abort_handler now becomes a wrapper that determines whether
SLI-3 or SLI-4 and calls the proper abort handler.
Link: https://lore.kernel.org/r/20201115192646.12977-16-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While testing initiator-side cable swaps with NPIV, oops occur. The
reference counts for the Fabric nodes on the NPIV vports isn't balanced,
resulting in premature node removal.
The following fixes were made:
- Removed the FC_LBIT check in lpfc_linkup_port. This removed the special
case for vports that didn't have them clean up just like the physical
port.
- Removed the unreg_rpi call in lpfc_cleanup_node. In this section, the
node is being removed in the context of a reference count release and a
mailbox command can't be issued at this point.
- Remove special case handling in the default mailbox completion handler
that allowed the skipping of a node reference. Now, reference counting
always requires the removal of the reference.
- Move the location of the DEVICE_RM event is done during LOGO handling as
the driver has additional work to do on the ndlp before puts/releases
can be performed.
Link: https://lore.kernel.org/r/20201115192646.12977-10-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
While testing NPIV and link bounces, the vport would not show a fabric node
for the F_Port, would not transition into NPR state during a link fault, or
leave the FDMI node untouched during error injection. Cause for this was
determined to be an inconsistent manner in which F_Port, Nameserver, and
FDMI controller nodes were created and linked. In some cases, the nodes
would never be unregistered from the transport, leaving references
active. In other cases, the fabric nodes may register with the transport
multiple times while still registered.
The following changes were made:
- Fix the FDISC issue routine, which starts vport (re)creation, to mark
the F_Port as a fabric node (NLP_FABRIC) and allow the F_Port node to
fully be created and show up in the node list.
- When remote ports are cleaned up on vport termination, cleanup the
nameserver and FDMI controller nodes on the vport so they unregister
from the transport.
- On link bounces, don't exclude the NPIV Fabric remote ports from
transitioning to the NPR state, allowing them to avoid re-registration
if already registered.
Link: https://lore.kernel.org/r/20201115192646.12977-9-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a PLOGI/ADISC/PRLI/REG_RPI fails, the node remains in the nodelist in
that state. Although the driver now frees a node when the ref count goes
to zero, in this case the ref cnt doesn't reach zero because there isn't a
mechanism to release the final reference. Discovery just stops.
Fix by calling the node discovery state machine DEVICE_RM event whenever
one of these commands fail. This will remove the final reference count and
trigger node release.
Link: https://lore.kernel.org/r/20201115192646.12977-7-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently the discovery layers within the driver use the SCSI midlayer
host_lock to access node-specific structures. This can contend with the I/O
path and is too coarse of a lock.
Rework the driver so that it uses a lock specific to the remote port node
structure when accessing the structure contents. A few of the changes
brought out spots were some slightly reorganized routines worked better.
Link: https://lore.kernel.org/r/20201115192646.12977-6-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The lpfc driver is calling get_device and put_device on scsi_fc_transport
device structure. When this code was removed, the driver triggered an oops
in "scsi_is_host_dev" when the first SCSI target was unregistered from the
transport.
The reason the calls were necessary is that the driver is calling
scsi_remove_host too early, before the target rports are unregistered and
the scsi devices disconnected from the scsi_host. The fc_host was torn
down during fc_remove_host.
Fix by moving the lpfc_pci_remove_one_s3/s4 calls to scsi_remove_host to
after the nodes are cleaned up. Remove the get_device and put_device calls
and the supporting code.
Link: https://lore.kernel.org/r/20201115192646.12977-4-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Now that the driver has gone to a normal ref interface (with no odd logic)
the discovery logic needs to be updated to reworked so that it properly
takes references when it should and give them up when it should.
Rework the driver for the following get/put model:
- Move gets to just before an I/O is issued. Add gets for places where an
I/O was issued without one.
- Ensure that failures from lpfc_nlp_get() are handled by the driver.
- Check and fix the placement of lpfc_nlp_puts relative to io completions.
Note: some of these paths may not release the reference on the exact io
completion as the reference is held as the code takes another step in
the discovery thread and which may cause another io to be issued.
- Rearrange some code for error processing and calling lpfc_nlp_put.
- Fix some places of incorrect reference freeing that was causing the
premature releasing of the structure.
- Nvmet plogi handling performs unreg_rpi's. The reference counts were
unbalanced resulting in premature node removal. In some cases this
caused loss of node discovery. Corrected the reftaking around nvmet
plogis.
Nodes that experience devloss now get released from the node list now that
there is a proper reference taking.
Link: https://lore.kernel.org/r/20201115192646.12977-3-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When a remote port is disconnected and disappears, its node structure
(ndlp) stays allocated and on a vport node list. While on the list it can
be matched, thus requires validation checks on state to be added in
numerous code paths. If the node comes back, its possible for there to be
multiple node structures for the same device on the vport node list. There
is no reason to keep the node structure around after it is no longer in
existence, and the current implementation creates problems for itself
(multiple nodes) and lots of unnecessary code for state validation.
Additionally, the reference taking on the node structure didn't follow the
normal model used by the kernel kref api. It included lots of odd logic to
match state with reference count. The combination of this odd logic plus
the way it was implicitly used in the discovery engine made its reference
taking implementation suspect and extremely hard to follow.
Change the driver such that the reference taking routines are now normal
ref increments/decrements and callout on refcount=0.
With this in place, the rework can be done such that the node structure is
fully removed and deallocated when the remote port no longer exists and all
references are removed. This removal logic, and the basic ref counting are
intrically tied, thus in a single patch.
Link: https://lore.kernel.org/r/20201115192646.12977-2-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Created new attribute lpfc_enable_mi, which by default is enabled.
Add command definition bits for SLI-4 parameters that recognize whether the
adapter has MIB information support and what revision of MIB data. Using
the adapter information, register vendor-specific MIB support with FDMI.
The registration will be done every link up.
During FDMI registration, encountered a couple of errors when reverting to
FDMI rev1. Code needed to exist once reverting. Fixed these.
Link: https://lore.kernel.org/r/20201020202719.54726-8-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The following call trace was seen during HBA reset testing:
BUG: scheduling while atomic: swapper/2/0/0x10000100
...
Call Trace:
dump_stack+0x19/0x1b
__schedule_bug+0x64/0x72
__schedule+0x782/0x840
__cond_resched+0x26/0x30
_cond_resched+0x3a/0x50
mempool_alloc+0xa0/0x170
lpfc_unreg_rpi+0x151/0x630 [lpfc]
lpfc_sli_abts_recover_port+0x171/0x190 [lpfc]
lpfc_sli4_abts_err_handler+0xb2/0x1f0 [lpfc]
lpfc_sli4_io_xri_aborted+0x256/0x300 [lpfc]
lpfc_sli4_sp_handle_abort_xri_wcqe.isra.51+0xa3/0x190 [lpfc]
lpfc_sli4_fp_handle_cqe+0x89/0x4d0 [lpfc]
__lpfc_sli4_process_cq+0xdb/0x2e0 [lpfc]
__lpfc_sli4_hba_process_cq+0x41/0x100 [lpfc]
lpfc_cq_poll_hdler+0x1a/0x30 [lpfc]
irq_poll_softirq+0xc7/0x100
__do_softirq+0xf5/0x280
call_softirq+0x1c/0x30
do_softirq+0x65/0xa0
irq_exit+0x105/0x110
do_IRQ+0x56/0xf0
common_interrupt+0x16a/0x16a
With the conversion to blk_io_poll for better interrupt latency in normal
cases, it introduced this code path, executed when I/O aborts or logouts
are seen, which attempts to allocate memory for a mailbox command to be
issued. The allocation is GFP_KERNEL, thus it could attempt to sleep.
Fix by creating a work element that performs the event handling for the
remote port. This will have the mailbox commands and other items performed
in the work element, not the irq. A much better method as the "irq" routine
does not stall while performing all this deep handling code.
Ensure that allocation failures are handled and send LOGO on failure.
Additionally, enlarge the mailbox memory pool to reduce the possibility of
additional allocation in this path.
Link: https://lore.kernel.org/r/20201020202719.54726-3-james.smart@broadcom.com
Fixes: 317aeb83c9 ("scsi: lpfc: Add blk_io_poll support for latency improvment")
Cc: <stable@vger.kernel.org> # v5.9+
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Three fixes: one in drivers (lpfc) and two for zoned block devices.
The latter also impinges on the block layer but only to introduce a
new block API for setting the zone model rather than fiddling with the
queue directly in the zoned block driver.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX29mRyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishabnAP48vMYD
/cjyGAJfq/0k/U/t6pRPc5tUm89LOWcOJz0SjwD/YXcQNz7mx8MxnypAV1jbWXR7
iyWkPMYVc4EJh7oTARE=
=SQhI
-----END PGP SIGNATURE-----
Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI fixes from James Bottomley:
"Three fixes: one in drivers (lpfc) and two for zoned block devices.
The latter also impinges on the block layer but only to introduce a
new block API for setting the zone model rather than fiddling with the
queue directly in the zoned block driver"
* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: sd: sd_zbc: Fix ZBC disk initialization
scsi: sd: sd_zbc: Fix handling of host-aware ZBC disks
scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supported
Initial FLOGIs are failing with the following message:
lpfc 0000:13:00.1: 1:(0):0820 FLOGI Failed (x300). BBCredit Not Supported
In a prior patch, the READ_SPARAM command was re-ordered to post after
CONFIG_LINK as the driver is expected to update the driver's copy of the
service parameters for the FLOGI payload. If the bb-credit recovery feature
is enabled, this is fine. But on adapters were bb-credit recovery isn't
enabled, it would cause the FLOGI to fail.
Fix by restoring the original command order (READ_SPARAM before
CONFIG_LINK), and after issuing CONFIG_LINK, detect bb-credit recovery
support and reissuing READ_SPARAM to obtain the updated service parameters
(effectively adding in the fix command order).
[mkp: corrected SHA]
Link: https://lore.kernel.org/r/20200911200147.110826-1-james.smart@broadcom.com
Fixes: 835214f5d5 ("scsi: lpfc: Fix broken Credit Recovery after driver load")
CC: <stable@vger.kernel.org> # v5.7+
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes the following W=1 kernel build warning(s):
drivers/scsi/lpfc/lpfc_hbadisc.c:1209: warning: Function parameter or member 'phba' not described in 'lpfc_sli4_clear_fcf_rr_bmask'
drivers/scsi/lpfc/lpfc_hbadisc.c:1309: warning: Function parameter or member 'sw_name' not described in 'lpfc_sw_name_match'
drivers/scsi/lpfc/lpfc_hbadisc.c:1309: warning: Excess function parameter 'fab_name' description in 'lpfc_sw_name_match'
drivers/scsi/lpfc/lpfc_hbadisc.c:1397: warning: Function parameter or member 'fcf_rec' not described in 'lpfc_copy_fcf_record'
drivers/scsi/lpfc/lpfc_hbadisc.c:1397: warning: Excess function parameter 'fcf' description in 'lpfc_copy_fcf_record'
drivers/scsi/lpfc/lpfc_hbadisc.c:1956: warning: Cannot understand lpfc_sli4_fcf_record_match - testing new FCF record for matching existing FCF
on line 1956 - I thought it was a doc line
drivers/scsi/lpfc/lpfc_hbadisc.c:2078: warning: Function parameter or member 'fcf_index' not described in 'lpfc_sli4_fcf_pri_list_del'
drivers/scsi/lpfc/lpfc_hbadisc.c:2109: warning: Function parameter or member 'fcf_index' not described in 'lpfc_sli4_set_fcf_flogi_fail'
drivers/scsi/lpfc/lpfc_hbadisc.c:2135: warning: Function parameter or member 'fcf_index' not described in 'lpfc_sli4_fcf_pri_list_add'
drivers/scsi/lpfc/lpfc_hbadisc.c:2135: warning: Function parameter or member 'new_fcf_record' not described in 'lpfc_sli4_fcf_pri_list_add'
Link: https://lore.kernel.org/r/20200723122446.1329773-3-lee.jones@linaro.org
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The current logging methods typically end up requesting a reproduction with
a different logging level set to figure out what happened. This was mainly
by design to not clutter the kernel log messages with things that were
typically not interesting and the messages themselves could cause other
issues.
When looking to make a better system, it was seen that in many cases when
more data was wanted was when another message, usually at KERN_ERR level,
was logged. And in most cases, what the additional logging that was then
enabled was typically. Most of these areas fell into the discovery machine.
Based on this summary, the following design has been put in place: The
driver will maintain an internal log (256 elements of 256 bytes). The
"additional logging" messages that are usually enabled in a reproduction
will be changed to now log all the time to the internal log. A new logging
level is defined - LOG_TRACE_EVENT. When this level is set (it is not by
default) and a message marked as KERN_ERR is logged, all the messages in
the internal log will be dumped to the kernel log before the KERN_ERR
message is logged.
There is a timestamp on each message added to the internal log. However,
this timestamp is not converted to wall time when logged. The value of the
timestamp is solely to give a crude time reference for the messages.
Link: https://lore.kernel.org/r/20200630215001.70793-14-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This series consists of the usual driver updates (qla2xxx, ufs, zfcp,
target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host
of other minor updates. There are no major core changes in this
series apart from a refactoring in scsi_lib.c.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXtq5QyYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXyGAQCipTWx
7kHKHZBCVTU133bADt3+SstLrAm8PKZEXMnP9wEAzu4QkkW8URxEDRrpu7qk5gbA
9M/KyqvfRtTH7+BSK7M=
=J6aO
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
:This series consists of the usual driver updates (qla2xxx, ufs, zfcp,
target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host
of other minor updates.
There are no major core changes in this series apart from a
refactoring in scsi_lib.c"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits)
scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime changes
scsi: cxgb3i: Fix some leaks in init_act_open()
scsi: ibmvscsi: Make some functions static
scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim
scsi: ufs: Fix WriteBooster flush during runtime suspend
scsi: ufs: Fix index of attributes query for WriteBooster feature
scsi: ufs: Allow WriteBooster on UFS 2.2 devices
scsi: ufs: Remove unnecessary memset for dev_info
scsi: ufs-qcom: Fix scheduling while atomic issue
scsi: mpt3sas: Fix reply queue count in non RDPQ mode
scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited event
scsi: target: tcmu: Fix a use after free in tcmu_check_expired_queue_cmd()
scsi: vhost: Notify TCM about the maximum sg entries supported per command
scsi: qla2xxx: Remove return value from qla_nvme_ls()
scsi: qla2xxx: Remove an unused function
scsi: iscsi: Register sysfs for iscsi workqueue
scsi: scsi_debug: Parser tables and code interaction
scsi: core: Refactor scsi_mq_setup_tags function
scsi: core: Fix incorrect usage of shost_for_each_device
scsi: qla2xxx: Fix endianness annotations in source files
...
As the nvmet layer does not have the concept of a remoteport object, which
can be used to identify the entity on the other end of the fabric that is
to receive an LS, the hosthandle was introduced. The driver passes the
hosthandle, a value representative of the remote port, with a ls request
receive. The LS request will create the association. The transport will
remember the hosthandle for the association, and if there is a need to
initiate a LS request to the remote port for the association, the
hosthandle will be used. When the driver loses connectivity with the
remote port, it needs to notify the transport that the hosthandle is no
longer valid, allowing the transport to terminate associations related to
the hosthandle.
This patch adds support to the driver for the hosthandle. The driver will
use the ndlp pointer of the remote port for the hosthandle in calls to
nvmet_fc_rcv_ls_req(). The discovery engine is updated to invalidate the
hosthandle whenever connectivity with the remote port is lost.
Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A lot of files in lpfc include nvme headers, building up relationships that
require a file to change for its headers when there is no other change
necessary. It would be better to localize the nvme headers.
There is also no need for separate nvme (initiator) and nvmet (tgt)
header files.
Refactor the inclusion of nvme headers so that all nvme items are
included by lpfc_nvme.h
Merge lpfc_nvmet.h into lpfc_nvme.h so that there is a single header used
by both the nvme and nvmet sides. This prepares for structure sharing
between the two roles. Prep to add shared function prototypes for upcoming
shared routines.
Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In an audit of lockdep calls in the driver, there are multiple lockdep
checks in successive calling layers. E.g. a routine checks, and then calls
a lower routine that also checks, and so on. Calling sequences result in
many redundant checks.
Refine the code to remove lower-level lockdep checks. Update comments on
the lock, correcting a few places where lock object in comment was
incorrect.
Link: https://lore.kernel.org/r/20200501214310.91713-7-jsmart2021@gmail.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>