Commit Graph

342 Commits

Author SHA1 Message Date
James Smart
02243836ad scsi: lpfc: Add support for the CM framework
Complete the enablement of the cm framework feature in the adapter. Perform
the following:

 - Detect the presence of the congestion management framework feature.

When the cm framework is present:

 - Issue the SET_FEATURE command to enable the feature.

 - Register the cm statistics buffer with the adapter.

 - Read the cm enablement buffer to determine the cm framework state for cm
   management.

When cm management is enabled:

 - Monitor all FPIN and congestion signalling events, incrementing
   counters.

 - Regularly sync with the adapter to communicate congestion events and to
   receive an rx request limit.

 - Monitor requests for rx data and ensure that no more than the
   adapter prescribed limit is issued on the link. If the limit is
   exceeded, SCSI and/or NVMe traffic is temporarily suspended.

 - Maintain the minute, hourly, daily statistics buffer.

 - Monitor for congestion enablement change events, causing a reread of the
   enablement buffer and acting on any change in enablement.

And:

 - Add teardown logic, including buffer deregistration, on adapter
   detachment or reset.

Link: https://lore.kernel.org/r/20210816162901.121235-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24 22:56:34 -04:00
James Smart
9064aeb2df scsi: lpfc: Add EDC ELS support
When congestion management is enabled, issue EDC ELS to register congestion
signaling capabilities with the fabric. The response handling will process
the fabric parameters and set the reporting parameters.

Similarly, add support for receiving an EDC request from the fabric
generating a corresponding response.

Implement handlers for congestion signals from the fabric and maintain
statistics for them.

Link: https://lore.kernel.org/r/20210816162901.121235-6-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-08-24 22:56:33 -04:00
James Smart
bfc477854a scsi: lpfc: Add 256 Gb link speed support
Update routines to support 256 Gb link speed for LPe37000/LPe38000
adapters. 256 Gb speeds can be seen on trunk links.

Link: https://lore.kernel.org/r/20210722221721.74388-5-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-27 00:06:41 -04:00
James Smart
0614568361 scsi: lpfc: Delay unregistering from transport until GIDFT or ADISC completes
On an RSCN event, the nodes specified in RSCN payload and in MAPPED state
are moved to NPR state in order to revalidate the login. This triggers an
immediate unregister from SCSI/NVMe backend. The assumption is that the
node may be missing. The re-registration with the backend happens after
either relogin (PLOGI/PRLI; if ADISC is disabled or login truly lost) or
when ADISC completes successfully (rediscover with ADISC enabled).

However, the NVMe-FC standard provides for an RSCN to be triggered when
the remote port supports a discovery controller and there was a change
of discovery log content. As the remote port typically also supports
storage subsystems, this unregister causes all storage controller
connections to fail and require reconnect.

Correct by reworking the code to ensure that the unregistration only occurs
when a login state is truly terminated, thereby leaving the NVMe storage
controllers in place.

The changes made are:

 - Retain node state in ADISC_ISSUE when scheduling ADISC ELS retry.

 - Do not clear wwpn/wwnn values upon ADISC failure.

 - Move MAPPED nodes to NPR during RSCN processing, but do not unregister
   with transport.  On GIDFT completion, identify missing nodes (not marked
   NLP_NPR_2B_DISC) and unregister them.

 - Perform unregistration for nodes that will go through ADISC processing
   if ADISC completion fails.

 - Successful ADISC completion will move node back to MAPPED state.

Link: https://lore.kernel.org/r/20210707184351.67872-16-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-07-18 22:30:37 -04:00
Gaurav Srivastava
0c4792c64f scsi: lpfc: vmid: Add QFPA and VMID timeout check in worker thread
Add a periodic check for issuing of QFPA command and VMID timeout in the
worker thread. The inactivity timeout check is added via the timer
function.

Link: https://lore.kernel.org/r/20210608043556.274139-13-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-10 10:01:33 -04:00
Gaurav Srivastava
20397179aa scsi: lpfc: vmid: Timeout implementation for VMID
Implement timeout functionality for the VMID. After the set time period of
inactivity, the VMID is deregistered from the switch.

Link: https://lore.kernel.org/r/20210608043556.274139-12-muneendra.kumar@broadcom.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Gaurav Srivastava <gaurav.srivastava@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Muneendra Kumar <muneendra.kumar@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-06-10 10:01:33 -04:00
James Smart
fe83e3b9b4 scsi: lpfc: Fix node handling for Fabric Controller and Domain Controller
During link bounce testing, RPI counts were seen to differ from the number
of nodes. For fabric and domain controllers, a temporary RPI is assigned,
but the code isn't registering it. If the nodes do go away, such as on link
down, the temporary RPI isn't being released.

Change the way these two fabric services are managed, make them behave like
any other remote port. Register the RPI and register with the transport.
Never leave the nodes in a NPR or UNUSED state where their RPI is in limbo.
This allows them to follow normal dev_loss_tmo handling, RPI refcounting,
and normal removal rules. It also allows fabric I/Os to use the RPI for
traffic requests.

Note: There is some logic that still has a couple of exceptions when the
Domain controller (0xfffcXX). There are cases where the fabric won't have a
valid login but will send RDP. Other times, it will it send a LOGO then an
RDP. It makes for ad-hoc behavior to manage the node. Exceptions are
documented in the code.

Link: https://lore.kernel.org/r/20210514195559.119853-7-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-21 23:23:28 -04:00
James Smart
01131e7aae scsi: lpfc: Fix unreleased RPIs when NPIV ports are created
While testing NPIV and watching logins and used RPI levels, it was seen the
used RPI count was much higher than the number of remote ports discovered.

Code inspection showed that remote port removals on any NPIV instance are
releasing the RPI, but not performing an UNREG_RPI with the adapter thus
the reference counting never fully drops and the RPI is never fully
released. This was happening on NPIV nodes due to a log of fabric ELS's to
fabric addresses. This lack of UNREG_RPI was introduced by a prior node
rework patch that performed the UNREG_RPI as part of node cleanup.

To resolve the issue, do the following:

 - Restore the RPI release code, but move the location to so that it is in
   line with the new node cleanup design.

 - NPIV ports now release the RPI and drop the node when the caller sets
   the NLP_RELEASE_RPI flag.

 - Set the NLP_RELEASE_RPI flag in node cleanup which will trigger a
   release of RPI to free pool.

 - Ensure there's an UNREG_RPI at LOGO completion so that RPI release is
   completed.

 - Stop offline_prep from skipping nodes that are UNUSED. The RPI may
   not have been released.

 - Stop the default RPI handling in lpfc_cmpl_els_rsp() for SLI4.

 - Fixed up debugfs RPI displays for better debugging.

Fixes: a70e63eee1 ("scsi: lpfc: Fix NPIV Fabric Node reference counting")
Link: https://lore.kernel.org/r/20210514195559.119853-2-jsmart2021@gmail.com
Cc: <stable@vger.kernel.org> # v5.11+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-05-21 23:23:27 -04:00
James Smart
f115612528 scsi: lpfc: Standardize discovery object logging format
Code inspection showed lpfc was using three different pointer formats when
logging discovery object pointers.

Standardize the pointer format to x%px.

Note: %px use is limited to discovery objects in order to aid core
analysis.

Link: https://lore.kernel.org/r/20210412013127.2387-14-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-13 01:39:14 -04:00
James Smart
a314dec37c scsi: lpfc: Fix missing FDMI registrations after Mgmt Svc login
FDMI registration needs to be performed after every login with the FC Mgmt
service. The flag the driver is using to track registration is cleared on
link up, but never on Mgmt service logout/re-login.

Fix by clearing the flag whenever a new login is completed with the FC Mgmt
service.

While perusing the flag use, logging was performed as if FDMI registration
occurred on vports. However, it is limited to the physical port only.
Revise the logging to reflect physical port based.

Link: https://lore.kernel.org/r/20210412013127.2387-10-jsmart2021@gmail.com
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-13 01:39:14 -04:00
James Smart
078c68b87a scsi: lpfc: Fix rmmod crash due to bad ring pointers to abort_iotag
Rmmod on SLI-4 adapters is sometimes hitting a bad ptr dereference in
lpfc_els_free_iocb().

A prior patch refactored the lpfc_sli_abort_iocb() routine. One of the
changes was to convert from building/sending an abort within the routine to
using a common routine. The reworked routine passes, without modification,
the pring ptr to the new common routine. The older routine had logic to
check SLI-3 vs SLI-4 and adapt the pring ptr if necessary as callers were
passing SLI-3 pointers even when not on an SLI-4 adapter. The new routine
is missing this check and adapt, so the SLI-3 ring pointers are being used
in SLI-4 paths.

Fix by cleaning up the calling routines. In review, there is no need to
pass the ring ptr argument to abort_iocb at all. The routine can look at
the adapter type itself and reference the proper ring.

Link: https://lore.kernel.org/r/20210412013127.2387-2-jsmart2021@gmail.com
Fixes: db7531d2b3 ("scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers")
Cc: <stable@vger.kernel.org> # v5.11+
Co-developed-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-04-13 01:39:13 -04:00
Lee Jones
3884ce1539 scsi: lpfc: Fix incorrect naming of __lpfc_update_fcf_record()
Fixes the following W=1 kernel build warning(s):

 drivers/scsi/lpfc/lpfc_hbadisc.c:1505: warning: expecting prototype for lpfc_update_fcf_record(). Prototype was for __lpfc_update_fcf_record() instead

Link: https://lore.kernel.org/r/20210303144631.3175331-28-lee.jones@linaro.org
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Cc: "James E.J. Bottomley" <jejb@linux.ibm.com>
Cc: "Martin K. Petersen" <martin.petersen@oracle.com>
Cc: linux-scsi@vger.kernel.org
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-15 22:14:53 -04:00
James Smart
67073c69c8 scsi: lpfc: Update copyrights for 12.8.0.7 and 12.8.0.8 changes
For the files modified in 2021 via the 12.8.0.7 and 12.8.0.8 patch sets,
update the copyright for 2021.

Link: https://lore.kernel.org/r/20210301171821.3427-23-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:06 -05:00
James Smart
58c36e80ee scsi: lpfc: Fix vport indices in lpfc_find_vport_by_vpid()
Calls to lpfc_find_vport_by_vpid() for the highest indexed vport fails with
error, "2936 Could not find Vport mapped to vpi XXX".  Our vport indices in
the loop and if-clauses were off by one.

Correct the vpid range used for vpi lookup to include the highest possible
vpid.

Link: https://lore.kernel.org/r/20210301171821.3427-3-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-03-04 17:37:03 -05:00
Dan Carpenter
0be310979e scsi: lpfc: Fix ancient double free
The "pmb" pointer is freed at the start of the function and then freed
again in the error handling code.

Link: https://lore.kernel.org/r/YA6E8rO51hE56SVw@mwanda
Fixes: 92d7f7b0cd ("[SCSI] lpfc: NPIV: add NPIV support on top of SLI-3")
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-26 22:08:57 -05:00
James Smart
a22d73b655 scsi: lpfc: Implement health checking when aborting I/O
Several errors have occurred where the adapter stops or fails but does not
raise the register values for the driver to detect failure. Thus driver is
unaware of the failure. The failure typically results in I/O timeouts, the
I/O timeout handler failing (after several seconds), and the error handler
escalating recovery policy and resulting in more errors. Eventually, the
driver is in a position where things have spiraled and it can't do recovery
because other recovery ops are still outstanding and it becomes unusable.

Resolve the situation by having the I/O timeout handler (actually a els,
SCSI I/O, NVMe ls, or NVMe I/O timeout), in addition to aborting the I/O,
perform a mailbox command and look for a response from the hardware.  If
the mailbox command fails, it will mark the adapter offline and then invoke
the adapter reset handler to clean up.

The new I/O timeout test will be limited to a test every 5s. If there are
multiple I/O timeouts concurrently, only the 1st I/O timeout will generate
the mailbox command. Further testing will only occur once a timeout occurs
after a 5s delay from the last mailbox command has expired.

Link: https://lore.kernel.org/r/20210104180240.46824-14-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:37 -05:00
James Smart
07aaefdf75 scsi: lpfc: Fix crash when a fabric node is released prematurely
The driver's management of the fabric controller (aka pseudo-scsi
initiator) node in SLI3 mode is causing this crash. The crash occurs
because of a node reference imbalance that frees the fabric controller node
while devloss is outstanding from the SCSI transport.  This is triggered by
an odd behavior where the switch reacts to a rejected RDP request with a
PLOGI and nothing else, not even a LOGO.  The driver ACKS the PLOGI and
after successfully registering the RPI, incorrectly registers the fabric
controller node because it has the NLP_FC4_FCP flag still set from the
fabric controller PRLI.  If a LIP is issued, the driver attempts to cleanup
on Link Up and ends up executing too many puts.

Fix by detecting the fabric node type and clearing out the nodes internal
flags that triggered a SCSI transport registration and subsequence dev_loss
event.  The driver cannot count on any persistence from fabric controller
nodes.

Link: https://lore.kernel.org/r/20210104180240.46824-5-jsmart2021@gmail.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2021-01-07 23:02:35 -05:00
James Smart
09b15e3507 scsi: lpfc: Fix set but unused variables in lpfc_dev_loss_tmo_handler()
Remove set but not used variable shost in lpfc_dev_loss_tmo_handler().

Link: https://lore.kernel.org/r/20201119203353.121866-1-james.smart@broadcom.com
Fixes: 52edb2caf6 ("scsi: lpfc: Remove ndlp when a PLOGI/ADISC/PRLI/REG_RPI ultimately fails")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-19 22:21:04 -05:00
James Smart
4a119d8a4c scsi: lpfc: Fix set but not used warnings from Rework remote port lock handling
Remove local variables that are set but not used.

Link: https://lore.kernel.org/r/20201119203340.121819-1-james.smart@broadcom.com
Fixes: c6adba1501 ("scsi: lpfc: Rework remote port lock handling")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-19 22:20:26 -05:00
James Smart
db7531d2b3 scsi: lpfc: Convert abort handling to SLI-3 and SLI-4 handlers
This patch reworks the abort interfaces such that SLI-3 retains the
iocb-based formatting and completions and SLI-4 now uses native WQEs and
completion routines.

The following changes are made:

 - The code is refactored from a confusing 2 routine sequence of
   xx_abort_iotag_issue(), which creates/formats and abort cmd, and
   xx_issue_abort_tag(), which then issues and handles the completion of
   the abort cmd - into a single interface of xx_issue_abort_iotag().  The
   new interface will determine whether SLI-3 or SLI-4 and then call the
   appropriate handler. A completion handler can now be specified to
   address the differences in completion handling.  Note: original code is
   all iocb based, with SLI-4 converting to SLI-3 for the SCSI/ELS path,
   and NVMe natively using wqes.

 - The SLI-3 side is refactored:

   The older iocb-base lpfc_sli_issue_abort_iotag() routine is combined
   with the logic of lpfc_sli_abort_iotag_issue() as well as the
   iocb-specific code in lpfc_abort_handler() and lpfc_sli_abort_iocb() to
   create the new single SLI-3 abort routine that formats and issues the
   iocb.

 - The SLI-4 side is refactored and added to:

   The native WQE abort code in NVMe is moved to the new SLI-4
   issue_abort_iotag() routine. Items in SCSI that set fields not set by
   NVMe is migrated into the new routine. Thus the routine supports NVMe
   and SCSI initiators. The nvmet block (target) formats the abort slightly
   different (like the old NVMe initiator) thus it has its own prep routine
   stolen from NVMe initiator and it retains the current code it has for
   issuing the WQE (does not use the commonized routine the initiators
   do). SLI-4 completion handlers were also added.

 - lpfc_abort_handler now becomes a wrapper that determines whether
   SLI-3 or SLI-4 and calls the proper abort handler.

Link: https://lore.kernel.org/r/20201115192646.12977-16-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:56 -05:00
James Smart
a70e63eee1 scsi: lpfc: Fix NPIV Fabric Node reference counting
While testing initiator-side cable swaps with NPIV, oops occur.  The
reference counts for the Fabric nodes on the NPIV vports isn't balanced,
resulting in premature node removal.

The following fixes were made:

 - Removed the FC_LBIT check in lpfc_linkup_port. This removed the special
   case for vports that didn't have them clean up just like the physical
   port.

 - Removed the unreg_rpi call in lpfc_cleanup_node. In this section, the
   node is being removed in the context of a reference count release and a
   mailbox command can't be issued at this point.

 - Remove special case handling in the default mailbox completion handler
   that allowed the skipping of a node reference. Now, reference counting
   always requires the removal of the reference.

 - Move the location of the DEVICE_RM event is done during LOGO handling as
   the driver has additional work to do on the ndlp before puts/releases
   can be performed.

Link: https://lore.kernel.org/r/20201115192646.12977-10-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:55 -05:00
James Smart
b3f2e67cc2 scsi: lpfc: Fix NPIV discovery and Fabric Node detection
While testing NPIV and link bounces, the vport would not show a fabric node
for the F_Port, would not transition into NPR state during a link fault, or
leave the FDMI node untouched during error injection. Cause for this was
determined to be an inconsistent manner in which F_Port, Nameserver, and
FDMI controller nodes were created and linked. In some cases, the nodes
would never be unregistered from the transport, leaving references
active. In other cases, the fabric nodes may register with the transport
multiple times while still registered.

The following changes were made:

 - Fix the FDISC issue routine, which starts vport (re)creation, to mark
   the F_Port as a fabric node (NLP_FABRIC) and allow the F_Port node to
   fully be created and show up in the node list.

 - When remote ports are cleaned up on vport termination, cleanup the
   nameserver and FDMI controller nodes on the vport so they unregister
   from the transport.

 - On link bounces, don't exclude the NPIV Fabric remote ports from
   transitioning to the NPR state, allowing them to avoid re-registration
   if already registered.

Link: https://lore.kernel.org/r/20201115192646.12977-9-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:55 -05:00
James Smart
52edb2caf6 scsi: lpfc: Remove ndlp when a PLOGI/ADISC/PRLI/REG_RPI ultimately fails
When a PLOGI/ADISC/PRLI/REG_RPI fails, the node remains in the nodelist in
that state.  Although the driver now frees a node when the ref count goes
to zero, in this case the ref cnt doesn't reach zero because there isn't a
mechanism to release the final reference.  Discovery just stops.

Fix by calling the node discovery state machine DEVICE_RM event whenever
one of these commands fail. This will remove the final reference count and
trigger node release.

Link: https://lore.kernel.org/r/20201115192646.12977-7-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:55 -05:00
James Smart
c6adba1501 scsi: lpfc: Rework remote port lock handling
Currently the discovery layers within the driver use the SCSI midlayer
host_lock to access node-specific structures. This can contend with the I/O
path and is too coarse of a lock.

Rework the driver so that it uses a lock specific to the remote port node
structure when accessing the structure contents. A few of the changes
brought out spots were some slightly reorganized routines worked better.

Link: https://lore.kernel.org/r/20201115192646.12977-6-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:54 -05:00
James Smart
95f0ef8a83 scsi: lpfc: Fix removal of SCSI transport device get and put on dev structure
The lpfc driver is calling get_device and put_device on scsi_fc_transport
device structure. When this code was removed, the driver triggered an oops
in "scsi_is_host_dev" when the first SCSI target was unregistered from the
transport.

The reason the calls were necessary is that the driver is calling
scsi_remove_host too early, before the target rports are unregistered and
the scsi devices disconnected from the scsi_host.  The fc_host was torn
down during fc_remove_host.

Fix by moving the lpfc_pci_remove_one_s3/s4 calls to scsi_remove_host to
after the nodes are cleaned up.  Remove the get_device and put_device calls
and the supporting code.

Link: https://lore.kernel.org/r/20201115192646.12977-4-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:54 -05:00
James Smart
4430f7fd09 scsi: lpfc: Rework locations of ndlp reference taking
Now that the driver has gone to a normal ref interface (with no odd logic)
the discovery logic needs to be updated to reworked so that it properly
takes references when it should and give them up when it should.

Rework the driver for the following get/put model:

 - Move gets to just before an I/O is issued. Add gets for places where an
   I/O was issued without one.

 - Ensure that failures from lpfc_nlp_get() are handled by the driver.

 - Check and fix the placement of lpfc_nlp_puts relative to io completions.
   Note: some of these paths may not release the reference on the exact io
   completion as the reference is held as the code takes another step in
   the discovery thread and which may cause another io to be issued.

 - Rearrange some code for error processing and calling lpfc_nlp_put.

 - Fix some places of incorrect reference freeing that was causing the
   premature releasing of the structure.

 - Nvmet plogi handling performs unreg_rpi's. The reference counts were
   unbalanced resulting in premature node removal. In some cases this
   caused loss of node discovery. Corrected the reftaking around nvmet
   plogis.

Nodes that experience devloss now get released from the node list now that
there is a proper reference taking.

Link: https://lore.kernel.org/r/20201115192646.12977-3-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:54 -05:00
James Smart
307e338097 scsi: lpfc: Rework remote port ref counting and node freeing
When a remote port is disconnected and disappears, its node structure
(ndlp) stays allocated and on a vport node list. While on the list it can
be matched, thus requires validation checks on state to be added in
numerous code paths. If the node comes back, its possible for there to be
multiple node structures for the same device on the vport node list. There
is no reason to keep the node structure around after it is no longer in
existence, and the current implementation creates problems for itself
(multiple nodes) and lots of unnecessary code for state validation.

Additionally, the reference taking on the node structure didn't follow the
normal model used by the kernel kref api. It included lots of odd logic to
match state with reference count.  The combination of this odd logic plus
the way it was implicitly used in the discovery engine made its reference
taking implementation suspect and extremely hard to follow.

Change the driver such that the reference taking routines are now normal
ref increments/decrements and callout on refcount=0.

With this in place, the rework can be done such that the node structure is
fully removed and deallocated when the remote port no longer exists and all
references are removed.  This removal logic, and the basic ref counting are
intrically tied, thus in a single patch.

Link: https://lore.kernel.org/r/20201115192646.12977-2-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-11-17 00:43:54 -05:00
James Smart
8aaa7bcf07 scsi: lpfc: Add FDMI Vendor MIB support
Created new attribute lpfc_enable_mi, which by default is enabled.

Add command definition bits for SLI-4 parameters that recognize whether the
adapter has MIB information support and what revision of MIB data.  Using
the adapter information, register vendor-specific MIB support with FDMI.
The registration will be done every link up.

During FDMI registration, encountered a couple of errors when reverting to
FDMI rev1. Code needed to exist once reverting. Fixed these.

Link: https://lore.kernel.org/r/20201020202719.54726-8-james.smart@broadcom.com
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-10-26 21:42:39 -04:00
James Smart
e7dab164a9 scsi: lpfc: Fix scheduling call while in softirq context in lpfc_unreg_rpi
The following call trace was seen during HBA reset testing:

BUG: scheduling while atomic: swapper/2/0/0x10000100
...
Call Trace:
dump_stack+0x19/0x1b
__schedule_bug+0x64/0x72
__schedule+0x782/0x840
__cond_resched+0x26/0x30
_cond_resched+0x3a/0x50
mempool_alloc+0xa0/0x170
lpfc_unreg_rpi+0x151/0x630 [lpfc]
lpfc_sli_abts_recover_port+0x171/0x190 [lpfc]
lpfc_sli4_abts_err_handler+0xb2/0x1f0 [lpfc]
lpfc_sli4_io_xri_aborted+0x256/0x300 [lpfc]
lpfc_sli4_sp_handle_abort_xri_wcqe.isra.51+0xa3/0x190 [lpfc]
lpfc_sli4_fp_handle_cqe+0x89/0x4d0 [lpfc]
__lpfc_sli4_process_cq+0xdb/0x2e0 [lpfc]
__lpfc_sli4_hba_process_cq+0x41/0x100 [lpfc]
lpfc_cq_poll_hdler+0x1a/0x30 [lpfc]
irq_poll_softirq+0xc7/0x100
__do_softirq+0xf5/0x280
call_softirq+0x1c/0x30
do_softirq+0x65/0xa0
irq_exit+0x105/0x110
do_IRQ+0x56/0xf0
common_interrupt+0x16a/0x16a

With the conversion to blk_io_poll for better interrupt latency in normal
cases, it introduced this code path, executed when I/O aborts or logouts
are seen, which attempts to allocate memory for a mailbox command to be
issued.  The allocation is GFP_KERNEL, thus it could attempt to sleep.

Fix by creating a work element that performs the event handling for the
remote port. This will have the mailbox commands and other items performed
in the work element, not the irq. A much better method as the "irq" routine
does not stall while performing all this deep handling code.

Ensure that allocation failures are handled and send LOGO on failure.

Additionally, enlarge the mailbox memory pool to reduce the possibility of
additional allocation in this path.

Link: https://lore.kernel.org/r/20201020202719.54726-3-james.smart@broadcom.com
Fixes: 317aeb83c9 ("scsi: lpfc: Add blk_io_poll support for latency improvment")
Cc: <stable@vger.kernel.org> # v5.9+
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-10-26 21:42:38 -04:00
Linus Torvalds
a1bffa4874 SCSI fixes on 20200926
Three fixes: one in drivers (lpfc) and two for zoned block devices.
 The latter also impinges on the block layer but only to introduce a
 new block API for setting the zone model rather than fiddling with the
 queue directly in the zoned block driver.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCX29mRyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishabnAP48vMYD
 /cjyGAJfq/0k/U/t6pRPc5tUm89LOWcOJz0SjwD/YXcQNz7mx8MxnypAV1jbWXR7
 iyWkPMYVc4EJh7oTARE=
 =SQhI
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "Three fixes: one in drivers (lpfc) and two for zoned block devices.

  The latter also impinges on the block layer but only to introduce a
  new block API for setting the zone model rather than fiddling with the
  queue directly in the zoned block driver"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: sd: sd_zbc: Fix ZBC disk initialization
  scsi: sd: sd_zbc: Fix handling of host-aware ZBC disks
  scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supported
2020-09-26 11:18:37 -07:00
James Smart
7f04839ec4 scsi: lpfc: Fix initial FLOGI failure due to BBSCN not supported
Initial FLOGIs are failing with the following message:

 lpfc 0000:13:00.1: 1:(0):0820 FLOGI Failed (x300). BBCredit Not Supported

In a prior patch, the READ_SPARAM command was re-ordered to post after
CONFIG_LINK as the driver is expected to update the driver's copy of the
service parameters for the FLOGI payload. If the bb-credit recovery feature
is enabled, this is fine. But on adapters were bb-credit recovery isn't
enabled, it would cause the FLOGI to fail.

Fix by restoring the original command order (READ_SPARAM before
CONFIG_LINK), and after issuing CONFIG_LINK, detect bb-credit recovery
support and reissuing READ_SPARAM to obtain the updated service parameters
(effectively adding in the fix command order).

[mkp: corrected SHA]

Link: https://lore.kernel.org/r/20200911200147.110826-1-james.smart@broadcom.com
Fixes: 835214f5d5 ("scsi: lpfc: Fix broken Credit Recovery after driver load")
CC: <stable@vger.kernel.org> # v5.7+
Co-developed-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-09-15 18:45:42 -04:00
Gustavo A. R. Silva
df561f6688 treewide: Use fallthrough pseudo-keyword
Replace the existing /* fall through */ comments and its variants with
the new pseudo-keyword macro fallthrough[1]. Also, remove unnecessary
fall-through markings when it is the case.

[1] https://www.kernel.org/doc/html/v5.7/process/deprecated.html?highlight=fallthrough#implicit-switch-case-fall-through

Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>
2020-08-23 17:36:59 -05:00
Lee Jones
e415f2a2ac scsi: lpfc: Fix kerneldoc parameter formatting/misnaming/missing issues
Fixes the following W=1 kernel build warning(s):

 drivers/scsi/lpfc/lpfc_hbadisc.c:1209: warning: Function parameter or member 'phba' not described in 'lpfc_sli4_clear_fcf_rr_bmask'
 drivers/scsi/lpfc/lpfc_hbadisc.c:1309: warning: Function parameter or member 'sw_name' not described in 'lpfc_sw_name_match'
 drivers/scsi/lpfc/lpfc_hbadisc.c:1309: warning: Excess function parameter 'fab_name' description in 'lpfc_sw_name_match'
 drivers/scsi/lpfc/lpfc_hbadisc.c:1397: warning: Function parameter or member 'fcf_rec' not described in 'lpfc_copy_fcf_record'
 drivers/scsi/lpfc/lpfc_hbadisc.c:1397: warning: Excess function parameter 'fcf' description in 'lpfc_copy_fcf_record'
 drivers/scsi/lpfc/lpfc_hbadisc.c:1956: warning: Cannot understand  lpfc_sli4_fcf_record_match - testing new FCF record for matching existing FCF
 on line 1956 - I thought it was a doc line
 drivers/scsi/lpfc/lpfc_hbadisc.c:2078: warning: Function parameter or member 'fcf_index' not described in 'lpfc_sli4_fcf_pri_list_del'
 drivers/scsi/lpfc/lpfc_hbadisc.c:2109: warning: Function parameter or member 'fcf_index' not described in 'lpfc_sli4_set_fcf_flogi_fail'
 drivers/scsi/lpfc/lpfc_hbadisc.c:2135: warning: Function parameter or member 'fcf_index' not described in 'lpfc_sli4_fcf_pri_list_add'
 drivers/scsi/lpfc/lpfc_hbadisc.c:2135: warning: Function parameter or member 'new_fcf_record' not described in 'lpfc_sli4_fcf_pri_list_add'

Link: https://lore.kernel.org/r/20200723122446.1329773-3-lee.jones@linaro.org
Cc: James Smart <james.smart@broadcom.com>
Cc: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: Lee Jones <lee.jones@linaro.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-24 22:31:54 -04:00
Dick Kennedy
372c187b8a scsi: lpfc: Add an internal trace log buffer
The current logging methods typically end up requesting a reproduction with
a different logging level set to figure out what happened. This was mainly
by design to not clutter the kernel log messages with things that were
typically not interesting and the messages themselves could cause other
issues.

When looking to make a better system, it was seen that in many cases when
more data was wanted was when another message, usually at KERN_ERR level,
was logged.  And in most cases, what the additional logging that was then
enabled was typically. Most of these areas fell into the discovery machine.

Based on this summary, the following design has been put in place: The
driver will maintain an internal log (256 elements of 256 bytes).  The
"additional logging" messages that are usually enabled in a reproduction
will be changed to now log all the time to the internal log.  A new logging
level is defined - LOG_TRACE_EVENT.  When this level is set (it is not by
default) and a message marked as KERN_ERR is logged, all the messages in
the internal log will be dumped to the kernel log before the KERN_ERR
message is logged.

There is a timestamp on each message added to the internal log. However,
this timestamp is not converted to wall time when logged. The value of the
timestamp is solely to give a crude time reference for the messages.

Link: https://lore.kernel.org/r/20200630215001.70793-14-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-07-02 23:06:49 -04:00
Linus Torvalds
818dbde78e SCSI misc on 20200605
This series consists of the usual driver updates (qla2xxx, ufs, zfcp,
 target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host
 of other minor updates.  There are no major core changes in this
 series apart from a refactoring in scsi_lib.c.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXtq5QyYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishXyGAQCipTWx
 7kHKHZBCVTU133bADt3+SstLrAm8PKZEXMnP9wEAzu4QkkW8URxEDRrpu7qk5gbA
 9M/KyqvfRtTH7+BSK7M=
 =J6aO
 -----END PGP SIGNATURE-----

Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI updates from James Bottomley:
 :This series consists of the usual driver updates (qla2xxx, ufs, zfcp,
  target, scsi_debug, lpfc, qedi, qedf, hisi_sas, mpt3sas) plus a host
  of other minor updates.

  There are no major core changes in this series apart from a
  refactoring in scsi_lib.c"

* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (207 commits)
  scsi: ufs: ti-j721e-ufs: Fix unwinding of pm_runtime changes
  scsi: cxgb3i: Fix some leaks in init_act_open()
  scsi: ibmvscsi: Make some functions static
  scsi: iscsi: Fix deadlock on recovery path during GFP_IO reclaim
  scsi: ufs: Fix WriteBooster flush during runtime suspend
  scsi: ufs: Fix index of attributes query for WriteBooster feature
  scsi: ufs: Allow WriteBooster on UFS 2.2 devices
  scsi: ufs: Remove unnecessary memset for dev_info
  scsi: ufs-qcom: Fix scheduling while atomic issue
  scsi: mpt3sas: Fix reply queue count in non RDPQ mode
  scsi: lpfc: Fix lpfc_nodelist leak when processing unsolicited event
  scsi: target: tcmu: Fix a use after free in tcmu_check_expired_queue_cmd()
  scsi: vhost: Notify TCM about the maximum sg entries supported per command
  scsi: qla2xxx: Remove return value from qla_nvme_ls()
  scsi: qla2xxx: Remove an unused function
  scsi: iscsi: Register sysfs for iscsi workqueue
  scsi: scsi_debug: Parser tables and code interaction
  scsi: core: Refactor scsi_mq_setup_tags function
  scsi: core: Fix incorrect usage of shost_for_each_device
  scsi: qla2xxx: Fix endianness annotations in source files
  ...
2020-06-05 15:11:50 -07:00
James Smart
4c2805aab5 lpfc: nvmet: Add support for NVME LS request hosthandle
As the nvmet layer does not have the concept of a remoteport object, which
can be used to identify the entity on the other end of the fabric that is
to receive an LS, the hosthandle was introduced.  The driver passes the
hosthandle, a value representative of the remote port, with a ls request
receive. The LS request will create the association.  The transport will
remember the hosthandle for the association, and if there is a need to
initiate a LS request to the remote port for the association, the
hosthandle will be used. When the driver loses connectivity with the
remote port, it needs to notify the transport that the hosthandle is no
longer valid, allowing the transport to terminate associations related to
the hosthandle.

This patch adds support to the driver for the hosthandle. The driver will
use the ndlp pointer of the remote port for the hosthandle in calls to
nvmet_fc_rcv_ls_req().  The discovery engine is updated to invalidate the
hosthandle whenever connectivity with the remote port is lost.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:34 -06:00
James Smart
2a1160a03a lpfc: Refactor lpfc nvme headers
A lot of files in lpfc include nvme headers, building up relationships that
require a file to change for its headers when there is no other change
necessary. It would be better to localize the nvme headers.

There is also no need for separate nvme (initiator) and nvmet (tgt)
header files.

Refactor the inclusion of nvme headers so that all nvme items are
included by lpfc_nvme.h

Merge lpfc_nvmet.h into lpfc_nvme.h so that there is a single header used
by both the nvme and nvmet sides. This prepares for structure sharing
between the two roles. Prep to add shared function prototypes for upcoming
shared routines.

Signed-off-by: Paul Ely <paul.ely@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2020-05-09 16:18:34 -06:00
Dick Kennedy
88acb4d9ff scsi: lpfc: Remove unnecessary lockdep_assert_held calls
In an audit of lockdep calls in the driver, there are multiple lockdep
checks in successive calling layers. E.g. a routine checks, and then calls
a lower routine that also checks, and so on. Calling sequences result in
many redundant checks.

Refine the code to remove lower-level lockdep checks.  Update comments on
the lock, correcting a few places where lock object in comment was
incorrect.

Link: https://lore.kernel.org/r/20200501214310.91713-7-jsmart2021@gmail.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-05-07 22:47:24 -04:00
James Smart
df3fe76658 scsi: lpfc: add RDF registration and Link Integrity FPIN logging
This patch modifies lpfc to register for Link Integrity events via the use
of an RDF ELS and to perform Link Integrity FPIN logging.

Specifically, the driver was modified to:

 - Format and issue the RDF ELS immediately following SCR registration.
   This registers the ability of the driver to receive FPIN ELS.

 - Adds decoding of the FPIN els into the received descriptors, with
   logging of the Link Integrity event information. After decoding, the ELS
   is delivered to the scsi fc transport to be delivered to any user-space
   applications.

 - To aid in logging, simple helpers were added to create enum to name
   string lookup functions that utilize the initialization helpers from the
   fc_els.h header.

 - Note: base header definitions for the ELS's don't populate the
   descriptor payloads. As such, lpfc creates it's own version of the
   structures, using the base definitions (mostly headers) and additionally
   declaring the descriptors that will complete the population of the ELS.

Link: https://lore.kernel.org/r/20200210173155.547-3-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-18 00:08:38 -05:00
James Smart
145e5a8a5c scsi: lpfc: Copyright updates for 12.6.0.4 patches
Update copyrights to 2020 for files modified in the 12.6.0.4 patch set.

Link: https://lore.kernel.org/r/20200128002312.16346-13-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-10 22:46:56 -05:00
James Smart
835214f5d5 scsi: lpfc: Fix broken Credit Recovery after driver load
When driver is set to enable bb credit recovery, the switch displayed the
setting as inactive.  If the link bounces, it switches to Active.

During link up processing, the driver currently does a MBX_READ_SPARAM
followed by a MBX_CONFIG_LINK. These mbox commands are queued to be
executed, one at a time and the completion is processed by the worker
thread.  Since the MBX_READ_SPARAM is done BEFORE the MBX_CONFIG_LINK, the
BB_SC_N bit is never set the the returned values. BB Credit recovery status
only gets set after the driver requests the feature in CONFIG_LINK, which
is done after the link up. Thus the ordering of READ_SPARAM needs to follow
the CONFIG_LINK.

Fix by reordering so that READ_SPARAM is done after CONFIG_LINK.  Added a
HBA_DEFER_FLOGI flag so that any FLOGI handling waits until after the
READ_SPARAM is done so that the proper BB credit value is set in the FLOGI
payload.

Fixes: 6bfb162082 ("scsi: lpfc: Fix configuration of BB credit recovery in service parameters")
Cc: <stable@vger.kernel.org> # v5.4+
Link: https://lore.kernel.org/r/20200128002312.16346-4-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2020-02-10 22:46:55 -05:00
James Smart
e3ba04c9ba scsi: lpfc: Fix Fabric hostname registration if system hostname changes
There are reports of multiple ports on the same system displaying different
hostnames in fabric FDMI displays.

Currently, the driver registers the hostname at initialization and obtains
the hostname via init_utsname()->nodename queried at the time the FC link
comes up. Unfortunately, if the machine hostname is updated after
initialization, such as via DHCP or admin command, the value registered
initially will be incorrect.

Fix by having the driver save the hostname that was registered with FDMI.
The driver then runs a heartbeat action that will check the hostname.  If
the name changes, reregister the FMDI data.

The hostname is used in RSNN_NN, FDMI RPA and FDMI RHBA.

Link: https://lore.kernel.org/r/20191218235808.31922-5-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-12-21 13:42:42 -05:00
James Smart
b9da814cd5 scsi: lpfc: Clarify FAWNN error message
Current message on FAWWN events is rather cryptic.

Expand the message to clarify its meaning.

Link: https://lore.kernel.org/r/20191105005708.7399-8-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-11-06 00:04:04 -05:00
James Smart
6bfb162082 scsi: lpfc: Fix configuration of BB credit recovery in service parameters
The driver today is reading service parameters from the firmware and then
overwriting the firmware-provided values with values of its own.  There are
some switch features that require preliminary FLOGI's that are
switch-specific and done prior to the actual fabric FLOGI for traffic.  The
fw will perform those FLOGIs and will revise the service parameters for the
features configured. As the driver later overwrites those values with its
own values, it misconfigures things like BBSCN use by doing so.

Correct by eliminating the driver-overwrite of firmware values. The driver
correctly re-reads the service parameters after each link up to obtain the
latest values from firmware.

Link: https://lore.kernel.org/r/20191105005708.7399-3-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-11-06 00:04:03 -05:00
James Smart
b4b3417cf6 scsi: lpfc: Add additional discovery log messages
When debugging a recent discovery customer problem it was very hard to tell
what was happening with the existing discovery log messages. To fully debug
the issue additional log messages were necessary.

Add or extend log messages so that sufficient information is present for
debugging.

Link: https://lore.kernel.org/r/20191018211832.7917-16-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-10-24 21:02:06 -04:00
James Smart
15498dc1a5 scsi: lpfc: Fix list corruption in lpfc_sli_get_iocbq
After study, it was determined there was a double free of a CT iocb during
execution of lpfc_offline_prep and lpfc_offline.  The prep routine issued
an abort for some CT iocbs, but the aborts did not complete fast enough for
a subsequent routine that waits for completion. Thus the driver proceeded
to lpfc_offline, which releases any pending iocbs. Unfortunately, the
completions for the aborts were then received which re-released the ct
iocbs.

Turns out the issue for why the aborts didn't complete fast enough was not
their time on the wire/in the adapter. It was the lpfc_work_done routine,
which requires the adapter state to be UP before it calls
lpfc_sli_handle_slow_ring_event() to process the completions. The issue is
the prep routine takes the link down as part of it's processing.

To fix, the following was performed:

 - Prevent the offline routine from releasing iocbs that have had aborts
   issued on them. Defer to the abort completions. Also means the driver
   fully waits for the completions.  Given this change, the recognition of
   "driver-generated" status which then releases the iocb is no longer
   valid. As such, the change made in the commit 296012285c is reverted.
   As recognition of "driver-generated" status is no longer valid, this
   patch reverts the changes made in
   commit 296012285c ("scsi: lpfc: Fix leak of ELS completions on adapter reset")

 - Modify lpfc_work_done to allow slow path completions so that the abort
   completions aren't ignored.

 - Updated the fdmi path to recognize a CT request that fails due to the
   port being unusable. This stops FDMI retries. FDMI will be restarted on
   next link up.

Link: https://lore.kernel.org/r/20190922035906.10977-14-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30 22:07:10 -04:00
James Smart
3f97aed611 scsi: lpfc: Fix discovery failures when target device connectivity bounces
An issue was seen discovering all SCSI Luns when a target device undergoes
link bounce.

The driver currently does not qualify the FC4 support on the target.
Therefore it will send a SCSI PRLI and an NVMe PRLI. The expectation is
that the target will reject the PRLI if it is not supported. If a PRLI
times out, the driver will retry. The driver will not proceed with the
device until both SCSI and NVMe PRLIs are resolved.  In the failure case,
the device is FCP only and does not respond to the NVMe PRLI, thus
initiating the wait/retry loop in the driver.  During that time, a RSCN is
received (device bounced) causing the driver to issue a GID_FT.  The GID_FT
response comes back before the PRLI mess is resolved and it prematurely
cancels the PRLI retry logic and leaves the device in a STE_PRLI_ISSUE
state. Discovery with the target never completes or resets.

Fix by resetting the node state back to STE_NPR_NODE when GID_FT completes,
thereby restarting the discovery process for the node.

Link: https://lore.kernel.org/r/20190922035906.10977-10-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30 22:07:09 -04:00
James Smart
0f154226d6 scsi: lpfc: Fix device recovery errors after PLOGI failures
When target-side fault injections are made, the driver isn't reconnecting
to the remote port. The driver is logging "2753" error messages which
state:

"PLOGI failure DID:1B2400 Status:x3/xf0240008"

The failures status is indicating a Illegal field error, which points to
the Temporary RPI field being used for the ELS. This error typically means
the driver used an RPI that was already registered (shouldn't be registered
if using it in this context).

Study has found that if the driver were in discovery attempts and
encountered an error, it wouldn't flag the temporary rpi in error.  Yet the
rpi was released for reallocation in these error paths and another ELS
could allocate the rpi. In the failure situation a retry was done on an ELS
that had encountered an error, and as the rpi wasn't marked in error, the
ELS reused the rpi it originally allocated. But that rpi had been allocated
by a different ELS issued after the original error and before the retry
attempt. The different ELS had succeeded and the RPI was registered.

Fix by marking the rpi state for the node to be in error, aka as needing
reallocation, upon an error in the els processing.  Error state marking is
always done prior to release back to the internal rpi free list, which the
driver wasn't doing in cases prior.

Also enhanced some of the logging to help in the next case of problem
troubleshooting.

Link: https://lore.kernel.org/r/20190922035906.10977-7-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30 22:07:09 -04:00
James Smart
97acd0019d scsi: lpfc: Fix rpi release when deleting vport
A prior use-after-free mailbox fix solved it's problem by null'ing a ndlp
pointer.  However, further testing has shown that this change causes a
later state change to occasionally be skipped, which results in a reference
count never being decremented thus the rpi is never released, which causes
a vport delete to never succeed.

Revise the fix in the prior patch to no longer null the ndlp. Instead the
RELEASE_RPI flag is set which will drive the release of the rpi.

Given the new code was added at a deep indentation level, refactor the code
block using a new routine that avoids the indentation issues.

Fixes: 	9b16406864 ("scsi: lpfc: Fix use-after-free mailbox cmd completion")
Link: https://lore.kernel.org/r/20190922035906.10977-6-jsmart2021@gmail.com
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-30 22:07:09 -04:00
Sakari Ailus
2d44d165e9 scsi: lpfc: Convert existing %pf users to %ps
Convert the remaining %pf users to %ps to prepare for the removal of the
old %pf conversion specifier support.

Fixes: 3235066449 ("scsi: lpfc: Migrate to %px and %pf in kernel print calls")
Link: https://lore.kernel.org/r/20190904160423.3865-1-sakari.ailus@linux.intel.com
Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Reviewed-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
2019-09-07 16:26:40 -04:00