In a test with high nvme remote port counts connected via a multi-hop FC
switch config where switches were systematically reset (e.g. fabric
partitioning and re-establishment), the nvme remote ports would switch
addresses based on the switch reconfiguration events. The driver would get
into a situation where the nvme port changed address, PLOGI and PRLI would
succeed nvme transport registration occurred, but subsequent LS requests by
the nvme subsystem failed due to a bad ndlp state and connectivity to the
device failed.
The driver hit a race condition on multiple devices that address swapped
simultaneously. In cases where the driver notices the remote port structure
came back as the same value as previously (meaning a nvme_rport structure
was re-enabled and did not go through devloss_tmo/connect_tmo_failures on
all controllers) the driver would unconditionally exit assuming the ndlp
information was correct. But, if the ndlp's had been swapped, the ndlp had
stale port state information, which when used by the LS request commands,
would fail the commands.
Fix by checking whether a node swap had occurred, and only exit if no ndlp
swap had occurred.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In situations where zoning is not being used, thus NVMe initiators see
other NVMe initiators as well as NVMe targets, a link bounce on an
initiator will cause the NVMe initiators to spew "6169" State Error
messages.
The driver is not qualifying whether the remote port is a NVMe targer or
not before calling the lpfc_nvme_rescan_port(), which validates the role
and prints the message if its only an NVMe initiator.
Fix by the following:
- Before calling lpfc_nvme_rescan_port() ensure that the node is a NVMe
storage target or a NVMe discovery controller.
- Clean up implementation of lpfc_nvme_rescan_port. remoteport pointer
will always be NULL if a NVMe initiator only. But, grabbing of
remoteport pointer should be done under lock to coincide with the
registering of the remote port with the fc transport.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
lpfc_nvme_register_port hit a null prev_ndlp pointer in a test with lots of
target ports swapping addresses. The oldport value was stale, thus it's
ndlp (prev_ndlp set to it) was used.
Fix by moving oldrport pointer checks, and if used prev_ndlp pointer
assignment, to be done while the lock is held.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is mostly update of the usual drivers: qla2xxx, hpsa, lpfc, ufs,
mpt3sas, ibmvscsi, megaraid_sas, bnx2fc and hisi_sas as well as the
removal of the osst driver (I heard from Willem privately that he
would like the driver removed because all his test hardware has
failed). Plus number of minor changes, spelling fixes and other
trivia.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXSTl4yYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishdcxAQDCJVbd
fPUX76/V1ldupunF97+3DTharxxbst+VnkOnCwD8D4c0KFFFOI9+F36cnMGCPegE
fjy17dQLvsJ4GsidHy8=
=aS5B
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: qla2xxx, hpsa, lpfc, ufs,
mpt3sas, ibmvscsi, megaraid_sas, bnx2fc and hisi_sas as well as the
removal of the osst driver (I heard from Willem privately that he
would like the driver removed because all his test hardware has
failed). Plus number of minor changes, spelling fixes and other
trivia.
The big merge conflict this time around is the SPDX licence tags.
Following discussion on linux-next, we believe our version to be more
accurate than the one in the tree, so the resolution is to take our
version for all the SPDX conflicts"
Note on the SPDX license tag conversion conflicts: the SCSI tree had
done its own SPDX conversion, which in some cases conflicted with the
treewide ones done by Thomas & co.
In almost all cases, the conflicts were purely syntactic: the SCSI tree
used the old-style SPDX tags ("GPL-2.0" and "GPL-2.0+") while the
treewide conversion had used the new-style ones ("GPL-2.0-only" and
"GPL-2.0-or-later").
In these cases I picked the new-style one.
In a few cases, the SPDX conversion was actually different, though. As
explained by James above, and in more detail in a pre-pull-request
thread:
"The other problem is actually substantive: In the libsas code Luben
Tuikov originally specified gpl 2.0 only by dint of stating:
* This file is licensed under GPLv2.
In all the libsas files, but then muddied the water by quoting GPLv2
verbatim (which includes the or later than language). So for these
files Christoph did the conversion to v2 only SPDX tags and Thomas
converted to v2 or later tags"
So in those cases, where the spdx tag substantially mattered, I took the
SCSI tree conversion of it, but then also took the opportunity to turn
the old-style "GPL-2.0" into a new-style "GPL-2.0-only" tag.
Similarly, when there were whitespace differences or other differences
to the comments around the copyright notices, I took the version from
the SCSI tree as being the more specific conversion.
Finally, in the spdx conversions that had no conflicts (because the
treewide ones hadn't been done for those files), I just took the SCSI
tree version as-is, even if it was old-style. The old-style conversions
are perfectly valid, even if the "-only" and "-or-later" versions are
perhaps more descriptive.
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (185 commits)
scsi: qla2xxx: move IO flush to the front of NVME rport unregistration
scsi: qla2xxx: Fix NVME cmd and LS cmd timeout race condition
scsi: qla2xxx: on session delete, return nvme cmd
scsi: qla2xxx: Fix kernel crash after disconnecting NVMe devices
scsi: megaraid_sas: Update driver version to 07.710.06.00-rc1
scsi: megaraid_sas: Introduce various Aero performance modes
scsi: megaraid_sas: Use high IOPS queues based on IO workload
scsi: megaraid_sas: Set affinity for high IOPS reply queues
scsi: megaraid_sas: Enable coalescing for high IOPS queues
scsi: megaraid_sas: Add support for High IOPS queues
scsi: megaraid_sas: Add support for MPI toolbox commands
scsi: megaraid_sas: Offload Aero RAID5/6 division calculations to driver
scsi: megaraid_sas: RAID1 PCI bandwidth limit algorithm is applicable for only Ventura
scsi: megaraid_sas: megaraid_sas: Add check for count returned by HOST_DEVICE_LIST DCMD
scsi: megaraid_sas: Handle sequence JBOD map failure at driver level
scsi: megaraid_sas: Don't send FPIO to RL Bypass queue
scsi: megaraid_sas: In probe context, retry IOC INIT once if firmware is in fault
scsi: megaraid_sas: Release Mutex lock before OCR in case of DCMD timeout
scsi: megaraid_sas: Call disable_irq from process IRQ poll
scsi: megaraid_sas: Remove few debug counters from IO path
...
This patch updates RSCN receive processing to check for the remote
port being an NVME port, and if so, invoke the nvme_fc callback to
rescan the remote port. The rescan will generate a discovery udev
event.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Reviewed-by: Arun Easi <aeasi@marvell.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Revise a stalled adapter message to also include the number of jobs that
are stalling the thread.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch avoids that the following compiler warning is reported with
CONFIG_NVME_FC=n:
drivers/scsi/lpfc/lpfc_nvme.c:2140:1: warning: 'lpfc_nvme_lport_unreg_wait' defined but not used [-Wunused-function]
lpfc_nvme_lport_unreg_wait(struct lpfc_vport *vport,
^~~~~~~~~~~~~~~~~~~~~~~~~~
Fixes: 3999df75bc ("scsi: lpfc: Declare local functions static")
Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch avoids that a kernel warning appears when smp_processor_id() is
called with preempt debugging enabled.
Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch avoids that the compiler warns about missing fall-through
annotation when building with W=1.
Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This patch avoids that the compiler complains about missing declarations
when building with W=1.
Cc: James Smart <james.smart@broadcom.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
clang -Wuninitialized incorrectly sees a variable being used without
initialization:
drivers/scsi/lpfc/lpfc_nvme.c:2102:37: error: variable 'localport' is uninitialized when used here
[-Werror,-Wuninitialized]
lport = (struct lpfc_nvme_lport *)localport->private;
^~~~~~~~~
drivers/scsi/lpfc/lpfc_nvme.c:2059:38: note: initialize the variable 'localport' to silence this warning
struct nvme_fc_local_port *localport;
^
= NULL
1 error generated.
This is clearly in dead code, as the condition leading up to it is always
false when CONFIG_NVME_FC is disabled, and the variable is always
initialized when nvme_fc_register_localport() got called successfully.
Change the preprocessor conditional to the equivalent C construct, which
makes the code more readable and gets rid of the warning.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Update the 6072 log message string to print the whole 32 bits of the
extended status.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is the final round of mostly small fixes and performance
improvements to our initial submit. The main regression fix is the
ia64 simscsi build failure which was missed in the serial number
elimination conversion.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXIxBayYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pisherpAP4rxLpX
bcUnQnEsvoxys/JyoK08Qfv1JebZo1B2MAZ62wD/VZ7LpOuzVLhsM2KhLFGRrs1/
7D2K4tgtO2dQsFix7H0=
=pcHl
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull more SCSI updates from James Bottomley:
"This is the final round of mostly small fixes and performance
improvements to our initial submit.
The main regression fix is the ia64 simscsi build failure which was
missed in the serial number elimination conversion"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (24 commits)
scsi: ia64: simscsi: use request tag instead of serial_number
scsi: aacraid: Fix performance issue on logical drives
scsi: lpfc: Fix error codes in lpfc_sli4_pci_mem_setup()
scsi: libiscsi: Hold back_lock when calling iscsi_complete_task
scsi: hisi_sas: Change SERDES_CFG init value to increase reliability of HiLink
scsi: hisi_sas: Send HARD RESET to clear the previous affiliation of STP target port
scsi: hisi_sas: Set PHY linkrate when disconnected
scsi: hisi_sas: print PHY RX errors count for later revision of v3 hw
scsi: hisi_sas: Fix a timeout race of driver internal and SMP IO
scsi: hisi_sas: Change return variable type in phy_up_v3_hw()
scsi: qla2xxx: check for kstrtol() failure
scsi: lpfc: fix 32-bit format string warning
scsi: lpfc: fix unused variable warning
scsi: target: tcmu: Switch to bitmap_zalloc()
scsi: libiscsi: fall back to sendmsg for slab pages
scsi: qla2xxx: avoid printf format warning
scsi: lpfc: resolve static checker warning in lpfc_sli4_hba_unset
scsi: lpfc: Correct __lpfc_sli_issue_iocb_s4 lockdep check
scsi: ufs: hisi: fix ufs_hba_variant_ops passing
scsi: qla2xxx: Fix panic in qla_dfs_tgt_counters_show
...
This is mostly update of the usual drivers: arcmsr, qla2xxx, lpfc,
hisi_sas, target/iscsi and target/core. Additionally Christoph
refactored gdth as part of the dma changes. The major mid-layer
change this time is the removal of bidi commands and with them the
whole of the osd/exofs driver and filesystem.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCXIC54SYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishT1GAPwJEV23
ExPiPsnuVgKj49nLTagZ3rILRQcYNbL+MNYqxQEA0cT8FHzSDBfWY5OKPNE+RQ8z
f69LpXGmMpuagKGvvd4=
=Fhy1
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly update of the usual drivers: arcmsr, qla2xxx, lpfc,
hisi_sas, target/iscsi and target/core.
Additionally Christoph refactored gdth as part of the dma changes. The
major mid-layer change this time is the removal of bidi commands and
with them the whole of the osd/exofs driver and filesystem. This is a
major simplification for block and mq in particular"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (240 commits)
scsi: cxgb4i: validate tcp sequence number only if chip version <= T5
scsi: cxgb4i: get pf number from lldi->pf
scsi: core: replace GFP_ATOMIC with GFP_KERNEL in scsi_scan.c
scsi: mpt3sas: Add missing breaks in switch statements
scsi: aacraid: Fix missing break in switch statement
scsi: kill command serial number
scsi: csiostor: drop serial_number usage
scsi: mvumi: use request tag instead of serial_number
scsi: dpt_i2o: remove serial number usage
scsi: st: osst: Remove negative constant left-shifts
scsi: ufs-bsg: Allow reading descriptors
scsi: ufs: Allow reading descriptor via raw upiu
scsi: ufs-bsg: Change the calling convention for write descriptor
scsi: ufs: Remove unused device quirks
Revert "scsi: ufs: disable vccq if it's not needed by UFS device"
scsi: megaraid_sas: Remove a bunch of set but not used variables
scsi: clean obsolete return values of eh_timed_out
scsi: sd: Optimal I/O size should be a multiple of physical block size
scsi: MAINTAINERS: SCSI initiator and target tweaks
scsi: fcoe: make use of fip_mode enum complete
...
The newly introduced 'cpu' variable is only used inside of an optional
block, so we get a warning without CONFIG_SCSI_LPFC_DEBUG_FS:
drivers/scsi/lpfc/lpfc_nvme.c: In function 'lpfc_nvme_io_cmd_wqe_cmpl':
drivers/scsi/lpfc/lpfc_nvme.c:968:30: error: unused variable 'cpu' [-Werror=unused-variable]
uint32_t code, status, idx, cpu;
Move the declaration into the same block to avoid the warning.
Fixes: 63df6d637e ("scsi: lpfc: Adapt cpucheck debugfs logic to Hardware Queues")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
For files modified as part of 12.2.0.0 patches, update copyright to 2019
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
A scsi host lock is taken on every io completion to check whether the abort
handler is waiting on the io completion. This is an expensive lock to take
on all completion when rarely in an abort condition.
Replace scsi host lock with command-specific lock. Synchronize completion
and abort paths by new cmd lock. Ensure all flag changing and nulling of
context pointers taken under lock. When adding lock to task management
abort, realized it was missing other synchronization locks. Added that
synchronization to match normal paths.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
So far MSIX vector allocation assumed it would be 1:1 with hardware
queues. However, there are several reasons why fewer MSIX vectors may be
allocated than hardware queues such as the platform being out of vectors or
adapter limits being less than cpu count.
This patch reworks the MSIX/EQ relationships with the per-cpu hardware
queues so they can function independently. MSIX vectors will be equitably
split been cpu sockets/cores and then the per-cpu hardware queues will be
mapped to the vectors most efficient for them.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Default behavior is to use the information from the upper IO stacks to
select the hardware queue to use for IO submission. Which typically has
good cpu affinity.
However, the driver, when used on some variants of the upstream kernel, has
found queuing information to be suboptimal for FCP or IO completion locked
on particular cpus.
For command submission situations, the lpfc_fcp_io_sched module parameter
can be set to specify a hardware queue selection policy that overrides the
os stack information.
For IO completion situations, rather than queing cq processing based on the
cpu servicing the interrupting event, schedule the cq processing on the cpu
associated with the hardware queue's cq.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The XRI get/put lists were partitioned per hardware queue. However, the
adapter rarely had sufficient resources to give a large number of resources
per queue. As such, it became common for a cpu to encounter a lack of XRI
resource and request the upper io stack to retry after returning a BUSY
condition. This occurred even though other cpus were idle and not using
their resources.
Create as efficient a scheme as possible to move resources to the cpus that
need them. Each cpu maintains a small private pool which it allocates from
for io. There is a watermark that the cpu attempts to keep in the private
pool. The private pool, when empty, pulls from a global pool from the
cpu. When the cpu's global pool is empty it will pull from other cpu's
global pool. As there many cpu global pools (1 per cpu or hardware queue
count) and as each cpu selects what cpu to pull from at different rates and
at different times, it creates a radomizing effect that minimizes the
number of cpu's that will contend with each other when the steal XRI's from
another cpu's global pool.
On io completion, a cpu will push the XRI back on to its private pool. A
watermark level is maintained for the private pool such that when it is
exceeded it will move XRI's to the CPU global pool so that other cpu's may
allocate them.
On NVME, as heartbeat commands are critical to get placed on the wire, a
single expedite pool is maintained. When a heartbeat is to be sent, it will
allocate an XRI from the expedite pool rather than the normal cpu
private/global pools. On any io completion, if a reduction in the expedite
pools is seen, it will be replenished before the XRI is placed on the cpu
private pool.
Statistics are added to aid understanding the XRI levels on each cpu and
their behaviors.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
SLI4 nvme functions are passing the SLI3 ring number when posting wqe to
hardware. This should be indicating the hardware queue to use, not the ring
number.
Replace ring number with the hardware queue that should be used.
Note: SCSI avoided this issue as it utilized an older lfpc_issue_iocb
routine that properly adapts.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Many io statistics were being sampled and saved using adapter-based data
structures. This was creating a lot of contention and cache thrashing in
the I/O path.
Move the statistics to the hardware queue data structures. Given the
per-queue data structures, use of atomic types is lessened.
Add new sysfs and debugfs stat routines to collate the per hardware queue
values and report at an adapter level.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Similar to the io execution path that reports cpu context information, the
debugfs routines for cpu information needs to be aligned with new hardware
queue implementation.
Convert debugfs cnd nvme cpucheck statistics to report information per
Hardware Queue.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Once the IO buff allocations were made shared, there was a single XRI
buffer list shared by all hardware queues. A single list isn't great for
performance when shared across the per-cpu hardware queues.
Create a separate XRI IO buffer get/put list for each Hardware Queue. As
SGLs and associated IO buffers get allocated/posted to the firmware; round
robin their assignment across all available hardware Queues so that there
is an equitable assignment.
Modify SCSI and NVME IO submit code paths to use the Hardware Queue logic
for XRI allocation.
Add a debugfs interface to display hardware queue statistics
Added new empty_io_bufs counter to track if a cpu runs out of XRIs.
Replace common_ variables/names with io_ to make meanings clearer.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, both nvme and fcp each have their own concept of an io_channel,
which is a combination wq/cq and associated msix. Different cpus would
share an io_channel.
The driver is now moving to per-cpu wq/cq pairs and msix vectors. The
driver will still use separate wq/cq pairs per protocol on each cpu, but
the protocols will share the msix vector.
Given the elimination of the nvme and fcp io channels, the module
parameters will be removed. A new parameter, lpfc_hdw_queue is added which
allows the wq/cq pair allocation per cpu to be overridden and allocated to
lesser value. If lpfc_hdw_queue is zero, the number of pairs allocated will
be based on the number of cpus. If non-zero, the parameter specifies the
number of queues to allocate. At this time, the maximum non-zero value is
64.
To manage this new paradigm, a new hardware queue structure is created to
track queue activity and relationships.
As MSIX vector allocation must be known before setting up the
relationships, msix allocation now occurs before queue datastructures are
allocated. If the number of vectors allocated is less than the desired
hardware queues, the hardware queue counts will be reduced to the number of
vectors
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently, both NVME and SCSI get their IO buffers from separate
pools. XRI's are associated 1:1 with IO buffers, so XRI's are also split
between protocols.
Eliminate the independent pools and use a single pool. Each buffer
structure now has a common section and a protocol section. Per protocol
routines for SGL initialization are removed and replaced by common
routines. Initialization of the buffers is only done on the common area.
All other fields, which are protocol specific, are initialized when the
buffer is allocated for use in the per-protocol allocation routine.
In the past, the SCSI side allocated IO buffers as part of slave_alloc
calls until the maximum XRIs for SCSI was reached. As all XRIs are now
common and may be used for either protocol, allocation for everything is
done as part of adapter initialization and the scsi side has no action in
slave alloc.
As XRI's are no longer split, the lpfc_xri_split module parameter is
removed.
Adapters based on SLI3 will continue to use the older scsi_buf_list_get/put
routines. All SLI4 adapters utilize the new IO buffer scheme
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
lpfc_nvme_prep_io_cmd() checks for null pnode, but caller
lpfc_nvme_fcp_io_submit() has already ensured it's non-null.
Remove the pnode null check.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
An hba-wide lock is taken in the nvme io completion routine. The lock
covers null'ing of the nrport pointer in the cmd structure.
The nrport member isn't necessary. After extracting the pointer from the
command, the pointer was dereferenced to get the fc discovery node
pointer. But the fc discovery node pointer is alrady in the command
structure so the dereferrence was unnecessary.
Eliminated the nrport structure member and its use, which also eliminates
the port-wide lock.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We cannot wait on a completion object in the lpfc_nvme_lport structure in
the _destroy_localport() code path because the NVMe/fc transport will free
that structure immediately after the .localport_delete() callback. This
results in a use-after-free, and a hang if slub_debug=FZPU is enabled.
Fix this by putting the completion on the stack.
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Driver is setting bits in word 10 of the SLI4 ABORT WQE (the wqid). The
field was a carry over from a prior SLI revision. The field does not exist
in SLI4, and the action may result in an overlap with future definition of
the WQE.
Remove the setting of WQID in the ABORT WQE.
Also cleaned up WQE field settings - initialize to zero, don't bother to
set fields to zero.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
This is mostly updates of the usual drivers: UFS, esp_scsi, NCR5380,
qla2xxx, lpfc, libsas, hisi_sas. In addition there's a set of mostly
small updates to the target subsystem a set of conversions to the
generic DMA API, which do have some potential for issues in the older
drivers but we'll handle those as case by case fixes. A new myrs for
the DAC960/mylex raid controllers to replace the block based DAC960
which is also being removed by Jens in this merge window. Plus the
usual slew of trivial changes.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCW9BQJSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishU3MAP41T8yW
UJQDCprj65pCR+9mOUWzgMvgAW/15ouK89x/7AD/XAEQZqoAgpFUbgnoZWGddZkS
LykIzSiLHP4qeDOh1TQ=
=2JMU
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"This is mostly updates of the usual drivers: UFS, esp_scsi, NCR5380,
qla2xxx, lpfc, libsas, hisi_sas.
In addition there's a set of mostly small updates to the target
subsystem a set of conversions to the generic DMA API, which do have
some potential for issues in the older drivers but we'll handle those
as case by case fixes.
A new myrs driver for the DAC960/mylex raid controllers to replace the
block based DAC960 which is also being removed by Jens in this merge
window.
Plus the usual slew of trivial changes"
[ "myrs" stands for "MYlex Raid Scsi". Obviously. Silly of me to even
wonder. There's also a "myrb" driver, where the 'b' stands for
'block'. Truly, somebody has got mad naming skillz. - Linus ]
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (237 commits)
scsi: myrs: Fix the processor absent message in processor_show()
scsi: myrs: Fix a logical vs bitwise bug
scsi: hisi_sas: Fix NULL pointer dereference
scsi: myrs: fix build failure on 32 bit
scsi: fnic: replace gross legacy tag hack with blk-mq hack
scsi: mesh: switch to generic DMA API
scsi: ips: switch to generic DMA API
scsi: smartpqi: fully convert to the generic DMA API
scsi: vmw_pscsi: switch to generic DMA API
scsi: snic: switch to generic DMA API
scsi: qla4xxx: fully convert to the generic DMA API
scsi: qla2xxx: fully convert to the generic DMA API
scsi: qla1280: switch to generic DMA API
scsi: qedi: fully convert to the generic DMA API
scsi: qedf: fully convert to the generic DMA API
scsi: pm8001: switch to generic DMA API
scsi: nsp32: switch to generic DMA API
scsi: mvsas: fully convert to the generic DMA API
scsi: mvumi: switch to generic DMA API
scsi: mpt3sas: switch to generic DMA API
...
The driver currently uses the ndlp to get the local rport which is then used
to get the nvme transport remoteport pointer. There can be cases where a stale
remoteport pointer is obtained as synchronization isn't done through the
different dereferences.
Correct by using locks to synchronize the dereferences.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fixes gcc '-Wunused-but-set-variable' warning:
drivers/scsi/lpfc/lpfc_nvme.c: In function 'lpfc_new_nvme_buf':
drivers/scsi/lpfc/lpfc_nvme.c:2238:24: warning:
variable 'sgl_size' set but not used [-Wunused-but-set-variable]
int bcnt, num_posted, sgl_size;
^
Signed-off-by: YueHaibing <yuehaibing@huawei.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Message 6408 is displayed for each entry in an array, but the cpu and queue
numbers were incorrect for the entry. Message 6001 includes an extraneous
character.
Resolve both issues
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The driver allocates a sg list per io struture based on a fixed maximum
size. When it registers with the protocol transports and indicates the max sg
list size it supports, the driver manipulates the fixed value to report a
lesser amount so that it has reserved space for sg elements that are used for
DIF.
The driver initialization path sets the cfg_sg_seg_cnt field to the
manipulated value for scsi. NVME initialization ran afterward and capped it's
maximum by the manipulated value for SCSI. This erroneously made NVME report
the SCSI-reduce-for-DIF value that reduced the max io size for nvme and wasted
sg elements.
Rework the driver so that cfg_sg_seg_cnt becomes the overall maximum size and
allow the max size to be tunable. A separate (new) scsi sg count is then
setup with the scsi-modified reduced value. NVME then initializes based off
the overall maximum.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Performance is affected when target queue depth is tracked. An atomic
counter is incremented on the submission path which competes with it being
decremented on the completion path. In addition, multiple CPUs can
simultaniously be manipulating this counter for the same ndlp.
Reduce the overhead by only performing the target increment/decrement when
the target queue depth is less than the overall adapter depth, thus is
actually meaningful.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During remote port loss fault testing, the driver crashed with the
following trace:
general protection fault: 0000 [#1] SMP
RIP: ... lpfc_nvme_register_port+0x250/0x480 [lpfc]
Call Trace:
lpfc_nlp_state_cleanup+0x1b3/0x7a0 [lpfc]
lpfc_nlp_set_state+0xa6/0x1d0 [lpfc]
lpfc_cmpl_prli_prli_issue+0x213/0x440
lpfc_disc_state_machine+0x7e/0x1e0 [lpfc]
lpfc_cmpl_els_prli+0x18a/0x200 [lpfc]
lpfc_sli_sp_handle_rspiocb+0x3b5/0x6f0 [lpfc]
lpfc_sli_handle_slow_ring_event_s4+0x161/0x240 [lpfc]
lpfc_work_done+0x948/0x14c0 [lpfc]
lpfc_do_work+0x16f/0x180 [lpfc]
kthread+0xc9/0xe0
ret_from_fork+0x55/0x80
After registering a new remoteport, the driver is pulling an ndlp pointer
from the lpfc rport associated with the private area of a newly registered
remoteport. The private area is uninitialized, so it's garbage.
Correct by pulling the the lpfc rport pointer from the entering ndlp point,
then ndlp value from at rport. Note the entering ndlp may be replacing by
the rport->ndlp due to an address change swap.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
The PBDE optimizations aren't supported in all firmware revs.
Make optimizations configurable in case there's a side effect on old
firmware.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
System crashes when the lpfc module is unloaded after making the port
offline
The nvme queue pointers were freed during port offline, but were later
accessed in pci remove path.
Validate the pointers in pci remove path before accessing them.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
modprobe -r lpfc produces the following:
Call Trace:
__blk_mq_run_hw_queue+0xa2/0xb0
__blk_mq_delay_run_hw_queue+0x9d/0xb0
? blk_mq_hctx_has_pending+0x32/0x80
blk_mq_run_hw_queue+0x50/0xd0
blk_mq_sched_insert_request+0x110/0x1b0
blk_execute_rq_nowait+0x76/0x180
nvme_keep_alive_work+0x8a/0xd0 [nvme_core]
process_one_work+0x17f/0x440
worker_thread+0x126/0x3c0
? manage_workers.isra.24+0x2a0/0x2a0
kthread+0xd1/0xe0
? insert_kthread_work+0x40/0x40
ret_from_fork_nospec_begin+0x21/0x21
? insert_kthread_work+0x40/0x40
However, rmmod lpfc would run correctly.
When an nvme remoteport is unregistered with the host nvme transport, it
needs to set the remoteport->dev_loss_tmo value 0 to indicate an immediate
termination of device loss and prevent any further keep alives to that
rport. The driver was never setting dev_loss_tmo causing the nvme
transport to continue to send the keep alive.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Under large configurations, the driver would start to log message 6065 -
NVME out of buffers (exchanges).
The driver is using the ndlp cmd_qdepth value when determining the max
outstanding ios for an adapter. This value, by default, is set to 65536,
which exceeds the maximum exchange counts supported on an adapter. The ndlp
cmd_qdepth has no relevance and outstanding io count should be capped at
the max exchange count with IO requests beyond that level getting bounced
back with an EBUSY status so that they are retried by the block layer.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix small formatting and wording nits in Broadcom copyright header
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Fix up log messages and add an fcp error stat counter in the IO submit
code path to make diagnosing problems easier
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
I/O submission paths in the lpfc nvme path are rejecting the io with an
error code that reflects back to the callee as a hard io failure. Many
of these conditions are transient and would likely resolve if retried.
Correct by returning -EBUSY, which the FC transport triggers off of to
return busy status codes to the blk-mq layer.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Points referencing local port structures didn't accommodate cases where
the localport may not be registered yet.
Add NULL pointer checks to logic.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
On tests adding and removing a remote port, calls to nvme_info would
eventually show fewer target ports discovered than were present in the
san. Additionally, the following error messages were seen:
6031 RemotePort Registration failed err: -116, DID x471301
There is a race condition that exists between the driver and the nvme
transport on remote port unregister vs the confirmed deletion. It's
possible that the driver may rediscover the remote port and reregister
the remote port before a prior unregister delete callback was made (as
it rebinded to the prior remoteport structure). However, the driver was
coded to expect the callback before seeing the remote port again thus a
new registration. The logic results in the driver having an invalid
remoteport pointer set.
Correct by tracking when waiting for the delete callback. In cases where
the ndlp remoteport pointer is updated, it is only cleared when the wait
has not been superceded by a prior registration.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
During target-side port faults, the driver would not recover all target
port logins. This resulted in a loss of nvme device discovery.
The driver is coded to wait for all GID_FT requests to complete before
restarting discovery. A fault is seen where the outstanding GIT_FT
counts are not properly decremented, thus discovery would never
start. Another fault was found in the clearing of the gidft_inp counter
that would be skipped in this condition. And a third fault found with
lpfc_nvme_register_port that would remove a reverence on the ndlp which
then allows a node swap on a port address change to prematurely remove
the reference and release the ndlp.
The following changes are made:
- Correct the decrementing of the outstanding GID_FT counters.
- In RSCN handling, no longer zero the counter before calling to issue
another GID_FT.
- No longer remove the reference on the dlp when the ndlp->nrport value
is not yet null.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After making remoteport unregister requests, the ndlp nrport pointer was
stale.
Track when waiting for waiting for unregister completion callback and
adjust nldp pointer assignment. Add a few safety checks for NULL
pointer values.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When debugging various issues, per IO channel IO statistics were useful
to understand what was happening. However, many of the stats were on a
port basis rather than an io channel basis.
Move statistics to an io channel basis.
Signed-off-by: Dick Kennedy <dick.kennedy@broadcom.com>
Signed-off-by: James Smart <james.smart@broadcom.com>
Reviewed-by: Hannes Reinecke <hare@suse.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
There are several unions that are local to the source and do not need to
be in global scope, so make them static. Also add in a missing void
parameter to functions lpfc_nvme_cmd_template and
lpfc_nvmet_cmd_template to clean up non-ANSI warning.
Cleans up sparse warnings:
drivers/scsi/lpfc/lpfc_nvme.c:68:19: warning: symbol
'lpfc_iread_cmd_template' was not declared. Should it be static?
drivers/scsi/lpfc/lpfc_nvme.c:69:19: warning: symbol
'lpfc_iwrite_cmd_template' was not declared. Should it be static?
drivers/scsi/lpfc/lpfc_nvme.c:70:19: warning: symbol
'lpfc_icmnd_cmd_template' was not declared. Should it be static?
drivers/scsi/lpfc/lpfc_nvme.c:74:24: warning: non-ANSI function
'lpfc_tsend_cmd_template' was not declared. Should it be static?
drivers/scsi/lpfc/lpfc_nvmet.c:78:19: warning: symbol
'lpfc_treceive_cmd_template' was not declared. Should it be static?
drivers/scsi/lpfc/lpfc_nvmet.c:79:19: warning: symbol
'lpfc_trsp_cmd_template' was not declared. Should it be static?
drivers/scsi/lpfc/lpfc_nvmet.c:83:25: warning: non-ANSI function
declaration of function 'lpfc_nvmet_cmd_template'
Signed-off-by: Colin Ian King <colin.king@canonical.com>
Acked-by: James Smart <james.smart@broadcom.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>