Current code uses the specified ring buffer size (either the default of 128
Kbytes or a module parameter specified value) to encompass the one page
ring buffer header plus the actual ring itself. When the page size is 4K,
carving off one page for the header isn't significant. But when the page
size is 64K on ARM64, only half of the default 128 Kbytes is left for the
actual ring. While this doesn't break anything, the smaller ring size
could be a performance bottleneck.
Fix this by applying the VMBUS_RING_SIZE macro to the specified ring buffer
size. This macro adds a page for the header, and rounds up the size to a
page boundary, using the page size for which the kernel is built. Use this
new size for subsequent ring buffer calculations. For example, on ARM64
with 64K page size and the default ring size, this results in the actual
ring being 128 Kbytes, which is intended.
Cc: stable@vger.kernel.org # 5.15.x
Signed-off-by: Michael Kelley <mhklinux@outlook.com>
Link: https://lore.kernel.org/r/20240122170956.496436-1-mhklinux@outlook.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Inside scsi_eh_wakeup(), scsi_host_busy() is called & checked with host
lock every time for deciding if error handler kthread needs to be waken up.
This can be too heavy in case of recovery, such as:
- N hardware queues
- queue depth is M for each hardware queue
- each scsi_host_busy() iterates over (N * M) tag/requests
If recovery is triggered in case that all requests are in-flight, each
scsi_eh_wakeup() is strictly serialized, when scsi_eh_wakeup() is called
for the last in-flight request, scsi_host_busy() has been run for (N * M -
1) times, and request has been iterated for (N*M - 1) * (N * M) times.
If both N and M are big enough, hard lockup can be triggered on acquiring
host lock, and it is observed on mpi3mr(128 hw queues, queue depth 8169).
Fix the issue by calling scsi_host_busy() outside the host lock. We don't
need the host lock for getting busy count because host the lock never
covers that.
[mkp: Drop unnecessary 'busy' variables pointed out by Bart]
Cc: Ewan Milne <emilne@redhat.com>
Fixes: 6eb045e092 ("scsi: core: avoid host-wide host_busy counter for scsi_mq")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Link: https://lore.kernel.org/r/20240112070000.4161982-1-ming.lei@redhat.com
Reviewed-by: Ewan D. Milne <emilne@redhat.com>
Reviewed-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
Tested-by: Sathya Prakash Veerichetty <safhya.prakash@broadcom.com>
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Final round of fixes that came in too late to send in the first
request. It's 9 bug fixes and one version update (because of a bug
fix) and one set of PCI ID additions. There's one bug fix in the core
which is really a one liner (except that an additional sdev pointer
was added for convenience) and the rest are in drivers.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZavlICYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishSIpAP9Ynm64
sxlwxmaRv9uzf6+Nbadesve0UWCPHHYjIxtNBwD/SV6wgn/WdJqO1rONR8w3RknL
ZFShKw5QmiKXVDmDJTg=
=WSYQ
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"Final round of fixes that came in too late to send in the first
request.
It's nine bug fixes and one version update (because of a bug fix) and
one set of PCI ID additions. There's one bug fix in the core which is
really a one liner (except that an additional sdev pointer was added
for convenience) and the rest are in drivers"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
scsi: target: core: Add TMF to tmr_list handling
scsi: core: Kick the requeue list after inserting when flushing
scsi: fnic: unlock on error path in fnic_queuecommand()
scsi: fcoe: Fix unsigned comparison with zero in store_ctlr_mode()
scsi: mpi3mr: Fix mpi3mr_fw.c kernel-doc warnings
scsi: smartpqi: Bump driver version to 2.1.26-030
scsi: smartpqi: Fix logical volume rescan race condition
scsi: smartpqi: Add new controller PCI IDs
scsi: ufs: qcom: Remove unnecessary goto statement from ufs_qcom_config_esi()
scsi: ufs: core: Remove the ufshcd_hba_exit() call from ufshcd_async_scan()
scsi: ufs: core: Simplify power management during async scan
vdpa/mlx5: support for resumable vqs
virtio_scsi: mq_poll support
3virtio_pmem: support SHMEM_REGION
virtio_balloon: stay awake while adjusting balloon
virtio: support for no-reset virtio PCI PM
Fixes, cleanups.
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
-----BEGIN PGP SIGNATURE-----
iQFDBAABCAAtFiEEXQn9CHHI+FuUyooNKB8NuNKNVGkFAmWmgP8PHG1zdEByZWRo
YXQuY29tAAoJECgfDbjSjVRpcfgH/0RD2S+NFY0ZEJz8BuI6GjykzYnyRW9iyxcw
epTLjPUcoEBttlw8TA+3PiPoNIJGfuU8Q4iKXJ8Jzql081tP9G1UxTIbj0v3Hx+q
0L2DUXfdAMYMLo5WQVl/PADV/10xLgExEh9jMqpU3IJIxVaLE/knD9ghRCDvDbs/
fOo3sSUGaNsSHYZs5bH73Q7cRKKmTLO+MzvHBbavFfz2fQ1b3vwecmJuQtAtK0JC
6JxH6Y38VfOl8jA6IHeEpGIHeF661HABkDDUh4UVEGOeyBl4E6ZcG4fjWSMinZ08
U3TbQLYOq10i8ki2LJKgoZHRv1HkxbM1Ogn0bsIh1hish8dPORM=
=RWjR
-----END PGP SIGNATURE-----
Merge tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost
Pull virtio updates from Michael Tsirkin:
- vdpa/mlx5: support for resumable vqs
- virtio_scsi: mq_poll support
- 3virtio_pmem: support SHMEM_REGION
- virtio_balloon: stay awake while adjusting balloon
- virtio: support for no-reset virtio PCI PM
- Fixes, cleanups
* tag 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mst/vhost:
vdpa/mlx5: Add mkey leak detection
vdpa/mlx5: Introduce reference counting to mrs
vdpa/mlx5: Use vq suspend/resume during .set_map
vdpa/mlx5: Mark vq state for modification in hw vq
vdpa/mlx5: Mark vq addrs for modification in hw vq
vdpa/mlx5: Introduce per vq and device resume
vdpa/mlx5: Allow modifying multiple vq fields in one modify command
vdpa/mlx5: Expose resumable vq capability
vdpa: Block vq property changes in DRIVER_OK
vdpa: Track device suspended state
scsi: virtio_scsi: Add mq_poll support
virtio_pmem: support feature SHMEM_REGION
virtio_balloon: stay awake while adjusting balloon
vdpa: Remove usage of the deprecated ida_simple_xx() API
virtio: Add support for no-reset virtio PCI PM
virtio_net: fix missing dma unmap for resize
vhost-vdpa: account iommu allocations
vdpa: Fix an error handling path in eni_vdpa_probe()
- Allow kernel trace instance creation to specify what events are created
Inside the kernel, a subsystem may create a tracing instance that it can
use to send events to user space. This sub-system may not care about the
thousands of events that exist in eventfs. Allow the sub-system to specify
what sub-systems of events it cares about, and only those events are exposed
to this instance.
- Allow the ring buffer to be broken up into bigger sub-buffers than just the
architecture page size. A new tracefs file called "buffer_subbuf_size_kb"
is created. The user can now specify a minimum size the sub-buffer may be
in kilobytes. Note, that the implementation currently make the sub-buffer
size a power of 2 pages (1, 2, 4, 8, 16, ...) but the user only writes in
kilobyte size, and the sub-buffer will be updated to the next size that
it will can accommodate it. If the user writes in 10, it will change the
size to be 4 pages on x86 (16K), as that is the next available size that
can hold 10K pages.
- Update the debug output when a corrupt time is detected in the ring buffer.
If the ring buffer detects inconsistent timestamps, there's a debug config
options that will dump the contents of the meta data of the sub-buffer that
is used for debugging. Add some more information to this dump that helps
with debugging.
- Add more timestamp debugging checks (only triggers when the config is enabled)
- Increase the trace_seq iterator to 2 page sizes.
- Allow strings written into tracefs_marker to be larger. Up to just under
2 page sizes (based on what trace_seq can hold).
- Increase the trace_maker_raw write to be as big as a sub-buffer can hold.
- Remove 32 bit time stamp logic, now that the rb_time_cmpxchg() has been
removed.
- More selftests were added.
- Some code clean ups as well.
-----BEGIN PGP SIGNATURE-----
iIoEABYIADIWIQRRSw7ePDh/lE+zeZMp5XQQmuv6qgUCZZ8p3BQccm9zdGVkdEBn
b29kbWlzLm9yZwAKCRAp5XQQmuv6ql2GAQDZg/zlFEiJHyTfWbCIE8pA3T5xbzKo
26TNxIZAxJJZpQEAvGFU5Smy14pG6soEoVMp8B6ZOANbqU8VVamhOL+r+Qw=
=0OYG
-----END PGP SIGNATURE-----
Merge tag 'trace-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace
Pull tracing updates from Steven Rostedt:
- Allow kernel trace instance creation to specify what events are
created
Inside the kernel, a subsystem may create a tracing instance that it
can use to send events to user space. This sub-system may not care
about the thousands of events that exist in eventfs. Allow the
sub-system to specify what sub-systems of events it cares about, and
only those events are exposed to this instance.
- Allow the ring buffer to be broken up into bigger sub-buffers than
just the architecture page size.
A new tracefs file called "buffer_subbuf_size_kb" is created. The
user can now specify a minimum size the sub-buffer may be in
kilobytes. Note, that the implementation currently make the
sub-buffer size a power of 2 pages (1, 2, 4, 8, 16, ...) but the user
only writes in kilobyte size, and the sub-buffer will be updated to
the next size that it will can accommodate it. If the user writes in
10, it will change the size to be 4 pages on x86 (16K), as that is
the next available size that can hold 10K pages.
- Update the debug output when a corrupt time is detected in the ring
buffer. If the ring buffer detects inconsistent timestamps, there's a
debug config options that will dump the contents of the meta data of
the sub-buffer that is used for debugging. Add some more information
to this dump that helps with debugging.
- Add more timestamp debugging checks (only triggers when the config is
enabled)
- Increase the trace_seq iterator to 2 page sizes.
- Allow strings written into tracefs_marker to be larger. Up to just
under 2 page sizes (based on what trace_seq can hold).
- Increase the trace_maker_raw write to be as big as a sub-buffer can
hold.
- Remove 32 bit time stamp logic, now that the rb_time_cmpxchg() has
been removed.
- More selftests were added.
- Some code clean ups as well.
* tag 'trace-v6.8' of git://git.kernel.org/pub/scm/linux/kernel/git/trace/linux-trace: (29 commits)
ring-buffer: Remove stale comment from ring_buffer_size()
tracing histograms: Simplify parse_actions() function
tracing/selftests: Remove exec permissions from trace_marker.tc test
ring-buffer: Use subbuf_order for buffer page masking
tracing: Update subbuffer with kilobytes not page order
ringbuffer/selftest: Add basic selftest to test changing subbuf order
ring-buffer: Add documentation on the buffer_subbuf_order file
ring-buffer: Just update the subbuffers when changing their allocation order
ring-buffer: Keep the same size when updating the order
tracing: Stop the tracing while changing the ring buffer subbuf size
tracing: Update snapshot order along with main buffer order
ring-buffer: Make sure the spare sub buffer used for reads has same size
ring-buffer: Do no swap cpu buffers if order is different
ring-buffer: Clear pages on error in ring_buffer_subbuf_order_set() failure
ring-buffer: Read and write to ring buffers with custom sub buffer size
ring-buffer: Set new size of the ring buffer sub page
ring-buffer: Add interface for configuring trace sub buffer size
ring-buffer: Page size per ring buffer
ring-buffer: Have ring_buffer_print_page_header() be able to access ring_buffer_iter
ring-buffer: Check if absolute timestamp goes backwards
...
-----BEGIN PGP SIGNATURE-----
iQJIBAABCgAyFiEEgMe7l+5h9hnxdsnuWYigwDrT+vwFAmWldYsUHGJoZWxnYWFz
QGdvb2dsZS5jb20ACgkQWYigwDrT+vyxUhAAs2ctoK/sMAfTOO2b1UAD/ig7CGGz
DlDt38RezFU4uqeY0Ix4heFs3RIt8YGuns76Fejfyevh1I7SOA9lbhFuMLBfO9j0
LU+KuZeGoXtIe5Kd6hCQIUgVvwISs407yp7JUUzqxFQ2rv7bin64xiDb407ZQGaK
5v4oRsnQn1KBhgZ2wfQ/S+adAma9IroK9F3C/Bm+IJ+mpNxJcbWPqnf9+5ExoxzU
MFyu0azan1crqWA/geJBetL4zVoRJx4qNEve0gqwk06vwLeIKyzB2jPO5dmn9pAb
kfAFCQgtTUGZHvZWyBZMWQcMKEQLSupOLYXU4b2Vf+oR9U0jvevqs3LArBsUceM9
vQw8Vg9RZiWs9lVeVYSQErYQecMhdiHYCXFuteaNH9tvATN4PumXiT2ZM9OsX6uy
jrXW7YLawJbGLIDNsAyrn8JESzY/CsRPpCIUq3JzL2VQdInC3mEl18rTEuKTBeZF
zE/RgwudhWDT58/vceS2LHa5KNd/vAzMTmUHEUwHg1N7TV3qkSgpPaVcvx4KklXv
1nKT2KcfD5K1Yy/InjxUYdGhRPYa7azl+l7W4hJ+NCGxwL+tUCg3knp80+empTJ0
mZm6/VSbc245nKjx3ydLlTbQ/xNMQXgHHDKPW6eO4ezZaydJZG2xkK3x6eF1+i0k
PWHSLjUxrK1AGrg=
=ri0M
-----END PGP SIGNATURE-----
Merge tag 'pci-v6.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci
Pull pci updates from Bjorn Helgaas:
"Enumeration:
- Reserve ECAM so we don't assign it to PCI BARs; this works around
bugs where BIOS included ECAM in a PNP0A03 host bridge window,
didn't reserve it via a PNP0C02 motherboard device, and didn't
allocate space for SR-IOV VF BARs (Bjorn Helgaas)
- Add MMCONFIG/ECAM debug logging (Bjorn Helgaas)
- Rename 'MMCONFIG' to 'ECAM' to match spec usage (Bjorn Helgaas)
- Log device type (Root Port, Switch Port, etc) during enumeration
(Bjorn Helgaas)
- Log bridges before downstream devices so the dmesg order is more
logical (Bjorn Helgaas)
- Log resource names (BAR 0, VF BAR 0, bridge window, etc)
consistently instead of a mix of names and "reg 0x10" (Puranjay
Mohan, Bjorn Helgaas)
- Fix 64GT/s effective data rate calculation to use 1b/1b encoding
rather than the 8b/10b or 128b/130b used by lower rates (Ilpo
Järvinen)
- Use PCI_HEADER_TYPE_* instead of literals in x86, powerpc, SCSI
lpfc (Ilpo Järvinen)
- Clean up open-coded PCIBIOS return code mangling (Ilpo Järvinen)
Resource management:
- Restructure pci_dev_for_each_resource() to avoid computing the
address of an out-of-bounds array element (the bounds check was
performed later so the element was never actually *read*, but it's
nicer to avoid even computing an out-of-bounds address) (Andy
Shevchenko)
Driver binding:
- Convert pci-host-common.c platform .remove() callback to
.remove_new() returning 'void' since it's not useful to return
error codes here (Uwe Kleine-König)
- Convert exynos, keystone, kirin from .remove() to .remove_new(),
which returns void instead of int (Uwe Kleine-König)
- Drop unused struct pci_driver.node member (Mathias Krause)
Virtualization:
- Add ACS quirk for more Zhaoxin Root Ports (LeoLiuoc)
Error handling:
- Log AER errors as "Correctable" (not "Corrected") or
"Uncorrectable" to match spec terminology (Bjorn Helgaas)
- Decode Requester ID when no error info found instead of printing
the raw hex value (Bjorn Helgaas)
Endpoint framework:
- Use a unique test pattern for each BAR in the pci_endpoint_test to
make it easier to debug address translation issues (Niklas Cassel)
Broadcom STB PCIe controller driver:
- Add DT property "brcm,clkreq-mode" and driver support for different
CLKREQ# modes to make ASPM L1.x states possible (Jim Quinlan)
Freescale Layerscape PCIe controller driver:
- Add suspend/resume support for Layerscape LS1043a and LS1021a,
including software-managed PME_Turn_Off and transitions between L0,
L2/L3_Ready Link states (Frank Li)
MediaTek PCIe controller driver:
- Clear MSI interrupt status before handler to avoid missing MSIs
that occur after the handler (qizhong cheng)
MediaTek PCIe Gen3 controller driver:
- Update mediatek-gen3 translation window setup to handle MMIO space
that is not a power of two in size (Jianjun Wang)
Qualcomm PCIe controller driver:
- Increase qcom iommu-map maxItems to accommodate SDX55 (five
entries) and SDM845 (sixteen entries) (Krzysztof Kozlowski)
- Describe qcom,pcie-sc8180x clocks and resets accurately (Krzysztof
Kozlowski)
- Describe qcom,pcie-sm8150 clocks and resets accurately (Krzysztof
Kozlowski)
- Correct the qcom "reset-name" property, previously incorrectly
called "reset-names" (Krzysztof Kozlowski)
- Document qcom,pcie-sm8650, based on qcom,pcie-sm8550 (Neil
Armstrong)
Renesas R-Car PCIe controller driver:
- Replace of_device.h with explicit of.h include to untangle header
usage (Rob Herring)
- Add DT and driver support for optional miniPCIe 1.5v and 3.3v
regulators on KingFisher (Wolfram Sang)
SiFive FU740 PCIe controller driver:
- Convert fu740 CONFIG_PCIE_FU740 dependency from SOC_SIFIVE to
ARCH_SIFIVE (Conor Dooley)
Synopsys DesignWare PCIe controller driver:
- Align iATU mapping for endpoint MSI-X (Niklas Cassel)
- Drop "host_" prefix from struct dw_pcie_host_ops members (Yoshihiro
Shimoda)
- Drop "ep_" prefix from struct dw_pcie_ep_ops members (Yoshihiro
Shimoda)
- Rename struct dw_pcie_ep_ops.func_conf_select() to
.get_dbi_offset() to be more descriptive (Yoshihiro Shimoda)
- Add Endpoint DBI accessors to encapsulate offset lookups (Yoshihiro
Shimoda)
TI J721E PCIe driver:
- Add j721e DT and driver support for 'num-lanes' for devices that
support x1, x2, or x4 Links (Matt Ranostay)
- Add j721e DT compatible strings and driver support for j784s4 (Matt
Ranostay)
- Make TI J721E Kconfig depend on ARCH_K3 since the hardware is
specific to those TI SoC parts (Peter Robinson)
TI Keystone PCIe controller driver:
- Hold power management references to all PHYs while enabling them to
avoid a race when one provides clocks to others (Siddharth
Vadapalli)
Xilinx XDMA PCIe controller driver:
- Remove redundant dev_err(), since platform_get_irq() and
platform_get_irq_byname() already log errors (Yang Li)
- Fix uninitialized symbols in xilinx_pl_dma_pcie_setup_irq()
(Krzysztof Wilczyński)
- Fix xilinx_pl_dma_pcie_init_irq_domain() error return when
irq_domain_add_linear() fails (Harshit Mogalapalli)
MicroSemi Switchtec management driver:
- Do dma_mrpc cleanup during switchtec_pci_remove() to match its devm
ioremapping in switchtec_pci_probe(). Previously the cleanup was
done in stdev_release(), which used stale pointers if stdev->cdev
happened to be open when the PCI device was removed (Daniel
Stodden)
Miscellaneous:
- Convert interrupt terminology from "legacy" to "INTx" to be more
specific and match spec terminology (Damien Le Moal)
- In dw-xdata-pcie, pci_endpoint_test, and vmd, replace usage of
deprecated ida_simple_*() API with ida_alloc() and ida_free()
(Christophe JAILLET)"
* tag 'pci-v6.8-changes' of git://git.kernel.org/pub/scm/linux/kernel/git/pci/pci: (97 commits)
PCI: Fix kernel-doc issues
PCI: brcmstb: Configure HW CLKREQ# mode appropriate for downstream device
dt-bindings: PCI: brcmstb: Add property "brcm,clkreq-mode"
PCI: mediatek-gen3: Fix translation window size calculation
PCI: mediatek: Clear interrupt status before dispatching handler
PCI: keystone: Fix race condition when initializing PHYs
PCI: xilinx-xdma: Fix error code in xilinx_pl_dma_pcie_init_irq_domain()
PCI: xilinx-xdma: Fix uninitialized symbols in xilinx_pl_dma_pcie_setup_irq()
PCI: rcar-gen4: Fix -Wvoid-pointer-to-enum-cast error
PCI: iproc: Fix -Wvoid-pointer-to-enum-cast warning
PCI: dwc: Add dw_pcie_ep_{read,write}_dbi[2] helpers
PCI: dwc: Rename .func_conf_select to .get_dbi_offset in struct dw_pcie_ep_ops
PCI: dwc: Rename .ep_init to .init in struct dw_pcie_ep_ops
PCI: dwc: Drop host prefix from struct dw_pcie_host_ops members
misc: pci_endpoint_test: Use a unique test pattern for each BAR
PCI: j721e: Make TI J721E depend on ARCH_K3
PCI: j721e: Add TI J784S4 PCIe configuration
PCI/AER: Use explicit register sizes for struct members
PCI/AER: Decode Requester ID when no error info found
PCI/AER: Use 'Correctable' and 'Uncorrectable' spec terms for errors
...
The variable 'rb' is being assigned a value but it isn't being read
afterwards. The assignment is redundant and so 'rb' can be removed.
Cleans up clang scan build warning:
warning: Although the value stored to 'rb' is used in the enclosing
expression, the value is never actually read from 'rb'[deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Link: https://lore.kernel.org/r/20240116112606.2263738-1-colin.i.king@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
virtqueue_enable_cb() will call virtqueue_poll() which will check if queue
is broken at beginning, so remove the virtqueue_is_broken() call
Signed-off-by: Li RongQing <lirongqing@baidu.com>
Link: https://lore.kernel.org/r/20240116045836.12475-1-lirongqing@baidu.com
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Acked-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Clang static complains that Value stored to 'status' is never read. Return
'status' rather than 'SCI_SUCCESS'.
Fixes: f1f52e7593 ("isci: uplevel request infrastructure")
Signed-off-by: Su Hui <suhui@nfschina.com>
Link: https://lore.kernel.org/r/20240112041926.3924315-1-suhui@nfschina.com
Reviewed-by: Artur Paszkiewicz <artur.paszkiewicz@intel.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
When libata calls ata_link_abort() to abort all ata queued commands, it
calls blk_abort_request() on the SCSI command representing each QC.
This causes scsi_timeout() to be called, which calls scsi_eh_scmd_add() for
each SCSI command.
scsi_eh_scmd_add() sets the SCSI host to state recovery, and then adds the
command to shost->eh_cmd_q.
This will wake up the SCSI EH, and eventually the libata EH strategy
handler will be called, which calls scsi_eh_flush_done_q() to either flush
retry or flush finish each failed command.
The commands that are flush retried by scsi_eh_flush_done_q() are done so
using scsi_queue_insert().
Before commit 8b566edbdb ("scsi: core: Only kick the requeue list if
necessary"), __scsi_queue_insert() called blk_mq_requeue_request() with the
second argument set to true, indicating that it should always kick/run the
requeue list after inserting.
After commit 8b566edbdb ("scsi: core: Only kick the requeue list if
necessary"), __scsi_queue_insert() does not kick/run the requeue list after
inserting, if the current SCSI host state is recovery (which is the case in
the libata example above).
This optimization is probably fine in most cases, as I can only assume that
most often someone will eventually kick/run the queues.
However, that is not the case for scsi_eh_flush_done_q(), where we can see
that the request gets inserted to the requeue list, but the queue is never
started after the request has been inserted, leading to the block layer
waiting for the completion of command that never gets to run.
Since scsi_eh_flush_done_q() is called by SCSI EH context, the SCSI host
state is most likely always in recovery when this function is called.
Thus, let scsi_eh_flush_done_q() explicitly kick the requeue list after
inserting a flush retry command, so that scsi_eh_flush_done_q() keeps the
same behavior as before commit 8b566edbdb ("scsi: core: Only kick the
requeue list if necessary").
Simple reproducer for the libata example above:
$ hdparm -Y /dev/sda
$ echo 1 > /sys/class/scsi_device/0\:0\:0\:0/device/delete
Fixes: 8b566edbdb ("scsi: core: Only kick the requeue list if necessary")
Reported-by: Kevin Locke <kevin@kevinlocke.name>
Closes: https://lore.kernel.org/linux-scsi/ZZw3Th70wUUvCiCY@kevinlocke.name/
Signed-off-by: Niklas Cassel <cassel@kernel.org>
Link: https://lore.kernel.org/r/20240111120533.3612509-1-cassel@kernel.org
Reviewed-by: Bart Van Assche <bvanassche@acm.org>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Call spin_unlock_irqrestore(&fnic->wq_copy_lock[hwq], flags) before
returning.
Fixes: c81df08cd2 ("scsi: fnic: Add support for multiqueue (MQ) in fnic driver")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Link: https://lore.kernel.org/r/5360fa20-74bc-4c22-a78e-ea8b18c5410d@moroto.mountain
Reviewed-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
ctlr->mode is of unsigned type, it is never less than zero.
Fix this by using an extra variable called 'res', to store return value
from sysfs_match_string() and assign that to ctlr->mode on the success
path.
Fixes: edc22a7c86 ("scsi: fcoe: Use sysfs_match_string() over fcoe_parse_mode()")
Signed-off-by: Harshit Mogalapalli <harshit.m.mogalapalli@oracle.com>
Link: https://lore.kernel.org/r/20240102085245.600570-1-harshit.m.mogalapalli@oracle.com
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Updates to the usual drivers (ufs, mpi3mr, mpt3sas, lpfc, fnic,
hisi_sas, arcmsr, ) plus the usual assorted minor fixes and updates.
This time around there's only a single line update to the core, so
nothing major and barely anything minor.
Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
-----BEGIN PGP SIGNATURE-----
iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZZ7roSYcamFtZXMuYm90
dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pisha3lAQC5BAGM
7OKk39iOtsKdq8uxYAhYx871sNtwBp8+BSk1FgEAhy1hg7fgCnOvl7chbSNDR6p/
mW11fYi4j2UvxECT2tg=
=/tvT
-----END PGP SIGNATURE-----
Merge tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi
Pull SCSI updates from James Bottomley:
"Updates to the usual drivers (ufs, mpi3mr, mpt3sas, lpfc, fnic,
hisi_sas, arcmsr, ) plus the usual assorted minor fixes and updates.
This time around there's only a single line update to the core, so
nothing major and barely anything minor"
* tag 'scsi-misc' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi: (135 commits)
scsi: ufs: core: Simplify ufshcd_auto_hibern8_update()
scsi: ufs: core: Rename ufshcd_auto_hibern8_enable() and make it static
scsi: ufs: qcom: Fix ESI vector mask
scsi: ufs: host: Fix kernel-doc warning
scsi: hisi_sas: Correct the number of global debugfs registers
scsi: hisi_sas: Rollback some operations if FLR failed
scsi: hisi_sas: Check before using pointer variables
scsi: hisi_sas: Replace with standard error code return value
scsi: hisi_sas: Set .phy_attached before notifing phyup event HISI_PHYE_PHY_UP_PM
scsi: ufs: core: Add sysfs node for UFS RTC update
scsi: ufs: core: Add UFS RTC support
scsi: ufs: core: Add ufshcd_is_ufs_dev_busy()
scsi: ufs: qcom: Remove unused definitions
scsi: ufs: qcom: Use ufshcd_rmwl() where applicable
scsi: ufs: qcom: Remove support for host controllers older than v2.0
scsi: ufs: qcom: Simplify ufs_qcom_{assert/deassert}_reset
scsi: ufs: qcom: Initialize cycles_in_1us variable in ufs_qcom_set_core_clk_ctrl()
scsi: ufs: qcom: Sort includes alphabetically
scsi: ufs: qcom: Remove unused ufs_qcom_hosts struct array
scsi: ufs: qcom: Use dev_err_probe() to simplify error handling of devm_gpiod_get_optional()
...
-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmWcIOIQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpn6hD/9oO7U75PuxUwYYHZ9Uzxpw6gQ0LEmeyJmE
NQYCkfYHVq3IsgOdF7elI9v3qtr6v8V8CdB7cByrnn3DgwsMuiTKZZ0dK7vH37PO
DX+/xn349e8oH7RdRo7f3m95g1YbHfpfnj0Rc4mjTDV72Jr/HlLTVgGTQg8DEnCR
wBIFmeuBHHgeeLh87gsWLAP7ReReiy9V1uqpDFsko2/4BxRAM/8eedkwcAxD8aEy
rd+dT/SBQj2cOdQMUeExT3gWjwzHh6ZHx3f1WCLK5fdck6BogH2hBUeri6F/H98L
HoaXjBZYBTH68hB/mnO5I4g1ZlrVM74Vp7JPa3e1SFFtyEi6lsyrk2J3GoNh0E7r
pXqH5kAcaJwBsBrbRGuvEyGbn9RLTaN5Gvseud0VE4oMruyodTniQaHXuIGackgz
sMavMho4486EUWPaF7gIBdLNK1hO13w+IDZ4+3oBxhudMqdgZbk4iYpOCqQ7QY5G
2vkzAE/sZ+aVNXeaIQOI8dE5clBy8gJ+6+t8dm3DY1r1xdbcnU40iZ8/fri3h69r
vHs9bpQnVWZF0gEyEflY1pkcAPpIkvMmWCR7Ehy5YCkIfa+qfSL05o3dicpWovLP
N+gCtpkhTK2AvmUWsUMypMLRvoSOImyCIiobrr3qNBaUdgRP8xKfUa72RuRp8cGl
Vrj5oAiE3w==
=YAfp
-----END PGP SIGNATURE-----
Merge tag 'for-6.8/block-2024-01-08' of git://git.kernel.dk/linux
Pull block updates from Jens Axboe:
"Pretty quiet round this time around. This contains:
- NVMe updates via Keith:
- nvme fabrics spec updates (Guixin, Max)
- nvme target udpates (Guixin, Evan)
- nvme attribute refactoring (Daniel)
- nvme-fc numa fix (Keith)
- MD updates via Song:
- Fix/Cleanup RCU usage from conf->disks[i].rdev (Yu Kuai)
- Fix raid5 hang issue (Junxiao Bi)
- Add Yu Kuai as Reviewer of the md subsystem
- Remove deprecated flavors (Song Liu)
- raid1 read error check support (Li Nan)
- Better handle events off-by-1 case (Alex Lyakas)
- Efficiency improvements for passthrough (Kundan)
- Support for mapping integrity data directly (Keith)
- Zoned write fix (Damien)
- rnbd fixes (Kees, Santosh, Supriti)
- Default to a sane discard size granularity (Christoph)
- Make the default max transfer size naming less confusing
(Christoph)
- Remove support for deprecated host aware zoned model (Christoph)
- Misc fixes (me, Li, Matthew, Min, Ming, Randy, liyouhong, Daniel,
Bart, Christoph)"
* tag 'for-6.8/block-2024-01-08' of git://git.kernel.dk/linux: (78 commits)
block: Treat sequential write preferred zone type as invalid
block: remove disk_clear_zoned
sd: remove the !ZBC && blk_queue_is_zoned case in sd_read_block_characteristics
drivers/block/xen-blkback/common.h: Fix spelling typo in comment
blk-cgroup: fix rcu lockdep warning in blkg_lookup()
blk-cgroup: don't use removal safe list iterators
block: floor the discard granularity to the physical block size
mtd_blkdevs: use the default discard granularity
bcache: use the default discard granularity
zram: use the default discard granularity
null_blk: use the default discard granularity
nbd: use the default discard granularity
ubd: use the default discard granularity
block: default the discard granularity to sector size
bcache: discard_granularity should not be smaller than a sector
block: remove two comments in bio_split_discard
block: rename and document BLK_DEF_MAX_SECTORS
loop: don't abuse BLK_DEF_MAX_SECTORS
aoe: don't abuse BLK_DEF_MAX_SECTORS
null_blk: don't cap max_hw_sectors to BLK_DEF_MAX_SECTORS
...
This adds polling support to virtio-scsi. It's based on and works similar
to virtblk support where we add a module param to specify the number of
poll queues then subtract to calculate the IO queues.
When using 8 poll queues and a vhost worker per queue we see 4K IOPs
with fio:
fio --filename=/dev/sda --direct=1 --rw=randread --bs=4k \
--ioengine=io_uring --hipri --iodepth=128 --numjobs=$NUM_JOBS
increase like:
jobs base poll
1 207K 296K
2 392K 552K
3 581K 860K
4 765K 1235K
5 936K 1598K
6 1104K 1880K
7 1253K 2095K
8 1311k 2187K
Signed-off-by: Mike Christie <michael.christie@oracle.com>
Message-Id: <20231214052649.57743-1-michael.christie@oracle.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Now that host-aware devices are always treated as conventional this case
can't happen.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Link: https://lore.kernel.org/r/20231228075141.362560-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use correct format for function return values.
Delete blank lines that are reported as "bad line:".
mpi3mr_fw.c:482: warning: No description found for return value of 'mpi3mr_get_reply_desc'
mpi3mr_fw.c:1066: warning: bad line:
mpi3mr_fw.c:1109: warning: bad line:
mpi3mr_fw.c:1249: warning: No description found for return value of 'mpi3mr_revalidate_factsdata'
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Link: https://lore.kernel.org/r/20231221053113.32191-1-rdunlap@infradead.org
Cc: Sathya Prakash Veerichetty <sathya.prakash@broadcom.com>
Cc: Kashyap Desai <kashyap.desai@broadcom.com>
Cc: Sumit Saxena <sumit.saxena@broadcom.com>
Cc: Sreekanth Reddy <sreekanth.reddy@broadcom.com>
Cc: <mpi3mr-linuxdrv.pdl@broadcom.com>
Cc: James E.J. Bottomley <jejb@linux.ibm.com>
Cc: Martin K. Petersen <martin.petersen@oracle.com>
Cc: <linux-scsi@vger.kernel.org>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Reviewed-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Reviewed-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20231219193653.277553-4-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Correct rescan flag race condition.
Multiple conditions are being evaluated before notifying OS to do a rescan.
Driver will skip rescanning the device if any one of the following
conditions are met:
- Devices that have not yet been added to the OS or devices that have been
removed.
- Devices which are already marked for removal or in the phase of removal.
Under very rare conditions, after logical volume size expansion, the OS
still sees the size of the logical volume which was before expansion.
The rescan flag in the driver is used to signal the need for a logical
volume rescan. A race condition can occur in the driver, and it leads to
one thread overwriting the flag inadvertently. As a result, driver is not
notifying the OS SML to rescan the logical volume.
Move device->rescan update into new function pqi_mark_volumes_for_rescan()
and protect with a spin lock.
Move check for device->rescan into new function pqi_volume_rescan_needed()
and protect function call with a spin_lock.
Reviewed-by: Scott Teel <scott.teel@microchip.com>
Reviewed-by: Scott Benesh <scott.benesh@microchip.com>
Reviewed-by: Mike McGowen <mike.mcgowen@microchip.com>
Reviewed-by: Kevin Barnett <kevin.barnett@microchip.com>
Co-developed-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Signed-off-by: Murthy Bhat <Murthy.Bhat@microchip.com>
Signed-off-by: Mahesh Rajashekhara <mahesh.rajashekhara@microchip.com>
Signed-off-by: Don Brace <don.brace@microchip.com>
Link: https://lore.kernel.org/r/20231219193653.277553-3-don.brace@microchip.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
disk_clear_zoned only needs to be called when a device reported zone
managed mode first and we clear it. Add a check so that disk_clear_zoned
isn't called on devices that were never zoned.
This avoids a fairly expensive queue freezing when revalidating
conventional devices.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-6-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Only use disk_set_zoned to actually enable zoned device support.
For clearing it, call disk_clear_zoned, which is renamed from
disk_clear_zone_settings and now directly clears the zoned flag as
well.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When zones were first added the SCSI and ATA specs, two different
models were supported (in addition to the drive managed one that
is invisible to the host):
- host managed where non-conventional zones there is strict requirement
to write at the write pointer, or else an error is returned
- host aware where a write point is maintained if writes always happen
at it, otherwise it is left in an under-defined state and the
sequential write preferred zones behave like conventional zones
(probably very badly performing ones, though)
Not surprisingly this lukewarm model didn't prove to be very useful and
was finally removed from the ZBC and SBC specs (NVMe never implemented
it). Due to to the easily disappearing write pointer host software
could never rely on the write pointer to actually be useful for say
recovery.
Fortunately only a few HDD prototypes shipped using this model which
never made it to mass production. Drop the support before it is too
late. Note that any such host aware prototype HDD can still be used
with Linux as we'll now treat it as a conventional HDD.
Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20231217165359.604246-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A trace instance may only need to enable specific events. As the eventfs
directory of an instance currently creates all events which adds overhead,
allow internal instances to be created with just the events in systems
that they care about. This currently only deals with systems and not
individual events, but this should bring down the overhead of creating
instances for specific use cases quite bit.
The trace_array_get_by_name() now has another parameter "systems". This
parameter is a const string pointer of a comma/space separated list of
event systems that should be created by the trace_array. (Note if the
trace_array already exists, this parameter is ignored).
The list of systems is saved and if a module is loaded, its events will
not be added unless the system for those events also match the systems
string.
Link: https://lore.kernel.org/linux-trace-kernel/20231213093701.03fddec0@gandalf.local.home
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Sean Paul <seanpaul@chromium.org>
Cc: Arun Easi <aeasi@marvell.com>
Cc: Daniel Wagner <dwagner@suse.de>
Tested-by: Dmytro Maluka <dmaluka@chromium.org>
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
In commit 8930a6c207 ("scsi: core: add support for request batching") the
block layer bd->last flag was mapped to SCMD_LAST and used as an indicator
to send the batch for the drivers that implement this feature. However, the
error handling code was not updated accordingly.
scsi_send_eh_cmnd() is used to send error handling commands and request
sense. The problem is that request sense comes as a single command that
gets into the batch queue and times out. As a result the device goes
offline after several failed resets. This was observed on virtio_scsi
during a device resize operation.
[ 496.316946] sd 0:0:4:0: [sdd] tag#117 scsi_eh_0: requesting sense
[ 506.786356] sd 0:0:4:0: [sdd] tag#117 scsi_send_eh_cmnd timeleft: 0
[ 506.787981] sd 0:0:4:0: [sdd] tag#117 abort
To fix this always set SCMD_LAST flag in scsi_send_eh_cmnd() and
scsi_reset_ioctl().
Fixes: 8930a6c207 ("scsi: core: add support for request batching")
Cc: <stable@vger.kernel.org>
Signed-off-by: Alexander Atanasov <alexander.atanasov@virtuozzo.com>
Link: https://lore.kernel.org/r/20231215121008.2881653-1-alexander.atanasov@virtuozzo.com
Reviewed-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
skb_share_check() already drops the reference to the skb when returning
NULL. Using kfree_skb() in the error handling path leads to an skb double
free.
Fix this by removing the variable tmp_skb, and return directly when
skb_share_check() returns NULL.
Fixes: 01a4cc4d0c ("bnx2fc: do not add shared skbs to the fcoe_rx_list")
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Link: https://lore.kernel.org/r/20221114110626.526643-1-weiyongjun@huaweicloud.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
chenxiang <chenxiang66@hisilicon.com> says:
This series contain some fixes and cleanups including:
- Set .phy_attached before notifying phyup event HISI_PHYEE_PHY_UP_PM;
- Use standard error code instead of hardcode;
- Check before using pointer variable;
- Rollback some operations if FLR failed;
- Correct the number of global debugfs registers;
Link: https://lore.kernel.org/r/1702525516-51258-1-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In function debugfs_debugfs_snapshot_global_reg_v3_hw() it uses
debugfs_axi_reg.count (which is the number of axi debugfs registers) to
acquire the number of global debugfs registers.
Use debugfs_global_reg.count to acquire the number of global debugfs
registers instead.
Fixes: 623a4b6d5c ("scsi: hisi_sas: Move debugfs code to v3 hw driver")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1702525516-51258-6-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
We obtain the semaphore and set HISI_SAS_RESETTING_BIT in
hisi_sas_reset_prepare_v3_hw(), block the scsi host and set
HISI_SAS_REJECT_CMD_BIT in hisi_sas_controller_reset_prepare(), released
them in hisi_sas_controller_reset_done(). However, if the HW reset failure
in FLR results in early return, the semaphore and flag bits will not be
release.
Rollback some operations including clearing flags / releasing semaphore
when FLR is failed.
Fixes: e5ea48014a ("scsi: hisi_sas: Implement handlers of PCIe FLR for v3 hw")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1702525516-51258-5-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In commit 4b329abc91 ("scsi: hisi_sas: Move slot variable definition in
hisi_sas_abort_task()"), we move the variables slot to the function head.
However, the variable slot may be NULL, we should check it in each branch.
Fixes: 4b329abc91 ("scsi: hisi_sas: Move slot variable definition in hisi_sas_abort_task()")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1702525516-51258-4-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
In function hisi_sas_controller_prereset(), -ENOSYS (Function not
implemented) should be returned if the driver does not support .soft_reset.
Returns -EPERM (Operation not permitted) if HISI_SAS_RESETTING_BIT is
already be set.
In function _suspend_v3_hw(), returns -EPERM (Operation not permitted) if
HISI_SAS_RESETTING_BIT is already be set.
Fixes: 4522204ab2 ("scsi: hisi_sas: tidy host controller reset function a bit")
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1702525516-51258-3-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Currently in directly attached scenario, the phyup event
HISI_PHYE_PHY_UP_PM is notified before .phy_attached is set - this may
cause the phyup work hisi_sas_bytes_dmaed() execution failed and the
attached device will not be found.
To fix it, set .phy_attached before notifing phyup event.
Signed-off-by: Yihang Li <liyihang9@huawei.com>
Signed-off-by: Xiang Chen <chenxiang66@hisilicon.com>
Link: https://lore.kernel.org/r/1702525516-51258-2-git-send-email-chenxiang66@hisilicon.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Instead of copying @buf into a new buffer and carefully managing its
newline/null-terminating status, we can just use sysfs_match_string() as it
uses sysfs_streq() internally which handles newline/null-term:
| /**
| * sysfs_streq - return true if strings are equal, modulo trailing newline
| * @s1: one string
| * @s2: another string
| *
| * This routine returns true iff two strings are equal, treating both
| * NUL and newline-then-NUL as equivalent string terminations. It's
| * geared for use with sysfs input strings, which generally terminate
| * with newlines but are compared against values without newlines.
| */
| bool sysfs_streq(const char *s1, const char *s2)
| ...
Then entirely drop the now unused fcoe_parse_mode(), being careful to
change if condition from checking for FIP_CONN_TYPE_UNKNOWN to < 0 as
sysfs_match_string() can return -EINVAL. Also check explicitly if
ctlr->mode is equal to FIP_CONN_TYPE_UNKNOWN -- this is probably preferred
to "<=" as the behavior is more obvious while maintaining functionality.
To get the compiler not to complain, make fip_conn_type_names const char *
const. Perhaps, this should also be done for fcf_state_names.
This also removes an instance of strncpy() which helps [1].
Link: https://github.com/KSPP/linux/issues/90 [1]
Cc: <linux-hardening@vger.kernel.org>
Signed-off-by: Justin Stitt <justinstitt@google.com>
Link: https://lore.kernel.org/r/20231212-strncpy-drivers-scsi-fcoe-fcoe_sysfs-c-v2-1-1f2d6b2fc409@google.com
Reviewed-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Justin Tee <justintee8345@gmail.com> says:
Update lpfc to revision 14.2.0.17
This patch set contains bug fixes for the VMID feature.
The patches were cut against Martin's 6.8/scsi-queue tree.
Link: https://lore.kernel.org/r/20231207224039.35466-1-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
If priority tagging is set in the service parameters of a FLOGI cmpl, then
we update the vmid_flag. In the current logic, if a follow up FLOGI cmpl
updates its service parameters such that priority tagging is no longer set,
then the vmid_flag ends up keeping stale data.
Fix by ensuring we clear the vmid_flag member during lpfc_reinit_vmid, and
check the priority tagging service parameter after reinitialization of the
vmid data structures.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231207224039.35466-4-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
After a follow up FDISC cmpl, an NPIV's VMID data structures are not
updated.
Fix by calling lpfc_reinit_vmid and copying the physical port's vmid_flag
to the NPIV's vmid_flag in the NPIV registration cmpl code path.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231207224039.35466-3-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
VMID driver support is a load time configuration setting. Thus, change
sysfs attributes to read only.
Signed-off-by: Justin Tee <justin.tee@broadcom.com>
Link: https://lore.kernel.org/r/20231207224039.35466-2-justintee8345@gmail.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Karan Tilak Kumar <kartilak@cisco.com> says:
Hi Martin, reviewers,
This cover letter describes the feature: add support for
multiqueue (MQ) to fnic driver.
Background: The Virtual Interface Card (VIC) firmware exposes several
queues that can be configured for sending IOs and receiving IO
responses. Unified Computing System Manager (UCSM) and Intersight
Manager (IMM) allows users to configure the number of queues to be
used for IOs.
The number of IO queues to be used is stored in a configuration file
by the VIC firmware. The fNIC driver reads the configuration file and
sets the number of queues to be used. Previously, the driver was
hard-coded to use only one queue. With this set of changes, the fNIC
driver will configure itself to use multiple queues. This feature
takes advantage of the block multiqueue layer to parallelize IOs being
sent out of the VIC card.
Here's a brief description of some of the salient patches:
- vnic_scsi.h needs to be in sync with VIC firmware to be able to read
the number of queues from the firmware config file. A patch has been
created for this.
- In an environment with many fnics (like we see in our customer
environments), it is hard to distinguish which fnic is printing logs.
Therefore, an fnic number has been included in the logs.
- read the number of queues from the firmware config file.
- include definitions in fnic.h to support multiqueue.
- modify the interrupt service routines (ISRs) to read from the
correct registers. The numbers that are used here come from discussions
with the VIC firmware team.
- track IO statistics for different queues.
- remove usage of host_lock, and only use fnic_lock in the fnic driver.
- use a hardware queue based spinlock to protect io_req.
- replace the hard-coded zeroth queue with a hardware queue number.
This presents a bulk of the changes.
- modify the definition of fnic_queuecommand to accept multiqueue tags.
- improve log messages, and indicate fnic number and multiqueue tags for
effective debugging.
Even though the patches have been made into a series, some patches are
heavier than others.
But, every effort has been made to keep the purpose of each patch as
a single-purpose, and to compile cleanly.
This patchset has been tested as a whole. Therefore, the tested-by fields
have been added only to two patches
in the set. All the individual patches compile cleanly. However,
I've refrained from adding tested-by to
most of the patches, so as to not mislead the reviewer/reader.
A brief note on the unit tests:
1. Increase number of queues to 64. Load driver. Run IOs via Medusa.
12+ hour run successful.
2. Configure multipathing, and run link flaps on single link.
IOs drop briefly, but pick up as expected.
3. Configure multipathing, and run link flaps on two links, with a
30 second delay in between. IOs drop briefly, but pick up as expected.
Repeat the above tests with 1 queue and 32 queues. All tests were
successful.
Please consider this patch series for the next merge window.
Link: https://lore.kernel.org/r/20231211173617.932990-1-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Improve existing logs by adding fnic number, hardware queue, tag, and mqtag
in the prints. Add logs with the above elements for effective debugging.
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Tested-by: Karan Tilak Kumar <kartilak@cisco.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20231211173617.932990-13-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Implement support for MQ in fnic driver:
The block multiqueue layer issues IO to the fnic driver with an MQ tag. Use
the mqtag and derive a tag from it. Derive the hardware queue from the
mqtag and use it in all paths. Modify queuecommand to handle mqtag.
Replace wq and cq indices to support MQ. Replace the zeroth queue with a
hardware queue. Implement spin locks on a per hardware queue basis.
Replace io_lock with per hardware queue spinlock. Implement out of range
tag checks.
Allocate an io_req_table to track status of the io_req.
Test the driver by building it, loading it, and configuring 64 queues in
UCSM. Issue IOs using Medusa on multiple fnics. Enable/disable links to
exercise the abort and clean up path.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310300032.2awCqkfn-lkp@intel.com/
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Tested-by: Karan Tilak Kumar <kartilak@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20231211173617.932990-12-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Set map_queues in the fnic_host_template to fnic_mq_map_queues_cpus.
Define fnic_mq_map_queues_cpus to set cpu assignment to fnic queues.
Refactor code in fnic_probe to enable vnic queues before scsi_add_host.
Modify notify set to the correct index.
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: John Garry <john.g.garry@oracle.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20231211173617.932990-11-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Remove usage of host_lock. Replace with fnic_lock, where necessary. fnic
does not use host_lock. fnic uses fnic_lock. Use fnic lock to protect fnic
members in fnic_queuecommand. Add log messages in error cases.
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20231211173617.932990-10-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Define an array to track IOs for the different queues, print the IO stats
in fnic get stats data.
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20231211173617.932990-9-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Modify interrupt service routines for INTx, MSI, and MSI-x to support
multiqueue. Modify parameter list of fnic_wq_copy_cmpl_handler to take
cq_index. Modify fnic_cleanup function to use the new function call of
fnic_wq_copy_cmpl_handler. Refactor code to set interrupt mode to MSI-x to
a new function. Add a new stat for intx_dummy.
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202310251847.4T8BVZAZ-lkp@intel.com/
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20231211173617.932990-8-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Refactor and re-define values in fnic.h to implement multiqueue (MQ)
functionality.
VIC firmware allows fnic to create up to 64 copy workqueues. Update the
copy workqueue max to 64. Modify the interrupt index to be in sync with
the firmware to support MQ. Add irq number to the MSIX entry. Define a
software workqueue table to track the status of io_reqs. Define a base for
the copy workqueue.
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20231211173617.932990-7-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>
Get the copy workqueue count and interrupt mode from the configuration. The
config can be changed via UCSM. Add logs to print the interrupt mode and
copy workqueue count. Add logs to print the vNIC resources.
Reviewed-by: Sesidhar Baddela <sebaddel@cisco.com>
Reviewed-by: Arulprabhu Ponnusamy <arulponn@cisco.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Karan Tilak Kumar <kartilak@cisco.com>
Link: https://lore.kernel.org/r/20231211173617.932990-6-kartilak@cisco.com
Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com>