Commit Graph

3538 Commits

Author SHA1 Message Date
Linus Torvalds
26bb0d3f38 for-6.12/block-20240913
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmbkZhQQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjOKD/0fzd4yOcqxSI9W3OLGd04VrOTJIQa4CRbV
 GmoTq39pOeIDVGug5ekkTpqqHHnuGk+nQhCzD9vsN/eTmC7yZOIr847O2aWzvYEn
 PzFRgmJpoo2E9sr/IsTR5LnJjbaIZhQVkqLH6ZOj9tpKlVwN2SK0nIRVNrAi5zgT
 MaDrto/2OUld+vmA99Rgb23jxM6UBdCPIjuiVa+11Vg9Z3D1tWbBmrsG7OMysyIf
 FbASBeKHqFSO61/ipFCZv6VV1X8zoWEVyT8n4A1yUbbN5rLzPgoQJVbfSqQRXIdr
 cdrKeCbKxl+joSgKS6LKpvnfwRgGF+hgAfpZg4c0vrbZGTQcRhhLFECyh/aVI08F
 p5TOMArhVaX59664gHgSPq4KnGTXOO29dot9N3Jya/ZQnxinjY9r+GVOfLuduPPy
 1B04vab8oAsk4zK7fZbkDxgYUyifwzK/vQ6OqYq2mYdpdIS/AE7T2ou61Bz5mI7I
 /BuucNV0Z96OKlyLEXwXXZjZgNu1TFcq6ARIBJ8L08PY64Fesj5BXabRyXkeNH26
 0exyz9heeJs6OwRGfngXmS24tDSS0k74CeZX3KoePNj69u6KCn346KiU1qgntwwD
 E5F7AEHqCl5FjUEIWB4M1EPlfA8U0MzOL+tkx2xKJAjsU60wAy7jRSyOIcqodpMs
 6UlPcJzgYg==
 =uuLl
 -----END PGP SIGNATURE-----

Merge tag 'for-6.12/block-20240913' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - MD changes via Song:
      - md-bitmap refactoring (Yu Kuai)
      - raid5 performance optimization (Artur Paszkiewicz)
      - Other small fixes (Yu Kuai, Chen Ni)
      - Add a sysfs entry 'new_level' (Xiao Ni)
      - Improve information reported in /proc/mdstat (Mateusz Kusiak)

 - NVMe changes via Keith:
      - Asynchronous namespace scanning (Stuart)
      - TCP TLS updates (Hannes)
      - RDMA queue controller validation (Niklas)
      - Align field names to the spec (Anuj)
      - Metadata support validation (Puranjay)
      - A syntax cleanup (Shen)
      - Fix a Kconfig linking error (Arnd)
      - New queue-depth quirk (Keith)

 - Add missing unplug trace event (Keith)

 - blk-iocost fixes (Colin, Konstantin)

 - t10-pi modular removal and fixes (Alexey)

 - Fix for potential BLKSECDISCARD overflow (Alexey)

 - bio splitting cleanups and fixes (Christoph)

 - Deal with folios rather than rather than pages, speeding up how the
   block layer handles bigger IOs (Kundan)

 - Use spinlocks rather than bit spinlocks in zram (Sebastian, Mike)

 - Reduce zoned device overhead in ublk (Ming)

 - Add and use sendpages_ok() for drbd and nvme-tcp (Ofir)

 - Fix regression in partition error pointer checking (Riyan)

 - Add support for write zeroes and rotational status in nbd (Wouter)

 - Add Yu Kuai as new BFQ maintainer. The scheduler has been
   unmaintained for quite a while.

 - Various sets of fixes for BFQ (Yu Kuai)

 - Misc fixes and cleanups (Alvaro, Christophe, Li, Md Haris, Mikhail,
   Yang)

* tag 'for-6.12/block-20240913' of git://git.kernel.dk/linux: (120 commits)
  nvme-pci: qdepth 1 quirk
  block: fix potential invalid pointer dereference in blk_add_partition
  blk_iocost: make read-only static array vrate_adj_pct const
  block: unpin user pages belonging to a folio at once
  mm: release number of pages of a folio
  block: introduce folio awareness and add a bigger size from folio
  block: Added folio-ized version of bio_add_hw_page()
  block, bfq: factor out a helper to split bfqq in bfq_init_rq()
  block, bfq: remove local variable 'bfqq_already_existing' in bfq_init_rq()
  block, bfq: remove local variable 'split' in bfq_init_rq()
  block, bfq: remove bfq_log_bfqg()
  block, bfq: merge bfq_release_process_ref() into bfq_put_cooperator()
  block, bfq: fix procress reference leakage for bfqq in merge chain
  block, bfq: fix uaf for accessing waker_bfqq after splitting
  blk-throttle: support prioritized processing of metadata
  blk-throttle: remove last_low_overflow_time
  drbd: Add NULL check for net_conf to prevent dereference in state validation
  nvme-tcp: fix link failure for TCP auth
  blk-mq: add missing unplug trace event
  mtip32xx: Remove redundant null pointer checks in mtip_hw_debugfs_init()
  ...
2024-09-16 13:33:06 +02:00
Keith Busch
83bdfcbdbe nvme-pci: qdepth 1 quirk
Another device has been reported to be unreliable if we have more than
one outstanding command. In this new case, data corruption may occur.
Since we have two devices now needing this quirky behavior, make a
generic quirk flag.

The same Apple quirk is clearly not "temporary", so update the comment
while moving it.

Link: https://lore.kernel.org/linux-nvme/191d810a4e3.fcc6066c765804.973611676137075390@collabora.com/
Reported-by: Robert Beckett <bob.beckett@collabora.com>
Reviewed-by: Christoph Hellwig hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-09-13 07:29:30 -07:00
Arnd Bergmann
2d5a333e09 nvme-tcp: fix link failure for TCP auth
The nvme fabric driver calls the nvme_tls_key_lookup() function from
nvmf_parse_key() when the keyring is enabled, but this is broken in a
configuration with CONFIG_NVME_FABRICS=y and CONFIG_NVME_TCP=m because
this leads to the function definition being in a loadable module:

x86_64-linux-ld: vmlinux.o: in function `nvmf_parse_key':
fabrics.c:(.text+0xb1bdec): undefined reference to `nvme_tls_key_lookup'

Move the 'select' up to CONFIG_NVME_FABRICS itself to force this
part to be built-in as well if needed.

Fixes: 5bc46b49c8 ("nvme-tcp: check for invalidated or revoked key")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-09-10 07:41:07 -07:00
Shen Lichuan
389e72c5d1 nvme: Convert comma to semicolon
To ensure code clarity and prevent potential errors, it's advisable
to employ the ';' as a statement separator, except when ',' are
intentionally used for specific purposes.

Signed-off-by: Shen Lichuan <shenlichuan@vivo.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-09-06 14:20:38 -07:00
Maurizio Lombardi
899d2e5a4e nvmet: Identify-Active Namespace ID List command should reject invalid nsid
nsid values of 0xFFFFFFFE and 0XFFFFFFFF should be rejected with
a status code of "Invalid Namespace or Format".
See NVMe Base Specification, Active Namespace ID list (CNS 02h).

Fixes: a07b4970f4 ("nvmet: add a generic NVMe target")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-09-03 10:05:40 -07:00
Christoph Hellwig
28982ad73d nvme: set BLK_FEAT_ZONED for ZNS multipath disks
The new stricter limits validation doesn't like a max_append_sectors value
to be set without BLK_FEAT_ZONED.  Set it before allocation the disk to
fix this instead of just inheriting it later.

Fixes: d690cb8ae1 ("block: add an API to atomically update queue limits")
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-09-03 10:00:10 -07:00
Puranjay Mohan
7c2fd76048 nvme: fix metadata handling in nvme-passthrough
On an NVMe namespace that does not support metadata, it is possible to
send an IO command with metadata through io-passthru. This allows issues
like [1] to trigger in the completion code path.
nvme_map_user_request() doesn't check if the namespace supports metadata
before sending it forward. It also allows admin commands with metadata to
be processed as it ignores metadata when bdev == NULL and may report
success.

Reject an IO command with metadata when the NVMe namespace doesn't
support it and reject an admin command if it has metadata.

[1] https://lore.kernel.org/all/mb61pcylvnym8.fsf@amazon.com/

Suggested-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Puranjay Mohan <pjy@amazon.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-30 07:50:29 -07:00
Georg Gottleuber
61aa894e7a nvme-pci: Add sleep quirk for Samsung 990 Evo
On some TUXEDO platforms, a Samsung 990 Evo NVMe leads to a high
power consumption in s2idle sleep (2-3 watts).

This patch applies 'Force No Simple Suspend' quirk to achieve a
sleep with a lower power consumption, typically around 0.5 watts.

Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com>
Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Cc: <stable@vger.kernel.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-27 12:06:43 -07:00
Keith Busch
6f01bdbfef nvme-pci: allocate tagset on reset if necessary
If a drive is unable to create IO queues on the initial probe, a
subsequent reset will need to allocate the tagset if IO queue creation
is successful. Without this, blk_mq_update_nr_hw_queues will crash on a
bad pointer due to the invalid tagset.

Fixes: eac3ef2629 ("nvme-pci: split the initial probe from the rest path")
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-27 07:55:40 -07:00
Maurizio Lombardi
5572a55a6f nvmet-tcp: fix kernel crash if commands allocation fails
If the commands allocation fails in nvmet_tcp_alloc_cmds()
the kernel crashes in nvmet_tcp_release_queue_work() because of
a NULL pointer dereference.

  nvmet: failed to install queue 0 cntlid 1 ret 6
  Unable to handle kernel NULL pointer dereference at
         virtual address 0000000000000008

Fix the bug by setting queue->nr_cmds to zero in case
nvmet_tcp_alloc_cmd() fails.

Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-26 16:00:52 -07:00
Anuj Gupta
cead0b8991 nvme: rename apptag and appmask to lbat and lbatm
Rename apptag and appmask to lbat and lbatm so that it matches the field
names used in NVMe spec.

Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Suggested-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-26 09:51:32 -07:00
Niklas Cassel
03c3d7c743 nvme-rdma: send cntlid in the RDMA_CM_REQUEST Private Data
When sending a RDMA_CM_REQUEST, the NVMe RDMA Transport Specification
allows you to populate the cntlid field in the RDMA_CM_REQUEST Private
Data.

The cntlid is returned by the target on completion of the first
RDMA_CM_REQUEST command (which creates the admin queue).

The cntlid field can then be populated by the host when the I/O queues
are created (using additional RDMA_CM_REQUEST commands), such that the
target can perform extra validation for additional RDMA_CM_REQUEST
commands.

This additional error code and error message is also added, such that
nvme_rdma_cm_msg() will display the proper error message if the target
fails the RDMA_CM_REQUEST command because of this extra validation.

Signed-off-by: Niklas Cassel <cassel@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-26 08:11:19 -07:00
Keith Busch
5a6d3a638c nvme: use better description for async reset reason
The NVMe AER notification of a persistent internal error triggers a
reset. The existing warning message just says "due to AER", which can be
confused with the unrelated PCIe AER condition. Just say what the event
was instead of the generic overloaded acronym.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-23 09:52:04 -07:00
Jinjie Ruan
f4bd313993 nvmet: Make nvmet_debugfs static
The sparse tool complains as follows:

drivers/nvme/target/debugfs.c:16:15: warning:
	symbol 'nvmet_debugfs' was not declared. Should it be static?

This symbol is not used outside debugfs.c, so marks it static.

Signed-off-by: Jinjie Ruan <ruanjinjie@huawei.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-23 09:50:16 -07:00
Nilay Shroff
fe01751347 nvme: Remove unused field
The "name" field in struct nvme_ctrl is unsued so removing it.
This would help save 12 bytes of space for each nvme_ctrl instance
created.

Signed-off-by: Nilay Shroff <nilay@linux.ibm.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-22 13:28:40 -07:00
Ming Lei
a54a93d0e3 nvme: move stopping keep-alive into nvme_uninit_ctrl()
Commit 4733b65d82 ("nvme: start keep-alive after admin queue setup")
moves starting keep-alive from nvme_start_ctrl() into
nvme_init_ctrl_finish(), but don't move stopping keep-alive into
nvme_uninit_ctrl(), so keep-alive work can be started and keep pending
after failing to start controller, finally use-after-free is triggered if
nvme host driver is unloaded.

This patch fixes kernel panic when running nvme/004 in case that connection
failure is triggered, by moving stopping keep-alive into nvme_uninit_ctrl().

This way is reasonable because keep-alive is now started in
nvme_init_ctrl_finish().

Fixes: 3af755a468 ("nvme: move nvme_stop_keep_alive() back to original position")
Cc: Hannes Reinecke <hare@suse.de>
Cc: Mark O'Donovan <shiftee@posteo.net>
Reported-by: Changhui Zhong <czhong@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-22 13:28:40 -07:00
Hannes Reinecke
ff4a0a4088 nvme-target: do not check authentication status for admin commands twice
nvmet_check_ctrl_status() checks the authentication status, so
we don't need to do that prior to calling it.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-22 13:25:11 -07:00
Hannes Reinecke
bb2df18958 nvmet-auth: allow to clear DH-HMAC-CHAP keys
As we can set DH-HMAC-CHAP keys, we should also be
able to unset them.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-22 13:25:11 -07:00
Hannes Reinecke
02a3688c53 nvme-sysfs: add 'tls_keyring' attribute
Add a 'tls_keyring' attribute to display the contents of the
--keyring option from the connect string. Adding this attribute
allows us to recreate the original connect string from sysfs
settings.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-22 13:25:11 -07:00
Hannes Reinecke
f5eb739747 nvme-sysfs: add 'tls_configured_key' sysfs attribute
There is a difference between the negotiated TLS key (which is
always present for a TLS encrypted connection) and the configured
TLS key (which is specified with the --tls_key command line option).
To differentate between these two add a new sysfs attribute
'tls_configured_key' to hold the specified on the command line.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-22 13:25:11 -07:00
Hannes Reinecke
1e48b34c9b nvme: split off TLS sysfs attributes into a separate group
Split off TLS sysfs attributes into a separate group to improve
readability and to keep all TLS related handling in one section.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-22 13:25:11 -07:00
Hannes Reinecke
c5f2ca52d0 nvme: add a newline to the 'tls_key' sysfs attribute
Print a newline for easier userspace handling.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-22 13:25:11 -07:00
Hannes Reinecke
5bc46b49c8 nvme-tcp: check for invalidated or revoked key
key_lookup() will always return a key, even if that key is revoked
or invalidated. So check for invalid keys before continuing.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-22 13:25:07 -07:00
Hannes Reinecke
363895767f nvme-tcp: sanitize TLS key handling
There is a difference between TLS configured (ie the user has
provisioned/requested a key) and TLS enabled (ie the connection
is encrypted with TLS). This becomes important for secure concatenation,
where the initial authentication is run on an unencrypted connection
(ie with TLS configured, but not enabled), and then the queue is reset to
run over TLS (ie TLS configured _and_ enabled).
So to differentiate between those two states store the generated
key in opts->tls_key (as we're using the same TLS key for all queues),
the key serial of the resulting TLS handshake in ctrl->tls_pskid
(to signal that TLS on the admin queue is enabled), and a simple
flag for the queues to indicated that TLS has been enabled.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-22 13:22:41 -07:00
Hannes Reinecke
79559c7533 nvme-keyring: restrict match length for version '1' identifiers
TP8018 introduced a new TLS PSK identifier version (version 1), which appended
a PSK hash value to the existing identifier (cf NVMe TCP specification v1.1,
section 3.6.1.3 'TLS PSK and PSK Identity Derivation').
An original (version 0) identifier has the form:

NVMe0<type><hmac> <hostnqn> <subsysnqn>

and a version 1 identifier has the form:

NVMe1<type><hmac> <hostnqn> <subsysnqn> <hash>

This patch modifies the lookup algorthm to compare only the first part
of the identifier (excluding the hash value) to handle both version 0 and
version 1 identifiers.
And the spec declares 'version 0' identifiers obsolete, so the lookup
algorithm is modified to prever v1 identifiers.

Signed-off-by: Hannes Reinecke <hare@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-08-22 13:22:41 -07:00
Stuart Hayes
4e893ca811 nvme_core: scan namespaces asynchronously
Use async function calls to make namespace scanning happen in parallel.

Without the patch, NVME namespaces are scanned serially, so it can take
a long time for all of a controller's namespaces to become available,
especially with a slower (TCP) interface with large number of
namespaces.

It is not uncommon to have large numbers (hundreds or thousands) of
namespaces on nvme-of with storage servers.

The time it took for all namespaces to show up after connecting (via
TCP) to a controller with 1002 namespaces was measured on one system:

network latency   without patch   with patch
     0                 6s            1s
    50ms             210s           10s
   100ms             417s           18s

Measurements taken on another system show the effect of the patch on the
time nvme_scan_work() took to complete, when connecting to a linux
nvme-of target with varying numbers of namespaces, on a network of
400us.

namespaces    without patch   with patch
     1            16ms           14ms
     2            24ms           16ms
     4            49ms           22ms
     8           101ms           33ms
    16           207ms           56ms
   100           1.4s           0.6s
  1000          12.9s           2.0s

On the same system, connecting to a local PCIe NVMe drive (a Samsung
PM1733) instead of a network target:

namespaces    without patch   with patch
     1            13ms           12ms
     2            41ms           13ms

Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
2024-08-22 10:05:22 -07:00
Kanchan Joshi
b4c1f33a5d nvme: reorganize nvme_ns_head fields
shuffle few fields to reduce the holes within nvme_ns_head.
On x86_64, the size is reduced to 1104 bytes from 1120 bytes.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-31 07:40:10 -07:00
Kanchan Joshi
73d148ccb9 nvme: change data type of lba_shift
u8 fits the need, so stop using int for it.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-31 07:40:10 -07:00
Kanchan Joshi
6339b7edad nvme: remove a field from nvme_ns_head
pi_offset field is not required to be present in nvme_ns_head.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-31 07:40:10 -07:00
Kanchan Joshi
7ec5bd247a nvme: remove unused parameter
First parameter of nvme_init_integrity() is unused.
Remove it, and modify the callers.

Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-29 07:27:58 -07:00
Ofir Gal
6af7331a70 nvme-tcp: use sendpages_ok() instead of sendpage_ok()
Currently nvme_tcp_try_send_data() use sendpage_ok() in order to disable
MSG_SPLICE_PAGES, it check the first page of the iterator, the iterator
may represent contiguous pages.

MSG_SPLICE_PAGES enables skb_splice_from_iter() which checks all the
pages it sends with sendpage_ok().

When nvme_tcp_try_send_data() sends an iterator that the first page is
sendable, but one of the other pages isn't skb_splice_from_iter() warns
and aborts the data transfer.

Using the new helper sendpages_ok() in order to disable MSG_SPLICE_PAGES
solves the issue.

Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Ofir Gal <ofir.gal@volumez.com>
Link: https://lore.kernel.org/r/20240718084515.3833733-3-ofir.gal@volumez.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-28 16:47:52 -06:00
Jens Axboe
f6bb5254b7 nvme fixes for Linux 6.11
- Fix request without payloads cleanup  (Leon)
  - Use new protection information format (Francis)
  - Improved debug message for lost pci link (Bart)
  - Another apst quirk (Wang)
  - Use appropriate sysfs api for printing chars (Markus)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE3Fbyvv+648XNRdHTPe3zGtjzRgkFAmajrFcACgkQPe3zGtjz
 Rgk4QRAA357EtsOgDKAbodaePFsGhmfhVEVhizCkw+yj+l73jGWMRMDfbpNsnkii
 OTJaXhQgVPrCBGnmbgHn5GehEKJ/VZwW4fFbF42wvetFXWSYFDPL2yqC3YpaPjNp
 aSJvUM0rw4f+JE52tTPQvVzLQ/M9PZFsJ6sUmoozV5MoPLt4eKKUEHKigCXpXV/p
 AOwiejVVk035WbKNq4R8DoQsa05Yk0Tv5zKsFgmXEZjrnorC0dpqWQjT5HH6V9pt
 eHTA2cxKq9qAHBN1Zm/3HUOmxmJZ1GW3AKLxYM+k0ornnfnO7inlQwNJDsQItXXS
 ZNBELiYIIObVoy6COB03NWMSCcS/TrpfSKJ9s+JOdJt/T+AOVCwQkqpIff6aJTaH
 k4ppVjChmaY3+taIkLQ5nC1zecCZr7hY+xL0ZkUGhlznKn9x2a1zOBZ6tUuabul6
 57JztkeXPyTNZ/t1WhYQQpGQ4MCLXnu81gMRzVKfJcmtMOSrBOO1p8eNjb/LgM4M
 Qpu9VKS33OBrmMEBlhnBvhkFIXxHUU1CjZJkQ2MYm4YLLnaO4VmmOk3tcX1mpID4
 R6GrsXAOBlqjtAyqsTGQJ/+Z5BRp1nhOk/E0APiD+sbgJPiRSJ2HNZpDEbOroYOy
 4IiaQwZbyGSjieBWWe0fMMCBbazk2S9ws57eAbVh5/r6L+DSWTc=
 =1Rzv
 -----END PGP SIGNATURE-----

Merge tag 'nvme-6.11-2024-07-26' of git://git.infradead.org/nvme into block-6.11

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.11

 - Fix request without payloads cleanup  (Leon)
 - Use new protection information format (Francis)
 - Improved debug message for lost pci link (Bart)
 - Another apst quirk (Wang)
 - Use appropriate sysfs api for printing chars (Markus)"

* tag 'nvme-6.11-2024-07-26' of git://git.infradead.org/nvme:
  nvme-pci: add missing condition check for existence of mapped data
  nvme-core: choose PIF from QPIF if QPIFS supports and PIF is QTYPE
  nvme-pci: Fix the instructions for disabling power management
  nvme: remove redundant bdev local variable
  nvme-fabrics: Use seq_putc() in __nvmf_concat_opt_tokens()
  nvme/pci: Add APST quirk for Lenovo N60z laptop
2024-07-26 08:06:15 -06:00
Leon Romanovsky
c31fad1470 nvme-pci: add missing condition check for existence of mapped data
nvme_map_data() is called when request has physical segments, hence
the nvme_unmap_data() should have same condition to avoid dereference.

Fixes: 4aedb70543 ("nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data")
Signed-off-by: Leon Romanovsky <leonro@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-25 07:20:54 -07:00
Linus Torvalds
0256994887 for-6.11/block-post-20240722
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmaeY00QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpjPGD/9CPo93+V/ztfzY1J18KhA2CCUh1uuxZIjx
 dLfi07Bo+gyLwB1vaSf0bNy9gM8SzGFSMszSIDTErNq9/F6RvWjXN0CchyQf1Wii
 o2UyQg8JLjT2o1pJSsdJySZQRsG/daWUHzHaX1kD343Cd6OBV2YaVFdYTaXUGg4v
 G1AVh7qFvQhAIg1jV8q2z7QC7PSeuTnvyvY65Z8/iVJe95FayOrtGmDPTaJab8r2
 7uEFiWZk23erzNygVdcSoNIrwWFmRARz5o3IvwJJfEL08hkdoAqu6vD2oCUZspKU
 3g4wU6JrN0QYQpVwIJ9WcwYcoOm6iMm9xwCVMsp8R3KRUU107HjaiEazFDGk4HW4
 ozZTa7leTXnrRqnjVhcQpUvC+1uVLCFN8sSElNY7m2dg0IojnlMz+t3lMiTtaR9N
 Rt6wy5alVQFlb2uhzALuUh6HM1zA98swWySNoP0arTkOT9kjXwwAgn0I+M1s9Uxo
 FaQvM0YnAsb2C8LSpNtZWLaTlRSLTzUsGThLSJMBZueIJ9+BF23i7W7euklCNxjj
 Jl6CykEkEkacOxU6b9PG6qSnUq9JJ+W7gcJVing+ugAFrZDutxy6eJZXVv8wuvCC
 EOxaADpSs2xAaH9V0BMmwO51w0NDWySyGPHB5UBkhNjqOji/oG3FvAITiboQArgS
 FES4jtU1TA==
 =dn4l
 -----END PGP SIGNATURE-----

Merge tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux

Pull block integrity mapping updates from Jens Axboe:
 "A set of cleanups and fixes for the block integrity support.

  Sent separately from the main block changes from last week, as they
  depended on later fixes in the 6.10-rc cycle"

* tag 'for-6.11/block-post-20240722' of git://git.kernel.dk/linux:
  block: don't free the integrity payload in bio_integrity_unmap_free_user
  block: don't free submitter owned integrity payload on I/O completion
  block: call bio_integrity_unmap_free_user from blk_rq_unmap_user
  block: don't call bio_uninit from bio_endio
  block: also return bio_integrity_payload * from stubs
  block: split integrity support out of bio.h
2024-07-22 11:04:09 -07:00
Francis Pravin
415fb383ec nvme-core: choose PIF from QPIF if QPIFS supports and PIF is QTYPE
As per TP4141a:
"If the Qualified Protection Information Format Support(QPIFS) bit is
set to 1 and the Protection Information Format(PIF) field is set to 11b
(i.e., Qualified Type), then the pif is as defined in the Qualified
Protection Information Format (QPIF) field."
So, choose PIF from QPIF if QPIFS supports and PIF is QTYPE.

Signed-off-by: Francis Pravin <francis.p@samsung.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-16 07:55:31 -07:00
Linus Torvalds
3e78198862 for-6.11/block-20240710
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmaOTd8QHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgppqIEACUr8Vv2FtezvT3OfVSlYWHHLXzkRhwEG5s
 vdk0o7Ow6U54sMjfymbHTgLD0ZOJf3uJ6BI95FQuW41jPzDFVbx4Hy8QzqonMkw9
 1D/YQ4zrVL2mOKBzATbKpoGJzMOzGeoXEueFZ1AYPAX7RrDtP4xPQNfrcfkdE2zF
 LycJN70Vp6lrZZMuI9yb9ts1tf7TFzK0HJANxOAKTgSiPmBmxesjkJlhrdUrgkAU
 qDVyjj7u/ssndBJAb9i6Bl95Do8s9t4DeJq5/6wgKqtf5hClMXzPVB8Wy084gr6E
 rTRsCEhOug3qEZSqfAgAxnd3XFRNc/p2KMUe5YZ4mAqux4hpSmIQQDM/5X5K9vEv
 f4MNqUGlqyqntZx+KPyFpf7kLHFYS1qK4ub0FojWJEY4GrbBPNjjncLJ9+ozR0c8
 kNDaFjMNAjalBee1FxNNH8LdVcd28rrCkPxRLEfO/gvBMUmvJf4ZyKmSED0v5DhY
 vZqKlBqG+wg0EXvdiWEHMDh9Y+q/2XBIkS6NN/Bhh61HNu+XzC838ts1X7lR+4o2
 AM5Vapw+v0q6kFBMRP3IcJI/c0UcIU8EQU7axMyzWtvhog8kx8x01hIj1L4UyYYr
 rUdWrkugBVXJbywFuH/QIJxWxS/z4JdSw5VjASJLIrXy+aANmmG9Wonv95eyhpUv
 5iv+EdRSNA==
 =wVi8
 -----END PGP SIGNATURE-----

Merge tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:

 - NVMe updates via Keith:
     - Device initialization memory leak fixes (Keith)
     - More constants defined (Weiwen)
     - Target debugfs support (Hannes)
     - PCIe subsystem reset enhancements (Keith)
     - Queue-depth multipath policy (Redhat and PureStorage)
     - Implement get_unique_id (Christoph)
     - Authentication error fixes (Gaosheng)

 - MD updates via Song
     - sync_action fix and refactoring (Yu Kuai)
     - Various small fixes (Christoph Hellwig, Li Nan, and Ofir Gal, Yu
       Kuai, Benjamin Marzinski, Christophe JAILLET, Yang Li)

 - Fix loop detach/open race (Gulam)

 - Fix lower control limit for blk-throttle (Yu)

 - Add module descriptions to various drivers (Jeff)

 - Add support for atomic writes for block devices, and statx reporting
   for same. Includes SCSI and NVMe (John, Prasad, Alan)

 - Add IO priority information to block trace points (Dongliang)

 - Various zone improvements and tweaks (Damien)

 - mq-deadline tag reservation improvements (Bart)

 - Ignore direct reclaim swap writes in writeback throttling (Baokun)

 - Block integrity improvements and fixes (Anuj)

 - Add basic support for rust based block drivers. Has a dummy null_blk
   variant for now (Andreas)

 - Series converting driver settings to queue limits, and cleanups and
   fixes related to that (Christoph)

 - Cleanup for poking too deeply into the bvec internals, in preparation
   for DMA mapping API changes (Christoph)

 - Various minor tweaks and fixes (Jiapeng, John, Kanchan, Mikulas,
   Ming, Zhu, Damien, Christophe, Chaitanya)

* tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux: (206 commits)
  floppy: add missing MODULE_DESCRIPTION() macro
  loop: add missing MODULE_DESCRIPTION() macro
  ublk_drv: add missing MODULE_DESCRIPTION() macro
  xen/blkback: add missing MODULE_DESCRIPTION() macro
  block/rnbd: Constify struct kobj_type
  block: take offset into account in blk_bvec_map_sg again
  block: fix get_max_segment_size() warning
  loop: Don't bother validating blocksize
  virtio_blk: Don't bother validating blocksize
  null_blk: Don't bother validating blocksize
  block: Validate logical block size in blk_validate_limits()
  virtio_blk: Fix default logical block size fallback
  nvmet-auth: fix nvmet_auth hash error handling
  nvme: implement ->get_unique_id
  block: pass a phys_addr_t to get_max_segment_size
  block: add a bvec_phys helper
  blk-lib: check for kill signal in ioctl BLKZEROOUT
  block: limit the Write Zeroes to manually writing zeroes fallback
  block: refacto blkdev_issue_zeroout
  block: move read-only and supported checks into (__)blkdev_issue_zeroout
  ...
2024-07-15 14:20:22 -07:00
Bart Van Assche
92fc2c469e nvme-pci: Fix the instructions for disabling power management
pcie_aspm=off tells the kernel not to modify the ASPM configuration. This
setting does not guarantee that ASPM (Active State Power Management) is
disabled. Hence add pcie_port_pm=off. This disables power management for
all PCIe ports.

This patch has been tested on a workstation with a Samsung SSD 970 EVO Plus
NVMe SSD.

Fixes: 4641a8e6e1 ("nvme-pci: add trouble shooting steps for timeouts")
Cc: Keith Busch <kbusch@kernel.org>
Cc: Christoph Hellwig <hch@lst.de>
Cc: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Bart Van Assche <bvanassche@acm.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-15 13:46:00 -07:00
Israel Rukshin
88c918d1ee nvme: remove redundant bdev local variable
Use disk directly instead of getting it from bdev->bd_disk.

Signed-off-by: Israel Rukshin <israelr@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-15 13:44:59 -07:00
Markus Elfring
1a7812b25e nvme-fabrics: Use seq_putc() in __nvmf_concat_opt_tokens()
Single characters should be put into a sequence.
Thus use the corresponding function “seq_putc”.

This issue was transformed by using the Coccinelle software.

Signed-off-by: Markus Elfring <elfring@users.sourceforge.net>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-15 13:43:41 -07:00
WangYuli
ab091ec536 nvme/pci: Add APST quirk for Lenovo N60z laptop
There is a hardware power-saving problem with the Lenovo N60z
board. When turn it on and leave it for 10 hours, there is a
20% chance that a nvme disk will not wake up until reboot.

Link: https://lore.kernel.org/all/2B5581C46AC6E335+9c7a81f1-05fb-4fd0-9fbb-108757c21628@uniontech.com
Signed-off-by: hmy <huanglin@uniontech.com>
Signed-off-by: Wentao Guan <guanwentao@uniontech.com>
Signed-off-by: WangYuli <wangyuli@uniontech.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-15 13:43:39 -07:00
Jens Axboe
6b43537fae nvme updates for Linux 6.11
- Device initialization memory leak fixes (Keith)
  - More constants defined (Weiwen)
  - Target debugfs support (Hannes)
  - PCIe subsystem reset enhancements (Keith)
  - Queue-depth multipath policy (Redhat and PureStorage)
  - Implement get_unique_id (Christoph)
  - Authentication error fixes (Gaosheng)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE3Fbyvv+648XNRdHTPe3zGtjzRgkFAmaMS6oACgkQPe3zGtjz
 Rgljeg//f47yJNR61V+zePH2prcTUdTA21z2xnEbd1IT40qkpbBIf2mS5R6loBzT
 fm8cGGgd3fdZ3qUnvaMExL//A4aaQCgoKmbtJHLv616KeCu8Iwy8GC0myG2gl05h
 +a96zy8b30FZjRE6H08EBt0JyRZUqjxVcFtvIq1ZlRu2xLWG7f2hg4S7sSKDPKw/
 SZWTeFcY1GRiUupb0tcgaJeqiQ0U7BPuyUuHGnuklMY60ePO5qgtw3D6dmRwLgj+
 pbcRzrp2OWbKCUH1JrW34Ku1gzDbOYFYLL04akAq4rIp4JzsXEgvs7rlyRn0PtDP
 V6gGAvIvb7ktMhD4GvXFQlqT29AoOCYmWr5w2xS6uXAoLZ+t3gK9VBKa+7oRBOeo
 hiZNJy/EdIIuF54nWaFpZ4FBKunNocp7w9BEYkuu/HBm5GW4m9mwLxL66BLLYgPj
 kuC2t4Nc/waO+SrZFrHErtdb+QZNW8IUWIRG3jXLjo6yipGrv+K6lZpOY3HCEXev
 7F0AAOVNFWW+nWv+mEVQkd5lCFrHDjbVX2rRC3z4saKJvDOh69pBaSCKyikZR0bO
 95wz3B//sF2STBa4b/570KMPHJTJfTkKRtaaZvkHPT/0IQi0kmuWM/yKy57Q7NPE
 Ehkk3hfWjLUHU7jNuI5wxky8un7GMZJKArFg3Q1rkQCQ8OxQUXM=
 =0en2
 -----END PGP SIGNATURE-----

Merge tag 'nvme-6.11-2024-07-08' of git://git.infradead.org/nvme into for-6.11/block

Pull NVMe updates from Keith:

"nvme updates for Linux 6.11

 - Device initialization memory leak fixes (Keith)
 - More constants defined (Weiwen)
 - Target debugfs support (Hannes)
 - PCIe subsystem reset enhancements (Keith)
 - Queue-depth multipath policy (Redhat and PureStorage)
 - Implement get_unique_id (Christoph)
 - Authentication error fixes (Gaosheng)"

* tag 'nvme-6.11-2024-07-08' of git://git.infradead.org/nvme: (21 commits)
  nvmet-auth: fix nvmet_auth hash error handling
  nvme: implement ->get_unique_id
  nvme-multipath: implement "queue-depth" iopolicy
  nvme-multipath: prepare for "queue-depth" iopolicy
  nvme-pci: do not directly handle subsys reset fallout
  lpfc_nvmet: implement 'host_traddr'
  nvme-fcloop: implement 'host_traddr'
  nvmet-fc: implement host_traddr()
  nvmet-rdma: implement host_traddr()
  nvmet-tcp: implement host_traddr()
  nvmet: add 'host_traddr' callback for debugfs
  nvmet: add debugfs support
  mailmap: add entry for Weiwen Hu
  nvme: rename CDR/MORE/DNR to NVME_STATUS_*
  nvme: fix status magic numbers
  nvme: rename nvme_sc_to_pr_err to nvme_status_to_pr_err
  nvme: split device add from initialization
  nvme: fc: split controller bringup handling
  nvme: rdma: split controller bringup handling
  nvme: tcp: split controller bringup handling
  ...
2024-07-08 23:57:02 -06:00
Gaosheng Cui
89f58f96d1 nvmet-auth: fix nvmet_auth hash error handling
If we fail to call nvme_auth_augmented_challenge, or fail to kmalloc
for shash, we should free the memory allocation for challenge, so add
err path out_free_challenge to fix the memory leak.

Fixes: 7a277c37d3 ("nvmet-auth: Diffie-Hellman key exchange support")
Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-08 10:28:16 -07:00
Christoph Hellwig
18f03a063d nvme: implement ->get_unique_id
Implement the get_unique_id method to allow pNFS SCSI layout access to
NVMe namespaces.

This is the server side implementation of RFC 9561 "Using the Parallel
NFS (pNFS) SCSI Layout to Access Non-Volatile Memory Express (NVMe)
Storage Devices".

Signed-off-by: Christoph Hellwig <hch@lst.de>
Acked-by: Chuck Lever <chuck.lever@oracle.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-08 10:25:39 -07:00
Damien Le Moal
f2a7bea237 block: Remove REQ_OP_ZONE_RESET_ALL emulation
Now that device mapper can handle resetting all zones of a mapped zoned
device using REQ_OP_ZONE_RESET_ALL, all zoned block device drivers
support this operation. With this, the request queue feature
BLK_FEAT_ZONE_RESETALL is not necessary and the emulation code in
blk-zone.c can be removed.

Signed-off-by: Damien Le Moal <dlemoal@kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240704052816.623865-5-dlemoal@kernel.org
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-05 00:42:04 -06:00
Christoph Hellwig
f8924374fd block: call bio_integrity_unmap_free_user from blk_rq_unmap_user
blk_rq_unmap_user always unmaps user space pass-through request.  If such
a request has integrity data attached it must come from a user mapping
as well.  Call bio_integrity_unmap_free_user from blk_rq_unmap_user
and remove the nvme_unmap_bio wrapper in the nvme driver.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240702151047.1746127-5-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-03 10:21:16 -06:00
Christoph Hellwig
da042a3655 block: split integrity support out of bio.h
Split struct bio_integrity_payload and the related prototypes out of
bio.h into a separate bio-integrity.h header so that it is only pulled
in by the few places that need it.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Anuj Gupta <anuj20.g@samsung.com>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20240702151047.1746127-2-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-03 10:21:15 -06:00
Jens Axboe
1a50d14670 Linux 6.10-rc6
-----BEGIN PGP SIGNATURE-----
 
 iQFSBAABCAA8FiEEq68RxlopcLEwq+PEeb4+QwBBGIYFAmaB0NweHHRvcnZhbGRz
 QGxpbnV4LWZvdW5kYXRpb24ub3JnAAoJEHm+PkMAQRiGkvwH/36UJRk/o6wvXnyH
 E6QjCSWo2226APyWks22NjtC3I/8Iqdvkneuh6wG0qL2sXAB078EMjUq5R81bF8H
 wWFBJwetjYTp8GEyLioMEb2wCH/J3R29dLFC4UYTplafXRGP6//xcpJaKmTxcgdR
 31IzvTPXbApZ7L3k1U6rA2bK9PNKcFCOvZlrNMUCuwMrabymHsDfOUt1DqXyg2xp
 zjqiWYBwlklozmgawSWt/mdEgkWuTcAbg+KyqDVQF59s9aj/OOwZ0j+HACq5V8CM
 quTPIAYL6CC9p7uxa69lGr/sgC0Is/BZLPX7RTZAwCgarGvnX+1HUsjDcaFCtrVg
 O6fPUV8=
 =pgUx
 -----END PGP SIGNATURE-----

Merge tag 'v6.10-rc6' into for-6.11/block-post

Pull in v6.10-rc6 to resolve a conflict for the integrity cleanups.

* tag 'v6.10-rc6': (778 commits)
  Linux 6.10-rc6
  ata: ahci: Clean up sysfs file on error
  ata: libata-core: Fix double free on error
  ata,scsi: libata-core: Do not leak memory for ata_port struct members
  ata: libata-core: Fix null pointer dereference on error
  x86-32: fix cmpxchg8b_emu build error with clang
  x86: stop playing stack games in profile_pc()
  i2c: testunit: discard write requests while old command is running
  i2c: testunit: don't erase registers after STOP
  tty: mxser: Remove __counted_by from mxser_board.ports[]
  randomize_kstack: Remove non-functional per-arch entropy filtering
  string: kunit: add missing MODULE_DESCRIPTION() macros
  ata: libata-core: Add ATA_HORKAGE_NOLPM for all Crucial BX SSD1 models
  MAINTAINERS: Update IOMMU tree location
  tools/power turbostat: Add local build_bug.h header for snapshot target
  tools/power turbostat: Fix unc freq columns not showing with '-q' or '-l'
  tools/power turbostat: option '-n' is ambiguous
  drm/drm_file: Fix pid refcounting race
  kallsyms: rework symbol lookup return codes
  gpiolib: cdev: Ignore reconfiguration without direction
  ...

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-03 10:20:05 -06:00
Thomas Song
f227345f0a nvme-multipath: implement "queue-depth" iopolicy
The round-robin path selector is inefficient in cases where there is a
difference in latency between paths.  In the presence of one or more
high latency paths the round-robin selector continues to use the high
latency path equally. This results in a bias towards the highest latency
path and can cause a significant decrease in overall performance as IOs
pile on the highest latency path. This problem is acute with NVMe-oF
controllers.

The queue-depth path selector sends I/O down the path with the lowest
number of requests in its request queue. Paths with lower latency will
clear requests more quickly and have less requests queued compared to
higher latency paths. The goal of this path selector is to make more use
of lower latency paths which will bring down overall IO latency and
increase throughput and performance.

Signed-off-by: Thomas Song <tsong@purestorage.com>
[emilne: commandeered patch developed by Thomas Song @ Pure Storage]
Co-developed-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Co-developed-by: John Meneghini <jmeneghi@redhat.com>
Signed-off-by: John Meneghini <jmeneghi@redhat.com>
Link: https://lore.kernel.org/linux-nvme/20240509202929.831680-1-jmeneghi@redhat.com/
Tested-by: Marco Patalano <mpatalan@redhat.com>
Tested-by: Jyoti Rani <jrani@purestorage.com>
Tested-by: John Meneghini <jmeneghi@redhat.com>
Reviewed-by: Randy Jennings <randyj@purestorage.com>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-07-02 06:47:19 -07:00
Christoph Hellwig
f3bf25d513 nvme: don't set io_opt if NOWS is zero
NOWS is one of the annoying "0's based values" in NVMe, where 0 means one
and we thus can't detect if it isn't set.  Thus a NOWS value of 0 means
that the Namespace Optimal Write Size is a single LBA, which is clearly
bogus.  Ignore the value in that case and don't propagate an io_opt
value to the block layer.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Nitesh Shetty <nj.shetty@samsung.com>
Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Link: https://lore.kernel.org/r/20240701051800.1245240-4-hch@lst.de
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2024-07-01 06:52:42 -06:00
Nathan Chancellor
440e2051c5 nvmet-fc: Remove __counted_by from nvmet_fc_tgt_queue.fod[]
Work for __counted_by on generic pointers in structures (not just
flexible array members) has started landing in Clang 19 (current tip of
tree). During the development of this feature, a restriction was added
to __counted_by to prevent the flexible array member's element type from
including a flexible array member itself such as:

  struct foo {
    int count;
    char buf[];
  };

  struct bar {
    int count;
    struct foo data[] __counted_by(count);
  };

because the size of data cannot be calculated with the standard array
size formula:

  sizeof(struct foo) * count

This restriction was downgraded to a warning but due to CONFIG_WERROR,
it can still break the build. The application of __counted_by on the fod
member of 'struct nvmet_fc_tgt_queue' triggers this restriction,
resulting in:

  drivers/nvme/target/fc.c:151:2: error: 'counted_by' should not be applied to an array with element of unknown size because 'struct nvmet_fc_fcp_iod' is a struct type with a flexible array member. This will be an error in a future compiler version [-Werror,-Wbounds-safety-counted-by-elt-type-unknown-size]
    151 |         struct nvmet_fc_fcp_iod         fod[] __counted_by(sqsize);
        |         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
  1 error generated.

Remove this use of __counted_by to fix the warning/error. However,
rather than remove it altogether, leave it commented, as it may be
possible to support this in future compiler releases.

Cc: stable@vger.kernel.org
Closes: https://github.com/ClangBuiltLinux/linux/issues/2027
Fixes: ccd3129aca ("nvmet-fc: Annotate struct nvmet_fc_tgt_queue with __counted_by")
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2024-06-26 10:13:04 -07:00