Commit Graph

3225 Commits

Author SHA1 Message Date
Maurizio Lombardi
2ed3d35328 nvmet-tcp: Fix the H2C expected PDU len calculation
[ Upstream commit 9a1abc2485 ]

The nvmet_tcp_handle_h2c_data_pdu() function should take into
consideration the possibility that the header digest and/or the data
digests are enabled when calculating the expected PDU length, before
comparing it to the value stored in cmd->pdu_len.

Fixes: efa5630590 ("nvmet-tcp: Fix a kernel panic when host sends an invalid H2C PDU length")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25 15:35:55 -08:00
Arnd Bergmann
79e9dfd7f8 nvme: trace: avoid memcpy overflow warning
[ Upstream commit a7de1dea76 ]

A previous patch introduced a struct_group() in nvme_common_command to help
stringop fortification figure out the length of the fields, but one function
is not currently using them:

In file included from drivers/nvme/target/core.c:7:
In file included from include/linux/string.h:254:
include/linux/fortify-string.h:592:4: error: call to '__read_overflow2_field' declared with 'warning' attribute: detected read beyond size of field (2nd parameter); maybe use struct_group()? [-Werror,-Wattribute-warning]
                        __read_overflow2_field(q_size_field, size);
                        ^

Change this one to use the correct field name to avoid the warning.

Fixes: 5c629dc960 ("nvme: use struct group for generic command dwords")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25 15:35:55 -08:00
Arnd Bergmann
4652eb8176 nvmet: re-fix tracing strncpy() warning
[ Upstream commit 4ee7ffeb4c ]

An earlier patch had tried to address a warning about a string copy with
missing zero termination:

drivers/nvme/target/trace.h:52:3: warning: ‘strncpy’ specified bound 32 equals destination size [-Wstringop-truncation]

The new version causes a different warning with some compiler versions, notably
gcc-9 and gcc-10, and also misses the zero padding that was apparently done
intentionally in the original code:

drivers/nvme/target/trace.h:56:2: error: 'strncpy' specified bound depends on the length of the source argument [-Werror=stringop-overflow=]

Change it to use strscpy_pad() with the original length, which will give
a properly padded and zero-terminated string as well as avoiding the warning.

Fixes: d86481e924 ("nvmet: use min of device_path and disk len")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25 15:35:55 -08:00
Maurizio Lombardi
2f00fd8d50 nvmet-tcp: fix a crash in nvmet_req_complete()
[ Upstream commit 0849a54413 ]

in nvmet_tcp_handle_h2c_data_pdu(), if the host sends a data_offset
different from rbytes_done, the driver ends up calling nvmet_req_complete()
passing a status error.
The problem is that at this point cmd->req is not yet initialized,
the kernel will crash after dereferencing a NULL pointer.

Fix the bug by replacing the call to nvmet_req_complete() with
nvmet_tcp_fatal_error().

Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Reviewed-by: Keith Busch <kbsuch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25 15:35:54 -08:00
Maurizio Lombardi
24e0576018 nvmet-tcp: Fix a kernel panic when host sends an invalid H2C PDU length
[ Upstream commit efa5630590 ]

If the host sends an H2CData command with an invalid DATAL,
the kernel may crash in nvmet_tcp_build_pdu_iovec().

Unable to handle kernel NULL pointer dereference at
virtual address 0000000000000000
lr : nvmet_tcp_io_work+0x6ac/0x718 [nvmet_tcp]
Call trace:
  process_one_work+0x174/0x3c8
  worker_thread+0x2d0/0x3e8
  kthread+0x104/0x110

Fix the bug by raising a fatal error if DATAL isn't coherent
with the packet size.
Also, the PDU length should never exceed the MAXH2CDATA parameter which
has been communicated to the host in nvmet_tcp_handle_icreq().

Fixes: 872d26a391 ("nvmet-tcp: add NVMe over TCP target driver")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-25 15:35:54 -08:00
Bitao Hu
c52d545c1e nvme: fix deadlock between reset and scan
[ Upstream commit 839a40d1e7 ]

If controller reset occurs when allocating namespace, both
nvme_reset_work and nvme_scan_work will hang, as shown below.

Test Scripts:

    for ((t=1;t<=128;t++))
    do
    nsid=`nvme create-ns /dev/nvme1 -s 14537724 -c 14537724 -f 0 -m 0 \
    -d 0 | awk -F: '{print($NF);}'`
    nvme attach-ns /dev/nvme1 -n $nsid -c 0
    done
    nvme reset /dev/nvme1

We will find that both nvme_reset_work and nvme_scan_work hung:

    INFO: task kworker/u249:4:17848 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
    message.
    task:kworker/u249:4  state:D stack:    0 pid:17848 ppid:     2
    flags:0x00000028
    Workqueue: nvme-reset-wq nvme_reset_work [nvme]
    Call trace:
    __switch_to+0xb4/0xfc
    __schedule+0x22c/0x670
    schedule+0x4c/0xd0
    blk_mq_freeze_queue_wait+0x84/0xc0
    nvme_wait_freeze+0x40/0x64 [nvme_core]
    nvme_reset_work+0x1c0/0x5cc [nvme]
    process_one_work+0x1d8/0x4b0
    worker_thread+0x230/0x440
    kthread+0x114/0x120
    INFO: task kworker/u249:3:22404 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
    message.
    task:kworker/u249:3  state:D stack:    0 pid:22404 ppid:     2
    flags:0x00000028
    Workqueue: nvme-wq nvme_scan_work [nvme_core]
    Call trace:
    __switch_to+0xb4/0xfc
    __schedule+0x22c/0x670
    schedule+0x4c/0xd0
    rwsem_down_write_slowpath+0x32c/0x98c
    down_write+0x70/0x80
    nvme_alloc_ns+0x1ac/0x38c [nvme_core]
    nvme_validate_or_alloc_ns+0xbc/0x150 [nvme_core]
    nvme_scan_ns_list+0xe8/0x2e4 [nvme_core]
    nvme_scan_work+0x60/0x500 [nvme_core]
    process_one_work+0x1d8/0x4b0
    worker_thread+0x260/0x440
    kthread+0x114/0x120
    INFO: task nvme:28428 blocked for more than 120 seconds.
    "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this
    message.
    task:nvme            state:D stack:    0 pid:28428 ppid: 27119
    flags:0x00000000
    Call trace:
    __switch_to+0xb4/0xfc
    __schedule+0x22c/0x670
    schedule+0x4c/0xd0
    schedule_timeout+0x160/0x194
    do_wait_for_common+0xac/0x1d0
    __wait_for_common+0x78/0x100
    wait_for_completion+0x24/0x30
    __flush_work.isra.0+0x74/0x90
    flush_work+0x14/0x20
    nvme_reset_ctrl_sync+0x50/0x74 [nvme_core]
    nvme_dev_ioctl+0x1b0/0x250 [nvme_core]
    __arm64_sys_ioctl+0xa8/0xf0
    el0_svc_common+0x88/0x234
    do_el0_svc+0x7c/0x90
    el0_svc+0x1c/0x30
    el0_sync_handler+0xa8/0xb0
    el0_sync+0x148/0x180

The reason for the hang is that nvme_reset_work occurs while nvme_scan_work
is still running. nvme_scan_work may add new ns into ctrl->namespaces
list after nvme_reset_work frozen all ns->q in ctrl->namespaces list.
The newly added ns is not frozen, so nvme_wait_freeze will wait forever.
Unfortunately, ctrl->namespaces_rwsem is held by nvme_reset_work, so
nvme_scan_work will also wait forever. Now we are deadlocked!

PROCESS1                         PROCESS2
==============                   ==============
nvme_scan_work
  ...                            nvme_reset_work
  nvme_validate_or_alloc_ns        nvme_dev_disable
    nvme_alloc_ns                    nvme_start_freeze
     down_write                      ...
     nvme_ns_add_to_ctrl_list        ...
     up_write                      nvme_wait_freeze
    ...                              down_read
    nvme_alloc_ns                    blk_mq_freeze_queue_wait
     down_write

Fix by marking the ctrl with say NVME_CTRL_FROZEN flag set in
nvme_start_freeze and cleared in nvme_unfreeze. Then the scan can check
it before adding the new namespace (under the namespaces_rwsem).

Signed-off-by: Bitao Hu <yaoma@linux.alibaba.com>
Reviewed-by: Guixin Liu <kanie@linux.alibaba.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-20 11:51:41 +01:00
Nitesh Shetty
946fd64ba3 nvme: prevent potential spectre v1 gadget
[ Upstream commit 20dc66f2d7 ]

This patch fixes the smatch warning, "nvmet_ns_ana_grpid_store() warn:
potential spectre issue 'nvmet_ana_group_enabled' [w] (local cap)"
Prevent the contents of kernel memory from being leaked to  user space
via speculative execution by using array_index_nospec.

Signed-off-by: Nitesh Shetty <nj.shetty@samsung.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-20 11:51:41 +01:00
Keith Busch
8b2a6a3692 nvme-ioctl: move capable() admin check to the end
[ Upstream commit 7be866b1cf ]

This can be an expensive call on some kernel configs. Move it to the end
after checking the cheaper ways to determine if the command is allowed.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-20 11:51:41 +01:00
Keith Busch
8884a56d21 nvme: ensure reset state check ordering
[ Upstream commit e6e7f7ac03 ]

A different CPU may be setting the ctrl->state value, so ensure proper
barriers to prevent optimizing to a stale state. Normally it isn't a
problem to observe the wrong state as it is merely advisory to take a
quicker path during initialization and error recovery, but seeing an old
state can report unexpected ENETRESET errors when a reset request was in
fact successful.

Reported-by: Minh Hoang <mh2022@meta.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-20 11:51:41 +01:00
Keith Busch
cc5b051eeb nvme: introduce helper function to get ctrl state
[ Upstream commit 5c687c287c ]

The controller state is typically written by another CPU, so reading it
should ensure no optimizations are taken. This is a repeated pattern in
the driver, so start with adding a convenience function that returns the
controller state with READ_ONCE().

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-20 11:51:41 +01:00
Keith Busch
a4848c45a3 nvme-core: check for too small lba shift
[ Upstream commit 74fbc88e16 ]

The block layer doesn't support logical block sizes smaller than 512
bytes. The nvme spec doesn't support that small either, but the driver
isn't checking to make sure the device responded with usable data.
Failing to catch this will result in a kernel bug, either from a
division by zero when stacking, or a zero length bio.

Reviewed-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-20 11:51:39 +01:00
Maurizio Lombardi
75cc56afb2 nvme-core: fix a memory leak in nvme_ns_info_from_identify()
[ Upstream commit e3139cef82 ]

In case of error, free the nvme_id_ns structure that was allocated
by nvme_identify_ns().

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-20 11:51:38 +01:00
Keith Busch
c62b9a2daf Revert "nvme-fc: fix race between error recovery and creating association"
commit d3e8b18587 upstream.

The commit was identified to might sleep in invalid context and is
blocking regression testing.

This reverts commit ee6fdc5055.

Link: https://lore.kernel.org/linux-nvme/hkhl56n665uvc6t5d6h3wtx7utkcorw4xlwi7d2t2bnonavhe6@xaan6pu43ap6/
Link: https://lists.infradead.org/pipermail/linux-nvme/2023-December/043756.html
Reported-by: Daniel Wagner <dwagner@suse.de>
Reported-by: Maurizio Lombardi <mlombard@redhat.com>
Cc: Michael Liang <mliang@purestorage.com>
Tested-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2024-01-05 15:19:43 +01:00
Maurizio Lombardi
fedbc8732f nvme-pci: fix sleeping function called from interrupt context
[ Upstream commit f6fe0b2d35 ]

the nvme_handle_cqe() interrupt handler calls nvme_complete_async_event()
but the latter may call nvme_auth_stop() which is a blocking function.
Sleeping functions can't be called in interrupt context

 BUG: sleeping function called from invalid context
 in_atomic(): 1, irqs_disabled(): 1, non_block: 0, pid: 0, name: swapper/15
  Call Trace:
     <IRQ>
      __cancel_work_timer+0x31e/0x460
      ? nvme_change_ctrl_state+0xcf/0x3c0 [nvme_core]
      ? nvme_change_ctrl_state+0xcf/0x3c0 [nvme_core]
      nvme_complete_async_event+0x365/0x480 [nvme_core]
      nvme_poll_cq+0x262/0xe50 [nvme]

Fix the bug by moving nvme_auth_stop() to fw_act_work
(executed by the nvme_wq workqueue)

Fixes: f50fff73d6 ("nvme: implement In-Band authentication")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2024-01-01 12:42:35 +00:00
Hannes Reinecke
9514925a9a nvme: catch errors from nvme_configure_metadata()
[ Upstream commit cd9aed6060 ]

nvme_configure_metadata() is issuing I/O, so we might incur an I/O
error which will cause the connection to be reset.
But in that case any further probing will race with reset and
cause UAF errors.
So return a status from nvme_configure_metadata() and abort
probing if there was an I/O error.

Signed-off-by: Hannes Reinecke <hare@suse.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-20 17:01:57 +01:00
Mark O'Donovan
89fc9028e8 nvme-auth: set explanation code for failure2 msgs
[ Upstream commit 38ce1570e2 ]

Some error cases were not setting an auth-failure-reason-code-explanation.
This means an AUTH_Failure2 message will be sent with an explanation value
of 0 which is a reserved value.

Signed-off-by: Mark O'Donovan <shiftee@posteo.net>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-20 17:01:57 +01:00
Georg Gottleuber
dd864f6ee0 nvme-pci: Add sleep quirk for Kingston drives
commit 107b4e063d upstream.

Some Kingston NV1 and A2000 are wasting a lot of power on specific TUXEDO
platforms in s2idle sleep if 'Simple Suspend' is used.

This patch applies a new quirk 'Force No Simple Suspend' to achieve a
low power sleep without 'Simple Suspend'.

Signed-off-by: Werner Sembach <wse@tuxedocomputers.com>
Signed-off-by: Georg Gottleuber <ggo@tuxedocomputers.com>
Cc: <stable@vger.kernel.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-13 18:45:20 +01:00
Ewan D. Milne
2bda53caf1 nvme: check for valid nvme_identify_ns() before using it
commit d8b90d600a upstream.

When scanning namespaces, it is possible to get valid data from the first
call to nvme_identify_ns() in nvme_alloc_ns(), but not from the second
call in nvme_update_ns_info_block().  In particular, if the NSID becomes
inactive between the two commands, a storage device may return a buffer
filled with zero as per 4.1.5.1.  In this case, we can get a kernel crash
due to a divide-by-zero in blk_stack_limits() because ns->lba_shift will
be set to zero.

PID: 326      TASK: ffff95fec3cd8000  CPU: 29   COMMAND: "kworker/u98:10"
 #0 [ffffad8f8702f9e0] machine_kexec at ffffffff91c76ec7
 #1 [ffffad8f8702fa38] __crash_kexec at ffffffff91dea4fa
 #2 [ffffad8f8702faf8] crash_kexec at ffffffff91deb788
 #3 [ffffad8f8702fb00] oops_end at ffffffff91c2e4bb
 #4 [ffffad8f8702fb20] do_trap at ffffffff91c2a4ce
 #5 [ffffad8f8702fb70] do_error_trap at ffffffff91c2a595
 #6 [ffffad8f8702fbb0] exc_divide_error at ffffffff928506e6
 #7 [ffffad8f8702fbd0] asm_exc_divide_error at ffffffff92a00926
    [exception RIP: blk_stack_limits+434]
    RIP: ffffffff92191872  RSP: ffffad8f8702fc80  RFLAGS: 00010246
    RAX: 0000000000000000  RBX: ffff95efa0c91800  RCX: 0000000000000001
    RDX: 0000000000000000  RSI: 0000000000000001  RDI: 0000000000000001
    RBP: 00000000ffffffff   R8: ffff95fec7df35a8   R9: 0000000000000000
    R10: 0000000000000000  R11: 0000000000000001  R12: 0000000000000000
    R13: 0000000000000000  R14: 0000000000000000  R15: ffff95fed33c09a8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffffad8f8702fce0] nvme_update_ns_info_block at ffffffffc06d3533 [nvme_core]
 #9 [ffffad8f8702fd18] nvme_scan_ns at ffffffffc06d6fa7 [nvme_core]

This happened when the check for valid data was moved out of nvme_identify_ns()
into one of the callers.  Fix this by checking in both callers.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=218186
Fixes: 0dd6fff2aa ("nvme: bring back auto-removal of deleted namespaces during sequential scan")
Cc: stable@vger.kernel.org
Signed-off-by: Ewan D. Milne <emilne@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2023-12-08 08:52:18 +01:00
Christoph Hellwig
2291653c27 nvmet: nul-terminate the NQNs passed in the connect command
[ Upstream commit 1c22e0295a ]

The host and subsystem NQNs are passed in the connect command payload and
interpreted as nul-terminated strings.  Ensure they actually are
nul-terminated before using them.

Fixes: a07b4970f4 "nvmet: add a generic NVMe target")
Reported-by: Alon Zahavi <zahavi.alon@gmail.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-03 07:33:06 +01:00
Hannes Reinecke
399d76d330 nvme: blank out authentication fabrics options if not configured
[ Upstream commit c7ca9757bd ]

If the config option NVME_HOST_AUTH is not selected we should not
accept the corresponding fabrics options. This allows userspace
to detect if NVMe authentication has been enabled for the kernel.

Cc: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Fixes: f50fff73d6 ("nvme: implement In-Band authentication")
Signed-off-by: Hannes Reinecke <hare@suse.de>
Tested-by: Shin'ichiro Kawasaki <shinichiro.kawasaki@wdc.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-12-03 07:33:06 +01:00
Anuj Gupta
04e446f54f nvme: fix error-handling for io_uring nvme-passthrough
[ Upstream commit 1147dd0503 ]

Driver may return an error before submitting the command to the device.
Ensure that such error is propagated up.

Fixes: 456cba386e ("nvme: wire-up uring-cmd support for io-passthru on char-device.")
Signed-off-by: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Niklas Cassel <niklas.cassel@wdc.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2023-11-20 11:59:35 +01:00
Keith Busch
5c3f406646 nvme-pci: add BOGUS_NID for Intel 0a54 device
These ones claim cmic and nmic capable, so need special consideration to ignore
their duplicate identifiers.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217981
Reported-by: welsh@cassens.com
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-10-18 14:08:39 -07:00
Maurizio Lombardi
f965b281fd nvmet-auth: complete a request only after freeing the dhchap pointers
It may happen that the work to destroy a queue
(for example nvmet_tcp_release_queue_work()) is started while
an auth-send or auth-receive command is still completing.

nvmet_sq_destroy() will block, waiting for all the references
to the sq to be dropped, the last reference is then
dropped when nvmet_req_complete() is called.

When this happens, both nvmet_sq_destroy() and
nvmet_execute_auth_send()/_receive() will free the dhchap pointers by
calling nvmet_auth_sq_free().
Since there isn't any lock, the two threads may race against each other,
causing double frees and memory corruptions, as reported by KASAN.

Reproduced by stress blktests nvme/041 nvme/042 nvme/043

 nvme nvme2: qid 0: authenticated with hash hmac(sha512) dhgroup ffdhe4096
 ==================================================================
 BUG: KASAN: double-free in kfree+0xec/0x4b0

 Call Trace:
  <TASK>
  kfree+0xec/0x4b0
  nvmet_auth_sq_free+0xe1/0x160 [nvmet]
  nvmet_execute_auth_send+0x482/0x16d0 [nvmet]
  process_one_work+0x8e5/0x1510

 Allocated by task 191846:
  __kasan_kmalloc+0x81/0xa0
  nvmet_auth_ctrl_sesskey+0xf6/0x380 [nvmet]
  nvmet_auth_reply+0x119/0x990 [nvmet]

 Freed by task 143270:
  kfree+0xec/0x4b0
  nvmet_auth_sq_free+0xe1/0x160 [nvmet]
  process_one_work+0x8e5/0x1510

Fix this bug by calling nvmet_req_complete() only after freeing the
pointers, so we will prevent the race by holding the sq reference.

V2: remove redundant code

Fixes: db1312dd95 ("nvmet: implement basic In-Band Authentication")
Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-10-18 14:08:39 -07:00
Keith Busch
2b32c76e2b nvme: sanitize metadata bounce buffer for reads
User can request more metadata bytes than the device will write. Ensure
kernel buffer is initialized so we're not leaking unsanitized memory on
the copy-out.

Fixes: 0b7f1f26f9 ("nvme: use the block layer for userspace passthrough metadata")
Reviewed-by: Jens Axboe <axboe@kernel.dk>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Kanchan Joshi <joshi.k@samsung.com>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-10-18 14:08:39 -07:00
Martin Wilck
4ae55a7dce nvme-auth: use chap->s2 to indicate bidirectional authentication
Commit 546dea18c9 ("nvme-auth: check chap ctrl_key once constructed")
replaced the condition "if (ctrl->ctrl_key)" (indicating bidirectional
auth) by "if (chap->ctrl_key)", because ctrl->ctrl_key is a resource shared
with sysfs. But chap->ctrl_key is set in
nvme_auth_process_dhchap_challenge() depending on the DHVLEN in the
DH-HMAC-CHAP Challenge message received from the controller, and will thus
be non-NULL for every DH-HMAC-CHAP exchange, even if unidirectional auth
was requested. This will lead to a protocol violation by sending a Success2
message in the unidirectional case (per NVMe base spec 2.0, the
authentication transaction ends after the Success1 message for
unidirectional auth). Use chap->s2 instead, which is non-zero if and only
if the host requested bi-directional authentication from the controller.

Fixes: 546dea18c9 ("nvme-auth: check chap ctrl_key once constructed")
Signed-off-by: Martin Wilck <mwilck@suse.com>
Reviewed-by: Daniel Wagner <dwagner@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-10-10 08:06:06 -07:00
Sagi Grimberg
d920abd1e7 nvmet-tcp: Fix a possible UAF in queue intialization setup
From Alon:
"Due to a logical bug in the NVMe-oF/TCP subsystem in the Linux kernel,
a malicious user can cause a UAF and a double free, which may lead to
RCE (may also lead to an LPE in case the attacker already has local
privileges)."

Hence, when a queue initialization fails after the ahash requests are
allocated, it is guaranteed that the queue removal async work will be
called, hence leave the deallocation to the queue removal.

Also, be extra careful not to continue processing the socket, so set
queue rcv_state to NVMET_TCP_RECV_ERR upon a socket error.

Cc: stable@vger.kernel.org
Reported-by: Alon Zahavi <zahavi.alon@gmail.com>
Tested-by: Alon Zahavi <zahavi.alon@gmail.com>
Signed-off-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-10-10 08:03:22 -07:00
Maurizio Lombardi
3820c4fdc2 nvme-rdma: do not try to stop unallocated queues
Trying to stop a queue which hasn't been allocated will result
in a warning due to calling mutex_lock() against an uninitialized mutex.

 DEBUG_LOCKS_WARN_ON(lock->magic != lock)
 WARNING: CPU: 4 PID: 104150 at kernel/locking/mutex.c:579

 Call trace:
  RIP: 0010:__mutex_lock+0x1173/0x14a0
  nvme_rdma_stop_queue+0x1b/0xa0 [nvme_rdma]
  nvme_rdma_teardown_io_queues.part.0+0xb0/0x1d0 [nvme_rdma]
  nvme_rdma_delete_ctrl+0x50/0x100 [nvme_rdma]
  nvme_do_delete_ctrl+0x149/0x158 [nvme_core]

Signed-off-by: Maurizio Lombardi <mlombard@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-10-05 09:37:41 -07:00
Jens Axboe
c266ae774e nvme fixes for Linux 6.6
- nvme-tcp iov len fix (Varun)
  - nvme-hwmon const qualifier for safety (Krzysztof)
  - nvme-fc null pointer checks (Nigel)
  - nvme-pci no numa node fix (Pratyush)
  - nvme timeout fix for non-compliant controllers (Keith)
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE3Fbyvv+648XNRdHTPe3zGtjzRgkFAmUDakkACgkQPe3zGtjz
 RgnSyA/+MR9dvwaa9ADcmWtzKaFnpxOy24Yr0CQ97oqfryv89ZVHSOT9gzQGdhNO
 m3Z3hJ72nU+mjjy8/Ixh0+AdSNLOD6sY9ohQQ/7jev+yGjOaSZnQWXpH6pionfuW
 DCfQweFq8VggCpT+wVinnzs7C4S/mhuuNHunefDAa+A7R6jeJENhridbzillokfi
 semNz6GSKcIMBsaQ85JDlTyh+nGfva7KwLVdEzA0g00XjxMw159bBjqPFtm/Dwde
 Jt280wVNQHEa+Jo8CVgAbYvFAhgAmwt5cqr8d22PisVVR324s7aMUf++BQcslmzX
 5WyVTvr3nvKH5uAWtvKJiDV8VNhs6WM3CuCfOOqK2fOaQ6PN18In08y9aGPDQ73e
 3xkkgDdRlJFa2jae48msIY2KnysJoqPJgymEtvHuqbQNYifE1jym39Ml5soWCol2
 a0aEVobGTceeEb0D7MWdVFntS6HmdFrZcinmSGcgj34IAYCYaguHK8ZSPNovJnAq
 /27YDkfXnt3+8ZgbxhZjxxToH2xCaa/uC842Ujh3KTROqKbppw8YBv/2VoLNxGNh
 Kqlu2vApIbrNP19HuCp/oDu10Q+6KPJZY1qqT68ndLl47CYAXEG0XbyB13I4hdwL
 YUues45a9f9X+Q0cHoJ9Hr4rTLfF6dm5gNv0X1cUdyLNTcB1FbE=
 =WTY0
 -----END PGP SIGNATURE-----

Merge tag 'nvme-6.6-2023-09-14' of git://git.infradead.org/nvme into block-6.6

Pull NVMe fixes from Keith:

"nvme fixes for Linux 6.6

 - nvme-tcp iov len fix (Varun)
 - nvme-hwmon const qualifier for safety (Krzysztof)
 - nvme-fc null pointer checks (Nigel)
 - nvme-pci no numa node fix (Pratyush)
 - nvme timeout fix for non-compliant controllers (Keith)"

* tag 'nvme-6.6-2023-09-14' of git://git.infradead.org/nvme:
  nvme: avoid bogus CRTO values
  nvme-pci: do not set the NUMA node of device if it has none
  nvme-fc: Prevent null pointer dereference in nvme_fc_io_getuuid()
  nvme: host: hwmon: constify pointers to hwmon_channel_info
  nvmet-tcp: pass iov_len instead of sg->length to bvec_set_page()
2023-09-14 16:20:31 -06:00
Keith Busch
6cc834ba62 nvme: avoid bogus CRTO values
Some devices are reporting controller ready mode support, but return 0
for CRTO. These devices require a much higher time to ready than that,
so they are failing to initialize after the driver starter preferring
that value over CAP.TO.

The spec requires that CAP.TO match the appropritate CRTO value, or be
set to 0xff if CRTO is larger than that. This means that CAP.TO can be
used to validate if CRTO is reliable, and provides an appropriate
fallback for setting the timeout value if not. Use whichever is larger.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217863
Reported-by: Cláudio Sampaio <patola@gmail.com>
Reported-by: Felix Yan <felixonmars@archlinux.org>
Tested-by: Felix Yan <felixonmars@archlinux.org>
Based-on-a-patch-by: Felix Yan <felixonmars@archlinux.org>
Cc: stable@vger.kernel.org
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-09-14 13:09:52 -07:00
Pratyush Yadav
dad651b2a4 nvme-pci: do not set the NUMA node of device if it has none
If a device has no NUMA node information associated with it, the driver
puts the device in node first_memory_node (say node 0). Not having a
NUMA node and being associated with node 0 are completely different
things and it makes little sense to mix the two.

Signed-off-by: Pratyush Yadav <ptyadav@amazon.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-09-12 09:06:58 -07:00
Linus Torvalds
3d3dfeb3ae for-6.6/block-2023-08-28
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmTs08EQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpqa4EACu/zKE+omGXBV0Q7kEpVsChjp0ElGtSDIJ
 tJfTuvnWqQjrqRv4ksmZvGdx8SkqFuXri4/7oBXlsaqeUVbIQdWJUpLErBye6nxa
 lUb6nXOFWwyG94cMRYs71lN0loosjb7aiVw7oVLAIhntq3p3doFl/cyy3ndMZrUE
 pZbsrWSt4QiOKhcO0TtIjfAwsr31AN51qFiNNITEiZl3UjXfkGRCK81X0yM2N8zZ
 7Y0h1ldPBsZ/olNWeRyaW1uB64nKM0buR7/nDxCV/NI05nndJ34bIgo/JIj4xy0v
 SiBj2+y86+oMJZt17yYENwOQdtX3hbyESGuVm9dCrO0t9/byVQxkUk0OMm65BM/l
 l2d+gmMQZTbHziqfLlgq9i3i9+B4C2hsb7iBpuo7SW/FPbM45POgi3lpiZycaZyu
 krQo1qwL4KSGXzGN9CabEuKDcJcXqLxqMDOyEDA3R5Kz06V9tNuM+Di/mr4vuZHK
 sVHUfHuWBO9ionLlGPdc3fH/CuMqic8SHjumiAm2menBZV6cSzRDxpm6H4CyLt7y
 tWmw7BNU7dfHFGd+Jw0Ld49sAuEybszEXq6qYv5uYBVfJNqDvOvEeVoQp0RN2jJA
 AG30hymcZgxn9n7gkIgkPQDgIGUjnzUR8B2mE2UFU1CYVHXYXAXU55CCI5oeTkbs
 d0Y/zCZf1A==
 =p1bd
 -----END PGP SIGNATURE-----

Merge tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux

Pull block updates from Jens Axboe:
 "Pretty quiet round for this release. This contains:

   - Add support for zoned storage to ublk (Andreas, Ming)

   - Series improving performance for drivers that mark themselves as
     needing a blocking context for issue (Bart)

   - Cleanup the flush logic (Chengming)

   - sed opal keyring support (Greg)

   - Fixes and improvements to the integrity support (Jinyoung)

   - Add some exports for bcachefs that we can hopefully delete again in
     the future (Kent)

   - deadline throttling fix (Zhiguo)

   - Series allowing building the kernel without buffer_head support
     (Christoph)

   - Sanitize the bio page adding flow (Christoph)

   - Write back cache fixes (Christoph)

   - MD updates via Song:
      - Fix perf regression for raid0 large sequential writes (Jan)
      - Fix split bio iostat for raid0 (David)
      - Various raid1 fixes (Heinz, Xueshi)
      - raid6test build fixes (WANG)
      - Deprecate bitmap file support (Christoph)
      - Fix deadlock with md sync thread (Yu)
      - Refactor md io accounting (Yu)
      - Various non-urgent fixes (Li, Yu, Jack)

   - Various fixes and cleanups (Arnd, Azeem, Chengming, Damien, Li,
     Ming, Nitesh, Ruan, Tejun, Thomas, Xu)"

* tag 'for-6.6/block-2023-08-28' of git://git.kernel.dk/linux: (113 commits)
  block: use strscpy() to instead of strncpy()
  block: sed-opal: keyring support for SED keys
  block: sed-opal: Implement IOC_OPAL_REVERT_LSP
  block: sed-opal: Implement IOC_OPAL_DISCOVERY
  blk-mq: prealloc tags when increase tagset nr_hw_queues
  blk-mq: delete redundant tagset map update when fallback
  blk-mq: fix tags leak when shrink nr_hw_queues
  ublk: zoned: support REQ_OP_ZONE_RESET_ALL
  md: raid0: account for split bio in iostat accounting
  md/raid0: Fix performance regression for large sequential writes
  md/raid0: Factor out helper for mapping and submitting a bio
  md raid1: allow writebehind to work on any leg device set WriteMostly
  md/raid1: hold the barrier until handle_read_error() finishes
  md/raid1: free the r1bio before waiting for blocked rdev
  md/raid1: call free_r1bio() before allow_barrier() in raid_end_bio_io()
  blk-cgroup: Fix NULL deref caused by blkg_policy_data being installed before init
  drivers/rnbd: restore sysfs interface to rnbd-client
  md/raid5-cache: fix null-ptr-deref for r5l_flush_stripe_to_raid()
  raid6: test: only check for Altivec if building on powerpc hosts
  raid6: test: make sure all intermediate and artifact files are .gitignored
  ...
2023-08-29 20:21:42 -07:00
Nigel Kirkland
8ae5b3a685 nvme-fc: Prevent null pointer dereference in nvme_fc_io_getuuid()
The nvme_fc_fcp_op structure describing an AEN operation is initialized with a
null request structure pointer. An FC LLDD may make a call to
nvme_fc_io_getuuid passing a pointer to an nvmefc_fcp_req for an AEN operation.

Add validation of the request structure pointer before dereference.

Signed-off-by: Nigel Kirkland <nkirkland2304@gmail.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-08-21 13:29:17 -07:00
Krzysztof Kozlowski
71be868472 nvme: host: hwmon: constify pointers to hwmon_channel_info
Statically allocated array of pointed to hwmon_channel_info can be made
const for safety.

Signed-off-by: Krzysztof Kozlowski <krzysztof.kozlowski@linaro.org>
Acked-by: Christoph Hellwig <hch@lst.de>
Acked-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-08-21 12:54:02 -07:00
Varun Prakash
1f0bbf2894 nvmet-tcp: pass iov_len instead of sg->length to bvec_set_page()
iov_len is the valid data length, so pass iov_len instead of sg->length to
bvec_set_page().

Fixes: 5bfaba275a ("nvmet-tcp: don't map pages which can't come from HIGHMEM")
Signed-off-by: Rakshana Sridhar <rakshanas@chelsio.com>
Signed-off-by: Varun Prakash <varun@chelsio.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-08-21 12:54:02 -07:00
Linus Torvalds
360e694282 block-6.5-2023-08-11
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmTWfLQQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpg3nEACROhaeX6cpeDCSTqDVDW/ontbyn15eX7ep
 tGPLn/TVtKv2AztIobEinS08MdywqBO/VcB7XkxQV9Ov4JqCHIAKhndWI6/HqD9P
 DH3h6tE5JA8RQlNw1aHRrqWWIl1lpDQI6263um1tB2TuaxRa4xuR560jju0VZzAm
 9541ceKlJT8Qc7yG0aiiCv6Bxz+b6Htv3DqCf1mY2yznl3BpN52RQHKhiA0sfnlF
 WKqNsvSJ9/kz3vJbNpFucO7ch8a7W+MzmBx0vf2ickTBpL/3hbhUOrE7dGeKI9rS
 cWh1HaULWqjnKY1uxF9nnapZxm8QoxkT/5T0DgmprKjwuZivfLASAhYpHBc3mT1S
 eQQ0AK8hqx7sPnPeO/kxWtxM2nzRLkeVd19ClbIwux/zDbRrpHWk2/wgnSUUd3/H
 HBbjbgPWbkgLvTOUKhIA5VPBcgkC1efom1+ePzkH/H4TRRuVJwg6s6utGXdgc1PX
 +B4TA8GtXRH/7L0tsblFyJRmd0Y6G7gYE/yy0DYZTMie3oaWrKx3lmz48AQUtEzh
 DG46VRA4wnthHRlw3mkLP7C6z4PJvK9WWBiK11eZ9VfJMF643FNpXQ3/bviR9pfF
 kXdwYXoi1mlnsQ0VUhu2f+JeV4hHalrjwD/VE2H0E8Ogb4ezmJteLyiZKcw5xwaA
 Hmtmbb7Qxw==
 =+1Vt
 -----END PGP SIGNATURE-----

Merge tag 'block-6.5-2023-08-11' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
      - Fixes for request_queue state (Ming)
      - Another uuid quirk (August)

 - RCU poll fix for NVMe (Ming)

 - Fix for an IO stall with polled IO (me)

 - Fix for blk-iocost stats enable/disable accounting (Chengming)

 - Regression fix for large pages for zram (Christoph)

* tag 'block-6.5-2023-08-11' of git://git.kernel.dk/linux:
  nvme: core: don't hold rcu read lock in nvme_ns_chr_uring_cmd_iopoll
  blk-iocost: fix queue stats accounting
  block: don't make REQ_POLLED imply REQ_NOWAIT
  block: get rid of unused plug->nowait flag
  zram: take device and not only bvec offset into account
  nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G and 512G
  nvme-rdma: fix potential unbalanced freeze & unfreeze
  nvme-tcp: fix potential unbalanced freeze & unfreeze
  nvme: fix possible hang when removing a controller during error recovery
2023-08-11 12:14:08 -07:00
Ming Lei
a7a7dabb5d nvme: core: don't hold rcu read lock in nvme_ns_chr_uring_cmd_iopoll
Now nvme_ns_chr_uring_cmd_iopoll() has switched to request based io
polling, and the associated NS is guaranteed to be live in case of
io polling, so request is guaranteed to be valid because blk-mq uses
pre-allocated request pool.

Remove the rcu read lock in nvme_ns_chr_uring_cmd_iopoll(), which
isn't needed any more after switching to request based io polling.

Fix "BUG: sleeping function called from invalid context" because
set_page_dirty_lock() from blk_rq_unmap_user() may sleep.

Fixes: 585079b6e4 ("nvme: wire up async polling for io passthrough commands")
Reported-by: Guangwu Zhang <guazhang@redhat.com>
Cc: Kanchan Joshi <joshi.k@samsung.com>
Cc: Anuj Gupta <anuj20.g@samsung.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Guangwu Zhang <guazhang@redhat.com>
Link: https://lore.kernel.org/r/20230809020440.174682-1-ming.lei@redhat.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-11 08:12:32 -06:00
Jinyoung Choi
80814b8e35 bio-integrity: update the payload size in bio_integrity_add_page()
Previously, the bip's bi_size has been set before an integrity pages
were added. If a problem occurs in the process of adding pages for
bip, the bi_size mismatch problem must be dealt with.

When the page is successfully added to bvec, the bi_size is updated.

The parts affected by the change were also contained in this commit.

Cc: Christoph Hellwig <hch@lst.de>
Cc: Martin K. Petersen <martin.petersen@oracle.com>

Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Jinyoung Choi <j-young.choi@samsung.com>
Tested-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Reviewed-by: "Martin K. Petersen" <martin.petersen@oracle.com>
Link: https://lore.kernel.org/r/20230803024956epcms2p38186a17392706650c582d38ef3dbcd32@epcms2p3
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2023-08-09 16:05:35 -06:00
August Wikerfors
688b419c57 nvme-pci: add NVME_QUIRK_BOGUS_NID for Samsung PM9B1 256G and 512G
The Samsung PM9B1 512G SSD found in some Lenovo Yoga 7 14ARB7 laptop units
reports eui as 0001000200030004 when resuming from s2idle, causing the
device to be removed with this error in dmesg:

nvme nvme0: identifiers changed for nsid 1

To fix this, add a quirk to ignore namespace identifiers for this device.

Signed-off-by: August Wikerfors <git@augustwikerfors.se>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-08-01 13:28:29 -07:00
Ming Lei
29b434d1e4 nvme-rdma: fix potential unbalanced freeze & unfreeze
Move start_freeze into nvme_rdma_configure_io_queues(), and there is
at least two benefits:

1) fix unbalanced freeze and unfreeze, since re-connection work may
fail or be broken by removal

2) IO during error recovery can be failfast quickly because nvme fabrics
unquiesces queues after teardown.

One side-effect is that !mpath request may timeout during connecting
because of queue topo change, but that looks not one big deal:

1) same problem exists with current code base

2) compared with !mpath, mpath use case is dominant

Fixes: 9f98772ba3 ("nvme-rdma: fix controller reset hang during traffic")
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-21 00:53:33 -07:00
Ming Lei
99dc264014 nvme-tcp: fix potential unbalanced freeze & unfreeze
Move start_freeze into nvme_tcp_configure_io_queues(), and there is
at least two benefits:

1) fix unbalanced freeze and unfreeze, since re-connection work may
fail or be broken by removal

2) IO during error recovery can be failfast quickly because nvme fabrics
unquiesces queues after teardown.

One side-effect is that !mpath request may timeout during connecting
because of queue topo change, but that looks not one big deal:

1) same problem exists with current code base

2) compared with !mpath, mpath use case is dominant

Fixes: 2875b0aeca ("nvme-tcp: fix controller reset hang during traffic")
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Tested-by: Yi Zhang <yi.zhang@redhat.com>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-21 00:53:29 -07:00
Ming Lei
1b95e81791 nvme: fix possible hang when removing a controller during error recovery
Error recovery can be interrupted by controller removal, then the
controller is left as quiesced, and IO hang can be caused.

Fix the issue by unquiescing controller unconditionally when removing
namespaces.

This way is reasonable and safe given forward progress can be made
when removing namespaces.

Reviewed-by: Keith Busch <kbusch@kernel.org>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Reported-by: Chunguang Xu <brookxu.cn@gmail.com>
Closes: https://lore.kernel.org/linux-nvme/cover.1685350577.git.chunguang.xu@shopee.com/
Cc: stable@vger.kernel.org
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-21 00:53:26 -07:00
Linus Torvalds
be522ac7cd SCSI fixes on 20230714
This is a bunch of small driver fixes and a larger rework of zone disk
 handling (which reaches into blk and nvme).  The aacraid array-bounds
 fix is now critical since the security people turned on -Werror for
 some build tests, which now fail without it.
 
 Signed-off-by: James E.J. Bottomley <jejb@linux.ibm.com>
 -----BEGIN PGP SIGNATURE-----
 
 iJwEABMIAEQWIQTnYEDbdso9F2cI+arnQslM7pishQUCZLGSiCYcamFtZXMuYm90
 dG9tbGV5QGhhbnNlbnBhcnRuZXJzaGlwLmNvbQAKCRDnQslM7pishd/BAPwO2i4t
 5uzhcWihoYaIZ6x07oEhgOP/o1h5n5mM908AyAEA6s2hQKDoIxjJexqvkS7lPjni
 P8VMcfvOmdsLDCD3nJ4=
 =+10g
 -----END PGP SIGNATURE-----

Merge tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi

Pull SCSI fixes from James Bottomley:
 "This is a bunch of small driver fixes and a larger rework of zone disk
  handling (which reaches into blk and nvme).

  The aacraid array-bounds fix is now critical since the security people
  turned on -Werror for some build tests, which now fail without it"

* tag 'scsi-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/jejb/scsi:
  scsi: storvsc: Handle SRB status value 0x30
  scsi: block: Improve checks in blk_revalidate_disk_zones()
  scsi: block: virtio_blk: Set zone limits before revalidating zones
  scsi: block: nullblk: Set zone limits before revalidating zones
  scsi: nvme: zns: Set zone limits before revalidating zones
  scsi: sd_zbc: Set zone limits before revalidating zones
  scsi: ufs: core: Add support for qTimestamp attribute
  scsi: aacraid: Avoid -Warray-bounds warning
  scsi: ufs: ufs-mediatek: Add dependency for RESET_CONTROLLER
  scsi: ufs: core: Update contact email for monitor sysfs nodes
  scsi: scsi_debug: Remove dead code
  scsi: qla2xxx: Use vmalloc_array() and vcalloc()
  scsi: fnic: Use vmalloc_array() and vcalloc()
  scsi: qla2xxx: Fix error code in qla2x00_start_sp()
  scsi: qla2xxx: Silence a static checker warning
  scsi: lpfc: Fix a possible data race in lpfc_unregister_fcf_rescan()
2023-07-14 19:57:29 -07:00
Linus Torvalds
b3bd86a049 block-6.5-2023-07-14
-----BEGIN PGP SIGNATURE-----
 
 iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmSxpCwQHGF4Ym9lQGtl
 cm5lbC5kawAKCRD301j7KXHgpu1kEACX29tNFgxVYh6hpFlbEfK4eaAEESulgo0n
 ubkD+VLnjI3pEUBH+GA++0/kXjrdi9hsXQ2LcO2P9oB2LN6da7Tf0CvFWRbFwqzd
 Dpt7z/UFEikLNYHnctahQbtB7fsy7PYek6RIhJrCJo/+t7UUnV8RCUVLzeqROxKz
 WA6/B62/ahPW4wD0ZfW/xPDUOpR61jZ+GqD7/F5qXc7+7MqL7nLwUFH71zstwUoX
 zmqiyA9clArlSCjmARBP0ekjK/7wYDz8NHMlD9wja2ZwJkIrw8/fml7MHS+j9cPB
 rYc6zhHSksfQa/T4PZq00uGMEIC38QqtV+zHzeziIvh0i2lvLXRyiXAO0bY5yiB2
 rgyYOdaV1kEV3zuQ8zwPI/ZZJkZBeFzXM0hBeAq8K02QRPT/fHjcKE7+xxIQUy/h
 ZfX/NJcS6cWQhI0NFf18BtZR7N544A1LFhj2U8hM8m3U2+O8/n1Ozb0v0yXBuXW3
 oxgRdrqRBpI9pAywLIIBo5Ro/E3BK+2MqO0fIs0UxpYWPbSw6WMrFvwRkDEzQvhP
 0pZpgr9Y1LhPPW+nysASqwlx1Bf9PJTaE6zH2ze1eu9nauhtivuStGC0Sc5w0+UC
 3UsNLDDAgw98+SRxllQrjFl1Q+0YCseLgQrz54nCLnu7xen6UN3n/HMX/yI0C2ri
 Ufn+F9jOig==
 =ao7r
 -----END PGP SIGNATURE-----

Merge tag 'block-6.5-2023-07-14' of git://git.kernel.dk/linux

Pull block fixes from Jens Axboe:

 - NVMe pull request via Keith:
      - Don't require quirk to use duplicate namespace identifiers
        (Christoph, Sagi)
      - One more BOGUS_NID quirk (Pankaj)
      - IO timeout and error hanlding fixes for PCI (Keith)
      - Enhanced metadata format mask fix (Ankit)
      - Association race condition fix for fibre channel (Michael)
      - Correct debugfs error checks (Minjie)
      - Use PAGE_SECTORS_SHIFT where needed (Damien)
      - Reduce kernel logs for legacy nguid attribute (Keith)
      - Use correct dma direction when unmapping metadata (Ming)

 - Fix for a flush handling regression in this release (Christoph)

 - Fix for batched request time stamping (Chengming)

 - Fix for a regression in the mq-deadline position calculation (Bart)

 - Lockdep fix for blk-crypto (Eric)

 - Fix for a regression in the Amiga partition handling changes
   (Michael)

* tag 'block-6.5-2023-07-14' of git://git.kernel.dk/linux:
  block: queue data commands from the flush state machine at the head
  blk-mq: fix start_time_ns and alloc_time_ns for pre-allocated rq
  nvme-pci: fix DMA direction of unmapping integrity data
  nvme: don't reject probe due to duplicate IDs for single-ported PCIe devices
  block/mq-deadline: Fix a bug in deadline_from_pos()
  nvme: ensure disabling pairs with unquiesce
  nvme-fc: fix race between error recovery and creating association
  nvme-fc: return non-zero status code when fails to create association
  nvme: fix parameter check in nvme_fault_inject_init()
  nvme: warn only once for legacy uuid attribute
  block: remove dead struc request->completion_data field
  nvme: fix the NVME_ID_NS_NVM_STS_MASK definition
  nvmet: use PAGE_SECTORS_SHIFT
  nvme: add BOGUS_NID quirk for Samsung SM953
  blk-crypto: use dynamic lock class for blk_crypto_profile::lock
  block/partition: fix signedness issue for Amiga partitions
2023-07-14 19:52:18 -07:00
Ming Lei
b8f6446b68 nvme-pci: fix DMA direction of unmapping integrity data
DMA direction should be taken in dma_unmap_page() for unmapping integrity
data.

Fix this DMA direction, and reported in Guangwu's test.

Reported-by: Guangwu Zhang <guazhang@redhat.com>
Fixes: 4aedb70543 ("nvme-pci: split metadata handling from nvme_map_data / nvme_unmap_data")
Signed-off-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-13 08:16:07 -07:00
Christoph Hellwig
ac522fc6c3 nvme: don't reject probe due to duplicate IDs for single-ported PCIe devices
While duplicate IDs are still very harmful, including the potential to easily
see changing devices in /dev/disk/by-id, it turn out they are extremely
common for cheap end user NVMe devices.

Relax our check for them for so that it doesn't reject the probe on
single-ported PCIe devices, but prints a big warning instead.  In doubt
we'd still like to see quirk entries to disable the potential for
changing supposed stable device identifier links, but this will at least
allow users how have two (or more) of these devices to use them without
having to manually add a new PCI ID entry with the quirk through sysfs or
by patching the kernel.

Fixes: 2079f41ec6 ("nvme: check that EUI/GUID/UUID are globally unique")
Cc: stable@vger.kernel.org # 6.0+
Co-developed-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-13 08:12:50 -07:00
Keith Busch
71a5bb153b nvme: ensure disabling pairs with unquiesce
If any error handling that disables the controller fails to queue the
reset work, like if the state changed to disconnected inbetween, then
the failed teardown needs to unquiesce the queues since it's no longer
paired with reset_work. Just make sure that the controller can be put
into a resetting state prior to starting the disable so that no other
handling can change the queue states while recovery is happening.

Reported-by: Ming Lei <ming.lei@redhat.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-12 09:32:58 -07:00
Michael Liang
ee6fdc5055 nvme-fc: fix race between error recovery and creating association
There is a small race window between nvme-fc association creation and error
recovery. Fix this race condition by protecting accessing to controller
state and ASSOC_FAILED flag under nvme-fc controller lock.

Signed-off-by: Michael Liang <mliang@purestorage.com>
Reviewed-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-12 09:29:51 -07:00
Michael Liang
60e445bdfc nvme-fc: return non-zero status code when fails to create association
Return non-zero status code(-EIO) when needed, so re-connecting or
deleting controller will be triggered properly.

Signed-off-by: Michael Liang <mliang@purestorage.com>
Reviewed-by: Caleb Sander <csander@purestorage.com>
Reviewed-by: James Smart <jsmart2021@gmail.com>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-12 09:29:48 -07:00
Minjie Du
733ba346d9 nvme: fix parameter check in nvme_fault_inject_init()
Make IS_ERR() judge the debugfs_create_dir() function return.

Signed-off-by: Minjie Du <duminjie@vivo.com>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-12 08:48:33 -07:00
Keith Busch
b718ae835b nvme: warn only once for legacy uuid attribute
Report the legacy fallback behavior for uuid attributes just once
instead of logging repeated warnings for the same condition every time
the attribute is read. The old behavior is too spamy on the kernel logs.

Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Reported-by: Breno Leitao <leitao@debian.org>
Reviewed-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Keith Busch <kbusch@kernel.org>
2023-07-12 08:48:19 -07:00