linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-11 12:28:41 +08:00

Author	SHA1	Message	Date
Linus Torvalds	3e78198862	for-6.11/block-20240710 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmaOTd8QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgppqIEACUr8Vv2FtezvT3OfVSlYWHHLXzkRhwEG5s vdk0o7Ow6U54sMjfymbHTgLD0ZOJf3uJ6BI95FQuW41jPzDFVbx4Hy8QzqonMkw9 1D/YQ4zrVL2mOKBzATbKpoGJzMOzGeoXEueFZ1AYPAX7RrDtP4xPQNfrcfkdE2zF LycJN70Vp6lrZZMuI9yb9ts1tf7TFzK0HJANxOAKTgSiPmBmxesjkJlhrdUrgkAU qDVyjj7u/ssndBJAb9i6Bl95Do8s9t4DeJq5/6wgKqtf5hClMXzPVB8Wy084gr6E rTRsCEhOug3qEZSqfAgAxnd3XFRNc/p2KMUe5YZ4mAqux4hpSmIQQDM/5X5K9vEv f4MNqUGlqyqntZx+KPyFpf7kLHFYS1qK4ub0FojWJEY4GrbBPNjjncLJ9+ozR0c8 kNDaFjMNAjalBee1FxNNH8LdVcd28rrCkPxRLEfO/gvBMUmvJf4ZyKmSED0v5DhY vZqKlBqG+wg0EXvdiWEHMDh9Y+q/2XBIkS6NN/Bhh61HNu+XzC838ts1X7lR+4o2 AM5Vapw+v0q6kFBMRP3IcJI/c0UcIU8EQU7axMyzWtvhog8kx8x01hIj1L4UyYYr rUdWrkugBVXJbywFuH/QIJxWxS/z4JdSw5VjASJLIrXy+aANmmG9Wonv95eyhpUv 5iv+EdRSNA== =wVi8 -----END PGP SIGNATURE----- Merge tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux Pull block updates from Jens Axboe: - NVMe updates via Keith: - Device initialization memory leak fixes (Keith) - More constants defined (Weiwen) - Target debugfs support (Hannes) - PCIe subsystem reset enhancements (Keith) - Queue-depth multipath policy (Redhat and PureStorage) - Implement get_unique_id (Christoph) - Authentication error fixes (Gaosheng) - MD updates via Song - sync_action fix and refactoring (Yu Kuai) - Various small fixes (Christoph Hellwig, Li Nan, and Ofir Gal, Yu Kuai, Benjamin Marzinski, Christophe JAILLET, Yang Li) - Fix loop detach/open race (Gulam) - Fix lower control limit for blk-throttle (Yu) - Add module descriptions to various drivers (Jeff) - Add support for atomic writes for block devices, and statx reporting for same. Includes SCSI and NVMe (John, Prasad, Alan) - Add IO priority information to block trace points (Dongliang) - Various zone improvements and tweaks (Damien) - mq-deadline tag reservation improvements (Bart) - Ignore direct reclaim swap writes in writeback throttling (Baokun) - Block integrity improvements and fixes (Anuj) - Add basic support for rust based block drivers. Has a dummy null_blk variant for now (Andreas) - Series converting driver settings to queue limits, and cleanups and fixes related to that (Christoph) - Cleanup for poking too deeply into the bvec internals, in preparation for DMA mapping API changes (Christoph) - Various minor tweaks and fixes (Jiapeng, John, Kanchan, Mikulas, Ming, Zhu, Damien, Christophe, Chaitanya) * tag 'for-6.11/block-20240710' of git://git.kernel.dk/linux: (206 commits) floppy: add missing MODULE_DESCRIPTION() macro loop: add missing MODULE_DESCRIPTION() macro ublk_drv: add missing MODULE_DESCRIPTION() macro xen/blkback: add missing MODULE_DESCRIPTION() macro block/rnbd: Constify struct kobj_type block: take offset into account in blk_bvec_map_sg again block: fix get_max_segment_size() warning loop: Don't bother validating blocksize virtio_blk: Don't bother validating blocksize null_blk: Don't bother validating blocksize block: Validate logical block size in blk_validate_limits() virtio_blk: Fix default logical block size fallback nvmet-auth: fix nvmet_auth hash error handling nvme: implement ->get_unique_id block: pass a phys_addr_t to get_max_segment_size block: add a bvec_phys helper blk-lib: check for kill signal in ioctl BLKZEROOUT block: limit the Write Zeroes to manually writing zeroes fallback block: refacto blkdev_issue_zeroout block: move read-only and supported checks into (__)blkdev_issue_zeroout ...	2024-07-15 14:20:22 -07:00
Christoph Hellwig	1122c0c1cc	block: move cache control settings out of queue->flags Move the cache control settings into the queue_limits so that the flags can be set atomically with the device queue frozen. Add new features and flags field for the driver set flags, and internal (usually sysfs-controlled) flags in the block layer. Note that we'll eventually remove enough field from queue_limits to bring it back to the previous size. The disable flag is inverted compared to the previous meaning, which means it now survives a rescan, similar to the max_sectors and max_discard_sectors user limits. The FLUSH and FUA flags are now inherited by blk_stack_limits, which simplified the code in dm a lot, but also causes a slight behavior change in that dm-switch and dm-unstripe now advertise a write cache despite setting num_flush_bios to 0. The I/O path will handle this gracefully, but as far as I can tell the lack of num_flush_bios and thus flush support is a pre-existing data integrity bug in those targets that really needs fixing, after which a non-zero num_flush_bios should be required in dm for targets that map to underlying devices. Signed-off-by: Christoph Hellwig <hch@lst.de> Acked-by: Ulf Hansson <ulf.hansson@linaro.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Link: https://lore.kernel.org/r/20240617060532.127975-14-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-19 07:58:28 -06:00
Christoph Hellwig	70905f8706	block: remove blk_flush_policy Fold blk_flush_policy into the only caller to prepare for pending changes to it. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20240617060532.127975-13-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-19 07:58:28 -06:00
Chengming Zhou	d0321c812d	block: fix request.queuelist usage in flush Friedrich Weber reported a kernel crash problem and bisected to commit `81ada09cc2` ("blk-flush: reuse rq queuelist in flush state machine"). The root cause is that we use "list_move_tail(&rq->queuelist, pending)" in the PREFLUSH/POSTFLUSH sequences. But rq->queuelist.next == xxx since it's popped out from plug->cached_rq in __blk_mq_alloc_requests_batch(). We don't initialize its queuelist just for this first request, although the queuelist of all later popped requests will be initialized. Fix it by changing to use "list_add_tail(&rq->queuelist, pending)" so rq->queuelist doesn't need to be initialized. It should be ok since rq can't be on any list when PREFLUSH or POSTFLUSH, has no move actually. Please note the commit `81ada09cc2` ("blk-flush: reuse rq queuelist in flush state machine") also has another requirement that no drivers would touch rq->queuelist after blk_mq_end_request() since we will reuse it to add rq to the post-flush pending list in POSTFLUSH. If this is not true, we will have to revert that commit IMHO. This updated version adds "list_del_init(&rq->queuelist)" in flush rq callback since the dm layer may submit request of a weird invalid format (REQ_FSEQ_PREFLUSH \| REQ_FSEQ_POSTFLUSH), which causes double list_add if without this "list_del_init(&rq->queuelist)". The weird invalid format problem should be fixed in dm layer. Reported-by: Friedrich Weber <f.weber@proxmox.com> Closes: https://lore.kernel.org/lkml/14b89dfb-505c-49f7-aebb-01c54451db40@proxmox.com/ Closes: https://lore.kernel.org/lkml/c9d03ff7-27c5-4ebd-b3f6-5a90d96f35ba@proxmox.com/ Fixes: `81ada09cc2` ("blk-flush: reuse rq queuelist in flush state machine") Cc: Christoph Hellwig <hch@lst.de> Cc: ming.lei@redhat.com Cc: bvanassche@acm.org Tested-by: Friedrich Weber <f.weber@proxmox.com> Signed-off-by: Chengming Zhou <chengming.zhou@linux.dev> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20240608143115.972486-1-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-06-12 10:58:11 -06:00
Damien Le Moal	af147b740f	block: Fix flush request sector restore Make sure that a request bio is not NULL before trying to restore the request start sector. Reported-by: Yi Zhang <yi.zhang@redhat.com> Fixes: `6f8fd758de` ("block: Restore sector of flush requests") Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240501110907.96950-9-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-05-01 08:08:43 -06:00
Damien Le Moal	6f8fd758de	block: Restore sector of flush requests On completion of a flush sequence, blk_flush_restore_request() restores the bio of a request to the original submitted BIO. However, the last use of the request in the flush sequence may have been for a POSTFLUSH which does not have a sector. So make sure to restore the request sector using the iter sector of the original BIO. This BIO has not changed yet since the completions of the flush sequence intermediate steps use requeueing of the request until all steps are completed. Restoring the request sector ensures that blk_mq_end_request() will see a valid sector as originally set when the flush BIO was submitted. Signed-off-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: Hans Holmberg <hans.holmberg@wdc.com> Tested-by: Dennis Maisenbacher <dennis.maisenbacher@wdc.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Link: https://lore.kernel.org/r/20240408014128.205141-2-dlemoal@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-04-17 08:44:02 -06:00
Jens Axboe	08420cf70c	block: add blk_time_get_ns() and blk_time_get() helpers Convert any user of ktime_get_ns() to use blk_time_get_ns(), and ktime_get() to blk_time_get(), so we have a unified API for querying the current time in nanoseconds or as ktime. No functional changes intended, this patch just wraps ktime_get_ns() and ktime_get() with a block helper. Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2024-02-05 10:07:22 -07:00
Chengming Zhou	48554df6bf	blk-mq: remove RQF_MQ_INFLIGHT Since the previous patch change to only account active requests when we really allocate the driver tag, the RQF_MQ_INFLIGHT can be removed and no double account problem. 1. none elevator: flush request will use the first pending request's driver tag, won't double account. 2. other elevator: flush request will be accounted when allocate driver tag when issue, and will be unaccounted when it put the driver tag. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230913151616.3164338-3-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-09-22 08:52:13 -06:00
Chengming Zhou	81ada09cc2	blk-flush: reuse rq queuelist in flush state machine Since we don't need to maintain inflight flush_data requests list anymore, we can reuse rq->queuelist for flush pending list. Note in mq_flush_data_end_io(), we need to re-initialize rq->queuelist before reusing it in the state machine when end, since the rq->rq_next also reuse it, may have corrupted rq->queuelist by the driver. This patch decrease the size of struct request by 16 bytes. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20230717040058.3993930-5-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-07-17 08:18:21 -06:00
Chengming Zhou	b175c86739	blk-flush: count inflight flush_data requests The flush state machine use a double list to link all inflight flush_data requests, to avoid issuing separate post-flushes for these flush_data requests which shared PREFLUSH. So we can't reuse rq->queuelist, this is why we need rq->flush.list In preparation of the next patch that reuse rq->queuelist for flush state machine, we change the double linked list to unsigned long counter, which count all inflight flush_data requests. This is ok since we only need to know if there is any inflight flush_data request, so unsigned long counter is good. Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230717040058.3993930-4-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-07-17 08:18:21 -06:00
Chengming Zhou	28b2412374	blk-flush: fix rq->flush.seq for post-flush requests If the policy == (REQ_FSEQ_DATA \| REQ_FSEQ_POSTFLUSH), it means that the data sequence and post-flush sequence need to be done for this request. The rq->flush.seq should record what sequences have been done (or don't need to be done). So in this case, pre-flush doesn't need to be done, we should init rq->flush.seq to REQ_FSEQ_PREFLUSH not REQ_FSEQ_POSTFLUSH. Fixes: `615939a2ae` ("blk-mq: defer to the normal submission path for post-flush requests") Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230717040058.3993930-3-chengming.zhou@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-07-17 08:18:21 -06:00
Christoph Hellwig	9f87fc4d72	block: queue data commands from the flush state machine at the head We used to insert the data commands following a pre-flush to the head of the queue until commit `1e82fadfc6` ("blk-mq: do not do head insertions post-pre-flush commands"). Not doing this seems to cause hangs of such commands on NFS workloads when exported from file systems with SATA SSDs. I have no idea why this would starve these workloads, but doing a semantic revert of this patch (which looks quite different due to various other changes) fixes the hangs. Fixes: `1e82fadfc6` ("blk-mq: do not do head insertions post-pre-flush commands") Reported-by: Chuck Lever <chuck.lever@oracle.com> Signed-off-by: Christoph Hellwig <hch@lst.de> Tested-by: Chuck Lever <chuck.lever@oracle.com> Link: https://lore.kernel.org/r/20230714143014.11879-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-07-14 08:42:58 -06:00
Christoph Hellwig	9a67aa52a4	blk-mq: don't use the requeue list to queue flush commands Currently both requeues of commands that were already sent to the driver and flush commands submitted from the flush state machine share the same requeue_list struct request_queue, despite requeues doing head insertions and flushes not. Switch to using two separate lists instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230519044050.107790-8-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-05-19 19:52:42 -06:00
Christoph Hellwig	1e82fadfc6	blk-mq: do not do head insertions post-pre-flush commands blk_flush_complete_seq currently queues requests that write data after a pre-flush from the flush state machine at the head of the queue. This doesn't really make sense, as the original request bypassed all queue lists by directly diverting to blk_insert_flush from blk_mq_submit_bio. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230519044050.107790-7-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-05-19 19:52:29 -06:00
Christoph Hellwig	615939a2ae	blk-mq: defer to the normal submission path for post-flush requests Requests with the FUA bit on hardware without FUA support need a post flush before returning to the caller, but they can still be sent using the normal I/O path after initializing the flush-related fields and end I/O handler. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230519044050.107790-6-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-05-19 19:52:29 -06:00
Christoph Hellwig	360f264834	blk-mq: defer to the normal submission path for non-flush flush commands If blk_insert_flush decides that a command does not need to use the flush state machine, return false and let blk_mq_submit_bio handle it the normal way (including using an I/O scheduler) instead of doing a bypass insert. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230519044050.107790-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-05-19 19:52:29 -06:00
Christoph Hellwig	c1075e548c	blk-mq: reflow blk_insert_flush Use a switch statement to decide on the disposition of a flush request instead of multiple if statements, out of which one does checks that are more complex than required. Also warn on a malformed request early on instead of doing a BUG_ON later. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230519044050.107790-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-05-19 19:52:29 -06:00
Christoph Hellwig	0b573692f1	blk-mq: factor out a blk_rq_init_flush helper Factor out a helper from blk_insert_flush that initializes the flush machine related fields in struct request, and don't bother with the full memset as there's just a few fields to initialize, and all but one already have explicit initializers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20230519044050.107790-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-05-19 19:52:29 -06:00
Christoph Hellwig	26a42b614e	blk-mq: fix the blk_mq_add_to_requeue_list call in blk_kick_flush Commit `b12e5c6c75` accidentally changes blk_kick_flush to do a head insert into the requeue list, fix this up. Fixes: `b12e5c6c75` ("blk-mq: pass a flags argument to blk_mq_add_to_requeue_list") Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20230416073553.966161-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-16 13:01:43 -06:00
Christoph Hellwig	b12e5c6c75	blk-mq: pass a flags argument to blk_mq_add_to_requeue_list Replace the boolean at_head argument with the same flags that are already passed to blk_mq_insert_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-21-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	2b5976134b	blk-mq: pass a flags argument to blk_mq_request_bypass_insert Replace the boolean at_head argument with the same flags that are already passed to blk_mq_insert_request. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	214a441805	blk-mq: don't kick the requeue_list in blk_mq_add_to_requeue_list blk_mq_add_to_requeue_list takes a bool parameter to control how to kick the requeue list at the end of the function. Move the call to blk_mq_kick_requeue_list to the callers that want it instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-17-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	2394395cd5	blk-mq: don't run the hw_queue from blk_mq_request_bypass_insert blk_mq_request_bypass_insert takes a bool parameter to control how to run the queue at the end of the function. Move the blk_mq_run_hw_queue call to the callers that want it instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-16-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:30 -06:00
Christoph Hellwig	a4fa57ffb7	blk-mq: remove blk_flush_queue_rq Just call blk_mq_add_to_requeue_list directly from the two callers. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-10-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:29 -06:00
Christoph Hellwig	90110e04f2	blk-mq: include <linux/blk-mq.h> in block/blk-mq.h block/blk-mq.h needs various definitions from <linux/blk-mq.h>, include it there instead of relying on the source files to include both. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:29 -06:00
Christoph Hellwig	bebe84ebee	blk-mq: remove blk-mq-tag.h blk-mq-tag.h is always included by blk-mq.h, and causes recursive inclusion hell with further changes. Just merge it into blk-mq.h instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Damien Le Moal <dlemoal@kernel.org> Link: https://lore.kernel.org/r/20230413064057.707578-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2023-04-13 06:52:29 -06:00
Jens Axboe	de671d6116	block: change request end_io handler to pass back a return value Everything is just converted to returning RQ_END_IO_NONE, and there should be no functional changes with this patch. In preparation for allowing the end_io handler to pass ownership back to the block layer, rather than retain ownership of the request. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-30 07:49:09 -06:00
Jens Axboe	e73a625bc2	block: kill deprecated BUG_ON() in the flush handling We've never had any useful reports from this BUG_ON(), and in fact a number of the BUG_ON()'s in the flush handling need to be turned into more graceful handling. In preparation for allowing batched completions of the end_io handling, where we can enter the flush completion with queuelist having been reused for the batch, get rid of this BUG_ON(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-30 07:48:00 -06:00
Bart Van Assche	16458cf3bd	block: Use the new blk_opf_t type Use the new blk_opf_t type for arguments and variables that represent request flags or a bitwise combination of a request operation and request flags. Rename the function arguments and also a structure member that hold a request operation and flags from 'rw' into 'opf'. This patch does not change any functionality. Cc: Christoph Hellwig <hch@lst.de> Cc: Ming Lei <ming.lei@redhat.com> Cc: Hannes Reinecke <hare@suse.de> Cc: Damien Le Moal <damien.lemoal@wdc.com> Cc: Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20220714180729.1065367-7-bvanassche@acm.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-07-14 12:14:30 -06:00
Christoph Hellwig	49add4966d	block: pass a block_device and opf to bio_init Pass the block_device that we plan to use this bio for and the operation to bio_init to optimize the assignment. A NULL block_device can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-19-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-02-02 07:49:59 -07:00
Jens Axboe	0a467d0fdd	block: switch to atomic_t for request references refcount_t is not as expensive as it used to be, but it's still more expensive than the io_uring method of using atomic_t and just checking for potential over/underflow. This borrows that same implementation, which in turn is based on the mm implementation from Linus. Reviewed-by: Keith Busch <kbusch@kernel.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-12-03 14:51:29 -07:00
Ye Bin	8a7518931b	block: Fix fsync always failed if once failed We do test with inject error fault base on v4.19, after test some time we found sync /dev/sda always failed. [root@localhost] sync /dev/sda sync: error syncing '/dev/sda': Input/output error scsi log as follows: [19069.812296] sd 0:0:0:0: [sda] tag#64 Send: scmd 0x00000000d03a0b6b [19069.812302] sd 0:0:0:0: [sda] tag#64 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [19069.812533] sd 0:0:0:0: [sda] tag#64 Done: SUCCESS Result: hostbyte=DID_OK driverbyte=DRIVER_OK [19069.812536] sd 0:0:0:0: [sda] tag#64 CDB: Synchronize Cache(10) 35 00 00 00 00 00 00 00 00 00 [19069.812539] sd 0:0:0:0: [sda] tag#64 scsi host busy 1 failed 0 [19069.812542] sd 0:0:0:0: Notifying upper driver of completion (result 0) [19069.812546] sd 0:0:0:0: [sda] tag#64 sd_done: completed 0 of 0 bytes [19069.812549] sd 0:0:0:0: [sda] tag#64 0 sectors total, 0 bytes done. [19069.812564] print_req_error: I/O error, dev sda, sector 0 ftrace log as follows: rep-306069 [007] .... 19654.923315: block_bio_queue: 8,0 FWS 0 + 0 [rep] rep-306069 [007] .... 19654.923333: block_getrq: 8,0 FWS 0 + 0 [rep] kworker/7:1H-250 [007] .... 19654.923352: block_rq_issue: 8,0 FF 0 () 0 + 0 [kworker/7:1H] <idle>-0 [007] ..s. 19654.923562: block_rq_complete: 8,0 FF () 18446744073709551615 + 0 [0] <idle>-0 [007] d.s. 19654.923576: block_rq_complete: 8,0 WS () 0 + 0 [-5] As `8d6996630c` introduce 'fq->rq_status', this data only update when 'flush_rq' reference count isn't zero. If flush request once failed and record error code in 'fq->rq_status'. If there is no chance to update 'fq->rq_status',then do fsync will always failed. To address this issue reset 'fq->rq_status' after return error code to upper layer. Fixes: 8d6996630c03("block: fix null pointer dereference in blk_mq_rq_timed_out()") Signed-off-by: Ye Bin <yebin10@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211129012659.1553733-1-yebin10@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-29 06:46:04 -07:00
Christoph Hellwig	f3fa33acca	block: remove the ->rq_disk field in struct request Just use the disk attached to the request_queue instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Link: https://lore.kernel.org/r/20211126121802.2090656-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-29 06:41:29 -07:00
Christoph Hellwig	82d981d423	block: don't include <linux/part_stat.h> in blk.h Not needed, shift it into the source files that need it instead. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211123185312.1432157-9-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-29 06:38:44 -07:00
Christoph Hellwig	0281ed3cf4	block: move blk_get_flush_queue to blk-flush.c blk_get_flush_queue is only used in blk-flush.c, so move it there. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211123185312.1432157-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-29 06:38:44 -07:00
Ming Lei	2b504bd484	blk-mq: don't insert FUA request with data into scheduler queue We never insert flush request into scheduler queue before. Recently commit `d92ca9d834` ("blk-mq: don't handle non-flush requests in blk_insert_flush") tries to handle FUA data request as normal request. This way has caused warning[1] in mq-deadline dd_exit_sched() or io hang in case of kyber since RQF_ELVPRIV isn't set for flush request, then ->finish_request won't be called. Fix the issue by inserting FUA data request with blk_mq_request_bypass_insert() when the device supports FUA, just like what we did before. [1] https://lore.kernel.org/linux-block/CAHj4cs-_vkTW=dAzbZYGxpEWSpzpcmaNeY1R=vH311+9vMUSdg@mail.gmail.com/ Reported-by: Yi Zhang <yi.zhang@redhat.com> Fixes: `d92ca9d834` ("blk-mq: don't handle non-flush requests in blk_insert_flush") Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Bart Van Assche <bvanassche@acm.org> Link: https://lore.kernel.org/r/20211118153041.2163228-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-19 06:28:18 -07:00
Christoph Hellwig	d92ca9d834	blk-mq: don't handle non-flush requests in blk_insert_flush Return to the normal blk_mq_submit_bio flow if the bio did not end up actually being a flush because the device didn't support it. Note that this is basically impossible to hit without special instrumentation given that submit_bio_checks already clears these flags usually, so we'd need a tight race to actually hit this code path. With this the call to blk_mq_run_hw_queue for the flush requests can be removed given that the actual flush requests are always issued via the requeue workqueue which runs the queue unconditionally. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019122553.2467817-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 11:10:09 -06:00
Ming Lei	a9ed27a764	blk-mq: fix is_flush_rq is_flush_rq() is called from bt_iter()/bt_tags_iter(), and runs the following check: hctx->fq->flush_rq == req but the passed hctx from bt_iter()/bt_tags_iter() may be NULL because: 1) memory re-order in blk_mq_rq_ctx_init(): rq->mq_hctx = data->hctx; ... refcount_set(&rq->ref, 1); OR 2) tag re-use and ->rqs[] isn't updated with new request. Fix the issue by re-writing is_flush_rq() as: return rq->end_io == flush_end_io; which turns out simpler to follow and immune to data race since we have ordered WRITE rq->end_io and refcount_set(&rq->ref, 1). Fixes: `2e315dc07d` ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter") Cc: "Blank-Burian, Markus, Dr." <blankburian@uni-muenster.de> Cc: Yufen Yu <yuyufen@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210818010925.607383-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-17 20:17:34 -06:00
Ming Lei	c2da19ed50	blk-mq: fix kernel panic during iterating over flush request For fixing use-after-free during iterating over requests, we grabbed request's refcount before calling ->fn in commit `2e315dc07d` ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter"). Turns out this way may cause kernel panic when iterating over one flush request: 1) old flush request's tag is just released, and this tag is reused by one new request, but ->rqs[] isn't updated yet 2) the flush request can be re-used for submitting one new flush command, so blk_rq_init() is called at the same time 3) meantime blk_mq_queue_tag_busy_iter() is called, and old flush request is retrieved from ->rqs[tag]; when blk_mq_put_rq_ref() is called, flush_rq->end_io may not be updated yet, so NULL pointer dereference is triggered in blk_mq_put_rq_ref(). Fix the issue by calling refcount_set(&flush_rq->ref, 1) after flush_rq->end_io is set. So far the only other caller of blk_rq_init() is scsi_ioctl_reset() in which the request doesn't enter block IO stack and the request reference count isn't used, so the change is safe. Fixes: `2e315dc07d` ("blk-mq: grab rq->refcount before calling ->fn in blk_mq_tagset_busy_iter") Reported-by: "Blank-Burian, Markus, Dr." <blankburian@uni-muenster.de> Tested-by: "Blank-Burian, Markus, Dr." <blankburian@uni-muenster.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/20210811142624.618598-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-08-17 08:33:32 -06:00
Ming Lei	84da7acc3b	block: avoid double io accounting for flush request For flush request, rq->end_io() may be called two times, one is from timeout handling(blk_mq_check_expired()), another is from normal completion(__blk_mq_end_request()). Move blk_account_io_flush() after flush_rq->ref drops to zero, so io accounting can be done just once for flush request. Fixes: `b686631865` ("block: add iostat counters for flush requests") Reviewed-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Christoph Hellwig <hch@lst.de> Tested-by: John Garry <john.garry@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20210511152236.763464-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-05-24 06:47:21 -06:00
Christoph Hellwig	c6bf3f0e25	block: use an on-stack bio in blkdev_issue_flush There is no point in allocating memory for a synchronous flush. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by: Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by: Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-01-27 09:51:48 -07:00
Linus Torvalds	ac7ac4618c	for-5.11/block-2020-12-14 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl/Xec8QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpoLbEACzXypgZWwMdfgRckA/Vt333rXHtbhUV+hK 2XP+P81iRvr9Esi31UPbRp82vrgcDO0cpI1QmQojS5U5TIQP88BfXptfRZZu48eb wT5RDDNQ34HItqAh/yEuYsv9yUKcxeIrB99tBVvM+4UmQg9zTdIW3mg6PvCBdbhV N38jI0tCF/PJatjfRuphT/nXonQLPWBlVDmZk06KZQFOwQe9ep1vUi1+nbiRPuo3 geFBpTh1Kp6Vl1B3n4RpECs6Y7I0RRuJdaH2sDizICla1/BW91F9fQwHimNnUxUq e1Q1kMuh6ftcQGkYlHSYcPhuv6CvorldTZCO5arPxWpcwvxriTSMRPWAgUr5pEiF fhiGhqeDu9e6vl9vS31wUD1B30hy+jFz9wyjRrDwJ3cPHH1JVBjTzvdX+cIh/1ku IbIwUMteUtvUrzqAv/DzbGhedp7xWtOFaVo8j0QFYh9zkjd6b8yDOF/yztwX2gjY Xt1cd+KpDSiN449ZRaoMI0sCJAxqzhMa6nsWlb0L7KuNyWKAbvKQBm9Rb47FLV9A Vx70KC+zkFoyw23capvIahmQazerriUJ5PGe0lVm6ROgmIFdCpXTPDjnrvq/6RZ/ GEpD7gTW9atGJ7EuEE8686sAfKD5kneChWLX5EHXf0d0AG5Mr2lKsluiGp5LpPJg Q1Xqs6xwww== =zo4w -----END PGP SIGNATURE----- Merge tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block Pull block updates from Jens Axboe: "Another series of killing more code than what is being added, again thanks to Christoph's relentless cleanups and tech debt tackling. This contains: - blk-iocost improvements (Baolin Wang) - part0 iostat fix (Jeffle Xu) - Disable iopoll for split bios (Jeffle Xu) - block tracepoint cleanups (Christoph Hellwig) - Merging of struct block_device and hd_struct (Christoph Hellwig) - Rework/cleanup of how block device sizes are updated (Christoph Hellwig) - Simplification of gendisk lookup and removal of block device aliasing (Christoph Hellwig) - Block device ioctl cleanups (Christoph Hellwig) - Removal of bdget()/blkdev_get() as exported API (Christoph Hellwig) - Disk change rework, avoid ->revalidate_disk() (Christoph Hellwig) - sbitmap improvements (Pavel Begunkov) - Hybrid polling fix (Pavel Begunkov) - bvec iteration improvements (Pavel Begunkov) - Zone revalidation fixes (Damien Le Moal) - blk-throttle limit fix (Yu Kuai) - Various little fixes" * tag 'for-5.11/block-2020-12-14' of git://git.kernel.dk/linux-block: (126 commits) blk-mq: fix msec comment from micro to milli seconds blk-mq: update arg in comment of blk_mq_map_queue blk-mq: add helper allocating tagset->tags Revert "block: Fix a lockdep complaint triggered by request queue flushing" nvme-loop: use blk_mq_hctx_set_fq_lock_class to set loop's lock class blk-mq: add new API of blk_mq_hctx_set_fq_lock_class block: disable iopoll for split bio block: Improve blk_revalidate_disk_zones() checks sbitmap: simplify wrap check sbitmap: replace CAS with atomic and sbitmap: remove swap_lock sbitmap: optimise sbitmap_deferred_clear() blk-mq: skip hybrid polling if iopoll doesn't spin blk-iocost: Factor out the base vrate change into a separate function blk-iocost: Factor out the active iocgs' state check into a separate function blk-iocost: Move the usage ratio calculation to the correct place blk-iocost: Remove unnecessary advance declaration blk-iocost: Fix some typos in comments blktrace: fix up a kerneldoc comment block: remove the request_queue to argument request based tracepoints ...	2020-12-16 12:57:51 -08:00
Ming Lei	7aa390ec2d	Revert "block: Fix a lockdep complaint triggered by request queue flushing" This reverts commit `b3c6a59975`. Now we can avoid nvme-loop lockdep warning of 'lockdep possible recursive locking' by nvme-loop's lock class, no need to apply dynamically allocated lock class key, so revert commit b3c6a5997541("block: Fix a lockdep complaint triggered by request queue flushing"). This way fixes horrible SCSI probe delay issue on megaraid_sas, and it is reported the whole probe may take more than half an hour. Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Reported-by: Qian Cai <cai@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: John Garry <john.garry@huawei.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-07 20:30:19 -07:00
Ming Lei	fb01a2932e	blk-mq: add new API of blk_mq_hctx_set_fq_lock_class flush_end_io() may be called recursively from some driver, such as nvme-loop, so lockdep may complain 'possible recursive locking'. Commit b3c6a5997541("block: Fix a lockdep complaint triggered by request queue flushing") tried to address this issue by assigning dynamically allocated per-flush-queue lock class. This solution adds synchronize_rcu() for each hctx's release handler, and causes horrible SCSI MQ probe delay(more than half an hour on megaraid sas). Add new API of blk_mq_hctx_set_fq_lock_class() for these drivers, so we just need to use driver specific lock class for avoiding the lockdep warning of 'possible recursive locking'. Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Reported-by: Qian Cai <cai@redhat.com> Cc: Sumit Saxena <sumit.saxena@broadcom.com> Cc: John Garry <john.garry@huawei.com> Cc: Kashyap Desai <kashyap.desai@broadcom.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.de> Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-07 20:30:19 -07:00
Christoph Hellwig	8446fe9255	block: switch partition lookup to use struct block_device Use struct block_device to lookup partitions on a disk. This removes all usage of struct hd_struct from the I/O path. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Acked-by: Coly Li <colyli@suse.de> [bcache] Acked-by: Chao Yu <yuchao0@huawei.com> [f2fs] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:40 -07:00
Christoph Hellwig	cb8432d650	block: allocate struct hd_struct as part of struct bdev_inode Allocate hd_struct together with struct block_device to pre-load the lifetime rule changes in preparation of merging the two structures. Note that part0 was previously embedded into struct gendisk, but is a separate allocation now, and already points to the block_device instead of the hd_struct. The lifetime of struct gendisk is still controlled by the struct device embedded in the part0 hd_struct. Signed-off-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Jan Kara <jack@suse.cz> Reviewed-by: Hannes Reinecke <hare@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-12-01 14:53:40 -07:00
Ming Lei	9f16a66733	block: mark flush request as IDLE when it is really finished For avoiding use-after-free on flush request, we call its .end_io() from both timeout code path and __blk_mq_end_request(). When flush request's ref doesn't drop to zero, it is still used, we can't mark it as IDLE, so fix it by marking IDLE when its refcount drops to zero really. Fixes: `65ff5cd045` ("blk-mq: mark flush request as IDLE in flush_end_io()") Signed-off-by: Ming Lei <ming.lei@redhat.com> Cc: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-11-13 14:24:16 -07:00
Ming Lei	65ff5cd045	blk-mq: mark flush request as IDLE in flush_end_io() Mark flush request as IDLE in its .end_io(), aligning it with how normal requests behave. The flush request stays in in-flight tags if we're not using an IO scheduler, so we need to change its state into IDLE. Otherwise, we will hang in blk_mq_tagset_wait_completed_request() during error recovery because flush the request state is kept as COMPLETED. Reported-by: Yi Zhang <yi.zhang@redhat.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Yi Zhang <yi.zhang@redhat.com> Cc: Chao Leng <lengchao@huawei.com> Cc: Sagi Grimberg <sagi@grimberg.me> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-10-30 08:33:49 -06:00
Ming Lei	c1e2b8422b	block: fix double account of flush request's driver tag In case of none scheduler, we share data request's driver tag for flush request, so have to mark the flush request as INFLIGHT for avoiding double account of this driver tag. Fixes: `568f270065` ("blk-mq: centralise related handling into blk_mq_get_driver_tag") Reported-by: Matthew Wilcox <willy@infradead.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> Tested-by: Matthew Wilcox <willy@infradead.org> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-08-11 13:53:32 -06:00
Yufen Yu	b5718d6c00	block: defer flush request no matter whether we have elevator Commit `7520872c0c` ("block: don't defer flushes on blk-mq + scheduling") tried to fix deadlock for cycled wait between flush requests and data request into flush_data_in_flight. The former holded all driver tags and wait for data request completion, but the latter can not complete for waiting free driver tags. After commit `923218f616` ("blk-mq: don't allocate driver tag upfront for flush rq"), flush requests will not get driver tag before queuing into flush queue. * With elevator, flush request just get sched_tags before inserting flush queue. It will not get driver tag until issue them to driver. data request on list fq->flush_data_in_flight will complete in the end. * Without elevator, each flush request will get a driver tag when allocate request. Then data request on fq->flush_data_in_flight don't worry about lacking driver tag. In both of these cases, cycled wait cannot be true. So we may allow to defer flush request. Signed-off-by: Yufen Yu <yuyufen@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2020-07-17 07:14:28 -06:00

1 2 3 4

161 Commits