linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-21 11:44:01 +08:00

Author	SHA1	Message	Date
Jens Axboe	c98cb5bbda	block: make bio_queue_enter() fast-path available inline Just a prep patch for shifting the queue enter logic. This moves the expected fast path inline, and leaves __bio_queue_enter() as an out-of-line function call. We don't want to inline the latter, as it's mostly slow path code. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-04 12:54:33 -06:00
Jens Axboe	71539717c1	block: split request allocation components into helpers This is in preparation for a fix, but serves as a cleanup as well moving the cached vs regular alloc logic out of blk_mq_submit_bio(). Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-04 12:50:51 -06:00
Jens Axboe	c5fc7b9317	block: have plug stored requests hold references to the queue Requests that were stored in the cache deliberately didn't hold an enter reference to the queue, instead we grabbed one every time we pulled a request out of there. That made for awkward logic on freeing the remainder of the cached list, if needed, where we had to artificially raise the queue usage count before each free. Grab references up front for cached plug requests. That's safer, and also more efficient. Fixes: `47c122e35d` ("block: pre-allocate requests if plug is started and is a batch") Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-04 12:50:46 -06:00
Ming Lei	3b87c6ea67	blk-mq: update hctx->nr_active in blk_mq_end_request_batch() In case of shared tags and none io sched, batched completion still may be run into, and hctx->nr_active is accounted when getting driver tag, so it has to be updated in blk_mq_end_request_batch(). Otherwise, hctx->nr_active may become same with queue depth, then hctx_may_queue() always return false, then io hang is caused. Fixes the issue by updating the counter in batched way. Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Fixes: `f794f3351f` ("block: add support for blk_mq_end_request_batch()") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211102153619.3627505-4-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-03 09:27:57 -06:00
Ming Lei	62ba0c008f	blk-mq: add RQF_ELV debug entry Looks it is missed so add it. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211102133502.3619184-3-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-03 09:27:57 -06:00
Ming Lei	a1cb65377e	blk-mq: only try to run plug merge if request has same queue with incoming bio It is obvious that io merge can't be done between two different queues, so just try to run io merge in case of same queue. Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211102133502.3619184-2-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-03 09:27:57 -06:00
Jens Axboe	781dd830ec	block: move RQF_ELV setting into allocators It's not safe to do this before blk_queue_enter(), as the scheduler state could have changed in between. Hence move the RQF_ELV setting into the allocators, where we know the queue is already entered. Suggested-by: Ming Lei <ming.lei@redhat.com> Reported-by: Yi Zhang <yi.zhang@redhat.com> Reported-by: Steffen Maier <maier@linux.ibm.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-03 09:26:26 -06:00
Jens Axboe	b22809092c	block: replace always false argument with 'false' A previous commit fixed up the condition for doing direct issue, but that left the 'from_schedule' argument dead inside the branch. Replace it with 'false'. Fixes: `ff1552232b` ("blk-mq: don't issue request directly in case that current is to be blocked") Reviewed-by: Ming Lei <ming.lei@redhat.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-02 06:57:27 -06:00
Jens Axboe	a22c00be90	block: assign correct tag before doing prefetch of request Ensure that current tag is correctly assigned before attempting to prefetch the first cacheline of the request. Fixes: `92aff191cc` ("block: prefetch request to be initialized") Reported-and-tested-by: syzbot+cd20829ac44b92bf6ed0@syzkaller.appspotmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-11-02 06:57:20 -06:00
Jean Sacren	ef1661ba6d	blk-mq: fix redundant check of !e expression In the if branch, e is checked. In the else branch, ->dispatch_busy is merely a number and has no effect on !e. We should remove the check of !e since it is always true. Signed-off-by: Jean Sacren <sakiwit@gmail.com> Link: https://lore.kernel.org/r/20211029202945.3052-1-sakiwit@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-30 09:34:14 -06:00
John Garry	9b84c629c9	blk-mq-debugfs: Show active requests per queue for shared tags Currently we show the hctx.active value for the per-hctx "active" file. However this is not maintained for shared tags, and we instead keep a record of the number active requests per request queue - see commit `f1b49fdc1c` ("blk-mq: Record active_queues_shared_sbitmap per tag_set for when using shared sbitmap). Change for the case of shared tags to show the active requests per request queue by using __blk_mq_active_requests() helper. Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1635496823-33515-1-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-29 06:53:34 -06:00
Jens Axboe	02f7eab009	block: improve readability of blk_mq_end_request_batch() It's faster and easier to read if we tolerate cur_hctx being NULL in the "when to flush" condition. Rename last_hctx to cur_hctx while at it, as it better describes the role of that variable. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-28 12:08:34 -06:00
Jens Axboe	c7b84d4226	block: re-flow blk_mq_rq_ctx_init() Now that we have flags passed in, we can do a final re-arrange of the flow of blk_mq_rq_ctx_init() so we're always writing request in the order in which it is laid out. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20211019153300.623322-5-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-27 08:43:15 -06:00
Jens Axboe	92aff191cc	block: prefetch request to be initialized Now we have the tags available in __blk_mq_alloc_requests_batch(), we can start fetching the first request cacheline before calling into the request initialization. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20211019153300.623322-4-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-27 08:43:15 -06:00
Jens Axboe	fe6134f669	block: pass in blk_mq_tags to blk_mq_rq_ctx_init() Instead of getting this from data for every invocation of request initialization, pass it in as an argument instead. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20211019153300.623322-3-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-27 08:43:15 -06:00
Jens Axboe	56f8da642b	block: add rq_flags to struct blk_mq_alloc_data There's a hole here we can use, and it's faster to set this earlier rather than need to check q->elevator multiple times. Signed-off-by: Jens Axboe <axboe@kernel.dk> Link: https://lore.kernel.org/r/20211019153300.623322-2-axboe@kernel.dk Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-27 08:43:15 -06:00
Pavel Begunkov	842e39b013	block: add async version of bio_set_polled If we know that a iocb is async we can optimise bio_set_polled() a bit, add a new helper bio_set_polled_async(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8fa137885164a5d05fadcff4c3521da8d5a83d00.1635337135.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-27 06:54:58 -06:00
Pavel Begunkov	e71aa913e2	block: kill DIO_MULTI_BIO Now __blkdev_direct_IO() serves only multi-bio I/O, thus remove not used anymore single bio refcounting optimisations. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/88eb488aae9ed4852a30f3a7132f296f56e43b80.1635337135.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-27 06:54:58 -06:00
Pavel Begunkov	25d207dc22	block: kill unused polling bits in __blkdev_direct_IO() With addition of __blkdev_direct_IO_async(), __blkdev_direct_IO() now serves only multio-bio I/O, which we don't poll. Now we can remove anything related to I/O polling from it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b8c597a6b7ee612df394853bfd24726aee5b898e.1635337135.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-27 06:54:58 -06:00
Pavel Begunkov	1bb6b81029	block: avoid extra iter advance with async iocb Nobody cares about iov iterators state if we return -EIOCBQUEUED, so as the we now have __blkdev_direct_IO_async(), which gets pages only once, we can skip expensive iov_iter_advance(). It's around 1-2% of all CPU spent. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a6158edfbfa2ae3bc24aed29a72f035df18fad2f.1635337135.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-27 06:54:58 -06:00
Damien Le Moal	a2247f19ee	block: Add independent access ranges support The Concurrent Positioning Ranges VPD page (for SCSI) and data log page (for ATA) contain parameters describing the set of contiguous LBAs that can be served independently by a single LUN multi-actuator hard-disk. Similarly, a logically defined block device composed of multiple disks can in some cases execute requests directed at different sector ranges in parallel. A dm-linear device aggregating 2 block devices together is an example. This patch implements support for exposing a block device independent access ranges to the user through sysfs to allow optimizing device accesses to increase performance. To describe the set of independent sector ranges of a device (actuators of a multi-actuator HDDs or table entries of a dm-linear device), The type struct blk_independent_access_ranges is introduced. This structure describes the sector ranges using an array of struct blk_independent_access_range structures. This range structure defines the start sector and number of sectors of the access range. The ranges in the array cannot overlap and must contain all sectors within the device capacity. The function disk_set_independent_access_ranges() allows a device driver to signal to the block layer that a device has multiple independent access ranges. In this case, a struct blk_independent_access_ranges is attached to the device request queue by the function disk_set_independent_access_ranges(). The function disk_alloc_independent_access_ranges() is provided for drivers to allocate this structure. struct blk_independent_access_ranges contains kobjects (struct kobject) to expose to the user through sysfs the set of independent access ranges supported by a device. When the device is initialized, sysfs registration of the ranges information is done from blk_register_queue() using the block layer internal function disk_register_independent_access_ranges(). If a driver calls disk_set_independent_access_ranges() for a registered queue, e.g. when a device is revalidated, disk_set_independent_access_ranges() will execute disk_register_independent_access_ranges() to update the sysfs attribute files. The sysfs file structure created starts from the independent_access_ranges sub-directory and contains the start sector and number of sectors of each range, with the information for each range grouped in numbered sub-directories. E.g. for a dual actuator HDD, the user sees: $ tree /sys/block/sdk/queue/independent_access_ranges/ /sys/block/sdk/queue/independent_access_ranges/ \|-- 0 \| \|-- nr_sectors \| `-- sector `-- 1 \|-- nr_sectors `-- sector For a regular device with a single access range, the independent_access_ranges sysfs directory does not exist. Device revalidation may lead to changes to this structure and to the attribute values. When manipulated, the queue sysfs_lock and sysfs_dir_lock mutexes are held for atomicity, similarly to how the blk-mq and elevator sysfs queue sub-directories are protected. The code related to the management of independent access ranges is added in the new file block/blk-ia-ranges.c. Signed-off-by: Damien Le Moal <damien.lemoal@wdc.com> Reviewed-by: Hannes Reinecke <hare@suse.de> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Reviewed-by: Keith Busch <kbusch@kernel.org> Link: https://lore.kernel.org/r/20211027022223.183838-2-damien.lemoal@wdc.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-26 20:36:47 -06:00
Ming Lei	ff1552232b	blk-mq: don't issue request directly in case that current is to be blocked When flushing plug list in case that current will be blocked, we can't issue request directly because ->queue_rq() may sleep, otherwise scheduler may complain. Fixes: `dc5fc361d8` ("block: attempt direct issue of plug list") Signed-off-by: Ming Lei <ming.lei@redhat.com> Link: https://lore.kernel.org/r/20211026082257.2889890-1-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-26 08:38:11 -06:00
Yu Kuai	0c9d338c84	blk-cgroup: synchronize blkg creation against policy deactivation Our test reports a null pointer dereference: [ 168.534653] ================================================================== [ 168.535614] Disabling lock debugging due to kernel taint [ 168.536346] BUG: kernel NULL pointer dereference, address: 0000000000000008 [ 168.537274] #PF: supervisor read access in kernel mode [ 168.537964] #PF: error_code(0x0000) - not-present page [ 168.538667] PGD 0 P4D 0 [ 168.539025] Oops: 0000 [#1] PREEMPT SMP KASAN [ 168.539656] CPU: 13 PID: 759 Comm: bash Tainted: G B 5.15.0-rc2-next-202100 [ 168.540954] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ?-20190727_0738364 [ 168.542736] RIP: 0010:bfq_pd_init+0x88/0x1e0 [ 168.543318] Code: 98 00 00 00 e8 c9 e4 5b ff 4c 8b 65 00 49 8d 7c 24 08 e8 bb e4 5b ff 4d0 [ 168.545803] RSP: 0018:ffff88817095f9c0 EFLAGS: 00010002 [ 168.546497] RAX: 0000000000000001 RBX: ffff888101a1c000 RCX: 0000000000000000 [ 168.547438] RDX: 0000000000000003 RSI: 0000000000000002 RDI: ffff888106553428 [ 168.548402] RBP: ffff888106553400 R08: ffffffff961bcaf4 R09: 0000000000000001 [ 168.549365] R10: ffffffffa2e16c27 R11: fffffbfff45c2d84 R12: 0000000000000000 [ 168.550291] R13: ffff888101a1c098 R14: ffff88810c7a08c8 R15: ffffffffa55541a0 [ 168.551221] FS: 00007fac75227700(0000) GS:ffff88839ba80000(0000) knlGS:0000000000000000 [ 168.552278] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 168.553040] CR2: 0000000000000008 CR3: 0000000165ce7000 CR4: 00000000000006e0 [ 168.554000] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 168.554929] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 168.555888] Call Trace: [ 168.556221] <TASK> [ 168.556510] blkg_create+0x1c0/0x8c0 [ 168.556989] blkg_conf_prep+0x574/0x650 [ 168.557502] ? stack_trace_save+0x99/0xd0 [ 168.558033] ? blkcg_conf_open_bdev+0x1b0/0x1b0 [ 168.558629] tg_set_conf.constprop.0+0xb9/0x280 [ 168.559231] ? kasan_set_track+0x29/0x40 [ 168.559758] ? kasan_set_free_info+0x30/0x60 [ 168.560344] ? tg_set_limit+0xae0/0xae0 [ 168.560853] ? do_sys_openat2+0x33b/0x640 [ 168.561383] ? do_sys_open+0xa2/0x100 [ 168.561877] ? __x64_sys_open+0x4e/0x60 [ 168.562383] ? __kasan_check_write+0x20/0x30 [ 168.562951] ? copyin+0x48/0x70 [ 168.563390] ? _copy_from_iter+0x234/0x9e0 [ 168.563948] tg_set_conf_u64+0x17/0x20 [ 168.564467] cgroup_file_write+0x1ad/0x380 [ 168.565014] ? cgroup_file_poll+0x80/0x80 [ 168.565568] ? __mutex_lock_slowpath+0x30/0x30 [ 168.566165] ? pgd_free+0x100/0x160 [ 168.566649] kernfs_fop_write_iter+0x21d/0x340 [ 168.567246] ? cgroup_file_poll+0x80/0x80 [ 168.567796] new_sync_write+0x29f/0x3c0 [ 168.568314] ? new_sync_read+0x410/0x410 [ 168.568840] ? __handle_mm_fault+0x1c97/0x2d80 [ 168.569425] ? copy_page_range+0x2b10/0x2b10 [ 168.570007] ? _raw_read_lock_bh+0xa0/0xa0 [ 168.570622] vfs_write+0x46e/0x630 [ 168.571091] ksys_write+0xcd/0x1e0 [ 168.571563] ? __x64_sys_read+0x60/0x60 [ 168.572081] ? __kasan_check_write+0x20/0x30 [ 168.572659] ? do_user_addr_fault+0x446/0xff0 [ 168.573264] __x64_sys_write+0x46/0x60 [ 168.573774] do_syscall_64+0x35/0x80 [ 168.574264] entry_SYSCALL_64_after_hwframe+0x44/0xae [ 168.574960] RIP: 0033:0x7fac74915130 [ 168.575456] Code: 73 01 c3 48 8b 0d 58 ed 2c 00 f7 d8 64 89 01 48 83 c8 ff c3 66 0f 1f 444 [ 168.577969] RSP: 002b:00007ffc3080e288 EFLAGS: 00000246 ORIG_RAX: 0000000000000001 [ 168.578986] RAX: ffffffffffffffda RBX: 0000000000000009 RCX: 00007fac74915130 [ 168.579937] RDX: 0000000000000009 RSI: 000056007669f080 RDI: 0000000000000001 [ 168.580884] RBP: 000056007669f080 R08: 000000000000000a R09: 00007fac75227700 [ 168.581841] R10: 000056007655c8f0 R11: 0000000000000246 R12: 0000000000000009 [ 168.582796] R13: 0000000000000001 R14: 00007fac74be55e0 R15: 00007fac74be08c0 [ 168.583757] </TASK> [ 168.584063] Modules linked in: [ 168.584494] CR2: 0000000000000008 [ 168.584964] ---[ end trace 2475611ad0f77a1a ]--- This is because blkg_alloc() is called from blkg_conf_prep() without holding 'q->queue_lock', and elevator is exited before blkg_create(): thread 1 thread 2 blkg_conf_prep spin_lock_irq(&q->queue_lock); blkg_lookup_check -> return NULL spin_unlock_irq(&q->queue_lock); blkg_alloc blkcg_policy_enabled -> true pd = ->pd_alloc_fn blkg->pd[i] = pd blk_mq_exit_sched bfq_exit_queue blkcg_deactivate_policy spin_lock_irq(&q->queue_lock); __clear_bit(pol->plid, q->blkcg_pols); spin_unlock_irq(&q->queue_lock); q->elevator = NULL; spin_lock_irq(&q->queue_lock); blkg_create if (blkg->pd[i]) ->pd_init_fn -> q->elevator is NULL spin_unlock_irq(&q->queue_lock); Because blkcg_deactivate_policy() requires queue to be frozen, we can grab q_usage_counter to synchoronize blkg_conf_prep() against blkcg_deactivate_policy(). Fixes: `e21b7a0b98` ("block, bfq: add full hierarchical scheduling and cgroups support") Signed-off-by: Yu Kuai <yukuai3@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20211020014036.2141723-1-yukuai3@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-25 08:06:27 -06:00
Pavel Begunkov	fa5fa8ec60	block: refactor bio_iov_bvec_set() Combine bio_iov_bvec_set() and bio_iov_bvec_set_append() and let the caller to do iov_iter_advance(). Also get rid of __bio_iov_bvec_set(), which was duplicated in the final binary, and replace a weird iov_iter_truncate() of a temporal iter copy with min() better reflecting the intention. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/bcf1ac36fce769a514e19475f3623cd86a1d8b72.1635006010.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-25 08:00:48 -06:00
Pavel Begunkov	54a88eb838	block: add single bio async direct IO helper As with __blkdev_direct_IO_simple(), we can implement direct IO more efficiently if there is only one bio. Add __blkdev_direct_IO_async() and blkdev_bio_end_io_async(). This patch brings me from 4.45-4.5 MIOPS with nullblk to 4.7+. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f0ae4109b7a6934adede490f84d188d53b97051b.1635006010.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-25 08:00:48 -06:00
John Garry	8bdf7b3fe1	blk-mq-sched: Don't reference queue tagset in blk_mq_sched_tags_teardown() We should not reference the queue tagset in blk_mq_sched_tags_teardown() (see function comment) for the blk-mq flags, so use the passed flags instead. This solves a use-after-free, similarly fixed earlier (and since broken again) in commit `f0c1c4d286` ("blk-mq: fix use-after-free in blk_mq_exit_sched"). Reported-by: Linux Kernel Functional Testing <lkft@linaro.org> Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Tested-by: Anders Roxell <anders.roxell@linaro.org> Fixes: `e155b0c238` ("blk-mq: Use shared tags for shared sbitmap support") Signed-off-by: John Garry <john.garry@huawei.com> Link: https://lore.kernel.org/r/1634890340-15432-1-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-22 09:16:12 -06:00
Pavel Begunkov	297db73184	block: fix req_bio_endio append error handling Shinichiro Kawasaki reports that there is a bug in a recent req_bio_endio() patch causing problems with zonefs. As Shinichiro suggested, inverse the condition in zone append path to resemble how it was before: fail when it's not fully completed. Fixes: `478eb72b81` ("block: optimise req_bio_endio()") Reported-by: Shinichiro Kawasaki <shinichiro.kawasaki@wdc.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/344ea4e334aace9148b41af5f2426da38c8aa65a.1634914228.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-22 09:12:37 -06:00
Eric Biggers	cb77cb5abe	blk-crypto: rename blk_keyslot_manager to blk_crypto_profile blk_keyslot_manager is misnamed because it doesn't necessarily manage keyslots. It actually does several different things: - Contains the crypto capabilities of the device. - Provides functions to control the inline encryption hardware. Originally these were just for programming/evicting keyslots; however, new functionality (hardware-wrapped keys) will require new functions here which are unrelated to keyslots. Moreover, device-mapper devices already (ab)use "keyslot_evict" to pass key eviction requests to their underlying devices even though device-mapper devices don't have any keyslots themselves (so it really should be "evict_key", not "keyslot_evict"). - Sometimes (but not always!) it manages keyslots. Originally it always did, but device-mapper devices don't have keyslots themselves, so they use a "passthrough keyslot manager" which doesn't actually manage keyslots. This hack works, but the terminology is unnatural. Also, some hardware doesn't have keyslots and thus also uses a "passthrough keyslot manager" (support for such hardware is yet to be upstreamed, but it will happen eventually). Let's stop having keyslot managers which don't actually manage keyslots. Instead, rename blk_keyslot_manager to blk_crypto_profile. This is a fairly big change, since for consistency it also has to update keyslot manager-related function names, variable names, and comments -- not just the actual struct name. However it's still a fairly straightforward change, as it doesn't change any actual functionality. Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20211018180453.40441-4-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 10:49:32 -06:00
Eric Biggers	1e8d44bddf	blk-crypto: rename keyslot-manager files to blk-crypto-profile In preparation for renaming struct blk_keyslot_manager to struct blk_crypto_profile, rename the keyslot-manager.h and keyslot-manager.c source files. Renaming these files separately before making a lot of changes to their contents makes it easier for git to understand that they were renamed. Acked-by: Ulf Hansson <ulf.hansson@linaro.org> # For MMC Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20211018180453.40441-3-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 10:49:32 -06:00
Eric Biggers	eebcafaebb	blk-crypto-fallback: properly prefix function and struct names For clarity, avoid using just the "blk_crypto_" prefix for functions and structs that are specific to blk-crypto-fallback. Instead, use "blk_crypto_fallback_". Some places already did this, but others didn't. This is also a prerequisite for using "struct blk_crypto_keyslot" to mean a generic blk-crypto keyslot (which is what it sounds like). Rename the fallback one to "struct blk_crypto_fallback_keyslot". No change in behavior. Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Mike Snitzer <snitzer@redhat.com> Reviewed-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20211018180453.40441-2-ebiggers@kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 10:49:32 -06:00
Xie Yongji	f059a1d2e2	block: Add invalidate_disk() helper to invalidate the gendisk To hide internal implementation and simplify some driver code, this adds a helper to invalidate the gendisk. It will clean the gendisk's associated buffer/page caches and reset its internal states. Signed-off-by: Xie Yongji <xieyongji@bytedance.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20210922123711.187-2-xieyongji@bytedance.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 10:12:41 -06:00
Pavel Begunkov	e94f68527a	block: kill extra rcu lock/unlock in queue enter blk_try_enter_queue() already takes rcu_read_lock/unlock, so we can avoid the second pair in percpu_ref_tryget_live(), use a newly added percpu_ref_tryget_live_rcu(). As rcu_read_lock/unlock imply barrier()s, it's pretty noticeable, especially for for !CONFIG_PREEMPT_RCU (default for some distributions), where __rcu_read_lock/unlock() are not inlined. 3.20% io_uring [kernel.vmlinux] [k] __rcu_read_unlock 3.05% io_uring [kernel.vmlinux] [k] __rcu_read_lock 2.52% io_uring [kernel.vmlinux] [k] __rcu_read_unlock 2.28% io_uring [kernel.vmlinux] [k] __rcu_read_lock Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6b11c67ea495ed9d44f067622d852de4a510ce65.1634822969.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 08:37:26 -06:00
Pavel Begunkov	6549a874fb	block: convert fops.c magic constants to SHIFT_SECTOR Don't use shifting by a magic number 9 but replace with a more descriptive SHIFT_SECTOR. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/068782b9f7e97569fb59a99529b23bb17ea4c5e2.1634755800.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 08:27:17 -06:00
Pavel Begunkov	179ae84f7e	block: clean up blk_mq_submit_bio() merging Combine blk_mq_sched_bio_merge() and blk_attempt_plug_merge() under a common if, so we don't check it twice. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/daedc90d4029a5d1d73344771632b1faca3aaf81.1634755800.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 08:27:17 -06:00
Pavel Begunkov	6450fe1f66	block: optimise boundary blkdev_read_iter's checks Combine pos and len checks and mark unlikely. Also, don't reexpand if it's not truncated. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/fff34e613aeaae1ad12977dc4592cb1a1f5d3190.1634755800.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 08:27:17 -06:00
Jackie Liu	057178cf51	fs: bdev: fix conflicting comment from lookup_bdev We switched to directly use dev_t to get block device, lookup changed the meaning of use, now we fix this conflicting comment. Fixes: `4e7b5671c6` ("block: remove i_bdev") Cc: Jens Axboe <axboe@kernel.dk> Cc: Christoph Hellwig <hch@lst.de> Cc: linux-fsdevel@vger.kernel.org Signed-off-by: Jackie Liu <liuyun01@kylinos.cn> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211021071344.1600362-1-liu.yun@linux.dev Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 08:24:14 -06:00
John Garry	0994c64eb4	blk-mq: Fix blk_mq_tagset_busy_iter() for shared tags Since it is now possible for a tagset to share a single set of tags, the iter function should not re-iter the tags for the count of #hw queues in that case. Rather it should just iter once. Fixes: `e155b0c238` ("blk-mq: Use shared tags for shared sbitmap support") Reported-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: John Garry <john.garry@huawei.com> Reviewed-by: Ming Lei <ming.lei@redhat.com> Tested-by: Kashyap Desai <kashyap.desai@broadcom.com> Link: https://lore.kernel.org/r/1634550083-202815-1-git-send-email-john.garry@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-21 08:21:52 -06:00
Christoph Hellwig	008f75a20e	block: cleanup the flush plug helpers Consolidate the various helpers into a single blk_flush_plug helper that takes a plk_plug and the from_scheduler bool and switch all callsites to call it directly. Checks that the plug is non-NULL must be performed by the caller, something that most already do anyway. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-5-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 09:56:11 -06:00
Pavel Begunkov	b600455d84	block: optimise blk_flush_plug_list Don't call flush_plug_callbacks if there are no plug callbacks. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> [hch: split from a larger patch] Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-4-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 09:56:11 -06:00
Christoph Hellwig	dbb6f764a0	blk-mq: move blk_mq_flush_plug_list to block/blk-mq.h This helper is internal to the block layer. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-3-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 09:56:11 -06:00
Christoph Hellwig	a214b949d8	blk-mq: only flush requests from the plug in blk_mq_submit_bio Replace the call to blk_flush_plug_list in blk_mq_submit_bio with a direct call to blk_mq_flush_plug_list. This means we do not flush plug callback from stackable devices, which doesn't really help with the accumulated requests anyway, and it also means the cached requests aren't freed here as they can still be used later on. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211020144119.142582-2-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 09:56:11 -06:00
Jens Axboe	037057a5a9	block: remove inaccurate requeue check This check is meant to catch cases where a requeue is attempted on a request that is still inserted. It's never really been useful to catch any misuse, and now it's actively wrong. Outside of that, this should not be a BUG_ON() to begin with. Remove the check as it's now causing active harm, as requeue off the plug path will trigger it even though the request state is just fine. Reported-by: Yi Zhang <yi.zhang@redhat.com> Link: https://lore.kernel.org/linux-block/CAHj4cs80zAUc2grnCZ015-2Rvd-=gXRfB_dFKy=RTm+wRo09HQ@mail.gmail.com/ Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 08:21:40 -06:00
Pavel Begunkov	c809084ab0	block: inline a part of bio_release_pages() Inline BIO_NO_PAGE_REF check of bio_release_pages() to avoid function call. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 08:08:07 -06:00
Pavel Begunkov	1497a51a32	block: don't bloat enter_queue with percpu_ref percpu_ref_put() are inlined for performance and bloat the binary, we don't care about the fail case of blk_try_enter_queue(), so we can replace it with a call to blk_queue_exit(). Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 08:08:06 -06:00
Pavel Begunkov	478eb72b81	block: optimise req_bio_endio() First, get rid of an extra branch and chain error checks. Also reshuffle it with bio_advance(), so it goes closer to the final check, with that the compiler loads rq->rq_flags only once, and also doesn't reload bio->bi_iter.bi_size if bio_advance() didn't actually advanced the iter. Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 08:08:05 -06:00
Pavel Begunkov	859897c3fb	block: convert leftovers to bdev_get_queue Convert bdev->bd_disk->queue to bdev_get_queue(), which is faster. Apparently, there are a few such spots in block that got lost during rebases. Reviewed-by: Chaitanya Kulkarni <kch@nvidia.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-20 08:08:03 -06:00
Ming Lei	e70feb8b3e	blk-mq: support concurrent queue quiesce/unquiesce blk_mq_quiesce_queue() has been used a bit wide now, so far we don't support concurrent/nested quiesce. One biggest issue is that unquiesce can happen unexpectedly in case that quiesce/unquiesce are run concurrently from more than one context. This patch introduces q->mq_quiesce_depth to deal concurrent quiesce, and we only unquiesce queue when it is the last/outer-most one of all contexts. Several kernel panic issue has been reported[1][2][3] when running stress quiesce test. And this patch has been verified in these reports. [1] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m1fc52431fad7f33b1ffc3f12c4450e4238540787 [2] https://lore.kernel.org/linux-block/9b21c797-e505-3821-4f5b-df7bf9380328@huawei.com/T/#m10ad90afeb9c8cc318334190a7c24c8b5c5e0722 [3] https://listman.redhat.com/archives/dm-devel/2021-September/msg00189.html Signed-off-by: Ming Lei <ming.lei@redhat.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211014081710.1871747-7-ming.lei@redhat.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 18:27:58 -06:00
Zheng Liang	2fc428f6b7	block, bfq: fix UAF problem in bfqg_stats_init() In bfq_pd_alloc(), the function bfqg_stats_init() init bfqg. If blkg_rwstat_init() init bfqg_stats->bytes successful and init bfqg_stats->ios failed, bfqg_stats_init() return failed, bfqg will be freed. But blkg_rwstat->cpu_cnt is not deleted from the list of percpu_counters. If we traverse the list of percpu_counters, It will have UAF problem. we should use blkg_rwstat_exit() to cleanup bfqg_stats bytes in the above scenario. Fixes: commit `fd41e60331` ("bfq-iosched: stop using blkg->stat_bytes and ->stat_ios") Signed-off-by: Zheng Liang <zhengliang6@huawei.com> Acked-by: Tejun Heo <tj@kernel.org> Link: https://lore.kernel.org/r/20211018024225.1493938-1-zhengliang6@huawei.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 15:18:30 -06:00
Jens Axboe	a808a9d545	block: inline fast path of driver tag allocation If we don't use an IO scheduler or have shared tags, then we don't need to call into this external function at all. This saves ~2% for such a setup. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 15:18:25 -06:00
Christoph Hellwig	d92ca9d834	blk-mq: don't handle non-flush requests in blk_insert_flush Return to the normal blk_mq_submit_bio flow if the bio did not end up actually being a flush because the device didn't support it. Note that this is basically impossible to hit without special instrumentation given that submit_bio_checks already clears these flags usually, so we'd need a tight race to actually hit this code path. With this the call to blk_mq_run_hw_queue for the flush requests can be removed given that the actual flush requests are always issued via the requeue workqueue which runs the queue unconditionally. Signed-off-by: Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20211019122553.2467817-1-hch@lst.de Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-10-19 11:10:09 -06:00

1 2 3 4 5 ...

5859 Commits