linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-15 00:04:15 +08:00

Author	SHA1	Message	Date
Pavel Begunkov	dac7a09864	io_uring: add helper flushing locked_free_list Add a new helper io_flush_cached_locked_reqs() that splices locked_free_list to free_list, and does it right doing all sync and invariant reinit. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:59 -06:00
Pavel Begunkov	a05432fb49	io_uring: refactor io_free_req_deferred() We don't care about ret value in io_free_req_deferred(), make the code a bit more concise. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:59 -06:00
Pavel Begunkov	0d85035a73	io_uring: inline io_put_req and friends One big omission is that io_put_req() haven't been marked inline, and at least gcc 9 doesn't inline it, not to mention that it's really hot and extra function call is intolerable, especially when it doesn't put a final ref. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:59 -06:00
Pavel Begunkov	8dd03afe61	io_uring: refactor rsrc refnode allocation There are two problems: 1) we always allocate refnodes in advance and free them if those haven't been used. It's expensive, takes two allocations, where one of them is percpu. And it may be pretty common not actually using them. 2) Current API with allocating a refnode and setting some of the fields is error prone, we don't ever want to have a file node runninng fixed buffer callback... Solve both with pre-init/get API. Pre-init just leaves the node for later if not used, and for get (i.e. io_rsrc_refnode_get()), you need to explicitly pass all arguments setting callbacks/etc., so it's more resilient. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	dd78f49260	io_uring: refactor io_flush_cached_reqs() Emphasize that return value of io_flush_cached_reqs() depends on number of requests in the cache. It looks nicer and might help tools from false-negative analyses. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	1840038e11	io_uring: optimise success case of __io_queue_sqe Move the case of successfully issued request by doing that check first. It's not much of a difference, just generates slightly better code for me. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	de968c182b	io_uring: inline __io_queue_linked_timeout() Inline __io_queue_linked_timeout(), we don't need it Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	966706579a	io_uring: keep io_req_free_batch() call locality Don't do a function call (io_dismantle_req()) in the middle and place it to near other function calls, otherwise may lead to excessive register spilling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	cf27f3b149	io_uring: optimise tctx node checks/alloc First of all, w need to set tctx->sqpoll only when we add a new entry into ->xa, so move it from the hot path. Also extract a hot path for io_uring_add_task_file() as an inline helper. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	33f993da98	io_uring: optimise io_uring_enter() Add unlikely annotations, because my compiler pretty much mispredicts every first check, and apart jumping around in the fast path, it also generates extra instructions, like in advance setting ret value. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	493f3b158a	io_uring: don't take ctx refs in task_work handler __tctx_task_work() guarantees that ctx won't be killed while running task_works, so we can remove now unnecessary ctx pinning for internally armed polling. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Jens Axboe	45ab03b19e	io_uring: transform ret == 0 for poll cancelation completions We can set canceled == true and complete out-of-line, ensure that we catch that and correctly return -ECANCELED if the poll operation got canceled. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Jens Axboe	b9b0e0d39c	io_uring: correct comment on poll vs iopoll The correct function is io_iopoll_complete(), which deals with completions of IOPOLL requests, not io_poll_complete(). Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Jens Axboe	7b29f92da3	io_uring: cache async and regular file state for fixed files We have to dig quite deep to check for particularly whether or not a file supports a fast-path nonblock attempt. For fixed files, we can do this lookup once and cache the state instead. This adds two new bits to track whether we support async read/write attempt, and lines up the REQ_F_ISREG bit with those two. The file slot re-uses the last 3 (or 2, for 32-bit) of the file pointer to cache that state, and then we mask it in when we go and use a fixed file. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Jens Axboe	d44f554e10	io_uring: don't check for io_uring_fops for fixed files We don't allow them at registration time, so limit the check for needing inflight tracking in io_file_get() to the non-fixed path. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	c9dca27dc7	io_uring: simplify io_sqd_update_thread_idle() Use a more comprehensible() max instead of hand coding it with ifs in io_sqd_update_thread_idle(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Jens Axboe	abc54d6343	io_uring: switch to atomic_t for io_kiocb reference count io_uring manipulates references twice for each request, and hence is very sensitive to performance of the reference count. This commit borrows a trick from: commit `f958d7b528` Author: Linus Torvalds <torvalds@linux-foundation.org> Date: Thu Apr 11 10:06:20 2019 -0700 mm: make page ref count overflow check tighter and more explicit and switches to atomic_t for references, while still retaining overflow and underflow checks. This is good for a 2-3% increase in peak IOPS on a single core. Before: IOPS=2970879, IOS/call=31/31, inflight=128 (128) IOPS=2952597, IOS/call=31/31, inflight=128 (128) IOPS=2943904, IOS/call=31/31, inflight=128 (128) IOPS=2930006, IOS/call=31/31, inflight=96 (96) and after: IOPS=3054354, IOS/call=31/31, inflight=128 (128) IOPS=3059038, IOS/call=31/31, inflight=128 (128) IOPS=3060320, IOS/call=31/31, inflight=128 (128) IOPS=3068256, IOS/call=31/31, inflight=96 (96) Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Jens Axboe	de9b4ccad7	io_uring: wrap io_kiocb reference count manipulation in helpers No functional changes in this patch, just in preparation for handling the references a bit more efficiently. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	179ae0d15e	io_uring: simplify io_resubmit_prep() If not for async_data NULL check, io_resubmit_prep() is already an rw specific version of io_req_prep_async(), but slower because 1) it always goes through io_import_iovec() even if following io_setup_async_rw() the result 2) instead of initialising iovec/iter in-place it does it on-stack and then copies with io_setup_async_rw(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	b7e298d265	io_uring: merge defer_prep() and prep_async() Merge two function and do renaming in favour of the second one, it relays the meaning better. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	26f0505a9c	io_uring: rethink def->needs_async_data needs_async_data controls allocation of async_data, and used in two cases. 1) when async setup requires it (by io_req_prep_async() or handler themselves), and 2) when op always needs additional space to operate, like timeouts do. Opcode preps already don't bother about the second case and do allocation unconditionally, restrict needs_async_data to the first case only and rename it into needs_async_setup. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> [axboe: update for IOPOLL fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	6cb78689fa	io_uring: untie alloc_async_data and needs_async_data All opcode handlers pretty well know whether they need async data or not, and can skip testing for needs_async_data. The exception is rw the generic path, but those test the flag by hand anyway. So, check the flag and make io_alloc_async_data() allocating unconditionally. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	2e052d443d	io_uring: refactor out send/recv async setup IORING_OP_[SEND,RECV] don't need async setup neither will get into io_req_prep_async(). Remove them from io_req_prep_async() and remove needs_async_data checks from the related setup functions. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	8c3f9cd160	io_uring: use better types for cflags __io_cqring_fill_event() takes cflags as long to squeeze it into u32 in an CQE, awhile all users pass int or unsigned. Replace it with unsigned int and store it as u32 in struct io_completion to match CQE. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	9fb8cb49c7	io_uring: refactor provide/remove buffer locking Always complete request holding the mutex instead of doing that strange dancing with conditional ordering. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	f41db2732d	io_uring: add a helper failing not issued requests Add a simple helper doing CQE posting, marking request for link-failure, and putting both submission and completion references. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	dafecf19e2	io_uring: further deduplicate file slot selection io_fixed_file_slot() and io_file_from_index() behave pretty similarly, DRY and call one from another. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:58 -06:00
Pavel Begunkov	2c4b8eb643	io_uring: reuse io_req_task_queue_fail() Use io_req_task_queue_fail() on the fail path of io_req_task_queue(). It's unlikely to happen, so don't care about additional overhead, but allows to keep all the req->result invariant in a single function. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:57 -06:00
Pavel Begunkov	e83acd7d37	io_uring: avoid taking ctx refs for task-cancel Don't bother to take a ctx->refs for io_req_task_cancel() because it take uring_lock before putting a request, and the context is promised to stay alive until unlock happens. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-11 17:41:57 -06:00
Linus Torvalds	7d90072491	for-5.12-rc6-tag -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEE8rQSAMVO+zA4DBdWxWXV+ddtWDsFAmBy9DoACgkQxWXV+ddt WDtqdxAAnK4zx79k5ok6nlj8JlOfReimX4wPYYigiiKGY40cfQUZ1YUqbDscvrt+ cbzvqJuMU/V/UVaPW/CLmNi5XpNlSmj0229iwy59BIcpXfgtAMTsa1zsY4teZ/AT 3noNuT15CTeybwii0nT++AkqJbCbwXc5ItccGh9ZMOQwXuA5IUVTAzKrulUJoxXN zt23lX/ivtSfUH+pMMIG6wMVG2eGIP5m9drw+2n0yK08gt+oprLYnaAaE389mXgb TIRBafeBY7UA1YEcA4JDBDMNa0L8yWSV+XiMhxw7Ear7KoROAunKNbsG8USll6zb zBftfO+Gzv86wVvvPXg2KR8Qs9vyJMw2bOROFKzOnd+wQQ76v0XefOhNUUN98E6g tLTmCH+M1B1Qm1j2hVyOect/PMY51xqJA9xwlTtAbqIcz4qyOtfTR9KqqlWxVKJW 9pAEMII063xEKVxgv2khOhewEjOgqa4v9YFQjVXdcHPKvGTAYBeoJA735+WnQ1HZ okPC5k3DoEcVZEkUPvespEsAqm+RoBufNxWmQ7hq5N3IwZAXsIwTlhysgrXQWyc9 aTigWBq6rQ/bMz/57vI626+MAMh3StL+UOxlWiT+GToInpjZwoxZ0lgQdD6vUfUm T90T2930+PTkykQM9sNdQygGiH0J5FzkvneYvpkOYJ/+vphsRiA= =MuRt -----END PGP SIGNATURE----- Merge tag 'for-5.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux Pull btrfs fix from David Sterba: "One more patch that we'd like to get to 5.12 before release. It's changing where and how the superblock is stored in the zoned mode. It is an on-disk format change but so far there are no implications for users as the proper mkfs support hasn't been merged and is waiting for the kernel side to settle. Until now, the superblocks were derived from the zone index, but zone size can differ per device. This is changed to be based on fixed offset values, to make it independent of the device zone size. The work on that got a bit delayed, we discussed the exact locations to support potential device sizes and usecases. (Partially delayed also due to my vacation.) Having that in the same release where the zoned mode is declared usable is highly desired, there are userspace projects that need to be updated to recognize the feature. Pushing that to the next release would make things harder to test" * tag 'for-5.12-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/kdave/linux: btrfs: zoned: move superblock logging zone location	2021-04-11 11:53:36 -07:00
Naohiro Aota	53b74fa990	btrfs: zoned: move superblock logging zone location Moves the location of the superblock logging zones. The new locations of the logging zones are now determined based on fixed block addresses instead of on fixed zone numbers. The old placement method based on fixed zone numbers causes problems when one needs to inspect a file system image without access to the drive zone information. In such case, the super block locations cannot be reliably determined as the zone size is unknown. By locating the superblock logging zones using fixed addresses, we can scan a dumped file system image without the zone information since a super block copy will always be present at or after the fixed known locations. Introduce the following three pairs of zones containing fixed offset locations, regardless of the device zone size. - primary superblock: offset 0B (and the following zone) - first copy: offset 512G (and the following zone) - Second copy: offset 4T (4096G, and the following zone) If a logging zone is outside of the disk capacity, we do not record the superblock copy. The first copy position is much larger than for a non-zoned filesystem, which is at 64M. This is to avoid overlapping with the log zones for the primary superblock. This higher location is arbitrary but allows supporting devices with very large zone sizes, plus some space around in between. Such large zone size is unrealistic and very unlikely to ever be seen in real devices. Currently, SMR disks have a zone size of 256MB, and we are expecting ZNS drives to be in the 1-4GB range, so this limit gives us room to breathe. For now, we only allow zone sizes up to 8GB. The maximum zone size that would still fit in the space is 256G. The fixed location addresses are somewhat arbitrary, with the intent of maintaining superblock reliability for smaller and larger devices, with the preference for the latter. For this reason, there are two superblocks under the first 1T. This should cover use cases for physical devices and for emulated/device-mapper devices. The superblock logging zones are reserved for superblock logging and never used for data or metadata blocks. Note that we only reserve the two zones per primary/copy actually used for superblock logging. We do not reserve the ranges of zones possibly containing superblocks with the largest supported zone size (0-16GB, 512G-528GB, 4096G-4112G). The zones containing the fixed location offsets used to store superblocks on a non-zoned volume are also reserved to avoid confusion. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Signed-off-by: David Sterba <dsterba@suse.com>	2021-04-10 12:13:16 +02:00
Linus Torvalds	adb2c4174f	Merge branch 'akpm' (patches from Andrew) Merge misc fixes from Andrew Morton: "14 patches. Subsystems affected by this patch series: mm (kasan, gup, pagecache, and kfence), MAINTAINERS, mailmap, nds32, gcov, ocfs2, ia64, and lib" * emailed patches from Andrew Morton <akpm@linux-foundation.org>: lib: fix kconfig dependency on ARCH_WANT_FRAME_POINTERS kfence, x86: fix preemptible warning on KPTI-enabled systems lib/test_kasan_module.c: suppress unused var warning kasan: fix conflict with page poisoning fs: direct-io: fix missing sdio->boundary ia64: fix user_stack_pointer() for ptrace() ocfs2: fix deadlock between setattr and dio_end_io_write gcov: re-fix clang-11+ support nds32: flush_dcache_page: use page_mapping_file to avoid races with swapoff mm/gup: check page posion status for coredump. .mailmap: fix old email addresses mailmap: update email address for Jordan Crouse treewide: change my e-mail address, fix my name MAINTAINERS: update CZ.NIC's Turris information	2021-04-09 17:06:32 -07:00
Linus Torvalds	3b9784350f	io_uring-5.12-2021-04-09 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBwXtgQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgprCVEAC8ZqiwYFIaCODC4NouBtMjZsInyGwetwBa sezn2KUIsP3umjbgZMO5KrI/6zGWO5j+7O7j1eW05FoN/yE1gM477EfPVgYeupgV tefAOskEAwPtMrTkQJt7kTxnv2IHNhTV1LEvv5Co3ueFoIA4Fbqso1oTsn87D3JK x5oZ+c5tDC7VendkZ/5SX4ioysFwabZv1qUR8GxRHCx4Kk5prrdUmejb8QeUOglG aJ6hd0UIFNmQAW+3ujOD6wlhn3MyKVfdZpGbhQfFQz3GV88maxAxNm/5dcEuEDan W8khgdJHZ5mr7/oDkjmJyjp9MNvamuPw52UzrQwmMOf3RvaKanM74/mJWXcVz+J9 tf9UsqiWwMmCpXwszA2KYBy+NIFZ3QqAy0Ed1pj0WLeYOzw0zPb34S/O9SnWy597 /T0I2jIBsOl478Wo6U+kmvTubJoRpGz9kgXbpHLve0rLb1BEWj1hQsZZrMJf/29b b5zULnIQqhztFoaYdg08LFrhePIp1oVTuY8XO8/k84C4VpOFQjBKfGMtK+HalRTA bX2h/K3xCQPwK4VWIaM1ucvhrMTTReajfKXesQwuBmwETbs2p4tRIjL8KFw7d1rC 2jB4UvKVnWuxE64LZ+VK86btC9eAjYmDTTEPRUVFE4ZFroXcr+MK2Qke5o2FNReN UOVy2Xlelw== =EbJ/ -----END PGP SIGNATURE----- Merge tag 'io_uring-5.12-2021-04-09' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Two minor fixups for the reissue logic, and one for making sure that unbounded work is canceled on io-wq exit" * tag 'io_uring-5.12-2021-04-09' of git://git.kernel.dk/linux-block: io-wq: cancel unbounded works on io-wq destroy io_uring: fix rw req completion io_uring: clear F_REISSUE right after getting it	2021-04-09 15:06:52 -07:00
Jack Qiu	df41872b68	fs: direct-io: fix missing sdio->boundary I encountered a hung task issue, but not a performance one. I run DIO on a device (need lba continuous, for example open channel ssd), maybe hungtask in below case: DIO: Checkpoint: get addr A(at boundary), merge into BIO, no submit because boundary missing flush dirty data(get addr A+1), wait IO(A+1) writeback timeout, because DIO(A) didn't submit get addr A+2 fail, because checkpoint is doing dio_send_cur_page() may clear sdio->boundary, so prevent it from missing a boundary. Link: https://lkml.kernel.org/r/20210322042253.38312-1-jack.qiu@huawei.com Fixes: `b1058b9812` ("direct-io: submit bio after boundary buffer is added to it") Signed-off-by: Jack Qiu <jack.qiu@huawei.com> Reviewed-by: Jan Kara <jack@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-04-09 14:54:23 -07:00
Wengang Wang	90bd070aae	ocfs2: fix deadlock between setattr and dio_end_io_write The following deadlock is detected: truncate -> setattr path is waiting for pending direct IO to be done (inode->i_dio_count become zero) with inode->i_rwsem held (down_write). PID: 14827 TASK: ffff881686a9af80 CPU: 20 COMMAND: "ora_p005_hrltd9" #0 __schedule at ffffffff818667cc #1 schedule at ffffffff81866de6 #2 inode_dio_wait at ffffffff812a2d04 #3 ocfs2_setattr at ffffffffc05f322e [ocfs2] #4 notify_change at ffffffff812a5a09 #5 do_truncate at ffffffff812808f5 #6 do_sys_ftruncate.constprop.18 at ffffffff81280cf2 #7 sys_ftruncate at ffffffff81280d8e #8 do_syscall_64 at ffffffff81003949 #9 entry_SYSCALL_64_after_hwframe at ffffffff81a001ad dio completion path is going to complete one direct IO (decrement inode->i_dio_count), but before that it hung at locking inode->i_rwsem: #0 __schedule+700 at ffffffff818667cc #1 schedule+54 at ffffffff81866de6 #2 rwsem_down_write_failed+536 at ffffffff8186aa28 #3 call_rwsem_down_write_failed+23 at ffffffff8185a1b7 #4 down_write+45 at ffffffff81869c9d #5 ocfs2_dio_end_io_write+180 at ffffffffc05d5444 [ocfs2] #6 ocfs2_dio_end_io+85 at ffffffffc05d5a85 [ocfs2] #7 dio_complete+140 at ffffffff812c873c #8 dio_aio_complete_work+25 at ffffffff812c89f9 #9 process_one_work+361 at ffffffff810b1889 #10 worker_thread+77 at ffffffff810b233d #11 kthread+261 at ffffffff810b7fd5 #12 ret_from_fork+62 at ffffffff81a0035e Thus above forms ABBA deadlock. The same deadlock was mentioned in upstream commit `28f5a8a7c0` ("ocfs2: should wait dio before inode lock in ocfs2_setattr()"). It seems that that commit only removed the cluster lock (the victim of above dead lock) from the ABBA deadlock party. End-user visible effects: Process hang in truncate -> ocfs2_setattr path and other processes hang at ocfs2_dio_end_io_write path. This is to fix the deadlock itself. It removes inode_lock() call from dio completion path to remove the deadlock and add ip_alloc_sem lock in setattr path to synchronize the inode modifications. [wen.gang.wang@oracle.com: remove the "had_alloc_lock" as suggested] Link: https://lkml.kernel.org/r/20210402171344.1605-1-wen.gang.wang@oracle.com Link: https://lkml.kernel.org/r/20210331203654.3911-1-wen.gang.wang@oracle.com Signed-off-by: Wengang Wang <wen.gang.wang@oracle.com> Reviewed-by: Joseph Qi <joseph.qi@linux.alibaba.com> Cc: Mark Fasheh <mark@fasheh.com> Cc: Joel Becker <jlbec@evilplan.org> Cc: Junxiao Bi <junxiao.bi@oracle.com> Cc: Changwei Ge <gechangwei@live.cn> Cc: Gang He <ghe@suse.com> Cc: Jun Piao <piaojun@huawei.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>	2021-04-09 14:54:23 -07:00
Linus Torvalds	17e7124aad	3 cifs/smb3 fixes, 2 for stable, includes a reconnetct fix and fix for display of devnames with special characters -----BEGIN PGP SIGNATURE----- iQGzBAABCgAdFiEE6fsu8pdIjtWE/DpLiiy9cAdyT1EFAmBvsOIACgkQiiy9cAdy T1G90Qv+IEnZZClHENPxoefXE9hdMYflrZ0FgR1xAD0JJZDTjj0OIsilCfJxpDq5 Wz4oXZT0WDuwOishMljGkdIUR+xdH8fILWsXhUJoa38zCzF4OZ7ld1zbBDblHLSw dR0bZJ5FmfXJmqMmrdl2ebLysQ2Xn0qCEn/6FiHABCKgoEvUcYJ95TMVnc6xyc+j x1dZbCmL0lQjRsUE+V918fFyAHJqWvlJC3dfEPl15ARgksEM/14f4o1Tp3tI3jZ1 aVgPMsCb/ZC4Cwjr1NB7g66ymLKZZODl66wxM6zgNXQj72Ay2Sr2KeXT4WH6jspK mJQED27i20VtganGTBcZaULsupqd5+378G5Or1TDqEsDDq+Xg4+B+BBgojhBqSh6 Czp1iZgZGHNaw/40t4ikeiNTqzQN3WNaiTAptsAew9MS2yvM4wsu1T2D70r0wQOI FytBASx/u60mu5BnomTaOSgvkOw4LJ9LJproqREdgyNcvSwqnqc3HkCRoWDxxcQc TCxr/i4a =rxNZ -----END PGP SIGNATURE----- Merge tag '5.12-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6 Pull cifs fixes from Steve French: "Three cifs/smb3 fixes, two for stable: a reconnect fix and a fix for display of devnames with special characters" * tag '5.12-rc6-smb3' of git://git.samba.org/sfrench/cifs-2.6: cifs: escape spaces in share names fs: cifs: Remove unnecessary struct declaration cifs: On cifs_reconnect, resolve the hostname again.	2021-04-08 18:57:47 -07:00
Pavel Begunkov	c60eb049f4	io-wq: cancel unbounded works on io-wq destroy WARNING: CPU: 5 PID: 227 at fs/io_uring.c:8578 io_ring_exit_work+0xe6/0x470 RIP: 0010:io_ring_exit_work+0xe6/0x470 Call Trace: process_one_work+0x206/0x400 worker_thread+0x4a/0x3d0 kthread+0x129/0x170 ret_from_fork+0x22/0x30 INFO: task lfs-openat:2359 blocked for more than 245 seconds. task:lfs-openat state:D stack: 0 pid: 2359 ppid: 1 flags:0x00000004 Call Trace: ... wait_for_completion+0x8b/0xf0 io_wq_destroy_manager+0x24/0x60 io_wq_put_and_exit+0x18/0x30 io_uring_clean_tctx+0x76/0xa0 __io_uring_files_cancel+0x1b9/0x2e0 do_exit+0xc0/0xb40 ... Even after io-wq destroy has been issued io-wq worker threads will continue executing all left work items as usual, and may hang waiting for I/O that won't ever complete (aka unbounded). [<0>] pipe_read+0x306/0x450 [<0>] io_iter_do_read+0x1e/0x40 [<0>] io_read+0xd5/0x330 [<0>] io_issue_sqe+0xd21/0x18a0 [<0>] io_wq_submit_work+0x6c/0x140 [<0>] io_worker_handle_work+0x17d/0x400 [<0>] io_wqe_worker+0x2c0/0x330 [<0>] ret_from_fork+0x22/0x30 Cancel all unbounded I/O instead of executing them. This changes the user visible behaviour, but that's inevitable as io-wq is not per task. Suggested-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cd4b543154154cba055cf86f351441c2174d7f71.1617842918.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-08 13:33:17 -06:00
Pavel Begunkov	9728463737	io_uring: fix rw req completion WARNING: at fs/io_uring.c:8578 io_ring_exit_work.cold+0x0/0x18 As reissuing is now passed back by REQ_F_REISSUE and kiocb_done() internally uses __io_complete_rw(), it may stop after setting the flag so leaving a dangling request. There are tricky edge cases, e.g. reading beyound file, boundary, so the easiest way is to hand code reissue in kiocb_done() as __io_complete_rw() was doing for us before. Fixes: `230d50d448` ("io_uring: move reissue into regular IO path") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/f602250d292f8a84cca9a01d747744d1e797be26.1617842918.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-08 13:32:59 -06:00
Linus Torvalds	4ea51e0e37	for-linus-2021-04-08 -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQRAhzRXHqcMeLMyaSiRxhvAZXjcogUCYG7FBgAKCRCRxhvAZXjc osGSAQCW7V8zPhZ7Zwll3QeUk0xAqD6e6T3Uv3EoQPKCVcc00gEA/hQtDJYSGZWI 22hPAffU2YOKeYDXq7SIu+eJ1y/ShQ0= =xfme -----END PGP SIGNATURE----- Merge tag 'for-linus-2021-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux Pull close_range() fix from Christian Brauner: "Syzbot reported a bug in close_range. Debugging this showed we didn't recalculate the current maximum fd number for CLOSE_RANGE_UNSHARE \| CLOSE_RANGE_CLOEXEC after we unshared the file descriptors table. As a result, max_fd could exceed the current fdtable maximum causing us to set excessive bits. As a concrete example, let's say the user requested everything from fd 4 to ~0UL to be closed and their current fdtable size is 256 with their highest open fd being 4. With CLOSE_RANGE_UNSHARE the caller will end up with a new fdtable which has room for 64 file descriptors since that is the lowest fdtable size we accept. But now max_fd will still point to 255 and needs to be adjusted. Fix this by retrieving the correct maximum fd value in __range_cloexec(). I've carried this fix for a little while but since there was no linux-next release over easter I waited until now. With this change close_range() can be further simplified but imho we are in no hurry to do that and so I'll defer this for the 5.13 merge window" * tag 'for-linus-2021-04-08' of git://git.kernel.org/pub/scm/linux/kernel/git/brauner/linux: file: fix close_range() for unshare+cloexec	2021-04-08 08:46:53 -07:00
Linus Torvalds	035d80695f	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull umount fix from Al Viro: "Brown paperbag time: dumb braino in the series that went into 5.7 broke the 'don't step into ->d_weak_revalidate() when umount(2) looks the victim up' behaviour. Spotted only now - saw if (!err && unlikely(nd->flags & LOOKUP_MOUNTPOINT)) { err = handle_lookup_down(nd); nd->flags &= ~LOOKUP_JUMPED; // no d_weak_revalidate(), please... } and went "why do we clear that flag here - nothing below that point is going to check it anyway" / "wait a minute, what is it doing after complete_walk() (which is where we check that flag and call ->d_weak_revalidate())" / "how could that possibly _not_ break?", followed by reproducing the breakage and verifying that the obvious fix of that braino does, indeed, fix it. The reproducer is (assuming that $DIR exists and is exported r/w to localhost) mkdir $DIR/a mkdir /tmp/foo mount --bind /tmp/foo /tmp/foo mkdir /tmp/foo/a mkdir /tmp/foo/b mount -t nfs4 localhost:$DIR/a /tmp/foo/a mount -t nfs4 localhost:$DIR /tmp/foo/b rmdir /tmp/foo/b/a umount /tmp/foo/b umount /tmp/foo/a umount -l /tmp/foo # will get everything under /tmp/foo, no matter what Correct behaviour is successful umount; broken kernels (5.7-rc1 and later) get umount.nfs4: /tmp/foo/a: Stale file handle Note that bind mount is there to be able to recover - on broken kernels we'd get stuck with impossible-to-umount filesystem if not for that. FWIW, that braino had been posted for review back then, at least twice. Unfortunately, the call of complete_walk() was outside of diff context, so the bogosity hadn't been immediately obvious from the patch alone ;-/" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late	2021-04-08 08:26:06 -07:00
Pavel Begunkov	6ad7f2332e	io_uring: clear F_REISSUE right after getting it There are lots of ways r/w request may continue its path after getting REQ_F_REISSUE, it's not necessarily io-wq and can be, e.g. apoll, and submitted via io_async_task_func() -> __io_req_task_submit() Clear the flag right after getting it, so the next attempt is well prepared regardless how the request will be executed. Fixes: `230d50d448` ("io_uring: move reissue into regular IO path") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/11dcead939343f4e27cab0074d34afcab771bfa4.1617842918.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-07 22:10:19 -06:00
Maciek Borzecki	0fc9322ab5	cifs: escape spaces in share names Commit `653a5efb84` ("cifs: update super_operations to show_devname") introduced the display of devname for cifs mounts. However, when mounting a share which has a whitespace in the name, that exact share name is also displayed in mountinfo. Make sure that all whitespace is escaped. Signed-off-by: Maciek Borzecki <maciek.borzecki@gmail.com> CC: <stable@vger.kernel.org> # 5.11+ Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-04-07 21:30:27 -05:00
Wan Jiabing	d135be0a7f	fs: cifs: Remove unnecessary struct declaration struct cifs_readdata is declared twice. One is declared at 208th line. And struct cifs_readdata is defined blew. The declaration here is not needed. Remove the duplicate. Signed-off-by: Wan Jiabing <wanjiabing@vivo.com> Reviewed-by: Shyam Prasad N <sprasad@microsoft.com> Signed-off-by: Steve French <stfrench@microsoft.com>	2021-04-07 21:30:27 -05:00
Shyam Prasad N	4e456b30f7	cifs: On cifs_reconnect, resolve the hostname again. On cifs_reconnect, make sure that DNS resolution happens again. It could be the cause of connection to go dead in the first place. This also contains the fix for a build issue identified by Intel bot. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Shyam Prasad N <sprasad@microsoft.com> Reviewed-by: Paulo Alcantara (SUSE) <pc@cjr.nz> Reviewed-by: Pavel Shilovsky <pshilov@microsoft.com> CC: <stable@vger.kernel.org> # 5.11+ Signed-off-by: Steve French <stfrench@microsoft.com>	2021-04-07 21:29:36 -05:00
Al Viro	4f0ed93fb9	LOOKUP_MOUNTPOINT: we are cleaning "jumped" flag too late That (and traversals in case of umount .) should be done before complete_walk(). Either a braino or mismerge damage on queue reorders - either way, I should've spotted that much earlier. Fucked-up-by: Al Viro <viro@zeniv.linux.org.uk> X-Paperbag: Brown Fixes: `161aff1d93` "LOOKUP_MOUNTPOINT: fold path_mountpointat() into path_lookupat()" Cc: stable@vger.kernel.org # v5.7+ Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-04-06 20:33:00 -04:00
Linus Torvalds	2d74366078	Merge branch 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs Pull fs fixes from Al Viro: "Fairly old hostfs bug (in setups that are not used by anyone, apparently) + fix for this cycle regression: extra dput/mntput in LOOKUP_CACHED failure handling" * 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs: Make sure nd->path.mnt and nd->path.dentry are always valid pointers hostfs: fix memory handling in follow_link()	2021-04-06 12:52:49 -07:00
Al Viro	7d01ef7585	Make sure nd->path.mnt and nd->path.dentry are always valid pointers Initialize them in set_nameidata() and make sure that terminate_walk() clears them once the pointers become potentially invalid (i.e. we leave RCU mode or drop them in non-RCU one). Currently we have "path_init() always initializes them and nobody accesses them outside of path_init()/terminate_walk() segments", which is asking for trouble. With that change we would have nd->path.{mnt,dentry} 1) always valid - NULL or pointing to currently allocated objects. 2) non-NULL while we are successfully walking 3) NULL when we are not walking at all 4) contributing to refcounts whenever non-NULL outside of RCU mode. Fixes: `6c6ec2b0a3` ("fs: add support for LOOKUP_CACHED") Reported-by: syzbot+c88a7030da47945a3cc3@syzkaller.appspotmail.com Tested-by: Christian Brauner <christian.brauner@ubuntu.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>	2021-04-06 12:33:07 -04:00
Linus Torvalds	d83e98f9d8	io_uring-5.12-2021-04-03 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmBoyXQQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpiOXD/4wnXFHOmt5HMHmEXU4bB5b46snh3iCU4QM w7pqgPMgS8uRZZ16FWpzDwccC2NZDoFRHhFxDrOqdKMO1DQlzk1yfSeKjNs9h2Oe mJ6JuuaX6bEk2RRmjqqM5aMCazE2+tWtDydveL9OrJO6uiPcZYFPiRDwafDzQo4n YPhUhPm+dqn/4Li6Fap2ieCeLqXNEUKpA0/Bd9QQV0Q/oZq5oLdJefIk8EMXH/tf eKH2muZDjOV0FYdG8lPsNAF0c5qJ/aID+jlhyUz8Bkn31lOS96d5rzXoFq+AonsC gVwLbaMcAibHrDjNxIQGcEU0VSjvvfy9GAfjJ3uSuyjpE4dNMe2fuU/B3rF6xb6E upEfAik+frvzfFuZx11SZh/JwNBatJh35DVZZczI48YKk+s2MI0q9+lNINLtL6bD 3J287jnZbETU54WCMruiRwjQ3J1YWOj2pfxrPT9J0NZdQ0r7MXsDJecGNu+nL0X3 Ry7IpXUqabCf+4+XrGZ2NG6/kd5D/smatc5FqTkyeih3mw94iprNuWanzBlUZYQV 8ybrVtW37caAd1vyx0/p2I1LV5Y0eNGpSWz/WTDj0FINPCPjLq/VmPsIYgihmvz4 joWEzGnBKqds/CAvSneyz2d9MVlQ1083Az/Vi9g4w0IHG/p3ekQGBPDFBvQyszq9 pNkMpQ9Wxw== =edGl -----END PGP SIGNATURE----- Merge tag 'io_uring-5.12-2021-04-03' of git://git.kernel.dk/linux-block POull io_uring fix from Jens Axboe: "Just fixing a silly braino in a previous patch, where we'd end up failing to compile if CONFIG_BLOCK isn't enabled. Not that a lot of people do that, but kernel bot spotted it and it's probably prudent to just flush this out now before -rc6. Sorry about that, none of my test compile configs have !CONFIG_BLOCK" * tag 'io_uring-5.12-2021-04-03' of git://git.kernel.dk/linux-block: io_uring: fix !CONFIG_BLOCK compilation failure	2021-04-03 14:26:47 -07:00
Linus Torvalds	8e29be3468	Two more gfs2 fixes -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEEJZs3krPW0xkhLMTc1b+f6wMTZToFAmBomgAUHGFncnVlbmJh QHJlZGhhdC5jb20ACgkQ1b+f6wMTZTqWkA//bGV+XgrDas0mBjAQiGBcqC9M1Ts1 cffrJ2E9E5J/Cnn+/wAmkGYZU3oHpdmNI4DytCneHhfY9YamSlg6B9MDxebF3Iuu eXiWB4rXEf7l5AG5ywUloXl171TTxOQDo5bbOlvYeDWPZ/WYEBsdZ/rxF9yIent/ RS5HMt9I9x6c2WQGTlBvo8D618WyfxzkX5AjvhojITgX5UItZg6bOKkcuqcQ9PNG 5MGnHCCJU2Zh3Y6gGZjp3rQnxRAzpBhFdVrXaZgTAtKLycsGHMcjqfmST6N2si8l cCDgujRhRfIOpQZs0vOdcJVtpNBfDxgOO5JlTdkY6Grh/STYoN9dFo7V0SnYhMJE FdBgfHwNyyBEo/QV3gDXrvITxw+xypACjS/znffArxFySNzSfv2oWPCxOJvbel0L H4y+gbJ4R+QUzkUuvnXjxjl7c70jK+flLbzUxXxeSQBmOHtCiHZJK43UnCq+fpmZ hOrUaYHvCV1iC/9OeAy1N8MlXicUHnpmu/7q7GEGaRTV4zN85MjddOiRKAz6RiAl 2nB64GbLDFjY3HF8+/giEwAWikcYPk2W8S0WmX9Bjn+UdMYuH1cWF/TLwWrSmQJX MQmYlwLOj+UKg8Ku0+Klh2c8oMxo7C8o9t9BjBJio5od0U/sen9v/08DUobk4jwN hhjW+as1ZV/+Tpc= =TP8w -----END PGP SIGNATURE----- Merge tag 'gfs2-v5.12-rc2-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2 Pull gfs2 fixes from Andreas Gruenbacher: "Two more gfs2 fixes" * tag 'gfs2-v5.12-rc2-fixes2' of git://git.kernel.org/pub/scm/linux/kernel/git/gfs2/linux-gfs2: gfs2: report "already frozen/thawed" errors gfs2: Flag a withdraw if init_threads() fails	2021-04-03 12:15:01 -07:00
Jens Axboe	e82ad48539	io_uring: fix !CONFIG_BLOCK compilation failure kernel test robot correctly pinpoints a compilation failure if CONFIG_BLOCK isn't set: fs/io_uring.c: In function '__io_complete_rw': >> fs/io_uring.c:2509:48: error: implicit declaration of function 'io_rw_should_reissue'; did you mean 'io_rw_reissue'? [-Werror=implicit-function-declaration] 2509 \| if ((res == -EAGAIN \|\| res == -EOPNOTSUPP) && io_rw_should_reissue(req)) { \| ^~~~~~~~~~~~~~~~~~~~ \| io_rw_reissue cc1: some warnings being treated as errors Ensure that we have a stub declaration of io_rw_should_reissue() for !CONFIG_BLOCK. Fixes: `230d50d448` ("io_uring: move reissue into regular IO path") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2021-04-02 19:45:34 -06:00

1 2 3 4 5 ...

69698 Commits