linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-14 15:54:15 +08:00

Author	SHA1	Message	Date
Jens Axboe	4f731705cc	io_uring/fdinfo: get rid of unnecessary is_cqe32 variable We already have the cq_shift, just use that to tell if we have doubly sized CQEs or not. While in there, cleanup the CQE32 vs normal CQE size printing. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 13:15:02 -06:00
Pavel Begunkov	c0dc995eb2	io_uring: remove unused return from io_disarm_next We removed conditional io_commit_cqring_flush() guarding against spurious eventfd and the io_disarm_next()'s return value is not used anymore, just void it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9a441c9a32a58bcc586076fa9a7d0dc33f1fb3cb.1662652536.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 13:15:01 -06:00
Pavel Begunkov	7924fdfeea	io_uring: add fast path for io_run_local_work() We'll grab uring_lock and call __io_run_local_work() with several atomics inside even if there are no task works. Skip it if ->work_llist is empty. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/f6a885f372bad2d77d9cd87341b0a86a4000c0ff.1662652536.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 13:15:01 -06:00
Pavel Begunkov	1f8d5bbe98	io_uring/iopoll: unify tw breaking logic Let's keep checks for whether to break the iopoll loop or not same for normal and defer tw, this includes ->cached_cq_tail checks guarding against polling more than asked for. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d2fa8a44f8114f55a4807528da438cde93815360.1662652536.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 13:15:01 -06:00
Pavel Begunkov	9d54bd6a3b	io_uring/iopoll: fix unexpected returns We may propagate a positive return value of io_run_task_work() out of io_iopoll_check(), which breaks our tests. io_run_task_work() doesn't return anything useful for us, ignore the return value. Fixes: `c0e0d6ba25` ("io_uring: add IORING_SETUP_DEFER_TASKRUN") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/c442bb87f79cea10b3f857cbd4b9a4f0a0493fa3.1662652536.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 13:14:59 -06:00
Pavel Begunkov	6567506b68	io_uring: disallow defer-tw run w/ no submitters We try to restrict CQ waiters when IORING_SETUP_DEFER_TASKRUN is set, but if nothing has been submitted yet it'll allow any waiter, which violates the contract. Fixes: `c0e0d6ba25` ("io_uring: add IORING_SETUP_DEFER_TASKRUN") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/b4f0d3f14236d7059d08c5abe2661ef0b78b5528.1662652536.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 13:14:55 -06:00
Pavel Begunkov	76de6749d1	io_uring: further limit non-owner defer-tw cq waiting In case of DEFER_TASK_WORK we try to restrict waiters to only one task, which is also the only submitter; however, we don't do it reliably, which might be very confusing and backfire in the future. E.g. we currently allow multiple tasks in io_iopoll_check(). Fixes: `c0e0d6ba25` ("io_uring: add IORING_SETUP_DEFER_TASKRUN") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/94c83c0a7fe468260ee2ec31bdb0095d6e874ba2.1662652536.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 13:14:46 -06:00
Pavel Begunkov	ac9e5784bb	io_uring/net: use io_sr_msg for sendzc Reuse struct io_sr_msg for zerocopy sends, which is handy. There is only one zerocopy specific field, namely .notif, and we have enough space for it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/408c5b1b2d8869e1a12da5f5a78ed72cac112149.1662639236.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:43 -06:00
Pavel Begunkov	0b048557db	io_uring/net: refactor io_sr_msg types In preparation for using struct io_sr_msg for zerocopy sends, clean up types. First, flags can be u16 as it's provided by the userspace in u16 ioprio, as well as addr_len. This saves us 4 bytes. Also use unsigned for size and done_io, both are as well limited to u32. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/42c2639d6385b8b2181342d2af3a42d3b1c5bcd2.1662639236.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:43 -06:00
Pavel Begunkov	cd9021e88f	io_uring/net: add non-bvec sg chunking callback Add a sg_from_iter() for when we initiate non-bvec zerocopy sends, which helps us to remove some extra steps from io_sg_from_iter(). The only thing the new function has to do before giving control away to __zerocopy_sg_from_iter() is to check if the skb has managed frags and downgrade them if so. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/cda3dea0d36f7931f63a70f350130f085ac3f3dd.1662639236.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:43 -06:00
Pavel Begunkov	6bf8ad25fc	io_uring/net: io_async_msghdr caches for sendzc We already keep io_async_msghdr caches for normal send/recv requests, use them also for zerocopy send. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/42fa615b6e0be25f47a685c35d7b5e4f1b03d348.1662639236.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:43 -06:00
Pavel Begunkov	858c293e5d	io_uring/net: use async caches for async prep send/recv have async_data caches but there are only used from within issue handlers. Extend their use also to ->prep_async, should be handy with links and IOSQE_ASYNC. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/b9a2264b807582a97ed606c5bfcdc2399384e8a5.1662639236.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:43 -06:00
Pavel Begunkov	95eafc74be	io_uring/net: reshuffle error handling We should prioritise send/recv retry cases over failures, they're more important. Shuffle -ERESTARTSYS after we handled retries. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d9059691b30d0963b7269fa4a0c81ee7720555e6.1662639236.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:43 -06:00
Pavel Begunkov	e9a8842854	io_uring: use io_cq_lock consistently There is one place when we forgot to change hand coded spin locking with io_cq_lock(), change it to be more consistent. Note, the unlock part is already __io_cq_unlock_post(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/91699b9a00a07128f7ca66136bdbbfc67a64659e.1662639236.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:43 -06:00
Pavel Begunkov	385c609f9b	io_uring: kill an outdated comment Request referencing has changed a while ago and there is no notion left of submission/completion references, kill an outdated comment. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/38902e7229d68cecd62702436d627d4858b0d9d4.1662639236.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:43 -06:00
Dylan Yudaken	4ab9d46507	io_uring: allow buffer recycling in READV In commit `934447a603` ("io_uring: do not recycle buffer in READV") a temporary fix was put in io_kbuf_recycle to simply never recycle READV buffers. Instead of that, rather treat READV with REQ_F_BUFFER_SELECTED the same as a READ with REQ_F_BUFFER_SELECTED. Since READV requires iov_len of 1 they are essentially the same. In order to do this inside io_prep_rw() add some validation to check that it is in fact only length 1, and also extract the length of the buffer at prep time. This allows removal of the io_iov_buffer_select codepaths as they are only used from the READV op. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220907165152.994979-1-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:43 -06:00
Jens Axboe	de97fcb303	fs: add batch and poll flags to the uring_cmd_iopoll() handler We need the poll_flags to know how to poll for the IO, and we should have the batch structure in preparation for supporting batched completions with iopoll. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:43 -06:00
Jens Axboe	dac6a0eae7	io_uring: ensure iopoll runs local task work as well Combine the two checks we have for task_work running and whether or not we need to shuffle the mutex into one, so we unify how task_work is run in the iopoll loop. This helps ensure that local task_work is run when needed, and also optimizes that path to avoid a mutex shuffle if it's not needed. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:43 -06:00
Jens Axboe	8ac5d85a89	io_uring: add local task_work run helper that is entered locked We have a few spots that drop the mutex just to run local task_work, which immediately tries to grab it again. Add a helper that just passes in whether we're locked already. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:43 -06:00
Jens Axboe	a1119fb071	io_uring: cleanly separate request types for iopoll After the addition of iopoll support for passthrough, there's a bit of a mixup here. Clean it up and get rid of the casting for the passthrough command type. Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:42 -06:00
Kanchan Joshi	5756a3a7e7	io_uring: add iopoll infrastructure for io_uring_cmd Put this up in the same way as iopoll is done for regular read/write IO. Make place for storing a cookie into struct io_uring_cmd on submission. Perform the completion using the ->uring_cmd_iopoll handler. Signed-off-by: Kanchan Joshi <joshi.k@samsung.com> Signed-off-by: Pankaj Raghav <p.raghav@samsung.com> Link: https://lore.kernel.org/r/20220823161443.49436-3-joshi.k@samsung.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:42 -06:00
Dylan Yudaken	f75d5036d0	io_uring: trace local task work run Add tracing for io_run_local_task_work Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220830125013.570060-8-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:42 -06:00
Dylan Yudaken	21a091b970	io_uring: signal registered eventfd to process deferred task work Some workloads rely on a registered eventfd (via io_uring_register_eventfd(3)) in order to wake up and process the io_uring. In the case of a ring setup with IORING_SETUP_DEFER_TASKRUN, that eventfd also needs to be signalled when there are tasks to run. This changes an old behaviour which assumed 1 eventfd signal implied at least 1 CQE, however only when this new flag is set (and so old users will not notice). This should be expected with the IORING_SETUP_DEFER_TASKRUN flag as it is not guaranteed that every task will result in a CQE. Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220830125013.570060-7-dylany@fb.com [axboe: fold in call_rcu() serialization fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:42 -06:00
Dylan Yudaken	d8e9214f11	io_uring: move io_eventfd_put Non functional change: move this function above io_eventfd_signal so it can be used from there Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220830125013.570060-6-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:42 -06:00
Dylan Yudaken	c0e0d6ba25	io_uring: add IORING_SETUP_DEFER_TASKRUN Allow deferring async tasks until the user calls io_uring_enter(2) with the IORING_ENTER_GETEVENTS flag. Enable this mode with a flag at io_uring_setup time. This functionality requires that the later io_uring_enter will be called from the same submission task, and therefore restrict this flag to work only when IORING_SETUP_SINGLE_ISSUER is also set. Being able to hand pick when tasks are run prevents the problem where there is current work to be done, however task work runs anyway. For example, a common workload would obtain a batch of CQEs, and process each one. Interrupting this to additional taskwork would add latency but not gain anything. If instead task work is deferred to just before more CQEs are obtained then no additional latency is added. The way this is implemented is by trying to keep task work local to a io_ring_ctx, rather than to the submission task. This is required, as the application will want to wake up only a single io_ring_ctx at a time to process work, and so the lists of work have to be kept separate. This has some other benefits like not having to check the task continually in handle_tw_list (and potentially unlocking/locking those), and reducing locks in the submit & process completions path. There are networking cases where using this option can reduce request latency by 50%. For example a contrived example using [1] where the client sends 2k data and receives the same data back while doing some system calls (to trigger task work) shows this reduction. The reason ends up being that if sending responses is delayed by processing task work, then the client side sits idle. Whereas reordering the sends first means that the client runs it's workload in parallel with the local task work. [1]: Using https://github.com/DylanZA/netbench/tree/defer_run Client: ./netbench --client_only 1 --control_port 10000 --host <host> --tx "epoll --threads 16 --per_thread 1 --size 2048 --resp 2048 --workload 1000" Server: ./netbench --server_only 1 --control_port 10000 --rx "io_uring --defer_taskrun 0 --workload 100" --rx "io_uring --defer_taskrun 1 --workload 100" Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220830125013.570060-5-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:42 -06:00
Dylan Yudaken	2327337b88	io_uring: do not run task work at the start of io_uring_enter This is not needed, and it is normally better to wait for task work until after submissions. This will allow greater batching if either work arrives in the meanwhile, or if the submissions cause task work to be queued up. For SQPOLL this also no longer runs task work, but this is handled inside the SQPOLL loop anyway. For IOPOLL io_iopoll_check will run task work anyway And otherwise io_cqring_wait will run task work Suggested-by: Pavel Begunkov <asml.silence@gmail.com> Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220830125013.570060-4-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:42 -06:00
Dylan Yudaken	b4c98d59a7	io_uring: introduce io_has_work This will be used later to know if the ring has outstanding work. Right now just if there is overflow CQEs to copy to the main CQE ring, but later will include deferred tasks Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220830125013.570060-3-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:42 -06:00
Dylan Yudaken	32d91f0590	io_uring: remove unnecessary variable 'running' is set once and read once, so can easily just remove it Signed-off-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/20220830125013.570060-2-dylany@fb.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-21 10:30:42 -06:00
Linus Torvalds	38eddeedbb	io_uring-6.0-2022-09-18 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmMnFlcQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgphm1D/0ZXgihejm59WTef8UktYzXT1B0SbN9TT1r CQm/5BVSTWkz5UOmpPxtiL2wT0Lj+D1i4xtKEvPS3L9nwWHgz5dM6AmdIk9jXKUz 09Y8XnZqtjr228mxRxZ33x3YaUaJv3b/AAgdL12rzN/9Crr4V1z+vAFuW1LQpFhN DxXSMi+tQzyNBjD503h/buQ4eOpdkKOW/EpjqePHsz+OqSpjgoy+ddTVS7jhakun 9B6BrDUVEMwyCzT///1Zi+TjkdiZOub26CSn38TXaQAWBkGDRo3B1Jq6D9MH8VK5 MlHWgrkz6OSqoJw79bvLKjWR/WNA8EM4e5Myd1QGsesMa7BRPBCp/V0ooVtHeHtb lrN8CmGFXxt5uKRxzP0F6IxrRxo9hYxTTbH+Qy5K7c9JNNeyl6bxSP4DXtTNzLfy Apl343BiZFqdbFHlR6CCFcx+4YESr9UhSF5h3MFgX5TZQWwqNH/GDBYZtZ/qjg2W YNznGYx/xBphCeC08/LgHTdy+EhGy9WjLBP/KAzVs6rRwpiPLpn/PBAKrNHqskIa T6QmcTmSgfzKJtKg8ZQwkzp8QELwudNfYOyasSeHD0nY855j9zvnfnKdPHhzkx33 Gt4goE94xas968SoQuQVF966L72JeZoAx48gMk+WTyP/3nMbwEDwtYX3cdOCte8z m8s04p1SQg== =02l7 -----END PGP SIGNATURE----- Merge tag 'io_uring-6.0-2022-09-18' of git://git.kernel.dk/linux Pull io_uring fixes from Jens Axboe: "Nothing really major here, but figured it'd be nicer to just get these flushed out for -rc6 so that the 6.1 branch will have them as well. That'll make our lives easier going forward in terms of development, and avoid trivial conflicts in this area. - Simple trace rename so that the returned opcode name is consistent with the enum definition (Stefan) - Send zc rsrc request vs notification lifetime fix (Pavel)" * tag 'io_uring-6.0-2022-09-18' of git://git.kernel.dk/linux: io_uring/opdef: rename SENDZC_NOTIF to SEND_ZC io_uring/net: fix zc fixed buf lifetime	2022-09-18 09:25:27 -07:00
Stefan Metzmacher	9bd3f72822	io_uring/opdef: rename SENDZC_NOTIF to SEND_ZC It's confusing to see the string SENDZC_NOTIF in ftrace output when using IORING_OP_SEND_ZC. Fixes: `b48c312be0` ("io_uring/net: simplify zerocopy send user API") Signed-off-by: Stefan Metzmacher <metze@samba.org> Cc: Pavel Begunkov <asml.silence@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Cc: io-uring@vger.kernel.org Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8e5cd8616919c92b6c3c7b6ea419fdffd5b97f3c.1663363798.git.metze@samba.org Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-18 06:59:13 -06:00
Pavel Begunkov	e3366e0234	io_uring/net: fix zc fixed buf lifetime Notifications usually outlive requests, so we need to pin buffers with it by assigning a rsrc to it instead of the request. Fixed: `b48c312be0` ("io_uring/net: simplify zerocopy send user API") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/dd6406ff8a90887f2b36ed6205dac9fda17c1f35.1663366886.git.asml.silence@gmail.com Reviewed-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-18 05:07:51 -06:00
Linus Torvalds	0158137d81	io_uring-6.0-2022-09-16 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmMkPA4QHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpjojD/9wzCmmIa1KF/oKJ+ee7UG7fMykuuBc2S0k sdPGILmBy8K57hQZ1emyyfJqxMrhsvV0KHVCMsvV4xvE2lMbxydga/3EqrjiU8ad YnZdAeAyFEdAqiM2SL8bQSdiTMVUCN7cddGjZL4kY7CzWPP115phNt0XneFGr1Cd exsTRNjfPABUlGGm2q7G/ii3QCg2nbnl9wn5kbwfsHx7yKedkiO4BJ5Yl8ynL1i3 jzLG/pcSxia9Bp8ZSLF+THFNqgYOJPvEbgygcn8tUt+F9DNE9lsEQ52/rp5TVEQz mcYUXzhaEbgblDaZGeTY0WI2Pa9M9f4AOlIhJ5du6rX9z3u2nIrumE4Y062VmWJP 9Cr47Nf/bAmzvbKrZ2ZRWYaA4dMufqLvUwrFz60BxLMYX4Z6ZWkyg7altzjTbqlf zrODI+fDY77NNhMPFoPknUl3RpKUYSzL6N4Qod4qL3xj3PW02HNZn8GqIMnDNeT8 5jWA1Arvqf420QOOvxiumIHiF9EgGWHoRIzg4UXnNiuRh08rJxgVVSAy0GuDA3L0 n296x55DCGwprJtT91oB7Vbv/qg6Twetcqzw0VLASsxnUKZkcTI4R/MTFdQe4tif qja3A5Okq8tefqSTk90zguFVnEnHH0pWyQQsEMwUPLE4lqrur/viKgVxZQdyQ+ak D+kSGs3aNg== =BSte -----END PGP SIGNATURE----- Merge tag 'io_uring-6.0-2022-09-16' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: "Two small patches: - Fix using an unsigned type for the return value, introduced in this release (Pavel) - Stable fix for a missing check for a fixed file on put (me)" * tag 'io_uring-6.0-2022-09-16' of git://git.kernel.dk/linux-block: io_uring/msg_ring: check file type before putting io_uring/rw: fix error'ed retry return values	2022-09-16 06:50:25 -07:00
Jens Axboe	fc7222c3a9	io_uring/msg_ring: check file type before putting If we're invoked with a fixed file, follow the normal rules of not calling io_fput_file(). Fixed files are permanently registered to the ring, and do not need putting separately. Cc: stable@vger.kernel.org Fixes: `aa184e8671` ("io_uring: don't attempt to IOPOLL for MSG_RING requests") Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-15 11:44:35 -06:00
Pavel Begunkov	62bb0647b1	io_uring/rw: fix error'ed retry return values Kernel test robot reports that we test negativity of an unsigned in io_fixup_rw_res() after a recent change, which masks error codes and messes up the return value in case I/O is re-retried and failed with an error. Fixes: `4d9cb92ca4` ("io_uring/rw: fix short rw error handling") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9754a0970af1861e7865f9014f735c70dc60bf79.1663071587.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-13 07:47:11 -06:00
Linus Torvalds	d2b768c3d4	io_uring-6.0-2022-09-09 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmMbdjoQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpnupD/9r6uN+ZnxYs3lYHZxi5NNFJ+CHzFFfxOVK fTs/JQwJbKey05Rk8F77jt9np5RBoTOsrUXV3Z2B48Mr+8jp3g2tMqb/PbULL+pa kt5RZ1rxmD1IWARbd6nWweJh1CJbqB5Z/wijLnU4ZBmjtN76RLlJRZfEd6FTRkwY DPFOdmmKks2io+QvDABundj80z2Hrk15KH7dN5yt+4DPNofhMUk5XOQD/CM7tnWa KKVvH4MjewrXzdS6OgbBv2Z6L+nJcQniOjAJdKSC3X4DEsRlN8EkF7tPhcVksss9 kmpilwfh0Um5ZyrxN8igzRqZ78l+lqkjG285rN8+MKsJar/X1KFfsZlG3ErfOe9T wEc9KHiPrGfDUJoCt6WzPnYQcTEmzqu8x4ip62UjaJ+c4wL7TIWTmcNRpxzqTto5 2iouoBzC6nh/xrYPFSwRopglfXZZhJhM8X+yYpKUQHJ+uS/JGcsCJSihM3qRzJWy 47IQ61GKpTFfnoJuftU8YjjjgUdYh1fAtxDhx5j6qI52Zflz3UKqmWC44yJXUuZM EHNzMHD2PMoSYDu6kmOcj+jTIlK1dHrA00riGySHDsjLh4MztXyVPc7pKGshneS7 Qrq8AcqYmnlCmxfAJcmNQ3BsT1aCCJqnzfCAzpObUU/7VFMiU0Y3vSJSOKhViPxS e2O/H2VBBw== =8R3K -----END PGP SIGNATURE----- Merge tag 'io_uring-6.0-2022-09-09' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: - Removed function that became unused after last week's merge (Jiapeng) - Two small fixes for kbuf recycling (Pavel) - Include address copy for zc send for POLLFIRST (Pavel) - Fix for short IO handling in the normal read/write path (Pavel) * tag 'io_uring-6.0-2022-09-09' of git://git.kernel.dk/linux-block: io_uring/rw: fix short rw error handling io_uring/net: copy addr for zc on POLL_FIRST io_uring: recycle kbuf recycle on tw requeue io_uring/kbuf: fix not advancing READV kbuf ring io_uring/notif: Remove the unused function io_notif_complete()	2022-09-09 14:57:18 -04:00
Pavel Begunkov	4d9cb92ca4	io_uring/rw: fix short rw error handling We have a couple of problems, first reports of unexpected link breakage for reads when cqe->res indicates that the IO was done in full. The reason here is partial IO with retries. TL;DR; we compare the result in __io_complete_rw_common() against req->cqe.res, but req->cqe.res doesn't store the full length but rather the length left to be done. So, when we pass the full corrected result via kiocb_done() -> __io_complete_rw_common(), it fails. The second problem is that we don't try to correct res in io_complete_rw(), which, for instance, might be a problem for O_DIRECT but when a prefix of data was cached in the page cache. We also definitely don't want to pass a corrected result into io_rw_done(). The fix here is to leave __io_complete_rw_common() alone, always pass not corrected result into it and fix it up as the last step just before actually finishing the I/O. Cc: stable@vger.kernel.org Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://github.com/axboe/liburing/issues/643 Reported-by: Beld Zhang <beldzhang@gmail.com> Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-09 08:57:57 -06:00
Pavel Begunkov	3c8400532d	io_uring/net: copy addr for zc on POLL_FIRST Every time we return from an issue handler and expect the request to be retried we should also setup it for async exec ourselves. Do that when we return on IORING_RECVSEND_POLL_FIRST in io_sendzc(), otherwise it'll re-read the address, which might be a surprise for the userspace. Fixes: `092aeedb75` ("io_uring: allow to pass addr into sendzc") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/ab1d0657890d6721339c56d2e161a4bba06f85d0.1662642013.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-08 08:28:38 -06:00
Pavel Begunkov	336d28a8f3	io_uring: recycle kbuf recycle on tw requeue When we queue a request via tw for execution it's not going to be executed immediately, so when io_queue_async() hits IO_APOLL_READY and queues a tw but doesn't try to recycle/consume the buffer some other request may try to use the the buffer. Fixes: `c7fb19428d` ("io_uring: add support for ring mapped supplied buffers") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/a19bc9e211e3184215a58e129b62f440180e9212.1662480490.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-07 10:36:10 -06:00
Pavel Begunkov	df6d3422d3	io_uring/kbuf: fix not advancing READV kbuf ring When we don't recycle a selected ring buffer we should advance the head of the ring, so don't just skip io_kbuf_recycle() for IORING_OP_READV but adjust the ring. Fixes: `934447a603` ("io_uring: do not recycle buffer in READV") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Reviewed-by: Dylan Yudaken <dylany@fb.com> Link: https://lore.kernel.org/r/a6d85e2611471bcb5d5dcd63a8342077ddc2d73d.1662480490.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-07 10:36:10 -06:00
Jiapeng Chong	4fa07edbb7	io_uring/notif: Remove the unused function io_notif_complete() The function io_notif_complete() is defined in the notif.c file, but not called elsewhere, so delete this unused function. io_uring/notif.c:24:20: warning: unused function 'io_notif_complete' [-Wunused-function]. Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2047 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com> Reviewed-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/20220905020436.51894-1-jiapeng.chong@linux.alibaba.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-05 11:42:39 -06:00
Linus Torvalds	cec53f4c8d	io_uring-6.0-2022-09-02 -----BEGIN PGP SIGNATURE----- iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAmMSIfUQHGF4Ym9lQGtl cm5lbC5kawAKCRD301j7KXHgpmWyD/9vhN1GTy41KS+GoU4ljBCNx3ubigp9FqXc baCFaDEQlsUuYyuaG/DAWghs/dDJc/Qk51cpii+QOrU2yVw6vvOO32P2rbSJYW+l qlqxNm3Mmpr+amMc94RcL26r69av6l+6UA4HTgziWia2LR4QklvthmC+G65HqQRt dbkWRb7KPqlmtFyeTQSzy0mgwwra66zyUUEpcif4Fhu1puV64MD7IGvR9INsS50r Mey2Uo+ANvEKtqhhAFBPL7/LH49IyYJtLoJiH5PdCJzS0UTIUkb5FQoiu2gQTeF2 /2ihT6wqe4T75Zo2Ofm8qK7OQF2+vM2KQWBc7ytiBNat32ypT/MGS3YtRd1p3zqc XTtfrtkzu7eoYGKCZfgZrRwKzfJuV9SjOwZEtfLgV42gTLnNkPZQCGsSPfPnVNeU Lc4GCxmf95blNvlkJK5SCtZ+SQQ+zOFm9sPuiDtmjJ5yMxUd28n52dzxTXqfG8su 75Zgh+tH5rFK2Vi0tn8mWsm8/ZLwEdMoyG1OCi9eXWyfxXfR2iYj7PK9a/T9aTXn cr5JwuGbfW6pnnoTq9CSKFknJSar4FiJMCkQuhtzBMmhl23gpjJifqs4PXWQpX+s e2rL5B0/9DOKXmagTqPYtfBLvvmnNRgZ7wE+N3gLbPbtRPjb6XupOv3nudLA+rgR keJh/vVhqQ== =yXSa -----END PGP SIGNATURE----- Merge tag 'io_uring-6.0-2022-09-02' of git://git.kernel.dk/linux-block Pull io_uring fixes from Jens Axboe: - A single fix for over-eager retries for networking (Pavel) - Revert the notification slot support for zerocopy sends. It turns out that even after more than a year or development and testing, there's not full agreement on whether just using plain ordered notifications is Good Enough to avoid the complexity of using the notifications slots. Because of that, we decided that it's best left to a future final decision. We can always bring back this feature, but we can't really change it or remove it once we've released 6.0 with it enabled. The reverts leave the usual CQE notifications as the primary interface for knowing when data was sent, and when it was acked. (Pavel) * tag 'io_uring-6.0-2022-09-02' of git://git.kernel.dk/linux-block: selftests/net: return back io_uring zc send tests io_uring/net: simplify zerocopy send user API io_uring/notif: remove notif registration Revert "io_uring: rename IORING_OP_FILES_UPDATE" Revert "io_uring: add zc notification flush requests" selftests/net: temporarily disable io_uring zc test io_uring/net: fix overexcessive retries	2022-09-02 16:37:01 -07:00
Pavel Begunkov	b48c312be0	io_uring/net: simplify zerocopy send user API Following user feedback, this patch simplifies zerocopy send API. One of the main complaints is that the current API is difficult with the userspace managing notification slots, and then send retries with error handling make it even worse. Instead of keeping notification slots change it to the per-request notifications model, which posts both completion and notification CQEs for each request when any data has been sent, and only one CQE if it fails. All notification CQEs will have IORING_CQE_F_NOTIF set and IORING_CQE_F_MORE in completion CQEs indicates whether to wait a notification or not. IOSQE_CQE_SKIP_SUCCESS is disallowed with zerocopy sends for now. This is less flexible, but greatly simplifies the user API and also the kernel implementation. We reuse notif helpers in this patch, but in the future there won't be need for keeping two requests. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/95287640ab98fc9417370afb16e310677c63e6ce.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-01 09:13:33 -06:00
Pavel Begunkov	57f332246a	io_uring/notif: remove notif registration We're going to remove the userspace exposed zerocopy notification API, remove notification registration. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/6ff00b97be99869c386958a990593c9c31cf105b.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-01 09:13:33 -06:00
Pavel Begunkov	d9808ceb31	Revert "io_uring: rename IORING_OP_FILES_UPDATE" This reverts commit `4379d5f15b`. We removed notification flushing, also cleanup uapi preparation changes to not pollute it. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/89edc3905350f91e1b6e26d9dbf42ee44fd451a2.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-01 09:13:33 -06:00
Pavel Begunkov	23c12d5fc0	Revert "io_uring: add zc notification flush requests" This reverts commit `492dddb4f6`. Soon we won't have the very notion of notification flushing, so remove notification flushing requests. Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/8850334ca56e65b413cb34fd158db81d7b2865a3.1662027856.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-09-01 09:13:33 -06:00
Linus Torvalds	9c9d1896fa	lsm/stable-6.0 PR 20220829 -----BEGIN PGP SIGNATURE----- iQJIBAABCAAyFiEES0KozwfymdVUl37v6iDy2pc3iXMFAmMNEC8UHHBhdWxAcGF1 bC1tb29yZS5jb20ACgkQ6iDy2pc3iXN6uA//Wvoj5l33ngi5p6CNAfxrZiOeeki7 ylMO9NF4BZY+BOKtWDcrUvpZoLCEEEtLihQ8vz7Iyedtpd34KBzI+H+36JDC9jei dWZiXYzzmaN6JVQ2pIGWr9kTfRPbbE4X91bI2jhDOBv64zCqZu2qDoXshud5WHU1 XhMMtAsQHKrdZa29y6nj6xHYuVA/fkpL5rg5LDrFDYwS7fV+g02ATmRnEsGefRNu JbjrapAnl6lWO6peRuyLNzf6NNgLLsXAmYOdyJGERKx23TSwqVMGhK6eODYBttiH E9OfFDz3oqbLfVrL6uBlr30T1lnns+WyRWdRvAP36L9wbQ/0o24mGsf5E20wo1T9 rwPNsFelI66Eu2S1v/DQWtGtzeaed5IrWMtQc93x4I1PQIxwMSP4znWEKg/2zDNQ tBVVjs6bIzWHbeYozmKK9xvtqL08F5H6t+cS7BDVWPfb8nAfiXvyrwgCRY36xHfO LJWb125lbDflkPRiIgf81IAE6SZLH/PFLowNXZUSAo0CTALhlGZXmhNr6Oz7Xr2A NIwKvuFNqGav0Rcsk+Qy0ir6jRKOj9854U4y3kAVOAhPSyBVZAoN1Y3wtiOpmdI0 taLNKv9W46ZxQtqQNOm31/py3N4bZl0y2JvS4lvwbDMqCjCqVE7236GjQ0vtYQQi 8thpb268VJTby8Y= =/7Pp -----END PGP SIGNATURE----- Merge tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm Pull LSM support for IORING_OP_URING_CMD from Paul Moore: "Add SELinux and Smack controls to the io_uring IORING_OP_URING_CMD. These are necessary as without them the IORING_OP_URING_CMD remains outside the purview of the LSMs (Luis' LSM patch, Casey's Smack patch, and my SELinux patch). They have been discussed at length with the io_uring folks, and Jens has given his thumbs-up on the relevant patches (see the commit descriptions). There is one patch that is not strictly necessary, but it makes testing much easier and is very trivial: the /dev/null IORING_OP_URING_CMD patch." * tag 'lsm-pr-20220829' of git://git.kernel.org/pub/scm/linux/kernel/git/pcmoore/lsm: Smack: Provide read control for io_uring_cmd /dev/null: add IORING_OP_URING_CMD support selinux: implement the security_uring_cmd() LSM hook lsm,io_uring: add LSM hooks for the new uring_cmd file op	2022-08-31 09:23:16 -07:00
Pavel Begunkov	dfb58b1796	io_uring/net: fix overexcessive retries Length parameter of io_sg_from_iter() can be smaller than the iterator's size, as it's with TCP, so when we set from->count at the end of the function we truncate the iterator forcing TCP to return preliminary with a short send. It affects zerocopy sends with large payload sizes and leads to retries and possible request failures. Fixes: `3ff1a0d395` ("io_uring: enable managed frags with register buffers") Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/0bc0d5179c665b4ef5c328377c84c7a1f298467e.1661530037.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-26 10:31:42 -06:00
Luis Chamberlain	2a58401240	lsm,io_uring: add LSM hooks for the new uring_cmd file op io-uring cmd support was added through `ee692a21e9` ("fs,io_uring: add infrastructure for uring-cmd"), this extended the struct file_operations to allow a new command which each subsystem can use to enable command passthrough. Add an LSM specific for the command passthrough which enables LSMs to inspect the command details. This was discussed long ago without no clear pointer for something conclusive, so this enables LSMs to at least reject this new file operation. [0] https://lkml.kernel.org/r/8adf55db-7bab-f59d-d612-ed906b948d19@schaufler-ca.com Cc: stable@vger.kernel.org Fixes: `ee692a21e9` ("fs,io_uring: add infrastructure for uring-cmd") Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> Acked-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Paul Moore <paul@paul-moore.com>	2022-08-26 11:19:43 -04:00
Pavel Begunkov	581711c466	io_uring/net: save address for sendzc async execution We usually copy all bits that a request needs from the userspace for async execution, so the userspace can keep them on the stack. However, send zerocopy violates this pattern for addresses and may reloads it e.g. from io-wq. Save the address if any in ->async_data as usual. Reported-by: Stefan Metzmacher <metze@samba.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/d7512d7aa9abcd36e9afe1a4d292a24cb2d157e5.1661342812.git.asml.silence@gmail.com [axboe: fold in incremental fix] Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-25 07:52:30 -06:00
Pavel Begunkov	5916943943	io_uring: conditional ->async_data allocation There are opcodes that need ->async_data only in some cases and allocation it unconditionally may hurt performance. Add an option to opdef to make move the allocation part from the core io_uring to opcode specific code. Note, we can't just set opdef->async_size to zero because there are other helpers that rely on it, e.g. io_alloc_async_data(). Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> Link: https://lore.kernel.org/r/9dc62be9e88dd0ed63c48365340e8922d2498293.1661342812.git.asml.silence@gmail.com Signed-off-by: Jens Axboe <axboe@kernel.dk>	2022-08-24 08:57:28 -06:00

1 2 3 4 5 ...

264 Commits