Commit Graph

1089717 Commits

Author SHA1 Message Date
Jens Axboe
9f010507bb io_uring: set task_work notify method at init time
While doing so, switch SQPOLL to TWA_SIGNAL_NO_IPI as well, as that
just does a task wakeup and then we can remove the special wakeup we
have in task_work_add.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-5-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-30 08:39:54 -06:00
Jens Axboe
6cf5862e3c io-wq: use __set_notify_signal() to wake workers
The only difference between set_notify_signal() and __set_notify_signal()
is that the former checks if it needs to deliver an IPI to force a
reschedule. As the io-wq workers never leave the kernel, and IPI is never
needed, they simply need a wakeup.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-4-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-30 08:39:54 -06:00
Jens Axboe
3a4b89a25c io_uring: serialize ctx->rings->sq_flags with atomic_or/and
Rather than require ctx->completion_lock for ensuring that we don't
clobber the flags, use the atomic bitop helpers instead. This removes
the need to grab the completion_lock, in preparation for needing to set
or clear sq_flags when we don't know the status of this lock.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/20220426014904.60384-3-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-30 08:39:54 -06:00
Jens Axboe
e788be95a5 task_work: allow TWA_SIGNAL without a rescheduling IPI
Some use cases don't always need an IPI when sending a TWA_SIGNAL
notification. Add TWA_SIGNAL_NO_IPI, which is just like TWA_SIGNAL, except
it doesn't send an IPI to the target task. It merely sets
TIF_NOTIFY_SIGNAL and wakes up the task.

This can be useful in avoiding a forceful transition to the kernel if the
task is running in userspace. Depending on the task_work in question, it
may be quite fine waiting for the next reschedule or kernel enter anyway,
or the use case may even have other mechanisms for hinting to the task
that a transition may be useful. This can drive more cooperative
scheduling of task_work.

Reviewed-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/821f42b6-7d91-8074-8212-d34998097de4@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-30 08:39:32 -06:00
Jens Axboe
69cc1b6fa5 io_uring: fix compile warning for 32-bit builds
If IO_URING_SCM_ALL isn't set, as it would not be on 32-bit builds,
then we trigger a warning:

fs/io_uring.c: In function '__io_sqe_files_unregister':
fs/io_uring.c:8992:13: warning: unused variable 'i' [-Wunused-variable]
 8992 |         int i;
      |             ^

Move the ifdef up to include the 'i' variable declaration.

Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Fixes: 5e45690a1c ("io_uring: store SCM state in io_fixed_file->file_ptr")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-25 16:43:45 -06:00
Dylan Yudaken
155bc9505d io_uring: return an error when cqe is dropped
Right now io_uring will not actively inform userspace if a CQE is
dropped. This is extremely rare, requiring a CQ ring overflow, as well as
a GFP_ATOMIC kmalloc failure. However the consequences could cause for
example applications to go into an undefined state, possibly waiting for a
CQE that never arrives.

Return an error code (EBADR) in these cases. Since this is expected to be
incredibly rare, try and avoid as much as possible affecting the hot code
paths, and so it only is returned lazily and when there is no other
available CQEs.

Once the error is returned, reset the error condition assuming the user is
either ok with it or will clean up appropriately.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-6-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:18 -06:00
Dylan Yudaken
10988a0a67 io_uring: use constants for cq_overflow bitfield
Prepare to use this bitfield for more flags by using constants instead of
magic value 0

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-5-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:18 -06:00
Dylan Yudaken
3e813c9026 io_uring: rework io_uring_enter to simplify return value
io_uring_enter returns the count submitted preferrably over an error
code. In some code paths this check is not required, so reorganise the
code so that the check is only done as needed.
This is also a prep for returning error codes only in waiting scenarios.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-4-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:18 -06:00
Dylan Yudaken
08dcd0288f io_uring: trace cqe overflows
Trace cqe overflows in io_uring. Print ocqe before the check, so if it is
NULL it indicates that it has been dropped.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-3-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:18 -06:00
Dylan Yudaken
47894438e9 io_uring: add trace support for CQE overflow
Add trace function for overflowing CQ ring.

Signed-off-by: Dylan Yudaken <dylany@fb.com>
Link: https://lore.kernel.org/r/20220421091345.2115755-2-dylany@fb.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:18 -06:00
Jens Axboe
10c873334f io_uring: allow re-poll if we made progress
We currently check REQ_F_POLLED before arming async poll for a
notification to retry. If it's set, then we don't allow poll and will
punt to io-wq instead. This is done to prevent a situation where a buggy
driver will repeatedly return that there's space/data available yet we
get -EAGAIN.

However, if we already transferred data, then it should be safe to rely
on poll again. Gate the check on whether or not REQ_F_PARTIAL_IO is
also set.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:18 -06:00
Jens Axboe
4c3c09439c io_uring: support MSG_WAITALL for IORING_OP_SEND(MSG)
Like commit 7ba89d2af1 for recv/recvmsg, support MSG_WAITALL for the
send side. If this flag is set and we do a short send, retry for a
stream of seqpacket socket.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:18 -06:00
Jens Axboe
970f256edb io_uring: add support for IORING_ASYNC_CANCEL_ANY
Rather than match on a specific key, be it user_data or file, allow
canceling any request that we can lookup. Works like
IORING_ASYNC_CANCEL_ALL in that it cancels multiple requests, but it
doesn't key off user_data or the file.

Can't be set with IORING_ASYNC_CANCEL_FD, as that's a key selector.
Only one may be used at the time.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-6-axboe@kernel.dk
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:18 -06:00
Jens Axboe
4bf94615b8 io_uring: allow IORING_OP_ASYNC_CANCEL with 'fd' key
Currently sqe->addr must contain the user_data of the request being
canceled. Introduce the IORING_ASYNC_CANCEL_FD flag, which tells the
kernel that we're keying off the file fd instead for cancelation. This
allows canceling any request that a) uses a file, and b) was assigned the
file based on the value being passed in.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-5-axboe@kernel.dk
2022-04-24 18:18:18 -06:00
Jens Axboe
8e29da69fe io_uring: add support for IORING_ASYNC_CANCEL_ALL
The current cancelation will lookup and cancel the first request it
finds based on the key passed in. Add a flag that allows to cancel any
request that matches they key. It completes with the number of requests
found and canceled, or res < 0 if an error occured.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-4-axboe@kernel.dk
2022-04-24 18:18:18 -06:00
Jens Axboe
b21432b4d5 io_uring: pass in struct io_cancel_data consistently
In preparation for being able to not only key cancel off the user_data,
pass in the io_cancel_data struct for the various functions that deal
with request cancelation.

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-3-axboe@kernel.dk
2022-04-24 18:18:18 -06:00
Jens Axboe
98d3dcc8be io_uring: remove dead 'poll_only' argument to io_poll_cancel()
It's only called from one location, and it always passes in 'false'.
Kill the argument, and just pass in 'false' to io_poll_find().

Signed-off-by: Jens Axboe <axboe@kernel.dk>
Link: https://lore.kernel.org/r/20220418164402.75259-2-axboe@kernel.dk
2022-04-24 18:18:18 -06:00
Pavel Begunkov
81ec803b4e io_uring: refactor io_disarm_next() locking
Split timeout handling into removal + failing, so we can reduce
spinlocking time and remove another instance of triple nested locking.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/0f00d115f9d4c5749028f19623708ad3695512d6.1650458197.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:17 -06:00
Pavel Begunkov
3645c2000a io_uring: move timeout locking in io_timeout_cancel()
Move ->timeout_lock grabbing inside of io_timeout_cancel(), so
we can do io_req_task_queue_fail() outside of the lock. It's much nicer
than relying on triple nested locking.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cde758c2897930d31e205ed8f476d4ec879a8849.1650458197.git.asml.silence@gmail.com
[axboe: drop now wrong timeout_lock annotation]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:17 -06:00
Jens Axboe
5e45690a1c io_uring: store SCM state in io_fixed_file->file_ptr
A previous commit removed SCM accounting for non-unix sockets, as those
are the only ones that can cause a fixed file reference. While that is
true, it also means we're now dereferencing the file as part of the
workqueue driven __io_sqe_files_unregister() after the process has
exited. This isn't safe for SCM files, as unix gc may have already
reaped them when the process exited. KASAN complains about this:

[   12.307040] Freed by task 0:
[   12.307592]  kasan_save_stack+0x28/0x4c
[   12.308318]  kasan_set_track+0x28/0x38
[   12.309049]  kasan_set_free_info+0x24/0x44
[   12.309890]  ____kasan_slab_free+0x108/0x11c
[   12.310739]  __kasan_slab_free+0x14/0x1c
[   12.311482]  slab_free_freelist_hook+0xd4/0x164
[   12.312382]  kmem_cache_free+0x100/0x1dc
[   12.313178]  file_free_rcu+0x58/0x74
[   12.313864]  rcu_core+0x59c/0x7c0
[   12.314675]  rcu_core_si+0xc/0x14
[   12.315496]  _stext+0x30c/0x414
[   12.316287]
[   12.316687] Last potentially related work creation:
[   12.317885]  kasan_save_stack+0x28/0x4c
[   12.318845]  __kasan_record_aux_stack+0x9c/0xb0
[   12.319976]  kasan_record_aux_stack_noalloc+0x10/0x18
[   12.321268]  call_rcu+0x50/0x35c
[   12.322082]  __fput+0x2fc/0x324
[   12.322873]  ____fput+0xc/0x14
[   12.323644]  task_work_run+0xac/0x10c
[   12.324561]  do_notify_resume+0x37c/0xe74
[   12.325420]  el0_svc+0x5c/0x68
[   12.326050]  el0t_64_sync_handler+0xb0/0x12c
[   12.326918]  el0t_64_sync+0x164/0x168
[   12.327657]
[   12.327976] Second to last potentially related work creation:
[   12.329134]  kasan_save_stack+0x28/0x4c
[   12.329864]  __kasan_record_aux_stack+0x9c/0xb0
[   12.330735]  kasan_record_aux_stack+0x10/0x18
[   12.331576]  task_work_add+0x34/0xf0
[   12.332284]  fput_many+0x11c/0x134
[   12.332960]  fput+0x10/0x94
[   12.333524]  __scm_destroy+0x80/0x84
[   12.334213]  unix_destruct_scm+0xc4/0x144
[   12.334948]  skb_release_head_state+0x5c/0x6c
[   12.335696]  skb_release_all+0x14/0x38
[   12.336339]  __kfree_skb+0x14/0x28
[   12.336928]  kfree_skb_reason+0xf4/0x108
[   12.337604]  unix_gc+0x1e8/0x42c
[   12.338154]  unix_release_sock+0x25c/0x2dc
[   12.338895]  unix_release+0x58/0x78
[   12.339531]  __sock_release+0x68/0xec
[   12.340170]  sock_close+0x14/0x20
[   12.340729]  __fput+0x18c/0x324
[   12.341254]  ____fput+0xc/0x14
[   12.341763]  task_work_run+0xac/0x10c
[   12.342367]  do_notify_resume+0x37c/0xe74
[   12.343086]  el0_svc+0x5c/0x68
[   12.343510]  el0t_64_sync_handler+0xb0/0x12c
[   12.344086]  el0t_64_sync+0x164/0x168

We have an extra bit we can use in file_ptr on 64-bit, use that to store
whether this file is SCM'ed or not, avoiding the need to look at the
file contents itself. This does mean that 32-bit will be stuck with SCM
for all registered files, just like 64-bit did before the referenced
commit.

Fixes: 1f59bc0f18 ("io_uring: don't scm-account for non af_unix sockets")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:18:06 -06:00
Pavel Begunkov
7ac1edc4a9 io_uring: kill ctx arg from io_req_put_rsrc
The ctx argument of io_req_put_rsrc() is not used, kill it.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/bb51bf3ff02775b03e6ea21bc79c25d7870d1644.1650311386.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:52 -06:00
Pavel Begunkov
25a15d3c66 io_uring: add a helper for putting rsrc nodes
Add a simple helper to encapsulating dropping rsrc nodes references,
it's cleaner and will help if we'd change rsrc refcounting or play with
percpu_ref_put() [no]inlining.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/63fdd953ac75898734cd50e8f69e95e6664f46fe.1650311386.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:52 -06:00
Pavel Begunkov
c1bdf8ed1e io_uring: store rsrc node in req instead of refs
req->fixed_rsrc_refs keeps a pointer to rsrc node pcpu references, but
it's more natural just to store rsrc node directly. There were some
reasons for that in the past but not anymore.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/cee1c86ec9023f3e4f6ce8940d58c017ef8782f4.1650311386.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:52 -06:00
Pavel Begunkov
772f5e002b io_uring: refactor io_assign_file error path
All io_assign_file() callers do error handling themselves,
req_set_fail() in the io_assign_file()'s fail path needlessly bloats the
kernel and is not the best abstraction to have. Simplify the error path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/eff77fb1eac2b6a90cca5223813e6a396ffedec0.1650311386.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:52 -06:00
Pavel Begunkov
93f052cb39 io_uring: use right helpers for file assign locking
We have io_ring_submit_[un]lock() functions helping us with conditional
->uring_lock locking, use them in io_file_get_fixed() instead of hand
coding.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c9c9ff1e046f6eb68da0a251962a697f8a2275fa.1650311386.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:52 -06:00
Pavel Begunkov
a6d97a8a77 io_uring: add data_race annotations
We have several racy reads, mark them with data_race() to demonstrate
this fact.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7e56e750d294c70b2a56938bd733386f19f0eb53.1650056133.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:52 -06:00
Pavel Begunkov
17b147f6c1 io_uring: inline io_req_complete_fail_submit()
Inline io_req_complete_fail_submit(), there is only one caller and the
name doesn't tell us much.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fe5851af01dcd39fc84b71b8539c7cbe4658fb6d.1650056133.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:52 -06:00
Pavel Begunkov
924a07e482 io_uring: refactor io_submit_sqe()
Remove one extra if for non-linked path of io_submit_sqe().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/03183199d1bf494b4a72eca16d792c8a5945acb4.1650056133.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:52 -06:00
Pavel Begunkov
df3becde8d io_uring: refactor lazy link fail
Remove the lazy link fail logic from io_submit_sqe() and hide it into a
helper. It simplifies the code and will be needed in next patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/6a68aca9cf4492132da1d7c8a09068b74aba3c65.1650056133.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:49 -06:00
Pavel Begunkov
da1a08c5b2 io_uring: introduce IO_REQ_LINK_FLAGS
Add a macro for all link request flags to avoid duplication.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/df38b883e31e7e0ca4e364d25a0743862961b180.1650056133.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:49 -06:00
Pavel Begunkov
7bfa9badc7 io_uring: refactor io_queue_sqe()
io_queue_sqe() is a part of the submission path and we try hard to keep
it inlined, so shed some extra bytes from it by moving the error
checking part into io_queue_sqe_arm_apoll() and renaming it accordingly.

note: io_queue_sqe_arm_apoll() is not inlined, thus the patch doesn't
change the number of function calls for the apoll path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/9b79edd246336decfaca79b949a15ac69123490d.1650056133.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:49 -06:00
Pavel Begunkov
77955efbc4 io_uring: rename io_queue_async_work()
Rename io_queue_async_work(). The name is pretty old but now doesn't
reflect well what the function is doing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/5d4b25c54cccf084f9f2fd63bd4e4fa4515e998e.1650056133.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:49 -06:00
Pavel Begunkov
cbc2e20388 io_uring: inline io_queue_sqe()
Inline io_queue_sqe() as there is only one caller left, and rename
__io_queue_sqe().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d5742683b7a7caceb1c054e91e5b9135b0f3b858.1650056133.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:49 -06:00
Pavel Begunkov
cb2d344c75 io_uring: helper for prep+queuing linked timeouts
We try to aggresively inline the submission path, so it's a good idea to
not pollute it with colder code. One of them is linked timeout
preparation + queue, which can be extracted into a function.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ecf74df7ac77389b6d9211211ec4954e91de98ba.1650056133.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:49 -06:00
Pavel Begunkov
f5c6cf2a31 io_uring: inline io_free_req()
Inline io_free_req() into its only user and remove an underscore prefix
from __io_free_req().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ed114edef5c256a644f4839bb372df70d8df8e3f.1650056133.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:49 -06:00
Pavel Begunkov
4e118cd9e9 io_uring: kill io_put_req_deferred()
We have several spots where a call to io_fill_cqe_req() is immediately
followed by io_put_req_deferred(). Replace them with
__io_req_complete_post() and get rid of io_put_req_deferred() and
io_fill_cqe_req().

> size ./fs/io_uring.o
   text    data     bss     dec     hex filename
  86942   13734       8  100684   1894c ./fs/io_uring.o
> size ./fs/io_uring.o
   text    data     bss     dec     hex filename
  86438   13654       8  100100   18704 ./fs/io_uring.o

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/10672a538774ac8986bee6468d960527af59169d.1650056133.git.asml.silence@gmail.com
[axboe: fold in followup fix]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:49 -06:00
Pavel Begunkov
971cf9c19e io_uring: minor refactoring for some tw handlers
Get rid of some useless local variables

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/7798327b684b7015f7e4300420142ddfcd317297.1650056133.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:49 -06:00
Pavel Begunkov
f22190570b io_uring: clean poll tw PF_EXITING handling
When we meet PF_EXITING in io_poll_check_events(), don't overcomplicate
the code with io_poll_mark_cancelled() but just return -ECANCELED and
the callers will deal with the rest.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f0cc981af82a5b193658f8f44397eeb3bf838b7b.1650056133.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:49 -06:00
Pavel Begunkov
d8da428b7a io_uring: optimise io_get_cqe()
io_get_cqe() is expensive because of a bunch of loads, masking, etc.
However, most of the time we should have enough of entries in the CQ,
so we can cache two pointers representing a range of contiguous CQE
memory we can use. When the range is exhausted we'll go through a slower
path to set up a new range. When there are no CQEs avaliable, pointers
will naturally point to the same address.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/487eeef00f3146537b3d9c1a9cef2fc0b9a86f81.1649771823.git.asml.silence@gmail.com
[axboe: santinel -> sentinel]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:46 -06:00
Pavel Begunkov
1cd15904b6 io_uring: optimise submission left counting
Considering all inlining io_submit_sqe() is huge and usually ends up
calling some other functions.

We decrement @left in io_submit_sqes() just before calling
io_submit_sqe() and use it later after the call. Considering how huge
io_submit_sqe() is, there is not much hope @left will be treated
gracefully by compilers.

Decrement it after the call, not only it's easier on register spilling
and probably saves stack write/read, but also at least for x64 uses
CPU flags set by the dec instead of doing (read/write and tests).

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/807f9a276b54ee8ff4e42e2b78721484f1c71743.1649771823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:46 -06:00
Pavel Begunkov
8e6971a819 io_uring: optimise submission loop invariant
Instead of keeping @submitted in io_submit_sqes(), which for each
iteration requires comparison with the initial number of SQEs, store the
number of SQEs left to submit. We'll need nr only for when we're done
with SQE handling.

note: if we can't allocate a req for the first SQE we always has been
returning -EAGAIN to the userspace, save this behaviour by looking into
the cache in a slow path.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/c3b3df9aeae4c2f7a53fd8386385742e4e261e77.1649771823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:46 -06:00
Pavel Begunkov
fa05457a60 io_uring: add helper to return req to cache list
Don't hand code wq_stack_add_head() to ->free_list, which serves for
recycling io_kiocb, add a helper doing it for us.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/f206f575486a8dd3d52f074ab37ed146b2d215b7.1649771823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:46 -06:00
Pavel Begunkov
88ab95be7e io_uring: helper for empty req cache checks
Add io_req_cache_empty(), which checks if there are requests in the
inline req cache or not. It'll be needed in the future, but also nicely
cleans up a few spots poking into ->free_list directly.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/b18662389f3fb483d0bd07906647f65f6037475a.1649771823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:46 -06:00
Pavel Begunkov
23a5c43b2f io_uring: inline io_flush_cached_reqs
io_flush_cached_reqs() isn't descriptive and has only one caller, inline
it into __io_alloc_req_refill().

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ec38abe65a883d9fe6b169793119ce86806655a4.1649771823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:46 -06:00
Pavel Begunkov
e126391c09 io_uring: shrink final link flush
All good users should not set IOSQE_IO_*LINK flags for the last request
of a link. io_uring flushes collected links at the end of submission,
but it's not the optimal way and so we don't care too much about it.
Replace io_queue_sqe() call with io_queue_sqe_fallback() as the former
one is inlined and will generate a bunch of extra code. This will also
help compilers with the submission path inlining.

> size ./fs/io_uring.o
   text    data     bss     dec     hex filename
  87265   13734       8  101007   18a8f ./fs/io_uring.o
> size ./fs/io_uring.o
   text    data     bss     dec     hex filename
  87073   13734       8  100815   189cf ./fs/io_uring.o

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/01fb5e417ef49925d544a0b0bae30409845ed2b4.1649771823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:46 -06:00
Pavel Begunkov
90e7c35fb8 io_uring: memcpy CQE from req
We can do CQE filling a bit more efficiently when req->cqe is fully
filled by memcpy()'ing it to the userspace instead of doing it field by
field. It's easier on register spilling, removes a couple of extra
loads/stores and write combines two u32 memory writes.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/ee3f514ff28b1fe3347a8eca93a9d91647f2eaad.1649771823.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 18:02:46 -06:00
Pavel Begunkov
cef216fc32 io_uring: explicitly keep a CQE in io_kiocb
We already have req->{result,user_data,cflags}, which mimic struct
io_uring_cqe and are intended to store CQE data. Combine them into a
struct io_uring_cqe field.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/e1efe65d5005cd6a9ec3440767eb15a9fa9351cf.1649771823.git.asml.silence@gmail.com
[axboe: add mirror cqe to cater to fd union]
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 17:37:06 -06:00
Pavel Begunkov
8b3171bdf5 io_uring: rename io_sqe_file_register
Rename io_sqe_file_register(), so the name better reflects what the
function is doing.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/d5091518883786969e244d2f0854a47bbdaa5061.1649334991.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 17:34:32 -06:00
Pavel Begunkov
73b25d3bad io_uring: deduplicate SCM accounting
Merge io_sqe_file_register() and io_sqe_file_register(). The only
real difference left between them is from where we get an skb.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/dddda3039c71fcbec24b3465cbe8c7e7ae7bb0e8.1649334991.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 17:34:19 -06:00
Pavel Begunkov
e390510af0 io_uring: don't pass around fixed index for scm
There is an old API nuisance where io_uring's SCM accounting functions
traverse fixed file tables and so requires them to be set in advance,
which leads to some implicit rules of how io_sqe_file_register() should
be used.

__io_sqe_files_scm() now works with only one file at a time, pass a file
directly and get rid of all fixed table dereferencing inside. Clean
io_sqe_file_register() callers.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Link: https://lore.kernel.org/r/fb32031d892e61a7748c70da7999725d5e798671.1649334991.git.asml.silence@gmail.com
Signed-off-by: Jens Axboe <axboe@kernel.dk>
2022-04-24 17:34:19 -06:00