-----BEGIN PGP SIGNATURE-----
iQJEBAABCAAuFiEEwPw5LcreJtl1+l5K99NY+ylx4KYFAl7VP+kQHGF4Ym9lQGtl
cm5lbC5kawAKCRD301j7KXHgpuK4D/0XsSG/Yirbba1rrbqw/qpw9xcAs9oyN0tS
8SmmGN27ghrkVSsGBXNcG+PSTu3pkkLjYZ6TQtKamrya9G+lRAsKRsQ+Yq+7Qv4e
N6lCUlLJ99KqTMtwvIoxSpA1tz3ENHucOw2cJrw3kd9G0kil7GvDkIOBasd+kmwn
ak+mnMJZzRhqSM7M5lKQOk8l92gKBHGbPy4xKb0st3dQkYptDvit0KcNSAuevtOp
sRZpdbXaT3FA6xa5iEgggI6vZQGVmK1EaGoQqZ8vgVo75aovkjZyQWWiFVVOlEqr
QjUCCQuixcbMRbZjgpojqva5nmLhFVhLCfoSH2XgttEQZhmTwypdRwM2/IlxV5q2
xCofrDkhYOfIgHkuP6p68ukIPIfQ+4jotvsmXZ/HeD/xbx3TRyJRZadISr6wiuLm
7zRXWaGCYomUIPJOOrpBQ9FsCglkaN63oB6VGuGKTg3g7kE2QrZ2/aGuexP+FAdh
OrA8BlzxZzpqMKhjQVKOl9r6FU928MZn8nIAkMdQ/Ia1mOpb4rrPo4qCdf+tbhPO
pmKtQPQjbszQ3UfTgShvfvDk43BeRim1DxZPFTauSu1FMpqWBCwQgXMynPFrf5TR
HXF61G+jw5swDW6uJgW7bXdm7hHr15vRqQr54MgGS+T0OOa1df9MR0dJB5CGklfI
ycLU6AAT+A==
=A/qA
-----END PGP SIGNATURE-----
Merge tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block
Pull io_uring updates from Jens Axboe:
"A relatively quiet round, mostly just fixes and code improvements. In
particular:
- Make statx just use the generic statx handler, instead of open
coding it. We don't need that anymore, as we always call it async
safe (Bijan)
- Enable closing of the ring itself. Also fixes O_PATH closure (me)
- Properly name completion members (me)
- Batch reap of dead file registrations (me)
- Allow IORING_OP_POLL with double waitqueues (me)
- Add tee(2) support (Pavel)
- Remove double off read (Pavel)
- Fix overflow cancellations (Pavel)
- Improve CQ timeouts (Pavel)
- Async defer drain fixes (Pavel)
- Add support for enabling/disabling notifications on a registered
eventfd (Stefano)
- Remove dead state parameter (Xiaoguang)
- Disable SQPOLL submit on dying ctx (Xiaoguang)
- Various code cleanups"
* tag 'for-5.8/io_uring-2020-06-01' of git://git.kernel.dk/linux-block: (29 commits)
io_uring: fix overflowed reqs cancellation
io_uring: off timeouts based only on completions
io_uring: move timeouts flushing to a helper
statx: hide interfaces no longer used by io_uring
io_uring: call statx directly
statx: allow system call to be invoked from io_uring
io_uring: add io_statx structure
io_uring: get rid of manual punting in io_close
io_uring: separate DRAIN flushing into a cold path
io_uring: don't re-read sqe->off in timeout_prep()
io_uring: simplify io_timeout locking
io_uring: fix flush req->refs underflow
io_uring: don't submit sqes when ctx->refs is dying
io_uring: async task poll trigger cleanup
io_uring: add tee(2) support
splice: export do_tee()
io_uring: don't repeat valid flag list
io_uring: rename io_file_put()
io_uring: remove req->needs_fixed_files
io_uring: cleanup io_poll_remove_one() logic
...
Overflowed requests in io_uring_cancel_files() should be shed only of
inflight and overflowed refs. All other left references are owned by
someone else.
If refcount_sub_and_test() fails, it will go further and put put extra
ref, don't do that. Also, don't need to do io_wq_cancel_work()
for overflowed reqs, they will be let go shortly anyway.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Offset timeouts wait not for sqe->off non-timeout CQEs, but rather
sqe->off + number of prior inflight requests. Wait exactly for
sqe->off non-timeout completions
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Separate flushing offset timeouts io_commit_cqring() by moving it into a
helper. Just a preparation, makes following patches clearer.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Calling statx directly both simplifies the interface and avoids potential
incompatibilities between sync and async invokations.
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Separate statx data from open in io_kiocb. No functional changes.
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_close() was punting async manually to skip grabbing files. Use
REQ_F_NO_FILE_TABLE instead, and pass it through the generic path
with -EAGAIN.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_commit_cqring() assembly doesn't look good with extra code handling
drained requests. IOSQE_IO_DRAIN is slow and discouraged to be used in
a hot path, so try to minimise its impact by putting it into a helper
and doing a fast check.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
SQEs are user writable, don't read sqe->off twice in io_timeout_prep()
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Move spin_lock_irq() earlier to have only 1 call site of it in
io_timeout(). It makes the flow easier.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In io_uring_cancel_files(), after refcount_sub_and_test() leaves 0
req->refs, it calls io_put_req(), which would also put a ref. Call
io_free_req() instead.
Cc: stable@vger.kernel.org
Fixes: 2ca10259b4 ("io_uring: prune request from overflow list on flush")
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When IORING_SETUP_SQPOLL is enabled, io_ring_ctx_wait_and_kill() will wait
for sq thread to idle by busy loop:
while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait))
cond_resched();
Above loop isn't very CPU friendly, it may introduce a short cpu burst on
the current cpu.
If ctx->refs is dying, we forbid sq_thread from submitting any further
SQEs. Instead they just get discarded when we exit.
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
In io_sq_thread(), currently if we get an -EBUSY error and go to sleep,
we will won't clear it again, which will result in io_sq_thread() will
never have a chance to submit sqes again. Below test program test.c
can reveal this bug:
int main(int argc, char *argv[])
{
struct io_uring ring;
int i, fd, ret;
struct io_uring_sqe *sqe;
struct io_uring_cqe *cqe;
struct iovec *iovecs;
void *buf;
struct io_uring_params p;
if (argc < 2) {
printf("%s: file\n", argv[0]);
return 1;
}
memset(&p, 0, sizeof(p));
p.flags = IORING_SETUP_SQPOLL;
ret = io_uring_queue_init_params(4, &ring, &p);
if (ret < 0) {
fprintf(stderr, "queue_init: %s\n", strerror(-ret));
return 1;
}
fd = open(argv[1], O_RDONLY | O_DIRECT);
if (fd < 0) {
perror("open");
return 1;
}
iovecs = calloc(10, sizeof(struct iovec));
for (i = 0; i < 10; i++) {
if (posix_memalign(&buf, 4096, 4096))
return 1;
iovecs[i].iov_base = buf;
iovecs[i].iov_len = 4096;
}
ret = io_uring_register_files(&ring, &fd, 1);
if (ret < 0) {
fprintf(stderr, "%s: register %d\n", __FUNCTION__, ret);
return ret;
}
for (i = 0; i < 10; i++) {
sqe = io_uring_get_sqe(&ring);
if (!sqe)
break;
io_uring_prep_readv(sqe, 0, &iovecs[i], 1, 0);
sqe->flags |= IOSQE_FIXED_FILE;
ret = io_uring_submit(&ring);
sleep(1);
printf("submit %d\n", i);
}
for (i = 0; i < 10; i++) {
io_uring_wait_cqe(&ring, &cqe);
printf("receive: %d\n", i);
if (cqe->res != 4096) {
fprintf(stderr, "ret=%d, wanted 4096\n", cqe->res);
ret = 1;
}
io_uring_cqe_seen(&ring, cqe);
}
close(fd);
io_uring_queue_exit(&ring);
return 0;
}
sudo ./test testfile
above command will hang on the tenth request, to fix this bug, when io
sq_thread is waken up, we reset the variable 'ret' to be zero.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We normally disable any commands that aren't specifically poll commands
for a ring that is setup for polling, but we do allow buffer provide and
remove commands to support buffer selection for polled IO. Once a
request is issued, we add it to the poll list to poll for completion. But
we should not do that for non-IO commands, as those request complete
inline immediately and aren't pollable. If we do, we can leave requests
on the iopoll list after they are freed.
Fixes: ddf0322db7 ("io_uring: add IORING_OP_PROVIDE_BUFFERS")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
kiocb.private is used in iomap_dio_rw() so store buf_index separately.
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Move 'buf_index' to a hole in io_kiocb.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We currently move it to the io_wqe_manager for execution, but we cannot
safely do so as we may lack some of the state to execute it out of
context. As we cancel work anyway when the ring/task exits, just mark
this request as canceled and io_async_task_func() will do the right
thing.
Fixes: aa96bf8a9e ("io_uring: use io-wq manager as backup task if task is exiting")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If the request is still hashed in io_async_task_func(), then it cannot
have been canceled and it's pointless to check. So save that check.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We checked for 'force_nonblock' higher up, so it's definitely false
at this point. Kill the check, it's a remnant of when we tried to do
inline splice without always punting to async context.
Fixes: 2fb3e82284 ("io_uring: punt splice async because of inode mutex")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Add IORING_OP_TEE implementing tee(2) support. Almost identical to
splice bits, but without offsets.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
req->flags stores all sqe->flags. After checking that sqe->flags are
valid set if IOSQE* flags, no need to double check it, just forward them
all.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_file_put() deals with flushing state's file refs, adding "state" to
its name makes it a bit clearer. Also, avoid double check of
state->file in __io_file_get() in some cases.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
A submission is "async" IIF it's done by SQPOLL thread. Instead of
passing @async flag into io_submit_sqes(), deduce it from ctx->flags.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We only need apoll in the one section, do the juggling with the work
restoration there. This removes a special case further down as well.
No functional changes in this patch.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
As for other not inlined requests, alloc req->io for FORCE_ASYNC reqs,
so they can be prepared properly.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If req->io is not NULL, it's already prepared. Don't do it again,
it's dangerous.
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
There's no point in using list_del_init() on entries that are going
away, and the associated lock is always used in process context so
let's not use the IRQ disabling+saving variant of the spinlock.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This new flag should be set/clear from the application to
disable/enable eventfd notifications when a request is completed
and queued to the CQ ring.
Before this patch, notifications were always sent if an eventfd is
registered, so IORING_CQ_EVENTFD_DISABLED is not set during the
initialization.
It will be up to the application to set the flag after initialization
if no notifications are required at the beginning.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
This patch adds the new 'cq_flags' field that should be written by
the application and read by the kernel.
This new field is available to the userspace application through
'cq_off.flags'.
We are using 4-bytes previously reserved and set to zero. This means
that if the application finds this field to zero, then the new
functionality is not supported.
In the next patch we will introduce the first flag available.
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Some file descriptors use separate waitqueues for their f_ops->poll()
handler, most commonly one for read and one for write. The io_uring
poll implementation doesn't work with that, as the 2nd poll_wait()
call will cause the io_uring poll request to -EINVAL.
This affects (at least) tty devices and /dev/random as well. This is a
big problem for event loops where some file descriptors work, and others
don't.
With this fix, io_uring handles multiple waitqueues.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We currently embed and queue a work item per fixed_file_ref_node that
we update, but if the workload does a lot of these, then the associated
kworker-events overhead can become quite noticeable.
Since we rarely need to wait on these, batch them at 1 second intervals
instead. If we do need to wait for them, we just flush the pending
delayed work.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We used to have three completions, now we just have two. With the two,
let's not allocate them dynamically, just embed then in the ctx and
name them appropriately.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When we changed the file registration handling, it became important to
iterate the bulk request freeing list for fixed files as well, or we
miss dropping the fixed file reference. If not, we're leaking references,
and we'll get a kworker stuck waiting for file references to disappear.
This also means we can remove the special casing of fixed vs non-fixed
files, we need to iterate for both and we can just rely on
__io_req_aux_free() doing io_put_file() instead of doing it manually.
Fixes: 0558955373 ("io_uring: refactor file register/unregister/update handling")
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Remove duplicate semicolon at the end of line in io_file_from_index()
Signed-off-by: Xiaoming Ni <nixiaoming@huawei.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
do_splice() doesn't expect len to be 0. Just always return 0 in this
case as splice(2) does.
Fixes: 7d67af2c01 ("io_uring: add splice(2) support")
Reported-by: Jann Horn <jannh@google.com>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The "struct io_submit_state *state" parameter is not used, remove it.
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The attempt protecting us from closing the ring itself wasn't really
complete, and we actually don't need it. The referencing of requests
themselve, and the references they hold on the ring, ensures that the
life time of the ring is sane. With the check removed, we can also
remove the need to have the close operation fget() the file.
Reported-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We currently make some guesses as when to open this fd, but in reality
we have no business (or need) to do so at all. In fact, it makes certain
things fail, like O_PATH.
Remove the fd lookup from these opcodes, we're just passing the 'fd' to
generic helpers anyway. With that, we can also remove the special casing
of fd values in io_req_needs_file(), and the 'fd_non_neg' check that
we have. And we can ensure that we only read sqe->fd once.
This fixes O_PATH usage with openat/openat2, and ditto statx path side
oddities.
Cc: stable@vger.kernel.org: # v5.6
Reported-by: Max Kellermann <mk@cm4all.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
If copy_to_user() in io_uring_setup() failed, we'll leak many kernel
resources, which will be recycled until process terminates. This bug
can be reproduced by using mprotect to set params to PROT_READ. To fix
this issue, refactor io_uring_create() a bit to add a new 'struct
io_uring_params __user *params' parameter and move the copy_to_user()
in io_uring_setup() to io_uring_setup(), if copy_to_user() failed,
we can free kernel resource properly.
Suggested-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
The prepare_to_wait() and finish_wait() calls in io_uring_cancel_files()
are mismatched. Currently I don't see any issues related this bug, just
find it by learning codes.
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Nonblocking do_splice() still may wait for some time on an inode mutex.
Let's play safe and always punt it async.
Reported-by: Jens Axboe <axboe@kernel.dk>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
io_req_defer() do double-checked locking. Use proper helpers for that,
i.e. list_empty_careful().
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
While working on to make io_uring sqpoll mode support syscalls that need
struct files_struct, I got cpu soft lockup in io_ring_ctx_wait_and_kill(),
while (ctx->sqo_thread && !wq_has_sleeper(&ctx->sqo_wait))
cpu_relax();
above loop never has an chance to exit, it's because preempt isn't enabled
in the kernel, and the context calling io_ring_ctx_wait_and_kill() and
io_sq_thread() run in the same cpu, if io_sq_thread calls a cond_resched()
yield cpu and another context enters above loop, then io_sq_thread() will
always in runqueue and never exit.
Use cond_resched() can fix this issue.
Reported-by: syzbot+66243bb7126c410cefe6@syzkaller.appspotmail.com
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Use ctx->fallback_req address for test_and_set_bit_lock() and
clear_bit_unlock().
Signed-off-by: Bijan Mottahedeh <bijan.mottahedeh@oracle.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We do blocking retry from our poll handler, if the file supports polled
notifications. Only mark the request as needing an async worker if we
can't poll for it.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
We can have files like eventfd where it's perfectly fine to do poll
based retry on them, right now io_file_supports_async() doesn't take
that into account.
Pass in data direction and check the f_op instead of just always needing
an async worker.
Signed-off-by: Jens Axboe <axboe@kernel.dk>
Clay reports that OP_STATX fails for a test case with a valid fd
and empty path:
-- Test 0: statx:fd 3: SUCCEED, file mode 100755
-- Test 1: statx:path ./uring_statx: SUCCEED, file mode 100755
-- Test 2: io_uring_statx:fd 3: FAIL, errno 9: Bad file descriptor
-- Test 3: io_uring_statx:path ./uring_statx: SUCCEED, file mode 100755
This is due to statx not grabbing the process file table, hence we can't
lookup the fd in async context. If the fd is valid, ensure that we grab
the file table so we can grab the file from async context.
Cc: stable@vger.kernel.org # v5.6
Reported-by: Clay Harris <bugs@claycon.org>
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When testing io_uring IORING_FEAT_FAST_POLL feature, I got below panic:
BUG: kernel NULL pointer dereference, address: 0000000000000030
PGD 0 P4D 0
Oops: 0000 [#1] SMP PTI
CPU: 5 PID: 2154 Comm: io_uring_echo_s Not tainted 5.6.0+ #359
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS rel-1.11.1-0-g0551a4be2c-prebuilt.qemu-project.org 04/01/2014
RIP: 0010:io_wq_submit_work+0xf/0xa0
Code: ff ff ff be 02 00 00 00 e8 ae c9 19 00 e9 58 ff ff ff 66 0f 1f
84 00 00 00 00 00 0f 1f 44 00 00 41 54 49 89 fc 55 53 48 8b 2f <8b>
45 30 48 8d 9d 48 ff ff ff 25 01 01 00 00 83 f8 01 75 07 eb 2a
RSP: 0018:ffffbef543e93d58 EFLAGS: 00010286
RAX: ffffffff84364f50 RBX: ffffa3eb50f046b8 RCX: 0000000000000000
RDX: ffffa3eb0efc1840 RSI: 0000000000000006 RDI: ffffa3eb50f046b8
RBP: 0000000000000000 R08: 00000000fffd070d R09: 0000000000000000
R10: 0000000000000000 R11: 0000000000000000 R12: ffffa3eb50f046b8
R13: ffffa3eb0efc2088 R14: ffffffff85b69be0 R15: ffffa3eb0effa4b8
FS: 00007fe9f69cc4c0(0000) GS:ffffa3eb5ef40000(0000) knlGS:0000000000000000
CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000000030 CR3: 0000000020410000 CR4: 00000000000006e0
Call Trace:
task_work_run+0x6d/0xa0
do_exit+0x39a/0xb80
? get_signal+0xfe/0xbc0
do_group_exit+0x47/0xb0
get_signal+0x14b/0xbc0
? __x64_sys_io_uring_enter+0x1b7/0x450
do_signal+0x2c/0x260
? __x64_sys_io_uring_enter+0x228/0x450
exit_to_usermode_loop+0x87/0xf0
do_syscall_64+0x209/0x230
entry_SYSCALL_64_after_hwframe+0x49/0xb3
RIP: 0033:0x7fe9f64f8df9
Code: Bad RIP value.
task_work_run calls io_wq_submit_work unexpectedly, it's obvious that
struct callback_head's func member has been changed. After looking into
codes, I found this issue is still due to the union definition:
union {
/*
* Only commands that never go async can use the below fields,
* obviously. Right now only IORING_OP_POLL_ADD uses them, and
* async armed poll handlers for regular commands. The latter
* restore the work, if needed.
*/
struct {
struct callback_head task_work;
struct hlist_node hash_node;
struct async_poll *apoll;
};
struct io_wq_work work;
};
When task_work_run has multiple work to execute, the work that calls
io_poll_remove_all() will do req->work restore for non-poll request
always, but indeed if a non-poll request has been added to a new
callback_head, subsequent callback will call io_async_task_func() to
handle this request, that means we should not do the restore work
for such non-poll request. Meanwhile in io_async_task_func(), we should
drop submit ref when req has been canceled.
Fix both issues.
Fixes: b1f573bd15 ("io_uring: restore req->work when canceling poll request")
Signed-off-by: Xiaoguang Wang <xiaoguang.wang@linux.alibaba.com>
Use io_double_put_req()
Signed-off-by: Jens Axboe <axboe@kernel.dk>
When checking for draining with __req_need_defer(), it tries to match
how many requests were sent before a current one with number of already
completed. Dropped SQEs are included in req->sequence, and they won't
ever appear in CQ. To compensate for that, __req_need_defer() substracts
ctx->cached_sq_dropped.
However, what it should really use is number of SQEs dropped __before__
the current one. In other words, any submitted request shouldn't
shouldn't affect dequeueing from the drain queue of previously submitted
ones.
Instead of saving proper ctx->cached_sq_dropped in each request,
substract from req->sequence it at initialisation, so it includes number
of properly submitted requests.
note: it also changes behaviour of timeouts, but
1. it's already diverge from the description because of using SQ
2. the description is ambiguous regarding dropped SQEs
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
Signed-off-by: Jens Axboe <axboe@kernel.dk>