[ Upstream commit 6f769f2282 ]
Stop rxrpc from sending a DUP ACK in response to a PING RESPONSE ACK on a
dead call. We may have initiated the ping but the call may have beaten the
response to completion.
Fixes: 18bfeba50d ("rxrpc: Perform terminal call ACK/ABORT retransmission from conn processor")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit bad1a11c0f ]
rxrpc_find_service_conn_rcu() should make the "seq" counter odd on the
second pass, otherwise read_seqbegin_or_lock() never takes the lock.
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20231117164846.GA10410@redhat.com/
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d098d83c5 ]
The hard call timeout is specified in the RXRPC_SET_CALL_TIMEOUT cmsg in
seconds, so fix the point at which sendmsg() applies it to the call to
convert to jiffies from seconds, not milliseconds.
Fixes: a158bdd324 ("rxrpc: Fix timeout of a call that hasn't yet been granted a channel")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: "David S. Miller" <davem@davemloft.net>
cc: Eric Dumazet <edumazet@google.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: Paolo Abeni <pabeni@redhat.com>
cc: linux-afs@lists.infradead.org
cc: netdev@vger.kernel.org
cc: linux-kernel@vger.kernel.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4feb2c4462 ]
One of the error paths in rxrpc_do_sendmsg() doesn't unlock the call mutex
before returning. Fix it to do this.
Note that this still doesn't get rid of the checker warning:
../net/rxrpc/sendmsg.c:617:5: warning: context imbalance in 'rxrpc_do_sendmsg' - wrong count at exit
I think the interplay between the socket lock and the call's user_mutex may
be too complicated for checker to analyse, especially as
rxrpc_new_client_call_for_sendmsg(), which it calls, returns with the
call's user_mutex if successful but unconditionally drops the socket lock.
Fixes: e754eba685 ("rxrpc: Provide a cmsg to specify the amount of Tx data for a call")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8889a711f9 ]
ack.bufferSize should be set to 0 when generating an ack.
Fixes: 8d94aa381d ("rxrpc: Calls shouldn't hold socket refs")
Reported-by: Jeffrey Altman <jaltman@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 214a9dc7d8 ]
Fix the calculation of the resend age to add a microsecond value as
microseconds, not nanoseconds.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d3d863036d ]
If the local processor work item for the rxrpc local endpoint gets requeued
by an event (such as an incoming packet) between it getting scheduled for
destruction and the UDP socket being closed, the rxrpc_local_destroyer()
function can get run twice. The second time it can hang because it can end
up waiting for cleanup events that will never happen.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 0d40f728e2 ]
rxkad_verify_packet_2() has a small stack-allocated sglist of 4 elements,
but if that isn't sufficient for the number of fragments in the socket
buffer, we try to allocate an sglist large enough to hold all the
fragments.
However, for large packets with a lot of fragments, this isn't sufficient
and we need at least one additional fragment.
The problem manifests as skb_to_sgvec() returning -EMSGSIZE and this then
getting returned by userspace. Most of the time, this isn't a problem as
rxrpc sets a limit of 5692, big enough for 4 jumbo subpackets to be glued
together; occasionally, however, the server will ignore the reported limit
and give a packet that's a lot bigger - say 19852 bytes with ->nr_frags
being 7. skb_to_sgvec() then tries to return a "zeroth" fragment that
seems to occur before the fragments counted by ->nr_frags and we hit the
end of the sglist too early.
Note that __skb_to_sgvec() also has an skb_walk_frags() loop that is
recursive up to 24 deep. I'm not sure if I need to take account of that
too - or if there's an easy way of counting those frags too.
Fix this by counting an extra frag and allocating a larger sglist based on
that.
Fixes: d0d5c0cd1e ("rxrpc: Use skb_unshare() rather than skb_cow_data()")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9a3dedcf18 ]
Fix the decision on when to generate an IDLE ACK by keeping a count of the
number of packets we've received, but not yet soft-ACK'd, and the number of
packets we've processed, but not yet hard-ACK'd, rather than trying to keep
track of which DATA sequence numbers correspond to those points.
We then generate an ACK when either counter exceeds 2. The counters are
both cleared when we transcribe the information into any sort of ACK packet
for transmission. IDLE and DELAY ACKs are skipped if both counters are 0
(ie. no change).
Fixes: 805b21b929 ("rxrpc: Send an ACK after every few DATA packets we receive")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 81524b6312 ]
The previousPacket field in the rx ACK packet should never go backwards -
it's now the highest DATA sequence number received, not the last on
received (it used to be used for out of sequence detection).
Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 8940ba3cfe ]
Fix accidental overlapping of Rx-phase ACK accounting with Tx-phase ACK
accounting through variables shared between the two. call->acks_* members
refer to ACKs received in the Tx phase and call->ackr_* members to ACKs
sent/to be sent during the Rx phase.
Fixes: 1a2391c30c ("rxrpc: Fix detection of out of order acks")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Jeffrey Altman <jaltman@auristor.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 114af61f88 ]
rxrpc has a timer to trigger resending of unacked data packets in a call.
This is not cancelled when a client call switches to the receive phase on
the basis that most calls don't last long enough for it to ever expire.
However, if it *does* expire after we've started to receive the reply, we
shouldn't then go into trying to retransmit or pinging the server to find
out if an ack got lost.
Fix this by skipping the resend code if we're into receiving the reply to a
client call.
Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 88e2215975 ]
AF_RXRPC's listen() handler lets you set the backlog up to 32 (if you bump
up the sysctl), but whilst the preallocation circular buffers have 32 slots
in them, one of them has to be a dead slot because we're using CIRC_CNT().
This means that listen(rxrpc_sock, 32) will cause an oops when the socket
is closed because rxrpc_service_prealloc_one() allocated one too many calls
and rxrpc_discard_prealloc() won't then be able to get rid of them because
it'll think the ring is empty. rxrpc_release_calls_on_socket() then tries
to abort them, but oopses because call->peer isn't yet set.
Fix this by setting the maximum backlog to RXRPC_BACKLOG_MAX - 1 to match
the ring capacity.
BUG: kernel NULL pointer dereference, address: 0000000000000086
...
RIP: 0010:rxrpc_send_abort_packet+0x73/0x240 [rxrpc]
Call Trace:
<TASK>
? __wake_up_common_lock+0x7a/0x90
? rxrpc_notify_socket+0x8e/0x140 [rxrpc]
? rxrpc_abort_call+0x4c/0x60 [rxrpc]
rxrpc_release_calls_on_socket+0x107/0x1a0 [rxrpc]
rxrpc_release+0xc9/0x1c0 [rxrpc]
__sock_release+0x37/0xa0
sock_close+0x11/0x20
__fput+0x89/0x240
task_work_run+0x59/0x90
do_exit+0x319/0xaa0
Fixes: 00e907127e ("rxrpc: Preallocate peers, conns and calls for incoming service requests")
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Link: https://lists.infradead.org/pipermail/linux-afs/2022-March/005079.html
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 4ba68c5192 ]
If at the end of rxrpc sendmsg() or rxrpc_kernel_send_data() the call that
was being given data was aborted remotely or otherwise failed, return an
error rather than returning the amount of data buffered for transmission.
The call (presumably) did not complete, so there's not much point
continuing with it. AF_RXRPC considers it "complete" and so will be
unwilling to do anything else with it - and won't send a notification for
it, deeming the return from sendmsg sufficient.
Not returning an error causes afs to incorrectly handle a StoreData
operation that gets interrupted by a change of address due to NAT
reconfiguration.
This doesn't normally affect most operations since their request parameters
tend to fit into a single UDP packet and afs_make_call() returns before the
server responds; StoreData is different as it involves transmission of a
lot of data.
This can be triggered on a client by doing something like:
dd if=/dev/zero of=/afs/example.com/foo bs=1M count=512
at one prompt, and then changing the network address at another prompt,
e.g.:
ifconfig enp6s0 inet 192.168.6.2 && route add 192.168.6.1 dev enp6s0
Tracing packets on an Auristor fileserver looks something like:
192.168.6.1 -> 192.168.6.3 RX 107 ACK Idle Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001
192.168.6.3 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(64538) (64538)
192.168.6.3 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(64538) (64538)
192.168.6.1 -> 192.168.6.3 RX 107 ACK Idle Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001
<ARP exchange for 192.168.6.2>
192.168.6.2 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(0) (0)
192.168.6.2 -> 192.168.6.1 AFS (RX) 1482 FS Request: Unknown(0) (0)
192.168.6.1 -> 192.168.6.2 RX 107 ACK Exceeds Window Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001
192.168.6.1 -> 192.168.6.2 RX 74 ABORT Seq: 0 Call: 4 Source Port: 7000 Destination Port: 7001
192.168.6.1 -> 192.168.6.2 RX 74 ABORT Seq: 29321 Call: 4 Source Port: 7000 Destination Port: 7001
The Auristor fileserver logs code -453 (RXGEN_SS_UNMARSHAL), but the abort
code received by kafs is -5 (RX_PROTOCOL_ERROR) as the rx layer sees the
condition and generates an abort first and the unmarshal error is a
consequence of that at the application layer.
Reported-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: linux-afs@lists.infradead.org
Link: http://lists.infradead.org/pipermail/linux-afs/2021-December/004810.html # v1
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit ee3b0826b4 ]
A recent patch[1] from Eric Dumazet flipped the order in which the
keepalive timer and the keepalive worker were cancelled in order to fix a
syzbot reported issue[2]. Unfortunately, this enables the mirror image bug
whereby the timer races with rxrpc_exit_net(), restarting the worker after
it has been cancelled:
CPU 1 CPU 2
=============== =====================
if (rxnet->live)
<INTERRUPT>
rxnet->live = false;
cancel_work_sync(&rxnet->peer_keepalive_work);
rxrpc_queue_work(&rxnet->peer_keepalive_work);
del_timer_sync(&rxnet->peer_keepalive_timer);
Fix this by restoring the removed del_timer_sync() so that we try to remove
the timer twice. If the timer runs again, it should see ->live == false
and not restart the worker.
Fixes: 1946014ca3 ("rxrpc: fix a race in rxrpc_exit_net()")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Eric Dumazet <edumazet@google.com>
cc: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/20220404183439.3537837-1-eric.dumazet@gmail.com/ [1]
Link: https://syzkaller.appspot.com/bug?extid=724378c4bb58f703b09a [2]
Reviewed-by: Eric Dumazet <edumazet@google.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2c13c05c5f ]
Improve retransmission backoff by only backing off when we retransmit data
packets rather than when we set the lost ack timer.
To this end:
(1) In rxrpc_resend(), use rxrpc_get_rto_backoff() when setting the
retransmission timer and only tell it that we are retransmitting if we
actually have things to retransmit.
Note that it's possible for the retransmission algorithm to race with
the processing of a received ACK, so we may see no packets needing
retransmission.
(2) In rxrpc_send_data_packet(), don't bump the backoff when setting the
ack_lost_at timer, as it may then get bumped twice.
With this, when looking at one particular packet, the retransmission
intervals were seen to be 1.5ms, 2ms, 3ms, 5ms, 9ms, 17ms, 33ms, 71ms,
136ms, 264ms, 544ms, 1.088s, 2.1s, 4.2s and 8.3s.
Fixes: c410bf0193 ("rxrpc: Fix the excessive initial retransmission timeout")
Suggested-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Tested-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/r/164138117069.2023386.17446904856843997127.stgit@warthog.procyon.org.uk/
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit beacff50ed upstream.
Need to call rxrpc_put_local() for peer candidate before kfree() as it
holds a ref to rxrpc_local.
[DH: v2: Changed to abstract the peer freeing code out into a function]
Fixes: 9ebeddef58 ("rxrpc: rxrpc_peer needs to hold a ref on the rxrpc_local record")
Signed-off-by: Eiichi Tsukata <eiichi.tsukata@nutanix.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
cc: linux-afs@lists.infradead.org
Link: https://lore.kernel.org/all/20211121041608.133740-2-eiichi.tsukata@nutanix.com/ # v1
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit acde891c24 ]
Directly using _usecs_to_jiffies() might be unsafe, so it's
better to use usecs_to_jiffies() instead.
Because we can see that the result of _usecs_to_jiffies()
could be larger than MAX_JIFFY_OFFSET values without the
check of the input.
Fixes: c410bf0193 ("Fix the excessive initial retransmission timeout")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 7b5eab57ca upstream.
At the end of rxrpc_release_call(), rxrpc_cleanup_ring() is called to clear
the Rx/Tx skbuff ring, but this doesn't lock the ring whilst it's accessing
it. Unfortunately, rxrpc_resend() might be trying to retransmit a packet
concurrently with this - and whilst it does lock the ring, this isn't
protection against rxrpc_cleanup_call().
Fix this by removing the call to rxrpc_cleanup_ring() from
rxrpc_release_call(). rxrpc_cleanup_ring() will be called again anyway
from rxrpc_cleanup_call(). The earlier call is just an optimisation to
recycle skbuffs more quickly.
Alternative solutions include rxrpc_release_call() could try to cancel the
work item or wait for it to complete or rxrpc_cleanup_ring() could lock
when accessing the ring (which would require a bh lock).
This can produce a report like the following:
BUG: KASAN: use-after-free in rxrpc_send_data_packet+0x19b4/0x1e70 net/rxrpc/output.c:372
Read of size 4 at addr ffff888011606e04 by task kworker/0:0/5
...
Workqueue: krxrpcd rxrpc_process_call
Call Trace:
...
kasan_report.cold+0x79/0xd5 mm/kasan/report.c:413
rxrpc_send_data_packet+0x19b4/0x1e70 net/rxrpc/output.c:372
rxrpc_resend net/rxrpc/call_event.c:266 [inline]
rxrpc_process_call+0x1634/0x1f60 net/rxrpc/call_event.c:412
process_one_work+0x98d/0x15f0 kernel/workqueue.c:2275
...
Allocated by task 2318:
...
sock_alloc_send_pskb+0x793/0x920 net/core/sock.c:2348
rxrpc_send_data+0xb51/0x2bf0 net/rxrpc/sendmsg.c:358
rxrpc_do_sendmsg+0xc03/0x1350 net/rxrpc/sendmsg.c:744
rxrpc_sendmsg+0x420/0x630 net/rxrpc/af_rxrpc.c:560
...
Freed by task 2318:
...
kfree_skb+0x140/0x3f0 net/core/skbuff.c:704
rxrpc_free_skb+0x11d/0x150 net/rxrpc/skbuff.c:78
rxrpc_cleanup_ring net/rxrpc/call_object.c:485 [inline]
rxrpc_release_call+0x5dd/0x860 net/rxrpc/call_object.c:552
rxrpc_release_calls_on_socket+0x21c/0x300 net/rxrpc/call_object.c:579
rxrpc_release_sock net/rxrpc/af_rxrpc.c:885 [inline]
rxrpc_release+0x263/0x5a0 net/rxrpc/af_rxrpc.c:916
__sock_release+0xcd/0x280 net/socket.c:597
...
The buggy address belongs to the object at ffff888011606dc0
which belongs to the cache skbuff_head_cache of size 232
Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Reported-by: syzbot+174de899852504e4a74a@syzkaller.appspotmail.com
Reported-by: syzbot+3d1c772efafd3c38d007@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
Link: https://lore.kernel.org/r/161234207610.653119.5287360098400436976.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 5399d52233 ]
AF_RXRPC sockets use UDP ports in encap mode. This causes socket and dst
from an incoming packet to get stolen and attached to the UDP socket from
whence it is leaked when that socket is closed.
When a network namespace is removed, the wait for dst records to be cleaned
up happens before the cleanup of the rxrpc and UDP socket, meaning that the
wait never finishes.
Fix this by moving the rxrpc (and, by dependence, the afs) private
per-network namespace registrations to the device group rather than subsys
group. This allows cached rxrpc local endpoints to be cleared and their
UDP sockets closed before we try waiting for the dst records.
The symptom is that lines looking like the following:
unregister_netdevice: waiting for lo to become free
get emitted at regular intervals after running something like the
referenced syzbot test.
Thanks to Vadim for tracking this down and work out the fix.
Reported-by: syzbot+df400f2f24a1677cd7e0@syzkaller.appspotmail.com
Reported-by: Vadim Fedorenko <vfedorenko@novek.ru>
Fixes: 5271953cad ("rxrpc: Use the UDP encap_rcv hook")
Signed-off-by: David Howells <dhowells@redhat.com>
Acked-by: Vadim Fedorenko <vfedorenko@novek.ru>
Link: https://lore.kernel.org/r/161196443016.3868642.5577440140646403533.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d52e419ac8 ]
Clang static analysis reports the following:
net/rxrpc/key.c:657:11: warning: Assigned value is garbage or undefined
toksize = toksizes[tok++];
^ ~~~~~~~~~~~~~~~
rxrpc_read() contains two consecutive loops. The first loop calculates the
token sizes and stores the results in toksizes[] and the second one uses
the array. When there is an error in identifying the token in the first
loop, the token is skipped, no change is made to the toksizes[] array.
When the same error happens in the second loop, the token is not skipped.
This will cause the toksizes[] array to be out of step and will overrun
past the calculated sizes.
Fix this by making both loops log a message and return an error in this
case. This should only happen if a new token type is incompletely
implemented, so it should normally be impossible to trigger this.
Fixes: 9a059cd5ca ("rxrpc: Downgrade the BUG() for unsupported token type in rxrpc_read()")
Reported-by: Tom Rix <trix@redhat.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/161046503122.2445787.16714129930607546635.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit a95d25dd7b ]
The call state may be changed at any time by the data-ready routine in
response to received packets, so if the call state is to be read and acted
upon several times in a function, READ_ONCE() must be used unless the call
state lock is held.
As it happens, we used READ_ONCE() to read the state a few lines above the
unmarked read in rxrpc_input_data(), so use that value rather than
re-reading it.
Fixes: a158bdd324 ("rxrpc: Fix call timeouts")
Signed-off-by: Baptiste Lepers <baptiste.lepers@gmail.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Link: https://lore.kernel.org/r/161046715522.2450566.488819910256264150.stgit@warthog.procyon.org.uk
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 38b1dc47a3 ]
If someone calls setsockopt() twice to set a server key keyring, the first
keyring is leaked.
Fix it to return an error instead if the server key keyring is already set.
Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fea9911124 ]
The keyring containing the server's tokens isn't network-namespaced, so it
shouldn't be looked up with a network namespace. It is expected to be
owned specifically by the server, so namespacing is unnecessary.
Fixes: a58946c158 ("keys: Pass the network namespace into request_key mechanism")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit fa1d113a0f ]
conn->state_lock may be taken in softirq mode, but a previous patch
replaced an outer lock in the response-packet event handling code, and lost
the _bh from that when doing so.
Fix this by applying the _bh annotation to the state_lock locking.
Fixes: a1399f8bb0 ("rxrpc: Call channels should have separate call number spaces")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 9a059cd5ca ]
If rxrpc_read() (which allows KEYCTL_READ to read a key), sees a token of a
type it doesn't recognise, it can BUG in a couple of places, which is
unnecessary as it can easily get back to userspace.
Fix this to print an error message instead.
Fixes: 99455153d0 ("RxRPC: Parse security index 5 keys (Kerberos 5)")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 56305118e0 ]
The session key should be encoded with just the 8 data bytes and
no length; ENCODE_DATA precedes it with a 4 byte length, which
confuses some existing tools that try to parse this format.
Add an ENCODE_BYTES macro that does not include a length, and use
it for the key. Also adjust the expected length.
Note that commit 774521f353 ("rxrpc: Fix an assertion in
rxrpc_read()") had fixed a BUG by changing the length rather than
fixing the encoding. The original length was correct.
Fixes: 99455153d0 ("RxRPC: Parse security index 5 keys (Kerberos 5)")
Signed-off-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 1d4adfaf65 ]
Fix rxrpc_kernel_get_srtt() to indicate the validity of the returned
smoothed RTT. If we haven't had any valid samples yet, the SRTT isn't
useful.
Fixes: c410bf0193 ("rxrpc: Fix the excessive initial retransmission timeout")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 68528d937d ]
Keep the ACK serial number in a variable in rxrpc_input_ack() as it's used
frequently.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 65550098c1 ]
There's a race between rxrpc_sendmsg setting up a call, but then failing to
send anything on it due to an error, and recvmsg() seeing the call
completion occur and trying to return the state to the user.
An assertion fails in rxrpc_recvmsg() because the call has already been
released from the socket and is about to be released again as recvmsg deals
with it. (The recvmsg_q queue on the socket holds a ref, so there's no
problem with use-after-free.)
We also have to be careful not to end up reporting an error twice, in such
a way that both returns indicate to userspace that the user ID supplied
with the call is no longer in use - which could cause the client to
malfunction if it recycles the user ID fast enough.
Fix this by the following means:
(1) When sendmsg() creates a call after the point that the call has been
successfully added to the socket, don't return any errors through
sendmsg(), but rather complete the call and let recvmsg() retrieve
them. Make sendmsg() return 0 at this point. Further calls to
sendmsg() for that call will fail with ESHUTDOWN.
Note that at this point, we haven't send any packets yet, so the
server doesn't yet know about the call.
(2) If sendmsg() returns an error when it was expected to create a new
call, it means that the user ID wasn't used.
(3) Mark the call disconnected before marking it completed to prevent an
oops in rxrpc_release_call().
(4) recvmsg() will then retrieve the error and set MSG_EOR to indicate
that the user ID is no longer known by the kernel.
An oops like the following is produced:
kernel BUG at net/rxrpc/recvmsg.c:605!
...
RIP: 0010:rxrpc_recvmsg+0x256/0x5ae
...
Call Trace:
? __init_waitqueue_head+0x2f/0x2f
____sys_recvmsg+0x8a/0x148
? import_iovec+0x69/0x9c
? copy_msghdr_from_user+0x5c/0x86
___sys_recvmsg+0x72/0xaa
? __fget_files+0x22/0x57
? __fget_light+0x46/0x51
? fdget+0x9/0x1b
do_recvmmsg+0x15e/0x232
? _raw_spin_unlock+0xa/0xb
? vtime_delta+0xf/0x25
__x64_sys_recvmmsg+0x2c/0x2f
do_syscall_64+0x4c/0x78
entry_SYSCALL_64_after_hwframe+0x44/0xa9
Fixes: 357f5ef646 ("rxrpc: Call rxrpc_release_call() on error in rxrpc_new_client_call()")
Reported-by: syzbot+b54969381df354936d96@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
Reviewed-by: Marc Dionne <marc.dionne@auristor.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 639f181f0e ]
rxrpc_sendmsg() returns EPIPE if there's an outstanding error, such as if
rxrpc_recvmsg() indicating ENODATA if there's nothing for it to read.
Change rxrpc_recvmsg() to return EAGAIN instead if there's nothing to read
as this particular error doesn't get stored in ->sk_err by the networking
core.
Also change rxrpc_sendmsg() so that it doesn't fail with delayed receive
errors (there's no way for it to report which call, if any, the error was
caused by).
Fixes: 17926a7932 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit 02c28dffb1 ]
Commit 2ad6691d98, which moved the modification of the status annotation
for a packet in the Tx buffer prior to the retransmission moved the state
clearance, but managed to lose the bit that set it to UNACK.
Consequently, if a retransmission occurs, the packet is accidentally
changed to the ACK state (ie. 0) by masking it off, which means that the
packet isn't counted towards the tally of newly-ACK'd packets if it gets
hard-ACK'd. This then prevents the congestion control algorithm from
recovering properly.
Fix by reinstating the change of state to UNACK.
Spotted by the generic/460 xfstest.
Fixes: 2ad6691d98 ("rxrpc: Fix race between incoming ACK parser and retransmitter")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 2ad6691d98 ]
There's a race between the retransmission code and the received ACK parser.
The problem is that the retransmission loop has to drop the lock under
which it is iterating through the transmission buffer in order to transmit
a packet, but whilst the lock is dropped, the ACK parser can crank the Tx
window round and discard the packets from the buffer.
The retransmission code then updated the annotations for the wrong packet
and a later retransmission thought it had to retransmit a packet that
wasn't there, leading to a NULL pointer dereference.
Fix this by:
(1) Moving the annotation change to before we drop the lock prior to
transmission. This means we can't vary the annotation depending on
the outcome of the transmission, but that's fine - we'll retransmit
again later if it failed now.
(2) Skipping the packet if the skb pointer is NULL.
The following oops was seen:
BUG: kernel NULL pointer dereference, address: 000000000000002d
Workqueue: krxrpcd rxrpc_process_call
RIP: 0010:rxrpc_get_skb+0x14/0x8a
...
Call Trace:
rxrpc_resend+0x331/0x41e
? get_vtime_delta+0x13/0x20
rxrpc_process_call+0x3c0/0x4ac
process_one_work+0x18f/0x27f
worker_thread+0x1a3/0x247
? create_worker+0x17d/0x17d
kthread+0xe6/0xeb
? kthread_delayed_work_timer_fn+0x83/0x83
ret_from_fork+0x1f/0x30
Fixes: 248f219cb8 ("rxrpc: Rewrite the data and ack handling code")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit a2ad7c21ad ]
The handling of the receive window size (rwind) from a received ACK packet
is not correct. The rxrpc_input_ackinfo() function currently checks the
current Tx window size against the rwind from the ACK to see if it has
changed, but then limits the rwind size before storing it in the tx_winsize
member and, if it increased, wake up the transmitting process. This means
that if rwind > RXRPC_RXTX_BUFF_SIZE - 1, this path will always be
followed.
Fix this by limiting rwind before we compare it to tx_winsize.
The effect of this can be seen by enabling the rxrpc_rx_rwind_change
tracepoint.
Fixes: 702f2ac87a ("rxrpc: Wake up the transmitter if Rx window size increases on the peer")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 32f71aa497 ]
The user ID value isn't actually much use - and leaks a kernel pointer or a
userspace value - so replace it with the call debug ID, which appears in trace
points.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 441fdee1ea ]
The Rx protocol has a "previousPacket" field in it that is not handled in
the same way by all protocol implementations. Sometimes it contains the
serial number of the last DATA packet received, sometimes the sequence
number of the last DATA packet received and sometimes the highest sequence
number so far received.
AF_RXRPC is using this to weed out ACKs that are out of date (it's possible
for ACK packets to get reordered on the wire), but this does not work with
OpenAFS which will just stick the sequence number of the last packet seen
into previousPacket.
The issue being seen is that big AFS FS.StoreData RPC (eg. of ~256MiB) are
timing out when partly sent. A trace was captured, with an additional
tracepoint to show ACKs being discarded in rxrpc_input_ack(). Here's an
excerpt showing the problem.
52873.203230: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 0002449c q=00024499 fl=09
A DATA packet with sequence number 00024499 has been transmitted (the "q="
field).
...
52873.243296: rxrpc_rx_ack: c=000004ae 00012a2b DLY r=00024499 f=00024497 p=00024496 n=0
52873.243376: rxrpc_rx_ack: c=000004ae 00012a2c IDL r=0002449b f=00024499 p=00024498 n=0
52873.243383: rxrpc_rx_ack: c=000004ae 00012a2d OOS r=0002449d f=00024499 p=0002449a n=2
The Out-Of-Sequence ACK indicates that the server didn't see DATA sequence
number 00024499, but did see seq 0002449a (previousPacket, shown as "p=",
skipped the number, but firstPacket, "f=", which shows the bottom of the
window is set at that point).
52873.252663: rxrpc_retransmit: c=000004ae q=24499 a=02 xp=14581537
52873.252664: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244bc q=00024499 fl=0b *RETRANS*
The packet has been retransmitted. Retransmission recurs until the peer
says it got the packet.
52873.271013: rxrpc_rx_ack: c=000004ae 00012a31 OOS r=000244a1 f=00024499 p=0002449e n=6
More OOS ACKs indicate that the other packets that are already in the
transmission pipeline are being received. The specific-ACK list is up to 6
ACKs and NAKs.
...
52873.284792: rxrpc_rx_ack: c=000004ae 00012a49 OOS r=000244b9 f=00024499 p=000244b6 n=30
52873.284802: rxrpc_retransmit: c=000004ae q=24499 a=0a xp=63505500
52873.284804: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244c2 q=00024499 fl=0b *RETRANS*
52873.287468: rxrpc_rx_ack: c=000004ae 00012a4a OOS r=000244ba f=00024499 p=000244b7 n=31
52873.287478: rxrpc_rx_ack: c=000004ae 00012a4b OOS r=000244bb f=00024499 p=000244b8 n=32
At this point, the server's receive window is full (n=32) with presumably 1
NAK'd packet and 31 ACK'd packets. We can't transmit any more packets.
52873.287488: rxrpc_retransmit: c=000004ae q=24499 a=0a xp=61327980
52873.287489: rxrpc_tx_data: c=000004ae DATA ed1a3584:00000002 000244c3 q=00024499 fl=0b *RETRANS*
52873.293850: rxrpc_rx_ack: c=000004ae 00012a4c DLY r=000244bc f=000244a0 p=00024499 n=25
And now we've received an ACK indicating that a DATA retransmission was
received. 7 packets have been processed (the occupied part of the window
moved, as indicated by f= and n=).
52873.293853: rxrpc_rx_discard_ack: c=000004ae r=00012a4c 000244a0<00024499 00024499<000244b8
However, the DLY ACK gets discarded because its previousPacket has gone
backwards (from p=000244b8, in the ACK at 52873.287478 to p=00024499 in the
ACK at 52873.293850).
We then end up in a continuous cycle of retransmit/discard. kafs fails to
update its window because it's discarding the ACKs and can't transmit an
extra packet that would clear the issue because the window is full.
OpenAFS doesn't change the previousPacket value in the ACKs because no new
DATA packets are received with a different previousPacket number.
Fix this by altering the discard check to only discard an ACK based on
previousPacket if there was no advance in the firstPacket. This allows us
to transmit a new packet which will cause previousPacket to advance in the
next ACK.
The check, however, needs to allow for the possibility that previousPacket
may actually have had the serial number placed in it instead - in which
case it will go outside the window and we should ignore it.
Fixes: 1a2391c30c ("rxrpc: Fix detection of out of order acks")
Reported-by: Dave Botsch <botsch@cnf.cornell.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit d1f129470e ]
Add a tracepoint to track received ACKs that are discarded due to being
outside of the Tx window.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit f45d01f4f3 upstream.
A ticket was not released after a call of the function
"rxkad_decrypt_ticket" failed. Thus replace the jump target
"temporary_error_free_resp" by "temporary_error_free_ticket".
Fixes: 8c2f826dc3 ("rxrpc: Don't put crypto buffers on the stack")
Signed-off-by: Qiushi Wu <wu000273@umn.edu>
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Markus Elfring <Markus.Elfring@web.de>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit c410bf0193 upstream.
rxrpc currently uses a fixed 4s retransmission timeout until the RTT is
sufficiently sampled. This can cause problems with some fileservers with
calls to the cache manager in the afs filesystem being dropped from the
fileserver because a packet goes missing and the retransmission timeout is
greater than the call expiry timeout.
Fix this by:
(1) Copying the RTT/RTO calculation code from Linux's TCP implementation
and altering it to fit rxrpc.
(2) Altering the various users of the RTT to make use of the new SRTT
value.
(3) Replacing the use of rxrpc_resend_timeout to use the calculated RTO
value instead (which is needed in jiffies), along with a backoff.
Notes:
(1) rxrpc provides RTT samples by matching the serial numbers on outgoing
DATA packets that have the RXRPC_REQUEST_ACK set and PING ACK packets
against the reference serial number in incoming REQUESTED ACK and
PING-RESPONSE ACK packets.
(2) Each packet that is transmitted on an rxrpc connection gets a new
per-connection serial number, even for retransmissions, so an ACK can
be cross-referenced to a specific trigger packet. This allows RTT
information to be drawn from retransmitted DATA packets also.
(3) rxrpc maintains the RTT/RTO state on the rxrpc_peer record rather than
on an rxrpc_call because many RPC calls won't live long enough to
generate more than one sample.
(4) The calculated SRTT value is in units of 8ths of a microsecond rather
than nanoseconds.
The (S)RTT and RTO values are displayed in /proc/net/rxrpc/peers.
Fixes: 17926a7932 ([AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both"")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 0e631eee17 upstream.
Fix the DATA packet transmission to disable nofrag for UDPv4 on an AF_INET6
socket as well as UDPv6 when trying to transmit fragmentably.
Without this, packets filled to the normal size used by the kernel AFS
client of 1412 bytes be rejected by udp_sendmsg() with EMSGSIZE
immediately. The ->sk_error_report() notification hook is called, but
rxrpc doesn't generate a trace for it.
This is a temporary fix; a more permanent solution needs to involve
changing the size of the packets being filled in accordance with the MTU,
which isn't currently done in AF_RXRPC. The reason for not doing so was
that, barring the last packet in an rx jumbo packet, jumbos can only be
assembled out of 1412-byte packets - and the plan was to construct jumbos
on the fly at transmission time.
Also, there's no point turning on IPV6_MTU_DISCOVER, since IPv6 has to
engage in this anyway since fragmentation is only done by the sender. We
can then condense the switch-statement in rxrpc_send_data_packet().
Fixes: 75b54cb57c ("rxrpc: Add IPv6 support")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit e138aa7d32 ]
Fix the interruptibility of kernel-initiated client calls so that they're
either only interruptible when they're waiting for a call slot to come
available or they're not interruptible at all. Either way, they're not
interruptible during transmission.
This should help prevent StoreData calls from being interrupted when
writeback is in progress. It doesn't, however, handle interruption during
the receive phase.
Userspace-initiated calls are still interruptable. After the signal has
been handled, sendmsg() will return the amount of data copied out of the
buffer and userspace can perform another sendmsg() call to continue
transmission.
Fixes: bc5e3a546d ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
[ Upstream commit 158fe66653 ]
Abstract out the calculation of there being sufficient Tx buffer space.
This is reproduced several times in the rxrpc sendmsg code.
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
commit 498b577660 upstream.
Fix the handling of sendmsg() with MSG_WAITALL for userspace to round the
timeout for when a signal occurs up to at least two jiffies as a 1 jiffy
timeout may end up being effectively 0 if jiffies wraps at the wrong time.
Fixes: bc5e3a546d ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 7d7587db0d upstream.
Fix the handling of signals in client rxrpc calls made by the afs
filesystem. Ignore signals completely, leaving call abandonment or
connection loss to be detected by timeouts inside AF_RXRPC.
Allowing a filesystem call to be interrupted after the entire request has
been transmitted and an abort sent means that the server may or may not
have done the action - and we don't know. It may even be worse than that
for older servers.
Fixes: bc5e3a546d ("rxrpc: Use MSG_WAITALL to tell sendmsg() to temporarily ignore signals")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
commit 963485d436 upstream.
rxrpc_rcu_destroy_call(), which is called as an RCU callback to clean up a
put call, calls rxrpc_put_connection() which, deep in its bowels, takes a
number of spinlocks in a non-BH-safe way, including rxrpc_conn_id_lock and
local->client_conns_lock. RCU callbacks, however, are normally called from
softirq context, which can cause lockdep to notice the locking
inconsistency.
To get lockdep to detect this, it's necessary to have the connection
cleaned up on the put at the end of the last of its calls, though normally
the clean up is deferred. This can be induced, however, by starting a call
on an AF_RXRPC socket and then closing the socket without reading the
reply.
Fix this by having rxrpc_rcu_destroy_call() punt the destruction to a
workqueue if in softirq-mode and defer the destruction to process context.
Note that another way to fix this could be to add a bunch of bh-disable
annotations to the spinlocks concerned - and there might be more than just
those two - but that means spending more time with BHs disabled.
Note also that some of these places were covered by bh-disable spinlocks
belonging to the rxrpc_transport object, but these got removed without the
_bh annotation being retained on the next lock in.
Fixes: 999b69f892 ("rxrpc: Kill the client connection bundle concept")
Reported-by: syzbot+d82f3ac8d87e7ccbb2c9@syzkaller.appspotmail.com
Reported-by: syzbot+3f1fd6b8cbf8702d134e@syzkaller.appspotmail.com
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hillf Danton <hdanton@sina.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
[ Upstream commit b39a934ec7 ]
The recent patch that substituted a flag on an rxrpc_call for the
connection pointer being NULL as an indication that a call was disconnected
puts the set_bit in the wrong place for service calls. This is only a
problem if a call is implicitly terminated by a new call coming in on the
same connection channel instead of a terminating ACK packet.
In such a case, rxrpc_input_implicit_end_call() calls
__rxrpc_disconnect_call(), which is now (incorrectly) setting the
disconnection bit, meaning that when rxrpc_release_call() is later called,
it doesn't call rxrpc_disconnect_call() and so the call isn't removed from
the peer's error distribution list and the list gets corrupted.
KASAN finds the issue as an access after release on a call, but the
position at which it occurs is confusing as it appears to be related to a
different call (the call site is where the latter call is being removed
from the error distribution list and either the next or pprev pointer
points to a previously released call).
Fix this by moving the setting of the flag from __rxrpc_disconnect_call()
to rxrpc_disconnect_call() in the same place that the connection pointer
was being cleared.
Fixes: 5273a191dc ("rxrpc: Fix NULL pointer deref due to call->conn being cleared on disconnect")
Signed-off-by: David Howells <dhowells@redhat.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
Signed-off-by: Sasha Levin <sashal@kernel.org>