linux/net/core
Kuniyuki Iwashima a8df9d0428 udp: Update reuse->has_conns under reuseport_lock.
[ Upstream commit 69421bf984 ]

When we call connect() for a UDP socket in a reuseport group, we have
to update sk->sk_reuseport_cb->has_conns to 1.  Otherwise, the kernel
could select a unconnected socket wrongly for packets sent to the
connected socket.

However, the current way to set has_conns is illegal and possible to
trigger that problem.  reuseport_has_conns() changes has_conns under
rcu_read_lock(), which upgrades the RCU reader to the updater.  Then,
it must do the update under the updater's lock, reuseport_lock, but
it doesn't for now.

For this reason, there is a race below where we fail to set has_conns
resulting in the wrong socket selection.  To avoid the race, let's split
the reader and updater with proper locking.

 cpu1                               cpu2
+----+                             +----+

__ip[46]_datagram_connect()        reuseport_grow()
.                                  .
|- reuseport_has_conns(sk, true)   |- more_reuse = __reuseport_alloc(more_socks_size)
|  .                               |
|  |- rcu_read_lock()
|  |- reuse = rcu_dereference(sk->sk_reuseport_cb)
|  |
|  |                               |  /* reuse->has_conns == 0 here */
|  |                               |- more_reuse->has_conns = reuse->has_conns
|  |- reuse->has_conns = 1         |  /* more_reuse->has_conns SHOULD BE 1 HERE */
|  |                               |
|  |                               |- rcu_assign_pointer(reuse->socks[i]->sk_reuseport_cb,
|  |                               |                     more_reuse)
|  `- rcu_read_unlock()            `- kfree_rcu(reuse, rcu)
|
|- sk->sk_state = TCP_ESTABLISHED

Note the likely(reuse) in reuseport_has_conns_set() is always true,
but we put the test there for ease of review.  [0]

For the record, usually, sk_reuseport_cb is changed under lock_sock().
The only exception is reuseport_grow() & TCP reqsk migration case.

  1) shutdown() TCP listener, which is moved into the latter part of
     reuse->socks[] to migrate reqsk.

  2) New listen() overflows reuse->socks[] and call reuseport_grow().

  3) reuse->max_socks overflows u16 with the new listener.

  4) reuseport_grow() pops the old shutdown()ed listener from the array
     and update its sk->sk_reuseport_cb as NULL without lock_sock().

shutdown()ed TCP sk->sk_reuseport_cb can be changed without lock_sock(),
but, reuseport_has_conns_set() is called only for UDP under lock_sock(),
so likely(reuse) never be false in reuseport_has_conns_set().

[0]: https://lore.kernel.org/netdev/CANn89iLja=eQHbsM_Ta2sQF0tOGU8vAGrh_izRuuHjuO1ouUag@mail.gmail.com/

Fixes: acdcecc612 ("udp: correct reuseport selection with connected sockets")
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20221014182625.89913-1-kuniyu@amazon.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
Signed-off-by: Sasha Levin <sashal@kernel.org>
2022-10-29 10:12:56 +02:00
..
bpf_sk_storage.c net: Fix data-races around sysctl_optmem_max. 2022-08-31 17:16:43 +02:00
datagram.c tcp: TX zerocopy should not sense pfmemalloc status 2022-09-15 11:30:05 +02:00
datagram.h
dev_addr_lists.c net: dev_addr_list: handle first address in __hw_addr_add_ex 2021-09-30 13:29:09 +01:00
dev_ioctl.c net: core: don't call SIOCBRADD/DELIF for non-bridge devices 2021-08-05 11:36:59 +01:00
dev.c bpf: Don't redirect packets with invalid pkt_len 2022-09-05 10:30:07 +02:00
devlink.c devlink: Fix use-after-free after a failed reload 2022-08-25 11:40:06 +02:00
drop_monitor.c net: skb: introduce kfree_skb_reason() 2022-07-29 17:25:15 +02:00
dst_cache.c wireguard: device: reset peer src endpoint when netns exits 2021-12-08 09:04:46 +01:00
dst.c net: Remove redundant if statements 2021-08-05 13:27:50 +01:00
failover.c
fib_notifier.c
fib_rules.c ipv6: fix memory leak in fib6_rule_suppress 2021-12-08 09:04:43 +01:00
filter.c net: Fix data-races around sysctl_optmem_max. 2022-08-31 17:16:43 +02:00
flow_dissector.c net: core: fix flow symmetric hash 2022-09-28 11:11:47 +02:00
flow_offload.c netfilter: nf_tables: bail out early if hardware offload is not supported 2022-06-14 18:36:17 +02:00
gen_estimator.c net_sched: gen_estimator: support large ewma log 2021-01-15 18:11:06 -08:00
gen_stats.c docs: networking: convert gen_stats.txt to ReST 2020-04-28 14:39:46 -07:00
gro_cells.c net: Fix data-races around netdev_max_backlog. 2022-08-31 17:16:42 +02:00
hwbm.c
link_watch.c net: Write lock dev_base_lock without disabling bottom halves. 2022-06-29 09:03:22 +02:00
lwt_bpf.c bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook 2022-05-09 09:14:35 +02:00
lwtunnel.c lwtunnel: Validate RTA_ENCAP_TYPE attribute length 2022-01-11 15:35:14 +01:00
Makefile of: net: move of_net under net/ 2022-03-08 19:12:41 +01:00
neighbour.c net: neigh: don't call kfree_skb() under spin_lock_irqsave() 2022-09-05 10:30:13 +02:00
net_namespace.c net: initialize init_net earlier 2022-04-13 20:59:03 +02:00
net-procfs.c net-procfs: show net devices bound packet types 2022-02-01 17:27:08 +01:00
net-sysfs.c net: fix data-race in dev_isalive() 2022-06-29 09:03:22 +02:00
net-sysfs.h net-sysfs: add netdev_change_owner() 2020-02-26 20:07:25 -08:00
net-traces.c tcp: add tracepoint for checksum errors 2021-05-14 15:26:03 -07:00
netclassid_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2021-09-13 16:35:58 -07:00
netevent.c net: core: Correct function name netevent_unregister_notifier() in the kerneldoc 2021-03-28 17:56:56 -07:00
netpoll.c asm-generic/unaligned: Unify asm/unaligned.h around struct helper 2021-07-02 12:43:40 -07:00
netprio_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2021-09-13 16:35:58 -07:00
of_net.c of: net: move of_net under net/ 2022-03-08 19:12:41 +01:00
page_pool.c page_pool: use relaxed atomic for release side accounting 2021-08-24 10:46:31 +01:00
pktgen.c pktgen: remove unused variable 2021-09-03 11:48:28 +01:00
ptp_classifier.c bpf: Refactor BPF_PROG_RUN into a function 2021-08-17 00:45:07 +02:00
request_sock.c
rtnetlink.c net: Write lock dev_base_lock without disabling bottom halves. 2022-06-29 09:03:22 +02:00
scm.c memcg: enable accounting for scm_fp_list objects 2021-07-20 06:00:38 -07:00
secure_seq.c tcp: Fix data-races around sysctl knobs related to SYN option. 2022-07-29 17:25:22 +02:00
selftests.c net: selftests: add MTU test 2021-07-22 00:52:04 -07:00
skbuff.c net/core/skbuff: Check the return value of skb_copy_bits() 2022-09-15 11:30:01 +02:00
skmsg.c skmsg: Schedule psock work if the cached skb exists on the psock 2022-10-26 12:34:46 +02:00
sock_destructor.h skb_expand_head() adjust skb->truesize incorrectly 2021-10-22 12:35:51 -07:00
sock_diag.c bpf, net: Rework cookie generator as per-cpu one 2020-09-30 11:50:35 -07:00
sock_map.c bpf: Acquire map uref in .init_seq_private for sock{map,hash} iterator 2022-08-25 11:40:03 +02:00
sock_reuseport.c udp: Update reuse->has_conns under reuseport_lock. 2022-10-29 10:12:56 +02:00
sock.c net: Fix a data-race around sysctl_net_busy_read. 2022-08-31 17:16:43 +02:00
stream.c net: If sock is dead don't access sock's sk_wq in sk_stream_wait_memory 2022-10-26 12:35:37 +02:00
sysctl_net_core.c net: Fix data-races around weight_p and dev_weight_[rt]x_bias. 2022-08-31 17:16:42 +02:00
timestamping.c
tso.c net: tso: add UDP segmentation support 2020-06-18 20:46:23 -07:00
utils.c net: Fix skb->csum update in inet_proto_csum_replace16(). 2020-01-24 20:54:30 +01:00
xdp.c xdp: Move the rxq_info.mem clearing to unreg_mem_model() 2021-06-28 23:07:59 +02:00