linux/tools
Lorenz Bauer 9c02bec959 bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign
Currently the bpf_sk_assign helper in tc BPF context refuses SO_REUSEPORT
sockets. This means we can't use the helper to steer traffic to Envoy,
which configures SO_REUSEPORT on its sockets. In turn, we're blocked
from removing TPROXY from our setup.

The reason that bpf_sk_assign refuses such sockets is that the
bpf_sk_lookup helpers don't execute SK_REUSEPORT programs. Instead,
one of the reuseport sockets is selected by hash. This could cause
dispatch to the "wrong" socket:

    sk = bpf_sk_lookup_tcp(...) // select SO_REUSEPORT by hash
    bpf_sk_assign(skb, sk) // SK_REUSEPORT wasn't executed

Fixing this isn't as simple as invoking SK_REUSEPORT from the lookup
helpers unfortunately. In the tc context, L2 headers are at the start
of the skb, while SK_REUSEPORT expects L3 headers instead.

Instead, we execute the SK_REUSEPORT program when the assigned socket
is pulled out of the skb, further up the stack. This creates some
trickiness with regards to refcounting as bpf_sk_assign will put both
refcounted and RCU freed sockets in skb->sk. reuseport sockets are RCU
freed. We can infer that the sk_assigned socket is RCU freed if the
reuseport lookup succeeds, but convincing yourself of this fact isn't
straight forward. Therefore we defensively check refcounting on the
sk_assign sock even though it's probably not required in practice.

Fixes: 8e368dc72e ("bpf: Fix use of sk->sk_reuseport from sk_assign")
Fixes: cf7fbe660f ("bpf: Add socket assign support")
Co-developed-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Cc: Joe Stringer <joe@cilium.io>
Link: https://lore.kernel.org/bpf/CACAyw98+qycmpQzKupquhkxbvWK4OFyDuuLMBNROnfWMZxUWeA@mail.gmail.com/
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Signed-off-by: Lorenz Bauer <lmb@isovalent.com>
Link: https://lore.kernel.org/r/20230720-so-reuseport-v6-7-7021b683cdae@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
2023-07-25 13:55:55 -07:00
..
accounting
arch tools headers arm64: Sync arm64's cputype.h with the kernel sources 2023-07-14 10:36:16 -03:00
bootconfig
bpf bpftool: Extend net dump with tcx progs 2023-07-19 10:07:28 -07:00
build Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-07-20 15:52:55 -07:00
certs
cgroup
counter
debugging
edid
firewire
firmware
gpio
hv
iio
include bpf, net: Support SO_REUSEPORT sockets with bpf_sk_assign 2023-07-25 13:55:55 -07:00
io_uring
kvm/kvm_stat
laptop
leds
lib Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-07-20 15:52:55 -07:00
memory-model
mm
net/ynl tools: ynl: add display-hint support to ynl 2023-06-24 15:45:49 -07:00
objtool objtool: initialize all of struct elf 2023-07-10 09:52:28 +02:00
pci
pcmcia
perf perf test task_exit: No need for a cycles event to check if we get an PERF_RECORD_EXIT 2023-07-17 10:27:44 -03:00
power platform-drivers-x86 for v6.5-1 2023-06-30 14:50:00 -07:00
rcu
scripts
spi spi: spidev_test Add three missing spi mode bits 2023-05-30 15:20:12 +01:00
testing connector/cn_proc: Selftest for proc connector 2023-07-23 11:34:22 +01:00
thermal
time
tracing rtla/timerlat_hist: Add timerlat user-space support 2023-06-13 16:41:14 -04:00
usb usbip: Use _FORTIFY_SOURCE=2 instead of (implicitly) =1 2023-05-29 15:11:30 +01:00
verification
virtio tools/virtio: fix build break for aarch64 2023-06-27 10:47:08 -04:00
wmi
workqueue
Makefile