linux/net/ipv4
John Fastabend 78fa0d61d9 bpf, sockmap: Pass skb ownership through read_skb
The read_skb hook calls consume_skb() now, but this means that if the
recv_actor program wants to use the skb it needs to inc the ref cnt
so that the consume_skb() doesn't kfree the sk_buff.

This is problematic because in some error cases under memory pressure
we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue().
Then we get this,

 skb_linearize()
   __pskb_pull_tail()
     pskb_expand_head()
       BUG_ON(skb_shared(skb))

Because we incremented users refcnt from sk_psock_verdict_recv() we
hit the bug on with refcnt > 1 and trip it.

To fix lets simply pass ownership of the sk_buff through the skb_read
call. Then we can drop the consume from read_skb handlers and assume
the verdict recv does any required kfree.

Bug found while testing in our CI which runs in VMs that hit memory
constraints rather regularly. William tested TCP read_skb handlers.

[  106.536188] ------------[ cut here ]------------
[  106.536197] kernel BUG at net/core/skbuff.c:1693!
[  106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[  106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1
[  106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014
[  106.537467] RIP: 0010:pskb_expand_head+0x269/0x330
[  106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202
[  106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20
[  106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8
[  106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000
[  106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8
[  106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8
[  106.540568] FS:  00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
[  106.540954] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0
[  106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  106.542255] Call Trace:
[  106.542383]  <IRQ>
[  106.542487]  __pskb_pull_tail+0x4b/0x3e0
[  106.542681]  skb_ensure_writable+0x85/0xa0
[  106.542882]  sk_skb_pull_data+0x18/0x20
[  106.543084]  bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9
[  106.543536]  ? migrate_disable+0x66/0x80
[  106.543871]  sk_psock_verdict_recv+0xe2/0x310
[  106.544258]  ? sk_psock_write_space+0x1f0/0x1f0
[  106.544561]  tcp_read_skb+0x7b/0x120
[  106.544740]  tcp_data_queue+0x904/0xee0
[  106.544931]  tcp_rcv_established+0x212/0x7c0
[  106.545142]  tcp_v4_do_rcv+0x174/0x2a0
[  106.545326]  tcp_v4_rcv+0xe70/0xf60
[  106.545500]  ip_protocol_deliver_rcu+0x48/0x290
[  106.545744]  ip_local_deliver_finish+0xa7/0x150

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Reported-by: William Findlay <will@isovalent.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-2-john.fastabend@gmail.com
2023-05-23 16:09:47 +02:00
..
bpfilter
netfilter xtables: move icmp/icmpv6 logic to xt_tcpudp 2023-03-22 21:48:59 +01:00
af_inet.c tcp: add annotations around sk->sk_shutdown accesses 2023-05-10 10:27:31 +01:00
ah4.c net: ipv4: Remove completion function scaffolding 2023-02-13 18:35:15 +08:00
arp.c neighbour: annotate lockless accesses to n->nud_state 2023-03-15 00:37:32 -07:00
bpf_tcp_ca.c bpf: Remove unused arguments from btf_struct_access(). 2023-04-04 16:57:10 -07:00
cipso_ipv4.c cipso_ipv4: use iph_set_totlen in skbuff_setattr 2023-02-01 20:54:27 -08:00
datagram.c Networking fixes for 6.1-rc2, including fixes from netfilter 2022-10-20 17:24:59 -07:00
devinet.c net: ipv4: Allow changing IPv4 address protocol 2023-03-23 08:32:52 +00:00
esp4_offload.c xfrm: replay: Fix ESN wrap around for GSO 2022-10-19 09:00:53 +02:00
esp4.c net: ipv4: Remove completion function scaffolding 2023-02-13 18:35:15 +08:00
fib_frontend.c ipv4: Fix incorrect table ID in IOCTL path 2023-03-16 17:26:31 -07:00
fib_lookup.h Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-02-17 11:44:20 -08:00
fib_notifier.c net: ipv4: remove superfluous header files from fib_notifier.c 2021-09-28 17:32:56 -07:00
fib_rules.c ipv4: remove unnecessary type castings 2022-04-30 15:12:58 +01:00
fib_semantics.c neighbour: switch to standard rcu, instead of rcu_bh 2023-03-21 21:32:18 -07:00
fib_trie.c ipv4: Fix error return code in fib_table_insert() 2022-11-22 20:18:20 -08:00
fou_bpf.c bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
fou_core.c bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
fou_nl.c ynl: broaden the license even more 2023-03-16 21:20:32 -07:00
fou_nl.h ynl: broaden the license even more 2023-03-16 21:20:32 -07:00
gre_demux.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
gre_offload.c net: gro: skb_gro_header helper function 2022-08-25 10:33:21 +02:00
icmp.c icmp: guard against too small mtu 2023-03-31 21:37:06 -07:00
igmp.c ipv4: constify ip_mc_sf_allow() socket argument 2023-03-17 08:56:37 +00:00
inet_connection_sock.c net/ulp: Remove redundant ->clone() test in inet_clone_ulp(). 2023-02-20 16:31:49 -08:00
inet_diag.c net: inet: Retire port only listening_hash 2022-05-12 16:52:18 -07:00
inet_fragment.c net: dropreason: add SKB_DROP_REASON_FRAG_REASM_TIMEOUT 2022-10-31 20:14:27 -07:00
inet_hashtables.c ipv6: Remove in6addr_any alternatives. 2023-03-29 08:22:52 +01:00
inet_timewait_sock.c net: no longer support SOCK_REFCNT_DEBUG feature 2023-02-15 10:25:21 +00:00
inetpeer.c inetpeer: Fix data-races around sysctl. 2022-07-08 12:10:33 +01:00
ip_forward.c ip: Fix data-races around sysctl_ip_fwd_update_priority. 2022-07-15 11:49:55 +01:00
ip_fragment.c net: dropreason: add SKB_DROP_REASON_FRAG_TOO_FAR 2022-10-31 20:14:27 -07:00
ip_gre.c erspan: do not use skb_mac_header() in ndo_start_xmit() 2023-03-21 21:16:26 -07:00
ip_input.c net: add support for ipv4 big tcp 2023-02-01 20:54:27 -08:00
ip_options.c ipv4: drop fragmentation code from ip_options_build() 2022-01-29 17:53:07 +00:00
ip_output.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-26 10:17:46 +02:00
ip_sockglue.c inet: Add IP_LOCAL_PORT_RANGE socket option 2023-01-25 22:45:00 -08:00
ip_tunnel_core.c net: Add helper function to parse netlink msg of ip_tunnel_parm 2022-10-03 07:59:06 +01:00
ip_tunnel.c bpf-next-for-netdev 2023-04-13 16:43:38 -07:00
ip_vti.c ipv4: tunnels: use DEV_STATS_INC() 2022-11-16 12:48:44 +00:00
ipcomp.c xfrm: ipcomp: add extack to ipcomp{4,6}_init_state 2022-09-29 07:18:00 +02:00
ipconfig.c Driver core / kernfs changes for 6.0-rc1 2022-08-04 11:31:20 -07:00
ipip.c ipip,ip_tunnel,sit: Add FOU support for externally controlled ipip devices 2023-04-12 16:40:39 -07:00
ipmr_base.c ipmr: adopt rcu_read_lock() in mr_dump() 2022-06-24 11:34:38 +01:00
ipmr.c treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
Kconfig tcp: configurable source port perturb table size 2022-11-16 13:02:04 +00:00
Makefile bpf,fou: Add bpf_skb_{set,get}_fou_encap kfuncs 2023-04-12 16:40:39 -07:00
metrics.c ipv4: prevent potential spectre v1 gadget in ip_metrics_convert() 2023-01-23 21:37:25 -08:00
netfilter.c netfilter: Use l3mdev flow key when re-routing mangled packets 2022-05-16 13:03:29 +02:00
netlink.c
nexthop.c neighbour: switch to standard rcu, instead of rcu_bh 2023-03-21 21:32:18 -07:00
ping.c ping: Fix potentail NULL deref for /proc/net/icmp. 2023-04-04 18:56:58 -07:00
proc.c icmp: Add counters for rate limits 2023-01-26 10:52:18 +01:00
protocol.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
raw_diag.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-06 12:01:20 -07:00
raw.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-06 12:01:20 -07:00
route.c net: dst: fix missing initialization of rt_uncached 2023-04-21 20:26:56 -07:00
syncookies.c mptcp: remove MPTCP 'ifdef' in TCP SYN cookies 2022-12-12 13:11:24 -08:00
sysctl_net_ipv4.c tcp: restrict net.ipv4.tcp_app_win 2023-04-07 08:19:11 +01:00
tcp_bbr.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
tcp_bic.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_bpf.c net: deal with most data-races in sk_wait_event() 2023-05-10 10:03:32 +01:00
tcp_cdg.c Random number generator fixes for Linux 6.1-rc1. 2022-10-16 15:27:07 -07:00
tcp_cong.c net: Update an existing TCP congestion control algorithm. 2023-03-22 22:53:00 -07:00
tcp_cubic.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
tcp_dctcp.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
tcp_dctcp.h
tcp_diag.c tcp: Access &tcp_hashinfo via net. 2022-09-20 10:21:49 -07:00
tcp_fastopen.c tcp: Make SYN ACK RTO tunable by BPF programs with TFO 2022-08-17 10:19:22 +01:00
tcp_highspeed.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_htcp.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_hybla.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_illinois.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_input.c tcp: add annotations around sk->sk_shutdown accesses 2023-05-10 10:27:31 +01:00
tcp_ipv4.c tcp: fix possible sk_priority leak in tcp_v4_send_reset() 2023-05-12 10:05:50 +01:00
tcp_lp.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_metrics.c genetlink: start to validate reserved header bytes 2022-08-29 12:47:15 +01:00
tcp_minisocks.c tcp: preserve const qualifier in tcp_sk() 2023-03-18 12:23:34 +00:00
tcp_nv.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_offload.c gro: add support of (hw)gro packets to gro stack 2022-10-03 12:38:34 +01:00
tcp_output.c tcp: preserve const qualifier in tcp_sk() 2023-03-18 12:23:34 +00:00
tcp_plb.c prandom: remove prandom_u32_max() 2022-12-20 03:13:45 +01:00
tcp_rate.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-04-28 13:02:01 -07:00
tcp_recovery.c tcp: preserve const qualifier in tcp_sk() 2023-03-18 12:23:34 +00:00
tcp_scalable.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_timer.c tcp: annotate lockless access to sk->sk_err 2023-03-17 08:25:05 +00:00
tcp_ulp.c net/ulp: use consistent error code when blocking ULP 2023-01-19 09:26:16 -08:00
tcp_vegas.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_vegas.h
tcp_veno.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_westwood.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp_yeah.c tcp: add accessors to read/set tp->snd_cwnd 2022-04-06 12:05:41 -07:00
tcp.c bpf, sockmap: Pass skb ownership through read_skb 2023-05-23 16:09:47 +02:00
tunnel4.c net: Remove the member netns_ok 2021-05-17 15:29:35 -07:00
udp_bpf.c bpf, sockmap: Fix an infinite loop error when len is 0 in tcp_bpf_recvmsg_parser() 2023-03-03 17:25:15 +01:00
udp_diag.c udp: Access &udp_table via net. 2022-11-16 09:43:35 +00:00
udp_impl.h net: remove noblock parameter from recvmsg() entities 2022-04-12 15:00:25 +02:00
udp_offload.c udp: allow header check for dodgy GSO_UDP_L4 packets. 2022-12-12 09:29:56 +00:00
udp_tunnel_core.c net/tunnel: wait until all sk_user_data reader finish before releasing the sock 2022-12-12 09:51:52 +00:00
udp_tunnel_nic.c udp_tunnel: Add checks for nla_nest_start() in __udp_tunnel_nic_dump_write() 2022-11-29 08:44:24 -08:00
udp_tunnel_stub.c
udp.c bpf, sockmap: Pass skb ownership through read_skb 2023-05-23 16:09:47 +02:00
udplite.c tcp/udp: Call inet6_destroy_sock() in IPv6 sk->sk_destruct(). 2022-10-12 17:50:37 -07:00
xfrm4_input.c
xfrm4_output.c
xfrm4_policy.c net: dst: fix missing initialization of rt_uncached 2023-04-21 20:26:56 -07:00
xfrm4_protocol.c net: xfrm: unexport __init-annotated xfrm4_protocol_init() 2022-06-08 10:10:13 -07:00
xfrm4_state.c
xfrm4_tunnel.c xfrm: tunnel: add extack to ipip_init_state, xfrm6_tunnel_init_state 2022-09-29 07:18:00 +02:00