linux/net/core
John Fastabend 78fa0d61d9 bpf, sockmap: Pass skb ownership through read_skb
The read_skb hook calls consume_skb() now, but this means that if the
recv_actor program wants to use the skb it needs to inc the ref cnt
so that the consume_skb() doesn't kfree the sk_buff.

This is problematic because in some error cases under memory pressure
we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue().
Then we get this,

 skb_linearize()
   __pskb_pull_tail()
     pskb_expand_head()
       BUG_ON(skb_shared(skb))

Because we incremented users refcnt from sk_psock_verdict_recv() we
hit the bug on with refcnt > 1 and trip it.

To fix lets simply pass ownership of the sk_buff through the skb_read
call. Then we can drop the consume from read_skb handlers and assume
the verdict recv does any required kfree.

Bug found while testing in our CI which runs in VMs that hit memory
constraints rather regularly. William tested TCP read_skb handlers.

[  106.536188] ------------[ cut here ]------------
[  106.536197] kernel BUG at net/core/skbuff.c:1693!
[  106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[  106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1
[  106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014
[  106.537467] RIP: 0010:pskb_expand_head+0x269/0x330
[  106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202
[  106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20
[  106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8
[  106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000
[  106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8
[  106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8
[  106.540568] FS:  00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
[  106.540954] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0
[  106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  106.542255] Call Trace:
[  106.542383]  <IRQ>
[  106.542487]  __pskb_pull_tail+0x4b/0x3e0
[  106.542681]  skb_ensure_writable+0x85/0xa0
[  106.542882]  sk_skb_pull_data+0x18/0x20
[  106.543084]  bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9
[  106.543536]  ? migrate_disable+0x66/0x80
[  106.543871]  sk_psock_verdict_recv+0xe2/0x310
[  106.544258]  ? sk_psock_write_space+0x1f0/0x1f0
[  106.544561]  tcp_read_skb+0x7b/0x120
[  106.544740]  tcp_data_queue+0x904/0xee0
[  106.544931]  tcp_rcv_established+0x212/0x7c0
[  106.545142]  tcp_v4_do_rcv+0x174/0x2a0
[  106.545326]  tcp_v4_rcv+0xe70/0xf60
[  106.545500]  ip_protocol_deliver_rcu+0x48/0x290
[  106.545744]  ip_local_deliver_finish+0xa7/0x150

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Reported-by: William Findlay <will@isovalent.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-2-john.fastabend@gmail.com
2023-05-23 16:09:47 +02:00
..
bpf_sk_storage.c bpf: Teach verifier that certain helpers accept NULL pointer. 2023-04-04 16:57:16 -07:00
datagram.c net: datagram: fix data-races in datagram_poll() 2023-05-10 19:06:49 -07:00
dev_addr_lists_test.c kunit: Use KUNIT_EXPECT_MEMEQ macro 2022-10-27 02:40:14 -06:00
dev_addr_lists.c net: extract a few internals from netdevice.h 2022-04-07 20:32:09 -07:00
dev_ioctl.c net: dsa: replace NETDEV_PRE_CHANGE_HWTSTAMP notifier with a stub 2023-04-09 15:35:49 +01:00
dev.c net: add vlan_get_protocol_and_depth() helper 2023-05-10 10:25:55 +01:00
dev.h net-sysctl: factor-out rpm mask manipulation helpers 2023-02-09 17:45:55 -08:00
drop_monitor.c net: extend drop reasons for multiple subsystems 2023-04-20 20:20:49 -07:00
dst_cache.c wireguard: device: reset peer src endpoint when netns exits 2021-11-29 19:50:45 -08:00
dst.c net: dst: fix missing initialization of rt_uncached 2023-04-21 20:26:56 -07:00
failover.c net: failover: use IFF_NO_ADDRCONF flag to prevent ipv6 addrconf 2022-12-12 15:18:25 -08:00
fib_notifier.c
fib_rules.c fib: expand fib_rule_policy 2021-12-16 07:18:35 -08:00
filter.c bpf: minimal support for programs hooked into netfilter framework 2023-04-21 11:34:14 -07:00
flow_dissector.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-11-29 13:04:52 -08:00
flow_offload.c net: flow_offload: add support for ARP frame matching 2022-11-14 11:24:16 +00:00
gen_estimator.c treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
gen_stats.c net: Remove the obsolte u64_stats_fetch_*_irq() users (net). 2022-10-28 20:13:54 -07:00
gro_cells.c net: drop the weight argument from netif_napi_add 2022-09-28 18:57:14 -07:00
gro.c net: skbuff: update and rename __kfree_skb_defer() 2023-04-20 19:25:08 -07:00
hwbm.c
link_watch.c net: linkwatch: only report IF_OPER_LOWERLAYERDOWN if iflink is actually down 2022-11-16 09:45:00 +00:00
lwt_bpf.c bpf, lwt: Fix crash when using bpf_skb_set_tunnel_key() from bpf_xmit lwt hook 2022-04-22 17:45:25 +02:00
lwtunnel.c xfrm: lwtunnel: squelch kernel warning in case XFRM encap type is not available 2022-10-12 10:45:51 +02:00
Makefile netdev-genl: create a simple family for netdev stuff 2023-02-02 20:48:23 -08:00
neighbour.c neighbour: switch to standard rcu, instead of rcu_bh 2023-03-21 21:32:18 -07:00
net_namespace.c kill the last remaining user of proc_ns_fget() 2023-04-20 22:55:35 -04:00
net-procfs.c net-sysfs: display two backlog queue len separately 2023-03-22 12:03:52 +01:00
net-sysfs.c net: make default_rps_mask a per netns attribute 2023-02-20 11:22:54 +00:00
net-sysfs.h
net-traces.c net: bridge: Add a tracepoint for MDB overflows 2023-02-06 08:48:25 +00:00
netclassid_cgroup.c core: Variable type completion 2022-08-31 09:40:34 +01:00
netdev-genl-gen.c tools: ynl: skip the explicit op array size when not needed 2023-03-21 21:45:31 -07:00
netdev-genl-gen.h ynl: broaden the license even more 2023-03-16 21:20:32 -07:00
netdev-genl.c netdev-genl: create a simple family for netdev stuff 2023-02-02 20:48:23 -08:00
netevent.c net: core: Correct function name netevent_unregister_notifier() in the kerneldoc 2021-03-28 17:56:56 -07:00
netpoll.c net: don't let netpoll invoke NAPI if in xmit context 2023-04-02 13:26:21 +01:00
netprio_cgroup.c bpf, cgroups: Fix cgroup v2 fallback on v1/v2 mixed mode 2021-09-13 16:35:58 -07:00
of_net.c of: net: export of_get_mac_address_nvmem() 2022-11-29 10:45:53 +01:00
page_pool.c page_pool: unlink from napi during destroy 2023-04-20 19:13:37 -07:00
pktgen.c treewide: use get_random_u32_inclusive() when possible 2022-11-18 02:18:02 +01:00
ptp_classifier.c ptp: Add generic PTP is_sync() function 2022-03-07 11:31:34 +00:00
request_sock.c
rtnetlink.c bridge: Allow setting per-{Port, VLAN} neighbor suppression state 2023-04-21 08:25:50 +01:00
scm.c net: Ensure ->msg_control_user is used for user buffers 2023-04-14 11:09:27 +01:00
secure_seq.c tcp: Fix data-races around sysctl knobs related to SYN option. 2022-07-20 10:14:49 +01:00
selftests.c net: core: constify mac addrs in selftests 2021-10-24 13:59:44 +01:00
skbuff.c net: skb_partial_csum_set() fix against transport header magic value 2023-05-07 14:55:37 +01:00
skmsg.c bpf, sockmap: Pass skb ownership through read_skb 2023-05-23 16:09:47 +02:00
sock_destructor.h skb_expand_head() adjust skb->truesize incorrectly 2021-10-22 12:35:51 -07:00
sock_diag.c net: fix __sock_gen_cookie() 2022-11-21 20:36:30 -08:00
sock_map.c bpf, sockmap: Revert buggy deadlock fix in the sockhash and sockmap 2023-04-13 20:36:32 +02:00
sock_reuseport.c soreuseport: Fix socket selection for SO_INCOMING_CPU. 2022-10-25 11:35:16 +02:00
sock.c net: make SO_BUSY_POLL available to all users 2023-04-07 20:07:07 -07:00
stream.c net: deal with most data-races in sk_wait_event() 2023-05-10 10:03:32 +01:00
sysctl_net_core.c net/sysctl: Rename kvfree_rcu() to kvfree_rcu_mightsleep() 2023-04-05 13:48:04 +00:00
timestamping.c
tso.c net: tso: inline tso_count_descs() 2022-12-12 15:04:39 -08:00
utils.c net: core: inet[46]_pton strlen len types 2022-11-01 21:14:39 -07:00
xdp.c bpf-next-for-netdev 2023-04-13 16:43:38 -07:00