linux/net
John Fastabend 78fa0d61d9 bpf, sockmap: Pass skb ownership through read_skb
The read_skb hook calls consume_skb() now, but this means that if the
recv_actor program wants to use the skb it needs to inc the ref cnt
so that the consume_skb() doesn't kfree the sk_buff.

This is problematic because in some error cases under memory pressure
we may need to linearize the sk_buff from sk_psock_skb_ingress_enqueue().
Then we get this,

 skb_linearize()
   __pskb_pull_tail()
     pskb_expand_head()
       BUG_ON(skb_shared(skb))

Because we incremented users refcnt from sk_psock_verdict_recv() we
hit the bug on with refcnt > 1 and trip it.

To fix lets simply pass ownership of the sk_buff through the skb_read
call. Then we can drop the consume from read_skb handlers and assume
the verdict recv does any required kfree.

Bug found while testing in our CI which runs in VMs that hit memory
constraints rather regularly. William tested TCP read_skb handlers.

[  106.536188] ------------[ cut here ]------------
[  106.536197] kernel BUG at net/core/skbuff.c:1693!
[  106.536479] invalid opcode: 0000 [#1] PREEMPT SMP PTI
[  106.536726] CPU: 3 PID: 1495 Comm: curl Not tainted 5.19.0-rc5 #1
[  106.537023] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS ArchLinux 1.16.0-1 04/01/2014
[  106.537467] RIP: 0010:pskb_expand_head+0x269/0x330
[  106.538585] RSP: 0018:ffffc90000138b68 EFLAGS: 00010202
[  106.538839] RAX: 000000000000003f RBX: ffff8881048940e8 RCX: 0000000000000a20
[  106.539186] RDX: 0000000000000002 RSI: 0000000000000000 RDI: ffff8881048940e8
[  106.539529] RBP: ffffc90000138be8 R08: 00000000e161fd1a R09: 0000000000000000
[  106.539877] R10: 0000000000000018 R11: 0000000000000000 R12: ffff8881048940e8
[  106.540222] R13: 0000000000000003 R14: 0000000000000000 R15: ffff8881048940e8
[  106.540568] FS:  00007f277dde9f00(0000) GS:ffff88813bd80000(0000) knlGS:0000000000000000
[  106.540954] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  106.541227] CR2: 00007f277eeede64 CR3: 000000000ad3e000 CR4: 00000000000006e0
[  106.541569] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  106.541915] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  106.542255] Call Trace:
[  106.542383]  <IRQ>
[  106.542487]  __pskb_pull_tail+0x4b/0x3e0
[  106.542681]  skb_ensure_writable+0x85/0xa0
[  106.542882]  sk_skb_pull_data+0x18/0x20
[  106.543084]  bpf_prog_b517a65a242018b0_bpf_skskb_http_verdict+0x3a9/0x4aa9
[  106.543536]  ? migrate_disable+0x66/0x80
[  106.543871]  sk_psock_verdict_recv+0xe2/0x310
[  106.544258]  ? sk_psock_write_space+0x1f0/0x1f0
[  106.544561]  tcp_read_skb+0x7b/0x120
[  106.544740]  tcp_data_queue+0x904/0xee0
[  106.544931]  tcp_rcv_established+0x212/0x7c0
[  106.545142]  tcp_v4_do_rcv+0x174/0x2a0
[  106.545326]  tcp_v4_rcv+0xe70/0xf60
[  106.545500]  ip_protocol_deliver_rcu+0x48/0x290
[  106.545744]  ip_local_deliver_finish+0xa7/0x150

Fixes: 04919bed94 ("tcp: Introduce tcp_read_skb()")
Reported-by: William Findlay <will@isovalent.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: William Findlay <will@isovalent.com>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/bpf/20230523025618.113937-2-john.fastabend@gmail.com
2023-05-23 16:09:47 +02:00
..
6lowpan 6lowpan: Remove redundant initialisation. 2023-03-29 08:22:52 +01:00
9p Including fixes from netfilter. 2023-05-05 19:12:01 -07:00
802 treewide: Convert del_timer*() to timer_shutdown*() 2022-12-25 13:38:09 -08:00
8021q vlan: Add MACsec offload operations for VLAN interface 2023-04-21 08:22:14 +01:00
appletalk
atm Networking changes for 6.4. 2023-04-26 16:07:23 -07:00
ax25
batman-adv net: vlan: introduce skb_vlan_eth_hdr() 2023-04-23 14:16:44 +01:00
bluetooth Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
bpf bpf: add test_run support for netfilter program type 2023-04-21 11:34:50 -07:00
bpfilter
bridge net: add vlan_get_protocol_and_depth() helper 2023-05-10 10:25:55 +01:00
caif net: caif: Fix use-after-free in cfusbl_device_notify() 2023-03-02 22:22:07 -08:00
can Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-06 12:01:20 -07:00
ceph Networking changes for 6.3. 2023-02-21 18:24:12 -08:00
core bpf, sockmap: Pass skb ownership through read_skb 2023-05-23 16:09:47 +02:00
dcb net: dcb: add helper functions to retrieve PCP and DSCP rewrite maps 2023-01-20 09:33:22 +00:00
dccp netfilter: keep conntrack reference until IPsecv6 policy checks are done 2023-03-22 21:50:23 +01:00
devlink devlink: change per-devlink netdev notifier to static one 2023-05-11 18:06:50 -07:00
dns_resolver
dsa net: dsa: tag_ocelot: call only the relevant portion of __skb_vlan_pop() on TX 2023-04-23 14:16:45 +01:00
ethernet
ethtool ethtool: Fix uninitialized number of lanes 2023-05-03 09:13:20 +01:00
handshake net/handshake: Fix section mismatch in handshake_exit 2023-04-21 20:24:57 -07:00
hsr hsr: ratelimit only when errors are printed 2023-03-16 21:11:03 -07:00
ieee802154 net: ieee802154: remove an unnecessary null pointer check 2023-03-17 09:13:53 +01:00
ife
ipv4 bpf, sockmap: Pass skb ownership through read_skb 2023-05-23 16:09:47 +02:00
ipv6 erspan: get the proto with the md version for collect_md 2023-05-13 16:58:58 +01:00
iucv net/iucv: Fix size of interrupt data 2023-03-16 17:34:40 -07:00
kcm net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
key af_key: Fix heap information leak 2023-02-13 09:30:14 +00:00
l2tp l2tp: generate correct module alias strings 2023-03-31 09:25:12 +01:00
l3mdev
lapb
llc net: deal with most data-races in sk_wait_event() 2023-05-10 10:03:32 +01:00
mac80211 wireless-next patches for v6.4 2023-04-21 07:35:51 -07:00
mac802154 mac802154: Rename kfree_rcu() to kvfree_rcu_mightsleep() 2023-04-05 13:48:04 +00:00
mctp mctp: remove MODULE_LICENSE in non-modules 2023-03-09 23:06:21 -08:00
mpls net: mpls: fix stale pointer if allocation fails during device rename 2023-02-15 10:26:37 +00:00
mptcp Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2023-04-20 16:29:51 -07:00
ncsi net/ncsi: clear Tx enable mode when handling a Config required AEN 2023-04-28 09:35:33 +01:00
netfilter netfilter: conntrack: fix possible bug_on with enable_hooks=1 2023-05-10 08:50:39 +02:00
netlabel
netlink netlink: annotate accesses to nlk->cb_running 2023-05-10 09:28:38 +01:00
netrom netrom: Fix use-after-free caused by accept on already connected socket 2023-01-30 07:30:47 +00:00
nfc nfc: change order inside nfc_se_io error path 2023-03-07 13:37:05 -08:00
nsh
openvswitch net: openvswitch: fix race on port output 2023-04-07 19:42:53 -07:00
packet net: add vlan_get_protocol_and_depth() helper 2023-05-10 10:25:55 +01:00
phonet net/sock: Introduce trace_sk_data_ready() 2023-01-23 11:26:50 +00:00
psample
qrtr net: qrtr: Fix an uninit variable access bug in qrtr_tx_resume() 2023-04-13 09:35:30 +02:00
rds rds: rds_rm_zerocopy_callback() correct order for list_add_tail() 2023-02-13 09:33:39 +00:00
rfkill net: rfkill-gpio: Add explicit include for of.h 2023-04-06 20:36:27 +02:00
rose net/rose: Fix to not accept on connected socket 2023-01-28 00:19:57 -08:00
rxrpc Including fixes from netfilter. 2023-05-05 19:12:01 -07:00
sched net/sched: flower: fix error handler on replace 2023-05-05 10:01:31 +01:00
sctp sctp: delete the nested flexible array hmac 2023-04-21 08:19:30 +01:00
smc net: deal with most data-races in sk_wait_event() 2023-05-10 10:03:32 +01:00
strparser
sunrpc NFSD 6.4 Release Notes 2023-04-29 11:04:14 -07:00
switchdev
tipc net: deal with most data-races in sk_wait_event() 2023-05-10 10:03:32 +01:00
tls net: deal with most data-races in sk_wait_event() 2023-05-10 10:03:32 +01:00
unix bpf, sockmap: Pass skb ownership through read_skb 2023-05-23 16:09:47 +02:00
vmw_vsock bpf, sockmap: Pass skb ownership through read_skb 2023-05-23 16:09:47 +02:00
wireless Driver core changes for 6.4-rc1 2023-04-27 11:53:57 -07:00
x25 net/x25: Fix to not accept on connected socket 2023-01-25 09:51:04 +00:00
xdp bpf-next-for-netdev 2023-04-13 16:43:38 -07:00
xfrm ipsec-next-2023-04-19 2023-04-19 18:46:17 -07:00
compat.c net/compat: Update msg_control_is_user when setting a kernel pointer 2023-04-14 11:09:27 +01:00
devres.c
Kconfig net/handshake: Add Kunit tests for the handshake consumer API 2023-04-19 18:48:48 -07:00
Kconfig.debug
Makefile net/handshake: Create a NETLINK service for handling handshake requests 2023-04-19 18:48:48 -07:00
socket.c net: annotate sk->sk_err write from do_recvmmsg() 2023-05-10 09:58:29 +01:00
sysctl_net.c