linux/kernel
Yonghong Song a2baf4e8bb bpf: Fix potentially incorrect results with bpf_get_local_storage()
Commit b910eaaaa4 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage()
helper") fixed a bug for bpf_get_local_storage() helper so different tasks
won't mess up with each other's percpu local storage.

The percpu data contains 8 slots so it can hold up to 8 contexts (same or
different tasks), for 8 different program runs, at the same time. This in
general is sufficient. But our internal testing showed the following warning
multiple times:

  [...]
  warning: WARNING: CPU: 13 PID: 41661 at include/linux/bpf-cgroup.h:193
     __cgroup_bpf_run_filter_sock_ops+0x13e/0x180
  RIP: 0010:__cgroup_bpf_run_filter_sock_ops+0x13e/0x180
  <IRQ>
   tcp_call_bpf.constprop.99+0x93/0xc0
   tcp_conn_request+0x41e/0xa50
   ? tcp_rcv_state_process+0x203/0xe00
   tcp_rcv_state_process+0x203/0xe00
   ? sk_filter_trim_cap+0xbc/0x210
   ? tcp_v6_inbound_md5_hash.constprop.41+0x44/0x160
   tcp_v6_do_rcv+0x181/0x3e0
   tcp_v6_rcv+0xc65/0xcb0
   ip6_protocol_deliver_rcu+0xbd/0x450
   ip6_input_finish+0x11/0x20
   ip6_input+0xb5/0xc0
   ip6_sublist_rcv_finish+0x37/0x50
   ip6_sublist_rcv+0x1dc/0x270
   ipv6_list_rcv+0x113/0x140
   __netif_receive_skb_list_core+0x1a0/0x210
   netif_receive_skb_list_internal+0x186/0x2a0
   gro_normal_list.part.170+0x19/0x40
   napi_complete_done+0x65/0x150
   mlx5e_napi_poll+0x1ae/0x680
   __napi_poll+0x25/0x120
   net_rx_action+0x11e/0x280
   __do_softirq+0xbb/0x271
   irq_exit_rcu+0x97/0xa0
   common_interrupt+0x7f/0xa0
   </IRQ>
   asm_common_interrupt+0x1e/0x40
  RIP: 0010:bpf_prog_1835a9241238291a_tw_egress+0x5/0xbac
   ? __cgroup_bpf_run_filter_skb+0x378/0x4e0
   ? do_softirq+0x34/0x70
   ? ip6_finish_output2+0x266/0x590
   ? ip6_finish_output+0x66/0xa0
   ? ip6_output+0x6c/0x130
   ? ip6_xmit+0x279/0x550
   ? ip6_dst_check+0x61/0xd0
  [...]

Using drgn [0] to dump the percpu buffer contents showed that on this CPU
slot 0 is still available, but slots 1-7 are occupied and those tasks in
slots 1-7 mostly don't exist any more. So we might have issues in
bpf_cgroup_storage_unset().

Further debugging confirmed that there is a bug in bpf_cgroup_storage_unset().
Currently, it tries to unset "current" slot with searching from the start.
So the following sequence is possible:

  1. A task is running and claims slot 0
  2. Running BPF program is done, and it checked slot 0 has the "task"
     and ready to reset it to NULL (not yet).
  3. An interrupt happens, another BPF program runs and it claims slot 1
     with the *same* task.
  4. The unset() in interrupt context releases slot 0 since it matches "task".
  5. Interrupt is done, the task in process context reset slot 0.

At the end, slot 1 is not reset and the same process can continue to occupy
slots 2-7 and finally, when the above step 1-5 is repeated again, step 3 BPF
program won't be able to claim an empty slot and a warning will be issued.

To fix the issue, for unset() function, we should traverse from the last slot
to the first. This way, the above issue can be avoided.

The same reverse traversal should also be done in bpf_get_local_storage() helper
itself. Otherwise, incorrect local storage may be returned to BPF program.

  [0] https://github.com/osandov/drgn

Fixes: b910eaaaa4 ("bpf: Fix NULL pointer dereference in bpf_get_local_storage() helper")
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20210810010413.1976277-1-yhs@fb.com
2021-08-10 10:27:16 +02:00
..
bpf bpf: Fix potentially incorrect results with bpf_get_local_storage() 2021-08-10 10:27:16 +02:00
cgroup cgroup1: fix leaked context root causing sporadic NULL deref in LTP 2021-07-21 06:39:20 -10:00
configs drivers/char: remove /dev/kmem for good 2021-05-07 00:26:34 -07:00
debug kernel: debug: Fix unreachable code in gdb_serial_stub() 2021-07-12 11:03:35 -05:00
dma dma-mapping: handle vmalloc addresses in dma_common_{mmap,get_sgtable} 2021-07-16 11:30:26 +02:00
entry tick/nohz: Only check for RCU deferred wakeup on user/guest entry when needed 2021-05-31 10:14:49 +02:00
events Merge branch 'akpm' (patches from Andrew) 2021-06-29 17:29:11 -07:00
gcov Kconfig: Introduce ARCH_WANTS_NO_INSTR and CC_HAS_NO_PROFILE_FN_ATTR 2021-06-22 11:07:18 -07:00
irq irqchip fixes for 5.14, take #1 2021-07-09 15:35:13 +02:00
kcsan Merge branch 'kcsan.2021.05.18a' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu 2021-07-04 12:29:16 -07:00
livepatch Livepatching changes for 5.13 2021-04-27 18:14:38 -07:00
locking Locking fixes: 2021-07-11 11:06:09 -07:00
power PM: hibernate: disable when there are active secretmem users 2021-07-08 11:48:21 -07:00
printk printk changes for 5.14 2021-06-29 12:07:18 -07:00
rcu rcu: Fix pr_info() formats and values in show_rcu_gp_kthreads() 2021-07-06 15:53:12 -07:00
sched Three fixes: 2021-07-11 11:13:57 -07:00
time timers: Fix get_next_timer_interrupt() with no timers pending 2021-07-15 01:23:54 +02:00
trace bpf: Add lockdown check for probe_write_user helper 2021-08-10 10:10:10 +02:00
.gitignore .gitignore: prefix local generated files with a slash 2021-05-02 00:43:35 +09:00
acct.c kernel/acct.c: use #elif instead of #end and #elif 2020-12-15 22:46:15 -08:00
async.c kernel/async.c: remove async_unregister_domain() 2021-05-07 00:26:33 -07:00
audit_fsnotify.c audit_alloc_mark(): don't open-code ERR_CAST() 2021-02-23 10:25:27 -05:00
audit_tree.c audit: Use list_move instead of list_del/list_add 2021-06-08 22:18:35 -04:00
audit_watch.c fsnotify: generalize handle_inode_event() 2020-12-03 14:58:35 +01:00
audit.c lsm: separate security_task_getsecid() into subjective and objective variants 2021-03-22 15:23:32 -04:00
audit.h audit: remove trailing spaces and tabs 2021-06-10 20:59:05 -04:00
auditfilter.c lsm: separate security_task_getsecid() into subjective and objective variants 2021-03-22 15:23:32 -04:00
auditsc.c audit: remove trailing spaces and tabs 2021-06-10 20:59:05 -04:00
backtracetest.c
bounds.c
capability.c capability: handle idmapped mounts 2021-01-24 14:27:16 +01:00
cfi.c add support for Clang CFI 2021-04-08 16:04:20 -07:00
compat.c
configs.c
context_tracking.c
cpu_pm.c
cpu.c A fix for the CPU hotplug and cpusets interaction: 2021-06-29 12:23:02 -07:00
crash_core.c kdump: use vmlinux_build_id to simplify 2021-07-08 11:48:22 -07:00
crash_dump.c
cred.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-06-28 20:39:26 -07:00
delayacct.c delayacct: Add sysctl to enable at runtime 2021-05-12 11:43:25 +02:00
dma.c
exec_domain.c
exit.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-06-28 20:39:26 -07:00
extable.c
fail_function.c fault-injection: handle EI_ETYPE_TRUE 2020-12-15 22:46:19 -08:00
fork.c Merge branch 'akpm' (patches from Andrew) 2021-06-29 17:29:11 -07:00
freezer.c sched: Add get_current_state() 2021-06-18 11:43:08 +02:00
futex.c Locking changes for this cycle: 2021-06-28 11:45:29 -07:00
gen_kheaders.sh kbuild: clean up ${quiet} checks in shell scripts 2021-05-27 04:01:50 +09:00
groups.c groups: simplify struct group_info allocation 2021-02-26 09:41:03 -08:00
hung_task.c Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
iomem.c
irq_work.c irq_work: Make irq_work_queue() NMI-safe again 2021-06-10 10:00:08 +02:00
jump_label.c jump_label: Fix jump_label_text_reserved() vs __init 2021-07-05 10:46:20 +02:00
kallsyms.c module: add printk formats to add module build ID to stacktraces 2021-07-08 11:48:22 -07:00
kcmp.c Merge branch 'exec-update-lock-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:36:48 -08:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt sched/core: Disable CONFIG_SCHED_CORE by default 2021-06-28 22:43:05 +02:00
kcov.c kernel: make kcov_common_handle consider the current context 2020-11-02 18:00:20 -08:00
kexec_core.c kernel.h: split out panic and oops helpers 2021-07-01 11:06:04 -07:00
kexec_elf.c
kexec_file.c kernel: kexec_file: fix error return code of kexec_calculate_store_digests() 2021-05-07 00:26:32 -07:00
kexec_internal.h kexec: move machine_kexec_post_load() to public interface 2021-02-22 12:33:26 +00:00
kexec.c
kheaders.c
kmod.c modules: add CONFIG_MODPROBE_PATH 2021-05-07 00:26:33 -07:00
kprobes.c Locking fixes: 2021-07-11 11:06:09 -07:00
ksysfs.c
kthread.c Merge branch 'akpm' (patches from Andrew) 2021-06-29 17:29:11 -07:00
latencytop.c
Makefile kbuild: update config_data.gz only when the content of .config is changed 2021-05-02 00:43:35 +09:00
module_signature.c module: harden ELF info handling 2021-01-19 10:24:45 +01:00
module_signing.c module: harden ELF info handling 2021-01-19 10:24:45 +01:00
module-internal.h
module.c module: add printk formats to add module build ID to stacktraces 2021-07-08 11:48:22 -07:00
notifier.c
nsproxy.c fixes-v5.11 2020-12-14 16:40:27 -08:00
padata.c
panic.c kernel.h: split out panic and oops helpers 2021-07-01 11:06:04 -07:00
params.c Modules updates for v5.11 2020-12-17 13:01:31 -08:00
pid_namespace.c fixes-v5.11 2020-12-14 16:40:27 -08:00
pid.c Merge branch 'exec-update-lock-for-v5.11' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2020-12-15 19:36:48 -08:00
profile.c kernel: Initialize cpumask before parsing 2021-04-10 13:35:54 +02:00
ptrace.c sched: Change task_struct::state 2021-06-18 11:43:09 +02:00
range.c
reboot.c reboot: Add hardware protection power-off 2021-06-21 13:08:36 +01:00
regset.c
relay.c relay: allow the use of const callback structs 2020-12-15 22:46:18 -08:00
resource_kunit.c resource: provide meaningful MODULE_LICENSE() in test suite 2020-11-25 18:52:35 +01:00
resource.c kernel/resource: fix return code check in __request_free_mem_region 2021-05-14 19:41:32 -07:00
rseq.c rseq: Optimise rseq_get_rseq_cs() and clear_rseq_cs() 2021-04-14 18:04:09 +02:00
scftorture.c scftorture: Avoid false-positive warnings in scftorture_invoker() 2021-07-06 12:37:55 -07:00
scs.c scs: switch to vmapped shadow stacks 2020-12-01 10:30:28 +00:00
seccomp.c seccomp: Support atomic "addfd + send reply" 2021-06-28 12:49:52 -07:00
signal.c Fix UCOUNT_RLIMIT_SIGPENDING counter leak 2021-07-08 11:43:24 -07:00
smp.c smp: Fix smp_call_function_single_async prototype 2021-05-06 15:33:49 +02:00
smpboot.c smpboot: fix duplicate and misplaced inlining directive 2021-07-25 11:06:37 -07:00
smpboot.h
softirq.c sched: Introduce task_is_running() 2021-06-18 11:43:07 +02:00
stackleak.c
stacktrace.c
static_call.c static_call: Fix static_call_text_reserved() vs __init 2021-07-05 10:46:33 +02:00
stop_machine.c stop_machine: Add caller debug info to queue_stop_cpus_work 2021-03-23 16:01:58 +01:00
sys_ni.c mm: introduce memfd_secret system call to create "secret" memory areas 2021-07-08 11:48:21 -07:00
sys.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-06-28 20:39:26 -07:00
sysctl-test.c kernel/sysctl-test: Remove some casts which are no-longer required 2021-06-23 16:41:24 -06:00
sysctl.c Merge branch 'akpm' (patches from Andrew) 2021-07-02 12:08:10 -07:00
task_work.c kasan: record task_work_add() call stack 2021-04-30 11:20:42 -07:00
taskstats.c treewide: rename nla_strlcpy to nla_strscpy. 2020-11-16 08:08:54 -08:00
test_kprobes.c
torture.c torture: Replace torture_init_begin string with %s 2021-03-08 14:22:28 -08:00
tracepoint.c tracepoints: Update static_call before tp_funcs when adding a tracepoint 2021-07-23 08:46:22 -04:00
tsacct.c
ucount.c ucounts: Fix race condition between alloc_ucounts and put_ucounts 2021-07-28 12:31:51 -05:00
uid16.c
uid16.h
umh.c kernel/umh.c: fix some spelling mistakes 2021-05-07 00:26:34 -07:00
up.c A set of locking related fixes and updates: 2021-05-09 13:07:03 -07:00
user_namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2021-06-28 20:39:26 -07:00
user-return-notifier.c
user.c Reimplement RLIMIT_MEMLOCK on top of ucounts 2021-04-30 14:14:02 -05:00
usermode_driver.c Merge branch 'work.namei' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2021-07-03 11:41:14 -07:00
utsname_sysctl.c
utsname.c
watch_queue.c watch_queue: rectify kernel-doc for init_watch() 2021-01-26 11:16:34 +00:00
watchdog_hld.c
watchdog.c kernel: watchdog: modify the explanation related to watchdog thread 2021-06-29 10:53:46 -07:00
workqueue_internal.h
workqueue.c workqueue: fix UAF in pwq_unbound_release_workfn() 2021-07-21 06:42:31 -10:00