2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2025-01-23 14:13:58 +08:00
linux-next/kernel/bpf
Song Liu 39d8f0d102 bpf: Use raw_spin_trylock() for pcpu_freelist_push/pop in NMI
Recent improvements in LOCKDEP highlighted a potential A-A deadlock with
pcpu_freelist in NMI:

./tools/testing/selftests/bpf/test_progs -t stacktrace_build_id_nmi

[   18.984807] ================================
[   18.984807] WARNING: inconsistent lock state
[   18.984808] 5.9.0-rc6-01771-g1466de1330e1 #2967 Not tainted
[   18.984809] --------------------------------
[   18.984809] inconsistent {INITIAL USE} -> {IN-NMI} usage.
[   18.984810] test_progs/1990 [HC2[2]:SC0[0]:HE0:SE1] takes:
[   18.984810] ffffe8ffffc219c0 (&head->lock){....}-{2:2}, at: __pcpu_freelist_pop+0xe3/0x180
[   18.984813] {INITIAL USE} state was registered at:
[   18.984814]   lock_acquire+0x175/0x7c0
[   18.984814]   _raw_spin_lock+0x2c/0x40
[   18.984815]   __pcpu_freelist_pop+0xe3/0x180
[   18.984815]   pcpu_freelist_pop+0x31/0x40
[   18.984816]   htab_map_alloc+0xbbf/0xf40
[   18.984816]   __do_sys_bpf+0x5aa/0x3ed0
[   18.984817]   do_syscall_64+0x2d/0x40
[   18.984818]   entry_SYSCALL_64_after_hwframe+0x44/0xa9
[   18.984818] irq event stamp: 12
[...]
[   18.984822] other info that might help us debug this:
[   18.984823]  Possible unsafe locking scenario:
[   18.984823]
[   18.984824]        CPU0
[   18.984824]        ----
[   18.984824]   lock(&head->lock);
[   18.984826]   <Interrupt>
[   18.984826]     lock(&head->lock);
[   18.984827]
[   18.984828]  *** DEADLOCK ***
[   18.984828]
[   18.984829] 2 locks held by test_progs/1990:
[...]
[   18.984838]  <NMI>
[   18.984838]  dump_stack+0x9a/0xd0
[   18.984839]  lock_acquire+0x5c9/0x7c0
[   18.984839]  ? lock_release+0x6f0/0x6f0
[   18.984840]  ? __pcpu_freelist_pop+0xe3/0x180
[   18.984840]  _raw_spin_lock+0x2c/0x40
[   18.984841]  ? __pcpu_freelist_pop+0xe3/0x180
[   18.984841]  __pcpu_freelist_pop+0xe3/0x180
[   18.984842]  pcpu_freelist_pop+0x17/0x40
[   18.984842]  ? lock_release+0x6f0/0x6f0
[   18.984843]  __bpf_get_stackid+0x534/0xaf0
[   18.984843]  bpf_prog_1fd9e30e1438d3c5_oncpu+0x73/0x350
[   18.984844]  bpf_overflow_handler+0x12f/0x3f0

This is because pcpu_freelist_head.lock is accessed in both NMI and
non-NMI context. Fix this issue by using raw_spin_trylock() in NMI.

Since NMI interrupts non-NMI context, when NMI context tries to lock the
raw_spinlock, non-NMI context of the same CPU may already have locked a
lock and is blocked from unlocking the lock. For a system with N CPUs,
there could be N NMIs at the same time, and they may block N non-NMI
raw_spinlocks. This is tricky for pcpu_freelist_push(), where unlike
_pop(), failing _push() means leaking memory. This issue is more likely to
trigger in non-SMP system.

Fix this issue with an extra list, pcpu_freelist.extralist. The extralist
is primarily used to take _push() when raw_spin_trylock() failed on all
the per CPU lists. It should be empty most of the time. The following
table summarizes the behavior of pcpu_freelist in NMI and non-NMI:

non-NMI pop(): 	use _lock(); check per CPU lists first;
                if all per CPU lists are empty, check extralist;
                if extralist is empty, return NULL.

non-NMI push(): use _lock(); only push to per CPU lists.

NMI pop():    use _trylock(); check per CPU lists first;
              if all per CPU lists are locked or empty, check extralist;
              if extralist is locked or empty, return NULL.

NMI push():   use _trylock(); check per CPU lists first;
              if all per CPU lists are locked; try push to extralist;
              if extralist is also locked, keep trying on per CPU lists.

Reported-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Song Liu <songliubraving@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Martin KaFai Lau <kafai@fb.com>
Link: https://lore.kernel.org/bpf/20201005165838.3735218-1-songliubraving@fb.com
2020-10-06 00:04:11 +02:00
..
preload bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach 2020-09-29 13:09:23 -07:00
arraymap.c bpf: Introduce BPF_F_PRESERVE_ELEMS for perf event array 2020-09-30 23:18:12 -07:00
bpf_inode_storage.c bpf: Allow specifying a BTF ID per argument in function protos 2020-09-21 15:00:40 -07:00
bpf_iter.c bpf: Bump iter seq size to support BTF representation of large data structures 2020-09-28 18:26:58 -07:00
bpf_local_storage.c bpf: Use hlist_add_head_rcu when linking to local_storage 2020-09-19 01:12:35 +02:00
bpf_lru_list.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 206 2019-05-30 11:29:53 -07:00
bpf_lru_list.h bpf: Fix a typo "inacitve" -> "inactive" 2020-04-06 21:54:10 +02:00
bpf_lsm.c bpf: Change bpf_sk_storage_*() to accept ARG_PTR_TO_BTF_ID_SOCK_COMMON 2020-09-25 13:58:01 -07:00
bpf_struct_ops_types.h bpf: tcp: Support tcp_congestion_ops in bpf 2020-01-09 08:46:18 -08:00
bpf_struct_ops.c bpf: Move btf_resolve_size into __btf_resolve_size 2020-08-25 15:37:41 -07:00
btf.c bpf: Introduce bpf_per_cpu_ptr() 2020-10-02 15:00:49 -07:00
cgroup.c treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
core.c bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach 2020-09-29 13:09:23 -07:00
cpumap.c bpf, cpumap: Remove rcpu pointer from cpu_map_build_skb signature 2020-09-28 23:30:42 +02:00
devmap.c bpf: {cpu,dev}map: Change various functions return type from int to void 2020-09-01 15:45:58 +02:00
disasm.c treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 2019-06-05 17:36:38 +02:00
disasm.h treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 295 2019-06-05 17:36:38 +02:00
dispatcher.c bpf: Remove bpf_image tree 2020-03-13 12:49:52 -07:00
hashtab.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
helpers.c bpf: Introducte bpf_this_cpu_ptr() 2020-10-02 15:00:49 -07:00
inode.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2020-09-22 16:45:34 -07:00
local_storage.c bpf/local_storage: Fix build without CONFIG_CGROUP 2020-07-25 20:16:36 -07:00
lpm_trie.c bpf: Add map_meta_equal map ops 2020-08-28 15:41:30 +02:00
Makefile bpf: Implement bpf_local_storage for inodes 2020-08-25 15:00:04 -07:00
map_in_map.c bpf: Relax max_entries check for most of the inner map types 2020-08-28 15:41:30 +02:00
map_in_map.h bpf: Add map_meta_equal map ops 2020-08-28 15:41:30 +02:00
map_iter.c bpf: Implement link_query callbacks in map element iterators 2020-08-21 14:01:39 -07:00
net_namespace.c bpf: Add support for forced LINK_DETACH command 2020-08-01 20:38:28 -07:00
offload.c bpf, offload: Replace bitwise AND by logical AND in bpf_prog_offload_info_fill 2020-02-17 16:53:49 +01:00
percpu_freelist.c bpf: Use raw_spin_trylock() for pcpu_freelist_push/pop in NMI 2020-10-06 00:04:11 +02:00
percpu_freelist.h bpf: Use raw_spin_trylock() for pcpu_freelist_push/pop in NMI 2020-10-06 00:04:11 +02:00
prog_iter.c bpf: Refactor bpf_iter_reg to have separate seq_info member 2020-07-25 20:16:32 -07:00
queue_stack_maps.c bpf: Add map_meta_equal map ops 2020-08-28 15:41:30 +02:00
reuseport_array.c bpf, net: Rework cookie generator as per-cpu one 2020-09-30 11:50:35 -07:00
ringbuf.c bpf: Add map_meta_equal map ops 2020-08-28 15:41:30 +02:00
stackmap.c bpf: Allow specifying a BTF ID per argument in function protos 2020-09-21 15:00:40 -07:00
syscall.c bpf: Deref map in BPF_PROG_BIND_MAP when it's already used 2020-10-02 19:21:25 -07:00
sysfs_btf.c bpf: Support llvm-objcopy for vmlinux BTF 2020-03-19 12:32:38 +01:00
task_iter.c bpf: Avoid iterating duplicated files for task_file iterator 2020-09-02 16:40:33 +02:00
tnum.c bpf: Verifier, do explicit ALU32 bounds tracking 2020-03-30 14:59:53 -07:00
trampoline.c bpf: Move prog->aux->linked_prog and trampoline into bpf_link on attach 2020-09-29 13:09:23 -07:00
verifier.c bpf, verifier: Use fallthrough pseudo-keyword 2020-10-05 15:52:36 +02:00