linux/kernel/bpf
Daniel Borkmann 10ec8ca8ec bpf: Adjust insufficient default bpf_jit_limit
We've seen recent AWS EKS (Kubernetes) user reports like the following:

  After upgrading EKS nodes from v20230203 to v20230217 on our 1.24 EKS
  clusters after a few days a number of the nodes have containers stuck
  in ContainerCreating state or liveness/readiness probes reporting the
  following error:

    Readiness probe errored: rpc error: code = Unknown desc = failed to
    exec in container: failed to start exec "4a11039f730203ffc003b7[...]":
    OCI runtime exec failed: exec failed: unable to start container process:
    unable to init seccomp: error loading seccomp filter into kernel:
    error loading seccomp filter: errno 524: unknown

  However, we had not been seeing this issue on previous AMIs and it only
  started to occur on v20230217 (following the upgrade from kernel 5.4 to
  5.10) with no other changes to the underlying cluster or workloads.

  We tried the suggestions from that issue (sysctl net.core.bpf_jit_limit=452534528)
  which helped to immediately allow containers to be created and probes to
  execute but after approximately a day the issue returned and the value
  returned by cat /proc/vmallocinfo | grep bpf_jit | awk '{s+=$2} END {print s}'
  was steadily increasing.

I tested bpf tree to observe bpf_jit_charge_modmem, bpf_jit_uncharge_modmem
their sizes passed in as well as bpf_jit_current under tcpdump BPF filter,
seccomp BPF and native (e)BPF programs, and the behavior all looks sane
and expected, that is nothing "leaking" from an upstream perspective.

The bpf_jit_limit knob was originally added in order to avoid a situation
where unprivileged applications loading BPF programs (e.g. seccomp BPF
policies) consuming all the module memory space via BPF JIT such that loading
of kernel modules would be prevented. The default limit was defined back in
2018 and while good enough back then, we are generally seeing far more BPF
consumers today.

Adjust the limit for the BPF JIT pool from originally 1/4 to now 1/2 of the
module memory space to better reflect today's needs and avoid more users
running into potentially hard to debug issues.

Fixes: fdadd04931 ("bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K")
Reported-by: Stephen Haynes <sh@synk.net>
Reported-by: Lefteris Alexakis <lefteris.alexakis@kpn.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://github.com/awslabs/amazon-eks-ami/issues/1179
Link: https://github.com/awslabs/amazon-eks-ami/issues/1219
Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Link: https://lore.kernel.org/r/20230320143725.8394-1-daniel@iogearbox.net
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-03-21 12:43:05 -07:00
..
preload bpf: iterators: Split iterators.lskel.h into little- and big- endian versions 2023-01-28 12:45:15 -08:00
arraymap.c bpf: Do btf_record_free outside map_free callback 2022-11-17 19:11:31 -08:00
bloom_filter.c treewide: use get_random_u32() when possible 2022-10-11 17:42:58 -06:00
bpf_cgrp_storage.c bpf: Fix a compilation failure with clang lto build 2022-11-30 17:13:25 -08:00
bpf_inode_storage.c bpf: Fix a compilation failure with clang lto build 2022-11-30 17:13:25 -08:00
bpf_iter.c bpf: Initialize the bpf_run_ctx in bpf_iter_run_prog() 2022-08-18 17:06:13 -07:00
bpf_local_storage.c bpf: use bpf_map_kvcalloc in bpf_local_storage 2023-02-10 18:59:56 -08:00
bpf_lru_list.c bpf_lru_list: Read double-checked variable once without lock 2021-02-10 15:54:26 -08:00
bpf_lru_list.h printk: stop including cache.h from printk.h 2022-05-13 07:20:07 -07:00
bpf_lsm.c bpf: Fix the kernel crash caused by bpf_setsockopt(). 2023-01-26 23:26:40 -08:00
bpf_struct_ops_types.h bpf: Add dummy BPF STRUCT_OPS for test purpose 2021-11-01 14:10:00 -07:00
bpf_struct_ops.c mm: Introduce set_memory_rox() 2022-12-15 10:37:26 -08:00
bpf_task_storage.c bpf: Fix a compilation failure with clang lto build 2022-11-30 17:13:25 -08:00
btf.c btf: fix resolving BTF_KIND_VAR after ARRAY, STRUCT, UNION, PTR 2023-03-06 11:44:13 -08:00
cgroup_iter.c bpf: Pin the start cgroup in cgroup_iter_seq_init() 2022-11-21 17:40:42 +01:00
cgroup.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net 2022-10-03 17:44:18 -07:00
core.c bpf: Adjust insufficient default bpf_jit_limit 2023-03-21 12:43:05 -07:00
cpumap.c net: skbuff: drop the word head from skb cache 2023-02-10 09:10:28 +00:00
cpumask.c bpf: Add __bpf_kfunc tag to all kfuncs 2023-02-02 00:25:14 +01:00
devmap.c bpf: devmap: check XDP features in __xdp_enqueue routine 2023-02-02 20:48:24 -08:00
disasm.c bpf: Relicense disassembler as GPL-2.0-only OR BSD-2-Clause 2021-09-02 14:49:23 +02:00
disasm.h bpf: Relicense disassembler as GPL-2.0-only OR BSD-2-Clause 2021-09-02 14:49:23 +02:00
dispatcher.c bpf: Synchronize dispatcher update with bpf_dispatcher_xdp_func 2022-12-14 12:02:14 -08:00
hashtab.c bpf: Zeroing allocated object from slab in bpf memory allocator 2023-02-15 15:40:06 -08:00
helpers.c bpf: Add bpf_rbtree_{add,remove,first} kfuncs 2023-02-13 19:40:48 -08:00
inode.c fs: port inode_init_owner() to mnt_idmap 2023-01-19 09:24:28 +01:00
Kconfig rcu: Make the TASKS_RCU Kconfig option be selected 2022-04-20 16:52:58 -07:00
link_iter.c bpf: Add bpf_link iterator 2022-05-10 11:20:45 -07:00
local_storage.c bpf: Consolidate spin_lock, timer management into btf_record 2022-11-03 22:19:40 -07:00
lpm_trie.c bpf: Use bpf_map_area_alloc consistently on bpf map creation 2022-08-10 11:50:43 -07:00
Makefile bpf: Enable cpumasks to be queried and used as kptrs 2023-01-25 07:57:49 -08:00
map_in_map.c bpf: Add comments for map BTF matching requirement for bpf_list_head 2022-11-17 19:22:14 -08:00
map_in_map.h bpf: Add map_meta_equal map ops 2020-08-28 15:41:30 +02:00
map_iter.c bpf: Introduce MEM_RDONLY flag 2021-12-18 13:27:41 -08:00
memalloc.c bpf: Zeroing allocated object from slab in bpf memory allocator 2023-02-15 15:40:06 -08:00
mmap_unlock_work.h bpf: Introduce helper bpf_find_vma 2021-11-07 11:54:51 -08:00
net_namespace.c net: Add includes masked by netdevice.h including uapi/bpf.h 2021-12-29 20:03:05 -08:00
offload.c bpf: Drop always true do_idr_lock parameter to bpf_map_free_id 2023-02-02 20:26:12 -08:00
percpu_freelist.c bpf: Initialize same number of free nodes for each pcpu_freelist 2022-11-11 12:05:14 -08:00
percpu_freelist.h bpf: Use raw_spin_trylock() for pcpu_freelist_push/pop in NMI 2020-10-06 00:04:11 +02:00
prog_iter.c bpf: Refactor bpf_iter_reg to have separate seq_info member 2020-07-25 20:16:32 -07:00
queue_stack_maps.c bpf: Remove unneeded memset in queue_stack_map creation 2022-08-10 11:48:22 -07:00
reuseport_array.c net: Fix suspicious RCU usage in bpf_sk_reuseport_detach() 2022-08-17 16:42:59 -07:00
ringbuf.c mm: replace vma->vm_flags direct modifications with modifier calls 2023-02-09 16:51:39 -08:00
stackmap.c perf/bpf: Always use perf callchains if exist 2022-09-13 15:03:22 +02:00
syscall.c - Daniel Verkamp has contributed a memfd series ("mm/memfd: add 2023-02-23 17:09:35 -08:00
sysfs_btf.c bpf: Load and verify kernel module BTFs 2020-11-10 15:25:53 -08:00
task_iter.c bpf: keep a reference to the mm, in case the task is dead. 2022-12-28 14:11:48 -08:00
tnum.c bpf, tnums: Provably sound, faster, and more precise algorithm for tnum_mul 2021-06-01 13:34:15 +02:00
trampoline.c bpf: Fix panic due to wrong pageattr of im->image 2022-12-28 13:46:28 -08:00
verifier.c bpf: Allow reads from uninit stack 2023-02-22 12:34:50 -08:00