linux/kernel/bpf
Daniel Borkmann fdadd04931 bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K
Michael and Sandipan report:

  Commit ede95a63b5 introduced a bpf_jit_limit tuneable to limit BPF
  JIT allocations. At compile time it defaults to PAGE_SIZE * 40000,
  and is adjusted again at init time if MODULES_VADDR is defined.

  For ppc64 kernels, MODULES_VADDR isn't defined, so we're stuck with
  the compile-time default at boot-time, which is 0x9c400000 when
  using 64K page size. This overflows the signed 32-bit bpf_jit_limit
  value:

  root@ubuntu:/tmp# cat /proc/sys/net/core/bpf_jit_limit
  -1673527296

  and can cause various unexpected failures throughout the network
  stack. In one case `strace dhclient eth0` reported:

  setsockopt(5, SOL_SOCKET, SO_ATTACH_FILTER, {len=11, filter=0x105dd27f8},
             16) = -1 ENOTSUPP (Unknown error 524)

  and similar failures can be seen with tools like tcpdump. This doesn't
  always reproduce however, and I'm not sure why. The more consistent
  failure I've seen is an Ubuntu 18.04 KVM guest booted on a POWER9
  host would time out on systemd/netplan configuring a virtio-net NIC
  with no noticeable errors in the logs.

Given this and also given that in near future some architectures like
arm64 will have a custom area for BPF JIT image allocations we should
get rid of the BPF_JIT_LIMIT_DEFAULT fallback / default entirely. For
4.21, we have an overridable bpf_jit_alloc_exec(), bpf_jit_free_exec()
so therefore add another overridable bpf_jit_alloc_exec_limit() helper
function which returns the possible size of the memory area for deriving
the default heuristic in bpf_jit_charge_init().

Like bpf_jit_alloc_exec() and bpf_jit_free_exec(), the new
bpf_jit_alloc_exec_limit() assumes that module_alloc() is the default
JIT memory provider, and therefore in case archs implement their custom
module_alloc() we use MODULES_{END,_VADDR} for limits and otherwise for
vmalloc_exec() cases like on ppc64 we use VMALLOC_{END,_START}.

Additionally, for archs supporting large page sizes, we should change
the sysctl to be handled as long to not run into sysctl restrictions
in future.

Fixes: ede95a63b5 ("bpf: add bpf_jit_limit knob to restrict unpriv allocations")
Reported-by: Sandipan Das <sandipan@linux.ibm.com>
Reported-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Tested-by: Michael Roth <mdroth@linux.vnet.ibm.com>
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2018-12-11 19:12:21 -08:00
..
arraymap.c bpf: return EOPNOTSUPP when map lookup isn't supported 2018-10-09 21:52:20 -07:00
bpf_lru_list.c bpf: lru: Lower the PERCPU_NR_SCANS from 16 to 4 2017-04-17 13:55:52 -04:00
bpf_lru_list.h bpf: Only set node->ref = 1 if it has not been set 2017-09-01 09:57:39 -07:00
btf.c bpf: btf: check name validity for various types 2018-11-28 16:03:04 -08:00
cgroup.c bpf: add cg_skb_is_valid_access for BPF_PROG_TYPE_CGROUP_SKB 2018-10-19 13:49:34 -07:00
core.c bpf: fix bpf_jit_limit knob for PAGE_SIZE >= 64K 2018-12-11 19:12:21 -08:00
cpumap.c bpf: fix redirect to map under tail calls 2018-08-17 15:56:23 -07:00
devmap.c bpf: devmap: fix wrong interface selection in notifier_call 2018-10-26 00:32:21 +02:00
disasm.c bpf: Remove struct bpf_verifier_env argument from print_bpf_insn 2018-03-23 17:38:57 +01:00
disasm.h bpf: Remove struct bpf_verifier_env argument from print_bpf_insn 2018-03-23 17:38:57 +01:00
hashtab.c bpf: add bpffs pretty print for percpu arraymap/hash/lru_hash 2018-08-30 14:03:53 +02:00
helpers.c bpf: fix direct packet write into pop/peek helpers 2018-10-25 17:02:06 -07:00
inode.c bpf: decouple btf from seq bpf fs dump and enable more maps 2018-08-13 00:52:45 +02:00
local_storage.c bpf: allocate local storage buffers using GFP_ATOMIC 2018-11-16 21:17:49 -08:00
lpm_trie.c bpf: decouple btf from seq bpf fs dump and enable more maps 2018-08-13 00:52:45 +02:00
Makefile bpf: add queue and stack maps 2018-10-19 13:24:31 -07:00
map_in_map.c bpf: don't allow create maps of per-cpu cgroup local storages 2018-10-01 16:18:33 +02:00
map_in_map.h bpf: Add syscall lookup support for fd array and htab 2017-06-29 13:13:25 -04:00
offload.c bpf: add verifier callback to get stack usage info for offloaded progs 2018-10-08 10:24:12 +02:00
percpu_freelist.c bpf: fix lockdep splat 2017-11-15 19:46:32 +09:00
percpu_freelist.h bpf: introduce percpu_freelist 2016-03-08 15:28:31 -05:00
queue_stack_maps.c bpf: fix integer overflow in queue_stack_map 2018-11-22 21:29:40 +01:00
reuseport_array.c bpf: Introduce BPF_MAP_TYPE_REUSEPORT_SOCKARRAY 2018-08-11 01:58:46 +02:00
stackmap.c bpf: rename stack trace map operations 2018-10-19 13:24:30 -07:00
syscall.c bpf: fix bpf_prog_get_info_by_fd to return 0 func_lens for unpriv 2018-11-02 13:51:15 -07:00
tnum.c bpf/verifier: improve register value range tracking with ARSH 2018-04-29 08:45:53 -07:00
verifier.c bpf: add per-insn complexity limit 2018-12-04 17:22:02 +01:00
xskmap.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net 2018-10-19 11:03:06 -07:00