linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-16 08:44:21 +08:00

Author	SHA1	Message	Date
Andrey Ignatov	d7af7e497f	bpf: Fix possible out of bound write in narrow load handling Fix a verifier bug found by smatch static checker in [0]. This problem has never been seen in prod to my best knowledge. Fixing it still seems to be a good idea since it's hard to say for sure whether it's possible or not to have a scenario where a combination of convert_ctx_access() and a narrow load would lead to an out of bound write. When narrow load is handled, one or two new instructions are added to insn_buf array, but before it was only checked that cnt >= ARRAY_SIZE(insn_buf) And it's safe to add a new instruction to insn_buf[cnt++] only once. The second try will lead to out of bound write. And this is what can happen if `shift` is set. Fix it by making sure that if the BPF_RSH instruction has to be added in addition to BPF_AND then there is enough space for two more instructions in insn_buf. The full report [0] is below: kernel/bpf/verifier.c:12304 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array kernel/bpf/verifier.c:12311 convert_ctx_accesses() warn: offset 'cnt' incremented past end of array kernel/bpf/verifier.c 12282 12283 insn->off = off & ~(size_default - 1); 12284 insn->code = BPF_LDX \| BPF_MEM \| size_code; 12285 } 12286 12287 target_size = 0; 12288 cnt = convert_ctx_access(type, insn, insn_buf, env->prog, 12289 &target_size); 12290 if (cnt == 0 \|\| cnt >= ARRAY_SIZE(insn_buf) \|\| ^^^^^^^^^^^^^^^^^^^^^^^^^^^ Bounds check. 12291 (ctx_field_size && !target_size)) { 12292 verbose(env, "bpf verifier is misconfigured\n"); 12293 return -EINVAL; 12294 } 12295 12296 if (is_narrower_load && size < target_size) { 12297 u8 shift = bpf_ctx_narrow_access_offset( 12298 off, size, size_default) * 8; 12299 if (ctx_field_size <= 4) { 12300 if (shift) 12301 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_RSH, ^^^^^ increment beyond end of array 12302 insn->dst_reg, 12303 shift); --> 12304 insn_buf[cnt++] = BPF_ALU32_IMM(BPF_AND, insn->dst_reg, ^^^^^ out of bounds write 12305 (1 << size * 8) - 1); 12306 } else { 12307 if (shift) 12308 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_RSH, 12309 insn->dst_reg, 12310 shift); 12311 insn_buf[cnt++] = BPF_ALU64_IMM(BPF_AND, insn->dst_reg, ^^^^^^^^^^^^^^^ Same. 12312 (1ULL << size * 8) - 1); 12313 } 12314 } 12315 12316 new_prog = bpf_patch_insn_data(env, i + delta, insn_buf, cnt); 12317 if (!new_prog) 12318 return -ENOMEM; 12319 12320 delta += cnt - 1; 12321 12322 /* keep walking new program and skip insns we just inserted */ 12323 env->prog = new_prog; 12324 insn = new_prog->insnsi + i + delta; 12325 } 12326 12327 return 0; 12328 } [0] https://lore.kernel.org/bpf/20210817050843.GA21456@kili/ v1->v2: - clarify that problem was only seen by static checker but not in prod; Fixes: `46f53a65d2` ("bpf: Allow narrow loads with offset > 0") Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Andrey Ignatov <rdna@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820163935.1902398-1-rdna@fb.com	2021-08-24 14:32:26 -07:00
Alexei Starovoitov	f63693e3ae	Merge branch 'bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_SK_MSG' Xu Liu says: ==================== We'd like to be able to identify netns from sk_msg hooks to accelerate local process communication form different netns. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2021-08-24 14:17:53 -07:00
Xu Liu	6cbca1ee0d	selftests/bpf: Test for get_netns_cookie Add test to use get_netns_cookie() from BPF_PROG_TYPE_SK_MSG. Signed-off-by: Xu Liu <liuxu623@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820071712.52852-3-liuxu623@gmail.com	2021-08-24 14:17:53 -07:00
Xu Liu	fab60e29fc	bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_SK_MSG We'd like to be able to identify netns from sk_msg hooks to accelerate local process communication form different netns. Signed-off-by: Xu Liu <liuxu623@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820071712.52852-2-liuxu623@gmail.com	2021-08-24 14:17:53 -07:00
Alexei Starovoitov	8c0bb89e8e	Merge branch 'selftests/bpf: minor fixups' Li Zhijian says: ==================== Fix a few issues reported by 0Day/LKP during runing selftests/bpf. Changelog: V2: - folded previous similar standalone patch to [1/5], and add acked tag from Song Liu - add acked tag to [2/5], [3/5] from Song Liu - [4/5]: move test_bpftool.py to TEST_PROGS_EXTENDED, files in TEST_GEN_PROGS_EXTENDED are generated by make. Otherwise, it will break out-of-tree install: 'make O=/kselftest-build SKIP_TARGETS= V=1 -C tools/testing/selftests install INSTALL_PATH=/kselftest-install' - [5/5]: new patch ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2021-08-24 14:01:11 -07:00
Li Zhijian	00e1116031	selftests/bpf: Exit with KSFT_SKIP if no Makefile found This would happend when we run the tests after install kselftests root@lkp-skl-d01 ~# /kselftests/run_kselftest.sh -t bpf:test_doc_build.sh TAP version 13 1..1 # selftests: bpf: test_doc_build.sh perl: warning: Setting locale failed. perl: warning: Please check that your locale settings: LANGUAGE = (unset), LC_ALL = (unset), LC_ADDRESS = "en_US.UTF-8", LC_NAME = "en_US.UTF-8", LC_MONETARY = "en_US.UTF-8", LC_PAPER = "en_US.UTF-8", LC_IDENTIFICATION = "en_US.UTF-8", LC_TELEPHONE = "en_US.UTF-8", LC_MEASUREMENT = "en_US.UTF-8", LC_TIME = "en_US.UTF-8", LC_NUMERIC = "en_US.UTF-8", LANG = "en_US.UTF-8" are supported and installed on your system. perl: warning: Falling back to the standard locale ("C"). # skip: bpftool files not found! # ok 1 selftests: bpf: test_doc_build.sh # SKIP Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820025549.28325-1-lizhijian@cn.fujitsu.com	2021-08-24 14:01:10 -07:00
Li Zhijian	404bd9ff5d	selftests/bpf: Add missing files required by test_bpftool.sh for installing test_bpftool.sh relies on bpftool and test_bpftool.py. 'make install' will install bpftool to INSTALL_PATH/bpf/bpftool, and export it to PATH so that it can be used after installing. Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210820015556.23276-5-lizhijian@cn.fujitsu.com	2021-08-24 14:01:10 -07:00
Li Zhijian	7a3bdca20b	selftests/bpf: Add default bpftool built by selftests to PATH For 'make run_tests': selftests will build bpftool into tools/testing/selftests/bpf/tools/sbin/bpftool by default. ================== root@lkp-skl-d01 /opt/rootfs/v5.14-rc4# make -C tools/testing/selftests/bpf run_tests make: Entering directory '/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf' MKDIR include MKDIR libbpf MKDIR bpftool [...] GEN /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/profiler.skel.h CC /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/prog.o GEN /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/pid_iter.skel.h CC /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/pids.o LINK /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/tools/build/bpftool/bpftool INSTALL bpftool GEN vmlinux.h [...] # test_feature_dev_json (test_bpftool.TestBpftool) ... ERROR # test_feature_kernel (test_bpftool.TestBpftool) ... ERROR # test_feature_kernel_full (test_bpftool.TestBpftool) ... ERROR # test_feature_kernel_full_vs_not_full (test_bpftool.TestBpftool) ... ERROR # test_feature_macros (test_bpftool.TestBpftool) ... Error: bug: failed to retrieve CAP_BPF status: Invalid argument # ERROR # # ====================================================================== # ERROR: test_feature_dev_json (test_bpftool.TestBpftool) # ---------------------------------------------------------------------- # Traceback (most recent call last): # File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 57, in wrapper # return f(args, iface, kwargs) # File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 82, in test_feature_dev_json # res = bpftool_json(["feature", "probe", "dev", iface]) # File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 42, in bpftool_json # res = _bpftool(args) # File "/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/test_bpftool.py", line 34, in _bpftool # return subprocess.check_output(_args) # File "/usr/lib/python3.7/subprocess.py", line 395, in check_output # *kwargs).stdout # File "/usr/lib/python3.7/subprocess.py", line 487, in run # output=stdout, stderr=stderr) # subprocess.CalledProcessError: Command '['bpftool', '-j', 'feature', 'probe', 'dev', 'dummy0']' returned non-zero exit status 255. # ================== Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210820015556.23276-4-lizhijian@cn.fujitsu.com	2021-08-24 14:01:10 -07:00
Li Zhijian	5a980b5baf	selftests/bpf: Make test_doc_build.sh work from script directory Previously, it fails as below: ------------- root@lkp-skl-d01 /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf# ./test_doc_build.sh ++ realpath --relative-to=/opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf ./test_doc_build.sh + SCRIPT_REL_PATH=test_doc_build.sh ++ dirname test_doc_build.sh + SCRIPT_REL_DIR=. ++ realpath /opt/rootfs/v5.14-rc4/tools/testing/selftests/bpf/./../../../../ + KDIR_ROOT_DIR=/opt/rootfs/v5.14-rc4 + cd /opt/rootfs/v5.14-rc4 + for tgt in docs docs-clean + make -s -C /opt/rootfs/v5.14-rc4/. docs make: * No rule to make target 'docs'. Stop. + for tgt in docs docs-clean + make -s -C /opt/rootfs/v5.14-rc4/. docs-clean make: * No rule to make target 'docs-clean'. Stop. ----------- Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210820015556.23276-3-lizhijian@cn.fujitsu.com	2021-08-24 14:01:10 -07:00
Li Zhijian	2d82d73da3	selftests/bpf: Enlarge select() timeout for test_maps 0Day robot observed that it's easily timeout on a heavy load host. ------------------- # selftests: bpf: test_maps # Fork 1024 tasks to 'test_update_delete' # Fork 1024 tasks to 'test_update_delete' # Fork 100 tasks to 'test_hashmap' # Fork 100 tasks to 'test_hashmap_percpu' # Fork 100 tasks to 'test_hashmap_sizes' # Fork 100 tasks to 'test_hashmap_walk' # Fork 100 tasks to 'test_arraymap' # Fork 100 tasks to 'test_arraymap_percpu' # Failed sockmap unexpected timeout not ok 3 selftests: bpf: test_maps # exit=1 # selftests: bpf: test_lru_map # nr_cpus:8 ------------------- Since this test will be scheduled by 0Day to a random host that could have only a few cpus(2-8), enlarge the timeout to avoid a false NG report. In practice, i tried to pin it to only one cpu by 'taskset 0x01 ./test_maps', and knew 10S is likely enough, but i still perfer to a larger value 30. Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210820015556.23276-2-lizhijian@cn.fujitsu.com	2021-08-24 14:01:10 -07:00
Yucong Sun	a6258837c8	selftests/bpf: Reduce flakyness in timer_mim This patch extends wait time in timer_mim. As observed in slow CI environment, it is possible to have interrupt/preemption long enough to cause the test to fail, almost 1 failure in 5 runs. Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210823213629.3519641-1-fallentree@fb.com	2021-08-23 18:01:47 -07:00
Alexei Starovoitov	4ed589a278	Merge branch 'Refactor cgroup_bpf internals to use more specific attach_type' Dave Marchevsky says: ==================== The cgroup_bpf struct has a few arrays (effective, progs, and flags) of size MAX_BPF_ATTACH_TYPE. These are meant to separate progs by their attach type, currently represented by the bpf_attach_type enum. There are some bpf_attach_type values which are not valid attach types for cgroup bpf programs. Programs with these attach types will never be handled by cgroup_bpf_{attach,detach} and thus will never be held in cgroup_bpf structs. Even if such programs did make it into their reserved slot in those arrays, they would never be executed. Accordingly we can migrate to a new internal cgroup_bpf-specific enum for these arrays, saving some bytes per cgroup and making it more obvious which BPF programs belong there. netns_bpf_attach_type is an existing example of this pattern, let's do similar for cgroup_bpf. v1->v2: Address Daniel's comments * Reverse xmas tree ordering for def changes * Helper macro to reduce to_cgroup_bpf_attach_type boilerplate * checkpatch.pl complains: "ERROR: Macros with complex values should be enclosed in parentheses". Found some existing macros (do 'git grep "define case"') which get same complaint. Think it's fine to keep as-is since it's immediately undef'd. * Remove CG_BPF_ prefix from cgroup_bpf_attach_type * Although I agree that the prefix is redundant, the de-prefixed names feel a bit too 'general' given the internal use of the enum. e.g. when someone sees CGROUP_INET6_BIND it's not obvious that it should only be used in certain ways internally. * Don't feel strongly about this, just my thoughts as a noob to the internals. * Rebase onto latest bpf-next/master * No significant conflicts, some small boilerplate adjustments needed to catch up to Andrii's "bpf: Refactor BPF_PROG_RUN_ARRAY family of macros into functions" change ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2021-08-23 17:50:24 -07:00
Dave Marchevsky	6fc88c354f	bpf: Migrate cgroup_bpf to internal cgroup_bpf_attach_type enum Add an enum (cgroup_bpf_attach_type) containing only valid cgroup_bpf attach types and a function to map bpf_attach_type values to the new enum. Inspired by netns_bpf_attach_type. Then, migrate cgroup_bpf to use cgroup_bpf_attach_type wherever possible. Functionality is unchanged as attach_type_to_prog_type switches in bpf/syscall.c were preventing non-cgroup programs from making use of the invalid cgroup_bpf array slots. As a result struct cgroup_bpf uses 504 fewer bytes relative to when its arrays were sized using MAX_BPF_ATTACH_TYPE. bpf_cgroup_storage is notably not migrated as struct bpf_cgroup_storage_key is part of uapi and contains a bpf_attach_type member which is not meant to be opaque. Similarly, bpf_cgroup_link continues to report its bpf_attach_type member to userspace via fdinfo and bpf_link_info. To ease disambiguation, bpf_attach_type variables are renamed from 'type' to 'atype' when changed to cgroup_bpf_attach_type. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20210819092420.1984861-2-davemarchevsky@fb.com	2021-08-23 17:50:24 -07:00
Jiang Wang	d359902d5c	af_unix: Fix NULL pointer bug in unix_shutdown Commit `94531cfcbe` ("af_unix: Add unix_stream_proto for sockmap") introduced a bug for af_unix SEQPACKET type. In unix_shutdown, the unhash function will call prot->unhash(), which is NULL for SEQPACKET. And kernel will panic. On ARM32, it will show following messages: (it likely affects x86 too). Fix the bug by checking the prot->unhash is NULL or not first. Kernel log: <--- cut here --- Unable to handle kernel NULL pointer dereference at virtual address 00000000 pgd = 2fba1ffb *pgd=00000000 Internal error: Oops: 80000005 [#1] PREEMPT SMP THUMB2 Modules linked in: CPU: 1 PID: 1999 Comm: falkon Tainted: G W 5.14.0-rc5-01175-g94531cfcbe79-dirty #9240 Hardware name: NVIDIA Tegra SoC (Flattened Device Tree) PC is at 0x0 LR is at unix_shutdown+0x81/0x1a8 pc : [<00000000>] lr : [<c08f3311>] psr: 600f0013 sp : e45aff70 ip : e463a3c0 fp : beb54f04 r10: 00000125 r9 : e45ae000 r8 : c4a56664 r7 : 00000001 r6 : c4a56464 r5 : 00000001 r4 : c4a56400 r3 : 00000000 r2 : c5a6b180 r1 : 00000000 r0 : c4a56400 Flags: nZCv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none Control: 50c5387d Table: 05aa804a DAC: 00000051 Register r0 information: slab PING start c4a56400 pointer offset 0 Register r1 information: NULL pointer Register r2 information: slab task_struct start c5a6b180 pointer offset 0 Register r3 information: NULL pointer Register r4 information: slab PING start c4a56400 pointer offset 0 Register r5 information: non-paged memory Register r6 information: slab PING start c4a56400 pointer offset 100 Register r7 information: non-paged memory Register r8 information: slab PING start c4a56400 pointer offset 612 Register r9 information: non-slab/vmalloc memory Register r10 information: non-paged memory Register r11 information: non-paged memory Register r12 information: slab filp start e463a3c0 pointer offset 0 Process falkon (pid: 1999, stack limit = 0x9ec48895) Stack: (0xe45aff70 to 0xe45b0000) ff60: e45ae000 c5f26a00 00000000 00000125 ff80: c0100264 c07f7fa3 beb54f04 fffffff7 00000001 e6f3fc0e b5e5e9ec beb54ec4 ffa0: b5da0ccc c010024b b5e5e9ec beb54ec4 0000000f 00000000 00000000 beb54ebc ffc0: b5e5e9ec beb54ec4 b5da0ccc 00000125 beb54f58 00785238 beb5529c beb54f04 ffe0: b5da1e24 beb54eac b301385c b62b6ee8 600f0030 0000000f 00000000 00000000 [<c08f3311>] (unix_shutdown) from [<c07f7fa3>] (__sys_shutdown+0x2f/0x50) [<c07f7fa3>] (__sys_shutdown) from [<c010024b>] (__sys_trace_return+0x1/0x16) Exception stack(0xe45affa8 to 0xe45afff0) Fixes: `94531cfcbe` ("af_unix: Add unix_stream_proto for sockmap") Reported-by: Dmitry Osipenko <digetx@gmail.com> Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Dmitry Osipenko <digetx@gmail.com> Acked-by: Kuniyuki Iwashima <kuniyu@amazon.co.jp> Link: https://lore.kernel.org/bpf/20210821180738.1151155-1-jiang.wang@bytedance.com	2021-08-23 14:56:01 +02:00
Prankur Gupta	f2a6ee924d	selftests/bpf: Add tests for {set\|get} socket option from setsockopt BPF Adding selftests for the newly added functionality to call bpf_setsockopt() and bpf_getsockopt() from setsockopt BPF programs. Test Details: 1. BPF Program Checks for changes in IPV6_TCLASS(SOL_IPV6) via setsockopt If the cca for the socket is not cubic do nothing If the newly set value for IPV6_TCLASS is 45 (0x2d) (as per our use-case) then change the cc from cubic to reno 2. User Space Program Creates an AF_INET6 socket and set the cca for that to be "cubic" Attach the program and set the IPV6_TCLASS to 0x2d using setsockopt Verify the cca for the socket changed to reno Signed-off-by: Prankur Gupta <prankgup@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210817224221.3257826-3-prankgup@fb.com	2021-08-20 01:10:01 +02:00
Prankur Gupta	2c531639de	bpf: Add support for {set\|get} socket options from setsockopt BPF Add logic to call bpf_setsockopt() and bpf_getsockopt() from setsockopt BPF programs. An example use case is when the user sets the IPV6_TCLASS socket option, we would also like to change the tcp-cc for that socket. We don't have any use case for calling bpf_setsockopt() from supposedly read- only sys_getsockopt(), so it is made available to BPF_CGROUP_SETSOCKOPT only at this point. Signed-off-by: Prankur Gupta <prankgup@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210817224221.3257826-2-prankgup@fb.com	2021-08-20 01:04:52 +02:00
Stanislav Fomichev	44779a4b85	bpf: Use kvmalloc for map keys in syscalls Same as previous patch but for the keys. memdup_bpfptr is renamed to kvmemdup_bpfptr (and converted to kvmalloc). Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210818235216.1159202-2-sdf@google.com	2021-08-20 00:09:49 +02:00
Stanislav Fomichev	f0dce1d9b7	bpf: Use kvmalloc for map values in syscall Use kvmalloc/kvfree for temporary value when manipulating a map via syscall. kmalloc might not be sufficient for percpu maps where the value is big (and further multiplied by hundreds of CPUs). Can be reproduced with netcnt test on qemu with "-smp 255". Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210818235216.1159202-1-sdf@google.com	2021-08-20 00:09:38 +02:00
Yucong Sun	3666b167ea	selftests/bpf: Adding delay in socketmap_listen to reduce flakyness This patch adds a 1ms delay to reduce flakyness of the test. Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210819163609.2583758-1-fallentree@fb.com	2021-08-19 12:28:20 -07:00
Yonghong Song	594286b757	bpf: Fix NULL event->prog pointer access in bpf_overflow_handler Andrii reported that libbpf CI hit the following oops when running selftest send_signal: [ 1243.160719] BUG: kernel NULL pointer dereference, address: 0000000000000030 [ 1243.161066] #PF: supervisor read access in kernel mode [ 1243.161066] #PF: error_code(0x0000) - not-present page [ 1243.161066] PGD 0 P4D 0 [ 1243.161066] Oops: 0000 [#1] PREEMPT SMP NOPTI [ 1243.161066] CPU: 1 PID: 882 Comm: new_name Tainted: G O 5.14.0-rc5 #1 [ 1243.161066] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.13.0-1ubuntu1.1 04/01/2014 [ 1243.161066] RIP: 0010:bpf_overflow_handler+0x9a/0x1e0 [ 1243.161066] Code: 5a 84 c0 0f 84 06 01 00 00 be 66 02 00 00 48 c7 c7 6d 96 07 82 48 8b ab 18 05 00 00 e8 df 55 eb ff 66 90 48 8d 75 48 48 89 e7 <ff> 55 30 41 89 c4 e8 fb c1 f0 ff 84 c0 0f 84 94 00 00 00 e8 6e 0f [ 1243.161066] RSP: 0018:ffffc900000c0d80 EFLAGS: 00000046 [ 1243.161066] RAX: 0000000000000002 RBX: ffff8881002e0dd0 RCX: 00000000b4b47cf8 [ 1243.161066] RDX: ffffffff811dcb06 RSI: 0000000000000048 RDI: ffffc900000c0d80 [ 1243.161066] RBP: 0000000000000000 R08: 0000000000000000 R09: 1a9d56bb00000000 [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: 0000000000000000 [ 1243.161066] R13: ffffc900000c0e00 R14: ffffc900001c3c68 R15: 0000000000000082 [ 1243.161066] FS: 00007fc0be2d3380(0000) GS:ffff88813bd00000(0000) knlGS:0000000000000000 [ 1243.161066] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1243.161066] CR2: 0000000000000030 CR3: 0000000104f8e000 CR4: 00000000000006e0 [ 1243.161066] Call Trace: [ 1243.161066] <IRQ> [ 1243.161066] __perf_event_overflow+0x4f/0xf0 [ 1243.161066] perf_swevent_hrtimer+0x116/0x130 [ 1243.161066] ? __lock_acquire+0x378/0x2730 [ 1243.161066] ? __lock_acquire+0x372/0x2730 [ 1243.161066] ? lock_is_held_type+0xd5/0x130 [ 1243.161066] ? find_held_lock+0x2b/0x80 [ 1243.161066] ? lock_is_held_type+0xd5/0x130 [ 1243.161066] ? perf_event_groups_first+0x80/0x80 [ 1243.161066] ? perf_event_groups_first+0x80/0x80 [ 1243.161066] __hrtimer_run_queues+0x1a3/0x460 [ 1243.161066] hrtimer_interrupt+0x110/0x220 [ 1243.161066] __sysvec_apic_timer_interrupt+0x8a/0x260 [ 1243.161066] sysvec_apic_timer_interrupt+0x89/0xc0 [ 1243.161066] </IRQ> [ 1243.161066] asm_sysvec_apic_timer_interrupt+0x12/0x20 [ 1243.161066] RIP: 0010:finish_task_switch+0xaf/0x250 [ 1243.161066] Code: 31 f6 68 90 2a 09 81 49 8d 7c 24 18 e8 aa d6 03 00 4c 89 e7 e8 12 ff ff ff 4c 89 e7 e8 ca 9c 80 00 e8 35 af 0d 00 fb 4d 85 f6 <58> 74 1d 65 48 8b 04 25 c0 6d 01 00 4c 3b b0 a0 04 00 00 74 37 f0 [ 1243.161066] RSP: 0018:ffffc900001c3d18 EFLAGS: 00000282 [ 1243.161066] RAX: 000000000000031f RBX: ffff888104cf4980 RCX: 0000000000000000 [ 1243.161066] RDX: 0000000000000000 RSI: ffffffff82095460 RDI: ffffffff820adc4e [ 1243.161066] RBP: ffffc900001c3d58 R08: 0000000000000001 R09: 0000000000000001 [ 1243.161066] R10: 0000000000000001 R11: 0000000000080000 R12: ffff88813bd2bc80 [ 1243.161066] R13: ffff8881002e8000 R14: ffff88810022ad80 R15: 0000000000000000 [ 1243.161066] ? finish_task_switch+0xab/0x250 [ 1243.161066] ? finish_task_switch+0x70/0x250 [ 1243.161066] __schedule+0x36b/0xbb0 [ 1243.161066] ? _raw_spin_unlock_irqrestore+0x2d/0x50 [ 1243.161066] ? lockdep_hardirqs_on+0x79/0x100 [ 1243.161066] schedule+0x43/0xe0 [ 1243.161066] pipe_read+0x30b/0x450 [ 1243.161066] ? wait_woken+0x80/0x80 [ 1243.161066] new_sync_read+0x164/0x170 [ 1243.161066] vfs_read+0x122/0x1b0 [ 1243.161066] ksys_read+0x93/0xd0 [ 1243.161066] do_syscall_64+0x35/0x80 [ 1243.161066] entry_SYSCALL_64_after_hwframe+0x44/0xae The oops can also be reproduced with the following steps: ./vmtest.sh -s # at qemu shell cd /root/bpf && while true; do ./test_progs -t send_signal Further analysis showed that the failure is introduced with commit `b89fbfbb85` ("bpf: Implement minimal BPF perf link"). With the above commit, the following scenario becomes possible: cpu1 cpu2 hrtimer_interrupt -> bpf_overflow_handler (due to closing link_fd) bpf_perf_link_release -> perf_event_free_bpf_prog -> perf_event_free_bpf_handler -> WRITE_ONCE(event->overflow_handler, event->orig_overflow_handler) event->prog = NULL bpf_prog_run(event->prog, &ctx) In the above case, the event->prog is NULL for bpf_prog_run, hence causing oops. To fix the issue, check whether event->prog is NULL or not. If it is, do not call bpf_prog_run. This seems working as the above reproducible step runs more than one hour and I didn't see any failures. Fixes: `b89fbfbb85` ("bpf: Implement minimal BPF perf link") Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210819155209.1927994-1-yhs@fb.com	2021-08-19 11:49:19 -07:00
Daniel Borkmann	f9dabe016b	bpf: Undo off-by-one in interpreter tail call count limit The BPF interpreter as well as x86-64 BPF JIT were both in line by allowing up to 33 tail calls (however odd that number may be!). Recently, this was changed for the interpreter to reduce it down to 32 with the assumption that this should have been the actual limit "which is in line with the behavior of the x86 JITs" according to `b61a28cf11` ("bpf: Fix off-by-one in tail call count limiting"). Paul recently reported: I'm a bit surprised by this because I had previously tested the tail call limit of several JIT compilers and found it to be 33 (i.e., allowing chains of up to 34 programs). I've just extended a test program I had to validate this again on the x86-64 JIT, and found a limit of 33 tail calls again [1]. Also note we had previously changed the RISC-V and MIPS JITs to allow up to 33 tail calls [2, 3], for consistency with other JITs and with the interpreter. We had decided to increase these two to 33 rather than decrease the other JITs to 32 for backward compatibility, though that probably doesn't matter much as I'd expect few people to actually use 33 tail calls. [1] `ae78874829` [2] `96bc4432f5` ("bpf, riscv: Limit to 33 tail calls") [3] `e49e6f6db0` ("bpf, mips: Limit to 33 tail calls") Therefore, revert `b61a28cf11` to re-align interpreter to limit a maximum of 33 tail calls. While it is unlikely to hit the limit for the vast majority, programs in the wild could one way or another depend on this, so lets rather be a bit more conservative, and lets align the small remainder of JITs to 33. If needed in future, this limit could be slightly increased, but not decreased. Fixes: `b61a28cf11` ("bpf: Fix off-by-one in tail call count limiting") Reported-by: Paul Chaignon <paul@cilium.io> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Johan Almbladh <johan.almbladh@anyfinetworks.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/CAO5pjwTWrC0_dzTbTHFPSqDwA56aVH+4KFGVqdq8=ASs0MqZGQ@mail.gmail.com	2021-08-19 18:33:37 +02:00
Xu Liu	374e74de96	selftests/bpf: Test for get_netns_cookie Add test to use get_netns_cookie() from BPF_PROG_TYPE_SOCK_OPS. Signed-off-by: Xu Liu <liuxu623@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210818105820.91894-3-liuxu623@gmail.com	2021-08-19 00:30:14 +02:00
Xu Liu	6cf1770d63	bpf: Allow bpf_get_netns_cookie in BPF_PROG_TYPE_SOCK_OPS We'd like to be able to identify netns from sockops hooks to accelerate local process communication form different netns. Signed-off-by: Xu Liu <liuxu623@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210818105820.91894-2-liuxu623@gmail.com	2021-08-19 00:30:01 +02:00
Grant Seltzer	d20b41115a	libbpf: Rename libbpf documentation index file This patch renames a documentation libbpf.rst to index.rst. In order for readthedocs.org to pick this file up and properly build the documentation site. It also changes the title type of the ABI subsection in the naming convention doc. This is so that readthedocs.org doesn't treat this section as a separate document. Signed-off-by: Grant Seltzer <grantseltzer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210818151313.49992-1-grantseltzer@gmail.com	2021-08-18 08:45:25 -07:00
Colin Ian King	8cacfc85b6	bpf: Remove redundant initialization of variable allow The variable allow is being initialized with a value that is never read, it is being updated later on. The assignment is redundant and can be removed. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817170842.495440-1-colin.king@canonical.com	2021-08-17 14:09:12 -07:00
Andrii Nakryiko	04d2319467	Merge branch 'selftests/bpf: fix flaky send_signal test' Yonghong Song says: ==================== The bpf selftest send_signal() is flaky for its subtests trying to send signals in softirq/nmi context. To reduce flakiness, the signal-targetted process priority is boosted, which should minimize preemption of that process and improve the possibility that the underlying task in softirq/nmi context is the bpf_send_signal() wanted task. Patch #1 did a refactoring to use ASSERT_* instead of old CHECK macros. Patch #2 did actual change of boosting priority. Changelog: v1 -> v2: remove skip logic where the underlying task in interrupt context is not the intended one. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2021-08-17 14:08:31 -07:00
Yonghong Song	b16ac5bf73	selftests/bpf: Fix flaky send_signal test libbpf CI has reported send_signal test is flaky although I am not able to reproduce it in my local environment. But I am able to reproduce with on-demand libbpf CI ([1]). Through code analysis, the following is possible reason. The failed subtest runs bpf program in softirq environment. Since bpf_send_signal() only sends to a fork of "test_progs" process. If the underlying current task is not "test_progs", bpf_send_signal() will not be triggered and the subtest will fail. To reduce the chances where the underlying process is not the intended one, this patch boosted scheduling priority to -20 (highest allowed by setpriority() call). And I did 10 runs with on-demand libbpf CI with this patch and I didn't observe any failures. [1] https://github.com/libbpf/libbpf/actions/workflows/ondemand.yml Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817190923.3186725-1-yhs@fb.com	2021-08-17 14:08:30 -07:00
Yonghong Song	6f6cc42645	selftests/bpf: Replace CHECK with ASSERT_* macros in send_signal.c Replace CHECK in send_signal.c with ASSERT_* macros as ASSERT_* macros are generally preferred. There is no funcitonality change. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817190918.3186400-1-yhs@fb.com	2021-08-17 14:08:30 -07:00
Andrii Nakryiko	87bb11ccfe	Merge branch 'selftests/bpf: Improve the usability of test_progs' Yucong Sun says: ==================== This short series adds two new switches to test_progs, "-a" and "-d", adding support for both exact string matching, as well as '*' wildcards. It also cleans up the output to make it possible to generate allowlist/denylist using common cli tools. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2021-08-17 11:31:52 -07:00
Yucong Sun	74339a8f86	selftests/bpf: Support glob matching for test selector. This patch adds '-a' and '-d' arguments supporting both exact string match as well as using '' wildcard in test/subtests selection. '-a' and '-t' can co-exists, same as '-d' and '-b', in which case they just add to the list of allowed or denied test selectors. Caveat: Same as the current substring matching mechanism, test and subtest selector applies independently, 'a/b' will execute all tests matching "a", and with subtest name matching "b", but tests matching "a" that has no subtests will also be executed. Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817044732.3263066-5-fallentree@fb.com	2021-08-17 11:31:29 -07:00
Yucong Sun	99c4fd8b92	selftests/bpf: Also print test name in subtest status message This patch add test name in subtest status message line, making it possible to grep ':OK' in the output to generate a list of passed test+subtest names, which can be processed to generate argument list to be used with "-a", "-d" exact string matching. Example: #1/1 align/mov:OK .. #1/12 align/pointer variable subtraction:OK #1 align:OK Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817044732.3263066-4-fallentree@fb.com	2021-08-17 11:17:07 -07:00
Yucong Sun	f667d1d667	selftests/bpf: Correctly display subtest skip status In skip_account(), test->skip_cnt is set to 0 at the end, this makes next print statement never display SKIP status for the subtest. This patch moves the accounting logic after the print statement, fixing the issue. This patch also added SKIP status display for normal tests. Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817044732.3263066-3-fallentree@fb.com	2021-08-17 11:16:53 -07:00
Yucong Sun	26d82640d5	selftests/bpf: Skip loading bpf_testmod when using -l to list tests. When using "-l", test_progs often is executed as non-root user, load_bpf_testmod() will fail and output errors. This patch skips loading bpf testmod when "-l" is specified, making output cleaner. Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817044732.3263066-2-fallentree@fb.com	2021-08-17 11:16:27 -07:00
Yucong Sun	857f75ea84	selftests/bpf: Add exponential backoff to map_delete_retriable in test_maps Using a fixed delay of 1 microsecond has proven flaky in slow CPU environment, e.g. Github Actions CI system. This patch adds exponential backoff with a cap of 50ms to reduce the flakiness of the test. Initial delay is chosen at random in the range [0ms, 5ms). Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210817045713.3307985-1-fallentree@fb.com	2021-08-17 08:17:40 -07:00
Yucong Sun	3c3bd542ff	selftests/bpf: Add exponential backoff to map_update_retriable in test_maps Using a fixed delay of 1 microsecond has proven flaky in slow CPU environment, e.g. Github Actions CI system. This patch adds exponential backoff with a cap of 50ms to reduce the flakiness of the test. Initial delay is chosen at random in the range [0ms, 5ms). Signed-off-by: Yucong Sun <fallentree@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20210816175250.296110-1-fallentree@fb.com	2021-08-16 19:13:20 -07:00
Andrii Nakryiko	1e1e49df02	Merge branch 'sockmap: add sockmap support for unix stream socket' Jiang Wang says: ==================== This patch series add support for unix stream type for sockmap. Sockmap already supports TCP, UDP, unix dgram types. The unix stream support is similar to unix dgram. Also add selftests for unix stream type in sockmap tests. ==================== Signed-off-by: Andrii Nakryiko <andrii@kernel.org>	2021-08-16 18:44:19 -07:00
Jiang Wang	31c50aeed5	selftest/bpf: Add new tests in sockmap for unix stream to tcp. Add two new test cases in sockmap tests, where unix stream is redirected to tcp and vice versa. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-6-jiang.wang@bytedance.com	2021-08-16 18:44:09 -07:00
Jiang Wang	75e0e27db6	selftest/bpf: Change udp to inet in some function names This is to prepare for adding new unix stream tests. Mostly renames, also pass the socket types as an argument. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-5-jiang.wang@bytedance.com	2021-08-16 18:43:55 -07:00
Jiang Wang	9b03152bd4	selftest/bpf: Add tests for sockmap with unix stream type. Add two tests for unix stream to unix stream redirection in sockmap tests. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-4-jiang.wang@bytedance.com	2021-08-16 18:43:46 -07:00
Jiang Wang	94531cfcbe	af_unix: Add unix_stream_proto for sockmap Previously, sockmap for AF_UNIX protocol only supports dgram type. This patch add unix stream type support, which is similar to unix_dgram_proto. To support sockmap, dgram and stream cannot share the same unix_proto anymore, because they have different implementations, such as unhash for stream type (which will remove closed or disconnected sockets from the map), so rename unix_proto to unix_dgram_proto and add a new unix_stream_proto. Also implement stream related sockmap functions. And add dgram key words to those dgram specific functions. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-3-jiang.wang@bytedance.com	2021-08-16 18:43:39 -07:00
Jiang Wang	77462de14a	af_unix: Add read_sock for stream socket types To support sockmap for af_unix stream type, implement read_sock, which is similar to the read_sock for unix dgram sockets. Signed-off-by: Jiang Wang <jiang.wang@bytedance.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Cong Wang <cong.wang@bytedance.com> Acked-by: Jakub Sitnicki <jakub@cloudflare.com> Acked-by: John Fastabend <john.fastabend@gmail.com> Link: https://lore.kernel.org/bpf/20210816190327.2739291-2-jiang.wang@bytedance.com	2021-08-16 18:42:31 -07:00
Hengqi Chen	edce1a2486	selftests/bpf: Test btf__load_vmlinux_btf/btf__load_module_btf APIs Add test for btf__load_vmlinux_btf/btf__load_module_btf APIs. The test loads bpf_testmod module BTF and check existence of a symbol which is known to exist. Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210815081035.205879-1-hengqi.chen@gmail.com	2021-08-16 18:38:52 -07:00
grantseltzer	bb57164920	bpf: Reconfigure libbpf docs to remove unversioned API This removes the libbpf_api.rst file from the kernel documentation. The intention for this file was to pull documentation from comments above API functions in libbpf. However, due to limitations of the kernel documentation system, this API documentation could not be versioned, which is counterintuative to how users expect to use it. There is also currently no doc comments, making this a blank page. Once the kernel comment documentation is actually contributed, it will still exist in the kernel repository, just in the code itself. A seperate site is being spun up to generate documentaiton from those comments in a way in which it can be versioned properly. This also reconfigures the bpf documentation index page to make it easier to sync to the previously mentioned documentaiton site. Signed-off-by: Grant Seltzer <grantseltzer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20210810020508.280639-1-grantseltzer@gmail.com	2021-08-16 17:25:03 -07:00
Daniel Borkmann	3a4ce01b24	Merge branch 'bpf-perf-link' Andrii Nakryiko says: ==================== This patch set implements an ability for users to specify custom black box u64 value for each BPF program attachment, bpf_cookie, which is available to BPF program at runtime. This is a feature that's critically missing for cases when some sort of generic processing needs to be done by the common BPF program logic (or even exactly the same BPF program) across multiple BPF hooks (e.g., many uniformly handled kprobes) and it's important to be able to distinguish between each BPF hook at runtime (e.g., for additional configuration lookup). The choice of restricting this to a fixed-size 8-byte u64 value is an explicit design decision. Making this configurable by users adds unnecessary complexity (extra memory allocations, extra complications on the verifier side to validate accesses to variable-sized data area) while not really opening up new possibilities. If user's use case requires storing more data per attachment, it's possible to use either global array, or ARRAY/HASHMAP BPF maps, where bpf_cookie would be used as an index into respective storage, populated by user-space code before creating BPF link. This gives user all the flexibility and control while keeping BPF verifier and BPF helper API simple. Currently, similar functionality can only be achieved through: - code-generation and BPF program cloning, which is very complicated and unmaintainable; - on-the-fly C code generation and further runtime compilation, which is what BCC uses and allows to do pretty simply. The big downside is a very heavy-weight Clang/LLVM dependency and inefficient memory usage (due to many BPF program clones and the compilation process itself); - in some cases (kprobes and sometimes uprobes) it's possible to do function IP lookup to get function-specific configuration. This doesn't work for all the cases (e.g., when attaching uprobes to shared libraries) and has higher runtime overhead and additional programming complexity due to BPF_MAP_TYPE_HASHMAP lookups. Up until recently, before bpf_get_func_ip() BPF helper was added, it was also very complicated and unstable (API-wise) to get traced function's IP from fentry/fexit and kretprobe. With libbpf and BPF CO-RE, runtime compilation is not an option, so to be able to build generic tracing tooling simply and efficiently, ability to provide additional bpf_cookie value for each attachment (as opposed to each BPF program) is extremely important. Two immediate users of this functionality are going to be libbpf-based USDT library (currently in development) and retsnoop ([0]), but I'm sure more applications will come once users get this feature in their kernels. To achieve above described, all perf_event-based BPF hooks are made available through a new BPF_LINK_TYPE_PERF_EVENT BPF link, which allows to use common LINK_CREATE command for program attachments and generally brings perf_event-based attachments into a common BPF link infrastructure. With that, LINK_CREATE gets ability to pass throught bpf_cookie value during link creation (BPF program attachment) time. bpf_get_attach_cookie() BPF helper is added to allow fetching this value at runtime from BPF program side. BPF cookie is stored either on struct perf_event itself and fetched from the BPF program context, or is passed through ambient BPF run context, added in `c7603cfa04` ("bpf: Add ambient BPF runtime context stored in current"). On the libbpf side of things, BPF perf link is utilized whenever is supported by the kernel instead of using PERF_EVENT_IOC_SET_BPF ioctl on perf_event FD. All the tracing attach APIs are extended with OPTS and bpf_cookie is passed through corresponding opts structs. Last part of the patch set adds few self-tests utilizing new APIs. There are also a few refactorings along the way to make things cleaner and easier to work with, both in kernel (BPF_PROG_RUN and BPF_PROG_RUN_ARRAY), and throughout libbpf and selftests. Follow-up patches will extend bpf_cookie to fentry/fexit programs. While adding uprobe_opts, also extend it with ref_ctr_offset for specifying USDT semaphore (reference counter) offset. Update attach_probe selftests to validate its functionality. This is another feature (along with bpf_cookie) required for implementing libbpf-based USDT solution. [0] https://github.com/anakryiko/retsnoop v4->v5: - rebase on latest bpf-next to resolve merge conflict; - add ref_ctr_offset to uprobe_opts and corresponding selftest; v3->v4: - get rid of BPF_PROG_RUN macro in favor of bpf_prog_run() (Daniel); - move #ifdef CONFIG_BPF_SYSCALL check into bpf_set_run_ctx (Daniel); v2->v3: - user_ctx -> bpf_cookie, bpf_get_user_ctx -> bpf_get_attach_cookie (Peter); - fix BPF_LINK_TYPE_PERF_EVENT value fix (Jiri); - use bpf_prog_run() from bpf_prog_run_pin_on_cpu() (Yonghong); v1->v2: - fix build failures on non-x86 arches by gating on CONFIG_PERF_EVENTS. ==================== Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2021-08-17 00:45:17 +02:00
Andrii Nakryiko	4bd11e08e0	selftests/bpf: Add ref_ctr_offset selftests Extend attach_probe selftests to specify ref_ctr_offset for uprobe/uretprobe and validate that its value is incremented from zero. Turns out that once uprobe is attached with ref_ctr_offset, uretprobe for the same location/function has to use ref_ctr_offset as well, otherwise perf_event_open() fails with -EINVAL. So this test uses ref_ctr_offset for both uprobe and uretprobe, even though for the purpose of test uprobe would be enough. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-17-andrii@kernel.org	2021-08-17 00:45:08 +02:00
Andrii Nakryiko	5e3b8356de	libbpf: Add uprobe ref counter offset support for USDT semaphores When attaching to uprobes through perf subsystem, it's possible to specify offset of a so-called USDT semaphore, which is just a reference counted u16, used by kernel to keep track of how many tracers are attached to a given location. Support for this feature was added in [0], so just wire this through uprobe_opts. This is important to enable implementing USDT attachment and tracing through libbpf's bpf_program__attach_uprobe_opts() API. [0] `a6ca88b241` ("trace_uprobe: support reference counter in fd-based uprobe") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-16-andrii@kernel.org	2021-08-17 00:45:08 +02:00
Andrii Nakryiko	0a80cf67f3	selftests/bpf: Add bpf_cookie selftests for high-level APIs Add selftest with few subtests testing proper bpf_cookie usage. Kprobe and uprobe subtests are pretty straightforward and just validate that the same BPF program attached with different bpf_cookie will be triggered with those different bpf_cookie values. Tracepoint subtest is a bit more interesting, as it is the only perf_event-based BPF hook that shares bpf_prog_array between multiple perf_events internally. This means that the same BPF program can't be attached to the same tracepoint multiple times. So we have 3 identical copies. This arrangement allows to test bpf_prog_array_copy()'s handling of bpf_prog_array list manipulation logic when programs are attached and detached. The test validates that bpf_cookie isn't mixed up and isn't lost during such list manipulations. Perf_event subtest validates that two BPF links can be created against the same perf_event (but not at the same time, only one BPF program can be attached to perf_event itself), and that for each we can specify different bpf_cookie value. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-15-andrii@kernel.org	2021-08-17 00:45:08 +02:00
Andrii Nakryiko	a549aaa673	selftests/bpf: Extract uprobe-related helpers into trace_helpers.{c,h} Extract two helpers used for working with uprobes into trace_helpers.{c,h} to be re-used between multiple uprobe-using selftests. Also rename get_offset() into more appropriate get_uprobe_offset(). Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-14-andrii@kernel.org	2021-08-17 00:45:08 +02:00
Andrii Nakryiko	f36d3557a1	selftests/bpf: Test low-level perf BPF link API Add tests utilizing low-level bpf_link_create() API to create perf BPF link. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-13-andrii@kernel.org	2021-08-17 00:45:08 +02:00
Andrii Nakryiko	47faff3717	libbpf: Add bpf_cookie to perf_event, kprobe, uprobe, and tp attach APIs Wire through bpf_cookie for all attach APIs that use perf_event_open under the hood: - for kprobes, extend existing bpf_kprobe_opts with bpf_cookie field; - for perf_event, uprobe, and tracepoint APIs, add their _opts variants and pass bpf_cookie through opts. For kernel that don't support BPF_LINK_CREATE for perf_events, and thus bpf_cookie is not supported either, return error and log warning for user. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20210815070609.987780-12-andrii@kernel.org	2021-08-17 00:45:08 +02:00

1 2 3 4 5 ...

1032021 Commits