linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-28 05:24:47 +08:00

Author	SHA1	Message	Date
Kui-Feng Lee	a7b0fa352e	bpftool: Generated shadow variables for struct_ops maps. Declares and defines a pointer of the shadow type for each struct_ops map. The code generator will create an anonymous struct type as the shadow type for each struct_ops map. The shadow type is translated from the original struct type of the map. The user of the skeleton use pointers of them to access the values of struct_ops maps. However, shadow types only supports certain types of fields, including scalar types and function pointers. Any fields of unsupported types are translated into an array of characters to occupy the space of the original field. Function pointers are translated into pointers of the struct bpf_program. Additionally, padding fields are generated to occupy the space between two consecutive fields. The pointers of shadow types of struct_osp maps are initialized when *__open_opts() in skeletons are called. For a map called FOO, the user can access it through the pointer at skel->struct_ops.FOO. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/20240229064523.2091270-4-thinker.li@gmail.com	2024-02-29 14:23:53 -08:00
Kui-Feng Lee	69e4a9d2b3	libbpf: Convert st_ops->data to shadow type. Convert st_ops->data to the shadow type of the struct_ops map. The shadow type of a struct_ops type is a variant of the original struct type providing a way to access/change the values in the maps of the struct_ops type. bpf_map__initial_value() will return st_ops->data for struct_ops types. The skeleton is going to use it as the pointer to the shadow type of the original struct type. One of the main differences between the original struct type and the shadow type is that all function pointers of the shadow type are converted to pointers of struct bpf_program. Users can replace these bpf_program pointers with other BPF programs. The st_ops->progs[] will be updated before updating the value of a map to reflect the changes made by users. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240229064523.2091270-3-thinker.li@gmail.com	2024-02-29 14:23:52 -08:00
Kui-Feng Lee	3644d28546	libbpf: Set btf_value_type_id of struct bpf_map for struct_ops. For a struct_ops map, btf_value_type_id is the type ID of it's struct type. This value is required by bpftool to generate skeleton including pointers of shadow types. The code generator gets the type ID from bpf_map__btf_value_type_id() in order to get the type information of the struct type of a map. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240229064523.2091270-2-thinker.li@gmail.com	2024-02-29 14:23:52 -08:00
Kees Cook	896880ff30	bpf: Replace bpf_lpm_trie_key 0-length array with flexible array Replace deprecated 0-length array in struct bpf_lpm_trie_key with flexible array. Found with GCC 13: ../kernel/bpf/lpm_trie.c:207:51: warning: array subscript i is outside array bounds of 'const __u8[0]' {aka 'const unsigned char[]'} [-Warray-bounds=] 207 \| (__be16 )&key->data[i]); \| ^~~~~~~~~~~~~ ../include/uapi/linux/swab.h:102:54: note: in definition of macro '__swab16' 102 \| #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x)) \| ^ ../include/linux/byteorder/generic.h:97:21: note: in expansion of macro '__be16_to_cpu' 97 \| #define be16_to_cpu __be16_to_cpu \| ^~~~~~~~~~~~~ ../kernel/bpf/lpm_trie.c:206:28: note: in expansion of macro 'be16_to_cpu' 206 \| u16 diff = be16_to_cpu((__be16 )&node->data[i] ^ \| ^~~~~~~~~~~ In file included from ../include/linux/bpf.h:7: ../include/uapi/linux/bpf.h:82:17: note: while referencing 'data' 82 \| __u8 data[0]; /* Arbitrary size / \| ^~~~ And found at run-time under CONFIG_FORTIFY_SOURCE: UBSAN: array-index-out-of-bounds in kernel/bpf/lpm_trie.c:218:49 index 0 is out of range for type '__u8 []' Changing struct bpf_lpm_trie_key is difficult since has been used by userspace. For example, in Cilium: struct egress_gw_policy_key { struct bpf_lpm_trie_key lpm_key; __u32 saddr; __u32 daddr; }; While direct references to the "data" member haven't been found, there are static initializers what include the final member. For example, the "{}" here: struct egress_gw_policy_key in_key = { .lpm_key = { 32 + 24, {} }, .saddr = CLIENT_IP, .daddr = EXTERNAL_SVC_IP & 0Xffffff, }; To avoid the build time and run time warnings seen with a 0-sized trailing array for struct bpf_lpm_trie_key, introduce a new struct that correctly uses a flexible array for the trailing bytes, struct bpf_lpm_trie_key_u8. As part of this, include the "header" portion (which is just the "prefixlen" member), so it can be used by anything building a bpf_lpr_trie_key that has trailing members that aren't a u8 flexible array (like the self-test[1]), which is named struct bpf_lpm_trie_key_hdr. Unfortunately, C++ refuses to parse the __struct_group() helper, so it is not possible to define struct bpf_lpm_trie_key_hdr directly in struct bpf_lpm_trie_key_u8, so we must open-code the union directly. Adjust the kernel code to use struct bpf_lpm_trie_key_u8 through-out, and for the selftest to use struct bpf_lpm_trie_key_hdr. Add a comment to the UAPI header directing folks to the two new options. Reported-by: Mark Rutland <mark.rutland@arm.com> Signed-off-by: Kees Cook <keescook@chromium.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Gustavo A. R. Silva <gustavoars@kernel.org> Closes: https://paste.debian.net/hidden/ca500597/ Link: https://lore.kernel.org/all/202206281009.4332AA33@keescook/ [1] Link: https://lore.kernel.org/bpf/20240222155612.it.533-kees@kernel.org	2024-02-29 22:52:43 +01:00
Alexei Starovoitov	b9a6299848	Merge branch 'bpf-arm64-use-bpf-prog-pack-allocator-in-bpf-jit' Puranjay Mohan says: ==================== bpf, arm64: use BPF prog pack allocator in BPF JIT Changes in V8 => V9: V8: https://lore.kernel.org/bpf/20240221145106.105995-1-puranjay12@gmail.com/ 1. Rebased on bpf-next/master 2. Added Acked-by: Catalin Marinas <catalin.marinas@arm.com> Changes in V7 => V8: V7: https://lore.kernel.org/bpf/20240125133159.85086-1-puranjay12@gmail.com/ 1. Rebase on bpf-next/master 2. Fix __text_poke() by removing usage of 'ret' that was never set. Changes in V6 => V7: V6: https://lore.kernel.org/all/20240124164917.119997-1-puranjay12@gmail.com/ 1. Rebase on bpf-next/master. Changes in V5 => V6: V5: https://lore.kernel.org/all/20230908144320.2474-1-puranjay12@gmail.com/ 1. Implement a text poke api to reduce code repeatition. 2. Use flush_icache_range() in place of caches_clean_inval_pou() in the functions that modify code. 3. Optimize the bpf_jit_free() by not copying the all instructions on the rw image to the ro_image Changes in V4 => v5: 1. Remove the patch for making prog pack allocator portable as it will come through the RISCV tree[1]. 2. Add a new function aarch64_insn_set() to be used in bpf_arch_text_invalidate() for putting illegal instructions after a program is removed. The earlier implementation of bpf_arch_text_invalidate() was calling aarch64_insn_patch_text_nosync() in a loop and making it slow because each call invalidated the cache. Here is test_tag now: [root@ip-172-31-6-176 bpf]# time ./test_tag test_tag: OK (40945 tests) real 0m19.695s user 0m1.514s sys 0m17.841s test_tag without these patches: [root@ip-172-31-6-176 bpf]# time ./test_tag test_tag: OK (40945 tests) real 0m21.487s user 0m1.647s sys 0m19.106s test_tag in the previous version was really slow > 2 minutes. see [2] 3. Add cache invalidation in aarch64_insn_copy() so other users can call the function without worrying about the cache. Currently only bpf_arch_text_copy() is using it, but there might be more users in the future. Chanes in V3 => V4: Changes only in 3rd patch 1. Fix the I-cache maintenance: Clean the data cache and invalidate the i-Cache only after the instructions have been copied to the ROX region. Chanes in V2 => V3: Changes only in 3rd patch 1. Set prog = orig_prog; in the failure path of bpf_jit_binary_pack_finalize() call. 2. Add comments explaining the usage of the offsets in the exception table. Changes in v1 => v2: 1. Make the naming consistent in the 3rd patch: ro_image and image ro_header and header ro_image_ptr and image_ptr 2. Use names dst/src in place of addr/opcode in second patch. 3. Add Acked-by: Song Liu <song@kernel.org> in 1st and 2nd patch. BPF programs currently consume a page each on ARM64. For systems with many BPF programs, this adds significant pressure to instruction TLB. High iTLB pressure usually causes slow down for the whole system. Song Liu introduced the BPF prog pack allocator[3] to mitigate the above issue. It packs multiple BPF programs into a single huge page. It is currently only enabled for the x86_64 BPF JIT. This patch series enables the BPF prog pack allocator for the ARM64 BPF JIT. ==================================================== Performance Analysis of prog pack allocator on ARM64 ==================================================== To test the performance of the BPF prog pack allocator on ARM64, a stresser tool[4] was built. This tool loads 8 BPF programs on the system and triggers 5 of them in an infinite loop by doing system calls. The runner script starts 20 instances of the above which loads 820=160 BPF programs on the system, 520=100 of which are being constantly triggered. In the above environment we try to build Python-3.8.4 and try to find different iTLB metrics for the compilation done by gcc-12.2.0. The source code[5] is configured with the following command: ./configure --enable-optimizations --with-ensurepip=install Then the runner script is executed with the following command: ./run.sh "perf stat -e ITLB_WALK,L1I_TLB,INST_RETIRED,iTLB-load-misses -a make -j32" This builds Python while 160 BPF programs are loaded and 100 are being constantly triggered and measures iTLB related metrics. The output of the above command is discussed below before and after enabling the BPF prog pack allocator. The tests were run on qemu-system-aarch64 with 32 cpus, 4G memory, -machine virt, -cpu host, and -enable-kvm. Results ------- Before enabling prog pack allocator: ------------------------------------ Performance counter stats for 'system wide': 333278635 ITLB_WALK 6762692976558 L1I_TLB 25359571423901 INST_RETIRED 15824054789 iTLB-load-misses 189.029769053 seconds time elapsed After enabling prog pack allocator: ----------------------------------- Performance counter stats for 'system wide': 190333544 ITLB_WALK 6712712386528 L1I_TLB 25278233304411 INST_RETIRED 5716757866 iTLB-load-misses 185.392650561 seconds time elapsed Improvements in metrics ----------------------- Compilation time ---> 1.92% faster iTLB-load-misses/Sec (Less is better) ---> 63.16% decrease ITLB_WALK/1000 INST_RETIRED (Less is better) ---> 42.71% decrease ITLB_Walk/L1I_TLB (Less is better) ---> 42.47% decrease [1] https://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux.git/commit/?h=for-next&id=20e490adea279d49d57b800475938f5b67926d98 [2] https://lore.kernel.org/all/CANk7y0gcP3dF2mESLp5JN1+9iDfgtiWRFGqLkCgZD6wby1kQOw@mail.gmail.com/ [3] https://lore.kernel.org/bpf/20220204185742.271030-1-song@kernel.org/ [4] https://github.com/puranjaymohan/BPF-Allocator-Bench [5] https://www.python.org/ftp/python/3.8.4/Python-3.8.4.tgz ==================== Link: https://lore.kernel.org/r/20240228141824.119877-1-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-28 13:44:48 -08:00
Puranjay Mohan	1dad391dae	bpf, arm64: use bpf_prog_pack for memory management Use bpf_jit_binary_pack_alloc for memory management of JIT binaries in ARM64 BPF JIT. The bpf_jit_binary_pack_alloc creates a pair of RW and RX buffers. The JIT writes the program into the RW buffer. When the JIT is done, the program is copied to the final RX buffer with bpf_jit_binary_pack_finalize. Implement bpf_arch_text_copy() and bpf_arch_text_invalidate() for ARM64 JIT as these functions are required by bpf_jit_binary_pack allocator. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Song Liu <song@kernel.org> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240228141824.119877-3-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-28 13:44:47 -08:00
Puranjay Mohan	451c3cab9a	arm64: patching: implement text_poke API The text_poke API is used to implement functions like memcpy() and memset() for instruction memory (RO+X). The implementation is similar to the x86 version. This will be used by the BPF JIT to write and modify BPF programs. There could be more users of this in the future. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240228141824.119877-2-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-28 13:44:47 -08:00
Alexei Starovoitov	e59997d905	Merge branch 'bpf-arm64-support-exceptions' Puranjay Mohan says: ==================== bpf, arm64: Support Exceptions Changes in V2->V3: V2: https://lore.kernel.org/all/20230917000045.56377-1-puranjay12@gmail.com/ - Use unwinder from stacktrace.c rather than open coding the unwind logic. - Fix a bug in the prologue related to BPF_FP (Xu Kuohai) Changes in V1->V2: V1: https://lore.kernel.org/all/20230912233942.6734-1-puranjay12@gmail.com/ - Remove exceptions from DENYLIST.aarch64 as they are supported now. The base support for exceptions was merged with [1] and it was enabled for x86-64. This patch set enables the support on ARM64, all sefltests are passing: # ./test_progs -a exceptions #74/1 exceptions/exception_throw_always_1:OK #74/2 exceptions/exception_throw_always_2:OK #74/3 exceptions/exception_throw_unwind_1:OK #74/4 exceptions/exception_throw_unwind_2:OK #74/5 exceptions/exception_throw_default:OK #74/6 exceptions/exception_throw_default_value:OK #74/7 exceptions/exception_tail_call:OK #74/8 exceptions/exception_ext:OK #74/9 exceptions/exception_ext_mod_cb_runtime:OK #74/10 exceptions/exception_throw_subprog:OK #74/11 exceptions/exception_assert_nz_gfunc:OK #74/12 exceptions/exception_assert_zero_gfunc:OK #74/13 exceptions/exception_assert_neg_gfunc:OK #74/14 exceptions/exception_assert_pos_gfunc:OK #74/15 exceptions/exception_assert_negeq_gfunc:OK #74/16 exceptions/exception_assert_poseq_gfunc:OK #74/17 exceptions/exception_assert_nz_gfunc_with:OK #74/18 exceptions/exception_assert_zero_gfunc_with:OK #74/19 exceptions/exception_assert_neg_gfunc_with:OK #74/20 exceptions/exception_assert_pos_gfunc_with:OK #74/21 exceptions/exception_assert_negeq_gfunc_with:OK #74/22 exceptions/exception_assert_poseq_gfunc_with:OK #74/23 exceptions/exception_bad_assert_nz_gfunc:OK #74/24 exceptions/exception_bad_assert_zero_gfunc:OK #74/25 exceptions/exception_bad_assert_neg_gfunc:OK #74/26 exceptions/exception_bad_assert_pos_gfunc:OK #74/27 exceptions/exception_bad_assert_negeq_gfunc:OK #74/28 exceptions/exception_bad_assert_poseq_gfunc:OK #74/29 exceptions/exception_bad_assert_nz_gfunc_with:OK #74/30 exceptions/exception_bad_assert_zero_gfunc_with:OK #74/31 exceptions/exception_bad_assert_neg_gfunc_with:OK #74/32 exceptions/exception_bad_assert_pos_gfunc_with:OK #74/33 exceptions/exception_bad_assert_negeq_gfunc_with:OK #74/34 exceptions/exception_bad_assert_poseq_gfunc_with:OK #74/35 exceptions/exception_assert_range:OK #74/36 exceptions/exception_assert_range_with:OK #74/37 exceptions/exception_bad_assert_range:OK #74/38 exceptions/exception_bad_assert_range_with:OK #74/39 exceptions/non-throwing fentry -> exception_cb:OK #74/40 exceptions/throwing fentry -> exception_cb:OK #74/41 exceptions/non-throwing fexit -> exception_cb:OK #74/42 exceptions/throwing fexit -> exception_cb:OK #74/43 exceptions/throwing extension (with custom cb) -> exception_cb:OK #74/44 exceptions/throwing extension -> global func in exception_cb:OK #74/45 exceptions/exception_ext_mod_cb_runtime:OK #74/46 exceptions/throwing extension (with custom cb) -> global func in exception_cb:OK #74/47 exceptions/exception_ext:OK #74/48 exceptions/non-throwing fentry -> non-throwing subprog:OK #74/49 exceptions/throwing fentry -> non-throwing subprog:OK #74/50 exceptions/non-throwing fentry -> throwing subprog:OK #74/51 exceptions/throwing fentry -> throwing subprog:OK #74/52 exceptions/non-throwing fexit -> non-throwing subprog:OK #74/53 exceptions/throwing fexit -> non-throwing subprog:OK #74/54 exceptions/non-throwing fexit -> throwing subprog:OK #74/55 exceptions/throwing fexit -> throwing subprog:OK #74/56 exceptions/non-throwing fmod_ret -> non-throwing subprog:OK #74/57 exceptions/non-throwing fmod_ret -> non-throwing global subprog:OK #74/58 exceptions/non-throwing extension -> non-throwing subprog:OK #74/59 exceptions/non-throwing extension -> throwing subprog:OK #74/60 exceptions/non-throwing extension -> non-throwing subprog:OK #74/61 exceptions/non-throwing extension -> throwing global subprog:OK #74/62 exceptions/throwing extension -> throwing global subprog:OK #74/63 exceptions/throwing extension -> non-throwing global subprog:OK #74/64 exceptions/non-throwing extension -> main subprog:OK #74/65 exceptions/throwing extension -> main subprog:OK #74/66 exceptions/reject_exception_cb_type_1:OK #74/67 exceptions/reject_exception_cb_type_2:OK #74/68 exceptions/reject_exception_cb_type_3:OK #74/69 exceptions/reject_exception_cb_type_4:OK #74/70 exceptions/reject_async_callback_throw:OK #74/71 exceptions/reject_with_lock:OK #74/72 exceptions/reject_subprog_with_lock:OK #74/73 exceptions/reject_with_rcu_read_lock:OK #74/74 exceptions/reject_subprog_with_rcu_read_lock:OK #74/75 exceptions/reject_with_rbtree_add_throw:OK #74/76 exceptions/reject_with_reference:OK #74/77 exceptions/reject_with_cb_reference:OK #74/78 exceptions/reject_with_cb:OK #74/79 exceptions/reject_with_subprog_reference:OK #74/80 exceptions/reject_throwing_exception_cb:OK #74/81 exceptions/reject_exception_cb_call_global_func:OK #74/82 exceptions/reject_exception_cb_call_static_func:OK #74/83 exceptions/reject_multiple_exception_cb:OK #74/84 exceptions/reject_exception_throw_cb:OK #74/85 exceptions/reject_exception_throw_cb_diff:OK #74/86 exceptions/reject_set_exception_cb_bad_ret1:OK #74/87 exceptions/reject_set_exception_cb_bad_ret2:OK #74/88 exceptions/check_assert_eq_int_min:OK #74/89 exceptions/check_assert_eq_int_max:OK #74/90 exceptions/check_assert_eq_zero:OK #74/91 exceptions/check_assert_eq_llong_min:OK #74/92 exceptions/check_assert_eq_llong_max:OK #74/93 exceptions/check_assert_lt_pos:OK #74/94 exceptions/check_assert_lt_zero:OK #74/95 exceptions/check_assert_lt_neg:OK #74/96 exceptions/check_assert_le_pos:OK #74/97 exceptions/check_assert_le_zero:OK #74/98 exceptions/check_assert_le_neg:OK #74/99 exceptions/check_assert_gt_pos:OK #74/100 exceptions/check_assert_gt_zero:OK #74/101 exceptions/check_assert_gt_neg:OK #74/102 exceptions/check_assert_ge_pos:OK #74/103 exceptions/check_assert_ge_zero:OK #74/104 exceptions/check_assert_ge_neg:OK #74/105 exceptions/check_assert_range_s64:OK #74/106 exceptions/check_assert_range_u64:OK #74/107 exceptions/check_assert_single_range_s64:OK #74/108 exceptions/check_assert_single_range_u64:OK #74/109 exceptions/check_assert_generic:OK #74/110 exceptions/check_assert_with_return:OK #74 exceptions:OK Summary: 1/110 PASSED, 0 SKIPPED, 0 FAILED [1] https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git/commit/?h=for-next&id=ec6f1b4db95b7eedb3fe85f4f14e08fa0e9281c3 ==================== Link: https://lore.kernel.org/r/20240201125225.72796-1-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-27 13:54:18 -08:00
Puranjay Mohan	22fc0e80ae	bpf, arm64: support exceptions The prologue generation code has been modified to make the callback program use the stack of the program marked as exception boundary where callee-saved registers are already pushed. As the bpf_throw function never returns, if it clobbers any callee-saved registers, they would remain clobbered. So, the prologue of the exception-boundary program is modified to push R23 and R24 as well, which the callback will then recover in its epilogue. The Procedure Call Standard for the Arm 64-bit Architecture[1] states that registers r19 to r28 should be saved by the callee. BPF programs on ARM64 already save all callee-saved registers except r23 and r24. This patch adds an instruction in prologue of the program to save these two registers and another instruction in the epilogue to recover them. These extra instructions are only added if bpf_throw() is used. Otherwise the emitted prologue/epilogue remains unchanged. [1] https://github.com/ARM-software/abi-aa/blob/main/aapcs64/aapcs64.rst Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Link: https://lore.kernel.org/r/20240201125225.72796-3-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-27 13:54:17 -08:00
Puranjay Mohan	e74cb1b422	arm64: stacktrace: Implement arch_bpf_stack_walk() for the BPF JIT This will be used by bpf_throw() to unwind till the program marked as exception boundary and run the callback with the stack of the main program. This is required for supporting BPF exceptions on ARM64. Signed-off-by: Puranjay Mohan <puranjay12@gmail.com> Acked-by: Catalin Marinas <catalin.marinas@arm.com> Link: https://lore.kernel.org/r/20240201125225.72796-2-puranjay12@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-27 13:54:17 -08:00
Benjamin Tissoires	2ab256e932	bpf: add is_async_callback_calling_insn() helper Currently we have a special case for BPF_FUNC_timer_set_callback, let's introduce a helper we can extend for the kfunc that will come in a later patch Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-3-1fb378ca6301@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-22 17:48:53 -08:00
Benjamin Tissoires	dfe6625df4	bpf: introduce in_sleepable() helper No code change, but it'll allow to have only one place to change everything when we add in_sleepable in cur_state. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-2-1fb378ca6301@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-22 17:47:15 -08:00
Benjamin Tissoires	55bad79e33	bpf: allow more maps in sleepable bpf programs These 2 maps types are required for HID-BPF when a user wants to do IO with a device from a sleepable tracing point. Allowing BPF_MAP_TYPE_QUEUE (and therefore BPF_MAP_TYPE_STACK) allows for a BPF program to prepare from an IRQ the list of HID commands to send back to the device and then these commands can be retrieved from the sleepable trace point. Signed-off-by: Benjamin Tissoires <bentiss@kernel.org> Link: https://lore.kernel.org/r/20240221-hid-bpf-sleepable-v3-1-1fb378ca6301@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-22 17:42:23 -08:00
Martin KaFai Lau	63c7049ef9	Merge branch 'Check cfi_stubs before registering a struct_ops type.' Kui-Feng Lee says: ==================== Recently, st_ops->cfi_stubs was introduced. However, the upcoming new struct_ops support (e.g. sched_ext) is not aware of this and does not provide its own cfi_stubs. The kernel ends up NULL dereferencing the st_ops->cfi_stubs. Considering struct_ops supports kernel module now, this NULL check is necessary. This patch set is to reject struct_ops registration that does not provide a cfi_stubs. Changes from v4: - Remove changes of check_member. - Remove checks of the pointers in cfi_stubs[]. Changes from v3: - Remove CFI stub function for get_info. - Allow passing NULL prog arg to check_member of struct bpf_struct_ops type. - Call check_member to determines if a CFI stub function should be defined for an operator. Changes from v2: - Add a stub function for get_info of struct tcp_congestion_ops. Changes from v1: - Check (void *)(cfi_stubs + moff) to make sure stub functions are provided for every operator. - Add a test case to ensure that struct_ops rejects incomplete cfi_stub. v4: https://lore.kernel.org/all/20240221075213.2071454-1-thinker.li@gmail.com/ v3: https://lore.kernel.org/all/20240216193434.735874-1-thinker.li@gmail.com/ v2: https://lore.kernel.org/all/20240216020350.2061373-1-thinker.li@gmail.com/ v1: https://lore.kernel.org/all/20240215022401.1882010-1-thinker.li@gmail.com/ ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-02-22 12:26:42 -08:00
Kui-Feng Lee	e9bbda13a7	selftests/bpf: Test case for lacking CFI stub functions. Ensure struct_ops rejects the registration of struct_ops types without proper CFI stub functions. bpf_test_no_cfi.ko is a module that attempts to register a struct_ops type called "bpf_test_no_cfi_ops" with cfi_stubs of NULL and non-NULL value. The NULL one should fail, and the non-NULL one should succeed. The module can only be loaded successfully if these registrations yield the expected results. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240222021105.1180475-3-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-02-22 12:26:41 -08:00
Kui-Feng Lee	3e0008336a	bpf: Check cfi_stubs before registering a struct_ops type. Recently, st_ops->cfi_stubs was introduced. However, the upcoming new struct_ops support (e.g. sched_ext) is not aware of this and does not provide its own cfi_stubs. The kernel ends up NULL dereferencing the st_ops->cfi_stubs. Considering struct_ops supports kernel module now, this NULL check is necessary. This patch is to reject struct_ops registration that does not provide a cfi_stubs. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240222021105.1180475-2-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-02-22 12:26:40 -08:00
Martin Kelly	58fd62e0aa	bpf: Clarify batch lookup/lookup_and_delete semantics The batch lookup and lookup_and_delete APIs have two parameters, in_batch and out_batch, to facilitate iterative lookup/lookup_and_deletion operations for supported maps. Except NULL for in_batch at the start of these two batch operations, both parameters need to point to memory equal or larger than the respective map key size, except for various hashmaps (hash, percpu_hash, lru_hash, lru_percpu_hash) where the in_batch/out_batch memory size should be at least 4 bytes. Document these semantics to clarify the API. Signed-off-by: Martin Kelly <martin.kelly@crowdstrike.com> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240221211838.1241578-1-martin.kelly@crowdstrike.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-02-22 10:24:38 -08:00
Dave Thaler	89ee838130	bpf, docs: specify which BPF_ABS and BPF_IND fields were zero Specifying which fields were unused allows IANA to only list as deprecated instructions that were actually used, leaving the rest as unassigned and possibly available for future use for something else. Signed-off-by: Dave Thaler <dthaler1968@gmail.com> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20240221175419.16843-1-dthaler1968@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-22 09:11:49 -08:00
Dave Thaler	c1bb68f6b2	bpf, docs: Fix typos in instruction-set.rst * "BPF ADD" should be "BPF_ADD". * "src" should be "src_reg" in several places. The latter is the field name in the instruction. The former refers to the value of the register, or the immediate. * Add '' around field names in one sentence, for consistency with the rest of the document. Signed-off-by: Dave Thaler <dthaler1968@gmail.com> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/r/20240221173535.16601-1-dthaler1968@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-22 09:07:37 -08:00
Alexei Starovoitov	8425b6eb51	Merge branch 'selftests-bpf-reduce-tcp_custom_syncookie-verification-complexity' Eduard Zingerman says: ==================== selftests/bpf: reduce tcp_custom_syncookie verification complexity Thread [0] discusses a fix for bpf_loop() handling bug. That change makes tcp_custom_syncookie test too complex to verify. The fix discussed in [0] would be sent via 'bpf' tree, tcp_custom_syncookie test is not in 'bpf' tree yet. As agreed in [0] I'm sending syncookie test update separately. [0] https://lore.kernel.org/bpf/20240216150334.31937-1-eddyz87@gmail.com/ ==================== Link: https://lore.kernel.org/r/20240222150300.14909-1-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-22 08:46:15 -08:00
Eduard Zingerman	b546b57526	selftests/bpf: update tcp_custom_syncookie to use scalar packet offset This commit updates tcp_custom_syncookie.c:tcp_parse_option() to use explicit packet offset (ctx->off) for packet access instead of ever moving pointer (ctx->ptr), this reduces verification complexity: - the tcp_parse_option() is passed as a callback to bpf_loop(); - suppose a checkpoint is created each time at function entry; - the ctx->ptr is tracked by verifier as PTR_TO_PACKET; - the ctx->ptr is incremented in tcp_parse_option(), thus umax_value field tracked for it is incremented as well; - on each next iteration of tcp_parse_option() checkpoint from a previous iteration can't be reused for state pruning, because PTR_TO_PACKET registers are considered equivalent only if old->umax_value >= cur->umax_value; - on the other hand, the ctx->off is a SCALAR, subject to widen_imprecise_scalars(); - it's exact bounds are eventually forgotten and it is tracked as unknown scalar at entry to tcp_parse_option(); - hence checkpoints created at the start of the function eventually converge. The change is similar to one applied in [0] to xdp_synproxy_kern.c. Comparing before and after with veristat yields following results: File Insns (A) Insns (B) Insns (DIFF) ------------------------------- --------- --------- ----------------- test_tcp_custom_syncookie.bpf.o 466657 12423 -454234 (-97.34%) [0] commit `977bc146d4` ("selftests/bpf: track tcp payload offset as scalar in xdp_synproxy") Acked-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> Link: https://lore.kernel.org/r/20240222150300.14909-2-eddyz87@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-22 08:46:15 -08:00
Alexei Starovoitov	a3c70a3cf1	bpf: Shrink size of struct bpf_map/bpf_array. Back in 2018 the commit `be95a845cc` ("bpf: avoid false sharing of map refcount with max_entries") added ____cacheline_aligned to "struct bpf_map" to make sure that fields like refcnt don't share a cache line with max_entries that is used to bounds check map access. That was done to make spectre style attacks harder. The main mitigation is done via code similar to array_index_nospec(), of course. This was an additional precaution. It increased the size of "struct bpf_map" a little, but it's affect on all other maps (like array) is significant, since "struct bpf_map" is typically the first member in other map types. Undo this ____cacheline_aligned tag. Instead move freeze_mutex field around, so that refcnt and max_entries are still in different cache lines. The main effect is seen in sizeof(struct bpf_array) that reduces from 320 to 248 bytes. BEFORE: struct bpf_map { const struct bpf_map_ops * ops; /* 0 8 / ... char name[16]; / 96 16 / / XXX 16 bytes hole, try to pack / / --- cacheline 2 boundary (128 bytes) --- / atomic64_t refcnt __attribute__((__aligned__(64))); / 128 8 / ... / size: 256, cachelines: 4, members: 30 / / sum members: 232, holes: 1, sum holes: 16 / / padding: 8 / / paddings: 1, sum paddings: 2 / } __attribute__((__aligned__(64))); struct bpf_array { struct bpf_map map; / 0 256 / ... / size: 320, cachelines: 5, members: 5 / / padding: 48 / / paddings: 1, sum paddings: 8 / } __attribute__((__aligned__(64))); AFTER: struct bpf_map { / size: 232, cachelines: 4, members: 30 / / paddings: 1, sum paddings: 2 / / last cacheline: 40 bytes / }; struct bpf_array { / size: 248, cachelines: 4, members: 5 / / last cacheline: 56 bytes */ }; Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240220235001.57411-1-alexei.starovoitov@gmail.com	2024-02-21 13:49:14 -08:00
Alexei Starovoitov	01dbd7d872	selftests/bpf: Remove intermediate test files. The test of linking process creates several intermediate files. Remove them once the build is over. This reduces the number of files in selftests/bpf/ directory from ~4400 to ~2600. Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240220231102.49090-1-alexei.starovoitov@gmail.com	2024-02-21 13:49:14 -08:00
Marcos Paulo de Souza	7648f0c91e	selftests/bpf: Remove empty TEST_CUSTOM_PROGS Commit `f04a32b2c5` ("selftests/bpf: Do not use sign-file as testcase") removed the TEST_CUSTOM_PROGS assignment, and removed it from being used on TEST_GEN_FILES. Remove two leftovers from that cleanup. Found by inspection. Signed-off-by: Marcos Paulo de Souza <mpdesouza@suse.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Cc: Alexey Gladkov <legion@kernel.org> Link: https://lore.kernel.org/bpf/20240216-bpf-selftests-custom-progs-v1-1-f7cf281a1fda@suse.com	2024-02-16 18:08:26 +01:00
Yonghong Song	682158ab53	bpf: Fix test verif_scale_strobemeta_subprogs failure due to llvm19 With latest llvm19, I hit the following selftest failures with $ ./test_progs -j libbpf: prog 'on_event': BPF program load failed: Permission denied libbpf: prog 'on_event': -- BEGIN PROG LOAD LOG -- combined stack size of 4 calls is 544. Too large verification time 1344153 usec stack depth 24+440+0+32 processed 51008 insns (limit 1000000) max_states_per_insn 19 total_states 1467 peak_states 303 mark_read 146 -- END PROG LOAD LOG -- libbpf: prog 'on_event': failed to load: -13 libbpf: failed to load object 'strobemeta_subprogs.bpf.o' scale_test:FAIL:expect_success unexpected error: -13 (errno 13) #498 verif_scale_strobemeta_subprogs:FAIL The verifier complains too big of the combined stack size (544 bytes) which exceeds the maximum stack limit 512. This is a regression from llvm19 ([1]). In the above error log, the original stack depth is 24+440+0+32. To satisfy interpreter's need, in verifier the stack depth is adjusted to 32+448+32+32=544 which exceeds 512, hence the error. The same adjusted stack size is also used for jit case. But the jitted codes could use smaller stack size. $ egrep -r stack_depth \| grep round_up arm64/net/bpf_jit_comp.c: ctx->stack_size = round_up(prog->aux->stack_depth, 16); loongarch/net/bpf_jit.c: bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); powerpc/net/bpf_jit_comp.c: cgctx.stack_size = round_up(fp->aux->stack_depth, 16); riscv/net/bpf_jit_comp32.c: round_up(ctx->prog->aux->stack_depth, STACK_ALIGN); riscv/net/bpf_jit_comp64.c: bpf_stack_adjust = round_up(ctx->prog->aux->stack_depth, 16); s390/net/bpf_jit_comp.c: u32 stack_depth = round_up(fp->aux->stack_depth, 8); sparc/net/bpf_jit_comp_64.c: stack_needed += round_up(stack_depth, 16); x86/net/bpf_jit_comp.c: EMIT3_off32(0x48, 0x81, 0xEC, round_up(stack_depth, 8)); x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8); x86/net/bpf_jit_comp.c: round_up(stack_depth, 8)); x86/net/bpf_jit_comp.c: int tcc_off = -4 - round_up(stack_depth, 8); x86/net/bpf_jit_comp.c: EMIT3_off32(0x48, 0x81, 0xC4, round_up(stack_depth, 8)); In the above, STACK_ALIGN in riscv/net/bpf_jit_comp32.c is defined as 16. So stack is aligned in either 8 or 16, x86/s390 having 8-byte stack alignment and the rest having 16-byte alignment. This patch calculates total stack depth based on 16-byte alignment if jit is requested. For the above failing case, the new stack size will be 32+448+0+32=512 and no verification failure. llvm19 regression will be discussed separately in llvm upstream. The verifier change caused three test failures as these tests compared messages with stack size. More specifically, - test_global_funcs/global_func1: fail with interpreter mode and success with jit mode. Adjusted stack sizes so both jit and interpreter modes will fail. - async_stack_depth/{pseudo_call_check, async_call_root_check}: since jit and interpreter will calculate different stack sizes, the failure msg is adjusted to omit those specific stack size numbers. [1] https://lore.kernel.org/bpf/32bde0f0-1881-46c9-931a-673be566c61d@linux.dev/ Suggested-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240214232951.4113094-1-yonghong.song@linux.dev Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-15 13:45:27 -08:00
Andrii Nakryiko	57354f5fde	bpf: improve duplicate source code line detection Verifier log avoids printing the same source code line multiple times when a consecutive block of BPF assembly instructions are covered by the same original (C) source code line. This greatly improves verifier log legibility. Unfortunately, this check is imperfect and in production applications it quite often happens that verifier log will have multiple duplicated source lines emitted, for no apparently good reason. E.g., this is excerpt from a real-world BPF application (with register states omitted for clarity): BEFORE ====== ; for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { @ strobemeta_probe.bpf.c:394 5369: (07) r8 += 2 ; 5370: (07) r7 += 16 ; ; for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { @ strobemeta_probe.bpf.c:394 5371: (07) r9 += 1 ; 5372: (79) r4 = (u64 )(r10 -32) ; ; for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { @ strobemeta_probe.bpf.c:394 5373: (55) if r9 != 0xf goto pc+2 ; if (i >= map->cnt) @ strobemeta_probe.bpf.c:396 5376: (79) r1 = (u64 )(r10 -40) ; 5377: (79) r1 = (u64 )(r1 +8) ; ; if (i >= map->cnt) @ strobemeta_probe.bpf.c:396 5378: (dd) if r1 s<= r9 goto pc-5 ; ; descr->key_lens[i] = 0; @ strobemeta_probe.bpf.c:398 5379: (b4) w1 = 0 ; 5380: (6b) (u16 )(r8 -30) = r1 ; ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); @ strobemeta_probe.bpf.c:400 5381: (79) r3 = (u64 )(r7 -8) ; 5382: (7b) (u64 )(r10 -24) = r6 ; ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); @ strobemeta_probe.bpf.c:400 5383: (bc) w6 = w6 ; ; barrier_var(payload_off); @ strobemeta_probe.bpf.c:280 5384: (bf) r2 = r6 ; 5385: (bf) r1 = r4 ; As can be seen, line 394 is emitted thrice, 396 is emitted twice, and line 400 is duplicated as well. Note that there are no intermingling other lines of source code in between these duplicates, so the issue is not compiler reordering assembly instruction such that multiple original source code lines are in effect. It becomes more obvious what's going on if we look at full original line info information (using btfdump for this, [0]): #2764: line: insn #5363 --> 394:3 @ ./././strobemeta_probe.bpf.c for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { #2765: line: insn #5373 --> 394:21 @ ./././strobemeta_probe.bpf.c for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { #2766: line: insn #5375 --> 394:47 @ ./././strobemeta_probe.bpf.c for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { #2767: line: insn #5377 --> 394:3 @ ./././strobemeta_probe.bpf.c for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { #2768: line: insn #5378 --> 414:10 @ ./././strobemeta_probe.bpf.c return off; We can see that there are four line info records covering instructions #5363 through #5377 (instruction indices are shifted due to subprog instruction being appended to main program), all of them are pointing to the same C source code line #394. But each of them points to a different part of that line, which is denoted by differing column numbers (3, 21, 47, 3). But verifier log doesn't distinguish between parts of the same source code line and doesn't emit this column number information, so for end user it's just a repetitive visual noise. So let's improve the detection of repeated source code line and avoid this. With the changes in this patch, we get this output for the same piece of BPF program log: AFTER ===== ; for (int i = 0; i < STROBE_MAX_MAP_ENTRIES; ++i) { @ strobemeta_probe.bpf.c:394 5369: (07) r8 += 2 ; 5370: (07) r7 += 16 ; 5371: (07) r9 += 1 ; 5372: (79) r4 = (u64 )(r10 -32) ; 5373: (55) if r9 != 0xf goto pc+2 ; if (i >= map->cnt) @ strobemeta_probe.bpf.c:396 5376: (79) r1 = (u64 )(r10 -40) ; 5377: (79) r1 = (u64 )(r1 +8) ; 5378: (dd) if r1 s<= r9 goto pc-5 ; ; descr->key_lens[i] = 0; @ strobemeta_probe.bpf.c:398 5379: (b4) w1 = 0 ; 5380: (6b) (u16 )(r8 -30) = r1 ; ; task, data, off, STROBE_MAX_STR_LEN, map->entries[i].key); @ strobemeta_probe.bpf.c:400 5381: (79) r3 = (u64 )(r7 -8) ; 5382: (7b) (u64 )(r10 -24) = r6 ; 5383: (bc) w6 = w6 ; ; barrier_var(payload_off); @ strobemeta_probe.bpf.c:280 5384: (bf) r2 = r6 ; 5385: (bf) r1 = r4 ; All the duplication is gone and the log is cleaner and less distracting. [0] https://github.com/anakryiko/btfdump Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240214174100.2847419-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-15 13:00:48 -08:00
Andrii Nakryiko	a4561f5afe	bpf: Use O(log(N)) binary search to find line info record Real-world BPF applications keep growing in size. Medium-sized production application can easily have 50K+ verified instructions, and its line info section in .BTF.ext has more than 3K entries. When verifier emits log with log_level>=1, it annotates assembly code with matched original C source code. Currently it uses linear search over line info records to find a match. As complexity of BPF applications grows, this O(K * N) approach scales poorly. So, let's instead of linear O(N) search for line info record use faster equivalent O(log(N)) binary search algorithm. It's not a plain binary search, as we don't look for exact match. It's an upper bound search variant, looking for rightmost line info record that starts at or before given insn_off. Some unscientific measurements were done before and after this change. They were done in VM and fluctuate a bit, but overall the speed up is undeniable. BASELINE ======== File Program Duration (us) Insns -------------------------------- ---------------- ------------- ------ katran.bpf.o balancer_ingress 2497130 343552 pyperf600.bpf.linked3.o on_event 12389611 627288 strobelight_pyperf_libbpf.o on_py_event 387399 52445 -------------------------------- ---------------- ------------- ------ BINARY SEARCH ============= File Program Duration (us) Insns -------------------------------- ---------------- ------------- ------ katran.bpf.o balancer_ingress 2339312 343552 pyperf600.bpf.linked3.o on_event 5602203 627288 strobelight_pyperf_libbpf.o on_py_event 294761 52445 -------------------------------- ---------------- ------------- ------ While Katran's speed up is pretty modest (about 105ms, or 6%), for production pyperf BPF program (on_py_event) it's much greater already, going from 387ms down to 295ms (23% improvement). Looking at BPF selftests's biggest pyperf example, we can see even more dramatic improvement, shaving more than 50% of time, going from 12.3s down to 5.6s. Different amount of improvement is the function of overall amount of BPF assembly instructions in .bpf.o files (which contributes to how much line info records there will be and thus, on average, how much time linear search will take), among other things: $ llvm-objdump -d katran.bpf.o \| wc -l 3863 $ llvm-objdump -d strobelight_pyperf_libbpf.o \| wc -l 6997 $ llvm-objdump -d pyperf600.bpf.linked3.o \| wc -l 87854 Granted, this only applies to debugging cases (e.g., using veristat, or failing verification in production), but seems worth doing to improve overall developer experience anyways. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240214002311.2197116-1-andrii@kernel.org	2024-02-14 23:53:42 +01:00
Matt Bobrowski	1159d27852	libbpf: Make remark about zero-initializing bpf_*_info structs In some situations, if you fail to zero-initialize the bpf_{prog,map,btf,link}_info structs supplied to the set of LIBBPF helpers bpf_{prog,map,btf,link}_get_info_by_fd(), you can expect the helper to return an error. This can possibly leave people in a situation where they're scratching their heads for an unnnecessary amount of time. Make an explicit remark about the requirement of zero-initializing the supplied bpf_{prog,map,btf,link}_info structs for the respective LIBBPF helpers. Internally, LIBBPF helpers bpf_{prog,map,btf,link}_get_info_by_fd() call into bpf_obj_get_info_by_fd() where the bpf(2) BPF_OBJ_GET_INFO_BY_FD command is used. This specific command is effectively backed by restrictions enforced by the bpf_check_uarg_tail_zero() helper. This function ensures that if the size of the supplied bpf_{prog,map,btf,link}_info structs are larger than what the kernel can handle, trailing bits are zeroed. This can be a problem when compiling against UAPI headers that don't necessarily match the sizes of the same underlying types known to the kernel. Signed-off-by: Matt Bobrowski <mattbobrowski@google.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/ZcyEb8x4VbhieWsL@google.com	2024-02-14 09:48:46 -08:00
Andrii Nakryiko	7cc13adbd0	bpf: emit source code file name and line number in verifier log As BPF applications grow in size and complexity and are separated into multiple .bpf.c files that are statically linked together, it becomes harder and harder to match verifier's BPF assembly level output to original C code. While often annotated C source code is unique enough to be able to identify the file it belongs to, quite often this is actually problematic as parts of source code can be quite generic. Long story short, it is very useful to see source code file name and line number information along with the original C code. Verifier already knows this information, we just need to output it. This patch extends verifier log with file name and line number information, emitted next to original (presumably C) source code, annotating BPF assembly output, like so: ; <original C code> @ <filename>.bpf.c:<line> If file name has directory names in it, they are stripped away. This should be fine in practice as file names tend to be pretty unique with C code anyways, and keeping log size smaller is always good. In practice this might look something like below, where some code is coming from application files, while others are from libbpf's usdt.bpf.h header file: ; if (STROBEMETA_READ( @ strobemeta_probe.bpf.c:534 5592: (79) r1 = (u64 )(r10 -56) ; R1_w=mem_or_null(id=1589,sz=7680) R10=fp0 5593: (7b) (u64 )(r10 -56) = r1 ; R1_w=mem_or_null(id=1589,sz=7680) R10=fp0 5594: (79) r3 = (u64 )(r10 -8) ; R3_w=scalar() R10=fp0 fp-8=mmmmmmmm ... 170: (71) r1 = (u8 )(r8 +15) ; frame1: R1_w=scalar(...) R8_w=map_value(map=__bpf_usdt_spec,ks=4,vs=208) 171: (67) r1 <<= 56 ; frame1: R1_w=scalar(...) 172: (c7) r1 s>>= 56 ; frame1: R1_w=scalar(smin=smin32=-128,smax=smax32=127) ; val <<= arg_spec->arg_bitshift; @ usdt.bpf.h:183 173: (67) r1 <<= 32 ; frame1: R1_w=scalar(...) 174: (77) r1 >>= 32 ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) 175: (79) r2 = (u64 )(r10 -8) ; frame1: R2_w=scalar() R10=fp0 fp-8=mmmmmmmm 176: (6f) r2 <<= r1 ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R2_w=scalar() 177: (7b) (u64 )(r10 -8) = r2 ; frame1: R2_w=scalar(id=61) R10=fp0 fp-8_w=scalar(id=61) ; if (arg_spec->arg_signed) @ usdt.bpf.h:184 178: (bf) r3 = r2 ; frame1: R2_w=scalar(id=61) R3_w=scalar(id=61) 179: (7f) r3 >>= r1 ; frame1: R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R3_w=scalar() ; if (arg_spec->arg_signed) @ usdt.bpf.h:184 180: (71) r4 = (u8 )(r8 +14) 181: safe log_fixup tests needed a minor adjustment as verifier log output increased a bit and that test is quite sensitive to such changes. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240212235944.2816107-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-13 18:51:32 -08:00
Alexei Starovoitov	96adbf7125	Merge branch 'fix-global-subprog-ptr_to_ctx-arg-handling' Andrii Nakryiko says: ==================== Fix global subprog PTR_TO_CTX arg handling Fix confusing and incorrect inference of PTR_TO_CTX argument type in BPF global subprogs. For some program types (iters, tracepoint, any program type that doesn't have fixed named "canonical" context type) when user uses (in a correct and valid way) a pointer argument to user-defined anonymous struct type, verifier will incorrectly assume that it has to be PTR_TO_CTX argument. While it should be just a PTR_TO_MEM argument with allowed size calculated from user-provided (even if anonymous) struct. This did come up in practice and was very confusing to users, so let's prevent this going forward. We had to do a slight refactoring of btf_get_prog_ctx_type() to make it easy to support a special s390x KPROBE use cases. See details in respective patches. v1->v2: - special-case typedef bpf_user_pt_regs_t handling for KPROBE programs, fixing s390x after changes in patch #2. ==================== Link: https://lore.kernel.org/r/20240212233221.2575350-1-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-13 18:46:47 -08:00
Andrii Nakryiko	63d5a33fb4	selftests/bpf: add anonymous user struct as global subprog arg test Add tests validating that kernel handles pointer to anonymous struct argument as PTR_TO_MEM case, not as PTR_TO_CTX case. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240212233221.2575350-5-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-13 18:46:47 -08:00
Andrii Nakryiko	879bbe7aa4	bpf: don't infer PTR_TO_CTX for programs with unnamed context type For program types that don't have named context type name (e.g., BPF iterator programs or tracepoint programs), ctx_tname will be a non-NULL empty string. For such programs it shouldn't be possible to have PTR_TO_CTX argument for global subprogs based on type name alone. arg:ctx tag is the only way to have PTR_TO_CTX passed into global subprog for such program types. Fix this loophole, which currently would assume PTR_TO_CTX whenever user uses a pointer to anonymous struct as an argument to their global subprogs. This happens in practice with the following (quite common, in practice) approach: typedef struct { /* anonymous / int x; } my_type_t; int my_subprog(my_type_t arg) { ... } User's intent is to have PTR_TO_MEM argument for `arg`, but verifier will complain about expecting PTR_TO_CTX. This fix also closes unintended s390x-specific KPROBE handling of PTR_TO_CTX case. Selftest change is necessary to accommodate this. Fixes: `91cc1a9974` ("bpf: Annotate context types") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240212233221.2575350-4-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-13 18:46:47 -08:00
Andrii Nakryiko	824c58fb10	bpf: handle bpf_user_pt_regs_t typedef explicitly for PTR_TO_CTX global arg Expected canonical argument type for global function arguments representing PTR_TO_CTX is `bpf_user_pt_regs_t *ctx`. This currently works on s390x by accident because kernel resolves such typedef to underlying struct (which is anonymous on s390x), and erroneously accepting it as expected context type. We are fixing this problem next, which would break s390x arch, so we need to handle `bpf_user_pt_regs_t` case explicitly for KPROBE programs. Fixes: `91cc1a9974` ("bpf: Annotate context types") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240212233221.2575350-3-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-13 18:46:47 -08:00
Andrii Nakryiko	fb5b86cfd4	bpf: simplify btf_get_prog_ctx_type() into btf_is_prog_ctx_type() Return result of btf_get_prog_ctx_type() is never used and callers only check NULL vs non-NULL case to determine if given type matches expected PTR_TO_CTX type. So rename function to `btf_is_prog_ctx_type()` and return a simple true/false. We'll use this simpler interface to handle kprobe program type's special typedef case in the next patch. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20240212233221.2575350-2-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2024-02-13 18:46:46 -08:00
Oliver Crumrine	32e18e7688	bpf: remove check in __cgroup_bpf_run_filter_skb Originally, this patch removed a redundant check in BPF_CGROUP_RUN_PROG_INET_EGRESS, as the check was already being done in the function it called, __cgroup_bpf_run_filter_skb. For v2, it was reccomended that I remove the check from __cgroup_bpf_run_filter_skb, and add the checks to the other macro that calls that function, BPF_CGROUP_RUN_PROG_INET_INGRESS. To sum it up, checking that the socket exists and that it is a full socket is now part of both macros BPF_CGROUP_RUN_PROG_INET_EGRESS and BPF_CGROUP_RUN_PROG_INET_INGRESS, and it is no longer part of the function they call, __cgroup_bpf_run_filter_skb. v3->v4: Fixed weird merge conflict. v2->v3: Sent to bpf-next instead of generic patch v1->v2: Addressed feedback about where check should be removed. Signed-off-by: Oliver Crumrine <ozlinuxc@gmail.com> Acked-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/7lv62yiyvmj5a7eozv2iznglpkydkdfancgmbhiptrgvgan5sy@3fl3onchgdz3 Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-02-13 15:41:17 -08:00
Martin KaFai Lau	2c21a0f67c	Merge branch 'Support PTR_MAYBE_NULL for struct_ops arguments.' Kui-Feng Lee says: ==================== Allow passing null pointers to the operators provided by a struct_ops object. This is an RFC to collect feedbacks/opinions. The function pointers that are passed to struct_ops operators (the function pointers) are always considered reliable until now. They cannot be null. However, in certain scenarios, it should be possible to pass null pointers to these operators. For instance, sched_ext may pass a null pointer in the struct task type to an operator that is provided by its struct_ops objects. The proposed solution here is to add PTR_MAYBE_NULL annotations to arguments and create instances of struct bpf_ctx_arg_aux (arg_info) for these arguments. These arg_infos will be installed at prog->aux->ctx_arg_info and will be checked by the BPF verifier when loading the programs. When a struct_ops program accesses arguments in the ctx, the verifier will call btf_ctx_access() (through bpf_verifier_ops->is_valid_access) to verify the access. btf_ctx_access() will check arg_info and use the information of the matched arg_info to properly set reg_type. For nullable arguments, this patch sets an arg_info to label them with PTR_TO_BTF_ID \| PTR_TRUSTED \| PTR_MAYBE_NULL. This enforces the verifier to check programs and ensure that they properly check the pointer. The programs should check if the pointer is null before reading/writing the pointed memory. The implementer of a struct_ops should annotate the arguments that can be null. The implementer should define a stub function (empty) as a placeholder for each defined operator. The name of a stub function should be in the pattern "<st_op_type>__<operator name>". For example, for test_maybe_null of struct bpf_testmod_ops, it's stub function name should be "bpf_testmod_ops__test_maybe_null". You mark an argument nullable by suffixing the argument name with "__nullable" at the stub function. Here is the example in bpf_testmod.c. static int bpf_testmod_ops__test_maybe_null(int dummy, struct task_struct *task__nullable) { return 0; } This means that the argument 1 (2nd) of bpf_testmod_ops->test_maybe_null, which is a function pointer that can be null. With this annotation, the verifier will understand how to check programs using this arguments. A BPF program that implement test_maybe_null should check the pointer to make sure it is not null before using it. For example, if (task__nullable) save_tgid = task__nullable->tgid Without the check, the verifier will reject the program. Since we already has stub functions for kCFI, we just reuse these stub functions with the naming convention mentioned earlier. These stub functions with the naming convention is only required if there are nullable arguments to annotate. For functions without nullable arguments, stub functions are not necessary for the purpose of this patch. --- Major changes from v7: - Update a comment that is out of date. Major changes from v6: - Remove "len" from bpf_struct_ops_desc_release(). - Rename arg_info(s) to info, and rename all_arg_info to arg_info in prepare_arg_info(). - Rename arg_info to info in struct bpf_struct_ops_arg_info. Major changes from v5: - Rename all member_arg_info variables. - Refactor to bpf_struct_ops_desc_release() to share code between btf_free_struct_ops_tab() and bpf_struct_ops_desc_init(). - Refactor to btf_param_match_suffix(). (Add a new patch as the part 2.) - Clean up the commit log and remaining code in the patch of test cases. - Update a comment in struct_ops_maybe_null.c. Major changes from v4: - Remove the support of pointers to types other than struct types. That would be a separate patchset. - Remove the patch about extending PTR_TO_BTF_ID. - Remove the test against various pointer types from selftests. - Remove the patch "bpf: Remove an unnecessary check" and send that patch separately. - Remove member_arg_info_cnt from struct bpf_struct_ops_desc. - Use btf_id from FUNC_PROTO of a function pointer instead of a stub function. Major changes from v3: - Move the code collecting argument information to prepare_arg_info() called in the loop in bpf_struct_ops_desc_init(). - Simplify the memory allocation by having separated arg_info for each member of a struct_ops type. - Extend PTR_TO_BTF_ID to pointers to scalar types and array types, not only to struct types. Major changes from v2: - Remove dead code. - Add comments to explain the code itself. Major changes from v1: - Annotate arguments by suffixing argument names with "__nullable" at stub functions. v7: https://lore.kernel.org/all/20240209020053.1132710-1-thinker.li@gmail.com/ v6: https://lore.kernel.org/all/20240208065103.2154768-1-thinker.li@gmail.com/ v5: https://lore.kernel.org/all/20240206063833.2520479-1-thinker.li@gmail.com/ v4: https://lore.kernel.org/all/20240202220516.1165466-1-thinker.li@gmail.com/ v3: https://lore.kernel.org/all/20240122212217.1391878-1-thinker.li@gmail.com/ v2: https://lore.kernel.org/all/20240118224922.336006-1-thinker.li@gmail.com/ ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-02-13 15:16:44 -08:00
Kui-Feng Lee	00f239eccf	selftests/bpf: Test PTR_MAYBE_NULL arguments of struct_ops operators. Test if the verifier verifies nullable pointer arguments correctly for BPF struct_ops programs. "test_maybe_null" in struct bpf_testmod_ops is the operator defined for the test cases here. A BPF program should check a pointer for NULL beforehand to access the value pointed by the nullable pointer arguments, or the verifier should reject the programs. The test here includes two parts; the programs checking pointers properly and the programs not checking pointers beforehand. The test checks if the verifier accepts the programs checking properly and rejects the programs not checking at all. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240209023750.1153905-5-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-02-13 15:16:44 -08:00
Kui-Feng Lee	1611603537	bpf: Create argument information for nullable arguments. Collect argument information from the type information of stub functions to mark arguments of BPF struct_ops programs with PTR_MAYBE_NULL if they are nullable. A nullable argument is annotated by suffixing "__nullable" at the argument name of stub function. For nullable arguments, this patch sets a struct bpf_ctx_arg_aux to label their reg_type with PTR_TO_BTF_ID \| PTR_TRUSTED \| PTR_MAYBE_NULL. This makes the verifier to check programs and ensure that they properly check the pointer. The programs should check if the pointer is null before accessing the pointed memory. The implementer of a struct_ops type should annotate the arguments that can be null. The implementer should define a stub function (empty) as a placeholder for each defined operator. The name of a stub function should be in the pattern "<st_op_type>__<operator name>". For example, for test_maybe_null of struct bpf_testmod_ops, it's stub function name should be "bpf_testmod_ops__test_maybe_null". You mark an argument nullable by suffixing the argument name with "__nullable" at the stub function. Since we already has stub functions for kCFI, we just reuse these stub functions with the naming convention mentioned earlier. These stub functions with the naming convention is only required if there are nullable arguments to annotate. For functions having not nullable arguments, stub functions are not necessary for the purpose of this patch. This patch will prepare a list of struct bpf_ctx_arg_aux, aka arg_info, for each member field of a struct_ops type. "arg_info" will be assigned to "prog->aux->ctx_arg_info" of BPF struct_ops programs in check_struct_ops_btf_id() so that it can be used by btf_ctx_access() later to set reg_type properly for the verifier. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240209023750.1153905-4-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-02-13 15:16:44 -08:00
Kui-Feng Lee	6115a0aeef	bpf: Move __kfunc_param_match_suffix() to btf.c. Move __kfunc_param_match_suffix() to btf.c and rename it as btf_param_match_suffix(). It can be reused by bpf_struct_ops later. Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240209023750.1153905-3-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-02-13 15:16:44 -08:00
Kui-Feng Lee	77c0208e19	bpf: add btf pointer to struct bpf_ctx_arg_aux. Enable the providers to use types defined in a module instead of in the kernel (btf_vmlinux). Signed-off-by: Kui-Feng Lee <thinker.li@gmail.com> Link: https://lore.kernel.org/r/20240209023750.1153905-2-thinker.li@gmail.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-02-13 15:16:44 -08:00
Dave Thaler	dc8543b597	bpf, docs: Update ISA document title * Use "Instruction Set Architecture (ISA)" instead of "Instruction Set Specification" * Remove version number As previously discussed on the mailing list at https://mailarchive.ietf.org/arch/msg/bpf/SEpn3OL9TabNRn-4rDX9A6XVbjM/ Signed-off-by: Dave Thaler <dthaler1968@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: David Vernet <void@manifault.com> Link: https://lore.kernel.org/bpf/20240208221449.12274-1-dthaler1968@gmail.com	2024-02-13 23:14:15 +01:00
Cupertino Miranda	12bbcf8e84	libbpf: Add support to GCC in CORE macro definitions Due to internal differences between LLVM and GCC the current implementation for the CO-RE macros does not fit GCC parser, as it will optimize those expressions even before those would be accessible by the BPF backend. As examples, the following would be optimized out with the original definitions: - As enums are converted to their integer representation during parsing, the IR would not know how to distinguish an integer constant from an actual enum value. - Types need to be kept as temporary variables, as the existing type casts of the 0 address (as expanded for LLVM), are optimized away by the GCC C parser, never really reaching GCCs IR. Although, the macros appear to add extra complexity, the expanded code is removed from the compilation flow very early in the compilation process, not really affecting the quality of the generated assembly. Signed-off-by: Cupertino Miranda <cupertino.miranda@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240213173543.1397708-1-cupertino.miranda@oracle.com	2024-02-13 11:28:12 -08:00
Jose E. Marchesi	52dbd67dff	bpf: Abstract loop unrolling pragmas in BPF selftests [Changes from V1: - Avoid conflict by rebasing with latest master.] Some BPF tests use loop unrolling compiler pragmas that are clang specific and not supported by GCC. These pragmas, along with their GCC equivalences are: #pragma clang loop unroll_count(N) #pragma GCC unroll N #pragma clang loop unroll(full) #pragma GCC unroll 65534 #pragma clang loop unroll(disable) #pragma GCC unroll 1 #pragma unroll [aka #pragma clang loop unroll(enable)] There is no GCC equivalence to this pragma. It enables unrolling on loops that the compiler would not ordinarily unroll even with -O2\|-funroll-loops, but it is not equivalent to full unrolling either. This patch adds a new header progs/bpf_compiler.h that defines the following macros, which correspond to each pair of compiler-specific pragmas above: __pragma_loop_unroll_count(N) __pragma_loop_unroll_full __pragma_loop_no_unroll __pragma_loop_unroll The selftests using loop unrolling pragmas are then changed to include the header and use these macros in place of the explicit pragmas. Tested in bpf-next master. No regressions. Signed-off-by: Jose E. Marchesi <jose.marchesi@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/20240208203612.29611-1-jose.marchesi@oracle.com	2024-02-13 11:17:30 -08:00
Yonghong Song	fc1c9e40da	selftests/bpf: Ensure fentry prog cannot attach to bpf_spin_{lock,unlcok}() Add two tests to ensure fentry programs cannot attach to bpf_spin_{lock,unlock}() helpers. The tracing_failure.c files can be used in the future for other tracing failure cases. Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20240207070107.335341-1-yonghong.song@linux.dev	2024-02-13 11:11:25 -08:00
Yonghong Song	178c54666f	bpf: Mark bpf_spin_{lock,unlock}() helpers with notrace correctly Currently tracing is supposed not to allow for bpf_spin_{lock,unlock}() helper calls. This is to prevent deadlock for the following cases: - there is a prog (prog-A) calling bpf_spin_{lock,unlock}(). - there is a tracing program (prog-B), e.g., fentry, attached to bpf_spin_lock() and/or bpf_spin_unlock(). - prog-B calls bpf_spin_{lock,unlock}(). For such a case, when prog-A calls bpf_spin_{lock,unlock}(), a deadlock will happen. The related source codes are below in kernel/bpf/helpers.c: notrace BPF_CALL_1(bpf_spin_lock, struct bpf_spin_lock , lock) notrace BPF_CALL_1(bpf_spin_unlock, struct bpf_spin_lock , lock) notrace is supposed to prevent fentry prog from attaching to bpf_spin_{lock,unlock}(). But actually this is not the case and fentry prog can successfully attached to bpf_spin_lock(). Siddharth Chintamaneni reported the issue in [1]. The following is the macro definition for above BPF_CALL_1: #define BPF_CALL_x(x, name, ...) \ static __always_inline \ u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \ typedef u64 (*btf_##name)(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)); \ u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__)); \ u64 name(__BPF_REG(x, __BPF_DECL_REGS, __BPF_N, __VA_ARGS__)) \ { \ return ((btf_##name)____##name)(__BPF_MAP(x,__BPF_CAST,__BPF_N,__VA_ARGS__));\ } \ static __always_inline \ u64 ____##name(__BPF_MAP(x, __BPF_DECL_ARGS, __BPF_V, __VA_ARGS__)) #define BPF_CALL_1(name, ...) BPF_CALL_x(1, name, __VA_ARGS__) The notrace attribute is actually applied to the static always_inline function ____bpf_spin_{lock,unlock}(). The actual callback function bpf_spin_{lock,unlock}() is not marked with notrace, hence allowing fentry prog to attach to two helpers, and this may cause the above mentioned deadlock. Siddharth Chintamaneni actually has a reproducer in [2]. To fix the issue, a new macro NOTRACE_BPF_CALL_1 is introduced which will add notrace attribute to the original function instead of the hidden always_inline function and this fixed the problem. [1] https://lore.kernel.org/bpf/CAE5sdEigPnoGrzN8WU7Tx-h-iFuMZgW06qp0KHWtpvoXxf1OAQ@mail.gmail.com/ [2] https://lore.kernel.org/bpf/CAE5sdEg6yUc_Jz50AnUXEEUh6O73yQ1Z6NV2srJnef0ZrQkZew@mail.gmail.com/ Fixes: `d83525ca62` ("bpf: introduce bpf_spin_lock") Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/bpf/20240207070102.335167-1-yonghong.song@linux.dev	2024-02-13 11:11:25 -08:00
Daniel Xu	5b268d1ebc	bpf: Have bpf_rdonly_cast() take a const pointer Since `20d59ee551` ("libbpf: add bpf_core_cast() macro"), libbpf is now exporting a const arg version of bpf_rdonly_cast(). This causes the following conflicting type error when generating kfunc prototypes from BTF: In file included from skeleton/pid_iter.bpf.c:5: /home/dxu/dev/linux/tools/bpf/bpftool/bootstrap/libbpf/include/bpf/bpf_core_read.h:297:14: error: conflicting types for 'bpf_rdonly_cast' extern void bpf_rdonly_cast(const void obj__ign, __u32 btf_id__k) __ksym __weak; ^ ./vmlinux.h:135625:14: note: previous declaration is here extern void bpf_rdonly_cast(void obj__ign, u32 btf_id__k) __weak __ksym; This is b/c the kernel defines bpf_rdonly_cast() with non-const arg. Since const arg is more permissive and thus backwards compatible, we change the kernel definition as well to avoid conflicting type errors. Signed-off-by: Daniel Xu <dxu@dxuuu.xyz> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/bpf/dfd3823f11ffd2d4c838e961d61ec9ae8a646773.1707080349.git.dxu@dxuuu.xyz	2024-02-13 11:05:26 -08:00
Marco Elver	68bc61c26c	bpf: Allow compiler to inline most of bpf_local_storage_lookup() In various performance profiles of kernels with BPF programs attached, bpf_local_storage_lookup() appears as a significant portion of CPU cycles spent. To enable the compiler generate more optimal code, turn bpf_local_storage_lookup() into a static inline function, where only the cache insertion code path is outlined Notably, outlining cache insertion helps avoid bloating callers by duplicating setting up calls to raw_spin_{lock,unlock}_irqsave() (on architectures which do not inline spin_lock/unlock, such as x86), which would cause the compiler produce worse code by deciding to outline otherwise inlinable functions. The call overhead is neutral, because we make 2 calls either way: either calling raw_spin_lock_irqsave() and raw_spin_unlock_irqsave(); or call __bpf_local_storage_insert_cache(), which calls raw_spin_lock_irqsave(), followed by a tail-call to raw_spin_unlock_irqsave() where the compiler can perform TCO and (in optimized uninstrumented builds) turns it into a plain jump. The call to __bpf_local_storage_insert_cache() can be elided entirely if cacheit_lockit is a false constant expression. Based on results from './benchs/run_bench_local_storage.sh' (21 trials, reboot between each trial; x86 defconfig + BPF, clang 16) this produces improvements in throughput and latency in the majority of cases, with an average (geomean) improvement of 8%: +---- Hashmap Control -------------------- \| \| + num keys: 10 \| : <before> \| <after> \| +-+ hashmap (control) sequential get +----------------------+---------------------- \| +- hits throughput \| 14.789 M ops/s \| 14.745 M ops/s ( ~ ) \| +- hits latency \| 67.679 ns/op \| 67.879 ns/op ( ~ ) \| +- important_hits throughput \| 14.789 M ops/s \| 14.745 M ops/s ( ~ ) \| \| + num keys: 1000 \| : <before> \| <after> \| +-+ hashmap (control) sequential get +----------------------+---------------------- \| +- hits throughput \| 12.233 M ops/s \| 12.170 M ops/s ( ~ ) \| +- hits latency \| 81.754 ns/op \| 82.185 ns/op ( ~ ) \| +- important_hits throughput \| 12.233 M ops/s \| 12.170 M ops/s ( ~ ) \| \| + num keys: 10000 \| : <before> \| <after> \| +-+ hashmap (control) sequential get +----------------------+---------------------- \| +- hits throughput \| 7.220 M ops/s \| 7.204 M ops/s ( ~ ) \| +- hits latency \| 138.522 ns/op \| 138.842 ns/op ( ~ ) \| +- important_hits throughput \| 7.220 M ops/s \| 7.204 M ops/s ( ~ ) \| \| + num keys: 100000 \| : <before> \| <after> \| +-+ hashmap (control) sequential get +----------------------+---------------------- \| +- hits throughput \| 5.061 M ops/s \| 5.165 M ops/s (+2.1%) \| +- hits latency \| 198.483 ns/op \| 194.270 ns/op (-2.1%) \| +- important_hits throughput \| 5.061 M ops/s \| 5.165 M ops/s (+2.1%) \| \| + num keys: 4194304 \| : <before> \| <after> \| +-+ hashmap (control) sequential get +----------------------+---------------------- \| +- hits throughput \| 2.864 M ops/s \| 2.882 M ops/s ( ~ ) \| +- hits latency \| 365.220 ns/op \| 361.418 ns/op (-1.0%) \| +- important_hits throughput \| 2.864 M ops/s \| 2.882 M ops/s ( ~ ) \| +---- Local Storage ---------------------- \| \| + num_maps: 1 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 33.005 M ops/s \| 39.068 M ops/s (+18.4%) \| +- hits latency \| 30.300 ns/op \| 25.598 ns/op (-15.5%) \| +- important_hits throughput \| 33.005 M ops/s \| 39.068 M ops/s (+18.4%) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 37.151 M ops/s \| 44.926 M ops/s (+20.9%) \| +- hits latency \| 26.919 ns/op \| 22.259 ns/op (-17.3%) \| +- important_hits throughput \| 37.151 M ops/s \| 44.926 M ops/s (+20.9%) \| \| + num_maps: 10 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 32.288 M ops/s \| 38.099 M ops/s (+18.0%) \| +- hits latency \| 30.972 ns/op \| 26.248 ns/op (-15.3%) \| +- important_hits throughput \| 3.229 M ops/s \| 3.810 M ops/s (+18.0%) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 34.473 M ops/s \| 41.145 M ops/s (+19.4%) \| +- hits latency \| 29.010 ns/op \| 24.307 ns/op (-16.2%) \| +- important_hits throughput \| 12.312 M ops/s \| 14.695 M ops/s (+19.4%) \| \| + num_maps: 16 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 32.524 M ops/s \| 38.341 M ops/s (+17.9%) \| +- hits latency \| 30.748 ns/op \| 26.083 ns/op (-15.2%) \| +- important_hits throughput \| 2.033 M ops/s \| 2.396 M ops/s (+17.9%) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 34.575 M ops/s \| 41.338 M ops/s (+19.6%) \| +- hits latency \| 28.925 ns/op \| 24.193 ns/op (-16.4%) \| +- important_hits throughput \| 11.001 M ops/s \| 13.153 M ops/s (+19.6%) \| \| + num_maps: 17 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 28.861 M ops/s \| 32.756 M ops/s (+13.5%) \| +- hits latency \| 34.649 ns/op \| 30.530 ns/op (-11.9%) \| +- important_hits throughput \| 1.700 M ops/s \| 1.929 M ops/s (+13.5%) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 31.529 M ops/s \| 36.110 M ops/s (+14.5%) \| +- hits latency \| 31.719 ns/op \| 27.697 ns/op (-12.7%) \| +- important_hits throughput \| 9.598 M ops/s \| 10.993 M ops/s (+14.5%) \| \| + num_maps: 24 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 18.602 M ops/s \| 19.937 M ops/s (+7.2%) \| +- hits latency \| 53.767 ns/op \| 50.166 ns/op (-6.7%) \| +- important_hits throughput \| 0.776 M ops/s \| 0.831 M ops/s (+7.2%) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 21.718 M ops/s \| 23.332 M ops/s (+7.4%) \| +- hits latency \| 46.047 ns/op \| 42.865 ns/op (-6.9%) \| +- important_hits throughput \| 6.110 M ops/s \| 6.564 M ops/s (+7.4%) \| \| + num_maps: 32 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 14.118 M ops/s \| 14.626 M ops/s (+3.6%) \| +- hits latency \| 70.856 ns/op \| 68.381 ns/op (-3.5%) \| +- important_hits throughput \| 0.442 M ops/s \| 0.458 M ops/s (+3.6%) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 17.111 M ops/s \| 17.906 M ops/s (+4.6%) \| +- hits latency \| 58.451 ns/op \| 55.865 ns/op (-4.4%) \| +- important_hits throughput \| 4.776 M ops/s \| 4.998 M ops/s (+4.6%) \| \| + num_maps: 100 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 5.281 M ops/s \| 5.528 M ops/s (+4.7%) \| +- hits latency \| 192.398 ns/op \| 183.059 ns/op (-4.9%) \| +- important_hits throughput \| 0.053 M ops/s \| 0.055 M ops/s (+4.9%) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 6.265 M ops/s \| 6.498 M ops/s (+3.7%) \| +- hits latency \| 161.436 ns/op \| 152.877 ns/op (-5.3%) \| +- important_hits throughput \| 1.636 M ops/s \| 1.697 M ops/s (+3.7%) \| \| + num_maps: 1000 \| : <before> \| <after> \| +-+ local_storage cache sequential get +----------------------+---------------------- \| +- hits throughput \| 0.355 M ops/s \| 0.354 M ops/s ( ~ ) \| +- hits latency \| 2826.538 ns/op \| 2827.139 ns/op ( ~ ) \| +- important_hits throughput \| 0.000 M ops/s \| 0.000 M ops/s ( ~ ) \| : \| : <before> \| <after> \| +-+ local_storage cache interleaved get +----------------------+---------------------- \| +- hits throughput \| 0.404 M ops/s \| 0.403 M ops/s ( ~ ) \| +- hits latency \| 2481.190 ns/op \| 2487.555 ns/op ( ~ ) \| +- important_hits throughput \| 0.102 M ops/s \| 0.101 M ops/s ( ~ ) The on_lookup test in {cgrp,task}_ls_recursion.c is removed because the bpf_local_storage_lookup is no longer traceable and adding tracepoint will make the compiler generate worse code: https://lore.kernel.org/bpf/ZcJmok64Xqv6l4ZS@elver.google.com/ Signed-off-by: Marco Elver <elver@google.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Acked-by: Yonghong Song <yonghong.song@linux.dev> Link: https://lore.kernel.org/r/20240207122626.3508658-1-elver@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-02-11 14:06:24 -08:00
Martin KaFai Lau	a7170d81e0	Merge branch 'bpf, btf: Add DEBUG_INFO_BTF checks for __register_bpf_struct_ops' Geliang Tang says: ==================== bpf: Add DEBUG_INFO_BTF checks for __register_bpf_struct_ops This patch set avoids module loading failure when the module trying to register a struct_ops and the module has its btf section stripped. This will then work similarly as module kfunc registration in commit `3de4d22cc9` ("bpf, btf: Warn but return no error for NULL btf from __register_btf_kfunc_id_set()") v5: - drop CONFIG_MODULE_ALLOW_BTF_MISMATCH check as Martin suggested. v4: - add a new patch to fix error checks for btf_get_module_btf. - rename the helper to check_btf_kconfigs. v3: - fix this build error: kernel/bpf/btf.c:7750:11: error: incomplete definition of type 'struct module' Reported-by: kernel test robot <lkp@intel.com> Closes: https://lore.kernel.org/oe-kbuild-all/202402040934.Fph0XeEo-lkp@intel.com/ v2: - add register_check_missing_btf helper as Jiri suggested. ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-02-08 11:40:18 -08:00
Geliang Tang	947e56f82f	bpf, btf: Check btf for register_bpf_struct_ops Similar to the handling in the functions __register_btf_kfunc_id_set() and register_btf_id_dtor_kfuncs(), this patch uses the newly added helper check_btf_kconfigs() to handle module with its btf section stripped. While at it, the patch also adds the missed IS_ERR() check to fix the commit `f6be98d199` ("bpf, net: switch to dynamic registration") Fixes: `f6be98d199` ("bpf, net: switch to dynamic registration") Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Link: https://lore.kernel.org/r/69082b9835463fe36f9e354bddf2d0a97df39c2b.1707373307.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-02-08 11:37:49 -08:00
Geliang Tang	9e60b0e025	bpf, btf: Add check_btf_kconfigs helper This patch extracts duplicate code on error path when btf_get_module_btf() returns NULL from the functions __register_btf_kfunc_id_set() and register_btf_id_dtor_kfuncs() into a new helper named check_btf_kconfigs() to check CONFIG_DEBUG_INFO_BTF and CONFIG_DEBUG_INFO_BTF_MODULES in it. Signed-off-by: Geliang Tang <tanggeliang@kylinos.cn> Acked-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/fa5537fc55f1e4d0bfd686598c81b7ab9dbd82b7.1707373307.git.tanggeliang@kylinos.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2024-02-08 11:22:56 -08:00

1 2 3 4 5 ...

1249305 Commits