linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-27 04:54:41 +08:00

Author	SHA1	Message	Date
Stanislav Fomichev	a4aeb9d656	bpf: Document XDP RX metadata Document all current use-cases and assumptions. Cc: John Fastabend <john.fastabend@gmail.com> Cc: David Ahern <dsahern@gmail.com> Cc: Martin KaFai Lau <martin.lau@linux.dev> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Willem de Bruijn <willemb@google.com> Cc: Jesper Dangaard Brouer <brouer@redhat.com> Cc: Anatoly Burakov <anatoly.burakov@intel.com> Cc: Alexander Lobakin <alexandr.lobakin@intel.com> Cc: Magnus Karlsson <magnus.karlsson@gmail.com> Cc: Maryam Tahhan <mtahhan@redhat.com> Cc: xdp-hints@xdp-project.net Cc: netdev@vger.kernel.org Acked-by: David Vernet <void@manifault.com> Signed-off-by: Stanislav Fomichev <sdf@google.com> Link: https://lore.kernel.org/r/20230119221536.3349901-2-sdf@google.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-01-23 09:38:10 -08:00
Alexei Starovoitov	84150795a4	Merge branch 'Dynptr fixes' Kumar Kartikeya Dwivedi says: ==================== This is part 2 of https://lore.kernel.org/bpf/20221018135920.726360-1-memxor@gmail.com. Changelog: ---------- v4 -> v5 v5: https://lore.kernel.org/bpf/20230120070355.1983560-1-memxor@gmail.com * Add comments, tests from Joanne * Add Joanne's acks v3 -> v4 v3: https://lore.kernel.org/bpf/20230120034314.1921848-1-memxor@gmail.com * Adopt BPF ASM tests to more readable style (Alexei) v2 -> v3 v2: https://lore.kernel.org/bpf/20230119021442.1465269-1-memxor@gmail.com * Fix slice invalidation logic for unreferenced dynptrs (Joanne) * Add selftests for precise slice invalidation on destruction * Add Joanne's acks v1 -> v2 v1: https://lore.kernel.org/bpf/20230101083403.332783-1-memxor@gmail.com * Return error early in case of overwriting referenced dynptr slots (Andrii, Joanne) * Rename destroy_stack_slots_dynptr to destroy_if_dynptr_stack_slot (Joanne) * Invalidate dynptr slices associated with dynptr in destroy_if_dynptr_stack_slot (Joanne) * Combine both dynptr_get_spi and is_spi_bounds_valid (Joanne) * Compute spi once in process_dynptr_func and pass it as parameter instead of recomputing (Joanne) * Add comments expanding REG_LIVE_WRITTEN marking in unmark_stack_slots_dynptr (Joanne) * Add comments explaining why destroy_if_dynptr_stack_slot call needs to be done for both spi and spi - 1 (Joanne) * Port BPF assembly tests from test_verifier to test_progs framework (Andrii) * Address misc feedback, rebase to bpf-next Old v1 -> v1 Old v1: https://lore.kernel.org/bpf/20221018135920.726360-1-memxor@gmail.com * Allow overwriting dynptr stack slots from dynptr init helpers * Fix a bug in alignment check where reg->var_off.value was still not included * Address other minor nits Eduard Zingerman (1): selftests/bpf: convenience macro for use with 'asm volatile' blocks ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-20 17:55:04 -08:00
Kumar Kartikeya Dwivedi	ae8e354c49	selftests/bpf: Add dynptr helper tests First test that we allow overwriting dynptr slots and reinitializing them in unreferenced case, and disallow overwriting for referenced case. Include tests to ensure slices obtained from destroyed dynptrs are being invalidated on their destruction. The destruction needs to be scoped, as in slices of dynptr A should not be invalidated when dynptr B is destroyed. Next, test that MEM_UNINIT doesn't allow writing dynptr stack slots. Acked-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230121002241.2113993-13-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-20 17:55:04 -08:00
Kumar Kartikeya Dwivedi	011edc8e49	selftests/bpf: Add dynptr partial slot overwrite tests Try creating a dynptr, then overwriting second slot with first slot of another dynptr. Then, the first slot of first dynptr should also be invalidated, but without our fix that does not happen. As a consequence, the unfixed case allows passing first dynptr (as the kernel check only checks for slot_type and then first_slot == true). Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230121002241.2113993-12-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-20 17:55:03 -08:00
Kumar Kartikeya Dwivedi	ef48101353	selftests/bpf: Add dynptr var_off tests Ensure that variable offset is handled correctly, and verifier takes both fixed and variable part into account. Also ensures that only constant var_off is allowed. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230121002241.2113993-11-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-20 17:55:03 -08:00
Kumar Kartikeya Dwivedi	f4d24edf1b	selftests/bpf: Add dynptr pruning tests Add verifier tests that verify the new pruning behavior for STACK_DYNPTR slots, and ensure that state equivalence takes into account changes to the old and current verifier state correctly. Also ensure that the stacksafe changes are actually enabling pruning in case states are equivalent from pruning PoV. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230121002241.2113993-10-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-20 17:55:03 -08:00
Eduard Zingerman	91b875a5e4	selftests/bpf: convenience macro for use with 'asm volatile' blocks A set of macros useful for writing naked BPF functions using inline assembly. E.g. as follows: struct map_struct { ... } map SEC(".maps"); SEC(...) __naked int foo_test(void) { asm volatile( "r0 = 0;" "(u64)(r10 - 8) = r0;" "r1 = %[map] ll;" "r2 = r10;" "r2 += -8;" "call %[bpf_map_lookup_elem];" "r0 = 0;" "exit;" : : __imm(bpf_map_lookup_elem), __imm_addr(map) : __clobber_all); } Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Eduard Zingerman <eddyz87@gmail.com> [ Kartikeya: Add acks, include __clobber_common from Andrii ] Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230121002241.2113993-9-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-20 17:55:03 -08:00
Kumar Kartikeya Dwivedi	1ee72bcbe4	bpf: Avoid recomputing spi in process_dynptr_func Currently, process_dynptr_func first calls dynptr_get_spi and then is_dynptr_reg_valid_init and is_dynptr_reg_valid_uninit have to call it again to obtain the spi value. Instead of doing this twice, reuse the already obtained value (which is by default 0, and is only set for PTR_TO_STACK, and only used in that case in aforementioned functions). The input value for these two functions will either be -ERANGE or >= 1, and can either be permitted or rejected based on the respective check. Suggested-by: Joanne Koong <joannelkoong@gmail.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230121002241.2113993-8-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-20 17:55:03 -08:00
Kumar Kartikeya Dwivedi	f5b625e5f8	bpf: Combine dynptr_get_spi and is_spi_bounds_valid Currently, a check on spi resides in dynptr_get_spi, while others checking its validity for being within the allocated stack slots happens in is_spi_bounds_valid. Almost always barring a couple of cases (where being beyond allocated stack slots is not an error as stack slots need to be populated), both are used together to make checks. Hence, subsume the is_spi_bounds_valid check in dynptr_get_spi, and return -ERANGE to specially distinguish the case where spi is valid but not within allocated slots in the stack state. The is_spi_bounds_valid function is still kept around as it is a generic helper that will be useful for other objects on stack similar to dynptr in the future. Suggested-by: Joanne Koong <joannelkoong@gmail.com> Acked-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230121002241.2113993-7-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-20 17:55:03 -08:00
Kumar Kartikeya Dwivedi	379d4ba831	bpf: Allow reinitializing unreferenced dynptr stack slots Consider a program like below: void prog(void) { { struct bpf_dynptr ptr; bpf_dynptr_from_mem(...); } ... { struct bpf_dynptr ptr; bpf_dynptr_from_mem(...); } } Here, the C compiler based on lifetime rules in the C standard would be well within in its rights to share stack storage for dynptr 'ptr' as their lifetimes do not overlap in the two distinct scopes. Currently, such an example would be rejected by the verifier, but this is too strict. Instead, we should allow reinitializing over dynptr stack slots and forget information about the old dynptr object. The destroy_if_dynptr_stack_slot function already makes necessary checks to avoid overwriting referenced dynptr slots. This is done to present a better error message instead of forgetting dynptr information on stack and preserving reference state, leading to an inevitable but undecipherable error at the end about an unreleased reference which has to be associated back to its allocating call instruction to make any sense to the user. Acked-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230121002241.2113993-6-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-20 17:55:03 -08:00
Kumar Kartikeya Dwivedi	f8064ab90d	bpf: Invalidate slices on destruction of dynptrs on stack The previous commit implemented destroy_if_dynptr_stack_slot. It destroys the dynptr which given spi belongs to, but still doesn't invalidate the slices that belong to such a dynptr. While for the case of referenced dynptr, we don't allow their overwrite and return an error early, we still allow it and destroy the dynptr for unreferenced dynptr. To be able to enable precise and scoped invalidation of dynptr slices in this case, we must be able to associate the source dynptr of slices that have been obtained using bpf_dynptr_data. When doing destruction, only slices belonging to the dynptr being destructed should be invalidated, and nothing else. Currently, dynptr slices belonging to different dynptrs are indistinguishible. Hence, allocate a unique id to each dynptr (CONST_PTR_TO_DYNPTR and those on stack). This will be stored as part of reg->id. Whenever using bpf_dynptr_data, transfer this unique dynptr id to the returned PTR_TO_MEM_OR_NULL slice pointer, and store it in a new per-PTR_TO_MEM dynptr_id register state member. Finally, after establishing such a relationship between dynptrs and their slices, implement precise invalidation logic that only invalidates slices belong to the destroyed dynptr in destroy_if_dynptr_stack_slot. Acked-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230121002241.2113993-5-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-20 17:55:03 -08:00
Kumar Kartikeya Dwivedi	ef8fc7a07c	bpf: Fix partial dynptr stack slot reads/writes Currently, while reads are disallowed for dynptr stack slots, writes are not. Reads don't work from both direct access and helpers, while writes do work in both cases, but have the effect of overwriting the slot_type. While this is fine, handling for a few edge cases is missing. Firstly, a user can overwrite the stack slots of dynptr partially. Consider the following layout: spi: [d][d][?] 2 1 0 First slot is at spi 2, second at spi 1. Now, do a write of 1 to 8 bytes for spi 1. This will essentially either write STACK_MISC for all slot_types or STACK_MISC and STACK_ZERO (in case of size < BPF_REG_SIZE partial write of zeroes). The end result is that slot is scrubbed. Now, the layout is: spi: [d][m][?] 2 1 0 Suppose if user initializes spi = 1 as dynptr. We get: spi: [d][d][d] 2 1 0 But this time, both spi 2 and spi 1 have first_slot = true. Now, when passing spi 2 to dynptr helper, it will consider it as initialized as it does not check whether second slot has first_slot == false. And spi 1 should already work as normal. This effectively replaced size + offset of first dynptr, hence allowing invalid OOB reads and writes. Make a few changes to protect against this: When writing to PTR_TO_STACK using BPF insns, when we touch spi of a STACK_DYNPTR type, mark both first and second slot (regardless of which slot we touch) as STACK_INVALID. Reads are already prevented. Second, prevent writing to stack memory from helpers if the range may contain any STACK_DYNPTR slots. Reads are already prevented. For helpers, we cannot allow it to destroy dynptrs from the writes as depending on arguments, helper may take uninit_mem and dynptr both at the same time. This would mean that helper may write to uninit_mem before it reads the dynptr, which would be bad. PTR_TO_MEM: [?????dd] Depending on the code inside the helper, it may end up overwriting the dynptr contents first and then read those as the dynptr argument. Verifier would only simulate destruction when it does byte by byte access simulation in check_helper_call for meta.access_size, and fail to catch this case, as it happens after argument checks. The same would need to be done for any other non-trivial objects created on the stack in the future, such as bpf_list_head on stack, or bpf_rb_root on stack. A common misunderstanding in the current code is that MEM_UNINIT means writes, but note that writes may also be performed even without MEM_UNINIT in case of helpers, in that case the code after handling meta && meta->raw_mode will complain when it sees STACK_DYNPTR. So that invalid read case also covers writes to potential STACK_DYNPTR slots. The only loophole was in case of meta->raw_mode which simulated writes through instructions which could overwrite them. A future series sequenced after this will focus on the clean up of helper access checks and bugs around that. Fixes: `97e03f5210` ("bpf: Add verifier support for dynptrs") Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230121002241.2113993-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-20 17:55:03 -08:00
Kumar Kartikeya Dwivedi	79168a669d	bpf: Fix missing var_off check for ARG_PTR_TO_DYNPTR Currently, the dynptr function is not checking the variable offset part of PTR_TO_STACK that it needs to check. The fixed offset is considered when computing the stack pointer index, but if the variable offset was not a constant (such that it could not be accumulated in reg->off), we will end up a discrepency where runtime pointer does not point to the actual stack slot we mark as STACK_DYNPTR. It is impossible to precisely track dynptr state when variable offset is not constant, hence, just like bpf_timer, kptr, bpf_spin_lock, etc. simply reject the case where reg->var_off is not constant. Then, consider both reg->off and reg->var_off.value when computing the stack pointer index. A new helper dynptr_get_spi is introduced to hide over these details since the dynptr needs to be located in multiple places outside the process_dynptr_func checks, hence once we know it's a PTR_TO_STACK, we need to enforce these checks in all places. Note that it is disallowed for unprivileged users to have a non-constant var_off, so this problem should only be possible to trigger from programs having CAP_PERFMON. However, its effects can vary. Without the fix, it is possible to replace the contents of the dynptr arbitrarily by making verifier mark different stack slots than actual location and then doing writes to the actual stack address of dynptr at runtime. Fixes: `97e03f5210` ("bpf: Add verifier support for dynptrs") Acked-by: Joanne Koong <joannelkoong@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230121002241.2113993-3-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-20 17:55:02 -08:00
Kumar Kartikeya Dwivedi	d6fefa1105	bpf: Fix state pruning for STACK_DYNPTR stack slots The root of the problem is missing liveness marking for STACK_DYNPTR slots. This leads to all kinds of problems inside stacksafe. The verifier by default inside stacksafe ignores spilled_ptr in stack slots which do not have REG_LIVE_READ marks. Since this is being checked in the 'old' explored state, it must have already done clean_live_states for this old bpf_func_state. Hence, it won't be receiving any more liveness marks from to be explored insns (it has received REG_LIVE_DONE marking from liveness point of view). What this means is that verifier considers that it's safe to not compare the stack slot if was never read by children states. While liveness marks are usually propagated correctly following the parentage chain for spilled registers (SCALAR_VALUE and PTR_* types), the same is not the case for STACK_DYNPTR. clean_live_states hence simply rewrites these stack slots to the type STACK_INVALID since it sees no REG_LIVE_READ marks. The end result is that we will never see STACK_DYNPTR slots in explored state. Even if verifier was conservatively matching !REG_LIVE_READ slots, very next check continuing the stacksafe loop on seeing STACK_INVALID would again prevent further checks. Now as long as verifier stores an explored state which we can compare to when reaching a pruning point, we can abuse this bug to make verifier prune search for obviously unsafe paths using STACK_DYNPTR slots thinking they are never used hence safe. Doing this in unprivileged mode is a bit challenging. add_new_state is only set when seeing BPF_F_TEST_STATE_FREQ (which requires privileges) or when jmps_processed difference is >= 2 and insn_processed difference is >= 8. So coming up with the unprivileged case requires a little more work, but it is still totally possible. The test case being discussed below triggers the heuristic even in unprivileged mode. However, it no longer works since commit `8addbfc7b3` ("bpf: Gate dynptr API behind CAP_BPF"). Let's try to study the test step by step. Consider the following program (C style BPF ASM): 0 r0 = 0; 1 r6 = &ringbuf_map; 3 r1 = r6; 4 r2 = 8; 5 r3 = 0; 6 r4 = r10; 7 r4 -= -16; 8 call bpf_ringbuf_reserve_dynptr; 9 if r0 == 0 goto pc+1; 10 goto pc+1; 11 *(r10 - 16) = 0xeB9F; 12 r1 = r10; 13 r1 -= -16; 14 r2 = 0; 15 call bpf_ringbuf_discard_dynptr; 16 r0 = 0; 17 exit; We know that insn 12 will be a pruning point, hence if we force add_new_state for it, it will first verify the following path as safe in straight line exploration: 0 1 3 4 5 6 7 8 9 -> 10 -> (12) 13 14 15 16 17 Then, when we arrive at insn 12 from the following path: 0 1 3 4 5 6 7 8 9 -> 11 (12) We will find a state that has been verified as safe already at insn 12. Since register state is same at this point, regsafe will pass. Next, in stacksafe, for spi = 0 and spi = 1 (location of our dynptr) is skipped seeing !REG_LIVE_READ. The rest matches, so stacksafe returns true. Next, refsafe is also true as reference state is unchanged in both states. The states are considered equivalent and search is pruned. Hence, we are able to construct a dynptr with arbitrary contents and use the dynptr API to operate on this arbitrary pointer and arbitrary size + offset. To fix this, first define a mark_dynptr_read function that propagates liveness marks whenever a valid initialized dynptr is accessed by dynptr helpers. REG_LIVE_WRITTEN is marked whenever we initialize an uninitialized dynptr. This is done in mark_stack_slots_dynptr. It allows screening off mark_reg_read and not propagating marks upwards from that point. This ensures that we either set REG_LIVE_READ64 on both dynptr slots, or none, so clean_live_states either sets both slots to STACK_INVALID or none of them. This is the invariant the checks inside stacksafe rely on. Next, do a complete comparison of both stack slots whenever they have STACK_DYNPTR. Compare the dynptr type stored in the spilled_ptr, and also whether both form the same first_slot. Only then is the later path safe. Fixes: `97e03f5210` ("bpf: Add verifier support for dynptrs") Acked-by: Eduard Zingerman <eddyz87@gmail.com> Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230121002241.2113993-2-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-20 17:55:02 -08:00
Alexei Starovoitov	00b8f39f1d	Merge branch 'kallsyms: Optimize the search for module symbols by livepatch and bpf' Jiri Olsa says: ==================== hi, sending new version of [1] patchset posted originally by Zhen Lei. It contains 2 changes that improove search performance for livepatch and bpf. v3 changes: - fixed off by 1 issue, simplified condition, added acks [Song] - added module attach as subtest [Andrii] v2 changes: - reworked the bpf change and meassured the performance - adding new selftest to benchmark kprobe multi module attachment - skipping patch 3 as requested by Zhen Lei - added Reviewed-by for patch 1 [Petr Mladek] thanks, jirka [1] https://lore.kernel.org/bpf/20221230112729.351-1-thunder.leizhen@huawei.com/ --- Jiri Olsa (2): selftests/bpf: Add serial_test_kprobe_multi_bench_attach_kernel/module tests bpf: Change modules resolving for kprobe multi link ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-19 17:07:50 -08:00
Jiri Olsa	6a5f2d6ee8	bpf: Change modules resolving for kprobe multi link We currently use module_kallsyms_on_each_symbol that iterates all modules/symbols and we try to lookup each such address in user provided symbols/addresses to get list of used modules. This fix instead only iterates provided kprobe addresses and calls __module_address on each to get list of used modules. This turned out to be simpler and also bit faster. On my setup with workload (executed 10 times): # test_progs -t kprobe_multi_bench_attach/modules Current code: Performance counter stats for './test.sh' (5 runs): 76,081,161,596 cycles:k ( +- 0.47% ) 18.3867 +- 0.0992 seconds time elapsed ( +- 0.54% ) With the fix: Performance counter stats for './test.sh' (5 runs): 74,079,889,063 cycles:k ( +- 0.04% ) 17.8514 +- 0.0218 seconds time elapsed ( +- 0.12% ) Signed-off-by: Jiri Olsa <jolsa@kernel.org> Reviewed-by: Zhen Lei <thunder.leizhen@huawei.com> Reviewed-by: Petr Mladek <pmladek@suse.com> Link: https://lore.kernel.org/r/20230116101009.23694-4-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-19 17:07:40 -08:00
Jiri Olsa	edac4b5b18	selftests/bpf: Add serial_test_kprobe_multi_bench_attach_kernel/module tests Add bench test for module portion of the symbols as well. # ./test_progs -v -t kprobe_multi_bench_attach_module bpf_testmod.ko is already unloaded. Loading bpf_testmod.ko... Successfully loaded bpf_testmod.ko. test_kprobe_multi_bench_attach:PASS:get_syms 0 nsec test_kprobe_multi_bench_attach:PASS:kprobe_multi_empty__open_and_load 0 nsec test_kprobe_multi_bench_attach:PASS:bpf_program__attach_kprobe_multi_opts 0 nsec test_kprobe_multi_bench_attach: found 26620 functions test_kprobe_multi_bench_attach: attached in 0.182s test_kprobe_multi_bench_attach: detached in 0.082s #96 kprobe_multi_bench_attach_module:OK Summary: 1/0 PASSED, 0 SKIPPED, 0 FAILED Successfully unloaded bpf_testmod.ko. It's useful for testing kprobe multi link modules resolving. Acked-by: Song Liu <song@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230116101009.23694-3-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-19 17:07:15 -08:00
Zhen Lei	07cc2c931e	livepatch: Improve the search performance of module_kallsyms_on_each_symbol() Currently we traverse all symbols of all modules to find the specified function for the specified module. But in reality, we just need to find the given module and then traverse all the symbols in it. Let's add a new parameter 'const char modname' to function module_kallsyms_on_each_symbol(), then we can compare the module names directly in this function and call hook 'fn' after matching. If 'modname' is NULL, the symbols of all modules are still traversed for compatibility with other usage cases. Phase1: mod1-->mod2..(subsequent modules do not need to be compared) \| Phase2: -->f1-->f2-->f3 Assuming that there are m modules, each module has n symbols on average, then the time complexity is reduced from O(m n) to O(m) + O(n). Reviewed-by: Petr Mladek <pmladek@suse.com> Acked-by: Song Liu <song@kernel.org> Signed-off-by: Zhen Lei <thunder.leizhen@huawei.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Miroslav Benes <mbenes@suse.cz> Reviewed-by: Luis Chamberlain <mcgrof@kernel.org> Link: https://lore.kernel.org/r/20230116101009.23694-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-19 17:07:15 -08:00
Tiezhu Yang	92afc5329a	selftests/bpf: Fix build errors if CONFIG_NF_CONNTRACK=m If CONFIG_NF_CONNTRACK=m, there are no definitions of NF_NAT_MANIP_SRC and NF_NAT_MANIP_DST in vmlinux.h, build test_bpf_nf.c failed. $ make -C tools/testing/selftests/bpf/ CLNG-BPF [test_maps] test_bpf_nf.bpf.o progs/test_bpf_nf.c:160:42: error: use of undeclared identifier 'NF_NAT_MANIP_SRC' bpf_ct_set_nat_info(ct, &saddr, sport, NF_NAT_MANIP_SRC); ^ progs/test_bpf_nf.c:163:42: error: use of undeclared identifier 'NF_NAT_MANIP_DST' bpf_ct_set_nat_info(ct, &daddr, dport, NF_NAT_MANIP_DST); ^ 2 errors generated. Copy the definitions in include/net/netfilter/nf_nat.h to test_bpf_nf.c, in order to avoid redefinitions if CONFIG_NF_CONNTRACK=y, rename them with ___local suffix. This is similar with commit `1058b6a78d` ("selftests/bpf: Do not fail build if CONFIG_NF_CONNTRACK=m/n"). Fixes: `b06b45e82b` ("selftests/bpf: add tests for bpf_ct_set_nat_info kfunc") Signed-off-by: Tiezhu Yang <yangtiezhu@loongson.cn> Acked-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/r/1674028604-7113-1-git-send-email-yangtiezhu@loongson.cn Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-01-18 13:33:45 -08:00
Jiri Olsa	c0f264e4ed	bpf/selftests: Add verifier tests for loading sleepable programs Adding verifier tests for loading all types od allowed sleepable programs plus reject for tp_btf type. Acked-by: Song Liu <song@kernel.org> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230117223705.440975-2-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-17 16:56:04 -08:00
Jiri Olsa	700e6f853e	bpf: Do not allow to load sleepable BPF_TRACE_RAW_TP program Currently we allow to load any tracing program as sleepable, but BPF_TRACE_RAW_TP can't sleep. Making the check explicit for tracing programs attach types, so sleepable BPF_TRACE_RAW_TP will fail to load. Updating the verifier error to mention iter programs as well. Acked-by: Song Liu <song@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20230117223705.440975-1-jolsa@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-17 16:56:04 -08:00
Martin Rodriguez Reboredo	c1177979af	btf, scripts: Exclude Rust CUs with pahole Version 1.24 of pahole has the capability to exclude compilation units (CUs) of specific languages [1] [2]. Rust, as of writing, is not currently supported by pahole and if it's used with a build that has BTF debugging enabled it results in malformed kernel and module binaries [3]. So it's better for pahole to exclude Rust CUs until support for it arrives. Co-developed-by: Eric Curtin <ecurtin@redhat.com> Signed-off-by: Eric Curtin <ecurtin@redhat.com> Signed-off-by: Martin Rodriguez Reboredo <yakoyoku@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Eric Curtin <ecurtin@redhat.com> Reviewed-by: Neal Gompa <neal@gompa.dev> Acked-by: Miguel Ojeda <ojeda@kernel.org> Acked-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=49358dfe2aaae4e90b072332c3e324019826783f [1] Link: https://git.kernel.org/pub/scm/devel/pahole/pahole.git/commit/?id=8ee363790b7437283c53090a85a9fec2f0b0fbc4 [2] Link: https://github.com/Rust-for-Linux/linux/issues/735 [3] Link: https://lore.kernel.org/bpf/20230111152050.559334-1-yakoyoku@gmail.com	2023-01-17 17:29:42 +01:00
Alexei Starovoitov	dfff86f8eb	Merge branch 'samples/bpf: modernize BPF functionality test programs' "Daniel T. Lee" says: ==================== Currently, there are many programs under samples/bpf to test the various functionality of BPF that have been developed for a long time. However, the kernel (BPF) has changed a lot compared to the 2016 when some of these test programs were first introduced. Therefore, some of these programs use the deprecated function of BPF, and some programs no longer work normally due to changes in the API. To list some of the kernel changes that this patch set is focusing on, - legacy BPF map declaration syntax support had been dropped [1] - bpf_trace_printk() always append newline at the end [2] - deprecated styled BPF section header (bpf_load style) [3] - urandom_read tracepoint is removed (used for testing overhead) [4] - ping sends packet with SOCK_DGRAM instead of SOCK_RAW [5]* - use "vmlinux.h" instead of including individual headers In addition to this, this patchset tries to modernize the existing testing scripts a bit. And for network-related testing programs, a separate header file was created and applied. (To use the Endianness conversion function from xdp_sample and bunch of constants) [1]: https://github.com/libbpf/libbpf/issues/282 [2]: commit `ac5a72ea5c` ("bpf: Use dedicated bpf_trace_printk event instead of trace_printk()") [3]: commit `ceb5dea565` ("samples: bpf: Remove bpf_load loader completely") [4]: commit `14c174633f` ("random: remove unused tracepoints") [5]: https://lwn.net/Articles/422330/ *: This is quite old, but I'm not sure why the code was initially developed to filter only SOCK_RAW. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-15 13:32:46 -08:00
Daniel T. Lee	e04946f54c	samples/bpf: change _kern suffix to .bpf with BPF test programs This commit changes the _kern suffix to .bpf with the BPF test programs. With this modification, test programs will inherit the benefit of the new CLANG-BPF compile target. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230115071613.125791-11-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-15 13:32:45 -08:00
Daniel T. Lee	e8acf8f47a	samples/bpf: use vmlinux.h instead of implicit headers in BPF test program This commit applies vmlinux.h to BPF functionality testing program. Macros that were not defined despite migration to "vmlinux.h" were defined separately in individual files. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230115071613.125791-10-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-15 13:32:45 -08:00
Daniel T. Lee	c2f4f5593e	samples/bpf: replace BPF programs header with net_shared.h This commit applies "net_shared.h" to BPF programs to remove existing network related header dependencies. Also, this commit removes unnecessary headers before applying "vmlinux.h" to the BPF programs. Mostly, endianness conversion function has been applied to the source. In addition, several macros have been defined to fulfill the INET, TC-related constants. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230115071613.125791-9-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-15 13:32:45 -08:00
Daniel T. Lee	e69fe84595	samples/bpf: split common macros to net_shared.h Currently, many programs under sample/bpf often include individual macros by directly including the header under "linux/" rather than using the "vmlinux.h" header. However, there are some problems with migrating to "vmlinux.h" because there is no definition for utility functions such as endianness conversion (ntohs/htons). Fortunately, the xdp_sample program already has a function that can be replaced to solve this problem. Therefore, this commit attempts to separate these functions into a file called net_shared.h to make them universally available. Additionally, this file includes network-related macros that are not defined in "vmlinux.h". (inspired by 'selftests' bpf_tracing_net.h) Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230115071613.125791-8-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-15 13:32:45 -08:00
Daniel T. Lee	a1f93c8fb0	samples/bpf: replace legacy map with the BTF-defined map With libbpf 1.0 release, support for legacy BPF map declaration syntax had been dropped. If you run a program using legacy BPF in the latest libbpf, the following error will be output. libbpf: map 'lwt_len_hist_map' (legacy): legacy map definitions are deprecated, use BTF-defined maps instead libbpf: Use of BPF_ANNOTATE_KV_PAIR is deprecated, use BTF-defined maps in .maps section instead This commit replaces legacy map with the BTF-defined map. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230115071613.125791-7-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-15 13:32:45 -08:00
Daniel T. Lee	58e975d014	samples/bpf: replace broken overhead microbenchmark with fib_table_lookup The test_overhead bpf program is designed to compare performance between tracepoint and kprobe. Initially it used task_rename and urandom_read tracepoint. However, commit `14c174633f` ("random: remove unused tracepoints") removed urandom_read tracepoint, and for this reason the test_overhead got broken. This commit introduces new microbenchmark using fib_table_lookup. This microbenchmark sends UDP packets to localhost in order to invoke fib_table_lookup. In a nutshell: fd = socket(AF_INET, SOCK_DGRAM, IPPROTO_UDP); addr.sin_addr.s_addr = inet_addr(DUMMY_IP); addr.sin_port = htons(DUMMY_PORT); for() { sendto(fd, buf, strlen(buf), 0, (struct sockaddr *)&addr, sizeof(addr)); } on 4 cpus in parallel: lookup per sec base (no tracepoints, no kprobes) 381k with kprobe at fib_table_lookup() 325k with tracepoint at fib:fib_table_lookup 330k with raw_tracepoint at fib:fib_table_lookup 365k Fixes: `14c174633f` ("random: remove unused tracepoints") Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230115071613.125791-6-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-15 13:32:45 -08:00
Daniel T. Lee	31b12a4159	samples/bpf: fix broken cgroup socket testing Currently, executing test_cgrp2_sock2 fails due to wrong section header. This 'cgroup/sock1' style section is previously used at 'samples/bpf_load' (deprecated) BPF loader. Because this style isn't supported in libbpf, this commit fixes this problem by correcting the section header. $ sudo ./test_cgrp2_sock2.sh libbpf: prog 'bpf_prog1': missing BPF prog type, check ELF section name 'cgroup/sock1' libbpf: prog 'bpf_prog1': failed to load: -22 libbpf: failed to load object './sock_flags_kern.o' ERROR: loading BPF object file failed In addition, this BPF program filters ping packets by comparing whether the socket type uses SOCK_RAW. However, after the ICMP socket[1] was developed, ping sends ICMP packets using SOCK_DGRAM. Therefore, in this commit, the packet filtering is changed to use SOCK_DGRAM instead of SOCK_RAW. $ strace --trace socket ping -6 -c1 -w1 ::1 socket(AF_INET6, SOCK_DGRAM, IPPROTO_ICMPV6) = 3 [1]: https://lwn.net/Articles/422330/ Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230115071613.125791-5-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-15 13:32:45 -08:00
Daniel T. Lee	dac808c907	samples/bpf: fix broken lightweight tunnel testing The test_lwt_bpf is a script that tests the functionality of BPF through the output of the ftrace with bpf_trace_printk. Currently, this program is not operating normally for several reasons. First of all, this test script can't parse the ftrace results properly. GNU sed tries to be as greedy as possible when attempting pattern matching. Due to this, cutting metadata (such as timestamp) from the log entry of ftrace doesn't work properly, and also desired log isn't extracted properly. To make sed stripping clearer, 'nocontext-info' option with the ftrace has been used to remove metadata from the log. Also, instead of using unclear pattern matching, this commit specifies an explicit parse pattern. Also, unlike before when this test was introduced, the way bpf_trace_printk behaves has changed[1]. The previous bpf_trace_printk had to always have '\n' in order to print newline, but now that the bpf_trace_printk call includes newline by default, so '\n' is no longer needed. Lastly with the lwt ENCAP_BPF out, the context information with the sk_buff protocol is preserved. Therefore, this commit changes the previous test result from 'protocol 0' to 'protocol 8', which means ETH_P_IP. [1]: commit `ac5a72ea5c` ("bpf: Use dedicated bpf_trace_printk event instead of trace_printk()") Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230115071613.125791-4-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-15 13:32:45 -08:00
Daniel T. Lee	f20f064e84	samples/bpf: refactor BPF functionality testing scripts Currently, some test scripts are experiencing minor errors related to executing tests. $ sudo ./test_cgrp2_sock.sh ./test_cgrp2_sock.sh: 22: test_cgrp2_sock: not found This problem occurs because the path to the execution target is not properly specified. Therefore, this commit solves this problem by specifying a relative path to its executables. This commit also makes a concise refactoring of hard-coded BPF program names. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230115071613.125791-3-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-15 13:32:45 -08:00
Daniel T. Lee	d982a2e306	samples/bpf: ensure ipv6 is enabled before running tests Currently, a few of BPF tests use ipv6 functionality. The problem here is that if ipv6 is disabled, these tests will fail, and even if the test fails, it will not tell you why it failed. $ sudo ./test_cgrp2_sock2.sh RTNETLINK answers: Permission denied In order to fix this, this commit ensures ipv6 is enabled prior to running tests. Signed-off-by: Daniel T. Lee <danieltimlee@gmail.com> Link: https://lore.kernel.org/r/20230115071613.125791-2-danieltimlee@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-15 13:32:45 -08:00
Martin KaFai Lau	81bbbb6974	Merge branch 'bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room()' Ziyang Xuan says: ==================== Add ipip6 and ip6ip decap support for bpf_skb_adjust_room(). Main use case is for using cls_bpf on ingress hook to decapsulate IPv4 over IPv6 and IPv6 over IPv4 tunnel packets. And add ipip6 and ip6ip decap testcases to verify that bpf_skb_adjust_room() correctly decapsulate ipip6 and ip6ip tunnel packets. $./test_tc_tunnel.sh ipip encap 192.168.1.1 to 192.168.1.2, type ipip, mac none len 100 test basic connectivity 0 test bpf encap without decap (expect failure) Ncat: TIMEOUT. 1 test bpf encap with tunnel device decap 0 test bpf encap with bpf decap 0 OK ipip6 encap 192.168.1.1 to 192.168.1.2, type ipip6, mac none len 100 test basic connectivity 0 test bpf encap without decap (expect failure) Ncat: TIMEOUT. 1 test bpf encap with tunnel device decap 0 test bpf encap with bpf decap 0 OK ip6ip6 encap fd::1 to fd::2, type ip6tnl, mac none len 100 test basic connectivity 0 test bpf encap without decap (expect failure) Ncat: TIMEOUT. 1 test bpf encap with tunnel device decap 0 test bpf encap with bpf decap 0 OK sit encap fd::1 to fd::2, type sit, mac none len 100 test basic connectivity 0 test bpf encap without decap (expect failure) Ncat: TIMEOUT. 1 test bpf encap with tunnel device decap 0 test bpf encap with bpf decap 0 OK ... OK. All tests passed v3: - Fix compilation failure of selftests/bpf. - Combine two new branches in bpf_skb_adjust_room(). - Simplify description for new flags BPF_F_ADJ_ROOM_DECAP_L3_IP*. v2: - Use decap flags to indicate the new IP header. Do not rely on skb->encapsulation. ==================== Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-01-15 12:56:17 -08:00
Ziyang Xuan	7105f76fb5	selftests/bpf: add ipip6 and ip6ip decap to test_tc_tunnel Add ipip6 and ip6ip decap testcases. Verify that bpf_skb_adjust_room() correctly decapsulate ipip6 and ip6ip tunnel packets. Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/dfd2d8cfdf9111bd129170d4345296f53bee6a67.1673574419.git.william.xuanziyang@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-01-15 12:56:17 -08:00
Ziyang Xuan	d219df60a7	bpf: Add ipip6 and ip6ip decap support for bpf_skb_adjust_room() Add ipip6 and ip6ip decap support for bpf_skb_adjust_room(). Main use case is for using cls_bpf on ingress hook to decapsulate IPv4 over IPv6 and IPv6 over IPv4 tunnel packets. Add two new flags BPF_F_ADJ_ROOM_DECAP_L3_IPV{4,6} to indicate the new IP header version after decapsulating the outer IP header. Suggested-by: Willem de Bruijn <willemb@google.com> Signed-off-by: Ziyang Xuan <william.xuanziyang@huawei.com> Reviewed-by: Willem de Bruijn <willemb@google.com> Link: https://lore.kernel.org/r/b268ec7f0ff9431f4f43b1b40ab856ebb28cb4e1.1673574419.git.william.xuanziyang@huawei.com Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>	2023-01-15 12:56:17 -08:00
Roberto Valenzuela	1c48391bc6	selftests/bpf: Fix missing space error Add the missing space after 'dest' variable assignment. This change will resolve the following checkpatch.pl script error: ERROR: spaces required around that '+=' (ctx:VxW) Signed-off-by: Roberto Valenzuela <valenzuelarober@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20230113180257.39769-1-valenzuelarober@gmail.com	2023-01-13 14:23:02 -08:00
Menglong Dong	2fa0745365	libbpf: Replace '.' with '_' in legacy kprobe event name '.' is not allowed in the event name of kprobe. Therefore, we will get a EINVAL if the kernel function name has a '.' in legacy kprobe attach case, such as 'icmp_reply.constprop.0'. In order to adapt this case, we need to replace the '.' with other char in gen_kprobe_legacy_event_name(). And I use '_' for this propose. Signed-off-by: Menglong Dong <imagedong@tencent.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20230113093427.1666466-1-imagedong@tencent.com	2023-01-13 14:05:47 -08:00
Holger Hoffstätte	878625e1c7	bpftool: Always disable stack protection for BPF objects When the clang toolchain has stack protection enabled in order to be consistent with gcc - which just happens to be the case on Gentoo - the bpftool build fails: [...] clang \ -I. \ -I/tmp/portage/dev-util/bpftool-6.0.12/work/linux-6.0/tools/include/uapi/ \ -I/tmp/portage/dev-util/bpftool-6.0.12/work/linux-6.0/tools/bpf/bpftool/bootstrap/libbpf/include \ -g -O2 -Wall -target bpf -c skeleton/pid_iter.bpf.c -o pid_iter.bpf.o clang \ -I. \ -I/tmp/portage/dev-util/bpftool-6.0.12/work/linux-6.0/tools/include/uapi/ \ -I/tmp/portage/dev-util/bpftool-6.0.12/work/linux-6.0/tools/bpf/bpftool/bootstrap/libbpf/include \ -g -O2 -Wall -target bpf -c skeleton/profiler.bpf.c -o profiler.bpf.o skeleton/profiler.bpf.c:40:14: error: A call to built-in function '__stack_chk_fail' is not supported. int BPF_PROG(fentry_XXX) ^ skeleton/profiler.bpf.c:94:14: error: A call to built-in function '__stack_chk_fail' is not supported. int BPF_PROG(fexit_XXX) ^ 2 errors generated. [...] Since stack-protector makes no sense for the BPF bits just unconditionally disable it. Bug: https://bugs.gentoo.org/890638 Signed-off-by: Holger Hoffstätte <holger@applied-asynchrony.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/bpf/74cd9d2e-6052-312a-241e-2b514a75c92c@applied-asynchrony.com	2023-01-13 16:44:21 +01:00
Alexei Starovoitov	db473df289	Merge branch 'selftests/xsk: speed-ups, fixes, and new XDP programs' Magnus Karlsson says: ==================== This is a patch set of various performance improvements, fixes, and the introduction of more than one XDP program to the xsk selftests framework so we can test more things in the future such as upcoming multi-buffer and metadata support for AF_XDP. The new programs just reuse the framework that all the other eBPF selftests use. The new feature is used to implement one new test that does XDP_DROP on every other packet. More tests using this will be added in future commits. Contents: * The run-time of the test suite is cut by 10x when executing the tests on a real NIC, by only attaching the XDP program once per mode tested, instead of once per test program. * Over 700 lines of code have been removed. The xsk.c control file was moved straight over from libbpf when the xsk support was deprecated there. As it is now not used as library code that has to work with all kinds of versions of Linux, a lot of code could be dropped or simplified. * Add a new command line option "-d" that can be used when a test fails and you want to debug it with gdb or some other debugger. The option creates the two veth netdevs and prints them to the screen without deleting them afterwards. This way these veth netdevs can be used when running xskxceiver in a debugger. * Implemented the possibility to load external XDP programs so we can have more than the default one. This feature is used to implement a test where every other packet is dropped. Good exercise for the recycling mechanism of the xsk buffer pool used in zero-copy mode. * Various clean-ups and small fixes in patches 1 to 5. None of these fixes has any impact on the correct execution of the tests when they pass, though they can be irritating when a test fails. IMHO, they do not need to go to bpf as they will not fix anything there. The first version of patches 1, 2, and 4 where previously sent to bpf, but has now been included here. v2 -> v3: * Fixed compilation error for llvm [David] * Made the function xsk_is_in_drv_mode(ifobj) more generic by changing it to xsk_is_in_mode(ifobj, xdp_mode) [Maciej] * Added Maciej's acks to all the patches v1 -> v2: * Fixed spelling error in commit message of patch #6 [Björn] * Added explanation on why it is safe to use C11 atomics in patch #7 [Daniel] * Put all XDP programs in the same file so that adding more XDP programs to xskxceiver.c becomes more scalable in patches #11 and #12 [Maciej] * Removed more dead code in patch #8 [Maciej] * Removed stale %s specifier in error print, patch #9 [Maciej] * Changed name of XDP_CONSUMES_SOME_PACKETS to XDP_DROP_HALF to hopefully make it clearer [Maciej] * ifobj_rx and ifobj_tx name changes in patch #13 [Maciej] * Simplified XDP attachment code in patch #15 [Maciej] Patches: 1-5: Small fixes and clean-ups 6: New convenient debug option when using a debugger such as gdb 7-8: Removal of unnecessary code 9: Add the ability to load external XDP programs 10-11: Removal of more unnecessary code 12: Implement a new test where every other packet is XDP_DROP:ed 13: Unify the thread dispatching code 14-15: Simplify the way tests are written when using custom packet_streams or custom XDP programs Thanks: Magnus ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-11 18:16:53 -08:00
Magnus Karlsson	7d8319a7cc	selftests/xsk: automatically switch XDP programs Implement automatic switching of XDP programs and execution modes if needed by a test. This makes it much simpler to write a test as it only has to say what XDP program it needs if it is not the default one. This also makes it possible to remove the initial explicit attachment of the XDP program as well as the explicit mode switch in the code. These are now handled by the same code that just checks if a switch is necessary, so no special cases are needed. The default XDP program for all tests is one that sends all packets to the AF_XDP socket. If you need another one, please use the new function test_spec_set_xdp_prog() to specify what XDP programs and maps to use for this test. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230111093526.11682-16-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-11 18:16:52 -08:00
Magnus Karlsson	e67b2554f3	selftests/xsk: automatically restore packet stream Automatically restore the default packet stream if needed at the end of each test. This so that test writers do not forget to do this. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230111093526.11682-15-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-11 18:16:52 -08:00
Magnus Karlsson	7f88198407	selftests/xsk: merge dual and single thread dispatchers Make the thread dispatching code common by unifying the dual and single thread dispatcher code. This so we do not have to add code in two places in upcoming commits. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230111093526.11682-14-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-11 18:16:52 -08:00
Magnus Karlsson	80bea9acab	selftests/xsk: add test when some packets are XDP_DROPed Add a new test where some of the packets are not passed to the AF_XDP socket and instead get a XDP_DROP verdict. This is important as it tests the recycling mechanism of the buffer pool. If a packet is not sent to the AF_XDP socket, the buffer the packet resides in is instead recycled so it can be used again without the round-trip to user space. The test introduces a new XDP program that drops every other packet. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230111093526.11682-13-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-11 18:16:52 -08:00
Magnus Karlsson	f0a249df1b	selftests/xsk: get rid of built-in XDP program Get rid of the built-in XDP program that was part of the old libbpf code in xsk.c and replace it with an eBPF program build using the framework by all the other bpf selftests. This will form the base for adding more programs in later commits. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230111093526.11682-12-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-11 18:16:52 -08:00
Magnus Karlsson	6b3c0821ca	selftests/xsk: remove unnecessary code in control path Remove unnecessary code in the control path. This is located in the file xsk.c that was moved from libbpf when the xsk support there was deprecated. Some of the code there is not needed anymore as the selftests are only guaranteed to run on the kernel it is shipped with. Therefore, all the code that has to deal with compatibility of older kernels can be dropped and also any other function that is not of any use for the tests. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230111093526.11682-11-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-11 18:16:52 -08:00
Magnus Karlsson	aa61d81f39	selftests/xsk: load and attach XDP program only once per mode Load and attach the XDP program only once per XDP mode that is being executed. Today, the XDP program is loaded and attached for every test, then unloaded, which takes a long time on real NICs, since they have to reconfigure their HW, in contrast to veth. The test suite now completes in 21 seconds, instead of 207 seconds previously on my machine. This is a speed-up of around 10x. This is accomplished by moving the XDP loading from the worker threads to the main thread and replacing the XDP loading interfaces of xsk.c that was taken from the xsk support in libbpf, with something more explicit that is more useful for these tests. Instead, the relevant file descriptors and ifindexes are just passed down to the new functions. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230111093526.11682-10-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-11 18:16:52 -08:00
Magnus Karlsson	64aef77d75	selftests/xsk: remove namespaces Remove the namespaces used as they fill no function. This will simplify the code for speeding up the tests in the following commits. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230111093526.11682-9-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-11 18:16:52 -08:00
Magnus Karlsson	efe620e5ba	selftests/xsk: replace asm acquire/release implementations Replace our own homegrown assembly store/release and load/acquire implementations with the HW agnositic atomic APIs C11 offers. This to make the code more portable, easier to read, and reduce the maintenance burden. The original code used load-acquire and store-release barriers hand-coded in assembly. Since C11, these kind of operations are offered as built-ins in gcc and llvm. The load-acquire operation prevents hoisting of non-atomic memory operations to before this operation and it corresponds to the __ATOMIC_ACQUIRE operation in the built-in atomics. The store-release operation prevents hoisting of non-atomic memory operations to after this operation and it corresponds to the __ATOMIC_RELEASE operation in the built-in atomics. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230111093526.11682-8-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-11 18:16:52 -08:00
Magnus Karlsson	703bfd3710	selftests/xsk: add debug option for creating netdevs Add a new option to the test_xsk.sh script that only creates the two veth netdevs and the extra namespace, then exits without running any tests. The failed test can then be executed in the debugger without having to create the netdevs and namespace manually. For ease-of-use, the veth netdevs to use are printed so they can be copied into the debugger. Here is an example how to use it: > sudo ./test_xsk.sh -d veth10 veth11 > gdb xskxceiver In gdb: run -i veth10 -i veth11 And now the test cases can be debugged with gdb. If you want to debug the test suite on a real NIC in loopback mode, there is no need to use this feature as you already know the netdev of your NIC. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Acked-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230111093526.11682-7-magnus.karlsson@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-01-11 18:16:51 -08:00

1 2 3 4 5 ...

1153871 Commits