Do a sanity check whether provided file-to-be-pinned is actually a BPF
object (prog, map, btf) before calling security_path_mknod LSM hook. If
it's not, LSM hook doesn't have to be triggered, as the operation has no
chance of succeeding anyways.
Suggested-by: Christian Brauner <brauner@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Christian Brauner <brauner@kernel.org>
Link: https://lore.kernel.org/bpf/20230522232917.2454595-2-andrii@kernel.org
Add ability to specify a network interface used to resolve XDP hints
kfuncs when loading program through bpftool.
Usage:
bpftool prog load [...] xdpmeta_dev <ifname>
Writing just 'dev <ifname>' instead of 'xdpmeta_dev' is a very probable
mistake that results in not very descriptive errors,
so 'bpftool prog load [...] dev <ifname>' syntax becomes deprecated,
followed by 'bpftool map create [...] dev <ifname>' for consistency.
Now, to offload program, execute:
bpftool prog load [...] offload_dev <ifname>
To offload map:
bpftool map create [...] offload_dev <ifname>
'dev <ifname>' still performs offloading in the commands above, but now
triggers a warning and is excluded from bash completion.
'xdpmeta_dev' and 'offload_dev' are mutually exclusive options, because
'xdpmeta_dev' basically makes a program device-bound without loading it
onto the said device. For now, offloaded programs cannot use XDP hints [0],
but if this changes, using 'offload_dev <ifname>' should cover this case.
[0] https://lore.kernel.org/bpf/a5a636cc-5b03-686f-4be0-000383b05cfc@linux.dev
Signed-off-by: Larysa Zaremba <larysa.zaremba@intel.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/20230517160103.1088185-1-larysa.zaremba@intel.com
Aditi Ghag says:
====================
This patch set adds the capability to destroy sockets in BPF. We plan to
use the capability in Cilium to force client sockets to reconnect when
their remote load-balancing backends are deleted. The other use case is
on-the-fly policy enforcement where existing socket connections
prevented by policies need to be terminated.
The use cases, and more details around
the selected approach were presented at LPC 2022 -
https://lpc.events/event/16/contributions/1358/.
RFC discussion -
https://lore.kernel.org/netdev/CABG=zsBEh-P4NXk23eBJw7eajB5YJeRS7oPXnTAzs=yob4EMoQ@mail.gmail.com/T/#u.
v8 patch series -
https://lore.kernel.org/bpf/20230517175359.527917-1-aditi.ghag@isovalent.com/
v9 highlights:
Address review comments:
Martin:
- Rearranged the kfunc filter patch, and added the missing break
statement.
- Squashed the extended selftest/bpf patch.
Yonghong:
- Revised commit message for patch 1.
(Below notes are same as v8 patch series that are still relevant. Refer to
earlier patch series versions for other notes.)
- I hit a snag while writing the kfunc where verifier complained about the
`sock_common` type passed from TCP iterator. With kfuncs, there don't
seem to be any options available to pass BTF type hints to the verifier
(equivalent of `ARG_PTR_TO_BTF_ID_SOCK_COMMON`, as was the case with the
helper). As a result, I changed the argument type of the sock_destory
kfunc to `sock_common`.
====================
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The test cases for destroying sockets mirror the intended usages of the
bpf_sock_destroy kfunc using iterators.
The destroy helpers set `ECONNABORTED` error code that we can validate
in the test code with client sockets. But UDP sockets have an overriding
error code from `disconnect()` called during abort, so the error code
validation is only done for TCP sockets.
The failure test cases validate that the `bpf_sock_destroy` kfunc is not
allowed from program attach types other than BPF trace iterator, and
such programs fail to load.
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-10-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The helper will be used to programmatically retrieve
and pass ports in userspace and kernel selftest programs.
Suggested-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-9-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
The socket destroy kfunc is used to forcefully terminate sockets from
certain BPF contexts. We plan to use the capability in Cilium
load-balancing to terminate client sockets that continue to connect to
deleted backends. The other use case is on-the-fly policy enforcement
where existing socket connections prevented by policies need to be
forcefully terminated. The kfunc also allows terminating sockets that may
or may not be actively sending traffic.
The kfunc can currently be called only from BPF TCP and UDP iterators
where users can filter, and terminate selected sockets. More
specifically, it can only be called from BPF contexts that ensure
socket locking in order to allow synchronous execution of protocol
specific `diag_destroy` handlers. The previous commit that batches UDP
sockets during iteration facilitated a synchronous invocation of the UDP
destroy callback from BPF context by skipping socket locks in
`udp_abort`. TCP iterator already supported batching of sockets being
iterated. To that end, `tracing_iter_filter` callback filter is added so
that verifier can restrict the kfunc to programs with `BPF_TRACE_ITER`
attach type, and reject other programs.
The kfunc takes `sock_common` type argument, even though it expects, and
casts them to a `sock` pointer. This enables the verifier to allow the
sock_destroy kfunc to be called for TCP with `sock_common` and UDP with
`sock` structs. Furthermore, as `sock_common` only has a subset of
certain fields of `sock`, casting pointer to the latter type might not
always be safe for certain sockets like request sockets, but these have a
special handling in the diag_destroy handlers.
Additionally, the kfunc is defined with `KF_TRUSTED_ARGS` flag to avoid the
cases where a `PTR_TO_BTF_ID` sk is obtained by following another pointer.
eg. getting a sk pointer (may be even NULL) by following another sk
pointer. The pointer socket argument passed in TCP and UDP iterators is
tagged as `PTR_TRUSTED` in {tcp,udp}_reg_info. The TRUSTED arg changes
are contributed by Martin KaFai Lau <martin.lau@kernel.org>.
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-8-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This commit adds the ability to filter kfuncs to certain BPF program
types. This is required to limit bpf_sock_destroy kfunc implemented in
follow-up commits to programs with attach type 'BPF_TRACE_ITER'.
The commit adds a callback filter to 'struct btf_kfunc_id_set'. The
filter has access to the `bpf_prog` construct including its properties
such as `expected_attached_type`.
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-7-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Batch UDP sockets from BPF iterator that allows for overlapping locking
semantics in BPF/kernel helpers executed in BPF programs. This facilitates
BPF socket destroy kfunc (introduced by follow-up patches) to execute from
BPF iterator programs.
Previously, BPF iterators acquired the sock lock and sockets hash table
bucket lock while executing BPF programs. This prevented BPF helpers that
again acquire these locks to be executed from BPF iterators. With the
batching approach, we acquire a bucket lock, batch all the bucket sockets,
and then release the bucket lock. This enables BPF or kernel helpers to
skip sock locking when invoked in the supported BPF contexts.
The batching logic is similar to the logic implemented in TCP iterator:
https://lore.kernel.org/bpf/20210701200613.1036157-1-kafai@fb.com/.
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-6-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This is a preparatory commit to remove the field. The field was
previously shared between proc fs and BPF UDP socket iterators. As the
follow-up commits will decouple the implementation for the iterators,
remove the field. As for BPF socket iterator, filtering of sockets is
exepected to be done in BPF programs.
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-5-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This is a preparatory commit that encapsulates the logic
to get udp table in iterator inside udp_get_table_afinfo, and
renames the function to `udp_get_table_seq` accordingly.
Suggested-by: Martin KaFai Lau <martin.lau@kernel.org>
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-4-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This is a preparatory commit to refactor code that matches socket
attributes in iterators to a helper function, and use it in the
proc fs iterator.
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-3-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
This is a preparatory commit to replace `lock_sock_fast` with
`lock_sock`,and facilitate BPF programs executed from the TCP sockets
iterator to be able to destroy TCP sockets using the bpf_sock_destroy
kfunc (implemented in follow-up commits).
Previously, BPF TCP iterator was acquiring the sock lock with BH
disabled. This led to scenarios where the sockets hash table bucket lock
can be acquired with BH enabled in some path versus disabled in other.
In such situation, kernel issued a warning since it thinks that in the
BH enabled path the same bucket lock *might* be acquired again in the
softirq context (BH disabled), which will lead to a potential dead lock.
Since bpf_sock_destroy also happens in a process context, the potential
deadlock warning is likely a false alarm.
Here is a snippet of annotated stack trace that motivated this change:
```
Possible interrupt unsafe locking scenario:
CPU0 CPU1
---- ----
lock(&h->lhash2[i].lock);
local_bh_disable();
lock(&h->lhash2[i].lock);
kernel imagined possible scenario:
local_bh_disable(); /* Possible softirq */
lock(&h->lhash2[i].lock);
*** Potential Deadlock ***
process context:
lock_acquire+0xcd/0x330
_raw_spin_lock+0x33/0x40
------> Acquire (bucket) lhash2.lock with BH enabled
__inet_hash+0x4b/0x210
inet_csk_listen_start+0xe6/0x100
inet_listen+0x95/0x1d0
__sys_listen+0x69/0xb0
__x64_sys_listen+0x14/0x20
do_syscall_64+0x3c/0x90
entry_SYSCALL_64_after_hwframe+0x72/0xdc
bpf_sock_destroy run from iterator:
lock_acquire+0xcd/0x330
_raw_spin_lock+0x33/0x40
------> Acquire (bucket) lhash2.lock with BH disabled
inet_unhash+0x9a/0x110
tcp_set_state+0x6a/0x210
tcp_abort+0x10d/0x200
bpf_prog_6793c5ca50c43c0d_iter_tcp6_server+0xa4/0xa9
bpf_iter_run_prog+0x1ff/0x340
------> lock_sock_fast that acquires sock lock with BH disabled
bpf_iter_tcp_seq_show+0xca/0x190
bpf_seq_read+0x177/0x450
```
Also, Yonghong reported a deadlock for non-listening TCP sockets that
this change resolves. Previously, `lock_sock_fast` held the sock spin
lock with BH which was again being acquired in `tcp_abort`:
```
watchdog: BUG: soft lockup - CPU#0 stuck for 86s! [test_progs:2331]
RIP: 0010:queued_spin_lock_slowpath+0xd8/0x500
Call Trace:
<TASK>
_raw_spin_lock+0x84/0x90
tcp_abort+0x13c/0x1f0
bpf_prog_88539c5453a9dd47_iter_tcp6_client+0x82/0x89
bpf_iter_run_prog+0x1aa/0x2c0
? preempt_count_sub+0x1c/0xd0
? from_kuid_munged+0x1c8/0x210
bpf_iter_tcp_seq_show+0x14e/0x1b0
bpf_seq_read+0x36c/0x6a0
bpf_iter_tcp_seq_show
lock_sock_fast
__lock_sock_fast
spin_lock_bh(&sk->sk_lock.slock);
/* * Fast path return with bottom halves disabled and * sock::sk_lock.slock held.* */
...
tcp_abort
local_bh_disable();
spin_lock(&((sk)->sk_lock.slock)); // from bh_lock_sock(sk)
```
With the switch to `lock_sock`, it calls `spin_unlock_bh` before returning:
```
lock_sock
lock_sock_nested
spin_lock_bh(&sk->sk_lock.slock);
:
spin_unlock_bh(&sk->sk_lock.slock);
```
Acked-by: Yonghong Song <yhs@meta.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
Link: https://lore.kernel.org/r/20230519225157.760788-2-aditi.ghag@isovalent.com
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Yafang Shao says:
====================
The target_btf_id can help us understand which kernel function is
linked by a tracing prog. The target_btf_id and target_obj_id have
already been exposed to userspace, so we just need to show them.
For some other link types like perf_event and kprobe_multi, it is not
easy to find which functions are attached either. We may support
->fill_link_info for them in the future.
v1->v2:
- Skip showing them in the plain output for the old kernels. (Quentin)
- Coding improvement. (Andrii)
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The target_btf_id can help us understand which kernel function is
linked by a tracing prog. The target_btf_id and target_obj_id have
already been exposed to userspace, so we just need to show them.
The result as follows,
$ tools/bpf/bpftool/bpftool link show
2: tracing prog 13
prog_type tracing attach_type trace_fentry
target_obj_id 1 target_btf_id 13964
pids trace(10673)
$ tools/bpf/bpftool/bpftool link show -j
[{"id":2,"type":"tracing","prog_id":13,"prog_type":"tracing","attach_type":"trace_fentry","target_obj_id":1,"target_btf_id":13964,"pids":[{"pid":10673,"comm":"trace"}]}]
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230517103126.68372-3-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
The target_btf_id can help us understand which kernel function is
linked by a tracing prog. The target_btf_id and target_obj_id have
already been exposed to userspace, so we just need to show them.
The result as follows,
$ cat /proc/10673/fdinfo/10
pos: 0
flags: 02000000
mnt_id: 15
ino: 2094
link_type: tracing
link_id: 2
prog_tag: a04f5eef06a7f555
prog_id: 13
attach_type: 24
target_obj_id: 1
target_btf_id: 13964
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230517103126.68372-2-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently kernel kfunc bpf_dynptr_is_rdonly() has prototype ...
__bpf_kfunc bool bpf_dynptr_is_rdonly(struct bpf_dynptr_kern *ptr)
... while selftests bpf_kfuncs.h has:
extern int bpf_dynptr_is_rdonly(const struct bpf_dynptr *ptr) __ksym;
Such a mismatch might cause problems although currently it is okay in
selftests. Fix it to prevent future potential surprise.
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230517040409.4024618-1-yhs@fb.com
With latest llvm17, dynptr/test_dynptr_is_null subtest failed in my testing
VM. The failure log looks like below:
All error logs:
tester_init:PASS:tester_log_buf 0 nsec
process_subtest:PASS:obj_open_mem 0 nsec
process_subtest:PASS:Can't alloc specs array 0 nsec
verify_success:PASS:dynptr_success__open 0 nsec
verify_success:PASS:bpf_object__find_program_by_name 0 nsec
verify_success:PASS:dynptr_success__load 0 nsec
verify_success:PASS:bpf_program__attach 0 nsec
verify_success:FAIL:err unexpected err: actual 4 != expected 0
#65/9 dynptr/test_dynptr_is_null:FAIL
The error happens for bpf prog test_dynptr_is_null in dynptr_success.c:
if (bpf_dynptr_is_null(&ptr2)) {
err = 4;
goto exit;
}
The bpf_dynptr_is_null(&ptr) unexpectedly returned a non-zero value and
the control went to the error path. Digging further, I found the root cause
is due to function signature difference between kernel and user space.
In kernel, we have ...
__bpf_kfunc bool bpf_dynptr_is_null(struct bpf_dynptr_kern *ptr)
... while in bpf_kfuncs.h we have:
extern int bpf_dynptr_is_null(const struct bpf_dynptr *ptr) __ksym;
The kernel bpf_dynptr_is_null disasm code:
ffffffff812f1a90 <bpf_dynptr_is_null>:
ffffffff812f1a90: f3 0f 1e fa endbr64
ffffffff812f1a94: 0f 1f 44 00 00 nopl (%rax,%rax)
ffffffff812f1a99: 53 pushq %rbx
ffffffff812f1a9a: 48 89 fb movq %rdi, %rbx
ffffffff812f1a9d: e8 ae 29 17 00 callq 0xffffffff81464450 <__asan_load8_noabort>
ffffffff812f1aa2: 48 83 3b 00 cmpq $0x0, (%rbx)
ffffffff812f1aa6: 0f 94 c0 sete %al
ffffffff812f1aa9: 5b popq %rbx
ffffffff812f1aaa: c3 retq
Note that only 1-byte register %al is set and the other 7-bytes are not
touched. In bpf program, the asm code for the above bpf_dynptr_is_null(&ptr2):
266: 85 10 00 00 ff ff ff ff call -0x1
267: b4 01 00 00 04 00 00 00 w1 = 0x4
268: 16 00 03 00 00 00 00 00 if w0 == 0x0 goto +0x3 <LBB9_8>
Basically, 4-byte subregister is tested. This might cause error as the value
other than the lowest byte might not be 0.
This patch fixed the issue by using the identical func prototype across kernel
and selftest user space. The fixed bpf asm code:
267: 85 10 00 00 ff ff ff ff call -0x1
268: 54 00 00 00 01 00 00 00 w0 &= 0x1
269: b4 01 00 00 04 00 00 00 w1 = 0x4
270: 16 00 03 00 00 00 00 00 if w0 == 0x0 goto +0x3 <LBB9_8>
Signed-off-by: Yonghong Song <yhs@fb.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20230517040404.4023912-1-yhs@fb.com
Update the documentation regarding shift operations to explain the
use of a mask, since otherwise shifting by a value out of range
(like negative) is undefined.
Signed-off-by: Dave Thaler <dthaler@microsoft.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/bpf/20230509180845.1236-1-dthaler1968@googlemail.com
Currently, when using prog loadall and the pin path is a bpffs mountpoint,
bpffs will be repeatedly mounted to the parent directory of the bpffs
mountpoint path. For example, a `bpftool prog loadall test.o /sys/fs/bpf`
will trigger this.
Signed-off-by: Pengcheng Yang <yangpc@wangsu.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Quentin Monnet <quentin@isovalent.com>
Link: https://lore.kernel.org/bpf/1683342439-3677-1-git-send-email-yangpc@wangsu.com
The sign-file utility (from scripts/) is used in prog_tests/verify_pkcs7_sig.c,
but the utility should not be called as a test. Executing this utility produces
the following error:
selftests: /linux/tools/testing/selftests/bpf: urandom_read
ok 16 selftests: /linux/tools/testing/selftests/bpf: urandom_read
selftests: /linux/tools/testing/selftests/bpf: sign-file
not ok 17 selftests: /linux/tools/testing/selftests/bpf: sign-file # exit=2
Also, urandom_read is mistakenly used as a test. It does not lead to an error,
but should be moved over to TEST_GEN_FILES as well. The empty TEST_CUSTOM_PROGS
can then be removed.
Fixes: fc97590668 ("selftests/bpf: Add test for bpf_verify_pkcs7_signature() kfunc")
Signed-off-by: Alexey Gladkov <legion@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Roberto Sassu <roberto.sassu@huawei.com>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/bpf/ZEuWFk3QyML9y5QQ@example.org
Link: https://lore.kernel.org/bpf/88e3ab23029d726a2703adcf6af8356f7a2d3483.1684316821.git.legion@kernel.org
It's trivial for user to trigger "verifier log line truncated" warning,
as verifier has a fixed-sized buffer of 1024 bytes (as of now), and there are at
least two pieces of user-provided information that can be output through
this buffer, and both can be arbitrarily sized by user:
- BTF names;
- BTF.ext source code lines strings.
Verifier log buffer should be properly sized for typical verifier state
output. But it's sort-of expected that this buffer won't be long enough
in some circumstances. So let's drop the check. In any case code will
work correctly, at worst truncating a part of a single line output.
Reported-by: syzbot+8b2a08dfbd25fd933d75@syzkaller.appspotmail.com
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20230516180409.3549088-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Magnus Karlsson says:
====================
Prepare the AF_XDP selftests test framework code for the upcoming
multi-buffer support in AF_XDP. This so that the multi-buffer patch
set does not become way too large. In that upcoming patch set, we are
only including the multi-buffer tests together with any framework
code that depends on the new options bit introduced in the AF_XDP
multi-buffer implementation itself.
Currently, the test framework is based on the premise that a packet
consists of a single fragment and thus occupies a single buffer and a
single descriptor. Multi-buffer breaks this assumption, as that is the
whole purpose of it. Now, a packet can consist of multiple buffers and
therefore consume multiple descriptors.
The patch set starts with some clean-ups and simplifications followed
by patches that make sure that the current code works even when a
packet occupies multiple buffers. The actual code for sending and
receiving multi-buffer packets will be included in the AF_XDP
multi-buffer patch set as it depends on a new bit being used in the
options field of the descriptor.
Patch set anatomy:
1: The XDP program was unnecessarily changed many times. Fixes this.
2: There is no reason to generate a full UDP/IPv4 packet as it is
never used. Simplify the code by just generating a valid Ethernet
frame.
3: Introduce a more complicated payload pattern that can detect
fragments out of bounds in a multi-buffer packet and other errors
found in single-fragment packets.
4: As a convenience, dump the content of the faulty packet at error.
5: To simplify the code, make the usage of the packet stream for Tx
and Rx more similar.
6: Store the offset of the packet in the buffer in the struct pkt
definition instead of the address in the umem itself and introduce
a simple buffer allocator. The address only made sense when all
packets consumed a single buffer. Now, we do not know beforehand
how many buffers a packet will consume, so we instead just allocate
a buffer from the allocator and specify the offset within that
buffer.
7: Test for huge pages only once instead of before each test that needs it.
8: Populate the fill ring based on how many frags are needed for each
packet.
9: Change the data generation code so it can generate data for
multi-buffer packets too.
10: Adjust the packet pacing algorithm so that it can cope with
multi-buffer packets. The pacing algorithm is present so that Tx
does not send too many packets/frames to Rx that it starts to drop
packets. That would ruin the tests.
v1 -> v2:
* Fixed spelling error in patch #6 [Simon]
* Fixed compilation error with llvm in patch #7 [Daniel]
Thanks: Magnus
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Modify the packet pacing algorithm so that it works with multi-buffer
packets. This algorithm makes sure we do not send too many buffers to
the receiving thread so that packets have to be dropped. The previous
algorithm made the assumption that each packet only consumes one
buffer, but that is not true anymore when multi-buffer support gets
added. Instead, we find out what the largest packet size is in the
packet stream and assume that each packet will consume this many
buffers. This is conservative and overly cautious as there might be
smaller packets in the stream that need fewer buffers per packet. But
it keeps the algorithm simple.
Also simplify it by removing the pthread conditional and just test if
there is enough space in the Rx thread before trying to send one more
batch. Also makes the tests run faster.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-11-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add the ability to generate data in the packets that are correct for
multi-buffer packets. The ethernet header should only go into the
first fragment followed by data and the others should only have
data. We also need to modify the pkt_dump function so that it knows
what fragment has an ethernet header so it can print this.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-10-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Populate the fill ring based on the number of frags a packet
needs. With multi-buffer support, a packet might require more than a
single fragment/buffer, so the function xsk_populate_fill_ring() needs
to consider how many buffers a packet will consume, and put that many
buffers on the fill ring for each packet it should receive. As we are
still not sending any multi-buffer packets, the function will only
produce one buffer per packet at the moment.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-9-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Test for hugepages only once at the beginning of the execution of the
whole test suite, instead of before each test that needs huge
pages. These are the tests that use unaligned mode. As more unaligned
tests will be added, so the current system just does not scale.
With this change, there are now three possible outcomes of a test run:
fail, pass, or skip. To simplify the handling of this, the function
testapp_validate_traffic() now returns this value to the main loop. As
this function is used by nearly all tests, it meant a small change to
most of them.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-8-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Store the offset in struct pkt instead of the address. This is
important since address is only meaningful in the context of a packet
that is stored in a single umem buffer and thus a single Tx
descriptor. If the packet, in contrast need to be represented by
multiple buffers in the umem, storing the address makes no sense since
the packet will consist of multiple buffers in the umem at various
addresses. This change is in preparation for the upcoming
multi-buffer support in AF_XDP and the corresponding tests.
So instead of indicating the address, we instead indicate the offset
of the packet in the first buffer. The actual address of the buffer is
allocated from the umem with a new function called
umem_alloc_buffer(). This also means we can get rid of the
use_fill_for_addr flag as the addresses fed into the fill ring will
always be the offset from the pkt specification in the packet stream
plus the address of the allocated buffer from the umem. No special
casing needed.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-7-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Convert the current variable rx_pkt_nb to an iterator that can be used
for both Rx and Tx. This to simplify the code and making Tx more like
Rx that already has this feature.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-6-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Dump the content of the packet when a test finds that packets are
received out of order, the length is wrong, or some other packet
error. Use the already existing pkt_dump function for this and call it
when the above errors are detected. Get rid of the command line option
for dumping packets as it is not useful to print out thousands of
good packets followed by the faulty one you would like to see.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-5-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Add a varying payload pattern within the packet. Instead of having
just a packet number that is the same for all words in a packet, make
each word different in the packet. The upper 16-bits are set to the
packet number and the lower 16-bits are the sequence number of the
words in this packet. So the 3rd packet's 5th 32-bit word of data will
contain the number (2<<32) | 4 as they are numbered from 0.
This will make it easier to detect fragments that are out of order
when starting to test multi-buffer support.
The member payload in the packet is renamed pkt_nb to reflect that it
is now only a pkt_nb, not the real payload as seen above.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-4-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Implement support for generating pkts with variable length. Before
this patch, they were all 64 bytes, exception for some packets of zero
length and some that were too large. This feature will be used to test
multi-buffer support for which large packets are needed.
The packets are also made simpler, just a valid Ethernet header
followed by a sequence number. This so that it will become easier to
implement packet generation when each packet consists of multiple
fragments. There is also a maintenance burden associated with carrying
all this code for generating proper UDP/IP packets, especially since
they are not needed.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-3-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Do not change the XDP program for the Tx thread when not needed. It
was erroneously compared to the XDP program for the Rx thread, which
is always going to be different, which meant that the code made
unnecessary switches to the same program it had before. This did not
affect functionality, just performance.
Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com>
Link: https://lore.kernel.org/r/20230516103109.3066-2-magnus.karlsson@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Jiri Olsa says:
====================
hi,
I noticed several times in discussions that we should move test kfuncs
into kernel module, now perhaps even more pressing with all the kfunc
effort. This patchset moves all the test kfuncs into bpf_testmod.
I added bpf_testmod/bpf_testmod_kfunc.h header that is shared between
bpf_testmod kernel module and BPF programs.
v4 changes:
- s390 supports long calls [1] now, so it can call now kfuncs from module [Ilya]
- added acks [David]
- cleanups for ptr_to_u64 function [David]
- use relative path for bpf_testmod_kfunc.h include [Andrii]
- new libbpf fix (patch 1) for gen_loader
v3 changes:
- added acks [David]
- added bpf_testmod.ko make dependency for bpf test progs [David]
- better handling of __ksym and refcount_t in bpf_testmod_kfunc.h [David]
- removed 'extern' from kfuncs declarations [David]
- typo in header guard macro [David]
- use only stdout in un/load_bpf_testmod
v2 changes:
- add 74bc3a5acc into bpf-next/master CI, so the test would pass
https://github.com/kernel-patches/vmtest/pull/192
- remove extra externs [Artem]
- using un/load_bpf_testmod in other tests
- rebased
thanks,
jirka
[1] 1cf3bfc60f bpf: Support 64-bit pointers to kfuncs
====================
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Moving kernel test kfuncs into bpf_testmod kernel module, and adding
necessary init calls and BTF IDs records.
We need to keep following structs in kernel:
struct prog_test_ref_kfunc
struct prog_test_member (embedded in prog_test_ref_kfunc)
The reason is because they need to be marked as rcu safe (check test
prog mark_ref_as_untrusted_or_null) and such objects are being required
to be defined only in kernel at the moment (see rcu_safe_kptr check
in kernel).
We need to keep also dtor functions for both objects in kernel:
bpf_kfunc_call_test_release
bpf_kfunc_call_memb_release
We also keep the copy of these struct in bpf_testmod_kfunc.h, because
other test functions use them. This is unfortunate, but this is just
temporary solution until we are able to these structs them to bpf_testmod
completely.
As suggested by David adding bpf_testmod.ko make dependency for
bpf programs, so they are rebuilt if we change the bpf_testmod.ko
module.
Also adding missing __bpf_kfunc to bpf_kfunc_call_test4 functions.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: David Vernet <void@manifault.com>
Link: https://lore.kernel.org/r/20230515133756.1658301-11-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
There's no need to keep the extern in kfuncs declarations.
Suggested-by: David Vernet <void@manifault.com>
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-10-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Currently the test_verifier allows test to specify kfunc symbol
and search for it in the kernel BTF.
Adding the possibility to search for kfunc also in bpf_testmod
module when it's not found in kernel BTF.
To find bpf_testmod btf we need to get back SYS_ADMIN cap.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-9-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Loading bpf_testmod kernel module for verifier test. We will
move all the tests kfuncs into bpf_testmod in following change.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-8-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Now that we have un/load_bpf_testmod helpers in testing_helpers.h,
we can use it in other tests and save some lines.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-7-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Do not unload bpf_testmod in load_bpf_testmod, instead call
unload_bpf_testmod separatelly.
This way we will be able use un/load_bpf_testmod functions
in other tests that un/load bpf_testmod module.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-6-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
We are about to use un/load_bpf_testmod functions in couple tests
and it's better to print output to stdout, so it's aligned with
tests ASSERT macros output, which use stdout as well.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-5-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Moving test_progs helpers to testing_helpers object so they can be
used from test_verifier in following changes.
Also adding missing ifndef header guard to testing_helpers.h header.
Using stderr instead of env.stderr because un/load_bpf_testmod helpers
will be used outside test_progs. Also at the point of calling them
in test_progs the std files are not hijacked yet and stderr is the
same as env.stderr.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-4-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Move all kfunc exports into separate bpf_testmod_kfunc.h header file
and include it in tests that need it.
We will move all test kfuncs into bpf_testmod in following change,
so it's convenient to have declarations in single place.
The bpf_testmod_kfunc.h is included by both bpf_testmod and bpf
programs that use test kfuncs.
As suggested by David, the bpf_testmod_kfunc.h includes vmlinux.h
and bpf/bpf_helpers.h for bpf programs build, so the declarations
have proper __ksym attribute and we can resolve all the structs.
Note in kfunc_call_test_subprog.c we can no longer use the sk_state
define from bpf_tcp_helpers.h (because it clashed with vmlinux.h)
and we need to address __sk_common.skc_state field directly.
Acked-by: David Vernet <void@manifault.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-3-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
When moving some of the test kfuncs to bpf_testmod I hit an issue
when some of the kfuncs that object uses are in module and some
in vmlinux.
The problem is that both vmlinux and module kfuncs get allocated
btf_fd_idx index into fd_array, but we store to it the BTF fd value
only for module's kfunc, not vmlinux's one because (it's zero).
Then after the program is loaded we check if fd_array[btf_fd_idx] != 0
and close the fd.
When the object has kfuncs from both vmlinux and module, the fd from
fd_array[btf_fd_idx] from previous load will be stored in there for
vmlinux's kfunc, so we close unrelated fd (of the program we just
loaded in my case).
Fixing this by storing zero to fd_array[btf_fd_idx] for vmlinux
kfuncs, so the we won't close stale fd.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/r/20230515133756.1658301-2-jolsa@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
llvm patch [1] enabled cross-function optimization for func arguments
(ArgumentPromotion) at -O2 level. And this caused s390 sock_fields
test failure ([2]). The failure is gone right now as patch [1] was
reverted in [3]. But it is possible that patch [3] will be reverted
again and then the test failure in [2] will show up again. So it is
desirable to fix the failure regardless.
The following is an analysis why sock_field test fails with
llvm patch [1].
The main problem is in
static __noinline bool sk_dst_port__load_word(struct bpf_sock *sk)
{
__u32 *word = (__u32 *)&sk->dst_port;
return word[0] == bpf_htons(0xcafe);
}
static __noinline bool sk_dst_port__load_half(struct bpf_sock *sk)
{
__u16 *half = (__u16 *)&sk->dst_port;
return half[0] == bpf_htons(0xcafe);
}
...
int read_sk_dst_port(struct __sk_buff *skb)
{
...
sk = skb->sk;
...
if (!sk_dst_port__load_word(sk))
RET_LOG();
if (!sk_dst_port__load_half(sk))
RET_LOG();
...
}
Through some cross-function optimization by ArgumentPromotion
optimization, the compiler does:
static __noinline bool sk_dst_port__load_word(__u32 word_val)
{
return word_val == bpf_htons(0xcafe);
}
static __noinline bool sk_dst_port__load_half(__u16 half_val)
{
return half_val == bpf_htons(0xcafe);
}
...
int read_sk_dst_port(struct __sk_buff *skb)
{
...
sk = skb->sk;
...
__u32 *word = (__u32 *)&sk->dst_port;
__u32 word_val = word[0];
...
if (!sk_dst_port__load_word(word_val))
RET_LOG();
__u16 half_val = word_val >> 16;
if (!sk_dst_port__load_half(half_val))
RET_LOG();
...
}
In current uapi bpf.h, we have
struct bpf_sock {
...
__be16 dst_port; /* network byte order */
__u16 :16; /* zero padding */
...
};
But the old kernel (e.g., 5.6) we have
struct bpf_sock {
...
__u32 dst_port; /* network byte order */
...
};
So for backward compatability reason, 4-byte load of
dst_port is converted to 2-byte load internally.
Specifically, 'word_val = word[0]' is replaced by 2-byte load
by the verifier and this caused the trouble for later
sk_dst_port__load_half() where half_val becomes 0.
Typical usr program won't have such a code pattern tiggering
the above bug, so let us fix the test failure with source
code change. Adding an empty asm volatile statement seems
enough to prevent undesired transformation.
[1] https://reviews.llvm.org/D148269
[2] https://lore.kernel.org/bpf/e7f2c5e8-a50c-198d-8f95-388165f1e4fd@meta.com/
[3] https://reviews.llvm.org/rG141be5c062ecf22bd287afffd310e8ac4711444a
Tested-by: Ilya Leoshkevich <iii@linux.ibm.com>
Signed-off-by: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20230516214945.1013578-1-yhs@fb.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Change netcnt to demand at least 10K packets, as we frequently see some
stray packet arriving during the test in BPF CI. It seems more important
to make sure we haven't lost any packet than enforcing exact number of
packets.
Cc: Stanislav Fomichev <sdf@google.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Stanislav Fomichev <sdf@google.com>
Link: https://lore.kernel.org/r/20230515204833.2832000-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
-----BEGIN PGP SIGNATURE-----
iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZGKqEAAKCRDbK58LschI
g6LYAQDp1jAszCOkmJ8VUA0ZyC5NAFDv+7y9Nd1toYWYX1btzAEAkf8+5qBJ1qmI
P5M0hjMTbH4MID9Aql10ZbMHheyOBAo=
=NUQM
-----END PGP SIGNATURE-----
Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next
Daniel Borkmann says:
====================
pull-request: bpf-next 2023-05-16
We've added 57 non-merge commits during the last 19 day(s) which contain
a total of 63 files changed, 3293 insertions(+), 690 deletions(-).
The main changes are:
1) Add precision propagation to verifier for subprogs and callbacks,
from Andrii Nakryiko.
2) Improve BPF's {g,s}setsockopt() handling with wrong option lengths,
from Stanislav Fomichev.
3) Utilize pahole v1.25 for the kernel's BTF generation to filter out
inconsistent function prototypes, from Alan Maguire.
4) Various dyn-pointer verifier improvements to relax restrictions,
from Daniel Rosenberg.
5) Add a new bpf_task_under_cgroup() kfunc for designated task,
from Feng Zhou.
6) Unblock tests for arm64 BPF CI after ftrace supporting direct call,
from Florent Revest.
7) Add XDP hint kfunc metadata for RX hash/timestamp for igc,
from Jesper Dangaard Brouer.
8) Add several new dyn-pointer kfuncs to ease their usability,
from Joanne Koong.
9) Add in-depth LRU internals description and dot function graph,
from Joe Stringer.
10) Fix KCSAN report on bpf_lru_list when accessing node->ref,
from Martin KaFai Lau.
11) Only dump unprivileged_bpf_disabled log warning upon write,
from Kui-Feng Lee.
12) Extend test_progs to directly passing allow/denylist file,
from Stephen Veiss.
13) Fix BPF trampoline memleak upon failure attaching to fentry,
from Yafang Shao.
14) Fix emitting struct bpf_tcp_sock type in vmlinux BTF,
from Yonghong Song.
* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (57 commits)
bpf: Fix memleak due to fentry attach failure
bpf: Remove bpf trampoline selector
bpf, arm64: Support struct arguments in the BPF trampoline
bpftool: JIT limited misreported as negative value on aarch64
bpf: fix calculation of subseq_idx during precision backtracking
bpf: Remove anonymous union in bpf_kfunc_call_arg_meta
bpf: Document EFAULT changes for sockopt
selftests/bpf: Correctly handle optlen > 4096
selftests/bpf: Update EFAULT {g,s}etsockopt selftests
bpf: Don't EFAULT for {g,s}setsockopt with wrong optlen
libbpf: fix offsetof() and container_of() to work with CO-RE
bpf: Address KCSAN report on bpf_lru_list
bpf: Add --skip_encoding_btf_inconsistent_proto, --btf_gen_optimized to pahole flags for v1.25
selftests/bpf: Accept mem from dynptr in helper funcs
bpf: verifier: Accept dynptr mem as mem in helpers
selftests/bpf: Check overflow in optional buffer
selftests/bpf: Test allowing NULL buffer in dynptr slice
bpf: Allow NULL buffers in bpf_dynptr_slice(_rw)
selftests/bpf: Add testcase for bpf_task_under_cgroup
bpf: Add bpf_task_under_cgroup() kfunc
...
====================
Link: https://lore.kernel.org/r/20230515225603.27027-1-daniel@iogearbox.net
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
Bagas Sanjaya says:
====================
SPDX conversion for bonding, 8390, and i825xx drivers
This series is SPDX conversion for bonding, 8390, and i825xx driver
subsystems. It is splitted from v2 of my SPDX conversion series in
response to Didi's GPL full name fixes [1] to make it easily
digestible.
The conversion in this series is divided by each subsystem and by
license type.
[1]: https://lore.kernel.org/linux-spdx/20230512100620.36807-1-bagasdotme@gmail.com/
====================
Link: https://lore.kernel.org/r/20230515060714.621952-1-bagasdotme@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>