Commit Graph

38790 Commits

Author SHA1 Message Date
John Fastabend
8c1b382a55 bpf: sockmap, add tests for proto updates many to single map
Add test with a single map where each socket is inserted multiple
times. Test protocols: TCP, UDP, stream af_unix and dgram af_unix.

Signed-off-by: John Fastabend <john.fastabend@gmail.com>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Reviewed-by: Jakub Sitnicki <jakub@cloudflare.com>
Link: https://lore.kernel.org/r/20231221232327.43678-4-john.fastabend@gmail.com
2024-01-03 16:50:19 -08:00
Alexei Starovoitov
7e3811cb99 selftests/bpf: Convert profiler.c to bpf_cmp.
Convert profiler[123].c to "volatile compare" to compare barrier_var() approach vs bpf_cmp_likely() vs bpf_cmp_unlikely().

bpf_cmp_unlikely() produces correct code, but takes much longer to verify:

./veristat -C -e prog,insns,states before after_with_unlikely
Program                               Insns (A)  Insns (B)  Insns       (DIFF)  States (A)  States (B)  States     (DIFF)
------------------------------------  ---------  ---------  ------------------  ----------  ----------  -----------------
kprobe__proc_sys_write                     1603      19606  +18003 (+1123.08%)         123        1678  +1555 (+1264.23%)
kprobe__vfs_link                          11815      70305   +58490 (+495.05%)         971        4967   +3996 (+411.53%)
kprobe__vfs_symlink                        5464      42896   +37432 (+685.07%)         434        3126   +2692 (+620.28%)
kprobe_ret__do_filp_open                   5641      44578   +38937 (+690.25%)         446        3162   +2716 (+608.97%)
raw_tracepoint__sched_process_exec         2770      35962  +33192 (+1198.27%)         226        3121  +2895 (+1280.97%)
raw_tracepoint__sched_process_exit         1526       2135      +609 (+39.91%)         133         208      +75 (+56.39%)
raw_tracepoint__sched_process_fork          265        337       +72 (+27.17%)          19          24       +5 (+26.32%)
tracepoint__syscalls__sys_enter_kill      18782     140407  +121625 (+647.56%)        1286       12176  +10890 (+846.81%)

bpf_cmp_likely() is equivalent to barrier_var():

./veristat -C -e prog,insns,states before after_with_likely
Program                               Insns (A)  Insns (B)  Insns   (DIFF)  States (A)  States (B)  States (DIFF)
------------------------------------  ---------  ---------  --------------  ----------  ----------  -------------
kprobe__proc_sys_write                     1603       1663    +60 (+3.74%)         123         127    +4 (+3.25%)
kprobe__vfs_link                          11815      12090   +275 (+2.33%)         971         971    +0 (+0.00%)
kprobe__vfs_symlink                        5464       5448    -16 (-0.29%)         434         426    -8 (-1.84%)
kprobe_ret__do_filp_open                   5641       5739    +98 (+1.74%)         446         446    +0 (+0.00%)
raw_tracepoint__sched_process_exec         2770       2608   -162 (-5.85%)         226         216   -10 (-4.42%)
raw_tracepoint__sched_process_exit         1526       1526     +0 (+0.00%)         133         133    +0 (+0.00%)
raw_tracepoint__sched_process_fork          265        265     +0 (+0.00%)          19          19    +0 (+0.00%)
tracepoint__syscalls__sys_enter_kill      18782      18970   +188 (+1.00%)        1286        1286    +0 (+0.00%)
kprobe__proc_sys_write                     2700       2809   +109 (+4.04%)         107         109    +2 (+1.87%)
kprobe__vfs_link                          12238      12366   +128 (+1.05%)         267         269    +2 (+0.75%)
kprobe__vfs_symlink                        7139       7365   +226 (+3.17%)         167         175    +8 (+4.79%)
kprobe_ret__do_filp_open                   7264       7070   -194 (-2.67%)         180         182    +2 (+1.11%)
raw_tracepoint__sched_process_exec         3768       3453   -315 (-8.36%)         211         199   -12 (-5.69%)
raw_tracepoint__sched_process_exit         3138       3138     +0 (+0.00%)          83          83    +0 (+0.00%)
raw_tracepoint__sched_process_fork          265        265     +0 (+0.00%)          19          19    +0 (+0.00%)
tracepoint__syscalls__sys_enter_kill      26679      24327  -2352 (-8.82%)        1067        1037   -30 (-2.81%)
kprobe__proc_sys_write                     1833       1833     +0 (+0.00%)         157         157    +0 (+0.00%)
kprobe__vfs_link                           9995      10127   +132 (+1.32%)         803         803    +0 (+0.00%)
kprobe__vfs_symlink                        5606       5672    +66 (+1.18%)         451         451    +0 (+0.00%)
kprobe_ret__do_filp_open                   5716       5782    +66 (+1.15%)         462         462    +0 (+0.00%)
raw_tracepoint__sched_process_exec         3042       3042     +0 (+0.00%)         278         278    +0 (+0.00%)
raw_tracepoint__sched_process_exit         1680       1680     +0 (+0.00%)         146         146    +0 (+0.00%)
raw_tracepoint__sched_process_fork          299        299     +0 (+0.00%)          25          25    +0 (+0.00%)
tracepoint__syscalls__sys_enter_kill      18372      18372     +0 (+0.00%)        1558        1558    +0 (+0.00%)

default (mcpu=v3), no_alu32, cpuv4 have similar differences.

Note one place where bpf_nop_mov() is used to workaround the verifier lack of link
between the scalar register and its spill to stack.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231226191148.48536-7-alexei.starovoitov@gmail.com
2024-01-03 11:08:23 -08:00
Alexei Starovoitov
0bcc62aa98 bpf: Add bpf_nop_mov() asm macro.
bpf_nop_mov(var) asm macro emits nop register move: rX = rX.
If 'var' is a scalar and not a fixed constant the verifier will assign ID to it.
If it's later spilled the stack slot will carry that ID as well.
Hence the range refining comparison "if rX < const" will update all copies
including spilled slot.
This macro is a temporary workaround until the verifier gets smarter.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231226191148.48536-6-alexei.starovoitov@gmail.com
2024-01-03 11:08:23 -08:00
Alexei Starovoitov
907dbd3ede selftests/bpf: Remove bpf_assert_eq-like macros.
Since the last user was converted to bpf_cmp, remove bpf_assert_eq/ne/... macros.

__bpf_assert_op() macro is kept for experiments, since it's slightly more efficient
than bpf_assert(bpf_cmp_unlikely()) until LLVM is fixed.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20231226191148.48536-5-alexei.starovoitov@gmail.com
2024-01-03 11:08:23 -08:00
Alexei Starovoitov
624cd2a176 selftests/bpf: Convert exceptions_assert.c to bpf_cmp
Convert exceptions_assert.c to bpf_cmp_unlikely() macro.

Since

bpf_assert(bpf_cmp_unlikely(var, ==, 100));
other code;

will generate assembly code:

  if r1 == 100 goto L2;
  r0 = 0
  call bpf_throw
L1:
  other code;
  ...

L2: goto L1;

LLVM generates redundant basic block with extra goto. LLVM will be fixed eventually.
Right now it's less efficient than __bpf_assert(var, ==, 100) macro that produces:
  if r1 == 100 goto L1;
  r0 = 0
  call bpf_throw
L1:
  other code;

But extra goto doesn't hurt the verification process.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20231226191148.48536-4-alexei.starovoitov@gmail.com
2024-01-03 11:08:23 -08:00
Alexei Starovoitov
a8b242d77b bpf: Introduce "volatile compare" macros
Compilers optimize conditional operators at will, but often bpf programmers
want to force compilers to keep the same operator in asm as it's written in C.
Introduce bpf_cmp_likely/unlikely(var1, conditional_op, var2) macros that can be used as:

-               if (seen >= 1000)
+               if (bpf_cmp_unlikely(seen, >=, 1000))

The macros take advantage of BPF assembly that is C like.

The macros check the sign of variable 'seen' and emits either
signed or unsigned compare.

For example:
int a;
bpf_cmp_unlikely(a, >, 0) will be translated to 'if rX s> 0 goto' in BPF assembly.

unsigned int a;
bpf_cmp_unlikely(a, >, 0) will be translated to 'if rX > 0 goto' in BPF assembly.

C type conversions coupled with comparison operator are tricky.
  int i = -1;
  unsigned int j = 1;
  if (i < j) // this is false.

  long i = -1;
  unsigned int j = 1;
  if (i < j) // this is true.

Make sure BPF program is compiled with -Wsign-compare then the macros will catch
the mistake.

The macros check LHS (left hand side) only to figure out the sign of compare.

'if 0 < rX goto' is not allowed in the assembly, so the users
have to use a variable on LHS anyway.

The patch updates few tests to demonstrate the use of the macros.

The macro allows to use BPF_JSET in C code, since LLVM doesn't generate it at
present. For example:

if (i & j) compiles into r0 &= r1; if r0 == 0 goto

while

if (bpf_cmp_unlikely(i, &, j)) compiles into if r0 & r1 goto

Note that the macros has to be careful with RHS assembly predicate.
Since:
u64 __rhs = 1ull << 42;
asm goto("if r0 < %[rhs] goto +1" :: [rhs] "ri" (__rhs));
LLVM will silently truncate 64-bit constant into s32 imm.

Note that [lhs] "r"((short)LHS) the type cast is a workaround for LLVM issue.
When LHS is exactly 32-bit LLVM emits redundant <<=32, >>=32 to zero upper 32-bits.
When LHS is 64 or 16 or 8-bit variable there are no shifts.
When LHS is 32-bit the (u64) cast doesn't help. Hence use (short) cast.
It does _not_ truncate the variable before it's assigned to a register.

Traditional likely()/unlikely() macros that use __builtin_expect(!!(x), 1 or 0)
have no effect on these macros, hence macros implement the logic manually.
bpf_cmp_unlikely() macro preserves compare operator as-is while
bpf_cmp_likely() macro flips the compare.

Consider two cases:
A.
  for() {
    if (foo >= 10) {
      bar += foo;
    }
    other code;
  }

B.
  for() {
    if (foo >= 10)
       break;
    other code;
  }

It's ok to use either bpf_cmp_likely or bpf_cmp_unlikely macros in both cases,
but consider that 'break' is effectively 'goto out_of_the_loop'.
Hence it's better to use bpf_cmp_unlikely in the B case.
While 'bar += foo' is better to keep as 'fallthrough' == likely code path in the A case.

When it's written as:
A.
  for() {
    if (bpf_cmp_likely(foo, >=, 10)) {
      bar += foo;
    }
    other code;
  }

B.
  for() {
    if (bpf_cmp_unlikely(foo, >=, 10))
       break;
    other code;
  }

The assembly will look like:
A.
  for() {
    if r1 < 10 goto L1;
      bar += foo;
  L1:
    other code;
  }

B.
  for() {
    if r1 >= 10 goto L2;
    other code;
  }
  L2:

The bpf_cmp_likely vs bpf_cmp_unlikely changes basic block layout, hence it will
greatly influence the verification process. The number of processed instructions
will be different, since the verifier walks the fallthrough first.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20231226191148.48536-3-alexei.starovoitov@gmail.com
2024-01-03 10:58:42 -08:00
Alexei Starovoitov
495d2d8133 selftests/bpf: Attempt to build BPF programs with -Wsign-compare
GCC's -Wall includes -Wsign-compare while clang does not.
Since BPF programs are built with clang we need to add this flag explicitly
to catch problematic comparisons like:

  int i = -1;
  unsigned int j = 1;
  if (i < j) // this is false.

  long i = -1;
  unsigned int j = 1;
  if (i < j) // this is true.

C standard for reference:

- If either operand is unsigned long the other shall be converted to unsigned long.

- Otherwise, if one operand is a long int and the other unsigned int, then if a
long int can represent all the values of an unsigned int, the unsigned int
shall be converted to a long int; otherwise both operands shall be converted to
unsigned long int.

- Otherwise, if either operand is long, the other shall be converted to long.

- Otherwise, if either operand is unsigned, the other shall be converted to unsigned.

Unfortunately clang's -Wsign-compare is very noisy.
It complains about (s32)a == (u32)b which is safe and doen't have surprising behavior.

This patch fixes some of the issues. It needs a follow up to fix the rest.

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Link: https://lore.kernel.org/bpf/20231226191148.48536-2-alexei.starovoitov@gmail.com
2024-01-03 10:41:22 -08:00
Andrei Matei
72187506de bpf: Add a possibly-zero-sized read test
This patch adds a test for the condition that the previous patch mucked
with - illegal zero-sized helper memory access. As opposed to existing
tests, this new one uses a size whose lower bound is zero, as opposed to
a known-zero one.

Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231221232225.568730-3-andreimatei1@gmail.com
2024-01-03 10:37:56 -08:00
Andrei Matei
8a021e7fa1 bpf: Simplify checking size of helper accesses
This patch simplifies the verification of size arguments associated to
pointer arguments to helpers and kfuncs. Many helpers take a pointer
argument followed by the size of the memory access performed to be
performed through that pointer. Before this patch, the handling of the
size argument in check_mem_size_reg() was confusing and wasteful: if the
size register's lower bound was 0, then the verification was done twice:
once considering the size of the access to be the lower-bound of the
respective argument, and once considering the upper bound (even if the
two are the same). The upper bound checking is a super-set of the
lower-bound checking(*), except: the only point of the lower-bound check
is to handle the case where zero-sized-accesses are explicitly not
allowed and the lower-bound is zero. This static condition is now
checked explicitly, replacing a much more complex, expensive and
confusing verification call to check_helper_mem_access().

Error messages change in this patch. Before, messages about illegal
zero-size accesses depended on the type of the pointer and on other
conditions, and sometimes the message was plain wrong: in some tests
that changed you'll see that the old message was something like "R1 min
value is outside of the allowed memory range", where R1 is the pointer
register; the error was wrongly claiming that the pointer was bad
instead of the size being bad. Other times the information that the size
came for a register with a possible range of values was wrong, and the
error presented the size as a fixed zero. Now the errors refer to the
right register. However, the old error messages did contain useful
information about the pointer register which is now lost; recovering
this information was deemed not important enough.

(*) Besides standing to reason that the checks for a bigger size access
are a super-set of the checks for a smaller size access, I have also
mechanically verified this by reading the code for all types of
pointers. I could convince myself that it's true for all but
PTR_TO_BTF_ID (check_ptr_to_btf_access). There, simply looking
line-by-line does not immediately prove what we want. If anyone has any
qualms, let me know.

Signed-off-by: Andrei Matei <andreimatei1@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/bpf/20231221232225.568730-2-andreimatei1@gmail.com
2024-01-03 10:37:56 -08:00
Jamal Hadi Salim
33241dca48 net/sched: Remove uapi support for CBQ qdisc
Commit 051d442098 ("net/sched: Retire CBQ qdisc") retired the CBQ qdisc.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 14:25:51 +00:00
Jamal Hadi Salim
26cc8714fc net/sched: Remove uapi support for ATM qdisc
Commit fb38306ceb ("net/sched: Retire ATM qdisc") retired the ATM qdisc.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 14:25:51 +00:00
Jamal Hadi Salim
fe3b739a54 net/sched: Remove uapi support for dsmark qdisc
Commit bbe77c14ee ("net/sched: Retire dsmark qdisc") retired the dsmark
classifier. Remove UAPI support for it.
Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 14:25:51 +00:00
Jamal Hadi Salim
82b2545ed9 net/sched: Remove uapi support for tcindex classifier
commit 8c710f7525 ("net/sched: Retire tcindex classifier") retired the TC
tcindex classifier.
Remove UAPI for it.  Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 14:25:51 +00:00
Jamal Hadi Salim
41bc3e8fc1 net/sched: Remove uapi support for rsvp classifier
commit 265b4da82d ("net/sched: Retire rsvp classifier") retired the TC RSVP
classifier.
Remove UAPI for it. Iproute2 will sync by equally removing it from user space.

Reviewed-by: Victor Nogueira <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 14:25:51 +00:00
Geliang Tang
81ab772819 selftests: mptcp: diag: check CURRESTAB counters
This patch adds a new helper chk_msk_cestab() to check the current
established connections counter MIB_CURRESTAB in diag.sh. Invoke it
to check the counter during the connection after every chk_msk_inuse().

Signed-off-by: Geliang Tang <geliang.tang@linux.dev>
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 13:33:33 +00:00
Geliang Tang
0bd962dd86 selftests: mptcp: join: check CURRESTAB counters
This patch adds a new helper chk_cestab_nr() to check the current
established connections counter MIB_CURRESTAB. Set the newly added
variables cestab_ns1 and cestab_ns2 to indicate how many connections
are expected in ns1 or ns2.

Invoke check_cestab() to check the counter during the connection in
do_transfer() and invoke chk_cestab_nr() to re-check it when the
connection closed. These checks are embedded in add_tests().

Signed-off-by: Geliang Tang <geliang.tang@linux.dev>
Acked-by: Paolo Abeni <pabeni@redhat.com>
Reviewed-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 13:33:33 +00:00
Dmitry Safonov
80057b2080 selftest/tcp-ao: Work on namespace-ified sysctl_optmem_max
Since commit f5769faeec ("net: Namespace-ify sysctl_optmem_max")
optmem_max is per-netns, so need of switching to root namespace.
It seems trivial to keep the old logic working, so going to keep it for
a while (at least, until kernel with netns-optmem_max will be release).

Currently, there is a test that checks that optmem_max limit applies to
TCP-AO keys and a little benchmark that measures linked-list TCP-AO keys
scaling, those are fixed by this.

Cc: Eric Dumazet <edumazet@google.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 13:27:48 +00:00
Dmitry Safonov
72cd9f8d5a selftest/tcp-ao: Set routes in a proper VRF table id
In unsigned-md5 selftests ip_route_add() is not needed in
client_add_ip(): the route was pre-setup in __test_init() => link_init()
for subnet, rather than a specific ip-address.

Currently, __ip_route_add() mistakenly always sets VRF table
to RT_TABLE_MAIN - this seems to have sneaked in during unsigned-md5
tests debugging. That also explains, why ip_route_add_vrf() ignored
EEXIST, returned by fib6.

Yet, keep EEXIST ignoring in bench-lookups selftests as it's expected
that those selftests may add the same (duplicate) routes.

Reported-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 13:27:48 +00:00
Jamal Hadi Salim
ba24ea1291 net/sched: Retire ipt action
The tc ipt action was intended to run all netfilter/iptables target.
Unfortunately it has not benefitted over the years from proper updates when
netfilter changes, and for that reason it has remained rudimentary.
Pinging a bunch of people that i was aware were using this indicates that
removing it wont affect them.
Retire it to reduce maintenance efforts. Buh-bye.

Reviewed-by: Victor Noguiera <victor@mojatatu.com>
Reviewed-by: Pedro Tammela <pctammela@mojatatu.com>
Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-02 12:41:16 +00:00
David S. Miller
109bf4cfe1 netfilter pull request 23-12-22
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmWFd9oACgkQ1V2XiooU
 IOTBog//auEj3LR5v2b5et6CLsuMESsCFkVko1AFdzzpu94sJ8TTkqlrW7SPSfPW
 CMNYz1EYFaRGK1GjfH0a49EUofeU/kPD/TmffmiYwuJlGyEIXp17ZIUQnGDrM6Mz
 UBHX8VrW73McJYuZJbxM4HX6Q3aFyshKy8iOPpffdFAQcCD5XtkYbW24Mzw6V4sI
 8fBz6C/iT+5Jq1wGtvM2xmGkNYuMLJxbf9qOpinRZCXQV/z5IWda/bFnv3kpJ9mR
 x2ZYXHnzFEr2gVFIhZ7G6FvMgQAkkKGVQE+n9zVZ9V78czqBk9depdeLaq/RYGyx
 8VYqJyaXHKQm7FqsBwSSxU964aQhi/zNrjF9SeJSfcRNhTmXgzdERndVWVjUPe9O
 rFLyQVoQ6JyKy5++1lL3+63cd6ZeGw8gUw8MCw9DX8rwamnNFunUsixKv4SLzQHA
 jfFgmAUq+HVWHXI4xmj+SDO8g+s7O3BpbZM09E3si6lwy4/LiQqHhV4sNgEVN/It
 hf76t/U6GRYZ0Hwy2dwQl8SZg1BxMSGFmzB7vyEZyGXaqxRZb1Jym4h9slOMKXWl
 gS5y4y0ZVTmzRCqJR+jUrMAKPCw/R964LrslDYwQpsQeP7nuUPeU5jH1VWD6kAbS
 Vc36uDRnfojoNiRcmAfuhvZIseM1+hI6UXCIsqYR6cekCFlBad8=
 =+Uvv
 -----END PGP SIGNATURE-----

Merge tag 'nf-next-23-12-22' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf-next

Pablo Neira Ayuso says:

====================
netfilter pull request 23-12-22

The following patchset contains Netfilter updates for net-next:

1) Add locking for NFT_MSG_GETSETELEM_RESET requests, to address a
   race scenario with two concurrent processes running a dump-and-reset
   which exposes negative counters to userspace, from Phil Sutter.

2) Use GFP_KERNEL in pipapo GC, from Florian Westphal.

3) Reorder nf_flowtable struct members, place the read-mostly parts
   accessed by the datapath first. From Florian Westphal.

4) Set on dead flag for NFT_MSG_NEWSET in abort path,
   from Florian Westphal.

5) Support filtering zone in ctnetlink, from Felix Huettner.

6) Bail out if user tries to redefine an existing chain with different
   type in nf_tables.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01 16:15:40 +00:00
David S. Miller
240436c06c bpf-next-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZYVEqQAKCRDbK58LschI
 gzH6AP9hVXLpHFTWMT0+2GK2lx69VX8zW1C0SmN7WHaxUbPN9QEAwzGnELfKk00P
 0IKRHSl5abhVMX7JOM3sSOhCILeKjQg=
 =wRLJ
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next

Daniel Borkmann says:

====================
bpf-next-for-netdev
The following pull-request contains BPF updates for your *net-next* tree.

We've added 22 non-merge commits during the last 3 day(s) which contain
a total of 23 files changed, 652 insertions(+), 431 deletions(-).

The main changes are:

1) Add verifier support for annotating user's global BPF subprogram arguments
   with few commonly requested annotations for a better developer experience,
   from Andrii Nakryiko.

   These tags are:
     - Ability to annotate a special PTR_TO_CTX argument
     - Ability to annotate a generic PTR_TO_MEM as non-NULL

2) Support BPF verifier tracking of BPF_JNE which helps cases when the compiler
   transforms (unsigned) "a > 0" into "if a == 0 goto xxx" and the like, from
   Menglong Dong.

3) Fix a warning in bpf_mem_cache's check_obj_size() as reported by LKP, from Hou Tao.

4) Re-support uid/gid options when mounting bpffs which had to be reverted with
   the prior token series revert to avoid conflicts, from Daniel Borkmann.

5) Fix a libbpf NULL pointer dereference in bpf_object__collect_prog_relos() found
   from fuzzing the library with malformed ELF files, from Mingyi Zhang.

6) Skip DWARF sections in libbpf's linker sanity check given compiler options to
   generate compressed debug sections can trigger a rejection due to misalignment,
   from Alyssa Ross.

7) Fix an unnecessary use of the comma operator in BPF verifier, from Simon Horman.

8) Fix format specifier for unsigned long values in cpustat sample, from Colin Ian King.
====================

Signed-off-by: David S. Miller <davem@davemloft.net>
2024-01-01 14:45:21 +00:00
Maxim Galaganov
122db5e363 selftests/net: add MPTCP coverage for IP_LOCAL_PORT_RANGE
Since previous commit, MPTCP has support for IP_BIND_ADDRESS_NO_PORT and
IP_LOCAL_PORT_RANGE sockopts.

Add ip4_mptcp and ip6_mptcp fixture variants to ip_local_port_range
selftest to provide selftest coverage for these sockopts.

Acked-by: Mat Martineau <martineau@kernel.org>
Signed-off-by: Maxim Galaganov <max@internet.ru>
Signed-off-by: Matthieu Baerts <matttbe@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-26 22:33:21 +00:00
Vladimir Oltean
c8659bd9d1 selftests: forwarding: ethtool_mm: fall back to aggregate if device does not report pMAC stats
Some devices do not support individual 'pmac' and 'emac' stats.
For such devices, resort to 'aggregate' stats.

Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23 01:01:19 +00:00
Vladimir Oltean
2491d66ae6 selftests: forwarding: ethtool_mm: support devices with higher rx-min-frag-size
Some devices have errata due to which they cannot report ETH_ZLEN (60)
in the rx-min-frag-size. This was foreseen of course, and lldpad has
logic that when we request it to advertise addFragSize 0, it will round
it up to the lowest value that is _actually_ supported by the hardware.

The problem is that the selftest expects lldpad to report back to us the
same value as we requested.

Make the selftest smarter by figuring out on its own what is a
reasonable value to expect.

Cc: Shuah Khan <shuah@kernel.org>
Signed-off-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Tested-by: Roger Quadros <rogerq@kernel.org>
Signed-off-by: Roger Quadros <rogerq@kernel.org>
Tested-by: Vladimir Oltean <vladimir.oltean@nxp.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23 01:01:19 +00:00
Hangbin Liu
9d0b4ad82d kselftest/runner.sh: add netns support
Add a variable RUN_IN_NETNS if the user wants to run all the selected tests
in namespace in parallel. With this, we can save a lot of testing time.

Note that some tests may not fit to run in namespace, e.g.
net/drop_monitor_tests.sh, as the dwdump needs to be run in init ns.

I also added another parameter -p to make all the logs reported separately
instead of mixing them in the stdout or output.log.

Nit: the NUM in run_one is not used, rename it to test_num.

Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23 00:26:32 +00:00
Hangbin Liu
378f082eaf selftests/net: convert pmtu.sh to run it in unique namespace
pmtu test use /bin/sh, so we need to source ./lib.sh instead of lib.sh
Here is the test result after conversion.

 # ./pmtu.sh
 TEST: ipv4: PMTU exceptions                                         [ OK ]
 TEST: ipv4: PMTU exceptions - nexthop objects                       [ OK ]
 TEST: ipv6: PMTU exceptions                                         [ OK ]
 TEST: ipv6: PMTU exceptions - nexthop objects                       [ OK ]
 ...
 TEST: ipv4: list and flush cached exceptions - nexthop objects      [ OK ]
 TEST: ipv6: list and flush cached exceptions                        [ OK ]
 TEST: ipv6: list and flush cached exceptions - nexthop objects      [ OK ]
 TEST: ipv4: PMTU exception w/route replace                          [ OK ]
 TEST: ipv4: PMTU exception w/route replace - nexthop objects        [ OK ]
 TEST: ipv6: PMTU exception w/route replace                          [ OK ]
 TEST: ipv6: PMTU exception w/route replace - nexthop objects        [ OK ]

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23 00:26:32 +00:00
Hangbin Liu
4416c5f53b selftests/net: use unique netns name for setup_loopback.sh setup_veth.sh
The setup_loopback and setup_veth use their own way to create namespace.
So let's just re-define server_ns/client_ns to unique name.
At the same time update the namespace name in gro.sh and toeplitz.sh.
As I don't have env to run toeplitz.sh. Here is only the gro test result.

 # ./gro.sh
 running test ipv4 data
 Expected {200 }, Total 1 packets
 Received {200 }, Total 1 packets.
 ...
 Gro::large test passed.
 All Tests Succeeded!

Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23 00:26:32 +00:00
Hangbin Liu
976fd1fe4f selftests/net: convert xfrm_policy.sh to run it in unique namespace
Here is the test result after conversion.

 # ./xfrm_policy.sh
 PASS: policy before exception matches
 PASS: ping to .254 bypassed ipsec tunnel (exceptions)
 PASS: direct policy matches (exceptions)
 PASS: policy matches (exceptions)
 PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies)
 PASS: direct policy matches (exceptions and block policies)
 PASS: policy matches (exceptions and block policies)
 PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies after hresh changes)
 PASS: direct policy matches (exceptions and block policies after hresh changes)
 PASS: policy matches (exceptions and block policies after hresh changes)
 PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies after hthresh change in ns3)
 PASS: direct policy matches (exceptions and block policies after hthresh change in ns3)
 PASS: policy matches (exceptions and block policies after hthresh change in ns3)
 PASS: ping to .254 bypassed ipsec tunnel (exceptions and block policies after htresh change to normal)
 PASS: direct policy matches (exceptions and block policies after htresh change to normal)
 PASS: policy matches (exceptions and block policies after htresh change to normal)
 PASS: policies with repeated htresh change

Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23 00:26:32 +00:00
Hangbin Liu
098f1ce08b selftests/net: convert stress_reuseport_listen.sh to run it in unique namespace
Here is the test result after conversion.

 # ./stress_reuseport_listen.sh
 listen 24000 socks took 0.47714

Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23 00:26:32 +00:00
Hangbin Liu
d3b6b11161 selftests/net: convert rtnetlink.sh to run it in unique namespace
When running the test in namespace, the debugfs may not load automatically.
So add a checking to make sure debugfs loaded. Here is the test result
after conversion.

 # ./rtnetlink.sh
 PASS: policy routing
 PASS: route get
 ...
 PASS: address proto IPv4
 PASS: address proto IPv6

Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23 00:26:32 +00:00
Hangbin Liu
f6476dedf0 selftests/net: convert netns-name.sh to run it in unique namespace
This test will move the device to netns 1. Add a new test_ns to do this.
Here is the test result after conversion.

 # ./netns-name.sh
 netns-name.sh                           [  OK  ]

Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23 00:26:32 +00:00
Hangbin Liu
b84c2faeb9 selftests/net: convert gre_gso.sh to run it in unique namespace
Here is the test result after conversion.

 # ./gre_gso.sh
     TEST: GREv6/v4 - copy file w/ TSO                                   [ OK ]
     TEST: GREv6/v4 - copy file w/ GSO                                   [ OK ]
     TEST: GREv6/v6 - copy file w/ TSO                                   [ OK ]
     TEST: GREv6/v6 - copy file w/ GSO                                   [ OK ]

 Tests passed:   4
 Tests failed:   0

Acked-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23 00:26:32 +00:00
Jiapeng Chong
6530b29f77 selftests/net: remove unneeded semicolon
No functional modification involved.

./tools/testing/selftests/net/tcp_ao/setsockopt-closed.c:121:2-3: Unneeded semicolon.

Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Closes: https://bugzilla.openanolis.cn/show_bug.cgi?id=7771
Signed-off-by: Jiapeng Chong <jiapeng.chong@linux.alibaba.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-23 00:23:30 +00:00
Dmitry Safonov
826eb9bcc1 selftest/tcp-ao: Rectify out-of-tree build
Trivial fix for out-of-tree build that I wasn't testing previously:

1. Create a directory for library object files, fixes:
> gcc lib/kconfig.c -Wall -O2 -g -D_GNU_SOURCE -fno-strict-aliasing -I ../../../../../usr/include/ -iquote /tmp/kselftest/kselftest/net/tcp_ao/lib -I ../../../../include/  -o /tmp/kselftest/kselftest/net/tcp_ao/lib/kconfig.o -c
> Assembler messages:
> Fatal error: can't create /tmp/kselftest/kselftest/net/tcp_ao/lib/kconfig.o: No such file or directory
> make[1]: *** [Makefile:46: /tmp/kselftest/kselftest/net/tcp_ao/lib/kconfig.o] Error 1

2. Include $(KHDR_INCLUDES) that's exported by selftests/Makefile, fixes:
> In file included from lib/kconfig.c:6:
> lib/aolib.h:320:45: warning: ‘struct tcp_ao_add’ declared inside parameter list will not be visible outside of this definition or declaration
>   320 | extern int test_prepare_key_sockaddr(struct tcp_ao_add *ao, const char *alg,
>       |                                             ^~~~~~~~~~
...

3. While at here, clean-up $(KSFT_KHDR_INSTALL): it's not needed anymore
   since commit f2745dc0ba ("selftests: stop using KSFT_KHDR_INSTALL")

4. Also, while at here, drop .DEFAULT_GOAL definition: that has a
   self-explaining comment, that was valid when I made these selftests
   compile on local v4.19 kernel, but not needed since
   commit 8ce72dc325 ("selftests: fix headers_install circular dependency")

Fixes: cfbab37b3d ("selftests/net: Add TCP-AO library")
Reported-by: kernel test robot <lkp@intel.com>
Closes: https://lore.kernel.org/oe-kbuild-all/202312190645.q76MmHyq-lkp@intel.com/
Signed-off-by: Dmitry Safonov <dima@arista.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22 23:26:37 +00:00
Colin Ian King
67f440c05d selftests/net: Fix various spelling mistakes in TCP-AO tests
There are a handful of spelling mistakes in test messages in the
TCP-AIO selftests. Fix these.

Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Reviewed-by: Dmitry Safonov <dima@arista.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-22 22:13:00 +00:00
Felix Huettner
eff3c558bb netfilter: ctnetlink: support filtering by zone
conntrack zones are heavily used by tools like openvswitch to run
multiple virtual "routers" on a single machine. In this context each
conntrack zone matches to a single router, thereby preventing
overlapping IPs from becoming issues.
In these systems it is common to operate on all conntrack entries of a
given zone, e.g. to delete them when a router is deleted. Previously this
required these tools to dump the full conntrack table and filter out the
relevant entries in userspace potentially causing performance issues.

To do this we reuse the existing CTA_ZONE attribute. This was previous
parsed but not used during dump and flush requests. Now if CTA_ZONE is
set we filter these operations based on the provided zone.
However this means that users that previously passed CTA_ZONE will
experience a difference in functionality.

Alternatively CTA_FILTER could have been used for the same
functionality. However it is not yet supported during flush requests and
is only available when using AF_INET or AF_INET6.

Co-developed-by: Luca Czesla <luca.czesla@mail.schwarz>
Signed-off-by: Luca Czesla <luca.czesla@mail.schwarz>
Co-developed-by: Max Lamprecht <max.lamprecht@mail.schwarz>
Signed-off-by: Max Lamprecht <max.lamprecht@mail.schwarz>
Signed-off-by: Felix Huettner <felix.huettner@mail.schwarz>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2023-12-22 12:15:20 +01:00
Paolo Abeni
56794e5358 Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

Adjacent changes:

drivers/net/ethernet/broadcom/bnxt/bnxt_xdp.c
  23c93c3b62 ("bnxt_en: do not map packet buffers twice")
  6d1add9553 ("bnxt_en: Modify TX ring indexing logic.")

tools/testing/selftests/net/Makefile
  2258b66648 ("selftests: add vlan hw filter tests")
  a0bc96c0cd ("selftests: net: verify fq per-band packet limit")

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21 22:17:23 +01:00
Linus Torvalds
7c5e046bdc Including fixes from WiFi and bpf.
Current release - regressions:
 
   - bpf: syzkaller found null ptr deref in unix_bpf proto add
 
   - eth: i40e: fix ST code value for clause 45
 
 Previous releases - regressions:
 
   - core: return error from sk_stream_wait_connect() if sk_wait_event() fails
 
   - ipv6: revert remove expired routes with a separated list of routes
 
   - wifi rfkill:
     - set GPIO direction
     - fix crash with WED rx support enabled
 
   - bluetooth:
     - fix deadlock in vhci_send_frame
     - fix use-after-free in bt_sock_recvmsg
 
   - eth: mlx5e: fix a race in command alloc flow
 
   - eth: ice: fix PF with enabled XDP going no-carrier after reset
 
   - eth: bnxt_en: do not map packet buffers twice
 
 Previous releases - always broken:
 
   - core:
     - check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev()
     - check dev->gso_max_size in gso_features_check()
 
   - mptcp: fix inconsistent state on fastopen race
 
   - phy: skip LED triggers on PHYs on SFP modules
 
   - eth: mlx5e:
     - fix double free of encap_header
     - fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list()
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmWELNYSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkL0MQALYng9iIz5m/iX0pjRpI4HxruflMkdWa
 +FMRnWp0OE5ak8l/UcffgYRNozXAcA/PSJhanZU3gT21IeJ+X78paivWyEPUhpqN
 d1nhRRsnx37fTK3lbS/wSGJUN+x4g49kHQn5mw8vfi/RGHuc/vbLO23iWsawB92/
 7YI0rEzZh2b1FKytvqF9t2lLtJw5ucwQtdm3d/tg4iuL44Lq8dA69dln4wZx3t28
 lobsW0eQW2JRh2YwrREb1oUD0CcUNk+XGsWVyUXqs31OflkqYMEzI41Yxs/lHJs0
 0Lmt3/F2Ls+H+vEYElJ0wsNPFZr4TDhAsV5KMxZdoBfWhTN8ordloBXGHre1IVSK
 SQtja5IqT01dLbDoL7tLpyEsGLp1A+HPH+BVxt582srSMoXWmFYOZcRczKJ85C1W
 qaohCGeEO537ExrAMHbJ0CxR3oSawyOBszjTYGdbI3xiFj5q1n48YyJSep//rgvP
 PewzqtMpPPapPIiJbvRjN8Mn56Y2832TSbPOVZ2KJuBpx+i/mjXyIK97FMb+Jdvu
 3ACH3BmsUfvXXpXNSZIgtc35tP03GSeV9B2vzlhjFwxB2vV4wuX9NbI5OIWi7ZM1
 03jkC2wQf6jVby45IM5kMuEKL3hMXUx9t0nZg0szJ3T31+OQ6e5Hlv1Aqp4Ihn5Q
 N+fxo6lpm+Aq
 =sEmi
 -----END PGP SIGNATURE-----

Merge tag 'net-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from WiFi and bpf.

  Current release - regressions:

   - bpf: syzkaller found null ptr deref in unix_bpf proto add

   - eth: i40e: fix ST code value for clause 45

  Previous releases - regressions:

   - core: return error from sk_stream_wait_connect() if sk_wait_event()
     fails

   - ipv6: revert remove expired routes with a separated list of routes

   - wifi rfkill:
       - set GPIO direction
       - fix crash with WED rx support enabled

   - bluetooth:
       - fix deadlock in vhci_send_frame
       - fix use-after-free in bt_sock_recvmsg

   - eth: mlx5e: fix a race in command alloc flow

   - eth: ice: fix PF with enabled XDP going no-carrier after reset

   - eth: bnxt_en: do not map packet buffers twice

  Previous releases - always broken:

   - core:
       - check vlan filter feature in vlan_vids_add_by_dev() and
         vlan_vids_del_by_dev()
       - check dev->gso_max_size in gso_features_check()

   - mptcp: fix inconsistent state on fastopen race

   - phy: skip LED triggers on PHYs on SFP modules

   - eth: mlx5e:
       - fix double free of encap_header
       - fix slab-out-of-bounds in mlx5_query_nic_vport_mac_list()"

* tag 'net-6.7-rc7' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (69 commits)
  net: check dev->gso_max_size in gso_features_check()
  kselftest: rtnetlink.sh: use grep_fail when expecting the cmd fail
  net/ipv6: Revert remove expired routes with a separated list of routes
  net: avoid build bug in skb extension length calculation
  net: ethernet: mtk_wed: fix possible NULL pointer dereference in mtk_wed_wo_queue_tx_clean()
  net: stmmac: fix incorrect flag check in timestamp interrupt
  selftests: add vlan hw filter tests
  net: check vlan filter feature in vlan_vids_add_by_dev() and vlan_vids_del_by_dev()
  net: hns3: add new maintainer for the HNS3 ethernet driver
  net: mana: select PAGE_POOL
  net: ks8851: Fix TX stall caused by TX buffer overrun
  ice: Fix PF with enabled XDP going no-carrier after reset
  ice: alter feature support check for SRIOV and LAG
  ice: stop trashing VF VSI aggregator node ID information
  mailmap: add entries for Geliang Tang
  mptcp: fill in missing MODULE_DESCRIPTION()
  mptcp: fix inconsistent state on fastopen race
  selftests: mptcp: join: fix subflow_send_ack lookup
  net: phy: skip LED triggers on PHYs on SFP modules
  bpf: Add missing BPF_LINK_TYPE invocations
  ...
2023-12-21 09:15:37 -08:00
Paolo Abeni
74769d810e bpf-for-netdev
-----BEGIN PGP SIGNATURE-----
 
 iHUEABYIAB0WIQTFp0I1jqZrAX+hPRXbK58LschIgwUCZYQVjgAKCRDbK58LschI
 g/NfAP9xMBCASd22+KPu44FtPPO5DKcdG7hATXZMpb/cygF8GQEAojcZ4jztx42S
 F1+4RPEoxrn31oVYdtEGFY9q85ruzgA=
 =2XhN
 -----END PGP SIGNATURE-----

Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Daniel Borkmann says:

====================
pull-request: bpf 2023-12-21

Hi David, hi Jakub, hi Paolo, hi Eric,

The following pull-request contains BPF updates for your *net* tree.

We've added 3 non-merge commits during the last 5 day(s) which contain
a total of 4 files changed, 45 insertions(+).

The main changes are:

1) Fix a syzkaller splat which triggered an oob issue in bpf_link_show_fdinfo(),
   from Jiri Olsa.

2) Fix another syzkaller-found issue which triggered a NULL pointer dereference
   in BPF sockmap for unconnected unix sockets, from John Fastabend.

bpf-for-netdev

* tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: Add missing BPF_LINK_TYPE invocations
  bpf: sockmap, test for unconnected af_unix sock
  bpf: syzkaller found null ptr deref in unix_bpf proto add
====================

Link: https://lore.kernel.org/r/20231221104844.1374-1-daniel@iogearbox.net
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21 12:27:29 +01:00
Mingyi Zhang
fc3a5534e2 libbpf: Fix NULL pointer dereference in bpf_object__collect_prog_relos
An issue occurred while reading an ELF file in libbpf.c during fuzzing:

	Program received signal SIGSEGV, Segmentation fault.
	0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206
	4206 in libbpf.c
	(gdb) bt
	#0 0x0000000000958e97 in bpf_object.collect_prog_relos () at libbpf.c:4206
	#1 0x000000000094f9d6 in bpf_object.collect_relos () at libbpf.c:6706
	#2 0x000000000092bef3 in bpf_object_open () at libbpf.c:7437
	#3 0x000000000092c046 in bpf_object.open_mem () at libbpf.c:7497
	#4 0x0000000000924afa in LLVMFuzzerTestOneInput () at fuzz/bpf-object-fuzzer.c:16
	#5 0x000000000060be11 in testblitz_engine::fuzzer::Fuzzer::run_one ()
	#6 0x000000000087ad92 in tracing::span::Span::in_scope ()
	#7 0x00000000006078aa in testblitz_engine::fuzzer::util::walkdir ()
	#8 0x00000000005f3217 in testblitz_engine::entrypoint::main::{{closure}} ()
	#9 0x00000000005f2601 in main ()
	(gdb)

scn_data was null at this code(tools/lib/bpf/src/libbpf.c):

	if (rel->r_offset % BPF_INSN_SZ || rel->r_offset >= scn_data->d_size) {

The scn_data is derived from the code above:

	scn = elf_sec_by_idx(obj, sec_idx);
	scn_data = elf_sec_data(obj, scn);

	relo_sec_name = elf_sec_str(obj, shdr->sh_name);
	sec_name = elf_sec_name(obj, scn);
	if (!relo_sec_name || !sec_name)// don't check whether scn_data is NULL
		return -EINVAL;

In certain special scenarios, such as reading a malformed ELF file,
it is possible that scn_data may be a null pointer

Signed-off-by: Mingyi Zhang <zhangmingyi5@huawei.com>
Signed-off-by: Xin Liu <liuxin350@huawei.com>
Signed-off-by: Changye Wu <wuchangye@huawei.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20231221033947.154564-1-liuxin350@huawei.com
2023-12-21 10:05:42 +01:00
Alyssa Ross
812d8bf876 libbpf: Skip DWARF sections in linker sanity check
clang can generate (with -g -Wa,--compress-debug-sections) 4-byte
aligned DWARF sections that declare themselves to be 8-byte aligned in
the section header.  Since DWARF sections are dropped during linking
anyway, just skip running the sanity checks on them.

Reported-by: Sergei Trofimovich <slyich@gmail.com>
Suggested-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Signed-off-by: Alyssa Ross <hi@alyssa.is>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Closes: https://lore.kernel.org/bpf/ZXcFRJVKbKxtEL5t@nz.home/
Link: https://lore.kernel.org/bpf/20231219110324.8989-1-hi@alyssa.is
2023-12-21 10:05:15 +01:00
Hangbin Liu
b8056f2ce0 kselftest: rtnetlink.sh: use grep_fail when expecting the cmd fail
run_cmd_grep_fail should be used when expecting the cmd fail, or the ret
will be set to 1, and the total test return 1 when exiting. This would cause
the result report to fail if run via run_kselftest.sh.

Before fix:
 # ./rtnetlink.sh -t kci_test_addrlft
 PASS: preferred_lft addresses have expired
 # echo $?
 1

After fix:
 # ./rtnetlink.sh -t kci_test_addrlft
 PASS: preferred_lft addresses have expired
 # echo $?
 0

Fixes: 9c2a19f715 ("kselftest: rtnetlink.sh: add verbose flag")
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20231219065737.1725120-1-liuhangbin@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2023-12-21 09:32:13 +01:00
Hou Tao
69ff403d87 selftests/bpf: Remove tests for zeroed-array kptr
bpf_mem_alloc() doesn't support zero-sized allocation, so removing these
tests from test_bpf_ma test. After the removal, there will no definition
for bin_data_8, so remove 8 from data_sizes array and adjust the index
of data_btf_ids array in all test cases accordingly.

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231216131052.27621-3-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-20 13:25:46 -08:00
Ido Schimmel
c3e87a7fcd selftests: vxlan_mdb: Add MDB bulk deletion test
Add test cases to verify the behavior of the MDB bulk deletion
functionality in the VXLAN driver.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Petr Machata <petrm@nvidia.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20 11:27:21 +00:00
Ido Schimmel
bd2dcb94c8 selftests: bridge_mdb: Add MDB bulk deletion test
Add test cases to verify the behavior of the MDB bulk deletion
functionality in the bridge driver.

Signed-off-by: Ido Schimmel <idosch@nvidia.com>
Acked-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Hangbin Liu <liuhangbin@gmail.com>
Acked-by: Nikolay Aleksandrov <razor@blackwall.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2023-12-20 11:27:21 +00:00
Hou Tao
441c725ed5 selftests/bpf: Close cgrp fd before calling cleanup_cgroup_environment()
There is error log when htab-mem benchmark completes. The error log
looks as follows:

$ ./bench htab-mem -d1
Setting up benchmark 'htab-mem'...
Benchmark 'htab-mem' started.
......
(cgroup_helpers.c:353: errno: Device or resource busy) umount cgroup2

Fix it by closing cgrp fd before invoking cleanup_cgroup_environment().

Signed-off-by: Hou Tao <houtao1@huawei.com>
Link: https://lore.kernel.org/r/20231219135727.2661527-1-houtao@huaweicloud.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19 18:22:11 -08:00
Andrii Nakryiko
f0a5056222 selftests/bpf: add freplace of BTF-unreliable main prog test
Add a test validating that freplace'ing another main (entry) BPF program
fails if the target BPF program doesn't have valid/expected func proto BTF.

We extend fexit_bpf2bpf test to allow to specify expected log message
for negative test cases (where freplace program is expected to fail to
load).

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231215011334.2307144-11-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19 18:06:47 -08:00
Andrii Nakryiko
0a0ffcac92 selftests/bpf: add global subprog annotation tests
Add test cases to validate semantics of global subprog argument
annotations:
  - non-null pointers;
  - context argument;
  - const dynptr passing;
  - packet pointers (data, metadata, end).

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231215011334.2307144-10-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19 18:06:47 -08:00
Andrii Nakryiko
aae9c25dda libbpf: add __arg_xxx macros for annotating global func args
Add a set of __arg_xxx macros which can be used to augment BPF global
subprogs/functions with extra information for use by BPF verifier.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231215011334.2307144-9-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19 18:06:47 -08:00
Andrii Nakryiko
f18c3d88de bpf: reuse subprog argument parsing logic for subprog call checks
Remove duplicated BTF parsing logic when it comes to subprog call check.
Instead, use (potentially cached) results of btf_prepare_func_args() to
abstract away expectations of each subprog argument in generic terms
(e.g., "this is pointer to context", or "this is a pointer to memory of
size X"), and then use those simple high-level argument type
expectations to validate actual register states to check if they match
expectations.

Acked-by: Eduard Zingerman <eddyz87@gmail.com>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20231215011334.2307144-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2023-12-19 18:06:46 -08:00