Commit Graph

1264777 Commits

Author SHA1 Message Date
Martin KaFai Lau
42e4ebd390 bpf: Remove CONFIG_X86 and CONFIG_DYNAMIC_FTRACE guard from the tcp-cc kfuncs
The commit 7aae231ac9 ("bpf: tcp: Limit calling some tcp cc functions to CONFIG_DYNAMIC_FTRACE")
added CONFIG_DYNAMIC_FTRACE guard because pahole was only generating
btf for ftrace-able functions. The ftrace filter had already been
removed from pahole, so the CONFIG_DYNAMIC_FTRACE guard can be
removed.

The commit 569c484f99 ("bpf: Limit static tcp-cc functions in the .BTF_ids list to x86")
has added CONFIG_X86 guard because it failed the powerpc arch which
prepended a "." to the local static function, so "cubictcp_init" becomes
".cubictcp_init". "__bpf_kfunc" has been added to kfunc
since then and it uses the __unused compiler attribute.
There is an existing
"__bpf_kfunc static u32 bpf_kfunc_call_test_static_unused_arg(u32 arg, u32 unused)"
test in bpf_testmod.c to cover the static kfunc case.

cross compile on ppc64 with CONFIG_DYNAMIC_FTRACE disabled:
> readelf -s vmlinux | grep cubictcp_
56938: c00000000144fd00   184 FUNC    LOCAL  DEFAULT    2 cubictcp_cwnd_event 	    [<localentry>: 8]
56939: c00000000144fdb8   200 FUNC    LOCAL  DEFAULT    2 cubictcp_recalc_[...]   [<localentry>: 8]
56940: c00000000144fe80   296 FUNC    LOCAL  DEFAULT    2 cubictcp_init 	    [<localentry>: 8]
56941: c00000000144ffa8   228 FUNC    LOCAL  DEFAULT    2 cubictcp_state 	    [<localentry>: 8]
56942: c00000000145008c  1908 FUNC    LOCAL  DEFAULT    2 cubictcp_cong_avoid  [<localentry>: 8]
56943: c000000001450800  1644 FUNC    LOCAL  DEFAULT    2 cubictcp_acked 	    [<localentry>: 8]

> bpftool btf dump file vmlinux | grep cubictcp_
[51540] FUNC 'cubictcp_acked' type_id=38137 linkage=static
[51541] FUNC 'cubictcp_cong_avoid' type_id=38122 linkage=static
[51543] FUNC 'cubictcp_cwnd_event' type_id=51542 linkage=static
[51544] FUNC 'cubictcp_init' type_id=9186 linkage=static
[51545] FUNC 'cubictcp_recalc_ssthresh' type_id=35021 linkage=static
[51547] FUNC 'cubictcp_state' type_id=38141 linkage=static

The patch removed both config guards.

Cc: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Martin KaFai Lau <martin.lau@kernel.org>
Link: https://lore.kernel.org/r/20240322191433.4133280-1-martin.lau@linux.dev
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:31:40 -07:00
Yafang Shao
ee3bad033d bpf: Mitigate latency spikes associated with freeing non-preallocated htab
Following the recent upgrade of one of our BPF programs, we encountered
significant latency spikes affecting other applications running on the same
host. After thorough investigation, we identified that these spikes were
primarily caused by the prolonged duration required to free a
non-preallocated htab with approximately 2 million keys.

Notably, our kernel configuration lacks the presence of CONFIG_PREEMPT. In
scenarios where kernel execution extends excessively, other threads might
be starved of CPU time, resulting in latency issues across the system. To
mitigate this, we've adopted a proactive approach by incorporating
cond_resched() calls within the kernel code. This ensures that during
lengthy kernel operations, the scheduler is invoked periodically to provide
opportunities for other threads to execute.

Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Acked-by: Yonghong Song <yonghong.song@linux.dev>
Link: https://lore.kernel.org/r/20240327032022.78391-1-laoar.shao@gmail.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:31:40 -07:00
Alexei Starovoitov
a461a51e51 Merge branch 'bench-fast-in-kernel-triggering-benchmarks'
Andrii Nakryiko says:

====================
bench: fast in-kernel triggering benchmarks

Remove "legacy" triggering benchmarks which rely on syscalls (and thus syscall
overhead is a noticeable part of benchmark, unfortunately). Replace them with
faster versions that rely on triggering BPF programs in-kernel through another
simple "driver" BPF program. See patch #2 with comparison results.

raw_tp/tp/fmodret benchmarks required adding a simple kfunc in kernel to be
able to trigger a simple tracepoint from BPF program (plus it is also allowed
to be replaced by fmod_ret programs). This limits raw_tp/tp/fmodret benchmarks
to new kernels only, but it keeps bench tool itself very portable and most of
other benchmarks will still work on wide variety of kernels without the need
to worry about building and deploying custom kernel module. See patches #5
and #6 for details.

v1->v2:
  - move new TP closer to BPF test run code;
  - rename/move kfunc and register it for fmod_rets (Alexei);
  - limit --trig-batch-iters param to [1, 1000] (Alexei).
====================

Link: https://lore.kernel.org/r/20240326162151.3981687-1-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:31:40 -07:00
Andrii Nakryiko
985d0681b4 selftests/bpf: add batched tp/raw_tp/fmodret tests
Utilize bpf_modify_return_test_tp() kfunc to have a fast way to trigger
tp/raw_tp/fmodret programs from another BPF program, which gives us
comparable batched benchmarks to (batched) kprobe/fentry benchmarks.

We don't switch kprobe/fentry batched benchmarks to this kfunc to make
bench tool usable on older kernels as well.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240326162151.3981687-7-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:31:40 -07:00
Andrii Nakryiko
3124591f68 bpf: add bpf_modify_return_test_tp() kfunc triggering tracepoint
Add a simple bpf_modify_return_test_tp() kfunc, available to all program
types, that is useful for various testing and benchmarking scenarios, as
it allows to trigger most tracing BPF program types from BPF side,
allowing to do complex testing and benchmarking scenarios.

It is also attachable to for fmod_ret programs, making it a good and
simple way to trigger fmod_ret program under test/benchmark.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240326162151.3981687-6-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:31:40 -07:00
Andrii Nakryiko
b4ccf9158f selftests/bpf: lazy-load trigger bench BPF programs
Instead of front-loading all possible benchmarking BPF programs for
trigger benchmarks, explicitly specify which BPF programs are used by
specific benchmark and load only it.

This allows to be more flexible in supporting older kernels, where some
program types might not be possible to load (e.g., those that rely on
newly added kfunc).

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240326162151.3981687-5-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:31:40 -07:00
Andrii Nakryiko
208c439120 selftests/bpf: remove syscall-driven benchs, keep syscall-count only
Remove "legacy" benchmarks triggered by syscalls in favor of newly added
in-kernel/batched benchmarks. Drop -batched suffix now as well.
Next patch will restore "feature parity" by adding back
tp/raw_tp/fmodret benchmarks based on in-kernel kfunc approach.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240326162151.3981687-4-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:31:40 -07:00
Andrii Nakryiko
7df4e597ea selftests/bpf: add batched, mostly in-kernel BPF triggering benchmarks
Existing kprobe/fentry triggering benchmarks have 1-to-1 mapping between
one syscall execution and BPF program run. While we use a fast
get_pgid() syscall, syscall overhead can still be non-trivial.

This patch adds kprobe/fentry set of benchmarks significantly amortizing
the cost of syscall vs actual BPF triggering overhead. We do this by
employing BPF_PROG_TEST_RUN command to trigger "driver" raw_tp program
which does a tight parameterized loop calling cheap BPF helper
(bpf_get_numa_node_id()), to which kprobe/fentry programs are
attached for benchmarking.

This way 1 bpf() syscall causes N executions of BPF program being
benchmarked. N defaults to 100, but can be adjusted with
--trig-batch-iters CLI argument.

For comparison we also implement a new baseline program that instead of
triggering another BPF program just does N atomic per-CPU counter
increments, establishing the limit for all other types of program within
this batched benchmarking setup.

Taking the final set of benchmarks added in this patch set (including
tp/raw_tp/fmodret, added in later patch), and keeping for now "legacy"
syscall-driven benchmarks, we can capture all triggering benchmarks in
one place for comparison, before we remove the legacy ones (and rename
xxx-batched into just xxx).

$ benchs/run_bench_trigger.sh
usermode-count       :   79.500 ± 0.024M/s
kernel-count         :   49.949 ± 0.081M/s
syscall-count        :    9.009 ± 0.007M/s

fentry-batch         :   31.002 ± 0.015M/s
fexit-batch          :   20.372 ± 0.028M/s
fmodret-batch        :   21.651 ± 0.659M/s
rawtp-batch          :   36.775 ± 0.264M/s
tp-batch             :   19.411 ± 0.248M/s
kprobe-batch         :   12.949 ± 0.220M/s
kprobe-multi-batch   :   15.400 ± 0.007M/s
kretprobe-batch      :    5.559 ± 0.011M/s
kretprobe-multi-batch:    5.861 ± 0.003M/s

fentry-legacy        :    8.329 ± 0.004M/s
fexit-legacy         :    6.239 ± 0.003M/s
fmodret-legacy       :    6.595 ± 0.001M/s
rawtp-legacy         :    8.305 ± 0.004M/s
tp-legacy            :    6.382 ± 0.001M/s
kprobe-legacy        :    5.528 ± 0.003M/s
kprobe-multi-legacy  :    5.864 ± 0.022M/s
kretprobe-legacy     :    3.081 ± 0.001M/s
kretprobe-multi-legacy:   3.193 ± 0.001M/s

Note how xxx-batch variants are measured with significantly higher
throughput, even though it's exactly the same in-kernel overhead. As
such, results can be compared only between benchmarks of the same kind
(syscall vs batched):

fentry-legacy        :    8.329 ± 0.004M/s
fentry-batch         :   31.002 ± 0.015M/s

kprobe-multi-legacy  :    5.864 ± 0.022M/s
kprobe-multi-batch   :   15.400 ± 0.007M/s

Note also that syscall-count is setting a theoretical limit for
syscall-triggered benchmarks, while kernel-count is setting similar
limits for batch variants. usermode-count is a happy and unachievable
case of user space counting without doing any syscalls, and is mostly
the measure of CPU speed for such a trivial benchmark.

As was mentioned, tp/raw_tp/fmodret require kernel-side kfunc to produce
similar benchmark, which we address in a separate patch.

Note that run_bench_trigger.sh allows to override a list of benchmarks
to run, which is very useful for performance work.

Cc: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240326162151.3981687-3-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:31:40 -07:00
Andrii Nakryiko
1175f8dea3 selftests/bpf: rename and clean up userspace-triggered benchmarks
Rename uprobe-base to more precise usermode-count (it will match other
baseline-like benchmarks, kernel-count and syscall-count). Also use
BENCH_TRIG_USERMODE() macro to define all usermode-based triggering
benchmarks, which include usermode-count and uprobe/uretprobe benchmarks.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
Link: https://lore.kernel.org/r/20240326162151.3981687-2-andrii@kernel.org
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:31:39 -07:00
Haiyue Wang
55fc888ded bpf,arena: Use helper sizeof_field in struct accessors
Use the well defined helper sizeof_field() to calculate the size of a
struct member, instead of doing custom calculations.

Signed-off-by: Haiyue Wang <haiyue.wang@intel.com>
Link: https://lore.kernel.org/r/20240327065334.8140-1-haiyue.wang@intel.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:31:36 -07:00
Mykyta Yatsenko
786bf0e7e2 bpf: improve error message for unsupported helper
BPF verifier emits "unknown func" message when given BPF program type
does not support BPF helper. This message may be confusing for users, as
important context that helper is unknown only to current program type is
not provided.

This patch changes message to "program of this type cannot use helper "
and aligns dependent code in libbpf and tests. Any suggestions on
improving/changing this message are welcome.

Signed-off-by: Mykyta Yatsenko <yatsenko@meta.com>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: Quentin Monnet <qmo@kernel.org>
Link: https://lore.kernel.org/r/20240325152210.377548-1-yatsenko@meta.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:30:53 -07:00
Anton Protopopov
59b418c706 bpf: Add a check for struct bpf_fib_lookup size
The struct bpf_fib_lookup should not grow outside of its 64 bytes.
Add a static assert to validate this.

Suggested-by: David Ahern <dsahern@kernel.org>
Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240326101742.17421-4-aspsk@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:30:53 -07:00
Anton Protopopov
6efec2cb06 selftests/bpf: Add BPF_FIB_LOOKUP_MARK tests
This patch extends the fib_lookup test suite by adding a few test
cases for each IP family to test the new BPF_FIB_LOOKUP_MARK flag
to the bpf_fib_lookup:

  * Test destination IP address selection with and without a mark
    and/or the BPF_FIB_LOOKUP_MARK flag set

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240326101742.17421-3-aspsk@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:30:53 -07:00
Anton Protopopov
5311591fbb bpf: Add support for passing mark with bpf_fib_lookup
Extend the bpf_fib_lookup() helper by making it to utilize mark if
the BPF_FIB_LOOKUP_MARK flag is set. In order to pass the mark the
four bytes of struct bpf_fib_lookup are used, shared with the
output-only smac/dmac fields.

Signed-off-by: Anton Protopopov <aspsk@isovalent.com>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: David Ahern <dsahern@kernel.org>
Acked-by: Daniel Borkmann <daniel@iogearbox.net>
Link: https://lore.kernel.org/bpf/20240326101742.17421-2-aspsk@isovalent.com
Signed-off-by: Alexei Starovoitov <ast@kernel.org>
2024-03-28 18:30:52 -07:00
Jakub Kicinski
c602f4ca13 Merge branch 'ravb-support-describing-the-mdio-bus'
Niklas Söderlund says:

====================
ravb: Support describing the MDIO bus

This series adds support to the binding and driver of the Renesas
Ethernet AVB to described the MDIO bus. Currently the driver uses
the OF node of the device itself when registering the MDIO bus.
This forces any MDIO bus properties the MDIO core should react on
to be set on the device OF node. This is confusing and none of
the MDIO bus properties are described in the Ethernet AVB bindings.

Patch 1/2 extends the bindings with an optional mdio child-node
to the device that can be used to contain the MDIO bus settings.
While patch 2/2 changes the driver to use this node (if present)
when registering the MDIO bus.

If the new optional mdio child-node is not present the driver
fallback to the old behavior and uses the device OF node like before.
This change is fully backward compatible with existing usage
of the bindings.
====================

Link: https://lore.kernel.org/r/20240325153451.2366083-1-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:17:54 -07:00
Niklas Söderlund
2c60c4c008 ravb: Add support for an optional MDIO mode
The driver used the DT node of the device itself when registering the
MDIO bus. While this works, it creates a problem: it forces any MDIO bus
properties to also be set on the devices DT node. This mixes the
properties of two distinctly different things and is confusing.

This change adds support for an optional mdio node to be defined as a
child to the device DT node. The child node can then be used to describe
MDIO bus properties that the MDIO core can act on when registering the
bus.

If no mdio child node is found the driver fallback to the old behavior
and register the MDIO bus using the device DT node. This change is
backward compatible with old bindings in use.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240325153451.2366083-3-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:17:52 -07:00
Niklas Söderlund
a87590c45c dt-bindings: net: renesas,etheravb: Add optional MDIO bus node
The Renesas Ethernet AVB bindings do not allow the MDIO bus to be
described. This has not been needed as only a single PHY is
supported and no MDIO bus properties have been needed.

Add an optional mdio node to the binding which allows the MDIO bus to be
described and allow bus properties to be set.

Signed-off-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se>
Reviewed-by: Sergey Shtylyov <s.shtylyov@omp.ru>
Reviewed-by: Rob Herring <robh@kernel.org>
Link: https://lore.kernel.org/r/20240325153451.2366083-2-niklas.soderlund+renesas@ragnatech.se
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:17:52 -07:00
Jakub Kicinski
fb984d17fd Merge branch 'doc-netlink-specs-add-vlan-support'
Hangbin Liu says:

====================
doc/netlink/specs: Add vlan support

Add vlan support in rt_link spec.
====================

Link: https://lore.kernel.org/r/20240327123130.1322921-1-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:07:10 -07:00
Hangbin Liu
782c1084b9 doc/netlink/specs: Add vlan attr in rt_link spec
With command:
 # ./tools/net/ynl/cli.py \
 --spec Documentation/netlink/specs/rt_link.yaml \
 --do getlink --json '{"ifname": "eno1.2"}' --output-json | \
 jq -C '.linkinfo'

Before:
Exception: No message format for 'vlan' in sub-message spec 'linkinfo-data-msg'

After:
 {
   "kind": "vlan",
   "data": {
     "protocol": "8021q",
     "id": 2,
     "flag": {
       "flags": [
         "reorder-hdr"
       ],
       "mask": "0xffffffff"
     },
     "egress-qos": {
       "mapping": [
         {
           "from": 1,
           "to": 2
         },
         {
           "from": 4,
           "to": 4
         }
       ]
     }
   }
 }

Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Link: https://lore.kernel.org/r/20240327123130.1322921-3-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:07:08 -07:00
Hangbin Liu
b334f5ed3d ynl: support hex display_hint for integer
Some times it would be convenient to read the integer as hex, like
mask values.

Suggested-by: Donald Hunter <donald.hunter@gmail.com>
Reviewed-by: Donald Hunter <donald.hunter@gmail.com>
Signed-off-by: Hangbin Liu <liuhangbin@gmail.com>
Link: https://lore.kernel.org/r/20240327123130.1322921-2-liuhangbin@gmail.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:07:08 -07:00
Jakub Kicinski
51cf49f626 Merge branch 'selftests-fixes-for-kernel-ci'
Petr Machata says:

====================
selftests: Fixes for kernel CI

As discussed on the bi-weekly call on Jan 30, and in mailing around
kernel CI effort, some changes are desirable in the suite of forwarding
selftests the better to work with the CI tooling. Namely:

- The forwarding selftests use a configuration file where names of
  interfaces are defined and various variables can be overridden. There
  is also forwarding.config.sample that users can use as a template to
  refer to when creating the config file. What happens a fair bit is
  that users either do not know about this at all, or simply forget, and
  are confused by cryptic failures about interfaces that cannot be
  created.

  In patches #1 - #3 have lib.sh just be the single source of truth with
  regards to which variables exist. That includes the topology variables
  which were previously only in the sample file, and any "tweak
  variables", such as what tools to use, sleep times, etc.

  forwarding.config.sample then becomes just a placeholder with a couple
  examples. Unless specific HW should be exercised, or specific tools
  used, the defaults are usually just fine.

- Several net/forwarding/ selftests (and one net/ one) cannot be run on
  veth pairs, they need an actual HW interface to run on. They are
  generic in the sense that any capable HW should pass them, which is
  why they have been put to net/forwarding/ as opposed to drivers/net/,
  but they do not generalize to veth. The fact that these tests are in
  net/forwarding/, but still complaining when run, is confusing.

  In patches #4 - #6 move these tests to a new directory
  drivers/net/hw.

- The following patches extend the codebase to handle well test results
  other than pass and fail.

  Patch #7 is preparatory. It converts several log_test_skip to XFAIL,
  so that tests do not spuriously end up returning non-0 when they
  are not supposed to.

  In patches #8 - #10, introduce some missing ksft constants, then support
  having those constants in RET, and then finally in EXIT_STATUS.

- The traffic scheduler tests generate a large amount of network traffic
  to test the behavior of the scheduler. This demands a relatively
  high-performance computer. On slow machines, such as with a debugging
  kernel, the test would spuriously fail.

  It can still be useful to "go through the motions" though, to possibly
  catch bugs in setup of the scheduler graph and passing packets around.
  Thus we still want to run the tests, just with lowered demands.

  To that end, in patches #11 - #12, introduce an environment variable
  KSFT_MACHINE_SLOW, with obvious meaning. Tests can then make checks
  more lenient, such as mark failures as XFAIL. A helper, xfail_on_slow,
  is provided to mark performance-sensitive parts of the selftest.

- In patch #13, use a similar mechanism to mark a NH group stats
  selftest to XFAIL HW stats tests when run on VETH pairs.

- All these changes complicate the hitherto straightforward logging and
  checking logic, so in patch #14, add a selftest that checks this
  functionality in lib.sh.

v1 (vs. an RFC circulated through linux-kselftest):
- Patch #9:
    - Clarify intended usage by s/set_ret/ret_set_ksft_status/,
      s/nret/ksft_status/
====================

Link: https://lore.kernel.org/r/cover.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:46 -07:00
Petr Machata
8ff2d7abfb selftests: forwarding: Add a test for testing lib.sh functionality
Rerunning various scenarios to make sure lib.sh changes do not impact the
observable behavior is no fun. Add a selftest at least for the bare basics
-- the mechanics of setting RET, retmsg, and EXIT_STATUS.

Since the selftest itself uses lib.sh, it would be possible to break lib.sh
in such a way that invalidates result of the selftest. Since the metatest
only uses the bare basics (just pass/fail), hopefully such fundamental
breakages would be noticed.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/6d25cedbf2d4b83614944809a34fe023fbe8db38.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:43 -07:00
Petr Machata
6db870bbf7 selftests: forwarding: router_mpath_nh_lib: Don't skip, xfail on veth
When the NH group stats tests are currently run on a veth topology, the
HW-stats leg of each test is SKIP'ped. But kernel networking CI interprets
skips as a sign that tooling is missing, and prompts maintainer
investigation. Lack of capability to pass a test should be expressed as
XFAIL.

Selftests that require HW should normally be put in drivers/net/hw, but
doing so for the NH counter selftests would just lead to a lot of
duplicity.

So instead, introduce a helper, xfail_on_veth(), which can be used to mark
selftests that should XFAIL instead of FAILing when run on a veth topology.
On non-veth topology, they don't do anything.

Use the helper in the HW-stats part of router_mpath_nh_lib selftest.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/15f0ab9637aa0497f164ec30e83c1c8f53d53719.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:43 -07:00
Petr Machata
e10391092a selftests: forwarding: Mark performance-sensitive tests
When run on a slow machine, the scheduler traffic tests can be expected to
fail, and should be reported as XFAIL in that case. Therefore run these
tests through the perf_sensitive wrapper.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/9a357f8cf34f5ececac08d43a3eb023008996035.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:43 -07:00
Petr Machata
e16a8d209c selftests: forwarding: Support for performance sensitive tests
Several tests in the suite use large amounts of traffic to e.g. cause
congestion and evaluate RED or shaper performance. These tests will not run
well on a slow machine, be it one with heavy debug kernel, or a VM, or e.g.
a single-board computer. Allow users to specify an environment variable,
KSFT_MACHINE_SLOW=yes, to indicate that the tests are being run on one such
machine.

Performance sensitive tests can then use a new helper, xfail_on_slow(), to
mark parts of the test that are sensitive to low-performance machines.
The helper can be used to just mark the whole suite, like so:

	xfail_on_slow tests_run

... or, on the other side of the granularity spectrum, to override
individual checks:

	xfail_on_slow check_err $? "Expected much, got little."

Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/99a376a2d2ffdaeee7752b1910cb0c3ea5d80fbe.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:43 -07:00
Petr Machata
a923af1cee selftests: forwarding: Convert log_test() to recognize RET values
In a previous patch, the interpretation of RET value was changed to mean
the kselftest framework constant with the test outcome: $ksft_pass,
$ksft_xfail, etc.

Update log_test() to recognize the various possible RET values.

Then have EXIT_STATUS track the RET value of the current test. This differs
subtly from the way RET tracks the value: while for RET we want to
recognize XFAIL as a separate status, for purposes of exit code, we want to
to conflate XFAIL and PASS, because they both communicate non-failure. Thus
add a new helper, ksft_exit_status_merge().

With this log_test_skip() and log_test_xfail() can be reexpressed as thin
wrappers around log_test.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/e5f807cb5476ab795fd14ac74da53a731a9fc432.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:43 -07:00
Petr Machata
596c8819cb selftests: forwarding: Have RET track kselftest framework constants
The variable RET keeps track of whether the test under execution has so far
failed or not. Currently it works in binary fashion: zero means everything
is fine, non-zero means something failed. log_test() then uses the value to
given a human-readable message.

In order to allow log_test() to report skips and xfails, the semantics of
RET need to be more fine-grained. Therefore have RET value be one of
kselftest framework constants: $ksft_fail, $ksft_xfail, etc.

The current logic in check_err() is such that first non-zero value of RET
trumps all those that follow. But that is not right when RET has more
fine-grained value semantics. Different outcomes have different weights.

The results of PASS and XFAIL are mostly the same: they both communicate a
test that did not go wrong. SKIP communicates lack of tooling, which the
user should go and try to fix, and as such should not be overridden by the
passes. So far, the higher-numbered statuses can be considered weightier.
But FAIL should be the weightiest.

Add a helper, ksft_status_merge(), which merges two statuses in a way that
respects the above conditions. Express it in a generic manner, because exit
status merge is subtly different, and we want to reuse the same logic.

Use the new helper when setting RET in check_err().

Re-express check_fail() in terms of check_err() to avoid duplication.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/7dfff51cc925c7a3ac879b9050a0d6a327c8d21f.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:42 -07:00
Petr Machata
51ccf267be selftests: lib: Define more kselftest exit codes
The following patches will operate with more exit codes besides
ksft_skip. Add them here.

Additionally, move a duplicated skip exit code definition from
forwarding/tc_tunnel_key.sh. Keep a similar duplicate in
forwarding/devlink_lib.sh, because even though lib.sh will have
been sourced in all cases where devlink_lib is, the inclusion is not
visible in the file itself, and relying on it would be confusing.

Cc: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/545a03046c7aca0628a51a389a9b81949ab288ce.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:42 -07:00
Petr Machata
677f394956 selftests: forwarding: Change inappropriate log_test_skip() calls
The SKIP return should be used for cases where tooling of the machine under
test is lacking. For cases where HW is lacking, the appropriate outcome is
XFAIL.

This is the case with ethtool_rmon and mlxsw_lib. For these, introduce a
new helper, log_test_xfail().

Do the same for router_mpath_nh_lib. Note that it will be fixed using a
more reusable way in a following patch.

For the two resource_scale selftests, the log should simply not be written,
because there is no problem.

Cc: Tobias Waldekranz <tobias@waldekranz.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/3d668d8fb6fa0d9eeb47ce6d9e54114348c7c179.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:42 -07:00
Petr Machata
0c499a3517 selftests: forwarding: Ditch skip_on_veth()
Since the selftests that are not supposed to run on veth pairs are now in
their own dedicated directory, the skip_on_veth logic can go away. Drop it
from the selftests, and from lib.sh.

Cc: Danielle Ratson <danieller@nvidia.com>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/63b470e10d65270571ee7de709b31672ce314872.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:42 -07:00
Petr Machata
40d269c000 selftests: forwarding: Move several selftests
The tests in net/forwarding are generally expected to be HW-independent.
There are however several tests that, while not depending on any HW in
particular, nevertheless depend on being used on HW interfaces. Placing
these selftests to net/forwarding is confusing, because the selftest will
just report it can't be run on veth pairs. At the same time, placing them
to a particular driver's selftests subdirectory would be wrong.

Instead, add a new directory, drivers/net/hw, where these generic but HW
independent selftests should be placed. Move over several such tests
including one helper library.

Since typically these tests will not be expected to run, omit the directory
drivers/net/hw from the TARGETS list in selftests/Makefile. Retain a
Makefile in the new directory itself, so that a user can make -C into that
directory and act on those tests explicitly.

Cc: Roger Quadros <rogerq@kernel.org>
Cc: Tobias Waldekranz <tobias@waldekranz.com>
Cc: Danielle Ratson <danieller@nvidia.com>
Cc: Davide Caratti <dcaratti@redhat.com>
Cc: Johannes Nixdorf <jnixdorf-oss@avm.de>
Suggested-by: Jakub Kicinski <kuba@kernel.org>
Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/e11dae1f62703059e9fc2240004288ac7cc15756.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:42 -07:00
Petr Machata
0faa565bc4 selftests: forwarding: ipip_lib: Do not import lib.sh
This library is always sourced in the context where lib.sh has already been
sourced as well. Therefore drop the explicit sourcing and expect the client
to already have done it. This will simplify moving some of the clients to a
different directory.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Link: https://lore.kernel.org/r/a4da5e9cd42a34cbace917a048ca71081719d6ac.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:42 -07:00
Petr Machata
0cb862871f selftests: forwarding: README: Document customization
That any sort of customization is possible at all, let alone how it should
be done, is currently not at all clear. Document the whats and hows in
README.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com>
Link: https://lore.kernel.org/r/e819623af6aaeea49e9dc36cecd95694fad73bb8.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:42 -07:00
Petr Machata
fd36fd26ae selftests: forwarding.config.sample: Move overrides to lib.sh
forwarding.config.sample, net/lib.sh and net/forwarding/lib.sh contain
definitions and redefinitions of some of the same variables. The overlap
between net/forwarding/lib.sh and forwarding.config.sample is especially
large. This duplication is a potential source of confusion and problems.

It would be overall less error prone if each variable were defined in one
place only. In this patch set, that place is the library itself. Therefore
move all comments from forwarding.config.sample to net/forwarding/lib.sh.

Move over also a definition of TC_FLAG, which was missing from lib.sh
entirely.

Additionally, add to lib.sh a default definition of the topology variables.
The logic behind this is that forgetting to specify forwarding.config was a
frequent source of frustration for the selftest users. But really, most of
the time the default veth based topology is just fine. We considered just
sourcing forwarding.config.sample instead if forwarding.config is not
available, but this is a cleaner solution.

That means the syntax of the forwarding.config.sample override has to
change to an array assignment, so that the whole variable is overwritten,
not just individual keys, which could leave the value of some keys
unchanged. Do the same in lib.sh for any cut'n'pasters out there.

The config file is then given a sort of carte blanche to redefine whatever
variables it sees fit from the libraries. This is described in a comment in
the file. Only a handful of variables are left behind, to illustrate the
customization.

The fact that the variables are now missing from forwarding.config.sample,
and therefore would miss from forwarding.config derived from that file as
well, should not change anything. This is just the sample file. Users that
keep their own forwarding.config would retain it as before.

The only observable change is introduction of TC_FLAG to lib.sh, because
now the filters would not be attempted to install to HW datapath. For veth
pairs this does not change anything. For HW deployments, users presumably
have forwarding.config with this value overridden.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com>
Link: https://lore.kernel.org/r/b9b8a11a22821a7aa532211ff461a34f596e26bf.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:41 -07:00
Petr Machata
fa61e9aec9 selftests: net: libs: Change variable fallback syntax
The current syntax of X=${X:=X} first evaluates the ${X:=Y} expression,
which either uses the existing value of $X if there is one, or uses the
value of "Y" as a fallback, and assigns it to X. The expression is then
replaced with the now-current value of $X. Assigning that value to X once
more is meaningless.

So avoid the outer X=... bit, and instead express the same idea though the
do-nothing ":" built-in as : "${X:=Y}". This also cleans up the block
nicely and makes it more readable.

Signed-off-by: Petr Machata <petrm@nvidia.com>
Reviewed-by: Benjamin Poirier <bpoirier@nvidia.com>
Link: https://lore.kernel.org/r/1890ddc58420c2c0d5ba3154c87ecc6d9faf6947.1711464583.git.petrm@nvidia.com
Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 18:03:41 -07:00
Jakub Kicinski
5e47fbe5ce Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net
Cross-merge networking fixes after downstream PR.

No conflicts, or adjacent changes.

Signed-off-by: Jakub Kicinski <kuba@kernel.org>
2024-03-28 17:25:57 -07:00
Linus Torvalds
50108c352d Including fixes from bpf, WiFi and netfilter.
Current release - regressions:
 
  - ipv6: fix address dump when IPv6 is disabled on an interface
 
 Current release - new code bugs:
 
  - bpf: temporarily disable atomic operations in BPF arena
 
  - nexthop: fix uninitialized variable in nla_put_nh_group_stats()
 
 Previous releases - regressions:
 
  - bpf: protect against int overflow for stack access size
 
  - hsr: fix the promiscuous mode in offload mode
 
  - wifi: don't always use FW dump trig
 
  - tls: adjust recv return with async crypto and failed copy to userspace
 
  - tcp: properly terminate timers for kernel sockets
 
  - ice: fix memory corruption bug with suspend and rebuild
 
  - at803x: fix kernel panic with at8031_probe
 
  - qeth: handle deferred cc1
 
 Previous releases - always broken:
 
  - bpf: fix bug in BPF_LDX_MEMSX
 
  - netfilter: reject table flag and netdev basechain updates
 
  - inet_defrag: prevent sk release while still in use
 
  - wifi: pick the version of SESSION_PROTECTION_NOTIF
 
  - wwan: t7xx: split 64bit accesses to fix alignment issues
 
  - mlxbf_gige: call request_irq() after NAPI initialized
 
  - hns3: fix kernel crash when devlink reload during pf initialization
 
 Signed-off-by: Paolo Abeni <pabeni@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEEg1AjqC77wbdLX2LbKSR5jcyPE6QFAmYFezkSHHBhYmVuaUBy
 ZWRoYXQuY29tAAoJECkkeY3MjxOkdAUP/3SYNsFNIkh0/jwQqO9qBLJfI4suFjYG
 +s8jOGdCiA7n7aSgzv/RgGZ7XNqOegW3mpPRHecVVZcDu5I9y9N4AOhTDQG84TM/
 65YatgWpiZJT74oVEpoA8zcnmb4CCGYdWAxJCQZUKXoLjMAMPWelU4ee6VwonxGy
 GJ97+a4AxTXGvmQTi3rz0HLrSHQaizA+D7YP7YD8JczkG7I7kcAIR+SUWVKLSuw0
 VJnbko7RPIe3vdFFlMFypPgpZASjnO0O8g60s+eruazarEpMZE2+RqPfyz0nEg+u
 IK3W9zRw7r0PMkKqk9PoSaRjsIaNqIZBJR2Smh2cLMIpEB4CUvEFLi7WAshIdyUC
 +LBN9um3Ep3vLYh4nyuU3FzAyqdsqEo6+ayJCTRKq91xv9LrLmIN16IQpAqaRikb
 LJAuiaASwIpyu1FxBuTv41mLEUKtpm7ooziomHTJ7KbtzSf4QevRMBtorrB5t7VH
 l4yvp9ymcwHE79q8nrak1JH1JI/kCT5ZEPSqcOU5UNKSf6INjWqUTJedqZdVa5wB
 WiSZBixAmsc7DgZzARWKotRkgBEDyGeeHwrNLo/2kS8rS+hUCf6mSafpTZiPI/kL
 e+SVh+9RA8elFIF3sBV0VPcyt35G+if8o1NG1/2OTDPvZEkIz21eJhJgGyxRMHCD
 cpVSRBkU+np3
 =HbtI
 -----END PGP SIGNATURE-----

Merge tag 'net-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net

Pull networking fixes from Paolo Abeni:
 "Including fixes from bpf, WiFi and netfilter.

  Current release - regressions:

   - ipv6: fix address dump when IPv6 is disabled on an interface

  Current release - new code bugs:

   - bpf: temporarily disable atomic operations in BPF arena

   - nexthop: fix uninitialized variable in nla_put_nh_group_stats()

  Previous releases - regressions:

   - bpf: protect against int overflow for stack access size

   - hsr: fix the promiscuous mode in offload mode

   - wifi: don't always use FW dump trig

   - tls: adjust recv return with async crypto and failed copy to
     userspace

   - tcp: properly terminate timers for kernel sockets

   - ice: fix memory corruption bug with suspend and rebuild

   - at803x: fix kernel panic with at8031_probe

   - qeth: handle deferred cc1

  Previous releases - always broken:

   - bpf: fix bug in BPF_LDX_MEMSX

   - netfilter: reject table flag and netdev basechain updates

   - inet_defrag: prevent sk release while still in use

   - wifi: pick the version of SESSION_PROTECTION_NOTIF

   - wwan: t7xx: split 64bit accesses to fix alignment issues

   - mlxbf_gige: call request_irq() after NAPI initialized

   - hns3: fix kernel crash when devlink reload during pf
     initialization"

* tag 'net-6.9-rc2' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (81 commits)
  inet: inet_defrag: prevent sk release while still in use
  Octeontx2-af: fix pause frame configuration in GMP mode
  net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips
  net: bcmasp: Remove phy_{suspend/resume}
  net: bcmasp: Bring up unimac after PHY link up
  net: phy: qcom: at803x: fix kernel panic with at8031_probe
  netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.c
  netfilter: nf_tables: skip netdev hook unregistration if table is dormant
  netfilter: nf_tables: reject table flag and netdev basechain updates
  netfilter: nf_tables: reject destroy command to remove basechain hooks
  bpf: update BPF LSM designated reviewer list
  bpf: Protect against int overflow for stack access size
  bpf: Check bloom filter map value size
  bpf: fix warning for crash_kexec
  selftests: netdevsim: set test timeout to 10 minutes
  net: wan: framer: Add missing static inline qualifiers
  mlxbf_gige: call request_irq() after NAPI initialized
  tls: get psock ref after taking rxlock to avoid leak
  selftests: tls: add test with a partially invalid iov
  tls: adjust recv return with async crypto and failed copy to userspace
  ...
2024-03-28 13:09:37 -07:00
Florian Westphal
18685451fc inet: inet_defrag: prevent sk release while still in use
ip_local_out() and other functions can pass skb->sk as function argument.

If the skb is a fragment and reassembly happens before such function call
returns, the sk must not be released.

This affects skb fragments reassembled via netfilter or similar
modules, e.g. openvswitch or ct_act.c, when run as part of tx pipeline.

Eric Dumazet made an initial analysis of this bug.  Quoting Eric:
  Calling ip_defrag() in output path is also implying skb_orphan(),
  which is buggy because output path relies on sk not disappearing.

  A relevant old patch about the issue was :
  8282f27449 ("inet: frag: Always orphan skbs inside ip_defrag()")

  [..]

  net/ipv4/ip_output.c depends on skb->sk being set, and probably to an
  inet socket, not an arbitrary one.

  If we orphan the packet in ipvlan, then downstream things like FQ
  packet scheduler will not work properly.

  We need to change ip_defrag() to only use skb_orphan() when really
  needed, ie whenever frag_list is going to be used.

Eric suggested to stash sk in fragment queue and made an initial patch.
However there is a problem with this:

If skb is refragmented again right after, ip_do_fragment() will copy
head->sk to the new fragments, and sets up destructor to sock_wfree.
IOW, we have no choice but to fix up sk_wmem accouting to reflect the
fully reassembled skb, else wmem will underflow.

This change moves the orphan down into the core, to last possible moment.
As ip_defrag_offset is aliased with sk_buff->sk member, we must move the
offset into the FRAG_CB, else skb->sk gets clobbered.

This allows to delay the orphaning long enough to learn if the skb has
to be queued or if the skb is completing the reasm queue.

In the former case, things work as before, skb is orphaned.  This is
safe because skb gets queued/stolen and won't continue past reasm engine.

In the latter case, we will steal the skb->sk reference, reattach it to
the head skb, and fix up wmem accouting when inet_frag inflates truesize.

Fixes: 7026b1ddb6 ("netfilter: Pass socket pointer down through okfn().")
Diagnosed-by: Eric Dumazet <edumazet@google.com>
Reported-by: xingwei lee <xrivendell7@gmail.com>
Reported-by: yue sun <samsun1006219@gmail.com>
Reported-by: syzbot+e5167d7144a62715044c@syzkaller.appspotmail.com
Signed-off-by: Florian Westphal <fw@strlen.de>
Reviewed-by: Eric Dumazet <edumazet@google.com>
Link: https://lore.kernel.org/r/20240326101845.30836-1-fw@strlen.de
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-28 12:06:22 +01:00
Hariprasad Kelam
40d4b4807c Octeontx2-af: fix pause frame configuration in GMP mode
The Octeontx2 MAC block (CGX) has separate data paths (SMU and GMP) for
different speeds, allowing for efficient data transfer.

The previous patch which added pause frame configuration has a bug due
to which pause frame feature is not working in GMP mode.

This patch fixes the issue by configurating appropriate registers.

Fixes: f7e086e754 ("octeontx2-af: Pause frame configuration at cgx")
Signed-off-by: Hariprasad Kelam <hkelam@marvell.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240326052720.4441-1-hkelam@marvell.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-28 11:56:47 +01:00
Raju Lakkaraju
e4a58989f5 net: lan743x: Add set RFE read fifo threshold for PCI1x1x chips
PCI11x1x Rev B0 devices might drop packets when receiving back to back frames
at 2.5G link speed. Change the B0 Rev device's Receive filtering Engine FIFO
threshold parameter from its hardware default of 4 to 3 dwords to prevent the
problem. Rev C0 and later hardware already defaults to 3 dwords.

Fixes: bb4f6bffe3 ("net: lan743x: Add PCI11010 / PCI11414 device IDs")
Signed-off-by: Raju Lakkaraju <Raju.Lakkaraju@microchip.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Link: https://lore.kernel.org/r/20240326065805.686128-1-Raju.Lakkaraju@microchip.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-28 11:36:10 +01:00
Paolo Abeni
eb67cdb33f Merge branch 'net-bcmasp-phy-managements-fixes'
Justin Chen says:

====================
net: bcmasp: phy managements fixes

Fix two issues.

- The unimac may be put in a bad state if PHY RX clk doesn't exist
  during reset. Work around this by bringing the unimac out of reset
  during phy up.

- Remove redundant phy_{suspend/resume}
====================

Link: https://lore.kernel.org/r/20240325193025.1540737-1-justin.chen@broadcom.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-28 10:46:39 +01:00
Justin Chen
4494c10e00 net: bcmasp: Remove phy_{suspend/resume}
phy_{suspend/resume} is redundant. It gets called from phy_{stop/start}.

Fixes: 490cb41200 ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-28 10:46:38 +01:00
Justin Chen
dfd222e2ae net: bcmasp: Bring up unimac after PHY link up
The unimac requires the PHY RX clk during reset or it may be put
into a bad state. Bring up the unimac after link up to ensure the
PHY RX clk exists.

Fixes: 490cb41200 ("net: bcmasp: Add support for ASP2.0 Ethernet controller")
Signed-off-by: Justin Chen <justin.chen@broadcom.com>
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-28 10:46:38 +01:00
Christian Marangi
6a4aee2777 net: phy: qcom: at803x: fix kernel panic with at8031_probe
On reworking and splitting the at803x driver, in splitting function of
at803x PHYs it was added a NULL dereference bug where priv is referenced
before it's actually allocated and then is tried to write to for the
is_1000basex and is_fiber variables in the case of at8031, writing on
the wrong address.

Fix this by correctly setting priv local variable only after
at803x_probe is called and actually allocates priv in the phydev struct.

Reported-by: William Wortel <wwortel@dorpstraat.com>
Cc: <stable@vger.kernel.org>
Fixes: 25d2ba9400 ("net: phy: at803x: move specific at8031 probe mode check to dedicated probe")
Signed-off-by: Christian Marangi <ansuelsmth@gmail.com>
Reviewed-by: Andrew Lunn <andrew@lunn.ch>
Link: https://lore.kernel.org/r/20240325190621.2665-1-ansuelsmth@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-28 10:42:22 +01:00
Paolo Abeni
005e528c24 netfilter pull request 24-03-28
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEN9lkrMBJgcdVAPub1V2XiooUIOQFAmYE3H8ACgkQ1V2XiooU
 IOR/2xAAiN+XhXMFiplQvR/6CEVirEriFIUUT8IR+fwIBUsFc6QAMpdNwCHo7j49
 2gFxHNER8oD2IynVCvqYM/O2Ukz8S3FclMicgOb03HZ3cenCUjW1l0el8dYhb8Do
 twVs2Trt217QyiwNkM0B68+5lo3wDo3uB70p2abdmcJ1tgCPncpDPR2Pl3DBPswP
 kMrO1aohYBTn4SFyaVbCLzkOlS1T6Bf4yqQMcL+zgIdd9+kLkYRqHlvMiwM/vgwp
 JJk7mnQx9h73y8sx9EAgaaf+63rNJ1JcDsKAhAAqJa9lJMVOPFTChaDOGp4aInvD
 qYBUIqCRC/FWN7BEnq4Hj6NvLmbUPm+9YkMnE7nCfZXJVCVUyfwBFtv1FVKvD5YU
 Ybi7Nf66bh9kYhy23TmhdQXEjzO9rIrJQEADz7qmGYydx27c5rwWUu8u7c0SRQ/V
 l/1rmT39Dr3vyZIBguIXaeU8hSwVvtlhavEVQ73wKGdMGgdg6QUB2zUrYvb/IU74
 v9PSmIoXKdcG4wwQj/ijRGgsZx556ifzehwbHhvFpCOn2THprMLnIaXC/mLKyiJb
 TwBhTxyYBqGecEa214VQaxUndzwT9txZ7ExO6ZzEZEip0RERX6LoSvlvy4xo715U
 ndq2s/yCSjRprmvEZSkgBwva0LEXjrFl9gormfkM+ejnDC4eAcQ=
 =sQm5
 -----END PGP SIGNATURE-----

Merge tag 'nf-24-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf

Pablo Neira Ayuso says:

====================
Netfilter fixes for net

The following patchset contains Netfilter fixes for net:

Patch #1 reject destroy chain command to delete device hooks in netdev
         family, hence, only delchain commands are allowed.

Patch #2 reject table flag update interference with netdev basechain
	 hook updates, this can leave hooks in inconsistent
	 registration/unregistration state.

Patch #3 do not unregister netdev basechain hooks if table is dormant.
	 Otherwise, splat with double unregistration is possible.

Patch #4 fixes Kconfig to allow to restore IP_NF_ARPTABLES,
	 from Kuniyuki Iwashima.

There are a more fixes still in progress on my side that need more work.

* tag 'nf-24-03-28' of git://git.kernel.org/pub/scm/linux/kernel/git/netfilter/nf:
  netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.c
  netfilter: nf_tables: skip netdev hook unregistration if table is dormant
  netfilter: nf_tables: reject table flag and netdev basechain updates
  netfilter: nf_tables: reject destroy command to remove basechain hooks
====================

Link: https://lore.kernel.org/r/20240328031855.2063-1-pablo@netfilter.org
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-28 10:23:03 +01:00
Paolo Abeni
7e6f4b2af5 for-net
-----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmYExnMACgkQ6rmadz2v
 bTofuA/8CVtNs4vcBfHSDaz9SzcSOp5pGhmUFHpwXkE5NPyi6tTRFRxfCkEK9/UG
 1z9J54U6I7HB/6zbrhf1PP9c7ZbD9awPYTXude1cQaN9lgyxnfl5rfMDj4H5+5S7
 TlmXxtFXUDlhcl8Hayxxe8UEZd6VPbfTP0/b7BRsesrT+G3+FxVf1Mh43NjEllYQ
 Fn/s/4UpYxz0YJCuud97fL+Vd04Dpx33ZihhIXU0hQ85ieyRMozat9o8n2bTsUGv
 7K9Jsp9SzLpELeS/ScbzCqgU5mAJYfQWaXtt7tRNOpetvmL3/HQGAM3JRmPlOtna
 KDjZFO8ihIxSpqxXxwLjy3Z9SgzwqfVn6SP4cA+vhK2Nbk1vItAD/BvPkxsX1Zl+
 Q8zSHQGNtoz+dMPlQtU1nEjVdk8YxQ/R9OI807CuiifY6590V13SfiNnxgoC213A
 tduI8q/EBFvAnuA8IJlutfVasHRuqCPmn0PXYWnlaWJP9tExE3shjCJG2Qmy3+bC
 z8RHeswujidR22VL8vDLxRKtlDl3mOclBqSJa+Cz5gH3oEBlvMfD0UU8CFeiEM4p
 ngryIc2dtd4Jd7eDKw2caNq+rgaTXpUjFi34deR0T0jO+YEwHGw6Kr/JYvU4UovY
 /YgGIeQXNMoO5eI72nNyDIeZNwENZLnt2P618vjIPDL+Pqau7go=
 =Sz5u
 -----END PGP SIGNATURE-----

Merge tag 'for-net' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf

Alexei Starovoitov says:

====================
pull-request: bpf 2024-03-27

The following pull-request contains BPF updates for your *net* tree.

We've added 4 non-merge commits during the last 1 day(s) which contain
a total of 5 files changed, 26 insertions(+), 3 deletions(-).

The main changes are:

1) Fix bloom filter value size validation and protect the verifier
   against such mistakes, from Andrei.

2) Fix build due to CONFIG_KEXEC_CORE/CRASH_DUMP split, from Hari.

3) Update bpf_lsm maintainers entry, from Matt.

* tag 'for-net' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf:
  bpf: update BPF LSM designated reviewer list
  bpf: Protect against int overflow for stack access size
  bpf: Check bloom filter map value size
  bpf: fix warning for crash_kexec
====================

Link: https://lore.kernel.org/r/20240328012938.24249-1-alexei.starovoitov@gmail.com
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
2024-03-28 10:08:00 +01:00
Linus Torvalds
8d025e2092 Changes since last update:
- Add a new reviewer Sandeep Dhavale to build a healthier community;
 
  - Drop experimental warning for FSDAX.
 -----BEGIN PGP SIGNATURE-----
 
 iQJFBAABCgAvFiEEQ0A6bDUS9Y+83NPFUXZn5Zlu5qoFAmYE1ucRHHhpYW5nQGtl
 cm5lbC5vcmcACgkQUXZn5Zlu5qoSnw//Q35Sc7fIQb6YfuOOQ2VQX7TVE4jTrLej
 +3hIQUz3BA8wRwrBYdeJLcvjda4y+0gNxlpw2ycNS6S4vxAiVEAuwPJYO6oAC0Il
 ETa6opc1vpTMeXgwwtbXC7ACnWas9EQC61Z4E8W5zeVcNqQZnZyInMJ9Rkjqs/iJ
 VjJH2wWR5MIgWJdEPqchPx/28nqbBOcztc7ARJqpujyZEvu+OBVIPv8P/7N0a5yG
 bpHkDrzoelBMkpktpuvrkv84ymyCDC7LH9mq+Wk6dY5wRFxa2BtoZVh6YcpcjrpR
 75GZVg3BN3Ph41JCwYHqxyRGpoLO11dSYi6DNDVngxOGtkTNRVGrJ1FYEapWV+o8
 1MnEbl0vZSHUkjrIFbfZTFSqpvW2XSfEOa3heNDFknmzT/ISobSGENULp9cggcYI
 jhS6wtVG4bl5bCiCKCZluByr8/J8TCQc/5t5f5bQLy2MWqlyjaSx82uWuDpzO1Hh
 +q+p+MB+ketMUwxaIUAuTNgzhFPFT4Na/ni9WqP7Ri3GJY6pdjDUUtrtoIBK4oPQ
 ajUWhPlOk5zMwLq9Jl4MiG1ostBI9P37ZerjsdaLDZYElGhTjwjPu/xlh7p26Inq
 Ufq3QaQH2wai+oAVS6Sli3dJfb399XVJhmT2WFMH+0DmQW6JzvsGTdUX2fMgv5sb
 I7dVfuceTs0=
 =+ZW9
 -----END PGP SIGNATURE-----

Merge tag 'erofs-for-6.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs

Pull erofs fixes from Gao Xiang:

 - Add a new reviewer Sandeep Dhavale to build a healthier community

 - Drop experimental warning for FSDAX

* tag 'erofs-for-6.9-rc2-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/xiang/erofs:
  MAINTAINERS: erofs: add myself as reviewer
  erofs: drop experimental warning for FSDAX
2024-03-27 20:24:09 -07:00
Kuniyuki Iwashima
15fba562f7 netfilter: arptables: Select NETFILTER_FAMILY_ARP when building arp_tables.c
syzkaller started to report a warning below [0] after consuming the
commit 4654467dc7 ("netfilter: arptables: allow xtables-nft only
builds").

The change accidentally removed the dependency on NETFILTER_FAMILY_ARP
from IP_NF_ARPTABLES.

If NF_TABLES_ARP is not enabled on Kconfig, NETFILTER_FAMILY_ARP will
be removed and some code necessary for arptables will not be compiled.

  $ grep -E "(NETFILTER_FAMILY_ARP|IP_NF_ARPTABLES|NF_TABLES_ARP)" .config
  CONFIG_NETFILTER_FAMILY_ARP=y
  # CONFIG_NF_TABLES_ARP is not set
  CONFIG_IP_NF_ARPTABLES=y

  $ make olddefconfig

  $ grep -E "(NETFILTER_FAMILY_ARP|IP_NF_ARPTABLES|NF_TABLES_ARP)" .config
  # CONFIG_NF_TABLES_ARP is not set
  CONFIG_IP_NF_ARPTABLES=y

So, when nf_register_net_hooks() is called for arptables, it will
trigger the splat below.

Now IP_NF_ARPTABLES is only enabled by IP_NF_ARPFILTER, so let's
restore the dependency on NETFILTER_FAMILY_ARP in IP_NF_ARPFILTER.

[0]:
WARNING: CPU: 0 PID: 242 at net/netfilter/core.c:316 nf_hook_entry_head+0x1e1/0x2c0 net/netfilter/core.c:316
Modules linked in:
CPU: 0 PID: 242 Comm: syz-executor.0 Not tainted 6.8.0-12821-g537c2e91d354 #10
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.0-0-gd239552ce722-prebuilt.qemu.org 04/01/2014
RIP: 0010:nf_hook_entry_head+0x1e1/0x2c0 net/netfilter/core.c:316
Code: 83 fd 04 0f 87 bc 00 00 00 e8 5b 84 83 fd 4d 8d ac ec a8 0b 00 00 e8 4e 84 83 fd 4c 89 e8 5b 5d 41 5c 41 5d c3 e8 3f 84 83 fd <0f> 0b e8 38 84 83 fd 45 31 ed 5b 5d 4c 89 e8 41 5c 41 5d c3 e8 26
RSP: 0018:ffffc90000b8f6e8 EFLAGS: 00010293
RAX: 0000000000000000 RBX: 0000000000000003 RCX: ffffffff83c42164
RDX: ffff888106851180 RSI: ffffffff83c42321 RDI: 0000000000000005
RBP: 0000000000000000 R08: 0000000000000005 R09: 000000000000000a
R10: 0000000000000003 R11: ffff8881055c2f00 R12: ffff888112b78000
R13: 0000000000000000 R14: ffff8881055c2f00 R15: ffff8881055c2f00
FS:  00007f377bd78800(0000) GS:ffff88811b000000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: 0000000000496068 CR3: 000000011298b003 CR4: 0000000000770ef0
PKRU: 55555554
Call Trace:
 <TASK>
 __nf_register_net_hook+0xcd/0x7a0 net/netfilter/core.c:428
 nf_register_net_hook+0x116/0x170 net/netfilter/core.c:578
 nf_register_net_hooks+0x5d/0xc0 net/netfilter/core.c:594
 arpt_register_table+0x250/0x420 net/ipv4/netfilter/arp_tables.c:1553
 arptable_filter_table_init+0x41/0x60 net/ipv4/netfilter/arptable_filter.c:39
 xt_find_table_lock+0x2e9/0x4b0 net/netfilter/x_tables.c:1260
 xt_request_find_table_lock+0x2b/0xe0 net/netfilter/x_tables.c:1285
 get_info+0x169/0x5c0 net/ipv4/netfilter/arp_tables.c:808
 do_arpt_get_ctl+0x3f9/0x830 net/ipv4/netfilter/arp_tables.c:1444
 nf_getsockopt+0x76/0xd0 net/netfilter/nf_sockopt.c:116
 ip_getsockopt+0x17d/0x1c0 net/ipv4/ip_sockglue.c:1777
 tcp_getsockopt+0x99/0x100 net/ipv4/tcp.c:4373
 do_sock_getsockopt+0x279/0x360 net/socket.c:2373
 __sys_getsockopt+0x115/0x1e0 net/socket.c:2402
 __do_sys_getsockopt net/socket.c:2412 [inline]
 __se_sys_getsockopt net/socket.c:2409 [inline]
 __x64_sys_getsockopt+0xbd/0x150 net/socket.c:2409
 do_syscall_x64 arch/x86/entry/common.c:52 [inline]
 do_syscall_64+0x4f/0x110 arch/x86/entry/common.c:83
 entry_SYSCALL_64_after_hwframe+0x46/0x4e
RIP: 0033:0x7f377beca6fe
Code: 1f 44 00 00 48 8b 15 01 97 0a 00 f7 d8 64 89 02 b8 ff ff ff ff eb b8 0f 1f 44 00 00 f3 0f 1e fa 49 89 ca b8 37 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 0a c3 66 0f 1f 84 00 00 00 00 00 48 8b 15 c9
RSP: 002b:00000000005df728 EFLAGS: 00000246 ORIG_RAX: 0000000000000037
RAX: ffffffffffffffda RBX: 00000000004966e0 RCX: 00007f377beca6fe
RDX: 0000000000000060 RSI: 0000000000000000 RDI: 0000000000000003
RBP: 000000000042938a R08: 00000000005df73c R09: 00000000005df800
R10: 00000000004966e8 R11: 0000000000000246 R12: 0000000000000003
R13: 0000000000496068 R14: 0000000000000003 R15: 00000000004bc9d8
 </TASK>

Fixes: 4654467dc7 ("netfilter: arptables: allow xtables-nft only builds")
Reported-by: syzkaller <syzkaller@googlegroups.com>
Signed-off-by: Kuniyuki Iwashima <kuniyu@amazon.com>
Reviewed-by: Simon Horman <horms@kernel.org>
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-28 03:54:02 +01:00
Pablo Neira Ayuso
216e7bf740 netfilter: nf_tables: skip netdev hook unregistration if table is dormant
Skip hook unregistration when adding or deleting devices from an
existing netdev basechain. Otherwise, commit/abort path try to
unregister hooks which not enabled.

Fixes: b9703ed44f ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Fixes: 7d937b1071 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-28 03:54:01 +01:00
Pablo Neira Ayuso
1e1fb6f00f netfilter: nf_tables: reject table flag and netdev basechain updates
netdev basechain updates are stored in the transaction object hook list.
When setting on the table dormant flag, it iterates over the existing
hooks in the basechain. Thus, skipping the hooks that are being
added/deleted in this transaction, which leaves hook registration in
inconsistent state.

Reject table flag updates in combination with netdev basechain updates
in the same batch:

- Update table flags and add/delete basechain: Check from basechain update
  path if there are pending flag updates for this table.
- add/delete basechain and update table flags: Iterate over the transaction
  list to search for basechain updates from the table update path.

In both cases, the batch is rejected. Based on suggestion from Florian Westphal.

Fixes: b9703ed44f ("netfilter: nf_tables: support for adding new devices to an existing netdev chain")
Fixes: 7d937b1071 ("netfilter: nf_tables: support for deleting devices in an existing netdev chain")
Signed-off-by: Pablo Neira Ayuso <pablo@netfilter.org>
2024-03-28 03:54:01 +01:00