linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-09-22 12:44:11 +08:00

Author	SHA1	Message	Date
Anjali Kulkarni	73a29531f4	connector/cn_proc: Selftest for proc connector Run as ./proc_filter -f to run new filter code. Run without "-f" to run usual proc connector code without the new filtering code. Signed-off-by: Anjali Kulkarni <anjali.k.kulkarni@oracle.com> Reviewed-by: Liam R. Howlett <Liam.Howlett@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-23 11:34:22 +01:00
Petr Machata	d7eb1f1751	selftests: mlxsw: rtnetlink: Drop obsolete tests Support for enslaving ports to LAGs with uppers will be added in the following patches. Selftests to make sure it actually does the right thing are ready and will be sent as a follow-up. Similarly, ordering of MACVLAN creation and RIF creation will be relaxed and it will be permitted to create a MACVLAN first. Thus these two tests are obsolete. Drop them. Signed-off-by: Petr Machata <petrm@nvidia.com> Reviewed-by: Danielle Ratson <danieller@nvidia.com> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-21 08:54:04 +01:00
Benjamin Poirier	c7e95bbda8	selftests: net: Add test cases for nexthop groups with invalid neighbors Add test cases for hash threshold (multipath) nexthop groups with invalid neighbors. Check that a nexthop with invalid neighbor is not selected when there is another nexthop with a valid neighbor. Check that there is no crash when there is no nexthop with a valid neighbor. The first test fails before the previous commit in this series. Signed-off-by: Benjamin Poirier <bpoirier@nvidia.com> Reviewed-by: David Ahern <dsahern@kernel.org> Link: https://lore.kernel.org/r/20230719-nh_select-v2-4-04383e89f868@nvidia.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-20 20:23:20 -07:00
Jakub Kicinski	59be3baa8d	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Cross-merge networking fixes after downstream PR. No conflicts or adjacent changes. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-20 15:52:55 -07:00
Linus Torvalds	57f1f9dd3a	Including fixes from BPF, netfilter, bluetooth and CAN. Current release - regressions: - eth: r8169: multiple fixes for PCIe ASPM-related problems - vrf: fix RCU lockdep splat in output path Previous releases - regressions: - gso: fall back to SW segmenting with GSO_UDP_L4 dodgy bit set - dsa: mv88e6xxx: do a final check before timing out when polling - nf_tables: fix sleep in atomic in nft_chain_validate Previous releases - always broken: - sched: fix undoing tcf_bind_filter() in multiple classifiers - bpf, arm64: fix BTI type used for freplace attached functions - can: gs_usb: fix time stamp counter initialization - nft_set_pipapo: fix improper element removal (leading to UAF) Misc: - net: support STP on bridge in non-root netns, STP prevents packet loops so not supporting it results in freezing systems of unsuspecting users, and in turn very upset noises being made - fix kdoc warnings - annotate various bits of TCP state to prevent data races Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmS5pp0ACgkQMUZtbf5S IrtudA/9Ep+URprI3tpv+VHOQMWtMd7lzz+wwEUDQSo2T6xdMcYbd1E4ZWWOPw/y jTIIVF3qde4nuI/MZtzGhvCD8v4bzhw10uRm4f4vhC2i+CzXr/UdOQSMqeZmJZgN vndixvRjHJKYxogOa+DjXgOiuQTQfuSfSnaai0kvw3zZzi4tev/Bdj6KZmFW+UK+ Q7uQZ5n8tdE4UvUdj8Jek23SZ4kL+HtQOIdAAqyduQnYnax5L5sbep0TjuCjjkpK 26rvmwYFJmEab4mC2T3Y7VDaXYM9M2f/EuFBMBVEohE3KPTTdT12WzLfJv7TTKTl hymfXgfmCXiZElzoQTJ69bFGbhqFaCJwhCUHFwYqkqj0bW9cXYJD2achpi3nVgnn CV8vfqJtkzdgh2bV2faG+1wmAm1wzHSURmT5NlnFaX6a6BYypaN7CERn7BnIdLM/ YA2wud39bL0EJsic5e3gtlyJdfhtx7iqCMzE7S5FiUZvgOmUhBZ4IWkMs6Aq5PpL FLLgBSHGEIAdLVQGvXLjfQ/LeSrW8JsiSy6deztzR+ZflvvaBIP5y8sC3+KdxAvN 3ybMsMEE5OK3i808aV3l6/8DLeAJ+DWuMc96Ix7Yyt2LXFnnV79DX49zJAEUWrc7 54FnNzkgAO/Q9aEFmmQoFt5qZmoFHuNwcHBOmXARAatQqNCwDqk= =Xifr -----END PGP SIGNATURE----- Merge tag 'net-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net Pull networking fixes from Jakub Kicinski: "Including fixes from BPF, netfilter, bluetooth and CAN. Current release - regressions: - eth: r8169: multiple fixes for PCIe ASPM-related problems - vrf: fix RCU lockdep splat in output path Previous releases - regressions: - gso: fall back to SW segmenting with GSO_UDP_L4 dodgy bit set - dsa: mv88e6xxx: do a final check before timing out when polling - nf_tables: fix sleep in atomic in nft_chain_validate Previous releases - always broken: - sched: fix undoing tcf_bind_filter() in multiple classifiers - bpf, arm64: fix BTI type used for freplace attached functions - can: gs_usb: fix time stamp counter initialization - nft_set_pipapo: fix improper element removal (leading to UAF) Misc: - net: support STP on bridge in non-root netns, STP prevents packet loops so not supporting it results in freezing systems of unsuspecting users, and in turn very upset noises being made - fix kdoc warnings - annotate various bits of TCP state to prevent data races" * tag 'net-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net: (95 commits) net: phy: prevent stale pointer dereference in phy_init() tcp: annotate data-races around fastopenq.max_qlen tcp: annotate data-races around icsk->icsk_user_timeout tcp: annotate data-races around tp->notsent_lowat tcp: annotate data-races around rskq_defer_accept tcp: annotate data-races around tp->linger2 tcp: annotate data-races around icsk->icsk_syn_retries tcp: annotate data-races around tp->keepalive_probes tcp: annotate data-races around tp->keepalive_intvl tcp: annotate data-races around tp->keepalive_time tcp: annotate data-races around tp->tsoffset tcp: annotate data-races around tp->tcp_tx_delay Bluetooth: MGMT: Use correct address for memcpy() Bluetooth: btusb: Fix bluetooth on Intel Macbook 2014 Bluetooth: SCO: fix sco_conn related locking and validity issues Bluetooth: hci_conn: return ERR_PTR instead of NULL when there is no link Bluetooth: hci_sync: Avoid use-after-free in dbg for hci_remove_adv_monitor() Bluetooth: coredump: fix building with coredump disabled Bluetooth: ISO: fix iso_conn related locking and validity issues Bluetooth: hci_event: call disconnect callback before deleting conn ...	2023-07-20 14:46:39 -07:00
Jakub Kicinski	e93165d5e7	for-netdev -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmS4IUIACgkQ6rmadz2v bTrVCw/9GG5A5ebqwoh/DrsFXEzpKDmZFIAWd5wB+Fx2i8y+6Jl/Fw6SjkkAtUnc 215T3YX2u3Xg1WFC5zxY9lYm2OeMq2lPHVwjlqgt/pHE8D6b8cZ44eyN+f0ZSiLy wyx0wHLd3oP4KvMyiqm7/ZmhDjAtBpuqMjY5FNsbUxrIGUUI2ZLC4VFVWhnWmzRA eEOQuUge4e1YD62kfkWlT/GEv710ysqFZD2zs4yhevDfmr/6DAIaA7dhfKMYsM/S hCPoCuuXWVoHiqksm0U1BwpEiAQrqR91Sx8RCAakw5Pyp5hkj9dJc9sLwkgMH/k7 2352IIPXddH8cGKQM+hIBrc/io+6MxMbVk7Pe+1OUIBrvP//zQrHWk0zbssF3D8C z6TbxBLdSzbDELPph3gZu5bNaLSkpuODhNjLcIVGSOeSJ5nsgATCQtXFAAPV0E/Q v2O7Te5aTjTOpFMcIrIK1eWXUS56yRA+YwDa1VuWXAiLrr+Rq0tm4tBqxhof3KlH bfCoqFNa12MfpCJURHICcV7DJo53rWbCtDSJPaYwZXb/jJPd3gPb8EVixoLN2A1M dV/ou9rKEEkJXxsZ4Bctuh7t5YwpqxTq74YSdvnkOJ8P1lBDYST2SfHgQVOayQPv XH9MlMO3Qtb9Sl0ZiI7gHbpK7h6v9RvRuHJcnN2e3wwMEx256xE= =VRCb -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Alexei Starovoitov says: ==================== pull-request: bpf-next 2023-07-19 We've added 45 non-merge commits during the last 3 day(s) which contain a total of 71 files changed, 7808 insertions(+), 592 deletions(-). The main changes are: 1) multi-buffer support in AF_XDP, from Maciej Fijalkowski, Magnus Karlsson, Tirthendu Sarkar. 2) BPF link support for tc BPF programs, from Daniel Borkmann. 3) Enable bpf_map_sum_elem_count kfunc for all program types, from Anton Protopopov. 4) Add 'owner' field to bpf_rb_node to fix races in shared ownership, Dave Marchevsky. 5) Prevent potential skb_header_pointer() misuse, from Alexei Starovoitov. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (45 commits) bpf, net: Introduce skb_pointer_if_linear(). bpf: sync tools/ uapi header with selftests/bpf: Add mprog API tests for BPF tcx links selftests/bpf: Add mprog API tests for BPF tcx opts bpftool: Extend net dump with tcx progs libbpf: Add helper macro to clear opts structs libbpf: Add link-based API for tcx libbpf: Add opts-based attach/detach/query API for tcx bpf: Add fd-based tcx multi-prog infra with link support bpf: Add generic attach/detach/query API for multi-progs selftests/xsk: reset NIC settings to default after running test suite selftests/xsk: add test for too many frags selftests/xsk: add metadata copy test for multi-buff selftests/xsk: add invalid descriptor test for multi-buffer selftests/xsk: add unaligned mode test for multi-buffer selftests/xsk: add basic multi-buffer test selftests/xsk: transmit and receive multi-buffer packets xsk: add multi-buffer documentation i40e: xsk: add TX multi-buffer support ice: xsk: Tx multi-buffer support ... ==================== Link: https://lore.kernel.org/r/20230719175424.75717-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-19 15:02:18 -07:00
Jakub Kicinski	e80698b7f8	for-netdev -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE+soXsSLHKoYyzcli6rmadz2vbToFAmS4IHYACgkQ6rmadz2v bTo4bA/8C38pmtG+eyca/BLd0s/SbPq2b+BPYNHdbgBXp+iK2HMzBq3s19cbhDUk UAXjDEUKOBnPx+6J37Hqq+CSiYaecRn/96Q2WQq6LfJdPSiBSVnudhJwZYFOLVZC fApGAtzWQRYtlJuTLnDPL6rLEBVShz3aCroR/NXYHGqTw79yZFqvW+0VLKgoUSgK 4Px/TC6PvsIQPtIpN+x46ATa/p0DzTbiPH9qn1vz3fXfRSXrA+4dD5pDYkDdNE+L lhBTIsrBHjc6Luz1EY3ac0haZPUAMkKEyDzT8PbsO+DKhNk/fBEgPOo+6iFTaLfE N2Ns09iw5qNnnBgHkTphw1PhabPsDGxCf7oy4uSnTW+7O6KkmyshUIk1eF6NL5hl TTPP0pAS3UJfIRtWdghatF+3ZrkGGwCI+3FzB16Hc8chLW3oyr8x6W5K7bHAJbI1 yg/nLYCkrLipm9+dRMtYjYrx8aoStGgSW0WvGTS0McpndHAJhuhdRLHkf26MFa18 dPus4xJ40njBJn2/f6xiJ24lIemasu37/vrYtHJafywUjP9a4lYkM476/0Cnr1Ek +IMrydijUUH/WeSoeO0OhevdCfzhaPHS3g/7LyV3Lnn/2xl7Qg5GEkEjmHamFamY XLbBDToJc6mWJjOZx4QV0LO3atT3Kaj3SgS0GkY/43X8gyj8GiY= =Dscx -----END PGP SIGNATURE----- Merge tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf Alexei Starovoitov says: ==================== pull-request: bpf 2023-07-19 We've added 4 non-merge commits during the last 1 day(s) which contain a total of 3 files changed, 55 insertions(+), 10 deletions(-). The main changes are: 1) Fix stack depth check in presence of async callbacks, from Kumar Kartikeya Dwivedi. 2) Fix BTI type used for freplace attached functions, from Alexander Duyck. * tag 'for-netdev' of https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf: bpf, arm64: Fix BTI type used for freplace attached functions selftests/bpf: Add more tests for check_max_stack_depth bug bpf: Repeat check_max_stack_depth for async callbacks bpf: Fix subprog idx logic in check_max_stack_depth ==================== Link: https://lore.kernel.org/r/20230719174502.74023-1-alexei.starovoitov@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-19 15:01:10 -07:00
Alan Maguire	41ee0145a4	bpf: sync tools/ uapi header with Seeing the following: Warning: Kernel ABI header at 'tools/include/uapi/linux/bpf.h' differs from latest version at 'include/uapi/linux/bpf.h' ...so sync tools version missing some list_node/rb_tree fields. Fixes: `c3c510ce43` ("bpf: Add 'owner' field to bpf_{list,rb}_node") Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/r/20230719162257.20818-1-alan.maguire@oracle.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 10:13:09 -07:00
Daniel Borkmann	c6d479b334	selftests/bpf: Add mprog API tests for BPF tcx links Add a big batch of test coverage to assert all aspects of the tcx link API: # ./vmtest.sh -- ./test_progs -t tc_links [...] #225 tc_links_after:OK #226 tc_links_append:OK #227 tc_links_basic:OK #228 tc_links_before:OK #229 tc_links_chain_classic:OK #230 tc_links_dev_cleanup:OK #231 tc_links_invalid:OK #232 tc_links_prepend:OK #233 tc_links_replace:OK #234 tc_links_revision:OK Summary: 10/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20230719140858.13224-9-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 10:07:28 -07:00
Daniel Borkmann	cd13c91d92	selftests/bpf: Add mprog API tests for BPF tcx opts Add a big batch of test coverage to assert all aspects of the tcx opts attach, detach and query API: # ./vmtest.sh -- ./test_progs -t tc_opts [...] #238 tc_opts_after:OK #239 tc_opts_append:OK #240 tc_opts_basic:OK #241 tc_opts_before:OK #242 tc_opts_chain_classic:OK #243 tc_opts_demixed:OK #244 tc_opts_detach:OK #245 tc_opts_detach_after:OK #246 tc_opts_detach_before:OK #247 tc_opts_dev_cleanup:OK #248 tc_opts_invalid:OK #249 tc_opts_mixed:OK #250 tc_opts_prepend:OK #251 tc_opts_replace:OK #252 tc_opts_revision:OK Summary: 15/0 PASSED, 0 SKIPPED, 0 FAILED Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20230719140858.13224-8-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 10:07:28 -07:00
Daniel Borkmann	57c61da8bf	bpftool: Extend net dump with tcx progs Add support to dump fd-based attach types via bpftool. This includes both the tc BPF link and attach ops programs. Dumped information contain the attach location, function entry name, program ID and link ID when applicable. Example with tc BPF link: # ./bpftool net xdp: tc: bond0(4) tcx/ingress cil_from_netdev prog_id 784 link_id 10 bond0(4) tcx/egress cil_to_netdev prog_id 804 link_id 11 flow_dissector: netfilter: Example with tc BPF attach ops: # ./bpftool net xdp: tc: bond0(4) tcx/ingress cil_from_netdev prog_id 654 bond0(4) tcx/egress cil_to_netdev prog_id 672 flow_dissector: netfilter: Currently, permanent flags are not yet supported, so 'unknown' ones are dumped via NET_DUMP_UINT_ONLY() and once we do have permanent ones, we dump them as human readable string. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Quentin Monnet <quentin@isovalent.com> Link: https://lore.kernel.org/r/20230719140858.13224-7-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 10:07:28 -07:00
Daniel Borkmann	4e9c2d9af5	libbpf: Add helper macro to clear opts structs Add a small and generic LIBBPF_OPTS_RESET() helper macros which clears an opts structure and reinitializes its .sz member to place the structure size. Additionally, the user can pass option-specific data to reinitialize via varargs. I found this very useful when developing selftests, but it is also generic enough as a macro next to the existing LIBBPF_OPTS() which hides the .sz initialization, too. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20230719140858.13224-6-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 10:07:28 -07:00
Daniel Borkmann	55cc376847	libbpf: Add link-based API for tcx Implement tcx BPF link support for libbpf. The bpf_program__attach_fd() API has been refactored slightly in order to pass bpf_link_create_opts pointer as input. A new bpf_program__attach_tcx() has been added on top of this which allows for passing all relevant data via extensible struct bpf_tcx_opts. The program sections tcx/ingress and tcx/egress correspond to the hook locations for tc ingress and egress, respectively. For concrete usage examples, see the extensive selftests that have been developed as part of this series. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230719140858.13224-5-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 10:07:28 -07:00
Daniel Borkmann	fe20ce3a51	libbpf: Add opts-based attach/detach/query API for tcx Extend libbpf attach opts and add a new detach opts API so this can be used to add/remove fd-based tcx BPF programs. The old-style bpf_prog_detach() and bpf_prog_detach2() APIs are refactored to reuse the new bpf_prog_detach_opts() internally. The bpf_prog_query_opts() API got extended to be able to handle the new link_ids, link_attach_flags and revision fields. For concrete usage examples, see the extensive selftests that have been developed as part of this series. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230719140858.13224-4-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 10:07:28 -07:00
Daniel Borkmann	e420bed025	bpf: Add fd-based tcx multi-prog infra with link support This work refactors and adds a lightweight extension ("tcx") to the tc BPF ingress and egress data path side for allowing BPF program management based on fds via bpf() syscall through the newly added generic multi-prog API. The main goal behind this work which we also presented at LPC [0] last year and a recent update at LSF/MM/BPF this year [3] is to support long-awaited BPF link functionality for tc BPF programs, which allows for a model of safe ownership and program detachment. Given the rise in tc BPF users in cloud native environments, this becomes necessary to avoid hard to debug incidents either through stale leftover programs or 3rd party applications accidentally stepping on each others toes. As a recap, a BPF link represents the attachment of a BPF program to a BPF hook point. The BPF link holds a single reference to keep BPF program alive. Moreover, hook points do not reference a BPF link, only the application's fd or pinning does. A BPF link holds meta-data specific to attachment and implements operations for link creation, (atomic) BPF program update, detachment and introspection. The motivation for BPF links for tc BPF programs is multi-fold, for example: - From Meta: "It's especially important for applications that are deployed fleet-wide and that don't "control" hosts they are deployed to. If such application crashes and no one notices and does anything about that, BPF program will keep running draining resources or even just, say, dropping packets. We at FB had outages due to such permanent BPF attachment semantics. With fd-based BPF link we are getting a framework, which allows safe, auto-detachable behavior by default, unless application explicitly opts in by pinning the BPF link." [1] - From Cilium-side the tc BPF programs we attach to host-facing veth devices and phys devices build the core datapath for Kubernetes Pods, and they implement forwarding, load-balancing, policy, EDT-management, etc, within BPF. Currently there is no concept of 'safe' ownership, e.g. we've recently experienced hard-to-debug issues in a user's staging environment where another Kubernetes application using tc BPF attached to the same prio/handle of cls_bpf, accidentally wiping all Cilium-based BPF programs from underneath it. The goal is to establish a clear/safe ownership model via links which cannot accidentally be overridden. [0,2] BPF links for tc can co-exist with non-link attachments, and the semantics are in line also with XDP links: BPF links cannot replace other BPF links, BPF links cannot replace non-BPF links, non-BPF links cannot replace BPF links and lastly only non-BPF links can replace non-BPF links. In case of Cilium, this would solve mentioned issue of safe ownership model as 3rd party applications would not be able to accidentally wipe Cilium programs, even if they are not BPF link aware. Earlier attempts [4] have tried to integrate BPF links into core tc machinery to solve cls_bpf, which has been intrusive to the generic tc kernel API with extensions only specific to cls_bpf and suboptimal/complex since cls_bpf could be wiped from the qdisc also. Locking a tc BPF program in place this way, is getting into layering hacks given the two object models are vastly different. We instead implemented the tcx (tc 'express') layer which is an fd-based tc BPF attach API, so that the BPF link implementation blends in naturally similar to other link types which are fd-based and without the need for changing core tc internal APIs. BPF programs for tc can then be successively migrated from classic cls_bpf to the new tc BPF link without needing to change the program's source code, just the BPF loader mechanics for attaching is sufficient. For the current tc framework, there is no change in behavior with this change and neither does this change touch on tc core kernel APIs. The gist of this patch is that the ingress and egress hook have a lightweight, qdisc-less extension for BPF to attach its tc BPF programs, in other words, a minimal entry point for tc BPF. The name tcx has been suggested from discussion of earlier revisions of this work as a good fit, and to more easily differ between the classic cls_bpf attachment and the fd-based one. For the ingress and egress tcx points, the device holds a cache-friendly array with program pointers which is separated from control plane (slow-path) data. Earlier versions of this work used priority to determine ordering and expression of dependencies similar as with classic tc, but it was challenged that for something more future-proof a better user experience is required. Hence this resulted in the design and development of the generic attach/detach/query API for multi-progs. See prior patch with its discussion on the API design. tcx is the first user and later we plan to integrate also others, for example, one candidate is multi-prog support for XDP which would benefit and have the same 'look and feel' from API perspective. The goal with tcx is to have maximum compatibility to existing tc BPF programs, so they don't need to be rewritten specifically. Compatibility to call into classic tcf_classify() is also provided in order to allow successive migration or both to cleanly co-exist where needed given its all one logical tc layer and the tcx plus classic tc cls/act build one logical overall processing pipeline. tcx supports the simplified return codes TCX_NEXT which is non-terminating (go to next program) and terminating ones with TCX_PASS, TCX_DROP, TCX_REDIRECT. The fd-based API is behind a static key, so that when unused the code is also not entered. The struct tcx_entry's program array is currently static, but could be made dynamic if necessary at a point in future. The a/b pair swap design has been chosen so that for detachment there are no allocations which otherwise could fail. The work has been tested with tc-testing selftest suite which all passes, as well as the tc BPF tests from the BPF CI, and also with Cilium's L4LB. Thanks also to Nikolay Aleksandrov and Martin Lau for in-depth early reviews of this work. [0] https://lpc.events/event/16/contributions/1353/ [1] https://lore.kernel.org/bpf/CAEf4BzbokCJN33Nw_kg82sO=xppXnKWEncGTWCTB9vGCmLB6pw@mail.gmail.com [2] https://colocatedeventseu2023.sched.com/event/1Jo6O/tales-from-an-ebpf-programs-murder-mystery-hemanth-malla-guillaume-fournier-datadog [3] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf [4] https://lore.kernel.org/bpf/20210604063116.234316-1-memxor@gmail.com Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Jakub Kicinski <kuba@kernel.org> Link: https://lore.kernel.org/r/20230719140858.13224-3-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 10:07:27 -07:00
Daniel Borkmann	053c8e1f23	bpf: Add generic attach/detach/query API for multi-progs This adds a generic layer called bpf_mprog which can be reused by different attachment layers to enable multi-program attachment and dependency resolution. In-kernel users of the bpf_mprog don't need to care about the dependency resolution internals, they can just consume it with few API calls. The initial idea of having a generic API sparked out of discussion [0] from an earlier revision of this work where tc's priority was reused and exposed via BPF uapi as a way to coordinate dependencies among tc BPF programs, similar as-is for classic tc BPF. The feedback was that priority provides a bad user experience and is hard to use [1], e.g.: I cannot help but feel that priority logic copy-paste from old tc, netfilter and friends is done because "that's how things were done in the past". [...] Priority gets exposed everywhere in uapi all the way to bpftool when it's right there for users to understand. And that's the main problem with it. The user don't want to and don't need to be aware of it, but uapi forces them to pick the priority. [...] Your cover letter [0] example proves that in real life different service pick the same priority. They simply don't know any better. Priority is an unnecessary magic that apps _have_ to pick, so they just copy-paste and everyone ends up using the same. The course of the discussion showed more and more the need for a generic, reusable API where the "same look and feel" can be applied for various other program types beyond just tc BPF, for example XDP today does not have multi- program support in kernel, but also there was interest around this API for improving management of cgroup program types. Such common multi-program management concept is useful for BPF management daemons or user space BPF applications coordinating internally about their attachments. Both from Cilium and Meta side [2], we've collected the following requirements for a generic attach/detach/query API for multi-progs which has been implemented as part of this work: - Support prog-based attach/detach and link API - Dependency directives (can also be combined): - BPF_F_{BEFORE,AFTER} with relative_{fd,id} which can be {prog,link,none} - BPF_F_ID flag as {fd,id} toggle; the rationale for id is so that user space application does not need CAP_SYS_ADMIN to retrieve foreign fds via bpf_*_get_fd_by_id() - BPF_F_LINK flag as {prog,link} toggle - If relative_{fd,id} is none, then BPF_F_BEFORE will just prepend, and BPF_F_AFTER will just append for attaching - Enforced only at attach time - BPF_F_REPLACE with replace_bpf_fd which can be prog, links have their own infra for replacing their internal prog - If no flags are set, then it's default append behavior for attaching - Internal revision counter and optionally being able to pass expected_revision - User space application can query current state with revision, and pass it along for attachment to assert current state before doing updates - Query also gets extension for link_ids array and link_attach_flags: - prog_ids are always filled with program IDs - link_ids are filled with link IDs when link was used, otherwise 0 - {prog,link}_attach_flags for holding {prog,link}-specific flags - Must be easy to integrate/reuse for in-kernel users The uapi-side changes needed for supporting bpf_mprog are rather minimal, consisting of the additions of the attachment flags, revision counter, and expanding existing union with relative_{fd,id} member. The bpf_mprog framework consists of an bpf_mprog_entry object which holds an array of bpf_mprog_fp (fast-path structure). The bpf_mprog_cp (control-path structure) is part of bpf_mprog_bundle. Both have been separated, so that fast-path gets efficient packing of bpf_prog pointers for maximum cache efficiency. Also, array has been chosen instead of linked list or other structures to remove unnecessary indirections for a fast point-to-entry in tc for BPF. The bpf_mprog_entry comes as a pair via bpf_mprog_bundle so that in case of updates the peer bpf_mprog_entry is populated and then just swapped which avoids additional allocations that could otherwise fail, for example, in detach case. bpf_mprog_{fp,cp} arrays are currently static, but they could be converted to dynamic allocation if necessary at a point in future. Locking is deferred to the in-kernel user of bpf_mprog, for example, in case of tcx which uses this API in the next patch, it piggybacks on rtnl. An extensive test suite for checking all aspects of this API for prog-based attach/detach and link API comes as BPF selftests in this series. Thanks also to Andrii Nakryiko for early API discussions wrt Meta's BPF prog management. [0] https://lore.kernel.org/bpf/20221004231143.19190-1-daniel@iogearbox.net [1] https://lore.kernel.org/bpf/CAADnVQ+gEY3FjCR=+DmjDR4gp5bOYZUFJQXj4agKFHT9CQPZBw@mail.gmail.com [2] http://vger.kernel.org/bpfconf2023_material/tcx_meta_netdev_borkmann.pdf Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/r/20230719140858.13224-2-daniel@iogearbox.net Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 10:07:27 -07:00
Maciej Fijalkowski	3666bccab4	selftests/xsk: reset NIC settings to default after running test suite Currently, when running ZC test suite, after finishing first run of test suite and then switching to busy-poll tests within xskxceiver, such errors are observed: libbpf: Kernel error message: ice: MTU is too large for linear frames and XDP prog does not support frags 1..26 libbpf: Kernel error message: Native and generic XDP can't be active at the same time Error attaching XDP program not ok 1 [xskxceiver.c:xsk_reattach_xdp:1568]: ERROR: 17/"File exists" this is because test suite ends with 9k MTU and native xdp program being loaded. Busy-poll tests start non-multi-buffer tests for generic mode. To fix this, let us introduce bash function that will reset NIC settings to default (e.g. 1500 MTU and no xdp progs loaded) so that test suite can continue without interrupts. It also means that after busy-poll tests NIC will have those default settings, whereas right now it is left with 9k MTU and xdp prog loaded in native mode. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-25-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 09:56:51 -07:00
Magnus Karlsson	807bf4da20	selftests/xsk: add test for too many frags Add a test that will exercise maximum number of supported fragments. This number depends on mode of the test - for SKB and DRV it will be 18 whereas for ZC this is defined by a value from NETDEV_A_DEV_XDP_ZC_MAX_SEGS netlink attribute. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # made use of new netlink attribute Link: https://lore.kernel.org/r/20230719132421.584801-24-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 09:56:51 -07:00
Magnus Karlsson	f80ddbec47	selftests/xsk: add metadata copy test for multi-buff Enable the already existing metadata copy test to also run in multi-buffer mode with 9K packets. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-23-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 09:56:50 -07:00
Magnus Karlsson	697604492b	selftests/xsk: add invalid descriptor test for multi-buffer Add a test that produces lots of nasty descriptors testing the corner cases of the descriptor validation. Some of these descriptors are valid and some are not as indicated by the valid flag. For a description of all the test combinations, please see the code. To stress the API, we need to be able to generate combinations of descriptors that make little sense. A new verbatim mode is introduced for the packet_stream to accomplish this. In this mode, all packets in the packet_stream are sent as is. We do not try to chop them up into frames that are of the right size that we know are going to work as we would normally do. The packets are just written into the Tx ring even if we know they make no sense. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> # adjusted valid flags for frags Link: https://lore.kernel.org/r/20230719132421.584801-22-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 09:56:50 -07:00
Magnus Karlsson	1005a226da	selftests/xsk: add unaligned mode test for multi-buffer Add a test for multi-buffer AF_XDP when using unaligned mode. The test sends 4096 9K-buffers. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-21-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 09:56:50 -07:00
Magnus Karlsson	f540d44e05	selftests/xsk: add basic multi-buffer test Add the first basic multi-buffer test that sends a stream of 9K packets and validates that they are received at the other end. In order to enable sending and receiving multi-buffer packets, code that sets the MTU is introduced as well as modifications to the XDP programs so that they signal that they are multi-buffer enabled. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-20-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 09:56:50 -07:00
Magnus Karlsson	17f1034dd7	selftests/xsk: transmit and receive multi-buffer packets Add the ability to send and receive packets that are larger than the size of a umem frame, using the AF_XDP /XDP multi-buffer support. There are three pieces of code that need to be changed to achieve this: the Rx path, the Tx path, and the validation logic. Both the Rx path and Tx could only deal with a single fragment per packet. The Tx path is extended with a new function called pkt_nb_frags() that can be used to retrieve the number of fragments a packet will consume. We then create these many fragments in a loop and fill the N-1 first ones to the max size limit to use the buffer space efficiently, and the Nth one with whatever data that is left. This goes on until we have filled in at the most BATCH_SIZE worth of descriptors and fragments. If we detect that the next packet would lead to BATCH_SIZE number of fragments sent being exceeded, we do not send this packet and finish the batch. This packet is instead sent in the next iteration of BATCH_SIZE fragments. For Rx, we loop over all fragments we receive as usual, but for every descriptor that we receive we call a new validation function called is_frag_valid() to validate the consistency of this fragment. The code then checks if the packet continues in the next frame. If so, it loops over the next packet and performs the same validation. once we have received the last fragment of the packet we also call the function is_pkt_valid() to validate the packet as a whole. If we get to the end of the batch and we are not at the end of the current packet, we back out the partial packet and end the loop. Once we get into the receive loop next time, we start over from the beginning of that packet. This so the code becomes simpler at the cost of some performance. The validation function is_frag_valid() checks that the sequence and packet numbers are correct at the start and end of each fragment. Signed-off-by: Magnus Karlsson <magnus.karlsson@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-19-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 09:56:50 -07:00
Maciej Fijalkowski	13ce2daa25	xsk: add new netlink attribute dedicated for ZC max frags Introduce new netlink attribute NETDEV_A_DEV_XDP_ZC_MAX_SEGS that will carry maximum fragments that underlying ZC driver is able to handle on TX side. It is going to be included in netlink response only when driver supports ZC. Any value higher than 1 implies multi-buffer ZC support on underlying device. Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com> Link: https://lore.kernel.org/r/20230719132421.584801-11-maciej.fijalkowski@intel.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 09:56:49 -07:00
Anton Protopopov	72829b1c1f	bpf: allow any program to use the bpf_map_sum_elem_count kfunc Register the bpf_map_sum_elem_count func for all programs, and update the map_ptr subtest of the test_progs test to test the new functionality. The usage is allowed as long as the pointer to the map is trusted (when using tracing programs) or is a const pointer to map, as in the following example: struct { __uint(type, BPF_MAP_TYPE_HASH); ... } hash SEC(".maps"); ... static inline int some_bpf_prog(void) { struct bpf_map map = (struct bpf_map )&hash; __s64 count; count = bpf_map_sum_elem_count(map); ... } Signed-off-by: Anton Protopopov <aspsk@isovalent.com> Link: https://lore.kernel.org/r/20230719092952.41202-5-aspsk@isovalent.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-19 09:48:53 -07:00
Matthieu Baerts	f589234e1a	selftests: mptcp: userspace_pm: format subtests results in TAP The current selftests infrastructure formats the results in TAP 13. This version doesn't support subtests and only the end result of each selftest is taken into account. It means that a single issue in a subtest of a selftest containing multiple subtests forces the whole selftest to be marked as failed. It also means that subtests results are not tracked by CIs executing selftests. MPTCP selftests run hundreds of various subtests. It is then important to track each of them and not one result per selftest. It is particularly interesting to do that when validating stable kernels with the last version of the test suite: tests might fail because a feature is not supported but the test didn't skip that part. In this case, if subtests are not tracked, the whole selftest will be marked as failed making the other subtests useless because their results are ignored. This patch formats subtests results in TAP in userspace_pm.sh selftest. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	9e86a29779	selftests: mptcp: sockopt: format subtests results in TAP The current selftests infrastructure formats the results in TAP 13. This version doesn't support subtests and only the end result of each selftest is taken into account. It means that a single issue in a subtest of a selftest containing multiple subtests forces the whole selftest to be marked as failed. It also means that subtests results are not tracked by CIs executing selftests. MPTCP selftests run hundreds of various subtests. It is then important to track each of them and not one result per selftest. It is particularly interesting to do that when validating stable kernels with the last version of the test suite: tests might fail because a feature is not supported but the test didn't skip that part. In this case, if subtests are not tracked, the whole selftest will be marked as failed making the other subtests useless because their results are ignored. This patch formats subtests results in TAP in mptcp_sockopt.sh selftest. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	675d99338e	selftests: mptcp: simult flows: format subtests results in TAP The current selftests infrastructure formats the results in TAP 13. This version doesn't support subtests and only the end result of each selftest is taken into account. It means that a single issue in a subtest of a selftest containing multiple subtests forces the whole selftest to be marked as failed. It also means that subtests results are not tracked by CIs executing selftests. MPTCP selftests run hundreds of various subtests. It is then important to track each of them and not one result per selftest. It is particularly interesting to do that when validating stable kernels with the last version of the test suite: tests might fail because a feature is not supported but the test didn't skip that part. In this case, if subtests are not tracked, the whole selftest will be marked as failed making the other subtests useless because their results are ignored. This patch formats subtests results in TAP in simult_flows.sh selftest. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	ce99025736	selftests: mptcp: diag: format subtests results in TAP The current selftests infrastructure formats the results in TAP 13. This version doesn't support subtests and only the end result of each selftest is taken into account. It means that a single issue in a subtest of a selftest containing multiple subtests forces the whole selftest to be marked as failed. It also means that subtests results are not tracked by CIs executing selftests. MPTCP selftests run hundreds of various subtests. It is then important to track each of them and not one result per selftest. It is particularly interesting to do that when validating stable kernels with the last version of the test suite: tests might fail because a feature is not supported but the test didn't skip that part. In this case, if subtests are not tracked, the whole selftest will be marked as failed making the other subtests useless because their results are ignored. This patch formats subtests results in TAP in diag.sh selftest. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	7f117cd37c	selftests: mptcp: join: format subtests results in TAP The current selftests infrastructure formats the results in TAP 13. This version doesn't support subtests and only the end result of each selftest is taken into account. It means that a single issue in a subtest of a selftest containing multiple subtests forces the whole selftest to be marked as failed. It also means that subtests results are not tracked by CIs executing selftests. MPTCP selftests run hundreds of various subtests. It is then important to track each of them and not one result per selftest. It is particularly interesting to do that when validating stable kernels with the last version of the test suite: tests might fail because a feature is not supported but the test didn't skip that part. In this case, if subtests are not tracked, the whole selftest will be marked as failed making the other subtests useless because their results are ignored. This patch formats subtests results in TAP in mptcp_join.sh selftest. In this selftest and before starting each subtest, the 'reset' function is called. We can then check if the previous test has passed, failed or has been skipped from there. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	d85555ac11	selftests: mptcp: pm_netlink: format subtests results in TAP The current selftests infrastructure formats the results in TAP 13. This version doesn't support subtests and only the end result of each selftest is taken into account. It means that a single issue in a subtest of a selftest containing multiple subtests forces the whole selftest to be marked as failed. It also means that subtests results are not tracked by CIs executing selftests. MPTCP selftests run hundreds of various subtests. It is then important to track each of them and not one result per selftest. It is particularly interesting to do that when validating stable kernels with the last version of the test suite: tests might fail because a feature is not supported but the test didn't skip that part. In this case, if subtests are not tracked, the whole selftest will be marked as failed making the other subtests useless because their results are ignored. This patch formats subtests results in TAP in pm_netlink.sh selftest. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	dd350f46e3	selftests: mptcp: connect: format subtests results in TAP The current selftests infrastructure formats the results in TAP 13. This version doesn't support subtests and only the end result of each selftest is taken into account. It means that a single issue in a subtest of a selftest containing multiple subtests forces the whole selftest to be marked as failed. It also means that subtests results are not tracked by CIs executing selftests. MPTCP selftests run hundreds of various subtests. It is then important to track each of them and not one result per selftest. It is particularly interesting to do that when validating stable kernels with the last version of the test suite: tests might fail because a feature is not supported but the test didn't skip that part. In this case, if subtests are not tracked, the whole selftest will be marked as failed making the other subtests useless because their results are ignored. This patch formats subtests results in TAP in mptcp_connect.sh selftest. Link: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	c4192967e6	selftests: mptcp: lib: format subtests results in TAP The current selftests infrastructure formats the results in TAP 13. This version doesn't support subtests and only the end result of each selftest is taken into account. It means that a single issue in a subtest of a selftest containing multiple subtests forces the whole selftest to be marked as failed. It also means that subtests results are not tracked by CIs executing selftests. MPTCP selftests run hundreds of various subtests. It is then important to track each of them and not one result per selftest. It is particularly interesting to do that when validating stable kernels with the last version of the test suite: tests might fail because a feature is not supported but the test didn't skip that part. In this case, if subtests are not tracked, the whole selftest will be marked as failed making the other subtests useless because their results are ignored. This patch adds some helpers in mptcp_lib.sh to be able to easily format subtests results in TAP in the different MPTCP selftests. Closes: https://github.com/multipath-tcp/mptcp_net-next/issues/368 Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	d8463d8165	selftests: mptcp: userspace_pm: reduce dup code around printf In this selftest, "printf" is always used with "stdbuf". With a new helper, it is possible to call "stdbuf" only from one place. This makes the code a bit clearer to read. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	e198ad7592	selftests: mptcp: userspace_pm: uniform results printing There are a few reasons to do that: - When the tabs are not printed as 8 spaces, some results were not properly aligned - Some lines printing the test name were very long due to the use of a lot of spaces/tabs at the end and stdbuf at the beginning. - To reduce duplicated code, e.g. to print what has failed and set the status But by centralising how the test results are printed, this also prepares future commits to avoid more duplicated code and ease the tracking of the different subtests. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	8320b1387a	selftests: mptcp: userspace_pm: fix shellcheck warnings shellcheck recently helped to find an issue where a wrong variable name was used. It is then good to fix the other harmless issues in order to spot "real" ones later. Here, three categories of warnings are ignored: - SC2317: Command appears to be unreachable. The cleanup() function is invoke indirectly via the EXIT trap. - SC2034: Variable appears unused. The check_expected_one() function takes the name of the variable in argument but it ends up reading the content: indirect usage. - SC2086: Double quote to prevent globbing and word splitting. This is recommended but the current usage is correct and there is no need to do all these modifications to be compliant with this rule. One error has been fixed with SC2181: Check exit code directly with e.g. 'if ! mycmd;', not indirectly with $?. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	e141c1e8e4	selftests: mptcp: userspace pm: don't stop if error No more tests were executed after a failure but it is still interesting to get results for all the tests to better understand what's still OK and what's not after a modification. Now we only exit earlier if the two connections cannot be established. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Matthieu Baerts	edbc16c43b	selftests: mptcp: connect: don't stop if error No more tests were executed after a failure but it is still interesting to get results for all the tests to better understand what's still OK and what's not after a modification. Now we only exit earlier if the basic tests are failing: no ping going through namespaces or unable to transfer data on the loopback interface. Acked-by: Paolo Abeni <pabeni@redhat.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 11:10:52 +01:00
Ido Schimmel	b408453053	selftests: net: Add bridge backup port and backup nexthop ID test Add test cases for bridge backup port and backup nexthop ID, testing both good and bad flows. Example truncated output: # ./test_bridge_backup_port.sh [...] Tests passed: 83 Tests failed: 0 Signed-off-by: Ido Schimmel <idosch@nvidia.com> Acked-by: Nikolay Aleksandrov <razor@blackwall.org> Signed-off-by: David S. Miller <davem@davemloft.net>	2023-07-19 10:53:49 +01:00
Mahmoud Maatuq	3645c71b58	selftests/net: replace manual array size calc with ARRAYSIZE macro. fixes coccinelle WARNING: Use ARRAY_SIZE Signed-off-by: Mahmoud Maatuq <mahmoudmatook.mm@gmail.com> Link: https://lore.kernel.org/r/20230716184349.2124858-1-mahmoudmatook.mm@gmail.com Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 17:43:51 -07:00
Dave Marchevsky	f3514a5d67	selftests/bpf: Disable newly-added 'owner' field test until refcount re-enabled The test added in previous patch will fail with bpf_refcount_acquire disabled. Until all races are fixed and bpf_refcount_acquire is re-enabled on bpf-next, disable the test so CI doesn't complain. Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/r/20230718083813.3416104-6-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-18 17:23:10 -07:00
Dave Marchevsky	fdf48dc2d0	selftests/bpf: Add rbtree test exercising race which 'owner' field prevents This patch adds a runnable version of one of the races described by Kumar in [0]. Specifically, this interleaving: (rbtree1 and list head protected by lock1, rbtree2 protected by lock2) Prog A Prog B ====================================== n = bpf_obj_new(...) m = bpf_refcount_acquire(n) kptr_xchg(map, m) m = kptr_xchg(map, NULL) lock(lock2) bpf_rbtree_add(rbtree2, m->r, less) unlock(lock2) lock(lock1) bpf_list_push_back(head, n->l) /* make n non-owning ref / bpf_rbtree_remove(rbtree1, n->r) unlock(lock1) The above interleaving, the node's struct bpf_rb_node r can be used to add it to either rbtree1 or rbtree2, which are protected by different locks. If the node has been added to rbtree2, we should not be allowed to remove it while holding rbtree1's lock. Before changes in the previous patch in this series, the rbtree_remove in the second part of Prog A would succeed as the verifier has no way of knowing which tree owns a particular node at verification time. The addition of 'owner' field results in bpf_rbtree_remove correctly failing. The test added in this patch splits "Prog A" above into two separate BPF programs - A1 and A2 - and uses a second mapval + kptr_xchg to pass n from A1 to A2 similarly to the pass from A1 to B. If the test is run without the fix applied, the remove will succeed. Kumar's example had the two programs running on separate CPUs. This patch doesn't do this as it's not necessary to exercise the broken behavior / validate fixed behavior. [0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2 Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230718083813.3416104-5-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-18 17:23:10 -07:00
Dave Marchevsky	c3c510ce43	bpf: Add 'owner' field to bpf_{list,rb}_node As described by Kumar in [0], in shared ownership scenarios it is necessary to do runtime tracking of {rb,list} node ownership - and synchronize updates using this ownership information - in order to prevent races. This patch adds an 'owner' field to struct bpf_list_node and bpf_rb_node to implement such runtime tracking. The owner field is a void * that describes the ownership state of a node. It can have the following values: NULL - the node is not owned by any data structure BPF_PTR_POISON - the node is in the process of being added to a data structure ptr_to_root - the pointee is a data structure 'root' (bpf_rb_root / bpf_list_head) which owns this node The field is initially NULL (set by bpf_obj_init_field default behavior) and transitions states in the following sequence: Insertion: NULL -> BPF_PTR_POISON -> ptr_to_root Removal: ptr_to_root -> NULL Before a node has been successfully inserted, it is not protected by any root's lock, and therefore two programs can attempt to add the same node to different roots simultaneously. For this reason the intermediate BPF_PTR_POISON state is necessary. For removal, the node is protected by some root's lock so this intermediate hop isn't necessary. Note that bpf_list_pop_{front,back} helpers don't need to check owner before removing as the node-to-be-removed is not passed in as input and is instead taken directly from the list. Do the check anyways and WARN_ON_ONCE in this unexpected scenario. Selftest changes in this patch are entirely mechanical: some BTF tests have hardcoded struct sizes for structs that contain bpf_{list,rb}_node fields, those were adjusted to account for the new sizes. Selftest additions to validate the owner field are added in a further patch in the series. [0]: https://lore.kernel.org/bpf/d7hyspcow5wtjcmw4fugdgyp3fwhljwuscp3xyut5qnwivyeru@ysdq543otzv2 Signed-off-by: Dave Marchevsky <davemarchevsky@fb.com> Suggested-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230718083813.3416104-4-davemarchevsky@fb.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-18 17:23:10 -07:00
Matthieu Baerts	031c99e71f	selftests: tc: add ConnTrack procfs kconfig When looking at the TC selftest reports, I noticed one test was failing because /proc/net/nf_conntrack was not available. not ok 373 3992 - Add ct action triggering DNAT tuple conflict Could not match regex pattern. Verify command output: cat: /proc/net/nf_conntrack: No such file or directory It is only available if NF_CONNTRACK_PROCFS kconfig is set. So the issue can be fixed simply by adding it to the list of required kconfig. Fixes: `e469056413` ("tc-testing: add test for ct DNAT tuple collision") Cc: stable@vger.kernel.org Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessares.net/T/ [1] Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Tested-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20230713-tc-selftests-lkft-v1-3-1eb4fd3a96e7@tessares.net Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 16:52:12 -07:00
Matthieu Baerts	719b4774a8	selftests: tc: add 'ct' action kconfig dep When looking for something else in LKFT reports [1], I noticed most of the tests were skipped because the "teardown stage" did not complete successfully. Pedro found out this is due to the fact CONFIG_NF_FLOW_TABLE is required but not listed in the 'config' file. Adding it to the list fixes the issues on LKFT side. CONFIG_NET_ACT_CT is now set to 'm' in the final kconfig. Fixes: `c34b961a24` ("net/sched: act_ct: Create nf flow table per zone") Cc: stable@vger.kernel.org Link: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230711/testrun/18267241/suite/kselftest-tc-testing/test/tc-testing_tdc_sh/log [1] Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessares.net/T/ [2] Suggested-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Tested-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20230713-tc-selftests-lkft-v1-2-1eb4fd3a96e7@tessares.net Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 16:52:12 -07:00
Matthieu Baerts	fda05798c2	selftests: tc: set timeout to 15 minutes When looking for something else in LKFT reports [1], I noticed that the TC selftest ended with a timeout error: not ok 1 selftests: tc-testing: tdc.sh # TIMEOUT 45 seconds The timeout had been introduced 3 years ago, see the Fixes commit below. This timeout is only in place when executing the selftests via the kselftests runner scripts. I guess this is not what most TC devs are using and nobody noticed the issue before. The new timeout is set to 15 minutes as suggested by Pedro [2]. It looks like it is plenty more time than what it takes in "normal" conditions. Fixes: `852c8cbf34` ("selftests/kselftest/runner.sh: Add 45 second timeout per test") Cc: stable@vger.kernel.org Link: https://qa-reports.linaro.org/lkft/linux-next-master/build/next-20230711/testrun/18267241/suite/kselftest-tc-testing/test/tc-testing_tdc_sh/log [1] Link: https://lore.kernel.org/netdev/0e061d4a-9a23-9f58-3b35-d8919de332d7@tessares.net/T/ [2] Suggested-by: Pedro Tammela <pctammela@mojatatu.com> Signed-off-by: Matthieu Baerts <matthieu.baerts@tessares.net> Reviewed-by: Zhengchao Shao <shaozhengchao@huawei.com> Link: https://lore.kernel.org/r/20230713-tc-selftests-lkft-v1-1-1eb4fd3a96e7@tessares.net Acked-by: Jamal Hadi Salim <jhs@mojatatu.com> Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2023-07-18 16:52:11 -07:00
Kumar Kartikeya Dwivedi	824adae453	selftests/bpf: Add more tests for check_max_stack_depth bug Another test which now exercies the path of the verifier where it will explore call chains rooted at the async callback. Without the prior fixes, this program loads successfully, which is incorrect. Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/r/20230717161530.1238-4-memxor@gmail.com Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2023-07-18 15:21:09 -07:00
Linus Torvalds	ccff6d117d	perf tools fixes for v6.5: - Don't group events when computing metrics that require more than the maximum number of simultaneously enabled events on AMD systems. - Fix multi CU handling in 'perf probe', add a 'perf test' entry to regress test it. - Make the 'perf test task_exit' stop generating samples by using the 'dummy' event, all it is testing is if a PERF_RECORD_EXIT is generated at the end of a perf session. This makes this perf test to stop sometimes failing on some systems due to a full ring buffer. - Avoid SEGV if PMU lookup fails for legacy cache terms. - Fix libsubcmd SEGV/use-after-free when commands aren't excluded. - Fix OpenCSD (ARM64's CoreSight hardware tracing) library path resolution when specifying CSLIBS= in the make command line. - Fix broken feature check for libtracefs due to external lib changes, use the provided pkgconfig file instead future proof it. - Sync drm, fcntl, kvm, mount, prctl, socket, vhost, asound, arm64's cputype headers with the kernel sources, in some cases this made the tools become aware of new kernel APIs such as ioctls and the cachestat sysctl. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZLbnYwAKCRCyPKLppCJ+ J74PAQD4FOQ1ZPDsE1z5cJZYRnA9T9CVqZtMhr0D0gzTY40ZpAD+JdtX3Qftrpuo rmjgGWzwtAmuQXSqTbFqJ6IsjlCkMw8= =shqX -----END PGP SIGNATURE----- Merge tag 'perf-tools-fixes-for-v6.5-1-2023-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools Pull perf tools fixes from Arnaldo Carvalho de Melo: - Don't group events when computing metrics that require more than the maximum number of simultaneously enabled events on AMD systems. - Fix multi CU handling in 'perf probe', add a 'perf test' entry to regress test it. - Make the 'perf test task_exit' stop generating samples by using the 'dummy' event, all it is testing is if a PERF_RECORD_EXIT is generated at the end of a perf session. This makes this perf test to stop sometimes failing on some systems due to a full ring buffer. - Avoid SEGV if PMU lookup fails for legacy cache terms. - Fix libsubcmd SEGV/use-after-free when commands aren't excluded. - Fix OpenCSD (ARM64's CoreSight hardware tracing) library path resolution when specifying CSLIBS= in the make command line. - Fix broken feature check for libtracefs due to external lib changes, use the provided pkgconfig file instead future proof it. - Sync drm, fcntl, kvm, mount, prctl, socket, vhost, asound, arm64's cputype headers with the kernel sources, in some cases this made the tools become aware of new kernel APIs such as ioctls and the cachestat sysctl. * tag 'perf-tools-fixes-for-v6.5-1-2023-07-18' of git://git.kernel.org/pub/scm/linux/kernel/git/perf/perf-tools: perf test task_exit: No need for a cycles event to check if we get an PERF_RECORD_EXIT tools headers arm64: Sync arm64's cputype.h with the kernel sources tools include UAPI: Sync the sound/asound.h copy with the kernel sources tools include UAPI: Sync linux/vhost.h with the kernel sources perf beauty: Update copy of linux/socket.h with the kernel sources perf parse-events: Avoid SEGV if PMU lookup fails for legacy cache terms libsubcmd: Avoid SEGV/use-after-free when commands aren't excluded tools headers UAPI: Sync linux/prctl.h with the kernel sources perf build: Fix broken feature check for libtracefs due to external lib changes tools include UAPI: Sync linux/mount.h copy with the kernel sources tools headers UAPI: Sync linux/kvm.h with the kernel sources tools headers uapi: Sync linux/fcntl.h with the kernel sources perf vendor events amd: Fix large metrics perf build: Fix library not found error when using CSLIBS tools headers UAPI: Sync files changed by new cachestat syscall with the kernel sources tools headers UAPI: Sync drm/i915_drm.h with the kernel sources perf probe: Read DWARF files from the correct CU perf probe: Add test for regression introduced by switch to die_get_decl_file()	2023-07-18 14:51:29 -07:00
Linus Torvalds	4806364acf	Seven hotfixes, six of which are cc:stable and one of which addresses a post-6.5 issue. -----BEGIN PGP SIGNATURE----- iHUEABYIAB0WIQTTMBEPP41GrTpTJgfdBJ7gKXxAjgUCZLboHQAKCRDdBJ7gKXxA jqwtAP4m3MQNcYzQk8qbV+EQat/csTnrefytyD0ogFRoxcMAFAD/XT784sZzn4SU s/mL1HLk1BsubT/yQmY3lISXHDPuPAo= =5W3V -----END PGP SIGNATURE----- Merge tag 'mm-hotfixes-stable-2023-07-18-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm Pull hotfixes from Andrew Morton: "Seven hotfixes, six of which are cc:stable and one of which addresses a post-6.5 issue" * tag 'mm-hotfixes-stable-2023-07-18-12-28' of git://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm: maple_tree: fix node allocation testing on 32 bit maple_tree: fix 32 bit mas_next testing selftests/mm: mkdirty: fix incorrect position of #endif maple_tree: set the node limit when creating a new root node mm/mlock: fix vma iterator conversion of apply_vma_lock_flags() prctl: move PR_GET_AUXV out of PR_MCE_KILL selftests/mm: give scripts execute permission	2023-07-18 14:19:42 -07:00
Linus Torvalds	74f1456c4a	linux-kselftest-fixes-6.5-rc3 This Kselftest fixes update for Linux 6.5-rc3 consists of fixes to bugs that are interfering with arm64 and risc workflows. This update also includes two fixes to timer and mincore tests that are causing test failures. -----BEGIN PGP SIGNATURE----- iQIzBAABCgAdFiEEPZKym/RZuOCGeA/kCwJExA0NQxwFAmS2qwsACgkQCwJExA0N Qxxk1g//Xe6yQnKYz5OVGZmWOoXn6VbOfnHiZMuTnIIJ25p1uWVbVBws6TX10DTB U/qgqcobs7vTkKywRhfVVnWWK1AIvPsYx7Cuy9bhqPaEfiv3U9KHauZZ42j8rVOW piDCgAKpobwBxE8279zxajPndEWPC7TbugDHPig7gRF0Y+kmU9i6n1sUymegKcVy lDQwsWG5brseb1HEs5S+3HJeV2VXfmjOleqwWqUVT+8xoKeTKteaWpkEkdbZP1Xb P8nn15/sMyWMRtZHgHmG8oR53SX4/6DMWAwL5ZXnOV240Wi4IBl8DnBQQpytrcK4 YIbUNgRbrYQs7YQRYUdejlX2n2KwRBZB6If2kNF4WxZBW63JB1CftnAzbqhHLqwO 8V83PAsK9bMuwPRUUI5ZHaVp7k54VszMHP1MwkbJDAilFPeYqdKR4XJT+KZ23k1F nXdqpbadTBIvmurEOR/lKPvWMr/FFIlUAhsG4YELYfiIIUVy7bCeA6Q/bVaclqnp jdkKSUbpe3rpwmhHxD+vpZP8EAFaljEMih+z9ollEd2/d1h64Ul4UyVScyZLr06M aGoblt7C0iE7sZUek+6Zfi+bxDyx/6EKtFUKrHw2iHMc7QpEu/z8PTviMz9KbOcw uoLthFkv1QUJtSzusAEGGncNc5cPEXG7WsWW5YjDmhoof5P1Z78= =IBin -----END PGP SIGNATURE----- Merge tag 'linux-kselftest-fixes-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest Pull Kselftest fixes from Shuah Khan: "Fixes to bugs that are interfering with arm64 and risc workflows. Also two fixes to timer and mincore tests that are causing test failures" * tag 'linux-kselftest-fixes-6.5-rc3' of git://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest: selftests/arm64: fix build failure during the "emit_tests" step selftests/riscv: fix potential build failure during the "emit_tests" step tools: timers: fix freq average calculation selftests/mincore: fix skip condition for check_huge_pages test	2023-07-18 08:56:02 -07:00

1 2 3 4 5 ...

36589 Commits