linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-25 13:14:07 +08:00

Author	SHA1	Message	Date
Adrian Hunter	4bbac9a1f5	libperf evsel: Factor out perf_evsel__ioctl() Factor out perf_evsel__ioctl() so it can be reused. Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Leo Yan <leo.yan@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lore.kernel.org/lkml/20220422162402.147958-2-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2022-04-24 07:50:38 -03:00
Andrii Nakryiko	8462e0b46f	libbpf: Teach bpf_link_create() to fallback to bpf_raw_tracepoint_open() Teach bpf_link_create() to fallback to bpf_raw_tracepoint_open() on older kernels for programs that are attachable through BPF_RAW_TRACEPOINT_OPEN. This makes bpf_link_create() more unified and convenient interface for creating bpf_link-based attachments. With this approach end users can just use bpf_link_create() for tp_btf/fentry/fexit/fmod_ret/lsm program attachments without needing to care about kernel support, as libbpf will handle this transparently. On the other hand, as newer features (like BPF cookie) are added to LINK_CREATE interface, they will be readily usable though the same bpf_link_create() API without any major refactoring from user's standpoint. bpf_program__attach_btf_id() is now using bpf_link_create() internally as well and will take advantaged of this unified interface when BPF cookie is added for fentry/fexit. Doing proactive feature detection of LINK_CREATE support for fentry/tp_btf/etc is quite involved. It requires parsing vmlinux BTF, determining some stable and guaranteed to be in all kernels versions target BTF type (either raw tracepoint or fentry target function), actually attaching this program and thus potentially affecting the performance of the host kernel briefly, etc. So instead we are taking much simpler "lazy" approach of falling back to bpf_raw_tracepoint_open() call only if initial LINK_CREATE command fails. For modern kernels this will mean zero added overhead, while older kernels will incur minimal overhead with a single fast-failing LINK_CREATE call. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Kui-Feng Lee <kuifeng@fb.com> Link: https://lore.kernel.org/bpf/20220421033945.3602803-3-andrii@kernel.org	2022-04-23 00:37:02 +02:00
Josh Poimboeuf	aa3d60e050	libsubcmd: Fix OPTION_GROUP sorting The OPTION_GROUP option type is a way of grouping certain options together in the printed usage text. It happens to be completely broken, thanks to the fact that the subcmd option sorting just sorts everything, without regard for grouping. Luckily, nobody uses this option anyway, though that will change shortly. Fix it by sorting each group individually. Signed-off-by: Josh Poimboeuf <jpoimboe@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Miroslav Benes <mbenes@suse.cz> Link: https://lkml.kernel.org/r/e167ea3a11e2a9800eb062c1fd0f13e9cd05140c.1650300597.git.jpoimboe@redhat.com	2022-04-22 12:32:01 +02:00
Paolo Abeni	f70925bf99	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net drivers/net/ethernet/microchip/lan966x/lan966x_main.c `d08ed85256` ("net: lan966x: Make sure to release ptp interrupt") `c834963932` ("net: lan966x: Add FDMA functionality") Signed-off-by: Paolo Abeni <pabeni@redhat.com>	2022-04-22 09:56:00 +02:00
Gaosheng Cui	b71a2ebf74	libbpf: Remove redundant non-null checks on obj_elf Obj_elf is already non-null checked at the function entry, so remove redundant non-null checks on obj_elf. Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220421031803.2283974-1-cuigaosheng1@huawei.com	2022-04-21 09:56:26 -07:00
Grant Seltzer	a66ab9a9e6	libbpf: Add documentation to API functions This adds documentation for the following API functions: - bpf_program__set_expected_attach_type() - bpf_program__set_type() - bpf_program__set_attach_target() - bpf_program__attach() - bpf_program__pin() - bpf_program__unpin() Signed-off-by: Grant Seltzer <grantseltzer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220420161226.86803-3-grantseltzer@gmail.com	2022-04-21 16:31:07 +02:00
Grant Seltzer	df28671632	libbpf: Update API functions usage to check error This updates usage of the following API functions within libbpf so their newly added error return is checked: - bpf_program__set_expected_attach_type() - bpf_program__set_type() Signed-off-by: Grant Seltzer <grantseltzer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220420161226.86803-2-grantseltzer@gmail.com	2022-04-21 16:28:25 +02:00
Grant Seltzer	93442f132b	libbpf: Add error returns to two API functions This adds an error return to the following API functions: - bpf_program__set_expected_attach_type() - bpf_program__set_type() In both cases, the error occurs when the BPF object has already been loaded when the function is called. In this case -EBUSY is returned. Signed-off-by: Grant Seltzer <grantseltzer@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220420161226.86803-1-grantseltzer@gmail.com	2022-04-21 16:28:11 +02:00
Pu Lehui	58ca8b0572	libbpf: Support riscv USDT argument parsing logic Add riscv-specific USDT argument specification parsing logic. riscv USDT argument format is shown below: - Memory dereference case: "size@off(reg)", e.g. "-8@-88(s0)" - Constant value case: "size@val", e.g. "4@5" - Register read case: "size@reg", e.g. "-8@a1" s8 will be marked as poison while it's a reg of riscv, we need to alias it in advance. Both RV32 and RV64 have been tested. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220419145238.482134-3-pulehui@huawei.com	2022-04-19 21:59:35 -07:00
Pu Lehui	5af25a410a	libbpf: Fix usdt_cookie being cast to 32 bits The usdt_cookie is defined as __u64, which should not be used as a long type because it will be cast to 32 bits in 32-bit platforms. Signed-off-by: Pu Lehui <pulehui@huawei.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220419145238.482134-2-pulehui@huawei.com	2022-04-19 21:59:35 -07:00
Andrii Nakryiko	a3820c4811	libbpf: Support opting out from autoloading BPF programs declaratively Establish SEC("?abc") naming convention (i.e., adding question mark in front of otherwise normal section name) that allows to set corresponding program's autoload property to false. This is effectively just a declarative way to do bpf_program__set_autoload(prog, false). Having a way to do this declaratively in BPF code itself is useful and convenient for various scenarios. E.g., for testing, when BPF object consists of multiple independent BPF programs that each needs to be tested separately. Opting out all of them by default and then setting autoload to true for just one of them at a time simplifies testing code (see next patch for few conversions in BPF selftests taking advantage of this new feature). Another real-world use case is in libbpf-tools for cases when different BPF programs have to be picked depending on particulars of the host kernel due to various incompatible changes (like kernel function renames or signature change, or to pick kprobe vs fentry depending on corresponding kernel support for the latter). Marking all the different BPF program candidates as non-autoloaded declaratively makes this more obvious in BPF source code and allows simpler code in user-space code. When BPF program marked as SEC("?abc") it is otherwise treated just like SEC("abc") and bpf_program__section_name() reported will be "abc". Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220419002452.632125-1-andrii@kernel.org	2022-04-19 13:48:20 -07:00
Adrian Hunter	a668cc07f9	perf tools: Fix segfault accessing sample_id xyarray perf_evsel::sample_id is an xyarray which can cause a segfault when accessed beyond its size. e.g. # perf record -e intel_pt// -C 1 sleep 1 Segmentation fault (core dumped) # That is happening because a dummy event is opened to capture text poke events accross all CPUs, however the mmap logic is allocating according to the number of user_requested_cpus. In general, perf sometimes uses the evsel cpus to open events, and sometimes the evlist user_requested_cpus. However, it is not necessary to determine which case is which because the opened event file descriptors are also in an xyarray, the size of whch can be used to correctly allocate the size of the sample_id xyarray, because there is one ID per file descriptor. Note, in the affected code path, perf_evsel fd array is subsequently used to get the file descriptor for the mmap, so it makes sense for the xyarrays to be the same size there. Fixes: `d1a177595b` ("libperf: Adopt perf_evlist__mmap()/munmap() from tools/perf") Fixes: `246eba8e90` ("perf tools: Add support for PERF_RECORD_TEXT_POKE") Signed-off-by: Adrian Hunter <adrian.hunter@intel.com> Acked-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: stable@vger.kernel.org # 5.5+ Link: https://lore.kernel.org/r/20220413114232.26914-1-adrian.hunter@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2022-04-13 22:23:02 -03:00
Alan Maguire	0f8619929c	libbpf: Usdt aarch64 arg parsing support Parsing of USDT arguments is architecture-specific. On aarch64 it is relatively easy since registers used are x[0-31] and sp. Format is slightly different compared to x86_64. Possible forms are: - "size@[reg[,offset]]" for dereferences, e.g. "-8@[sp,76]" and "-4@[sp]"; - "size@reg" for register values, e.g. "-4@x0"; - "size@value" for raw values, e.g. "-8@1". Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1649690496-1902-2-git-send-email-alan.maguire@oracle.com	2022-04-11 15:32:28 -07:00
Runqing Yang	d252a4a499	libbpf: Fix a bug with checking bpf_probe_read_kernel() support in old kernels Background: Libbpf automatically replaces calls to BPF bpf_probe_read_{kernel,user} [_str]() helpers with bpf_probe_read[_str](), if libbpf detects that kernel doesn't support new APIs. Specifically, libbpf invokes the probe_kern_probe_read_kernel function to load a small eBPF program into the kernel in which bpf_probe_read_kernel API is invoked and lets the kernel checks whether the new API is valid. If the loading fails, libbpf considers the new API invalid and replaces it with the old API. static int probe_kern_probe_read_kernel(void) { struct bpf_insn insns[] = { BPF_MOV64_REG(BPF_REG_1, BPF_REG_10), /* r1 = r10 (fp) / BPF_ALU64_IMM(BPF_ADD, BPF_REG_1, -8), / r1 += -8 / BPF_MOV64_IMM(BPF_REG_2, 8), / r2 = 8 / BPF_MOV64_IMM(BPF_REG_3, 0), / r3 = 0 */ BPF_RAW_INSN(BPF_JMP \| BPF_CALL, 0, 0, 0, BPF_FUNC_probe_read_kernel), BPF_EXIT_INSN(), }; int fd, insn_cnt = ARRAY_SIZE(insns); fd = bpf_prog_load(BPF_PROG_TYPE_KPROBE, NULL, "GPL", insns, insn_cnt, NULL); return probe_fd(fd); } Bug: On older kernel versions [0], the kernel checks whether the version number provided in the bpf syscall, matches the LINUX_VERSION_CODE. If not matched, the bpf syscall fails. eBPF However, the probe_kern_probe_read_kernel code does not set the kernel version number provided to the bpf syscall, which causes the loading process alwasys fails for old versions. It means that libbpf will replace the new API with the old one even the kernel supports the new one. Solution: After a discussion in [1], the solution is using BPF_PROG_TYPE_TRACEPOINT program type instead of BPF_PROG_TYPE_KPROBE because kernel does not enfoce version check for tracepoint programs. I test the patch in old kernels (4.18 and 4.19) and it works well. [0] https://elixir.bootlin.com/linux/v4.19/source/kernel/bpf/syscall.c#L1360 [1] Closes: https://github.com/libbpf/libbpf/issues/473 Signed-off-by: Runqing Yang <rainkin1993@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220409144928.27499-1-rainkin1993@gmail.com	2022-04-10 20:15:22 -07:00
Vladimir Isaev	0738599856	libbpf: Add ARC support to bpf_tracing.h Add PT_REGS macros suitable for ARCompact and ARCv2. Signed-off-by: Vladimir Isaev <isaev@synopsys.com> Signed-off-by: Sergey Matyukevich <geomatsi@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220408224442.599566-1-geomatsi@gmail.com	2022-04-10 18:53:37 -07:00
Jakub Kicinski	34ba23b44c	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2022-04-09 We've added 63 non-merge commits during the last 9 day(s) which contain a total of 68 files changed, 4852 insertions(+), 619 deletions(-). The main changes are: 1) Add libbpf support for USDT (User Statically-Defined Tracing) probes. USDTs are an abstraction built on top of uprobes, critical for tracing and BPF, and widely used in production applications, from Andrii Nakryiko. 2) While Andrii was adding support for x86{-64}-specific logic of parsing USDT argument specification, Ilya followed-up with USDT support for s390 architecture, from Ilya Leoshkevich. 3) Support name-based attaching for uprobe BPF programs in libbpf. The format supported is `u[ret]probe/binary_path:[raw_offset\|function[+offset]]`, e.g. attaching to libc malloc can be done in BPF via SEC("uprobe/libc.so.6:malloc") now, from Alan Maguire. 4) Various load/store optimizations for the arm64 JIT to shrink the image size by using arm64 str/ldr immediate instructions. Also enable pointer authentication to verify return address for JITed code, from Xu Kuohai. 5) BPF verifier fixes for write access checks to helper functions, e.g. rd-only memory from bpf__cpu_ptr() must not be passed to helpers that write into passed buffers, from Kumar Kartikeya Dwivedi. 6) Fix overly excessive stack map allocation for its base map structure and buckets which slipped-in from cleanups during the rlimit accounting removal back then, from Yuntao Wang. 7) Extend the unstable CT lookup helpers for XDP and tc/BPF to report netfilter connection tracking tuple direction, from Lorenzo Bianconi. 8) Improve bpftool dump to show BPF program/link type names, Milan Landaverde. 9) Minor cleanups all over the place from various others. https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (63 commits) bpf: Fix excessive memory allocation in stack_map_alloc() selftests/bpf: Fix return value checks in perf_event_stackmap test selftests/bpf: Add CO-RE relos into linked_funcs selftests libbpf: Use weak hidden modifier for USDT BPF-side API functions libbpf: Don't error out on CO-RE relos for overriden weak subprogs samples, bpf: Move routes monitor in xdp_router_ipv4 in a dedicated thread libbpf: Allow WEAK and GLOBAL bindings during BTF fixup libbpf: Use strlcpy() in path resolution fallback logic libbpf: Add s390-specific USDT arg spec parsing logic libbpf: Make BPF-side of USDT support work on big-endian machines libbpf: Minor style improvements in USDT code libbpf: Fix use #ifdef instead of #if to avoid compiler warning libbpf: Potential NULL dereference in usdt_manager_attach_usdt() selftests/bpf: Uprobe tests should verify param/return values libbpf: Improve string parsing for uprobe auto-attach libbpf: Improve library identification for uprobe binary path resolution selftests/bpf: Test for writes to map key from BPF helpers selftests/bpf: Test passing rdonly mem to global func bpf: Reject writes for PTR_TO_MAP_KEY in check_helper_mem_access bpf: Check PTR_TO_MEM \| MEM_RDONLY in check_helper_mem_access ... ==================== Link: https://lore.kernel.org/r/20220408231741.19116-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-04-08 17:07:29 -07:00
Andrii Nakryiko	2fa5b0f290	libbpf: Use weak hidden modifier for USDT BPF-side API functions Use __weak __hidden for bpf_usdt_xxx() APIs instead of much more confusing `static inline __noinline`. This was previously impossible due to libbpf erroring out on CO-RE relocations pointing to eliminated weak subprogs. Now that previous patch fixed this issue, switch back to __weak __hidden as it's a more direct way of specifying the desired behavior. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220408181425.2287230-3-andrii@kernel.org	2022-04-08 22:24:15 +02:00
Andrii Nakryiko	e89d57d938	libbpf: Don't error out on CO-RE relos for overriden weak subprogs During BPF static linking, all the ELF relocations and .BTF.ext information (including CO-RE relocations) are preserved for __weak subprograms that were logically overriden by either previous weak subprogram instance or by corresponding "strong" (non-weak) subprogram. This is just how native user-space linkers work, nothing new. But libbpf is over-zealous when processing CO-RE relocation to error out when CO-RE relocation belonging to such eliminated weak subprogram is encountered. Instead of erroring out on this expected situation, log debug-level message and skip the relocation. Fixes: `db2b8b0642` ("libbpf: Support CO-RE relocations for multi-prog sections") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220408181425.2287230-2-andrii@kernel.org	2022-04-08 22:24:15 +02:00
Andrii Nakryiko	3a06ec0a99	libbpf: Allow WEAK and GLOBAL bindings during BTF fixup During BTF fix up for global variables, global variable can be global weak and will have STB_WEAK binding in ELF. Support such global variables in addition to non-weak ones. This is not the problem when using BPF static linking, as BPF static linker "fixes up" BTF during generation so that libbpf doesn't have to do it anymore during bpf_object__open(), which led to this not being noticed for a while, along with a pretty rare (currently) use of __weak variables and maps. Reported-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220407230446.3980075-2-andrii@kernel.org	2022-04-08 09:16:09 -07:00
Andrii Nakryiko	3c0dfe6e4c	libbpf: Use strlcpy() in path resolution fallback logic Coverity static analyzer complains that strcpy() can cause buffer overflow. Use libbpf_strlcpy() instead to be 100% sure this doesn't happen. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220407230446.3980075-1-andrii@kernel.org	2022-04-08 09:16:09 -07:00
Ilya Leoshkevich	bd022685bd	libbpf: Add s390-specific USDT arg spec parsing logic The logic is superficially similar to that of x86, but the small differences (no need for register table and dynamic allocation of register names, no $ sign before constants) make maintaining a common implementation too burdensome. Therefore simply add a s390x-specific version of parse_usdt_arg(). Note that while bcc supports index registers, this patch does not. This should not be a problem in most cases, since s390 uses a default value "nor" for STAP_SDT_ARG_CONSTRAINT. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220407214411.257260-4-iii@linux.ibm.com	2022-04-08 07:04:20 -07:00
Ilya Leoshkevich	6f403d9d53	libbpf: Make BPF-side of USDT support work on big-endian machines BPF_USDT_ARG_REG_DEREF handling always reads 8 bytes, regardless of the actual argument size. On little-endian the relevant argument bits end up in the lower bits of val, and later on the code that handles all the argument types expects them to be there. On big-endian they end up in the upper bits of val, breaking that expectation. Fix by right-shifting val on big-endian. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220407214411.257260-3-iii@linux.ibm.com	2022-04-07 20:59:10 -07:00
Ilya Leoshkevich	e1b6df598a	libbpf: Minor style improvements in USDT code Fix several typos and references to non-existing headers. Also use __BYTE_ORDER__ instead of __BYTE_ORDER for consistency with the rest of the bpf code - see commit `45f2bebc80` ("libbpf: Fix endianness detection in BPF_CORE_READ_BITFIELD_PROBED()") for rationale). Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220407214411.257260-2-iii@linux.ibm.com	2022-04-07 20:59:10 -07:00
Andrii Nakryiko	ded6dffaed	libbpf: Fix use #ifdef instead of #if to avoid compiler warning As reported by Naresh: perf build errors on i386 [1] on Linux next-20220407 [2] usdt.c:1181:5: error: "__x86_64__" is not defined, evaluates to 0 [-Werror=undef] 1181 \| #if __x86_64__ \| ^~~~~~~~~~ usdt.c:1196:5: error: "__x86_64__" is not defined, evaluates to 0 [-Werror=undef] 1196 \| #if __x86_64__ \| ^~~~~~~~~~ cc1: all warnings being treated as errors Use #ifdef instead of #if to avoid this. Fixes: `4c59e584d1` ("libbpf: Add x86-specific USDT arg spec parsing logic") Reported-by: Naresh Kamboju <naresh.kamboju@linaro.org> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220407203842.3019904-1-andrii@kernel.org	2022-04-07 23:34:15 +02:00
Haowen Bai	e58c5c9717	libbpf: Potential NULL dereference in usdt_manager_attach_usdt() link could be null but still dereference bpf_link__destroy(&link->link) and it will lead to a null pointer access. Signed-off-by: Haowen Bai <baihaowen@meizu.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1649299098-2069-1-git-send-email-baihaowen@meizu.com	2022-04-07 11:46:33 -07:00
Alan Maguire	90db26e6be	libbpf: Improve string parsing for uprobe auto-attach For uprobe auto-attach, the parsing can be simplified for the SEC() name to a single sscanf(); the return value of the sscanf can then be used to distinguish between sections that simply specify "u[ret]probe" (and thus cannot auto-attach), those that specify "u[ret]probe/binary_path:function+offset" etc. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1649245431-29956-3-git-send-email-alan.maguire@oracle.com	2022-04-07 11:42:50 -07:00
Alan Maguire	a1c9d61b19	libbpf: Improve library identification for uprobe binary path resolution In the process of doing path resolution for uprobe attach, libraries are identified by matching a ".so" substring in the binary_path. This matches a lot of patterns that do not conform to library.so[.version] format, so instead match a ".so" _suffix_, and if that fails match a ".so." substring for the versioned library case. Suggested-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1649245431-29956-2-git-send-email-alan.maguire@oracle.com	2022-04-07 11:42:50 -07:00
Colin Ian King	a8d600f6bc	libbpf: Fix spelling mistake "libaries" -> "libraries" There is a spelling mistake in a pr_warn message. Fix it. Signed-off-by: Colin Ian King <colin.i.king@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220406080835.14879-1-colin.i.king@gmail.com	2022-04-06 10:14:27 -07:00
Andrii Nakryiko	4c59e584d1	libbpf: Add x86-specific USDT arg spec parsing logic Add x86/x86_64-specific USDT argument specification parsing. Each architecture will require their own logic, as all this is arch-specific assembly-based notation. Architectures that libbpf doesn't support for USDTs will pr_warn() with specific error and return -ENOTSUP. We use sscanf() as a very powerful and easy to use string parser. Those spaces in sscanf's format string mean "skip any whitespaces", which is pretty nifty (and somewhat little known) feature. All this was tested on little-endian architecture, so bit shifts are probably off on big-endian, which our CI will hopefully prove. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20220404234202.331384-6-andrii@kernel.org	2022-04-05 13:16:08 -07:00
Andrii Nakryiko	999783c8bb	libbpf: Wire up spec management and other arch-independent USDT logic Last part of architecture-agnostic user-space USDT handling logic is to set up BPF spec and, optionally, IP-to-ID maps from user-space. usdt_manager performs a compact spec ID allocation to utilize fixed-sized BPF maps as efficiently as possible. We also use hashmap to deduplicate USDT arg spec strings and map identical strings to single USDT spec, minimizing the necessary BPF map size. usdt_manager supports arbitrary sequences of attachment and detachment, both of the same USDT and multiple different USDTs and internally maintains a free list of unused spec IDs. bpf_link_usdt's logic is extended with proper setup and teardown of this spec ID free list and supporting BPF maps. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20220404234202.331384-5-andrii@kernel.org	2022-04-05 13:16:07 -07:00
Andrii Nakryiko	74cc6311ce	libbpf: Add USDT notes parsing and resolution logic Implement architecture-agnostic parts of USDT parsing logic. The code is the documentation in this case, it's futile to try to succinctly describe how USDT parsing is done in any sort of concreteness. But still, USDTs are recorded in special ELF notes section (.note.stapsdt), where each USDT call site is described separately. Along with USDT provider and USDT name, each such note contains USDT argument specification, which uses assembly-like syntax to describe how to fetch value of USDT argument. USDT arg spec could be just a constant, or a register, or a register dereference (most common cases in x86_64), but it technically can be much more complicated cases, like offset relative to global symbol and stuff like that. One of the later patches will implement most common subset of this for x86 and x86-64 architectures, which seems to handle a lot of real-world production application. USDT arg spec contains a compact encoding allowing usdt.bpf.h from previous patch to handle the above 3 cases. Instead of recording which register might be needed, we encode register's offset within struct pt_regs to simplify BPF-side implementation. USDT argument can be of different byte sizes (1, 2, 4, and 8) and signed or unsigned. To handle this, libbpf pre-calculates necessary bit shifts to do proper casting and sign-extension in a short sequences of left and right shifts. The rest is in the code with sometimes extensive comments and references to external "documentation" for USDTs. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Dave Marchevsky <davemarchevsky@fb.com> Link: https://lore.kernel.org/bpf/20220404234202.331384-4-andrii@kernel.org	2022-04-05 13:16:07 -07:00
Andrii Nakryiko	2e4913e025	libbpf: Wire up USDT API and bpf_link integration Wire up libbpf USDT support APIs without yet implementing all the nitty-gritty details of USDT discovery, spec parsing, and BPF map initialization. User-visible user-space API is simple and is conceptually very similar to uprobe API. bpf_program__attach_usdt() API allows to programmatically attach given BPF program to a USDT, specified through binary path (executable or shared lib), USDT provider and name. Also, just like in uprobe case, PID filter is specified (0 - self, -1 - any process, or specific PID). Optionally, USDT cookie value can be specified. Such single API invocation will try to discover given USDT in specified binary and will use (potentially many) BPF uprobes to attach this program in correct locations. Just like any bpf_program__attach_xxx() APIs, bpf_link is returned that represents this attachment. It is a virtual BPF link that doesn't have direct kernel object, as it can consist of multiple underlying BPF uprobe links. As such, attachment is not atomic operation and there can be brief moment when some USDT call sites are attached while others are still in the process of attaching. This should be taken into consideration by user. But bpf_program__attach_usdt() guarantees that in the case of success all USDT call sites are successfully attached, or all the successfuly attachments will be detached as soon as some USDT call sites failed to be attached. So, in theory, there could be cases of failed bpf_program__attach_usdt() call which did trigger few USDT program invocations. This is unavoidable due to multi-uprobe nature of USDT and has to be handled by user, if it's important to create an illusion of atomicity. USDT BPF programs themselves are marked in BPF source code as either SEC("usdt"), in which case they won't be auto-attached through skeleton's <skel>__attach() method, or it can have a full definition, which follows the spirit of fully-specified uprobes: SEC("usdt/<path>:<provider>:<name>"). In the latter case skeleton's attach method will attempt auto-attachment. Similarly, generic bpf_program__attach() will have enought information to go off of for parameterless attachment. USDT BPF programs are actually uprobes, and as such for kernel they are marked as BPF_PROG_TYPE_KPROBE. Another part of this patch is USDT-related feature probing: - BPF cookie support detection from user-space; - detection of kernel support for auto-refcounting of USDT semaphore. The latter is optional. If kernel doesn't support such feature and USDT doesn't rely on USDT semaphores, no error is returned. But if libbpf detects that USDT requires setting semaphores and kernel doesn't support this, libbpf errors out with explicit pr_warn() message. Libbpf doesn't support poking process's memory directly to increment semaphore value, like BCC does on legacy kernels, due to inherent raciness and danger of such process memory manipulation. Libbpf let's kernel take care of this properly or gives up. Logistically, all the extra USDT-related infrastructure of libbpf is put into a separate usdt.c file and abstracted behind struct usdt_manager. Each bpf_object has lazily-initialized usdt_manager pointer, which is only instantiated if USDT programs are attempted to be attached. Closing BPF object frees up usdt_manager resources. usdt_manager keeps track of USDT spec ID assignment and few other small things. Subsequent patches will fill out remaining missing pieces of USDT initialization and setup logic. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20220404234202.331384-3-andrii@kernel.org	2022-04-05 13:16:07 -07:00
Andrii Nakryiko	d72e2968fb	libbpf: Add BPF-side of USDT support Add BPF-side implementation of libbpf-provided USDT support. This consists of single header library, usdt.bpf.h, which is meant to be used from user's BPF-side source code. This header is added to the list of installed libbpf header, along bpf_helpers.h and others. BPF-side implementation consists of two BPF maps: - spec map, which contains "a USDT spec" which encodes information necessary to be able to fetch USDT arguments and other information (argument count, user-provided cookie value, etc) at runtime; - IP-to-spec-ID map, which is only used on kernels that don't support BPF cookie feature. It allows to lookup spec ID based on the place in user application that triggers USDT program. These maps have default sizes, 256 and 1024, which are chosen conservatively to not waste a lot of space, but handling a lot of common cases. But there could be cases when user application needs to either trace a lot of different USDTs, or USDTs are heavily inlined and their arguments are located in a lot of differing locations. For such cases it might be necessary to size those maps up, which libbpf allows to do by overriding BPF_USDT_MAX_SPEC_CNT and BPF_USDT_MAX_IP_CNT macros. It is an important aspect to keep in mind. Single USDT (user-space equivalent of kernel tracepoint) can have multiple USDT "call sites". That is, single logical USDT is triggered from multiple places in user application. This can happen due to function inlining. Each such inlined instance of USDT invocation can have its own unique USDT argument specification (instructions about the location of the value of each of USDT arguments). So while USDT looks very similar to usual uprobe or kernel tracepoint, under the hood it's actually a collection of uprobes, each potentially needing different spec to know how to fetch arguments. User-visible API consists of three helper functions: - bpf_usdt_arg_cnt(), which returns number of arguments of current USDT; - bpf_usdt_arg(), which reads value of specified USDT argument (by it's zero-indexed position) and returns it as 64-bit value; - bpf_usdt_cookie(), which functions like BPF cookie for USDT programs; this is necessary as libbpf doesn't allow specifying actual BPF cookie and utilizes it internally for USDT support implementation. Each bpf_usdt_xxx() APIs expect struct pt_regs * context, passed into BPF program. On kernels that don't support BPF cookie it is used to fetch absolute IP address of the underlying uprobe. usdt.bpf.h also provides BPF_USDT() macro, which functions like BPF_PROG() and BPF_KPROBE() and allows much more user-friendly way to get access to USDT arguments, if USDT definition is static and known to the user. It is expected that majority of use cases won't have to use bpf_usdt_arg_cnt() and bpf_usdt_arg() directly and BPF_USDT() will cover all their needs. Last, usdt.bpf.h is utilizing BPF CO-RE for one single purpose: to detect kernel support for BPF cookie. If BPF CO-RE dependency is undesirable, user application can redefine BPF_USDT_HAS_BPF_COOKIE to either a boolean constant (or equivalently zero and non-zero), or even point it to its own .rodata variable that can be specified from user's application user-space code. It is important that BPF_USDT_HAS_BPF_COOKIE is known to BPF verifier as static value (thus .rodata and not just .data), as otherwise BPF code will still contain bpf_get_attach_cookie() BPF helper call and will fail validation at runtime, if not dead-code eliminated. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20220404234202.331384-2-andrii@kernel.org	2022-04-05 13:16:07 -07:00
Ilya Leoshkevich	568189310c	libbpf: Support Debian in resolve_full_path() attach_probe selftest fails on Debian-based distros with `failed to resolve full path for 'libc.so.6'`. The reason is that these distros embraced multiarch to the point where even for the "main" architecture they store libc in /lib/<triple>. This is configured in /etc/ld.so.conf and in theory it's possible to replicate the loader's parsing and processing logic in libbpf, however a much simpler solution is to just enumerate the known library paths. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220404225020.51029-1-iii@linux.ibm.com	2022-04-04 16:47:16 -07:00
Yuntao Wang	e93f39998d	libbpf: Don't return -EINVAL if hdr_len < offsetofend(core_relo_len) Since core relos is an optional part of the .BTF.ext ELF section, we should skip parsing it instead of returning -EINVAL if header size is less than offsetofend(struct btf_ext_header, core_relo_len). Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220404005320.1723055-1-ytcoode@gmail.com	2022-04-03 19:56:01 -07:00
Alan Maguire	39f8dc43b7	libbpf: Add auto-attach for uprobes based on section name Now that u[ret]probes can use name-based specification, it makes sense to add support for auto-attach based on SEC() definition. The format proposed is SEC("u[ret]probe/binary:[raw_offset\|[function_name[+offset]]") For example, to trace malloc() in libc: SEC("uprobe/libc.so.6:malloc") ...or to trace function foo2 in /usr/bin/foo: SEC("uprobe//usr/bin/foo:foo2") Auto-attach is done for all tasks (pid -1). prog can be an absolute path or simply a program/library name; in the latter case, we use PATH/LD_LIBRARY_PATH to resolve the full path, falling back to standard locations (/usr/bin:/usr/sbin or /usr/lib64:/usr/lib) if the file is not found via environment-variable specified locations. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1648654000-21758-4-git-send-email-alan.maguire@oracle.com	2022-04-03 19:55:57 -07:00
Alan Maguire	433966e3ae	libbpf: Support function name-based attach uprobes kprobe attach is name-based, using lookups of kallsyms to translate a function name to an address. Currently uprobe attach is done via an offset value as described in [1]. Extend uprobe opts for attach to include a function name which can then be converted into a uprobe-friendly offset. The calcualation is done in several steps: 1. First, determine the symbol address using libelf; this gives us the offset as reported by objdump 2. If the function is a shared library function - and the binary provided is a shared library - no further work is required; the address found is the required address 3. Finally, if the function is local, subtract the base address associated with the object, retrieved from ELF program headers. The resultant value is then added to the func_offset value passed in to specify the uprobe attach address. So specifying a func_offset of 0 along with a function name "printf" will attach to printf entry. The modes of operation supported are then 1. to attach to a local function in a binary; function "foo1" in "/usr/bin/foo" 2. to attach to a shared library function in a shared library - function "malloc" in libc. [1] https://www.kernel.org/doc/html/latest/trace/uprobetracer.html Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1648654000-21758-3-git-send-email-alan.maguire@oracle.com	2022-04-03 18:12:05 -07:00
Alan Maguire	1ce3a60e3c	libbpf: auto-resolve programs/libraries when necessary for uprobes bpf_program__attach_uprobe_opts() requires a binary_path argument specifying binary to instrument. Supporting simply specifying "libc.so.6" or "foo" should be possible too. Library search checks LD_LIBRARY_PATH, then /usr/lib64, /usr/lib. This allows users to run BPF programs prefixed with LD_LIBRARY_PATH=/path2/lib while still searching standard locations. Similarly for non .so files, we check PATH and /usr/bin, /usr/sbin. Path determination will be useful for auto-attach of BPF uprobe programs using SEC() definition. Signed-off-by: Alan Maguire <alan.maguire@oracle.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/1648654000-21758-2-git-send-email-alan.maguire@oracle.com	2022-04-03 18:11:47 -07:00
Ian Rogers	da0bfb9fdf	perf cpumap: More cpu map reuse by merge. perf_cpu_map__merge() will reuse one of its arguments if they are equal or the other argument is NULL. The arguments could be reused if it is known one set of values is a subset of the other. For example, a map of 0-1 and a map of just 0 when merged yields the map of 0-1. Currently a new map is created rather than adding a reference count to the original 0-1 map. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20220328232648.2127340-5-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2022-04-01 16:19:35 -03:00
Ian Rogers	c3ad8d23bc	perf cpumap: Add is_subset function Returns true if the second argument is a subset of the first. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20220328232648.2127340-4-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2022-04-01 16:19:35 -03:00
Ian Rogers	0df6ade711	perf evlist: Rename cpus to user_requested_cpus evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps of all evsels. For non-task targets, cpus is set to be cpus requested from the command line, defaulting to all online cpus if no cpus are specified. For an uncore event, all_cpus may be just CPU 0 or every online CPU. This causes all_cpus to have fewer values than the cpus variable which is confusing given the 'all' in the name. To try to make the behavior clearer, rename cpus to user_requested_cpus and add comments on the two struct variables. Signed-off-by: Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Starovoitov <ast@kernel.org> Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Andrii Nakryiko <andrii@kernel.org> Cc: Daniel Borkmann <daniel@iogearbox.net> Cc: German Gomez <german.gomez@arm.com> Cc: James Clark <james.clark@arm.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Fastabend <john.fastabend@gmail.com> Cc: John Garry <john.garry@huawei.com> Cc: KP Singh <kpsingh@kernel.org> Cc: Kajol Jain <kjain@linux.ibm.com> Cc: Leo Yan <leo.yan@linaro.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Martin KaFai Lau <kafai@fb.com> Cc: Mathieu Poirier <mathieu.poirier@linaro.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Riccardo Mancini <rickyman7@gmail.com> Cc: Song Liu <songliubraving@fb.com> Cc: Stephane Eranian <eranian@google.com> Cc: Suzuki Poulouse <suzuki.poulose@arm.com> Cc: Will Deacon <will@kernel.org> Cc: Yonghong Song <yhs@fb.com> Cc: bpf@vger.kernel.org Cc: coresight@lists.linaro.org Cc: linux-arm-kernel@lists.infradead.org Cc: netdev@vger.kernel.org Link: http://lore.kernel.org/lkml/20220328232648.2127340-3-irogers@google.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2022-04-01 16:19:35 -03:00
Linus Torvalds	b8321ed4a4	Kbuild updates for v5.18 - Add new environment variables, USERCFLAGS and USERLDFLAGS to allow additional flags to be passed to user-space programs. - Fix missing fflush() bugs in Kconfig and fixdep - Fix a minor bug in the comment format of the .config file - Make kallsyms ignore llvm's local labels, .L* - Fix UAPI compile-test for cross-compiling with Clang - Extend the LLVM= syntax to support LLVM=<suffix> form for using a particular version of LLVm, and LLVM=<prefix> form for using custom LLVM in a particular directory path. - Clean up Makefiles -----BEGIN PGP SIGNATURE----- iQJJBAABCgAzFiEEbmPs18K1szRHjPqEPYsBB53g2wYFAmJFGloVHG1hc2FoaXJv eUBrZXJuZWwub3JnAAoJED2LAQed4NsGH0kP/j6Vx5BqEv3tP2Q+UANxLqITleJs IFpbSesz/BhlG7I/IapWmCDSqFbYd5uJTO4ko8CsPmZHcxr6Gw3y+DN5yQACKaG/ p9xiF6GjPyKR8+VdcT2tV50+dVY8ANe/DxCyzKrJd/uyYxgARPKJh0KRMNz+d9lj ixUpCXDhx/XlKzPIlcxrvhhjevKz+NnHmN0fe6rzcOw9KzBGBTsf20Q3PqUuBOKa rWHsRGcBPA8eKLfWT1Us1jjic6cT2g4aMpWjF20YgUWKHgWVKcNHpxYKGXASVo/z ewdDnNfmwo7f7fKMCDDro9iwFWV/BumGtn43U00tnqdBcTpFojPlEOga37UPbZDF nmTblGVUhR0vn4PmfBy8WkAkbW+IpVatKwJGV4J3KjSvdWvZOmVj9VUGLVAR0TXW /YcgRs6EtG8Hn0IlCj0fvZ5wRWoDLbP2DSZ67R/44EP0GaNQPwUe4FI1izEE4EYX oVUAIxcKixWGj4RmdtmtMMdUcZzTpbgS9uloMUmS3u9LK0Ir/8tcWaf2zfMO6Jl2 p4Q31s1dUUKCnFnj0xDKRyKGUkxYebrHLfuBqi0RIc0xRpSlxoXe3Dynm9aHEQoD ZSV0eouQJxnaxM1ck5Bu4AHLgEebHfEGjWVyUHno7jFU5EI9Wpbqpe4pCYEEDTm1 +LJMEpdZO0dFvpF+ =84rW -----END PGP SIGNATURE----- Merge tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild Pull Kbuild updates from Masahiro Yamada: - Add new environment variables, USERCFLAGS and USERLDFLAGS to allow additional flags to be passed to user-space programs. - Fix missing fflush() bugs in Kconfig and fixdep - Fix a minor bug in the comment format of the .config file - Make kallsyms ignore llvm's local labels, .L* - Fix UAPI compile-test for cross-compiling with Clang - Extend the LLVM= syntax to support LLVM=<suffix> form for using a particular version of LLVm, and LLVM=<prefix> form for using custom LLVM in a particular directory path. - Clean up Makefiles * tag 'kbuild-v5.18-v2' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild: kbuild: Make $(LLVM) more flexible kbuild: add --target to correctly cross-compile UAPI headers with Clang fixdep: use fflush() and ferror() to ensure successful write to files arch: syscalls: simplify uapi/kapi directory creation usr/include: replace extra-y with always-y certs: simplify empty certs creation in certs/Makefile certs: include certs/signing_key.x509 unconditionally kallsyms: ignore all local labels prefixed by '.L' kconfig: fix missing '# end of' for empty menu kconfig: add fflush() before ferror() check kbuild: replace $(if A,A,B) with $(or A,B) kbuild: Add environment variables for userprogs flags kbuild: unify cmd_copy and cmd_shipped	2022-03-31 11:59:03 -07:00
Linus Torvalds	7b58b82b86	perf tools changes for v5.18: 1st batch New features: perf ftrace: - Add -n/--use-nsec option to the 'latency' subcommand. Default: usecs: $ sudo perf ftrace latency -T dput -a sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 2098375 \| ############################# \| 1 - 2 us \| 61 \| \| 2 - 4 us \| 33 \| \| 4 - 8 us \| 13 \| \| 8 - 16 us \| 124 \| \| 16 - 32 us \| 123 \| \| 32 - 64 us \| 1 \| \| 64 - 128 us \| 0 \| \| 128 - 256 us \| 1 \| \| 256 - 512 us \| 0 \| \| Better granularity with nsec: $ sudo perf ftrace latency -T dput -a -n sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 0 \| \| 1 - 2 ns \| 0 \| \| 2 - 4 ns \| 0 \| \| 4 - 8 ns \| 0 \| \| 8 - 16 ns \| 0 \| \| 16 - 32 ns \| 0 \| \| 32 - 64 ns \| 0 \| \| 64 - 128 ns \| 1163434 \| ############## \| 128 - 256 ns \| 914102 \| ############# \| 256 - 512 ns \| 884 \| \| 512 - 1024 ns \| 613 \| \| 1 - 2 us \| 31 \| \| 2 - 4 us \| 17 \| \| 4 - 8 us \| 7 \| \| 8 - 16 us \| 123 \| \| 16 - 32 us \| 83 \| \| perf lock: - Add -c/--combine-locks option to merge lock instances in the same class into a single entry. # perf lock report -c Name acquired contended avg wait(ns) total wait(ns) max wait(ns) min wait(ns) rcu_read_lock 251225 0 0 0 0 0 hrtimer_bases.lock 39450 0 0 0 0 0 &sb->s_type->i_l... 10301 1 662 662 662 662 ptlock_ptr(page) 10173 2 701 1402 760 642 &(ei->i_block_re... 8732 0 0 0 0 0 &xa->xa_lock 8088 0 0 0 0 0 &base->lock 6705 0 0 0 0 0 &p->pi_lock 5549 0 0 0 0 0 &dentry->d_lockr... 5010 4 1274 5097 1844 789 &ep->lock 3958 0 0 0 0 0 - Add -F/--field option to customize the list of fields to output: $ perf lock report -F contended,wait_max -k avg_wait Name contended max wait(ns) avg wait(ns) slock-AF_INET6 1 23543 23543 &lruvec->lru_lock 5 18317 11254 slock-AF_INET6 1 10379 10379 rcu_node_1 1 2104 2104 &dentry->d_lockr... 1 1844 1844 &dentry->d_lockr... 1 1672 1672 &newf->file_lock 15 2279 1025 &dentry->d_lockr... 1 792 792 - Add --synth=no option for record, as there is no need to symbolize, lock names comes from the tracepoints. perf record: - Threaded recording, opt-in, via the new --threads command line option. - Improve AMD IBS (Instruction-Based Sampling) error handling messages. perf script: - Add 'brstackinsnlen' field (use it with -F) for branch stacks. - Output branch sample type in 'perf script'. perf report: - Add "addr_from" and "addr_to" sort dimensions. - Print branch stack entry type in 'perf report --dump-raw-trace' - Fix symbolization for chrooted workloads. Hardware tracing: Intel PT: - Add CFE (Control Flow Event) and EVD (Event Data) packets support. - Add MODE.Exec IFLAG bit support. Explanation about these features from the "Intel® 64 and IA-32 architectures software developer’s manual combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4" PDF at: https://cdrdv2.intel.com/v1/dl/getContent/671200 At page 3951: <quote> 32.2.4 Event Trace is a capability that exposes details about the asynchronous events, when they are generated, and when their corresponding software event handler completes execution. These include: o Interrupts, including NMI and SMI, including the interrupt vector when defined. o Faults, exceptions including the fault vector. — Page faults additionally include the page fault address, when in context. o Event handler returns, including IRET and RSM. o VM exits and VM entries.¹ — VM exits include the values written to the “exit reason” and “exit qualification” VMCS fields. INIT and SIPI events. o TSX aborts, including the abort status returned for the RTM instructions. o Shutdown. Additionally, it provides indication of the status of the Interrupt Flag (IF), to indicate when interrupts are masked. </quote> ARM CoreSight: - Use advertised caps/min_interval as default sample_period on ARM spe. - Update deduction of TRCCONFIGR register for branch broadcast on ARM's CoreSight ETM. Vendor Events (JSON): Intel: - Update events and metrics for: Alderlake, Broadwell, Broadwell DE, BroadwellX, CascadelakeX, Elkhartlake, Bonnell, Goldmont, GoldmontPlus, Westmere EP-DP, Haswell, HaswellX, Icelake, IcelakeX, Ivybridge, Ivytown, Jaketown, Knights Landing, Nehalem EP, Sandybridge, Silvermont, Skylake, Skylake Server, SkylakeX, Tigerlake, TremontX, Westmere EP-SP, Westmere EX. ARM: - Add support for HiSilicon CPA PMU aliasing. perf stat: - Fix forked applications enablement of counters. - The 'slots' should only be printed on a different order than the one specified on the command line when 'topdown' events are present, fix it. Miscellaneous: - Sync msr-index, cpufeatures header files with the kernel sources. - Stop using some deprecated libbpf APIs in 'perf trace'. - Fix some spelling mistakes. - Refactor the maps pointers usage to pave the way for using refcount debugging. - Only offer the --tui option on perf top, report and annotate when perf was built with libslang. - Don't mention --to-ctf in 'perf data --help' when not linking with the required library, libbabeltrace. - Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by array_size.cocci. - Enhance the matching of sub-commands abbreviations: 'perf c2c rec' -> 'perf c2c record' 'perf c2c recport -> error - Set build-id using build-id header on new mmap records. - Fix generation of 'perf --version' string. perf test: - Add test for the arm_spe event. - Add test to check unwinding using fame-pointer (fp) mode on arm64. - Make metric testing more robust in 'perf test'. - Add error message for unsupported branch stack cases. libperf: - Add API for allocating new thread map array. - Fix typo in perf_evlist__open() failure error messages in libperf tests. perf c2c: - Replace bitmap_weight() with bitmap_empty() where appropriate. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> -----BEGIN PGP SIGNATURE----- iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCYj8viwAKCRCyPKLppCJ+ J8K3AQDpN45P4/TWJxVWhZlvYzJtWDSboXHZJfmBiEd4Xu2zbwD7BFW02f1ATHPr dGBFXxRQQufBIqfE+OQXG59Awp1m8wE= =1l8S -----END PGP SIGNATURE----- Merge tag 'perf-tools-for-v5.18-2022-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux Pull perf tools updates from Arnaldo Carvalho de Melo: "New features: perf ftrace: - Add -n/--use-nsec option to the 'latency' subcommand. Default: usecs: $ sudo perf ftrace latency -T dput -a sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 2098375 \| ############################# \| 1 - 2 us \| 61 \| \| 2 - 4 us \| 33 \| \| 4 - 8 us \| 13 \| \| 8 - 16 us \| 124 \| \| 16 - 32 us \| 123 \| \| 32 - 64 us \| 1 \| \| 64 - 128 us \| 0 \| \| 128 - 256 us \| 1 \| \| 256 - 512 us \| 0 \| \| Better granularity with nsec: $ sudo perf ftrace latency -T dput -a -n sleep 1 # DURATION \| COUNT \| GRAPH \| 0 - 1 us \| 0 \| \| 1 - 2 ns \| 0 \| \| 2 - 4 ns \| 0 \| \| 4 - 8 ns \| 0 \| \| 8 - 16 ns \| 0 \| \| 16 - 32 ns \| 0 \| \| 32 - 64 ns \| 0 \| \| 64 - 128 ns \| 1163434 \| ############## \| 128 - 256 ns \| 914102 \| ############# \| 256 - 512 ns \| 884 \| \| 512 - 1024 ns \| 613 \| \| 1 - 2 us \| 31 \| \| 2 - 4 us \| 17 \| \| 4 - 8 us \| 7 \| \| 8 - 16 us \| 123 \| \| 16 - 32 us \| 83 \| \| perf lock: - Add -c/--combine-locks option to merge lock instances in the same class into a single entry. # perf lock report -c Name acquired contended avg wait(ns) total wait(ns) max wait(ns) min wait(ns) rcu_read_lock 251225 0 0 0 0 0 hrtimer_bases.lock 39450 0 0 0 0 0 &sb->s_type->i_l... 10301 1 662 662 662 662 ptlock_ptr(page) 10173 2 701 1402 760 642 &(ei->i_block_re... 8732 0 0 0 0 0 &xa->xa_lock 8088 0 0 0 0 0 &base->lock 6705 0 0 0 0 0 &p->pi_lock 5549 0 0 0 0 0 &dentry->d_lockr... 5010 4 1274 5097 1844 789 &ep->lock 3958 0 0 0 0 0 - Add -F/--field option to customize the list of fields to output: $ perf lock report -F contended,wait_max -k avg_wait Name contended max wait(ns) avg wait(ns) slock-AF_INET6 1 23543 23543 &lruvec->lru_lock 5 18317 11254 slock-AF_INET6 1 10379 10379 rcu_node_1 1 2104 2104 &dentry->d_lockr... 1 1844 1844 &dentry->d_lockr... 1 1672 1672 &newf->file_lock 15 2279 1025 &dentry->d_lockr... 1 792 792 - Add --synth=no option for record, as there is no need to symbolize, lock names comes from the tracepoints. perf record: - Threaded recording, opt-in, via the new --threads command line option. - Improve AMD IBS (Instruction-Based Sampling) error handling messages. perf script: - Add 'brstackinsnlen' field (use it with -F) for branch stacks. - Output branch sample type in 'perf script'. perf report: - Add "addr_from" and "addr_to" sort dimensions. - Print branch stack entry type in 'perf report --dump-raw-trace' - Fix symbolization for chrooted workloads. Hardware tracing: Intel PT: - Add CFE (Control Flow Event) and EVD (Event Data) packets support. - Add MODE.Exec IFLAG bit support. Explanation about these features from the "Intel® 64 and IA-32 architectures software developer’s manual combined volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D, and 4" PDF at: https://cdrdv2.intel.com/v1/dl/getContent/671200 At page 3951: "32.2.4 Event Trace is a capability that exposes details about the asynchronous events, when they are generated, and when their corresponding software event handler completes execution. These include: o Interrupts, including NMI and SMI, including the interrupt vector when defined. o Faults, exceptions including the fault vector. - Page faults additionally include the page fault address, when in context. o Event handler returns, including IRET and RSM. o VM exits and VM entries.¹ - VM exits include the values written to the “exit reason” and “exit qualification” VMCS fields. INIT and SIPI events. o TSX aborts, including the abort status returned for the RTM instructions. o Shutdown. Additionally, it provides indication of the status of the Interrupt Flag (IF), to indicate when interrupts are masked" ARM CoreSight: - Use advertised caps/min_interval as default sample_period on ARM spe. - Update deduction of TRCCONFIGR register for branch broadcast on ARM's CoreSight ETM. Vendor Events (JSON): Intel: - Update events and metrics for: Alderlake, Broadwell, Broadwell DE, BroadwellX, CascadelakeX, Elkhartlake, Bonnell, Goldmont, GoldmontPlus, Westmere EP-DP, Haswell, HaswellX, Icelake, IcelakeX, Ivybridge, Ivytown, Jaketown, Knights Landing, Nehalem EP, Sandybridge, Silvermont, Skylake, Skylake Server, SkylakeX, Tigerlake, TremontX, Westmere EP-SP, and Westmere EX. ARM: - Add support for HiSilicon CPA PMU aliasing. perf stat: - Fix forked applications enablement of counters. - The 'slots' should only be printed on a different order than the one specified on the command line when 'topdown' events are present, fix it. Miscellaneous: - Sync msr-index, cpufeatures header files with the kernel sources. - Stop using some deprecated libbpf APIs in 'perf trace'. - Fix some spelling mistakes. - Refactor the maps pointers usage to pave the way for using refcount debugging. - Only offer the --tui option on perf top, report and annotate when perf was built with libslang. - Don't mention --to-ctf in 'perf data --help' when not linking with the required library, libbabeltrace. - Use ARRAY_SIZE() instead of ad hoc equivalent, spotted by array_size.cocci. - Enhance the matching of sub-commands abbreviations: 'perf c2c rec' -> 'perf c2c record' 'perf c2c recport -> error - Set build-id using build-id header on new mmap records. - Fix generation of 'perf --version' string. perf test: - Add test for the arm_spe event. - Add test to check unwinding using fame-pointer (fp) mode on arm64. - Make metric testing more robust in 'perf test'. - Add error message for unsupported branch stack cases. libperf: - Add API for allocating new thread map array. - Fix typo in perf_evlist__open() failure error messages in libperf tests. perf c2c: - Replace bitmap_weight() with bitmap_empty() where appropriate" * tag 'perf-tools-for-v5.18-2022-03-26' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (143 commits) perf evsel: Improve AMD IBS (Instruction-Based Sampling) error handling messages perf python: Add perf_env stubs that will be needed in evsel__open_strerror() perf tools: Enhance the matching of sub-commands abbreviations libperf tests: Fix typo in perf_evlist__open() failure error messages tools arm64: Import cputype.h perf lock: Add -F/--field option to control output perf lock: Extend struct lock_key to have print function perf lock: Add --synth=no option for record tools headers cpufeatures: Sync with the kernel sources tools headers cpufeatures: Sync with the kernel sources perf stat: Fix forked applications enablement of counters tools arch x86: Sync the msr-index.h copy with the kernel sources perf evsel: Make evsel__env() always return a valid env perf build-id: Fix spelling mistake "Cant" -> "Can't" perf header: Fix spelling mistake "could't" -> "couldn't" perf script: Add 'brstackinsnlen' for branch stacks perf parse-events: Move slots only with topdown perf ftrace latency: Update documentation perf ftrace latency: Add -n/--use-nsec option perf tools: Fix version kernel tag ...	2022-03-27 13:42:32 -07:00
Linus Torvalds	02f9a04d76	memblock: test suite and a small cleanup * A small cleanup of unused variable in __next_mem_pfn_range_in_zone * Initial test suite to simulate memblock behaviour in userspace -----BEGIN PGP SIGNATURE----- iQFHBAABCAAxFiEEeOVYVaWZL5900a/pOQOGJssO/ZEFAmI9bD4THHJwcHRAbGlu dXguaWJtLmNvbQAKCRA5A4Ymyw79kXwhB/wNXR1wUb/eD3eKD+aNa2KMY5+8csjD ghJph8wQmM9U9hsLViv3/M/H5+bY/s0riZNulKYrcmzW2BgIzF2ebcoqgfQ89YGV bLx7lMJGxG/lCglur9m6KnOF89//Owq6Vfk7Jd6jR/F+43JO/3+5siCbTo6NrbVw 3DjT/WzvaICA646foyFTh8WotnIRbB2iYX1k/vIA3gwJ2C6n7WwoKzxU3ulKMUzg hVlhcuTVnaV4mjFBbl23wC7i4l9dgPO9M4ZrTtlEsNHeV6uoFYRObwy6/q/CsBqI avwgV0bQDch+QuCteUXcqIcnBpcUAfGxgiqp2PYX4lXA4gYTbo7plTna =IemP -----END PGP SIGNATURE----- Merge tag 'memblock-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock Pull memblock updates from Mike Rapoport: "Test suite and a small cleanup: - A small cleanup of unused variable in __next_mem_pfn_range_in_zone - Initial test suite to simulate memblock behaviour in userspace" * tag 'memblock-v5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rppt/memblock: (27 commits) memblock tests: Add TODO and README files memblock tests: Add memblock_alloc_try_nid tests for bottom up memblock tests: Add memblock_alloc_try_nid tests for top down memblock tests: Add memblock_alloc_from tests for bottom up memblock tests: Add memblock_alloc_from tests for top down memblock tests: Add memblock_alloc tests for bottom up memblock tests: Add memblock_alloc tests for top down memblock tests: Add simulation of physical memory memblock tests: Split up reset_memblock function memblock tests: Fix testing with 32-bit physical addresses memblock: __next_mem_pfn_range_in_zone: remove unneeded local variable nid memblock tests: Add memblock_free tests memblock tests: Add memblock_add_node test memblock tests: Add memblock_remove tests memblock tests: Add memblock_reserve tests memblock tests: Add memblock_add tests memblock tests: Add memblock reset function memblock tests: Add skeleton of the memblock simulator tools/include: Add debugfs.h stub tools/include: Add pfn.h stub ...	2022-03-27 13:36:06 -07:00
Shunsuke Nakamura	c2eeac9856	libperf tests: Fix typo in perf_evlist__open() failure error messages This patch corrects typos in error messages. I should be "evlist", not "evsel" as the function that fails is perf_evlist__open(). Fixes: `3ce311afb5` ("libperf: Move to tools/lib/perf") Fixes: `a7f3713f6b` ("libperf tests: Add test_stat_multiplexing test") Signed-off-by: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220325043829.224045-2-nakamura.shun@fujitsu.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2022-03-26 10:55:57 -03:00
Linus Torvalds	169e77764a	Networking changes for 5.18. Core ---- - Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO). - Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order. - Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect). - Continue the work of annotating packet drop reasons throughout the stack. - Switch netdev error counters from an atomic to dynamically allocated per-CPU counters. - Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT. - Extend the ref_tracker to allow catching use-after-free bugs. BPF --- - Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split. - Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers. - Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency. - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator. - Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep. - Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later). - Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra. - Allow cgroup BPF progs to return custom errors to user space. - Add support for AF_UNIX iterator batching. - Allow iterator programs to use sleepable helpers. - Support JIT of add, and, or, xor and xchg atomic ops on arm64. - Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info. - Large number of libbpf API improvements, cleanups and deprecations. Protocols --------- - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev. - Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames. - Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior. - VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge. - Support inserting IPv6 IOAM information to a fraction of frames. - Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.) - Support setting socket and IPv6 options via cmsg on ping6 sockets. - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules. - Add support for locked bridge ports (for 802.1X). - tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios. - IPv6 extension header handling in Open vSwitch. - Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor. - SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY - MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237) - Multi-BSSID beacon handling in AP mode for WiFi. - Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events - Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements - Increase the max PDU size in CAN ISOTP to 64 kB. Driver API ---------- - Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels. - Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8. - Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks. - Add header/data split indication to guide user space enabling of TCP zero-copy Rx. - Allow configuring completion queue event size. - Refactor page_pool to enable fragmenting after allocation. - Add allocation and page reuse statistics to page_pool. - Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches. - DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering New hardware / drivers ---------------------- - Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches - WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6 - Mobile: - iosm: Intel M.2 7360 WWAN card Drivers ------- - Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases. - Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support - Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions - Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP - Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support - Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports - Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload - NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches - Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock - Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets - Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode - Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices - MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915 - RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan - Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS) - Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup Signed-off-by: Jakub Kicinski <kuba@kernel.org> -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEE6jPA+I1ugmIBA4hXMUZtbf5SIrsFAmI7YBcACgkQMUZtbf5S IrveSBAAmSNJlUK6vPsnNzs7IhsZnfI/AUjm2TCLZnlhKttbpI4A/4Pohk33V7RS FGX7f8kjEfhUwrIiLDgeCnztNHRECrCmk6aZc/jLEvecmTauJ+f6kjShkDY/wix+ AkPHmrZnQeLPAEVuljDdV+sL6ik08+zQL7PazIYHsaSKKC0MGQptRwcri8PLRAKE KPBAhVhleq2rAZ/ntprSN52F4Af6rpFTrPIWuN8Bqdbc9dy5094LT0mpOOWYvgr3 /DLvvAPuLemwyIQkjWknVKBRUAQcmNPC+BY3J8K3LRaiNhekGqOFan46BfqP+k2J 6DWu0Qrp2yWt4BMOeEToZR5rA6v5suUAMIBu8PRZIDkINXQMlIxHfGjZyNm0rVfw 7edNri966yus9OdzwPa32MIG3oC6PnVAwYCJAjjBMNS8sSIkp7wgHLkgWN4UFe2H K/e6z8TLF4UQ+zFM0aGI5WZ+9QqWkTWEDF3R3OhdFpGrznna0gxmkOeV2YvtsgxY cbS0vV9Zj73o+bYzgBKJsw/dAjyLdXoHUGvus26VLQ78S/VGunVKtItwoxBAYmZo krW964qcC89YofzSi8RSKLHuEWtNWZbVm8YXr75u6jpr5GhMBu0CYefLs+BuZcxy dw8c69cGneVbGZmY2J3rBhDkchbuICl8vdUPatGrOJAoaFdYKuw= =ELpe -----END PGP SIGNATURE----- Merge tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next Pull networking updates from Jakub Kicinski: "The sprinkling of SPI drivers is because we added a new one and Mark sent us a SPI driver interface conversion pull request. Core ---- - Introduce XDP multi-buffer support, allowing the use of XDP with jumbo frame MTUs and combination with Rx coalescing offloads (LRO). - Speed up netns dismantling (5x) and lower the memory cost a little. Remove unnecessary per-netns sockets. Scope some lists to a netns. Cut down RCU syncing. Use batch methods. Allow netdev registration to complete out of order. - Support distinguishing timestamp types (ingress vs egress) and maintaining them across packet scrubbing points (e.g. redirect). - Continue the work of annotating packet drop reasons throughout the stack. - Switch netdev error counters from an atomic to dynamically allocated per-CPU counters. - Rework a few preempt_disable(), local_irq_save() and busy waiting sections problematic on PREEMPT_RT. - Extend the ref_tracker to allow catching use-after-free bugs. BPF --- - Introduce "packing allocator" for BPF JIT images. JITed code is marked read only, and used to be allocated at page granularity. Custom allocator allows for more efficient memory use, lower iTLB pressure and prevents identity mapping huge pages from getting split. - Make use of BTF type annotations (e.g. __user, __percpu) to enforce the correct probe read access method, add appropriate helpers. - Convert the BPF preload to use light skeleton and drop the user-mode-driver dependency. - Allow XDP BPF_PROG_RUN test infra to send real packets, enabling its use as a packet generator. - Allow local storage memory to be allocated with GFP_KERNEL if called from a hook allowed to sleep. - Introduce fprobe (multi kprobe) to speed up mass attachment (arch bits to come later). - Add unstable conntrack lookup helpers for BPF by using the BPF kfunc infra. - Allow cgroup BPF progs to return custom errors to user space. - Add support for AF_UNIX iterator batching. - Allow iterator programs to use sleepable helpers. - Support JIT of add, and, or, xor and xchg atomic ops on arm64. - Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info. - Large number of libbpf API improvements, cleanups and deprecations. Protocols --------- - Micro-optimize UDPv6 Tx, gaining up to 5% in test on dummy netdev. - Adjust TSO packet sizes based on min_rtt, allowing very low latency links (data centers) to always send full-sized TSO super-frames. - Make IPv6 flow label changes (AKA hash rethink) more configurable, via sysctl and setsockopt. Distinguish between server and client behavior. - VxLAN support to "collect metadata" devices to terminate only configured VNIs. This is similar to VLAN filtering in the bridge. - Support inserting IPv6 IOAM information to a fraction of frames. - Add protocol attribute to IP addresses to allow identifying where given address comes from (kernel-generated, DHCP etc.) - Support setting socket and IPv6 options via cmsg on ping6 sockets. - Reject mis-use of ECN bits in IP headers as part of DSCP/TOS. Define dscp_t and stop taking ECN bits into account in fib-rules. - Add support for locked bridge ports (for 802.1X). - tun: support NAPI for packets received from batched XDP buffs, doubling the performance in some scenarios. - IPv6 extension header handling in Open vSwitch. - Support IPv6 control message load balancing in bonding, prevent neighbor solicitation and advertisement from using the wrong port. Support NS/NA monitor selection similar to existing ARP monitor. - SMC - improve performance with TCP_CORK and sendfile() - support auto-corking - support TCP_NODELAY - MCTP (Management Component Transport Protocol) - add user space tag control interface - I2C binding driver (as specified by DMTF DSP0237) - Multi-BSSID beacon handling in AP mode for WiFi. - Bluetooth: - handle MSFT Monitor Device Event - add MGMT Adv Monitor Device Found/Lost events - Multi-Path TCP: - add support for the SO_SNDTIMEO socket option - lots of selftest cleanups and improvements - Increase the max PDU size in CAN ISOTP to 64 kB. Driver API ---------- - Add HW counters for SW netdevs, a mechanism for devices which offload packet forwarding to report packet statistics back to software interfaces such as tunnels. - Select the default NIC queue count as a fraction of number of physical CPU cores, instead of hard-coding to 8. - Expose devlink instance locks to drivers. Allow device layer of drivers to use that lock directly instead of creating their own which always runs into ordering issues in devlink callbacks. - Add header/data split indication to guide user space enabling of TCP zero-copy Rx. - Allow configuring completion queue event size. - Refactor page_pool to enable fragmenting after allocation. - Add allocation and page reuse statistics to page_pool. - Improve Multiple Spanning Trees support in the bridge to allow reuse of topologies across VLANs, saving HW resources in switches. - DSA (Distributed Switch Architecture): - replay and offload of host VLAN entries - offload of static and local FDB entries on LAG interfaces - FDB isolation and unicast filtering New hardware / drivers ---------------------- - Ethernet: - LAN937x T1 PHYs - Davicom DM9051 SPI NIC driver - Realtek RTL8367S, RTL8367RB-VB switch and MDIO - Microchip ksz8563 switches - Netronome NFP3800 SmartNICs - Fungible SmartNICs - MediaTek MT8195 switches - WiFi: - mt76: MediaTek mt7916 - mt76: MediaTek mt7921u USB adapters - brcmfmac: Broadcom BCM43454/6 - Mobile: - iosm: Intel M.2 7360 WWAN card Drivers ------- - Convert many drivers to the new phylink API built for split PCS designs but also simplifying other cases. - Intel Ethernet NICs: - add TTY for GNSS module for E810T device - improve AF_XDP performance - GTP-C and GTP-U filter offload - QinQ VLAN support - Mellanox Ethernet NICs (mlx5): - support xdp->data_meta - multi-buffer XDP - offload tc push_eth and pop_eth actions - Netronome Ethernet NICs (nfp): - flow-independent tc action hardware offload (police / meter) - AF_XDP - Other Ethernet NICs: - at803x: fiber and SFP support - xgmac: mdio: preamble suppression and custom MDC frequencies - r8169: enable ASPM L1.2 if system vendor flags it as safe - macb/gem: ZynqMP SGMII - hns3: add TX push mode - dpaa2-eth: software TSO - lan743x: multi-queue, mdio, SGMII, PTP - axienet: NAPI and GRO support - Mellanox Ethernet switches (mlxsw): - source and dest IP address rewrites - RJ45 ports - Marvell Ethernet switches (prestera): - basic routing offload - multi-chain TC ACL offload - NXP embedded Ethernet switches (ocelot & felix): - PTP over UDP with the ocelot-8021q DSA tagging protocol - basic QoS classification on Felix DSA switch using dcbnl - port mirroring for ocelot switches - Microchip high-speed industrial Ethernet (sparx5): - offloading of bridge port flooding flags - PTP Hardware Clock - Other embedded switches: - lan966x: PTP Hardward Clock - qca8k: mdio read/write operations via crafted Ethernet packets - Qualcomm 802.11ax WiFi (ath11k): - add LDPC FEC type and 802.11ax High Efficiency data in radiotap - enable RX PPDU stats in monitor co-exist mode - Intel WiFi (iwlwifi): - UHB TAS enablement via BIOS - band disablement via BIOS - channel switch offload - 32 Rx AMPDU sessions in newer devices - MediaTek WiFi (mt76): - background radar detection - thermal management improvements on mt7915 - SAR support for more mt76 platforms - MBSSID and 6 GHz band on mt7915 - RealTek WiFi: - rtw89: AP mode - rtw89: 160 MHz channels and 6 GHz band - rtw89: hardware scan - Bluetooth: - mt7921s: wake on Bluetooth, SCO over I2S, wide-band-speed (WBS) - Microchip CAN (mcp251xfd): - multiple RX-FIFOs and runtime configurable RX/TX rings - internal PLL, runtime PM handling simplification - improve chip detection and error handling after wakeup" * tag 'net-next-5.18' of git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next: (2521 commits) llc: fix netdevice reference leaks in llc_ui_bind() drivers: ethernet: cpsw: fix panic when interrupt coaleceing is set via ethtool ice: don't allow to run ice_send_event_to_aux() in atomic ctx ice: fix 'scheduling while atomic' on aux critical err interrupt net/sched: fix incorrect vlan_push_eth dest field net: bridge: mst: Restrict info size queries to bridge ports net: marvell: prestera: add missing destroy_workqueue() in prestera_module_init() drivers: net: xgene: Fix regression in CRC stripping net: geneve: add missing netlink policy and size for IFLA_GENEVE_INNER_PROTO_INHERIT net: dsa: fix missing host-filtered multicast addresses net/mlx5e: Fix build warning, detected write beyond size of field iwlwifi: mvm: Don't fail if PPAG isn't supported selftests/bpf: Fix kprobe_multi test. Revert "rethook: x86: Add rethook x86 implementation" Revert "arm64: rethook: Add arm64 rethook implementation" Revert "powerpc: Add rethook support" Revert "ARM: rethook: Add rethook arm implementation" netdevice: add missing dm_private kdoc net: bridge: mst: prevent NULL deref in br_mst_info_size() selftests: forwarding: Use same VRF for port and VLAN upper ...	2022-03-24 13:13:26 -07:00
Linus Torvalds	3ce62cf4dc	flexible-array transformations for 5.18-rc1 Hi Linus, Please, pull the following treewide patch that replaces zero-length arrays with flexible-array members. This patch has been baking in linux-next for a whole development cycle. Thanks -- Gustavo -----BEGIN PGP SIGNATURE----- iQIzBAABCAAdFiEEkmRahXBSurMIg1YvRwW0y0cG2zEFAmI6GIUACgkQRwW0y0cG 2zFLWw/+OB1gZeQD3boKpUMntWnn6wjhUxdrO8CYkpzG+B+8TFECXNjy8HV1CSiw GKKRndYELOyYaD5o/F2vtPe10iPHbrdIlMFRPBRoht0/cvSZgzHlfT8EjWQwerYY dieztUFKjeSj0MXivdNDnKOTm8o9cz8KmCrWFP+My37Fasn/9+nBX8iNVIvAX4xy T+IVmjtDifQUsTs298UGnBvDeuZOiGHhXXU5rq6lIX0Rl554OsWZW94d6jUPj/h7 t1v6jdojNuyaMKn45/xnPj9VvmDiSu3K67m3fjRdzLPDOhISjr2fw4KEUOKdsebh yJ9t5u8IufyPbm9kyI+rZt+T8ZlV2/qt2+mt6QgtDMnWrs+4nU15JY0SHImMSBZQ rBEZcQlrIcGJ+CsNB8Y7jIGYO0SSkhodAvfl0LRA0AbTqLGqq0OkAQS5D52r3H2r uz6xdYb7kG43XaRyaAIPqhZsp/jk2NrXvEvin2tSaXZFR1cxp+oxcV2UajmnOU6i EIBS4PzJnYx2RZRa+h8YbBa/+D4N6+fj/tjmwBawiUBPjjaLAsGFNwUHqvBoD05S bk6oXi654NBwVjsknZ0grVz0TtSvdZ3uJL5FZApTOHITqH8vlxlNefmHri4vZRZO NN7NIQ0yaUCnorzMg+vP8ZtflhQwrMJbjwIS9YD0RHd7MBhYX8k= =xZD2 -----END PGP SIGNATURE----- Merge tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux Pull flexible-array transformations from Gustavo Silva: "Treewide patch that replaces zero-length arrays with flexible-array members. This has been baking in linux-next for a whole development cycle" * tag 'flexible-array-transformations-5.18-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/gustavoars/linux: treewide: Replace zero-length arrays with flexible-array members	2022-03-24 11:39:32 -07:00
Hengqi Chen	d0f325c34c	libbpf: Close fd in bpf_object__reuse_map pin_fd is dup-ed and assigned in bpf_map__reuse_fd. Close it in bpf_object__reuse_map after reuse. Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220319030533.3132250-1-hengqi.chen@gmail.com	2022-03-21 15:36:52 +01:00
Andrii Nakryiko	a8fee96202	libbpf: Avoid NULL deref when initializing map BTF info If BPF object doesn't have an BTF info, don't attempt to search for BTF types describing BPF map key or value layout. Fixes: `262cfb74ff` ("libbpf: Init btf_{key,value}_type_id on internal map open") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220320001911.3640917-1-andrii@kernel.org	2022-03-20 18:53:04 -07:00
Delyan Kratunov	430025e5dc	libbpf: Add subskeleton scaffolding In symmetry with bpf_object__open_skeleton(), bpf_object__open_subskeleton() performs the actual walking and linking of maps, progs, and globals described by bpf_*_skeleton objects. Signed-off-by: Delyan Kratunov <delyank@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/6942a46fbe20e7ebf970affcca307ba616985b15.1647473511.git.delyank@fb.com	2022-03-17 23:11:16 -07:00
Delyan Kratunov	262cfb74ff	libbpf: Init btf_{key,value}_type_id on internal map open For internal and user maps, look up the key and value btf types on open() and not load(), so that `bpf_map_btf_value_type_id` is usable in `bpftool gen`. Signed-off-by: Delyan Kratunov <delyank@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/78dbe4e457b4a05e098fc6c8f50014b680c86e4e.1647473511.git.delyank@fb.com	2022-03-17 23:11:15 -07:00
Delyan Kratunov	bc380eb9d0	libbpf: .text routines are subprograms in strict mode Currently, libbpf considers a single routine in .text to be a program. This is particularly confusing when it comes to library objects - a single routine meant to be used as an extern will instead be considered a bpf_program. This patch hides this compatibility behavior behind the pre-existing SEC_NAME strict mode flag. Signed-off-by: Delyan Kratunov <delyank@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/018de8d0d67c04bf436055270d35d394ba393505.1647473511.git.delyank@fb.com	2022-03-17 23:11:15 -07:00
Jiri Olsa	ddc6b04989	libbpf: Add bpf_program__attach_kprobe_multi_opts function Adding bpf_program__attach_kprobe_multi_opts function for attaching kprobe program to multiple functions. struct bpf_link * bpf_program__attach_kprobe_multi_opts(const struct bpf_program prog, const char pattern, const struct bpf_kprobe_multi_opts opts); User can specify functions to attach with 'pattern' argument that allows wildcards (?' supported) or provide symbols or addresses directly through opts argument. These 3 options are mutually exclusive. When using symbols or addresses, user can also provide cookie value for each symbol/address that can be retrieved later in bpf program with bpf_get_attach_cookie helper. struct bpf_kprobe_multi_opts { size_t sz; const char *syms; const unsigned long addrs; const __u64 cookies; size_t cnt; bool retprobe; size_t :0; }; Symbols, addresses and cookies are provided through opts object (syms/addrs/cookies) as array pointers with specified count (cnt). Each cookie value is paired with provided function address or symbol with the same array index. The program can be also attached as return probe if 'retprobe' is set. For quick usage with NULL opts argument, like: bpf_program__attach_kprobe_multi_opts(prog, "ksys_", NULL) the 'prog' will be attached as kprobe to 'ksys_*' functions. Also adding new program sections for automatic attachment: kprobe.multi/<symbol_pattern> kretprobe.multi/<symbol_pattern> The symbol_pattern is used as 'pattern' argument in bpf_program__attach_kprobe_multi_opts function. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220316122419.933957-10-jolsa@kernel.org	2022-03-17 20:17:19 -07:00
Jiri Olsa	5117c26e87	libbpf: Add bpf_link_create support for multi kprobes Adding new kprobe_multi struct to bpf_link_create_opts object to pass multiple kprobe data to link_create attr uapi. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220316122419.933957-9-jolsa@kernel.org	2022-03-17 20:17:19 -07:00
Jiri Olsa	85153ac062	libbpf: Add libbpf_kallsyms_parse function Move the kallsyms parsing in internal libbpf_kallsyms_parse function, so it can be used from other places. It will be used in following changes. Signed-off-by: Jiri Olsa <jolsa@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220316122419.933957-8-jolsa@kernel.org	2022-03-17 20:17:19 -07:00
Toke Høiland-Jørgensen	24592ad1ab	libbpf: Support batch_size option to bpf_prog_test_run Add support for setting the new batch_size parameter to BPF_PROG_TEST_RUN to libbpf; just add it as an option and pass it through to the kernel. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20220309105346.100053-4-toke@redhat.com	2022-03-09 14:19:22 -08:00
Guo Zhengkui	04b6de649e	libbpf: Fix array_size.cocci warning Fix the following coccicheck warning: tools/lib/bpf/bpf.c:114:31-32: WARNING: Use ARRAY_SIZE tools/lib/bpf/xsk.c:484:34-35: WARNING: Use ARRAY_SIZE tools/lib/bpf/xsk.c:485:35-36: WARNING: Use ARRAY_SIZE It has been tested with gcc (Debian 8.3.0-6) 8.3.0 on x86_64. Signed-off-by: Guo Zhengkui <guozhengkui@vivo.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220306023426.19324-1-guozhengkui@vivo.com	2022-03-07 22:13:00 -08:00
lic121	9c6e6a80ee	libbpf: Unmap rings when umem deleted xsk_umem__create() does mmap for fill/comp rings, but xsk_umem__delete() doesn't do the unmap. This works fine for regular cases, because xsk_socket__delete() does unmap for the rings. But for the case that xsk_socket__create_shared() fails, umem rings are not unmapped. fill_save/comp_save are checked to determine if rings have already be unmapped by xsk. If fill_save and comp_save are NULL, it means that the rings have already been used by xsk. Then they are supposed to be unmapped by xsk_socket__delete(). Otherwise, xsk_umem__delete() does the unmap. Fixes: `2f6324a393` ("libbpf: Support shared umems between queues and devices") Signed-off-by: Cheng Li <lic121@chinatelecom.cn> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220301132623.GA19995@vscode.7~	2022-03-07 21:56:54 -08:00
Andrii Nakryiko	697f104db8	libbpf: Support custom SEC() handlers Allow registering and unregistering custom handlers for BPF program. This allows user applications and libraries to plug into libbpf's declarative SEC() definition handling logic. This allows to offload complex and intricate custom logic into external libraries, but still provide a great user experience. One such example is USDT handling library, which has a lot of code and complexity which doesn't make sense to put into libbpf directly, but it would be really great for users to be able to specify BPF programs with something like SEC("usdt/<path-to-binary>:<usdt_provider>:<usdt_name>") and have correct BPF program type set (BPF_PROGRAM_TYPE_KPROBE, as it is uprobe) and even support BPF skeleton's auto-attach logic. In some cases, it might be even good idea to override libbpf's default handling, like for SEC("perf_event") programs. With custom library, it's possible to extend logic to support specifying perf event specification right there in SEC() definition without burdening libbpf with lots of custom logic or extra library dependecies (e.g., libpfm4). With current patch it's possible to override libbpf's SEC("perf_event") handling and specify a completely custom ones. Further, it's possible to specify a generic fallback handling for any SEC() that doesn't match any other custom or standard libbpf handlers. This allows to accommodate whatever legacy use cases there might be, if necessary. See doc comments for libbpf_register_prog_handler() and libbpf_unregister_prog_handler() for detailed semantics. This patch also bumps libbpf development version to v0.8 and adds new APIs there. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20220305010129.1549719-3-andrii@kernel.org	2022-03-05 09:38:15 -08:00
Andrii Nakryiko	4fa5bcfe07	libbpf: Allow BPF program auto-attach handlers to bail out Allow some BPF program types to support auto-attach only in subste of cases. Currently, if some BPF program type specifies attach callback, it is assumed that during skeleton attach operation all such programs either successfully attach or entire skeleton attachment fails. If some program doesn't support auto-attachment from skeleton, such BPF program types shouldn't have attach callback specified. This is limiting for cases when, depending on how full the SEC("") definition is, there could either be enough details to support auto-attach or there might not be and user has to use some specific API to provide more details at runtime. One specific example of such desired behavior might be SEC("uprobe"). If it's specified as just uprobe auto-attach isn't possible. But if it's SEC("uprobe/<some_binary>:<some_func>") then there are enough details to support auto-attach. Note that there is a somewhat subtle difference between auto-attach behavior of BPF skeleton and using "generic" bpf_program__attach(prog) (which uses the same attach handlers under the cover). Skeleton allow some programs within bpf_object to not have auto-attach implemented and doesn't treat that as an error. Instead such BPF programs are just skipped during skeleton's (optional) attach step. bpf_program__attach(), on the other hand, is called when user expects auto-attach to work, so if specified program doesn't implement or doesn't support auto-attach functionality, that will be treated as an error. Another improvement to the way libbpf is handling SEC()s would be to not require providing dummy kernel function name for kprobe. Currently, SEC("kprobe/whatever") is necessary even if actual kernel function is determined by user at runtime and bpf_program__attach_kprobe() is used to specify it. With changes in this patch, it's possible to support both SEC("kprobe") and SEC("kprobe/<actual_kernel_function"), while only in the latter case auto-attach will be performed. In the former one, such kprobe will be skipped during skeleton attach operation. Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Alan Maguire <alan.maguire@oracle.com> Reviewed-by: Alan Maguire <alan.maguire@oracle.com> Link: https://lore.kernel.org/bpf/20220305010129.1549719-2-andrii@kernel.org	2022-03-05 09:38:15 -08:00
Yuntao Wang	41332d6e3a	libbpf: Add a check to ensure that page_cnt is non-zero The page_cnt parameter is used to specify the number of memory pages allocated for each per-CPU buffer, it must be non-zero and a power of 2. Currently, the __perf_buffer__new() function attempts to validate that the page_cnt is a power of 2 but forgets checking for the case where page_cnt is zero, we can fix it by replacing 'page_cnt & (page_cnt - 1)' with 'page_cnt == 0 \|\| (page_cnt & (page_cnt - 1))'. If so, we also don't need to add a check in perf_buffer__new_v0_6_0() to make sure that page_cnt is non-zero and the check for zero in perf_buffer__new_raw_v0_6_0() can also be removed. The code will be cleaner and more readable. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220303005921.53436-1-ytcoode@gmail.com	2022-03-03 16:23:22 +01:00
Xu Kuohai	4226961b00	libbpf: Skip forward declaration when counting duplicated type names Currently if a declaration appears in the BTF before the definition, the definition is dumped as a conflicting name, e.g.: $ bpftool btf dump file vmlinux format raw \| grep "'unix_sock'" [81287] FWD 'unix_sock' fwd_kind=struct [89336] STRUCT 'unix_sock' size=1024 vlen=14 $ bpftool btf dump file vmlinux format c \| grep "struct unix_sock" struct unix_sock; struct unix_sock___2 { <--- conflict, the "___2" is unexpected struct unix_sock___2 *unix_sk; This causes a compilation error if the dump output is used as a header file. Fix it by skipping declaration when counting duplicated type names. Fixes: `351131b51c` ("libbpf: add btf_dump API for BTF-to-C conversion") Signed-off-by: Xu Kuohai <xukuohai@huawei.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220301053250.1464204-2-xukuohai@huawei.com	2022-03-01 13:40:57 +01:00
Stijn Tintel	a4fbfdd7a1	libbpf: Fix BPF_MAP_TYPE_PERF_EVENT_ARRAY auto-pinning When a BPF map of type BPF_MAP_TYPE_PERF_EVENT_ARRAY doesn't have the max_entries parameter set, the map will be created with max_entries set to the number of available CPUs. When we try to reuse such a pinned map, map_is_reuse_compat will return false, as max_entries in the map definition differs from max_entries of the existing map, causing the following error: libbpf: couldn't reuse pinned map at '/sys/fs/bpf/m_logging': parameter mismatch Fix this by overwriting max_entries in the map definition. For this to work, we need to do this in bpf_object__create_maps, before calling bpf_object__reuse_map. Fixes: `57a00f4164` ("libbpf: Add auto-pinning of maps when loading BPF objects") Signed-off-by: Stijn Tintel <stijn@linux-ipv6.be> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Song Liu <songliubraving@fb.com> Link: https://lore.kernel.org/bpf/20220225152355.315204-1-stijn@linux-ipv6.be	2022-02-28 17:20:52 +01:00
Yuntao Wang	08894d9c64	libbpf: Simplify the find_elf_sec_sz() function The check in the last return statement is unnecessary, we can just return the ret variable. But we can simplify the function further by returning 0 immediately if we find the section size and -ENOENT otherwise. Thus we can also remove the ret variable. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220223085244.3058118-1-ytcoode@gmail.com	2022-02-23 14:53:21 -08:00
Tzvetomir Stoyanov (VMware)	56dce86819	libperf: Add API for allocating new thread map array The existing API perf_thread_map__new_dummy() allocates new thread map for one thread. I couldn't find a way to reallocate the map with more threads, or to allocate a new map for more than one thread. Having multiple threads in a thread map is essential for some use cases. That's why a new API is proposed, which allocates a new thread map for given number of threads: perf_thread_map__new_array() Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/linux-perf-users/20220221102628.43904-1-tz.stoyanov@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2022-02-23 14:40:23 -03:00
Tzvetomir Stoyanov (VMware)	41415b8a97	libperf: Rename arguments of perf_thread_map APIs The "int thread" input arguments of some perf_thead_map APIs are index of the thread in the thread map. In order to avoid confusion and to make the APIs consistent with perf_cpu_map APIs, those arguments are renamed to "int idx". Suggested-by: Arnaldo Carvalho de Melo <acme@kernel.org> Signed-off-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Ian Rogers <irogers@google.com> Link: https://lore.kernel.org/r/20220221102612.43879-1-tz.stoyanov@gmail.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2022-02-23 14:39:29 -03:00
Yuntao Wang	6966d4c442	libbpf: Remove redundant check in btf_fixup_datasec() The check 't->size && t->size != size' is redundant because if t->size compares unequal to 0, we will just skip straight to sorting variables. Signed-off-by: Yuntao Wang <ytcoode@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220220072750.209215-1-ytcoode@gmail.com	2022-02-22 10:50:54 -08:00
Karolina Drobnik	aa0eab8639	tools: Move gfp.h and slab.h from radix-tree to lib Merge radix-tree definitions from gfp.h and slab.h with these in tools/lib, so they can be used in other test suites. Fix style issues in slab.h. Update radix-tree test files. Signed-off-by: Karolina Drobnik <karolinadrobnik@gmail.com> Signed-off-by: Mike Rapoport <rppt@kernel.org> Link: https://lore.kernel.org/r/b76ddb8a12fdf9870b55c1401213e44f5e0d0da3.1643796665.git.karolinadrobnik@gmail.com	2022-02-20 08:44:37 +02:00
Jakub Kicinski	a3fc4b1d09	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== bpf-next 2022-02-17 We've added 29 non-merge commits during the last 8 day(s) which contain a total of 34 files changed, 1502 insertions(+), 524 deletions(-). The main changes are: 1) Add BTFGen support to bpftool which allows to use CO-RE in kernels without BTF info, from Mauricio Vásquez, Rafael David Tinoco, Lorenzo Fontana and Leonardo Di Donato. (Details: https://lpc.events/event/11/contributions/948/) 2) Prepare light skeleton to be used in both kernel module and user space and convert bpf_preload.ko to use light skeleton, from Alexei Starovoitov. 3) Rework bpftool's versioning scheme and align with libbpf's version number; also add linked libbpf version info to "bpftool version", from Quentin Monnet. 4) Add minimal C++ specific additions to bpftool's skeleton codegen to facilitate use of C skeletons in C++ applications, from Andrii Nakryiko. 5) Add BPF verifier sanity check whether relative offset on kfunc calls overflows desc->imm and reject the BPF program if the case, from Hou Tao. 6) Fix libbpf to use a dynamically allocated buffer for netlink messages to avoid receiving truncated messages on some archs, from Toke Høiland-Jørgensen. 7) Various follow-up fixes to the JIT bpf_prog_pack allocator, from Song Liu. 8) Various BPF selftest and vmtest.sh fixes, from Yucong Sun. 9) Fix bpftool pretty print handling on dumping map keys/values when no BTF is available, from Jiri Olsa and Yinjun Zhang. 10) Extend XDP frags selftest to check for invalid length, from Lorenzo Bianconi. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (29 commits) bpf: bpf_prog_pack: Set proper size before freeing ro_header selftests/bpf: Fix crash in core_reloc when bpftool btfgen fails selftests/bpf: Fix vmtest.sh to launch smp vm. libbpf: Fix memleak in libbpf_netlink_recv() bpftool: Fix C++ additions to skeleton bpftool: Fix pretty print dump for maps without BTF loaded selftests/bpf: Test "bpftool gen min_core_btf" bpftool: Gen min_core_btf explanation and examples bpftool: Implement btfgen_get_btf() bpftool: Implement "gen min_core_btf" logic bpftool: Add gen min_core_btf command libbpf: Expose bpf_core_{add,free}_cands() to bpftool libbpf: Split bpf_core_apply_relo() bpf: Reject kfunc calls that overflow insn->imm selftests/bpf: Add Skeleton templated wrapper as an example bpftool: Add C++-specific open/load/etc skeleton wrappers selftests/bpf: Fix GCC11 compiler warnings in -O2 mode bpftool: Fix the error when lookup in no-btf maps libbpf: Use dynamically allocated buffer when receiving netlink messages libbpf: Fix libbpf.map inheritance chain for LIBBPF_0.7.0 ... ==================== Link: https://lore.kernel.org/r/20220217232027.29831-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-17 17:23:52 -08:00
Arnaldo Carvalho de Melo	859f7e4554	Merge remote-tracking branch 'torvalds/master' into perf/core To pick up fixes from perf/urgent that recently got merged. Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2022-02-17 18:40:54 -03:00
Jakub Kicinski	6b5567b1b2	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-17 11:44:20 -08:00
Andrii Nakryiko	1b8c924a05	libbpf: Fix memleak in libbpf_netlink_recv() Ensure that libbpf_netlink_recv() frees dynamically allocated buffer in all code paths. Fixes: `9c3de619e1` ("libbpf: Use dynamically allocated buffer when receiving netlink messages") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20220217073958.276959-1-andrii@kernel.org	2022-02-17 16:09:07 +01:00
Gustavo A. R. Silva	5224f79096	treewide: Replace zero-length arrays with flexible-array members There is a regular need in the kernel to provide a way to declare having a dynamically sized set of trailing elements in a structure. Kernel code should always use “flexible array members”[1] for these cases. The older style of one-element or zero-length arrays should no longer be used[2]. This code was transformed with the help of Coccinelle: (next-20220214$ spatch --jobs $(getconf _NPROCESSORS_ONLN) --sp-file script.cocci --include-headers --dir . > output.patch) @@ identifier S, member, array; type T1, T2; @@ struct S { ... T1 member; T2 array[ - 0 ]; }; UAPI and wireless changes were intentionally excluded from this patch and will be sent out separately. [1] https://en.wikipedia.org/wiki/Flexible_array_member [2] https://www.kernel.org/doc/html/v5.16/process/deprecated.html#zero-length-and-one-element-arrays Link: https://github.com/KSPP/linux/issues/78 Reviewed-by: Kees Cook <keescook@chromium.org> Signed-off-by: Gustavo A. R. Silva <gustavoars@kernel.org>	2022-02-17 07:00:39 -06:00
Mauricio Vásquez	8de6cae40b	libbpf: Expose bpf_core_{add,free}_cands() to bpftool Expose bpf_core_add_cands() and bpf_core_free_cands() to handle candidates list. Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com> Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co> Signed-off-by: Leonardo Di Donato <leonardo.didonato@elastic.co> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220215225856.671072-3-mauricio@kinvolk.io	2022-02-16 10:05:45 -08:00
Mauricio Vásquez	adb8fa195e	libbpf: Split bpf_core_apply_relo() BTFGen needs to run the core relocation logic in order to understand what are the types involved in a given relocation. Currently bpf_core_apply_relo() calculates and applies a relocation to an instruction. Having both operations in the same function makes it difficult to only calculate the relocation without patching the instruction. This commit splits that logic in two different phases: (1) calculate the relocation and (2) patch the instruction. For the first phase bpf_core_apply_relo() is renamed to bpf_core_calc_relo_insn() who is now only on charge of calculating the relocation, the second phase uses the already existing bpf_core_patch_insn(). bpf_object__relocate_core() uses both of them and the BTFGen will use only bpf_core_calc_relo_insn(). Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Rafael David Tinoco <rafael.tinoco@aquasec.com> Signed-off-by: Lorenzo Fontana <lorenzo.fontana@elastic.co> Signed-off-by: Leonardo Di Donato <leonardo.didonato@elastic.co> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220215225856.671072-2-mauricio@kinvolk.io	2022-02-16 10:05:42 -08:00
Kees Cook	52a9dab6d8	libsubcmd: Fix use-after-free for realloc(..., 0) GCC 12 correctly reports a potential use-after-free condition in the xrealloc helper. Fix the warning by avoiding an implicit "free(ptr)" when size == 0: In file included from help.c:12: In function 'xrealloc', inlined from 'add_cmdname' at help.c:24:2: subcmd-util.h:56:23: error: pointer may be used after 'realloc' [-Werror=use-after-free] 56 \| ret = realloc(ptr, size); \| ^~~~~~~~~~~~~~~~~~ subcmd-util.h:52:21: note: call to 'realloc' here 52 \| void ret = realloc(ptr, size); \| ^~~~~~~~~~~~~~~~~~ subcmd-util.h:58:31: error: pointer may be used after 'realloc' [-Werror=use-after-free] 58 \| ret = realloc(ptr, 1); \| ^~~~~~~~~~~~~~~ subcmd-util.h:52:21: note: call to 'realloc' here 52 \| void ret = realloc(ptr, size); \| ^~~~~~~~~~~~~~~~~~ Fixes: `2f4ce5ec1d` ("perf tools: Finalize subcmd independence") Reported-by: Valdis Klētnieks <valdis.kletnieks@vt.edu> Signed-off-by: Kees Kook <keescook@chromium.org> Tested-by: Valdis Klētnieks <valdis.kletnieks@vt.edu> Tested-by: Justin M. Forbes <jforbes@fedoraproject.org> Acked-by: Josh Poimboeuf <jpoimboe@redhat.com> Cc: linux-hardening@vger.kernel.org Cc: Valdis Klētnieks <valdis.kletnieks@vt.edu> Link: http://lore.kernel.org/lkml/20220213182443.4037039-1-keescook@chromium.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2022-02-16 13:49:25 -03:00
Jiri Olsa	30d1c4d947	libperf: Fix perf_cpu_map__for_each_cpu macro Tzvetomir Stoyanov reported an issue with using macro perf_cpu_map__for_each_cpu using private perf_cpu object. The issue is caused by recent change that wrapped cpu in struct perf_cpu to distinguish it from cpu indexes. We need to make struct perf_cpu public. Add a simple test for using the perf_cpu_map__for_each_cpu macro. Fixes: `6d18804b96` ("perf cpumap: Give CPUs their own type") Reported-by: Tzvetomir Stoyanov (VMware) <tz.stoyanov@gmail.com> Signed-off-by: Jiri Olsa <jolsa@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lore.kernel.org/lkml/20220215153713.31395-1-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2022-02-16 13:49:25 -03:00
Rob Herring	096972f558	libperf: Fix 32-bit build for tests uint64_t printf Commit `a7f3713f6b` ("libperf tests: Add test_stat_multiplexing test") added printf's of 64-bit ints using %lu which doesn't work on 32-bit builds: tests/test-evlist.c:529:29: error: format ‘%lu’ expects argument of type \ ‘long unsigned int’, but argument 4 has type ‘uint64_t’ {aka ‘long long unsigned int’} [-Werror=format=] Use PRIu64 instead which works on both 32-bit and 64-bit systems. Fixes: `a7f3713f6b` ("libperf tests: Add test_stat_multiplexing test") Signed-off-by: Rob Herring <robh@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com> Link: https://lore.kernel.org/r/20220201213903.699656-1-robh@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2022-02-16 13:49:24 -03:00
Masahiro Yamada	5c8166419a	kbuild: replace $(if A,A,B) with $(or A,B) $(or ...) is available since GNU Make 3.81, and useful to shorten the code in some places. Covert as follows: $(if A,A,B) --> $(or A,B) This patch also converts: $(if A, A, B) --> $(or A, B) Strictly speaking, the latter is not an equivalent conversion because GNU Make keeps spaces after commas; if A is not empty, $(if A, A, B) expands to " A", while $(or A, B) expands to "A". Anyway, preceding spaces are not significant in the code hunks I touched. Signed-off-by: Masahiro Yamada <masahiroy@kernel.org> Reviewed-by: Nicolas Schier <nicolas@fjasle.eu>	2022-02-15 12:25:56 +09:00
Toke Høiland-Jørgensen	9c3de619e1	libbpf: Use dynamically allocated buffer when receiving netlink messages When receiving netlink messages, libbpf was using a statically allocated stack buffer of 4k bytes. This happened to work fine on systems with a 4k page size, but on systems with larger page sizes it can lead to truncated messages. The user-visible impact of this was that libbpf would insist no XDP program was attached to some interfaces because that bit of the netlink message got chopped off. Fix this by switching to a dynamically allocated buffer; we borrow the approach from iproute2 of using recvmsg() with MSG_PEEK\|MSG_TRUNC to get the actual size of the pending message before receiving it, adjusting the buffer as necessary. While we're at it, also add retries on interrupted system calls around the recvmsg() call. v2: - Move peek logic to libbpf_netlink_recv(), don't double free on ENOMEM. Fixes: `8bbb77b7c7` ("libbpf: Add various netlink helpers") Reported-by: Zhiqian Guan <zhguan@redhat.com> Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Acked-by: Kumar Kartikeya Dwivedi <memxor@gmail.com> Link: https://lore.kernel.org/bpf/20220211234819.612288-1-toke@redhat.com	2022-02-12 07:57:44 -08:00
Andrii Nakryiko	d130e954a0	libbpf: Fix libbpf.map inheritance chain for LIBBPF_0.7.0 Ensure that LIBBPF_0.7.0 inherits everything from LIBBPF_0.6.0. Fixes: `dbdd2c7f8c` ("libbpf: Add API to get/set log_level at per-program level") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20220211205235.2089104-1-andrii@kernel.org	2022-02-11 12:59:28 -08:00
Jakub Kicinski	5b91c5cc0e	Merge git://git.kernel.org/pub/scm/linux/kernel/git/netdev/net No conflicts. Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-10 17:29:56 -08:00
Alexei Starovoitov	6fe65f1b4d	libbpf: Prepare light skeleton for the kernel. Prepare light skeleton to be used in the kernel module and in the user space. The look and feel of lskel.h is mostly the same with the difference that for user space the skel->rodata is the same pointer before and after skel_load operation, while in the kernel the skel->rodata after skel_open and the skel->rodata after skel_load are different pointers. Typical usage of skeleton remains the same for kernel and user space: skel = my_bpf__open(); skel->rodata->my_global_var = init_val; err = my_bpf__load(skel); err = my_bpf__attach(skel); // access skel->rodata->my_global_var; // access skel->bss->another_var; Signed-off-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220209232001.27490-3-alexei.starovoitov@gmail.com	2022-02-10 23:31:51 +01:00
Alexey Bayduraev	d87c25e8f4	tools lib: Introduce fdarray duplicate function Introduce a function to duplicate an existing file descriptor in the fdarray structure. The function returns the position of the duplicated file descriptor. Reviewed-by: Riccardo Mancini <rickyman7@gmail.com> Signed-off-by: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com> Tested-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Riccardo Mancini <rickyman7@gmail.com> Acked-by: Namhyung Kim <namhyung@gmail.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Alexander Antonov <alexander.antonov@linux.intel.com> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexei Budankov <abudankov@huawei.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: https://lore.kernel.org/r/2891f1def287d5863cc82683a4d5879195c8d90c.1642440724.git.alexey.v.bayduraev@linux.intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>	2022-02-10 16:25:20 -03:00
Jakub Kicinski	1127170d45	Merge https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next Daniel Borkmann says: ==================== pull-request: bpf-next 2022-02-09 We've added 126 non-merge commits during the last 16 day(s) which contain a total of 201 files changed, 4049 insertions(+), 2215 deletions(-). The main changes are: 1) Add custom BPF allocator for JITs that pack multiple programs into a huge page to reduce iTLB pressure, from Song Liu. 2) Add __user tagging support in vmlinux BTF and utilize it from BPF verifier when generating loads, from Yonghong Song. 3) Add per-socket fast path check guarding from cgroup/BPF overhead when used by only some sockets, from Pavel Begunkov. 4) Continued libbpf deprecation work of APIs/features and removal of their usage from samples, selftests, libbpf & bpftool, from Andrii Nakryiko and various others. 5) Improve BPF instruction set documentation by adding byte swap instructions and cleaning up load/store section, from Christoph Hellwig. 6) Switch BPF preload infra to light skeleton and remove libbpf dependency from it, from Alexei Starovoitov. 7) Fix architecture-agnostic macros in libbpf for accessing syscall arguments from BPF progs for non-x86 architectures, from Ilya Leoshkevich. 8) Rework port members in struct bpf_sk_lookup and struct bpf_sock to be of 16-bit field with anonymous zero padding, from Jakub Sitnicki. 9) Add new bpf_copy_from_user_task() helper to read memory from a different task than current. Add ability to create sleepable BPF iterator progs, from Kenny Yu. 10) Implement XSK batching for ice's zero-copy driver used by AF_XDP and utilize TX batching API from XSK buffer pool, from Maciej Fijalkowski. 11) Generate temporary netns names for BPF selftests to avoid naming collisions, from Hangbin Liu. 12) Implement bpf_core_types_are_compat() with limited recursion for in-kernel usage, from Matteo Croce. 13) Simplify pahole version detection and finally enable CONFIG_DEBUG_INFO_DWARF5 to be selected with CONFIG_DEBUG_INFO_BTF, from Nathan Chancellor. 14) Misc minor fixes to libbpf and selftests from various folks. * https://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next: (126 commits) selftests/bpf: Cover 4-byte load from remote_port in bpf_sk_lookup bpf: Make remote_port field in struct bpf_sk_lookup 16-bit wide libbpf: Fix compilation warning due to mismatched printf format selftests/bpf: Test BPF_KPROBE_SYSCALL macro libbpf: Add BPF_KPROBE_SYSCALL macro libbpf: Fix accessing the first syscall argument on s390 libbpf: Fix accessing the first syscall argument on arm64 libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL selftests/bpf: Skip test_bpf_syscall_macro's syscall_arg1 on arm64 and s390 libbpf: Fix accessing syscall arguments on riscv libbpf: Fix riscv register names libbpf: Fix accessing syscall arguments on powerpc selftests/bpf: Use PT_REGS_SYSCALL_REGS in bpf_syscall_macro libbpf: Add PT_REGS_SYSCALL_REGS macro selftests/bpf: Fix an endianness issue in bpf_syscall_macro test bpf: Fix bpf_prog_pack build HPAGE_PMD_SIZE bpf: Fix leftover header->pages in sparc and powerpc code. libbpf: Fix signedness bug in btf_dump_array_data() selftests/bpf: Do not export subtest as standalone test bpf, x86_64: Fail gracefully on bpf_jit_binary_pack_finalize failures ... ==================== Link: https://lore.kernel.org/r/20220209210050.8425-1-daniel@iogearbox.net Signed-off-by: Jakub Kicinski <kuba@kernel.org>	2022-02-09 18:40:56 -08:00
Andrii Nakryiko	dc37dc617f	libbpf: Fix compilation warning due to mismatched printf format On ppc64le architecture __s64 is long int and requires %ld. Cast to ssize_t and use %zd to avoid architecture-specific specifiers. Fixes: `4172843ed4` ("libbpf: Fix signedness bug in btf_dump_array_data()") Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20220209063909.1268319-1-andrii@kernel.org	2022-02-09 14:33:32 +01:00
Hengqi Chen	816ae10955	libbpf: Add BPF_KPROBE_SYSCALL macro Add syscall-specific variant of BPF_KPROBE named BPF_KPROBE_SYSCALL ([0]). The new macro hides the underlying way of getting syscall input arguments. With the new macro, the following code: SEC("kprobe/__x64_sys_close") int BPF_KPROBE(do_sys_close, struct pt_regs regs) { int fd; fd = PT_REGS_PARM1_CORE(regs); / do something with fd / } can be written as: SEC("kprobe/__x64_sys_close") int BPF_KPROBE_SYSCALL(do_sys_close, int fd) { / do something with fd */ } [0] Closes: https://github.com/libbpf/libbpf/issues/425 Signed-off-by: Hengqi Chen <hengqi.chen@gmail.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220207143134.2977852-2-hengqi.chen@gmail.com	2022-02-08 21:45:02 -08:00
Ilya Leoshkevich	1f22a6f9f9	libbpf: Fix accessing the first syscall argument on s390 On s390, the first syscall argument should be accessed via orig_gpr2 (see arch/s390/include/asm/syscall.h). Currently gpr[2] is used instead, leading to bpf_syscall_macro test failure. orig_gpr2 cannot be added to user_pt_regs, since its layout is a part of the ABI. Therefore provide access to it only through PT_REGS_PARM1_CORE_SYSCALL() by using a struct pt_regs flavor. Reported-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220209021745.2215452-11-iii@linux.ibm.com	2022-02-08 21:37:50 -08:00
Ilya Leoshkevich	fbca4a2f64	libbpf: Fix accessing the first syscall argument on arm64 On arm64, the first syscall argument should be accessed via orig_x0 (see arch/arm64/include/asm/syscall.h). Currently regs[0] is used instead, leading to bpf_syscall_macro test failure. orig_x0 cannot be added to struct user_pt_regs, since its layout is a part of the ABI. Therefore provide access to it only through PT_REGS_PARM1_CORE_SYSCALL() by using a struct pt_regs flavor. Reported-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220209021745.2215452-10-iii@linux.ibm.com	2022-02-08 21:37:50 -08:00
Ilya Leoshkevich	60d16c5ccb	libbpf: Allow overriding PT_REGS_PARM1{_CORE}_SYSCALL arm64 and s390 need a special way to access the first syscall argument. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220209021745.2215452-9-iii@linux.ibm.com	2022-02-08 21:37:50 -08:00
Ilya Leoshkevich	cf0b5b2769	libbpf: Fix accessing syscall arguments on riscv riscv does not select ARCH_HAS_SYSCALL_WRAPPER, so its syscall handlers take "unpacked" syscall arguments. Indicate this to libbpf using PT_REGS_SYSCALL_REGS macro. Reported-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220209021745.2215452-7-iii@linux.ibm.com	2022-02-08 21:16:14 -08:00
Ilya Leoshkevich	5c101153bf	libbpf: Fix riscv register names riscv registers are accessed via struct user_regs_struct, not struct pt_regs. The program counter member in this struct is called pc, not epc. The frame pointer is called s0, not fp. Fixes: `3cc31d7940` ("libbpf: Normalize PT_REGS_xxx() macro definitions") Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220209021745.2215452-6-iii@linux.ibm.com	2022-02-08 21:16:14 -08:00
Ilya Leoshkevich	f07f150346	libbpf: Fix accessing syscall arguments on powerpc powerpc does not select ARCH_HAS_SYSCALL_WRAPPER, so its syscall handlers take "unpacked" syscall arguments. Indicate this to libbpf using PT_REGS_SYSCALL_REGS macro. Reported-by: Heiko Carstens <hca@linux.ibm.com> Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Tested-by: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com> Link: https://lore.kernel.org/bpf/20220209021745.2215452-5-iii@linux.ibm.com	2022-02-08 21:16:14 -08:00
Ilya Leoshkevich	c5a1ffa0da	libbpf: Add PT_REGS_SYSCALL_REGS macro Architectures that select ARCH_HAS_SYSCALL_WRAPPER pass a pointer to struct pt_regs to syscall handlers, others unpack it into individual function parameters. Introduce a macro to describe what a particular arch does. Signed-off-by: Ilya Leoshkevich <iii@linux.ibm.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220209021745.2215452-3-iii@linux.ibm.com	2022-02-08 21:16:14 -08:00
Dan Carpenter	4172843ed4	libbpf: Fix signedness bug in btf_dump_array_data() The btf__resolve_size() function returns negative error codes so "elem_size" must be signed for the error handling to work. Fixes: `920d16af9b` ("libbpf: BTF dumper support for typed data") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220208071552.GB10495@kili	2022-02-08 22:34:44 +01:00
Mauricio Vásquez	e4e835c87b	libbpf: Remove mode check in libbpf_set_strict_mode() libbpf_set_strict_mode() checks that the passed mode doesn't contain extra bits for LIBBPF_STRICT_* flags that don't exist yet. It makes it difficult for applications to disable some strict flags as something like "LIBBPF_STRICT_ALL & ~LIBBPF_STRICT_MAP_DEFINITIONS" is rejected by this check and they have to use a rather complicated formula to calculate it.[0] One possibility is to change LIBBPF_STRICT_ALL to only contain the bits of all existing LIBBPF_STRICT_* flags instead of 0xffffffff. However it's not possible because the idea is that applications compiled against older libbpf_legacy.h would still be opting into latest LIBBPF_STRICT_ALL features.[1] The other possibility is to remove that check so something like "LIBBPF_STRICT_ALL & ~LIBBPF_STRICT_MAP_DEFINITIONS" is allowed. It's what this commit does. [0]: https://lore.kernel.org/bpf/20220204220435.301896-1-mauricio@kinvolk.io/ [1]: https://lore.kernel.org/bpf/CAEf4BzaTWa9fELJLh+bxnOb0P1EMQmaRbJVG0L+nXZdy0b8G3Q@mail.gmail.com/ Fixes: `93b8952d22` ("libbpf: deprecate legacy BPF map definitions") Signed-off-by: Mauricio Vásquez <mauricio@kinvolk.io> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220207145052.124421-2-mauricio@kinvolk.io	2022-02-07 12:12:22 -08:00
Rob Herring	407eb43ae8	libperf: Add arm64 support to perf_mmap__read_self() Add the arm64 variants for read_perf_counter() and read_timestamp(). Unfortunately the counter number is encoded into the instruction, so the code is a bit verbose to enumerate all possible counters. Tested-by: Masayoshi Mizuma <m.mizuma@jp.fujitsu.com> Signed-off-by: Rob Herring <robh@kernel.org> Acked-by: Jiri Olsa <jolsa@redhat.com> Tested-by: John Garry <john.garry@huawei.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com> Link: https://lore.kernel.org/r/20220201214056.702854-1-robh@kernel.org Cc: Mark Rutland <mark.rutland@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Will Deacon <will@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: linux-kernel@vger.kernel.org Cc: linux-perf-users@vger.kernel.org	2022-02-06 09:14:27 -03:00
Yonghong Song	0908a66ad1	libbpf: Fix build issue with llvm-readelf There are cases where clang compiler is packaged in a way readelf is a symbolic link to llvm-readelf. In such cases, llvm-readelf will be used instead of default binutils readelf, and the following error will appear during libbpf build: Warning: Num of global symbols in /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/sharedobjs/libbpf-in.o (367) does NOT match with num of versioned symbols in /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf.so libbpf.map (383). Please make sure all LIBBPF_API symbols are versioned in libbpf.map. --- /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf_global_syms.tmp ... +++ /home/yhs/work/bpf-next/tools/testing/selftests/bpf/tools/build/libbpf/libbpf_versioned_syms.tmp ... @@ -324,6 +324,22 @@ btf__str_by_offset btf__type_by_id btf__type_cnt +LIBBPF_0.0.1 +LIBBPF_0.0.2 +LIBBPF_0.0.3 +LIBBPF_0.0.4 +LIBBPF_0.0.5 +LIBBPF_0.0.6 +LIBBPF_0.0.7 +LIBBPF_0.0.8 +LIBBPF_0.0.9 +LIBBPF_0.1.0 +LIBBPF_0.2.0 +LIBBPF_0.3.0 +LIBBPF_0.4.0 +LIBBPF_0.5.0 +LIBBPF_0.6.0 +LIBBPF_0.7.0 libbpf_attach_type_by_name libbpf_find_kernel_btf libbpf_find_vmlinux_btf_id make[2]: * [Makefile:184: check_abi] Error 1 make[1]: * [Makefile:140: all] Error 2 The above failure is due to different printouts for some ABS versioned symbols. For example, with the same libbpf.so, $ /bin/readelf --dyn-syms --wide tools/lib/bpf/libbpf.so \| grep "LIBBPF" \| grep ABS 134: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.5.0 202: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.6.0 ... $ /opt/llvm/bin/readelf --dyn-syms --wide tools/lib/bpf/libbpf.so \| grep "LIBBPF" \| grep ABS 134: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.5.0@@LIBBPF_0.5.0 202: 0000000000000000 0 OBJECT GLOBAL DEFAULT ABS LIBBPF_0.6.0@@LIBBPF_0.6.0 ... The binutils readelf doesn't print out the symbol LIBBPF_* version and llvm-readelf does. Such a difference caused libbpf build failure with llvm-readelf. The proposed fix filters out all ABS symbols as they are not part of the comparison. This works for both binutils readelf and llvm-readelf. Reported-by: Delyan Kratunov <delyank@fb.com> Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220204214355.502108-1-yhs@fb.com	2022-02-04 18:22:56 -08:00
Andrii Nakryiko	227a0713b3	libbpf: Deprecate forgotten btf__get_map_kv_tids() btf__get_map_kv_tids() is in the same group of APIs as btf_ext__reloc_func_info()/btf_ext__reloc_line_info() which were only used by BCC. It was missed to be marked as deprecated in [0]. Fixing that to complete [1]. [0] https://patchwork.kernel.org/project/netdevbpf/patch/20220201014610.3522985-1-davemarchevsky@fb.com/ [1] Closes: https://github.com/libbpf/libbpf/issues/277 Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20220203225017.1795946-1-andrii@kernel.org	2022-02-04 01:07:16 +01:00
Delyan Kratunov	ca33aa4ec5	libbpf: Deprecate priv/set_priv storage Arbitrary storage via bpf_*__set_priv/__priv is being deprecated without a replacement ([1]). perf uses this capability, but most of that is going away with the removal of prologue generation ([2]). perf is already suppressing deprecation warnings, so the remaining cleanup will happen separately. [1]: Closes: https://github.com/libbpf/libbpf/issues/294 [2]: https://lore.kernel.org/bpf/20220123221932.537060-1-jolsa@kernel.org/ Signed-off-by: Delyan Kratunov <delyank@fb.com> Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20220203180032.1921580-1-delyank@fb.com	2022-02-03 12:34:45 -08:00

1 2 3 4 5 ...

2164 Commits