linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-26 05:34:13 +08:00

Author	SHA1	Message	Date
Toke Høiland-Jørgensen	fb9a98e160	libbpf: Fix libbpf_common.h when installing libbpf through 'make install' This fixes two issues with the newly introduced libbpf_common.h file: - The header failed to include <string.h> for the definition of memset() - The new file was not included in the install_headers rule in the Makefile Both of these issues cause breakage when installing libbpf with 'make install' and trying to use it in applications. Fixes: `544402d4b4` ("libbpf: Extract common user-facing helpers") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191217112810.768078-1-toke@redhat.com	2019-12-18 00:19:31 +01:00
Andrii Nakryiko	92f7440ecc	selftests/bpf: More succinct Makefile output Similarly to bpftool/libbpf output, make selftests/bpf output succinct per-item output line. Output is roughly as follows: $ make ... CLANG-LLC [test_maps] pyperf600.o CLANG-LLC [test_maps] strobemeta.o CLANG-LLC [test_maps] pyperf100.o EXTRA-OBJ [test_progs] cgroup_helpers.o EXTRA-OBJ [test_progs] trace_helpers.o BINARY test_align BINARY test_verifier_log GEN-SKEL [test_progs] fexit_bpf2bpf.skel.h GEN-SKEL [test_progs] test_global_data.skel.h GEN-SKEL [test_progs] sendmsg6_prog.skel.h ... To see the actual command invocation, verbose mode can be turned on with V=1 argument: $ make V=1 ... very verbose output ... Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191217061425.2346359-1-andriin@fb.com	2019-12-18 00:14:20 +01:00
Andrii Nakryiko	dbd8f6bae6	libbpf: Add zlib as a dependency in pkg-config template List zlib as another dependency of libbpf in pkg-config template. Verified it is correctly resolved to proper -lz flag: $ make DESTDIR=/tmp/libbpf-install install $ pkg-config --libs /tmp/libbpf-install/usr/local/lib64/pkgconfig/libbpf.pc -L/usr/local/lib64 -lbpf $ pkg-config --libs --static /tmp/libbpf-install/usr/local/lib64/pkgconfig/libbpf.pc -L/usr/local/lib64 -lbpf -lelf -lz Fixes: `166750bc1d` ("libbpf: Support libbpf-provided extern variables") Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Yonghong Song <yhs@fb.com> Cc: Luca Boccassi <bluca@debian.org> Link: https://lore.kernel.org/bpf/20191216183830.3972964-1-andriin@fb.com	2019-12-16 14:55:29 -08:00
Toke Høiland-Jørgensen	dc3a2d2547	libbpf: Print hint about ulimit when getting permission denied error Probably the single most common error newcomers to XDP are stumped by is the 'permission denied' error they get when trying to load their program and 'ulimit -l' is set too low. For examples, see [0], [1]. Since the error code is UAPI, we can't change that. Instead, this patch adds a few heuristics in libbpf and outputs an additional hint if they are met: If an EPERM is returned on map create or program load, and geteuid() shows we are root, and the current RLIMIT_MEMLOCK is not infinity, we output a hint about raising 'ulimit -l' as an additional log line. [0] https://marc.info/?l=xdp-newbies&m=157043612505624&w=2 [1] https://github.com/xdp-project/xdp-tutorial/issues/86 Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: Yonghong Song <yhs@fb.com> Link: https://lore.kernel.org/bpf/20191216181204.724953-1-toke@redhat.com	2019-12-16 14:52:34 -08:00
Toke Høiland-Jørgensen	d50ecc46d1	samples/bpf: Attach XDP programs in driver mode by default When attaching XDP programs, userspace can set flags to request the attach mode (generic/SKB mode, driver mode or hw offloaded mode). If no such flags are requested, the kernel will attempt to attach in driver mode, and then silently fall back to SKB mode if this fails. The silent fallback is a major source of user confusion, as users will try to load a program on a device without XDP support, and instead of an error they will get the silent fallback behaviour, not notice, and then wonder why performance is not what they were expecting. In an attempt to combat this, let's switch all the samples to default to explicitly requesting driver-mode attach. As part of this, ensure that all the userspace utilities have a switch to enable SKB mode. For those that have a switch to request driver mode, keep it but turn it into a no-op. Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Jesper Dangaard Brouer <brouer@redhat.com> Acked-by: David Ahern <dsahern@gmail.com> Link: https://lore.kernel.org/bpf/20191216110742.364456-1-toke@redhat.com	2019-12-16 07:05:38 -08:00
Toke Høiland-Jørgensen	450278977a	samples/bpf: Set -fno-stack-protector when building BPF programs It seems Clang can in some cases turn on stack protection by default, which doesn't work with BPF. This was reported once before[0], but it seems the flag to explicitly turn off the stack protector wasn't added to the Makefile, so do that now. The symptom of this is compile errors like the following: error: <unknown>:0:0: in function bpf_prog1 i32 (%struct.__sk_buff*): A call to built-in function '__stack_chk_fail' is not supported. [0] https://www.spinics.net/lists/netdev/msg556400.html Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191216103819.359535-1-toke@redhat.com	2019-12-16 07:03:12 -08:00
Toke Høiland-Jørgensen	5615ed472d	samples/bpf: Add missing -lz to TPROGS_LDLIBS Since libbpf now links against zlib, this needs to be included in the linker invocation for the userspace programs in samples/bpf that link statically against libbpf. Fixes: `166750bc1d` ("libbpf: Support libbpf-provided extern variables") Signed-off-by: Toke Høiland-Jørgensen <toke@redhat.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Tested-by: Björn Töpel <bjorn.topel@intel.com> Link: https://lore.kernel.org/bpf/20191216102405.353834-1-toke@redhat.com	2019-12-16 06:58:10 -08:00
Prashant Bhole	5984dc6cb5	samples/bpf: Reintroduce missed build targets Add xdp_redirect and per_socket_stats_example in build targets. They got removed from build targets in Makefile reorganization. Fixes: `1d97c6c251` ("samples/bpf: Base target programs rules on Makefile.target") Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191216071619.25479-1-prashantbhole.linux@gmail.com	2019-12-16 06:55:30 -08:00
Paul Chaignon	159ecc002b	bpftool: Fix compilation warning on shadowed variable The ident variable has already been declared at the top of the function and doesn't need to be re-declared. Fixes: `985ead416d` ("bpftool: Add skeleton codegen command") Signed-off-by: Paul Chaignon <paul.chaignon@orange.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20191216112733.GA28366@Omicron	2019-12-16 14:18:48 +01:00
Prashant Bhole	a79ac2d103	libbpf: Fix build by renaming variables In btf__align_of() variable name 't' is shadowed by inner block declaration of another variable with same name. Patch renames variables in order to fix it. CC sharedobjs/btf.o btf.c: In function ‘btf__align_of’: btf.c:303:21: error: declaration of ‘t’ shadows a previous local [-Werror=shadow] 303 \| int i, align = 1, t; \| ^ btf.c:283:25: note: shadowed declaration is here 283 \| const struct btf_type *t = btf__type_by_id(btf, id); \| Fixes: `3d208f4ca1` ("libbpf: Expose btf__align_of() API") Signed-off-by: Prashant Bhole <prashantbhole.linux@gmail.com> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Tested-by: Toke Høiland-Jørgensen <toke@redhat.com> Link: https://lore.kernel.org/bpf/20191216082738.28421-1-prashantbhole.linux@gmail.com	2019-12-16 14:14:16 +01:00
Alexei Starovoitov	0849e10280	Merge branch 'support-flex-arrays' Andrii Nakryiko says: ==================== Add support for flexible array accesses in a relocatable manner in BPF CO-RE. It's a typical pattern in C, and kernel in particular, to provide a fixed-length struct with zero-sized or dimensionless array at the end. In such cases variable-sized array contents follows immediately after the end of a struct. This patch set adds support for such access pattern by allowing accesses to such arrays. Patch #1 adds libbpf support. Patch #2 adds few test cases for validation. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-12-15 16:53:55 -08:00
Andrii Nakryiko	5f2eeceffb	selftests/bpf: Add flexible array relocation tests Add few tests validation CO-RE relocation handling of flexible array accesses. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191215070844.1014385-3-andriin@fb.com	2019-12-15 16:53:51 -08:00
Andrii Nakryiko	1b484b301c	libbpf: Support flexible arrays in CO-RE Some data stuctures in kernel are defined with either zero-sized array or flexible (dimensionless) array at the end of a struct. Actual data of such array follows in memory immediately after the end of that struct, forming its variable-sized "body" of elements. Support such access pattern in CO-RE relocation handling. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191215070844.1014385-2-andriin@fb.com	2019-12-15 16:53:50 -08:00
Alexei Starovoitov	01c6f7aaac	Merge branch 'extern-var-support' Andrii Nakryiko says: ==================== It's often important for BPF program to know kernel version or some specific config values (e.g., CONFIG_HZ to convert jiffies to seconds) and change or adjust program logic based on their values. As of today, any such need has to be resolved by recompiling BPF program for specific kernel and kernel configuration. In practice this is usually achieved by using BCC and its embedded LLVM/Clang. With such set up #ifdef CONFIG_XXX and similar compile-time constructs allow to deal with kernel varieties. With CO-RE (Compile Once – Run Everywhere) approach, this is not an option, unfortunately. All such logic variations have to be done as a normal C language constructs (i.e., if/else, variables, etc), not a preprocessor directives. This patch series add support for such advanced scenarios through C extern variables. These extern variables will be recognized by libbpf and supplied through extra .extern internal map, similarly to global data. This .extern map is read-only, which allows BPF verifier to track its content precisely as constants. That gives an opportunity to have pre-compiled BPF program, which can potentially use BPF functionality (e.g., BPF helpers) or kernel features (types, fields, etc), that are available only on a subset of targeted kernels, while effectively eleminating (through verifier's dead code detection) such unsupported functionality for other kernels (typically, older versions). Patch #3 explicitly tests a scenario of using unsupported BPF helper, to validate the approach. This patch set heavily relies on BTF type information emitted by compiler for each extern variable declaration. Based on specific types, libbpf does strict checks of config data values correctness. See patch #1 for details. Outline of the patch set: - patch #1 does a small clean up of internal map names contants; - patch #2 adds all of the libbpf internal machinery for externs support, including setting up BTF information for .extern data section; - patch #3 adds support for .extern into BPF skeleton; - patch #4 adds externs selftests, as well as enhances test_skeleton.c test to validate mmap()-ed .extern datasection functionality. v3->v4: - clean up copyrights and rebase onto latest skeleton patches (Alexei); v2->v3: - truncate too long strings (Alexei); - clean ups, adding comments (Alexei); v1->v2: - use BTF type information for externs (Alexei); - add strings support; - add BPF skeleton support for .extern. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-12-15 16:44:22 -08:00
Andrii Nakryiko	330a73a7b6	selftests/bpf: Add tests for libbpf-provided externs Add a set of tests validating libbpf-provided extern variables. One crucial feature that's tested is dead code elimination together with using invalid BPF helper. CONFIG_MISSING is not supposed to exist and should always be specified by libbpf as zero, which allows BPF verifier to correctly do branch pruning and not fail validation, when invalid BPF helper is called from dead if branch. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191214014710.3449601-5-andriin@fb.com	2019-12-15 16:41:12 -08:00
Andrii Nakryiko	2ad97d473d	bpftool: Generate externs datasec in BPF skeleton Add support for generation of mmap()-ed read-only view of libbpf-provided extern variables. As externs are not supposed to be provided by user code (that's what .data, .bss, and .rodata is for), don't mmap() it initially. Only after skeleton load is performed, map .extern contents as read-only memory. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191214014710.3449601-4-andriin@fb.com	2019-12-15 16:41:12 -08:00
Andrii Nakryiko	166750bc1d	libbpf: Support libbpf-provided extern variables Add support for extern variables, provided to BPF program by libbpf. Currently the following extern variables are supported: - LINUX_KERNEL_VERSION; version of a kernel in which BPF program is executing, follows KERNEL_VERSION() macro convention, can be 4- and 8-byte long; - CONFIG_xxx values; a set of values of actual kernel config. Tristate, boolean, strings, and integer values are supported. Set of possible values is determined by declared type of extern variable. Supported types of variables are: - Tristate values. Are represented as `enum libbpf_tristate`. Accepted values are strictly 'y', 'n', or 'm', which are represented as TRI_YES, TRI_NO, or TRI_MODULE, respectively. - Boolean values. Are represented as bool (_Bool) types. Accepted values are 'y' and 'n' only, turning into true/false values, respectively. - Single-character values. Can be used both as a substritute for bool/tristate, or as a small-range integer: - 'y'/'n'/'m' are represented as is, as characters 'y', 'n', or 'm'; - integers in a range [-128, 127] or [0, 255] (depending on signedness of char in target architecture) are recognized and represented with respective values of char type. - Strings. String values are declared as fixed-length char arrays. String of up to that length will be accepted and put in first N bytes of char array, with the rest of bytes zeroed out. If config string value is longer than space alloted, it will be truncated and warning message emitted. Char array is always zero terminated. String literals in config have to be enclosed in double quotes, just like C-style string literals. - Integers. 8-, 16-, 32-, and 64-bit integers are supported, both signed and unsigned variants. Libbpf enforces parsed config value to be in the supported range of corresponding integer type. Integers values in config can be: - decimal integers, with optional + and - signs; - hexadecimal integers, prefixed with 0x or 0X; - octal integers, starting with 0. Config file itself is searched in /boot/config-$(uname -r) location with fallback to /proc/config.gz, unless config path is specified explicitly through bpf_object_open_opts' kernel_config_path option. Both gzipped and plain text formats are supported. Libbpf adds explicit dependency on zlib because of this, but this shouldn't be a problem, given libelf already depends on zlib. All detected extern variables, are put into a separate .extern internal map. It, similarly to .rodata map, is marked as read-only from BPF program side, as well as is frozen on load. This allows BPF verifier to track extern values as constants and perform enhanced branch prediction and dead code elimination. This can be relied upon for doing kernel version/feature detection and using potentially unsupported field relocations or BPF helpers in a CO-RE-based BPF program, while still having a single version of BPF program running on old and new kernels. Selftests are validating this explicitly for unexisting BPF helper. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191214014710.3449601-3-andriin@fb.com	2019-12-15 16:41:12 -08:00
Andrii Nakryiko	ac9d138963	libbpf: Extract internal map names into constants Instead of duplicating string literals, keep them in one place and consistent. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191214014710.3449601-2-andriin@fb.com	2019-12-15 16:41:12 -08:00
Alexei Starovoitov	f7c0bbf27e	Merge branch 'bpf-obj-skel' Andrii Nakryiko says: ==================== This patch set introduces an alternative and complimentary to existing libbpf API interface for working with BPF objects, maps, programs, and global data from userspace side. This approach is relying on code generation. bpftool produces a struct (a.k.a. skeleton) tailored and specific to provided BPF object file. It includes hard-coded fields and data structures for every map, program, link, and global data present. Altogether this approach significantly reduces amount of userspace boilerplate code required to open, load, attach, and work with BPF objects. It improves attach/detach story, by providing pre-allocated space for bpf_links, and ensuring they are properly detached on shutdown. It allows to do away with by name/title lookups of maps and programs, because libbpf's skeleton API, in conjunction with generated code from bpftool, is filling in hard-coded fields with actual pointers to corresponding struct bpf_map/bpf_program/bpf_link. Also, thanks to BPF array mmap() support, working with global data (variables) from userspace is now as natural as it is from BPF side: each variable is just a struct field inside skeleton struct. Furthermore, this allows to have a natural way for userspace to pre-initialize global data (including previously impossible to initialize .rodata) by just assigning values to the same per-variable fields. Libbpf will carefully take into account this initialization image, will use it to pre-populate BPF maps at creation time, and will re-mmap() BPF map's contents at exactly the same userspace memory address such that it can continue working with all the same pointers without any interruptions. If kernel doesn't support mmap(), global data will still be successfully initialized, but after map creation global data structures inside skeleton will be NULL-ed out. This allows userspace application to gracefully handle lack of mmap() support, if necessary. A bunch of selftests are also converted to using skeletons, demonstrating significant simplification of userspace part of test and reduction in amount of code necessary. v3->v4: - add OPTS_VALID check to btf_dump__emit_type_decl (Alexei); - expose skeleton as LIBBPF_API functions (Alexei); - copyright clean up, update internal map init refactor (Alexei); v2->v3: - make skeleton part of public API; - expose btf_dump__emit_type_decl and btf__align_of APIs; - move LIBBPF_API and DECLARE_LIBBPF_OPTS into libbpf_common.h for reuse; v1->v2: - checkpatch.pl and reverse Christmas tree styling (Jakub); - sanitize variable names to accomodate in-function static vars; rfc->v1: - runqslower moved out into separate patch set waiting for vmlinux.h improvements; - skeleton generation code deals with unknown internal maps more gracefully. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-12-15 16:16:18 -08:00
Andrii Nakryiko	d9c00c3b16	bpftool: Add `gen skeleton` BASH completions Add BASH completions for gen sub-command. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Cc: Quentin Monnet <quentin.monnet@netronome.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-18-andriin@fb.com	2019-12-15 15:58:06 -08:00
Andrii Nakryiko	197448eaac	selftests/bpf: Add test validating data section to struct convertion layout Add a simple selftests validating datasection-to-struct layour dumping. Global variables are constructed in such a way as to cause both natural and artificial padding (through custom alignment requirement). Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-17-andriin@fb.com	2019-12-15 15:58:05 -08:00
Andrii Nakryiko	dde53c1b76	selftests/bpf: Convert few more selftest to skeletons Convert few more selftests to use generated BPF skeletons as a demonstration on how to use it. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-16-andriin@fb.com	2019-12-15 15:58:05 -08:00
Andrii Nakryiko	f3c926a4df	selftests/bpf: Add BPF skeletons selftests and convert attach_probe.c Add BPF skeleton generation to selftest/bpf's Makefile. Convert attach_probe.c to use skeleton. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-15-andriin@fb.com	2019-12-15 15:58:05 -08:00
Andrii Nakryiko	985ead416d	bpftool: Add skeleton codegen command Add `bpftool gen skeleton` command, which takes in compiled BPF .o object file and dumps a BPF skeleton struct and related code to work with that skeleton. Skeleton itself is tailored to a specific structure of provided BPF object file, containing accessors (just plain struct fields) for every map and program, as well as dedicated space for bpf_links. If BPF program is using global variables, corresponding structure definitions of compatible memory layout are emitted as well, making it possible to initialize and subsequently read/update global variables values using simple and clear C syntax for accessing fields. This skeleton majorly improves usability of opening/loading/attaching of BPF object, as well as interacting with it throughout the lifetime of loaded BPF object. Generated skeleton struct has the following structure: struct <object-name> { /* used by libbpf's skeleton API / struct bpf_object_skeleton skeleton; /* bpf_object for libbpf APIs / struct bpf_object obj; struct { /* for every defined map in BPF object: / struct bpf_map <map-name>; } maps; struct { /* for every program in BPF object: / struct bpf_program <program-name>; } progs; struct { /* for every program in BPF object: / struct bpf_link <program-name>; } links; /* for every present global data section: / struct <object-name>__<one of bss, data, or rodata> { / memory layout of corresponding data section, * with every defined variable represented as a struct field * with exactly the same type, but without const/volatile * modifiers, e.g.: / int my_var_1; ... } <one of bss, data, or rodata>; }; This provides great usability improvements: - no need to look up maps and programs by name, instead just my_obj->maps.my_map or my_obj->progs.my_prog would give necessary bpf_map/bpf_program pointers, which user can pass to existing libbpf APIs; - pre-defined places for bpf_links, which will be automatically populated for program types that libbpf knows how to attach automatically (currently tracepoints, kprobe/kretprobe, raw tracepoint and tracing programs). On tearing down skeleton, all active bpf_links will be destroyed (meaning BPF programs will be detached, if they are attached). For cases in which libbpf doesn't know how to auto-attach BPF program, user can manually create link after loading skeleton and they will be auto-detached on skeleton destruction: my_obj->links.my_fancy_prog = bpf_program__attach_cgroup_whatever( my_obj->progs.my_fancy_prog, <whatever extra param); - it's extremely easy and convenient to work with global data from userspace now. Both for read-only and read/write variables, it's possible to pre-initialize them before skeleton is loaded: skel = my_obj__open(raw_embed_data); my_obj->rodata->my_var = 123; my_obj__load(skel); / 123 will be initialization value for my_var */ After load, if kernel supports mmap() for BPF arrays, user can still read (and write for .bss and .data) variables values, but at that point it will be directly mmap()-ed to BPF array, backing global variables. This allows to seamlessly exchange data with BPF side. From userspace program's POV, all the pointers and memory contents stay the same, but mapped kernel memory changes to point to created map. If kernel doesn't yet support mmap() for BPF arrays, it's still possible to use those data section structs to pre-initialize .bss, .data, and .rodata, but after load their pointers will be reset to NULL, allowing user code to gracefully handle this condition, if necessary. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-14-andriin@fb.com	2019-12-15 15:58:05 -08:00
Andrii Nakryiko	d66562fba1	libbpf: Add BPF object skeleton support Add new set of APIs, allowing to open/load/attach BPF object through BPF object skeleton, generated by bpftool for a specific BPF object file. All the xxx_skeleton() APIs wrap up corresponding bpf_object_xxx() APIs, but additionally also automate map/program lookups by name, global data initialization and mmap()-ing, etc. All this greatly improves and simplifies userspace usability of working with BPF programs. See follow up patches for examples. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-13-andriin@fb.com	2019-12-15 15:58:05 -08:00
Andrii Nakryiko	3f51935314	libbpf: Reduce log level of supported section names dump It's quite spammy. And now that bpf_object__open() is trying to determine program type from its section name, we are getting these verbose messages all the time. Reduce their log level to DEBUG. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-12-andriin@fb.com	2019-12-15 15:58:05 -08:00
Andrii Nakryiko	13acb508ae	libbpf: Postpone BTF ID finding for TRACING programs to load phase Move BTF ID determination for BPF_PROG_TYPE_TRACING programs to a load phase. Performing it at open step is inconvenient, because it prevents BPF skeleton generation on older host kernel, which doesn't contain BTF_KIND_FUNCs information in vmlinux BTF. This is a common set up, though, when, e.g., selftests are compiled on older host kernel, but the test program itself is executed in qemu VM with bleeding edge kernel. Having this BTF searching performed at load time allows to successfully use bpf_object__open() for codegen and inspection of BPF object file. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-11-andriin@fb.com	2019-12-15 15:58:05 -08:00
Andrii Nakryiko	eba9c5f498	libbpf: Refactor global data map initialization Refactor global data map initialization to use anonymous mmap()-ed memory instead of malloc()-ed one. This allows to do a transparent re-mmap()-ing of already existing memory address to point to BPF map's memory after bpf_object__load() step (done in follow up patch). This choreographed setup allows to have a nice and unsurprising way to pre-initialize read-only (and r/w as well) maps by user and after BPF map creation keep working with mmap()-ed contents of this map. All in a way that doesn't require user code to update any pointers: the illusion of working with memory contents is preserved before and after actual BPF map instantiation. Selftests and runqslower example demonstrate this feature in follow up patches. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-10-andriin@fb.com	2019-12-15 15:58:05 -08:00
Andrii Nakryiko	01af3bf067	libbpf: Expose BPF program's function name Add APIs to get BPF program function name, as opposed to bpf_program__title(), which returns BPF program function's section name. Function name has a benefit of being a valid C identifier and uniquely identifies a specific BPF program, while section name can be duplicated across multiple independent BPF programs. Add also bpf_object__find_program_by_name(), similar to bpf_object__find_program_by_title(), to facilitate looking up BPF programs by their C function names. Convert one of selftests to new API for look up. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-9-andriin@fb.com	2019-12-15 15:58:05 -08:00
Andrii Nakryiko	9f81654eeb	libbpf: Expose BTF-to-C type declaration emitting API Expose API that allows to emit type declaration and field/variable definition (if optional field name is specified) in valid C syntax for any provided BTF type. This is going to be used by bpftool when emitting data section layout as a struct. As part of making this API useful in a stand-alone fashion, move initialization of some of the internal btf_dump state to earlier phase. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-8-andriin@fb.com	2019-12-15 15:58:05 -08:00
Andrii Nakryiko	3d208f4ca1	libbpf: Expose btf__align_of() API Expose BTF API that calculates type alignment requirements. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191214014341.3442258-7-andriin@fb.com	2019-12-15 15:58:05 -08:00
Andrii Nakryiko	544402d4b4	libbpf: Extract common user-facing helpers LIBBPF_API and DECLARE_LIBBPF_OPTS are needed in many public libbpf API headers. Extract them into libbpf_common.h to avoid unnecessary interdependency between btf.h, libbpf.h, and bpf.h or code duplication. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191214014341.3442258-6-andriin@fb.com	2019-12-15 15:58:05 -08:00
Andrii Nakryiko	917f6b7b07	libbpf: Add BPF_EMBED_OBJ macro for embedding BPF .o files Add a convenience macro BPF_EMBED_OBJ, which allows to embed other files (typically used to embed BPF .o files) into a hosting userspace programs. To C program it is exposed as struct bpf_embed_data, containing a pointer to raw data and its size in bytes. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-5-andriin@fb.com	2019-12-15 15:58:04 -08:00
Andrii Nakryiko	612d05be25	libbpf: Move non-public APIs from libbpf.h to libbpf_internal.h Few libbpf APIs are not public but currently exposed through libbpf.h to be used by bpftool. Move them to libbpf_internal.h, where intent of being non-stable and non-public is much more obvious. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-4-andriin@fb.com	2019-12-15 15:58:04 -08:00
Andrii Nakryiko	d7a18ea7e8	libbpf: Add generic bpf_program__attach() Generalize BPF program attaching and allow libbpf to auto-detect type (and extra parameters, where applicable) and attach supported BPF program types based on program sections. Currently this is supported for: - kprobe/kretprobe; - tracepoint; - raw tracepoint; - tracing programs (typed raw TP/fentry/fexit). More types support can be trivially added within this framework. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-3-andriin@fb.com	2019-12-15 15:58:04 -08:00
Andrii Nakryiko	0d13bfce02	libbpf: Don't require root for bpf_object__open() Reorganize bpf_object__open and bpf_object__load steps such that bpf_object__open doesn't need root access. This was previously done for feature probing and BTF sanitization. This doesn't have to happen on open, though, so move all those steps into the load phase. This is important, because it makes it possible for tools like bpftool, to just open BPF object file and inspect their contents: programs, maps, BTF, etc. For such operations it is prohibitive to require root access. On the other hand, there is a lot of custom libbpf logic in those steps, so its best avoided for tools to reimplement all that on their own. Signed-off-by: Andrii Nakryiko <andriin@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191214014341.3442258-2-andriin@fb.com	2019-12-15 15:58:04 -08:00
Thadeu Lima de Souza Cascardo	aa915931ac	libbpf: Fix readelf output parsing for Fedora Fedora binutils has been patched to show "other info" for a symbol at the end of the line. This was done in order to support unmaintained scripts that would break with the extra info. [1] [1] `b8265c46f7` This in turn has been done to fix the build of ruby, because of checksec. [2] Thanks Michael Ellerman for the pointer. [2] https://bugzilla.redhat.com/show_bug.cgi?id=1479302 As libbpf Makefile is not unmaintained, we can simply deal with either output format, by just removing the "other info" field, as it always comes inside brackets. Fixes: `3464afdf11` (libbpf: Fix readelf output parsing on powerpc with recent binutils) Reported-by: Justin Forbes <jmforbes@linuxtx.org> Signed-off-by: Thadeu Lima de Souza Cascardo <cascardo@canonical.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andriin@fb.com> Cc: Aurelien Jarno <aurelien@aurel32.net> Link: https://lore.kernel.org/bpf/20191213101114.GA3986@calabresa	2019-12-15 09:40:58 -08:00
Alexei Starovoitov	a06ae6acc1	Merge branch 'bpftool-match-by-name' Paul Chaignon says: ==================== When working with frequently modified BPF programs, both the ID and the tag may change. bpftool currently doesn't provide a "stable" way to match such programs. This patchset allows bpftool to match programs and maps by name. When given a tag that matches several programs, bpftool currently only considers the first match. The first patch changes that behavior to either process all matching programs (for the show and dump commands) or error out. The second patch implements program lookup by name, with the same behavior as for tags in case of ambiguity. The last patch implements map lookup by name. Changelogs: Changes in v2: - Fix buffer overflow after realloc. - Add example output to commit message. - Properly close JSON arrays on errors. - Fix style errors (line breaks, for loops, exit labels, type for tagname). - Move do_show code for argc == 2 to do_show_subset functions. - Rebase. ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-12-15 09:32:21 -08:00
Paul Chaignon	99f9863a0c	bpftool: Match maps by name This patch implements lookup by name for maps and changes the behavior of lookups by tag to be consistent with prog subcommands. Similarly to program subcommands, the show and dump commands will return all maps with the given name (or tag), whereas other commands will error out if several maps have the same name (resp. tag). When a map has BTF info, it is dumped in JSON with available BTF info. This patch requires that all matched maps have BTF info before switching the output format to JSON. Signed-off-by: Paul Chaignon <paul.chaignon@orange.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/8de1c9f273860b3ea1680502928f4da2336b853e.1576263640.git.paul.chaignon@gmail.com	2019-12-15 09:03:18 -08:00
Paul Chaignon	a7d22ca2a4	bpftool: Match programs by name When working with frequently modified BPF programs, both the ID and the tag may change. bpftool currently doesn't provide a "stable" way to match such programs. This patch implements lookup by name for programs. The show and dump commands will return all programs with the given name, whereas other commands will error out if several programs have the same name. Signed-off-by: Paul Chaignon <paul.chaignon@orange.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Reviewed-by: Quentin Monnet <quentin.monnet@netronome.com> Link: https://lore.kernel.org/bpf/b5fc1a5dcfaeb5f16fc80295cdaa606dd2d91534.1576263640.git.paul.chaignon@gmail.com	2019-12-15 09:03:18 -08:00
Paul Chaignon	ec2025095c	bpftool: Match several programs with same tag When several BPF programs have the same tag, bpftool matches only the first (in ID order). This patch changes that behavior such that dump and show commands return all matched programs. Commands that require a single program (e.g., pin and attach) will error out if given a tag that matches several. bpftool prog dump will also error out if file or visual are given and several programs have the given tag. In the case of the dump command, a program header is added before each dump only if the tag matches several programs; this patch doesn't change the output if a single program matches. The output when several programs match thus looks as follows. $ ./bpftool prog dump xlated tag 6deef7357e7b4530 3: cgroup_skb tag 6deef7357e7b4530 gpl 0: (bf) r6 = r1 [...] 7: (95) exit 4: cgroup_skb tag 6deef7357e7b4530 gpl 0: (bf) r6 = r1 [...] 7: (95) exit Signed-off-by: Paul Chaignon <paul.chaignon@orange.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/fb1fe943202659a69cd21dd5b907c205af1e1e22.1576263640.git.paul.chaignon@gmail.com	2019-12-15 09:03:18 -08:00
Stanislav Fomichev	a06bf42f5a	selftests/bpf: Test wire_len/gso_segs in BPF_PROG_TEST_RUN Make sure we can pass arbitrary data in wire_len/gso_segs. Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191213223028.161282-2-sdf@google.com	2019-12-13 15:26:53 -08:00
Stanislav Fomichev	850a88cc40	bpf: Expose __sk_buff wire_len/gso_segs to BPF_PROG_TEST_RUN wire_len should not be less than real len and is capped by GSO_MAX_SIZE. gso_segs is capped by GSO_MAX_SEGS. v2: * set wire_len to skb->len when passed wire_len is 0 (Alexei Starovoitov) Signed-off-by: Stanislav Fomichev <sdf@google.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Cc: Martin KaFai Lau <kafai@fb.com> Link: https://lore.kernel.org/bpf/20191213223028.161282-1-sdf@google.com	2019-12-13 15:26:53 -08:00
Alexei Starovoitov	02620d9e62	Merge branch 'bpf-dispatcher' Björn Töpel says: ==================== Overview ======== This is the 6th iteration of the series that introduces the BPF dispatcher, which is a mechanism to avoid indirect calls. The BPF dispatcher is a multi-way branch code generator, targeted for BPF programs. E.g. when an XDP program is executed via the bpf_prog_run_xdp(), it is invoked via an indirect call. With retpolines enabled, the indirect call has a substantial performance impact. The dispatcher is a mechanism that transform indirect calls to direct calls, and therefore avoids the retpoline. The dispatcher is generated using the BPF JIT, and relies on text poking provided by bpf_arch_text_poke(). The dispatcher hijacks a trampoline function it via the __fentry__ nop of the trampoline. One dispatcher instance currently supports up to 48 dispatch points. This can be extended in the future. In this series, only one dispatcher instance is supported, and the only user is XDP. The dispatcher is updated when an XDP program is attached/detached to/from a netdev. An alternative to this could have been to update the dispatcher at program load point, but as there are usually more XDP programs loaded than attached, so the latter was picked. The XDP dispatcher is always enabled, if available, because it helps even when retpolines are disabled. Please refer to the "Performance" section below. The first patch refactors the image allocation from the BPF trampoline code. Patch two introduces the dispatcher, and patch three adds a dispatcher for XDP, and wires up the XDP control-/ fast-path. Patch four adds the dispatcher to BPF_TEST_RUN. Patch five adds a simple selftest, and the last adds alignment to jump targets. I have rebased the series on commit `679152d3a3` ("libbpf: Fix printf compilation warnings on ppc64le arch"). Generated code, x86-64 ====================== The dispatcher currently has a maximum of 48 entries, where one entry is a unique BPF program. Multiple users of a dispatcher instance using the same BPF program will share that entry. The program/slot lookup is performed by a binary search, O(log n). Let's have a look at the generated code. The trampoline function has the following signature: unsigned int tramp(const void ctx, const struct bpf_insn insnsi, unsigned int (bpf_func)(const void , const struct bpf_insn )) On Intel x86-64 this means that rdx will contain the bpf_func. To, make it easier to read, I've let the BPF programs have the following range: 0xffffffffffffffff (-1) to 0xfffffffffffffff0 (-16). 0xffffffff81c00f10 is the retpoline thunk, in this case __x86_indirect_thunk_rdx. If retpolines are disabled the thunk will be a regular indirect call. The minimal dispatcher will then look like this: ffffffffc0002000: cmp rdx,0xffffffffffffffff ffffffffc0002007: je 0xffffffffffffffff ; -1 ffffffffc000200d: jmp 0xffffffff81c00f10 A 16 entry dispatcher looks like this: ffffffffc0020000: cmp rdx,0xfffffffffffffff7 ; -9 ffffffffc0020007: jg 0xffffffffc0020130 ffffffffc002000d: cmp rdx,0xfffffffffffffff3 ; -13 ffffffffc0020014: jg 0xffffffffc00200a0 ffffffffc002001a: cmp rdx,0xfffffffffffffff1 ; -15 ffffffffc0020021: jg 0xffffffffc0020060 ffffffffc0020023: cmp rdx,0xfffffffffffffff0 ; -16 ffffffffc002002a: jg 0xffffffffc0020040 ffffffffc002002c: cmp rdx,0xfffffffffffffff0 ; -16 ffffffffc0020033: je 0xfffffffffffffff0 ; -16 ffffffffc0020039: jmp 0xffffffff81c00f10 ffffffffc002003e: xchg ax,ax ffffffffc0020040: cmp rdx,0xfffffffffffffff1 ; -15 ffffffffc0020047: je 0xfffffffffffffff1 ; -15 ffffffffc002004d: jmp 0xffffffff81c00f10 ffffffffc0020052: nop DWORD PTR [rax+rax1+0x0] ffffffffc002005a: nop WORD PTR [rax+rax1+0x0] ffffffffc0020060: cmp rdx,0xfffffffffffffff2 ; -14 ffffffffc0020067: jg 0xffffffffc0020080 ffffffffc0020069: cmp rdx,0xfffffffffffffff2 ; -14 ffffffffc0020070: je 0xfffffffffffffff2 ; -14 ffffffffc0020076: jmp 0xffffffff81c00f10 ffffffffc002007b: nop DWORD PTR [rax+rax1+0x0] ffffffffc0020080: cmp rdx,0xfffffffffffffff3 ; -13 ffffffffc0020087: je 0xfffffffffffffff3 ; -13 ffffffffc002008d: jmp 0xffffffff81c00f10 ffffffffc0020092: nop DWORD PTR [rax+rax1+0x0] ffffffffc002009a: nop WORD PTR [rax+rax1+0x0] ffffffffc00200a0: cmp rdx,0xfffffffffffffff5 ; -11 ffffffffc00200a7: jg 0xffffffffc00200f0 ffffffffc00200a9: cmp rdx,0xfffffffffffffff4 ; -12 ffffffffc00200b0: jg 0xffffffffc00200d0 ffffffffc00200b2: cmp rdx,0xfffffffffffffff4 ; -12 ffffffffc00200b9: je 0xfffffffffffffff4 ; -12 ffffffffc00200bf: jmp 0xffffffff81c00f10 ffffffffc00200c4: nop DWORD PTR [rax+rax1+0x0] ffffffffc00200cc: nop DWORD PTR [rax+0x0] ffffffffc00200d0: cmp rdx,0xfffffffffffffff5 ; -11 ffffffffc00200d7: je 0xfffffffffffffff5 ; -11 ffffffffc00200dd: jmp 0xffffffff81c00f10 ffffffffc00200e2: nop DWORD PTR [rax+rax1+0x0] ffffffffc00200ea: nop WORD PTR [rax+rax1+0x0] ffffffffc00200f0: cmp rdx,0xfffffffffffffff6 ; -10 ffffffffc00200f7: jg 0xffffffffc0020110 ffffffffc00200f9: cmp rdx,0xfffffffffffffff6 ; -10 ffffffffc0020100: je 0xfffffffffffffff6 ; -10 ffffffffc0020106: jmp 0xffffffff81c00f10 ffffffffc002010b: nop DWORD PTR [rax+rax1+0x0] ffffffffc0020110: cmp rdx,0xfffffffffffffff7 ; -9 ffffffffc0020117: je 0xfffffffffffffff7 ; -9 ffffffffc002011d: jmp 0xffffffff81c00f10 ffffffffc0020122: nop DWORD PTR [rax+rax1+0x0] ffffffffc002012a: nop WORD PTR [rax+rax1+0x0] ffffffffc0020130: cmp rdx,0xfffffffffffffffb ; -5 ffffffffc0020137: jg 0xffffffffc00201d0 ffffffffc002013d: cmp rdx,0xfffffffffffffff9 ; -7 ffffffffc0020144: jg 0xffffffffc0020190 ffffffffc0020146: cmp rdx,0xfffffffffffffff8 ; -8 ffffffffc002014d: jg 0xffffffffc0020170 ffffffffc002014f: cmp rdx,0xfffffffffffffff8 ; -8 ffffffffc0020156: je 0xfffffffffffffff8 ; -8 ffffffffc002015c: jmp 0xffffffff81c00f10 ffffffffc0020161: nop DWORD PTR [rax+rax1+0x0] ffffffffc0020169: nop DWORD PTR [rax+0x0] ffffffffc0020170: cmp rdx,0xfffffffffffffff9 ; -7 ffffffffc0020177: je 0xfffffffffffffff9 ; -7 ffffffffc002017d: jmp 0xffffffff81c00f10 ffffffffc0020182: nop DWORD PTR [rax+rax1+0x0] ffffffffc002018a: nop WORD PTR [rax+rax1+0x0] ffffffffc0020190: cmp rdx,0xfffffffffffffffa ; -6 ffffffffc0020197: jg 0xffffffffc00201b0 ffffffffc0020199: cmp rdx,0xfffffffffffffffa ; -6 ffffffffc00201a0: je 0xfffffffffffffffa ; -6 ffffffffc00201a6: jmp 0xffffffff81c00f10 ffffffffc00201ab: nop DWORD PTR [rax+rax1+0x0] ffffffffc00201b0: cmp rdx,0xfffffffffffffffb ; -5 ffffffffc00201b7: je 0xfffffffffffffffb ; -5 ffffffffc00201bd: jmp 0xffffffff81c00f10 ffffffffc00201c2: nop DWORD PTR [rax+rax1+0x0] ffffffffc00201ca: nop WORD PTR [rax+rax1+0x0] ffffffffc00201d0: cmp rdx,0xfffffffffffffffd ; -3 ffffffffc00201d7: jg 0xffffffffc0020220 ffffffffc00201d9: cmp rdx,0xfffffffffffffffc ; -4 ffffffffc00201e0: jg 0xffffffffc0020200 ffffffffc00201e2: cmp rdx,0xfffffffffffffffc ; -4 ffffffffc00201e9: je 0xfffffffffffffffc ; -4 ffffffffc00201ef: jmp 0xffffffff81c00f10 ffffffffc00201f4: nop DWORD PTR [rax+rax1+0x0] ffffffffc00201fc: nop DWORD PTR [rax+0x0] ffffffffc0020200: cmp rdx,0xfffffffffffffffd ; -3 ffffffffc0020207: je 0xfffffffffffffffd ; -3 ffffffffc002020d: jmp 0xffffffff81c00f10 ffffffffc0020212: nop DWORD PTR [rax+rax1+0x0] ffffffffc002021a: nop WORD PTR [rax+rax1+0x0] ffffffffc0020220: cmp rdx,0xfffffffffffffffe ; -2 ffffffffc0020227: jg 0xffffffffc0020240 ffffffffc0020229: cmp rdx,0xfffffffffffffffe ; -2 ffffffffc0020230: je 0xfffffffffffffffe ; -2 ffffffffc0020236: jmp 0xffffffff81c00f10 ffffffffc002023b: nop DWORD PTR [rax+rax1+0x0] ffffffffc0020240: cmp rdx,0xffffffffffffffff ; -1 ffffffffc0020247: je 0xffffffffffffffff ; -1 ffffffffc002024d: jmp 0xffffffff81c00f10 The nops are there to align jump targets to 16 B. Performance =========== The tests were performed using the xdp_rxq_info sample program with the following command-line: 1. XDP_DRV: # xdp_rxq_info --dev eth0 --action XDP_DROP 2. XDP_SKB: # xdp_rxq_info --dev eth0 -S --action XDP_DROP 3. xdp-perf, from selftests/bpf: # test_progs -v -t xdp_perf Run with mitigations=auto ------------------------- Baseline: 1. 21.7 Mpps (21736190) 2. 3.8 Mpps (3837582) 3. 15 ns Dispatcher: 1. 30.2 Mpps (30176320) 2. 4.0 Mpps (4015579) 3. 5 ns Dispatcher (full; walk all entries, and fallback): 1. 22.0 Mpps (21986704) 2. 3.8 Mpps (3831298) 3. 17 ns Run with mitigations=off ------------------------ Baseline: 1. 29.9 Mpps (29875135) 2. 4.1 Mpps (4100179) 3. 4 ns Dispatcher: 1. 30.4 Mpps (30439241) 2. 4.1 Mpps (4109350) 1. 4 ns Dispatcher (full; walk all entries, and fallback): 1. 28.9 Mpps (28903269) 2. 4.1 Mpps (4080078) 3. 5 ns xdp-perf runs, aliged vs non-aligned jump targets ------------------------------------------------- In this test dispatchers of different sizes, with and without jump target alignment, were exercised. As outlined above the function lookup is performed via binary search. This means that depending on the pointer value of the function, it can reside in the upper or lower part of the search table. The performed tests were: 1. aligned, mititations=auto, function entry < other entries 2. aligned, mititations=auto, function entry > other entries 3. non-aligned, mititations=auto, function entry < other entries 4. non-aligned, mititations=auto, function entry > other entries 5. aligned, mititations=off, function entry < other entries 6. aligned, mititations=off, function entry > other entries 7. non-aligned, mititations=off, function entry < other entries 8. non-aligned, mititations=off, function entry > other entries The micro benchmarks showed that alignment of jump target has some positive impact. A reply to this cover letter will contain complete data for all runs. Multiple xdp-perf baseline with mitigations=auto ------------------------------------------------ Performance counter stats for './test_progs -v -t xdp_perf' (1024 runs): 16.69 msec task-clock # 0.984 CPUs utilized ( +- 0.08% ) 2 context-switches # 0.123 K/sec ( +- 1.11% ) 0 cpu-migrations # 0.000 K/sec ( +- 70.68% ) 97 page-faults # 0.006 M/sec ( +- 0.05% ) 49,254,635 cycles # 2.951 GHz ( +- 0.09% ) (12.28%) 42,138,558 instructions # 0.86 insn per cycle ( +- 0.02% ) (36.15%) 7,315,291 branches # 438.300 M/sec ( +- 0.01% ) (59.43%) 1,011,201 branch-misses # 13.82% of all branches ( +- 0.01% ) (83.31%) 15,440,788 L1-dcache-loads # 925.143 M/sec ( +- 0.00% ) (99.40%) 39,067 L1-dcache-load-misses # 0.25% of all L1-dcache hits ( +- 0.04% ) 6,531 LLC-loads # 0.391 M/sec ( +- 0.05% ) 442 LLC-load-misses # 6.76% of all LL-cache hits ( +- 0.77% ) <not supported> L1-icache-loads 57,964 L1-icache-load-misses ( +- 0.06% ) 15,442,496 dTLB-loads # 925.246 M/sec ( +- 0.00% ) 514 dTLB-load-misses # 0.00% of all dTLB cache hits ( +- 0.73% ) (40.57%) 130 iTLB-loads # 0.008 M/sec ( +- 2.75% ) (16.69%) <not counted> iTLB-load-misses ( +- 8.71% ) (0.60%) <not supported> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 0.0169558 +- 0.0000127 seconds time elapsed ( +- 0.07% ) Multiple xdp-perf dispatcher with mitigations=auto -------------------------------------------------- Note that this includes generating the dispatcher. Performance counter stats for './test_progs -v -t xdp_perf' (1024 runs): 4.80 msec task-clock # 0.953 CPUs utilized ( +- 0.06% ) 1 context-switches # 0.258 K/sec ( +- 1.57% ) 0 cpu-migrations # 0.000 K/sec 97 page-faults # 0.020 M/sec ( +- 0.05% ) 14,185,861 cycles # 2.955 GHz ( +- 0.17% ) (50.49%) 45,691,935 instructions # 3.22 insn per cycle ( +- 0.01% ) (99.19%) 8,346,008 branches # 1738.709 M/sec ( +- 0.00% ) 13,046 branch-misses # 0.16% of all branches ( +- 0.10% ) 15,443,735 L1-dcache-loads # 3217.365 M/sec ( +- 0.00% ) 39,585 L1-dcache-load-misses # 0.26% of all L1-dcache hits ( +- 0.05% ) 7,138 LLC-loads # 1.487 M/sec ( +- 0.06% ) 671 LLC-load-misses # 9.40% of all LL-cache hits ( +- 0.73% ) <not supported> L1-icache-loads 56,213 L1-icache-load-misses ( +- 0.08% ) 15,443,735 dTLB-loads # 3217.365 M/sec ( +- 0.00% ) <not counted> dTLB-load-misses (0.00%) <not counted> iTLB-loads (0.00%) <not counted> iTLB-load-misses (0.00%) <not supported> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 0.00503705 +- 0.00000546 seconds time elapsed ( +- 0.11% ) Revisions ========= v4->v5: [1] * Fixed s/xdp_ctx/ctx/ type-o (Toke) * Marked dispatcher trampoline with noinline attribute (Alexei) v3->v4: [2] * Moved away from doing dispatcher lookup based on the trampoline function, to a model where the dispatcher instance is explicitly passed to the bpf_dispatcher_change_prog() (Alexei) v2->v3: [3] * Removed xdp_call, and instead make the dispatcher available to all XDP users via bpf_prog_run_xdp() and dev_xdp_install(). (Toke) * Always enable the dispatcher, if available (Alexei) * Reuse BPF trampoline image allocator (Alexei) * Make sure the dispatcher is exercised in selftests (Alexei) * Only allow one dispatcher, and wire it to XDP v1->v2: [4] * Fixed i386 build warning (kbuild robot) * Made bpf_dispatcher_lookup() static (kbuild robot) * Make sure xdp_call.h is only enabled for builtins * Add xdp_call() to ixgbe, mlx4, and mlx5 RFC->v1: [5] * Improved error handling (Edward and Andrii) * Explicit cleanup (Andrii) * Use 32B with sext cmp (Alexei) * Align jump targets to 16B (Alexei) * 4 to 16 entries (Toke) * Added stats to xdp_call_run() [1] https://lore.kernel.org/bpf/20191211123017.13212-1-bjorn.topel@gmail.com/ [2] https://lore.kernel.org/bpf/20191209135522.16576-1-bjorn.topel@gmail.com/ [3] https://lore.kernel.org/bpf/20191123071226.6501-1-bjorn.topel@gmail.com/ [4] https://lore.kernel.org/bpf/20191119160757.27714-1-bjorn.topel@gmail.com/ [5] https://lore.kernel.org/bpf/20191113204737.31623-1-bjorn.topel@gmail.com/ ==================== Signed-off-by: Alexei Starovoitov <ast@kernel.org>	2019-12-13 13:15:40 -08:00
Björn Töpel	116eb788f5	bpf, x86: Align dispatcher branch targets to 16B >From Intel 64 and IA-32 Architectures Optimization Reference Manual, 3.4.1.4 Code Alignment, Assembly/Compiler Coding Rule 11: All branch targets should be 16-byte aligned. This commits aligns branch targets according to the Intel manual. The nops used to align branch targets make the dispatcher larger, and therefore the number of supported dispatch points/programs are descreased from 64 to 48. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-7-bjorn.topel@gmail.com	2019-12-13 13:09:32 -08:00
Björn Töpel	e754f5a6e3	selftests: bpf: Add xdp_perf test The xdp_perf is a dummy XDP test, only used to measure the the cost of jumping into a naive XDP program one million times. To build and run the program: $ cd tools/testing/selftests/bpf $ make $ ./test_progs -v -t xdp_perf Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-6-bjorn.topel@gmail.com	2019-12-13 13:09:32 -08:00
Björn Töpel	f23c4b3924	bpf: Start using the BPF dispatcher in BPF_TEST_RUN In order to properly exercise the BPF dispatcher, this commit adds BPF dispatcher usage to BPF_TEST_RUN when executing XDP programs. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-5-bjorn.topel@gmail.com	2019-12-13 13:09:32 -08:00
Björn Töpel	7e6897f959	bpf, xdp: Start using the BPF dispatcher for XDP This commit adds a BPF dispatcher for XDP. The dispatcher is updated from the XDP control-path, dev_xdp_install(), and used when an XDP program is run via bpf_prog_run_xdp(). Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-4-bjorn.topel@gmail.com	2019-12-13 13:09:32 -08:00
Björn Töpel	75ccbef636	bpf: Introduce BPF dispatcher The BPF dispatcher is a multi-way branch code generator, mainly targeted for XDP programs. When an XDP program is executed via the bpf_prog_run_xdp(), it is invoked via an indirect call. The indirect call has a substantial performance impact, when retpolines are enabled. The dispatcher transform indirect calls to direct calls, and therefore avoids the retpoline. The dispatcher is generated using the BPF JIT, and relies on text poking provided by bpf_arch_text_poke(). The dispatcher hijacks a trampoline function it via the __fentry__ nop of the trampoline. One dispatcher instance currently supports up to 64 dispatch points. A user creates a dispatcher with its corresponding trampoline with the DEFINE_BPF_DISPATCHER macro. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-3-bjorn.topel@gmail.com	2019-12-13 13:09:32 -08:00
Björn Töpel	98e8627efc	bpf: Move trampoline JIT image allocation to a function Refactor the image allocation in the BPF trampoline code into a separate function, so it can be shared with the BPF dispatcher in upcoming commits. Signed-off-by: Björn Töpel <bjorn.topel@intel.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Link: https://lore.kernel.org/bpf/20191213175112.30208-2-bjorn.topel@gmail.com	2019-12-13 13:09:32 -08:00

1 2 3 4 5 ...

886956 Commits