2017-11-25 04:21:35 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2015-04-01 23:57:44 +08:00
|
|
|
#ifndef __BPF_ELF__
|
|
|
|
#define __BPF_ELF__
|
|
|
|
|
|
|
|
#include <asm/types.h>
|
|
|
|
|
|
|
|
/* Note:
|
|
|
|
*
|
|
|
|
* Below ELF section names and bpf_elf_map structure definition
|
|
|
|
* are not (!) kernel ABI. It's rather a "contract" between the
|
|
|
|
* application and the BPF loader in tc. For compatibility, the
|
|
|
|
* section names should stay as-is. Introduction of aliases, if
|
|
|
|
* needed, are a possibility, though.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* ELF section names, etc */
|
|
|
|
#define ELF_SECTION_LICENSE "license"
|
|
|
|
#define ELF_SECTION_MAPS "maps"
|
bpf: add initial support for attaching xdp progs
Now that we made the BPF loader generic as a library, reuse it
for loading XDP programs as well. This basically adds a minimal
start of a facility for iproute2 to load XDP programs. There
currently only exists the xdp1_user.c sample code in the kernel
tree that sets up netlink directly and an iovisor/bcc front-end.
Since we have all the necessary infrastructure in place already
from tc side, we can just reuse its loader back-end and thus
facilitate migration and usability among the two for people
familiar with tc/bpf already. Sharing maps, performing tail calls,
etc works the same way as with tc. Naturally, once kernel
configuration API evolves, we will extend new features for XDP
here as well, resp. extend dumping of related netlink attributes.
Minimal example:
clang -target bpf -O2 -Wall -c prog.c -o prog.o
ip [-force] link set dev em1 xdp obj prog.o # attaching
ip [-d] link # dumping
ip link set dev em1 xdp off # detaching
For the dump, intention is that in the first line for each ip
link entry, we'll see "xdp" to indicate that this device has an
XDP program attached. Once we dump some more useful information
via netlink (digest, etc), idea is that 'ip -d link' will then
display additional relevant program information below the "link/
ether [...]" output line for such devices, for example.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2016-12-06 09:21:57 +08:00
|
|
|
#define ELF_SECTION_PROG "prog"
|
2015-04-01 23:57:44 +08:00
|
|
|
#define ELF_SECTION_CLASSIFIER "classifier"
|
|
|
|
#define ELF_SECTION_ACTION "action"
|
|
|
|
|
|
|
|
#define ELF_MAX_MAPS 64
|
|
|
|
#define ELF_MAX_LICENSE_LEN 128
|
|
|
|
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
/* Object pinning settings */
|
|
|
|
#define PIN_NONE 0
|
|
|
|
#define PIN_OBJECT_NS 1
|
|
|
|
#define PIN_GLOBAL_NS 2
|
|
|
|
|
2015-04-01 23:57:44 +08:00
|
|
|
/* ELF map definition */
|
|
|
|
struct bpf_elf_map {
|
|
|
|
__u32 type;
|
|
|
|
__u32 size_key;
|
|
|
|
__u32 size_value;
|
|
|
|
__u32 max_elem;
|
2016-04-09 06:32:05 +08:00
|
|
|
__u32 flags;
|
2015-04-01 23:57:44 +08:00
|
|
|
__u32 id;
|
2015-11-26 22:38:44 +08:00
|
|
|
__u32 pinning;
|
2017-07-17 23:18:51 +08:00
|
|
|
__u32 inner_id;
|
|
|
|
__u32 inner_idx;
|
2015-04-01 23:57:44 +08:00
|
|
|
};
|
|
|
|
|
bpf: implement btf handling and map annotation
Implement loading of .BTF section from object file and build up
internal table for retrieving key/value id related to maps in
the BPF program. Latter is done by setting up struct btf_type
table.
One of the issues is that there's a disconnect between the data
types used in the map and struct bpf_elf_map, meaning the underlying
types are unknown from the map description. One way to overcome
this is to add a annotation such that the loader will recognize
the relation to both. BPF_ANNOTATE_KV_PAIR(map_foo, struct key,
struct val); has been added to the API that programs can use.
The loader will then pick the corresponding key/value type ids and
attach it to the maps for creation. This can later on be dumped via
bpftool for introspection.
Example with test_xdp_noinline.o from kernel selftests:
[...]
struct ctl_value {
union {
__u64 value;
__u32 ifindex;
__u8 mac[6];
};
};
struct bpf_map_def __attribute__ ((section("maps"), used)) ctl_array = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(__u32),
.value_size = sizeof(struct ctl_value),
.max_entries = 16,
.map_flags = 0,
};
BPF_ANNOTATE_KV_PAIR(ctl_array, __u32, struct ctl_value);
[...]
Above could also further be wrapped in a macro. Compiling through LLVM and
converting to BTF:
# llc --version
LLVM (http://llvm.org/):
LLVM version 7.0.0svn
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: skylake
Registered Targets:
bpf - BPF (host endian)
bpfeb - BPF (big endian)
bpfel - BPF (little endian)
[...]
# clang [...] -O2 -target bpf -g -emit-llvm -c test_xdp_noinline.c -o - |
llc -march=bpf -mcpu=probe -mattr=dwarfris -filetype=obj -o test_xdp_noinline.o
# pahole -J test_xdp_noinline.o
Checking pahole dump of BPF object file:
# file test_xdp_noinline.o
test_xdp_noinline.o: ELF 64-bit LSB relocatable, *unknown arch 0xf7* version 1 (SYSV), with debug_info, not stripped
# pahole test_xdp_noinline.o
[...]
struct ctl_value {
union {
__u64 value; /* 0 8 */
__u32 ifindex; /* 0 4 */
__u8 mac[0]; /* 0 0 */
}; /* 0 8 */
/* size: 8, cachelines: 1, members: 1 */
/* last cacheline: 8 bytes */
};
Now loading into kernel and dumping the map via bpftool:
# ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:227 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
[...]
# bpftool prog show id 227
227: xdp tag a85e060c275c5616 gpl
loaded_at 2018-07-17T14:41:29+0000 uid 0
xlated 8152B not jited memlock 12288B map_ids 381,385,386,382,384,383
# bpftool map dump id 386
[{
"key": 0,
"value": {
"": {
"value": 0,
"ifindex": 0,
"mac": []
}
}
},{
"key": 1,
"value": {
"": {
"value": 0,
"ifindex": 0,
"mac": []
}
}
},{
[...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-18 07:31:22 +08:00
|
|
|
#define BPF_ANNOTATE_KV_PAIR(name, type_key, type_val) \
|
|
|
|
struct ____btf_map_##name { \
|
|
|
|
type_key key; \
|
|
|
|
type_val value; \
|
|
|
|
}; \
|
|
|
|
struct ____btf_map_##name \
|
|
|
|
__attribute__ ((section(".maps." #name), used)) \
|
|
|
|
____btf_map_##name = { }
|
|
|
|
|
2015-04-01 23:57:44 +08:00
|
|
|
#endif /* __BPF_ELF__ */
|