2023-07-22 10:41:37 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0 */
|
2015-01-19 23:56:29 +08:00
|
|
|
/*
|
2016-11-10 08:20:59 +08:00
|
|
|
* bpf_util.h BPF common code
|
2015-01-19 23:56:29 +08:00
|
|
|
*
|
2016-11-10 08:20:59 +08:00
|
|
|
* Authors: Daniel Borkmann <daniel@iogearbox.net>
|
2015-01-19 23:56:29 +08:00
|
|
|
* Jiri Pirko <jiri@resnulli.us>
|
|
|
|
*/
|
|
|
|
|
2016-11-10 08:20:59 +08:00
|
|
|
#ifndef __BPF_UTIL__
|
|
|
|
#define __BPF_UTIL__
|
2015-01-19 23:56:29 +08:00
|
|
|
|
2015-03-17 02:37:41 +08:00
|
|
|
#include <linux/bpf.h>
|
bpf: implement btf handling and map annotation
Implement loading of .BTF section from object file and build up
internal table for retrieving key/value id related to maps in
the BPF program. Latter is done by setting up struct btf_type
table.
One of the issues is that there's a disconnect between the data
types used in the map and struct bpf_elf_map, meaning the underlying
types are unknown from the map description. One way to overcome
this is to add a annotation such that the loader will recognize
the relation to both. BPF_ANNOTATE_KV_PAIR(map_foo, struct key,
struct val); has been added to the API that programs can use.
The loader will then pick the corresponding key/value type ids and
attach it to the maps for creation. This can later on be dumped via
bpftool for introspection.
Example with test_xdp_noinline.o from kernel selftests:
[...]
struct ctl_value {
union {
__u64 value;
__u32 ifindex;
__u8 mac[6];
};
};
struct bpf_map_def __attribute__ ((section("maps"), used)) ctl_array = {
.type = BPF_MAP_TYPE_ARRAY,
.key_size = sizeof(__u32),
.value_size = sizeof(struct ctl_value),
.max_entries = 16,
.map_flags = 0,
};
BPF_ANNOTATE_KV_PAIR(ctl_array, __u32, struct ctl_value);
[...]
Above could also further be wrapped in a macro. Compiling through LLVM and
converting to BTF:
# llc --version
LLVM (http://llvm.org/):
LLVM version 7.0.0svn
Optimized build.
Default target: x86_64-unknown-linux-gnu
Host CPU: skylake
Registered Targets:
bpf - BPF (host endian)
bpfeb - BPF (big endian)
bpfel - BPF (little endian)
[...]
# clang [...] -O2 -target bpf -g -emit-llvm -c test_xdp_noinline.c -o - |
llc -march=bpf -mcpu=probe -mattr=dwarfris -filetype=obj -o test_xdp_noinline.o
# pahole -J test_xdp_noinline.o
Checking pahole dump of BPF object file:
# file test_xdp_noinline.o
test_xdp_noinline.o: ELF 64-bit LSB relocatable, *unknown arch 0xf7* version 1 (SYSV), with debug_info, not stripped
# pahole test_xdp_noinline.o
[...]
struct ctl_value {
union {
__u64 value; /* 0 8 */
__u32 ifindex; /* 0 4 */
__u8 mac[0]; /* 0 0 */
}; /* 0 8 */
/* size: 8, cachelines: 1, members: 1 */
/* last cacheline: 8 bytes */
};
Now loading into kernel and dumping the map via bpftool:
# ip -force link set dev lo xdp obj test_xdp_noinline.o sec xdp-test
# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 xdpgeneric/id:227 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
[...]
# bpftool prog show id 227
227: xdp tag a85e060c275c5616 gpl
loaded_at 2018-07-17T14:41:29+0000 uid 0
xlated 8152B not jited memlock 12288B map_ids 381,385,386,382,384,383
# bpftool map dump id 386
[{
"key": 0,
"value": {
"": {
"value": 0,
"ifindex": 0,
"mac": []
}
}
},{
"key": 1,
"value": {
"": {
"value": 0,
"ifindex": 0,
"mac": []
}
}
},{
[...]
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: David Ahern <dsahern@gmail.com>
2018-07-18 07:31:22 +08:00
|
|
|
#include <linux/btf.h>
|
2016-11-10 08:20:59 +08:00
|
|
|
#include <linux/filter.h>
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
#include <linux/magic.h>
|
2016-11-10 08:20:59 +08:00
|
|
|
#include <linux/elf-em.h>
|
|
|
|
#include <linux/if_alg.h>
|
2015-03-17 02:37:41 +08:00
|
|
|
|
|
|
|
#include "utils.h"
|
tc: built-in eBPF exec proxy
This work follows upon commit 6256f8c9e45f ("tc, bpf: finalize eBPF
support for cls and act front-end") and takes up the idea proposed by
Hannes Frederic Sowa to spawn a shell (or any other command) that holds
generated eBPF map file descriptors.
File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:
# env | grep BPF
BPF_NUM_MAPS=3
BPF_MAP1=6 <- BPF_MAP_ID_QUEUE (id 1)
BPF_MAP0=5 <- BPF_MAP_ID_PROTO (id 0)
BPF_MAP2=7 <- BPF_MAP_ID_DROPS (id 2)
# ls -la /proc/self/fd
[...]
lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4
[...]
lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map
The advantage (as opposed to the direct/native usage) is that now the
shell is map fd owner and applications can terminate and easily reattach
to descriptors w/o any kernel changes. Moreover, multiple applications
can easily read/write eBPF maps simultaneously.
To further allow users for experimenting with that, next step is to add
a small helper that can get along with simple data types, so that also
shell scripts can make use of bpf syscall, f.e to read/write into maps.
Generally, this allows for prepopulating maps, or any runtime altering
which could influence eBPF program behaviour (f.e. different run-time
classifications, skb modifications, ...), dumping of statistics, etc.
Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-04-17 03:20:06 +08:00
|
|
|
#include "bpf_scm.h"
|
2015-03-17 02:37:41 +08:00
|
|
|
|
2015-06-03 05:35:34 +08:00
|
|
|
#define BPF_ENV_UDS "TC_BPF_UDS"
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
#define BPF_ENV_MNT "TC_BPF_MNT"
|
2015-06-03 05:35:34 +08:00
|
|
|
|
2016-04-09 06:32:04 +08:00
|
|
|
#ifndef BPF_MAX_LOG
|
|
|
|
# define BPF_MAX_LOG 4096
|
|
|
|
#endif
|
|
|
|
|
2016-11-10 08:20:59 +08:00
|
|
|
#define BPF_DIR_GLOBALS "globals"
|
|
|
|
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
#ifndef BPF_FS_MAGIC
|
|
|
|
# define BPF_FS_MAGIC 0xcafe4a11
|
|
|
|
#endif
|
2015-01-19 23:56:29 +08:00
|
|
|
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
#define BPF_DIR_MNT "/sys/fs/bpf"
|
|
|
|
|
|
|
|
#ifndef TRACEFS_MAGIC
|
|
|
|
# define TRACEFS_MAGIC 0x74726163
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#define TRACE_DIR_MNT "/sys/kernel/tracing"
|
|
|
|
|
2016-11-10 08:20:59 +08:00
|
|
|
#ifndef AF_ALG
|
|
|
|
# define AF_ALG 38
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifndef EM_BPF
|
|
|
|
# define EM_BPF 247
|
|
|
|
#endif
|
|
|
|
|
|
|
|
struct bpf_cfg_ops {
|
|
|
|
void (*cbpf_cb)(void *nl, const struct sock_filter *ops, int ops_len);
|
|
|
|
void (*ebpf_cb)(void *nl, int fd, const char *annotation);
|
|
|
|
};
|
|
|
|
|
2017-11-24 10:11:59 +08:00
|
|
|
enum bpf_mode {
|
|
|
|
CBPF_BYTECODE,
|
|
|
|
CBPF_FILE,
|
|
|
|
EBPF_OBJECT,
|
|
|
|
EBPF_PINNED,
|
|
|
|
BPF_MODE_MAX,
|
|
|
|
};
|
|
|
|
|
2016-11-10 08:20:59 +08:00
|
|
|
struct bpf_cfg_in {
|
|
|
|
const char *object;
|
|
|
|
const char *section;
|
2022-07-05 12:25:01 +08:00
|
|
|
const char *prog_name;
|
2016-11-10 08:20:59 +08:00
|
|
|
const char *uds;
|
2017-11-24 10:11:58 +08:00
|
|
|
enum bpf_prog_type type;
|
2017-11-24 10:11:59 +08:00
|
|
|
enum bpf_mode mode;
|
2017-11-24 10:12:04 +08:00
|
|
|
__u32 ifindex;
|
2017-11-24 10:12:01 +08:00
|
|
|
bool verbose;
|
2016-11-10 08:20:59 +08:00
|
|
|
int argc;
|
|
|
|
char **argv;
|
2017-11-24 10:12:00 +08:00
|
|
|
struct sock_filter opcodes[BPF_MAXINSNS];
|
2017-11-24 10:12:01 +08:00
|
|
|
union {
|
|
|
|
int n_opcodes;
|
|
|
|
int prog_fd;
|
|
|
|
};
|
2016-11-10 08:20:59 +08:00
|
|
|
};
|
|
|
|
|
2016-12-12 08:53:10 +08:00
|
|
|
/* ALU ops on registers, bpf_add|sub|...: dst_reg += src_reg */
|
|
|
|
|
|
|
|
#define BPF_ALU64_REG(OP, DST, SRC) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_ALU64 | BPF_OP(OP) | BPF_X, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = SRC, \
|
|
|
|
.off = 0, \
|
|
|
|
.imm = 0 })
|
|
|
|
|
|
|
|
#define BPF_ALU32_REG(OP, DST, SRC) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_ALU | BPF_OP(OP) | BPF_X, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = SRC, \
|
|
|
|
.off = 0, \
|
|
|
|
.imm = 0 })
|
|
|
|
|
|
|
|
/* ALU ops on immediates, bpf_add|sub|...: dst_reg += imm32 */
|
|
|
|
|
|
|
|
#define BPF_ALU64_IMM(OP, DST, IMM) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_ALU64 | BPF_OP(OP) | BPF_K, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = 0, \
|
|
|
|
.off = 0, \
|
|
|
|
.imm = IMM })
|
|
|
|
|
|
|
|
#define BPF_ALU32_IMM(OP, DST, IMM) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_ALU | BPF_OP(OP) | BPF_K, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = 0, \
|
|
|
|
.off = 0, \
|
|
|
|
.imm = IMM })
|
|
|
|
|
|
|
|
/* Short form of mov, dst_reg = src_reg */
|
|
|
|
|
|
|
|
#define BPF_MOV64_REG(DST, SRC) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_ALU64 | BPF_MOV | BPF_X, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = SRC, \
|
|
|
|
.off = 0, \
|
|
|
|
.imm = 0 })
|
|
|
|
|
|
|
|
#define BPF_MOV32_REG(DST, SRC) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_ALU | BPF_MOV | BPF_X, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = SRC, \
|
|
|
|
.off = 0, \
|
|
|
|
.imm = 0 })
|
|
|
|
|
|
|
|
/* Short form of mov, dst_reg = imm32 */
|
|
|
|
|
|
|
|
#define BPF_MOV64_IMM(DST, IMM) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_ALU64 | BPF_MOV | BPF_K, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = 0, \
|
|
|
|
.off = 0, \
|
|
|
|
.imm = IMM })
|
|
|
|
|
|
|
|
#define BPF_MOV32_IMM(DST, IMM) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_ALU | BPF_MOV | BPF_K, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = 0, \
|
|
|
|
.off = 0, \
|
|
|
|
.imm = IMM })
|
|
|
|
|
|
|
|
/* BPF_LD_IMM64 macro encodes single 'load 64-bit immediate' insn */
|
|
|
|
#define BPF_LD_IMM64(DST, IMM) \
|
|
|
|
BPF_LD_IMM64_RAW(DST, 0, IMM)
|
|
|
|
|
|
|
|
#define BPF_LD_IMM64_RAW(DST, SRC, IMM) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_LD | BPF_DW | BPF_IMM, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = SRC, \
|
|
|
|
.off = 0, \
|
|
|
|
.imm = (__u32) (IMM) }), \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = 0, /* zero is reserved opcode */ \
|
|
|
|
.dst_reg = 0, \
|
|
|
|
.src_reg = 0, \
|
|
|
|
.off = 0, \
|
|
|
|
.imm = ((__u64) (IMM)) >> 32 })
|
|
|
|
|
|
|
|
#ifndef BPF_PSEUDO_MAP_FD
|
|
|
|
# define BPF_PSEUDO_MAP_FD 1
|
|
|
|
#endif
|
|
|
|
|
|
|
|
/* pseudo BPF_LD_IMM64 insn used to refer to process-local map_fd */
|
|
|
|
#define BPF_LD_MAP_FD(DST, MAP_FD) \
|
|
|
|
BPF_LD_IMM64_RAW(DST, BPF_PSEUDO_MAP_FD, MAP_FD)
|
|
|
|
|
|
|
|
|
|
|
|
/* Direct packet access, R0 = *(uint *) (skb->data + imm32) */
|
|
|
|
|
|
|
|
#define BPF_LD_ABS(SIZE, IMM) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_LD | BPF_SIZE(SIZE) | BPF_ABS, \
|
|
|
|
.dst_reg = 0, \
|
|
|
|
.src_reg = 0, \
|
|
|
|
.off = 0, \
|
|
|
|
.imm = IMM })
|
|
|
|
|
|
|
|
/* Memory load, dst_reg = *(uint *) (src_reg + off16) */
|
|
|
|
|
|
|
|
#define BPF_LDX_MEM(SIZE, DST, SRC, OFF) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_LDX | BPF_SIZE(SIZE) | BPF_MEM, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = SRC, \
|
|
|
|
.off = OFF, \
|
|
|
|
.imm = 0 })
|
|
|
|
|
|
|
|
/* Memory store, *(uint *) (dst_reg + off16) = src_reg */
|
|
|
|
|
|
|
|
#define BPF_STX_MEM(SIZE, DST, SRC, OFF) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_STX | BPF_SIZE(SIZE) | BPF_MEM, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = SRC, \
|
|
|
|
.off = OFF, \
|
|
|
|
.imm = 0 })
|
|
|
|
|
|
|
|
/* Memory store, *(uint *) (dst_reg + off16) = imm32 */
|
|
|
|
|
|
|
|
#define BPF_ST_MEM(SIZE, DST, OFF, IMM) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_ST | BPF_SIZE(SIZE) | BPF_MEM, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = 0, \
|
|
|
|
.off = OFF, \
|
|
|
|
.imm = IMM })
|
|
|
|
|
|
|
|
/* Conditional jumps against registers, if (dst_reg 'op' src_reg) goto pc + off16 */
|
|
|
|
|
|
|
|
#define BPF_JMP_REG(OP, DST, SRC, OFF) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_JMP | BPF_OP(OP) | BPF_X, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = SRC, \
|
|
|
|
.off = OFF, \
|
|
|
|
.imm = 0 })
|
|
|
|
|
|
|
|
/* Conditional jumps against immediates, if (dst_reg 'op' imm32) goto pc + off16 */
|
|
|
|
|
|
|
|
#define BPF_JMP_IMM(OP, DST, IMM, OFF) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_JMP | BPF_OP(OP) | BPF_K, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = 0, \
|
|
|
|
.off = OFF, \
|
|
|
|
.imm = IMM })
|
|
|
|
|
|
|
|
/* Raw code statement block */
|
|
|
|
|
|
|
|
#define BPF_RAW_INSN(CODE, DST, SRC, OFF, IMM) \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = CODE, \
|
|
|
|
.dst_reg = DST, \
|
|
|
|
.src_reg = SRC, \
|
|
|
|
.off = OFF, \
|
|
|
|
.imm = IMM })
|
|
|
|
|
|
|
|
/* Program exit */
|
|
|
|
|
|
|
|
#define BPF_EXIT_INSN() \
|
|
|
|
((struct bpf_insn) { \
|
|
|
|
.code = BPF_JMP | BPF_EXIT, \
|
|
|
|
.dst_reg = 0, \
|
|
|
|
.src_reg = 0, \
|
|
|
|
.off = 0, \
|
|
|
|
.imm = 0 })
|
|
|
|
|
2017-11-24 10:12:03 +08:00
|
|
|
int bpf_parse_common(struct bpf_cfg_in *cfg, const struct bpf_cfg_ops *ops);
|
|
|
|
int bpf_load_common(struct bpf_cfg_in *cfg, const struct bpf_cfg_ops *ops,
|
|
|
|
void *nl);
|
2017-11-24 10:12:02 +08:00
|
|
|
int bpf_parse_and_load_common(struct bpf_cfg_in *cfg,
|
|
|
|
const struct bpf_cfg_ops *ops, void *nl);
|
2016-11-10 08:20:59 +08:00
|
|
|
|
|
|
|
const char *bpf_prog_to_default_section(enum bpf_prog_type type);
|
2015-04-01 23:57:44 +08:00
|
|
|
|
{f,m}_bpf: allow updates on program arrays
Since we have all infrastructure in place now, allow atomic live updates
on program arrays. This can be very useful e.g. in case programs that are
being tail-called need to be replaced, f.e. when classifier functionality
needs to be changed, new protocols added/removed during runtime, etc.
Thus, provide a way for in-place code updates, minimal example: Given is
an object file cls.o that contains the entry point in section 'classifier',
has a globally pinned program array 'jmp' with 2 slots and id of 0, and
two tail called programs under section '0/0' (prog array key 0) and '0/1'
(prog array key 1), the section encoding for the loader is <id/key>.
Adding the filter loads everything into cls_bpf:
tc filter add dev foo parent ffff: bpf da obj cls.o
Now, the program under section '0/1' needs to be replaced with an updated
version that resides in the same section (also full path to tc's subfolder
of the mount point can be passed, e.g. /sys/fs/bpf/tc/globals/jmp):
tc exec bpf graft m:globals/jmp obj cls.o sec 0/1
In case the program resides under a different section 'foo', it can also
be injected into the program array like:
tc exec bpf graft m:globals/jmp key 1 obj cls.o sec foo
If the new tail called classifier program is already available as a pinned
object somewhere (here: /sys/fs/bpf/tc/progs/parser), it can be injected
into the prog array like:
tc exec bpf graft m:globals/jmp key 1 fd m:progs/parser
In the kernel, the program on key 1 is being atomically replaced and the
old one's refcount dropped.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
2015-11-26 22:38:45 +08:00
|
|
|
int bpf_graft_map(const char *map_path, uint32_t *key, int argc, char **argv);
|
2016-11-10 08:20:59 +08:00
|
|
|
int bpf_trace_pipe(void);
|
tc: built-in eBPF exec proxy
This work follows upon commit 6256f8c9e45f ("tc, bpf: finalize eBPF
support for cls and act front-end") and takes up the idea proposed by
Hannes Frederic Sowa to spawn a shell (or any other command) that holds
generated eBPF map file descriptors.
File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:
# env | grep BPF
BPF_NUM_MAPS=3
BPF_MAP1=6 <- BPF_MAP_ID_QUEUE (id 1)
BPF_MAP0=5 <- BPF_MAP_ID_PROTO (id 0)
BPF_MAP2=7 <- BPF_MAP_ID_DROPS (id 2)
# ls -la /proc/self/fd
[...]
lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4
[...]
lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map
The advantage (as opposed to the direct/native usage) is that now the
shell is map fd owner and applications can terminate and easily reattach
to descriptors w/o any kernel changes. Moreover, multiple applications
can easily read/write eBPF maps simultaneously.
To further allow users for experimenting with that, next step is to add
a small helper that can get along with simple data types, so that also
shell scripts can make use of bpf syscall, f.e to read/write into maps.
Generally, this allows for prepopulating maps, or any runtime altering
which could influence eBPF program behaviour (f.e. different run-time
classifications, skb modifications, ...), dumping of statistics, etc.
Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-04-17 03:20:06 +08:00
|
|
|
|
tc: full JSON support for 'bpf' actions
Add full JSON output support in the dump of 'act_bpf'.
Example using eBPF:
# tc actions flush action bpf
# tc action add action bpf object bpf/action.o section 'action-ok'
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bpf_name": "action.o:[action-ok]",
"prog": {
"id": 33,
"tag": "a04f5eef06a7f555",
"jited": 1
},
"control_action": {
"type": "pipe"
},
"index": 1,
"ref": 1,
"bind": 0
}
]
}
]
Example using cBPF:
# tc actions flush action bpf
# a=$(mktemp)
# tcpdump -ddd not ether proto 0x888e >$a
# tc action add action bpf bytecode-file $a index 42
# rm $a
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bytecode": {
"length": 4,
"insns": [
{
"code": 40,
"jt": 0,
"jf": 0,
"k": 12
},
{
"code": 21,
"jt": 0,
"jf": 1,
"k": 34958
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 0
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 262144
}
]
},
"control_action": {
"type": "pipe"
},
"index": 42,
"ref": 1,
"bind": 0
}
]
}
]
Tested with:
# ./tdc.py -c bpf
Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-01 01:58:09 +08:00
|
|
|
void bpf_print_ops(struct rtattr *bpf_ops, __u16 len);
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
|
2020-11-23 21:11:58 +08:00
|
|
|
int bpf_prog_load_dev(enum bpf_prog_type type, const struct bpf_insn *insns,
|
|
|
|
size_t size_insns, const char *license, __u32 ifindex,
|
2023-10-27 16:57:06 +08:00
|
|
|
char *log, size_t size_log, bool verbose);
|
2020-11-23 21:11:58 +08:00
|
|
|
int bpf_program_load(enum bpf_prog_type type, const struct bpf_insn *insns,
|
|
|
|
size_t size_insns, const char *license, char *log,
|
2023-10-27 16:57:06 +08:00
|
|
|
size_t size_log, bool verbose);
|
2016-12-12 08:53:09 +08:00
|
|
|
|
2016-12-12 08:53:08 +08:00
|
|
|
int bpf_prog_attach_fd(int prog_fd, int target_fd, enum bpf_attach_type type);
|
|
|
|
int bpf_prog_detach_fd(int target_fd, enum bpf_attach_type type);
|
2020-11-23 21:11:58 +08:00
|
|
|
int bpf_program_attach(int prog_fd, int target_fd, enum bpf_attach_type type);
|
2016-12-12 08:53:08 +08:00
|
|
|
|
2017-09-05 08:24:32 +08:00
|
|
|
int bpf_dump_prog_info(FILE *f, uint32_t id);
|
2017-07-17 23:18:52 +08:00
|
|
|
|
2022-02-07 08:59:24 +08:00
|
|
|
int bpf(int cmd, union bpf_attr *attr, unsigned int size);
|
|
|
|
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
#ifdef HAVE_ELF
|
tc: built-in eBPF exec proxy
This work follows upon commit 6256f8c9e45f ("tc, bpf: finalize eBPF
support for cls and act front-end") and takes up the idea proposed by
Hannes Frederic Sowa to spawn a shell (or any other command) that holds
generated eBPF map file descriptors.
File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:
# env | grep BPF
BPF_NUM_MAPS=3
BPF_MAP1=6 <- BPF_MAP_ID_QUEUE (id 1)
BPF_MAP0=5 <- BPF_MAP_ID_PROTO (id 0)
BPF_MAP2=7 <- BPF_MAP_ID_DROPS (id 2)
# ls -la /proc/self/fd
[...]
lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4
[...]
lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map
The advantage (as opposed to the direct/native usage) is that now the
shell is map fd owner and applications can terminate and easily reattach
to descriptors w/o any kernel changes. Moreover, multiple applications
can easily read/write eBPF maps simultaneously.
To further allow users for experimenting with that, next step is to add
a small helper that can get along with simple data types, so that also
shell scripts can make use of bpf syscall, f.e to read/write into maps.
Generally, this allows for prepopulating maps, or any runtime altering
which could influence eBPF program behaviour (f.e. different run-time
classifications, skb modifications, ...), dumping of statistics, etc.
Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-04-17 03:20:06 +08:00
|
|
|
int bpf_send_map_fds(const char *path, const char *obj);
|
|
|
|
int bpf_recv_map_fds(const char *path, int *fds, struct bpf_map_aux *aux,
|
|
|
|
unsigned int entries);
|
2020-11-23 21:11:59 +08:00
|
|
|
#ifdef HAVE_LIBBPF
|
|
|
|
int iproute2_bpf_elf_ctx_init(struct bpf_cfg_in *cfg);
|
|
|
|
int iproute2_bpf_fetch_ancillary(void);
|
|
|
|
int iproute2_get_root_path(char *root_path, size_t len);
|
|
|
|
bool iproute2_is_pin_map(const char *libbpf_map_name, char *pathname);
|
|
|
|
bool iproute2_is_map_in_map(const char *libbpf_map_name, struct bpf_elf_map *imap,
|
|
|
|
struct bpf_elf_map *omap, char *omap_name);
|
|
|
|
int iproute2_find_map_name_by_id(unsigned int map_id, char *name);
|
|
|
|
int iproute2_load_libbpf(struct bpf_cfg_in *cfg);
|
|
|
|
#endif /* HAVE_LIBBPF */
|
2015-03-17 02:37:41 +08:00
|
|
|
#else
|
tc: built-in eBPF exec proxy
This work follows upon commit 6256f8c9e45f ("tc, bpf: finalize eBPF
support for cls and act front-end") and takes up the idea proposed by
Hannes Frederic Sowa to spawn a shell (or any other command) that holds
generated eBPF map file descriptors.
File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:
# env | grep BPF
BPF_NUM_MAPS=3
BPF_MAP1=6 <- BPF_MAP_ID_QUEUE (id 1)
BPF_MAP0=5 <- BPF_MAP_ID_PROTO (id 0)
BPF_MAP2=7 <- BPF_MAP_ID_DROPS (id 2)
# ls -la /proc/self/fd
[...]
lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4
[...]
lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map
The advantage (as opposed to the direct/native usage) is that now the
shell is map fd owner and applications can terminate and easily reattach
to descriptors w/o any kernel changes. Moreover, multiple applications
can easily read/write eBPF maps simultaneously.
To further allow users for experimenting with that, next step is to add
a small helper that can get along with simple data types, so that also
shell scripts can make use of bpf syscall, f.e to read/write into maps.
Generally, this allows for prepopulating maps, or any runtime altering
which could influence eBPF program behaviour (f.e. different run-time
classifications, skb modifications, ...), dumping of statistics, etc.
Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-04-17 03:20:06 +08:00
|
|
|
static inline int bpf_send_map_fds(const char *path, const char *obj)
|
2015-04-01 23:57:44 +08:00
|
|
|
{
|
|
|
|
return 0;
|
|
|
|
}
|
tc: built-in eBPF exec proxy
This work follows upon commit 6256f8c9e45f ("tc, bpf: finalize eBPF
support for cls and act front-end") and takes up the idea proposed by
Hannes Frederic Sowa to spawn a shell (or any other command) that holds
generated eBPF map file descriptors.
File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:
# env | grep BPF
BPF_NUM_MAPS=3
BPF_MAP1=6 <- BPF_MAP_ID_QUEUE (id 1)
BPF_MAP0=5 <- BPF_MAP_ID_PROTO (id 0)
BPF_MAP2=7 <- BPF_MAP_ID_DROPS (id 2)
# ls -la /proc/self/fd
[...]
lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4
[...]
lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map
The advantage (as opposed to the direct/native usage) is that now the
shell is map fd owner and applications can terminate and easily reattach
to descriptors w/o any kernel changes. Moreover, multiple applications
can easily read/write eBPF maps simultaneously.
To further allow users for experimenting with that, next step is to add
a small helper that can get along with simple data types, so that also
shell scripts can make use of bpf syscall, f.e to read/write into maps.
Generally, this allows for prepopulating maps, or any runtime altering
which could influence eBPF program behaviour (f.e. different run-time
classifications, skb modifications, ...), dumping of statistics, etc.
Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-04-17 03:20:06 +08:00
|
|
|
|
|
|
|
static inline int bpf_recv_map_fds(const char *path, int *fds,
|
|
|
|
struct bpf_map_aux *aux,
|
|
|
|
unsigned int entries)
|
|
|
|
{
|
|
|
|
return -1;
|
|
|
|
}
|
2020-11-23 21:11:59 +08:00
|
|
|
#ifdef HAVE_LIBBPF
|
|
|
|
static inline int iproute2_load_libbpf(struct bpf_cfg_in *cfg)
|
|
|
|
{
|
|
|
|
fprintf(stderr, "No ELF library support compiled in.\n");
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
#endif /* HAVE_LIBBPF */
|
2015-03-17 02:37:41 +08:00
|
|
|
#endif /* HAVE_ELF */
|
2020-11-23 21:11:57 +08:00
|
|
|
|
|
|
|
const char *get_libbpf_version(void);
|
|
|
|
|
2016-11-10 08:20:59 +08:00
|
|
|
#endif /* __BPF_UTIL__ */
|