2023-01-11 11:05:49 +08:00
|
|
|
/* SPDX-License-Identifier: GPL-2.0-or-later */
|
2015-01-19 23:56:30 +08:00
|
|
|
/*
|
2015-04-28 19:37:42 +08:00
|
|
|
* m_bpf.c BPF based action module
|
2015-01-19 23:56:30 +08:00
|
|
|
*
|
|
|
|
* Authors: Jiri Pirko <jiri@resnulli.us>
|
2015-04-01 23:57:44 +08:00
|
|
|
* Daniel Borkmann <daniel@iogearbox.net>
|
2015-01-19 23:56:30 +08:00
|
|
|
*/
|
|
|
|
|
|
|
|
#include <stdio.h>
|
|
|
|
#include <stdlib.h>
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
|
2015-04-01 23:57:44 +08:00
|
|
|
#include <linux/bpf.h>
|
2015-01-19 23:56:30 +08:00
|
|
|
#include <linux/tc_act/tc_bpf.h>
|
|
|
|
|
|
|
|
#include "utils.h"
|
2016-11-10 08:20:59 +08:00
|
|
|
|
2015-01-19 23:56:30 +08:00
|
|
|
#include "tc_util.h"
|
2016-11-10 08:20:59 +08:00
|
|
|
#include "bpf_util.h"
|
2015-01-19 23:56:30 +08:00
|
|
|
|
2015-04-01 23:57:44 +08:00
|
|
|
static const enum bpf_prog_type bpf_type = BPF_PROG_TYPE_SCHED_ACT;
|
|
|
|
|
2015-01-19 23:56:30 +08:00
|
|
|
static void explain(void)
|
|
|
|
{
|
treewide: refactor help messages
Every tool in the iproute2 package have one or more function to show
an help message to the user. Some of these functions print the help
line by line with a series of printf call, e.g. ip/xfrm_state.c does
60 fprintf calls.
If we group all the calls to a single one and just concatenate strings,
we save a lot of libc calls and thus object size. The size difference
of the compiled binaries calculated with bloat-o-meter is:
ip/ip:
add/remove: 0/0 grow/shrink: 5/15 up/down: 103/-4796 (-4693)
Total: Before=672591, After=667898, chg -0.70%
ip/rtmon:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-54 (-54)
Total: Before=48879, After=48825, chg -0.11%
tc/tc:
add/remove: 0/2 grow/shrink: 31/10 up/down: 882/-6133 (-5251)
Total: Before=351912, After=346661, chg -1.49%
bridge/bridge:
add/remove: 0/0 grow/shrink: 0/1 up/down: 0/-459 (-459)
Total: Before=70502, After=70043, chg -0.65%
misc/lnstat:
add/remove: 0/1 grow/shrink: 1/0 up/down: 48/-486 (-438)
Total: Before=9960, After=9522, chg -4.40%
tipc/tipc:
add/remove: 0/0 grow/shrink: 1/1 up/down: 18/-62 (-44)
Total: Before=79182, After=79138, chg -0.06%
While at it, indent some strings which were starting at column 0,
and use tabs where possible, to have a consistent style across helps.
Signed-off-by: Matteo Croce <mcroce@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-05-17 21:38:28 +08:00
|
|
|
fprintf(stderr,
|
|
|
|
"Usage: ... bpf ... [ index INDEX ]\n"
|
|
|
|
"\n"
|
|
|
|
"BPF use case:\n"
|
|
|
|
" bytecode BPF_BYTECODE\n"
|
|
|
|
" bytecode-file FILE\n"
|
|
|
|
"\n"
|
|
|
|
"eBPF use case:\n"
|
|
|
|
" object-file FILE [ section ACT_NAME ] [ export UDS_FILE ]"
|
|
|
|
" [ verbose ]\n"
|
|
|
|
" object-pinned FILE\n"
|
|
|
|
"\n"
|
|
|
|
"Where BPF_BYTECODE := \'s,c t f k,c t f k,c t f k,...\'\n"
|
|
|
|
"c,t,f,k and s are decimals; s denotes number of 4-tuples\n"
|
|
|
|
"\n"
|
|
|
|
"Where FILE points to a file containing the BPF_BYTECODE string,\n"
|
|
|
|
"an ELF file containing eBPF map definitions and bytecode, or a\n"
|
|
|
|
"pinned eBPF program.\n"
|
|
|
|
"\n"
|
|
|
|
"Where ACT_NAME refers to the section name containing the\n"
|
|
|
|
"action (default \'%s\').\n"
|
|
|
|
"\n"
|
|
|
|
"Where UDS_FILE points to a unix domain socket file in order\n"
|
|
|
|
"to hand off control of all created eBPF maps to an agent.\n"
|
|
|
|
"\n"
|
|
|
|
"Where optionally INDEX points to an existing action, or\n"
|
|
|
|
"explicitly specifies an action index upon creation.\n",
|
|
|
|
bpf_prog_to_default_section(bpf_type));
|
2015-01-19 23:56:30 +08:00
|
|
|
}
|
|
|
|
|
2016-11-10 08:20:59 +08:00
|
|
|
static void bpf_cbpf_cb(void *nl, const struct sock_filter *ops, int ops_len)
|
|
|
|
{
|
|
|
|
addattr16(nl, MAX_MSG, TCA_ACT_BPF_OPS_LEN, ops_len);
|
|
|
|
addattr_l(nl, MAX_MSG, TCA_ACT_BPF_OPS, ops,
|
|
|
|
ops_len * sizeof(struct sock_filter));
|
|
|
|
}
|
|
|
|
|
|
|
|
static void bpf_ebpf_cb(void *nl, int fd, const char *annotation)
|
|
|
|
{
|
|
|
|
addattr32(nl, MAX_MSG, TCA_ACT_BPF_FD, fd);
|
|
|
|
addattrstrz(nl, MAX_MSG, TCA_ACT_BPF_NAME, annotation);
|
|
|
|
}
|
|
|
|
|
|
|
|
static const struct bpf_cfg_ops bpf_cb_ops = {
|
|
|
|
.cbpf_cb = bpf_cbpf_cb,
|
|
|
|
.ebpf_cb = bpf_ebpf_cb,
|
|
|
|
};
|
|
|
|
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
static int bpf_parse_opt(struct action_util *a, int *ptr_argc, char ***ptr_argv,
|
|
|
|
int tca_id, struct nlmsghdr *n)
|
2015-01-19 23:56:30 +08:00
|
|
|
{
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
const char *bpf_obj = NULL, *bpf_uds_name = NULL;
|
2017-05-17 01:29:36 +08:00
|
|
|
struct tc_act_bpf parm = {};
|
2016-11-10 08:20:59 +08:00
|
|
|
struct bpf_cfg_in cfg = {};
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
bool seen_run = false;
|
2015-01-19 23:56:30 +08:00
|
|
|
struct rtattr *tail;
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
int argc, ret = 0;
|
|
|
|
char **argv;
|
|
|
|
|
|
|
|
argv = *ptr_argv;
|
|
|
|
argc = *ptr_argc;
|
2015-01-19 23:56:30 +08:00
|
|
|
|
|
|
|
if (matches(*argv, "bpf") != 0)
|
|
|
|
return -1;
|
|
|
|
|
|
|
|
NEXT_ARG();
|
|
|
|
|
2018-01-31 16:15:08 +08:00
|
|
|
tail = addattr_nest(n, MAX_MSG, tca_id);
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
|
2015-01-19 23:56:30 +08:00
|
|
|
while (argc > 0) {
|
|
|
|
if (matches(*argv, "run") == 0) {
|
|
|
|
NEXT_ARG();
|
2017-11-24 10:12:05 +08:00
|
|
|
|
|
|
|
if (seen_run)
|
|
|
|
duparg("run", *argv);
|
2015-04-01 23:57:44 +08:00
|
|
|
opt_bpf:
|
|
|
|
seen_run = true;
|
2017-11-24 10:11:58 +08:00
|
|
|
cfg.type = bpf_type;
|
2016-11-10 08:20:59 +08:00
|
|
|
cfg.argc = argc;
|
|
|
|
cfg.argv = argv;
|
|
|
|
|
2017-11-24 10:12:02 +08:00
|
|
|
if (bpf_parse_and_load_common(&cfg, &bpf_cb_ops, n))
|
2015-01-19 23:56:30 +08:00
|
|
|
return -1;
|
2016-11-10 08:20:59 +08:00
|
|
|
|
|
|
|
argc = cfg.argc;
|
|
|
|
argv = cfg.argv;
|
|
|
|
|
|
|
|
bpf_obj = cfg.object;
|
|
|
|
bpf_uds_name = cfg.uds;
|
2015-01-19 23:56:30 +08:00
|
|
|
} else if (matches(*argv, "help") == 0) {
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
explain();
|
|
|
|
return -1;
|
2015-08-07 17:36:50 +08:00
|
|
|
} else if (matches(*argv, "index") == 0) {
|
|
|
|
break;
|
2015-01-19 23:56:30 +08:00
|
|
|
} else {
|
2015-04-01 23:57:44 +08:00
|
|
|
if (!seen_run)
|
|
|
|
goto opt_bpf;
|
2015-01-19 23:56:30 +08:00
|
|
|
break;
|
|
|
|
}
|
2015-10-08 21:22:05 +08:00
|
|
|
|
|
|
|
NEXT_ARG_FWD();
|
2015-01-19 23:56:30 +08:00
|
|
|
}
|
|
|
|
|
2017-05-17 01:29:36 +08:00
|
|
|
parse_action_control_dflt(&argc, &argv, &parm.action,
|
|
|
|
false, TC_ACT_PIPE);
|
2015-01-19 23:56:30 +08:00
|
|
|
|
|
|
|
if (argc) {
|
|
|
|
if (matches(*argv, "index") == 0) {
|
|
|
|
NEXT_ARG();
|
|
|
|
if (get_u32(&parm.index, *argv, 10)) {
|
|
|
|
fprintf(stderr, "bpf: Illegal \"index\"\n");
|
|
|
|
return -1;
|
|
|
|
}
|
2015-10-08 21:22:05 +08:00
|
|
|
|
|
|
|
NEXT_ARG_FWD();
|
2015-01-19 23:56:30 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
addattr_l(n, MAX_MSG, TCA_ACT_BPF_PARMS, &parm, sizeof(parm));
|
2018-01-31 16:15:08 +08:00
|
|
|
addattr_nest_end(n, tail);
|
2015-01-19 23:56:30 +08:00
|
|
|
|
2015-04-01 23:57:44 +08:00
|
|
|
if (bpf_uds_name)
|
tc: built-in eBPF exec proxy
This work follows upon commit 6256f8c9e45f ("tc, bpf: finalize eBPF
support for cls and act front-end") and takes up the idea proposed by
Hannes Frederic Sowa to spawn a shell (or any other command) that holds
generated eBPF map file descriptors.
File descriptors, based on their id, are being fetched from the same
unix domain socket as demonstrated in the bpf_agent, the shell spawned
via execvpe(2) and the map fds passed over the environment, and thus
are made available to applications in the fashion of std{in,out,err}
for read/write access, for example in case of iproute2's examples/bpf/:
# env | grep BPF
BPF_NUM_MAPS=3
BPF_MAP1=6 <- BPF_MAP_ID_QUEUE (id 1)
BPF_MAP0=5 <- BPF_MAP_ID_PROTO (id 0)
BPF_MAP2=7 <- BPF_MAP_ID_DROPS (id 2)
# ls -la /proc/self/fd
[...]
lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4
lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4
[...]
lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map
lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map
The advantage (as opposed to the direct/native usage) is that now the
shell is map fd owner and applications can terminate and easily reattach
to descriptors w/o any kernel changes. Moreover, multiple applications
can easily read/write eBPF maps simultaneously.
To further allow users for experimenting with that, next step is to add
a small helper that can get along with simple data types, so that also
shell scripts can make use of bpf syscall, f.e to read/write into maps.
Generally, this allows for prepopulating maps, or any runtime altering
which could influence eBPF program behaviour (f.e. different run-time
classifications, skb modifications, ...), dumping of statistics, etc.
Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860
Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org>
Acked-by: Alexei Starovoitov <ast@plumgrid.com>
2015-04-17 03:20:06 +08:00
|
|
|
ret = bpf_send_map_fds(bpf_uds_name, bpf_obj);
|
2015-04-01 23:57:44 +08:00
|
|
|
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
*ptr_argc = argc;
|
|
|
|
*ptr_argv = argv;
|
|
|
|
|
2015-04-01 23:57:44 +08:00
|
|
|
return ret;
|
2015-01-19 23:56:30 +08:00
|
|
|
}
|
|
|
|
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
static int bpf_print_opt(struct action_util *au, FILE *f, struct rtattr *arg)
|
2015-01-19 23:56:30 +08:00
|
|
|
{
|
|
|
|
struct rtattr *tb[TCA_ACT_BPF_MAX + 1];
|
|
|
|
struct tc_act_bpf *parm;
|
tc: full JSON support for 'bpf' actions
Add full JSON output support in the dump of 'act_bpf'.
Example using eBPF:
# tc actions flush action bpf
# tc action add action bpf object bpf/action.o section 'action-ok'
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bpf_name": "action.o:[action-ok]",
"prog": {
"id": 33,
"tag": "a04f5eef06a7f555",
"jited": 1
},
"control_action": {
"type": "pipe"
},
"index": 1,
"ref": 1,
"bind": 0
}
]
}
]
Example using cBPF:
# tc actions flush action bpf
# a=$(mktemp)
# tcpdump -ddd not ether proto 0x888e >$a
# tc action add action bpf bytecode-file $a index 42
# rm $a
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bytecode": {
"length": 4,
"insns": [
{
"code": 40,
"jt": 0,
"jf": 0,
"k": 12
},
{
"code": 21,
"jt": 0,
"jf": 1,
"k": 34958
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 0
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 262144
}
]
},
"control_action": {
"type": "pipe"
},
"index": 42,
"ref": 1,
"bind": 0
}
]
}
]
Tested with:
# ./tdc.py -c bpf
Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-01 01:58:09 +08:00
|
|
|
int d_ok = 0;
|
2016-03-22 02:48:36 +08:00
|
|
|
|
2020-10-23 22:55:35 +08:00
|
|
|
print_string(PRINT_ANY, "kind", "%s ", "bpf");
|
2015-01-19 23:56:30 +08:00
|
|
|
if (arg == NULL)
|
2020-10-23 22:55:35 +08:00
|
|
|
return 0;
|
2015-01-19 23:56:30 +08:00
|
|
|
|
|
|
|
parse_rtattr_nested(tb, TCA_ACT_BPF_MAX, arg);
|
|
|
|
|
|
|
|
if (!tb[TCA_ACT_BPF_PARMS]) {
|
2019-07-10 05:25:14 +08:00
|
|
|
fprintf(stderr, "Missing bpf parameters\n");
|
2015-01-19 23:56:30 +08:00
|
|
|
return -1;
|
|
|
|
}
|
2015-04-01 23:57:44 +08:00
|
|
|
|
2015-01-19 23:56:30 +08:00
|
|
|
parm = RTA_DATA(tb[TCA_ACT_BPF_PARMS]);
|
2015-04-01 23:57:44 +08:00
|
|
|
|
|
|
|
if (tb[TCA_ACT_BPF_NAME])
|
tc: full JSON support for 'bpf' actions
Add full JSON output support in the dump of 'act_bpf'.
Example using eBPF:
# tc actions flush action bpf
# tc action add action bpf object bpf/action.o section 'action-ok'
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bpf_name": "action.o:[action-ok]",
"prog": {
"id": 33,
"tag": "a04f5eef06a7f555",
"jited": 1
},
"control_action": {
"type": "pipe"
},
"index": 1,
"ref": 1,
"bind": 0
}
]
}
]
Example using cBPF:
# tc actions flush action bpf
# a=$(mktemp)
# tcpdump -ddd not ether proto 0x888e >$a
# tc action add action bpf bytecode-file $a index 42
# rm $a
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bytecode": {
"length": 4,
"insns": [
{
"code": 40,
"jt": 0,
"jf": 0,
"k": 12
},
{
"code": 21,
"jt": 0,
"jf": 1,
"k": 34958
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 0
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 262144
}
]
},
"control_action": {
"type": "pipe"
},
"index": 42,
"ref": 1,
"bind": 0
}
]
}
]
Tested with:
# ./tdc.py -c bpf
Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-01 01:58:09 +08:00
|
|
|
print_string(PRINT_ANY, "bpf_name", "%s ",
|
|
|
|
rta_getattr_str(tb[TCA_ACT_BPF_NAME]));
|
2015-04-01 23:57:44 +08:00
|
|
|
if (tb[TCA_ACT_BPF_OPS] && tb[TCA_ACT_BPF_OPS_LEN]) {
|
tc: full JSON support for 'bpf' actions
Add full JSON output support in the dump of 'act_bpf'.
Example using eBPF:
# tc actions flush action bpf
# tc action add action bpf object bpf/action.o section 'action-ok'
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bpf_name": "action.o:[action-ok]",
"prog": {
"id": 33,
"tag": "a04f5eef06a7f555",
"jited": 1
},
"control_action": {
"type": "pipe"
},
"index": 1,
"ref": 1,
"bind": 0
}
]
}
]
Example using cBPF:
# tc actions flush action bpf
# a=$(mktemp)
# tcpdump -ddd not ether proto 0x888e >$a
# tc action add action bpf bytecode-file $a index 42
# rm $a
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bytecode": {
"length": 4,
"insns": [
{
"code": 40,
"jt": 0,
"jf": 0,
"k": 12
},
{
"code": 21,
"jt": 0,
"jf": 1,
"k": 34958
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 0
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 262144
}
]
},
"control_action": {
"type": "pipe"
},
"index": 42,
"ref": 1,
"bind": 0
}
]
}
]
Tested with:
# ./tdc.py -c bpf
Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-01 01:58:09 +08:00
|
|
|
bpf_print_ops(tb[TCA_ACT_BPF_OPS],
|
2015-01-19 23:56:30 +08:00
|
|
|
rta_getattr_u16(tb[TCA_ACT_BPF_OPS_LEN]));
|
tc: full JSON support for 'bpf' actions
Add full JSON output support in the dump of 'act_bpf'.
Example using eBPF:
# tc actions flush action bpf
# tc action add action bpf object bpf/action.o section 'action-ok'
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bpf_name": "action.o:[action-ok]",
"prog": {
"id": 33,
"tag": "a04f5eef06a7f555",
"jited": 1
},
"control_action": {
"type": "pipe"
},
"index": 1,
"ref": 1,
"bind": 0
}
]
}
]
Example using cBPF:
# tc actions flush action bpf
# a=$(mktemp)
# tcpdump -ddd not ether proto 0x888e >$a
# tc action add action bpf bytecode-file $a index 42
# rm $a
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bytecode": {
"length": 4,
"insns": [
{
"code": 40,
"jt": 0,
"jf": 0,
"k": 12
},
{
"code": 21,
"jt": 0,
"jf": 1,
"k": 34958
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 0
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 262144
}
]
},
"control_action": {
"type": "pipe"
},
"index": 42,
"ref": 1,
"bind": 0
}
]
}
]
Tested with:
# ./tdc.py -c bpf
Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-01 01:58:09 +08:00
|
|
|
print_string(PRINT_FP, NULL, "%s", " ");
|
2015-04-01 23:57:44 +08:00
|
|
|
}
|
2015-01-19 23:56:30 +08:00
|
|
|
|
2017-09-05 08:24:32 +08:00
|
|
|
if (tb[TCA_ACT_BPF_ID])
|
tc: full JSON support for 'bpf' actions
Add full JSON output support in the dump of 'act_bpf'.
Example using eBPF:
# tc actions flush action bpf
# tc action add action bpf object bpf/action.o section 'action-ok'
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bpf_name": "action.o:[action-ok]",
"prog": {
"id": 33,
"tag": "a04f5eef06a7f555",
"jited": 1
},
"control_action": {
"type": "pipe"
},
"index": 1,
"ref": 1,
"bind": 0
}
]
}
]
Example using cBPF:
# tc actions flush action bpf
# a=$(mktemp)
# tcpdump -ddd not ether proto 0x888e >$a
# tc action add action bpf bytecode-file $a index 42
# rm $a
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bytecode": {
"length": 4,
"insns": [
{
"code": 40,
"jt": 0,
"jf": 0,
"k": 12
},
{
"code": 21,
"jt": 0,
"jf": 1,
"k": 34958
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 0
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 262144
}
]
},
"control_action": {
"type": "pipe"
},
"index": 42,
"ref": 1,
"bind": 0
}
]
}
]
Tested with:
# ./tdc.py -c bpf
Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-01 01:58:09 +08:00
|
|
|
d_ok = bpf_dump_prog_info(f,
|
|
|
|
rta_getattr_u32(tb[TCA_ACT_BPF_ID]));
|
|
|
|
if (!d_ok && tb[TCA_ACT_BPF_TAG]) {
|
2017-02-23 20:07:14 +08:00
|
|
|
SPRINT_BUF(b);
|
|
|
|
|
tc: full JSON support for 'bpf' actions
Add full JSON output support in the dump of 'act_bpf'.
Example using eBPF:
# tc actions flush action bpf
# tc action add action bpf object bpf/action.o section 'action-ok'
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bpf_name": "action.o:[action-ok]",
"prog": {
"id": 33,
"tag": "a04f5eef06a7f555",
"jited": 1
},
"control_action": {
"type": "pipe"
},
"index": 1,
"ref": 1,
"bind": 0
}
]
}
]
Example using cBPF:
# tc actions flush action bpf
# a=$(mktemp)
# tcpdump -ddd not ether proto 0x888e >$a
# tc action add action bpf bytecode-file $a index 42
# rm $a
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bytecode": {
"length": 4,
"insns": [
{
"code": 40,
"jt": 0,
"jf": 0,
"k": 12
},
{
"code": 21,
"jt": 0,
"jf": 1,
"k": 34958
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 0
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 262144
}
]
},
"control_action": {
"type": "pipe"
},
"index": 42,
"ref": 1,
"bind": 0
}
]
}
]
Tested with:
# ./tdc.py -c bpf
Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-01 01:58:09 +08:00
|
|
|
print_string(PRINT_ANY, "tag", "tag %s ",
|
|
|
|
hexstring_n2a(RTA_DATA(tb[TCA_ACT_BPF_TAG]),
|
|
|
|
RTA_PAYLOAD(tb[TCA_ACT_BPF_TAG]),
|
|
|
|
b, sizeof(b)));
|
2017-02-23 20:07:14 +08:00
|
|
|
}
|
|
|
|
|
tc: full JSON support for 'bpf' actions
Add full JSON output support in the dump of 'act_bpf'.
Example using eBPF:
# tc actions flush action bpf
# tc action add action bpf object bpf/action.o section 'action-ok'
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bpf_name": "action.o:[action-ok]",
"prog": {
"id": 33,
"tag": "a04f5eef06a7f555",
"jited": 1
},
"control_action": {
"type": "pipe"
},
"index": 1,
"ref": 1,
"bind": 0
}
]
}
]
Example using cBPF:
# tc actions flush action bpf
# a=$(mktemp)
# tcpdump -ddd not ether proto 0x888e >$a
# tc action add action bpf bytecode-file $a index 42
# rm $a
# tc -j action list action bpf | jq
[
{
"total acts": 1
},
{
"actions": [
{
"order": 0,
"kind": "bpf",
"bytecode": {
"length": 4,
"insns": [
{
"code": 40,
"jt": 0,
"jf": 0,
"k": 12
},
{
"code": 21,
"jt": 0,
"jf": 1,
"k": 34958
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 0
},
{
"code": 6,
"jt": 0,
"jf": 0,
"k": 262144
}
]
},
"control_action": {
"type": "pipe"
},
"index": 42,
"ref": 1,
"bind": 0
}
]
}
]
Tested with:
# ./tdc.py -c bpf
Cc: Andrea Claudi <aclaudi@redhat.com>
Signed-off-by: Davide Caratti <dcaratti@redhat.com>
Signed-off-by: David Ahern <dsahern@gmail.com>
2019-02-01 01:58:09 +08:00
|
|
|
print_action_control(f, "default-action ", parm->action, _SL_);
|
|
|
|
print_uint(PRINT_ANY, "index", "\t index %u", parm->index);
|
|
|
|
print_int(PRINT_ANY, "ref", " ref %d", parm->refcnt);
|
|
|
|
print_int(PRINT_ANY, "bind", " bind %d", parm->bindcnt);
|
2015-01-19 23:56:30 +08:00
|
|
|
|
|
|
|
if (show_stats) {
|
|
|
|
if (tb[TCA_ACT_BPF_TM]) {
|
|
|
|
struct tcf_t *tm = RTA_DATA(tb[TCA_ACT_BPF_TM]);
|
2016-03-22 02:48:36 +08:00
|
|
|
|
2015-01-19 23:56:30 +08:00
|
|
|
print_tm(f, tm);
|
|
|
|
}
|
|
|
|
}
|
|
|
|
|
|
|
|
fprintf(f, "\n ");
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
struct action_util bpf_action_util = {
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
.id = "bpf",
|
|
|
|
.parse_aopt = bpf_parse_opt,
|
|
|
|
.print_aopt = bpf_print_opt,
|
2015-01-19 23:56:30 +08:00
|
|
|
};
|