2015-04-01 23:57:44 +08:00
|
|
|
#ifndef __BPF_ELF__
|
|
|
|
#define __BPF_ELF__
|
|
|
|
|
|
|
|
#include <asm/types.h>
|
|
|
|
|
|
|
|
/* Note:
|
|
|
|
*
|
|
|
|
* Below ELF section names and bpf_elf_map structure definition
|
|
|
|
* are not (!) kernel ABI. It's rather a "contract" between the
|
|
|
|
* application and the BPF loader in tc. For compatibility, the
|
|
|
|
* section names should stay as-is. Introduction of aliases, if
|
|
|
|
* needed, are a possibility, though.
|
|
|
|
*/
|
|
|
|
|
|
|
|
/* ELF section names, etc */
|
|
|
|
#define ELF_SECTION_LICENSE "license"
|
|
|
|
#define ELF_SECTION_MAPS "maps"
|
|
|
|
#define ELF_SECTION_CLASSIFIER "classifier"
|
|
|
|
#define ELF_SECTION_ACTION "action"
|
|
|
|
|
|
|
|
#define ELF_MAX_MAPS 64
|
|
|
|
#define ELF_MAX_LICENSE_LEN 128
|
|
|
|
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
/* Object pinning settings */
|
|
|
|
#define PIN_NONE 0
|
|
|
|
#define PIN_OBJECT_NS 1
|
|
|
|
#define PIN_GLOBAL_NS 2
|
|
|
|
|
2015-04-01 23:57:44 +08:00
|
|
|
/* ELF map definition */
|
|
|
|
struct bpf_elf_map {
|
|
|
|
__u32 type;
|
|
|
|
__u32 size_key;
|
|
|
|
__u32 size_value;
|
|
|
|
__u32 max_elem;
|
|
|
|
__u32 id;
|
{f,m}_bpf: allow for sharing maps
This larger work addresses one of the bigger remaining issues on
tc's eBPF frontend, that is, to allow for persistent file descriptors.
Whenever tc parses the ELF object, extracts and loads maps into the
kernel, these file descriptors will be out of reach after the tc
instance exits.
Meaning, for simple (unnested) programs which contain one or
multiple maps, the kernel holds a reference, and they will live
on inside the kernel until the program holding them is unloaded,
but they will be out of reach for user space, even worse with
(also multiple nested) tail calls.
For this issue, we introduced the concept of an agent that can
receive the set of file descriptors from the tc instance creating
them, in order to be able to further inspect/update map data for
a specific use case. However, while that is more tied towards
specific applications, it still doesn't easily allow for sharing
maps accross multiple tc instances and would require a daemon to
be running in the background. F.e. when a map should be shared by
two eBPF programs, one attached to ingress, one to egress, this
currently doesn't work with the tc frontend.
This work solves exactly that, i.e. if requested, maps can now be
_arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within
a single object (but various program sections, PIN_OBJECT_NS) without
"loosing" the file descriptor set. To make that happen, we use eBPF
object pinning introduced in kernel commit b2197755b263 ("bpf: add
support for persistent maps/progs") for exactly this purpose.
The shipped examples/bpf/bpf_shared.c code from this patch can be
easily applied, for instance, as:
- classifier-classifier shared:
tc filter add dev foo parent 1: bpf obj shared.o sec egress
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
- classifier-action shared (here: late binding to a dummy classifier):
tc actions add action bpf obj shared.o sec egress pass index 42
tc filter add dev foo parent ffff: bpf obj shared.o sec ingress
tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \
action bpf index 42
The toy example increments a shared counter on egress and dumps its
value on ingress (if no sharing (PIN_NONE) would have been chosen,
map value is 0, of course, due to the two map instances being created):
[...]
<idle>-0 [002] ..s. 38264.788234: : map val: 4
<idle>-0 [002] ..s. 38264.788919: : map val: 4
<idle>-0 [002] ..s. 38264.789599: : map val: 5
[...]
... thus if both sections reference the pinned map(s) in question,
tc will take care of fetching the appropriate file descriptor.
The patch has been tested extensively on both, classifier and
action sides.
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2015-11-13 07:39:29 +08:00
|
|
|
__u8 pinning;
|
2015-04-01 23:57:44 +08:00
|
|
|
};
|
|
|
|
|
|
|
|
#endif /* __BPF_ELF__ */
|