iproute2

mirror of https://git.kernel.org/pub/scm/network/iproute2/iproute2.git synced 2024-11-15 22:15:13 +08:00

Author	SHA1	Message	Date
Daniel Borkmann	e42256699c	bpf: make tc's bpf loader generic and move into lib This work moves the bpf loader into the iproute2 library and reworks the tc specific parts into generic code. It's useful as we can then more easily support new program types by just having the same ELF loader backend. Joint work with Thomas Graf. I hacked a rough start of a test suite to make sure nothing breaks [1] and looks all good. [1] https://github.com/borkmann/clsact/blob/master/test_bpf.sh Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: Thomas Graf <tgraf@suug.ch>	2016-11-29 12:35:32 -08:00
Phil Sutter	d17b136f7d	Use C99 style initializers everywhere This big patch was compiled by vimgrepping for memset calls and changing to C99 initializer if applicable. One notable exception is the initialization of union bpf_attr in tc/tc_bpf.c: changing it would break for older gcc versions (at least <=3.4.6). Calls to memset for struct rtattr pointer fields for parse_rtattr*() were just dropped since they are not needed. The changes here allowed the compiler to discover some unused variables, so get rid of them, too. Signed-off-by: Phil Sutter <phil@nwl.cc> Acked-by: David Ahern <dsa@cumulusnetworks.com>	2016-07-20 12:05:24 -07:00
Daniel Borkmann	91d88eeb10	{f,m}_bpf: allow updates on program arrays Since we have all infrastructure in place now, allow atomic live updates on program arrays. This can be very useful e.g. in case programs that are being tail-called need to be replaced, f.e. when classifier functionality needs to be changed, new protocols added/removed during runtime, etc. Thus, provide a way for in-place code updates, minimal example: Given is an object file cls.o that contains the entry point in section 'classifier', has a globally pinned program array 'jmp' with 2 slots and id of 0, and two tail called programs under section '0/0' (prog array key 0) and '0/1' (prog array key 1), the section encoding for the loader is <id/key>. Adding the filter loads everything into cls_bpf: tc filter add dev foo parent ffff: bpf da obj cls.o Now, the program under section '0/1' needs to be replaced with an updated version that resides in the same section (also full path to tc's subfolder of the mount point can be passed, e.g. /sys/fs/bpf/tc/globals/jmp): tc exec bpf graft m:globals/jmp obj cls.o sec 0/1 In case the program resides under a different section 'foo', it can also be injected into the program array like: tc exec bpf graft m:globals/jmp key 1 obj cls.o sec foo If the new tail called classifier program is already available as a pinned object somewhere (here: /sys/fs/bpf/tc/progs/parser), it can be injected into the prog array like: tc exec bpf graft m:globals/jmp key 1 fd m:progs/parser In the kernel, the program on key 1 is being atomically replaced and the old one's refcount dropped. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org>	2015-11-29 11:55:16 -08:00
Daniel Borkmann	32e93fb7f6	{f,m}_bpf: allow for sharing maps This larger work addresses one of the bigger remaining issues on tc's eBPF frontend, that is, to allow for persistent file descriptors. Whenever tc parses the ELF object, extracts and loads maps into the kernel, these file descriptors will be out of reach after the tc instance exits. Meaning, for simple (unnested) programs which contain one or multiple maps, the kernel holds a reference, and they will live on inside the kernel until the program holding them is unloaded, but they will be out of reach for user space, even worse with (also multiple nested) tail calls. For this issue, we introduced the concept of an agent that can receive the set of file descriptors from the tc instance creating them, in order to be able to further inspect/update map data for a specific use case. However, while that is more tied towards specific applications, it still doesn't easily allow for sharing maps accross multiple tc instances and would require a daemon to be running in the background. F.e. when a map should be shared by two eBPF programs, one attached to ingress, one to egress, this currently doesn't work with the tc frontend. This work solves exactly that, i.e. if requested, maps can now be _arbitrarily_ shared between object files (PIN_GLOBAL_NS) or within a single object (but various program sections, PIN_OBJECT_NS) without "loosing" the file descriptor set. To make that happen, we use eBPF object pinning introduced in kernel commit b2197755b263 ("bpf: add support for persistent maps/progs") for exactly this purpose. The shipped examples/bpf/bpf_shared.c code from this patch can be easily applied, for instance, as: - classifier-classifier shared: tc filter add dev foo parent 1: bpf obj shared.o sec egress tc filter add dev foo parent ffff: bpf obj shared.o sec ingress - classifier-action shared (here: late binding to a dummy classifier): tc actions add action bpf obj shared.o sec egress pass index 42 tc filter add dev foo parent ffff: bpf obj shared.o sec ingress tc filter add dev foo parent 1: bpf bytecode '1,6 0 0 4294967295,' \ action bpf index 42 The toy example increments a shared counter on egress and dumps its value on ingress (if no sharing (PIN_NONE) would have been chosen, map value is 0, of course, due to the two map instances being created): [...] <idle>-0 [002] ..s. 38264.788234: : map val: 4 <idle>-0 [002] ..s. 38264.788919: : map val: 4 <idle>-0 [002] ..s. 38264.789599: : map val: 5 [...] ... thus if both sections reference the pinned map(s) in question, tc will take care of fetching the appropriate file descriptor. The patch has been tested extensively on both, classifier and action sides. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>	2015-11-23 16:10:44 -08:00
Daniel Borkmann	4bd624467b	tc: built-in eBPF exec proxy This work follows upon commit `6256f8c9e4` ("tc, bpf: finalize eBPF support for cls and act front-end") and takes up the idea proposed by Hannes Frederic Sowa to spawn a shell (or any other command) that holds generated eBPF map file descriptors. File descriptors, based on their id, are being fetched from the same unix domain socket as demonstrated in the bpf_agent, the shell spawned via execvpe(2) and the map fds passed over the environment, and thus are made available to applications in the fashion of std{in,out,err} for read/write access, for example in case of iproute2's examples/bpf/: # env \| grep BPF BPF_NUM_MAPS=3 BPF_MAP1=6 <- BPF_MAP_ID_QUEUE (id 1) BPF_MAP0=5 <- BPF_MAP_ID_PROTO (id 0) BPF_MAP2=7 <- BPF_MAP_ID_DROPS (id 2) # ls -la /proc/self/fd [...] lrwx------. 1 root root 64 Apr 14 16:46 0 -> /dev/pts/4 lrwx------. 1 root root 64 Apr 14 16:46 1 -> /dev/pts/4 lrwx------. 1 root root 64 Apr 14 16:46 2 -> /dev/pts/4 [...] lrwx------. 1 root root 64 Apr 14 16:46 5 -> anon_inode:bpf-map lrwx------. 1 root root 64 Apr 14 16:46 6 -> anon_inode:bpf-map lrwx------. 1 root root 64 Apr 14 16:46 7 -> anon_inode:bpf-map The advantage (as opposed to the direct/native usage) is that now the shell is map fd owner and applications can terminate and easily reattach to descriptors w/o any kernel changes. Moreover, multiple applications can easily read/write eBPF maps simultaneously. To further allow users for experimenting with that, next step is to add a small helper that can get along with simple data types, so that also shell scripts can make use of bpf syscall, f.e to read/write into maps. Generally, this allows for prepopulating maps, or any runtime altering which could influence eBPF program behaviour (f.e. different run-time classifications, skb modifications, ...), dumping of statistics, etc. Reference: http://thread.gmane.org/gmane.linux.network/357471/focus=357860 Suggested-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Reviewed-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Acked-by: Alexei Starovoitov <ast@plumgrid.com>	2015-04-27 16:39:23 -07:00

5 Commits