bpf: mini eBPF library, test stubs and verifier testsuite
1.
the library includes a trivial set of BPF syscall wrappers:
int bpf_create_map(int key_size, int value_size, int max_entries);
int bpf_update_elem(int fd, void *key, void *value);
int bpf_lookup_elem(int fd, void *key, void *value);
int bpf_delete_elem(int fd, void *key);
int bpf_get_next_key(int fd, void *key, void *next_key);
int bpf_prog_load(enum bpf_prog_type prog_type,
const struct sock_filter_int *insns, int insn_len,
const char *license);
bpf_prog_load() stores verifier log into global bpf_log_buf[] array
and BPF_*() macros to build instructions
2.
test stubs configure eBPF infra with 'unspec' map and program types.
These are fake types used by user space testsuite only.
3.
verifier tests valid and invalid programs and expects predefined
error log messages from kernel.
40 tests so far.
$ sudo ./test_verifier
#0 add+sub+mul OK
#1 unreachable OK
#2 unreachable2 OK
#3 out of range jump OK
#4 out of range jump2 OK
#5 test1 ld_imm64 OK
...
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 15:17:07 +08:00
|
|
|
# kbuild trick to avoid linker error. Can be omitted if a module is built.
|
|
|
|
obj- := dummy.o
|
|
|
|
|
|
|
|
# List of programs to build
|
2016-11-12 02:55:11 +08:00
|
|
|
hostprogs-y := test_lru_dist
|
2014-12-02 07:06:36 +08:00
|
|
|
hostprogs-y += sock_example
|
bpf: add sample usages for persistent maps/progs
This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN
and BPF_OBJ_GET commands can be used.
Example with maps:
# ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42
bpf: map fd:3 (Success)
bpf: pin ret:(0,Success)
bpf: fd:3 u->(1:42) ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m -G -m -k 1
bpf: get fd:3 (Success)
bpf: fd:3 l->(1):42 ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24
bpf: get fd:3 (Success)
bpf: fd:3 u->(1:24) ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m -G -m -k 1
bpf: get fd:3 (Success)
bpf: fd:3 l->(1):24 ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m2 -P -m
bpf: map fd:3 (Success)
bpf: pin ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1
bpf: get fd:3 (Success)
bpf: fd:3 l->(1):0 ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m2 -G -m
bpf: get fd:3 (Success)
Example with progs:
# ./fds_example -F /sys/fs/bpf/p -P -p
bpf: prog fd:3 (Success)
bpf: pin ret:(0,Success)
bpf sock:4 <- fd:3 attached ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/p -G -p
bpf: get fd:3 (Success)
bpf: sock:4 <- fd:3 attached ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/p2 -P -p -o ./sockex1_kern.o
bpf: prog fd:5 (Success)
bpf: pin ret:(0,Success)
bpf: sock:3 <- fd:5 attached ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/p2 -G -p
bpf: get fd:3 (Success)
bpf: sock:4 <- fd:3 attached ret:(0,Success)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-29 21:58:10 +08:00
|
|
|
hostprogs-y += fds_example
|
2014-12-02 07:06:38 +08:00
|
|
|
hostprogs-y += sockex1
|
2014-12-02 07:06:39 +08:00
|
|
|
hostprogs-y += sockex2
|
2015-05-20 07:59:06 +08:00
|
|
|
hostprogs-y += sockex3
|
2015-03-26 03:49:23 +08:00
|
|
|
hostprogs-y += tracex1
|
2015-03-26 03:49:24 +08:00
|
|
|
hostprogs-y += tracex2
|
2015-03-26 03:49:25 +08:00
|
|
|
hostprogs-y += tracex3
|
2015-03-26 03:49:26 +08:00
|
|
|
hostprogs-y += tracex4
|
samples/bpf: bpf_tail_call example for tracing
kprobe example that demonstrates how future seccomp programs may look like.
It attaches to seccomp_phase1() function and tail-calls other BPF programs
depending on syscall number.
Existing optimized classic BPF seccomp programs generated by Chrome look like:
if (sd.nr < 121) {
if (sd.nr < 57) {
if (sd.nr < 22) {
if (sd.nr < 7) {
if (sd.nr < 4) {
if (sd.nr < 1) {
check sys_read
} else {
if (sd.nr < 3) {
check sys_write and sys_open
} else {
check sys_close
}
}
} else {
} else {
} else {
} else {
} else {
}
the future seccomp using native eBPF may look like:
bpf_tail_call(&sd, &syscall_jmp_table, sd.nr);
which is simpler, faster and leaves more room for per-syscall checks.
Usage:
$ sudo ./tracex5
<...>-366 [001] d... 4.870033: : read(fd=1, buf=00007f6d5bebf000, size=771)
<...>-369 [003] d... 4.870066: : mmap
<...>-369 [003] d... 4.870077: : syscall=110 (one of get/set uid/pid/gid)
<...>-369 [003] d... 4.870089: : syscall=107 (one of get/set uid/pid/gid)
sh-369 [000] d... 4.891740: : read(fd=0, buf=00000000023d1000, size=512)
sh-369 [000] d... 4.891747: : write(fd=1, buf=00000000023d3000, size=512)
sh-369 [000] d... 4.891747: : read(fd=1, buf=00000000023d3000, size=512)
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-20 07:59:05 +08:00
|
|
|
hostprogs-y += tracex5
|
2015-08-06 15:02:36 +08:00
|
|
|
hostprogs-y += tracex6
|
2016-07-25 20:55:02 +08:00
|
|
|
hostprogs-y += test_probe_write_user
|
2015-10-21 11:02:35 +08:00
|
|
|
hostprogs-y += trace_output
|
2015-06-19 22:00:44 +08:00
|
|
|
hostprogs-y += lathist
|
2016-02-18 11:58:59 +08:00
|
|
|
hostprogs-y += offwaketime
|
2016-03-09 07:07:52 +08:00
|
|
|
hostprogs-y += spintest
|
2016-03-09 07:07:54 +08:00
|
|
|
hostprogs-y += map_perf_test
|
2016-04-07 09:43:31 +08:00
|
|
|
hostprogs-y += test_overhead
|
2016-07-01 01:28:45 +08:00
|
|
|
hostprogs-y += test_cgrp2_array_pin
|
2016-11-23 23:52:30 +08:00
|
|
|
hostprogs-y += test_cgrp2_attach
|
Add sample for adding simple drop program to link
Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
hook of a link. With the drop-only program, observed single core rate is
~20Mpps.
Other tests were run, for instance without the dropcnt increment or
without reading from the packet header, the packet rate was mostly
unchanged.
$ perf record -a samples/bpf/xdp1 $(</sys/class/net/eth0/ifindex)
proto 17: 20403027 drops/s
./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
Running... ctrl^C to stop
Device: eth4@0
Result: OK: 11791017(c11788327+d2689) usec, 59622913 (60byte,0frags)
5056638pps 2427Mb/sec (2427186240bps) errors: 0
Device: eth4@1
Result: OK: 11791012(c11787906+d3106) usec, 60526944 (60byte,0frags)
5133311pps 2463Mb/sec (2463989280bps) errors: 0
Device: eth4@2
Result: OK: 11791019(c11788249+d2769) usec, 59868091 (60byte,0frags)
5077431pps 2437Mb/sec (2437166880bps) errors: 0
Device: eth4@3
Result: OK: 11795039(c11792403+d2636) usec, 59483181 (60byte,0frags)
5043067pps 2420Mb/sec (2420672160bps) errors: 0
perf report --no-children:
26.05% ksoftirqd/0 [mlx4_en] [k] mlx4_en_process_rx_cq
17.84% ksoftirqd/0 [mlx4_en] [k] mlx4_en_alloc_frags
5.52% ksoftirqd/0 [mlx4_en] [k] mlx4_en_free_frag
4.90% swapper [kernel.vmlinux] [k] poll_idle
4.14% ksoftirqd/0 [kernel.vmlinux] [k] get_page_from_freelist
2.78% ksoftirqd/0 [kernel.vmlinux] [k] __free_pages_ok
2.57% ksoftirqd/0 [kernel.vmlinux] [k] bpf_map_lookup_elem
2.51% swapper [mlx4_en] [k] mlx4_en_process_rx_cq
1.94% ksoftirqd/0 [kernel.vmlinux] [k] percpu_array_map_lookup_elem
1.45% swapper [mlx4_en] [k] mlx4_en_alloc_frags
1.35% ksoftirqd/0 [kernel.vmlinux] [k] free_one_page
1.33% swapper [kernel.vmlinux] [k] intel_idle
1.04% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5c5
0.96% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c58d
0.93% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c6ee
0.92% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c6b9
0.89% ksoftirqd/0 [kernel.vmlinux] [k] __alloc_pages_nodemask
0.83% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c686
0.83% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5d5
0.78% ksoftirqd/0 [mlx4_en] [k] mlx4_alloc_pages.isra.23
0.77% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5b4
0.77% ksoftirqd/0 [kernel.vmlinux] [k] net_rx_action
machine specs:
receiver - Intel E5-1630 v3 @ 3.70GHz
sender - Intel E5645 @ 2.40GHz
Mellanox ConnectX-3 @40G
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 03:16:51 +08:00
|
|
|
hostprogs-y += xdp1
|
2016-07-20 03:16:57 +08:00
|
|
|
hostprogs-y += xdp2
|
2016-08-12 23:57:04 +08:00
|
|
|
hostprogs-y += test_current_task_under_cgroup
|
2016-09-02 09:37:25 +08:00
|
|
|
hostprogs-y += trace_event
|
2016-09-02 09:37:26 +08:00
|
|
|
hostprogs-y += sampleip
|
2016-11-10 07:36:34 +08:00
|
|
|
hostprogs-y += tc_l2_redirect
|
bpf: mini eBPF library, test stubs and verifier testsuite
1.
the library includes a trivial set of BPF syscall wrappers:
int bpf_create_map(int key_size, int value_size, int max_entries);
int bpf_update_elem(int fd, void *key, void *value);
int bpf_lookup_elem(int fd, void *key, void *value);
int bpf_delete_elem(int fd, void *key);
int bpf_get_next_key(int fd, void *key, void *next_key);
int bpf_prog_load(enum bpf_prog_type prog_type,
const struct sock_filter_int *insns, int insn_len,
const char *license);
bpf_prog_load() stores verifier log into global bpf_log_buf[] array
and BPF_*() macros to build instructions
2.
test stubs configure eBPF infra with 'unspec' map and program types.
These are fake types used by user space testsuite only.
3.
verifier tests valid and invalid programs and expects predefined
error log messages from kernel.
40 tests so far.
$ sudo ./test_verifier
#0 add+sub+mul OK
#1 unreachable OK
#2 unreachable2 OK
#3 out of range jump OK
#4 out of range jump2 OK
#5 test1 ld_imm64 OK
...
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 15:17:07 +08:00
|
|
|
|
2016-11-12 02:55:11 +08:00
|
|
|
test_lru_dist-objs := test_lru_dist.o libbpf.o
|
2014-12-02 07:06:36 +08:00
|
|
|
sock_example-objs := sock_example.o libbpf.o
|
bpf: add sample usages for persistent maps/progs
This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN
and BPF_OBJ_GET commands can be used.
Example with maps:
# ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42
bpf: map fd:3 (Success)
bpf: pin ret:(0,Success)
bpf: fd:3 u->(1:42) ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m -G -m -k 1
bpf: get fd:3 (Success)
bpf: fd:3 l->(1):42 ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24
bpf: get fd:3 (Success)
bpf: fd:3 u->(1:24) ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m -G -m -k 1
bpf: get fd:3 (Success)
bpf: fd:3 l->(1):24 ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m2 -P -m
bpf: map fd:3 (Success)
bpf: pin ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1
bpf: get fd:3 (Success)
bpf: fd:3 l->(1):0 ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m2 -G -m
bpf: get fd:3 (Success)
Example with progs:
# ./fds_example -F /sys/fs/bpf/p -P -p
bpf: prog fd:3 (Success)
bpf: pin ret:(0,Success)
bpf sock:4 <- fd:3 attached ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/p -G -p
bpf: get fd:3 (Success)
bpf: sock:4 <- fd:3 attached ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/p2 -P -p -o ./sockex1_kern.o
bpf: prog fd:5 (Success)
bpf: pin ret:(0,Success)
bpf: sock:3 <- fd:5 attached ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/p2 -G -p
bpf: get fd:3 (Success)
bpf: sock:4 <- fd:3 attached ret:(0,Success)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-29 21:58:10 +08:00
|
|
|
fds_example-objs := bpf_load.o libbpf.o fds_example.o
|
2014-12-02 07:06:38 +08:00
|
|
|
sockex1-objs := bpf_load.o libbpf.o sockex1_user.o
|
2014-12-02 07:06:39 +08:00
|
|
|
sockex2-objs := bpf_load.o libbpf.o sockex2_user.o
|
2015-05-20 07:59:06 +08:00
|
|
|
sockex3-objs := bpf_load.o libbpf.o sockex3_user.o
|
2015-03-26 03:49:23 +08:00
|
|
|
tracex1-objs := bpf_load.o libbpf.o tracex1_user.o
|
2015-03-26 03:49:24 +08:00
|
|
|
tracex2-objs := bpf_load.o libbpf.o tracex2_user.o
|
2015-03-26 03:49:25 +08:00
|
|
|
tracex3-objs := bpf_load.o libbpf.o tracex3_user.o
|
2015-03-26 03:49:26 +08:00
|
|
|
tracex4-objs := bpf_load.o libbpf.o tracex4_user.o
|
samples/bpf: bpf_tail_call example for tracing
kprobe example that demonstrates how future seccomp programs may look like.
It attaches to seccomp_phase1() function and tail-calls other BPF programs
depending on syscall number.
Existing optimized classic BPF seccomp programs generated by Chrome look like:
if (sd.nr < 121) {
if (sd.nr < 57) {
if (sd.nr < 22) {
if (sd.nr < 7) {
if (sd.nr < 4) {
if (sd.nr < 1) {
check sys_read
} else {
if (sd.nr < 3) {
check sys_write and sys_open
} else {
check sys_close
}
}
} else {
} else {
} else {
} else {
} else {
}
the future seccomp using native eBPF may look like:
bpf_tail_call(&sd, &syscall_jmp_table, sd.nr);
which is simpler, faster and leaves more room for per-syscall checks.
Usage:
$ sudo ./tracex5
<...>-366 [001] d... 4.870033: : read(fd=1, buf=00007f6d5bebf000, size=771)
<...>-369 [003] d... 4.870066: : mmap
<...>-369 [003] d... 4.870077: : syscall=110 (one of get/set uid/pid/gid)
<...>-369 [003] d... 4.870089: : syscall=107 (one of get/set uid/pid/gid)
sh-369 [000] d... 4.891740: : read(fd=0, buf=00000000023d1000, size=512)
sh-369 [000] d... 4.891747: : write(fd=1, buf=00000000023d3000, size=512)
sh-369 [000] d... 4.891747: : read(fd=1, buf=00000000023d3000, size=512)
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-20 07:59:05 +08:00
|
|
|
tracex5-objs := bpf_load.o libbpf.o tracex5_user.o
|
2015-08-06 15:02:36 +08:00
|
|
|
tracex6-objs := bpf_load.o libbpf.o tracex6_user.o
|
2016-07-25 20:55:02 +08:00
|
|
|
test_probe_write_user-objs := bpf_load.o libbpf.o test_probe_write_user_user.o
|
2015-10-21 11:02:35 +08:00
|
|
|
trace_output-objs := bpf_load.o libbpf.o trace_output_user.o
|
2015-06-19 22:00:44 +08:00
|
|
|
lathist-objs := bpf_load.o libbpf.o lathist_user.o
|
2016-02-18 11:58:59 +08:00
|
|
|
offwaketime-objs := bpf_load.o libbpf.o offwaketime_user.o
|
2016-03-09 07:07:52 +08:00
|
|
|
spintest-objs := bpf_load.o libbpf.o spintest_user.o
|
2016-03-09 07:07:54 +08:00
|
|
|
map_perf_test-objs := bpf_load.o libbpf.o map_perf_test_user.o
|
2016-04-07 09:43:31 +08:00
|
|
|
test_overhead-objs := bpf_load.o libbpf.o test_overhead_user.o
|
2016-07-01 01:28:45 +08:00
|
|
|
test_cgrp2_array_pin-objs := libbpf.o test_cgrp2_array_pin.o
|
2016-11-23 23:52:30 +08:00
|
|
|
test_cgrp2_attach-objs := libbpf.o test_cgrp2_attach.o
|
Add sample for adding simple drop program to link
Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
hook of a link. With the drop-only program, observed single core rate is
~20Mpps.
Other tests were run, for instance without the dropcnt increment or
without reading from the packet header, the packet rate was mostly
unchanged.
$ perf record -a samples/bpf/xdp1 $(</sys/class/net/eth0/ifindex)
proto 17: 20403027 drops/s
./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
Running... ctrl^C to stop
Device: eth4@0
Result: OK: 11791017(c11788327+d2689) usec, 59622913 (60byte,0frags)
5056638pps 2427Mb/sec (2427186240bps) errors: 0
Device: eth4@1
Result: OK: 11791012(c11787906+d3106) usec, 60526944 (60byte,0frags)
5133311pps 2463Mb/sec (2463989280bps) errors: 0
Device: eth4@2
Result: OK: 11791019(c11788249+d2769) usec, 59868091 (60byte,0frags)
5077431pps 2437Mb/sec (2437166880bps) errors: 0
Device: eth4@3
Result: OK: 11795039(c11792403+d2636) usec, 59483181 (60byte,0frags)
5043067pps 2420Mb/sec (2420672160bps) errors: 0
perf report --no-children:
26.05% ksoftirqd/0 [mlx4_en] [k] mlx4_en_process_rx_cq
17.84% ksoftirqd/0 [mlx4_en] [k] mlx4_en_alloc_frags
5.52% ksoftirqd/0 [mlx4_en] [k] mlx4_en_free_frag
4.90% swapper [kernel.vmlinux] [k] poll_idle
4.14% ksoftirqd/0 [kernel.vmlinux] [k] get_page_from_freelist
2.78% ksoftirqd/0 [kernel.vmlinux] [k] __free_pages_ok
2.57% ksoftirqd/0 [kernel.vmlinux] [k] bpf_map_lookup_elem
2.51% swapper [mlx4_en] [k] mlx4_en_process_rx_cq
1.94% ksoftirqd/0 [kernel.vmlinux] [k] percpu_array_map_lookup_elem
1.45% swapper [mlx4_en] [k] mlx4_en_alloc_frags
1.35% ksoftirqd/0 [kernel.vmlinux] [k] free_one_page
1.33% swapper [kernel.vmlinux] [k] intel_idle
1.04% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5c5
0.96% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c58d
0.93% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c6ee
0.92% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c6b9
0.89% ksoftirqd/0 [kernel.vmlinux] [k] __alloc_pages_nodemask
0.83% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c686
0.83% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5d5
0.78% ksoftirqd/0 [mlx4_en] [k] mlx4_alloc_pages.isra.23
0.77% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5b4
0.77% ksoftirqd/0 [kernel.vmlinux] [k] net_rx_action
machine specs:
receiver - Intel E5-1630 v3 @ 3.70GHz
sender - Intel E5645 @ 2.40GHz
Mellanox ConnectX-3 @40G
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 03:16:51 +08:00
|
|
|
xdp1-objs := bpf_load.o libbpf.o xdp1_user.o
|
2016-07-20 03:16:57 +08:00
|
|
|
# reuse xdp1 source intentionally
|
|
|
|
xdp2-objs := bpf_load.o libbpf.o xdp1_user.o
|
2016-08-12 23:57:04 +08:00
|
|
|
test_current_task_under_cgroup-objs := bpf_load.o libbpf.o \
|
|
|
|
test_current_task_under_cgroup_user.o
|
2016-09-02 09:37:25 +08:00
|
|
|
trace_event-objs := bpf_load.o libbpf.o trace_event_user.o
|
2016-09-02 09:37:26 +08:00
|
|
|
sampleip-objs := bpf_load.o libbpf.o sampleip_user.o
|
2016-11-10 07:36:34 +08:00
|
|
|
tc_l2_redirect-objs := bpf_load.o libbpf.o tc_l2_redirect_user.o
|
bpf: mini eBPF library, test stubs and verifier testsuite
1.
the library includes a trivial set of BPF syscall wrappers:
int bpf_create_map(int key_size, int value_size, int max_entries);
int bpf_update_elem(int fd, void *key, void *value);
int bpf_lookup_elem(int fd, void *key, void *value);
int bpf_delete_elem(int fd, void *key);
int bpf_get_next_key(int fd, void *key, void *next_key);
int bpf_prog_load(enum bpf_prog_type prog_type,
const struct sock_filter_int *insns, int insn_len,
const char *license);
bpf_prog_load() stores verifier log into global bpf_log_buf[] array
and BPF_*() macros to build instructions
2.
test stubs configure eBPF infra with 'unspec' map and program types.
These are fake types used by user space testsuite only.
3.
verifier tests valid and invalid programs and expects predefined
error log messages from kernel.
40 tests so far.
$ sudo ./test_verifier
#0 add+sub+mul OK
#1 unreachable OK
#2 unreachable2 OK
#3 out of range jump OK
#4 out of range jump2 OK
#5 test1 ld_imm64 OK
...
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 15:17:07 +08:00
|
|
|
|
|
|
|
# Tell kbuild to always build the programs
|
|
|
|
always := $(hostprogs-y)
|
2014-12-02 07:06:38 +08:00
|
|
|
always += sockex1_kern.o
|
2014-12-02 07:06:39 +08:00
|
|
|
always += sockex2_kern.o
|
2015-05-20 07:59:06 +08:00
|
|
|
always += sockex3_kern.o
|
2015-03-26 03:49:23 +08:00
|
|
|
always += tracex1_kern.o
|
2015-03-26 03:49:24 +08:00
|
|
|
always += tracex2_kern.o
|
2015-03-26 03:49:25 +08:00
|
|
|
always += tracex3_kern.o
|
2015-03-26 03:49:26 +08:00
|
|
|
always += tracex4_kern.o
|
samples/bpf: bpf_tail_call example for tracing
kprobe example that demonstrates how future seccomp programs may look like.
It attaches to seccomp_phase1() function and tail-calls other BPF programs
depending on syscall number.
Existing optimized classic BPF seccomp programs generated by Chrome look like:
if (sd.nr < 121) {
if (sd.nr < 57) {
if (sd.nr < 22) {
if (sd.nr < 7) {
if (sd.nr < 4) {
if (sd.nr < 1) {
check sys_read
} else {
if (sd.nr < 3) {
check sys_write and sys_open
} else {
check sys_close
}
}
} else {
} else {
} else {
} else {
} else {
}
the future seccomp using native eBPF may look like:
bpf_tail_call(&sd, &syscall_jmp_table, sd.nr);
which is simpler, faster and leaves more room for per-syscall checks.
Usage:
$ sudo ./tracex5
<...>-366 [001] d... 4.870033: : read(fd=1, buf=00007f6d5bebf000, size=771)
<...>-369 [003] d... 4.870066: : mmap
<...>-369 [003] d... 4.870077: : syscall=110 (one of get/set uid/pid/gid)
<...>-369 [003] d... 4.870089: : syscall=107 (one of get/set uid/pid/gid)
sh-369 [000] d... 4.891740: : read(fd=0, buf=00000000023d1000, size=512)
sh-369 [000] d... 4.891747: : write(fd=1, buf=00000000023d3000, size=512)
sh-369 [000] d... 4.891747: : read(fd=1, buf=00000000023d3000, size=512)
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-20 07:59:05 +08:00
|
|
|
always += tracex5_kern.o
|
2015-08-06 15:02:36 +08:00
|
|
|
always += tracex6_kern.o
|
2016-07-25 20:55:02 +08:00
|
|
|
always += test_probe_write_user_kern.o
|
2015-10-21 11:02:35 +08:00
|
|
|
always += trace_output_kern.o
|
2015-04-02 08:12:13 +08:00
|
|
|
always += tcbpf1_kern.o
|
2016-08-20 02:55:44 +08:00
|
|
|
always += tcbpf2_kern.o
|
2016-11-10 07:36:34 +08:00
|
|
|
always += tc_l2_redirect_kern.o
|
2015-06-19 22:00:44 +08:00
|
|
|
always += lathist_kern.o
|
2016-02-18 11:58:59 +08:00
|
|
|
always += offwaketime_kern.o
|
2016-03-09 07:07:52 +08:00
|
|
|
always += spintest_kern.o
|
2016-03-09 07:07:54 +08:00
|
|
|
always += map_perf_test_kern.o
|
2016-04-07 09:43:31 +08:00
|
|
|
always += test_overhead_tp_kern.o
|
|
|
|
always += test_overhead_kprobe_kern.o
|
2016-05-06 10:49:14 +08:00
|
|
|
always += parse_varlen.o parse_simple.o parse_ldabs.o
|
2016-07-01 01:28:45 +08:00
|
|
|
always += test_cgrp2_tc_kern.o
|
Add sample for adding simple drop program to link
Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
hook of a link. With the drop-only program, observed single core rate is
~20Mpps.
Other tests were run, for instance without the dropcnt increment or
without reading from the packet header, the packet rate was mostly
unchanged.
$ perf record -a samples/bpf/xdp1 $(</sys/class/net/eth0/ifindex)
proto 17: 20403027 drops/s
./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
Running... ctrl^C to stop
Device: eth4@0
Result: OK: 11791017(c11788327+d2689) usec, 59622913 (60byte,0frags)
5056638pps 2427Mb/sec (2427186240bps) errors: 0
Device: eth4@1
Result: OK: 11791012(c11787906+d3106) usec, 60526944 (60byte,0frags)
5133311pps 2463Mb/sec (2463989280bps) errors: 0
Device: eth4@2
Result: OK: 11791019(c11788249+d2769) usec, 59868091 (60byte,0frags)
5077431pps 2437Mb/sec (2437166880bps) errors: 0
Device: eth4@3
Result: OK: 11795039(c11792403+d2636) usec, 59483181 (60byte,0frags)
5043067pps 2420Mb/sec (2420672160bps) errors: 0
perf report --no-children:
26.05% ksoftirqd/0 [mlx4_en] [k] mlx4_en_process_rx_cq
17.84% ksoftirqd/0 [mlx4_en] [k] mlx4_en_alloc_frags
5.52% ksoftirqd/0 [mlx4_en] [k] mlx4_en_free_frag
4.90% swapper [kernel.vmlinux] [k] poll_idle
4.14% ksoftirqd/0 [kernel.vmlinux] [k] get_page_from_freelist
2.78% ksoftirqd/0 [kernel.vmlinux] [k] __free_pages_ok
2.57% ksoftirqd/0 [kernel.vmlinux] [k] bpf_map_lookup_elem
2.51% swapper [mlx4_en] [k] mlx4_en_process_rx_cq
1.94% ksoftirqd/0 [kernel.vmlinux] [k] percpu_array_map_lookup_elem
1.45% swapper [mlx4_en] [k] mlx4_en_alloc_frags
1.35% ksoftirqd/0 [kernel.vmlinux] [k] free_one_page
1.33% swapper [kernel.vmlinux] [k] intel_idle
1.04% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5c5
0.96% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c58d
0.93% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c6ee
0.92% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c6b9
0.89% ksoftirqd/0 [kernel.vmlinux] [k] __alloc_pages_nodemask
0.83% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c686
0.83% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5d5
0.78% ksoftirqd/0 [mlx4_en] [k] mlx4_alloc_pages.isra.23
0.77% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5b4
0.77% ksoftirqd/0 [kernel.vmlinux] [k] net_rx_action
machine specs:
receiver - Intel E5-1630 v3 @ 3.70GHz
sender - Intel E5645 @ 2.40GHz
Mellanox ConnectX-3 @40G
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 03:16:51 +08:00
|
|
|
always += xdp1_kern.o
|
2016-07-20 03:16:57 +08:00
|
|
|
always += xdp2_kern.o
|
2016-08-12 23:57:04 +08:00
|
|
|
always += test_current_task_under_cgroup_kern.o
|
2016-09-02 09:37:25 +08:00
|
|
|
always += trace_event_kern.o
|
2016-09-02 09:37:26 +08:00
|
|
|
always += sampleip_kern.o
|
bpf: mini eBPF library, test stubs and verifier testsuite
1.
the library includes a trivial set of BPF syscall wrappers:
int bpf_create_map(int key_size, int value_size, int max_entries);
int bpf_update_elem(int fd, void *key, void *value);
int bpf_lookup_elem(int fd, void *key, void *value);
int bpf_delete_elem(int fd, void *key);
int bpf_get_next_key(int fd, void *key, void *next_key);
int bpf_prog_load(enum bpf_prog_type prog_type,
const struct sock_filter_int *insns, int insn_len,
const char *license);
bpf_prog_load() stores verifier log into global bpf_log_buf[] array
and BPF_*() macros to build instructions
2.
test stubs configure eBPF infra with 'unspec' map and program types.
These are fake types used by user space testsuite only.
3.
verifier tests valid and invalid programs and expects predefined
error log messages from kernel.
40 tests so far.
$ sudo ./test_verifier
#0 add+sub+mul OK
#1 unreachable OK
#2 unreachable2 OK
#3 out of range jump OK
#4 out of range jump2 OK
#5 test1 ld_imm64 OK
...
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2014-09-26 15:17:07 +08:00
|
|
|
|
|
|
|
HOSTCFLAGS += -I$(objtree)/usr/include
|
2014-12-02 07:06:38 +08:00
|
|
|
|
|
|
|
HOSTCFLAGS_bpf_load.o += -I$(objtree)/usr/include -Wno-unused-variable
|
bpf: add sample usages for persistent maps/progs
This patch adds a couple of stand-alone examples on how BPF_OBJ_PIN
and BPF_OBJ_GET commands can be used.
Example with maps:
# ./fds_example -F /sys/fs/bpf/m -P -m -k 1 -v 42
bpf: map fd:3 (Success)
bpf: pin ret:(0,Success)
bpf: fd:3 u->(1:42) ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m -G -m -k 1
bpf: get fd:3 (Success)
bpf: fd:3 l->(1):42 ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m -G -m -k 1 -v 24
bpf: get fd:3 (Success)
bpf: fd:3 u->(1:24) ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m -G -m -k 1
bpf: get fd:3 (Success)
bpf: fd:3 l->(1):24 ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m2 -P -m
bpf: map fd:3 (Success)
bpf: pin ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m2 -G -m -k 1
bpf: get fd:3 (Success)
bpf: fd:3 l->(1):0 ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/m2 -G -m
bpf: get fd:3 (Success)
Example with progs:
# ./fds_example -F /sys/fs/bpf/p -P -p
bpf: prog fd:3 (Success)
bpf: pin ret:(0,Success)
bpf sock:4 <- fd:3 attached ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/p -G -p
bpf: get fd:3 (Success)
bpf: sock:4 <- fd:3 attached ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/p2 -P -p -o ./sockex1_kern.o
bpf: prog fd:5 (Success)
bpf: pin ret:(0,Success)
bpf: sock:3 <- fd:5 attached ret:(0,Success)
# ./fds_example -F /sys/fs/bpf/p2 -G -p
bpf: get fd:3 (Success)
bpf: sock:4 <- fd:3 attached ret:(0,Success)
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-10-29 21:58:10 +08:00
|
|
|
HOSTLOADLIBES_fds_example += -lelf
|
2014-12-02 07:06:38 +08:00
|
|
|
HOSTLOADLIBES_sockex1 += -lelf
|
2014-12-02 07:06:39 +08:00
|
|
|
HOSTLOADLIBES_sockex2 += -lelf
|
2015-05-20 07:59:06 +08:00
|
|
|
HOSTLOADLIBES_sockex3 += -lelf
|
2015-03-26 03:49:23 +08:00
|
|
|
HOSTLOADLIBES_tracex1 += -lelf
|
2015-03-26 03:49:24 +08:00
|
|
|
HOSTLOADLIBES_tracex2 += -lelf
|
2015-03-26 03:49:25 +08:00
|
|
|
HOSTLOADLIBES_tracex3 += -lelf
|
2015-03-26 03:49:26 +08:00
|
|
|
HOSTLOADLIBES_tracex4 += -lelf -lrt
|
samples/bpf: bpf_tail_call example for tracing
kprobe example that demonstrates how future seccomp programs may look like.
It attaches to seccomp_phase1() function and tail-calls other BPF programs
depending on syscall number.
Existing optimized classic BPF seccomp programs generated by Chrome look like:
if (sd.nr < 121) {
if (sd.nr < 57) {
if (sd.nr < 22) {
if (sd.nr < 7) {
if (sd.nr < 4) {
if (sd.nr < 1) {
check sys_read
} else {
if (sd.nr < 3) {
check sys_write and sys_open
} else {
check sys_close
}
}
} else {
} else {
} else {
} else {
} else {
}
the future seccomp using native eBPF may look like:
bpf_tail_call(&sd, &syscall_jmp_table, sd.nr);
which is simpler, faster and leaves more room for per-syscall checks.
Usage:
$ sudo ./tracex5
<...>-366 [001] d... 4.870033: : read(fd=1, buf=00007f6d5bebf000, size=771)
<...>-369 [003] d... 4.870066: : mmap
<...>-369 [003] d... 4.870077: : syscall=110 (one of get/set uid/pid/gid)
<...>-369 [003] d... 4.870089: : syscall=107 (one of get/set uid/pid/gid)
sh-369 [000] d... 4.891740: : read(fd=0, buf=00000000023d1000, size=512)
sh-369 [000] d... 4.891747: : write(fd=1, buf=00000000023d3000, size=512)
sh-369 [000] d... 4.891747: : read(fd=1, buf=00000000023d3000, size=512)
Signed-off-by: Alexei Starovoitov <ast@plumgrid.com>
Signed-off-by: David S. Miller <davem@davemloft.net>
2015-05-20 07:59:05 +08:00
|
|
|
HOSTLOADLIBES_tracex5 += -lelf
|
2015-08-06 15:02:36 +08:00
|
|
|
HOSTLOADLIBES_tracex6 += -lelf
|
2016-07-25 20:55:02 +08:00
|
|
|
HOSTLOADLIBES_test_probe_write_user += -lelf
|
2015-10-21 11:02:35 +08:00
|
|
|
HOSTLOADLIBES_trace_output += -lelf -lrt
|
2015-06-19 22:00:44 +08:00
|
|
|
HOSTLOADLIBES_lathist += -lelf
|
2016-02-18 11:58:59 +08:00
|
|
|
HOSTLOADLIBES_offwaketime += -lelf
|
2016-03-09 07:07:52 +08:00
|
|
|
HOSTLOADLIBES_spintest += -lelf
|
2016-03-09 07:07:54 +08:00
|
|
|
HOSTLOADLIBES_map_perf_test += -lelf -lrt
|
2016-04-07 09:43:31 +08:00
|
|
|
HOSTLOADLIBES_test_overhead += -lelf -lrt
|
Add sample for adding simple drop program to link
Add a sample program that only drops packets at the BPF_PROG_TYPE_XDP_RX
hook of a link. With the drop-only program, observed single core rate is
~20Mpps.
Other tests were run, for instance without the dropcnt increment or
without reading from the packet header, the packet rate was mostly
unchanged.
$ perf record -a samples/bpf/xdp1 $(</sys/class/net/eth0/ifindex)
proto 17: 20403027 drops/s
./pktgen_sample03_burst_single_flow.sh -i $DEV -d $IP -m $MAC -t 4
Running... ctrl^C to stop
Device: eth4@0
Result: OK: 11791017(c11788327+d2689) usec, 59622913 (60byte,0frags)
5056638pps 2427Mb/sec (2427186240bps) errors: 0
Device: eth4@1
Result: OK: 11791012(c11787906+d3106) usec, 60526944 (60byte,0frags)
5133311pps 2463Mb/sec (2463989280bps) errors: 0
Device: eth4@2
Result: OK: 11791019(c11788249+d2769) usec, 59868091 (60byte,0frags)
5077431pps 2437Mb/sec (2437166880bps) errors: 0
Device: eth4@3
Result: OK: 11795039(c11792403+d2636) usec, 59483181 (60byte,0frags)
5043067pps 2420Mb/sec (2420672160bps) errors: 0
perf report --no-children:
26.05% ksoftirqd/0 [mlx4_en] [k] mlx4_en_process_rx_cq
17.84% ksoftirqd/0 [mlx4_en] [k] mlx4_en_alloc_frags
5.52% ksoftirqd/0 [mlx4_en] [k] mlx4_en_free_frag
4.90% swapper [kernel.vmlinux] [k] poll_idle
4.14% ksoftirqd/0 [kernel.vmlinux] [k] get_page_from_freelist
2.78% ksoftirqd/0 [kernel.vmlinux] [k] __free_pages_ok
2.57% ksoftirqd/0 [kernel.vmlinux] [k] bpf_map_lookup_elem
2.51% swapper [mlx4_en] [k] mlx4_en_process_rx_cq
1.94% ksoftirqd/0 [kernel.vmlinux] [k] percpu_array_map_lookup_elem
1.45% swapper [mlx4_en] [k] mlx4_en_alloc_frags
1.35% ksoftirqd/0 [kernel.vmlinux] [k] free_one_page
1.33% swapper [kernel.vmlinux] [k] intel_idle
1.04% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5c5
0.96% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c58d
0.93% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c6ee
0.92% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c6b9
0.89% ksoftirqd/0 [kernel.vmlinux] [k] __alloc_pages_nodemask
0.83% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c686
0.83% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5d5
0.78% ksoftirqd/0 [mlx4_en] [k] mlx4_alloc_pages.isra.23
0.77% ksoftirqd/0 [mlx4_en] [k] 0x000000000001c5b4
0.77% ksoftirqd/0 [kernel.vmlinux] [k] net_rx_action
machine specs:
receiver - Intel E5-1630 v3 @ 3.70GHz
sender - Intel E5645 @ 2.40GHz
Mellanox ConnectX-3 @40G
Signed-off-by: Brenden Blanco <bblanco@plumgrid.com>
Acked-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: David S. Miller <davem@davemloft.net>
2016-07-20 03:16:51 +08:00
|
|
|
HOSTLOADLIBES_xdp1 += -lelf
|
2016-07-20 03:16:57 +08:00
|
|
|
HOSTLOADLIBES_xdp2 += -lelf
|
2016-08-12 23:57:04 +08:00
|
|
|
HOSTLOADLIBES_test_current_task_under_cgroup += -lelf
|
2016-09-02 09:37:25 +08:00
|
|
|
HOSTLOADLIBES_trace_event += -lelf
|
2016-09-02 09:37:26 +08:00
|
|
|
HOSTLOADLIBES_sampleip += -lelf
|
2016-11-10 07:36:34 +08:00
|
|
|
HOSTLOADLIBES_tc_l2_redirect += -l elf
|
2014-12-02 07:06:38 +08:00
|
|
|
|
2016-04-28 20:21:14 +08:00
|
|
|
# Allows pointing LLC/CLANG to a LLVM backend with bpf support, redefine on cmdline:
|
|
|
|
# make samples/bpf/ LLC=~/git/llvm/build/bin/llc CLANG=~/git/llvm/build/bin/clang
|
2016-04-28 20:20:53 +08:00
|
|
|
LLC ?= llc
|
2016-04-28 20:21:14 +08:00
|
|
|
CLANG ?= clang
|
2016-04-28 20:20:53 +08:00
|
|
|
|
2016-04-28 20:21:09 +08:00
|
|
|
# Trick to allow make to be run from this directory
|
|
|
|
all:
|
|
|
|
$(MAKE) -C ../../ $$PWD/
|
|
|
|
|
|
|
|
clean:
|
|
|
|
$(MAKE) -C ../../ M=$$PWD clean
|
|
|
|
@rm -f *~
|
|
|
|
|
2016-04-28 20:21:14 +08:00
|
|
|
# Verify LLVM compiler tools are available and bpf target is supported by llc
|
|
|
|
.PHONY: verify_cmds verify_target_bpf $(CLANG) $(LLC)
|
2016-04-28 20:20:58 +08:00
|
|
|
|
2016-04-28 20:21:14 +08:00
|
|
|
verify_cmds: $(CLANG) $(LLC)
|
|
|
|
@for TOOL in $^ ; do \
|
|
|
|
if ! (which -- "$${TOOL}" > /dev/null 2>&1); then \
|
|
|
|
echo "*** ERROR: Cannot find LLVM tool $${TOOL}" ;\
|
|
|
|
exit 1; \
|
|
|
|
else true; fi; \
|
|
|
|
done
|
2016-04-28 20:20:58 +08:00
|
|
|
|
2016-04-28 20:21:14 +08:00
|
|
|
verify_target_bpf: verify_cmds
|
2016-04-28 20:20:58 +08:00
|
|
|
@if ! (${LLC} -march=bpf -mattr=help > /dev/null 2>&1); then \
|
|
|
|
echo "*** ERROR: LLVM (${LLC}) does not support 'bpf' target" ;\
|
|
|
|
echo " NOTICE: LLVM version >= 3.7.1 required" ;\
|
|
|
|
exit 2; \
|
|
|
|
else true; fi
|
|
|
|
|
|
|
|
$(src)/*.c: verify_target_bpf
|
|
|
|
|
2016-04-05 01:01:33 +08:00
|
|
|
# asm/sysreg.h - inline assembly used by it is incompatible with llvm.
|
|
|
|
# But, there is no easy way to fix it, so just exclude it since it is
|
2015-11-13 06:07:46 +08:00
|
|
|
# useless for BPF samples.
|
2015-05-12 12:25:51 +08:00
|
|
|
$(obj)/%.o: $(src)/%.c
|
2016-04-28 20:21:14 +08:00
|
|
|
$(CLANG) $(NOSTDINC_FLAGS) $(LINUXINCLUDE) $(EXTRA_CFLAGS) \
|
2015-11-13 06:07:46 +08:00
|
|
|
-D__KERNEL__ -D__ASM_SYSREG_H -Wno-unused-value -Wno-pointer-sign \
|
2016-05-06 10:49:14 +08:00
|
|
|
-Wno-compare-distinct-pointer-types \
|
2016-04-28 20:20:53 +08:00
|
|
|
-O2 -emit-llvm -c $< -o -| $(LLC) -march=bpf -filetype=obj -o $@
|