linux/kernel
Alexei Starovoitov c4f6699dfc bpf: introduce BPF_RAW_TRACEPOINT
Introduce BPF_PROG_TYPE_RAW_TRACEPOINT bpf program type to access
kernel internal arguments of the tracepoints in their raw form.

>From bpf program point of view the access to the arguments look like:
struct bpf_raw_tracepoint_args {
       __u64 args[0];
};

int bpf_prog(struct bpf_raw_tracepoint_args *ctx)
{
  // program can read args[N] where N depends on tracepoint
  // and statically verified at program load+attach time
}

kprobe+bpf infrastructure allows programs access function arguments.
This feature allows programs access raw tracepoint arguments.

Similar to proposed 'dynamic ftrace events' there are no abi guarantees
to what the tracepoints arguments are and what their meaning is.
The program needs to type cast args properly and use bpf_probe_read()
helper to access struct fields when argument is a pointer.

For every tracepoint __bpf_trace_##call function is prepared.
In assembler it looks like:
(gdb) disassemble __bpf_trace_xdp_exception
Dump of assembler code for function __bpf_trace_xdp_exception:
   0xffffffff81132080 <+0>:     mov    %ecx,%ecx
   0xffffffff81132082 <+2>:     jmpq   0xffffffff811231f0 <bpf_trace_run3>

where

TRACE_EVENT(xdp_exception,
        TP_PROTO(const struct net_device *dev,
                 const struct bpf_prog *xdp, u32 act),

The above assembler snippet is casting 32-bit 'act' field into 'u64'
to pass into bpf_trace_run3(), while 'dev' and 'xdp' args are passed as-is.
All of ~500 of __bpf_trace_*() functions are only 5-10 byte long
and in total this approach adds 7k bytes to .text.

This approach gives the lowest possible overhead
while calling trace_xdp_exception() from kernel C code and
transitioning into bpf land.
Since tracepoint+bpf are used at speeds of 1M+ events per second
this is valuable optimization.

The new BPF_RAW_TRACEPOINT_OPEN sys_bpf command is introduced
that returns anon_inode FD of 'bpf-raw-tracepoint' object.

The user space looks like:
// load bpf prog with BPF_PROG_TYPE_RAW_TRACEPOINT type
prog_fd = bpf_prog_load(...);
// receive anon_inode fd for given bpf_raw_tracepoint with prog attached
raw_tp_fd = bpf_raw_tracepoint_open("xdp_exception", prog_fd);

Ctrl-C of tracing daemon or cmdline tool that uses this feature
will automatically detach bpf program, unload it and
unregister tracepoint probe.

On the kernel side the __bpf_raw_tp_map section of pointers to
tracepoint definition and to __bpf_trace_*() probe function is used
to find a tracepoint with "xdp_exception" name and
corresponding __bpf_trace_xdp_exception() probe function
which are passed to tracepoint_probe_register() to connect probe
with tracepoint.

Addition of bpf_raw_tracepoint doesn't interfere with ftrace and perf
tracepoint mechanisms. perf_event_open() can be used in parallel
on the same tracepoint.
Multiple bpf_raw_tracepoint_open("xdp_exception", prog_fd) are permitted.
Each with its own bpf program. The kernel will execute
all tracepoint probes and all attached bpf programs.

In the future bpf_raw_tracepoints can be extended with
query/introspection logic.

__bpf_raw_tp_map section logic was contributed by Steven Rostedt

Signed-off-by: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Acked-by: Steven Rostedt (VMware) <rostedt@goodmis.org>
Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
2018-03-28 22:55:19 +02:00
..
bpf bpf: introduce BPF_RAW_TRACEPOINT 2018-03-28 22:55:19 +02:00
cgroup cgroup: fix rule checking for threaded mode switching 2018-02-21 11:39:22 -08:00
configs KVM changes for 4.16 2018-02-10 13:16:35 -08:00
debug signal: Simplify and fix kdb_send_sig 2018-01-03 18:01:08 -06:00
events perf/core: Fix ctx_event_type in ctx_resched() 2018-03-09 08:03:02 +01:00
gcov License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
irq genirq/matrix: Handle CPU offlining proper 2018-02-22 22:05:43 +01:00
livepatch Merge branch 'for-4.16/remove-immediate' into for-linus 2018-01-31 16:36:38 +01:00
locking rtmutex: Make rt_mutex_futex_unlock() safe for irq-off callsites 2018-03-09 11:06:16 +01:00
power x86/power: Fix swsusp_arch_resume prototype 2018-02-02 23:33:50 +01:00
printk Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk 2018-03-01 10:06:39 -08:00
rcu SCSI misc on 20180131 2018-01-31 11:23:28 -08:00
sched Merge branch 'for-4.16-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2018-03-19 15:39:02 -07:00
time timers: Forward timer base before migrating timers 2018-02-28 23:34:33 +01:00
trace bpf: introduce BPF_RAW_TRACEPOINT 2018-03-28 22:55:19 +02:00
.gitignore
acct.c kernel/acct.c: fix the acct->needcheck check in check_free_space() 2018-01-04 16:45:09 -08:00
async.c kernel/async.c: revert "async: simplify lowest_in_progress()" 2018-02-06 18:32:44 -08:00
audit_fsnotify.c
audit_tree.c Merge branch 'fsnotify' of git://git.kernel.org/pub/scm/linux/kernel/git/jack/linux-fs 2017-11-14 14:08:20 -08:00
audit_watch.c audit/stable-4.13 PR 20170816 2017-08-16 16:48:34 -07:00
audit.c net: Convert audit_net_ops 2018-02-13 10:36:06 -05:00
audit.h audit/stable-4.15 PR 20171113 2017-11-15 13:28:48 -08:00
auditfilter.c audit: filter PATH records keyed on filesystem magic 2017-11-10 16:08:56 -05:00
auditsc.c audit/stable-4.15 PR 20171113 2017-11-15 13:28:48 -08:00
backtracetest.c
bounds.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
capability.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
compat.c signals: Move put_compat_sigset to compat.h to silence hardened usercopy 2018-03-02 21:31:55 +00:00
configs.c
context_tracking.c
cpu_pm.c PM / CPU: replace raw_notifier with atomic_notifier 2017-07-31 13:09:49 +02:00
cpu.c Merge branch 'timers-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2017-12-31 12:30:34 -08:00
crash_core.c kdump: write correct address of mem_section into vmcoreinfo 2018-01-13 10:42:48 -08:00
crash_dump.c
cred.c
delayacct.c delayacct: Account blkio completion on the correct task 2018-01-16 03:29:36 +01:00
dma.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
elfcore.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exec_domain.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
exit.c kernel/exit.c: export abort() to modules 2018-01-04 16:45:09 -08:00
extable.c extable: Make init_kernel_text() global 2018-02-21 16:54:06 +01:00
fail_function.c error-injection: Fix to prohibit jump optimization 2018-03-12 16:16:00 +01:00
fork.c include/linux/sched/mm.h: re-inline mmdrop() 2018-02-21 15:35:42 -08:00
freezer.c
futex_compat.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
futex.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
groups.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-14 16:00:49 -08:00
hung_task.c
irq_work.c irq/work: Improve the flag definitions 2018-01-08 19:43:15 +01:00
jump_label.c jump_label: Fix sparc64 warning 2018-03-14 16:35:26 +01:00
kallsyms.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/pmladek/printk 2018-02-01 13:36:15 -08:00
kcmp.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c kcov: detect double association with a single task 2018-02-06 18:32:46 -08:00
kexec_core.c
kexec_file.c resource: Provide resource struct in resource walk callback 2017-11-07 15:35:57 +01:00
kexec_internal.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
kexec.c
kmod.c kmod: move #ifdef CONFIG_MODULES wrapper to Makefile 2017-09-08 18:26:51 -07:00
kprobes.c kprobes: Propagate error from disarm_kprobe_ftrace() 2018-02-16 09:12:58 +01:00
ksysfs.c
kthread.c treewide: Remove TIMER_FUNC_TYPE and TIMER_DATA_TYPE casts 2017-11-21 16:35:54 -08:00
latencytop.c
Makefile error-injection: Support fault injection framework 2018-01-12 17:33:38 -08:00
memremap.c kernel/memremap: Remove stale devres_free() call 2018-03-06 10:58:54 -08:00
module_signing.c
module-internal.h
module.c module: propagate error in modules_open() 2018-03-08 21:58:51 +01:00
notifier.c
nsproxy.c
padata.c padata: add SPDX identifier 2018-01-05 18:43:00 +11:00
panic.c bug: use %pB in BUG and stack protector failure 2018-03-09 16:40:01 -08:00
params.c kernel/params.c: improve STANDARD_PARAM_DEF readability 2017-10-03 17:54:26 -07:00
pid_namespace.c pid: remove pidhash 2017-11-17 16:10:04 -08:00
pid.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
profile.c
ptrace.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
range.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
reboot.c kernel/reboot.c: add devm_register_reboot_notifier() 2017-11-17 16:10:04 -08:00
relay.c kernel/relay.c: limit kmalloc size to KMALLOC_MAX_SIZE 2018-02-21 15:35:43 -08:00
resource.c Merge branch 'akpm' (patches from Andrew) 2018-02-06 22:15:42 -08:00
seccomp.c - Fix seccomp GET_METADATA to deal with field sizes correctly (Tycho Andersen) 2018-02-22 10:50:24 -08:00
signal.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/livepatching 2018-01-31 13:02:18 -08:00
smp.c smp/core: Use lockdep to assert IRQs are disabled/enabled 2017-11-08 11:13:50 +01:00
smpboot.c watchdog/core, powerpc: Lock cpus across reconfiguration 2017-10-04 10:53:54 +02:00
smpboot.h License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
softirq.c softirq: Eliminate cond_resched_rcu_qs() in favor of cond_resched() 2017-12-04 10:28:58 -08:00
stacktrace.c
stop_machine.c
sys_ni.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sys.c fix typo in assignment of fs default overflow gid 2017-12-14 16:01:45 -06:00
sysctl_binary.c License cleanup: add SPDX GPL-2.0 license identifier to files with no license 2017-11-02 11:10:55 +01:00
sysctl.c pipe: reject F_SETPIPE_SZ with size over UINT_MAX 2018-02-06 18:32:47 -08:00
task_work.c locking/barriers: Convert users of lockless_dereference() to READ_ONCE() 2017-12-17 13:57:15 +01:00
taskstats.c pids: introduce find_get_task_by_vpid() helper 2018-02-06 18:32:46 -08:00
test_kprobes.c kprobes: Disable the jprobes test code 2017-10-20 11:02:54 +02:00
torture.c torture: Save a line in stutter_wait(): while -> for 2017-12-11 09:18:30 -08:00
tracepoint.c tracepoint: Remove smp_read_barrier_depends() from comment 2017-12-04 10:52:56 -08:00
tsacct.c
ucount.c
uid16.c kernel: make groups_sort calling a responsibility group_info allocators 2017-12-14 16:00:49 -08:00
umh.c kernel/umh.c: optimize 'proc_cap_handler()' 2017-11-17 16:10:01 -08:00
up.c smp: Avoid using two cache lines for struct call_single_data 2017-08-29 15:14:38 +02:00
user_namespace.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2017-11-16 12:20:15 -08:00
user-return-notifier.c
user.c efivarfs: Limit the rate for non-root to read files 2018-02-22 10:21:02 -08:00
utsname_sysctl.c
utsname.c
watchdog_hld.c Merge branch 'linus' into core/urgent, to pick up dependent commits 2017-11-04 08:53:04 +01:00
watchdog.c Merge branch 'linus' into sched/core, to pick up fixes 2017-11-08 10:17:15 +01:00
workqueue_internal.h Merge branch 'for-4.14-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2017-11-06 12:26:49 -08:00
workqueue.c workqueue: remove unused cancel_work() 2018-03-13 13:37:42 -07:00