linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-12-24 11:34:50 +08:00

History

Andrii Nakryiko fde2a3882b bpf: support precision propagation in the presence of subprogs Add support precision backtracking in the presence of subprogram frames in jump history. This means supporting a few different kinds of subprogram invocation situations, all requiring a slightly different handling in precision backtracking handling logic: - static subprogram calls; - global subprogram calls; - callback-calling helpers/kfuncs. For each of those we need to handle a few precision propagation cases: - what to do with precision of subprog returns (r0); - what to do with precision of input arguments; - for all of them callee-saved registers in caller function should be propagated ignoring subprog/callback part of jump history. N.B. Async callback-calling helpers (currently only bpf_timer_set_callback()) are transparent to all this because they set a separate async callback environment and thus callback's history is not shared with main program's history. So as far as all the changes in this commit goes, such helper is just a regular helper. Let's look at all these situation in more details. Let's start with static subprogram being called, using an exxerpt of a simple main program and its static subprog, indenting subprog's frame slightly to make everything clear. frame 0 frame 1 precision set ======= ======= ============= 9: r6 = 456; 10: r1 = 123; fr0: r6 11: call pc+10; fr0: r1, r6 22: r0 = r1; fr0: r6; fr1: r1 23: exit fr0: r6; fr1: r0 12: r1 = <map_pointer> fr0: r0, r6 13: r1 += r0; fr0: r0, r6 14: r1 += r6; fr0: r6 15: exit As can be seen above main function is passing 123 as single argument to an identity (`return x;`) subprog. Returned value is used to adjust map pointer offset, which forces r0 to be marked as precise. Then instruction #14 does the same for callee-saved r6, which will have to be backtracked all the way to instruction #9. For brevity, precision sets for instruction #13 and #14 are combined in the diagram above. First, for subprog calls, r0 returned from subprog (in frame 0) has to go into subprog's frame 1, and should be cleared from frame 0. So we go back into subprog's frame knowing we need to mark r0 precise. We then see that insn #22 sets r0 from r1, so now we care about marking r1 precise. When we pop up from subprog's frame back into caller at insn #11 we keep r1, as it's an argument-passing register, so we eventually find `10: r1 = 123;` and satify precision propagation chain for insn #13. This example demonstrates two sets of rules: - r0 returned after subprog call has to be moved into subprog's r0 set; - static subprog arguments (r1-r5) are moved back to caller precision set. Let's look at what happens with callee-saved precision propagation. Insn #14 mark r6 as precise. When we get into subprog's frame, we keep r6 in frame 0's precision set only. Subprog itself has its own set of independent r6-r10 registers and is not affected. When we eventually made our way out of subprog frame we keep r6 in precision set until we reach `9: r6 = 456;`, satisfying propagation. r6-r10 propagation is perhaps the simplest aspect, it always stays in its original frame. That's pretty much all we have to do to support precision propagation across static subprog invocation. Let's look at what happens when we have global subprog invocation. frame 0 frame 1 precision set ======= ======= ============= 9: r6 = 456; 10: r1 = 123; fr0: r6 11: call pc+10; # global subprog fr0: r6 12: r1 = <map_pointer> fr0: r0, r6 13: r1 += r0; fr0: r0, r6 14: r1 += r6; fr0: r6; 15: exit Starting from insn #13, r0 has to be precise. We backtrack all the way to insn #11 (call pc+10) and see that subprog is global, so was already validated in isolation. As opposed to static subprog, global subprog always returns unknown scalar r0, so that satisfies precision propagation and we drop r0 from precision set. We are done for insns #13. Now for insn #14. r6 is in precision set, we backtrack to `call pc+10;`. Here we need to recognize that this is effectively both exit and entry to global subprog, which means we stay in caller's frame. So we carry on with r6 still in precision set, until we satisfy it at insn #9. The only hard part with global subprogs is just knowing when it's a global func. Lastly, callback-calling helpers and kfuncs do simulate subprog calls, so jump history will have subprog instructions in between caller program's instructions, but the rules of propagating r0 and r1-r5 differ, because we don't actually directly call callback. We actually call helper/kfunc, which at runtime will call subprog, so the only difference between normal helper/kfunc handling is that we need to make sure to skip callback simulatinog part of jump history. Let's look at an example to make this clearer. frame 0 frame 1 precision set ======= ======= ============= 8: r6 = 456; 9: r1 = 123; fr0: r6 10: r2 = &callback; fr0: r6 11: call bpf_loop; fr0: r6 22: r0 = r1; fr0: r6 fr1: 23: exit fr0: r6 fr1: 12: r1 = <map_pointer> fr0: r0, r6 13: r1 += r0; fr0: r0, r6 14: r1 += r6; fr0: r6; 15: exit Again, insn #13 forces r0 to be precise. As soon as we get to `23: exit` we see that this isn't actually a static subprog call (it's `call bpf_loop;` helper call instead). So we clear r0 from precision set. For callee-saved register, there is no difference: it stays in frame 0's precision set, we go through insn #22 and #23, ignoring them until we get back to caller frame 0, eventually satisfying precision backtrack logic at insn #8 (`r6 = 456;`). Assuming callback needed to set r0 as precise at insn #23, we'd backtrack to insn #22, switching from r0 to r1, and then at the point when we pop back to frame 0 at insn #11, we'll clear r1-r5 from precision set, as we don't really do a subprog call directly, so there is no input argument precision propagation. That's pretty much it. With these changes, it seems like the only still unsupported situation for precision backpropagation is the case when program is accessing stack through registers other than r10. This is still left as unsupported (though rare) case for now. As for results. For selftests, few positive changes for bigger programs, cls_redirect in dynptr variant benefitting the most: [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results.csv ~/subprog-precise-after-results.csv -f @veristat.cfg -e file,prog,insns -f 'insns_diff!=0' File Program Insns (A) Insns (B) Insns (DIFF) ---------------------------------------- ------------- --------- --------- ---------------- pyperf600_bpf_loop.bpf.linked1.o on_event 2060 2002 -58 (-2.82%) test_cls_redirect_dynptr.bpf.linked1.o cls_redirect 15660 2914 -12746 (-81.39%) test_cls_redirect_subprogs.bpf.linked1.o cls_redirect 61620 59088 -2532 (-4.11%) xdp_synproxy_kern.bpf.linked1.o syncookie_tc 109980 86278 -23702 (-21.55%) xdp_synproxy_kern.bpf.linked1.o syncookie_xdp 97716 85147 -12569 (-12.86%) Cilium progress don't really regress. They don't use subprogs and are mostly unaffected, but some other fixes and improvements could have changed something. This doesn't appear to be the case: [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-cilium.csv ~/subprog-precise-after-results-cilium.csv -e file,prog,insns -f 'insns_diff!=0' File Program Insns (A) Insns (B) Insns (DIFF) ------------- ------------------------------ --------- --------- ------------ bpf_host.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%) bpf_lxc.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%) bpf_overlay.o tail_nodeport_nat_ingress_ipv6 4983 5003 +20 (+0.40%) bpf_xdp.o tail_handle_nat_fwd_ipv6 12475 12504 +29 (+0.23%) bpf_xdp.o tail_nodeport_nat_ingress_ipv6 6363 6371 +8 (+0.13%) Looking at (somewhat anonymized) Meta production programs, we see mostly insignificant variation in number of instructions, with one program (syar_bind6_protect6) benefitting the most at -17%. [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-fbcode.csv ~/subprog-precise-after-results-fbcode.csv -e prog,insns -f 'insns_diff!=0' Program Insns (A) Insns (B) Insns (DIFF) ------------------------ --------- --------- ---------------- on_request_context_event 597 585 -12 (-2.01%) read_async_py_stack 43789 43657 -132 (-0.30%) read_sync_py_stack 35041 37599 +2558 (+7.30%) rrm_usdt 946 940 -6 (-0.63%) sysarmor_inet6_bind 28863 28249 -614 (-2.13%) sysarmor_inet_bind 28845 28240 -605 (-2.10%) syar_bind4_protect4 154145 147640 -6505 (-4.22%) syar_bind6_protect6 165242 137088 -28154 (-17.04%) syar_task_exit_setgid 21289 19720 -1569 (-7.37%) syar_task_exit_setuid 21290 19721 -1569 (-7.37%) do_uprobe 19967 19413 -554 (-2.77%) tw_twfw_ingress 215877 204833 -11044 (-5.12%) tw_twfw_tc_in 215877 204833 -11044 (-5.12%) But checking duration (wall clock) differences, that is the actual time taken by verifier to validate programs, we see a sometimes dramatic improvements, all the way to about 16x improvements: [vmuser@archvm bpf]$ ./veristat -C ~/subprog-precise-before-results-meta.csv ~/subprog-precise-after-results-meta.csv -e prog,duration -s duration_diff^ \| head -n20 Program Duration (us) (A) Duration (us) (B) Duration (us) (DIFF) ---------------------------------------- ----------------- ----------------- -------------------- tw_twfw_ingress 4488374 272836 -4215538 (-93.92%) tw_twfw_tc_in 4339111 268175 -4070936 (-93.82%) tw_twfw_egress 3521816 270751 -3251065 (-92.31%) tw_twfw_tc_eg 3472878 284294 -3188584 (-91.81%) balancer_ingress 343119 291391 -51728 (-15.08%) syar_bind6_protect6 78992 64782 -14210 (-17.99%) ttls_tc_ingress 11739 8176 -3563 (-30.35%) kprobe__security_inode_link 13864 11341 -2523 (-18.20%) read_sync_py_stack 21927 19442 -2485 (-11.33%) read_async_py_stack 30444 28136 -2308 (-7.58%) syar_task_exit_setuid 10256 8440 -1816 (-17.71%) Signed-off-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/r/20230505043317.3629845-9-andrii@kernel.org Signed-off-by: Alexei Starovoitov <ast@kernel.org>		2023-05-04 22:35:35 -07:00
..
bpf	bpf: support precision propagation in the presence of subprogs	2023-05-04 22:35:35 -07:00
cgroup	Networking changes for 6.4.	2023-04-26 16:07:23 -07:00
configs	mm/slob: remove CONFIG_SLOB	2023-03-29 10:31:40 +02:00
debug	kdb: use srcu console list iterator	2022-12-02 11:25:00 +01:00
dma	swiotlb: fix a braino in the alignment check fix	2023-04-06 16:45:12 +02:00
entry	ptrace: Provide set/get interface for syscall user dispatch	2023-04-16 14:23:07 +02:00
events	perf/core: Fix the same task check in perf_event_set_output	2023-04-05 09:58:46 +02:00
futex	- Prevent the leaking of a debug timer in futex_waitv()	2023-01-01 11:15:05 -08:00
gcov	gcov: add support for checksum field	2022-12-21 14:31:52 -08:00
irq	genirq: Update affinity of secondary threads	2023-04-15 10:17:16 +02:00
kcsan	Kernel concurrency sanitizer (KCSAN) updates for v6.4	2023-04-24 11:46:53 -07:00
livepatch	livepatch: Make kobj_type structures constant	2023-03-09 11:15:42 +01:00
locking	RCU Changes for 6.4:	2023-04-24 12:16:14 -07:00
module	Networking changes for 6.4.	2023-04-26 16:07:23 -07:00
power	PM: Add sysfs files to represent time spent in hardware sleep state	2023-04-20 19:06:12 +02:00
printk	printk: Remove obsoleted check for non-existent "user" object	2023-04-03 12:05:17 +02:00
rcu	RCU Changes for 6.4:	2023-04-24 12:16:14 -07:00
sched	sched/fair: Fix imbalance overflow	2023-04-12 16:46:30 +02:00
time	Timers and timekeeping updates:	2023-04-25 11:22:46 -07:00
trace	bpf: Add bpf_dynptr_size	2023-04-27 10:40:41 +02:00
.gitignore
acct.c	acct: fix potential integer overflow in encode_comp_t()	2022-11-30 16:13:18 -08:00
async.c	Revert "module, async: async_synchronize_full() on module init iff async is used"	2022-02-03 11:20:34 -08:00
audit_fsnotify.c	audit: fix potential double free on error path from fsnotify_add_inode_mark	2022-08-22 18:50:06 -04:00
audit_tree.c	audit: use fsnotify group lock helpers	2022-04-25 14:37:28 +02:00
audit_watch.c	audit_init_parent(): constify path	2022-09-01 17:39:30 -04:00
audit.c	audit: use time_after to compare time	2022-08-29 19:47:03 -04:00
audit.h	audit: remove selinux_audit_rule_update() declaration	2022-09-07 11:30:15 -04:00
auditfilter.c	audit/stable-5.17 PR 20220110	2022-01-11 13:08:21 -08:00
auditsc.c	capability: just use a 'u64' instead of a 'u32[2]' array	2023-03-01 10:01:22 -08:00
backtracetest.c
bounds.c	mm: multi-gen LRU: minimal implementation	2022-09-26 19:46:09 -07:00
capability.c	capability: just use a 'u64' instead of a 'u32[2]' array	2023-03-01 10:01:22 -08:00
cfi.c	cfi: Switch to -fsanitize=kcfi	2022-09-26 10:13:13 -07:00
compat.c	sched_getaffinity: don't assume 'cpumask_size()' is fully initialized	2023-03-14 19:32:38 -07:00
configs.c
context_tracking.c	context_tracking: Fix noinstr vs KASAN	2023-01-13 11:48:18 +01:00
cpu_pm.c	cpuidle, cpu_pm: Remove RCU fiddling from cpu_pm_{enter,exit}()	2023-01-13 11:48:15 +01:00
cpu.c	cpu/hotplug: Do not bail-out in DYING/STARTING sections	2022-12-02 12:43:02 +01:00
crash_core.c	mm: remove 'First tail page' members from struct page	2023-02-02 22:32:59 -08:00
crash_dump.c
cred.c	cred: Do not default to init_cred in prepare_kernel_cred()	2022-11-01 10:04:52 -07:00
delayacct.c	delayacct: support re-entrance detection of thrashing accounting	2022-09-26 19:46:07 -07:00
dma.c
exec_domain.c
exit.c	arm64 updates for 6.3:	2023-02-21 15:27:48 -08:00
extable.c	context_tracking: Take NMI eqs entrypoints over RCU	2022-07-05 13:32:59 -07:00
fail_function.c	kernel/fail_function: fix memory leak with using debugfs_lookup()	2023-02-08 13:36:22 +01:00
fork.c	v6.4/pidfd.file	2023-04-24 13:03:42 -07:00
freezer.c	freezer,sched: Rewrite core freezer logic	2022-09-07 21:53:50 +02:00
gen_kheaders.sh	kheaders: use standard naming for the temporary directory	2023-01-22 23:43:34 +09:00
groups.c	security: Add LSM hook to setgroups() syscall	2022-07-15 18:21:49 +00:00
hung_task.c	hung_task: print message when hung_task_warnings gets down to zero.	2023-02-09 17:03:20 -08:00
iomem.c
irq_work.c	irq_work: use kasan_record_aux_stack_noalloc() record callstack	2022-04-15 14:49:55 -07:00
jump_label.c	jump_label: Prevent key->enabled int overflow	2022-12-01 15:53:05 -08:00
kallsyms_internal.h	kallsyms: Reduce the memory occupied by kallsyms_seqs_of_names[]	2022-11-12 18:47:36 -08:00
kallsyms_selftest.c	kallsyms: Fix scheduling with interrupts disabled in self-test	2023-01-13 15:09:08 -08:00
kallsyms_selftest.h	kallsyms: Add self-test facility	2022-11-15 00:42:02 -08:00
kallsyms.c	kallsyms: Add self-test facility	2022-11-15 00:42:02 -08:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt	Revert "signal, x86: Delay calling signals in atomic on RT enabled kernels"	2022-03-31 10:36:55 +02:00
kcov.c	mm: replace vma->vm_flags direct modifications with modifier calls	2023-02-09 16:51:39 -08:00
kexec_core.c	There is no particular theme here - mainly quick hits all over the tree.	2023-02-23 17:55:40 -08:00
kexec_elf.c
kexec_file.c	kexec: introduce sysctl parameters kexec_load_limit_*	2023-02-02 22:50:05 -08:00
kexec_internal.h	panic, kexec: make __crash_kexec() NMI safe	2022-09-11 21:55:06 -07:00
kexec.c	kexec: introduce sysctl parameters kexec_load_limit_*	2023-02-02 22:50:05 -08:00
kheaders.c
kmod.c
kprobes.c	x86/kprobes: Fix arch_check_optimized_kprobe check within optimized_kprobe range	2023-02-21 08:49:16 +09:00
ksysfs.c	kernels/ksysfs.c: export kernel address bits	2023-01-20 14:30:45 +01:00
kthread.c	kthread: Pass in the thread's name during creation	2023-03-12 10:54:36 +01:00
latencytop.c	latencytop: use the last element of latency_record of system	2022-09-11 21:55:12 -07:00
Makefile	vhost_task: Allow vhost layer to use copy_process	2023-03-23 12:45:36 +01:00
module_signature.c
notifier.c	kernel/notifier: Remove CONFIG_SRCU	2023-02-02 16:26:06 -08:00
nsproxy.c	convert setns(2) to fdget()/fdput()	2023-04-20 22:55:35 -04:00
padata.c	padata: use alignment when calculating the number of worker threads	2023-03-14 17:06:44 +08:00
panic.c	panic: fix the panic_print NMI backtrace setting	2023-03-02 21:54:23 -08:00
params.c	kernel/params.c: Use kstrtobool() instead of strtobool()	2023-01-25 14:07:21 -08:00
pid_namespace.c	- Daniel Verkamp has contributed a memfd series ("mm/memfd: add	2023-02-23 17:09:35 -08:00
pid_sysctl.h	mm/memfd: add MFD_NOEXEC_SEAL and MFD_EXEC	2023-01-18 17:12:37 -08:00
pid.c	pid: add pidfd_prepare()	2023-04-03 11:16:56 +02:00
profile.c	kernel/profile.c: simplify duplicated code in profile_setup()	2022-09-11 21:55:12 -07:00
ptrace.c	ptrace: Provide set/get interface for syscall user dispatch	2023-04-16 14:23:07 +02:00
range.c
reboot.c	kernel/reboot: Add SYS_OFF_MODE_RESTART_PREPARE mode	2022-10-04 15:59:36 +02:00
regset.c
relay.c	mm: replace vma->vm_flags direct modifications with modifier calls	2023-02-09 16:51:39 -08:00
resource_kunit.c
resource.c	dax/kmem: Fix leak of memory-hotplug resources	2023-02-17 14:58:01 -08:00
rseq.c	rseq: Extend struct rseq with per-memory-map concurrency ID	2022-12-27 12:52:12 +01:00
scftorture.c	scftorture: Fix distribution of short handler delays	2022-04-11 17:07:29 -07:00
scs.c	scs: add support for dynamic shadow call stacks	2022-11-09 18:06:35 +00:00
seccomp.c	seccomp: fix kernel-doc function name warning	2023-01-13 17:01:06 -08:00
signal.c	posix-timers: Prefer delivery of signals to the current thread	2023-04-16 09:00:18 +02:00
smp.c	bitmap patches for v6.1-rc1	2022-10-10 12:49:34 -07:00
smpboot.c	smpboot: use atomic_try_cmpxchg in cpu_wait_death and cpu_report_death	2022-09-11 21:55:10 -07:00
smpboot.h
softirq.c	softirq: Add trace points for tasklet entry/exit	2023-04-15 10:17:16 +02:00
stackleak.c	stackleak: add on/off stack variants	2022-05-08 01:33:09 -07:00
stacktrace.c	uaccess: remove CONFIG_SET_FS	2022-02-25 09:36:06 +01:00
static_call_inline.c	static_call: Add call depth tracking support	2022-10-17 16:41:16 +02:00
static_call.c	static_call: Don't make __static_call_return0 static	2022-04-05 09:59:38 +02:00
stop_machine.c	Scheduler changes in this cycle were:	2022-05-24 11:11:13 -07:00
sys_ni.c	kernel/sys_ni: add compat entry for fadvise64_64	2022-08-20 15:17:45 -07:00
sys.c	kernel/sys.c: fix and improve control flow in __sys_setres[ug]id()	2023-04-18 14:22:12 -07:00
sysctl-test.c	kernel/sysctl-test: use SYSCTL_{ZERO/ONE_HUNDRED} instead of i_{zero/one_hundred}	2022-09-08 16:56:45 -07:00
sysctl.c	sysctl: fix proc_dobool() usability	2023-02-21 13:34:07 -08:00
task_work.c	task_work: use try_cmpxchg in task_work_add, task_work_cancel_match and task_work_run	2022-09-11 21:55:10 -07:00
taskstats.c	genetlink: start to validate reserved header bytes	2022-08-29 12:47:15 +01:00
torture.c	torture: Fix hang during kthread shutdown phase	2023-01-05 12:10:35 -08:00
tracepoint.c	tracepoint: Allow livepatch module add trace event	2023-02-18 14:34:36 -05:00
tsacct.c	taskstats: version 12 with thread group and exe info	2022-04-29 14:38:03 -07:00
ucount.c	ucounts: Split rlimit and ucount values and max values	2022-05-18 18:24:57 -05:00
uid16.c
uid16.h
umh.c	umh: simplify the capability pointer logic	2023-03-03 16:18:19 -08:00
up.c
user_namespace.c	userns: fix a struct's kernel-doc notation	2023-02-02 22:50:04 -08:00
user-return-notifier.c
user.c	kernel/user: Allow user_struct::locked_vm to be usable for iommufd	2022-11-30 20:16:49 -04:00
usermode_driver.c	blob_to_mnt(): kern_unmount() is needed to undo kern_mount()	2022-05-19 23:25:47 -04:00
utsname_sysctl.c	kernel/utsname_sysctl.c: Fix hostname polling	2022-10-23 12:01:01 -07:00
utsname.c
vhost_task.c	vhost_task: Allow vhost layer to use copy_process	2023-03-23 12:45:36 +01:00
watch_queue.c	watch_queue: fix IOC_WATCH_QUEUE_SET_SIZE alloc error paths	2023-03-08 11:44:45 +01:00
watchdog_hld.c	Revert "printk: add functions to prefer direct printing"	2022-06-23 18:41:40 +02:00
watchdog.c	powerpc updates for 6.0	2022-08-06 16:38:17 -07:00
workqueue_internal.h
workqueue.c	workqueue: Fold rebind_worker() within rebind_workers()	2023-01-13 07:50:40 -10:00