linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-14 15:54:15 +08:00

History

Yonghong Song cf83b2d2e2 bpf: Permit cond_resched for some iterators Commit `e679654a70` ("bpf: Fix a rcu_sched stall issue with bpf task/task_file iterator") tries to fix rcu stalls warning which is caused by bpf task_file iterator when running "bpftool prog". rcu: INFO: rcu_sched self-detected stall on CPU rcu: \x097-....: (20999 ticks this GP) idle=302/1/0x4000000000000000 softirq=1508852/1508852 fqs=4913 \x09(t=21031 jiffies g=2534773 q=179750) NMI backtrace for cpu 7 CPU: 7 PID: 184195 Comm: bpftool Kdump: loaded Tainted: G W 5.8.0-00004-g68bfc7f8c1b4 #6 Hardware name: Quanta Twin Lakes MP/Twin Lakes Passive MP, BIOS F09_3A17 05/03/2019 Call Trace: <IRQ> dump_stack+0x57/0x70 nmi_cpu_backtrace.cold+0x14/0x53 ? lapic_can_unplug_cpu.cold+0x39/0x39 nmi_trigger_cpumask_backtrace+0xb7/0xc7 rcu_dump_cpu_stacks+0xa2/0xd0 rcu_sched_clock_irq.cold+0x1ff/0x3d9 ? tick_nohz_handler+0x100/0x100 update_process_times+0x5b/0x90 tick_sched_timer+0x5e/0xf0 __hrtimer_run_queues+0x12a/0x2a0 hrtimer_interrupt+0x10e/0x280 __sysvec_apic_timer_interrupt+0x51/0xe0 asm_call_on_stack+0xf/0x20 </IRQ> sysvec_apic_timer_interrupt+0x6f/0x80 ... task_file_seq_next+0x52/0xa0 bpf_seq_read+0xb9/0x320 vfs_read+0x9d/0x180 ksys_read+0x5f/0xe0 do_syscall_64+0x38/0x60 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The fix is to limit the number of bpf program runs to be one million. This fixed the program in most cases. But we also found under heavy load, which can increase the wallclock time for bpf_seq_read(), the warning may still be possible. For example, calling bpf_delay() in the "while" loop of bpf_seq_read(), which will introduce artificial delay, the warning will show up in my qemu run. static unsigned q; volatile unsigned p = &q; volatile unsigned long long ll; static void bpf_delay(void) { int i, j; for (i = 0; i < 10000; i++) for (j = 0; j < 10000; j++) ll += p; } There are two ways to fix this issue. One is to reduce the above one million threshold to say 100,000 and hopefully rcu warning will not show up any more. Another is to introduce a target feature which enables bpf_seq_read() calling cond_resched(). This patch took second approach as the first approach may cause more -EAGAIN failures for read() syscalls. Note that not all bpf_iter targets can permit cond_resched() in bpf_seq_read() as some, e.g., netlink seq iterator, rcu read lock critical section spans through seq_ops->next() -> seq_ops->show() -> seq_ops->next(). For the kernel code with the above hack, "bpftool p" roughly takes 38 seconds to finish on my VM with 184 bpf program runs. Using the following command, I am able to collect the number of context switches: perf stat -e context-switches -- ./bpftool p >& log Without this patch, 69 context-switches With this patch, 75 context-switches This patch added additional 6 context switches, roughly every 6 seconds to reschedule, to avoid lengthy no-rescheduling which may cause the above RCU warnings. Signed-off-by: Yonghong Song <yhs@fb.com> Signed-off-by: Alexei Starovoitov <ast@kernel.org> Acked-by: Andrii Nakryiko <andrii@kernel.org> Link: https://lore.kernel.org/bpf/20201028061054.1411116-1-yhs@fb.com		2020-10-28 14:54:31 -07:00
..
bpf	bpf: Permit cond_resched for some iterators	2020-10-28 14:54:31 -07:00
cgroup	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
configs	compiler: remove CONFIG_OPTIMIZE_INLINING entirely	2020-04-07 10:43:42 -07:00
debug	kdb: Fix pager search for multi-line strings	2020-10-01 14:44:08 +01:00
dma	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
entry	arch-cleanup-2020-10-22	2020-10-23 10:06:38 -07:00
events	task_work: cleanup notification modes	2020-10-17 15:05:30 -06:00
gcov	gcov: add support for GCC 10.1	2020-09-11 09:33:54 -07:00
irq	task_work: cleanup notification modes	2020-10-17 15:05:30 -06:00
kcsan	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
livepatch	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
locking	Merge tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2020-10-18 14:34:50 -07:00
power	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
printk	Urgent printk fix for 5.10	2020-10-16 12:52:37 -07:00
rcu	Merge tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2020-10-18 14:34:50 -07:00
sched	task_work: cleanup notification modes	2020-10-17 15:05:30 -06:00
time	Merge tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2020-10-18 14:34:50 -07:00
trace	Tracing: Fix mismatch section of adding early trace events	2020-10-16 14:56:52 -07:00
.gitignore	.gitignore: add SPDX License Identifier	2020-03-25 11:50:48 +01:00
acct.c	kernel: acct.c: fix some kernel-doc nits	2020-10-16 11:11:19 -07:00
async.c	treewide: Remove uninitialized_var() usage	2020-07-16 12:35:15 -07:00
audit_fsnotify.c	fsnotify: create method handle_inode_event() in fsnotify_operations	2020-07-27 23:25:50 +02:00
audit_tree.c	\n	2020-08-06 19:29:51 -07:00
audit_watch.c	fsnotify: create method handle_inode_event() in fsnotify_operations	2020-07-27 23:25:50 +02:00
audit.c	audit: Remove redundant null check	2020-08-26 09:10:39 -04:00
audit.h	audit: change unnecessary globals into statics	2020-08-17 20:26:58 -04:00
auditfilter.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
auditsc.c	audit/stable-5.9 PR 20200803	2020-08-04 14:20:26 -07:00
backtracetest.c	treewide: Replace DECLARE_TASKLET() with DECLARE_TASKLET_OLD()	2020-07-30 11:15:58 -07:00
bounds.c
capability.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
compat.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
configs.c
context_tracking.c	context_tracking: Ensure that the critical path cannot be instrumented	2020-06-11 15:14:36 +02:00
cpu_pm.c	notifier: Fix broken error handling pattern	2020-09-01 09:58:03 +02:00
cpu.c	The changes in this cycle are:	2020-06-03 13:06:42 -07:00
crash_core.c	kdump: append kernel build-id string to VMCOREINFO	2020-08-12 10:58:01 -07:00
crash_dump.c	crash_dump: Remove no longer used saved_max_pfn	2020-04-15 11:21:54 +02:00
cred.c	exec: Teach prepare_exec_creds how exec treats uids & gids	2020-05-20 14:44:21 -05:00
delayacct.c
dma.c
elfcore.c
exec_domain.c
exit.c	pid: move pidfd_get_pid() to pid.c	2020-10-18 09:27:10 -07:00
extable.c	kernel/extable.c: use address-of operator on section symbols	2020-04-07 10:43:42 -07:00
fail_function.c
fork.c	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
freezer.c
futex.c	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
gen_kheaders.sh	kbuild: add variables for compression tools	2020-06-06 23:42:01 +09:00
groups.c	mm: remove the pgprot argument to __vmalloc	2020-06-02 10:59:11 -07:00
hung_task.c	kernel/hung_task.c: introduce sysctl to print all traces when a hung task is detected	2020-06-08 11:05:56 -07:00
iomem.c
irq_work.c	irq_work, smp: Allow irq_work on call_single_queue	2020-05-28 10:54:15 +02:00
jump_label.c	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
kallsyms.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
kcmp.c	kernel/kcmp.c: Use new infrastructure to fix deadlocks in execve	2020-03-25 10:04:01 -05:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kcov.c	kcov: make some symbols static	2020-08-12 10:58:02 -07:00
kexec_core.c	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
kexec_elf.c
kexec_file.c	kernel/resource: move and rename IORESOURCE_MEM_DRIVER_MANAGED	2020-10-16 11:11:18 -07:00
kexec_internal.h
kexec.c	LSM: Introduce kernel_post_load_data() hook	2020-10-05 13:37:03 +02:00
kheaders.c
kmod.c	kmod: remove redundant "be an" in the comment	2020-08-12 10:58:01 -07:00
kprobes.c	Updates for tracing and bootconfig:	2020-10-15 15:51:28 -07:00
ksysfs.c
kthread.c	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
latencytop.c	sysctl: pass kernel pointers to ->proc_handler	2020-04-27 02:07:40 -04:00
Makefile	Kbuild updates for v5.10	2020-10-22 13:13:57 -07:00
module_signature.c
module_signing.c
module-internal.h
module.c	Modules updates for v5.10	2020-10-22 13:08:57 -07:00
notifier.c	notifier: Fix broken error handling pattern	2020-09-01 09:58:03 +02:00
nsproxy.c	nsproxy: support CLONE_NEWTIME with setns()	2020-07-08 11:14:22 +02:00
padata.c	padata: fix possible padata_works_lock deadlock	2020-09-04 17:51:55 +10:00
panic.c	panic: dump registers on panic_on_warn	2020-10-16 11:11:22 -07:00
params.c	moduleparams: Add hexint type parameter	2020-07-28 13:44:53 +02:00
pid_namespace.c	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
pid.c	pid: move pidfd_get_pid() to pid.c	2020-10-18 09:27:10 -07:00
profile.c
ptrace.c
range.c	kernel.h: split out min()/max() et al. helpers	2020-10-16 11:11:19 -07:00
reboot.c	arch: remove unicore32 port	2020-07-01 12:09:13 +03:00
regset.c	regset: kill ->get()	2020-07-27 14:31:12 -04:00
relay.c	kernel/relay.c: drop unneeded initialization	2020-10-16 11:11:22 -07:00
resource.c	kernel/resource: make iomem_resource implicit in release_mem_region_adjustable()	2020-10-16 11:11:18 -07:00
rseq.c
scftorture.c	scftorture: Add cond_resched() to test loop	2020-08-24 18:38:38 -07:00
scs.c	mm: memcontrol: account kernel stack per node	2020-08-07 11:33:25 -07:00
seccomp.c	seccomp: Make duplicate listener detection non-racy	2020-10-08 13:17:47 -07:00
signal.c	treewide: Use fallthrough pseudo-keyword	2020-08-23 17:36:59 -05:00
smp.c	Merge tag 'core-rcu-2020-10-12' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip	2020-10-18 14:34:50 -07:00
smpboot.c
smpboot.h
softirq.c	softirq: Add debug check to __raise_softirq_irqoff()	2020-09-16 15:18:56 +02:00
stackleak.c	stackleak: let stack_erasing_sysctl take a kernel pointer buffer	2020-09-19 13:13:39 -07:00
stacktrace.c	stacktrace: Remove reliable argument from arch_stack_walk() callback	2020-09-18 14:24:16 +01:00
static_call.c	static_call: Fix return type of static_call_init	2020-10-02 21:18:25 +02:00
stop_machine.c
sys_ni.c	mm/madvise: introduce process_madvise() syscall: an external memory hinting API	2020-10-18 09:27:10 -07:00
sys.c	kernel/sys.c: replace do_brk with do_brk_flags in comment of prctl_set_mm_map()	2020-10-16 11:11:19 -07:00
sysctl-test.c
sysctl.c	mm: allow a controlled amount of unfairness in the page lock	2020-09-17 10:26:41 -07:00
task_work.c	task_work: cleanup notification modes	2020-10-17 15:05:30 -06:00
taskstats.c	taskstats: move specifying netlink policy back to ops	2020-10-02 19:11:12 -07:00
test_kprobes.c
torture.c	torture: Dump ftrace at shutdown only if requested	2020-06-29 12:01:45 -07:00
tracepoint.c	tracepoint: Fix out of sync data passing by static caller	2020-10-02 21:18:25 +02:00
tsacct.c
ucount.c	ucount: Make sure ucounts in /proc/sys/user don't regress again	2020-04-07 21:51:27 +02:00
uid16.c
uid16.h
umh.c	usermodehelper: reset umask to default before executing user process	2020-10-06 10:31:52 -07:00
up.c
user_namespace.c	kernel/: fix repeated words in comments	2020-10-16 11:11:19 -07:00
user-return-notifier.c
user.c	user.c: make uidhash_table static	2020-06-04 19:06:24 -07:00
usermode_driver.c	umd: Stop using split_argv	2020-07-07 11:58:59 -05:00
utsname_sysctl.c	sysctl: pass kernel pointers to ->proc_handler	2020-04-27 02:07:40 -04:00
utsname.c	nsproxy: add struct nsset	2020-05-09 13:57:12 +02:00
watch_queue.c	watch_queue: Limit the number of watches a user can hold	2020-08-17 09:39:18 -07:00
watchdog_hld.c
watchdog.c	kernel/watchdog.c: convert {soft/hard}lockup boot parameters to sysctl aliases	2020-06-08 11:05:56 -07:00
workqueue_internal.h
workqueue.c	workqueue: fix a kernel-doc warning	2020-10-16 07:28:20 +02:00