linux/kernel
Frederic Weisbecker 76a33061b9 irq_work: Force raised irq work to run on irq work interrupt
The nohz full kick, which restarts the tick when any resource depend
on it, can't be executed anywhere given the operation it does on timers.
If it is called from the scheduler or timers code, chances are that
we run into a deadlock.

This is why we run the nohz full kick from an irq work. That way we make
sure that the kick runs on a virgin context.

However if that's the case when irq work runs in its own dedicated
self-ipi, things are different for the big bunch of archs that don't
support the self triggered way. In order to support them, irq works are
also handled by the timer interrupt as fallback.

Now when irq works run on the timer interrupt, the context isn't blank.
More precisely, they can run in the context of the hrtimer that runs the
tick. But the nohz kick cancels and restarts this hrtimer and cancelling
an hrtimer from itself isn't allowed. This is why we run in an endless
loop:

	Kernel panic - not syncing: Watchdog detected hard LOCKUP on cpu 2
	CPU: 2 PID: 7538 Comm: kworker/u8:8 Not tainted 3.16.0+ #34
	Workqueue: btrfs-endio-write normal_work_helper [btrfs]
	 ffff880244c06c88 000000001b486fe1 ffff880244c06bf0 ffffffff8a7f1e37
	 ffffffff8ac52a18 ffff880244c06c78 ffffffff8a7ef928 0000000000000010
	 ffff880244c06c88 ffff880244c06c20 000000001b486fe1 0000000000000000
	Call Trace:
	 <NMI[<ffffffff8a7f1e37>] dump_stack+0x4e/0x7a
	 [<ffffffff8a7ef928>] panic+0xd4/0x207
	 [<ffffffff8a1450e8>] watchdog_overflow_callback+0x118/0x120
	 [<ffffffff8a186b0e>] __perf_event_overflow+0xae/0x350
	 [<ffffffff8a184f80>] ? perf_event_task_disable+0xa0/0xa0
	 [<ffffffff8a01a4cf>] ? x86_perf_event_set_period+0xbf/0x150
	 [<ffffffff8a187934>] perf_event_overflow+0x14/0x20
	 [<ffffffff8a020386>] intel_pmu_handle_irq+0x206/0x410
	 [<ffffffff8a01937b>] perf_event_nmi_handler+0x2b/0x50
	 [<ffffffff8a007b72>] nmi_handle+0xd2/0x390
	 [<ffffffff8a007aa5>] ? nmi_handle+0x5/0x390
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 [<ffffffff8a008062>] default_do_nmi+0x72/0x1c0
	 [<ffffffff8a008268>] do_nmi+0xb8/0x100
	 [<ffffffff8a7ff66a>] end_repeat_nmi+0x1e/0x2e
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 [<ffffffff8a0cb7f8>] ? match_held_lock+0x8/0x1b0
	 <<EOE><IRQ[<ffffffff8a0ccd2f>] lock_acquired+0xaf/0x450
	 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
	 [<ffffffff8a7fc678>] _raw_spin_lock_irqsave+0x78/0x90
	 [<ffffffff8a0f74c5>] ? lock_hrtimer_base.isra.20+0x25/0x50
	 [<ffffffff8a0f74c5>] lock_hrtimer_base.isra.20+0x25/0x50
	 [<ffffffff8a0f7723>] hrtimer_try_to_cancel+0x33/0x1e0
	 [<ffffffff8a0f78ea>] hrtimer_cancel+0x1a/0x30
	 [<ffffffff8a109237>] tick_nohz_restart+0x17/0x90
	 [<ffffffff8a10a213>] __tick_nohz_full_check+0xc3/0x100
	 [<ffffffff8a10a25e>] nohz_full_kick_work_func+0xe/0x10
	 [<ffffffff8a17c884>] irq_work_run_list+0x44/0x70
	 [<ffffffff8a17c8da>] irq_work_run+0x2a/0x50
	 [<ffffffff8a0f700b>] update_process_times+0x5b/0x70
	 [<ffffffff8a109005>] tick_sched_handle.isra.21+0x25/0x60
	 [<ffffffff8a109b81>] tick_sched_timer+0x41/0x60
	 [<ffffffff8a0f7aa2>] __run_hrtimer+0x72/0x470
	 [<ffffffff8a109b40>] ? tick_sched_do_timer+0xb0/0xb0
	 [<ffffffff8a0f8707>] hrtimer_interrupt+0x117/0x270
	 [<ffffffff8a034357>] local_apic_timer_interrupt+0x37/0x60
	 [<ffffffff8a80010f>] smp_apic_timer_interrupt+0x3f/0x50
	 [<ffffffff8a7fe52f>] apic_timer_interrupt+0x6f/0x80

To fix this we force non-lazy irq works to run on irq work self-IPIs
when available. That ability of the arch to trigger irq work self IPIs
is available with arch_irq_work_has_interrupt().

Reported-by: Catalin Iacob <iacobcatalin@gmail.com>
Reported-by: Dave Jones <davej@redhat.com>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
2014-09-13 18:38:15 +02:00
..
bpf net: filter: split 'struct sk_filter' into socket and bpf parts 2014-08-02 15:03:58 -07:00
debug kdb: Use ktime_get_ts() 2014-06-12 16:18:45 +02:00
events Merge branch 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-08-24 16:16:55 -07:00
gcov kernel/gcov/fs.c: remove unnecessary null test before debugfs_remove 2014-08-08 15:57:24 -07:00
irq irq: Export handle_fasteoi_irq 2014-08-25 21:13:30 +02:00
locking arch, locking: Ciao arch_mutex_cpu_relax() 2014-07-17 12:32:47 +02:00
power PM / sleep: Fix test_suspend= command line option 2014-09-03 01:21:03 +02:00
printk printk: Add function to return log buffer address and size 2014-08-13 15:13:44 +10:00
rcu rcu: Make nocb leader kthreads process pending callbacks after spawning 2014-08-28 05:59:59 -07:00
sched Merge branches 'pm-sleep', 'pm-cpufreq' and 'pm-cpuidle' 2014-08-11 23:19:48 +02:00
time irq_work: Force raised irq work to run on irq work interrupt 2014-09-13 18:38:15 +02:00
trace trace: Fix epoll hang when we race with new entries 2014-08-25 20:18:11 -04:00
.gitignore
acct.c kernel/acct.c: fix coding style warnings and errors 2014-08-08 15:57:27 -07:00
async.c
audit_tree.c
audit_watch.c
audit.c CAPABILITIES: remove undefined caps from all processes 2014-07-24 21:53:47 +10:00
audit.h audit: Use struct net not pid_t to remember the network namespce to reply in 2014-03-20 10:10:53 -04:00
auditfilter.c kernel/auditfilter.c: replace count*size kmalloc by kcalloc 2014-08-06 18:01:12 -07:00
auditsc.c auditsc: audit_krule mask accesses need bounds checking 2014-06-10 08:44:40 -07:00
backtracetest.c kernel/backtracetest.c: replace no level printk by pr_info() 2014-06-04 16:54:14 -07:00
bounds.c page-cgroup: get rid of NR_PCG_FLAGS 2014-08-08 15:57:18 -07:00
capability.c CAPABILITIES: remove undefined caps from all processes 2014-07-24 21:53:47 +10:00
cgroup_freezer.c cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes 2014-07-15 11:05:09 -04:00
cgroup.c Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-08-04 10:11:28 -07:00
compat.c compat: nanosleep: Clarify error handling 2014-09-06 12:58:18 +02:00
configs.c
context_tracking.c x86/kprobes: Fix build errors and blacklist context_track_user 2014-06-14 09:07:44 +02:00
cpu_pm.c
cpu.c sched: Rework check_for_tasks() 2014-07-05 11:17:45 +02:00
cpuset.c Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup 2014-08-04 10:11:28 -07:00
crash_dump.c
cred.c
delayacct.c delayacct: Remove braindamaged type conversions 2014-07-23 10:18:06 -07:00
dma.c
elfcore.c
exec_domain.c kernel/exec_domain.c: code clean-up 2014-06-04 16:54:15 -07:00
exit.c kernel/exit.c: fix coding style warnings and errors 2014-08-08 15:57:22 -07:00
extable.c
fork.c seccomp: Replace BUG(!spin_is_locked()) with assert_spin_lock 2014-08-11 13:29:12 -07:00
freezer.c
futex_compat.c
futex.c futex: Simplify futex_lock_pi_atomic() and make it more robust 2014-06-21 22:26:24 +02:00
groups.c kernel/groups.c: remove return value of set_groups 2014-04-03 16:21:05 -07:00
hung_task.c kernel/hung_task.c: convert simple_strtoul to kstrtouint 2014-06-04 16:54:15 -07:00
irq_work.c irq_work: Force raised irq work to run on irq work interrupt 2014-09-13 18:38:15 +02:00
jump_label.c
kallsyms.c kernel/kallsyms.c: fix %pB when there's no symbol at the address 2014-08-08 15:57:18 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking/rwsem: Add CONFIG_RWSEM_SPIN_ON_OWNER 2014-07-16 14:57:13 +02:00
Kconfig.preempt
kexec.c kexec: create a new config option CONFIG_KEXEC_FILE for new syscall 2014-08-29 16:28:16 -07:00
kmod.c signals: change wait_for_helper() to use kernel_sigaction() 2014-06-06 16:08:12 -07:00
kprobes.c kprobes: Skip kretprobe hit in NMI context to avoid deadlock 2014-08-08 10:38:04 +02:00
ksysfs.c kobject: Make support for uevent_helper optional. 2014-04-25 12:00:49 -07:00
kthread.c kthread_work: wake up worker only when the worker is idle 2014-07-28 14:07:52 -04:00
latencytop.c kernel/latencytop.c: convert seq_printf to seq_puts 2014-06-04 16:54:15 -07:00
Makefile bin2c: move bin2c in scripts/basic 2014-08-08 15:57:32 -07:00
module_signing.c
module-internal.h
module.c module: Clean up ro/nx after early module load failures 2014-08-16 04:47:00 +09:30
notifier.c kprobes, notifier: Use NOKPROBE_SYMBOL macro in notifier 2014-04-24 10:26:39 +02:00
nsproxy.c namespaces: Use task_lock and not rcu to protect nsproxy 2014-07-29 18:08:50 -07:00
padata.c
panic.c panic: add TAINT_SOFTLOCKUP 2014-08-08 15:57:24 -07:00
params.c Add module param type 'ullong' 2014-07-17 22:07:37 +02:00
pid_namespace.c pid_namespace: pidns_get() should check task_active_pid_ns() != NULL 2014-04-02 16:20:21 -07:00
pid.c
profile.c kernel/profile.c: use static const char instead of static char 2014-06-06 16:08:13 -07:00
ptrace.c sched: Remove proliferation of wait_on_bit() action functions 2014-07-16 15:10:39 +02:00
range.c
reboot.c kernel/reboot.c: convert simple_strtoul to kstrtoint 2014-06-04 16:54:15 -07:00
relay.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
res_counter.c kernel/res_counter.c: replace simple_strtoull by kstrtoull 2014-06-04 16:54:15 -07:00
resource.c resource: fix the case of null pointer access 2014-08-29 16:28:15 -07:00
seccomp.c seccomp: Replace BUG(!spin_is_locked()) with assert_spin_lock 2014-08-11 13:29:12 -07:00
signal.c Merge branch 'signal-cleanup' of git://git.kernel.org/pub/scm/linux/kernel/git/rw/misc 2014-08-09 09:58:12 -07:00
smp.c kernel/smp.c:on_each_cpu_cond(): fix warning in fallback path 2014-08-06 18:01:22 -07:00
smpboot.c
smpboot.h
softirq.c Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu 2014-05-22 11:36:10 +02:00
stacktrace.c
stop_machine.c kernel/stop_machine.c: kernel-doc warning fix 2014-06-04 16:54:15 -07:00
sys_ni.c kexec: new syscall kexec_file_load() declaration 2014-08-08 15:57:32 -07:00
sys.c sched: move no_new_privs into new atomic flags 2014-07-18 12:13:38 -07:00
sysctl_binary.c ipv6: Allow accepting RA from local IP addresses. 2014-07-01 12:16:24 -07:00
sysctl.c mm, hugetlb: remove hugetlb_zero and hugetlb_infinity 2014-08-06 18:01:19 -07:00
system_certificates.S
system_keyring.c KEYS: validate certificate trust only with builtin keys 2014-07-17 09:35:17 -04:00
task_work.c
taskstats.c
test_kprobes.c kernel/test_kprobes.c: use current logging functions 2014-08-08 15:57:18 -07:00
torture.c torture: Avoid format string leak to thead name 2014-07-07 10:12:56 -07:00
tracepoint.c tracing: syscall_regfunc() should not skip kernel threads 2014-06-21 00:15:26 -04:00
tsacct.c sched: Make task->start_time nanoseconds based 2014-07-23 10:18:05 -07:00
uid16.c
up.c smp: Rename __smp_call_function_single() to smp_call_function_single_async() 2014-02-24 14:47:15 -08:00
user_namespace.c proc: constify seq_operations 2014-08-08 15:57:22 -07:00
user-return-notifier.c
user.c kernel/user.c: drop unused field 'files' from user_struct 2014-06-04 16:54:16 -07:00
utsname_sysctl.c sysctl: convert use of typedef ctl_table to struct ctl_table 2014-06-06 16:08:16 -07:00
utsname.c namespaces: Use task_lock and not rcu to protect nsproxy 2014-07-29 18:08:50 -07:00
watchdog.c panic: add TAINT_SOFTLOCKUP 2014-08-08 15:57:24 -07:00
workqueue_internal.h workqueue: rename manager_mutex to attach_mutex 2014-05-20 10:59:32 -04:00
workqueue.c Merge branch 'for-3.17' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu 2014-08-04 10:09:27 -07:00