linux/kernel/softlockup.c

207 lines
5.0 KiB
C
Raw Normal View History

/*
* Detect Soft Lockups
*
* started by Ingo Molnar, Copyright (C) 2005, 2006 Red Hat, Inc.
*
* this code detects soft lockups: incidents in where on a CPU
* the kernel does not reschedule for 10 seconds or more.
*/
#include <linux/mm.h>
#include <linux/cpu.h>
#include <linux/init.h>
#include <linux/delay.h>
#include <linux/freezer.h>
#include <linux/kthread.h>
#include <linux/notifier.h>
#include <linux/module.h>
softlockup: improve debug output Improve the debuggability of kernel lockups by enhancing the debug output of the softlockup detector: print the task that causes the lockup and try to print a more intelligent backtrace. The old format was: BUG: soft lockup detected on CPU#1! [<c0105e4a>] show_trace_log_lvl+0x19/0x2e [<c0105f43>] show_trace+0x12/0x14 [<c0105f59>] dump_stack+0x14/0x16 [<c015f6bc>] softlockup_tick+0xbe/0xd0 [<c013457d>] run_local_timers+0x12/0x14 [<c01346b8>] update_process_times+0x3e/0x63 [<c0145fb8>] tick_sched_timer+0x7c/0xc0 [<c0140a75>] hrtimer_interrupt+0x135/0x1ba [<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80 [<c0105aa3>] apic_timer_interrupt+0x33/0x38 [<c0104f8a>] syscall_call+0x7/0xb ======================= The new format is: BUG: soft lockup detected on CPU#1! [prctl:2363] Pid: 2363, comm: prctl EIP: 0060:[<c013915f>] CPU: 1 EIP is at sys_prctl+0x24/0x18c EFLAGS: 00000213 Not tainted (2.6.22-cfs-v20 #26) EAX: 00000001 EBX: 000003e7 ECX: 00000001 EDX: f6df0000 ESI: 000003e7 EDI: 000003e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 000006d0 [<c0105e4a>] show_trace_log_lvl+0x19/0x2e [<c0105f43>] show_trace+0x12/0x14 [<c01040be>] show_regs+0x1ab/0x1b3 [<c015f807>] softlockup_tick+0xef/0x108 [<c013457d>] run_local_timers+0x12/0x14 [<c01346b8>] update_process_times+0x3e/0x63 [<c0145fcc>] tick_sched_timer+0x7c/0xc0 [<c0140a89>] hrtimer_interrupt+0x135/0x1ba [<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80 [<c0105aa3>] apic_timer_interrupt+0x33/0x38 [<c0104f8a>] syscall_call+0x7/0xb ======================= Note that in the old format we only knew that some system call locked up, we didnt know _which_. With the new format we know that it's at a specific place in sys_prctl(). [which was where i created an artificial kernel lockup to test the new format.] This is also useful if the lockup happens in user-space - the user-space EIP (and other registers) will be printed too. (such a lockup would either suggest that the task was running at SCHED_FIFO:99 and looping for more than 10 seconds, or that the softlockup detector has a false-positive.) The task name is printed too first, just in case we dont manage to print a useful backtrace. [satyam@infradead.org: fix warning] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Satyam Sharma <satyam@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 14:26:08 +08:00
#include <asm/irq_regs.h>
static DEFINE_SPINLOCK(print_lock);
static DEFINE_PER_CPU(unsigned long, touch_timestamp);
static DEFINE_PER_CPU(unsigned long, print_timestamp);
static DEFINE_PER_CPU(struct task_struct *, watchdog_task);
static int did_panic = 0;
static int
softlock_panic(struct notifier_block *this, unsigned long event, void *ptr)
{
did_panic = 1;
return NOTIFY_DONE;
}
static struct notifier_block panic_block = {
.notifier_call = softlock_panic,
};
Ignore stolen time in the softlockup watchdog The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 15:28:02 +08:00
/*
* Returns seconds, approximately. We don't need nanosecond
* resolution, and we don't need to waste time with a big divide when
* 2^30ns == 1.074s.
*/
static unsigned long get_timestamp(int this_cpu)
Ignore stolen time in the softlockup watchdog The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 15:28:02 +08:00
{
return cpu_clock(this_cpu) >> 30; /* 2^30 ~= 10^9 */
Ignore stolen time in the softlockup watchdog The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 15:28:02 +08:00
}
void touch_softlockup_watchdog(void)
{
int this_cpu = raw_smp_processor_id();
__raw_get_cpu_var(touch_timestamp) = get_timestamp(this_cpu);
}
EXPORT_SYMBOL(touch_softlockup_watchdog);
void touch_all_softlockup_watchdogs(void)
{
int cpu;
/* Cause each CPU to re-update its timestamp rather than complain */
for_each_online_cpu(cpu)
per_cpu(touch_timestamp, cpu) = 0;
}
EXPORT_SYMBOL(touch_all_softlockup_watchdogs);
/*
* This callback runs from the timer interrupt, and checks
* whether the watchdog thread has hung or not:
*/
void softlockup_tick(void)
{
int this_cpu = smp_processor_id();
unsigned long touch_timestamp = per_cpu(touch_timestamp, this_cpu);
Ignore stolen time in the softlockup watchdog The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 15:28:02 +08:00
unsigned long print_timestamp;
softlockup: improve debug output Improve the debuggability of kernel lockups by enhancing the debug output of the softlockup detector: print the task that causes the lockup and try to print a more intelligent backtrace. The old format was: BUG: soft lockup detected on CPU#1! [<c0105e4a>] show_trace_log_lvl+0x19/0x2e [<c0105f43>] show_trace+0x12/0x14 [<c0105f59>] dump_stack+0x14/0x16 [<c015f6bc>] softlockup_tick+0xbe/0xd0 [<c013457d>] run_local_timers+0x12/0x14 [<c01346b8>] update_process_times+0x3e/0x63 [<c0145fb8>] tick_sched_timer+0x7c/0xc0 [<c0140a75>] hrtimer_interrupt+0x135/0x1ba [<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80 [<c0105aa3>] apic_timer_interrupt+0x33/0x38 [<c0104f8a>] syscall_call+0x7/0xb ======================= The new format is: BUG: soft lockup detected on CPU#1! [prctl:2363] Pid: 2363, comm: prctl EIP: 0060:[<c013915f>] CPU: 1 EIP is at sys_prctl+0x24/0x18c EFLAGS: 00000213 Not tainted (2.6.22-cfs-v20 #26) EAX: 00000001 EBX: 000003e7 ECX: 00000001 EDX: f6df0000 ESI: 000003e7 EDI: 000003e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 000006d0 [<c0105e4a>] show_trace_log_lvl+0x19/0x2e [<c0105f43>] show_trace+0x12/0x14 [<c01040be>] show_regs+0x1ab/0x1b3 [<c015f807>] softlockup_tick+0xef/0x108 [<c013457d>] run_local_timers+0x12/0x14 [<c01346b8>] update_process_times+0x3e/0x63 [<c0145fcc>] tick_sched_timer+0x7c/0xc0 [<c0140a89>] hrtimer_interrupt+0x135/0x1ba [<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80 [<c0105aa3>] apic_timer_interrupt+0x33/0x38 [<c0104f8a>] syscall_call+0x7/0xb ======================= Note that in the old format we only knew that some system call locked up, we didnt know _which_. With the new format we know that it's at a specific place in sys_prctl(). [which was where i created an artificial kernel lockup to test the new format.] This is also useful if the lockup happens in user-space - the user-space EIP (and other registers) will be printed too. (such a lockup would either suggest that the task was running at SCHED_FIFO:99 and looping for more than 10 seconds, or that the softlockup detector has a false-positive.) The task name is printed too first, just in case we dont manage to print a useful backtrace. [satyam@infradead.org: fix warning] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Satyam Sharma <satyam@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 14:26:08 +08:00
struct pt_regs *regs = get_irq_regs();
Ignore stolen time in the softlockup watchdog The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 15:28:02 +08:00
unsigned long now;
if (touch_timestamp == 0) {
touch_softlockup_watchdog();
Ignore stolen time in the softlockup watchdog The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 15:28:02 +08:00
return;
}
Ignore stolen time in the softlockup watchdog The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 15:28:02 +08:00
print_timestamp = per_cpu(print_timestamp, this_cpu);
/* report at most once a second */
if ((print_timestamp >= touch_timestamp &&
print_timestamp < (touch_timestamp + 1)) ||
did_panic || !per_cpu(watchdog_task, this_cpu)) {
return;
}
/* do not print during early bootup: */
if (unlikely(system_state != SYSTEM_RUNNING)) {
touch_softlockup_watchdog();
return;
}
now = get_timestamp(this_cpu);
Ignore stolen time in the softlockup watchdog The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 15:28:02 +08:00
/* Wake up the high-prio watchdog task every second: */
Ignore stolen time in the softlockup watchdog The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 15:28:02 +08:00
if (now > (touch_timestamp + 1))
wake_up_process(per_cpu(watchdog_task, this_cpu));
/* Warn about unreasonable 10+ seconds delays: */
softlockup: improve debug output Improve the debuggability of kernel lockups by enhancing the debug output of the softlockup detector: print the task that causes the lockup and try to print a more intelligent backtrace. The old format was: BUG: soft lockup detected on CPU#1! [<c0105e4a>] show_trace_log_lvl+0x19/0x2e [<c0105f43>] show_trace+0x12/0x14 [<c0105f59>] dump_stack+0x14/0x16 [<c015f6bc>] softlockup_tick+0xbe/0xd0 [<c013457d>] run_local_timers+0x12/0x14 [<c01346b8>] update_process_times+0x3e/0x63 [<c0145fb8>] tick_sched_timer+0x7c/0xc0 [<c0140a75>] hrtimer_interrupt+0x135/0x1ba [<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80 [<c0105aa3>] apic_timer_interrupt+0x33/0x38 [<c0104f8a>] syscall_call+0x7/0xb ======================= The new format is: BUG: soft lockup detected on CPU#1! [prctl:2363] Pid: 2363, comm: prctl EIP: 0060:[<c013915f>] CPU: 1 EIP is at sys_prctl+0x24/0x18c EFLAGS: 00000213 Not tainted (2.6.22-cfs-v20 #26) EAX: 00000001 EBX: 000003e7 ECX: 00000001 EDX: f6df0000 ESI: 000003e7 EDI: 000003e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 000006d0 [<c0105e4a>] show_trace_log_lvl+0x19/0x2e [<c0105f43>] show_trace+0x12/0x14 [<c01040be>] show_regs+0x1ab/0x1b3 [<c015f807>] softlockup_tick+0xef/0x108 [<c013457d>] run_local_timers+0x12/0x14 [<c01346b8>] update_process_times+0x3e/0x63 [<c0145fcc>] tick_sched_timer+0x7c/0xc0 [<c0140a89>] hrtimer_interrupt+0x135/0x1ba [<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80 [<c0105aa3>] apic_timer_interrupt+0x33/0x38 [<c0104f8a>] syscall_call+0x7/0xb ======================= Note that in the old format we only knew that some system call locked up, we didnt know _which_. With the new format we know that it's at a specific place in sys_prctl(). [which was where i created an artificial kernel lockup to test the new format.] This is also useful if the lockup happens in user-space - the user-space EIP (and other registers) will be printed too. (such a lockup would either suggest that the task was running at SCHED_FIFO:99 and looping for more than 10 seconds, or that the softlockup detector has a false-positive.) The task name is printed too first, just in case we dont manage to print a useful backtrace. [satyam@infradead.org: fix warning] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Satyam Sharma <satyam@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 14:26:08 +08:00
if (now <= (touch_timestamp + 10))
return;
softlockup: improve debug output Improve the debuggability of kernel lockups by enhancing the debug output of the softlockup detector: print the task that causes the lockup and try to print a more intelligent backtrace. The old format was: BUG: soft lockup detected on CPU#1! [<c0105e4a>] show_trace_log_lvl+0x19/0x2e [<c0105f43>] show_trace+0x12/0x14 [<c0105f59>] dump_stack+0x14/0x16 [<c015f6bc>] softlockup_tick+0xbe/0xd0 [<c013457d>] run_local_timers+0x12/0x14 [<c01346b8>] update_process_times+0x3e/0x63 [<c0145fb8>] tick_sched_timer+0x7c/0xc0 [<c0140a75>] hrtimer_interrupt+0x135/0x1ba [<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80 [<c0105aa3>] apic_timer_interrupt+0x33/0x38 [<c0104f8a>] syscall_call+0x7/0xb ======================= The new format is: BUG: soft lockup detected on CPU#1! [prctl:2363] Pid: 2363, comm: prctl EIP: 0060:[<c013915f>] CPU: 1 EIP is at sys_prctl+0x24/0x18c EFLAGS: 00000213 Not tainted (2.6.22-cfs-v20 #26) EAX: 00000001 EBX: 000003e7 ECX: 00000001 EDX: f6df0000 ESI: 000003e7 EDI: 000003e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 000006d0 [<c0105e4a>] show_trace_log_lvl+0x19/0x2e [<c0105f43>] show_trace+0x12/0x14 [<c01040be>] show_regs+0x1ab/0x1b3 [<c015f807>] softlockup_tick+0xef/0x108 [<c013457d>] run_local_timers+0x12/0x14 [<c01346b8>] update_process_times+0x3e/0x63 [<c0145fcc>] tick_sched_timer+0x7c/0xc0 [<c0140a89>] hrtimer_interrupt+0x135/0x1ba [<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80 [<c0105aa3>] apic_timer_interrupt+0x33/0x38 [<c0104f8a>] syscall_call+0x7/0xb ======================= Note that in the old format we only knew that some system call locked up, we didnt know _which_. With the new format we know that it's at a specific place in sys_prctl(). [which was where i created an artificial kernel lockup to test the new format.] This is also useful if the lockup happens in user-space - the user-space EIP (and other registers) will be printed too. (such a lockup would either suggest that the task was running at SCHED_FIFO:99 and looping for more than 10 seconds, or that the softlockup detector has a false-positive.) The task name is printed too first, just in case we dont manage to print a useful backtrace. [satyam@infradead.org: fix warning] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Satyam Sharma <satyam@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 14:26:08 +08:00
per_cpu(print_timestamp, this_cpu) = touch_timestamp;
spin_lock(&print_lock);
printk(KERN_ERR "BUG: soft lockup detected on CPU#%d!\n", this_cpu);
if (regs)
show_regs(regs);
else
dump_stack();
softlockup: improve debug output Improve the debuggability of kernel lockups by enhancing the debug output of the softlockup detector: print the task that causes the lockup and try to print a more intelligent backtrace. The old format was: BUG: soft lockup detected on CPU#1! [<c0105e4a>] show_trace_log_lvl+0x19/0x2e [<c0105f43>] show_trace+0x12/0x14 [<c0105f59>] dump_stack+0x14/0x16 [<c015f6bc>] softlockup_tick+0xbe/0xd0 [<c013457d>] run_local_timers+0x12/0x14 [<c01346b8>] update_process_times+0x3e/0x63 [<c0145fb8>] tick_sched_timer+0x7c/0xc0 [<c0140a75>] hrtimer_interrupt+0x135/0x1ba [<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80 [<c0105aa3>] apic_timer_interrupt+0x33/0x38 [<c0104f8a>] syscall_call+0x7/0xb ======================= The new format is: BUG: soft lockup detected on CPU#1! [prctl:2363] Pid: 2363, comm: prctl EIP: 0060:[<c013915f>] CPU: 1 EIP is at sys_prctl+0x24/0x18c EFLAGS: 00000213 Not tainted (2.6.22-cfs-v20 #26) EAX: 00000001 EBX: 000003e7 ECX: 00000001 EDX: f6df0000 ESI: 000003e7 EDI: 000003e7 EBP: f6df0fb0 DS: 007b ES: 007b FS: 00d8 CR0: 8005003b CR2: 4d8c3340 CR3: 3731d000 CR4: 000006d0 [<c0105e4a>] show_trace_log_lvl+0x19/0x2e [<c0105f43>] show_trace+0x12/0x14 [<c01040be>] show_regs+0x1ab/0x1b3 [<c015f807>] softlockup_tick+0xef/0x108 [<c013457d>] run_local_timers+0x12/0x14 [<c01346b8>] update_process_times+0x3e/0x63 [<c0145fcc>] tick_sched_timer+0x7c/0xc0 [<c0140a89>] hrtimer_interrupt+0x135/0x1ba [<c011bde7>] smp_apic_timer_interrupt+0x6e/0x80 [<c0105aa3>] apic_timer_interrupt+0x33/0x38 [<c0104f8a>] syscall_call+0x7/0xb ======================= Note that in the old format we only knew that some system call locked up, we didnt know _which_. With the new format we know that it's at a specific place in sys_prctl(). [which was where i created an artificial kernel lockup to test the new format.] This is also useful if the lockup happens in user-space - the user-space EIP (and other registers) will be printed too. (such a lockup would either suggest that the task was running at SCHED_FIFO:99 and looping for more than 10 seconds, or that the softlockup detector has a false-positive.) The task name is printed too first, just in case we dont manage to print a useful backtrace. [satyam@infradead.org: fix warning] Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Satyam Sharma <satyam@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-17 14:26:08 +08:00
spin_unlock(&print_lock);
}
/*
* The watchdog thread - runs every second and touches the timestamp.
*/
static int watchdog(void * __bind_cpu)
{
struct sched_param param = { .sched_priority = MAX_RT_PRIO-1 };
sched_setscheduler(current, SCHED_FIFO, &param);
Ignore stolen time in the softlockup watchdog The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 15:28:02 +08:00
/* initialize timestamp */
touch_softlockup_watchdog();
/*
* Run briefly once per second to reset the softlockup timestamp.
* If this gets delayed for more than 10 seconds then the
* debug-printout triggers in softlockup_tick().
*/
while (!kthread_should_stop()) {
set_current_state(TASK_INTERRUPTIBLE);
touch_softlockup_watchdog();
schedule();
}
return 0;
}
/*
* Create/destroy watchdog threads as CPUs come and go:
*/
static int __cpuinit
cpu_callback(struct notifier_block *nfb, unsigned long action, void *hcpu)
{
int hotcpu = (unsigned long)hcpu;
struct task_struct *p;
switch (action) {
case CPU_UP_PREPARE:
case CPU_UP_PREPARE_FROZEN:
BUG_ON(per_cpu(watchdog_task, hotcpu));
p = kthread_create(watchdog, hcpu, "watchdog/%d", hotcpu);
if (IS_ERR(p)) {
printk("watchdog for %i failed\n", hotcpu);
return NOTIFY_BAD;
}
Ignore stolen time in the softlockup watchdog The softlockup watchdog is currently a nuisance in a virtual machine, since the whole system could have the CPU stolen from it for a long period of time. While it would be unlikely for a guest domain to be denied timer interrupts for over 10s, it could happen and any softlockup message would be completely spurious. Earlier I proposed that sched_clock() return time in unstolen nanoseconds, which is how Xen and VMI currently implement it. If the softlockup watchdog uses sched_clock() to measure time, it would automatically ignore stolen time, and therefore only report when the guest itself locked up. When running native, sched_clock() returns real-time nanoseconds, so the behaviour would be unchanged. Note that sched_clock() used this way is inherently per-cpu, so this patch makes sure that the per-processor watchdog thread initialized its own timestamp. Signed-off-by: Jeremy Fitzhardinge <jeremy@xensource.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: john stultz <johnstul@us.ibm.com> Cc: Zachary Amsden <zach@vmware.com> Cc: James Morris <jmorris@namei.org> Cc: Dan Hecht <dhecht@vmware.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Prarit Bhargava <prarit@redhat.com> Cc: Chris Lalancette <clalance@redhat.com> Cc: Rick Lindsley <ricklind@us.ibm.com> Cc: Eric Dumazet <dada1@cosmosbay.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-05-08 15:28:02 +08:00
per_cpu(touch_timestamp, hotcpu) = 0;
per_cpu(watchdog_task, hotcpu) = p;
kthread_bind(p, hotcpu);
break;
case CPU_ONLINE:
case CPU_ONLINE_FROZEN:
wake_up_process(per_cpu(watchdog_task, hotcpu));
break;
#ifdef CONFIG_HOTPLUG_CPU
case CPU_UP_CANCELED:
case CPU_UP_CANCELED_FROZEN:
if (!per_cpu(watchdog_task, hotcpu))
break;
/* Unbind so it can run. Fall thru. */
kthread_bind(per_cpu(watchdog_task, hotcpu),
any_online_cpu(cpu_online_map));
case CPU_DEAD:
case CPU_DEAD_FROZEN:
p = per_cpu(watchdog_task, hotcpu);
per_cpu(watchdog_task, hotcpu) = NULL;
kthread_stop(p);
break;
#endif /* CONFIG_HOTPLUG_CPU */
}
return NOTIFY_OK;
}
static struct notifier_block __cpuinitdata cpu_nfb = {
.notifier_call = cpu_callback
};
__init void spawn_softlockup_task(void)
{
void *cpu = (void *)(long)smp_processor_id();
int err = cpu_callback(&cpu_nfb, CPU_UP_PREPARE, cpu);
BUG_ON(err == NOTIFY_BAD);
cpu_callback(&cpu_nfb, CPU_ONLINE, cpu);
register_cpu_notifier(&cpu_nfb);
[PATCH] Notifier chain update: API changes The kernel's implementation of notifier chains is unsafe. There is no protection against entries being added to or removed from a chain while the chain is in use. The issues were discussed in this thread: http://marc.theaimsgroup.com/?l=linux-kernel&m=113018709002036&w=2 We noticed that notifier chains in the kernel fall into two basic usage classes: "Blocking" chains are always called from a process context and the callout routines are allowed to sleep; "Atomic" chains can be called from an atomic context and the callout routines are not allowed to sleep. We decided to codify this distinction and make it part of the API. Therefore this set of patches introduces three new, parallel APIs: one for blocking notifiers, one for atomic notifiers, and one for "raw" notifiers (which is really just the old API under a new name). New kinds of data structures are used for the heads of the chains, and new routines are defined for registration, unregistration, and calling a chain. The three APIs are explained in include/linux/notifier.h and their implementation is in kernel/sys.c. With atomic and blocking chains, the implementation guarantees that the chain links will not be corrupted and that chain callers will not get messed up by entries being added or removed. For raw chains the implementation provides no guarantees at all; users of this API must provide their own protections. (The idea was that situations may come up where the assumptions of the atomic and blocking APIs are not appropriate, so it should be possible for users to handle these things in their own way.) There are some limitations, which should not be too hard to live with. For atomic/blocking chains, registration and unregistration must always be done in a process context since the chain is protected by a mutex/rwsem. Also, a callout routine for a non-raw chain must not try to register or unregister entries on its own chain. (This did happen in a couple of places and the code had to be changed to avoid it.) Since atomic chains may be called from within an NMI handler, they cannot use spinlocks for synchronization. Instead we use RCU. The overhead falls almost entirely in the unregister routine, which is okay since unregistration is much less frequent that calling a chain. Here is the list of chains that we adjusted and their classifications. None of them use the raw API, so for the moment it is only a placeholder. ATOMIC CHAINS ------------- arch/i386/kernel/traps.c: i386die_chain arch/ia64/kernel/traps.c: ia64die_chain arch/powerpc/kernel/traps.c: powerpc_die_chain arch/sparc64/kernel/traps.c: sparc64die_chain arch/x86_64/kernel/traps.c: die_chain drivers/char/ipmi/ipmi_si_intf.c: xaction_notifier_list kernel/panic.c: panic_notifier_list kernel/profile.c: task_free_notifier net/bluetooth/hci_core.c: hci_notifier net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_chain net/ipv4/netfilter/ip_conntrack_core.c: ip_conntrack_expect_chain net/ipv6/addrconf.c: inet6addr_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_chain net/netfilter/nf_conntrack_core.c: nf_conntrack_expect_chain net/netlink/af_netlink.c: netlink_chain BLOCKING CHAINS --------------- arch/powerpc/platforms/pseries/reconfig.c: pSeries_reconfig_chain arch/s390/kernel/process.c: idle_chain arch/x86_64/kernel/process.c idle_notifier drivers/base/memory.c: memory_chain drivers/cpufreq/cpufreq.c cpufreq_policy_notifier_list drivers/cpufreq/cpufreq.c cpufreq_transition_notifier_list drivers/macintosh/adb.c: adb_client_list drivers/macintosh/via-pmu.c sleep_notifier_list drivers/macintosh/via-pmu68k.c sleep_notifier_list drivers/macintosh/windfarm_core.c wf_client_list drivers/usb/core/notify.c usb_notifier_list drivers/video/fbmem.c fb_notifier_list kernel/cpu.c cpu_chain kernel/module.c module_notify_list kernel/profile.c munmap_notifier kernel/profile.c task_exit_notifier kernel/sys.c reboot_notifier_list net/core/dev.c netdev_chain net/decnet/dn_dev.c: dnaddr_chain net/ipv4/devinet.c: inetaddr_chain It's possible that some of these classifications are wrong. If they are, please let us know or submit a patch to fix them. Note that any chain that gets called very frequently should be atomic, because the rwsem read-locking used for blocking chains is very likely to incur cache misses on SMP systems. (However, if the chain's callout routines may sleep then the chain cannot be atomic.) The patch set was written by Alan Stern and Chandra Seetharaman, incorporating material written by Keith Owens and suggestions from Paul McKenney and Andrew Morton. [jes@sgi.com: restructure the notifier chain initialization macros] Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Chandra Seetharaman <sekharan@us.ibm.com> Signed-off-by: Jes Sorensen <jes@sgi.com> Signed-off-by: Andrew Morton <akpm@osdl.org> Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-03-27 17:16:30 +08:00
atomic_notifier_chain_register(&panic_notifier_list, &panic_block);
}