2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-29 07:34:06 +08:00
linux-next/arch/x86/kernel
Prarit Bhargava da6139e49c x86: Add check for number of available vectors before CPU down
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=64791

When a cpu is downed on a system, the irqs on the cpu are assigned to
other cpus.  It is possible, however, that when a cpu is downed there
aren't enough free vectors on the remaining cpus to account for the
vectors from the cpu that is being downed.

This results in an interesting "overflow" condition where irqs are
"assigned" to a CPU but are not handled.

For example, when downing cpus on a 1-64 logical processor system:

<snip>
[  232.021745] smpboot: CPU 61 is now offline
[  238.480275] smpboot: CPU 62 is now offline
[  245.991080] ------------[ cut here ]------------
[  245.996270] WARNING: CPU: 0 PID: 0 at net/sched/sch_generic.c:264 dev_watchdog+0x246/0x250()
[  246.005688] NETDEV WATCHDOG: p786p1 (ixgbe): transmit queue 0 timed out
[  246.013070] Modules linked in: lockd sunrpc iTCO_wdt iTCO_vendor_support sb_edac ixgbe microcode e1000e pcspkr joydev edac_core lpc_ich ioatdma ptp mdio mfd_core i2c_i801 dca pps_core i2c_core wmi acpi_cpufreq isci libsas scsi_transport_sas
[  246.037633] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.12.0+ #14
[  246.044451] Hardware name: Intel Corporation S4600LH ........../SVRBD-ROW_T, BIOS SE5C600.86B.01.08.0003.022620131521 02/26/2013
[  246.057371]  0000000000000009 ffff88081fa03d40 ffffffff8164fbf6 ffff88081fa0ee48
[  246.065728]  ffff88081fa03d90 ffff88081fa03d80 ffffffff81054ecc ffff88081fa13040
[  246.074073]  0000000000000000 ffff88200cce0000 0000000000000040 0000000000000000
[  246.082430] Call Trace:
[  246.085174]  <IRQ>  [<ffffffff8164fbf6>] dump_stack+0x46/0x58
[  246.091633]  [<ffffffff81054ecc>] warn_slowpath_common+0x8c/0xc0
[  246.098352]  [<ffffffff81054fb6>] warn_slowpath_fmt+0x46/0x50
[  246.104786]  [<ffffffff815710d6>] dev_watchdog+0x246/0x250
[  246.110923]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
[  246.119097]  [<ffffffff8106092a>] call_timer_fn+0x3a/0x110
[  246.125224]  [<ffffffff8106280f>] ? update_process_times+0x6f/0x80
[  246.132137]  [<ffffffff81570e90>] ? dev_deactivate_queue.constprop.31+0x80/0x80
[  246.140308]  [<ffffffff81061db0>] run_timer_softirq+0x1f0/0x2a0
[  246.146933]  [<ffffffff81059a80>] __do_softirq+0xe0/0x220
[  246.152976]  [<ffffffff8165fedc>] call_softirq+0x1c/0x30
[  246.158920]  [<ffffffff810045f5>] do_softirq+0x55/0x90
[  246.164670]  [<ffffffff81059d35>] irq_exit+0xa5/0xb0
[  246.170227]  [<ffffffff8166062a>] smp_apic_timer_interrupt+0x4a/0x60
[  246.177324]  [<ffffffff8165f40a>] apic_timer_interrupt+0x6a/0x70
[  246.184041]  <EOI>  [<ffffffff81505a1b>] ? cpuidle_enter_state+0x5b/0xe0
[  246.191559]  [<ffffffff81505a17>] ? cpuidle_enter_state+0x57/0xe0
[  246.198374]  [<ffffffff81505b5d>] cpuidle_idle_call+0xbd/0x200
[  246.204900]  [<ffffffff8100b7ae>] arch_cpu_idle+0xe/0x30
[  246.210846]  [<ffffffff810a47b0>] cpu_startup_entry+0xd0/0x250
[  246.217371]  [<ffffffff81646b47>] rest_init+0x77/0x80
[  246.223028]  [<ffffffff81d09e8e>] start_kernel+0x3ee/0x3fb
[  246.229165]  [<ffffffff81d0989f>] ? repair_env_string+0x5e/0x5e
[  246.235787]  [<ffffffff81d095a5>] x86_64_start_reservations+0x2a/0x2c
[  246.242990]  [<ffffffff81d0969f>] x86_64_start_kernel+0xf8/0xfc
[  246.249610] ---[ end trace fb74fdef54d79039 ]---
[  246.254807] ixgbe 0000:c2:00.0 p786p1: initiating reset due to tx timeout
[  246.262489] ixgbe 0000:c2:00.0 p786p1: Reset adapter
Last login: Mon Nov 11 08:35:14 from 10.18.17.119
[root@(none) ~]# [  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
[  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX
[  246.792676] ixgbe 0000:c2:00.0 p786p1: detected SFP+: 5
[  249.231598] ixgbe 0000:c2:00.0 p786p1: NIC Link is Up 10 Gbps, Flow Control: RX/TX

(last lines keep repeating.  ixgbe driver is dead until module reload.)

If the downed cpu has more vectors than are free on the remaining cpus on the
system, it is possible that some vectors are "orphaned" even though they are
assigned to a cpu.  In this case, since the ixgbe driver had a watchdog, the
watchdog fired and notified that something was wrong.

This patch adds a function, check_vectors(), to compare the number of vectors
on the CPU going down and compares it to the number of vectors available on
the system.  If there aren't enough vectors for the CPU to go down, an
error is returned and propogated back to userspace.

v2: Do not need to look at percpu irqs
v3: Need to check affinity to prevent counting of MSIs in IOAPIC Lowest
    Priority Mode
v4: Additional changes suggested by Gong Chen.
v5/v6/v7/v8: Updated comment text

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Link: http://lkml.kernel.org/r/1389613861-3853-1-git-send-email-prarit@redhat.com
Reviewed-by: Gong Chen <gong.chen@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Michel Lespinasse <walken@google.com>
Cc: Seiji Aguchi <seiji.aguchi@hds.com>
Cc: Yang Zhang <yang.z.zhang@Intel.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Janet Morgan <janet.morgan@intel.com>
Cc: Tony Luck <tony.luck@intel.com>
Cc: Ruiv Wang <ruiv.wang@gmail.com>
Cc: Gong Chen <gong.chen@linux.intel.com>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
Cc: <stable@vger.kernel.org>
2014-01-15 22:24:02 -08:00
..
acpi Merge branch 'acpica' 2013-11-07 19:21:11 +01:00
apic Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-19 10:48:19 -08:00
cpu x86, cpu, amd: Add workaround for family 16h, erratum 793 2014-01-14 16:39:07 -08:00
kprobes Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-09-04 08:42:44 -07:00
.gitignore
alternative.c lockdep, x86/alternatives: Drop ancient lockdep fixup message 2013-09-28 10:09:41 +02:00
amd_gart_64.c x86, mm: use pfn_range_is_mapped() with gart 2012-11-17 11:59:10 -08:00
amd_nb.c x86, amd_nb: Clarify F15h, model 30h GART and L3 support 2013-08-12 15:30:08 +02:00
apb_timer.c intel_mid: Renamed *mrst* to *intel_mid* 2013-10-17 16:40:47 -07:00
aperture_64.c x86/mm/gart: Drop unnecessary check 2013-04-16 10:54:40 +02:00
apm_32.c x86, asmlinkage, apm: Make APM data structure used from assembler visible 2013-08-06 14:20:20 -07:00
asm-offsets_32.c x86: Get rid of ->hard_math and all the FPU asm fu 2013-06-06 14:32:04 -07:00
asm-offsets_64.c x86, gdt, hibernate: Store/load GDT for hibernate path. 2013-05-02 11:27:35 -07:00
asm-offsets.c sched, x86: Provide a per-cpu preempt_count implementation 2013-09-25 14:07:57 +02:00
audit_64.c
bootflag.c
check.c
cpuid.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
crash_dump_32.c
crash_dump_64.c
crash.c x86/apic: Disable I/O APIC before shutdown of the local APIC 2013-11-07 10:12:37 +01:00
devicetree.c Merge remote-tracking branch 'grant/devicetree/next' into for-next 2013-11-07 10:34:46 -06:00
doublefault.c x86: Extend #DF debugging aid to 64-bit 2013-05-13 13:42:44 -07:00
dumpstack_32.c dump_stack: unify debug information printed by show_regs() 2013-04-30 17:04:02 -07:00
dumpstack_64.c dump_stack: unify debug information printed by show_regs() 2013-04-30 17:04:02 -07:00
dumpstack.c x86/dumpstack: Fix printk_address for direct addresses 2013-11-12 21:06:06 +01:00
e820.c x86: avoid remapping data in parse_setup_data() 2013-08-13 23:29:19 -07:00
early_printk.c Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 11:12:22 +09:00
early-quirks.c x86/early quirk: use gen6 stolen detection for VLV 2013-11-14 09:32:11 +01:00
entry_32.S ftrace/x86: Load ftrace_ops in parameter not the variable holding it 2014-01-09 13:24:29 -08:00
entry_64.S ftrace/x86: Load ftrace_ops in parameter not the variable holding it 2014-01-09 13:24:29 -08:00
ftrace.c ftrace/x86: skip over the breakpoint for ftrace caller 2013-11-05 16:01:47 -05:00
head32.c intel_mid: Renamed *mrst* to *intel_mid* 2013-10-17 16:40:47 -07:00
head64.c x86, trace: Register exception handler to trace IDT 2013-11-08 14:15:45 -08:00
head_32.S Merge branch 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-09-04 09:11:16 -07:00
head_64.S x86: Make sure IDT is page aligned 2013-07-16 15:14:48 -07:00
head.c x86: Make sure we can boot in the case the BDA contains pure garbage 2013-02-27 13:38:57 -08:00
hpet.c x86, hpet: Introduce x86_msi_ops.setup_hpet_msi 2013-01-28 10:48:30 +01:00
hw_breakpoint.c ptrace/x86: flush_ptrace_hw_breakpoint() shoule clear the virtual debug registers 2013-07-09 10:33:26 -07:00
i386_ksyms_32.c sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
i387.c x86: move fpu_counter into ARCH specific thread_struct 2013-11-13 12:09:13 +09:00
i8237.c
i8253.c
i8259.c x86/irq: Correct comment about i8259 initialization 2013-09-04 07:46:04 +02:00
io_delay.c
ioport.c x86: get rid of pt_regs argument of iopl(2) 2013-02-03 18:16:24 -05:00
irq_32.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 10:20:12 +09:00
irq_64.c irq: Consolidate do_softirq() arch overriden implementations 2013-10-01 12:53:25 +02:00
irq_work.c x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible 2013-08-06 14:18:23 -07:00
irq.c x86: Add check for number of available vectors before CPU down 2014-01-15 22:24:02 -08:00
irqinit.c KVM: VMX: Register a new IPI for posted interrupt 2013-04-16 16:32:39 -03:00
jump_label.c x86/jump_label: expect default_nop if static_key gets enabled on boot-up 2013-10-19 19:45:35 -04:00
kdebugfs.c arch/x86/kernel/kdebugfs.c: Ensure a consistent return value in error case 2012-07-26 15:07:20 +02:00
kgdb.c kgdb,x86: fix warning about unused variable 2012-10-12 06:37:34 -05:00
kvm.c x86, trace: Register exception handler to trace IDT 2013-11-08 14:15:45 -08:00
kvmclock.c pvclock: detect watchdog reset at pvclock read 2013-11-06 09:48:43 +02:00
ldt.c
machine_kexec_32.c
machine_kexec_64.c x86, kexec, 64bit: Only set ident mapping for ram. 2013-01-29 15:26:35 -08:00
Makefile sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
microcode_amd_early.c x86, microcode, AMD: Fix early microcode loading 2013-08-12 18:32:45 +02:00
microcode_amd.c x86/microcode/amd: Tone down printk(), don't treat a missing firmware file as an error 2013-11-12 22:03:49 +01:00
microcode_core_early.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
microcode_core.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
microcode_intel_early.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
microcode_intel_lib.c x86/microcode_intel_lib.c: Early update ucode on Intel's CPU 2013-01-31 13:19:14 -08:00
microcode_intel.c x86/microcode_intel.h: Define functions and macros for early loading ucode 2013-01-31 13:18:50 -08:00
mmconf-fam10h_64.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
module.c mm/arch: use NUMA_NO_NODE 2013-11-13 12:09:05 +09:00
mpparse.c
msr.c x86, msr: Use file_inode(), not f_mapping->host 2013-10-08 12:01:29 -07:00
nmi_selftest.c
nmi.c perf/x86: Fix NMI measurements 2013-10-29 12:01:20 +01:00
paravirt_patch_32.c
paravirt_patch_64.c
paravirt-spinlocks.c x86, ticketlock: Add slowpath logic 2013-08-09 07:54:00 -07:00
paravirt.c x86, paravirt: Remove duplicate definition for DEF_NATIVE 2013-09-04 09:46:43 -07:00
pci-calgary_64.c
pci-dma.c x86/dma-debug: Bump PREALLOC_DMA_DEBUG_ENTRIES 2013-01-24 17:34:18 +01:00
pci-iommu_table.c
pci-nommu.c
pci-swiotlb.c
pcspeaker.c
perf_regs.c perf: Fix off by one test in perf_reg_value() 2012-09-19 17:08:40 +02:00
preempt.S sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
probe_roms.c x86/pci/probe_roms: Add missing __iomem annotation to pci_map_biosrom() 2012-09-05 10:52:25 +02:00
process_32.c x86: move fpu_counter into ARCH specific thread_struct 2013-11-13 12:09:13 +09:00
process_64.c Merge branch 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:55:56 +09:00
process.c sched, idle: Fix the idle polling state logic 2013-09-25 13:53:10 +02:00
ptrace.c ptrace/x86: cleanup ptrace_set_debugreg() 2013-07-09 10:33:26 -07:00
pvclock.c hung_task: add method to reset detector 2013-11-06 09:49:02 +02:00
quirks.c x86, quirks: Shut-up a long-standing gcc warning 2013-04-02 16:03:34 -07:00
reboot_fixups_32.c
reboot.c x86/apic, doc: Justification for disabling IO APIC before Local APIC 2013-12-04 19:33:21 -08:00
relocate_kernel_32.S x86, asm, cleanup: Replace open-coded control register values with symbolic 2013-06-25 16:26:06 -07:00
relocate_kernel_64.S x86, reloc: Use xorl instead of xorq in relocate_kernel_64.S 2013-06-20 21:30:04 -07:00
resource.c
rtc.c Merge branch 'x86-intel-mid-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-12 11:12:22 +09:00
setup_percpu.c
setup.c x86, acpi, crash, kdump: do reserve_crashkernel() after SRAT is parsed. 2013-11-13 12:09:08 +09:00
signal.c Merge branch 'x86-smap-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-09-04 11:08:32 -07:00
smp.c x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible 2013-08-06 14:18:23 -07:00
smpboot.c x86: Add check for number of available vectors before CPU down 2014-01-15 22:24:02 -08:00
stacktrace.c
step.c ptrace: ensure arch_ptrace/ptrace_request can never race with SIGKILL 2013-01-22 10:08:00 -08:00
sys_x86_64.c x86 get_unmapped_area: Access mmap_legacy_base through mm_struct member 2013-08-22 10:19:35 -07:00
syscall_32.c x86, asmlinkage: Make syscall tables visible 2013-08-06 14:20:18 -07:00
syscall_64.c x86, asmlinkage: Make syscall tables visible 2013-08-06 14:20:18 -07:00
sysfb_efi.c x86: sysfb: move EFI quirks from efifb to sysfb 2013-08-02 16:17:47 -07:00
sysfb_simplefb.c x86/simplefb: Mark framebuffer mem-resources as IORESOURCE_BUSY to avoid bootup warning 2013-10-03 07:51:11 +02:00
sysfb.c x86: sysfb: move EFI quirks from efifb to sysfb 2013-08-02 16:17:47 -07:00
tboot.c x86 / tboot / ACPI: Fail extended mode reduced hardware sleep 2013-07-31 14:25:51 +02:00
tce_64.c
test_nx.c
test_rodata.c
time.c
tls.c make SYSCALL_DEFINE<n>-generated wrappers do asmlinkage_protect 2013-03-03 22:58:33 -05:00
tls.h
topology.c hotplug, powerpc, x86: Remove cpu_hotplug_driver_lock() 2013-09-30 19:55:51 +02:00
trace_clock.c tracing,x86: Add a TSC trace_clock 2012-11-13 15:48:27 -05:00
tracepoint.c x86: Make sure IDT is page aligned 2013-07-16 15:14:48 -07:00
traps.c Merge branch 'x86-trace-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:25:10 +09:00
tsc_sync.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
tsc.c perf/x86: Add ability to calculate TSC from perf sample timestamps 2013-07-23 12:17:45 +02:00
uprobes.c uretprobes/x86: Hijack return address 2013-04-13 15:31:55 +02:00
verify_cpu.S
vm86_32.c x86, vm86: fix VM86 syscalls: use SYSCALL_DEFINEx(...) 2013-05-02 20:36:32 -04:00
vmlinux.lds.S x86: intel-mid: Add section for sfi device table 2013-10-17 16:41:04 -07:00
vsmp_64.c x86/apic/x2apic: Limit the vector reservation to the user specified mask 2012-07-06 11:00:22 +02:00
vsyscall_64.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00
vsyscall_emu_64.S
vsyscall_trace.h
x86_init.c PCI: Add x86_msi.msi_mask_irq() and msix_mask_irq() 2013-11-06 16:32:19 -07:00
x8664_ksyms_64.c sched, x86: Optimize the preempt_schedule() call 2013-09-25 14:23:07 +02:00
xsave.c x86: delete __cpuinit usage from all x86 files 2013-07-14 19:36:56 -04:00