Pull x86 paravirt changes from Ingo Molnar:
"Hypervisor signature detection cleanup and fixes - the goal is to make
KVM guests run better on MS/Hyperv and to generalize and factor out
the code a bit"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86: Correctly detect hypervisor
x86, kvm: Switch to use hypervisor_cpuid_base()
xen: Switch to use hypervisor_cpuid_base()
x86: Introduce hypervisor_cpuid_base()
Pull x86/asmlinkage changes from Ingo Molnar:
"As a preparation for Andi Kleen's LTO patchset (link time
optimizations using GCC's -flto which build time optimization has
steadily increased in quality over the past few years and might
eventually be usable for the kernel too) this tree includes a handful
of preparatory patches that make function calling convention
annotations consistent again:
- Mark every function without arguments (or 64bit only) that is used
by assembly code with asmlinkage()
- Mark every function with parameters or variables that is used by
assembly code as __visible.
For the vanilla kernel this has documentation, consistency and
debuggability advantages, for the time being"
* 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/asmlinkage: Fix warning in xen asmlinkage change
x86, asmlinkage, vdso: Mark vdso variables __visible
x86, asmlinkage, power: Make various symbols used by the suspend asm code visible
x86, asmlinkage: Make dump_stack visible
x86, asmlinkage: Make 64bit checksum functions visible
x86, asmlinkage, paravirt: Add __visible/asmlinkage to xen paravirt ops
x86, asmlinkage, apm: Make APM data structure used from assembler visible
x86, asmlinkage: Make syscall tables visible
x86, asmlinkage: Make several variables used from assembler/linker script visible
x86, asmlinkage: Make kprobes code visible and fix assembler code
x86, asmlinkage: Make various syscalls asmlinkage
x86, asmlinkage: Make 32bit/64bit __switch_to visible
x86, asmlinkage: Make _*_start_kernel visible
x86, asmlinkage: Make all interrupt handlers asmlinkage / __visible
x86, asmlinkage: Change dotraplinkage into __visible on 32bit
x86: Fix sys_call_table type in asm/syscall.h
Current code uses asmlinkage for functions without arguments.
This adds an implicit regparm(0) which creates a warning
when assigning the function to pointers.
Use __visible for the functions without arguments.
This avoids having to add regparm(0) to function pointers.
Since they have no arguments it does not make any difference.
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1377115662-4865-1-git-send-email-andi@firstfloor.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
- On ARM did not have balanced calls to get/put_cpu.
- Fix to make tboot + Xen + Linux correctly.
- Fix events VCPU binding issues.
- Fix a vCPU online race where IPIs are sent to not-yet-online vCPU.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.14 (GNU/Linux)
iQEcBAABAgAGBQJSFMaJAAoJEFjIrFwIi8fJ+/0H/32rLj60FpKXcPDCvID+9p8T
XDGnFNttsxyhuzEzetOAd0aLKYKGnUaTDZBHfgSNipGCxjMLYgz84phRmHAYEj8u
kai1Ag1WjhZilCmImzFvdHFiUwtvKwkeBIL/cZtKr1BetpnuuFsoVnwbH9FVjMpr
TCg6sUwFq7xRyD1azo/cTLZFeiUqq0aQLw8J72YaapdS3SztHPeDHXlPpmLUdb6+
hiSYveJMYp2V0SW8g8eLKDJxVr2QdPEfl9WpBzpLlLK8GrNw8BEU6hSOSLzxB7z/
hDATAuZ5iHiIEi1uGfVjOyDws2ngUhmBKUH5x5iVIZd2P5c/ffLh2ePDVWGO5RI=
=yMuS
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip
Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
- On ARM did not have balanced calls to get/put_cpu.
- Fix to make tboot + Xen + Linux correctly.
- Fix events VCPU binding issues.
- Fix a vCPU online race where IPIs are sent to not-yet-online vCPU.
* tag 'stable/for-linus-3.11-rc6-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/xen/tip:
xen/smp: initialize IPI vectors before marking CPU online
xen/events: mask events when changing their VCPU binding
xen/events: initialize local per-cpu mask for all possible events
x86/xen: do not identity map UNUSABLE regions in the machine E820
xen/arm: missing put_cpu in xen_percpu_init
An older PVHVM guest (v3.0 based) crashed during vCPU hot-plug with:
kernel BUG at drivers/xen/events.c:1328!
RCU has detected that a CPU has not entered a quiescent state within the
grace period. It needs to send the CPU a reschedule IPI if it is not
offline. rcu_implicit_offline_qs() does this check:
/*
* If the CPU is offline, it is in a quiescent state. We can
* trust its state not to change because interrupts are disabled.
*/
if (cpu_is_offline(rdp->cpu)) {
rdp->offline_fqs++;
return 1;
}
Else the CPU is online. Send it a reschedule IPI.
The CPU is in the middle of being hot-plugged and has been marked online
(!cpu_is_offline()). See start_secondary():
set_cpu_online(smp_processor_id(), true);
...
per_cpu(cpu_state, smp_processor_id()) = CPU_ONLINE;
start_secondary() then waits for the CPU bringing up the hot-plugged CPU to
mark it as active:
/*
* Wait until the cpu which brought this one up marked it
* online before enabling interrupts. If we don't do that then
* we can end up waking up the softirq thread before this cpu
* reached the active state, which makes the scheduler unhappy
* and schedule the softirq thread on the wrong cpu. This is
* only observable with forced threaded interrupts, but in
* theory it could also happen w/o them. It's just way harder
* to achieve.
*/
while (!cpumask_test_cpu(smp_processor_id(), cpu_active_mask))
cpu_relax();
/* enable local interrupts */
local_irq_enable();
The CPU being hot-plugged will be marked active after it has been fully
initialized by the CPU managing the hot-plug. In the Xen PVHVM case
xen_smp_intr_init() is called to set up the hot-plugged vCPU's
XEN_RESCHEDULE_VECTOR.
The hot-plugging CPU is marked online, not marked active and does not have
its IPI vectors set up. rcu_implicit_offline_qs() sees the hot-plugging
cpu is !cpu_is_offline() and tries to send it a reschedule IPI:
This will lead to:
kernel BUG at drivers/xen/events.c:1328!
xen_send_IPI_one()
xen_smp_send_reschedule()
rcu_implicit_offline_qs()
rcu_implicit_dynticks_qs()
force_qs_rnp()
force_quiescent_state()
__rcu_process_callbacks()
rcu_process_callbacks()
__do_softirq()
call_softirq()
do_softirq()
irq_exit()
xen_evtchn_do_upcall()
because xen_send_IPI_one() will attempt to use an uninitialized IRQ for
the XEN_RESCHEDULE_VECTOR.
There is at least one other place that has caused the same crash:
xen_smp_send_reschedule()
wake_up_idle_cpu()
add_timer_on()
clocksource_watchdog()
call_timer_fn()
run_timer_softirq()
__do_softirq()
call_softirq()
do_softirq()
irq_exit()
xen_evtchn_do_upcall()
xen_hvm_callback_vector()
clocksource_watchdog() uses cpu_online_mask to pick the next CPU to handle
a watchdog timer:
/*
* Cycle through CPUs to check if the CPUs stay synchronized
* to each other.
*/
next_cpu = cpumask_next(raw_smp_processor_id(), cpu_online_mask);
if (next_cpu >= nr_cpu_ids)
next_cpu = cpumask_first(cpu_online_mask);
watchdog_timer.expires += WATCHDOG_INTERVAL;
add_timer_on(&watchdog_timer, next_cpu);
This resulted in an attempt to send an IPI to a hot-plugging CPU that
had not initialized its reschedule vector. One option would be to make
the RCU code check to not check for CPU offline but for CPU active.
As becoming active is done after a CPU is online (in older kernels).
But Srivatsa pointed out that "the cpu_active vs cpu_online ordering has been
completely reworked - in the online path, cpu_active is set *before* cpu_online,
and also, in the cpu offline path, the cpu_active bit is reset in the CPU_DYING
notification instead of CPU_DOWN_PREPARE." Drilling in this the bring-up
path: "[brought up CPU].. send out a CPU_STARTING notification, and in response
to that, the scheduler sets the CPU in the cpu_active_mask. Again, this mask
is better left to the scheduler alone, since it has the intelligence to use it
judiciously."
The conclusion was that:
"
1. At the IPI sender side:
It is incorrect to send an IPI to an offline CPU (cpu not present in
the cpu_online_mask). There are numerous places where we check this
and warn/complain.
2. At the IPI receiver side:
It is incorrect to let the world know of our presence (by setting
ourselves in global bitmasks) until our initialization steps are complete
to such an extent that we can handle the consequences (such as
receiving interrupts without crashing the sender etc.)
" (from Srivatsa)
As the native code enables the interrupts at some point we need to be
able to service them. In other words a CPU must have valid IPI vectors
if it has been marked online.
It doesn't need to handle the IPI (interrupts may be disabled) but needs
to have valid IPI vectors because another CPU may find it in cpu_online_mask
and attempt to send it an IPI.
This patch will change the order of the Xen vCPU bring-up functions so that
Xen vectors have been set up before start_secondary() is called.
It also will not continue to bring up a Xen vCPU if xen_smp_intr_init() fails
to initialize it.
Orabug 13823853
Signed-off-by Chuck Anderson <chuck.anderson@oracle.com>
Acked-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If there are UNUSABLE regions in the machine memory map, dom0 will
attempt to map them 1:1 which is not permitted by Xen and the kernel
will crash.
There isn't anything interesting in the UNUSABLE region that the dom0
kernel needs access to so we can avoid making the 1:1 mapping and
treat it as RAM.
We only do this for dom0, as that is where tboot case shows up.
A PV domU could have an UNUSABLE region in its pseudo-physical map
and would need to be handled in another patch.
This fixes a boot failure on hosts with tboot.
tboot marks a region in the e820 map as unusable and the dom0 kernel
would attempt to map this region and Xen does not permit unusable
regions to be mapped by guests.
(XEN) 0000000000000000 - 0000000000060000 (usable)
(XEN) 0000000000060000 - 0000000000068000 (reserved)
(XEN) 0000000000068000 - 000000000009e000 (usable)
(XEN) 0000000000100000 - 0000000000800000 (usable)
(XEN) 0000000000800000 - 0000000000972000 (unusable)
tboot marked this region as unusable.
(XEN) 0000000000972000 - 00000000cf200000 (usable)
(XEN) 00000000cf200000 - 00000000cf38f000 (reserved)
(XEN) 00000000cf38f000 - 00000000cf3ce000 (ACPI data)
(XEN) 00000000cf3ce000 - 00000000d0000000 (reserved)
(XEN) 00000000e0000000 - 00000000f0000000 (reserved)
(XEN) 00000000fe000000 - 0000000100000000 (reserved)
(XEN) 0000000100000000 - 0000000630000000 (usable)
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
[v1: Altered the patch and description with domU's with UNUSABLE regions]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We try to handle the hypervisor compatibility mode by detecting hypervisor
through a specific order. This is not robust, since hypervisors may implement
each others features.
This patch tries to handle this situation by always choosing the last one in the
CPUID leaves. This is done by letting .detect() return a priority instead of
true/false and just re-using the CPUID leaf where the signature were found as
the priority (or 1 if it was found by DMI). Then we can just pick hypervisor who
has the highest priority. Other sophisticated detection method could also be
implemented on top.
Suggested by H. Peter Anvin and Paolo Bonzini.
Acked-by: K. Y. Srinivasan <kys@microsoft.com>
Cc: Haiyang Zhang <haiyangz@microsoft.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: Jeremy Fitzhardinge <jeremy@goop.org>
Cc: Doug Covelli <dcovelli@vmware.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Dan Hecht <dhecht@vmware.com>
Cc: Paul Gortmaker <paul.gortmaker@windriver.com>
Cc: Marcelo Tosatti <mtosatti@redhat.com>
Cc: Gleb Natapov <gleb@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
Link: http://lkml.kernel.org/r/1374742475-2485-4-git-send-email-jasowang@redhat.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
Note that some harmless section mismatch warnings may result, since
notify_cpu_starting() and cpu_up() are arch independent (kernel/cpu.c)
are flagged as __cpuinit -- so if we remove the __cpuinit from
arch specific callers, we will also get section mismatch warnings.
As an intermediate step, we intend to turn the linux/init.h cpuinit
content into no-ops as early as possible, since that will get rid
of these warnings. In any case, they are temporary and harmless.
This removes all the arch/x86 uses of the __cpuinit macros from
all C files. x86 only had the one __CPUINIT used in assembly files,
and it wasn't paired off with a .previous or a __FINIT, so we can
delete it directly w/o any corresponding additional change there.
[1] https://lkml.org/lkml/2013/5/20/589
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: x86@kernel.org
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: H. Peter Anvin <hpa@linux.intel.com>
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Pull timer core updates from Thomas Gleixner:
"The timer changes contain:
- posix timer code consolidation and fixes for odd corner cases
- sched_clock implementation moved from ARM to core code to avoid
duplication by other architectures
- alarm timer updates
- clocksource and clockevents unregistration facilities
- clocksource/events support for new hardware
- precise nanoseconds RTC readout (Xen feature)
- generic support for Xen suspend/resume oddities
- the usual lot of fixes and cleanups all over the place
The parts which touch other areas (ARM/XEN) have been coordinated with
the relevant maintainers. Though this results in an handful of
trivial to solve merge conflicts, which we preferred over nasty cross
tree merge dependencies.
The patches which have been committed in the last few days are bug
fixes plus the posix timer lot. The latter was in akpms queue and
next for quite some time; they just got forgotten and Frederic
collected them last minute."
* 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (59 commits)
hrtimer: Remove unused variable
hrtimers: Move SMP function call to thread context
clocksource: Reselect clocksource when watchdog validated high-res capability
posix-cpu-timers: don't account cpu timer after stopped thread runtime accounting
posix_timers: fix racy timer delta caching on task exit
posix-timers: correctly get dying task time sample in posix_cpu_timer_schedule()
selftests: add basic posix timers selftests
posix_cpu_timers: consolidate expired timers check
posix_cpu_timers: consolidate timer list cleanups
posix_cpu_timer: consolidate expiry time type
tick: Sanitize broadcast control logic
tick: Prevent uncontrolled switch to oneshot mode
tick: Make oneshot broadcast robust vs. CPU offlining
x86: xen: Sync the CMOS RTC as well as the Xen wallclock
x86: xen: Sync the wallclock when the system time is set
timekeeping: Indicate that clock was set in the pvclock gtod notifier
timekeeping: Pass flags instead of multiple bools to timekeeping_update()
xen: Remove clock_was_set() call in the resume path
hrtimers: Support resuming with two or more CPUs online (but stopped)
timer: Fix jiffies wrap behavior of round_jiffies_common()
...
git://git.kernel.org/pub/scm/linux/kernel/git/frederic/linux-dynticks into timers/core
Frederic sayed: "Most of these patches have been hanging around for
several month now, in -mmotm for a significant chunk. They already
missed a few releases."
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* Fix memory leak when CPU hotplugging.
* Compile bugs with various #ifdefs
* Fix state changes in Xen PCI front not dealing well with new toolstack.
* Cleanups in code (use pr_*, fix 80 characters splits, etc)
* Long standing bug in double-reporting the steal time
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEcBAABAgAGBQJR0Xc1AAoJEFjIrFwIi8fJo68H/jZaJJmytDI7exHTyq8fSGXQ
5OERw5YZeM5jZQG55YC5hmGS5oIpKEdBt+aAEpuofYUhrR/ZFqDr0j+QEiqC36bl
cl0/IAnMBGnyyO6FYY4Sut2H+S5BGYQNbwo9YAtgKtZANr2eLABxYUfMU44I/jCW
M7DAojME9OZLBW3ORYZTGf1A0T8hJINhxZIWhtLMrkckCb9AZMieKdMOHJEWq2jl
aPxx78U+2CLTdLquOLIiBEiTO3vAzx2Wt4prD+uCizWna45H9gBj7GFwBYH7p9Ry
TFyzmDc5PufThTilfDyQW1y4yiNLpFDC/67a1wpwObEBn87hstHgwHQQ5INzqwY=
=Ej7M
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.11-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen bugfixes from Konrad Rzeszutek Wilk:
- Fix memory leak when CPU hotplugging.
- Compile bugs with various #ifdefs
- Fix state changes in Xen PCI front not dealing well with new
toolstack.
- Cleanups in code (use pr_*, fix 80 characters splits, etc)
- Long standing bug in double-reporting the steal time
* tag 'stable/for-linus-3.11-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/time: remove blocked time accounting from xen "clockchip"
xen: Convert printks to pr_<level>
xen: ifdef CONFIG_HIBERNATE_CALLBACKS xen_*_suspend
xen/pcifront: Deal with toolstack missing 'XenbusStateClosing' state.
xen/time: Free onlined per-cpu data structure if we want to online it again.
xen/time: Check that the per_cpu data structure has data before freeing.
xen/time: Don't leak interrupt name when offlining.
xen/time: Encapsulate the struct clock_event_device in another structure.
xen/spinlock: Don't leak interrupt name when offlining.
xen/smp: Don't leak interrupt name when offlining.
xen/smp: Set the per-cpu IRQ number to a valid default.
xen/smp: Introduce a common structure to contain the IRQ name and interrupt line.
xen/smp: Coalesce the free_irq calls in one function.
xen-pciback: fix error return code in pcistub_irq_handler_switch()
Pull x86 FPU changes from Ingo Molnar:
"There are two bigger changes in this tree:
- Add an [early-use-]safe static_cpu_has() variant and other
robustness improvements, including the new X86_DEBUG_STATIC_CPU_HAS
configurable debugging facility, motivated by recent obscure FPU
code bugs, by Borislav Petkov
- Reimplement FPU detection code in C and drop the old asm code, by
Peter Anvin."
* 'x86-fpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, fpu: Use static_cpu_has_safe before alternatives
x86: Add a static_cpu_has_safe variant
x86: Sanity-check static_cpu_has usage
x86, cpu: Add a synthetic, always true, cpu feature
x86: Get rid of ->hard_math and all the FPU asm fu
Adjustments to Xen's persistent clock via update_persistent_clock()
don't actually persist, as the Xen wallclock is a software only clock
and modifications to it do not modify the underlying CMOS RTC.
The x86_platform.set_wallclock hook is there to keep the hardware RTC
synchronized. On a guest this is pointless.
On Dom0 we can use the native implementaion which actually updates the
hardware RTC, but we still need to keep the software emulation of RTC
for the guests up to date. The subscription to the pvclock_notifier
allows us to emulate this easily. The notifier is called at every tick
and when the clock was set.
Right now we only use that notifier when the clock was set, but due to
the fact that it is called periodically from the timekeeping update
code, we can utilize it to emulate the NTP driven drift compensation
of update_persistant_clock() for the Xen wall (software) clock.
Add a 11 minutes periodic update to the pvclock_gtod notifier callback
to achieve that. The static variable 'next' which maintains that 11
minutes update cycle is protected by the core code serialization so
there is no need to add a Xen specific serialization mechanism.
[ tglx: Massaged changelog and added a few comments ]
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/1372329348-20841-6-git-send-email-david.vrabel@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Currently the Xen wallclock is only updated every 11 minutes if NTP is
synchronized to its clock source (using the sync_cmos_clock() work).
If a guest is started before NTP is synchronized it may see an
incorrect wallclock time.
Use the pvclock_gtod notifier chain to receive a notification when the
system time has changed and update the wallclock to match.
This chain is called on every timer tick and we want to avoid an extra
(expensive) hypercall on every tick. Because dom0 has historically
never provided a very accurate wallclock and guests do not expect one,
we can do this simply: the wallclock is only updated if the clock was
set.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: <xen-devel@lists.xen.org>
Link: http://lkml.kernel.org/r/1372329348-20841-5-git-send-email-david.vrabel@citrix.com
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
... because the "clock_event_device framework" already accounts for idle
time through the "event_handler" function pointer in
xen_timer_interrupt().
The patch is intended as the completion of [1]. It should fix the double
idle times seen in PV guests' /proc/stat [2]. It should be orthogonal to
stolen time accounting (the removed code seems to be isolated).
The approach may be completely misguided.
[1] https://lkml.org/lkml/2011/10/6/10
[2] http://lists.xensource.com/archives/html/xen-devel/2010-08/msg01068.html
John took the time to retest this patch on top of v3.10 and reported:
"idle time is correctly incremented for pv and hvm for the normal
case, nohz=off and nohz=idle." so lets put this patch in.
CC: stable@vger.kernel.org
Signed-off-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: John Haxby <john.haxby@oracle.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If the per-cpu time data structure has been onlined already and
we are trying to online it again, then free the previous copy
before blindly over-writting it.
A developer naturally should not call this function multiple times
but just in case.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We don't check whether the per_cpu data structure has actually
been freed in the past. This checks it and if it has been freed
in the past then just continues on without double-freeing.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
We don't do any code movement. We just encapsulate the struct clock_event_device
in a new structure which contains said structure and a pointer to
a char *name. The 'name' will be used in 'xen/time: Don't leak interrupt
name when offlining'.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When the user does:
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online
kmemleak reports:
kmemleak: 7 new suspected memory leaks (see /sys/kernel/debug/kmemleak)
unreferenced object 0xffff88003fa51260 (size 32):
comm "swapper/0", pid 1, jiffies 4294667339 (age 1027.789s)
hex dump (first 32 bytes):
73 70 69 6e 6c 6f 63 6b 31 00 00 00 00 00 00 00 spinlock1.......
00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ................
backtrace:
[<ffffffff81660721>] kmemleak_alloc+0x21/0x50
[<ffffffff81190aac>] __kmalloc_track_caller+0xec/0x2a0
[<ffffffff812fe1bb>] kvasprintf+0x5b/0x90
[<ffffffff812fe228>] kasprintf+0x38/0x40
[<ffffffff81663789>] xen_init_lock_cpu+0x61/0xbe
[<ffffffff816633a6>] xen_cpu_up+0x66/0x3e8
[<ffffffff8166bbf5>] _cpu_up+0xd1/0x14b
[<ffffffff8166bd48>] cpu_up+0xd9/0xec
[<ffffffff81ae6e4a>] smp_init+0x4b/0xa3
[<ffffffff81ac4981>] kernel_init_freeable+0xdb/0x1e6
[<ffffffff8165ce39>] kernel_init+0x9/0xf0
[<ffffffff8167edfc>] ret_from_fork+0x7c/0xb0
[<ffffffffffffffff>] 0xffffffffffffffff
Instead of doing it like the "xen/smp: Don't leak interrupt name when offlining"
patch did (which has a per-cpu structure which contains both the
IRQ number and char*) we use a per-cpu pointers to a *char.
The reason is that the "__this_cpu_read(lock_kicker_irq);" macro
blows up with "__bad_size_call_parameter()" as the size of the
returned structure is not within the parameters of what it expects
and optimizes for.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When we free it we want to make sure to set it to a default
value of -1 so that we don't double-free it (in case somebody
calls us twice).
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
This patch adds a new structure to contain the common two things
that each of the per-cpu interrupts need:
- an interrupt number,
- and the name of the interrupt (to be added in 'xen/smp: Don't leak
interrupt name when offlining').
This allows us to carry the tuple of the per-cpu interrupt data structure
and expand it as we need in the future.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
There are two functions that do a bunch of 'free_irq' on
the per_cpu IRQ. Instead of having duplicate code just move
it to one function.
This is just code movement.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Reimplement FPU detection code in C and drop old, not-so-recommended
detection method in asm. Move all the relevant stuff into i387.c where
it conceptually belongs. Finally drop cpuinfo_x86.hard_math.
[ hpa: huge thanks to Borislav for taking my original concept patch
and productizing it ]
[ Boris, note to self: do not use static_cpu_has before alternatives! ]
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Link: http://lkml.kernel.org/r/1367244262-29511-2-git-send-email-bp@alien8.de
Link: http://lkml.kernel.org/r/1365436666-9837-2-git-send-email-bp@alien8.de
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>
The xen_play_dead is an undead function. When the vCPU is told to
offline it ends up calling xen_play_dead wherin it calls the
VCPUOP_down hypercall which offlines the vCPU. However, when the
vCPU is onlined back, it resumes execution right after
VCPUOP_down hypercall.
That was OK (albeit the API for play_dead assumes that the CPU
stays dead and never returns) but with commit 4b0c0f294
(tick: Cleanup NOHZ per cpu data on cpu down) that is no longer safe
as said commit resets the ts->inidle which at the start of the
cpu_idle loop was set.
The net effect is that we get this warn:
Broke affinity for irq 16
installing Xen timer for CPU 1
cpu 1 spinlock event irq 48
------------[ cut here ]------------
WARNING: at /home/konrad/linux-linus/kernel/time/tick-sched.c:935 tick_nohz_idle_exit+0x195/0x1b0()
Modules linked in: dm_multipath dm_mod xen_evtchn iscsi_boot_sysfs
CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.10.0-rc3upstream-00068-gdcdbe33 #1
Hardware name: BIOSTAR Group N61PB-M2S/N61PB-M2S, BIOS 6.00 PG 09/03/2009
ffffffff8193b448 ffff880039da5e60 ffffffff816707c8 ffff880039da5ea0
ffffffff8108ce8b ffff880039da4010 ffff88003fa8e500 ffff880039da4010
0000000000000001 ffff880039da4000 ffff880039da4010 ffff880039da5eb0
Call Trace:
[<ffffffff816707c8>] dump_stack+0x19/0x1b
[<ffffffff8108ce8b>] warn_slowpath_common+0x6b/0xa0
[<ffffffff8108ced5>] warn_slowpath_null+0x15/0x20
[<ffffffff810e4745>] tick_nohz_idle_exit+0x195/0x1b0
[<ffffffff810da755>] cpu_startup_entry+0x205/0x250
[<ffffffff81661070>] cpu_bringup_and_idle+0x13/0x15
---[ end trace 915c8c486004dda1 ]---
b/c ts_inidle is set to zero. Thomas suggested that we just add a workaround
to call tick_nohz_idle_enter before returning from xen_play_dead() - and
that is what this patch does and fixes the issue.
We also add the stable part b/c git commit 4b0c0f294 is on the stable
tree.
CC: stable@vger.kernel.org
Suggested-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Commit f447d56d36 introduced the
implementation of the PV apic ipi interface. But there were some
odd things (it seems none of which cause really any issue but
maybe they should be cleaned up anyway):
- xen_send_IPI_mask_allbutself (and by that xen_send_IPI_allbutself)
ignore the passed in vector and only use the CALL_FUNCTION_SINGLE
vector. While xen_send_IPI_all and xen_send_IPI_mask use the vector.
- physflat_send_IPI_allbutself is declared unnecessarily. It is never
used.
This patch tries to clean up those things.
Signed-off-by: Stefan Bader <stefan.bader@canonical.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
All the virtualized platforms (KVM, lguest and Xen) have persistent
wallclocks that have more than one second of precision.
read_persistent_wallclock() and update_persistent_wallclock() allow
for nanosecond precision but their implementation on x86 with
x86_platform.get/set_wallclock() only allows for one second precision.
This means guests may see a wallclock time that is off by up to 1
second.
Make set_wallclock() and get_wallclock() take a struct timespec
parameter (which allows for nanosecond precision) so KVM and Xen
guests may start with a more accurate wallclock time and a Xen dom0
can maintain a more accurate wallclock for guests.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Signed-off-by: John Stultz <john.stultz@linaro.org>
- More fixes in the vCPU PVHVM hotplug path.
- Add more documentation.
- Fix various ARM related issues in the Xen generic drivers.
- Updates in the xen-pciback driver per Bjorn's updates.
- Mask the x2APIC feature for PV guests.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEcBAABAgAGBQJRjPL5AAoJEFjIrFwIi8fJdlIIANXawH+B+aFbqsFSKOOh76XN
smgICU78SVzKpW9WAPYK7YFqSdNN4AleGC2Mn2lSkiaqgciRyDb9Yt+OSMMts2Xn
ZVbFkGhEKR+DtZfTKo9YgsGatul/McTiVEkuuli+aN5dql3WXDLAaA+/b9bO3ohh
TCWtWNuSCGmlfDoJET2je+J6CgKvCErH3fvzKNxgYxytcGhAvxoVK/lC4d3pnq/m
wQUAIcF8XYENqC2m1WDR0OGveAB0Me0j9g+UkQS+TzqA8GPmxC4aptjkroFYhOz6
8nZp+LanimmTI6olVioWEXkCr5+dxb058jQKwncQfonFpl58RS0qUrz5zoe3etU=
=h9SX
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.10-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen bug-fixes from Konrad Rzeszutek Wilk:
- More fixes in the vCPU PVHVM hotplug path.
- Add more documentation.
- Fix various ARM related issues in the Xen generic drivers.
- Updates in the xen-pciback driver per Bjorn's updates.
- Mask the x2APIC feature for PV guests.
* tag 'stable/for-linus-3.10-rc0-tag-two' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen/pci: Used cached MSI-X capability offset
xen/pci: Use PCI_MSIX_TABLE_BIR, not PCI_MSIX_FLAGS_BIRMASK
xen: clear IRQ_NOAUTOEN and IRQ_NOREQUEST
xen: mask x2APIC feature in PV
xen: SWIOTLB is only used on x86
xen/spinlock: Fix check from greater than to be also be greater or equal to.
xen/smp/pvhvm: Don't point per_cpu(xen_vpcu, 33 and larger) to shared_info
xen/vcpu: Document the xen_vcpu_info and xen_vcpu
xen/vcpu/pvhvm: Fix vcpu hotplugging hanging.
During review of git commit cb9c6f15f3
("xen/spinlock: Check against default value of -1 for IRQ line.")
Stefano pointed out a bug in the patch. Unfortunatly due to vacation
timing the fix was not applied and this patch fixes it up.
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
As it will point to some data, but not event channel data (the
shared_info has an array limited to 32).
This means that for PVHVM guests with more than 32 VCPUs without
the usage of VCPUOP_register_info any interrupts to VCPUs
larger than 32 would have gone unnoticed during early bootup.
That is OK, as during early bootup, in smp_init we end up calling
the hotplug mechanism (xen_hvm_cpu_notify) which makes the
VCPUOP_register_vcpu_info call for all VCPUs and we can receive
interrupts on VCPUs 33 and further.
This is just a cleanup.
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
They are important structures and it is not clear at first
look what they are for.
The xen_vcpu is a pointer. By default it points to the shared_info
structure (at the CPU offset location). However if the
VCPUOP_register_vcpu_info hypercall is implemented we can make the
xen_vcpu pointer point to a per-CPU location.
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
[v1: Added comments from Ian Campbell]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If a user did:
echo 0 > /sys/devices/system/cpu/cpu1/online
echo 1 > /sys/devices/system/cpu/cpu1/online
we would (this a build with DEBUG enabled) get to:
smpboot: ++++++++++++++++++++=_---CPU UP 1
.. snip..
smpboot: Stack at about ffff880074c0ff44
smpboot: CPU1: has booted.
and hang. The RCU mechanism would kick in an try to IPI the CPU1
but the IPIs (and all other interrupts) would never arrive at the
CPU1. At first glance at least. A bit digging in the hypervisor
trace shows that (using xenanalyze):
[vla] d4v1 vec 243 injecting
0.043163027 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
] 0.043163639 --|x d4v1 vmentry cycles 1468
] 0.043164913 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
0.043164913 --|x d4v1 inj_virq vec 243 real
[vla] d4v1 vec 243 injecting
0.043164913 --|x d4v1 intr_window vec 243 src 5(vector) intr f3
] 0.043165526 --|x d4v1 vmentry cycles 1472
] 0.043166800 --|x d4v1 vmexit exit_reason PENDING_INTERRUPT eip ffffffff81673254
0.043166800 --|x d4v1 inj_virq vec 243 real
[vla] d4v1 vec 243 injecting
there is a pending event (subsequent debugging shows it is the IPI
from the VCPU0 when smpboot.c on VCPU1 has done
"set_cpu_online(smp_processor_id(), true)") and the guest VCPU1 is
interrupted with the callback IPI (0xf3 aka 243) which ends up calling
__xen_evtchn_do_upcall.
The __xen_evtchn_do_upcall seems to do *something* but not acknowledge
the pending events. And the moment the guest does a 'cli' (that is the
ffffffff81673254 in the log above) the hypervisor is invoked again to
inject the IPI (0xf3) to tell the guest it has pending interrupts.
This repeats itself forever.
The culprit was the per_cpu(xen_vcpu, cpu) pointer. At the bootup
we set each per_cpu(xen_vcpu, cpu) to point to the
shared_info->vcpu_info[vcpu] but later on use the VCPUOP_register_vcpu_info
to register per-CPU structures (xen_vcpu_setup).
This is used to allow events for more than 32 VCPUs and for performance
optimizations reasons.
When the user performs the VCPU hotplug we end up calling the
the xen_vcpu_setup once more. We make the hypercall which returns
-EINVAL as it does not allow multiple registration calls (and
already has re-assigned where the events are being set). We pick
the fallback case and set per_cpu(xen_vcpu, cpu) to point to the
shared_info->vcpu_info[vcpu] (which is a good fallback during bootup).
However the hypervisor is still setting events in the register
per-cpu structure (per_cpu(xen_vcpu_info, cpu)).
As such when the events are set by the hypervisor (such as timer one),
and when we iterate in __xen_evtchn_do_upcall we end up reading stale
events from the shared_info->vcpu_info[vcpu] instead of the
per_cpu(xen_vcpu_info, cpu) structures. Hence we never acknowledge the
events that the hypervisor has set and the hypervisor keeps on reminding
us to ack the events which we never do.
The fix is simple. Don't on the second time when xen_vcpu_setup is
called over-write the per_cpu(xen_vcpu, cpu) if it points to
per_cpu(xen_vcpu_info).
Acked-by: Stefano Stabellini <stefano.stabellini@eu.citrix.com>
CC: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Pull x86 paravirt update from Ingo Molnar:
"Various paravirtualization related changes - the biggest one makes
guest support optional via CONFIG_HYPERVISOR_GUEST"
* 'x86-paravirt-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, wakeup, sleep: Use pvops functions for changing GDT entries
x86, xen, gdt: Remove the pvops variant of store_gdt.
x86-32, gdt: Store/load GDT for ACPI S3 or hibernation/resume path is not needed
x86-64, gdt: Store/load GDT for ACPI S3 or hibernate/resume path is not needed.
x86: Make Linux guest support optional
x86, Kconfig: Move PARAVIRT_DEBUG into the paravirt menu
Pull perparatory x86 kasrl changes from Ingo Molnar:
"This contains changes from the ongoing KASLR work, by Kees Cook.
The main changes are the use of a read-only IDT on x86 (which
decouples the userspace visible virtual IDT address from the physical
address), and a rework of ELF relocation support, in preparation of
random, boot-time kernel image relocation."
* 'x86-kaslr-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86, relocs: Refactor the relocs tool to merge 32- and 64-bit ELF
x86, relocs: Build separate 32/64-bit tools
x86, relocs: Add 64-bit ELF support to relocs tool
x86, relocs: Consolidate processing logic
x86, relocs: Generalize ELF structure names
x86: Use a read-only IDT alias on all CPUs
Pull SMP/hotplug changes from Ingo Molnar:
"This is a pretty large, multi-arch series unifying and generalizing
the various disjunct pieces of idle routines that architectures have
historically copied from each other and have grown in random, wildly
inconsistent and sometimes buggy directions:
101 files changed, 455 insertions(+), 1328 deletions(-)
this went through a number of review and test iterations before it was
committed, it was tested on various architectures, was exposed to
linux-next for quite some time - nevertheless it might cause problems
on architectures that don't read the mailing lists and don't regularly
test linux-next.
This cat herding excercise was motivated by the -rt kernel, and was
brought to you by Thomas "the Whip" Gleixner."
* 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (40 commits)
idle: Remove GENERIC_IDLE_LOOP config switch
um: Use generic idle loop
ia64: Make sure interrupts enabled when we "safe_halt()"
sparc: Use generic idle loop
idle: Remove unused ARCH_HAS_DEFAULT_IDLE
bfin: Fix typo in arch_cpu_idle()
xtensa: Use generic idle loop
x86: Use generic idle loop
unicore: Use generic idle loop
tile: Use generic idle loop
tile: Enter idle with preemption disabled
sh: Use generic idle loop
score: Use generic idle loop
s390: Use generic idle loop
powerpc: Use generic idle loop
parisc: Use generic idle loop
openrisc: Use generic idle loop
mn10300: Use generic idle loop
mips: Use generic idle loop
microblaze: Use generic idle loop
...
- Populate the boot_params with EDD data.
- Cleanups in the IRQ code.
Bug-fixes:
- CPU hotplug offline/online in PVHVM mode.
- Re-upload processor PM data after ACPI S3 suspend/resume cycle.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQEcBAABAgAGBQJRcWFDAAoJEFjIrFwIi8fJQScIAJmnsPzoqd/USahqgbOvc0Nb
fJgzvno0Lfuvi5NipMpMDNCpCxRuikoJe6ocoF6E0+poucckRGDFSC3R2KVgLR2O
pXkO7bzggkUYkLehW7gXmM4IlvhLxnsfEofDWxG2VJy6RfWxFk+84v/uimQSIB0D
jI9FB3oUhfA+IT8/3Iofyv2OiPH39zNLlzidAnVXmJ8SoJFDLt3l1IymYqVdq6Lt
AKL0IXa31f+yfj9Vli1mkBsxuEIvIo5tVQNn250B4cMlR8mcWsQBnxIeA2M+8Jhl
d2ereTBwhjohCrFR/TWlXYrMlL+XVucI7J7agf3tF4kmDUqjg+c9hUkszSOPjEc=
=yJWr
-----END PGP SIGNATURE-----
Merge tag 'stable/for-linus-3.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen
Pull Xen updates from Konrad Rzeszutek Wilk:
"Features:
- Populate the boot_params with EDD data.
- Cleanups in the IRQ code.
Bug-fixes:
- CPU hotplug offline/online in PVHVM mode.
- Re-upload processor PM data after ACPI S3 suspend/resume cycle."
And Konrad gets a gold star for sending the pull request early when he
thought he'd be away for the first week of the merge window (but because
of 3.9 dragging out to -rc8 he then re-sent the reminder on the first
day of the merge window anyway)
* tag 'stable/for-linus-3.10-rc0-tag' of git://git.kernel.org/pub/scm/linux/kernel/git/konrad/xen:
xen: resolve section mismatch warnings in xen-acpi-processor
xen: Re-upload processor PM data to hypervisor after S3 resume (v2)
xen/smp: Unifiy some of the PVs and PVHVM offline CPU path
xen/smp/pvhvm: Don't initialize IRQ_WORKER as we are using the native one.
xen/spinlock: Disable IRQ spinlock (PV) allocation on PVHVM
xen/spinlock: Check against default value of -1 for IRQ line.
xen/time: Add default value of -1 for IRQ and check for that.
xen/events: Check that IRQ value passed in is valid.
xen/time: Fix kasprintf splat when allocating timer%d IRQ line.
xen/smp/spinlock: Fix leakage of the spinlock interrupt line for every CPU online/offline
xen/smp: Fix leakage of timer interrupt line for every CPU online/offline.
xen kconfig: fix select INPUT_XEN_KBDDEV_FRONTEND
xen: drop tracking of IRQ vector
x86/xen: populate boot_params with EDD data
There is no need to use the PV version of the IRQ_WORKER mechanism
as under PVHVM we are using the native version. The native
version is using the SMP API.
They just sit around unused:
69: 0 0 xen-percpu-ipi irqwork0
83: 0 0 xen-percpu-ipi irqwork1
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
See git commit f10cd522c5
(xen: disable PV spinlocks on HVM) for details.
But we did not disable it everywhere - which means that when
we boot as PVHVM we end up allocating per-CPU irq line for
spinlock. This fixes that.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
The default (uninitialized) value of the IRQ line is -1.
Check if we already have allocated an spinlock interrupt line
and if somebody is trying to do it again. Also set it to -1
when we offline the CPU.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
If the timer interrupt has been de-init or is just now being
initialized, the default value of -1 should be preset as
interrupt line. Check for that and if something is odd
WARN us.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
When we online the CPU, we get this splat:
smpboot: Booting Node 0 Processor 1 APIC 0x2
installing Xen timer for CPU 1
BUG: sleeping function called from invalid context at /home/konrad/ssd/konrad/linux/mm/slab.c:3179
in_atomic(): 1, irqs_disabled(): 0, pid: 0, name: swapper/1
Pid: 0, comm: swapper/1 Not tainted 3.9.0-rc6upstream-00001-g3884fad #1
Call Trace:
[<ffffffff810c1fea>] __might_sleep+0xda/0x100
[<ffffffff81194617>] __kmalloc_track_caller+0x1e7/0x2c0
[<ffffffff81303758>] ? kasprintf+0x38/0x40
[<ffffffff813036eb>] kvasprintf+0x5b/0x90
[<ffffffff81303758>] kasprintf+0x38/0x40
[<ffffffff81044510>] xen_setup_timer+0x30/0xb0
[<ffffffff810445af>] xen_hvm_setup_cpu_clockevents+0x1f/0x30
[<ffffffff81666d0a>] start_secondary+0x19c/0x1a8
The solution to that is use kasprintf in the CPU hotplug path
that 'online's the CPU. That is, do it in in xen_hvm_cpu_notify,
and remove the call to in xen_hvm_setup_cpu_clockevents.
Unfortunatly the later is not a good idea as the bootup path
does not use xen_hvm_cpu_notify so we would end up never allocating
timer%d interrupt lines when booting. As such add the check for
atomic() to continue.
CC: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
While we don't use the spinlock interrupt line (see for details
commit f10cd522c5 -
xen: disable PV spinlocks on HVM) - we should still do the proper
init / deinit sequence. We did not do that correctly and for the
CPU init for PVHVM guest we would allocate an interrupt line - but
failed to deallocate the old interrupt line.
This resulted in leakage of an irq_desc but more importantly this splat
as we online an offlined CPU:
genirq: Flags mismatch irq 71. 0002cc20 (spinlock1) vs. 0002cc20 (spinlock1)
Pid: 2542, comm: init.late Not tainted 3.9.0-rc6upstream #1
Call Trace:
[<ffffffff811156de>] __setup_irq+0x23e/0x4a0
[<ffffffff81194191>] ? kmem_cache_alloc_trace+0x221/0x250
[<ffffffff811161bb>] request_threaded_irq+0xfb/0x160
[<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
[<ffffffff813a8423>] bind_ipi_to_irqhandler+0xa3/0x160
[<ffffffff81303758>] ? kasprintf+0x38/0x40
[<ffffffff8104c6f0>] ? xen_spin_trylock+0x20/0x20
[<ffffffff810cad35>] ? update_max_interval+0x15/0x40
[<ffffffff816605db>] xen_init_lock_cpu+0x3c/0x78
[<ffffffff81660029>] xen_hvm_cpu_notify+0x29/0x33
[<ffffffff81676bdd>] notifier_call_chain+0x4d/0x70
[<ffffffff810bb2a9>] __raw_notifier_call_chain+0x9/0x10
[<ffffffff8109402b>] __cpu_notify+0x1b/0x30
[<ffffffff8166834a>] _cpu_up+0xa0/0x14b
[<ffffffff816684ce>] cpu_up+0xd9/0xec
[<ffffffff8165f754>] store_online+0x94/0xd0
[<ffffffff8141d15b>] dev_attr_store+0x1b/0x20
[<ffffffff81218f44>] sysfs_write_file+0xf4/0x170
[<ffffffff811a2864>] vfs_write+0xb4/0x130
[<ffffffff811a302a>] sys_write+0x5a/0xa0
[<ffffffff8167ada9>] system_call_fastpath+0x16/0x1b
cpu 1 spinlock event irq -16
smpboot: Booting Node 0 Processor 1 APIC 0x2
And if one looks at the /proc/interrupts right after
offlining (CPU1):
70: 0 0 xen-percpu-ipi spinlock0
71: 0 0 xen-percpu-ipi spinlock1
77: 0 0 xen-percpu-ipi spinlock2
There is the oddity of the 'spinlock1' still being present.
CC: stable@vger.kernel.org
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
In the PVHVM path when we do CPU online/offline path we would
leak the timer%d IRQ line everytime we do a offline event. The
online path (xen_hvm_setup_cpu_clockevents via
x86_cpuinit.setup_percpu_clockev) would allocate a new interrupt
line for the timer%d.
But we would still use the old interrupt line leading to:
kernel BUG at /home/konrad/ssd/konrad/linux/kernel/hrtimer.c:1261!
invalid opcode: 0000 [#1] SMP
RIP: 0010:[<ffffffff810b9e21>] [<ffffffff810b9e21>] hrtimer_interrupt+0x261/0x270
.. snip..
<IRQ>
[<ffffffff810445ef>] xen_timer_interrupt+0x2f/0x1b0
[<ffffffff81104825>] ? stop_machine_cpu_stop+0xb5/0xf0
[<ffffffff8111434c>] handle_irq_event_percpu+0x7c/0x240
[<ffffffff811175b9>] handle_percpu_irq+0x49/0x70
[<ffffffff813a74a3>] __xen_evtchn_do_upcall+0x1c3/0x2f0
[<ffffffff813a760a>] xen_evtchn_do_upcall+0x2a/0x40
[<ffffffff8167c26d>] xen_hvm_callback_vector+0x6d/0x80
<EOI>
[<ffffffff81666d01>] ? start_secondary+0x193/0x1a8
[<ffffffff81666cfd>] ? start_secondary+0x18f/0x1a8
There is also the oddity (timer1) in the /proc/interrupts after
offlining CPU1:
64: 1121 0 xen-percpu-virq timer0
78: 0 0 xen-percpu-virq timer1
84: 0 2483 xen-percpu-virq timer2
This patch fixes it.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
CC: stable@vger.kernel.org
During early setup of a dom0 kernel, populate boot_params with the
Enhanced Disk Drive (EDD) and MBR signature data. This makes
information on the BIOS boot device available in /sys/firmware/edd/.
Signed-off-by: David Vrabel <david.vrabel@citrix.com>
Acked-by: Jan Beulich <jbeulich@suse.com>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Pull x86 fixes from Ingo Molnar:
"Misc fixes"
* 'x86-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
x86/mm: Flush lazy MMU when DEBUG_PAGEALLOC is set
x86/mm/cpa/selftest: Fix false positive in CPA self test
x86/mm/cpa: Convert noop to functional fix
x86, mm: Patch out arch_flush_lazy_mmu_mode() when running on bare metal
x86, mm, paravirt: Fix vmalloc_fault oops during lazy MMU updates
The two use-cases where we needed to store the GDT were during ACPI S3 suspend
and resume. As the patches:
x86/gdt/i386: store/load GDT for ACPI S3 or hibernation/resume path is not needed
x86/gdt/64-bit: store/load GDT for ACPI S3 or hibernate/resume path is not needed.
have demonstrated - there are other mechanism by which the GDT is
saved and reloaded during early resume path.
Hence we do not need to worry about the pvops call-chain for saving the
GDT and can and can eliminate it. The other areas where the store_gdt is
used are never going to be hit when running under the pvops platforms.
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Link: http://lkml.kernel.org/r/1365194544-14648-4-git-send-email-konrad.wilk@oracle.com
Signed-off-by: H. Peter Anvin <hpa@linux.intel.com>