Our selection of interrupts to consider for IRQ migration is sub-
standard. We were potentially including per-CPU interrupts in our
migration strategy, but omitting chained interrupts. This caused
some interrupts to remain on a downed CPU.
We were also trying to migrate interrupts which were not migratable,
resulting in an OOPS.
Instead, iterate over all interrupts, skipping per-CPU interrupts
or interrupts whose affinity does not include the downed CPU, and
attempt to set the affinity for every one else if their chip
implements irq_set_affinity().
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Now that the GIC takes care of selecting a target interrupt from the
affinity mask, we don't need all this complexity in the core code
anymore. Just detect when we need to break affinity.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
irqdesc's node member is supposed to mark the numa node number for the
interrupt. Our use of it is non-standard. Remove this, replacing the
functionality with a test of the affinity mask.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Currently, the documented kernel entry requirements are not
explicit about whether the kernel should be entered in ARM or
Thumb, leading to an ambiguitity about how to enter Thumb-2
kernels. As a result, the kernel is reliant on the zImage
decompressor to enter the kernel proper in the correct instruction
set state.
This patch changes the boot entry protocol for head.S and Image to
be the same as for zImage: in all cases, the kernel is now entered
in ARM.
Documentation/arm/Booting is updated to reflect this new policy.
A different rule will be needed for Cortex-M class CPUs as and when
support for those lands in mainline, since these CPUs don't support
the ARM instruction set at all: a note is added to the effect that
the kernel must be entered in Thumb on such systems.
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Fix a hole in the VFP thread migration. Lets define two threads.
Thread 1, we'll call 'interesting_thread' which is a thread which is
running on CPU0, using VFP (so vfp_current_hw_state[0] =
&interesting_thread->vfpstate) and gets migrated off to CPU1, where
it continues execution of VFP instructions.
Thread 2, we'll call 'new_cpu0_thread' which is the thread which takes
over on CPU0. This has also been using VFP, and last used VFP on CPU0,
but doesn't use it again.
The following code will be executed twice:
cpu = thread->cpu;
/*
* On SMP, if VFP is enabled, save the old state in
* case the thread migrates to a different CPU. The
* restoring is done lazily.
*/
if ((fpexc & FPEXC_EN) && vfp_current_hw_state[cpu]) {
vfp_save_state(vfp_current_hw_state[cpu], fpexc);
vfp_current_hw_state[cpu]->hard.cpu = cpu;
}
/*
* Thread migration, just force the reloading of the
* state on the new CPU in case the VFP registers
* contain stale data.
*/
if (thread->vfpstate.hard.cpu != cpu)
vfp_current_hw_state[cpu] = NULL;
The first execution will be on CPU0 to switch away from 'interesting_thread'.
interesting_thread->cpu will be 0.
So, vfp_current_hw_state[0] points at interesting_thread->vfpstate.
The hardware state will be saved, along with the CPU number (0) that
it was executing on.
'thread' will be 'new_cpu0_thread' with new_cpu0_thread->cpu = 0.
Also, because it was executing on CPU0, new_cpu0_thread->vfpstate.hard.cpu = 0,
and so the thread migration check is not triggered.
This means that vfp_current_hw_state[0] remains pointing at interesting_thread.
The second execution will be on CPU1 to switch _to_ 'interesting_thread'.
So, 'thread' will be 'interesting_thread' and interesting_thread->cpu now
will be 1. The previous thread executing on CPU1 is not relevant to this
so we shall ignore that.
We get to the thread migration check. Here, we discover that
interesting_thread->vfpstate.hard.cpu = 0, yet interesting_thread->cpu is
now 1, indicating thread migration. We set vfp_current_hw_state[1] to
NULL.
So, at this point vfp_current_hw_state[] contains the following:
[0] = &interesting_thread->vfpstate
[1] = NULL
Our interesting thread now executes a VFP instruction, takes a fault
which loads the state into the VFP hardware. Now, through the assembly
we now have:
[0] = &interesting_thread->vfpstate
[1] = &interesting_thread->vfpstate
CPU1 stops due to ptrace (and so saves its VFP state) using the thread
switch code above), and CPU0 calls vfp_sync_hwstate().
if (vfp_current_hw_state[cpu] == &thread->vfpstate) {
vfp_save_state(&thread->vfpstate, fpexc | FPEXC_EN);
BANG, we corrupt interesting_thread's VFP state by overwriting the
more up-to-date state saved by CPU1 with the old VFP state from CPU0.
Fix this by ensuring that we have sane semantics for the various state
describing variables:
1. vfp_current_hw_state[] points to the current owner of the context
information stored in each CPUs hardware, or NULL if that state
information is invalid.
2. thread->vfpstate.hard.cpu always contains the most recent CPU number
which the state was loaded into or NR_CPUS if no CPU owns the state.
So, for a particular CPU to be a valid owner of the VFP state for a
particular thread t, two things must be true:
vfp_current_hw_state[cpu] == &t->vfpstate && t->vfpstate.hard.cpu == cpu.
and that is valid from the moment a CPU loads the saved VFP context
into the hardware. This gives clear and consistent semantics to
interpreting these variables.
This patch also fixes thread copying, ensuring that t->vfpstate.hard.cpu
is invalidated, otherwise CPU0 may believe it was the last owner. The
hole can happen thus:
- thread1 runs on CPU2 using VFP, migrates to CPU3, exits and thread_info
freed.
- New thread allocated from a previously running thread on CPU2, reusing
memory for thread1 and copying vfp.hard.cpu.
At this point, the following are true:
new_thread1->vfpstate.hard.cpu == 2
&new_thread1->vfpstate == vfp_current_hw_state[2]
Lastly, this also addresses thread flushing in a similar way to thread
copying. Hole is:
- thread runs on CPU0, using VFP, migrates to CPU1 but does not use VFP.
- thread calls execve(), so thread flush happens, leaving
vfp_current_hw_state[0] intact. This vfpstate is memset to 0 causing
thread->vfpstate.hard.cpu = 0.
- thread migrates back to CPU0 before using VFP.
At this point, the following are true:
thread->vfpstate.hard.cpu == 0
&thread->vfpstate == vfp_current_hw_state[0]
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
x86 uses _text to mark the start of the kernel image including the
head text, and _stext to mark the start of the .text section. Change
our vmlinux.lds to conform. An audit of the places which use _stext
and _text in arch/arm indicates no users of either symbol are impacted
by this change. It does mean a slight change to /proc/iomem output.
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Place the init sections between the text and data sections. This
means all code is grouped together at the beginning of the kernel
image, and all data is at the end of the image. This avoids problems
with the 24-bit branch instruction relocations becoming invalid with
large initramfs images.
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
RODATA() already handles these sections, so allow it to take care
of them for us.
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Keep the various linker tables as separate output sections rather
than combining them together into one big .init section. This
makes the 'vmlinux' easier to see what is placed where.
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Rather than scattering the discarded sections throughout the linker
file, move them to the start.
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Tested-by: Stephen Boyd <sboyd@codeaurora.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
To get hundredths of MHz the rate needs to be divided by 10'000.
Here is an example:
twd_timer_rate = 123456789
Before the patch:
twd_timer_rate / 1000000 = 123
(twd_timer_rate / 1000000) % 100 = 23
Result: 123.23MHz.
After being fixed:
twd_timer_rate / 1000000 = 123
(twd_timer_rate / 10000) % 100 = 45
Result: 123.45MHz.
Signed-off-by: Vitaly Kuzmichev <vkuzmichev@mvista.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
If an ARM system has multiple cpus in the same socket and the
kernel is booted with maxcpus=1, secondary cpus are possible but
not present due to how platform_smp_prepare_cpus() is called.
Since most typical ARM processors don't actually support physical
hotplug, initialize the present map to be equal to the possible
map in generic ARM SMP code. Also, always call
platform_smp_prepare_cpus() as long as max_cpus is non-zero (0
means no SMP) to allow platform code to do any SMP setup.
After applying this patch it's possible to boot an ARM system
with maxcpus=1 on the command line and then hotplug in secondary
cpus via sysfs. This is more in line with how x86 does things.
Signed-off-by: Stephen Boyd <sboyd@codeaurora.org>
Cc: Paul Mundt <lethal@linux-sh.org>
Cc: Kukjin Kim <kgene.kim@samsung.com>
Cc: David Brown <davidb@codeaurora.org>
Cc: Tony Lindgren <tony@atomide.com>
Cc: Srinidhi Kasagar <srinidhi.kasagar@stericsson.com>
Cc: Linus Walleij <linus.walleij@stericsson.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The scu_power_mode function can be used on UP builds as it drives signals
to an SOC power controller. So make it selectable for !SMP.
Signed-off-by: Rob Herring <rob.herring@calxeda.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
By allowing code to detect whether DTCM or ITCM is present, code paths
involving TCM can be avoided when running on platforms that lack it.
This is good for creating single kernels across several archs, if some
of them utilize TCM but others don't.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The PB11MPCore reports "3" DTCM banks, but anything above 2 is an
"undefined" value, so push this to become 0. Further add some checks
if code is compiled to TCM even if there is no D/ITCM present in the
system, and if we can really fit the compiled code. We don't do the
BUG() since it's not helpful, it's better to deal with non-present
TCM dynamically. If there is nothing compiled to the TCM and no TCM
is detected, it will now just shut up even if TCM support is enabled.
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ensure that the meminfo array is sanity checked before we pass the
memory to memblock. This helps to ensure that memblock and meminfo
agree on the dimensions of memory, especially when more memory is
passed than the kernel can deal with.
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
armpmu_enable can be called in situations where no events are present
(for example, from the event rotation tick after a profiled task has
exited). In this case, we currently start the PMU anyway which may
leave it active inevitably without any events being monitored.
This patch adds a simple check to the enabling code so that we avoid
starting the PMU when no events are present.
Cc: <stable@kernel.org>
Reported-by: Ashwin Chaugle <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The SVC IRQ, prefetch and data abort handlers preserve the SPSR value
via r5 across the exception. Rather than re-loading it from pt_regs,
use the preserved value instead.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tail-call the main C data abort handler code from the per-CPU helper
code. Update the comments in the code wrt the new calling and return
register state.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Tail-call the main C prefetch abort handler code from the per-CPU
helper code. Also note that the helper function becomes ABI
compliant in terms of the registers preserved.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This avoids the irq entry assembly corrupting r5, thereby allowing it
to be preserved through to the svc exit code.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
All handlers now call trace_hardirqs_off, so move this common code into
the (svc|usr)_entry assembler macros.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
As we no longer re-enable interrupts in these exception handlers, add
the irqsoff tracing calls to them so that the kernel tracks the state
more accurately.
Note that these calls are conditional on IRQSOFF_TRACER:
kernel ----------> user ---------> kernel
^ irqs enabled ^ irqs disabled
No kernel code can run on the local CPU until we've re-entered the
kernel through one of the exception handlers - and userspace can not
take any locks etc. So, the kernel doesn't care about the IRQ mask
state while userspace is running unless we're doing IRQ off latency
tracing. So, we can (and do) avoid the overhead of updating the IRQ
mask state on every kernel->user and user->kernel transition.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Add irqtrace function calls to the undefined exception handler, so
that we get sane lockdep traces from locking problems in undefined
exception handlers.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Avoid enabling interrupts if the parent context had interrupts enabled
in the abort handler assembly code, and move this into the breakpoint/
page/alignment fault handlers instead.
This gets rid of some special-casing for the breakpoint fault handlers
from the low level abort handler path.
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
There are SoCs where attempting to enter a low power state is ignored,
and the CPU continues executing instructions with all state preserved.
It is over-complex at that point to disable the MMU just to call the
resume path.
Instead, allow the suspend finisher to return error codes to abort
suspend in this circumstance, where the cpu_suspend internals will then
unwind the saved state on the stack. Also omit the tlb flush as no
changes to the page tables will have happened.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This avoids unnecessary instructions for CPUs which implement the IFAR
(instruction fault address register).
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This allows us to avoid moving registers twice to work around the
clobbered registers when we add calls to trace_hardirqs_{on,off}.
Ensure that all SVC handlers return with SPSR in r5 for consistency.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This patch adds support for platform_device_id tables, allowing new
PMU types to be registered with the correct type, without requiring
new platform_driver shims to provide the type. An single entry for
existing devices is provided.
Macros matching functionality of the of_device_id table macros are
provided for convenience.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jamie Iles <jamie@jamieiles.com>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
This is based on an earlier patch from Rob Herring <rob.herring@calxeda.com>
> Add OF match table to enable OF style driver binding. The dts entry is like
> this:
>
> pmu {
> compatible = "arm,cortex-a9-pmu";
> interrupts = <100 101>;
> };
>
> The use of pdev->id as an index breaks with OF device binding, so set the type
> based on the OF compatible string.
This modification sets the PMU hardware type based on data embedded in the
binding, allowing easy addition of new PMU types in future.
Support for new PMU types not provided by devicetree can be added later using
platform_device_id tables in a similar fashion.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jamie Iles <jamie@jamieiles.com>
Acked-by: Rob Herring <rob.herring@calxeda.com>
Cc: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Currently, the PMU reservation framework allows for multiple PMUs of
the same type to register themselves. This can lead to a bug with the
sequence:
register_pmu(pmu1);
reserve_pmu(pmu_type);
register_pmu(pmu2);
release_pmu(pmu1);
Here, pmu1 cannot be released, and pmu2 cannot be reserved.
This patch modifies register_pmu to reject registrations where a PMU is
already present, preventing this problem. PMUs which can have multiple
instances should not use the PMU reservation framework.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jamie Iles <jamie@jamieiles.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Currently, PMU platform_device reservation relies on some minor abuse
of the platform_device::id field for determining the type of PMU. This
is problematic for device tree based probing, where the ID cannot be
controlled.
This patch removes reliance on the id field, and depends on each PMU's
platform driver to figure out which type it is. As all PMUs handled by
the current platform_driver name "arm-pmu" are CPU PMUs, this
convention is hardcoded. New PMU types can be supported through the use
of {of,platform}_device_id tables
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Jamie Iles <jamie@jamieiles.com>
Acked-by: Will Deacon <will.deacon@arm.com>
Cc: Rob Herring <rob.herring@calxeda.com>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
There's no point checking to see whether IRQs were masked in the parent
context when returning from IRQ handling - the fact that we're handling
an IRQ means that the parent context must have had IRQs unmasked.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
irq_enter() and irq_exit() already take care of the preempt_count
handling for interrupts, which increment and decrement the hardirq
bits of the preempt count. So we can remove the preempt count handing
in our IRQ entry/exit assembly, like x86 did some 9 years ago.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Replace r4 with ip for calling abort helpers - ip is allowed to be
corrupted by called functions in the ABI, so it makes more sense to
use such a register.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The first and second arguments shouldn't concern platform code, so
hide them from each platforms caller.
Tested-by: Kevin Hilman <khilman@ti.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
As we have core code dealing with CPU suspend/resume, we can
re-initialize the CPUs exception banked registers via that code rather
than having platforms deal with that level of detail. So, move the
call to cpu_init() out of platform code into core code.
Tested-by: Kevin Hilman <khilman@ti.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
cpu_suspend() has a weird calling method which makes it only possible to
call from assembly code: it returns with a modified stack pointer to
finish the suspend, but on resume, it 'returns' via a provided pointer.
We can make cpu_suspend() appear to be a normal function merely by
swapping the resume pointer argument and the link register.
Do so, and update all callers to take account of this more traditional
behaviour.
Acked-by: Frank Hofmann <frank.hofmann@tomtom.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Save the suspend function pointer onto the stack for use when returning.
Allocate r2 to pass an argument to the suspend function.
Acked-by: Frank Hofmann <frank.hofmann@tomtom.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Avoid using r2 and r3 in the suspend code, allowing these to be
passed further into the function as arguments.
Acked-by: Frank Hofmann <frank.hofmann@tomtom.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Make cpu_suspend()..return function preserve r4 to r11 across a suspend
cycle. This is in preparation of relieving platform support code from
this task.
Acked-by: Frank Hofmann <frank.hofmann@tomtom.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Very little code is different between these two paths now, so extract
the common code.
Acked-by: Frank Hofmann <frank.hofmann@tomtom.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Move the return address for cpu_resume to the top of stack so that
cpu_resume looks more like a normal function.
Acked-by: Frank Hofmann <frank.hofmann@tomtom.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Eliminate the differences between MULTI_CPU and non-MULTI_CPU resume
paths, making the saved structure identical irrespective of the way
the kernel was configured.
Acked-by: Frank Hofmann <frank.hofmann@tomtom.com>
Tested-by: Kevin Hilman <khilman@ti.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
cpu_proc_init() does processor specific initialization, which we do
at boot time. We have been omitting to do this on resume, which
causes some of this initialization to be skipped. We've also been
skipping this on SMP initialization too.
Ensure that cpu_proc_init() is always called appropriately by
moving it into cpu_init(), and move cpu_init() to a more appropriate
point in the boot initialization.
Tested-by: Kevin Hilman <khilman@ti.com>
Acked-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
When we bring a CPU online, we should wait for it to become active
before entering the idle thread, so we know that the scheduler and
thread migration is going to work.
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
The "Thumb bit" of a symbol is only really meaningful for function
symbols (STT_FUNC).
However, sometimes a branch is relocated against a non-function
symbol; for example, PC-relative branches to anonymous assembler
local symbols are typically fixed up against the start-of-section
symbol, which is not a function symbol. Some inline assembler
generates references of this type, such as fixup code generated by
macros in <asm/uaccess.h>.
The existing relocation code for R_ARM_THM_CALL/R_ARM_THM_JUMP24
interprets this case as an error, because the target symbol appears
to be an ARM symbol; but this is really not the case, since the
target symbol is just a base in these cases. The addend defines
the precise offset to the target location, but since the addend is
encoded in a non-interworking Thumb branch instruction, there is no
explicit Thumb bit in the addend. Because these instructions never
interwork, the implied Thumb bit in the addend is 1, and the
destination is Thumb by definition.
This patch removes the extraneous Thumb bit check for non-function
symbols, enabling modules containing the affected relocation types
to be loaded. No modification to the actual relocation code is
required, since this code does not take bit[0] of the
location->destination offset into account in any case.
Function symbols are always checked for interworking conflicts, as
before.
Signed-off-by: Dave Martin <dave.martin@linaro.org>
Acked-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>