We need to ensure that state is pushed out from the L2 cache when
suspending so that the resume paths can access their data before the
MMU and caches have been re-initialized. Add the necessary calls to
__cpu_suspend_save().
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Convert some of the sleep.S guts to C code, which makes it easier to
use our macros and to add L2 cache handling. We provide a helper
function, __cpu_suspend_save(), which deals with saving the common
state, setting up for resume, and flushing caches.
The remainder left as assembly code is the saving of the CPU general
purpose registers, and allocating space on the stack to save the CPU
specific registers and resume state.
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
We don't require cpu_resume_turn_mmu_on as we can combine the ldr
instruction with the following code provided we ensure that
cpu_resume_mmu is aligned for older CPUs. Note that we also align
to a 32-byte boundary to ensure that the code can't cross a section
boundary.
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Only use the preallocated page table during the resume, not while
suspending. This avoids the overhead of having to switch unnecessarily
to the resume page table in the suspend path.
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Preallocate a page table and setup an identity mapping for the MMU
enable code. This means we don't have to "borrow" a page table to
do this, avoiding complexities with L2 cache coherency.
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
Ensure that the return value from __cpu_suspend is non-zero when
aborting. Zero indicates a successful suspend occurred.
Tested-by: Santosh Shilimkar <santosh.shilimkar@ti.com>
Tested-by: Shawn Guo <shawn.guo@linaro.org>
Tested-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk>
These benchmarks show the basic speed of kprobes and verify the success
of optimisations done to the emulation of typical function entry
instructions (i.e. push/stmdb).
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
This is used to verify that all combinations of CPU instructions
described by the kprobes decoding tables have a test case.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
These check that the bitmask and match value used in the decoding tables
are self consistent.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
The test code will be using kprobes' internal decoding tables so we
need to export these for when then the tests are compiled as a module.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
On ARM we have to simulate/emulate CPU instructions in order to
singlestep them. This patch adds a framework which can be used to
construct test cases for different instruction forms. It is described in
detail in the in-source comments of kprobes-test.c
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
These test that the different kinds of probes can be successfully placed
into ARM and Thumb code and that the handlers are called correctly when
this code is executed.
Signed-off-by: Jon Medhurst <tixy@yxit.co.uk>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Currently, armpmu_enable iterates through the events for a given
counter set, calling armpmu->enable on each before calling
armpmu->start to start the PMU's counters.
As armpmu->enable is called when each event is added, each event is
already configured in hardware. Due to this, calling armpmu->enable
in armpmu_enable is unnecessary and confusing.
This patch removes the unnecessary calls to armpmu->enable.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, struct arm_pmu and related functions are only visible to
{,arch/arm/}/kernel/perf_event.c. This prevents new drivers from using
the framework.
This patch moves declarations to asm/pmu.h, allowing new PMU drivers
to use the framework.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently struct cpu_hw_events stores data on events running on a
PMU associated with a CPU. As this data is general enough to be used
for system PMUs, this name is a misnomer, and may cause confusion when
it is used for system PMUs.
Additionally, 'armpmu' is commonly used as a parameter name for an
instance of struct arm_pmu. The name is also used for a global instance
which represents the CPU's PMU.
As cpu_hw_events is now not tied to CPU PMUs, it is renamed to
pmu_hw_events, with instances of it renamed similarly. As the global
'armpmu' is CPU-specfic, it is renamed to cpu_pmu. This should make it
clearer which code is generic, and which is coupled with the CPU.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently the event accounting data in pmu_hw_events is stored in
fixed-sized arrays within the structure.
This patch refactors the accounting data to allow any number of events
to be managed.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, a single static instance of struct pmu is used when
registering an ARM PMU with the main perf subsystem. This limits
the ARM perf code to supporting a single PMU.
This patch replaces the static struct pmu instance with a member
variable on struct arm_pmu. This provides bidirectional mapping
between the two structs, and therefore allows for support of multiple
PMUs. The function 'to_arm_pmu' is provided for convenience.
PMU-generic functions are also updated to use the new mapping, and
PMU-generic initialisation of the member variables is moved into a new
function: armpmu_init.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently mapping an event type to a hardware configuration value
depends on the data being pointed to from struct arm_pmu. These fields
(cache_map, event_map, raw_event_mask) are currently specific to CPU
PMUs, and do not serve the general case well.
This patch replaces the event map pointers on struct arm_pmu with a new
'map_event' function pointer. Small shim functions are used to reuse
the existing common code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, the ARM perf code assumes all PMUs it will handle are
CPU PMUs, having ARM_PMU_DEVICE_CPU hardcoded when reserving or
releasing hardware. This means that currently, the ARM perf code can't
support system PMUs.
This patch adds a 'type' field to struct arm_pmu, which allows the code
to reserve & release the hardware regardless of the PMU type.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, a single lock serialises access to CPU PMU registers. This
global locking is unnecessary as PMU registers are local to the CPU
they monitor.
This patch replaces the global lock with a per-CPU lock. As the lock is
in struct cpu_hw_events, PMUs providing a single cpu_hw_events instance
can be locked globally.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
As armpmu_disable will call armpmu->stop when the last event has been
removed, this is pointless and simply adds to the noise when debugging.
Additionally, due to this call occurring in a preemptible context, this
is problematic for per-cpu locking of PMU registers (where we will
attempt to access per-cpu spinlock for use with raw_spin_lock_irqsave).
This patch removes the call to armpmu->stop.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, cpu_hw_events is a global per-CPU variable. To enable
support for multiple PMUs, there needs to be a mapping from an instance
of arm_pmu to its cpu_hw_events. Additionally, as system PMUs are not
CPU-affine, they should not have this stored per-CPU.
This patch moves access to the hardware events data behind an accessor
function (arm_pmu::get_hw_events). This allows each instance to have
its own hardware event data, which can be stored per-CPU or globally as
required.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently the ARM perf code supports having a single struct
platform_device to supply IRQ numbers, limiting it to supporting a
single PMU.
This patch makes a platform_device instance variable on struct arm_pmu.
This should allow for multiple PMUs to be supported in future.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch moves the active_events counter into struct arm_pmu, in
preparation for supporting multiple PMUs. This also moves
pmu_reserve_mutex, as it is used to guard accesses to active_events.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, pmu_hw_events::active_mask is used to keep track of which
events are active in hardware. As we can stop counters and their
interrupts, this is unnecessary.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, event group validation compares each event's 'pmu' pointer
against the static 'pmu' pointer. This limits the code to supporting
only 1 PMU.
This patch changes the behaviour to consider an event's group leader's
'pmu' pointer as canonical for validation. This should ease later
generalisation of the code to support multiple PMUs at once.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Currently, an "empty" struct pmu is registered as the CPU PMU,
regardless of whether there is a physical PMU. This burdens the
accessor functions with checks to see whether a PMU is actually
present.
This patch changes initialisation to register a PMU only if there is a
supported PMU present, and removes the checks that this change makes
redundant.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Reviewed-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Ashwin Chaugule <ashwinc@codeaurora.org>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ARM hw_breakpoint backend is currently a bit too noisy when things
start to go awry.
This patch removes a couple of over-zealous WARN_ONCE invocations and
replaces then with pr_warnings instead.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ARM debug registers can only be accessed if the DBGSWENABLE signal
to the core is driven HIGH by the DAP. The architecture does not provide
a way to detect the value of this signal, so the best we can do is
register an undef_hook to trap debug register co-processor accesses and
then fail if the trap is taken.
Signed-off-by: Will Deacon <will.deacon@arm.com>
ARM debug architecture 7.1 mandates that the DFAR is updated on a
watchpoint debug exception to contain the faulting virtual address
of the memory access. This allows us to determine which watchpoints
have fired and therefore report useful information to userspace.
This patch adds support for using the DFAR in the watchpoint handler,
which allows us to support multiple watchpoints on CPUs implementing
the new debug architecture.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The current hw_breakpoint code on ARM reserves 1 breakpoint for each
watchpoint that is available. Since debug architectures prior to 7.1
are restricted to 1 watchpoint anyway, only one breakpoint was ever
reserved.
This patch changes the reservation strategy so that a single breakpoint
is reserved, regardless of the number of watchpoints. This is in
preparation for multiple-watchpoint support on debug architectures
from 7.1 onwards.
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch adds initial support for Cortex-A15 (debug architecture v7.1)
to the hw_breakpoint ARM backend.
Signed-off-by: Will Deacon <will.deacon@arm.com>
The Cortex-A15 PMU implements the PMUv2 specification and therefore
has support for some mode exclusion.
This patch adds support for excluding user, kernel and hypervisor counts
from a given event.
Acked-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Modern PMUs allow for mode exclusion, so we no longer wish to return
-EPERM if it is requested.
This patch provides a hook in the armpmu structure for implementing
mode exclusion. The hw_perf_event initialisation is slightly delayed so
that the backend code can update the structure if required.
Acked-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
ARM PMU code used to use 1-based indices for PMU registers. This caused
several data structures (pmu_hw_events::{active_events, used_mask, events})
to have an unused element at index zero. ARMPMU_MAX_HWEVENTS still takes
this indexing into account, and currently equates to 33.
This patch updates the core ARM perf code to use the 0th index again.
Acked-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Now that the ARMv7 PMU backend indexes event counters from zero, follow
suit and do the same for ARMv6 and Xscale.
Acked-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The current ARMv7 PMU backend indexes event counters from two, with
index zero being reserved and index one being used to represent the
cycle counter.
This patch tidies up the code by indexing from one instead (with zero
for the cycle counter). This allows us to remove many of the accessor
macros along with the counter enumeration and makes the code much more
readable.
Acked-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch ensures that integers are used to represent event indices in
the ARMv7 PMU backend. This ensures consistency between functions and
also with the arm_pmu structure.
Acked-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The ARMv7 perf backend mixes up u32 and unsigned long, which is rather
ugly.
This patch makes the ARMv7 PMU code consistently use the u32 type
instead.
Acked-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Commit 5dfc54e0 ("ARM: GIC: avoid routing interrupts to offline CPUs")
prevents the GIC from setting the affinity of an IRQ to a CPU with
id >= nr_cpu_ids. This was previously abused by perf on some platforms
where more IRQs were registered than possible CPUs.
This patch fixes the problem by using a cpumask_t to keep track of the
active (requested) interrupts in perf. The same effect could be achieved
by limiting the number of IRQs to the number of CPUs, but using a mask
instead will be useful for adding extended CPU hotplug support in the
future.
Acked-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Once upon a time, OProfile and Perf fought hard over who could play with
the PMU. To stop all hell from breaking loose, pmu.c offered an internal
reserve/release API and took care of parsing PMU platform data passed in
from board support code.
Now that Perf has ingested OProfile, let's move the platform device
handling into the Perf driver and out of the PMU locking code.
Unfortunately, the lock has to remain to prevent Perf being bitten by
out-of-tree modules such as LTTng, which still claim a right to the PMU
when Perf isn't looking.
Acked-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch removes const qualifiers from instances of struct arm_pmu,
and functions initialising them, in preparation for generalising
arm_pmu usage to system (AKA uncore) PMUs.
This will allow for dynamically modifiable structures (locks,
struct pmu) to be added as members of struct arm_pmu.
Acked-by: Jamie Iles <jamie@jamieiles.com>
Reviewed-by: Jean Pihet <j-pihet@ti.com>
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>