We mask the event with the CCI_PMU_EVENT_MASK, before passing
the config to pmu_validate_hw_event(), which causes extra bits
to be ignored and qualifies an invalid event code as valid.
e.g,
$ perf stat -a -C 0 -e CCI_400/config=0x1ff,name=cycles/ sleep 1
Performance counter stats for 'system wide':
506951142 cycles
1.013879626 seconds time elapsed
where, cycles has an event coding of 0xff. This patch also removes
the unnecessary 'event' mask in pmu_write_register, since the config_base
is set by the pmu code after the event is validated.
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
This patch separates the PMU driver code from the low level
CCI driver code and enables the PMU driver for ARM64.
Introduces config options for both.
ARM_CCI400_PORT_CTRL - controls the low level driver code for
CCI400 ports.
ARM_CCI400_PMU - controls the PMU driver code
ARM_CCI400_COMMON - Common defintions for CCI400
This patch also changes:
ARM_CCI - common code for probing the CCI devices. This can be
used for adding support for newer CCI versions(e.g, CCI-500).
Cc: Bartlomiej Zolnierkiewicz <b.zolnierkie@samsung.com>
Cc: Kukjin Kim <kgene@kernel.org>
Cc: Abhilash Kesavan <a.kesavan@samsung.com>
Cc: Liviu Dudau <liviu.dudau@arm.com>
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Sudeep Holla <sudeep.holla@arm.com>
Cc: Nicolas Pitre <nicolas.pitre@linaro.org>
Cc: Punit Agrawal <punit.agrawal@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Avoid secure transactions while probing the CCI PMU. The
existing code makes use of the Peripheral ID2 (PID2) register
to determine the revision of the CCI400, which requires a
secure transaction. This puts a limitation on the usage of the
driver on systems running non-secure Linux(e.g, ARM64).
Updated the device-tree binding for cci pmu node to add the explicit
revision number for the compatible field.
The supported strings are :
arm,cci-400-pmu,r0
arm,cci-400-pmu,r1
arm,cci-400-pmu - DEPRECATED. See NOTE below
NOTE: If the revision is not mentioned, we need to probe the cci revision,
which could be fatal on a platform running non-secure. We need a reliable way
to know if we can poke the CCI registers at runtime on ARM32. We depend on
'mcpm_is_available()' when it is available. mcpm_is_available() returns true
only when there is a registered driver for mcpm. Otherwise, we assume that we
don't have secure access, and skips probing the revision number(ARM64 case).
The MCPM should figure out if it is safe to access the CCI. Unfortunately
there isn't a reliable way to indicate the same via dtb. This patch doesn't
address/change the current situation. It only deals with the CCI-PMU, leaving
the assumptions about the secure access as it has been, prior to this patch.
Cc: devicetree@vger.kernel.org
Cc: Punit Agrawal <punit.agrawal@arm.com>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
CCI400 has different event specifications for PMU, for revsion
0 and revision 1. As of now, we check the revision every single
time before using the parameters for the PMU. This patch abstracts
the details of the pmu models in a struct (cci_pmu_model) and
stores the information in cci_pmu at initialisation time, avoiding
multiple probe operations.
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
No functional changes, only code re-arrangements for easier split of the
PMU code vs low level driver code. Extracts the port handling code
to cci_probe_ports().
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
The perf core implicitly rejects events spanning multiple HW PMUs, as in
these cases the event->ctx will differ. However this validation is
performed after pmu::event_init() is called in perf_init_event(), and
thus pmu::event_init() may be called with a group leader from a
different HW PMU.
The CCI PMU driver does not take this fact into account, and assumes
that the any other hardware event belongs to the CCI. There are two
issues with it :
1) It is wrong and we should reject such groups.
2) Validation allocates an temporary idx for this non-cci event, which leads
to wrong calculation of the counter availability, and eventually lesser
number of events in the group.
This patch updates the CCI PMU driver to first test for and reject
events from other PMUs, which is the right thing to do.
Cc: Will Deacon <will.deacon@arm.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Peter Ziljstra (Intel) <peterz@infradead.org>
Signed-off-by: Suzuki K. Poulose <suzuki.poulose@arm.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
printk and friends can now format bitmaps using '%*pb[l]'. cpumask
and nodemask also provide cpumask_pr_args() and nodemask_pr_args()
respectively which can be used to generate the two printf arguments
necessary to format the specified cpu/nodemask.
* Line termination only requires one extra space at the end of the
buffer. Use PAGE_SIZE - 1 instead of PAGE_SIZE - 2 when formatting.
Signed-off-by: Tejun Heo <tj@kernel.org>
Cc: Russell King <linux@arm.linux.org.uk>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The arm-cci driver completes the probe sequence even if the cci node is
marked as disabled. Add a check in the driver to honour the cci status
in the device tree.
Signed-off-by: Abhilash Kesavan <a.kesavan@samsung.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Sudeep Holla <sudeep.holla@arm.com>
Tested-by: Kevin Hilman <khilman@linaro.org>
Signed-off-by: Olof Johansson <olof@lixom.net>
The ARM CPU PMUs and the ARM CCI PMU are using the same framework
despite being substantially different in programming model, which makes
it difficult to handle either particularly well.
This patch migrates the ARM CCI PMU driver away from the arm_pmu
framework, matching the style of the CCN PMU driver and other 'uncore'
PMU drivers. This will enable refactoring of the arm_pmu framework to
better support CPU PMUs. Event context migration on hotplug is not yet
added due to a race on event->ctx in the core perf code.
Signed-off-by: Mark Rutland <mark.rutland@arm.com>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
[will: fix whitespace issues]
Signed-off-by: Will Deacon <will.deacon@arm.com>
In commit ae91d60ba8, a bug was fixed that
involved converting !x & y to !(x & y). The code below shows the same
pattern, and thus should perhaps be fixed in the same way.
The Coccinelle semantic patch that makes this change is as follows:
// <smpl>
@@ expression E1,E2; @@
(
!E1 & !E2
|
- !E1 & E2
+ !(E1 & E2)
)
// </smpl>
Signed-off-by: Himangi Saraogi <himangi774@gmail.com>
Acked-by: Julia Lawall <julia.lawall@lip6.fr>
Acked-by: Punit Agrawal <punit.agrawal@arm.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
The event numbering changed between revision r0 and r1 of the CCI
PMU. Expose this to userspace to allow tooling to handle the
differences in event numbers.
Suggested-by: Drew Richardson <Drew.Richardson@arm.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
The driver queries the CCI IP revision to distinguish between r0 and r1
scheme for event numbers and currently supports upto version r1p2. To
minimise code churn every time there's a new version of the IP, assume
that event numbering doesn't change for revisions > r1p0 (which is
the case).
The driver will still need an update for future revisions that change
the event numbers.
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
This patch fixes a bug/typo in the CCI driver kcalloc usage
that inadvertently swapped the parameters order in the
kcalloc call and went unnoticed.
Reported-by: Xia Feng <xiafeng@allwinnertech.com>
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Olof Johansson <olof@lixom.net>
Pull ARM updates from Russell King:
"Included in this series are:
1. BE8 (modern big endian) changes for ARM from Ben Dooks
2. big.Little support from Nicolas Pitre and Dave Martin
3. support for LPAE systems with all system memory above 4GB
4. Perf updates from Will Deacon
5. Additional prefetching and other performance improvements from Will.
6. Neon-optimised AES implementation fro Ard.
7. A number of smaller fixes scattered around the place.
There is a rather horrid merge conflict in tools/perf - I was never
notified of the conflict because it originally occurred between Will's
tree and other stuff. Consequently I have a resolution which Will
forwarded me, which I'll forward on immediately after sending this
mail.
The other notable thing is I'm expecting some build breakage in the
crypto stuff on ARM only with Ard's AES patches. These were merged
into a stable git branch which others had already pulled, so there's
little I can do about this. The problem is caused because these
patches have a dependency on some code in the crypto git tree - I
tried requesting a branch I can pull to resolve these, and all I got
each time from the crypto people was "we'll revert our patches then"
which would only make things worse since I still don't have the
dependent patches. I've no idea what's going on there or how to
resolve that, and since I can't split these patches from the rest of
this pull request, I'm rather stuck with pushing this as-is or
reverting Ard's patches.
Since it should "come out in the wash" I've left them in - the only
build problems they seem to cause at the moment are with randconfigs,
and since it's a new feature anyway. However, if by -rc1 the
dependencies aren't in, I think it'd be best to revert Ard's patches"
I resolved the perf conflict roughly as per the patch sent by Russell,
but there may be some differences. Any errors are likely mine. Let's
see how the crypto issues work out..
* 'for-linus' of git://git.linaro.org/people/rmk/linux-arm: (110 commits)
ARM: 7868/1: arm/arm64: remove atomic_clear_mask() in "include/asm/atomic.h"
ARM: 7867/1: include: asm: use 'int' instead of 'unsigned long' for 'oldval' in atomic_cmpxchg().
ARM: 7866/1: include: asm: use 'long long' instead of 'u64' within atomic.h
ARM: 7871/1: amba: Extend number of IRQS
ARM: 7887/1: Don't smp_cross_call() on UP devices in arch_irq_work_raise()
ARM: 7872/1: Support arch_irq_work_raise() via self IPIs
ARM: 7880/1: Clear the IT state independent of the Thumb-2 mode
ARM: 7878/1: nommu: Implement dummy early_paging_init()
ARM: 7876/1: clear Thumb-2 IT state on exception handling
ARM: 7874/2: bL_switcher: Remove cpu_hotplug_driver_{lock,unlock}()
ARM: footbridge: fix build warnings for netwinder
ARM: 7873/1: vfp: clear vfp_current_hw_state for dying cpu
ARM: fix misplaced arch_virt_to_idmap()
ARM: 7848/1: mcpm: Implement cpu_kill() to synchronise on powerdown
ARM: 7847/1: mcpm: Factor out logical-to-physical CPU translation
ARM: 7869/1: remove unused XSCALE_PMU Kconfig param
ARM: 7864/1: Handle 64-bit memory in case of 32-bit phys_addr_t
ARM: 7863/1: Let arm_add_memory() always use 64-bit arguments
ARM: 7862/1: pcpu: replace __get_cpu_var_uses
ARM: 7861/1: cacheflush: consolidate single-CPU ARMv7 cache disabling code
...
cci_enable_port_for_self written in asm and it works with h/w
registers that are in little endian format. When run in big
endian mode it needs byteswaped constants before/after it
writes/reads to/from such registers
Signed-off-by: Victor Kamensky <victor.kamensky@linaro.org>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Ben Dooks <ben.dooks@codethink.co.uk>
This patch fix the error handle of function cci_pmu_probe():
- using IS_ERR() instead of NULL test for the return value of
devm_ioremap_resource() since it nerver return NULL.
- remove kfree() for devm_kzalloc allocated memory
- remove dev_warn() since devm_ioremap_resource() has error message
already.
Signed-off-by: Wei Yongjun <yongjun_wei@trendmicro.com.cn>
Signed-off-by: Will Deacon <will.deacon@arm.com>
Extend the existing CCI driver to support the PMU by registering a perf
backend for it.
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Cc: Nicolas Pitre <nico@linaro.org>
Cc: Dave Martin <dave.martin@linaro.org>
Reviewed-by: Will Deacon <will.deacon@arm.com>
Signed-off-by: Punit Agrawal <punit.agrawal@arm.com>
[will: removed broken __init annotations]
Signed-off-by: Will Deacon <will.deacon@arm.com>
Since the CPU device nodes can be retrieved using arch_of_get_cpu_node,
we can use it to avoid parsing the cpus node searching the cpu nodes and
mapping to logical index.
This patch removes parsing DT for cpu nodes by using of_get_cpu_node.
Cc: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Acked-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Sudeep KarkadaNagesha <sudeep.karkadanagesha@arm.com>
When we build a kernel with support for both ARMv6 and ARMv7,
gas is trying to be helpful by pointing out that the arm-cci
driver would not work on ARMv6:
/tmp/ccu1LDeU.s: Assembler messages:
/tmp/ccu1LDeU.s:450: Error: selected processor does not support ARM mode `wfi '
/tmp/ccu1LDeU.s:451: Error: selected processor does not support ARM mode `wfe '
make[4]: *** [drivers/bus/arm-cci.o] Error 1
We know that the driver will only be used on ARMv7, hence we
can annotate the inline assembly listing to allow those instructions.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Nicolas Pitre <nico@linaro.org>
Cc: Dave Martin <dave.martin@linaro.org>
This provides cci_enable_port_for_self(). This is the counterpart to
cci_disable_port_by_cpu(self).
This is meant to be called from the MCPM machine specific power_up_setup
callback code when the appropriate affinity level needs to be initialized.
The code therefore has to be position independent as the MMU is still off
and it cannot rely on any stack space.
Signed-off-by: Nicolas Pitre <nico@linaro.org>
Reviewed-by: Dave Martin <dave.martin@linaro.org>
On ARM multi-cluster systems coherency between cores running on
different clusters is managed by the cache-coherent interconnect (CCI).
It allows broadcasting of TLB invalidates and memory barriers and it
guarantees cache coherency at system level through snooping of slave
interfaces connected to it.
This patch enables the basic infrastructure required in Linux to handle and
programme the CCI component.
Non-local variables used by the CCI management functions called by power
down function calls after disabling the cache must be flushed out to main
memory in advance, otherwise incoherency of those values may occur if they
are sitting in the cache of some other CPU when power down functions
execute. Driver code ensures that relevant data structures are flushed
from inner and outer caches after the driver probe is completed.
CCI slave port resources are linked to set of CPUs through bus masters
phandle properties that link the interface resources to masters node in
the device tree.
Documentation describing the CCI DT bindings is provided with the patch.
Signed-off-by: Lorenzo Pieralisi <lorenzo.pieralisi@arm.com>
Signed-off-by: Nicolas Pitre <nicolas.pitre@linaro.org>