Currently we reject events which have the L3 bank == 1, such as
0x000084918F, because the cache field is non-zero.
However that is incorrect, because although the bank is non-zero, the
value we would write into MMCRC is zero, and so we can count the event.
So fix the check to ignore the bank selector when checking whether the
cache selector is non-zero.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The commit adds a Kconfig option which allows the hv_gpci and hv_24x7
PMUs, added in the preceeding commits, to be built.
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This provides a basic interface between hv_24x7 and perf. Similar to
the one provided for gpci, it lacks transaction support and does not
list any events.
Example usage via perf tool:
perf stat -e 'hv_24x7/domain=2,offset=8,starting_index=0,lpar=0xffffffff/' -r 0 -C 0 -x ' ' sleep 0.1
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This provides a basic link between perf and hv_gpci. Notably, it does
not yet support transactions and does not list any events (they can
still be manually composed).
Example usage via perf tool:
perf stat -e 'hv_gpci/counter_info_version=3,offset=0,length=8,secondary_index=0,starting_index=0xffffffff,request=0x10/' -r 0 -C 0 -x ' ' sleep 0.1
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add two macros which generate functions to extract the relevent bits
from event->attr.config{,1,2}.
EVENT_DEFINE_RANGE() defines an accessor for a range of bits in the
event, as well as a "max" function that gives the maximum value of the
field based on the bit width.
EVENT_DEFINE_RANGE_FORMAT() defines the accessor & max routine and also
a format attribute for use in the PMU's attr_groups.
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
[mpe: move to powerpc, ugly but descriptive macro names]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This exposes a simple way to grab the firmware provided
collect_priveliged, ga, expanded, and lab capability bits. All of these
bits come in from the same gpci request, so we've exposed all of them.
Only the collect_priveliged bit is really used by the hv-gpci/hv-24x7
code, the other bits are simply exposed in sysfs to inform the user.
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
24x7 (also called hv_24x7 or H_24X7) is an interface to obtain
performance counters from the hypervisor. These counters do not have a
fixed format/possition and are instead documented in a "24x7 Catalog",
which is provided by the hypervisor (that interface is also documented
paritialy in the included hv-24x7-catalog.h and fully in at
https://raw.githubusercontent.com/jmesmon/catalog-24x7/master/hv-24x7-catalog.h ).
The 24x7 data access is simply a copy operation into a 4 dimentional
array of 64bit counters (from hypervisor to kernel memory). There is no
interupt triggered on overflow, these are completely disjoint from the
typical power pmu.
This method of obtaining performance counters from the hypervisor is
intended to paritialy replace the gpci interface.
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
"H_GetPerformanceCounterInfo" (refered to as hv_gpci or just gpci from
here on) is an interface to retrieve specific performance counters and
other data from the hypervisor. All outputs have a fixed format. This
header only describes the portions of the interface that we plan on
using in linux at this time.
Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The previous commit added constraint and register handling to allow
processes using EBB (Event Based Branches) to request access to the BHRB
(Branch History Rolling Buffer).
With that in place we can allow processes using EBB to access the BHRB.
This is achieved by setting BHRBA in MMCR0 when we enable EBB access. We
must also clear BHRBA when we are disabling.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We want a way for users of EBB (Event Based Branches) to also access the
BHRB (Branch History Rolling Buffer). EBB does not interoperate with our
existing BHRB support, which is wired into the generic Linux branch
stack sampling support.
To support EBB & BHRB we add three new bits to the event code. The first
bit indicates that the event wants access to the BHRB, and the other two
bits indicate the desired IFM (Instruction Filtering Mode).
We allow multiple events to request access to the BHRB, but they must
agree on the IFM value. Events which are not interested in the BHRB can
also interoperate with events which do.
Finally we program the desired IFM value into MMCRA. Although we do this
for every event, we know that the value will be identical for all events
that request BHRB access.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
We only need to mask the EBB bit out of the event for the check of the
special PMC 5 & 6 events. So use a local to do it just for that code,
rather than changing the event value for the life of the function.
While we're there move the set of mask and value after all the checks.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Rather than using PERF_EVENT_CONFIG_EBB_SHIFT everywhere, add an
EVENT_EBB_SHIFT like every other event and use that.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Although we already block EBB events which request sampling using
sample_period, technically it's possible for an event to set sample_type
but not sample_period.
Nothing terrible will happen if an EBB event does specify sample_type,
but it signals a major confusion on the part of userspace, and so we do
them the favor of rejecting it.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Some power8 revisions have a hardware bug where we can lose a PMU
exception, this commit adds a workaround to detect the bad condition and
rectify the situation.
See the comment in the commit for a full description.
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Currently the sysrq ShowRegs command does not print any PMU registers as
we have an empty definition for perf_event_print_debug(). This patch
defines perf_event_print_debug() to print various PMU registers.
Example output:
CPU: 0 PMU registers, ppmu = POWER7 n_counters = 6
PMC1: 00000000 PMC2: 00000000 PMC3: 00000000 PMC4: 00000000
PMC5: 00000000 PMC6: 00000000 PMC7: deadbeef PMC8: deadbeef
MMCR0: 0000000080000000 MMCR1: 0000000000000000 MMCRA: 0f00000001000000
SIAR: 0000000000000000 SDAR: 0000000000000000 SIER: 0000000000000000
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Fix 32 bit build and rework formatting for compactness]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This patchset adds some missing event list for POWER7 PMU raw
events which are exported through sysfs interface. Also updates
the ABI documentation to add all the sysfs exported raw events.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Right now the config_bhrb() PMU specific call happens after
write_mmcr0(), which actually enables the PMU for event counting and
interrupts. So there is a small window of time where the PMU and BHRB
runs without the required HW branch filter (if any) enabled in BHRB.
This can cause some of the branch samples to be collected through BHRB
without any filter applied and hence affects the correctness of
the results. This patch moves the BHRB config function call before
enabling interrupts.
Here are some data points captured via trace prints which depicts how we
could get PMU interrupts with BHRB filter NOT enabled with a standard
perf record command line (asking for branch record information as well).
$ perf record -j any_call ls
Before the patch:-
ls-1962 [003] d... 2065.299590: .perf_event_interrupt: MMCRA: 40000000000
ls-1962 [003] d... 2065.299603: .perf_event_interrupt: MMCRA: 40000000000
...
All the PMU interrupts before this point did not have the requested
HW branch filter enabled in the MMCRA.
ls-1962 [003] d... 2065.299647: .perf_event_interrupt: MMCRA: 40040000000
ls-1962 [003] d... 2065.299662: .perf_event_interrupt: MMCRA: 40040000000
After the patch:-
ls-1850 [008] d... 190.311828: .perf_event_interrupt: MMCRA: 40040000000
ls-1850 [008] d... 190.311848: .perf_event_interrupt: MMCRA: 40040000000
All the PMU interrupts have the requested HW BHRB branch filter
enabled in MMCRA.
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
[mpe: Fixed up whitespace and cleaned up changelog]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 4df4899 "Add power8 EBB support" included a bug in the handling
of the FAB_CRESP_MATCH and FAB_TYPE_MATCH fields.
These values are pulled out of the event code using EVENT_THR_CTL_SHIFT,
however we were then or'ing that value directly into MMCR1.
This meant we were failing to set the FAB fields correctly, and also
potentially corrupting the value for PMC4SEL. Leading to no counts for
the FAB events and incorrect counts for PMC4.
The fix is simply to shift left the FAB value correctly before or'ing it
with MMCR1.
Reported-by: Sooraj Ravindran Nair <soonair3@in.ibm.com>
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: <stable@vger.kernel.org> # 3.10+
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Pull powerpc updates from Ben Herrenschmidt:
"Here's the powerpc batch for this merge window. Some of the
highlights are:
- A bunch of endian fixes ! We don't have full LE support yet in that
release but this contains a lot of fixes all over arch/powerpc to
use the proper accessors, call the firmware with the right endian
mode, etc...
- A few updates to our "powernv" platform (non-virtualized, the one
to run KVM on), among other, support for bridging the P8 LPC bus
for UARTs, support and some EEH fixes.
- Some mpc51xx clock API cleanups in preparation for a clock API
overhaul
- A pile of cleanups of our old math emulation code, including better
support for using it to emulate optional FP instructions on
embedded chips that otherwise have a HW FPU.
- Some infrastructure in selftest, for powerpc now, but could be
generalized, initially used by some tests for our perf instruction
counting code.
- A pile of fixes for hotplug on pseries (that was seriously
bitrotting)
- The usual slew of freescale embedded updates, new boards, 64-bit
hiberation support, e6500 core PMU support, etc..."
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (146 commits)
powerpc: Correct FSCR bit definitions
powerpc/xmon: Fix printing of set of CPUs in xmon
powerpc/pseries: Move lparcfg.c to platforms/pseries
powerpc/powernv: Return secondary CPUs to firmware on kexec
powerpc/btext: Fix CONFIG_PPC_EARLY_DEBUG_BOOTX on ppc32
powerpc: Cleanup handling of the DSCR bit in the FSCR register
powerpc/pseries: Child nodes are not detached by dlpar_detach_node
powerpc/pseries: Add mising of_node_put in delete_dt_node
powerpc/pseries: Make dlpar_configure_connector parent node aware
powerpc/pseries: Do all node initialization in dlpar_parse_cc_node
powerpc/pseries: Fix parsing of initial node path in update_dt_node
powerpc/pseries: Pack update_props_workarea to map correctly to rtas buffer header
powerpc/pseries: Fix over writing of rtas return code in update_dt_node
powerpc/pseries: Fix creation of loop in device node property list
powerpc: Skip emulating & leave interrupts off for kernel program checks
powerpc: Add more exception trampolines for hypervisor exceptions
powerpc: Fix location and rename exception trampolines
powerpc: Add more trap names to xmon
powerpc/pseries: Add a warning in the case of cross-cpu VPA registration
powerpc: Update the 00-Index in Documentation/powerpc
...
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.19 (GNU/Linux)
iQEcBAABAgAGBQJSCDSjAAoJEHm+PkMAQRiGDXMIAI7Loae0Oqb1eoeJkvjyZsBS
OJDeeEcn+k58VbxVHyRdc7hGo4yI4tUZm172SpnOaM8sZ/ehPU7zBrwJK2lzX334
/jAM3uvVPfxA2nu0I4paNpkED/NQ8NRRsYE1iTE8dzHXOH6dA3mgp5qfco50rQvx
rvseXpME4KIAJEq4jnyFZF5+nuHiPueM9JftPmSSmJJ3/KY9kY1LESovyWd7ttg1
jYSVPFal9J0E+tl2UQY5g9H16GqhhjYn+39Iei6Q5P4bL4ZubQgTRQTN9nyDc06Z
ezQtGoqZ8kEz/2SyRlkda6PzjSEhgXlc8mCL5J7AW+dMhTHHx2IrosjiCA80kG8=
=c0rK
-----END PGP SIGNATURE-----
Merge tag 'v3.11-rc5' into perf/core
Merge Linux 3.11-rc5, to sync up with the latest upstream fixes since -rc1.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Address some of the trivial sparse warnings in arch/powerpc.
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
e6500 core performance monitors has the following features:
- 6 performance monitor counters
- 512 events supported
- no threshold events
e6500 PMU has more specific events (Data L1 cache misses, Instruction L1
cache misses, etc ) than e500 PMU (which only had Data L1 cache reloads,
etc). Where available, the more specific events have been used which will
produce slightly different results than e500 PMU equivalents.
Signed-off-by: Priyanka Jain <Priyanka.Jain@freescale.com>
Signed-off-by: Lijun Pan <Lijun.Pan@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
There are 6 counters in e6500 core instead of 4 in e500 core.
Signed-off-by: Lijun Pan <Lijun.Pan@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
This change is required after the e6500 perf support has been added.
There are 6 counters in e6500 core instead of 4 in e500 core and
the MAX_HWEVENTS counter should be changed accordingly from 4 to 6.
Added also runtime check for counters overflow.
Signed-off-by: Catalin Udma <catalin.udma@freescale.com>
Signed-off-by: Lijun Pan <Lijun.Pan@freescale.com>
Signed-off-by: Scott Wood <scottwood@freescale.com>
We use bit 63 of the event code for userspace to request that the event
be counted using EBB (Event Based Branches). Export this value, making
it part of the API - though only on processors that support EBB.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
When the task moves around the system, the corresponding cpuhw
per cpu strcuture should be popullated with the BHRB filter
request value so that PMU could be configured appropriately with
that during the next call into power_pmu_enable().
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Completely ignore BHRB privilege state filter request as we are
already configuring that with privilege state filtering attribute
for the accompanying PMU event. This would help achieve cleaner
user space interaction for BHRB.
This patch fixes a situation like this
Before patch:-
------------
./perf record -j any -e branch-misses:k ls
Error:
The sys_perf_event_open() syscall returned with 95 (Operation not
supported) for event (branch-misses:k).
/bin/dmesg may provide additional information.
No CONFIG_PERF_EVENTS=y kernel support configured?
Here 'perf record' actually copies over ':k' filter request into BHRB
privilege state filter config and our previous check in kernel would
fail that.
After patch:-
-------------
./perf record -j any -e branch-misses:k ls
perf perf.data perf.data.old test-mmap-ring
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (~102 samples)]
Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Acked-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The presence or absence of EBB is advertised to userspace via the presence
or absence of PPC_FEATURE2_EBB in cpu_user_features2.
Because the kernel can be built without PMU support, we should only add
PPC_FEATURE2_EBB to cpu_user_features2 when we successfully register the
power8 PMU support.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
. Add missing 'finished_round' event forwarding in 'perf inject', from Adrian Hunter.
. Assorted tidy ups, from Adrian Hunter.
. Fall back to sysfs event names when parsing fails, from Andi Kleen.
. List pmu events in perf list, from Andi Kleen.
. Cleanup some memory allocation/freeing uses, from David Ahern.
. Add option to collapse undesired parts of call graph, from Greg Price.
. Prep work for multi perf data file storage, from Jiri Olsa.
. Add support for more than two files comparision in 'perf diff', from Jiri Olsa
. A few more 'perf test' improvements, from Jiri Olsa
. libtraceevent cleanups, from Namhyung Kim.
. Remove odd build stall in 'perf sched' by moving a large struct initialization
from a local variable to a global one, from Namhyung Kim.
. Add support for callchains in the gtk UI, from Namhyung Kim.
. Do not apply symfs for an absolute vmlinux path, fix from Namhyung Kim.
. Use default include path notation for libtraceevent, from Robert Richter.
. Fix 'make tools/perf', from Robert Richter.
. Make Power7 events available, from Runzhen Wang.
. Add --objdump option to 'perf top', from Sukadev Bhattiprolu.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.13 (GNU/Linux)
iQIcBAABAgAGBQJR6EOuAAoJENZQFvNTUqpAqjEP/1Ist/eR9be8YhljMz8Yxl1o
JXktgxSkMS/n59lRibUuGZrgPKPNxivK6AEbnbZxzZoHDBS8tnAAOXUuUVTtNCoT
YsQurQjCmyXHIvYqwMaYarrhoIv33LdJyshskW3GZ81UfeeC6QoC56he3VTg1dEd
k8snS4F8LJpBQizRJN6s959nF+pyw16wqiGYKJ80G1nhPTsStz8NSSWdCRVbyXl9
fG0S/lLvUfilGT/ixHcvS62ENHiErL4N6jGNV4XeQqoADhrQvCwziDr+BORfJB9K
udbO0PFS5uR4HOGNqZOPZfPxW8cTUXV9cCscLScKEVUghKz9rzHbPTTSDejXna/h
cqLjRW1xpWUmIRY7Y5zSoBJIsh2t3vo4TkZoRNZxhCexoOT/qIUL6bWVZoxqaKzG
xwL4DopOvb/DdUDkb+UB9+9kW4rDoMR1wUb6XXuGx8EqM8LHiA3TAPcGwmNh/IM6
4maUhgyOFad3rm5mcjO7IoCU7NxoWR1dKsjGYteZeZv17X30UfRFwIIH+8l2Ma4o
bpBQ4DKu07jXRUZXcUajOhj7cZO+UmE6c4YoAPpv+CQ1YKVH4/YHYf6RBWdjGLqV
kqOrQuSopVztmaglrh93e+cYkfvNjZAC2EOzhx+UxcPqh9zh1Pil8LE1DwjChW5m
/Y/NPH98nDvXY3N4vlLK
=xk/v
-----END PGP SIGNATURE-----
Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core
Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:
* Add missing 'finished_round' event forwarding in 'perf inject', from Adrian Hunter.
* Assorted tidy ups, from Adrian Hunter.
* Fall back to sysfs event names when parsing fails, from Andi Kleen.
* List pmu events in perf list, from Andi Kleen.
* Cleanup some memory allocation/freeing uses, from David Ahern.
* Add option to collapse undesired parts of call graph, from Greg Price.
* Prep work for multi perf data file storage, from Jiri Olsa.
* Add support for more than two files comparision in 'perf diff', from Jiri Olsa
* A few more 'perf test' improvements, from Jiri Olsa
* libtraceevent cleanups, from Namhyung Kim.
* Remove odd build stall in 'perf sched' by moving a large struct initialization
from a local variable to a global one, from Namhyung Kim.
* Add support for callchains in the gtk UI, from Namhyung Kim.
* Do not apply symfs for an absolute vmlinux path, fix from Namhyung Kim.
* Use default include path notation for libtraceevent, from Robert Richter.
* Fix 'make tools/perf', from Robert Richter.
* Make Power7 events available, from Runzhen Wang.
* Add --objdump option to 'perf top', from Sukadev Bhattiprolu.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Pull perf fixes from Thomas Gleixner:
- fix for do_div() abuse on x86
- locking fix in perf core
- a pile of (build) fixes and cleanups in perf tools
* 'perf-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (27 commits)
perf/x86: Fix incorrect use of do_div() in NMI warning
perf: Fix perf_lock_task_context() vs RCU
perf: Remove WARN_ON_ONCE() check in __perf_event_enable() for valid scenario
perf: Clone child context from parent context pmu
perf script: Fix broken include in Context.xs
perf tools: Fix -ldw/-lelf link test when static linking
perf tools: Revert regression in configuration of Python support
perf tools: Fix perf version generation
perf stat: Fix per-socket output bug for uncore events
perf symbols: Fix vdso list searching
perf evsel: Fix missing increment in sample parsing
perf tools: Update symbol_conf.nr_events when processing attribute events
perf tools: Fix new_term() missing free on error path
perf tools: Fix parse_events_terms() segfault on error path
perf evsel: Fix count parameter to read call in event_format__new
perf tools: fix a typo of a Power7 event name
perf tools: Fix -x/--exclude-other option for report command
perf evlist: Enhance perf_evlist__start_workload()
perf record: Remove -f/--force option
perf record: Remove -A/--append option
...
Power7 supports over 530 different perf events but only a small subset
of these can be specified by name, for the remaining events, we must
specify them by their raw code:
perf stat -e r2003c <application>
This patch makes all the POWER7 events available in sysfs. So we can
instead specify these as:
perf stat -e 'cpu/PM_CMPLU_STALL_DFU/' <application>
where PM_CMPLU_STALL_DFU is the r2003c in previous example.
Before this patch is applied, the size of power7-pmu.o is:
$ size arch/powerpc/perf/power7-pmu.o
text data bss dec hex filename
3073 2720 0 5793 16a1 arch/powerpc/perf/power7-pmu.o
and after the patch is applied, it is:
$ size arch/powerpc/perf/power7-pmu.o
text data bss dec hex filename
15950 31112 0 47062 b7d6 arch/powerpc/perf/power7-pmu.o
For the run time overhead, I use two scripts, one is "event_name.sh",
which contains 50 event names, it looks like:
# ./perf record -e 'cpu/PM_CMPLU_STALL_DFU/' -e ..... /bin/sleep 1
the other one is named "event_code.sh" which use corresponding events
raw
code instead of events names, it looks like:
# ./perf record -e r2003c -e ...... /bin/sleep 1
below is the result.
Using events name:
[root@localhost perf]# time ./event_name.sh
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.002 MB perf.data (~102 samples) ]
real 0m1.192s
user 0m0.028s
sys 0m0.106s
Using events raw code:
[root@localhost perf]# time ./event_code.sh
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.003 MB perf.data (~112 samples) ]
real 0m1.198s
user 0m0.028s
sys 0m0.105s
Signed-off-by: Runzhen Wang <runzhen@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Cc: icycoder@gmail.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Runzhen Wang <runzhew@clemson.edu>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1372407297-6996-3-git-send-email-runzhen@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In the Power7 PMU guide:
https://www.power.org/documentation/commonly-used-metrics-for-performance-analysis/
PM_BRU_MPRED is referred to as PM_BR_MPRED.
It fixed the typo by changing the name of the event in kernel and
documentation accordingly.
This patch changes the ABI, there are some reasons I think it's ok:
- It is relatively new interface, specific to the Power7 platform.
- No tools that we know of actually use this interface at this point
(none are listed near the interface).
- Users of this interface (eg oprofile users migrating to perf)
would be more used to the "PM_BR_MPRED" rather than "PM_BRU_MPRED".
- These are in the ABI/testing at this point rather than ABI/stable,
so hoping we have some wiggle room.
Signed-off-by: Runzhen Wang <runzhen@linux.vnet.ibm.com>
Acked-by: Michael Ellerman <michael@ellerman.id.au>
Cc: icycoder@gmail.com
Cc: linuxppc-dev@lists.ozlabs.org
Cc: Michael Ellerman <michael@ellerman.id.au>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Runzhen Wang <runzhew@clemson.edu>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1372407297-6996-2-git-send-email-runzhen@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Pull powerpc updates from Ben Herrenschmidt:
"This is the powerpc changes for the 3.11 merge window. In addition to
the usual bug fixes and small updates, the main highlights are:
- Support for transparent huge pages by Aneesh Kumar for 64-bit
server processors. This allows the use of 16M pages as transparent
huge pages on kernels compiled with a 64K base page size.
- Base VFIO support for KVM on power by Alexey Kardashevskiy
- Wiring up of our nvram to the pstore infrastructure, including
putting compressed oopses in there by Aruna Balakrishnaiah
- Move, rework and improve our "EEH" (basically PCI error handling
and recovery) infrastructure. It is no longer specific to pseries
but is now usable by the new "powernv" platform as well (no
hypervisor) by Gavin Shan.
- I fixed some bugs in our math-emu instruction decoding and made it
usable to emulate some optional FP instructions on processors with
hard FP that lack them (such as fsqrt on Freescale embedded
processors).
- Support for Power8 "Event Based Branch" facility by Michael
Ellerman. This facility allows what is basically "userspace
interrupts" for performance monitor events.
- A bunch of Transactional Memory vs. Signals bug fixes and HW
breakpoint/watchpoint fixes by Michael Neuling.
And more ... I appologize in advance if I've failed to highlight
something that somebody deemed worth it."
* 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc: (156 commits)
pstore: Add hsize argument in write_buf call of pstore_ftrace_call
powerpc/fsl: add MPIC timer wakeup support
powerpc/mpic: create mpic subsystem object
powerpc/mpic: add global timer support
powerpc/mpic: add irq_set_wake support
powerpc/85xx: enable coreint for all the 64bit boards
powerpc/8xx: Erroneous double irq_eoi() on CPM IRQ in MPC8xx
powerpc/fsl: Enable CONFIG_E1000E in mpc85xx_smp_defconfig
powerpc/mpic: Add get_version API both for internal and external use
powerpc: Handle both new style and old style reserve maps
powerpc/hw_brk: Fix off by one error when validating DAWR region end
powerpc/pseries: Support compression of oops text via pstore
powerpc/pseries: Re-organise the oops compression code
pstore: Pass header size in the pstore write callback
powerpc/powernv: Fix iommu initialization again
powerpc/pseries: Inform the hypervisor we are using EBB regs
powerpc/perf: Add power8 EBB support
powerpc/perf: Core EBB support for 64-bit book3s
powerpc/perf: Drop MMCRA from thread_struct
powerpc/perf: Don't enable if we have zero events
...
Pull perf updates from Ingo Molnar:
"Kernel improvements:
- watchdog driver improvements by Li Zefan
- Power7 CPI stack events related improvements by Sukadev Bhattiprolu
- event multiplexing via hrtimers and other improvements by Stephane
Eranian
- kernel stack use optimization by Andrew Hunter
- AMD IOMMU uncore PMU support by Suravee Suthikulpanit
- NMI handling rate-limits by Dave Hansen
- various hw_breakpoint fixes by Oleg Nesterov
- hw_breakpoint overflow period sampling and related signal handling
fixes by Jiri Olsa
- Intel Haswell PMU support by Andi Kleen
Tooling improvements:
- Reset SIGTERM handler in workload child process, fix from David
Ahern.
- Makefile reorganization, prep work for Kconfig patches, from Jiri
Olsa.
- Add automated make test suite, from Jiri Olsa.
- Add --percent-limit option to 'top' and 'report', from Namhyung
Kim.
- Sorting improvements, from Namhyung Kim.
- Expand definition of sysfs format attribute, from Michael Ellerman.
Tooling fixes:
- 'perf tests' fixes from Jiri Olsa.
- Make Power7 CPI stack events available in sysfs, from Sukadev
Bhattiprolu.
- Handle death by SIGTERM in 'perf record', fix from David Ahern.
- Fix printing of perf_event_paranoid message, from David Ahern.
- Handle realloc failures in 'perf kvm', from David Ahern.
- Fix divide by 0 in variance, from David Ahern.
- Save parent pid in thread struct, from David Ahern.
- Handle JITed code in shared memory, from Andi Kleen.
- Fixes for 'perf diff', from Jiri Olsa.
- Remove some unused struct members, from Jiri Olsa.
- Add missing liblk.a dependency for python/perf.so, fix from Jiri
Olsa.
- Respect CROSS_COMPILE in liblk.a, from Rabin Vincent.
- No need to do locking when adding hists in perf report, only 'top'
needs that, from Namhyung Kim.
- Fix alignment of symbol column in in the hists browser (top,
report) when -v is given, from NAmhyung Kim.
- Fix 'perf top' -E option behavior, from Namhyung Kim.
- Fix bug in isupper() and islower(), from Sukadev Bhattiprolu.
- Fix compile errors in bp_signal 'perf test', from Sukadev
Bhattiprolu.
... and more things"
* 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (102 commits)
perf/x86: Disable PEBS-LL in intel_pmu_pebs_disable()
perf/x86: Fix shared register mutual exclusion enforcement
perf/x86/intel: Support full width counting
x86: Add NMI duration tracepoints
perf: Drop sample rate when sampling is too slow
x86: Warn when NMI handlers take large amounts of time
hw_breakpoint: Introduce "struct bp_cpuinfo"
hw_breakpoint: Simplify *register_wide_hw_breakpoint()
hw_breakpoint: Introduce cpumask_of_bp()
hw_breakpoint: Simplify the "weight" usage in toggle_bp_slot() paths
hw_breakpoint: Simplify list/idx mess in toggle_bp_slot() paths
perf/x86/intel: Add mem-loads/stores support for Haswell
perf/x86/intel: Support Haswell/v4 LBR format
perf/x86/intel: Move NMI clearing to end of PMI handler
perf/x86/intel: Add Haswell PEBS support
perf/x86/intel: Add simple Haswell PMU support
perf/x86/intel: Add Haswell PEBS record support
perf/x86/intel: Fix sparse warning
perf/x86/amd: AMD IOMMU Performance Counter PERF uncore PMU implementation
perf/x86/amd: Add IOMMU Performance Counter resource management
...
Add logic to the power8 PMU code to support EBB. Future processors would
also be expected to implement similar constraints. At that time we could
possibly factor these out into common code.
Finally mark the power8 PMU as supporting EBB, which is the actual
enable switch which allows EBBs to be configured.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Add support for EBB (Event Based Branches) on 64-bit book3s. See the
included documentation for more details.
EBBs are a feature which allows the hardware to branch directly to a
specified user space address when a PMU event overflows. This can be
used by programs for self-monitoring with no kernel involvement in the
inner loop.
Most of the logic is in the generic book3s code, primarily to avoid a
proliferation of PMU callbacks.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In power_pmu_enable() we still enable the PMU even if we have zero
events. This should have no effect but doesn't make much sense. Instead
just return after telling the hypervisor that we are not using the PMCs.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In power_pmu_enable() we can use the existing out label to reduce the
number of return paths.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
On Power8 we can freeze PMC5 and 6 if we're not using them. Normally they
run all the time.
As noticed by Anshuman, we should unfreeze them when we disable the PMU
as there are legacy tools which expect them to run all the time.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In pmu_disable() we disable the PMU by setting the FC (Freeze Counters)
bit in MMCR0. In order to do this we have to read/modify/write MMCR0.
It's possible that we read a value from MMCR0 which has PMAO (PMU Alert
Occurred) set. When we write that value back it will cause an interrupt
to occur. We will then end up in the PMU interrupt handler even though
we are supposed to have just disabled the PMU.
We can avoid this by making sure we never write PMAO back. We should not
lose interrupts because when the PMU is re-enabled the overflowed values
will cause another interrupt.
We also reorder the clearing of SAMPLE_ENABLE so that is done after the
PMU is frozen. Otherwise there is a small window between the clearing of
SAMPLE_ENABLE and the setting of FC where we could take an interrupt and
incorrectly see SAMPLE_ENABLE not set. This would for example change the
logic in perf_read_regs().
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A mistake we have made in the past is that we pull out the fields we
need from the event code, but don't check that there are no unknown bits
set. This means that we can't ever assign meaning to those unknown bits
in future.
Although we have once again failed to do this at release, it is still
early days for Power8 so I think we can still slip this in and get away
with it.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
CC: <stable@vger.kernel.org> [v3.10]
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The __cpuinit type of throwaway sections might have made sense
some time ago when RAM was more constrained, but now the savings
do not offset the cost and complications. For example, the fix in
commit 5e427ec2d0 ("x86: Fix bit corruption at CPU resume time")
is a good example of the nasty type of bugs that can be created
with improper use of the various __init prefixes.
After a discussion on LKML[1] it was decided that cpuinit should go
the way of devinit and be phased out. Once all the users are gone,
we can then finally remove the macros themselves from linux/init.h.
This removes all the powerpc uses of the __cpuinit macros. There
are no __CPUINIT users in assembly files in powerpc.
[1] https://lkml.org/lkml/2013/5/20/589
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Josh Boyer <jwboyer@gmail.com>
Cc: Matt Porter <mporter@kernel.crashing.org>
Cc: Kumar Gala <galak@kernel.crashing.org>
Cc: linuxppc-dev@lists.ozlabs.org
Signed-off-by: Paul Gortmaker <paul.gortmaker@windriver.com>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
In commit bc09c21 "Fix finding overflowed PMC in interrupt" we added
a printk() to the PMU exception handler. Unfortunately that is not safe.
The problem is that the PMU exception may run even when interrupts are
soft disabled, aka NMI context. We do this so that we can profile parts
of the kernel that have interrupts soft-disabled.
But by calling printk() from the exception handler, we can potentially
deadlock in the printk code on logbuf_lock, eg:
[c00000038ba575c0] c000000000081928 .vprintk_emit+0xa8/0x540
[c00000038ba576a0] c0000000007bcde8 .printk+0x48/0x58
[c00000038ba57710] c000000000076504 .perf_event_interrupt+0x2d4/0x490
[c00000038ba57810] c00000000001f6f8 .performance_monitor_exception+0x48/0x60
[c00000038ba57880] c0000000000032cc performance_monitor_common+0x14c/0x180
--- Exception: f01 (Performance Monitor) at c0000000007b25d4 ._raw_spin_lock_irq
+0x64/0xc0
[c00000038ba57bf0] c00000000007ed90 .devkmsg_read+0xd0/0x5a0
[c00000038ba57d00] c0000000001c2934 .vfs_read+0xc4/0x1e0
[c00000038ba57d90] c0000000001c2cd8 .SyS_read+0x58/0xd0
[c00000038ba57e30] c000000000009d54 syscall_exit+0x0/0x98
--- Exception: c01 (System Call) at 00001fffffbf6f7c
SP (3ffff6d4de10) is in userspace
Fix it by making sure we only call printk() when we are not in NMI
context.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Cc: <stable@vger.kernel.org> # 3.9
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Commit 8f61aa3 "Add support for SIER" missed updates to siar_valid()
and perf_get_data_addr().
In both cases we need to check the SIER instead of mmcra.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
This is a revert and then some of commit 860aad7 "Add regs_no_sipr()".
This workaround was only needed on early chip versions.
As before NO_SIPR becomes a static flag of the PMU struct.
Signed-off-by: Michael Ellerman <michael@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
A set of Power7 events are often used for Cycles Per Instruction (CPI) stack
analysis. Make these events available in sysfs (/sys/devices/cpu/events/) so
they can be identified using their symbolic names:
perf stat -e 'cpu/PM_CMPLU_STALL_DCACHE_MISS/' /bin/ls
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Acked-by: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: linuxppc-dev@ozlabs.org
Link: http://lkml.kernel.org/r/20130406164803.GA408@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently we only set the "to" address in the branch stack when the CPU
explicitly gives us a value. Unfortunately it only does this for XL form
branches (eg blr, bctr, bctar) and not I and B form branches (eg b, bc).
Fortunately if we read the instruction from memory we can extract the offset of
a branch and calculate the target address.
This adds a function power_pmu_bhrb_to() to calculate the target/to address of
the corresponding I and B form branches. It handles branches in both user and
kernel spaces. It also plumbs this into the perf brhb reading code.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
The current Branch History Rolling Buffer (BHRB) code misinterprets the order
of entries in the hardware buffer. It assumes that a branch target address
will be read _after_ its corresponding branch. In reality the branch target
comes before (lower mfbhrb entry) it's corresponding branch.
This is a rewrite of the code to take this into account.
Signed-off-by: Michael Neuling <mikey@neuling.org>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>