I'm surprised this remained undocumented since at least 2011. And it is
actually a very useful switch, as Steve and I came to realize recently.
Add the text from
2cba3ffb9a ("perf stat: Add -d -d and -d -d -d options to show more CPU events")
which added the incrementing aspect to -d.
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Borislav Petkov <bp@suse.de>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Davidlohr Bueso <dbueso@suse.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mel Gorman <mgorman@suse.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Fixes: 2cba3ffb9a ("perf stat: Add -d -d and -d -d -d options to show more CPU events")
Link: http://lkml.kernel.org/r/1457347294-32546-1-git-send-email-bp@alien8.de
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Support hierarchy output for perf-top using --hierarchy option.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1456326830-30456-19-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --hierarchy option is to show output in hierarchy mode. It extends
folding/unfolding in the TUI and GTK browsers to support sort items as
well as callchains. Users can toggle the items to see the performance
result at wanted level.
$ perf report --hierarchy --tui
Overhead Command / Shared Object / Symbol
--------------------------------------------------
+ 32.96% gnome-shell
- 15.11% swapper
- 14.97% [kernel.vmlinux]
6.82% [k] intel_idle
0.66% [k] menu_select
0.43% [k] __hrtimer_start_range_ns
...
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Pekka Enberg <penberg@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1456326830-30456-17-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Allow user to easily switch all events to user or kernel space with simple
--all-user or --all-kernel options.
This will be handy within perf mem/c2c wrappers to switch easily monitoring
modes.
Committer note:
Testing it:
# perf record --all-kernel --all-user -a sleep 2
Error: option `all-user' cannot be used with all-kernel
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
--all-user Configure all used events to run in user space.
--all-kernel Configure all used events to run in kernel space.
# perf record --all-user --all-kernel -a sleep 2
Error: option `all-kernel' cannot be used with all-user
Usage: perf record [<options>] [<command>]
or: perf record [<options>] -- <command> [<options>]
--all-kernel Configure all used events to run in kernel space.
--all-user Configure all used events to run in user space.
# perf record --all-user -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.416 MB perf.data (162 samples) ]
# perf report | grep '\[k\]'
# perf record --all-kernel -a sleep 1
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 1.423 MB perf.data (296 samples) ]
# perf report | grep '\[\.\]'
#
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1455525293-8671-2-git-send-email-jolsa@kernel.org
[ Made those options to be mutually exclusive ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The '--system' option means $(sysconfdir)/perfconfig and '--user' means
$HOME/.perfconfig. If none is used, both system and user config file are
read. E.g.:
# perf config [<file-option>] [options]
With an specific config file:
# perf config --user | --system
or both user and system config file:
# perf config
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1455126685-32367-2-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds a --jit/-j option to perf inject.
This options injects MMAP records into the perf.data file to cover the
jitted code mmaps. It also emits ELF images for each function in the
jidump file. Those images are created where the jitdump file is. The
MMAP records point to that location as well.
Typical flow:
$ perf record -k mono -- java -agentpath:libpjvmti.so java_class
$ perf inject --jit -i perf.data -o perf.data.jitted
$ perf report -i perf.data.jitted
Note that jitdump.h support is not limited to Java, it works with any
jitted environment modified to emit the jitdump file format, include
those where code can be jitted multiple times and moved around.
The jitdump.h format is adapted from the Oprofile project.
The genelf.c (ELF binary generation) depends on MD5 hash encoding for
the buildid. To enable this, libssl-dev must be installed. If not, then
genelf.c defaults to using urandom to generate the buildid, which is not
ideal. The Makefile auto-detects the presence on libssl-dev.
This version mmaps the jitdump file to create a marker MMAP record in
the perf.data file. The marker is used to detect jitdump and cause perf
inject to inject the jitted mmaps and generate ELF images for jitted
functions.
In V8, the following fixes and changes were made among other things:
- the jidump header format include a new flags field to be used
to carry information about the configuration of the runtime agent.
Contributed by: Adrian Hunter <adrian.hunter@intel.com>
- Fix mmap pgoff: MMAP event pgoff must be the offset within the ELF file
at which the code resides.
Contributed by: Adrian Hunter <adrian.hunter@intel.com>
- Fix ELF virtual addresses: perf tools expect the ELF virtual addresses of dynamic
objects to match the file offset.
Contributed by: Adrian Hunter <adrian.hunter@intel.com>
- JIT MMAP injection does not obey finished_round semantics. JIT MMAP injection injects all
MMAP events in one go, so it does not obey finished_round semantics, so drop the
finished_round events from the output perf.data file.
Contributed by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Carl Love <cel@us.ibm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John McCutchan <johnmccutchan@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Pawel Moll <pawel.moll@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sonny Rao <sonnyrao@chromium.org>
Cc: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Link: http://lkml.kernel.org/r/1448874143-7269-3-git-send-email-eranian@google.com
[ Moved inject.build_ids ordering bits to a separate patch, fixed the NO_LIBELF=1 build ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Explain 'man.viewer' variable and how to add new man viewer tools.
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1454577913-16401-6-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Explain 'report' section's variables:
'percent-limit', 'queue-size' and 'children'.
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1454577913-16401-4-git-send-email-treeze.taeung@gmail.com
[ Fix some grammar issues, add some more info ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Explain 'call-graph' section and its variables:
'record-mode', 'dump-size', 'print-type', 'order', 'sort-key',
'threshold' and 'print-limit'.
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1454577913-16401-3-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This option controls display of column headers (like 'Overhead' and
'Symbol') in 'report' and 'top'. If this option is false, they are
hidden. This option is only applied to TUI.
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1454577913-16401-2-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --percent-limit option was changed to be applied to callchains as
well as to hist entries recently, but it missed to update the doc.
Reported-by: Arnaldo Carvalho de Melo <acme@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1454508683-5735-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The description of the memory sort key (used by --mem-mode) was
misplaced. Move it under the --sort option so that it can be referenced
properly.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1454508683-5735-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Explain 'annotate' section and its variables.
'hide_src_code', 'use_offset', 'jump_arrows',
'show_linenr', 'show_nr_jump' and 'show_total_period'.
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1452253193-30502-5-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Explain 'tui' and 'gtk' sections and these variables.
'top', 'report' and 'annotate'
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1452253193-30502-3-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Explain 'colors' section and its variables, used for The variables for
customizing the colors used in the output for the 'report', 'top' and
'annotate' in the TUI, those are:
'top', 'medium', 'normal', 'selected',
'jump_arrows', 'addr' and 'root'.
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1452253193-30502-2-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
USe 'jump_arrows' config name instead of 'code' on 'colors' section.
'colors.code' config is only for jump arrows on assembly code listings
i.e.
│ ┌──jmp 1333
│ │ xchg %ax,%ax
│ │ mov %r15,%r10
│ └─→cmp %r15,%r14
But this config name seems unfit.
'jump_arrows' is more descriptive than 'code'.
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1452240971-25418-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --buildid-all option is to record build-id of all DSOs in the file.
It might be very costly to postprocess samples to find which DSO hits.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1452519429-31779-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently perf report only shows a help message "For a higher level
overview, try: perf report --sort comm,dso" unconditionally (even if
the sort keys were used). Add more help tips and show randomly.
Load tips from ${prefix}/share/doc/perf-tip/tips.txt file.
$ perf report | tail
0.10% swapper [kernel.vmlinux] [k] irq_exit
0.09% swapper [kernel.vmlinux] [k] flush_smp_call_function_queue
0.08% swapper [kernel.vmlinux] [k] native_write_msr_safe
0.03% swapper [kernel.vmlinux] [k] group_sched_in
0.01% perf [kernel.vmlinux] [k] native_write_msr_safe
#
# (Tip: Search options using a keyword: perf report -h <keyword>)
#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ingo Molnar <mingo@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1452166913-27046-1-git-send-email-namhyung@kernel.org
[ Renamed it to perf_tip() and the parameter dirname to dirpath to fix the build on older distros ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Steven Rostedt <rostedt@goodmis.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1451991518-25673-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Allowing to override record aggr_mode. It's possible to use perf stat
like:
$ perf stat report -A
$ perf stat report --per-core
$ perf stat report --per-socket
To customize the recorded aggregate mode regardless what was used during
the stat record command.
Reported-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1446734469-11352-19-git-send-email-jolsa@kernel.org
[ Renamed 'stat' parameter to 'st' to fix 'already defined' build error with older distros (e.g. RHEL6.7) ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding 'perf stat report' command support. ATM it only processes attr
events and display nothing.
Reported-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1446734469-11352-12-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add 'perf stat record' command support. It creates simple (header only)
perf.data file ATM.
The record command could be specified anywhere among stat options. All
stat command options are valid for stat record command with '-o' option
exception. If specified for record command it denotes the perf data file
name.
Committer note:
Set sample_type to PERF_SAMPLE_IDENTIFIER, which should be harmless
while avoiding that older tools show confusing messages, for instance,
with sample_type = 0, we get:
$ perf stat record usleep 1
Performance counter stats for 'usleep 1':
0.630237 task-clock (msec) # 0.528 CPUs utilized
1 context-switches # 0.002 M/sec
0 cpu-migrations # 0.000 K/sec
52 page-faults # 0.083 M/sec
978,312 cycles # 1.552 GHz
671,931 stalled-cycles-frontend # 68.68% frontend cycles idle
<not supported> stalled-cycles-backend
646,379 instructions # 0.66 insns per cycle
# 1.04 stalled cycles per insn
131,046 branches # 207.931 M/sec
7,073 branch-misses # 5.40% of all branches
0.001193240 seconds time elapsed
$ oldperf evlist
WARNING: The perf.data file's data size field is 0 which is unexpected.
Was the 'perf record' command properly terminated?
non matching sample_type
$
While with sample_type set to PERF_SAMPLE_IDENTIFIER, after we re-run 'perf
stat record usleep' we get:
$ oldperf evlist
WARNING: The perf.data file's data size field is 0 which is unexpected.
Was the 'perf record' command properly terminated?
task-clock
context-switches
cpu-migrations
page-faults
cycles
stalled-cycles-frontend
stalled-cycles-backend
instructions
branches
branch-misses
$
Which at least shows the names of the events in the perf.data file.
Additionally, such files, when passed to 'perf report' will produce:
$ oldperf report --stdio
WARNING: The perf.data file's data size field is 0 which is unexpected.
Was the 'perf record' command properly terminated?
Warning:
Kernel address maps (/proc/{kallsyms,modules}) were restricted.
Check /proc/sys/kernel/kptr_restrict before running 'perf record'.
As no suitable kallsyms nor vmlinux was found, kernel samples
can't be resolved.
Samples in kernel modules can't be resolved as well.
Error:
The perf.data file has no samples!
# To display the perf.data header info, please use --header/--header-only options.
#
$
Which is confusing and can be solved by just adding the kernel mmap record,
which will also remove that warning about the data size field being equal to
zero, after generating the mmap record:
$ perf stat record usleep 1
Performance counter stats for 'usleep 1':
0.600796 task-clock (msec) # 0.478 CPUs utilized
1 context-switches # 0.002 M/sec
0 cpu-migrations # 0.000 K/sec
54 page-faults # 0.090 M/sec
886,844 cycles # 1.476 GHz
582,169 stalled-cycles-frontend # 65.65% frontend cycles idle
<not supported> stalled-cycles-backend
638,344 instructions # 0.72 insns per cycle
# 0.91 stalled cycles per insn
130,204 branches # 216.719 M/sec
7,500 branch-misses # 5.76% of all branches
0.001255897 seconds time elapsed
$ oldperf evlist
task-clock
context-switches
cpu-migrations
page-faults
cycles
stalled-cycles-frontend
stalled-cycles-backend
instructions
branches
branch-misses
$ oldperf report --stdio
Error:
The perf.data file has no samples!
# To display the perf.data header info, please use --header/--header-only options.
#
[acme@zoo linux]$
No warnings, sensible output about what are the events in the perf.data file and also
a "file has no samples" message, which indeed it doesn't.
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Kan Liang <kan.liang@intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: htp://lkml.kernel.org/r/1446734469-11352-3-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Post processing at 'perf record' takes a long time on big machines.
What it does is to find the build-id of binaries found in the event
stream, so that it can make sure, at 'report' time, that the symtabs (be
it ELF, kallsyms, etc) being used to resolve symbols are the ones
matching the binaries found at 'record' time.
Sometimes we just want to skip this processing of events at the end of
the session to get quicker results, making sure the binaries haven't
changed from 'record' to 'report' time.
Add a new config option to control this behavior.
The record.build-id config variable can have one of the following
values:
- cache: post-process data and save/update the binaries into the
build-id cache (in ~/.debug). This is the default.
- no-cache: post-process the data but not update the build-id cache.
Same effect as using the -N option.
- skip: skip post-processing and do not update the cache.
Same effect as using the -B option.
Reported-and-Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Taeung Song <treeze.taeung@gmail.com>
Link: http://lkml.kernel.org/r/1450144196-22957-1-git-send-email-namhyung@kernel.org
[ Added some more text to the documentation ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Make perf-record command support --vmlinux option if BPF_PROLOGUE is on.
'perf record' needs vmlinux as the source of DWARF info to generate
prologue for BPF programs, so path of vmlinux should be specified.
Short name 'k' has been taken by 'clockid'. This patch skips the short
option name and uses '--vmlinux' for vmlinux path.
Documentation is also updated.
Test result:
In a production (or broken) environment:
(by:
# rm -rf ~/.debug/
# mv /lib/modules/`uname -r`/build/vmlinux /tmp/
)
# ./perf record -e ./test_bpf_base.c ls
Failed to find the path for kernel: No such file or directory
event syntax error: './test_bpf_base.c'
\___ You need to check probing points in BPF file
...
# ./perf record --vmlinux /tmp/vmlinux -e ./test_bpf_base.c ls
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data ]
Help messages when build with NO_LIBBPF:
# ./perf record -h
--transaction sample transaction flags (special events only)
--vmlinux <file> vmlinux pathname
(not built-in because NO_LIBBPF=1)
# ./perf record --vmlinux /tmp/vmlinux ls /
Warning: option `vmlinux' is being ignored because NO_LIBBPF=1
...
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.011 MB perf.data (11 samples) ]
Help messages when build with NO_DWARF:
# ./perf record -h
--transaction sample transaction flags (special events only)
--vmlinux <file> vmlinux pathname
(not built-in because NO_DWARF=1)
Signed-off-by: He Kuang <hekuang@huawei.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1450089563-122430-15-git-send-email-wangnan0@huawei.com
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add perf-config document to describe the perf configuration and a
'list’ subcommand.
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/63AD9B57-7B8C-46F8-8F18-0FFEB9A6A1BC@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Now -g/--call-graph option supports how to display callchain values.
Possible values are 'percent', 'period' and 'count'. The percent is
same as before and it's the default behavior. The period displays the
raw period value rather than the percentage. The count displays the
number of occurrences.
$ perf report --no-children --stdio -g percent
...
39.93% swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--28.63%-- start_secondary
|
--11.30%-- rest_init
$ perf report --no-children --show-total-period --stdio -g period
...
39.93% 13018705 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--9334403-- start_secondary
|
--3684302-- rest_init
$ perf report --no-children --show-nr-samples --stdio -g count
...
39.93% 80 swapper [kernel.vmlinux] [k] intel_idel
|
---intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
|
|--57-- start_secondary
|
--23-- rest_init
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1447047946-1691-6-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add new call chain option (-g) 'folded' to print callchains in a line.
The callchains are separated by semicolons, and preceded by (absolute)
percent values and a space.
For example, the following 20 lines can be printed in 3 lines with the
folded output mode:
$ perf report -g flat --no-children | grep -v ^# | head -20
60.48% swapper [kernel.vmlinux] [k] intel_idle
54.60%
intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
start_secondary
5.88%
intel_idle
cpuidle_enter_state
cpuidle_enter
call_cpuidle
cpu_startup_entry
rest_init
start_kernel
x86_64_start_reservations
x86_64_start_kernel
$ perf report -g folded --no-children | grep -v ^# | head -3
60.48% swapper [kernel.vmlinux] [k] intel_idle
54.60% intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;start_secondary
5.88% intel_idle;cpuidle_enter_state;cpuidle_enter;call_cpuidle;cpu_startup_entry;rest_init;start_kernel;x86_64_start_reservations;x86_64_start_kernel
This mode is supported only for --stdio now and intended to be used by
some scripts like in FlameGraphs[1]. Support for other UI might be
added later.
[1] http://www.brendangregg.com/FlameGraphs/cpuflamegraphs.html
Requested-and-Tested-by: Brendan Gregg <brendan.d.gregg@gmail.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1447047946-1691-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The -i flag was incorrectly listed as a short flag for --no-inherit. It
should have only been listed as a short flag for --input.
This documentation error has existed since the --input flag was
introduced in 6810fc915f (perf trace: Add
option to analyze events in a file versus live).
Signed-off-by: Peter Feiner <pfeiner@google.com>
Cc: David Ahern <dsahern@gmail.com>
Link: http://lkml.kernel.org/r/1446657706-14518-1-git-send-email-pfeiner@google.com
Fixes: 6810fc915f ("perf trace: Add option to analyze events in a file versus live")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Although previous patch allows setting BPF compiler related options in
perfconfig, on some ad-hoc situation it still requires passing options
through cmdline. This patch introduces 2 options to 'perf record' for
this propose: --clang-path and --clang-opt.
Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Alexei Starovoitov <ast@plumgrid.com>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: David Ahern <dsahern@gmail.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kaixu Xia <xiakaixu@huawei.com>
Cc: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1444826502-49291-9-git-send-email-wangnan0@huawei.com
[ Add the new options to the 'record' man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch improves perf script by enabling printing of the
branch stack via the 'brstack' and 'brstacksym' arguments to
the field selection option -F. The option is off by default
and operates only if the perf.data file has branch stack content.
The branches are printed in to/from pairs. The most recent branch
is printed first. The number of branch entries vary based on the
underlying hardware and filtering used.
The brstack prints FROM/TO addresses in raw hexadecimal format.
The brstacksym prints FROM/TO addresses in symbolic form wherever
possible.
$ perf script -F ip,brstack
5d3000 0x401aa0/0x5d2000/M/-/-/-/0 ...
$ perf script -F ip,brstacksym
4011e0 noploop+0x0/noploop+0x0/P/-/-/0
The notation F/T/M/X/A/C describes the attributes of the branch.
F=from, T=to, M/P=misprediction/prediction, X=TSX, A=TSX abort, C=cycles (SKL)
Signed-off-by: Stephane Eranian <eranian@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yuanfang Chen <cyfmxc@gmail.com>
Link: http://lkml.kernel.org/r/1441039273-16260-5-git-send-email-eranian@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that it can be more consistent with other --show-* options. The old
name (--showcpuutilization) is provided only for compatibility.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1445701767-12731-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --call-graph option is complex so we should provide better guide for
users. Also change help message to be consistent with config option
names. Now perf top will show help like below:
$ perf top --call-graph
Error: option `call-graph' requires a value
Usage: perf top [<options>]
--call-graph <record_mode[,record_size],print_type,threshold[,print_limit],order,sort_key[,branch]>
setup and enables call-graph (stack chain/backtrace):
record_mode: call graph recording mode (fp|dwarf|lbr)
record_size: if record_mode is 'dwarf', max size of stack recording (<bytes>)
default: 8192 (bytes)
print_type: call graph printing style (graph|flat|fractal|none)
threshold: minimum call graph inclusion threshold (<percent>)
print_limit: maximum number of call graph entry (<number>)
order: call graph order (caller|callee)
sort_key: call graph sort key (function|address)
branch: include last branch info to call graph (branch)
Default: fp,graph,0.5,caller,function
Requested-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445524112-5201-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently 'perf top --call-graph' option is same as 'perf record'. But
'perf top' also need to receive display options in 'perf report'. To do
that, change parse_callchain_report_opt() to allow record options too.
Now perf top can receive display options like below:
$ perf top --call-graph
Error: option `call-graph' requires a value
Usage: perf top [<options>]
--call-graph
<mode[,dump_size],output_type,min_percent[,print_limit],call_order[,branch]>
setup and enables call-graph (stack chain/backtrace)
recording: fp dwarf lbr, output_type (graph, flat,
fractal, or none), min percent threshold, optional
print limit, callchain order, key (function or
address), add branches
$ perf top --call-graph callee,graph,fp
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Brendan Gregg <brendan.d.gregg@gmail.com>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1445495330-25416-2-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch add a new branch type sampling filter to perf record.
It is named 'call' and maps to PERF_SAMPLE_BRANCH_CALL. It samples
direct call branches only, unlike 'any_call' which includes indirect
calls as well.
$ perf record -j call -e cycles .....
The man page is updated accordingly.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Cc: khandual@linux.vnet.ibm.com
Link: http://lkml.kernel.org/r/1444720151-10275-5-git-send-email-eranian@google.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
So right now there's a somewhat inconsistent mess of the benchmarking
code and options sometimes calling benchmarked functions 'functions',
sometimes calling them 'routines'.
Name them 'functions' consistently.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-14-git-send-email-mingo@kernel.org
[ Updated perf-bench man page, pointed out by David Ahern ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We have three benchmarking subsystems that specify some sort of 'number
of loops' parameter - but all of them do it inconsistently:
numa: -l/--nr_loops
sched messaging: -l/--loops
mem memset/memcpy: -i/--iterations
Harmonize them to -l/--nr_loops by picking the numa variant - which is
also the most likely one to have existing scripting which we don't want
to break.
Plus improve the parameter help texts to indicate the default value for
the nr_loops variable to keep users from guessing ...
Also propagate the naming to internal variables.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-13-git-send-email-mingo@kernel.org
[ Let the harmonisation reach the perf-bench man page as well ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So 'perf bench mem memcpy/memset' consistently uses 'len' and 'length'
for buffer sizes - while it's really a memory buffer size. (strings have
length.)
Rename all affected variables.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-10-git-send-email-mingo@kernel.org
[ Update perf-bench man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So 'perf bench mem memset/memcpy' has a CPU cycles measurement method,
but calls it 'cycle' (singular) throughout the code, which makes it
harder to read.
Rename all related functions, variables and options to a plural 'cycles'
nomenclature.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-8-git-send-email-mingo@kernel.org
[ s/--cycle/--cycles/g in perf-bench man page ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So 'perf bench mem memcpy/memset' has elaborate code to measure
memcpy()/memset() performance both with freshly allocated buffers (which
includes initial page fault overhead) and with preallocated buffers.
But the thing is, the resulting bandwidth results are mostly
meaningless, because page faults dominate so much of the cost.
It might make sense to measure cache cold vs. cache hot performance, but
the code does not do this.
So remove this complication, and always prefault the ranges before using
them.
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1445241870-24854-6-git-send-email-mingo@kernel.org
[ Remove --no-prefault, --only-prefault from docs, noticed by David Ahern ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Which is the most common default found in other similar tools.
Requested-by: Ingo Molnar <mingo@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Chandler Carruth <chandlerc@gmail.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: https://www.youtube.com/watch?v=nXaxk27zwlk
Link: http://lkml.kernel.org/n/tip-v8lq36aispvdwgxdmt9p9jd9@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Adding handling for '-h' and '-v' options to invoke help and version
command respectively.
Current behaviour is:
$ perf -v
Unknown option: -v
Usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
$ perf -h
Unknown option: -h
Usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
New behaviour:
$ perf -h
usage: perf [--version] [--help] [OPTIONS] COMMAND [ARGS]
The most commonly used perf commands are:
annotate Read perf.data (created by perf record) and display annotated code
archive Create archive with object files with build-ids found in perf.data file
bench General framework for benchmark suites
...
$ perf -v
perf version 4.3.rc3.gc99e32
Updated man page.
Requested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1444068369-20978-10-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>