Add -M option to report/annotate to pass directly to objdump. This
allows to use -M intel for intel style disassembler syntax, which is
useful for people who are very used to the Intel syntax.
Link: http://lkml.kernel.org/r/1316122302-24306-2-git-send-email-andi@firstfloor.org
[committer note: Add missing Documentation bits, fixup conflicts with 3e6a2a7]
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This perf stat option emulates valgrind's --log-fd option, allowing the
user to send perf results elsewhere, and leaving stderr for use by the
program under test. This complements --output file option, and is
mutually exclusive with it.
3>results perf stat --log-fd 3 -- $cmd
3>>results perf stat --log-fd 3 --append -- $cmd
The perl distro's make test.valgrind target uses valgrind's --log-fd
option, I've adapted it to invoke perf also, and tested this patch
there.
Link: http://lkml.kernel.org/r/1315437244-3788-2-git-send-email-jim.cromie@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jim Cromie <jim.cromie@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Try first reading the build id, validating that it is an ELF file, etc.
Cheap as libelf will bail out as soon as the magic number check fails.
Useful when investigating debuginfo packaging problems like this one:
[root@emilia ~]# perf buildid-list -i /usr/lib/debug/lib/modules/`uname -r`/vmlinux
77bb4ea591a602d455ace759a377c9adfff1aba3
[root@emilia ~]# perf buildid-list -k
07b0c016a2b30004e86132d0239945b1e88f5d75
[root@emilia ~]#
Cc: David Ahern <dsahern@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-4elot9oxwa0rr0d90dshca3a@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds an option (-o) to save the output of perf stat into a
file. You could do this with perf record but not with perf stat.
Instead, you had to fiddle with stderr to save the counts into a
separate file.
The patch also adds the --append option so that results can be
concatenated into a single file across runs. Each run of the tool is
clearly separated by a comment line starting with a hash mark. The -A
option of perf record is already used by perf stat, so we only add a
long option.
$ perf stat -o res.txt date
$ cat res.txt
Performance counter stats for 'date':
0.791306 task-clock # 0.668 CPUs utilized
2 context-switches # 0.003 M/sec
0 CPU-migrations # 0.000 M/sec
197 page-faults # 0.249 M/sec
1878143 cycles # 2.373 GHz
<not supported> stalled-cycles-frontend
<not supported> stalled-cycles-backend
1083367 instructions # 0.58 insns per cycle
193027 branches # 243.935 M/sec
9014 branch-misses # 4.67% of all branches
0.001184746 seconds time elapsed
The option can be combined with -x to make the output file much easier
to parse.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20110815202233.GA18535@quad
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If you have --symfs in perf report, then you also need it for perf
annotate. This allows off-box assembly level analysis of perf.data
samples.
This patch complements:
commit ec5761eab3
Author: David Ahern <daahern@cisco.com>
Date: Thu Dec 9 13:27:07 2010 -0700
perf symbols: Add symfs option for off-box analysis using specified tree
Acked-by: David Ahern <daahern@cisco.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: David Ahern <daahern@cisco.com>
Link: http://lkml.kernel.org/r/20110729232040.GA21838@quad
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds two new options to perf annotate:
- --no-asm-raw : Do not display raw instruction encodings
- --no-source : Do not interleave source code with assembly code
We believe those options make the output of annotate more readable.
Systematically displaying source can make it hard to follow code and
especially optimized code.
Raw encodings are not useful in most cases.
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20110517153207.GA9834@quad
Signed-off-by: Stephane Eranian <eranian@google.com>
[committer note: Use the 'no-' option inverting logic]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Support adding probes on offline kernel modules. This enables
perf-probe to trace kernel-module init functions via perf-probe.
If user gives the path of module with -m option, perf-probe
expects the module is offline.
This feature works with --add, --funcs, and --vars.
E.g)
# perf probe -m /lib/modules/`uname -r`/kernel/fs/btrfs/btrfs.ko \
-a "extent_io_init:5 extent_state_cache"
Add new events:
probe:extent_io_init (on extent_io_init:5 with extent_state_cache)
probe:extent_io_init_1 (on extent_io_init:5 with extent_state_cache)
You can now use it on all perf tools, such as:
perf record -e probe:extent_io_init_1 -aR sleep 1
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Link: http://lkml.kernel.org/r/20110627072751.6528.10230.stgit@fedora15
Signed-off-by: Steven Rostedt <rostedt@goodmis.org>
Add an option to perf report/annotate/script to specify which
CPUs to operate on. This enables us to take a single system wide
profile and analyse each CPU (or group of CPUs) in isolation.
This was useful when profiling a multiprocess workload where the
bottleneck was on one CPU but this was hidden in the overall
profile. Per process and per thread breakdowns didn't help
because multiple processes were running on each CPU and no
single process consumed an entire CPU.
The patch converts the list of CPUs returned by cpu_map__new
into a bitmap for fast lookup. I wanted to use -C to be
consistent with perf top/record/stat, but unfortunately perf
report already uses -C <comms>.
v2: Incorporate suggestions from David Ahern:
- Added -c to perf script
- Check that SAMPLE_CPU is set when -c is used
- Update documentation
v3: Create perf_session__cpu_bitmap()
Signed-off-by: Anton Blanchard <anton@samba.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Paul Mackerras <paulus@samba.org>
Link: http://lkml.kernel.org/r/20110704215750.11647eb9@kryten
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add "caller/callee" option to support inverted butterfly report,
in the inverted report (with caller option), the call graph start
from the callee's ancestor. Users can use such view to catch system's
performance bottleneck from a sysprof like view. Using this option
with specified sort order like pid gives us high level view of call
graph statistics.
Also add "-G" alias for inverted call graph.
Signed-off-by: Sam Liao <phyomh@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: David Ahern <dsahern@gmail.com>
Signed-off-by: Frederic Weisbecker <fweisbec@gmail.com>
Resolve to a function or variable if possible and if the sym option is
enabled.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1306782503-22002-1-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'sym' option displays both the function name and the DSO it comes
from. Split the display of the dso into a separate option. This allows
display of the ip address and symbol without the dso, thus shortening
line lengths - and decluttering the output a bit.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1306528124-25861-3-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently the "sym" output field is used to dump instruction pointers
and callchain stack. Sample addresses can also be converted to symbols,
so the meaning of "sym" needs to be fixed. This patch adds an "ip"
option and if it is selected the user can also opt to dump symbols for
them. If the user opts to dump IP without syms only the address is
shown.
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1306528124-25861-2-git-send-email-dsahern@gmail.com
Signed-off-by: David Ahern <dsahern@gmail.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
* 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip: (60 commits)
sched: Fix and optimise calculation of the weight-inverse
sched: Avoid going ahead if ->cpus_allowed is not changed
sched, rt: Update rq clock when unthrottling of an otherwise idle CPU
sched: Remove unused parameters from sched_fork() and wake_up_new_task()
sched: Shorten the construction of the span cpu mask of sched domain
sched: Wrap the 'cfs_rq->nr_spread_over' field with CONFIG_SCHED_DEBUG
sched: Remove unused 'this_best_prio arg' from balance_tasks()
sched: Remove noop in alloc_rt_sched_group()
sched: Get rid of lock_depth
sched: Remove obsolete comment from scheduler_tick()
sched: Fix sched_domain iterations vs. RCU
sched: Next buddy hint on sleep and preempt path
sched: Make set_*_buddy() work on non-task entities
sched: Remove need_migrate_task()
sched: Move the second half of ttwu() to the remote cpu
sched: Restructure ttwu() some more
sched: Rename ttwu_post_activation() to ttwu_do_wakeup()
sched: Remove rq argument from ttwu_stat()
sched: Remove rq->lock from the first half of ttwu()
sched: Drop rq->lock from sched_exec()
...
* 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
sched: Fix rt_rq runtime leakage bug
Neil Brown pointed out that lock_depth somehow escaped the BKL
removal work. Let's get rid of it now.
Note that the perf scripting utilities still have a bunch of
code for dealing with common_lock_depth in tracepoints; I have
left that in place in case anybody wants to use that code with
older kernels.
Suggested-by: Neil Brown <neilb@suse.de>
Signed-off-by: Jonathan Corbet <corbet@lwn.net>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/20110422111910.456c0e84@bike.lwn.net
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Allow a user to select which fields to print to stdout for event data.
Options include comm (command name), tid (thread id), pid (process id),
time (perf timestamp), cpu, event (for event name), and trace (for
trace data).
Default is set to maintain compatibility with current output; this
feature does alter output format slightly -- no '-' between command
and pid/tid.
Thanks to Frederic Weisbecker for detailed suggestions on this approach.
Examples (output compressed)
1. trace, default format
perf record -ga -e sched:sched_switch
perf script
swapper 0 [000] 537.037184: sched_switch: prev_comm=swapper prev_pid=0...
sshd 1675 [000] 537.037309: sched_switch: prev_comm=sshd prev_pid=1675...
netstat 1692 [001] 537.038664: sched_switch: prev_comm=netstat prev_pid=1692...
2. trace, custom format
perf record -ga -e sched:sched_switch
perf script -f comm,pid,time,trace <--- omitting cpu and event name
swapper 0 537.037184: prev_comm=swapper prev_pid=0 prev_prio=120 ...
sshd 1675 537.037309: prev_comm=sshd prev_pid=1675 prev_prio=120 ...
netstat 1692 537.038664: prev_comm=netstat prev_pid=1692 prev_prio=120 ...
Acked-by: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
LKML-Reference: <1299734608-5223-5-git-send-email-daahern@cisco.com>
Signed-off-by: David Ahern <daahern@cisco.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The perf makefile is nicely complete except for
a) an uninstall option
b) a 'make help' description
This patch implements b)
it also comments out other non-working makefile targets
Signed-off-by: Jesse Brandeburg <jesse.brandeburg@intel.com>
LKML-Reference: <new-submission>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds the ability to filter monitoring based on container groups
(cgroups) for both perf stat and perf record. It is possible to monitor
multiple cgroup in parallel. There is one cgroup per event. The cgroups to
monitor are passed via a new -G option followed by a comma separated list of
cgroup names.
The cgroup filesystem has to be mounted. Given a cgroup name, the perf tool
finds the corresponding directory in the cgroup filesystem and opens it. It
then passes that file descriptor to the kernel.
Example:
$ perf stat -B -a -e cycles:u,cycles:u,cycles:u -G test1,,test2 -- sleep 1
Performance counter stats for 'sleep 1':
2,368,667,414 cycles test1
2,369,661,459 cycles
<not counted> cycles test2
1.001856890 seconds time elapsed
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <4d590290.825bdf0a.7d0a.4890@mx.google.com>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Add filters support for available variable list.
Default filter is "!__k???tab_*&!__crc_*" for filtering out
automatically generated symbols.
The format of filter rule is "[!]GLOBPATTERN", so you can use wild
cards. If the filter rule starts with '!', matched variables are filter
out.
e.g.:
# perf probe -V schedule --externs --filter=cpu*
Available variables at schedule
@<schedule+0>
cpumask_var_t cpu_callout_mask
cpumask_var_t cpu_core_map
cpumask_var_t cpu_isolated_map
cpumask_var_t cpu_sibling_map
int cpu_number
long unsigned int* cpu_bit_bitmap
...
Cc: 2nddept-manager@sdl.hitachi.co.jp
Cc: Chase Douglas <chase.douglas@canonical.com>
Cc: Franck Bui-Huu <fbuihuu@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Srikar Dronamraju <srikar@linux.vnet.ibm.com>
Cc: Steven Rostedt <rostedt@goodmis.org>
LKML-Reference: <20110120141539.25915.43401.stgit@ltc236.sdl.hitachi.co.jp>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
[ committer note: Removed the elf.h include as it was fixed up in e80711c]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes there is a need to use perf in "live-log" mode. The problem
is, for seldom events, actual info output is largely delayed because
perf-record reads sample data in whole pages.
So for such scenarious, add flag for perf-record to go in "nodelay"
mode. To track e.g. what's going on in icmp_rcv while ping is running
Use it with something like this:
(1) $ perf probe -L icmp_rcv | grep -U8 '^ *43\>'
goto error;
}
38 if (!pskb_pull(skb, sizeof(*icmph)))
goto error;
icmph = icmp_hdr(skb);
43 ICMPMSGIN_INC_STATS_BH(net, icmph->type);
/*
* 18 is the highest 'known' ICMP type. Anything else is a mystery
*
* RFC 1122: 3.2.2 Unknown ICMP messages types MUST be silently
* discarded.
*/
50 if (icmph->type > NR_ICMP_TYPES)
goto error;
$ perf probe icmp_rcv:43 'type=icmph->type'
(2) $ cat trace-icmp.py
[...]
def trace_begin():
print "in trace_begin"
def trace_end():
print "in trace_end"
def probe__icmp_rcv(event_name, context, common_cpu,
common_secs, common_nsecs, common_pid, common_comm,
__probe_ip, type):
print_header(event_name, common_cpu, common_secs, common_nsecs,
common_pid, common_comm)
print "__probe_ip=%u, type=%u\n" % \
(__probe_ip, type),
[...]
(3) $ perf record -a -D -e probe:icmp_rcv -o - | \
perf script -i - -s trace-icmp.py
Thanks to Peter Zijlstra for pointing how to do it.
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>, Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Zanussi <tzanussi@gmail.com>
LKML-Reference: <20110112140613.GA11698@tugrik.mns.mnsspb.ru>
Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The symfs argument allows analysis of perf.data file using a locally accessible
filesystem tree with debug symbols - e.g., tree created during image builds,
sshfs mount, loop mounted KVM disk images, USB keys, initrds, etc. Anything
with an OS tree can be analyzed from anywhere without the need to populate a
local data store with build-ids.
Commiter notes:
o Fixed up symfs="/" variants handling.
o prefixed DSO__ORIG_GUEST_KMODULE case with symfs too, avoiding use of files
outside the symfs directory.
LKML-Reference: <1291926427-28846-1-git-send-email-daahern@cisco.com>
Signed-off-by: David Ahern <daahern@cisco.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is useful for analyzing a perf data file on a different system than
the one data was collected on and still include symbols from loaded
kernel modules in the output.
Commiter note: Updated the man page accordingly.
LKML-Reference: <1291775986-16475-1-git-send-email-daahern@cisco.com>
Signed-off-by: David Ahern <daahern@cisco.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This patch adds an option (-x/--field-separator) to print counts using a
CSV-style output. The user can pass a custom separator. This makes it very easy
to import counts directly into your favorite spreadsheet without having to
write scripts.
Example:
$ perf stat --field-separator=, -a -- sleep 1
4009.961740,task-clock-msecs
13,context-switches
2,CPU-migrations
189,page-faults
9596385684,cycles
3493659441,instructions
872897069,branches
41562,branch-misses
22424,cache-references
1289,cache-misses
Works also in non-aggregated mode:
$ perf stat -x , -a -A -- sleep 1
CPU0,1002.526168,task-clock-msecs
CPU1,1002.528365,task-clock-msecs
CPU2,1002.523360,task-clock-msecs
CPU3,1002.519878,task-clock-msecs
CPU0,1,context-switches
CPU1,5,context-switches
CPU2,5,context-switches
CPU3,6,context-switches
CPU0,0,CPU-migrations
CPU1,1,CPU-migrations
CPU2,0,CPU-migrations
CPU3,1,CPU-migrations
CPU0,2,page-faults
CPU1,6,page-faults
CPU2,9,page-faults
CPU3,174,page-faults
CPU0,2399439771,cycles
CPU1,2380369063,cycles
CPU2,2399142710,cycles
CPU3,2373161192,cycles
CPU0,872900618,instructions
CPU1,873030960,instructions
CPU2,872714525,instructions
CPU3,874460580,instructions
CPU0,221556839,branches
CPU1,218134342,branches
CPU2,218161730,branches
CPU3,218284093,branches
CPU0,18556,branch-misses
CPU1,1449,branch-misses
CPU2,3447,branch-misses
CPU3,12714,branch-misses
CPU0,8330,cache-references
CPU1,313844,cache-references
CPU2,47993728,cache-references
CPU3,826481,cache-references
CPU0,272,cache-misses
CPU1,5360,cache-misses
CPU2,1342193,cache-misses
CPU3,13992,cache-misses
This second version adds the ability to name a separator and uses
field-separator as the long option to be consistent with perf report.
Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be
used with it, we need to notice if the user explicitely enabled or disabled -B,
add code to disable big_num if the user didn't explicitely set --big_num when
-x is used.
Cc: David S. Miller <davem@davemloft.net>
Cc: Frederik Weisbecker <fweisbec@gmail.com>
Cc: Ingo Molnar <mingo@elte.hu>
Cc: paulus@samba.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <robert.richter@amd.com>
LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com>
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Merge reason: This is an older commit under testing that was not pushed yet - merge it.
Also fix up the merge in command-list.txt.
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Tom Zanussi <tzanussi@gmail.com>