linux/tools/perf/Documentation/perf-lock.txt
Namhyung Kim 96532a83ee perf lock contention: Allow to change stack depth and skip
It needs stack traces to find callers of locks.  To minimize the
performance overhead it only collects up to 8 entries for each stack
trace.  And it skips first 3 entries as they came from BPF, tracepoint
and lock functions which are not interested for most users.

But it turned out that those numbers are different in some
configuration.  Using fixed number can result in non meaningful caller
names.  Let's make them adjustable with --stack-depth and --skip-stack
options.

On my setup, the default output is like below:

  # /perf lock con -ab -F contended,wait_total sleep 3
   contended   total wait         type   caller

          28      4.55 ms     rwlock:W   __bpf_trace_contention_begin+0xb
          33      1.67 ms     rwlock:W   __bpf_trace_contention_begin+0xb
          12    580.28 us     spinlock   __bpf_trace_contention_begin+0xb
          60    240.54 us      rwsem:R   __bpf_trace_contention_begin+0xb
          27     64.45 us     spinlock   __bpf_trace_contention_begin+0xb

If I change the stack skip to 5, the result will be like:

  # perf lock con -ab -F contended,wait_total --stack-skip 5 sleep 3
   contended   total wait         type   caller

          32    715.45 us     spinlock   folio_lruvec_lock_irqsave+0x61
          26    550.22 us     spinlock   folio_lruvec_lock_irqsave+0x61
          15    486.93 us      rwsem:R   mmap_read_lock+0x13
          12    139.66 us      rwsem:W   vm_mmap_pgoff+0x93
           1      7.04 us     spinlock   tick_do_update_jiffies64+0x25

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220912055314.744552-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04 08:55:22 -03:00

161 lines
3.7 KiB
Plaintext

perf-lock(1)
============
NAME
----
perf-lock - Analyze lock events
SYNOPSIS
--------
[verse]
'perf lock' {record|report|script|info|contention}
DESCRIPTION
-----------
You can analyze various lock behaviours
and statistics with this 'perf lock' command.
'perf lock record <command>' records lock events
between start and end <command>. And this command
produces the file "perf.data" which contains tracing
results of lock events.
'perf lock report' reports statistical data.
'perf lock script' shows raw lock events.
'perf lock info' shows metadata like threads or addresses
of lock instances.
'perf lock contention' shows contention statistics.
COMMON OPTIONS
--------------
-i::
--input=<file>::
Input file name. (default: perf.data unless stdin is a fifo)
-v::
--verbose::
Be more verbose (show symbol address, etc).
-D::
--dump-raw-trace::
Dump raw trace in ASCII.
-f::
--force::
Don't complain, do it.
--vmlinux=<file>::
vmlinux pathname
--kallsyms=<file>::
kallsyms pathname
REPORT OPTIONS
--------------
-k::
--key=<value>::
Sorting key. Possible values: acquired (default), contended,
avg_wait, wait_total, wait_max, wait_min.
-F::
--field=<value>::
Output fields. By default it shows all the fields but users can
customize that using this. Possible values: acquired, contended,
avg_wait, wait_total, wait_max, wait_min.
-c::
--combine-locks::
Merge lock instances in the same class (based on name).
-t::
--threads::
The -t option is to show per-thread lock stat like below:
$ perf lock report -t -F acquired,contended,avg_wait
Name acquired contended avg wait (ns)
perf 240569 9 5784
swapper 106610 19 543
:15789 17370 2 14538
ContainerMgr 8981 6 874
sleep 5275 1 11281
ContainerThread 4416 4 944
RootPressureThr 3215 5 1215
rcu_preempt 2954 0 0
ContainerMgr 2560 0 0
unnamed 1873 0 0
EventManager_De 1845 1 636
futex-default-S 1609 0 0
INFO OPTIONS
------------
-t::
--threads::
dump thread list in perf.data
-m::
--map::
dump map of lock instances (address:name table)
CONTENTION OPTIONS
--------------
-k::
--key=<value>::
Sorting key. Possible values: contended, wait_total (default),
wait_max, wait_min, avg_wait.
-F::
--field=<value>::
Output fields. By default it shows all but the wait_min fields
and users can customize that using this. Possible values:
contended, wait_total, wait_max, wait_min, avg_wait.
-t::
--threads::
Show per-thread lock contention stat
-b::
--use-bpf::
Use BPF program to collect lock contention stats instead of
using the input data.
-a::
--all-cpus::
System-wide collection from all CPUs.
-C::
--cpu::
Collect samples only on the list of CPUs provided. Multiple CPUs can be
provided as a comma-separated list with no space: 0,1. Ranges of CPUs
are specified with -: 0-2. Default is to monitor all CPUs.
-p::
--pid=::
Record events on existing process ID (comma separated list).
--tid=::
Record events on existing thread ID (comma separated list).
--map-nr-entries::
Maximum number of BPF map entries (default: 10240).
--max-stack::
Maximum stack depth when collecting lock contention (default: 8).
--stack-skip
Number of stack depth to skip when finding a lock caller (default: 3).
SEE ALSO
--------
linkperf:perf[1]