mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2025-01-06 13:55:08 +08:00
96532a83ee
It needs stack traces to find callers of locks. To minimize the performance overhead it only collects up to 8 entries for each stack trace. And it skips first 3 entries as they came from BPF, tracepoint and lock functions which are not interested for most users. But it turned out that those numbers are different in some configuration. Using fixed number can result in non meaningful caller names. Let's make them adjustable with --stack-depth and --skip-stack options. On my setup, the default output is like below: # /perf lock con -ab -F contended,wait_total sleep 3 contended total wait type caller 28 4.55 ms rwlock:W __bpf_trace_contention_begin+0xb 33 1.67 ms rwlock:W __bpf_trace_contention_begin+0xb 12 580.28 us spinlock __bpf_trace_contention_begin+0xb 60 240.54 us rwsem:R __bpf_trace_contention_begin+0xb 27 64.45 us spinlock __bpf_trace_contention_begin+0xb If I change the stack skip to 5, the result will be like: # perf lock con -ab -F contended,wait_total --stack-skip 5 sleep 3 contended total wait type caller 32 715.45 us spinlock folio_lruvec_lock_irqsave+0x61 26 550.22 us spinlock folio_lruvec_lock_irqsave+0x61 15 486.93 us rwsem:R mmap_read_lock+0x13 12 139.66 us rwsem:W vm_mmap_pgoff+0x93 1 7.04 us spinlock tick_do_update_jiffies64+0x25 Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Song Liu <songliubraving@fb.com> Cc: bpf@vger.kernel.org Link: https://lore.kernel.org/r/20220912055314.744552-4-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
161 lines
3.7 KiB
Plaintext
161 lines
3.7 KiB
Plaintext
perf-lock(1)
|
|
============
|
|
|
|
NAME
|
|
----
|
|
perf-lock - Analyze lock events
|
|
|
|
SYNOPSIS
|
|
--------
|
|
[verse]
|
|
'perf lock' {record|report|script|info|contention}
|
|
|
|
DESCRIPTION
|
|
-----------
|
|
You can analyze various lock behaviours
|
|
and statistics with this 'perf lock' command.
|
|
|
|
'perf lock record <command>' records lock events
|
|
between start and end <command>. And this command
|
|
produces the file "perf.data" which contains tracing
|
|
results of lock events.
|
|
|
|
'perf lock report' reports statistical data.
|
|
|
|
'perf lock script' shows raw lock events.
|
|
|
|
'perf lock info' shows metadata like threads or addresses
|
|
of lock instances.
|
|
|
|
'perf lock contention' shows contention statistics.
|
|
|
|
COMMON OPTIONS
|
|
--------------
|
|
|
|
-i::
|
|
--input=<file>::
|
|
Input file name. (default: perf.data unless stdin is a fifo)
|
|
|
|
-v::
|
|
--verbose::
|
|
Be more verbose (show symbol address, etc).
|
|
|
|
-D::
|
|
--dump-raw-trace::
|
|
Dump raw trace in ASCII.
|
|
|
|
-f::
|
|
--force::
|
|
Don't complain, do it.
|
|
|
|
--vmlinux=<file>::
|
|
vmlinux pathname
|
|
|
|
--kallsyms=<file>::
|
|
kallsyms pathname
|
|
|
|
|
|
REPORT OPTIONS
|
|
--------------
|
|
|
|
-k::
|
|
--key=<value>::
|
|
Sorting key. Possible values: acquired (default), contended,
|
|
avg_wait, wait_total, wait_max, wait_min.
|
|
|
|
-F::
|
|
--field=<value>::
|
|
Output fields. By default it shows all the fields but users can
|
|
customize that using this. Possible values: acquired, contended,
|
|
avg_wait, wait_total, wait_max, wait_min.
|
|
|
|
-c::
|
|
--combine-locks::
|
|
Merge lock instances in the same class (based on name).
|
|
|
|
-t::
|
|
--threads::
|
|
The -t option is to show per-thread lock stat like below:
|
|
|
|
$ perf lock report -t -F acquired,contended,avg_wait
|
|
|
|
Name acquired contended avg wait (ns)
|
|
|
|
perf 240569 9 5784
|
|
swapper 106610 19 543
|
|
:15789 17370 2 14538
|
|
ContainerMgr 8981 6 874
|
|
sleep 5275 1 11281
|
|
ContainerThread 4416 4 944
|
|
RootPressureThr 3215 5 1215
|
|
rcu_preempt 2954 0 0
|
|
ContainerMgr 2560 0 0
|
|
unnamed 1873 0 0
|
|
EventManager_De 1845 1 636
|
|
futex-default-S 1609 0 0
|
|
|
|
INFO OPTIONS
|
|
------------
|
|
|
|
-t::
|
|
--threads::
|
|
dump thread list in perf.data
|
|
|
|
-m::
|
|
--map::
|
|
dump map of lock instances (address:name table)
|
|
|
|
CONTENTION OPTIONS
|
|
--------------
|
|
|
|
-k::
|
|
--key=<value>::
|
|
Sorting key. Possible values: contended, wait_total (default),
|
|
wait_max, wait_min, avg_wait.
|
|
|
|
-F::
|
|
--field=<value>::
|
|
Output fields. By default it shows all but the wait_min fields
|
|
and users can customize that using this. Possible values:
|
|
contended, wait_total, wait_max, wait_min, avg_wait.
|
|
|
|
-t::
|
|
--threads::
|
|
Show per-thread lock contention stat
|
|
|
|
-b::
|
|
--use-bpf::
|
|
Use BPF program to collect lock contention stats instead of
|
|
using the input data.
|
|
|
|
-a::
|
|
--all-cpus::
|
|
System-wide collection from all CPUs.
|
|
|
|
-C::
|
|
--cpu::
|
|
Collect samples only on the list of CPUs provided. Multiple CPUs can be
|
|
provided as a comma-separated list with no space: 0,1. Ranges of CPUs
|
|
are specified with -: 0-2. Default is to monitor all CPUs.
|
|
|
|
-p::
|
|
--pid=::
|
|
Record events on existing process ID (comma separated list).
|
|
|
|
--tid=::
|
|
Record events on existing thread ID (comma separated list).
|
|
|
|
--map-nr-entries::
|
|
Maximum number of BPF map entries (default: 10240).
|
|
|
|
--max-stack::
|
|
Maximum stack depth when collecting lock contention (default: 8).
|
|
|
|
--stack-skip
|
|
Number of stack depth to skip when finding a lock caller (default: 3).
|
|
|
|
|
|
SEE ALSO
|
|
--------
|
|
linkperf:perf[1]
|