Change "perf lock info" argument handling to:
Display both map and thread info (rather than an error) when neither are
specified.
Display both map and thread info (rather than just thread info) when
both are requested.
Signed-off-by: Nick Forrington <nick.forrington@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20240513091413.738537-2-nick.forrington@arm.com
Leak sanitizer complains about the strdup-ed arguments not being freed
and given cmd_record doesn't modify the given strings, remove the
strdups.
Original discussion in this patch:
https://lore.kernel.org/lkml/20240430184156.1824083-1-irogers@google.com/
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Link: https://lore.kernel.org/r/20240509053123.1918093-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
if a strdup-ed string is NULL,the allocated memory needs freeing.
Signed-off-by: zhaimingbing <zhaimingbing@cmss.chinamobile.com>
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Li Dong <lidong@vivo.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sean Christopherson <seanjc@google.com>
Link: https://lore.kernel.org/r/20231124092657.10392-1-zhaimingbing@cmss.chinamobile.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If a memory allocation fails then the strdup-ed string needs
freeing. Detected by clang-tidy.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: llvm@lists.linux.dev
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20231009183920.200859-15-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The perf test named "kernel lock contention analysis test"
fails in powerpc system with below error:
[command]# ./perf test 81 -vv
81: kernel lock contention analysis test :
--- start ---
test child forked, pid 2140
Testing perf lock record and perf lock contention
Testing perf lock contention --use-bpf
[Skip] No BPF support
Testing perf lock record and perf lock contention at the same time
Testing perf lock contention --threads
Testing perf lock contention --lock-addr
Testing perf lock contention --type-filter (w/ spinlock)
Testing perf lock contention --lock-filter (w/ tasklist_lock)
Testing perf lock contention --callstack-filter (w/ unix_stream)
[Fail] Recorded result should have a lock from unix_stream:
test child finished with -1
---- end ----
kernel lock contention analysis test: FAILED!
The test is failing because we get an address entry with 0 in
perf lock samples for powerpc, and code for lock contention
option "--callstack-filter" will not check further entries after
address 0.
Below are some of the samples from test generated perf.data file, which
have 0 address in the 2nd entry of callstack:
--------
sched-messaging 3409 [001] 7152.904029: lock:contention_begin: 0xc00000c80904ef00 (flags=SPIN)
c0000000001e926c __traceiter_contention_begin+0x6c ([kernel.kallsyms])
0 [unknown] ([unknown])
c000000000f8a178 native_queued_spin_lock_slowpath+0x1f8 ([kernel.kallsyms])
c000000000f89f44 _raw_spin_lock_irqsave+0x84 ([kernel.kallsyms])
c0000000001d9fd0 prepare_to_wait+0x50 ([kernel.kallsyms])
c000000000c80f50 sock_alloc_send_pskb+0x1b0 ([kernel.kallsyms])
c000000000e82298 unix_stream_sendmsg+0x2b8 ([kernel.kallsyms])
c000000000c78980 sock_sendmsg+0x80 ([kernel.kallsyms])
sched-messaging 3408 [005] 7152.904036: lock:contention_begin: 0xc00000c80904ef00 (flags=SPIN)
c0000000001e926c __traceiter_contention_begin+0x6c ([kernel.kallsyms])
0 [unknown] ([unknown])
c000000000f8a178 native_queued_spin_lock_slowpath+0x1f8 ([kernel.kallsyms])
c000000000f89f44 _raw_spin_lock_irqsave+0x84 ([kernel.kallsyms])
c0000000001d9fd0 prepare_to_wait+0x50 ([kernel.kallsyms])
c000000000c80f50 sock_alloc_send_pskb+0x1b0 ([kernel.kallsyms])
c000000000e82298 unix_stream_sendmsg+0x2b8 ([kernel.kallsyms])
c000000000c78980 sock_sendmsg+0x80 ([kernel.kallsyms])
--------
Based on commit 20002ded4d ("perf_counter: powerpc: Add callchain support"),
incase of powerpc, the callchain saved by kernel always includes first
three entries as the NIP (next instruction pointer), LR (link register), and
the contents of LR save area in the second stack frame. In certain scenarios
its possible to have invalid kernel instruction addresses in either of LR or the
second stack frame's LR. In that case, kernel will store the address as zer0.
Hence, its possible to have 2nd or 3rd callstack entry as 0.
As per the current code in match_callstack_filter function, we skip the callstack
check incase we get 0 address. And hence the test case is failing in powerpc.
Fix this issue by updating the check in match_callstack_filter function,
to not skip callstack check if the 2nd or 3rd entry have 0 address
for powerpc.
Result in powerpc after patch changes:
[command]# ./perf test 81 -vv
81: kernel lock contention analysis test :
--- start ---
test child forked, pid 4570
Testing perf lock record and perf lock contention
Testing perf lock contention --use-bpf
[Skip] No BPF support
Testing perf lock record and perf lock contention at the same time
Testing perf lock contention --threads
Testing perf lock contention --lock-addr
Testing perf lock contention --type-filter (w/ spinlock)
Testing perf lock contention --lock-filter (w/ tasklist_lock)
[Skip] Could not find 'tasklist_lock'
Testing perf lock contention --callstack-filter (w/ unix_stream)
Testing perf lock contention --callstack-filter with task aggregation
Testing perf lock contention CSV output
[Skip] No BPF support
test child finished with 0
---- end ----
kernel lock contention analysis test: Ok
Fixes: ebab291641 ("perf lock contention: Support filters for different aggregation")
Reported-by: Disha Goel <disgoel@linux.vnet.ibm.com>
Tested-by: Disha Goel <disgoel@linux.ibm.com>
Signed-off-by: Kajol Jain <kjain@linux.ibm.com>
Cc: maddy@linux.ibm.com
Cc: atrajeev@linux.vnet.ibm.com
Link: https://lore.kernel.org/r/20231003092113.252380-1-kjain@linux.ibm.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The -G/--cgroup-filter is to limit lock contention collection on the
tasks in the specific cgroups only.
$ sudo ./perf lock con -abt -G /user.slice/.../vte-spawn-52221fb8-b33f-4a52-b5c3-e35d1e6fc0e0.scope \
./perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.174 [sec]
contended total wait max wait avg wait pid comm
4 114.45 us 60.06 us 28.61 us 214847 sched-messaging
2 111.40 us 60.84 us 55.70 us 214848 sched-messaging
2 106.09 us 59.42 us 53.04 us 214837 sched-messaging
1 81.70 us 81.70 us 81.70 us 214709 sched-messaging
68 78.44 us 6.83 us 1.15 us 214633 sched-messaging
69 73.71 us 2.69 us 1.07 us 214632 sched-messaging
4 72.62 us 60.83 us 18.15 us 214850 sched-messaging
2 71.75 us 67.60 us 35.88 us 214840 sched-messaging
2 69.29 us 67.53 us 34.65 us 214804 sched-messaging
2 69.00 us 68.23 us 34.50 us 214826 sched-messaging
...
Export cgroup__new() function as it's needed from outside.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --lock-cgroup option shows lock contention stats break down by
cgroups.
Add LOCK_AGGR_CGROUP mode and use it instead of use_cgroup field.
$ sudo ./perf lock con -ab --lock-cgroup sleep 1
contended total wait max wait avg wait cgroup
8 15.70 us 6.34 us 1.96 us /
2 1.48 us 747 ns 738 ns /user.slice/.../app.slice/app-gnome-google\x2dchrome-6442.scope
1 848 ns 848 ns 848 ns /user.slice/.../session.slice/org.gnome.Shell@x11.service
1 220 ns 220 ns 220 ns /user.slice/.../session.slice/pipewire-pulse.service
For now, the cgroup mode only works with BPF (-b).
Committer notes:
Remove -g as it is used in the other tools with a clear meaning of
collect/show callchains. As agreed with Namhyung off list.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Save cgroup info and display cgroup names if requested. This is a
preparation for the next patch.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230906174903.346486-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
While debugging a segfault on 'perf lock contention' without an
available perf.data file I noticed that it was basically calling:
perf_session__delete(ERR_PTR(-1))
Resulting in:
(gdb) run lock contention
Starting program: /root/bin/perf lock contention
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
failed to open perf.data: No such file or directory (try 'perf record' first)
Initializing perf session failed
Program received signal SIGSEGV, Segmentation fault.
0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858
2858 if (!session->auxtrace)
(gdb) p session
$1 = (struct perf_session *) 0xffffffffffffffff
(gdb) bt
#0 0x00000000005e7515 in auxtrace__free (session=0xffffffffffffffff) at util/auxtrace.c:2858
#1 0x000000000057bb4d in perf_session__delete (session=0xffffffffffffffff) at util/session.c:300
#2 0x000000000047c421 in __cmd_contention (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2161
#3 0x000000000047dc95 in cmd_lock (argc=0, argv=0x7fffffffe200) at builtin-lock.c:2604
#4 0x0000000000501466 in run_builtin (p=0xe597a8 <commands+552>, argc=2, argv=0x7fffffffe200) at perf.c:322
#5 0x00000000005016d5 in handle_internal_command (argc=2, argv=0x7fffffffe200) at perf.c:375
#6 0x0000000000501824 in run_argv (argcp=0x7fffffffe02c, argv=0x7fffffffe020) at perf.c:419
#7 0x0000000000501b11 in main (argc=2, argv=0x7fffffffe200) at perf.c:535
(gdb)
So just set it to NULL after using PTR_ERR(session) to decode the error
as perf_session__delete(NULL) is supported.
Fixes: eef4fee5e5 ("perf lock: Dynamically allocate lockhash_table")
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mamatha Inamdar <mamatha4@linux.vnet.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Ross Zwisler <zwisler@chromium.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/lkml/ZN4R1AYfsD2J8lRs@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To avoid formatting failures for example in CSV output due to debug
messages, add --output option to put the result in a file.
Unfortunately the short -o option was taken by the --owner already.
$ sudo ./perf lock con -ab --output lock-out.txt -v sleep 1
Looking at the vmlinux_path (8 entries long)
symsrc__init: cannot get elf header.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
$ head lock-out.txt
contended total wait max wait avg wait type caller
3 76.79 us 26.89 us 25.60 us rwlock:R ep_poll_callback+0x2d
0xffffffff9a23f4b5 _raw_read_lock_irqsave+0x45
0xffffffff99bbd4dd ep_poll_callback+0x2d
0xffffffff999029f3 __wake_up_common+0x73
0xffffffff99902b82 __wake_up_common_lock+0x82
0xffffffff99fa5b1c sock_def_readable+0x3c
0xffffffff9a11521d unix_stream_sendmsg+0x18d
0xffffffff99f9fc9c sock_sendmsg+0x5c
Suggested-by: Ian Rogers <irogers@google.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628200141.2739587-4-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Sometimes we want to process the output by external programs. Let's add
the -x option to specify the field separator like perf stat.
$ sudo ./perf lock con -ab -x, sleep 1
# output: contended, total wait, max wait, avg wait, type, caller
19, 194232, 21415, 10222, spinlock, process_one_work+0x1f0
15, 162748, 23843, 10849, rwsem:R, do_user_addr_fault+0x40e
4, 86740, 23415, 21685, rwlock:R, ep_poll_callback+0x2d
1, 84281, 84281, 84281, mutex, iwl_mvm_async_handlers_wk+0x135
8, 67608, 27404, 8451, spinlock, __queue_work+0x174
3, 58616, 31125, 19538, rwsem:W, do_mprotect_pkey+0xff
3, 52953, 21172, 17651, rwlock:W, do_epoll_wait+0x248
2, 30324, 19704, 15162, rwsem:R, do_madvise+0x3ad
1, 24619, 24619, 24619, spinlock, rcu_core+0xd4
The first line is a comment that shows the output format. Each line is
separated by the given string ("," in this case). The time is printed
in nsec without the unit so that it can be parsed easily.
The characters can be used in the output like (":", "+" and ".") are not
allowed for the -x option.
$ ./perf lock con -x:
Cannot use the separator that is already used
Usage: perf lock contention [<options>]
-x, --field-separator <separator>
print result in CSV format with custom separator
The stacktraces are printed in the same line separated by ":". The
header is updated to show the stacktrace. Also the debug output is
added at the end as a comment.
$ sudo ./perf lock con -abv -x, -F wait_total sleep 1
Looking at the vmlinux_path (8 entries long)
symsrc__init: cannot get elf header.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
# output: total wait, type, caller, stacktrace
37134, spinlock, rcu_core+0xd4, 0xffffffff9d0401e4 _raw_spin_lock_irqsave+0x44: 0xffffffff9c738114 rcu_core+0xd4: ...
21213, spinlock, raw_spin_rq_lock_nested+0x1b, 0xffffffff9d0407c0 _raw_spin_lock+0x30: 0xffffffff9c6d9cfb raw_spin_rq_lock_nested+0x1b: ...
20506, rwlock:W, ep_done_scan+0x2d, 0xffffffff9c9bc4dd ep_done_scan+0x2d: 0xffffffff9c9bd5f1 do_epoll_wait+0x6d1: ...
18044, rwlock:R, ep_poll_callback+0x2d, 0xffffffff9d040555 _raw_read_lock_irqsave+0x45: 0xffffffff9c9bc81d ep_poll_callback+0x2d: ...
17890, rwlock:W, do_epoll_wait+0x47b, 0xffffffff9c9bd39b do_epoll_wait+0x47b: 0xffffffff9c9be9ef __x64_sys_epoll_wait+0x6d1: ...
12114, spinlock, futex_wait_queue+0x60, 0xffffffff9d0407c0 _raw_spin_lock+0x30: 0xffffffff9d037cae __schedule+0xbe: ...
# debug: total=7, bad=0, bad_task=0, bad_stack=0, bad_time=0, bad_data=0
Also note that some field (like lock symbols) can be empty.
$ sudo ./perf lock con -abl -x, -E 10 sleep 1
# output: contended, total wait, max wait, avg wait, address, symbol, type
6, 275025, 61764, 45837, ffff9dcc9f7d60d0, , spinlock
18, 87716, 11196, 4873, ffff9dc540059000, , spinlock
2, 6472, 5499, 3236, ffff9dcc7f730e00, rq_lock, spinlock
3, 4429, 2341, 1476, ffff9dcc7f7b0e00, rq_lock, spinlock
3, 3974, 1635, 1324, ffff9dcc7f7f0e00, rq_lock, spinlock
4, 3290, 1326, 822, ffff9dc5f4e2cde0, , rwlock
3, 2894, 1023, 964, ffffffff9e0d7700, rcu_state, spinlock
1, 2567, 2567, 2567, ffff9dcc7f6b0e00, rq_lock, spinlock
4, 1259, 596, 314, ffff9dc69c2adde0, , rwlock
1, 934, 934, 934, ffff9dcc7f670e00, rq_lock, spinlock
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628200141.2739587-3-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
The comment was for symbol_conf.sort_by_name which was deleted already.
Let's get rid of the stale comments as well.
Acked-by: Ian Rogers <irogers@google.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230628200141.2739587-2-namhyung@kernel.org
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Previously used to specify symbol_name_rb_node was in use.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Jason Wang <wangborong@cdjrlc.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Link: https://lore.kernel.org/r/20230623054520.4118442-4-irogers@google.com
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Pthread keys are more portable than __thread and allow the association
of a destructor with the key. Use the destructor to clean up TLS
callchain cursors to aid understanding memory leaks.
Committer notes:
Had to fixup a series of unconverted places and also check for the
return of get_tls_callchain_cursor() as it may fail and return NULL.
In that unlikely case we now either print something to a file, if the
caller was expecting to print a callchain, or return an error code to
state that resolving the callchain isn't possible.
In some cases this was made easier because thread__resolve_callchain()
already can fail for other reasons, so this new one (cursor == NULL) can
be added and the callers don't have to explicitely check for this new
condition.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Brian Robbins <brianrob@linux.microsoft.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Fangrui Song <maskray@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Ivan Babrou <ivan@cloudflare.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Liam Howlett <liam.howlett@oracle.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Naveen N. Rao <naveen.n.rao@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Steinar H. Gunderson <sesse@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Wenyu Liu <liuwenyu7@huawei.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Ye Xingchen <ye.xingchen@zte.com.cn>
Cc: Yuan Can <yuancan@huawei.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230608232823.4027869-25-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
lockhash_table is 32,768 bytes in .bss, make it a memory allocation so
that the space is freed for non-lock perf commands.
Signed-off-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20230526183401.2326121-10-irogers@google.com
Cc: K Prateek Nayak <kprateek.nayak@amd.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Ross Zwisler <zwisler@chromium.org>
Cc: Steven Rostedt (Google) <rostedt@goodmis.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu (Google) <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This reverts commit a980755beb.
We need to better polish building with BPF skels, so revert back to
making it an experimental feature that has to be explicitely enabled
using BUILD_BPF_SKEL=1.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Later changes will add reference count checking for struct map, add a
helper function to invoke the map_ip and unmap_ip function pointers. The
helper allows the reference count check to be in fewer places.
Committer notes:
Add missing conversions to:
tools/perf/util/map.c
tools/perf/util/cs-etm.c
tools/perf/util/annotate.c
tools/perf/arch/powerpc/util/sym-handling.c
tools/perf/arch/s390/annotate/instructions.c
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230404205954.2245628-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It needs callstacks for two reasons:
* for stack aggregation mode, the map key is the stack id and it can
also show the full stack traces when -v is used
* for other aggregation modes, the stack filter can be used to limit
lock contentions from known call paths
The -v option is meaningful (in terms of stack trace) only for stack
aggregation mode, so it should not set the save_callstack for other
mode like with -t or -l options.
I've noticed this with the following command line:
$ sudo ./perf lock con -ablv -E 3 -M 16 -- ./perf bench sched messaging
...
contended total wait max wait avg wait address symbol
88 4.59 ms 108.07 us 52.13 us ffff935757f46ec0 (spinlock)
33 905.22 us 73.67 us 27.43 us ffff935757f41700 (spinlock)
28 703.69 us 79.28 us 25.13 us ffff938a3d9b0c80 rq_lock (spinlock)
=== output for debug ===
bad: 12272, total: 12421
bad rate: 98.80 %
histogram of failure reasons
task: 8285
stack: 3987 <---------- here
time: 0
data: 0
It should not have any failure on stacks since it doesn't use it.
No functional change intended.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When -E option is used, it only prints the given number of entries but
the event stat at the end should have the numbers for entire entries.
Likewise, -S option will hide entries that don't have the named
function in the callstack. Also update event stat for them.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's possible to fail to update the data when the lock_stat map is full.
We should check that case and show the number at the end.
$ sudo ./perf lock con -ablv -E3 -- ./perf bench sched messaging
...
contended total wait max wait avg wait address symbol
6157 208.48 ms 69.29 us 33.86 us ffff934c001c1f00 (spinlock)
4030 72.04 ms 61.84 us 17.88 us ffff934c000415c0 (spinlock)
3201 50.30 ms 47.73 us 15.71 us ffff934c2eead850 (spinlock)
=== output for debug ===
bad: 0, total: 13388
bad rate: 0.00 %
histogram of failure reasons
task: 0
stack: 0
time: 0
data: 0 <----- added
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The BPF hash map will align the map size to a power of 2. So 10k would
be 16k anyway. Let's have the actual size to avoid confusions.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Users often want to change the map size, let's add a short option (-M)
for that.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The get_type_flag() should check both str and name fields in the
lock_type_table so that it can find the appropriate flag without retrying
with ':R' or ':W' suffix from the caller.
Also fix a typo in the rt-mutex.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It can fail to collect lock stat from BPF for various reasons. For
example, I've got a report that sometimes time calculation seems wrong
in case of contended spinlocks. I suspect the time delta went negative
for some reason.
Count them separately and show in the output like below:
$ sudo perf lock contention -abE5 sleep 10
contended total wait max wait avg wait type caller
13 785.61 us 79.36 us 60.43 us spinlock remove_wait_queue+0x14
10 469.02 us 87.51 us 46.90 us spinlock prepare_to_wait+0x27
9 289.09 us 69.08 us 32.12 us spinlock finish_wait+0x36
114 251.05 us 8.56 us 2.20 us spinlock try_to_wake_up+0x1f5
132 188.63 us 5.01 us 1.43 us spinlock __wake_up_common_lock+0x62
=== output for debug ===
bad: 1, total: 279
bad rate: 0.36 %
histogram of failure reasons
task: 1
stack: 0
time: 0
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230327225711.245738-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It should not divide if the total number is 0. Otherwise it'd show
NaN in the bad rate output. Also add a whitespace in the "output
for debug" message.
$ sudo perf lock contention -abv true
Looking at the vmlinux_path (8 entries long)
symsrc__init: cannot get elf header.
Using /proc/kcore for kernel data
Using /proc/kallsyms for symbols
contended total wait max wait avg wait type caller
=== output for debug===
bad: 0, total: 0
bad rate: -nan % <------------------------- (here)
histogram of events caused bad sequence
acquire: 0
acquired: 0
contended: 0
release: 0
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230327225711.245738-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Show lock type names after the symbol of locks if any. This can be
useful especially when it doesn't show the lock symbols.
The indentation before the lock type parenthesis is to recognize lock
symbols more easily.
$ sudo ./perf lock con -abl -- sleep 1
contended total wait max wait avg wait address symbol
44 6.13 ms 284.49 us 139.28 us ffffffff92e06080 tasklist_lock (rwlock)
159 983.38 us 12.38 us 6.18 us ffff8cc717c90000 siglock (spinlock)
10 679.90 us 153.35 us 67.99 us ffff8cdc2872aaf8 mmap_lock (rwsem)
9 558.11 us 180.67 us 62.01 us ffff8cd647914038 mmap_lock (rwsem)
78 228.56 us 7.82 us 2.93 us ffff8cc700061c00 (spinlock)
5 41.60 us 16.93 us 8.32 us ffffd853acb41468 (spinlock)
10 37.24 us 5.87 us 3.72 us ffff8cd560b5c200 siglock (spinlock)
4 11.17 us 3.97 us 2.79 us ffff8d053ddf0c80 rq_lock (spinlock)
1 7.86 us 7.86 us 7.86 us ffff8cd64791404c (spinlock)
1 4.13 us 4.13 us 4.13 us ffff8d053d930c80 rq_lock (spinlock)
7 3.98 us 1.67 us 568 ns ffff8ccb92479440 (mutex)
2 2.62 us 2.33 us 1.31 us ffff8cc702e6ede0 (rwlock)
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230313204825.2665483-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Likewise, we can display siglock by following the pointer like
current->sighand->siglock.
$ sudo ./perf lock con -abl -- sleep 1
contended total wait max wait avg wait address symbol
16 2.18 ms 305.35 us 136.34 us ffffffff92e06080 tasklist_lock
28 521.78 us 31.16 us 18.63 us ffff8cc703783ec4
7 119.03 us 23.55 us 17.00 us ffff8ccb92479440
15 88.29 us 10.06 us 5.89 us ffff8cd560b5f380 siglock
7 37.67 us 9.16 us 5.38 us ffff8d053daf0c80
5 8.81 us 4.92 us 1.76 us ffff8d053d6b0c80
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230313204825.2665483-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Sometimes there are severe contentions on the mmap_lock and we want
see it in the -l/--lock-addr output. However it cannot symbolize
the mmap_lock because it's allocated dynamically without symbols.
Stephane and Hao gave me an idea separately to display mmap_lock by
following the current->mm pointer. I added a flag to mark mmap_lock
after comparing the lock address so that it can show them differently.
With this change it can show mmap_lock like below:
$ sudo ./perf lock con -abl -- sleep 10
contended total wait max wait avg wait address symbol
...
16344 312.30 ms 2.22 ms 19.11 us ffff8cc702595640
17686 310.08 ms 1.49 ms 17.53 us ffff8cc7025952c0
3 84.14 ms 45.79 ms 28.05 ms ffff8cc78114c478 mmap_lock
3557 76.80 ms 68.75 us 21.59 us ffff8cc77ca3af58
1 68.27 ms 68.27 ms 68.27 ms ffff8cda745dfd70
9 54.53 ms 7.96 ms 6.06 ms ffff8cc7642a48b8 mmap_lock
14629 44.01 ms 60.00 us 3.01 us ffff8cc7625f9ca0
3481 42.63 ms 140.71 us 12.24 us ffffffff937906ac vmap_area_lock
16194 38.73 ms 42.15 us 2.39 us ffff8cd397cbc560
11 38.44 ms 10.39 ms 3.49 ms ffff8ccd6d12fbb8 mmap_lock
1 5.43 ms 5.43 ms 5.43 ms ffff8cd70018f0d8
1674 5.38 ms 422.93 us 3.21 us ffffffff92e06080 tasklist_lock
581 4.51 ms 130.68 us 7.75 us ffff8cc9b1259058
5 3.52 ms 1.27 ms 703.23 us ffff8cc754510070
112 3.47 ms 56.47 us 31.02 us ffff8ccee38b3120
381 3.31 ms 73.44 us 8.69 us ffffffff93790690 purge_vmap_area_lock
255 3.19 ms 36.35 us 12.49 us ffff8d053ce30c80
Note that mmap_lock was renamed some time ago and it needs to support
old kernels with a different name 'mmap_sem'.
Suggested-by: Hao Luo <haoluo@google.com>
Suggested-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Suren Baghdasaryan <surenb@google.com>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230313204825.2665483-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
BPF skeleton support is now key to a number of perf features. Rather
than making it so that BPF support must be enabled for the build, make
this the default and error if the build lacks a clang and libbpf that
are sufficient. To avoid the error and build without BPF skeletons the
NO_BPF_SKEL=1 flag can be used. Add a build-options flag to 'perf
version' to enable detection of the BPF skeleton support and use this
in the offcpu shell test.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andres Freund <andres@anarazel.de>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Pavithra Gurushankar <gpavithrasha@gmail.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Quentin Monnet <quentin@isovalent.com>
Cc: Roberto Sassu <roberto.sassu@huawei.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Tom Rix <trix@redhat.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230311065753.3012826-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When there're many lock contentions in the system, people sometimes want
to know who caused the contention, IOW who's the owner of the locks.
The -o/--lock-owner option tries to follow the lock owners for the
contended mutexes and rwsems from BPF, and then attributes the
contention time to the owner instead of the waiter. It's a best effort
approach to get the owner info at the time of the contention and doesn't
guarantee to have the precise tracking of owners if it's changing over
time.
Currently it only handles mutex and rwsem that have owner field in their
struct and it basically points to a task_struct that owns the lock at
the moment.
Technically its type is atomic_long_t and it comes with some LSB bits
used for other meanings. So it needs to clear them when casting it to a
pointer to task_struct.
Also the atomic_long_t is a typedef of the atomic 32 or 64 bit types
depending on arch which is a wrapper struct for the counter value. I'm
not aware of proper ways to access those kernel atomic types from BPF so
I just read the internal counter value directly. Please let me know if
there's a better way.
When -o/--lock-owner option is used, it goes to the task aggregation
mode like -t/--threads option does. However it cannot get the owner for
other lock types like spinlock and sometimes even for mutex.
$ sudo ./perf lock con -abo -- ./perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 4.766 [sec]
4.766540 usecs/op
209795 ops/sec
contended total wait max wait avg wait pid owner
403 565.32 us 26.81 us 1.40 us -1 Unknown
4 27.99 us 8.57 us 7.00 us 1583145 sched-pipe
1 8.25 us 8.25 us 8.25 us 1583144 sched-pipe
1 2.03 us 2.03 us 2.03 us 5068 chrome
As you can see, the owner is unknown for the most cases. But if we
filter only for the mutex locks, it'd more likely get the onwers.
$ sudo ./perf lock con -abo -Y mutex -- ./perf bench sched pipe
# Running 'sched/pipe' benchmark:
# Executed 1000000 pipe operations between two processes
Total time: 4.910 [sec]
4.910435 usecs/op
203647 ops/sec
contended total wait max wait avg wait pid owner
2 15.50 us 8.29 us 7.75 us 1582852 sched-pipe
7 7.20 us 2.47 us 1.03 us -1 Unknown
1 6.74 us 6.74 us 6.74 us 1582851 sched-pipe
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230207002403.63590-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The previous change missed to set the con->save_callstack for the
LOCK_AGGR_CALLER mode resulting in no caller information.
Fixes: ebab291641 ("perf lock contention: Support filters for different aggregation")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Boqun Feng <boqun.feng@gmail.com>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: Waiman Long <longman@redhat.com>
Cc: Will Deacon <will@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230207002403.63590-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It'd be useful to filter other than the current aggregation mode. For
example, users may want to see callstacks for specific locks only. Or
they may want tasks from a certain callstack.
The tracepoints already collected the information but it needs to check
the condition again when processing the event. And it needs to change
BPF to allow the key combinations.
The lock contentions on 'rcu_state' spinlock can be monitored:
$ sudo perf lock con -abv -L rcu_state sleep 1
...
contended total wait max wait avg wait type caller
4 151.39 us 62.57 us 37.85 us spinlock rcu_core+0xcb
0xffffffff81fd1666 _raw_spin_lock_irqsave+0x46
0xffffffff8172d76b rcu_core+0xcb
0xffffffff822000eb __softirqentry_text_start+0xeb
0xffffffff816a0ba9 __irq_exit_rcu+0xc9
0xffffffff81fc0112 sysvec_apic_timer_interrupt+0xa2
0xffffffff82000e46 asm_sysvec_apic_timer_interrupt+0x16
0xffffffff81d49f78 cpuidle_enter_state+0xd8
0xffffffff81d4a259 cpuidle_enter+0x29
1 30.21 us 30.21 us 30.21 us spinlock rcu_core+0xcb
0xffffffff81fd1666 _raw_spin_lock_irqsave+0x46
0xffffffff8172d76b rcu_core+0xcb
0xffffffff822000eb __softirqentry_text_start+0xeb
0xffffffff816a0ba9 __irq_exit_rcu+0xc9
0xffffffff81fc00c4 sysvec_apic_timer_interrupt+0x54
0xffffffff82000e46 asm_sysvec_apic_timer_interrupt+0x16
1 28.84 us 28.84 us 28.84 us spinlock rcu_accelerate_cbs_unlocked+0x40
0xffffffff81fd1c60 _raw_spin_lock+0x30
0xffffffff81728cf0 rcu_accelerate_cbs_unlocked+0x40
0xffffffff8172da82 rcu_core+0x3e2
0xffffffff822000eb __softirqentry_text_start+0xeb
0xffffffff816a0ba9 __irq_exit_rcu+0xc9
0xffffffff81fc0112 sysvec_apic_timer_interrupt+0xa2
0xffffffff82000e46 asm_sysvec_apic_timer_interrupt+0x16
0xffffffff81d49f78 cpuidle_enter_state+0xd8
...
To see tasks calling 'rcu_core' function:
$ sudo perf lock con -abt -S rcu_core sleep 1
contended total wait max wait avg wait pid comm
19 23.46 us 2.21 us 1.23 us 0 swapper
2 18.37 us 17.01 us 9.19 us 2061859 ThreadPoolForeg
3 5.76 us 1.97 us 1.92 us 3909 pipewire-pulse
1 2.26 us 2.26 us 2.26 us 1809271 MediaSu~isor #2
1 1.97 us 1.97 us 1.97 us 1514882 Chrome_ChildIOT
1 987 ns 987 ns 987 ns 3740 pipewire-pulse
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230203021324.143540-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This is a preparation work to support complex keys of BPF maps. Now it
has single value key according to the aggregation mode like stack_id or
pid. But we want to use a combination of those keys.
Then lock_contention_read() should still aggregate the result based on
the key that was requested by user. The other key info will be used for
filtering.
So instead of creating a lock_stat entry always, Check if it's already
there using lock_stat_find() first.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230203021324.143540-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The -S/--callstack-filter is to limit display entries having the given
string in the callstack (not only in the caller in the output).
The following example shows lock contention results if the callstack
has 'net' substring somewhere. Note that the caller '__dev_queue_xmit'
does not match to it, but it has 'inet6_csk_xmit' in the callstack.
This applies even if you don't use -v option to show the full callstack.
$ sudo ./perf lock con -abv -S net sleep 1
...
contended total wait max wait avg wait type caller
5 70.20 us 16.13 us 14.04 us spinlock __dev_queue_xmit+0xb6d
0xffffffffa5dd1c60 _raw_spin_lock+0x30
0xffffffffa5b8f6ed __dev_queue_xmit+0xb6d
0xffffffffa5cd8267 ip6_finish_output2+0x2c7
0xffffffffa5cdac14 ip6_finish_output+0x1d4
0xffffffffa5cdb477 ip6_xmit+0x457
0xffffffffa5d1fd17 inet6_csk_xmit+0xd7
0xffffffffa5c5f4aa __tcp_transmit_skb+0x54a
0xffffffffa5c6467d tcp_keepalive_timer+0x2fd
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230126000936.3017683-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The test case perf lock contention dumps core on s390. Run the following
commands:
# ./perf lock record -- ./perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 2.799 [sec]
[ perf record: Woken up 1 times to write data ]
[ perf record: Captured and wrote 0.073 MB perf.data (100 samples) ]
#
# ./perf lock contention
Segmentation fault (core dumped)
#
The function call stack is lengthy, here are the top 5 functions:
# gdb ./perf core.24048
GNU gdb (GDB) Fedora Linux 12.1-6.fc37
Core was generated by `./perf lock contention'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0 0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28, addr=1789230) at util/machine.c:3356
3356 machine->sched.text_end = kmap->unmap_ip(kmap, sym->start);
(gdb) where
#0 0x00000000011dd25c in machine__is_lock_function (machine=0x3029e28, addr=1789230) at util/machine.c:3356
#1 0x000000000109f244 in callchain_id (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:957
#2 0x000000000109e094 in get_key_by_aggr_mode (key=0x3ffea4f7290, addr=27758136, evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:586
#3 0x000000000109f4d0 in report_lock_contention_begin_event (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:1004
#4 0x00000000010a00ae in evsel__process_contention_begin (evsel=0x30313e0, sample=0x3ffea4f77d0) at builtin-lock.c:1254
#5 0x00000000010a0e14 in process_sample_event (tool=0x3ffea4f8480, event=0x3ff85601ef8, sample=0x3ffea4f77d0, evsel=0x30313e0, machine=0x3029e28) at builtin-lock.c:1464
.....
The issue is in function machine__is_lock_function() in file
./util/machine.c lines 3355:
/* should not fail from here */
sym = machine__find_kernel_symbol_by_name(machine, "__sched_text_end", &kmap);
machine->sched.text_end = kmap->unmap_ip(kmap, sym->start)
On s390 the symbol __sched_text_end is *NOT* in the symbol list and the
resulting pointer sym is set to NULL. The sym->start is then a NULL pointer
access and generates the core dump.
The reason why __sched_text_end is not in the symbol list on s390 is
simple:
When the symbol list is created at perf start up with function calls
dso__load
+--> dso__load_vmlinux_path
+--> dso__load_vmlinux
+--> dso__load_sym
+--> dso__load_sym_internal (reads kernel symbols)
+--> symbols__fixup_end
+--> symbols__fixup_duplicate
The issue is in function symbols__fixup_duplicate(). It deletes all
symbols with have the same address. On s390:
# nm -g ~/linux/vmlinux| fgrep c68390
0000000000c68390 T __cpuidle_text_start
0000000000c68390 T __sched_text_end
#
two symbols have identical addresses and __sched_text_end is considered
duplicate (in ascending sort order) and removed from the symbol list.
Therefore it is missing and an invalid pointer reference occurs. The
code checks for symbol __sched_text_start and when it exists assumes
symbol __sched_text_end is also in the symbol table. However this is not
the case on s390.
Same situation exists for symbol __lock_text_start:
0000000000c68770 T __cpuidle_text_end
0000000000c68770 T __lock_text_start
This symbol is also removed from the symbol table but used in function
machine__is_lock_function().
To fix this and keep duplicate symbols in the symbol table, set
symbol_conf.allow_aliases to true. This prevents the removal of
duplicate symbols in function symbols__fixup_duplicate().
Output After:
# ./perf lock contention
contended total wait max wait avg wait type caller
48 124.39 ms 123.99 ms 2.59 ms rwsem:W unlink_anon_vmas+0x24a
47 83.68 ms 83.26 ms 1.78 ms rwsem:W free_pgtables+0x132
5 41.22 us 10.55 us 8.24 us rwsem:W free_pgtables+0x140
4 40.12 us 20.55 us 10.03 us rwsem:W copy_process+0x1ac8
#
Fixes: 0d2997f750 ("perf lock: Look up callchain for the contended locks")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: https://lore.kernel.org/r/20221230102627.2410847-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The -L/--lock-filter option is to filter only given locks. The locks
can be specified by address or name (if exists).
$ sudo ./perf lock record -a sleep 1
$ sudo ./perf lock con -l
contended total wait max wait avg wait address symbol
57 1.11 ms 42.83 us 19.54 us ffff9f4140059000
15 280.88 us 23.51 us 18.73 us ffffffff9d007a40 jiffies_lock
1 20.49 us 20.49 us 20.49 us ffffffff9d0d50c0 rcu_state
1 9.02 us 9.02 us 9.02 us ffff9f41759e9ba0
$ sudo ./perf lock con -L jiffies_lock,rcu_state
contended total wait max wait avg wait type caller
15 280.88 us 23.51 us 18.73 us spinlock tick_sched_do_timer+0x93
1 20.49 us 20.49 us 20.49 us spinlock __softirqentry_text_start+0xeb
$ sudo ./perf lock con -L ffff9f4140059000
contended total wait max wait avg wait type caller
38 779.40 us 42.83 us 20.51 us spinlock worker_thread+0x50
11 216.30 us 39.87 us 19.66 us spinlock queue_work_on+0x39
8 118.13 us 20.51 us 14.77 us spinlock kthread+0xe5
Committer testing:
# uname -a
Linux quaco 6.0.12-200.fc36.x86_64 #1 SMP PREEMPT_DYNAMIC Thu Dec 8 17:15:53 UTC 2022 x86_64 x86_64 x86_64 GNU/Linux
# perf lock record
^C[ perf record: Woken up 1 times to write data ]
# perf lock con -L jiffies_lock,rcu_state
contended total wait max wait avg wait type caller
# perf lock con
contended total wait max wait avg wait type caller
1 9.06 us 9.06 us 9.06 us spinlock call_timer_fn+0x24
# perf lock con -L call
ignore unknown symbol: call
contended total wait max wait avg wait type caller
1 9.06 us 9.06 us 9.06 us spinlock call_timer_fn+0x24
#
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221219201732.460111-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Likewise, add type_filter BPF hash map and check it when user gave a
lock type filter.
$ sudo ./perf lock con -ab -Y rwlock -- ./perf bench sched messaging
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 10 groups == 400 processes run
Total time: 0.203 [sec]
contended total wait max wait avg wait type caller
15 156.19 us 19.45 us 10.41 us rwlock:W do_exit+0x36d
1 11.12 us 11.12 us 11.12 us rwlock:R do_wait+0x8b
1 5.09 us 5.09 us 5.09 us rwlock:W release_task+0x6e
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221219201732.460111-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The -Y/--type-filter option is to filter the result for specific lock
types only. It can accept comma-separated values. Note that it would
accept type names like one in the output. spinlock, mutex, rwsem:R and
so on.
For RW-variant lock types, it converts the name to the both variants.
In other words, "rwsem" is same as "rwsem:R,rwsem:W". Also note that
"mutex" has two different encoding - one for sleeping wait, another for
optimistic spinning. Add "mutex-spin" entry for the lock_type_table so
that we can add it for "mutex" under the table.
$ sudo ./perf lock record -a -- ./perf bench sched messaging
$ sudo ./perf lock con -E 5 -Y spinlock
contended total wait max wait avg wait type caller
802 1.26 ms 11.73 us 1.58 us spinlock __wake_up_common_lock+0x62
13 787.16 us 105.44 us 60.55 us spinlock remove_wait_queue+0x14
12 612.96 us 78.70 us 51.08 us spinlock prepare_to_wait+0x27
114 340.68 us 12.61 us 2.99 us spinlock try_to_wake_up+0x1f5
83 226.38 us 9.15 us 2.73 us spinlock folio_lruvec_lock_irqsave+0x5e
Committer notes:
Make get_type_flag() return UINT_MAX for error instad of -1UL, as that
function returns 'unsigned int' and we store the value on a 'unsigned
int' 'flags' variable which makes clang unhappy:
35 98.23 fedora:37 : FAIL clang version 15.0.6 (Fedora 15.0.6-1.fc37)
builtin-lock.c:2012:14: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare]
if (flags != -1UL) {
~~~~~ ^ ~~~~
builtin-lock.c:2021:14: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare]
if (flags != -1UL) {
~~~~~ ^ ~~~~
builtin-lock.c:2037:14: error: result of comparison of constant 18446744073709551615 with expression of type 'unsigned int' is always true [-Werror,-Wtautological-constant-out-of-range-compare]
if (flags != -1UL) {
~~~~~ ^ ~~~~
3 errors generated.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221219201732.460111-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move it out of get_type_str() so that we can reuse the table for others
later.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221219201732.460111-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The data type of the verbose variable is integer and can be negative,
replace improperly used cases in a unified manner:
1. if (verbose) => if (verbose > 0)
2. if (!verbose) => if (verbose <= 0)
3. if (XX && verbose) => if (XX && verbose > 0)
4. if (XX && !verbose) => if (XX && verbose <= 0)
Reviewed-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Carsten Haitzler <carsten.haitzler@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <martin.lau@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20221220035702.188413-3-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The -l/--lock-addr option is to implement per-lock-instance contention
stat using LOCK_AGGR_ADDR. It displays lock address and optionally
symbol name if exists.
$ sudo ./perf lock con -abl sleep 1
contended total wait max wait avg wait address symbol
1 36.28 us 36.28 us 36.28 us ffff92615d6448b8
9 10.91 us 1.84 us 1.21 us ffffffffbaed50c0 rcu_state
1 10.49 us 10.49 us 10.49 us ffff9262ac4f0c80
8 4.68 us 1.67 us 585 ns ffffffffbae07a40 jiffies_lock
3 3.03 us 1.45 us 1.01 us ffff9262277861e0
1 924 ns 924 ns 924 ns ffff926095ba9d20
1 436 ns 436 ns 436 ns ffff9260bfda4f60
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221209190727.759804-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The BPF didn't show the per-thread stat properly. Use task's thread id (PID)
as a key instead of stack_id and add a task_data map to save task comm names.
$ sudo ./perf lock con -abt -E 5 sleep 1
contended total wait max wait avg wait pid comm
1 740.66 ms 740.66 ms 740.66 ms 1950 nv_queue
3 305.50 ms 298.19 ms 101.83 ms 1884 nvidia-modeset/
1 25.14 us 25.14 us 25.14 us 2725038 EventManager_De
12 23.09 us 9.30 us 1.92 us 0 swapper
1 20.18 us 20.18 us 20.18 us 2725033 EventManager_De
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Blake Jones <blakejones@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20221209190727.759804-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is used in bpf_lock_contention.c and builtin-lock.c will be made
CONFIG_LIBTRACEEVENT=y conditional, so move it to machine.c, that is
always available.
This makes those 4 global variables for sched and lock text start and
end to move to 'struct machine' too, as conceivably we can have that
info for several machine instances, say some 'perf diff' like tool.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To support live monitoring of kernel lock contention without BPF,
it should support something like below:
# perf lock record -a -o- sleep 1 | perf lock contention -i-
contended total wait max wait avg wait type caller
2 10.27 us 6.17 us 5.13 us spinlock load_balance+0xc03
1 5.29 us 5.29 us 5.29 us rwlock:W ep_scan_ready_list+0x54
1 4.12 us 4.12 us 4.12 us spinlock smpboot_thread_fn+0x116
1 3.28 us 3.28 us 3.28 us mutex pipe_read+0x50
To do that, it needs to handle HEAD_ATTR, HEADER_EVENT_UPDATE and
HEADER_TRACING_DATA which are generated only for the pipe mode.
And setting event handler also should be delayed until it gets the
event information.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221104051440.220989-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The --max-stack option is used to allocate the BPF stack map and stack
trace array in the userspace. Check the value properly before using.
Practically it cannot be greater than the sysctl_perf_event_max_stack.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221028180128.3311491-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use the global quiet variable everywhere so that all tools hide warnings
in quiet mode and update the documentation to reflect this.
'perf probe' claimed that errors are not printed in quiet mode but I
don't see this so remove it from the docs.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221018094137.783081-3-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Like in 'perf report', this option is to suppress header and debug messages.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220924004221.841024-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Like in 'perf top', the -E option can limit number of entries to print.
It can be useful when users want to see top N contended locks only.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220924004221.841024-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It needs stack traces to find callers of locks. To minimize the
performance overhead it only collects up to 8 entries for each stack
trace. And it skips first 3 entries as they came from BPF, tracepoint
and lock functions which are not interested for most users.
But it turned out that those numbers are different in some
configuration. Using fixed number can result in non meaningful caller
names. Let's make them adjustable with --stack-depth and --skip-stack
options.
On my setup, the default output is like below:
# /perf lock con -ab -F contended,wait_total sleep 3
contended total wait type caller
28 4.55 ms rwlock:W __bpf_trace_contention_begin+0xb
33 1.67 ms rwlock:W __bpf_trace_contention_begin+0xb
12 580.28 us spinlock __bpf_trace_contention_begin+0xb
60 240.54 us rwsem:R __bpf_trace_contention_begin+0xb
27 64.45 us spinlock __bpf_trace_contention_begin+0xb
If I change the stack skip to 5, the result will be like:
# perf lock con -ab -F contended,wait_total --stack-skip 5 sleep 3
contended total wait type caller
32 715.45 us spinlock folio_lruvec_lock_irqsave+0x61
26 550.22 us spinlock folio_lruvec_lock_irqsave+0x61
15 486.93 us rwsem:R mmap_read_lock+0x13
12 139.66 us rwsem:W vm_mmap_pgoff+0x93
1 7.04 us spinlock tick_do_update_jiffies64+0x25
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20220912055314.744552-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>