2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-18 02:04:05 +08:00
linux-next/tools/perf
Kan Liang 384b60557b perf tools: Construct LBR call chain
LBR call stack only has user-space callchains. It is output in the
PERF_SAMPLE_BRANCH_STACK data format. For kernel callchains, it's
still in the form of PERF_SAMPLE_CALLCHAIN.

The perf tool has to handle both data sources to construct a
complete callstack.

For the "perf report -D" option, both lbr and fp information will be
displayed.

A new call chain recording option "lbr" is introduced into the perf
tool for LBR call stack. The user can use --call-graph lbr to get
the call stack information from hardware.

Here are some examples.

When profiling bc(1) on Fedora 19:

  echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph lbr bc -l < cmd

If enabling LBR, perf report output looks like:

    50.36%       bc  bc                 [.] bc_divide
                 |
                 --- bc_divide
                     execute
                     run_code
                     yyparse
                     main
                     __libc_start_main
                     _start
    33.66%       bc  bc                 [.] _one_mult
                 |
                 --- _one_mult
                     bc_divide
                     execute
                     run_code
                     yyparse
                     main
                     __libc_start_main
                     _start
     7.62%       bc  bc                 [.] _bc_do_add
                 |
                 --- _bc_do_add
                    |
                    |--99.89%-- 0x2000186a8
                     --0.11%-- [...]
     6.83%       bc  bc                 [.] _bc_do_sub
                 |
                 --- _bc_do_sub
                    |
                    |--99.94%-- bc_add
                    |          execute
                    |          run_code
                    |          yyparse
                    |          main
                    |          __libc_start_main
                    |          _start
                     --0.06%-- [...]
     0.46%       bc  libc-2.17.so       [.] __memset_sse2
                 |
                 --- __memset_sse2
                    |
                    |--54.13%-- bc_new_num
                    |          |
                    |          |--51.00%-- bc_divide
                    |          |          execute
                    |          |          run_code
                    |          |          yyparse
                    |          |          main
                    |          |          __libc_start_main
                    |          |          _start
                    |          |
                    |          |--30.46%-- _bc_do_sub
                    |          |          bc_add
                    |          |          execute
                    |          |          run_code
                    |          |          yyparse
                    |          |          main
                    |          |          __libc_start_main
                    |          |          _start
                    |          |
                    |           --18.55%-- _bc_do_add
                    |                     bc_add
                    |                     execute
                    |                     run_code
                    |                     yyparse
                    |                     main
                    |                     __libc_start_main
                    |                     _start
                    |
                     --45.87%-- bc_divide
                               execute
                               run_code
                               yyparse
                               main
                               __libc_start_main
                               _start

If using FP, perf report output looks like:

  echo 'scale=2000; 4*a(1)' > cmd; perf record --call-graph fp bc -l < cmd

    50.49%       bc  bc                 [.] bc_divide
                 |
                 --- bc_divide
    33.57%       bc  bc                 [.] _one_mult
                 |
                 --- _one_mult
     7.61%       bc  bc                 [.] _bc_do_add
                 |
                 --- _bc_do_add
                     0x2000186a8
     6.88%       bc  bc                 [.] _bc_do_sub
                 |
                 --- _bc_do_sub
     0.42%       bc  libc-2.17.so       [.] __memcpy_ssse3_back
                 |
                 --- __memcpy_ssse3_back

If using LBR, perf report -D output looks like:

3458145275743 0x2fd750 [0xd8]: PERF_RECORD_SAMPLE(IP, 0x2): 9748/9748: 0x408ea8 period: 609644 addr: 0
... LBR call chain: nr:8
.....  0: fffffffffffffe00
.....  1: 0000000000408e50
.....  2: 000000000040a458
.....  3: 000000000040562e
.....  4: 0000000000408590
.....  5: 00000000004022c0
.....  6: 00000000004015dd
.....  7: 0000003d1cc21b43
... FP chain: nr:2
.....  0: fffffffffffffe00
.....  1: 0000000000408ea8
 ... thread: bc:9748
 ...... dso: /usr/bin/bc

The LBR call stack has the following known limitations:

 - Zero length calls are not filtered out by the hardware

 - Exception handing such as setjmp/longjmp will have calls/returns not
   match

 - Pushing different return address onto the stack will have
   calls/returns not match

 - If callstack is deeper than the LBR, only the last entries are
   captured

Tested-by: Jiri Olsa <jolsa@kernel.org>
Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Simon Que <sque@chromium.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1420482185-29830-3-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-02-18 17:16:18 +01:00
..
arch perf tools powerpc: Use dwfl_report_elf() instead of offline. 2015-01-16 17:49:30 -03:00
bench perf tools: Provide stub for missing pthread_attr_setaffinity_np 2015-01-28 12:43:32 -03:00
config perf tools: Provide stub for missing pthread_attr_setaffinity_np 2015-01-28 12:43:32 -03:00
Documentation perf tools: Enable LBR call stack support 2015-02-18 17:16:17 +01:00
python perf python: Remove duplicate TID bit from mask 2013-08-07 17:35:25 -03:00
scripts perf scripting perl: Force to use stdbool 2015-01-21 10:05:00 -03:00
tests Merge branch 'perf/hw_breakpoints' into perf/core 2015-01-28 15:48:59 +01:00
ui perf ui/tui: Show fatal error message only if exists 2015-01-22 17:05:10 -03:00
util perf tools: Construct LBR call chain 2015-02-18 17:16:18 +01:00
.gitignore perf tools: Add perf-read-vdso32 and perf-read-vdsox32 to .gitignore 2014-11-19 12:34:24 -03:00
builtin-annotate.c perf report: Show progress bar for output resorting 2014-12-23 12:01:37 -03:00
builtin-bench.c perf bench: Add --repeat option 2014-06-19 16:13:15 -03:00
builtin-buildid-cache.c perf tools: Remove EOL whitespaces 2015-01-21 13:24:31 -03:00
builtin-buildid-list.c perf session: Separating data file properties from session 2013-10-21 17:33:25 -03:00
builtin-diff.c perf diff: Fix -o/--order option behavior 2015-01-21 13:24:35 -03:00
builtin-evlist.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-help.c perf help: Use strerror_r instead of strerror 2014-08-15 13:08:26 -03:00
builtin-inject.c perf tools: Use perf_data_file__fd() consistently 2015-01-29 16:58:24 -03:00
builtin-kmem.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-kvm.c perf kvm stat live: Mark events as (x86 only) in help output 2014-12-10 12:08:59 -03:00
builtin-list.c perf list: Fix --raw-dump option 2015-01-02 23:26:58 -03:00
builtin-lock.c perf tools: Modify error code for when perf_session__new() fails 2014-09-26 12:32:58 -03:00
builtin-mem.c perf mem: Move the mem_operations global to struct perf_mem 2015-01-21 13:24:31 -03:00
builtin-probe.c perf probe: Add --quiet option to suppress output result message 2014-10-29 10:32:49 -02:00
builtin-record.c perf tools: Enable LBR call stack support 2015-02-18 17:16:17 +01:00
builtin-report.c perf tools: Enable LBR call stack support 2015-02-18 17:16:17 +01:00
builtin-sched.c perf sched: Stop updating hists stats, not used 2014-10-09 11:46:35 -03:00
builtin-script.c perf tools: Export usage string and option table of perf record 2014-10-29 10:32:47 -02:00
builtin-stat.c perf tools: Remove EOL whitespaces 2015-01-21 13:24:31 -03:00
builtin-timechart.c perf tools: Export usage string and option table of perf record 2014-10-29 10:32:47 -02:00
builtin-top.c perf tools: Remove EOL whitespaces 2015-01-21 13:24:31 -03:00
builtin-trace.c tools lib fs debugfs: Introduce debugfs__strerror_open_tp 2015-01-22 17:02:20 -03:00
builtin.h perf tools: Add new mem command for memory access profiling 2013-04-01 12:21:44 -03:00
command-list.txt perf tools: Add new mem command for memory access profiling 2013-04-01 12:21:44 -03:00
CREDITS perf_counter tools: Add CREDITS file for Git contributors 2009-06-24 19:54:29 +02:00
design.txt perf tools: Update some code references in design.txt 2014-03-18 18:17:06 -03:00
Makefile perf tools: Add 'build-test' make target 2014-01-16 16:26:26 -03:00
Makefile.perf tools: Remove bitops/hweight usage of bits in tools/perf 2015-01-16 17:49:29 -03:00
MANIFEST tools: Remove bitops/hweight usage of bits in tools/perf 2015-01-16 17:49:29 -03:00
perf-archive.sh perf archive: Make 'f' the last parameter for tar 2012-09-17 13:10:42 -03:00
perf-completion.sh perf sched: Introduce --list-cmds for use by scripts 2014-04-16 17:16:05 +02:00
perf-read-vdso.c perf tools: Build programs to copy 32-bit compatibility 2014-10-29 10:32:48 -02:00
perf-sys.h perf tools: Avoid build splat for syscall numbers with uclibc 2015-01-16 17:49:29 -03:00
perf-with-kcore.sh perf tools: Add perf-with-kcore script 2014-09-17 17:08:08 -03:00
perf.c perf tools: Add --buildid-dir option to set cache directory 2014-12-09 09:14:35 -03:00
perf.h perf tools: Add core support for sampling intr machine state regs 2014-11-16 11:41:59 +01:00