2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-16 09:13:55 +08:00
linux-next/tools/perf
Namhyung Kim 784e8adda4 perf sort: Fix the 'weight' sort key behavior
Currently, the 'weight' field in the perf sample has latency information
for some instructions like in memory accesses.  And perf tool has 'weight'
and 'local_weight' sort keys to display the info.

But it's somewhat confusing what it shows exactly.  In my understanding,
'local_weight' shows a weight in a single sample, and (global) 'weight'
shows a sum of the weights in the hist_entry.

For example:

  $ perf mem record -t load dd if=/dev/zero of=/dev/null bs=4k count=1M

  $ perf report --stdio -n -s +local_weight
  ...
  #
  # Overhead  Samples  Command  Shared Object     Symbol                     Local Weight
  # ........  .......  .......  ................  .........................  ............
  #
      21.23%      313  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   32
      12.43%      183  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   35
      11.97%      159  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   36
      10.40%      141  dd       [kernel.vmlinux]  [k] lockref_put_return     32
       7.63%      113  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   33
       6.37%       92  dd       [kernel.vmlinux]  [k] lockref_get_not_zero   34
       6.15%       90  dd       [kernel.vmlinux]  [k] lockref_put_return     33
  ...

So let's look at the 'lockref_get_not_zero' symbols.  The top entry
shows that 313 samples were captured with 'local_weight' 32, so the
total weight should be 313 x 32 = 10016.  But it's not the case:

  $ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
  ...
  #
  # Overhead  Samples  Command  Shared Object     Local Weight  Weight
  # ........  .......  .......  ................  ............  ......
  #
       1.36%        4  dd       [kernel.vmlinux]  36            144
       0.47%        4  dd       [kernel.vmlinux]  37            148
       0.42%        4  dd       [kernel.vmlinux]  32            128
       0.40%        4  dd       [kernel.vmlinux]  34            136
       0.35%        4  dd       [kernel.vmlinux]  36            144
       0.34%        4  dd       [kernel.vmlinux]  35            140
       0.30%        4  dd       [kernel.vmlinux]  36            144
       0.30%        4  dd       [kernel.vmlinux]  34            136
       0.30%        4  dd       [kernel.vmlinux]  32            128
       0.30%        4  dd       [kernel.vmlinux]  32            128
  ...

With the 'weight' sort key, it's divided to 4 samples even with the same
info ('comm', 'dso', 'sym' and 'local_weight').  I don't think this is
what we want.

I found this because of the way it aggregates the 'weight' value.  Since
it's not a period, we should not add them in the he->stat.  Otherwise,
two 32 'weight' entries will create a 64 'weight' entry.

After that, new 32 'weight' samples don't have a matching entry so it'd
create a new entry and make it a 64 'weight' entry again and again.
Later, they will be merged into 128 'weight' entries during the
hists__collapse_resort() with 4 samples, multiple times like above.

Let's keep the weight and display it differently.  For 'local_weight',
it can show the weight as is, and for (global) 'weight' it can display
the number multiplied by the number of samples.

With this change, I can see the expected numbers.

  $ perf report --stdio -n -s +local_weight,weight -S lockref_get_not_zero
  ...
  #
  # Overhead  Samples  Command  Shared Object     Local Weight  Weight
  # ........  .......  .......  ................  ............  .....
  #
      21.23%      313  dd       [kernel.vmlinux]  32            10016
      12.43%      183  dd       [kernel.vmlinux]  35            6405
      11.97%      159  dd       [kernel.vmlinux]  36            5724
       7.63%      113  dd       [kernel.vmlinux]  33            3729
       6.37%       92  dd       [kernel.vmlinux]  34            3128
       4.17%       59  dd       [kernel.vmlinux]  37            2183
       0.08%        1  dd       [kernel.vmlinux]  269           269
       0.08%        1  dd       [kernel.vmlinux]  38            38

Reviewed-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20211105225617.151364-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-11-18 10:08:07 -03:00
..
arch tools headers UAPI: Sync files changed by new futex_waitv syscall 2021-11-13 18:11:51 -03:00
bench perf bench futex: Fix memory leak of perf_cpu_map__new() 2021-11-13 18:11:51 -03:00
dlfilters perf dlfilter: Add dlfilter-show-cycles 2021-10-27 16:20:30 -03:00
Documentation perf arm-spe: Update --switch-events docs in 'perf record' 2021-11-13 18:11:50 -03:00
examples/bpf perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
include perf build: Move perf_dlfilters.h in the source tree 2021-08-11 09:35:24 -03:00
jvmti perf tools: Fix various typos in comments 2021-03-23 17:13:43 -03:00
pmu-events perf vendor events power10: Add metric events JSON file for power10 platform 2021-11-13 18:11:50 -03:00
python tweewide: Fix most Shebang lines 2020-12-08 23:30:04 +09:00
scripts perf scripts python: Fix passing arguments to stackcollapse report 2021-09-10 11:45:19 -03:00
tests perf tests wp: Remove unused functions on s390 2021-11-18 10:08:07 -03:00
trace perf beauty: Add socket level scnprintf that handles ARCH specific SOL_SOCKET 2021-11-12 10:40:34 -03:00
ui perf annotate: Fix fused instr logic for assembly functions 2021-09-18 17:43:05 -03:00
util perf sort: Fix the 'weight' sort key behavior 2021-11-18 10:08:07 -03:00
.gitignore Add 'tools/perf/libbpf/' to ignored files 2021-11-08 11:33:35 -08:00
Build perf daemon: Add daemon command 2021-02-09 15:42:57 -03:00
builtin-annotate.c perf tools: Check vmlinux/kallsyms arguments in all tools 2021-11-07 12:27:38 -03:00
builtin-bench.c perf bench: Add benchmark for evlist open/close operations 2021-08-10 11:32:37 -03:00
builtin-buildid-cache.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-buildid-list.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-c2c.c perf tools: Check vmlinux/kallsyms arguments in all tools 2021-11-07 12:27:38 -03:00
builtin-config.c
builtin-daemon.c perf daemon: Remove duplicate sys/file.h include 2021-10-08 15:14:50 -03:00
builtin-data.c perf data: Correct -h output 2021-08-31 15:12:00 -03:00
builtin-diff.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-evlist.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-ftrace.c perf ftrace: Fix access to pid in array when setting a pid filter 2021-04-23 15:58:10 -03:00
builtin-help.c
builtin-inject.c perf inject: Add vmlinux and ignore-vmlinux arguments 2021-11-07 12:27:38 -03:00
builtin-kallsyms.c
builtin-kmem.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-kvm.c perf tools: Allow controlling synthesizing PERF_RECORD_ metadata events during record 2021-09-17 08:44:19 -03:00
builtin-list.c perf list: Display hybrid PMU events with cpu type 2021-10-25 13:47:42 -03:00
builtin-lock.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-mem.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-probe.c perf tools: Check vmlinux/kallsyms arguments in all tools 2021-11-07 12:27:38 -03:00
builtin-record.c perf tools: Check vmlinux/kallsyms arguments in all tools 2021-11-07 12:27:38 -03:00
builtin-report.c perf tools: Refactor out kernel symbol argument sanity checking 2021-11-07 12:27:38 -03:00
builtin-sched.c perf tools: Check vmlinux/kallsyms arguments in all tools 2021-11-07 12:27:38 -03:00
builtin-script.c perf tools: Check vmlinux/kallsyms arguments in all tools 2021-11-07 12:27:38 -03:00
builtin-stat.c perf parse-event: Add init and exit to parse_event_error 2021-11-07 15:39:25 -03:00
builtin-timechart.c perf tools: Remove repipe argument from perf_session__new() 2021-08-02 10:06:51 -03:00
builtin-top.c perf tools: Check vmlinux/kallsyms arguments in all tools 2021-11-07 12:27:38 -03:00
builtin-trace.c perf trace: Beautify the 'level' argument of setsockopt 2021-11-12 10:40:34 -03:00
builtin-version.c perf version: Add a feature for libpfm4 2020-11-04 09:42:40 -03:00
builtin.h perf daemon: Add daemon command 2021-02-09 15:42:57 -03:00
check-headers.sh tools lib: Adopt list_sort() from the kernel sources 2021-10-20 10:30:59 -03:00
command-list.txt perf stat: Enable iostat mode for x86 platforms 2021-04-20 08:40:20 -03:00
CREDITS
design.txt perf design.txt: Synchronize the definition of enum perf_hw_id with code 2021-11-13 18:11:50 -03:00
Makefile perf tools: Add a build-test variant to use in builds from a tarball 2021-04-20 08:43:58 -03:00
Makefile.config perf tools: Set COMPAT_NEED_REALLOCARRAY for CONFIG_AUXTRACE=1 2021-11-18 10:08:07 -03:00
Makefile.perf perf beauty socket: Add generator for socket level (SOL_*) string table 2021-11-12 10:40:34 -03:00
MANIFEST perf MANIFEST: Add bpftool files to allow building with BUILD_BPF_SKEL=1 2021-11-07 15:39:28 -03:00
perf-archive.sh perf archive: Fix filtering of empty build-ids 2021-03-06 16:54:31 -03:00
perf-completion.sh
perf-iostat.sh perf stat: Enable iostat mode for x86 platforms 2021-04-20 08:40:20 -03:00
perf-read-vdso.c
perf-sys.h perf tests: Call test_attr__open() directly 2020-09-10 11:55:37 -03:00
perf-with-kcore.sh
perf.c perf debug: Move debug initialization earlier 2021-05-27 13:24:22 -03:00
perf.h