2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-21 19:53:59 +08:00
linux-next/tools/perf/builtin-stat.c

1459 lines
36 KiB
C
Raw Normal View History

/*
* builtin-stat.c
*
* Builtin stat command: Give a precise performance counters summary
* overview about any workload, CPU or specific PID.
*
* Sample output:
perf stat: Add -d -d and -d -d -d options to show more CPU events Print even more detailed statistics if requested via perf stat -d: -d: detailed events, L1 and LLC data cache -d -d: more detailed events, dTLB and iTLB events -d -d -d: very detailed events, adding prefetch events Full output looks like this now: Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% ) 49,068 context-switches # 0.029 M/sec ( +- 16.66% ) 8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% ) 17,397 page-faults # 0.010 M/sec ( +- 0.46% ) 2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%] 1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] 743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] 1,314,416,379 instructions # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] 272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%] 3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%] 449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%] 22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] 6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%] 1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%] 411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%] 27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%] 464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%] 10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%] 1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%] 117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%] 4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%] 1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%] 0.195622057 seconds time elapsed ( +- 6.84% ) Also clean up the attribute construction code to be appending, and factor it out into add_default_attributes(). Tweak the coverage percentage printout a bit, so that it's easier to view it alongside the +- sttddev colum. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 19:30:56 +08:00
$ perf stat ./hackbench 10
perf stat: Add -d -d and -d -d -d options to show more CPU events Print even more detailed statistics if requested via perf stat -d: -d: detailed events, L1 and LLC data cache -d -d: more detailed events, dTLB and iTLB events -d -d -d: very detailed events, adding prefetch events Full output looks like this now: Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% ) 49,068 context-switches # 0.029 M/sec ( +- 16.66% ) 8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% ) 17,397 page-faults # 0.010 M/sec ( +- 0.46% ) 2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%] 1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] 743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] 1,314,416,379 instructions # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] 272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%] 3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%] 449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%] 22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] 6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%] 1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%] 411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%] 27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%] 464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%] 10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%] 1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%] 117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%] 4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%] 1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%] 0.195622057 seconds time elapsed ( +- 6.84% ) Also clean up the attribute construction code to be appending, and factor it out into add_default_attributes(). Tweak the coverage percentage printout a bit, so that it's easier to view it alongside the +- sttddev colum. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 19:30:56 +08:00
Time: 0.118
perf stat: Add -d -d and -d -d -d options to show more CPU events Print even more detailed statistics if requested via perf stat -d: -d: detailed events, L1 and LLC data cache -d -d: more detailed events, dTLB and iTLB events -d -d -d: very detailed events, adding prefetch events Full output looks like this now: Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% ) 49,068 context-switches # 0.029 M/sec ( +- 16.66% ) 8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% ) 17,397 page-faults # 0.010 M/sec ( +- 0.46% ) 2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%] 1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] 743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] 1,314,416,379 instructions # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] 272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%] 3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%] 449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%] 22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] 6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%] 1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%] 411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%] 27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%] 464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%] 10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%] 1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%] 117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%] 4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%] 1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%] 0.195622057 seconds time elapsed ( +- 6.84% ) Also clean up the attribute construction code to be appending, and factor it out into add_default_attributes(). Tweak the coverage percentage printout a bit, so that it's easier to view it alongside the +- sttddev colum. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 19:30:56 +08:00
Performance counter stats for './hackbench 10':
perf stat: Add -d -d and -d -d -d options to show more CPU events Print even more detailed statistics if requested via perf stat -d: -d: detailed events, L1 and LLC data cache -d -d: more detailed events, dTLB and iTLB events -d -d -d: very detailed events, adding prefetch events Full output looks like this now: Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% ) 49,068 context-switches # 0.029 M/sec ( +- 16.66% ) 8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% ) 17,397 page-faults # 0.010 M/sec ( +- 0.46% ) 2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%] 1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] 743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] 1,314,416,379 instructions # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] 272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%] 3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%] 449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%] 22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] 6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%] 1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%] 411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%] 27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%] 464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%] 10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%] 1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%] 117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%] 4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%] 1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%] 0.195622057 seconds time elapsed ( +- 6.84% ) Also clean up the attribute construction code to be appending, and factor it out into add_default_attributes(). Tweak the coverage percentage printout a bit, so that it's easier to view it alongside the +- sttddev colum. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 19:30:56 +08:00
1708.761321 task-clock # 11.037 CPUs utilized
41,190 context-switches # 0.024 M/sec
6,735 CPU-migrations # 0.004 M/sec
17,318 page-faults # 0.010 M/sec
5,205,202,243 cycles # 3.046 GHz
3,856,436,920 stalled-cycles-frontend # 74.09% frontend cycles idle
1,600,790,871 stalled-cycles-backend # 30.75% backend cycles idle
2,603,501,247 instructions # 0.50 insns per cycle
# 1.48 stalled cycles per insn
484,357,498 branches # 283.455 M/sec
6,388,934 branch-misses # 1.32% of all branches
0.154822978 seconds time elapsed
*
perf stat: Add -d -d and -d -d -d options to show more CPU events Print even more detailed statistics if requested via perf stat -d: -d: detailed events, L1 and LLC data cache -d -d: more detailed events, dTLB and iTLB events -d -d -d: very detailed events, adding prefetch events Full output looks like this now: Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% ) 49,068 context-switches # 0.029 M/sec ( +- 16.66% ) 8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% ) 17,397 page-faults # 0.010 M/sec ( +- 0.46% ) 2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%] 1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] 743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] 1,314,416,379 instructions # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] 272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%] 3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%] 449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%] 22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] 6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%] 1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%] 411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%] 27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%] 464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%] 10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%] 1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%] 117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%] 4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%] 1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%] 0.195622057 seconds time elapsed ( +- 6.84% ) Also clean up the attribute construction code to be appending, and factor it out into add_default_attributes(). Tweak the coverage percentage printout a bit, so that it's easier to view it alongside the +- sttddev colum. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 19:30:56 +08:00
* Copyright (C) 2008-2011, Red Hat Inc, Ingo Molnar <mingo@redhat.com>
*
* Improvements and fixes by:
*
* Arjan van de Ven <arjan@linux.intel.com>
* Yanmin Zhang <yanmin.zhang@intel.com>
* Wu Fengguang <fengguang.wu@intel.com>
* Mike Galbraith <efault@gmx.de>
* Paul Mackerras <paulus@samba.org>
* Jaswinder Singh Rajput <jaswinder@kernel.org>
*
* Released under the GPL v2. (and only v2, not any later version)
*/
#include "perf.h"
#include "builtin.h"
#include "util/cgroup.h"
#include "util/util.h"
#include <subcmd/parse-options.h>
#include "util/parse-events.h"
#include "util/pmu.h"
#include "util/event.h"
#include "util/evlist.h"
#include "util/evsel.h"
#include "util/debug.h"
#include "util/color.h"
#include "util/stat.h"
#include "util/header.h"
perf tools: Fix sparse CPU numbering related bugs At present, the perf subcommands that do system-wide monitoring (perf stat, perf record and perf top) don't work properly unless the online cpus are numbered 0, 1, ..., N-1. These tools ask for the number of online cpus with sysconf(_SC_NPROCESSORS_ONLN) and then try to create events for cpus 0, 1, ..., N-1. This creates problems for systems where the online cpus are numbered sparsely. For example, a POWER6 system in single-threaded mode (i.e. only running 1 hardware thread per core) will have only even-numbered cpus online. This fixes the problem by reading the /sys/devices/system/cpu/online file to find out which cpus are online. The code that does that is in tools/perf/util/cpumap.[ch], and consists of a read_cpu_map() function that sets up a cpumap[] array and returns the number of online cpus. If /sys/devices/system/cpu/online can't be read or can't be parsed successfully, it falls back to using sysconf to ask how many cpus are online and sets up an identity map in cpumap[]. The perf record, perf stat and perf top code then calls read_cpu_map() in the system-wide monitoring case (instead of sysconf) and uses cpumap[] to get the cpu numbers to pass to perf_event_open. Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Anton Blanchard <anton@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Arnaldo Carvalho de Melo <acme@infradead.org> LKML-Reference: <20100310093609.GA3959@brick.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-03-10 17:36:09 +08:00
#include "util/cpumap.h"
#include "util/thread.h"
#include "util/thread_map.h"
#include "util/counts.h"
#include <stdlib.h>
#include <sys/prctl.h>
perf stat: add perf stat -B to pretty print large numbers It is hard to read very large numbers so provide an option to perf stat to separate thousands using a separator. The patch leverages the locale support of stdio. You need to set your LC_NUMERIC appropriately, for instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this feature. This way existing scripts parsing the output do not need to be changed. Here is an example. $ perf stat noploop 2 noploop for 2 seconds Performance counter stats for 'noploop 2': 1998.347031 task-clock-msecs # 0.998 CPUs 61 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 118 page-faults # 0.000 M/sec 4,138,410,900 cycles # 2070.917 M/sec (scaled from 70.01%) 2,062,650,268 instructions # 0.498 IPC (scaled from 70.01%) 2,057,653,466 branches # 1029.678 M/sec (scaled from 70.01%) 40,267 branch-misses # 0.002 % (scaled from 30.04%) 2,055,961,348 cache-references # 1028.831 M/sec (scaled from 30.03%) 53,725 cache-misses # 0.027 M/sec (scaled from 30.02%) 2.001393933 seconds time elapsed $ perf stat -B noploop 2 noploop for 2 seconds Performance counter stats for 'noploop 2': 1998.297883 task-clock-msecs # 0.998 CPUs 59 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 119 page-faults # 0.000 M/sec 4,131,380,160 cycles # 2067.450 M/sec (scaled from 70.01%) 2,059,096,507 instructions # 0.498 IPC (scaled from 70.01%) 2,054,681,303 branches # 1028.216 M/sec (scaled from 70.01%) 25,650 branch-misses # 0.001 % (scaled from 30.05%) 2,056,283,014 cache-references # 1029.017 M/sec (scaled from 30.03%) 47,097 cache-misses # 0.024 M/sec (scaled from 30.02%) 2.001391016 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <4bf28fe8.914ed80a.01ca.fffff5f5@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-18 21:00:01 +08:00
#include <locale.h>
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
#define DEFAULT_SEPARATOR " "
perf stat: clarify unsupported events from uncounted events perf stat continues running even if the event list contains counters that are not supported. The resulting output then contains <not counted> for those events which gets confusing as to which events are supported, but not counted and which are not supported. Before: perf stat -ddd -- sleep 1 Performance counter stats for 'sleep 1': 0.571283 task-clock # 0.001 CPUs utilized 1 context-switches # 0.002 M/sec 0 CPU-migrations # 0.000 M/sec 157 page-faults # 0.275 M/sec 1,037,707 cycles # 1.816 GHz <not counted> stalled-cycles-frontend <not counted> stalled-cycles-backend 654,499 instructions # 0.63 insns per cycle 136,129 branches # 238.286 M/sec <not counted> branch-misses <not counted> L1-dcache-loads <not counted> L1-dcache-load-misses <not counted> LLC-loads <not counted> LLC-load-misses <not counted> L1-icache-loads <not counted> L1-icache-load-misses <not counted> dTLB-loads <not counted> dTLB-load-misses <not counted> iTLB-loads <not counted> iTLB-load-misses <not counted> L1-dcache-prefetches <not counted> L1-dcache-prefetch-misses 1.001004836 seconds time elapsed After: perf stat -ddd -- sleep 1 Performance counter stats for 'sleep 1': 1.350326 task-clock # 0.001 CPUs utilized 2 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 M/sec 157 page-faults # 0.116 M/sec 11,986 cycles # 0.009 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 496,986 instructions # 41.46 insns per cycle 138,065 branches # 102.246 M/sec 7,245 branch-misses # 5.25% of all branches <not counted> L1-dcache-loads <not counted> L1-dcache-load-misses <not counted> LLC-loads <not counted> LLC-load-misses <not counted> L1-icache-loads <not counted> L1-icache-load-misses <not counted> dTLB-loads <not counted> dTLB-load-misses <not counted> iTLB-loads <not counted> iTLB-load-misses <not counted> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 1.002397333 seconds time elapsed v1->v2: changed supported type from int to bool v2->v3 fixed vertical alignment of new struct element Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1306767359-13221-1-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-05-30 22:55:59 +08:00
#define CNTR_NOT_SUPPORTED "<not supported>"
#define CNTR_NOT_COUNTED "<not counted>"
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
static void print_counters(struct timespec *ts, int argc, const char **argv);
2013-01-29 19:47:44 +08:00
/* Default events used for perf stat -T */
static const char *transaction_attrs = {
"task-clock,"
"{"
"instructions,"
"cycles,"
"cpu/cycles-t/,"
"cpu/tx-start/,"
"cpu/el-start/,"
"cpu/cycles-ct/"
"}"
};
/* More limited version when the CPU does not have all events. */
static const char * transaction_limited_attrs = {
"task-clock,"
"{"
"instructions,"
"cycles,"
"cpu/cycles-t/,"
"cpu/tx-start/"
"}"
};
static struct perf_evlist *evsel_list;
static struct target target = {
.uid = UINT_MAX,
};
typedef int (*aggr_get_id_t)(struct cpu_map *m, int cpu);
static int run_count = 1;
static bool no_inherit = false;
static volatile pid_t child_pid = -1;
perf: Fix endianness argument compatibility with OPT_BOOLEAN() and introduce OPT_INCR() Parsing an option from the command line with OPT_BOOLEAN on a bool data type would not work on a big-endian machine due to the manner in which the boolean was being cast into an int and incremented. For example, running 'perf probe --list' on a PowerPC machine would fail to properly set the list_events bool and would therefore print out the usage information and terminate. This patch makes OPT_BOOLEAN work as expected with a bool datatype. For cases where the original OPT_BOOLEAN was intentionally being used to increment an int each time it was passed in on the command line, this patch introduces OPT_INCR with the old behaviour of OPT_BOOLEAN (the verbose variable is currently the only such example of this). I have reviewed every use of OPT_BOOLEAN to verify that a true C99 bool was passed. Where integers were used, I verified that they were only being used for boolean logic and changed them to bools to ensure that they would not be mistakenly used as ints. The major exception was the verbose variable which now uses OPT_INCR instead of OPT_BOOLEAN. Signed-off-by: Ian Munsie <imunsie@au.ibm.com> Acked-by: David S. Miller <davem@davemloft.net> Cc: <stable@kernel.org> # NOTE: wont apply to .3[34].x cleanly, please backport Cc: Git development list <git@vger.kernel.org> Cc: Ian Munsie <imunsie@au1.ibm.com> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: KOSAKI Motohiro <kosaki.motohiro@jp.fujitsu.com> Cc: Hitoshi Mitake <mitake@dcl.info.waseda.ac.jp> Cc: Rusty Russell <rusty@rustcorp.com.au> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Eric B Munson <ebmunson@us.ibm.com> Cc: Valdis.Kletnieks@vt.edu Cc: WANG Cong <amwang@redhat.com> Cc: Thiago Farina <tfransosi@gmail.com> Cc: Masami Hiramatsu <mhiramat@redhat.com> Cc: Xiao Guangrong <xiaoguangrong@cn.fujitsu.com> Cc: Jaswinder Singh Rajput <jaswinderrajput@gmail.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp> Cc: Mike Galbraith <efault@gmx.de> Cc: Tom Zanussi <tzanussi@gmail.com> Cc: Anton Blanchard <anton@samba.org> Cc: John Kacur <jkacur@redhat.com> Cc: Li Zefan <lizf@cn.fujitsu.com> Cc: Steven Rostedt <rostedt@goodmis.org> LKML-Reference: <1271147857-11604-1-git-send-email-imunsie@au.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2010-04-13 16:37:33 +08:00
static bool null_run = false;
perf stat: Add -d -d and -d -d -d options to show more CPU events Print even more detailed statistics if requested via perf stat -d: -d: detailed events, L1 and LLC data cache -d -d: more detailed events, dTLB and iTLB events -d -d -d: very detailed events, adding prefetch events Full output looks like this now: Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% ) 49,068 context-switches # 0.029 M/sec ( +- 16.66% ) 8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% ) 17,397 page-faults # 0.010 M/sec ( +- 0.46% ) 2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%] 1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] 743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] 1,314,416,379 instructions # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] 272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%] 3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%] 449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%] 22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] 6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%] 1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%] 411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%] 27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%] 464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%] 10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%] 1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%] 117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%] 4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%] 1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%] 0.195622057 seconds time elapsed ( +- 6.84% ) Also clean up the attribute construction code to be appending, and factor it out into add_default_attributes(). Tweak the coverage percentage printout a bit, so that it's easier to view it alongside the +- sttddev colum. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 19:30:56 +08:00
static int detailed_run = 0;
static bool transaction_run;
static bool big_num = true;
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
static int big_num_opt = -1;
static const char *csv_sep = NULL;
static bool csv_output = false;
static bool group = false;
static const char *pre_cmd = NULL;
static const char *post_cmd = NULL;
static bool sync_run = false;
perf stat: Add support for --initial-delay option When measuring workloads the startup phase -- doing page faults, dynamic linking, opening files -- is often very different from the rest of the workload. Especially with smaller kernels and using counter multiplexing this can give significant measurement errors. Multiplexing assumes that the workload is mostly the same over longer periods. But at startup there is typically some spike of activity which is relatively short. If many groups are multiplexing the one group seeing the spike, and which is then scaled up over the time to run all groups, may see a significant error. Also in general it's often not useful to measure the startup, because it is so different from the rest. One way around this is to use interval mode and discard the first sample, but this can be awkward because interval mode doesn't support intervals of less than 100ms, and also a useful interval is not necessarily the same as a useful startup delay. This patch adds a new --initial-delay / -D option to skip measuring for the startup phase. The time can be specified in ms Here's a simple example: perf stat -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done' ... 3,721 page-faults ... If we just wait 20 ms the number of page faults is 1/3 less: perf stat -D 20 -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done' ... 2,823 page-faults ... So we filtered out most of the startup noise from bash. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375490473-1503-4-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-08-03 08:41:11 +08:00
static unsigned int initial_delay = 0;
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
static unsigned int unit_width = 4; /* strlen("unit") */
static bool forever = false;
2013-01-29 19:47:44 +08:00
static struct timespec ref_time;
static struct cpu_map *aggr_map;
static aggr_get_id_t aggr_get_id;
static bool append_file;
static const char *output_name;
static int output_fd;
perf stat: add perf stat -B to pretty print large numbers It is hard to read very large numbers so provide an option to perf stat to separate thousands using a separator. The patch leverages the locale support of stdio. You need to set your LC_NUMERIC appropriately, for instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this feature. This way existing scripts parsing the output do not need to be changed. Here is an example. $ perf stat noploop 2 noploop for 2 seconds Performance counter stats for 'noploop 2': 1998.347031 task-clock-msecs # 0.998 CPUs 61 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 118 page-faults # 0.000 M/sec 4,138,410,900 cycles # 2070.917 M/sec (scaled from 70.01%) 2,062,650,268 instructions # 0.498 IPC (scaled from 70.01%) 2,057,653,466 branches # 1029.678 M/sec (scaled from 70.01%) 40,267 branch-misses # 0.002 % (scaled from 30.04%) 2,055,961,348 cache-references # 1028.831 M/sec (scaled from 30.03%) 53,725 cache-misses # 0.027 M/sec (scaled from 30.02%) 2.001393933 seconds time elapsed $ perf stat -B noploop 2 noploop for 2 seconds Performance counter stats for 'noploop 2': 1998.297883 task-clock-msecs # 0.998 CPUs 59 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 119 page-faults # 0.000 M/sec 4,131,380,160 cycles # 2067.450 M/sec (scaled from 70.01%) 2,059,096,507 instructions # 0.498 IPC (scaled from 70.01%) 2,054,681,303 branches # 1028.216 M/sec (scaled from 70.01%) 25,650 branch-misses # 0.001 % (scaled from 30.05%) 2,056,283,014 cache-references # 1029.017 M/sec (scaled from 30.03%) 47,097 cache-misses # 0.024 M/sec (scaled from 30.02%) 2.001391016 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <4bf28fe8.914ed80a.01ca.fffff5f5@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-18 21:00:01 +08:00
static volatile int done = 0;
static struct perf_stat_config stat_config = {
.aggr_mode = AGGR_GLOBAL,
.scale = true,
};
2013-01-29 19:47:44 +08:00
static inline void diff_timespec(struct timespec *r, struct timespec *a,
struct timespec *b)
{
r->tv_sec = a->tv_sec - b->tv_sec;
if (a->tv_nsec < b->tv_nsec) {
r->tv_nsec = a->tv_nsec + 1000000000L - b->tv_nsec;
r->tv_sec--;
} else {
r->tv_nsec = a->tv_nsec - b->tv_nsec ;
}
}
static void perf_stat__reset_stats(void)
{
perf_evlist__reset_stats(evsel_list);
perf_stat__reset_shadow_stats();
}
static int create_perf_stat_counter(struct perf_evsel *evsel)
{
struct perf_event_attr *attr = &evsel->attr;
if (stat_config.scale)
attr->read_format = PERF_FORMAT_TOTAL_TIME_ENABLED |
PERF_FORMAT_TOTAL_TIME_RUNNING;
attr->inherit = !no_inherit;
/*
* Some events get initialized with sample_(period/type) set,
* like tracepoints. Clear it up for counting.
*/
attr->sample_period = 0;
attr->sample_type = 0;
/*
* Disabling all counters initially, they will be enabled
* either manually by us or by kernel via enable_on_exec
* set later.
*/
if (perf_evsel__is_group_leader(evsel)) {
attr->disabled = 1;
/*
* In case of initial_delay we enable tracee
* events manually.
*/
if (target__none(&target) && !initial_delay)
perf stat: Add support for --initial-delay option When measuring workloads the startup phase -- doing page faults, dynamic linking, opening files -- is often very different from the rest of the workload. Especially with smaller kernels and using counter multiplexing this can give significant measurement errors. Multiplexing assumes that the workload is mostly the same over longer periods. But at startup there is typically some spike of activity which is relatively short. If many groups are multiplexing the one group seeing the spike, and which is then scaled up over the time to run all groups, may see a significant error. Also in general it's often not useful to measure the startup, because it is so different from the rest. One way around this is to use interval mode and discard the first sample, but this can be awkward because interval mode doesn't support intervals of less than 100ms, and also a useful interval is not necessarily the same as a useful startup delay. This patch adds a new --initial-delay / -D option to skip measuring for the startup phase. The time can be specified in ms Here's a simple example: perf stat -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done' ... 3,721 page-faults ... If we just wait 20 ms the number of page faults is 1/3 less: perf stat -D 20 -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done' ... 2,823 page-faults ... So we filtered out most of the startup noise from bash. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375490473-1503-4-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-08-03 08:41:11 +08:00
attr->enable_on_exec = 1;
}
if (target__has_cpu(&target))
return perf_evsel__open_per_cpu(evsel, perf_evsel__cpus(evsel));
return perf_evsel__open_per_thread(evsel, evsel_list->threads);
}
/*
* Does the counter have nsecs as a unit?
*/
static inline int nsec_counter(struct perf_evsel *evsel)
{
if (perf_evsel__match(evsel, SOFTWARE, SW_CPU_CLOCK) ||
perf_evsel__match(evsel, SOFTWARE, SW_TASK_CLOCK))
return 1;
return 0;
}
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
/*
* Read out the results of a single counter:
* do not aggregate counts across CPUs in system-wide mode
*/
static int read_counter(struct perf_evsel *counter)
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
{
int nthreads = thread_map__nr(evsel_list->threads);
int ncpus = perf_evsel__nr_cpus(counter);
int cpu, thread;
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
if (!counter->supported)
return -ENOENT;
if (counter->system_wide)
nthreads = 1;
for (thread = 0; thread < nthreads; thread++) {
for (cpu = 0; cpu < ncpus; cpu++) {
struct perf_counts_values *count;
count = perf_counts(counter->counts, cpu, thread);
if (perf_evsel__read(counter, cpu, thread, count))
return -1;
}
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
}
return 0;
}
static void read_counters(bool close_counters)
2013-01-29 19:47:44 +08:00
{
struct perf_evsel *counter;
evlist__for_each(evsel_list, counter) {
if (read_counter(counter))
pr_debug("failed to read counter %s\n", counter->name);
if (perf_stat_process_counter(&stat_config, counter))
pr_warning("failed to process counter %s\n", counter->name);
if (close_counters) {
perf_evsel__close_fd(counter, perf_evsel__nr_cpus(counter),
thread_map__nr(evsel_list->threads));
2013-01-29 19:47:44 +08:00
}
}
}
static void process_interval(void)
{
struct timespec ts, rs;
read_counters(false);
2013-01-29 19:47:44 +08:00
clock_gettime(CLOCK_MONOTONIC, &ts);
diff_timespec(&rs, &ts, &ref_time);
print_counters(&rs, 0, NULL);
2013-01-29 19:47:44 +08:00
}
static void enable_counters(void)
perf stat: Add support for --initial-delay option When measuring workloads the startup phase -- doing page faults, dynamic linking, opening files -- is often very different from the rest of the workload. Especially with smaller kernels and using counter multiplexing this can give significant measurement errors. Multiplexing assumes that the workload is mostly the same over longer periods. But at startup there is typically some spike of activity which is relatively short. If many groups are multiplexing the one group seeing the spike, and which is then scaled up over the time to run all groups, may see a significant error. Also in general it's often not useful to measure the startup, because it is so different from the rest. One way around this is to use interval mode and discard the first sample, but this can be awkward because interval mode doesn't support intervals of less than 100ms, and also a useful interval is not necessarily the same as a useful startup delay. This patch adds a new --initial-delay / -D option to skip measuring for the startup phase. The time can be specified in ms Here's a simple example: perf stat -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done' ... 3,721 page-faults ... If we just wait 20 ms the number of page faults is 1/3 less: perf stat -D 20 -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done' ... 2,823 page-faults ... So we filtered out most of the startup noise from bash. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375490473-1503-4-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-08-03 08:41:11 +08:00
{
if (initial_delay)
perf stat: Add support for --initial-delay option When measuring workloads the startup phase -- doing page faults, dynamic linking, opening files -- is often very different from the rest of the workload. Especially with smaller kernels and using counter multiplexing this can give significant measurement errors. Multiplexing assumes that the workload is mostly the same over longer periods. But at startup there is typically some spike of activity which is relatively short. If many groups are multiplexing the one group seeing the spike, and which is then scaled up over the time to run all groups, may see a significant error. Also in general it's often not useful to measure the startup, because it is so different from the rest. One way around this is to use interval mode and discard the first sample, but this can be awkward because interval mode doesn't support intervals of less than 100ms, and also a useful interval is not necessarily the same as a useful startup delay. This patch adds a new --initial-delay / -D option to skip measuring for the startup phase. The time can be specified in ms Here's a simple example: perf stat -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done' ... 3,721 page-faults ... If we just wait 20 ms the number of page faults is 1/3 less: perf stat -D 20 -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done' ... 2,823 page-faults ... So we filtered out most of the startup noise from bash. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375490473-1503-4-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-08-03 08:41:11 +08:00
usleep(initial_delay * 1000);
/*
* We need to enable counters only if:
* - we don't have tracee (attaching to task or cpu)
* - we have initial delay configured
*/
if (!target__none(&target) || initial_delay)
perf_evlist__enable(evsel_list);
perf stat: Add support for --initial-delay option When measuring workloads the startup phase -- doing page faults, dynamic linking, opening files -- is often very different from the rest of the workload. Especially with smaller kernels and using counter multiplexing this can give significant measurement errors. Multiplexing assumes that the workload is mostly the same over longer periods. But at startup there is typically some spike of activity which is relatively short. If many groups are multiplexing the one group seeing the spike, and which is then scaled up over the time to run all groups, may see a significant error. Also in general it's often not useful to measure the startup, because it is so different from the rest. One way around this is to use interval mode and discard the first sample, but this can be awkward because interval mode doesn't support intervals of less than 100ms, and also a useful interval is not necessarily the same as a useful startup delay. This patch adds a new --initial-delay / -D option to skip measuring for the startup phase. The time can be specified in ms Here's a simple example: perf stat -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done' ... 3,721 page-faults ... If we just wait 20 ms the number of page faults is 1/3 less: perf stat -D 20 -e page-faults bash -c 'for i in $(seq 100000) ; do true ; done' ... 2,823 page-faults ... So we filtered out most of the startup noise from bash. Signed-off-by: Andi Kleen <ak@linux.intel.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1375490473-1503-4-git-send-email-andi@firstfloor.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2013-08-03 08:41:11 +08:00
}
static volatile int workload_exec_errno;
/*
* perf_evlist__prepare_workload will send a SIGUSR1
* if the fork fails, since we asked by setting its
* want_signal to true.
*/
static void workload_exec_failed_signal(int signo __maybe_unused, siginfo_t *info,
void *ucontext __maybe_unused)
{
workload_exec_errno = info->si_value.sival_int;
}
static int __run_perf_stat(int argc, const char **argv)
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
{
int interval = stat_config.interval;
char msg[512];
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
unsigned long long t0, t1;
struct perf_evsel *counter;
2013-01-29 19:47:44 +08:00
struct timespec ts;
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
size_t l;
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
int status = 0;
const bool forks = (argc > 0);
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
2013-01-29 19:47:44 +08:00
if (interval) {
ts.tv_sec = interval / 1000;
ts.tv_nsec = (interval % 1000) * 1000000;
} else {
ts.tv_sec = 1;
ts.tv_nsec = 0;
}
if (forks) {
if (perf_evlist__prepare_workload(evsel_list, &target, argv, false,
workload_exec_failed_signal) < 0) {
perror("failed to prepare workload");
return -1;
}
child_pid = evsel_list->workload.pid;
perf_counter tools: Reduce perf stat measurement overhead/skew Vince Weaver reported a 'perf stat' measurement overhead in the count of retired instructions, which can amount to a +6000 instructions inflated count in the reported count. At present, perf stat creates its counters on the perf process. Thus the counters count the fork and various other activity in both the parent and child, such as the resolver overhead for resolving PLT entries for any libc functions that haven't been called before, such as execvp. This reduces the overhead by creating the counters on the child process after the fork, using a couple of pipes to synchronize so that the child process waits until the parent has created the counters before doing the exec. To eliminate the PLT resolution overhead on calling execvp, this does a dummy execvp first which will always fail. With this, the overhead of executing a program goes down from over 4800 instructions to about 90 instructions on powerpc (32-bit). This was measured with a statically-linked program written in assembler which only does the 3 instructions needed to call _exit(0). Before: $ perf stat -e 0:1:u ./three Performance counter stats for './three': 4858 instructions 0.001274523 seconds time elapsed After: $ perf stat -e 0:1:u ./three Performance counter stats for './three': 92 instructions 0.000468153 seconds time elapsed Reported-by: Vince Weaver <vince@deater.net> Signed-off-by: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> LKML-Reference: <19016.41425.814043.870352@cargo.ozlabs.ibm.com> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-29 19:13:21 +08:00
}
perf tools: Enable grouping logic for parsed events This patch adds a functionality that allows to create event groups based on the way they are specified on the command line. Adding functionality to the '{}' group syntax introduced in earlier patch. The current '--group/-g' option behaviour remains intact. If you specify it for record/stat/top command, all the specified events become members of a single group with the first event as a group leader. With the new '{}' group syntax you can create group like: # perf record -e '{cycles,faults}' ls resulting in single event group containing 'cycles' and 'faults' events, with cycles event as group leader. All groups are created with regards to threads and cpus. Thus recording an event group within a 2 threads on server with 4 CPUs will create 8 separate groups. Examples (first event in brackets is group leader): # 1 group (cpu-clock,task-clock) perf record --group -e cpu-clock,task-clock ls perf record -e '{cpu-clock,task-clock}' ls # 2 groups (cpu-clock,task-clock) (minor-faults,major-faults) perf record -e '{cpu-clock,task-clock},{minor-faults,major-faults}' ls # 1 group (cpu-clock,task-clock,minor-faults,major-faults) perf record --group -e cpu-clock,task-clock -e minor-faults,major-faults ls perf record -e '{cpu-clock,task-clock,minor-faults,major-faults}' ls # 2 groups (cpu-clock,task-clock) (minor-faults,major-faults) perf record -e '{cpu-clock,task-clock} -e '{minor-faults,major-faults}' \ -e instructions ls # 1 group # (cpu-clock,task-clock,minor-faults,major-faults,instructions) perf record --group -e cpu-clock,task-clock \ -e minor-faults,major-faults -e instructions ls perf record -e '{cpu-clock,task-clock,minor-faults,major-faults,instructions}' ls It's possible to use standard event modifier for a group, which spans over all events in the group and updates each event modifier settings, for example: # perf record -r '{faults:k,cache-references}:p' resulting in ':kp' modifier being used for 'faults' and ':p' modifier being used for 'cache-references' event. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ulrich Drepper <drepper@gmail.com> Link: http://lkml.kernel.org/n/tip-ho42u0wcr8mn1otkalqi13qp@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-08-08 18:22:36 +08:00
if (group)
perf_evlist__set_leader(evsel_list);
perf tools: Enable grouping logic for parsed events This patch adds a functionality that allows to create event groups based on the way they are specified on the command line. Adding functionality to the '{}' group syntax introduced in earlier patch. The current '--group/-g' option behaviour remains intact. If you specify it for record/stat/top command, all the specified events become members of a single group with the first event as a group leader. With the new '{}' group syntax you can create group like: # perf record -e '{cycles,faults}' ls resulting in single event group containing 'cycles' and 'faults' events, with cycles event as group leader. All groups are created with regards to threads and cpus. Thus recording an event group within a 2 threads on server with 4 CPUs will create 8 separate groups. Examples (first event in brackets is group leader): # 1 group (cpu-clock,task-clock) perf record --group -e cpu-clock,task-clock ls perf record -e '{cpu-clock,task-clock}' ls # 2 groups (cpu-clock,task-clock) (minor-faults,major-faults) perf record -e '{cpu-clock,task-clock},{minor-faults,major-faults}' ls # 1 group (cpu-clock,task-clock,minor-faults,major-faults) perf record --group -e cpu-clock,task-clock -e minor-faults,major-faults ls perf record -e '{cpu-clock,task-clock,minor-faults,major-faults}' ls # 2 groups (cpu-clock,task-clock) (minor-faults,major-faults) perf record -e '{cpu-clock,task-clock} -e '{minor-faults,major-faults}' \ -e instructions ls # 1 group # (cpu-clock,task-clock,minor-faults,major-faults,instructions) perf record --group -e cpu-clock,task-clock \ -e minor-faults,major-faults -e instructions ls perf record -e '{cpu-clock,task-clock,minor-faults,major-faults,instructions}' ls It's possible to use standard event modifier for a group, which spans over all events in the group and updates each event modifier settings, for example: # perf record -r '{faults:k,cache-references}:p' resulting in ':kp' modifier being used for 'faults' and ':p' modifier being used for 'cache-references' event. Reviewed-by: Namhyung Kim <namhyung@kernel.org> Signed-off-by: Jiri Olsa <jolsa@redhat.com> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Andi Kleen <andi@firstfloor.org> Cc: Arnaldo Carvalho de Melo <acme@ghostprotocols.net> Cc: Corey Ashford <cjashfor@linux.vnet.ibm.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ulrich Drepper <drepper@gmail.com> Link: http://lkml.kernel.org/n/tip-ho42u0wcr8mn1otkalqi13qp@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-08-08 18:22:36 +08:00
evlist__for_each(evsel_list, counter) {
if (create_perf_stat_counter(counter) < 0) {
/*
* PPC returns ENXIO for HW counters until 2.6.37
* (behavior changed with commit b0a873e).
*/
if (errno == EINVAL || errno == ENOSYS ||
errno == ENOENT || errno == EOPNOTSUPP ||
errno == ENXIO) {
if (verbose)
ui__warning("%s event is not supported by the kernel.\n",
perf_evsel__name(counter));
perf stat: clarify unsupported events from uncounted events perf stat continues running even if the event list contains counters that are not supported. The resulting output then contains <not counted> for those events which gets confusing as to which events are supported, but not counted and which are not supported. Before: perf stat -ddd -- sleep 1 Performance counter stats for 'sleep 1': 0.571283 task-clock # 0.001 CPUs utilized 1 context-switches # 0.002 M/sec 0 CPU-migrations # 0.000 M/sec 157 page-faults # 0.275 M/sec 1,037,707 cycles # 1.816 GHz <not counted> stalled-cycles-frontend <not counted> stalled-cycles-backend 654,499 instructions # 0.63 insns per cycle 136,129 branches # 238.286 M/sec <not counted> branch-misses <not counted> L1-dcache-loads <not counted> L1-dcache-load-misses <not counted> LLC-loads <not counted> LLC-load-misses <not counted> L1-icache-loads <not counted> L1-icache-load-misses <not counted> dTLB-loads <not counted> dTLB-load-misses <not counted> iTLB-loads <not counted> iTLB-load-misses <not counted> L1-dcache-prefetches <not counted> L1-dcache-prefetch-misses 1.001004836 seconds time elapsed After: perf stat -ddd -- sleep 1 Performance counter stats for 'sleep 1': 1.350326 task-clock # 0.001 CPUs utilized 2 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 M/sec 157 page-faults # 0.116 M/sec 11,986 cycles # 0.009 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 496,986 instructions # 41.46 insns per cycle 138,065 branches # 102.246 M/sec 7,245 branch-misses # 5.25% of all branches <not counted> L1-dcache-loads <not counted> L1-dcache-load-misses <not counted> LLC-loads <not counted> LLC-load-misses <not counted> L1-icache-loads <not counted> L1-icache-load-misses <not counted> dTLB-loads <not counted> dTLB-load-misses <not counted> iTLB-loads <not counted> iTLB-load-misses <not counted> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 1.002397333 seconds time elapsed v1->v2: changed supported type from int to bool v2->v3 fixed vertical alignment of new struct element Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1306767359-13221-1-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-05-30 22:55:59 +08:00
counter->supported = false;
if ((counter->leader != counter) ||
!(counter->leader->nr_members > 1))
continue;
}
perf_evsel__open_strerror(counter, &target,
errno, msg, sizeof(msg));
ui__error("%s\n", msg);
if (child_pid != -1)
kill(child_pid, SIGTERM);
return -1;
}
perf stat: clarify unsupported events from uncounted events perf stat continues running even if the event list contains counters that are not supported. The resulting output then contains <not counted> for those events which gets confusing as to which events are supported, but not counted and which are not supported. Before: perf stat -ddd -- sleep 1 Performance counter stats for 'sleep 1': 0.571283 task-clock # 0.001 CPUs utilized 1 context-switches # 0.002 M/sec 0 CPU-migrations # 0.000 M/sec 157 page-faults # 0.275 M/sec 1,037,707 cycles # 1.816 GHz <not counted> stalled-cycles-frontend <not counted> stalled-cycles-backend 654,499 instructions # 0.63 insns per cycle 136,129 branches # 238.286 M/sec <not counted> branch-misses <not counted> L1-dcache-loads <not counted> L1-dcache-load-misses <not counted> LLC-loads <not counted> LLC-load-misses <not counted> L1-icache-loads <not counted> L1-icache-load-misses <not counted> dTLB-loads <not counted> dTLB-load-misses <not counted> iTLB-loads <not counted> iTLB-load-misses <not counted> L1-dcache-prefetches <not counted> L1-dcache-prefetch-misses 1.001004836 seconds time elapsed After: perf stat -ddd -- sleep 1 Performance counter stats for 'sleep 1': 1.350326 task-clock # 0.001 CPUs utilized 2 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 M/sec 157 page-faults # 0.116 M/sec 11,986 cycles # 0.009 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 496,986 instructions # 41.46 insns per cycle 138,065 branches # 102.246 M/sec 7,245 branch-misses # 5.25% of all branches <not counted> L1-dcache-loads <not counted> L1-dcache-load-misses <not counted> LLC-loads <not counted> LLC-load-misses <not counted> L1-icache-loads <not counted> L1-icache-load-misses <not counted> dTLB-loads <not counted> dTLB-load-misses <not counted> iTLB-loads <not counted> iTLB-load-misses <not counted> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 1.002397333 seconds time elapsed v1->v2: changed supported type from int to bool v2->v3 fixed vertical alignment of new struct element Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1306767359-13221-1-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-05-30 22:55:59 +08:00
counter->supported = true;
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
l = strlen(counter->unit);
if (l > unit_width)
unit_width = l;
}
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
if (perf_evlist__apply_filters(evsel_list, &counter)) {
error("failed to set filter \"%s\" on event %s with %d (%s)\n",
counter->filter, perf_evsel__name(counter), errno,
strerror_r(errno, msg, sizeof(msg)));
return -1;
}
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
/*
* Enable counters and exec the command:
*/
t0 = rdclock();
2013-01-29 19:47:44 +08:00
clock_gettime(CLOCK_MONOTONIC, &ref_time);
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
if (forks) {
perf_evlist__start_workload(evsel_list);
enable_counters();
2013-01-29 19:47:44 +08:00
if (interval) {
while (!waitpid(child_pid, &status, WNOHANG)) {
nanosleep(&ts, NULL);
process_interval();
2013-01-29 19:47:44 +08:00
}
}
wait(&status);
if (workload_exec_errno) {
const char *emsg = strerror_r(workload_exec_errno, msg, sizeof(msg));
pr_err("Workload failed: %s\n", emsg);
return -1;
}
if (WIFSIGNALED(status))
psignal(WTERMSIG(status), argv[0]);
} else {
enable_counters();
2013-01-29 19:47:44 +08:00
while (!done) {
nanosleep(&ts, NULL);
if (interval)
process_interval();
2013-01-29 19:47:44 +08:00
}
}
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
t1 = rdclock();
update_stats(&walltime_nsecs_stats, t1 - t0);
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
read_counters(true);
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
return WEXITSTATUS(status);
}
static int run_perf_stat(int argc, const char **argv)
{
int ret;
if (pre_cmd) {
ret = system(pre_cmd);
if (ret)
return ret;
}
if (sync_run)
sync();
ret = __run_perf_stat(argc, argv);
if (ret)
return ret;
if (post_cmd) {
ret = system(post_cmd);
if (ret)
return ret;
}
return ret;
}
static void print_running(u64 run, u64 ena)
{
if (csv_output) {
fprintf(stat_config.output, "%s%" PRIu64 "%s%.2f",
csv_sep,
run,
csv_sep,
ena ? 100.0 * run / ena : 100.0);
} else if (run != ena) {
fprintf(stat_config.output, " (%.2f%%)", 100.0 * run / ena);
}
}
static void print_noise_pct(double total, double avg)
{
double pct = rel_stddev_stats(total, avg);
if (csv_output)
fprintf(stat_config.output, "%s%.2f%%", csv_sep, pct);
else if (pct)
fprintf(stat_config.output, " ( +-%6.2f%% )", pct);
}
static void print_noise(struct perf_evsel *evsel, double avg)
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
{
struct perf_stat_evsel *ps;
if (run_count == 1)
return;
ps = evsel->priv;
print_noise_pct(stddev_stats(&ps->res_stats[0]), avg);
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
}
static void aggr_printout(struct perf_evsel *evsel, int id, int nr)
{
switch (stat_config.aggr_mode) {
case AGGR_CORE:
fprintf(stat_config.output, "S%d-C%*d%s%*d%s",
cpu_map__id_to_socket(id),
csv_output ? 0 : -8,
cpu_map__id_to_cpu(id),
csv_sep,
csv_output ? 0 : 4,
nr,
csv_sep);
break;
case AGGR_SOCKET:
fprintf(stat_config.output, "S%*d%s%*d%s",
csv_output ? 0 : -5,
id,
csv_sep,
csv_output ? 0 : 4,
nr,
csv_sep);
break;
case AGGR_NONE:
fprintf(stat_config.output, "CPU%*d%s",
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
csv_output ? 0 : -4,
perf_evsel__cpus(evsel)->map[id], csv_sep);
break;
perf stat: Introduce --per-thread option Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 17:29:27 +08:00
case AGGR_THREAD:
fprintf(stat_config.output, "%*s-%*d%s",
perf stat: Introduce --per-thread option Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 17:29:27 +08:00
csv_output ? 0 : 16,
thread_map__comm(evsel->threads, id),
csv_output ? 0 : -8,
thread_map__pid(evsel->threads, id),
csv_sep);
break;
case AGGR_GLOBAL:
case AGGR_UNSET:
default:
break;
}
}
static void nsec_printout(int id, int nr, struct perf_evsel *evsel, double avg)
{
FILE *output = stat_config.output;
double msecs = avg / 1e6;
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
const char *fmt_v, *fmt_n;
char name[25];
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
fmt_v = csv_output ? "%.6f%s" : "%18.6f%s";
fmt_n = csv_output ? "%s" : "%-25s";
aggr_printout(evsel, id, nr);
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
scnprintf(name, sizeof(name), "%s%s",
perf_evsel__name(evsel), csv_output ? "" : " (msec)");
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
fprintf(output, fmt_v, msecs, csv_sep);
if (csv_output)
fprintf(output, "%s%s", evsel->unit, csv_sep);
else
fprintf(output, "%-*s%s", unit_width, evsel->unit, csv_sep);
fprintf(output, fmt_n, name);
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
if (evsel->cgrp)
fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);
}
static void abs_printout(int id, int nr, struct perf_evsel *evsel, double avg)
{
FILE *output = stat_config.output;
double sc = evsel->scale;
const char *fmt;
if (csv_output) {
fmt = sc != 1.0 ? "%.2f%s" : "%.0f%s";
} else {
if (big_num)
fmt = sc != 1.0 ? "%'18.2f%s" : "%'18.0f%s";
else
fmt = sc != 1.0 ? "%18.2f%s" : "%18.0f%s";
}
aggr_printout(evsel, id, nr);
fprintf(output, fmt, avg, csv_sep);
if (evsel->unit)
fprintf(output, "%-*s%s",
csv_output ? 0 : unit_width,
evsel->unit, csv_sep);
fprintf(output, "%-*s", csv_output ? 0 : 25, perf_evsel__name(evsel));
if (evsel->cgrp)
fprintf(output, "%s%s", csv_sep, evsel->cgrp->name);
}
static void printout(int id, int nr, struct perf_evsel *counter, double uval)
{
int cpu = cpu_map__id_to_cpu(id);
if (stat_config.aggr_mode == AGGR_GLOBAL)
cpu = 0;
if (nsec_counter(counter))
nsec_printout(id, nr, counter, uval);
else
abs_printout(id, nr, counter, uval);
if (!csv_output && !stat_config.interval)
perf_stat__print_shadow_stats(stat_config.output, counter,
uval, cpu,
stat_config.aggr_mode);
}
static void print_aggr(char *prefix)
{
FILE *output = stat_config.output;
struct perf_evsel *counter;
perf stat: Get correct cpu id for print_aggr print_aggr() fails to print per-core/per-socket statistics after commit 582ec0829b3d ("perf stat: Fix per-socket output bug for uncore events") if events have differnt cpus. Because in print_aggr(), aggr_get_id needs index (not cpu id) to find core/pkg id. Also, evsel cpu maps should be used to get aggregated id. Here is an example: Counting events cycles,uncore_imc_0/cas_count_read/. (Uncore event has cpumask 0,18) $ perf stat -e cycles,uncore_imc_0/cas_count_read/ -C0,18 --per-core sleep 2 Without this patch, it failes to get CPU 18 result. Performance counter stats for 'CPU(s) 0,18': S0-C0 1 7526851 cycles S0-C0 1 1.05 MiB uncore_imc_0/cas_count_read/ S1-C0 0 <not counted> cycles S1-C0 0 <not counted> MiB uncore_imc_0/cas_count_read/ With this patch, it can get both CPU0 and CPU18 result. Performance counter stats for 'CPU(s) 0,18': S0-C0 1 6327768 cycles S0-C0 1 0.47 MiB uncore_imc_0/cas_count_read/ S1-C0 1 330228 cycles S1-C0 1 0.29 MiB uncore_imc_0/cas_count_read/ Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Stephane Eranian <eranian@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: 582ec0829b3d ("perf stat: Fix per-socket output bug for uncore events") Link: http://lkml.kernel.org/r/1435820925-51091-1-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-02 15:08:43 +08:00
int cpu, s, s2, id, nr;
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
double uval;
u64 ena, run, val;
if (!(aggr_map || aggr_get_id))
return;
for (s = 0; s < aggr_map->nr; s++) {
id = aggr_map->map[s];
evlist__for_each(evsel_list, counter) {
val = ena = run = 0;
nr = 0;
for (cpu = 0; cpu < perf_evsel__nr_cpus(counter); cpu++) {
perf stat: Get correct cpu id for print_aggr print_aggr() fails to print per-core/per-socket statistics after commit 582ec0829b3d ("perf stat: Fix per-socket output bug for uncore events") if events have differnt cpus. Because in print_aggr(), aggr_get_id needs index (not cpu id) to find core/pkg id. Also, evsel cpu maps should be used to get aggregated id. Here is an example: Counting events cycles,uncore_imc_0/cas_count_read/. (Uncore event has cpumask 0,18) $ perf stat -e cycles,uncore_imc_0/cas_count_read/ -C0,18 --per-core sleep 2 Without this patch, it failes to get CPU 18 result. Performance counter stats for 'CPU(s) 0,18': S0-C0 1 7526851 cycles S0-C0 1 1.05 MiB uncore_imc_0/cas_count_read/ S1-C0 0 <not counted> cycles S1-C0 0 <not counted> MiB uncore_imc_0/cas_count_read/ With this patch, it can get both CPU0 and CPU18 result. Performance counter stats for 'CPU(s) 0,18': S0-C0 1 6327768 cycles S0-C0 1 0.47 MiB uncore_imc_0/cas_count_read/ S1-C0 1 330228 cycles S1-C0 1 0.29 MiB uncore_imc_0/cas_count_read/ Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Acked-by: Stephane Eranian <eranian@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Fixes: 582ec0829b3d ("perf stat: Fix per-socket output bug for uncore events") Link: http://lkml.kernel.org/r/1435820925-51091-1-git-send-email-kan.liang@intel.com Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-07-02 15:08:43 +08:00
s2 = aggr_get_id(perf_evsel__cpus(counter), cpu);
if (s2 != id)
continue;
val += perf_counts(counter->counts, cpu, 0)->val;
ena += perf_counts(counter->counts, cpu, 0)->ena;
run += perf_counts(counter->counts, cpu, 0)->run;
nr++;
}
if (prefix)
fprintf(output, "%s", prefix);
if (run == 0 || ena == 0) {
aggr_printout(counter, id, nr);
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
fprintf(output, "%*s%s",
csv_output ? 0 : 18,
counter->supported ? CNTR_NOT_COUNTED : CNTR_NOT_SUPPORTED,
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
csv_sep);
fprintf(output, "%-*s%s",
csv_output ? 0 : unit_width,
counter->unit, csv_sep);
fprintf(output, "%*s",
csv_output ? 0 : -25,
perf_evsel__name(counter));
if (counter->cgrp)
fprintf(output, "%s%s",
csv_sep, counter->cgrp->name);
print_running(run, ena);
fputc('\n', output);
continue;
}
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
uval = val * counter->scale;
printout(id, nr, counter, uval);
if (!csv_output)
print_noise(counter, 1.0);
print_running(run, ena);
fputc('\n', output);
}
}
}
perf stat: Introduce --per-thread option Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 17:29:27 +08:00
static void print_aggr_thread(struct perf_evsel *counter, char *prefix)
{
FILE *output = stat_config.output;
perf stat: Introduce --per-thread option Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 17:29:27 +08:00
int nthreads = thread_map__nr(counter->threads);
int ncpus = cpu_map__nr(counter->cpus);
int cpu, thread;
double uval;
for (thread = 0; thread < nthreads; thread++) {
u64 ena = 0, run = 0, val = 0;
for (cpu = 0; cpu < ncpus; cpu++) {
val += perf_counts(counter->counts, cpu, thread)->val;
ena += perf_counts(counter->counts, cpu, thread)->ena;
run += perf_counts(counter->counts, cpu, thread)->run;
}
if (prefix)
fprintf(output, "%s", prefix);
uval = val * counter->scale;
printout(thread, 0, counter, uval);
perf stat: Introduce --per-thread option Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 17:29:27 +08:00
if (!csv_output)
print_noise(counter, 1.0);
print_running(run, ena);
fputc('\n', output);
}
}
/*
* Print out the results of a single counter:
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
* aggregated counts in system-wide mode
*/
2013-01-29 19:47:44 +08:00
static void print_counter_aggr(struct perf_evsel *counter, char *prefix)
{
FILE *output = stat_config.output;
struct perf_stat_evsel *ps = counter->priv;
double avg = avg_stats(&ps->res_stats[0]);
int scaled = counter->counts->scaled;
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
double uval;
double avg_enabled, avg_running;
avg_enabled = avg_stats(&ps->res_stats[1]);
avg_running = avg_stats(&ps->res_stats[2]);
2013-01-29 19:47:44 +08:00
if (prefix)
fprintf(output, "%s", prefix);
if (scaled == -1 || !counter->supported) {
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
fprintf(output, "%*s%s",
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
csv_output ? 0 : 18,
perf stat: clarify unsupported events from uncounted events perf stat continues running even if the event list contains counters that are not supported. The resulting output then contains <not counted> for those events which gets confusing as to which events are supported, but not counted and which are not supported. Before: perf stat -ddd -- sleep 1 Performance counter stats for 'sleep 1': 0.571283 task-clock # 0.001 CPUs utilized 1 context-switches # 0.002 M/sec 0 CPU-migrations # 0.000 M/sec 157 page-faults # 0.275 M/sec 1,037,707 cycles # 1.816 GHz <not counted> stalled-cycles-frontend <not counted> stalled-cycles-backend 654,499 instructions # 0.63 insns per cycle 136,129 branches # 238.286 M/sec <not counted> branch-misses <not counted> L1-dcache-loads <not counted> L1-dcache-load-misses <not counted> LLC-loads <not counted> LLC-load-misses <not counted> L1-icache-loads <not counted> L1-icache-load-misses <not counted> dTLB-loads <not counted> dTLB-load-misses <not counted> iTLB-loads <not counted> iTLB-load-misses <not counted> L1-dcache-prefetches <not counted> L1-dcache-prefetch-misses 1.001004836 seconds time elapsed After: perf stat -ddd -- sleep 1 Performance counter stats for 'sleep 1': 1.350326 task-clock # 0.001 CPUs utilized 2 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 M/sec 157 page-faults # 0.116 M/sec 11,986 cycles # 0.009 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 496,986 instructions # 41.46 insns per cycle 138,065 branches # 102.246 M/sec 7,245 branch-misses # 5.25% of all branches <not counted> L1-dcache-loads <not counted> L1-dcache-load-misses <not counted> LLC-loads <not counted> LLC-load-misses <not counted> L1-icache-loads <not counted> L1-icache-load-misses <not counted> dTLB-loads <not counted> dTLB-load-misses <not counted> iTLB-loads <not counted> iTLB-load-misses <not counted> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 1.002397333 seconds time elapsed v1->v2: changed supported type from int to bool v2->v3 fixed vertical alignment of new struct element Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1306767359-13221-1-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-05-30 22:55:59 +08:00
counter->supported ? CNTR_NOT_COUNTED : CNTR_NOT_SUPPORTED,
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
csv_sep);
fprintf(output, "%-*s%s",
csv_output ? 0 : unit_width,
counter->unit, csv_sep);
fprintf(output, "%*s",
csv_output ? 0 : -25,
perf_evsel__name(counter));
if (counter->cgrp)
fprintf(output, "%s%s", csv_sep, counter->cgrp->name);
print_running(avg_running, avg_enabled);
fputc('\n', output);
return;
}
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
uval = avg * counter->scale;
printout(-1, 0, counter, uval);
print_noise(counter, avg);
print_running(avg_running, avg_enabled);
fprintf(output, "\n");
}
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
/*
* Print out the results of a single counter:
* does not use aggregated count in system-wide
*/
2013-01-29 19:47:44 +08:00
static void print_counter(struct perf_evsel *counter, char *prefix)
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
{
FILE *output = stat_config.output;
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
u64 ena, run, val;
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
double uval;
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
int cpu;
for (cpu = 0; cpu < perf_evsel__nr_cpus(counter); cpu++) {
val = perf_counts(counter->counts, cpu, 0)->val;
ena = perf_counts(counter->counts, cpu, 0)->ena;
run = perf_counts(counter->counts, cpu, 0)->run;
2013-01-29 19:47:44 +08:00
if (prefix)
fprintf(output, "%s", prefix);
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
if (run == 0 || ena == 0) {
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
fprintf(output, "CPU%*d%s%*s%s",
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
csv_output ? 0 : -4,
perf_evsel__cpus(counter)->map[cpu], csv_sep,
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
csv_output ? 0 : 18,
perf stat: clarify unsupported events from uncounted events perf stat continues running even if the event list contains counters that are not supported. The resulting output then contains <not counted> for those events which gets confusing as to which events are supported, but not counted and which are not supported. Before: perf stat -ddd -- sleep 1 Performance counter stats for 'sleep 1': 0.571283 task-clock # 0.001 CPUs utilized 1 context-switches # 0.002 M/sec 0 CPU-migrations # 0.000 M/sec 157 page-faults # 0.275 M/sec 1,037,707 cycles # 1.816 GHz <not counted> stalled-cycles-frontend <not counted> stalled-cycles-backend 654,499 instructions # 0.63 insns per cycle 136,129 branches # 238.286 M/sec <not counted> branch-misses <not counted> L1-dcache-loads <not counted> L1-dcache-load-misses <not counted> LLC-loads <not counted> LLC-load-misses <not counted> L1-icache-loads <not counted> L1-icache-load-misses <not counted> dTLB-loads <not counted> dTLB-load-misses <not counted> iTLB-loads <not counted> iTLB-load-misses <not counted> L1-dcache-prefetches <not counted> L1-dcache-prefetch-misses 1.001004836 seconds time elapsed After: perf stat -ddd -- sleep 1 Performance counter stats for 'sleep 1': 1.350326 task-clock # 0.001 CPUs utilized 2 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 M/sec 157 page-faults # 0.116 M/sec 11,986 cycles # 0.009 GHz <not supported> stalled-cycles-frontend <not supported> stalled-cycles-backend 496,986 instructions # 41.46 insns per cycle 138,065 branches # 102.246 M/sec 7,245 branch-misses # 5.25% of all branches <not counted> L1-dcache-loads <not counted> L1-dcache-load-misses <not counted> LLC-loads <not counted> LLC-load-misses <not counted> L1-icache-loads <not counted> L1-icache-load-misses <not counted> dTLB-loads <not counted> dTLB-load-misses <not counted> iTLB-loads <not counted> iTLB-load-misses <not counted> L1-dcache-prefetches <not supported> L1-dcache-prefetch-misses 1.002397333 seconds time elapsed v1->v2: changed supported type from int to bool v2->v3 fixed vertical alignment of new struct element Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Link: http://lkml.kernel.org/r/1306767359-13221-1-git-send-email-dsahern@gmail.com Signed-off-by: David Ahern <dsahern@gmail.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2011-05-30 22:55:59 +08:00
counter->supported ? CNTR_NOT_COUNTED : CNTR_NOT_SUPPORTED,
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
csv_sep);
fprintf(output, "%-*s%s",
csv_output ? 0 : unit_width,
counter->unit, csv_sep);
fprintf(output, "%*s",
csv_output ? 0 : -25,
perf_evsel__name(counter));
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
if (counter->cgrp)
fprintf(output, "%s%s",
csv_sep, counter->cgrp->name);
print_running(run, ena);
fputc('\n', output);
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
continue;
}
tools/perf/stat: Add event unit and scale support This patch adds perf stat support for handling event units and scales as exported by the kernel. The kernel can export PMU events actual unit and scaling factor via sysfs: $ ls -1 /sys/devices/power/events/energy-* /sys/devices/power/events/energy-cores /sys/devices/power/events/energy-cores.scale /sys/devices/power/events/energy-cores.unit /sys/devices/power/events/energy-pkg /sys/devices/power/events/energy-pkg.scale /sys/devices/power/events/energy-pkg.unit $ cat /sys/devices/power/events/energy-cores.scale 2.3283064365386962890625e-10 $ cat cat /sys/devices/power/events/energy-cores.unit Joules This patch modifies the pmu event alias code to check for the presence of the .unit and .scale files to load the corresponding values. They are then used by perf stat transparently: # perf stat -a -e power/energy-pkg/,power/energy-cores/,cycles -I 1000 sleep 1000 # time counts unit events 1.000214717 3.07 Joules power/energy-pkg/ [100.00%] 1.000214717 0.53 Joules power/energy-cores/ 1.000214717 12965028 cycles [100.00%] 2.000749289 3.01 Joules power/energy-pkg/ 2.000749289 0.52 Joules power/energy-cores/ 2.000749289 15817043 cycles When the event does not have an explicit unit exported by the kernel, nothing is printed. In csv output mode, there will be an empty field. Special thanks to Jiri for providing the supporting code in the parser to trigger reading of the scale and unit files. Signed-off-by: Stephane Eranian <eranian@google.com> Reviewed-by: Jiri Olsa <jolsa@redhat.com> Reviewed-by: Andi Kleen <ak@linux.intel.com> Signed-off-by: Peter Zijlstra <peterz@infradead.org> Cc: zheng.z.yan@intel.com Cc: bp@alien8.de Cc: maria.n.dimakopoulou@gmail.com Cc: acme@redhat.com Link: http://lkml.kernel.org/r/1384275531-10892-3-git-send-email-eranian@google.com Signed-off-by: Ingo Molnar <mingo@kernel.org>
2013-11-13 00:58:49 +08:00
uval = val * counter->scale;
printout(cpu, 0, counter, uval);
if (!csv_output)
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
print_noise(counter, 1.0);
print_running(run, ena);
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
fputc('\n', output);
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
}
}
static void print_interval(char *prefix, struct timespec *ts)
{
FILE *output = stat_config.output;
static int num_print_interval;
sprintf(prefix, "%6lu.%09lu%s", ts->tv_sec, ts->tv_nsec, csv_sep);
if (num_print_interval == 0 && !csv_output) {
switch (stat_config.aggr_mode) {
case AGGR_SOCKET:
fprintf(output, "# time socket cpus counts %*s events\n", unit_width, "unit");
break;
case AGGR_CORE:
fprintf(output, "# time core cpus counts %*s events\n", unit_width, "unit");
break;
case AGGR_NONE:
fprintf(output, "# time CPU counts %*s events\n", unit_width, "unit");
break;
perf stat: Introduce --per-thread option Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 17:29:27 +08:00
case AGGR_THREAD:
fprintf(output, "# time comm-pid counts %*s events\n", unit_width, "unit");
break;
case AGGR_GLOBAL:
default:
fprintf(output, "# time counts %*s events\n", unit_width, "unit");
case AGGR_UNSET:
break;
}
}
if (++num_print_interval == 25)
num_print_interval = 0;
}
static void print_header(int argc, const char **argv)
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
{
FILE *output = stat_config.output;
int i;
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
fflush(stdout);
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
if (!csv_output) {
fprintf(output, "\n");
fprintf(output, " Performance counter stats for ");
if (target.system_wide)
fprintf(output, "\'system wide");
else if (target.cpu_list)
fprintf(output, "\'CPU(s) %s", target.cpu_list);
else if (!target__has_task(&target)) {
fprintf(output, "\'%s", argv[0]);
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
for (i = 1; i < argc; i++)
fprintf(output, " %s", argv[i]);
} else if (target.pid)
fprintf(output, "process id \'%s", target.pid);
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
else
fprintf(output, "thread id \'%s", target.tid);
fprintf(output, "\'");
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
if (run_count > 1)
fprintf(output, " (%d runs)", run_count);
fprintf(output, ":\n\n");
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
}
}
static void print_footer(void)
{
FILE *output = stat_config.output;
if (!null_run)
fprintf(output, "\n");
fprintf(output, " %17.9f seconds time elapsed",
avg_stats(&walltime_nsecs_stats)/1e9);
if (run_count > 1) {
fprintf(output, " ");
print_noise_pct(stddev_stats(&walltime_nsecs_stats),
avg_stats(&walltime_nsecs_stats));
}
fprintf(output, "\n\n");
}
static void print_counters(struct timespec *ts, int argc, const char **argv)
{
int interval = stat_config.interval;
struct perf_evsel *counter;
char buf[64], *prefix = NULL;
if (interval)
print_interval(prefix = buf, ts);
else
print_header(argc, argv);
switch (stat_config.aggr_mode) {
case AGGR_CORE:
case AGGR_SOCKET:
print_aggr(prefix);
break;
perf stat: Introduce --per-thread option Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 17:29:27 +08:00
case AGGR_THREAD:
evlist__for_each(evsel_list, counter)
print_aggr_thread(counter, prefix);
break;
case AGGR_GLOBAL:
evlist__for_each(evsel_list, counter)
print_counter_aggr(counter, prefix);
break;
case AGGR_NONE:
evlist__for_each(evsel_list, counter)
print_counter(counter, prefix);
break;
case AGGR_UNSET:
default:
break;
perf stat: Add no-aggregation mode to -a This patch adds a new -A option to perf stat. If specified then perf stat does not aggregate counts across all monitored CPUs in system-wide mode, i.e., when using -a. This option is not supported in per-thread mode. Being able to get a per-cpu breakdown is useful to detect imbalances between CPUs when running a uniform workload than spans all monitored CPUs. The second version corrects the missing cpumap[] support, so that it works when the -C option is used. The third version fixes a missing cpumap[] in print_counter() and removes a stray patch in builtin-trace.c. Examples on a 4-way system: # perf stat -a -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': 9592808135 cycles 3490380006 instructions # 0.364 IPC 1.001584632 seconds time elapsed # perf stat -a -A -e cycles,instructions -- sleep 1 Performance counter stats for 'sleep 1': CPU0 2398163767 cycles CPU1 2398180817 cycles CPU2 2398217115 cycles CPU3 2398247483 cycles CPU0 872282046 instructions # 0.364 IPC CPU1 873481776 instructions # 0.364 IPC CPU2 872638127 instructions # 0.364 IPC CPU3 872437789 instructions # 0.364 IPC 1.001556052 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4ce257b5.1e07e30a.7b6b.3aa9@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-11-16 17:05:01 +08:00
}
if (!interval && !csv_output)
print_footer();
fflush(stat_config.output);
}
static volatile int signr = -1;
static void skip_signal(int signo)
{
if ((child_pid == -1) || stat_config.interval)
done = 1;
signr = signo;
/*
* render child_pid harmless
* won't send SIGTERM to a random
* process in case of race condition
* and fast PID recycling
*/
child_pid = -1;
}
static void sig_atexit(void)
{
sigset_t set, oset;
/*
* avoid race condition with SIGCHLD handler
* in skip_signal() which is modifying child_pid
* goal is to avoid send SIGTERM to a random
* process
*/
sigemptyset(&set);
sigaddset(&set, SIGCHLD);
sigprocmask(SIG_BLOCK, &set, &oset);
if (child_pid != -1)
kill(child_pid, SIGTERM);
sigprocmask(SIG_SETMASK, &oset, NULL);
if (signr == -1)
return;
signal(signr, SIG_DFL);
kill(getpid(), signr);
}
perf tools: Use __maybe_used for unused variables perf defines both __used and __unused variables to use for marking unused variables. The variable __used is defined to __attribute__((__unused__)), which contradicts the kernel definition to __attribute__((__used__)) for new gcc versions. On Android, __used is also defined in system headers and this leads to warnings like: warning: '__used__' attribute ignored __unused is not defined in the kernel and is not a standard definition. If __unused is included everywhere instead of __used, this leads to conflicts with glibc headers, since glibc has a variables with this name in its headers. The best approach is to use __maybe_unused, the definition used in the kernel for __attribute__((unused)). In this way there is only one definition in perf sources (instead of 2 definitions that point to the same thing: __used and __unused) and it works on both Linux and Android. This patch simply replaces all instances of __used and __unused with __maybe_unused. Signed-off-by: Irina Tirdea <irina.tirdea@intel.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com [ committer note: fixed up conflict with a116e05 in builtin-sched.c ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-11 06:15:03 +08:00
static int stat__set_big_num(const struct option *opt __maybe_unused,
const char *s __maybe_unused, int unset)
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
{
big_num_opt = unset ? 0 : 1;
return 0;
}
static const struct option stat_options[] = {
OPT_BOOLEAN('T', "transaction", &transaction_run,
"hardware transaction statistics"),
OPT_CALLBACK('e', "event", &evsel_list, "event",
"event selector. use 'perf list' to list available events",
parse_events_option),
OPT_CALLBACK(0, "filter", &evsel_list, "filter",
"event filter", parse_filter),
OPT_BOOLEAN('i', "no-inherit", &no_inherit,
"child tasks do not inherit counters"),
OPT_STRING('p', "pid", &target.pid, "pid",
"stat events on existing process id"),
OPT_STRING('t', "tid", &target.tid, "tid",
"stat events on existing thread id"),
OPT_BOOLEAN('a', "all-cpus", &target.system_wide,
"system-wide collection from all CPUs"),
OPT_BOOLEAN('g', "group", &group,
"put the counters into a counter group"),
OPT_BOOLEAN('c', "scale", &stat_config.scale, "scale/normalize counters"),
OPT_INCR('v', "verbose", &verbose,
"be more verbose (show counter open errors, etc)"),
OPT_INTEGER('r', "repeat", &run_count,
"repeat command and print average + stddev (max: 100, forever: 0)"),
OPT_BOOLEAN('n', "null", &null_run,
"null run - dont start any counters"),
OPT_INCR('d', "detailed", &detailed_run,
"detailed run - start a lot of events"),
OPT_BOOLEAN('S', "sync", &sync_run,
"call sync() before starting a run"),
OPT_CALLBACK_NOOPT('B', "big-num", NULL, NULL,
"print large numbers with thousands\' separators",
stat__set_big_num),
OPT_STRING('C', "cpu", &target.cpu_list, "cpu",
"list of cpus to monitor in system-wide"),
OPT_SET_UINT('A', "no-aggr", &stat_config.aggr_mode,
"disable CPU count aggregation", AGGR_NONE),
OPT_STRING('x', "field-separator", &csv_sep, "separator",
"print counts with custom separator"),
OPT_CALLBACK('G', "cgroup", &evsel_list, "name",
"monitor event in cgroup name only", parse_cgroups),
OPT_STRING('o', "output", &output_name, "file", "output file name"),
OPT_BOOLEAN(0, "append", &append_file, "append to the output file"),
OPT_INTEGER(0, "log-fd", &output_fd,
"log output to fd, instead of stderr"),
OPT_STRING(0, "pre", &pre_cmd, "command",
"command to run prior to the measured command"),
OPT_STRING(0, "post", &post_cmd, "command",
"command to run after to the measured command"),
OPT_UINTEGER('I', "interval-print", &stat_config.interval,
"print counts at regular interval in ms (>= 10)"),
OPT_SET_UINT(0, "per-socket", &stat_config.aggr_mode,
"aggregate counts per processor socket", AGGR_SOCKET),
OPT_SET_UINT(0, "per-core", &stat_config.aggr_mode,
"aggregate counts per physical processor core", AGGR_CORE),
OPT_SET_UINT(0, "per-thread", &stat_config.aggr_mode,
"aggregate counts per thread", AGGR_THREAD),
OPT_UINTEGER('D', "delay", &initial_delay,
"ms to wait before starting measurement after program start"),
OPT_END()
};
static int perf_stat__get_socket(struct cpu_map *map, int cpu)
{
return cpu_map__get_socket(map, cpu, NULL);
}
static int perf_stat__get_core(struct cpu_map *map, int cpu)
{
return cpu_map__get_core(map, cpu, NULL);
}
static int cpu_map__get_max(struct cpu_map *map)
{
int i, max = -1;
for (i = 0; i < map->nr; i++) {
if (map->map[i] > max)
max = map->map[i];
}
return max;
}
static struct cpu_map *cpus_aggr_map;
static int perf_stat__get_aggr(aggr_get_id_t get_id, struct cpu_map *map, int idx)
{
int cpu;
if (idx >= map->nr)
return -1;
cpu = map->map[idx];
if (cpus_aggr_map->map[cpu] == -1)
cpus_aggr_map->map[cpu] = get_id(map, idx);
return cpus_aggr_map->map[cpu];
}
static int perf_stat__get_socket_cached(struct cpu_map *map, int idx)
{
return perf_stat__get_aggr(perf_stat__get_socket, map, idx);
}
static int perf_stat__get_core_cached(struct cpu_map *map, int idx)
{
return perf_stat__get_aggr(perf_stat__get_core, map, idx);
}
static int perf_stat_init_aggr_mode(void)
{
int nr;
switch (stat_config.aggr_mode) {
case AGGR_SOCKET:
if (cpu_map__build_socket_map(evsel_list->cpus, &aggr_map)) {
perror("cannot build socket map");
return -1;
}
aggr_get_id = perf_stat__get_socket_cached;
break;
case AGGR_CORE:
if (cpu_map__build_core_map(evsel_list->cpus, &aggr_map)) {
perror("cannot build core map");
return -1;
}
aggr_get_id = perf_stat__get_core_cached;
break;
case AGGR_NONE:
case AGGR_GLOBAL:
perf stat: Introduce --per-thread option Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 17:29:27 +08:00
case AGGR_THREAD:
case AGGR_UNSET:
default:
break;
}
/*
* The evsel_list->cpus is the base we operate on,
* taking the highest cpu number to be the size of
* the aggregation translate cpumap.
*/
nr = cpu_map__get_max(evsel_list->cpus);
cpus_aggr_map = cpu_map__empty_new(nr + 1);
return cpus_aggr_map ? 0 : -ENOMEM;
}
static void perf_stat__exit_aggr_mode(void)
{
cpu_map__put(aggr_map);
cpu_map__put(cpus_aggr_map);
aggr_map = NULL;
cpus_aggr_map = NULL;
}
perf stat: Add -d -d and -d -d -d options to show more CPU events Print even more detailed statistics if requested via perf stat -d: -d: detailed events, L1 and LLC data cache -d -d: more detailed events, dTLB and iTLB events -d -d -d: very detailed events, adding prefetch events Full output looks like this now: Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% ) 49,068 context-switches # 0.029 M/sec ( +- 16.66% ) 8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% ) 17,397 page-faults # 0.010 M/sec ( +- 0.46% ) 2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%] 1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] 743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] 1,314,416,379 instructions # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] 272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%] 3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%] 449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%] 22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] 6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%] 1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%] 411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%] 27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%] 464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%] 10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%] 1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%] 117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%] 4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%] 1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%] 0.195622057 seconds time elapsed ( +- 6.84% ) Also clean up the attribute construction code to be appending, and factor it out into add_default_attributes(). Tweak the coverage percentage printout a bit, so that it's easier to view it alongside the +- sttddev colum. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 19:30:56 +08:00
/*
* Add default attributes, if there were no attributes specified or
* if -d/--detailed, -d -d or -d -d -d is used:
*/
static int add_default_attributes(void)
{
struct perf_event_attr default_attrs[] = {
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_TASK_CLOCK },
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CONTEXT_SWITCHES },
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_CPU_MIGRATIONS },
{ .type = PERF_TYPE_SOFTWARE, .config = PERF_COUNT_SW_PAGE_FAULTS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_CPU_CYCLES },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_FRONTEND },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_STALLED_CYCLES_BACKEND },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_INSTRUCTIONS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_INSTRUCTIONS },
{ .type = PERF_TYPE_HARDWARE, .config = PERF_COUNT_HW_BRANCH_MISSES },
};
/*
* Detailed stats (-d), covering the L1 and last level data caches:
*/
struct perf_event_attr detailed_attrs[] = {
{ .type = PERF_TYPE_HW_CACHE,
.config =
PERF_COUNT_HW_CACHE_L1D << 0 |
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
{ .type = PERF_TYPE_HW_CACHE,
.config =
PERF_COUNT_HW_CACHE_L1D << 0 |
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
{ .type = PERF_TYPE_HW_CACHE,
.config =
PERF_COUNT_HW_CACHE_LL << 0 |
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
{ .type = PERF_TYPE_HW_CACHE,
.config =
PERF_COUNT_HW_CACHE_LL << 0 |
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
};
/*
* Very detailed stats (-d -d), covering the instruction cache and the TLB caches:
*/
struct perf_event_attr very_detailed_attrs[] = {
{ .type = PERF_TYPE_HW_CACHE,
.config =
PERF_COUNT_HW_CACHE_L1I << 0 |
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
{ .type = PERF_TYPE_HW_CACHE,
.config =
PERF_COUNT_HW_CACHE_L1I << 0 |
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
{ .type = PERF_TYPE_HW_CACHE,
.config =
PERF_COUNT_HW_CACHE_DTLB << 0 |
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
{ .type = PERF_TYPE_HW_CACHE,
.config =
PERF_COUNT_HW_CACHE_DTLB << 0 |
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
{ .type = PERF_TYPE_HW_CACHE,
.config =
PERF_COUNT_HW_CACHE_ITLB << 0 |
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
{ .type = PERF_TYPE_HW_CACHE,
.config =
PERF_COUNT_HW_CACHE_ITLB << 0 |
(PERF_COUNT_HW_CACHE_OP_READ << 8) |
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
};
/*
* Very, very detailed stats (-d -d -d), adding prefetch events:
*/
struct perf_event_attr very_very_detailed_attrs[] = {
{ .type = PERF_TYPE_HW_CACHE,
.config =
PERF_COUNT_HW_CACHE_L1D << 0 |
(PERF_COUNT_HW_CACHE_OP_PREFETCH << 8) |
(PERF_COUNT_HW_CACHE_RESULT_ACCESS << 16) },
{ .type = PERF_TYPE_HW_CACHE,
.config =
PERF_COUNT_HW_CACHE_L1D << 0 |
(PERF_COUNT_HW_CACHE_OP_PREFETCH << 8) |
(PERF_COUNT_HW_CACHE_RESULT_MISS << 16) },
};
perf stat: Add -d -d and -d -d -d options to show more CPU events Print even more detailed statistics if requested via perf stat -d: -d: detailed events, L1 and LLC data cache -d -d: more detailed events, dTLB and iTLB events -d -d -d: very detailed events, adding prefetch events Full output looks like this now: Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% ) 49,068 context-switches # 0.029 M/sec ( +- 16.66% ) 8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% ) 17,397 page-faults # 0.010 M/sec ( +- 0.46% ) 2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%] 1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] 743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] 1,314,416,379 instructions # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] 272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%] 3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%] 449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%] 22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] 6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%] 1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%] 411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%] 27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%] 464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%] 10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%] 1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%] 117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%] 4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%] 1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%] 0.195622057 seconds time elapsed ( +- 6.84% ) Also clean up the attribute construction code to be appending, and factor it out into add_default_attributes(). Tweak the coverage percentage printout a bit, so that it's easier to view it alongside the +- sttddev colum. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 19:30:56 +08:00
/* Set attrs if no event is selected and !null_run: */
if (null_run)
return 0;
if (transaction_run) {
int err;
if (pmu_have_event("cpu", "cycles-ct") &&
pmu_have_event("cpu", "el-start"))
err = parse_events(evsel_list, transaction_attrs, NULL);
else
err = parse_events(evsel_list, transaction_limited_attrs, NULL);
if (err) {
fprintf(stderr, "Cannot set up transaction events\n");
return -1;
}
return 0;
}
perf stat: Add -d -d and -d -d -d options to show more CPU events Print even more detailed statistics if requested via perf stat -d: -d: detailed events, L1 and LLC data cache -d -d: more detailed events, dTLB and iTLB events -d -d -d: very detailed events, adding prefetch events Full output looks like this now: Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% ) 49,068 context-switches # 0.029 M/sec ( +- 16.66% ) 8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% ) 17,397 page-faults # 0.010 M/sec ( +- 0.46% ) 2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%] 1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] 743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] 1,314,416,379 instructions # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] 272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%] 3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%] 449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%] 22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] 6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%] 1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%] 411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%] 27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%] 464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%] 10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%] 1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%] 117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%] 4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%] 1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%] 0.195622057 seconds time elapsed ( +- 6.84% ) Also clean up the attribute construction code to be appending, and factor it out into add_default_attributes(). Tweak the coverage percentage printout a bit, so that it's easier to view it alongside the +- sttddev colum. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 19:30:56 +08:00
if (!evsel_list->nr_entries) {
perf stat: Initialize default events wrt exclude_{guest,host} When no event is specified the tools use perf_evlist__add_default(), that will call event_attr_init to initialize the KVM exclusion bits. When the change was made to the tools so that by default guest samples would be excluded, the changes were made just to the parsing routines and to perf_evlist__add_default(), not to perf_evlist__add_attrs, that is used so far just by perf stat to add multiple events, according to the level of detail specified. Recently the tools were changed to reconstruct the event name from all the details in perf_event_attr, not just from .type and .config, but taking into account all the feature bits (.exclude_{guest,host,user,kernel,etc}, .precise_ip, etc). That is when we noticed that the default for perf stat wasn't the one for the rest of the tools, i.e. the .exclude_guest bit wasn't being set. I.e. the default, that doesn't call event_attr_init was showing the :HG modifier: $ perf stat usleep 1 Performance counter stats for 'usleep 1': 0.942119 task-clock # 0.454 CPUs utilized 1 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 K/sec 126 page-faults # 0.134 M/sec 693,193 cycles:HG # 0.736 GHz [40.11%] 407,461 stalled-cycles-frontend:HG # 58.78% frontend cycles idle [72.29%] 365,403 stalled-cycles-backend:HG # 52.71% backend cycles idle 465,982 instructions:HG # 0.67 insns per cycle # 0.87 stalled cycles per insn 89,760 branches:HG # 95.275 M/sec 6,178 branch-misses:HG # 6.88% of all branches 0.002077228 seconds time elapsed While if one explicitely specifies the same events, which will make the parsing code to be called and thus event_attr_init is called: $ perf stat -e task-clock,context-switches,migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses usleep 1 Performance counter stats for 'usleep 1': 1.040349 task-clock # 0.500 CPUs utilized 2 context-switches # 0.002 M/sec 0 CPU-migrations # 0.000 K/sec 127 page-faults # 0.122 M/sec 587,966 cycles # 0.565 GHz [13.18%] 459,167 stalled-cycles-frontend # 78.09% frontend cycles idle 390,249 stalled-cycles-backend # 66.37% backend cycles idle 504,006 instructions # 0.86 insns per cycle # 0.91 stalled cycles per insn 96,455 branches # 92.714 M/sec 6,522 branch-misses # 6.76% of all branches [96.12%] 0.002078681 seconds time elapsed Fix it by introducing a perf_evlist__add_default_attrs method that will call evlist_attr_init in all the perf_event_attr entries before adding the events. Reported-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-4eysr236r0pgiyum9epwxw7s@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31 00:53:54 +08:00
if (perf_evlist__add_default_attrs(evsel_list, default_attrs) < 0)
return -1;
perf stat: Add -d -d and -d -d -d options to show more CPU events Print even more detailed statistics if requested via perf stat -d: -d: detailed events, L1 and LLC data cache -d -d: more detailed events, dTLB and iTLB events -d -d -d: very detailed events, adding prefetch events Full output looks like this now: Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% ) 49,068 context-switches # 0.029 M/sec ( +- 16.66% ) 8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% ) 17,397 page-faults # 0.010 M/sec ( +- 0.46% ) 2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%] 1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] 743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] 1,314,416,379 instructions # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] 272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%] 3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%] 449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%] 22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] 6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%] 1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%] 411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%] 27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%] 464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%] 10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%] 1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%] 117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%] 4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%] 1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%] 0.195622057 seconds time elapsed ( +- 6.84% ) Also clean up the attribute construction code to be appending, and factor it out into add_default_attributes(). Tweak the coverage percentage printout a bit, so that it's easier to view it alongside the +- sttddev colum. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 19:30:56 +08:00
}
/* Detailed events get appended to the event list: */
if (detailed_run < 1)
return 0;
/* Append detailed run extra attributes: */
perf stat: Initialize default events wrt exclude_{guest,host} When no event is specified the tools use perf_evlist__add_default(), that will call event_attr_init to initialize the KVM exclusion bits. When the change was made to the tools so that by default guest samples would be excluded, the changes were made just to the parsing routines and to perf_evlist__add_default(), not to perf_evlist__add_attrs, that is used so far just by perf stat to add multiple events, according to the level of detail specified. Recently the tools were changed to reconstruct the event name from all the details in perf_event_attr, not just from .type and .config, but taking into account all the feature bits (.exclude_{guest,host,user,kernel,etc}, .precise_ip, etc). That is when we noticed that the default for perf stat wasn't the one for the rest of the tools, i.e. the .exclude_guest bit wasn't being set. I.e. the default, that doesn't call event_attr_init was showing the :HG modifier: $ perf stat usleep 1 Performance counter stats for 'usleep 1': 0.942119 task-clock # 0.454 CPUs utilized 1 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 K/sec 126 page-faults # 0.134 M/sec 693,193 cycles:HG # 0.736 GHz [40.11%] 407,461 stalled-cycles-frontend:HG # 58.78% frontend cycles idle [72.29%] 365,403 stalled-cycles-backend:HG # 52.71% backend cycles idle 465,982 instructions:HG # 0.67 insns per cycle # 0.87 stalled cycles per insn 89,760 branches:HG # 95.275 M/sec 6,178 branch-misses:HG # 6.88% of all branches 0.002077228 seconds time elapsed While if one explicitely specifies the same events, which will make the parsing code to be called and thus event_attr_init is called: $ perf stat -e task-clock,context-switches,migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses usleep 1 Performance counter stats for 'usleep 1': 1.040349 task-clock # 0.500 CPUs utilized 2 context-switches # 0.002 M/sec 0 CPU-migrations # 0.000 K/sec 127 page-faults # 0.122 M/sec 587,966 cycles # 0.565 GHz [13.18%] 459,167 stalled-cycles-frontend # 78.09% frontend cycles idle 390,249 stalled-cycles-backend # 66.37% backend cycles idle 504,006 instructions # 0.86 insns per cycle # 0.91 stalled cycles per insn 96,455 branches # 92.714 M/sec 6,522 branch-misses # 6.76% of all branches [96.12%] 0.002078681 seconds time elapsed Fix it by introducing a perf_evlist__add_default_attrs method that will call evlist_attr_init in all the perf_event_attr entries before adding the events. Reported-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-4eysr236r0pgiyum9epwxw7s@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31 00:53:54 +08:00
if (perf_evlist__add_default_attrs(evsel_list, detailed_attrs) < 0)
return -1;
perf stat: Add -d -d and -d -d -d options to show more CPU events Print even more detailed statistics if requested via perf stat -d: -d: detailed events, L1 and LLC data cache -d -d: more detailed events, dTLB and iTLB events -d -d -d: very detailed events, adding prefetch events Full output looks like this now: Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% ) 49,068 context-switches # 0.029 M/sec ( +- 16.66% ) 8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% ) 17,397 page-faults # 0.010 M/sec ( +- 0.46% ) 2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%] 1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] 743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] 1,314,416,379 instructions # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] 272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%] 3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%] 449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%] 22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] 6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%] 1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%] 411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%] 27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%] 464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%] 10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%] 1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%] 117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%] 4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%] 1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%] 0.195622057 seconds time elapsed ( +- 6.84% ) Also clean up the attribute construction code to be appending, and factor it out into add_default_attributes(). Tweak the coverage percentage printout a bit, so that it's easier to view it alongside the +- sttddev colum. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 19:30:56 +08:00
if (detailed_run < 2)
return 0;
/* Append very detailed run extra attributes: */
perf stat: Initialize default events wrt exclude_{guest,host} When no event is specified the tools use perf_evlist__add_default(), that will call event_attr_init to initialize the KVM exclusion bits. When the change was made to the tools so that by default guest samples would be excluded, the changes were made just to the parsing routines and to perf_evlist__add_default(), not to perf_evlist__add_attrs, that is used so far just by perf stat to add multiple events, according to the level of detail specified. Recently the tools were changed to reconstruct the event name from all the details in perf_event_attr, not just from .type and .config, but taking into account all the feature bits (.exclude_{guest,host,user,kernel,etc}, .precise_ip, etc). That is when we noticed that the default for perf stat wasn't the one for the rest of the tools, i.e. the .exclude_guest bit wasn't being set. I.e. the default, that doesn't call event_attr_init was showing the :HG modifier: $ perf stat usleep 1 Performance counter stats for 'usleep 1': 0.942119 task-clock # 0.454 CPUs utilized 1 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 K/sec 126 page-faults # 0.134 M/sec 693,193 cycles:HG # 0.736 GHz [40.11%] 407,461 stalled-cycles-frontend:HG # 58.78% frontend cycles idle [72.29%] 365,403 stalled-cycles-backend:HG # 52.71% backend cycles idle 465,982 instructions:HG # 0.67 insns per cycle # 0.87 stalled cycles per insn 89,760 branches:HG # 95.275 M/sec 6,178 branch-misses:HG # 6.88% of all branches 0.002077228 seconds time elapsed While if one explicitely specifies the same events, which will make the parsing code to be called and thus event_attr_init is called: $ perf stat -e task-clock,context-switches,migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses usleep 1 Performance counter stats for 'usleep 1': 1.040349 task-clock # 0.500 CPUs utilized 2 context-switches # 0.002 M/sec 0 CPU-migrations # 0.000 K/sec 127 page-faults # 0.122 M/sec 587,966 cycles # 0.565 GHz [13.18%] 459,167 stalled-cycles-frontend # 78.09% frontend cycles idle 390,249 stalled-cycles-backend # 66.37% backend cycles idle 504,006 instructions # 0.86 insns per cycle # 0.91 stalled cycles per insn 96,455 branches # 92.714 M/sec 6,522 branch-misses # 6.76% of all branches [96.12%] 0.002078681 seconds time elapsed Fix it by introducing a perf_evlist__add_default_attrs method that will call evlist_attr_init in all the perf_event_attr entries before adding the events. Reported-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-4eysr236r0pgiyum9epwxw7s@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31 00:53:54 +08:00
if (perf_evlist__add_default_attrs(evsel_list, very_detailed_attrs) < 0)
return -1;
perf stat: Add -d -d and -d -d -d options to show more CPU events Print even more detailed statistics if requested via perf stat -d: -d: detailed events, L1 and LLC data cache -d -d: more detailed events, dTLB and iTLB events -d -d -d: very detailed events, adding prefetch events Full output looks like this now: Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% ) 49,068 context-switches # 0.029 M/sec ( +- 16.66% ) 8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% ) 17,397 page-faults # 0.010 M/sec ( +- 0.46% ) 2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%] 1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] 743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] 1,314,416,379 instructions # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] 272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%] 3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%] 449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%] 22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] 6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%] 1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%] 411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%] 27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%] 464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%] 10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%] 1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%] 117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%] 4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%] 1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%] 0.195622057 seconds time elapsed ( +- 6.84% ) Also clean up the attribute construction code to be appending, and factor it out into add_default_attributes(). Tweak the coverage percentage printout a bit, so that it's easier to view it alongside the +- sttddev colum. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 19:30:56 +08:00
if (detailed_run < 3)
return 0;
/* Append very, very detailed run extra attributes: */
perf stat: Initialize default events wrt exclude_{guest,host} When no event is specified the tools use perf_evlist__add_default(), that will call event_attr_init to initialize the KVM exclusion bits. When the change was made to the tools so that by default guest samples would be excluded, the changes were made just to the parsing routines and to perf_evlist__add_default(), not to perf_evlist__add_attrs, that is used so far just by perf stat to add multiple events, according to the level of detail specified. Recently the tools were changed to reconstruct the event name from all the details in perf_event_attr, not just from .type and .config, but taking into account all the feature bits (.exclude_{guest,host,user,kernel,etc}, .precise_ip, etc). That is when we noticed that the default for perf stat wasn't the one for the rest of the tools, i.e. the .exclude_guest bit wasn't being set. I.e. the default, that doesn't call event_attr_init was showing the :HG modifier: $ perf stat usleep 1 Performance counter stats for 'usleep 1': 0.942119 task-clock # 0.454 CPUs utilized 1 context-switches # 0.001 M/sec 0 CPU-migrations # 0.000 K/sec 126 page-faults # 0.134 M/sec 693,193 cycles:HG # 0.736 GHz [40.11%] 407,461 stalled-cycles-frontend:HG # 58.78% frontend cycles idle [72.29%] 365,403 stalled-cycles-backend:HG # 52.71% backend cycles idle 465,982 instructions:HG # 0.67 insns per cycle # 0.87 stalled cycles per insn 89,760 branches:HG # 95.275 M/sec 6,178 branch-misses:HG # 6.88% of all branches 0.002077228 seconds time elapsed While if one explicitely specifies the same events, which will make the parsing code to be called and thus event_attr_init is called: $ perf stat -e task-clock,context-switches,migrations,page-faults,cycles,stalled-cycles-frontend,stalled-cycles-backend,instructions,branches,branch-misses usleep 1 Performance counter stats for 'usleep 1': 1.040349 task-clock # 0.500 CPUs utilized 2 context-switches # 0.002 M/sec 0 CPU-migrations # 0.000 K/sec 127 page-faults # 0.122 M/sec 587,966 cycles # 0.565 GHz [13.18%] 459,167 stalled-cycles-frontend # 78.09% frontend cycles idle 390,249 stalled-cycles-backend # 66.37% backend cycles idle 504,006 instructions # 0.86 insns per cycle # 0.91 stalled cycles per insn 96,455 branches # 92.714 M/sec 6,522 branch-misses # 6.76% of all branches [96.12%] 0.002078681 seconds time elapsed Fix it by introducing a perf_evlist__add_default_attrs method that will call evlist_attr_init in all the perf_event_attr entries before adding the events. Reported-by: Ingo Molnar <mingo@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Namhyung Kim <namhyung@gmail.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/n/tip-4eysr236r0pgiyum9epwxw7s@git.kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-05-31 00:53:54 +08:00
return perf_evlist__add_default_attrs(evsel_list, very_very_detailed_attrs);
perf stat: Add -d -d and -d -d -d options to show more CPU events Print even more detailed statistics if requested via perf stat -d: -d: detailed events, L1 and LLC data cache -d -d: more detailed events, dTLB and iTLB events -d -d -d: very detailed events, adding prefetch events Full output looks like this now: Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% ) 49,068 context-switches # 0.029 M/sec ( +- 16.66% ) 8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% ) 17,397 page-faults # 0.010 M/sec ( +- 0.46% ) 2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%] 1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] 743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] 1,314,416,379 instructions # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] 272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%] 3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%] 449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%] 22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] 6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%] 1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%] 411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%] 27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%] 464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%] 10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%] 1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%] 117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%] 4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%] 1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%] 0.195622057 seconds time elapsed ( +- 6.84% ) Also clean up the attribute construction code to be appending, and factor it out into add_default_attributes(). Tweak the coverage percentage printout a bit, so that it's easier to view it alongside the +- sttddev colum. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 19:30:56 +08:00
}
perf tools: Use __maybe_used for unused variables perf defines both __used and __unused variables to use for marking unused variables. The variable __used is defined to __attribute__((__unused__)), which contradicts the kernel definition to __attribute__((__used__)) for new gcc versions. On Android, __used is also defined in system headers and this leads to warnings like: warning: '__used__' attribute ignored __unused is not defined in the kernel and is not a standard definition. If __unused is included everywhere instead of __used, this leads to conflicts with glibc headers, since glibc has a variables with this name in its headers. The best approach is to use __maybe_unused, the definition used in the kernel for __attribute__((unused)). In this way there is only one definition in perf sources (instead of 2 definitions that point to the same thing: __used and __unused) and it works on both Linux and Android. This patch simply replaces all instances of __used and __unused with __maybe_unused. Signed-off-by: Irina Tirdea <irina.tirdea@intel.com> Acked-by: Pekka Enberg <penberg@kernel.org> Cc: David Ahern <dsahern@gmail.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Namhyung Kim <namhyung.kim@lge.com> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/r/1347315303-29906-7-git-send-email-irina.tirdea@intel.com [ committer note: fixed up conflict with a116e05 in builtin-sched.c ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2012-09-11 06:15:03 +08:00
int cmd_stat(int argc, const char **argv, const char *prefix __maybe_unused)
{
const char * const stat_usage[] = {
"perf stat [<options>] [<command>]",
NULL
};
int status = -EINVAL, run_idx;
const char *mode;
FILE *output = stderr;
unsigned int interval;
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
perf stat: add perf stat -B to pretty print large numbers It is hard to read very large numbers so provide an option to perf stat to separate thousands using a separator. The patch leverages the locale support of stdio. You need to set your LC_NUMERIC appropriately, for instance LC_NUMERIC=en_US.UTF8. You need to pass -B to activate this feature. This way existing scripts parsing the output do not need to be changed. Here is an example. $ perf stat noploop 2 noploop for 2 seconds Performance counter stats for 'noploop 2': 1998.347031 task-clock-msecs # 0.998 CPUs 61 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 118 page-faults # 0.000 M/sec 4,138,410,900 cycles # 2070.917 M/sec (scaled from 70.01%) 2,062,650,268 instructions # 0.498 IPC (scaled from 70.01%) 2,057,653,466 branches # 1029.678 M/sec (scaled from 70.01%) 40,267 branch-misses # 0.002 % (scaled from 30.04%) 2,055,961,348 cache-references # 1028.831 M/sec (scaled from 30.03%) 53,725 cache-misses # 0.027 M/sec (scaled from 30.02%) 2.001393933 seconds time elapsed $ perf stat -B noploop 2 noploop for 2 seconds Performance counter stats for 'noploop 2': 1998.297883 task-clock-msecs # 0.998 CPUs 59 context-switches # 0.000 M/sec 0 CPU-migrations # 0.000 M/sec 119 page-faults # 0.000 M/sec 4,131,380,160 cycles # 2067.450 M/sec (scaled from 70.01%) 2,059,096,507 instructions # 0.498 IPC (scaled from 70.01%) 2,054,681,303 branches # 1028.216 M/sec (scaled from 70.01%) 25,650 branch-misses # 0.001 % (scaled from 30.05%) 2,056,283,014 cache-references # 1029.017 M/sec (scaled from 30.03%) 47,097 cache-misses # 0.024 M/sec (scaled from 30.02%) 2.001391016 seconds time elapsed Cc: David S. Miller <davem@davemloft.net> Cc: Frédéric Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Tom Zanussi <tzanussi@gmail.com> LKML-Reference: <4bf28fe8.914ed80a.01ca.fffff5f5@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-05-18 21:00:01 +08:00
setlocale(LC_ALL, "");
evsel_list = perf_evlist__new();
if (evsel_list == NULL)
return -ENOMEM;
argc = parse_options(argc, argv, stat_options, stat_usage,
PARSE_OPT_STOP_AT_NON_OPTION);
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
interval = stat_config.interval;
if (output_name && strcmp(output_name, "-"))
output = NULL;
if (output_name && output_fd) {
fprintf(stderr, "cannot use both --output and --log-fd\n");
parse_options_usage(stat_usage, stat_options, "o", 1);
parse_options_usage(NULL, stat_options, "log-fd", 0);
goto out;
}
if (output_fd < 0) {
fprintf(stderr, "argument to --log-fd must be a > 0\n");
parse_options_usage(stat_usage, stat_options, "log-fd", 0);
goto out;
}
if (!output) {
struct timespec tm;
mode = append_file ? "a" : "w";
output = fopen(output_name, mode);
if (!output) {
perror("failed to create output file");
return -1;
}
clock_gettime(CLOCK_REALTIME, &tm);
fprintf(output, "# started on %s\n", ctime(&tm.tv_sec));
} else if (output_fd > 0) {
mode = append_file ? "a" : "w";
output = fdopen(output_fd, mode);
if (!output) {
perror("Failed opening logfd");
return -errno;
}
}
stat_config.output = output;
if (csv_sep) {
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
csv_output = true;
if (!strcmp(csv_sep, "\\t"))
csv_sep = "\t";
} else
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
csv_sep = DEFAULT_SEPARATOR;
/*
* let the spreadsheet do the pretty-printing
*/
if (csv_output) {
/* User explicitly passed -B? */
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
if (big_num_opt == 1) {
fprintf(stderr, "-B option not supported with -x\n");
parse_options_usage(stat_usage, stat_options, "B", 1);
parse_options_usage(NULL, stat_options, "x", 1);
goto out;
perf stat: Add csv-style output This patch adds an option (-x/--field-separator) to print counts using a CSV-style output. The user can pass a custom separator. This makes it very easy to import counts directly into your favorite spreadsheet without having to write scripts. Example: $ perf stat --field-separator=, -a -- sleep 1 4009.961740,task-clock-msecs 13,context-switches 2,CPU-migrations 189,page-faults 9596385684,cycles 3493659441,instructions 872897069,branches 41562,branch-misses 22424,cache-references 1289,cache-misses Works also in non-aggregated mode: $ perf stat -x , -a -A -- sleep 1 CPU0,1002.526168,task-clock-msecs CPU1,1002.528365,task-clock-msecs CPU2,1002.523360,task-clock-msecs CPU3,1002.519878,task-clock-msecs CPU0,1,context-switches CPU1,5,context-switches CPU2,5,context-switches CPU3,6,context-switches CPU0,0,CPU-migrations CPU1,1,CPU-migrations CPU2,0,CPU-migrations CPU3,1,CPU-migrations CPU0,2,page-faults CPU1,6,page-faults CPU2,9,page-faults CPU3,174,page-faults CPU0,2399439771,cycles CPU1,2380369063,cycles CPU2,2399142710,cycles CPU3,2373161192,cycles CPU0,872900618,instructions CPU1,873030960,instructions CPU2,872714525,instructions CPU3,874460580,instructions CPU0,221556839,branches CPU1,218134342,branches CPU2,218161730,branches CPU3,218284093,branches CPU0,18556,branch-misses CPU1,1449,branch-misses CPU2,3447,branch-misses CPU3,12714,branch-misses CPU0,8330,cache-references CPU1,313844,cache-references CPU2,47993728,cache-references CPU3,826481,cache-references CPU0,272,cache-misses CPU1,5360,cache-misses CPU2,1342193,cache-misses CPU3,13992,cache-misses This second version adds the ability to name a separator and uses field-separator as the long option to be consistent with perf report. Commiter note: Since we enabled --big-num by default in 201e0b0 and -x can't be used with it, we need to notice if the user explicitely enabled or disabled -B, add code to disable big_num if the user didn't explicitely set --big_num when -x is used. Cc: David S. Miller <davem@davemloft.net> Cc: Frederik Weisbecker <fweisbec@gmail.com> Cc: Ingo Molnar <mingo@elte.hu> Cc: paulus@samba.org Cc: Peter Zijlstra <peterz@infradead.org> Cc: Robert Richter <robert.richter@amd.com> LKML-Reference: <4cf68aa7.0fedd80a.5294.1203@mx.google.com> Signed-off-by: Stephane Eranian <eranian@google.com> Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2010-12-02 00:49:05 +08:00
} else /* Nope, so disable big number formatting */
big_num = false;
} else if (big_num_opt == 0) /* User passed --no-big-num */
big_num = false;
if (!argc && target__none(&target))
usage_with_options(stat_usage, stat_options);
if (run_count < 0) {
pr_err("Run count must be a positive number\n");
parse_options_usage(stat_usage, stat_options, "r", 1);
goto out;
} else if (run_count == 0) {
forever = true;
run_count = 1;
}
if ((stat_config.aggr_mode == AGGR_THREAD) && !target__has_task(&target)) {
perf stat: Introduce --per-thread option Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 17:29:27 +08:00
fprintf(stderr, "The --per-thread option is only available "
"when monitoring via -p -t options.\n");
parse_options_usage(NULL, stat_options, "p", 1);
parse_options_usage(NULL, stat_options, "t", 1);
perf stat: Introduce --per-thread option Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 17:29:27 +08:00
goto out;
}
/*
* no_aggr, cgroup are for system-wide only
* --per-thread is aggregated per thread, we dont mix it with cpu mode
*/
if (((stat_config.aggr_mode != AGGR_GLOBAL &&
stat_config.aggr_mode != AGGR_THREAD) || nr_cgroups) &&
!target__has_cpu(&target)) {
fprintf(stderr, "both cgroup and no-aggregation "
"modes only available in system-wide mode\n");
parse_options_usage(stat_usage, stat_options, "G", 1);
parse_options_usage(NULL, stat_options, "A", 1);
parse_options_usage(NULL, stat_options, "a", 1);
goto out;
}
perf stat: Add -d -d and -d -d -d options to show more CPU events Print even more detailed statistics if requested via perf stat -d: -d: detailed events, L1 and LLC data cache -d -d: more detailed events, dTLB and iTLB events -d -d -d: very detailed events, adding prefetch events Full output looks like this now: Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1703.674707 task-clock # 8.709 CPUs utilized ( +- 4.19% ) 49,068 context-switches # 0.029 M/sec ( +- 16.66% ) 8,303 CPU-migrations # 0.005 M/sec ( +- 24.90% ) 17,397 page-faults # 0.010 M/sec ( +- 0.46% ) 2,345,389,239 cycles # 1.377 GHz ( +- 4.61% ) [55.90%] 1,884,503,527 stalled-cycles-frontend # 80.35% frontend cycles idle ( +- 5.67% ) [50.39%] 743,919,737 stalled-cycles-backend # 31.72% backend cycles idle ( +- 8.75% ) [49.91%] 1,314,416,379 instructions # 0.56 insns per cycle # 1.43 stalled cycles per insn ( +- 2.53% ) [60.87%] 272,592,567 branches # 160.003 M/sec ( +- 1.74% ) [56.56%] 3,794,846 branch-misses # 1.39% of all branches ( +- 6.59% ) [58.50%] 449,982,778 L1-dcache-loads # 264.125 M/sec ( +- 2.47% ) [49.88%] 22,404,961 L1-dcache-load-misses # 4.98% of all L1-dcache hits ( +- 6.08% ) [55.05%] 6,204,750 LLC-loads # 3.642 M/sec ( +- 8.91% ) [43.75%] 1,837,411 LLC-load-misses # 1.078 M/sec ( +- 7.27% ) [12.07%] 411,440,421 L1-icache-loads # 241.502 M/sec ( +- 5.60% ) [36.52%] 27,556,832 L1-icache-load-misses # 16.175 M/sec ( +- 7.46% ) [46.72%] 464,067,627 dTLB-loads # 272.392 M/sec ( +- 4.46% ) [54.17%] 10,765,648 dTLB-load-misses # 6.319 M/sec ( +- 3.18% ) [48.68%] 1,273,080,386 iTLB-loads # 747.256 M/sec ( +- 3.38% ) [47.53%] 117,481 iTLB-load-misses # 0.069 M/sec ( +- 14.99% ) [47.01%] 4,590,653 L1-dcache-prefetches # 2.695 M/sec ( +- 4.49% ) [46.19%] 1,712,660 L1-dcache-prefetch-misses # 1.005 M/sec ( +- 3.75% ) [44.82%] 0.195622057 seconds time elapsed ( +- 6.84% ) Also clean up the attribute construction code to be appending, and factor it out into add_default_attributes(). Tweak the coverage percentage printout a bit, so that it's easier to view it alongside the +- sttddev colum. Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Mike Galbraith <efault@gmx.de> Cc: Steven Rostedt <rostedt@goodmis.org> Link: http://lkml.kernel.org/n/tip-to3kgu04449s64062val8b62@git.kernel.org Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-05-19 19:30:56 +08:00
if (add_default_attributes())
goto out;
target__validate(&target);
if (perf_evlist__create_maps(evsel_list, &target) < 0) {
if (target__has_task(&target)) {
pr_err("Problems finding threads of monitor\n");
parse_options_usage(stat_usage, stat_options, "p", 1);
parse_options_usage(NULL, stat_options, "t", 1);
} else if (target__has_cpu(&target)) {
perror("failed to parse CPUs map");
parse_options_usage(stat_usage, stat_options, "C", 1);
parse_options_usage(NULL, stat_options, "a", 1);
}
goto out;
}
perf stat: Introduce --per-thread option Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 17:29:27 +08:00
/*
* Initialize thread_map with comm names,
* so we could print it out on output.
*/
if (stat_config.aggr_mode == AGGR_THREAD)
perf stat: Introduce --per-thread option Currently all the -p option PID arguments tasks values get aggregated and printed as single values. Adding --per-tasks option to print values per task. $ perf stat -e cycles,instructions --per-thread -p 30190,30242 ^C Performance counter stats for process id '30190,30242': cat-30190 0 cycles yes-30242 3,842,525,421 cycles cat-30190 0 instructions yes-30242 10,370,817,010 instructions 1.143155657 seconds time elapsed Also works under interval mode: $ perf stat -e cycles,instructions --per-thread -p 30190,30242 -I 1000 # time comm-pid counts unit events 1.000073435 cat-30190 89,058 cycles 1.000073435 yes-30242 3,360,786,902 cycles (100.00%) 1.000073435 cat-30190 14,066 instructions 1.000073435 yes-30242 9,069,937,462 instructions 2.000204830 cat-30190 0 cycles 2.000204830 yes-30242 3,351,667,626 cycles 2.000204830 cat-30190 0 instructions 2.000204830 yes-30242 9,045,796,885 instructions ^C 2.771286639 cat-30190 0 cycles 2.771286639 yes-30242 2,593,884,166 cycles 2.771286639 cat-30190 0 instructions 2.771286639 yes-30242 7,001,171,191 instructions It works only with -t and -p options, otherwise following error is printed: $ perf stat -e cycles --per-thread -I 1000 ls The --per-thread option is only available when monitoring via -p -t options. -p, --pid <pid> stat events on existing process id -t, --tid <tid> stat events on existing thread id Signed-off-by: Jiri Olsa <jolsa@kernel.org> Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Andi Kleen <ak@linux.intel.com> Cc: David Ahern <dsahern@gmail.com> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Stephane Eranian <eranian@google.com> Link: http://lkml.kernel.org/r/1435310967-14570-23-git-send-email-jolsa@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-06-26 17:29:27 +08:00
thread_map__read_comms(evsel_list->threads);
2013-01-29 19:47:44 +08:00
if (interval && interval < 100) {
perf stat: Reduce min --interval-print to 10ms The --interval-print parameter was limited to 100ms. However, for example, 10ms is required to do sophisticated bandwidth analysis using uncore events. The test shows that the overhead of the system-wide uncore monitoring with 10ms interval is only ~2%. So this patch reduces the minimal interval-print allowd to 10ms. But 10ms may not work well for all cases. For example, when the cpus/threads number is very large, for system-wide core event monitoring the overhead could be high. To handle this issue, a warning will be displayed when the interval-print is set between 10ms to 100ms. So users can make a decision according to their specific cases. # perf stat -e uncore_imc_1/cas_count_read/ -a --interval-print 10 -- sleep 1 print interval < 100ms. The overhead percentage could be high in some cases. Please proceed with caution. # time counts unit events 0.010200451 0.10 MiB uncore_imc_1/cas_count_read/ 0.020475117 0.02 MiB uncore_imc_1/cas_count_read/ 0.030692800 0.01 MiB uncore_imc_1/cas_count_read/ 0.040948161 0.02 MiB uncore_imc_1/cas_count_read/ 0.051159564 0.00 MiB uncore_imc_1/cas_count_read/ Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1443776674-42511-1-git-send-email-kan.liang@intel.com [ Added warning about overhead when using sub 100ms intervals to the man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-02 17:04:34 +08:00
if (interval < 10) {
pr_err("print interval must be >= 10ms\n");
parse_options_usage(stat_usage, stat_options, "I", 1);
perf stat: Reduce min --interval-print to 10ms The --interval-print parameter was limited to 100ms. However, for example, 10ms is required to do sophisticated bandwidth analysis using uncore events. The test shows that the overhead of the system-wide uncore monitoring with 10ms interval is only ~2%. So this patch reduces the minimal interval-print allowd to 10ms. But 10ms may not work well for all cases. For example, when the cpus/threads number is very large, for system-wide core event monitoring the overhead could be high. To handle this issue, a warning will be displayed when the interval-print is set between 10ms to 100ms. So users can make a decision according to their specific cases. # perf stat -e uncore_imc_1/cas_count_read/ -a --interval-print 10 -- sleep 1 print interval < 100ms. The overhead percentage could be high in some cases. Please proceed with caution. # time counts unit events 0.010200451 0.10 MiB uncore_imc_1/cas_count_read/ 0.020475117 0.02 MiB uncore_imc_1/cas_count_read/ 0.030692800 0.01 MiB uncore_imc_1/cas_count_read/ 0.040948161 0.02 MiB uncore_imc_1/cas_count_read/ 0.051159564 0.00 MiB uncore_imc_1/cas_count_read/ Signed-off-by: Kan Liang <kan.liang@intel.com> Acked-by: Jiri Olsa <jolsa@kernel.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: Namhyung Kim <namhyung@kernel.org> Link: http://lkml.kernel.org/r/1443776674-42511-1-git-send-email-kan.liang@intel.com [ Added warning about overhead when using sub 100ms intervals to the man page ] Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-10-02 17:04:34 +08:00
goto out;
} else
pr_warning("print interval < 100ms. "
"The overhead percentage could be high in some cases. "
"Please proceed with caution.\n");
2013-01-29 19:47:44 +08:00
}
if (perf_evlist__alloc_stats(evsel_list, interval))
goto out;
if (perf_stat_init_aggr_mode())
goto out;
/*
* We dont want to block the signals - that would cause
* child tasks to inherit that and Ctrl-C would not work.
* What we want is for Ctrl-C to work in the exec()-ed
* task, but being ignored by perf stat itself:
*/
atexit(sig_atexit);
if (!forever)
signal(SIGINT, skip_signal);
2013-01-29 19:47:44 +08:00
signal(SIGCHLD, skip_signal);
signal(SIGALRM, skip_signal);
signal(SIGABRT, skip_signal);
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
status = 0;
for (run_idx = 0; forever || run_idx < run_count; run_idx++) {
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
if (run_count != 1 && verbose)
fprintf(output, "[ perf stat: executing run #%d ... ]\n",
run_idx + 1);
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
status = run_perf_stat(argc, argv);
if (forever && status != -1) {
print_counters(NULL, argc, argv);
perf_stat__reset_stats();
}
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
}
if (!forever && status != -1 && !interval)
print_counters(NULL, argc, argv);
perf_stat__exit_aggr_mode();
perf_evlist__free_stats(evsel_list);
out:
perf_evlist__delete(evsel_list);
perf stat: Add feature to run and measure a command multiple times Add the --repeat <n> feature to perf stat, which repeats a given command up to a 100 times, collects the stats and calculates an average and a stddev. For example, the following oneliner 'perf stat' command runs hackbench 5 times and prints a tabulated result of all metrics, with averages and noise levels (in percentage) printed: aldebaran:~/linux/linux/tools/perf> ./perf stat --repeat 5 ~/hackbench 10 Time: 0.117 Time: 0.108 Time: 0.089 Time: 0.088 Time: 0.100 Performance counter stats for '/home/mingo/hackbench 10' (5 runs): 1243.989586 task-clock-msecs # 10.460 CPUs ( +- 4.720% ) 47706 context-switches # 0.038 M/sec ( +- 19.706% ) 387 CPU-migrations # 0.000 M/sec ( +- 3.608% ) 17793 page-faults # 0.014 M/sec ( +- 0.354% ) 3770941606 cycles # 3031.329 M/sec ( +- 4.621% ) 1566372416 instructions # 0.415 IPC ( +- 2.703% ) 16783421 cache-references # 13.492 M/sec ( +- 5.202% ) 7128590 cache-misses # 5.730 M/sec ( +- 7.420% ) 0.118924455 seconds time elapsed. The goal of this feature is to allow the reliance on these accurate statistics and to know how many times a command has to be repeated for the noise to go down to an acceptable level. (The -v option can be used to see a line printed out as each run progresses.) Cc: Peter Zijlstra <a.p.zijlstra@chello.nl> Cc: Mike Galbraith <efault@gmx.de> Cc: Paul Mackerras <paulus@samba.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> LKML-Reference: <new-submission> Signed-off-by: Ingo Molnar <mingo@elte.hu>
2009-06-13 20:57:28 +08:00
return status;
}