Commit Graph

507442 Commits

Author SHA1 Message Date
Wang Nan
e59d29e88f perf probe: Fix segfault if passed with ''.
Since parse_perf_probe_point() deals with a user passed argument, we
should not assume it to be a valid string.

Without this patch, if pass '' to perf probe, a segfault raises:

 $ perf probe -a ''
 Segmentation fault

This patch checks argument of parse_perf_probe_point() before
string processing.

After this patch:

 $ perf probe -a ''

  usage: perf probe [<options>] 'PROBEDEF' ['PROBEDEF' ...]
     or: perf probe [<options>] --add 'PROBEDEF' [--add 'PROBEDEF' ...]
     ...

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/1430210769-94177-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-05-05 12:26:52 -03:00
Namhyung Kim
e944ec2ca0 perf report: Fix -T/--threads option to work again
The commit 512ae1bd6a ("perf tools: Consolidate management of default
sort orders") changed default value of the 'sort_order' variable to NULL
indicating that users don't set any sort keys on the command line.

However it missed to update a check in perf_evlist__tty_browse_hists()
so that 'perf report -T' cannot show the per-thread values after the
normal output.  This patch fixes it to work again.

Note that the -T option only works on --stdio and neither --sort nor
--parent option was given.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1430309328-28317-1-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-05-01 10:13:30 -03:00
Ingo Molnar
8cc67c3b93 perf/urgent fixes:
User visible:
 
 . Fix a segfault in 'perf top' when kernel map is restricted (Wang Nan)
 
 . Fix hung wakeup tasks after requeueing in 'perf bench futex' (Davidlohr Bueso)
 
 . Fix bug in perf probe global variables handling, missing curly braces on
   an if body (He Kuang)
 
 . 'perf bench numa' fixes (command line help/handling, etc) (Petr Holasek)
 
 Build fixes:
 
 . 'perf kmem' on RHEL6/OL6 (David Ahern)
 
 . libtraceevent on 32-bit arch (Namhyung Kim)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJVPmuWAAoJENZQFvNTUqpAoLQP/RXptWa0Hu72fGBMC1wUQc1N
 m8rlttBfN6mK21W65u5HsprMApjUGMF0iIUJO2c+TubI6RtRX5pr+KTF+qk8/W6v
 7tbOBBRVPaJE1ScxrbWtR1oy82RHaETcBoZ3PeNedQ7m/U+8CtZBvJCP5Mvbj/TT
 umsuvTt/6vpkasPFJogMgiDZT12juETF0uhbyRtIIzqNRkz/HzpbCZm9CCeD/i56
 dfCYwAu/oyAheFC4fcot0QQXtJT9cacE65etVt2cWtO64xA8vWyS/4HW0YsR0K+k
 ZDhpdvZBpI7deH90iRUhzP7P1mKSJPin9DuCcRJbYfUYcWd4uvvp1H71yTIeXaq8
 CYVRz9+QqPQLKeuYd5HGgZURVrLH6shddVL3NwzGR3Ze+jD5Tz+CQtRCS4Lq4XJ3
 R2e1pFAPy9tXIdXR5SVjS3MOT/6KlRchrzZwx558DOZm3xin2PL0HM6BJrOSL4gS
 8/n10mtJ2OGkgx1+gjizDj1t2a7Nhc2JxamObt19u3yBsHP0s3+et3bEJlMJ9tEU
 QjNhItVb4I07UInadVpBQsIpLnV3CXyy/uUxI8EIC7KOTymBeZJBprrWh5lkmZHA
 iDKOetPw79e4DhjUn1ScmU0bZ3/dyz1YwyRUUoy7Y9qQRtBx9Nv4TLZEN7RRZ/rg
 pKcLZzhDLVvZ+wezcUXr
 =9vQM
 -----END PGP SIGNATURE-----

Merge tag 'perf-urgent-for-mingo-2' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf tooling fixes from Arnaldo Carvalho de Melo:

  . Fix a segfault in 'perf top' when kernel map is restricted (Wang Nan)

  . Fix hung wakeup tasks after requeueing in 'perf bench futex' (Davidlohr Bueso)

  . Fix bug in perf probe global variables handling, missing curly braces on
    an if body (He Kuang)

  . 'perf bench numa' fixes (command line help/handling, etc) (Petr Holasek)

  . fix the 'perf kmem' build on RHEL6/OL6 (David Ahern)

  . fix the libtraceevent build on 32-bit arch (Namhyung Kim)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-05-01 08:30:26 +02:00
Petr Holasek
1d90a685eb perf bench numa: Fix immediate meeting of convergence condition
This patch fixes the race in the beginning of benchmark run when some
threads hasn't got assigned curr_cpu yet so they don't occur in
nodes-of-process stats and benchmark concludes that all remaining
threads are converged already.

The race can be reproduced with small amount of threads and some bigger
amount of shared process memory, e.g. one process, two threads and 5GB
of process memory.

Signed-off-by: Petr Holasek <pholasek@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1429198699-25039-4-git-send-email-pholasek@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-27 13:57:50 -03:00
Petr Holasek
24f1ced167 perf bench numa: Fixes of --quiet argument
Corrected description and fixed function of --quiet argument.

Signed-off-by: Petr Holasek <pholasek@redhat.com>
Reviewed-by: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Link: http://lkml.kernel.org/r/1429198699-25039-2-git-send-email-pholasek@redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-27 13:57:49 -03:00
Davidlohr Bueso
052b0f6eaf perf bench futex: Fix hung wakeup tasks after requeueing
The futex-requeue benchmark can hang because of missing wakeups once the
benchmark is done, ie:

[Run 1]: Requeued 1024 of 1024 threads in 0.3290 ms
perf: couldn't wakeup all tasks (135/1024)

This bug, while perhaps suggesting missing wakeups in kernel futex code,
is merely a consequence of the crappy FUTEX_CMP_REQUEUE man page,
incorrectly mentioning that the number of requeued tasks is in fact
returned, not the wakeups.

This patch acknowledges this and updates the corresponding futex_wake
code around it.

Signed-off-by: Davidlohr Bueso <dbueso@suse.de>
Cc: Mel Gorman <mgorman@suse.de>
Link: http://lkml.kernel.org/r/1429894848.10273.44.camel@stgolabs.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-27 13:57:49 -03:00
He Kuang
d13855ef18 perf probe: Fix bug with global variables handling
There are missing curly braces which causes find_variable() return wrong
value when probing with global variables.

This problem can be reproduced as following:

  $ perf probe -v --add='generic_perform_write global_variable_for_test'
  ...
  Try to find probe point from debuginfo.
  Probe point found: generic_perform_write+0
  Searching 'global_variable_for_test' variable in context.
  An error occurred in debuginfo analysis (-2).
    Error: Failed to add events. Reason: No such file or directory (Code: -2)

After this patch:

  $ perf probe -v --add='generic_perform_write global_variable_for_test'
  ...
  Converting variable global_variable_for_test into trace event.
  global_variable_for_test type is int.
  Found 1 probe_trace_events.
  Opening /sys/kernel/debug/tracing/kprobe_events write=1
  Added new event:
  Writing event: p:probe/generic_perform_write _stext+1237464
  global_variable_for_test=@global_variable_for_test+0:s32
    probe:generic_perform_write (on generic_perform_write with
    global_variable_for_test)

  You can now use it in all perf tools, such as:

      perf record -e probe:generic_perform_write -aR sleep 1

Signed-off-by: He Kuang <hekuang@huawei.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1429949338-18678-1-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-27 13:57:29 -03:00
Wang Nan
c671835021 perf top: Fix a segfault when kernel map is restricted.
Perf top raise a warning if a kernel sample is collected but kernel map
is restricted. The warning message needs to dereference al.map->dso...

However, previous perf_event__preprocess_sample() doesn't always
guarantee al.map != NULL, for example, when kernel map is restricted.

This patch validates al.map before dereferencing, avoid the segfault.

Before this patch:

 $ cat /proc/sys/kernel/kptr_restrict
 1
 $ perf top -p  120183
 perf: Segmentation fault
 -------- backtrace --------
 /path/to/perf[0x509868]
 /lib64/libc.so.6(+0x3545f)[0x7f9a1540045f]
 /path/to/perf[0x448820]
 /path/to/perf(cmd_top+0xe3c)[0x44a5dc]
 /path/to/perf[0x4766a2]
 /path/to/perf(main+0x5f5)[0x42e545]
 /lib64/libc.so.6(__libc_start_main+0xf4)[0x7f9a153ecbd4]
 /path/to/perf[0x42e674]

And gdb call trace:

 Program received signal SIGSEGV, Segmentation fault.
 perf_event__process_sample (machine=0xa44030, sample=0x7fffffffa4c0, evsel=0xa43b00, event=0x7ffff41c3000, tool=0x7fffffffa8a0)
    at builtin-top.c:736
 736				  !RB_EMPTY_ROOT(&al.map->dso->symbols[MAP__FUNCTION]) ?
 (gdb) bt
 #0  perf_event__process_sample (machine=0xa44030, sample=0x7fffffffa4c0, evsel=0xa43b00, event=0x7ffff41c3000, tool=0x7fffffffa8a0)
     at builtin-top.c:736
 #1  perf_top__mmap_read_idx (top=top@entry=0x7fffffffa8a0, idx=idx@entry=0) at builtin-top.c:855
 #2  0x000000000044a5dd in perf_top__mmap_read (top=0x7fffffffa8a0) at builtin-top.c:872
 #3  __cmd_top (top=0x7fffffffa8a0) at builtin-top.c:997
 #4  cmd_top (argc=<optimized out>, argv=<optimized out>, prefix=<optimized out>) at builtin-top.c:1267
 #5  0x00000000004766a3 in run_builtin (p=p@entry=0x8a6ce8 <commands+264>, argc=argc@entry=3, argv=argv@entry=0x7fffffffdf70)
      at perf.c:371
 #6  0x000000000042e546 in handle_internal_command (argv=0x7fffffffdf70, argc=3) at perf.c:430
 #7  run_argv (argv=0x7fffffffdcf0, argcp=0x7fffffffdcfc) at perf.c:474
 #8  main (argc=3, argv=0x7fffffffdf70) at perf.c:589
 (gdb)

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Zefan Li <lizefan@huawei.com>
Link: http://lkml.kernel.org/r/1429946703-80807-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-27 13:24:32 -03:00
Namhyung Kim
410ceb8f2f tools lib traceevent: Fix build failure on 32-bit arch
In my i386 build, it failed like this:

    CC       event-parse.o
  event-parse.c: In function 'print_str_arg':
  event-parse.c:3868:5: warning: format '%lu' expects argument of type 'long unsigned int',
                        but argument 3 has type 'uint64_t' [-Wformat]

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Javi Merino <javi.merino@arm.com>
Link: http://lkml.kernel.org/r/20150424020218.GF1905@sejong
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-24 12:47:10 -03:00
David Ahern
4ad1f4300e perf kmem: Fix compiles on RHEL6/OL6
0d68bc92c4 breaks compiles on RHEL6/OL6:
    cc1: warnings being treated as errors
    builtin-kmem.c: In function ‘search_page_alloc_stat’:
    builtin-kmem.c:322: error: declaration of ‘stat’ shadows a global declaration
                            node = &parent->rb_left;
    /usr/include/sys/stat.h:455: error: shadowed declaration is here
    builtin-kmem.c: In function ‘perf_evsel__process_page_alloc_event’:
    builtin-kmem.c:378: error: declaration of ‘stat’ shadows a global declaration
    /usr/include/sys/stat.h:455: error: shadowed declaration is here
    builtin-kmem.c: In function ‘perf_evsel__process_page_free_event’:
    builtin-kmem.c:431: error: declaration of ‘stat’ shadows a global declaration
    /usr/include/sys/stat.h:455: error: shadowed declaration is here

Rename local variable to pstat to avoid the name conflict.

Signed-off-by: David Ahern <david.ahern@oracle.com>
Link: http://lkml.kernel.org/r/1429033773-31383-1-git-send-email-david.ahern@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-24 12:44:47 -03:00
Bobby Powers
de28c15daf tools lib api: Undefine _FORTIFY_SOURCE before setting it
Some toolchains (like Hardened Gentoo) define _FORTIFY_SOURCE in the
built-in, default args.  This causes perf builds to fail with:

<command-line>:0:0: error: "_FORTIFY_SOURCE" redefined [-Werror]
<built-in>: note: this is the location of the previous definition cc1:
all warnings being treated as errors

To avoid this, undefine _FORTIFY_SOURCE before (possibly re-)defining it
in tools/lib/api.

v2 applies cleanly on top of already pulled kbuild changes for 4.1-rc1.

Signed-off-by: Bobby Powers <bobbypowers@gmail.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Dirk Gouders <dirk@gouders.net>
Cc: Michal Marek <mmarek@suse.cz>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-kbuild@vger.kernel.org
Link: http://lkml.kernel.org/r/1429658381-3039-1-git-send-email-bobbypowers@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-23 17:08:23 -03:00
Will Deacon
6145c259cd perf kmem: Consistently use PRIu64 for printing u64 values
Building the perf tool for 32-bit ARM results in the following build
error due to a combination of an incorrect conversion specifier and
compiling with -Werror:

  builtin-kmem.c: In function ‘print_page_summary’:
  builtin-kmem.c:644:9: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘u64’ [-Werror=format=]
           nr_alloc_freed, (total_alloc_freed_bytes) / 1024);
           ^
  builtin-kmem.c:647:9: error: format ‘%lu’ expects argument of type ‘long unsigned int’, but argument 3 has type ‘u64’ [-Werror=format=]
           (total_page_alloc_bytes - total_alloc_freed_bytes) / 1024);
           ^
  cc1: all warnings being treated as errors

This patch fixes the problem by consistently using PRIu64 for printing
out u64 values.

Signed-off-by: Will Deacon <will.deacon@arm.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1429796437-1790-1-git-send-email-will.deacon@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-23 17:08:22 -03:00
Arnaldo Carvalho de Melo
02ac5421dd perf trace: Disable events and drain events when forked workload ends
We were not checking in the inner event processing loop if the forked workload
had finished, which, on a busy system, may make it take a long time trying to
drain events, entering a seemingly neverending loop, waiting for the system to
get idle enough to make it drain the buffers.

Fix it by disabling the events when 'done' is true, in the inner loop, to start
draining what is in the buffers.

Now:

[root@ssdandy ~]# time trace --filter-pids 14003 -a sleep 1 | tail
  996.748 ( 0.002 ms): sh/30296 rt_sigprocmask(how: SETMASK, nset: 0x7ffc83418160, sigsetsize: 8) = 0
  996.751 ( 0.002 ms): sh/30296 rt_sigprocmask(how: BLOCK, nset: 0x7ffc834181f0, oset: 0x7ffc83418270, sigsetsize: 8) = 0
  996.755 ( 0.002 ms): sh/30296 rt_sigaction(sig: INT, act: 0x7ffc83417f50, oact: 0x7ffc83417ff0, sigsetsize: 8) = 0
 1004.543 ( 0.362 ms): tail/30198  ... [continued]: read()) = 4096
 1004.548 ( 7.791 ms): sh/30296 wait4(upid: -1, stat_addr: 0x7ffc834181a0) ...
 1004.975 ( 0.427 ms): tail/30198 read(buf: 0x7633f0, count: 8192) = 4096
 1005.390 ( 0.410 ms): tail/30198 read(buf: 0x765410, count: 8192) = 4096
 1005.743 ( 0.348 ms): tail/30198 read(buf: 0x7633f0, count: 8192) = 4096
 1006.197 ( 0.449 ms): tail/30198 read(buf: 0x765410, count: 8192) = 4096
 1006.492 ( 0.290 ms): tail/30198 read(buf: 0x7633f0, count: 8192) = 4096

real	0m1.219s
user	0m0.704s
sys	0m0.331s
[root@ssdandy ~]#

Reported-by: Michael Petlan <mpetlan@redhat.com>
Suggested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-p6kpn1b26qcbe47pufpw0tex@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-23 17:08:08 -03:00
Arnaldo Carvalho de Melo
cb24d01d21 perf trace: Enable events when doing system wide tracing and starting a workload
commit f7aa222ff3
 Author: Arnaldo Carvalho de Melo <acme@redhat.com>
 Date:   Tue Feb 3 13:25:39 2015 -0300

    perf trace: No need to enable evsels for workload started from perf

The assumption was that whenever a workload is specified, the
attr.enable_on_exec evsel flag would be set, but that is not happening
when perf_record_opts.system_wide is set, for instance

That resulted in both perf_evlist__enable() and attr.enable_on_exec
being not called/set, which made the events to remain disabled while the
workload runs, producing no output.

Fix it,  by calling perf_evlist__enable() in the 'trace' tool
when forking and not targetting a workload started from trace

v2: Test against !target__none(), as suggested by Namhyung Kim, that is
what is used in perf_evsel__config() when deciding if the
attr.enable_on_exec flag to be set. More work is needed to cover other
cases such as opts->initial_delay.

Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: David Ahern <dsahern@gmail.com>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-27z7169pvfxgj8upic636syv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-23 17:07:59 -03:00
Sonny Rao
0140e6141e perf/x86/intel/uncore: Move PCI IDs for IMC to uncore driver
This keeps all the related PCI IDs together in the driver where
they are used.

Signed-off-by: Sonny Rao <sonnyrao@chromium.org>
Acked-by: Bjorn Helgaas <bhelgaas@google.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1429644791-25724-1-git-send-email-sonnyrao@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-22 08:29:19 +02:00
Sonny Rao
80bcffb376 perf/x86/intel/uncore: Add support for Intel Haswell ULT (lower power Mobile Processor) IMC uncore PMUs
This uncore is the same as the Haswell desktop part but uses a
different PCI ID.

Signed-off-by: Sonny Rao <sonnyrao@chromium.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1429569247-16697-1-git-send-email-sonnyrao@chromium.org
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-22 08:27:43 +02:00
Jiri Olsa
3b6e042188 perf/x86/intel: Add cpu_(prepare|starting|dying) for core_pmu
The core_pmu does not define cpu_* callbacks, which handles
allocation of 'struct cpu_hw_events::shared_regs' data,
initialization of debug store and PMU_FL_EXCL_CNTRS counters.

While this probably won't happen on bare metal, virtual CPU can
define x86_pmu.extra_regs together with PMU version 1 and thus
be using core_pmu -> using shared_regs data without it being
allocated. That could could leave to following panic:

	BUG: unable to handle kernel NULL pointer dereference at (null)
	IP: [<ffffffff8152cd4f>] _spin_lock_irqsave+0x1f/0x40

	SNIP

	 [<ffffffff81024bd9>] __intel_shared_reg_get_constraints+0x69/0x1e0
	 [<ffffffff81024deb>] intel_get_event_constraints+0x9b/0x180
	 [<ffffffff8101e815>] x86_schedule_events+0x75/0x1d0
	 [<ffffffff810586dc>] ? check_preempt_curr+0x7c/0x90
	 [<ffffffff810649fe>] ? try_to_wake_up+0x24e/0x3e0
	 [<ffffffff81064ba2>] ? default_wake_function+0x12/0x20
	 [<ffffffff8109eb16>] ? autoremove_wake_function+0x16/0x40
	 [<ffffffff810577e9>] ? __wake_up_common+0x59/0x90
	 [<ffffffff811a9517>] ? __d_lookup+0xa7/0x150
	 [<ffffffff8119db5f>] ? do_lookup+0x9f/0x230
	 [<ffffffff811a993a>] ? dput+0x9a/0x150
	 [<ffffffff8119c8f5>] ? path_to_nameidata+0x25/0x60
	 [<ffffffff8119e90a>] ? __link_path_walk+0x7da/0x1000
	 [<ffffffff8101d8f9>] ? x86_pmu_add+0xb9/0x170
	 [<ffffffff8101d7a7>] x86_pmu_commit_txn+0x67/0xc0
	 [<ffffffff811b07b0>] ? mntput_no_expire+0x30/0x110
	 [<ffffffff8119c731>] ? path_put+0x31/0x40
	 [<ffffffff8107c297>] ? current_fs_time+0x27/0x30
	 [<ffffffff8117d170>] ? mem_cgroup_get_reclaim_stat_from_page+0x20/0x70
	 [<ffffffff8111b7aa>] group_sched_in+0x13a/0x170
	 [<ffffffff81014a29>] ? sched_clock+0x9/0x10
	 [<ffffffff8111bac8>] ctx_sched_in+0x2e8/0x330
	 [<ffffffff8111bb7b>] perf_event_sched_in+0x6b/0xb0
	 [<ffffffff8111bc36>] perf_event_context_sched_in+0x76/0xc0
	 [<ffffffff8111eb3b>] perf_event_comm+0x1bb/0x2e0
	 [<ffffffff81195ee9>] set_task_comm+0x69/0x80
	 [<ffffffff81195fe1>] setup_new_exec+0xe1/0x2e0
	 [<ffffffff811ea68e>] load_elf_binary+0x3ce/0x1ab0

Adding cpu_(prepare|starting|dying) for core_pmu to have
shared_regs data allocated for core_pmu. AFAICS there's no harm
to initialize debug store and PMU_FL_EXCL_CNTRS either for
core_pmu.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20150421152623.GC13169@krava.redhat.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-22 08:24:33 +02:00
Ingo Molnar
0c99241c93 perf/x86/intel/pt: Fix and clean up error handling in pt_event_add()
Dan Carpenter reported that pt_event_add() has buggy
error handling logic: it returns 0 instead of -EBUSY when
it fails to start a newly added event.

Furthermore, the control flow in this function is messy,
with cleanup labels mixed with direct returns.

Fix the bug and clean up the code by converting it to
a straight fast path for the regular non-failing case,
plus a clear sequence of cascading goto labels to do
all cleanup.

NOTE: I materially changed the existing clean up logic in the
pt_event_start() failure case to use the direct
perf_aux_output_end() path, not pt_event_del(), because
perf_aux_output_end() is enough here.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Acked-by: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Julia Lawall <julia.lawall@lip6.fr>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150416103830.GB7847@gmail.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-18 13:31:26 +02:00
Kan Liang
78d504bcd7 perf/x86/intel: Add Broadwell support for the LBR callstack
Same as Haswell, Broadwell also support the LBR callstack.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/1427962377-40955-1-git-send-email-kan.liang@intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-17 09:59:07 +02:00
Jacob Pan
6455239601 perf/x86/intel/rapl: Fix energy counter measurements but supporing per domain energy units
RAPL energy hardware unit can vary within a single CPU package, e.g.
HSW server DRAM has a fixed energy unit of 15.3 uJ (2^-16) whereas
the unit on other domains can be enumerated from power unit MSR.

There might be other variations in the future, this patch adds
per cpu model quirk to allow special handling of certain cpus.

hw_unit is also removed from per cpu data since it is not per cpu
and the sampling rate for energy counter is typically not high.

Without this patch, DRAM domain on HSW servers will be counted
4x higher than the real energy counter.

Signed-off-by: Jacob Pan <jacob.jun.pan@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <andi.kleen@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Vince Weaver <vincent.weaver@maine.edu>
Link: http://lkml.kernel.org/r/1427405325-780-1-git-send-email-jacob.jun.pan@linux.intel.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-17 09:58:56 +02:00
Peter Zijlstra
517e6341fa perf/x86/intel: Fix Core2,Atom,NHM,WSM cycles:pp events
Ingo reported that cycles:pp didn't work for him on some machines.

It turns out that in this commit:

  af4bdcf675 perf/x86/intel: Disallow flags for most Core2/Atom/Nehalem/Westmere events

Andi forgot to explicitly allow that event when he
disabled event flags for PEBS on those uarchs.

Reported-by: Ingo Molnar <mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: af4bdcf675 ("perf/x86/intel: Disallow flags for most Core2/Atom/Nehalem/Westmere events")
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-17 09:58:47 +02:00
Peter Zijlstra
c857eb56e6 perf/x86: Fix hw_perf_event::flags collision
Somehow we ended up with overlapping flags when merging the
RDPMC control flag - this is bad, fix it.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-17 09:50:43 +02:00
Ingo Molnar
5b24e8cf61 perf/core improvements and fixes:
New features:
 
 - Analyze page allocator events also in 'perf kmem' (Namhyung Kim)
 
 User visible fixes:
 
 - Fix retprobe 'perf probe' handling when failing to find needed debuginfo (He Kuang)
 
 - lazy_line probe fixes in 'perf probe' (Naohiro Aota, He Kuang)
 
 Infrastructure:
 
 - Record pfn instead of pointer to struct page in tracepoints (Namhyung Kim)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVLE5YAAoJEBpxZoYYoA71ikgH/RbDjg6wlUeXCSSOfP1jLlvV
 VUGfBmaLyvBHd7G3DP/Ecsl/2mQw82EPZVRE52xpb/wZgZ6+vhldBo4KwE/+JA87
 XrKFBKQLDCxq5vD4AyfCsSEdfGCdRuL5uDoTVKw6ae3l3jlPDz6rN6BCpOeuN4Is
 JZOAeOKO/f21ExP+/A4eNmxdupbb0KAUXFkzCmVcLgMcE+TLVmHQSekiez+U3dbQ
 +rhXEz1rFGqYt0AaqYHNLH5Ewqp+9ebWY3luaqtLkVdPwayGfEq80MI3wlNOgg6k
 RusP/mi9L8SIVf29cPK0HBaqVxUR5zZxKA42as4YDKTE3KDU+V7mZ1CLBgozuiI=
 =0E3s
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo-2' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/urgent

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

New features:

  - Analyze page allocator events in 'perf kmem' (Namhyung Kim)

User visible changes:

  - Fix retprobe 'perf probe' handling when failing to find needed debuginfo (He Kuang)

  - lazy_line probe fixes in 'perf probe' (Naohiro Aota, He Kuang)

Infrastructure changes:

  - Record pfn instead of pointer to struct page in tracepoints (Namhyung Kim)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-14 14:10:56 +02:00
He Kuang
f19e80c640 perf probe: Fix segfault when probe with lazy_line to file
The first argument passed to find_probe_point_lazy() should be CU die,
which will be passed to die_walk_lines() when lazy_line matches.
Currently, when we probe with lazy_line pattern to file without function
name, NULL pointer is passed and causes a segment fault.

Can be reproduced as following:

  $ perf probe -k vmlinux --add='fs/super.c;s->s_count=1;'
  [ 1958.984658] perf[1020]: segfault at 10 ip 00007fc6e10d8c71 sp
  00007ffcbfaaf900 error 4 in libdw-0.161.so[7fc6e10ce000+34000]
  Segmentation fault

After this patch:

  $ perf probe -k vmlinux --add='fs/super.c;s->s_count=1;'
  Added new event:
  probe:_stext         (on @fs/super.c)

  You can now use it in all perf tools, such as:
    perf record -e probe:_stext -aR sleep 1

Signed-off-by: He Kuang <hekuang@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1428925290-5623-3-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-13 20:12:21 -03:00
Naohiro Aota
09ed8975c4 perf probe: Find compilation directory path for lazy matching
If we use lazy matching, it failed to open a souce file if perf command
is invoked outside of compilation directory:

$ perf probe -a '__schedule;clear_*'
Failed to open kernel/sched/core.c: No such file or directory
  Error: Failed to add events. (-2)

OTOH, other commands like "probe -L" can solve the souce directory by
themselves. Let's make it possible for lazy matching too!

Signed-off-by: Naohiro Aota <naota@elisp.net>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: He Kuang <hekuang@huawei.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/1426223923-1493-1-git-send-email-naota@elisp.net
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-13 20:11:05 -03:00
He Kuang
9d7b45c572 perf probe: Set retprobe flag when probe in address-based alternative mode
When perf probe searched in a debuginfo file and failed, it tried with
an alternative, in function get_alternative_probe_event():

        memcpy(tmp, &pev->point, sizeof(*tmp));
        memset(&pev->point, 0, sizeof(pev->point));

In this case, it drops the retprobe flag and forgets to set it back in
find_alternative_probe_point(), so the problem occurs.

Can be reproduced as following:

  $ perf probe -v -k vmlinux --add='sys_write%return'
  ...
  Added new event:
  Writing event: p:probe/sys_write _stext+1584952
    probe:sys_write      (on sys_write%return)

  $ cat /sys/kernel/debug/tracing/kprobe_events
  p:probe/sys_write _stext+1584952

After this patch:

  $ perf probe -v -k vmlinux --add='sys_write%return'
  Added new event:
  Writing event: r:probe/sys_write SyS_write+0
    probe:sys_write      (on sys_write%return)

  $ cat /sys/kernel/debug/tracing/kprobe_events
  r:probe/sys_write SyS_write

Signed-off-by: He Kuang <hekuang@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1428925290-5623-1-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-13 17:25:22 -03:00
Namhyung Kim
0d68bc92c4 perf kmem: Analyze page allocator events also
The perf kmem command records and analyze kernel memory allocation only
for SLAB objects.  This patch implement a simple page allocator analyzer
using kmem:mm_page_alloc and kmem:mm_page_free events.

It adds two new options of --slab and --page.  The --slab option is for
analyzing SLAB allocator and that's what perf kmem currently does.

The new --page option enables page allocator events and analyze kernel
memory usage in page unit.  Currently, 'stat --alloc' subcommand is
implemented only.

If none of these --slab nor --page is specified, --slab is implied.

First run 'perf kmem record' to generate a suitable perf.data file:

  # perf kmem record --page sleep 5

Then run 'perf kmem stat' to postprocess the perf.data file:

  # perf kmem stat --page --alloc --line 10

  -------------------------------------------------------------------------------
   PFN              | Total alloc (KB) | Hits     | Order | Mig.type | GFP flags
  -------------------------------------------------------------------------------
            4045014 |               16 |        1 |     2 |  RECLAIM |  00285250
            4143980 |               16 |        1 |     2 |  RECLAIM |  00285250
            3938658 |               16 |        1 |     2 |  RECLAIM |  00285250
            4045400 |               16 |        1 |     2 |  RECLAIM |  00285250
            3568708 |               16 |        1 |     2 |  RECLAIM |  00285250
            3729824 |               16 |        1 |     2 |  RECLAIM |  00285250
            3657210 |               16 |        1 |     2 |  RECLAIM |  00285250
            4120750 |               16 |        1 |     2 |  RECLAIM |  00285250
            3678850 |               16 |        1 |     2 |  RECLAIM |  00285250
            3693874 |               16 |        1 |     2 |  RECLAIM |  00285250
   ...              | ...              | ...      | ...   | ...      | ...
  -------------------------------------------------------------------------------

  SUMMARY (page allocator)
  ========================
  Total allocation requests     :           44,260   [          177,256 KB ]
  Total free requests           :              117   [              468 KB ]

  Total alloc+freed requests    :               49   [              196 KB ]
  Total alloc-only requests     :           44,211   [          177,060 KB ]
  Total free-only requests      :               68   [              272 KB ]

  Total allocation failures     :                0   [                0 KB ]

  Order     Unmovable   Reclaimable       Movable      Reserved  CMA/Isolated
  -----  ------------  ------------  ------------  ------------  ------------
      0            32             .        44,210             .             .
      1             .             .             .             .             .
      2             .            18             .             .             .
      3             .             .             .             .             .
      4             .             .             .             .             .
      5             .             .             .             .             .
      6             .             .             .             .             .
      7             .             .             .             .             .
      8             .             .             .             .             .
      9             .             .             .             .             .
     10             .             .             .             .             .

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Joonsoo Kim <js1304@gmail.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1428298576-9785-4-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-13 11:44:52 -03:00
Namhyung Kim
9fdd8a875c tracing, mm: Record pfn instead of pointer to struct page
The struct page is opaque for userspace tools, so it'd be better to save
pfn in order to identify page frames.

The textual output of $debugfs/tracing/trace file remains unchanged and
only raw (binary) data format is changed - but thanks to libtraceevent,
userspace tools which deal with the raw data (like perf and trace-cmd)
can parse the format easily.  So impact on the userspace will also be
minimal.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Based-on-patch-by: Joonsoo Kim <js1304@gmail.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: linux-mm@kvack.org
Link: http://lkml.kernel.org/r/1428298576-9785-3-git-send-email-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-13 11:44:52 -03:00
Ingo Molnar
066450be41 perf/x86/intel/pt: Clean up the control flow in pt_pmu_hw_init()
Dan Carpenter pointed out that the control flow in pt_pmu_hw_init()
is a bit messy: for example the kfree(de_attrs) is entirely
superfluous.

Another problem is the inconsistent mixing of label based and
direct return error handling.

Add modern, label based error handling instead and clean up the code
a bit as well.

Note that we'll still do a kfree(NULL) in the normal case - this does
not matter as this is an init path and kfree() returns early if it
sees a NULL.

Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20150409090805.GG17605@mwanda
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-12 11:21:15 +02:00
Ingo Molnar
5dafd7cb96 perf/core improvements and fixes:
New user visible features:
 
 - Support multiple probes on different binaries on the same command line (Masami Hiramatsu)
 
 User visible fixes:
 
 - Fix synthesizing fork_event.ppid for non-main thread (David Ahern)
 
 - Fix cross-endian analysis (David Ahern)
 
 - Fix segfault in 'perf buildid-list' when show DSOs with hits (He Kuang)
 
 Infrastructure:
 
 - Fix type for references to data_head/tail (David Ahern)
 
 - Fix error path to do closedir() when synthesizing threads (Arnaldo Carvalho de Melo)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVKEK2AAoJEBpxZoYYoA71KLYIANWmju2EX7H0ShukvAycQVEw
 O+V9PpUsH+5VsN2KlILKg4daYMNlyp8+zDjIeVaaoFq1nVP1I+iqGFiTXue4GtWh
 yDeGWmikFcLw/YMZUxBfXX/siXxi+PdCtQXpAkogn7JrXeJUSZMJxGg41UZjJZZk
 0xKbq5gHrrJ9DfBoww8bZBtEha7als5xHo7oyxGjtngHRPQwVB+euTlLIxps0Hio
 lF/R+231hABsdiDwesD0GsY5pIXnPh2hy/hm7eZNZllmhJxUc03BkvCQX5SzQqlJ
 KfOgKVo6KlU9T5f9CTj2CAtXsvJZcxuyTkTK58R06Me8reYCfbvjJeRQbWjJYn8=
 =5sLi
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

New user visible features:

  - Support multiple probes on different binaries on the same command line (Masami Hiramatsu)

User visible changes:

  - Fix synthesizing fork_event.ppid for non-main thread (David Ahern)

  - Fix cross-endian analysis (David Ahern)

  - Fix segfault in 'perf buildid-list' when show DSOs with hits (He Kuang)

Infrastructure changes:

  - Fix type for references to data_head/tail (David Ahern)

  - Fix error path to do closedir() when synthesizing threads (Arnaldo Carvalho de Melo)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-11 08:31:19 +02:00
David Ahern
7b8283b56d perf evlist: Fix type for references to data_head/tail
The data_head and data_tail fields are defined as __u64 in
linux/perf_event.h, but perf userspace uses int and unsigned int.

Convert all references to u64 for consistency.

Signed-off-by: David Ahern <dsahern@gmail.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/1428420037-26599-1-git-send-email-dsahern@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-10 11:29:20 -03:00
Masami Hiramatsu
8cb0aa4c2d perf probe: Check the orphaned -x option
To avoid probing in unintended binary, the orphaned -x option must be
checked and warned.

Without this patch, following command sets up the probe in the kernel.

  -----
  # perf probe -a strcpy -x ./perf
  Added new event:
    probe:strcpy         (on strcpy)

  You can now use it in all perf tools, such as:

          perf record -e probe:strcpy -aR sleep 1
  -----

But in this case, it seems that the user may want to probe in the perf
binary. With this patch, perf-probe correctly handles the orphaned -x.

  -----
  # perf probe -a strcpy -x ./perf
    Error: -x/-m must follow the probe definitions.
  ...
  -----

Reported-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150401102541.17137.75477.stgit@localhost.localdomain
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-10 10:21:30 -03:00
Masami Hiramatsu
7afb3fab39 perf probe: Support multiple probes on different binaries
Support multiple probes on different binaries with just
one command.

In the result, this example sets up the probes on icmp_rcv in
kernel, on main and set_target in perf, and on pcspkr_event
in pcspker.ko driver.
  -----
  # perf probe -a icmp_rcv -x ./perf -a main -a set_target \
   -m /lib/modules/4.0.0-rc5+/kernel/drivers/input/misc/pcspkr.ko \
   -a pcspkr_event
  Added new event:
    probe:icmp_rcv       (on icmp_rcv)

  You can now use it in all perf tools, such as:

          perf record -e probe:icmp_rcv -aR sleep 1

  Added new event:
    probe_perf:main      (on main in /home/mhiramat/ksrc/linux-3/tools/perf/perf)

  You can now use it in all perf tools, such as:

          perf record -e probe_perf:main -aR sleep 1

  Added new event:
    probe_perf:set_target (on set_target in /home/mhiramat/ksrc/linux-3/tools/perf/perf)

  You can now use it in all perf tools, such as:

          perf record -e probe_perf:set_target -aR sleep 1

  Added new event:
    probe:pcspkr_event   (on pcspkr_event in pcspkr)

  You can now use it in all perf tools, such as:

          perf record -e probe:pcspkr_event -aR sleep 1
  -----

Reported-by: Arnaldo Carvalho de Melo <acme@infradead.org>
Signed-off-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20150401102539.17137.46454.stgit@localhost.localdomain
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-10 10:19:53 -03:00
He Kuang
5e78c69b72 perf buildid-list: Fix segfault when show DSOs with hits
commit: f3b623b849 ("perf tools: Reference count struct thread")
appends every thread->node to dead_threads in machine__remove_thread()
and list_del_init() this node in thread__put().

perf_event__exit_del_thread() releases thread wihout using
machine__remove_thread(), and causes a NULL pointer crash when
list_del_init(&thread->node) is called. Fix this by using
machine_remove_thread() instead of using thread__put() directly.

This problem can be reproduced as following:

  $ perf record ls
  $ perf buildid-list --with-hits
  [ 3874.195070] perf[1018]: segfault at 0 ip 00000000004b0b15 sp
  00007ffc35b44780 error 6 in perf[400000+166000]
  Segmentation fault

After this patch:
  $ perf record ls
  $ perf buildid-list --with-hits
  bc23e7c3281e542650ba4324421d6acf78f4c23e /proc/kcore
  643324cb0e969f30c56d660f167f84a150845511 [vdso]
  0000000000000000000000000000000000000000 /bin/busybox
  ...

Signed-off-by: He Kuang <hekuang@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1428658500-6483-1-git-send-email-hekuang@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-10 10:13:59 -03:00
David Ahern
1060ab857f perf tools: Fix cross-endian analysis
Trying to analyze a big endian data file on little endian system fails
with the error:

  0xa9b40 [0x70]: failed to process type: 9

The problem is that header parsing is not done correctly because the
file attributes are not swapped. Make it so. With this patch able to
analyze a sparc64 data file on x86_64.

Signed-off-by: David Ahern <david.ahern@oracle.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1428610546-178789-1-git-send-email-david.ahern@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-10 10:13:58 -03:00
Arnaldo Carvalho de Melo
d998b73259 perf tools: Fix error path to do closedir() when synthesizing threads
When traversing /proc to synthesize the PERF_RECORD_FORK et al events we
were bailing out on errors without calling closedir(), fix it.

Reported-by: David Ahern <dsahern@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Borislav Petkov <bp@suse.de>
Cc: Don Zickus <dzickus@redhat.com>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/n/tip-vxtp593rfztgbi8noy0m967p@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-10 10:13:58 -03:00
David Ahern
7764a385f6 perf tools: Fix synthesizing fork_event.ppid for non-main thread
Commit ca6c41c59b sets the ppid based on what is read from the
/proc/pid/status file when synthesizing fork events.

This is correct thing to do for new processes but not threads of a
process.

Fix ppid for threads to be the main thread when synthesizing fork events
(ie., assume main thread spawned all sub-threads in a process).

Reported-by: Arnaldo Carvalho de Melo <arnaldo.melo@gmail.com>
Signed-off-by: David Ahern <david.ahern@oracle.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Don Zickus <dzickus@redhat.com>
Link: http://lkml.kernel.org/r/1428598107-178999-1-git-send-email-david.ahern@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-10 10:10:55 -03:00
Ingo Molnar
51ab7155c0 perf/core improvements and fixes:
- Teach about perf_event_attr.clockid to 'perf record' (Peter Zijlstra)
 
 - perf sched replay improvements for high CPU core count machines (Yunlong Song)
 
 - Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
   cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
   events (Arnaldo Carvalho de Melo)
 
 - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)
 
 - Respect -i option 'in perf kmem' (Jiri Olsa)
 
 Infrastructure:
 
 - Honor operator priority in libtraceevent (Namhyung Kim)
 
 - Merge all perf_event_attr print functions (Peter Zijlstra)
 
 - Check kmaps access to make code more robust (Wang Nan)
 
 - Fix inverted logic in perf_mmap__empty() (He Kuang)
 
 - Fix ARM 32 'perf probe' building error (Wang Nan)
 
 - Fix perf_event_attr tests (Jiri Olsa)
 
 Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQEcBAABAgAGBQJVJTXFAAoJEBpxZoYYoA71GIoIAM86QGhbod5EkFYPQ775LziL
 3ijzNaCsVFIFIsbVe6i7rE3saYR6py3946Wen7pWERdSfy7rgBduGG7if8i9xErs
 aPAXV6u/FKWM+BoRKgtLeSsj9RgWrtyoCrXAAxu2QStd4ML8DZUuXIviA5PFu6QV
 EQyLOSZw4EfkiDhk0G/6mE5FZCjWf7HUoTEdLq2SW9yv+O90nM9aQmIvo20fRL0X
 HEt8Ei4CWqX43viDFe5yml+2HRC2QbxenEWEAUW2Pop5CeGHYgG2wZjsUVuSreHH
 +YBSaKlhAGdt9mJVklB0Y6KzERE2fivozWthaly+oO5qGeoir6MpIbXT6eX8drY=
 =4fD6
 -----END PGP SIGNATURE-----

Merge tag 'perf-core-for-mingo' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux into perf/core

Pull perf/core improvements and fixes from Arnaldo Carvalho de Melo:

User visible changes:

  - Teach 'perf record' about perf_event_attr.clockid (Peter Zijlstra)

  - Improve 'perf sched replay' on high CPU core count machines (Yunlong Song)

  - Consider PERF_RECORD_ events with cpumode == 0 in 'perf top', removing one
    cause of long term memory usage buildup, i.e. not processing PERF_RECORD_EXIT
    events (Arnaldo Carvalho de Melo)

  - Add 'I' event modifier for perf_event_attr.exclude_idle bit (Jiri Olsa)

  - Respect -i option 'in perf kmem' (Jiri Olsa)

Infrastructure changes:

  - Honor operator priority in libtraceevent (Namhyung Kim)

  - Merge all perf_event_attr print functions (Peter Zijlstra)

  - Check kmaps access to make code more robust (Wang Nan)

  - Fix inverted logic in perf_mmap__empty() (He Kuang)

  - Fix ARM 32 'perf probe' building error (Wang Nan)

  - Fix perf_event_attr tests (Jiri Olsa)

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-04-08 17:03:47 +02:00
Jiri Olsa
a1e12da479 perf tools: Add 'I' event modifier for exclude_idle bit
Adding 'I' event modifier to have complete set of modifiers for
perf_event_attr:exclude_* bits.

Any event specified with 'I' modifier will have the
perf_event_attr:exclude_idle bit set.

  $ perf record -e cycles:I -vv ls 2>&1 | grep exclude_idle
  exclude_hv          0    exclude_idle        1

Adding automated tests.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: William Cohen <wcohen@redhat.com>
Link: http://lkml.kernel.org/r/1428441919-23099-2-git-send-email-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08 11:00:16 -03:00
Wang Nan
f6fcc1433a perf report: Don't call map__kmap if map is NULL.
report__warn_kptr_restrict() calls map__kmap(kernel_map) before checking
kernel_map againest NULL.

Which is dangerous, since map__kmap() will return a invalid and not NULL
address.

It will trigger a warning message in map__kmap() after the patch "perf:
kmaps: enforce usage of kmaps to protect futher bugs." was applied.

This patch fixes it by adding the missing checking.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1428490772-135393-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08 11:00:00 -03:00
Jiri Olsa
54a50f93eb perf tests: Fix attr tests
Following commit:
  1a59413124 perf: Add wakeup watermark control to the AUX area

enlarged perf_event_attr, but did not updated attr tests.

Reported-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <bp@alien8.de>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Kaixu Xia <kaixu.xia@linaro.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Robert Richter <rric@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Markus T Metzger <markus.t.metzger@intel.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Link: http://lkml.kernel.org/n/20150407171715.GA22603@krava.redhat.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08 10:49:53 -03:00
Wang Nan
f6c15621f0 perf probe: Fix ARM 32 building error
Commit 9b118acae3 ("perf probe: Fix to
handle aliased symbols in glibc") uses an absolute format '%lx' to
print u64 argument, which causes compiling error on ARM 32.

This patch replaces it with PRIx64.

Signed-off-by: Wang Nan <wangnan0@huawei.com>
Acked-by: Masami Hiramatsu <masami.hiramatsu.pt@hitachi.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Zefan Li <lizefan@huawei.com>
Cc: pi3orama@163.com
Link: http://lkml.kernel.org/r/1428459274-138470-1-git-send-email-wangnan0@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08 10:49:48 -03:00
Peter Zijlstra
2c5e8c52c6 perf tools: Merge all perf_event_attr print functions
Currently there's 3 (that I found) different and incomplete
implementations of printing perf_event_attr.

This is quite silly. Merge the lot.

While this patch does not retain the exact form all printing that I
found is debug output and thus it should not be critical.

Also, I cannot find a single print_event_desc() caller.

Pre:

 $ perf record -vv -e cycles -- sleep 1
 ------------------------------------------------------------
 perf_event_attr:
  type                0
  size                104
  config              0
  sample_period       4000
  sample_freq         4000
  sample_type         0x107
  read_format         0
  disabled            1    inherit             1
  pinned              0    exclusive           0
  exclude_user        0    exclude_kernel      0
  exclude_hv          0    exclude_idle        0
  mmap                1    comm                1
  mmap2               1    comm_exec           1
  freq                1    inherit_stat        0
  enable_on_exec      1    task                1
  watermark           0    precise_ip          0
  mmap_data           0    sample_id_all       1
  exclude_host        0    exclude_guest       1
  excl.callchain_kern 0    excl.callchain_user 0
  wakeup_events       0
  wakeup_watermark    0
  bp_type             0
  bp_addr             0
  config1             0
  bp_len              0
  config2             0
  branch_sample_type  0
  sample_regs_user    0
  sample_stack_user   0
  sample_regs_intr    0
 ------------------------------------------------------------

 $ perf evlist  -vv
 cycles: sample_freq=4000, size: 104, sample_type: IP|TID|TIME|PERIOD,
 disabled: 1, inherit: 1, mmap: 1, mmap2: 1, comm: 1, comm_exec: 1,
 freq: 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1

 Post:

 $ ./perf record -vv -e cycles -- sleep 1
 ------------------------------------------------------------
 perf_event_attr:
  size                             112
  { sample_period, sample_freq }   4000
  sample_type                      IP|TID|TIME|PERIOD
  disabled                         1
  inherit                          1
  mmap                             1
  comm                             1
  freq                             1
  enable_on_exec                   1
  task                             1
  sample_id_all                    1
  exclude_guest                    1
  mmap2                            1
  comm_exec                        1
------------------------------------------------------------

 $ ./perf evlist  -vv
 cycles: size: 112, { sample_period, sample_freq }: 4000, sample_type:
 IP|TID|TIME|PERIOD, disabled: 1, inherit: 1, mmap: 1, comm: 1, freq:
 1, enable_on_exec: 1, task: 1, sample_id_all: 1, exclude_guest: 1,
 mmap2: 1, comm_exec: 1

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ingo Molnar <mingo@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/20150407091150.644238729@infradead.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08 10:06:28 -03:00
Peter Zijlstra
814c8c38e1 perf record: Add clockid parameter
Teach perf-record about the new perf_event_attr::{use_clockid, clockid}
fields. Add a simple parameter to set the clock (if any) to be used for
the events to be recorded into the data file.

Since we store the entire perf_event_attr in the EVENT_DESC section we
also already store the used clockid in the data file.

Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: David Ahern <dsahern@gmail.com>
Cc: "H. Peter Anvin" <hpa@zytor.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Yunlong Song <yunlong.song@huawei.com>
Link: http://lkml.kernel.org/r/20150407154851.GR23123@twins.programming.kicks-ass.net
[ Conditionally define CLOCK_BOOTTIME, at least rhel6 doesn't have it - dsahern
  Ditto for CLOCK_MONOTONIC_RAW, sles11sp2 doesn't have it - yunlong.song ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08 10:04:55 -03:00
Yunlong Song
ff5f3bbd40 perf sched replay: Use replay_repeat to calculate the runavg of cpu usage instead of the default value 10
Since sched->replay_repeat is set to 10 as default, the sched->run_avg,
sched->runavg_cpu_usage, and sched->runavg_parent_cpu_usage all use
10 to calculate their value.

However, the replay_repeat can be changed to other value by using -r
option, so the calculation above should use replay_repeat to achieve
more accurate results instead of the default value 10.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-10-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08 09:07:27 -03:00
Yunlong Song
f0dd330fdf perf sched replay: Support using -f to override perf.data file ownership
Enable to use perf.data when it is not owned by current user or root.

Example:

 $ ls -al perf.data
 -rw------- 1 Yunlong.Song Yunlong.Song 5321918 Mar 25 15:14 perf.data
 $ sudo id
 uid=0(root) gid=0(root) groups=0(root),64(pkcs11)

Before this patch:

 $ sudo perf sched replay -f
 run measurement overhead: 98 nsecs
 sleep measurement overhead: 52909 nsecs
 the run test took 1000015 nsecs
 the sleep test took 1054253 nsecs
 File perf.data not owned by current user or root (use -f to override)

As shown above, the -f option does not work at all.

After this patch:

 $ sudo perf sched replay -f
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 40514 nsecs
 the run test took 1000003 nsecs
 the sleep test took 1056098 nsecs
 nr_run_events:        10
 nr_sleep_events:      1562
 nr_wakeup_events:     5
 task      0 (                  :1:         1), nr_events: 1
 task      1 (                  :2:         2), nr_events: 1
 task      2 (                  :3:         3), nr_events: 1
 ...
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 ------------------------------------------------------------
 #1  : 50.198, ravg: 50.20, cpu: 2335.18 / 2335.18
 #2  : 219.099, ravg: 67.09, cpu: 2835.11 / 2385.17
 #3  : 238.626, ravg: 84.24, cpu: 3278.26 / 2474.48
 #4  : 200.364, ravg: 95.85, cpu: 2977.41 / 2524.77
 #5  : 176.882, ravg: 103.96, cpu: 2801.35 / 2552.43
 #6  : 191.093, ravg: 112.67, cpu: 2813.70 / 2578.56
 #7  : 189.448, ravg: 120.35, cpu: 2809.21 / 2601.62
 #8  : 200.637, ravg: 128.38, cpu: 2849.91 / 2626.45
 #9  : 248.338, ravg: 140.37, cpu: 4380.61 / 2801.87
 #10 : 511.139, ravg: 177.45, cpu: 3077.73 / 2829.45

As shown above, the -f option really works now.

Besides for replay, -f option can also work for latency and map.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-9-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08 09:07:26 -03:00
Yunlong Song
939cda521a perf sched replay: Fix the EMFILE error caused by the limitation of the maximum open files
The soft maximum number of open files for a calling process is 1024,
which is defined as INR_OPEN_CUR in include/uapi/linux/fs.h, and the
hard maximum number of open files for a calling process is 4096, which
is defined as INR_OPEN_MAX in include/uapi/linux/fs.h.

Both INR_OPEN_CUR and INR_OPEN_MAX are used to limit the value of
RLIMIT_NOFILE in include/asm-generic/resource.h.

And the soft maximum number finally decides the limitation of the
maximum files which are allowed to be opened.

That is to say a process can use at most 1024 file descriptors for its
o pened files, or an EMFILE error will happen.

This error can be fixed by increasing the soft maximum number, under the
constraint that the soft maximum number can not exceed the hard maximum
number, or both soft and hard maximum number should be increased
simultaneously with privilege.

For perf sched replay, it uses sys_perf_event_open to create the file
descriptor for each of the tasks in order to handle information of perf
events.

That is to say each task needs a unique file descriptor. In x86_64,
there may be over 1024 or 4096 tasks correspoinding to the record in
perf.data, which causes that no enough file descriptors can be used.

As a result, EMFILE error happens and stops the replay process. To solve
this problem, we adaptively increase the soft and hard maximum number of
open files with a '-f' option.

Example:

Test environment: x86_64 with 160 cores

 $ cat /proc/sys/kernel/pid_max
 163840
 $ cat /proc/sys/fs/file-max
 6815744
 $ ulimit -Sn
 1024
 $ ulimit -Hn
 4096

Before this patch:

 $ perf sched replay
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 Error: sys_perf_event_open() syscall returned with -1 (Too many open
 files)

After this patch:

 $ perf sched replay
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 Error: sys_perf_event_open() syscall returned with -1 (Too many open
 files)
 Have a try with -f option

 $ perf sched replay -f
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 ------------------------------------------------------------
 #1  : 54.401, ravg: 54.40, cpu: 3285.21 / 3285.21
 #2  : 199.548, ravg: 68.92, cpu: 4999.65 / 3456.66
 #3  : 170.483, ravg: 79.07, cpu: 1349.94 / 3245.99
 #4  : 192.034, ravg: 90.37, cpu: 1322.88 / 3053.67
 #5  : 182.929, ravg: 99.62, cpu: 1406.51 / 2888.96
 #6  : 152.974, ravg: 104.96, cpu: 1167.54 / 2716.82
 #7  : 155.579, ravg: 110.02, cpu: 2992.53 / 2744.39
 #8  : 130.557, ravg: 112.08, cpu: 1126.43 / 2582.59
 #9  : 138.520, ravg: 114.72, cpu: 1253.22 / 2449.65
 #10 : 134.328, ravg: 116.68, cpu: 1587.95 / 2363.48

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-8-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08 09:07:26 -03:00
Yunlong Song
1aff59be53 perf sched replay: Handle the dead halt of sem_wait when create_tasks() fails for any task
Since there is sem_wait for each task in the wait_for_tasks(), e.g.
sem_wait(&task->work_done_sem).

The sem_wait can continue only when work_done_sem is greater than 0, or
it will be blocked.

For perf sched replay, one task may sem_post the work_done_sem of
another task, which causes the work_done_sem of that task processed in a
reasonable sequence, e.g. sem_post, sem_wait, sem_wait, sem_post...

This sequence simulates the sched process of the running tasks at the
time when perf sched record runs.

As a result, all the tasks are required and their threads must be
successfully created.

If any one (task A) of the tasks fails to create its thread, then
another task (task B), whose work_done_sem needs sem_post from that
failed task A, may likely block itself due to seg_wait.

And this is a dead halt, since task B's thread_func cannot continue at
all.

To solve this problem, perf sched replay should exit once any task fails
to create its thread.

Example:

Test environment: x86_64 with 160 cores

Before this patch:

 $ perf sched replay
 ...
 Error: sys_perf_event_open() syscall returned with -1 (Too many open
 files)
 ------------------------------------------------------------    <- dead halt

After this patch:

 $ perf sched replay
 ...
 task   1551 (           <unknown>:         0), nr_events: 10
 Error: sys_perf_event_open() syscall returned with -1 (Too many open
 files)
 $

As shown above, perf sched replay finishes the process after printing an
error message and does not block itself.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-7-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08 09:07:25 -03:00
Yunlong Song
08097abc11 perf sched replay: Fix the segmentation fault problem caused by pr_err in threads
The pr_err in self_open_counters() prints error message to stderr.
Unlike stdout, stderr uses memory buffer on the stack of each calling
process.

The pr_err in self_open_counters() works in a thread called thread_func
created in function create_tasks, which concurrently creates
sched->nr_tasks threads.

If the error happens and pr_err prints the error message in each of
these threads, the stack size of the perf process (default is 8192
kbytes) will quickly run out and the segmentation fault will happen
then.

To solve this problem, pr_err with self_open_counters() should be moved
from newly created threads to the old main thread of the perf process.
Then the pr_err can work in a stable situation without the strange
segmentation fault problem.

Example:

Test environment: x86_64 with 160 cores

Before this patch:

 $ perf sched replay
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 Segmentation fault

After this patch:

 $ perf sched replay
 ...
 task   1549 (             :163132:    163132), nr_events: 1
 task   1550 (             :163540:    163540), nr_events: 1
 task   1551 (           <unknown>:         0), nr_events: 10
 ...

As shown above, the result continues without any segmentation fault.

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-6-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08 09:07:24 -03:00
Yunlong Song
3a423a5c36 perf sched replay: Realloc the memory of pid_to_task stepwise to adapt to the different pid_max configurations
Although the memory of pid_to_task can be allocated via calloc according
to the value of /proc/sys/kernel/pid_max, it cannot handle the case when
pid_max is changed after 'perf sched record' has created its perf.data.

If the new pid_max configured in 'perf sched replay' is smaller than the
old pid_max configured in 'perf sched record', then it will cause the
assertion failure problem.

To solve this problem, we realloc the memory of pid_to_task stepwise
once the passed-in pid parameter in register_pid is larger than the
current pid_max.

Example:

Test environment: x86_64 with 160 cores

 $ cat /proc/sys/kernel/pid_max
 163840
 $ perf sched record ls
 $ echo 5000 > /proc/sys/kernel/pid_max
 $ cat /proc/sys/kernel/pid_max
 5000

Before this patch:

 $ perf sched replay
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 55356 nsecs
 the run test took 1000011 nsecs
 the sleep test took 1060940 nsecs
 perf: builtin-sched.c:337: register_pid: Assertion `!(pid >= (unsigned
 long)pid_max)' failed.
 Aborted

After this patch:

 $ perf sched replay
 run measurement overhead: 221 nsecs
 sleep measurement overhead: 55611 nsecs
 the run test took 1000026 nsecs
 the sleep test took 1060486 nsecs
 nr_run_events:        10
 nr_sleep_events:      1562
 nr_wakeup_events:     5
 task      0 (                  :1:         1), nr_events: 1
 task      1 (                  :2:         2), nr_events: 1
 task      2 (                  :3:         3), nr_events: 1
 task      3 (                  :5:         5), nr_events: 1
 ...

Signed-off-by: Yunlong Song <yunlong.song@huawei.com>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/1427809596-29559-5-git-send-email-yunlong.song@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2015-04-08 09:07:23 -03:00