2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-27 06:34:11 +08:00
Commit Graph

7983 Commits

Author SHA1 Message Date
Jiri Olsa
a82bfd041d perf ui progress: Fix progress update
We currently update the 'next' variable only with a single step value.
But it's possible the 'adv' update is bigger than single 'step' value.
This would leave 'next' value under counted and force unnecessary
ui_progress__ops->update calls.

Calculate the amount of steps we need for 'adv' update and increase the
'next' with that amounts of steps.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908120510.22515-3-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-12 12:34:54 -03:00
Jiri Olsa
4d286c89e4 perf ui progress: Make sure we always define step value
Unlikely, but we could have ui_progress__init being called with total <
16, which would set the next and step variables to 0. That would force
unnecessary ui_progress__ops->update calls because 'next' would never
raise.

Forcing the next and step values to be always > 0.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908120510.22515-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-12 12:34:46 -03:00
Jiri Olsa
cd6379ebb5 perf tools: Open perf.data with O_CLOEXEC flag
Do not carry the perf.data file descriptor into the workload process and
close it when perf executes the workload.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170908084621.31595-2-jolsa@kernel.org
[ Add definitions for O_CLOEXEC for older systems ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-12 12:34:23 -03:00
Milian Wolff
df90cc41d6 perf tests: Fix compile when libunwind's unwind.h is available
When cross compiling perf and I want to link against a self-compiled
libunwind, I usually make the custom path where the libunwind headers
exist visible by adding the libunwind prefix to the include path when
compiling perf, i.e.:

~~~~~
$ ls $HOME/projects/compiled/other/include/
libunwind-coredump.h  libunwind.h         libunwind-x86_64.h
libunwind-common.h  libunwind-dynamic.h   libunwind-ptrace.h
unwind.h
$ make EXTRA_CFLAGS="-I$HOME/projects/compiled/other/include/
~~~~~~

Note the `unwind.h` header from libunwind which leads to compile
errors when compiling tests/dwarf-unwind.c, since it shadows perf's
util/unwind.h:

~~~~~
tests/dwarf-unwind.c:41:32: error: ‘struct unwind_entry’ declared inside parameter list will not be visible outside of this definition or declaration [-Werror]
 static int unwind_entry(struct unwind_entry *entry, void *arg)
                                ^~~~~~~~~~~~
tests/dwarf-unwind.c: In function ‘unwind_entry’:
tests/dwarf-unwind.c:44:22: error: dereferencing pointer to incomplete type ‘struct unwind_entry’
  char *symbol = entry->sym ? entry->sym->name : NULL;
                      ^~
tests/dwarf-unwind.c: In function ‘unwind_thread’:
tests/dwarf-unwind.c:92:8: error: implicit declaration of function ‘unwind__get_entries’; did you mean ‘unwind_entry’? [-Werror=implicit-function-declaration]
  err = unwind__get_entries(unwind_entry, &cnt, thread,
        ^~~~~~~~~~~~~~~~~~~
        unwind_entry
tests/dwarf-unwind.c:92:8: error: nested extern declaration of ‘unwind__get_entries’ [-Werror=nested-externs]
~~~~~~

Fix this compile error by specificing an explicit include of perf's
unwind.h in the util folder.

Signed-off-by: Milian Wolff <milian.wolff@kdab.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Yao Jin <yao.jin@linux.intel.com>
Link: http://lkml.kernel.org/r/20170906150209.12579-1-milian.wolff@kdab.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-12 12:34:02 -03:00
Arnaldo Carvalho de Melo
eba9fac017 perf annotate browser: Help for cycling thru hottest instructions with TAB/shift+TAB
The popup help accessed via 'h' wasn't mentioning about TAB and
shift-TAB, just about 'H', which goes to the hottest line, while the
former two are the hotkeys for actually cycling thru the hottest lines.

Reported-by: Flavio Bruno Leitner <fbl@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Taeung Song <treeze.taeung@gmail.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-5ppym6odizfj1ifa4t7neiku@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-01 14:55:40 -03:00
Arnaldo Carvalho de Melo
63ce8449bc perf stat: Only auto-merge events that are PMU aliases
Peter reported that when he explicitely asked for multiple events with
the same name on the command line it got coalesced into just one line,
i.e.:

   # perf stat -e cycles -e cycles -e cycles usleep 1

   Performance counter stats for 'usleep 1':

         3,269,652      cycles

       0.000884123 seconds time elapsed

  #

And while there is the --no-merges option to disable that auto-merging,
this is a blunt change in behaviour for such explicit request, so change
the code so that this auto merging is done only when handling the multi
PMU aliases with the same name that introduced this coalescing,
restoring the previous behaviour for the explicit case:

  # perf stat -e cycles -e cycles -e cycles usleep 1

   Performance counter stats for 'usleep 1':

         1,472,837      cycles
         1,472,837      cycles
         1,472,837      cycles

       0.001764870 seconds time elapsed

  #

Reported-by: Peter Zijlstra <peterz@infradead.org>
Acked-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Fixes: 430daf2dc7 ("perf stat: Collapse identically named events")
Link: http://lkml.kernel.org/r/20170831184122.GK4831@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-01 14:48:59 -03:00
Kan Liang
fc33dccba3 perf test: Add test case for PERF_SAMPLE_PHYS_ADDR
Extend sample-parsing test cases to support new sample type
PERF_SAMPLE_PHYS_ADDR.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1504026672-7304-6-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-01 14:46:34 -03:00
Kan Liang
49d58f04eb perf script: Support physical address
Display the physical address at the tail if it is available.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1504026672-7304-5-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-01 14:46:29 -03:00
Kan Liang
c35aeb9dfe perf mem: Support physical address
Add option phys-data in "perf mem" to record/report physical address.
The default mem sort order for physical address is changed accordingly.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1504026672-7304-4-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-01 14:46:23 -03:00
Kan Liang
8780fb25ab perf sort: Add sort option for physical address
Add a new sort option "phys_daddr" for --mem-mode sort.  With this
option applied, perf can sort and report by sample's physical address.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1504026672-7304-3-git-send-email-kan.liang@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-01 14:46:11 -03:00
Kan Liang
3b0a5daa06 perf tools: Support new sample type for physical address
Support new sample type PERF_SAMPLE_PHYS_ADDR for physical address.

Add new option --phys-data to record sample physical address.

Signed-off-by: Kan Liang <kan.liang@intel.com>
Tested-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Stephane Eranian <eranian@google.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1504026672-7304-2-git-send-email-kan.liang@intel.com
[ Added missing printing in evsel.c patch sent by Jiri Olsa ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-01 14:46:00 -03:00
Sukadev Bhattiprolu
2a118e1bd2 perf vendor events powerpc: Remove duplicate events
Some POWER PMU event names have multiple/alternate event codes. These
alternate event codes were listed in the POWER9 JSON files for
reference.

But the perf tool does not seem to handle duplicates cleanly. 'perf
list' shows such duplicate events only once, but 'perf stat' ends up
counting the first event code twice, multiplexing if necessary and we
end up with double the event counts.

Remove the duplicate event codes from the JSON files for now.

Reported-by: Michael Petlan <mpetlan@redhat.com>
Signed-off-by: Sukadev Bhattiprolu <sukadev@linux.vnet.ibm.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: Anton Blanchard <anton@au1.ibm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Link: http://lkml.kernel.org/r/20170830231506.GB20351@us.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-01 14:46:00 -03:00
Jack Henschel
4fb2053920 perf intel-pt: Fix syntax in documentation of config option
As specified in tools/perf/Documentation/perf-config.txt, perf
configuration items must be in 'key = value' format, otherwise the
following error message occurs:

  $ perf record -e intel_pt//u -- ls
  bad config file line 2 in ~/.perfconfig
  $ cat .perfconfig
  [intel-pt]
      mispred-all

Changing to assigning a value to the key 'mispred-all' fixes the issue:

  $ perf record -e intel_pt//u -- ls
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Capured and wrote 0.031 MB perf.data]
  $ cat .perfconfig
  [intel-pt]
      mispred-all = true

Signed-off-by: Jack Henschel <jackdev@mailbox.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/20170831080535.2157-1-jackdev@mailbox.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-01 14:45:59 -03:00
Ravi Bangoria
9a805d8648 perf test powerpc: Fix 'Object code reading' test
'Object code reading' test always fails on powerpc guest. Two reasons
for the failure are:

1. When elf section is too big (size beyond 'unsigned int' max value).
objdump fails to disassemble from such section. This was fixed with
commit 0f6329bd7fc ("binutils/objdump: Fix disassemble for huge elf
sections") in binutils.

2. When the sample is from hypervisor. Hypervisor symbols can not be
resolved within guest and thus thread__find_addr_map() fails for such
symbols. Fix this by ignoring hypervisor symbols in the test.

Signed-off-by: Ravi Bangoria <ravi.bangoria@linux.vnet.ibm.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: linuxppc-dev@lists.ozlabs.org
Link: http://lkml.kernel.org/r/1504170896-7876-1-git-send-email-ravi.bangoria@linux.vnet.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-01 14:45:59 -03:00
Arnaldo Carvalho de Melo
27702bcfe8 perf trace: Support syscall name globbing
So now we can use:

  # perf trace -e pkey_*
   532.784 ( 0.006 ms): pkey/16018 pkey_alloc(init_val: DISABLE_WRITE) = -1 EINVAL Invalid argument
   532.795 ( 0.004 ms): pkey/16018 pkey_mprotect(start: 0x7f380d0a6000, len: 4096, prot: READ|WRITE, pkey: -1) = 0
   532.801 ( 0.002 ms): pkey/16018 pkey_free(pkey: -1                ) = -1 EINVAL Invalid argument
  ^C[root@jouet ~]#

Or '-e epoll*', '-e *msg*', etc.

Combining syscall names with perf events, tracepoints, etc, continues to
be valid, i.e. this is possible:

  # perf probe -L sys_nanosleep
  <SyS_nanosleep@/home/acme/git/linux/kernel/time/hrtimer.c:0>
      0  SYSCALL_DEFINE2(nanosleep, struct timespec __user *, rqtp,
                        struct timespec __user *, rmtp)
         {
                struct timespec64 tu;

      5         if (get_timespec64(&tu, rqtp))
      6                 return -EFAULT;

                if (!timespec64_valid(&tu))
      9                 return -EINVAL;

     11         current->restart_block.nanosleep.type = rmtp ? TT_NATIVE : TT_NONE;
     12         current->restart_block.nanosleep.rmtp = rmtp;
     13         return hrtimer_nanosleep(&tu, HRTIMER_MODE_REL, CLOCK_MONOTONIC);
         }

  # perf probe my_probe="sys_nanosleep:12 rmtp"
  Added new event:
    probe:my_probe       (on sys_nanosleep:12 with rmtp)

  You can now use it in all perf tools, such as:

	perf record -e probe:my_probe -aR sleep 1

  #
  # perf trace -e probe:my_probe/max-stack=5/,*sleep sleep 1
     0.427 ( 0.003 ms): sleep/16690 nanosleep(rqtp: 0x7ffefc245090) ...
     0.430 (         ): probe:my_probe:(ffffffffbd112923) rmtp=0)
                                       sys_nanosleep ([kernel.kallsyms])
                                       do_syscall_64 ([kernel.kallsyms])
                                       return_from_SYSCALL_64 ([kernel.kallsyms])
                                       __nanosleep_nocancel (/usr/lib64/libc-2.25.so)
     0.427 (1000.208 ms): sleep/16690  ... [continued]: nanosleep()) = 0
  #

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-elycoi8wy6y0w9dkj7ox1mzz@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-01 14:45:58 -03:00
Arnaldo Carvalho de Melo
89be3f8ab7 perf syscalltbl: Support glob matching on syscall names
With two new methods, one to find the first match, returning its syscall
id and its index in whatever internal database it keeps the syscall
into, then one to find the next match, if any.

Implemented only on arches where we actually read the syscall table from
the kernel sources, i.e. x86-64 for now, all the others use the libaudit
method for which this returns -1, i.e. just stubs were added, with the
actual implementation using whatever libaudit functions for matching
that may be available.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-i0sj4rxk1a63pfe9gl8z8irs@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-09-01 14:45:48 -03:00
Jin Yao
c4ee06251d perf report: Calculate the average cycles of iterations
The branch history code has a loop detection function. With this, we can
get the number of iterations by calculating the removed loops.

While it would be nice for knowing the average cycles of iterations.
This patch adds up the cycles in branch entries of removed loops and
save the result to the next branch entry (e.g. branch entry A).

Finally it will display the iteration number and average cycles at the
"from" of branch entry A.

For example:
perf record -g -j any,save_type ./div
perf report --branch-history --no-children --stdio

--22.63%--main div.c:42 (RET CROSS_2M)
          compute_flag div.c:28 (cycles:2 iter:173115 avg_cycles:2)
          |
           --10.73%--compute_flag div.c:27 (RET CROSS_2M)
                     rand rand.c:28 (cycles:1)
                     rand rand.c:28 (RET CROSS_2M)
                     __random random.c:298 (cycles:1)
                     __random random.c:297 (COND_BWD CROSS_2M)
                     __random random.c:295 (cycles:1)
                     __random random.c:295 (COND_BWD CROSS_2M)
                     __random random.c:295 (cycles:1)
                     __random random.c:295 (RET CROSS_2M)

Signed-off-by: Yao Jin <yao.jin@linux.intel.com>
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/1502111115-18305-1-git-send-email-yao.jin@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-30 10:03:27 -03:00
Li Bin
b2f7605076 perf symbols: Fix plt entry calculation for ARM and AARCH64
On x86, the plt header size is as same as the plt entry size, and can be
identified from shdr's sh_entsize of the plt.

But we can't assume that the sh_entsize of the plt shdr is always the
plt entry size in all architecture, and the plt header size may be not
as same as the plt entry size in some architecure.

On ARM, the plt header size is 20 bytes and the plt entry size is 12
bytes (don't consider the FOUR_WORD_PLT case) that refer to the binutils
implementation. The plt section is as follows:

Disassembly of section .plt:
000004a0 <__cxa_finalize@plt-0x14>:
 4a0:   e52de004        push    {lr}            ; (str lr, [sp, #-4]!)
 4a4:   e59fe004        ldr     lr, [pc, #4]    ; 4b0 <_init+0x1c>
 4a8:   e08fe00e        add     lr, pc, lr
 4ac:   e5bef008        ldr     pc, [lr, #8]!
 4b0:   00008424        .word   0x00008424

000004b4 <__cxa_finalize@plt>:
 4b4:   e28fc600        add     ip, pc, #0, 12
 4b8:   e28cca08        add     ip, ip, #8, 20  ; 0x8000
 4bc:   e5bcf424        ldr     pc, [ip, #1060]!        ; 0x424

000004c0 <printf@plt>:
 4c0:   e28fc600        add     ip, pc, #0, 12
 4c4:   e28cca08        add     ip, ip, #8, 20  ; 0x8000
 4c8:   e5bcf41c        ldr     pc, [ip, #1052]!        ; 0x41c

On AARCH64, the plt header size is 32 bytes and the plt entry size is 16
bytes.  The plt section is as follows:

Disassembly of section .plt:
0000000000000560 <__cxa_finalize@plt-0x20>:
 560:   a9bf7bf0        stp     x16, x30, [sp,#-16]!
 564:   90000090        adrp    x16, 10000 <__FRAME_END__+0xf8a8>
 568:   f944be11        ldr     x17, [x16,#2424]
 56c:   9125e210        add     x16, x16, #0x978
 570:   d61f0220        br      x17
 574:   d503201f        nop
 578:   d503201f        nop
 57c:   d503201f        nop

0000000000000580 <__cxa_finalize@plt>:
 580:   90000090        adrp    x16, 10000 <__FRAME_END__+0xf8a8>
 584:   f944c211        ldr     x17, [x16,#2432]
 588:   91260210        add     x16, x16, #0x980
 58c:   d61f0220        br      x17

0000000000000590 <__gmon_start__@plt>:
 590:   90000090        adrp    x16, 10000 <__FRAME_END__+0xf8a8>
 594:   f944c611        ldr     x17, [x16,#2440]
 598:   91262210        add     x16, x16, #0x988
 59c:   d61f0220        br      x17

NOTES:

In addition to ARM and AARCH64, other architectures, such as
s390/alpha/mips/parisc/poperpc/sh/sparc/xtensa also need to consider
this issue.

Signed-off-by: Li Bin <huawei.libin@huawei.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexis Berlemont <alexis.berlemont@gmail.com>
Cc: David Tolnay <dtolnay@gmail.com>
Cc: Hanjun Guo <guohanjun@huawei.com>
Cc: Hemant Kumar <hemant@linux.vnet.ibm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Milian Wolff <milian.wolff@kdab.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: zhangmengting@huawei.com
Link: http://lkml.kernel.org/r/1496622849-21877-1-git-send-email-huawei.libin@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-29 11:41:27 -03:00
Li Bin
2c29461e27 perf probe: Fix kprobe blacklist checking condition
The commit 9aaf5a5f47 ("perf probe: Check kprobes blacklist when
adding new events"), 'perf probe' supports checking the blacklist of the
fuctions which can not be probed.  But the checking condition is wrong,
that the end_addr of the symbol which is the start_addr of the next
symbol can't be included.

Committer notes:

IOW make it match its kernel counterpart in kernel/kprobes.c:

  bool within_kprobe_blacklist(unsigned long addr)

Each entry have as its end address not its end address, but the first
address _outside_ that symbol, which for related functions, is the first
address of the next symbol, like these from kernel/trace/trace_probe.c:

0xffffffffbd198df0-0xffffffffbd198e40	print_type_u8
0xffffffffbd198e40-0xffffffffbd198e90	print_type_u16
0xffffffffbd198e90-0xffffffffbd198ee0	print_type_u32
0xffffffffbd198ee0-0xffffffffbd198f30	print_type_u64
0xffffffffbd198f30-0xffffffffbd198f80	print_type_s8
0xffffffffbd198f80-0xffffffffbd198fd0	print_type_s16
0xffffffffbd198fd0-0xffffffffbd199020	print_type_s32
0xffffffffbd199020-0xffffffffbd199070	print_type_s64
0xffffffffbd199070-0xffffffffbd1990c0	print_type_x8
0xffffffffbd1990c0-0xffffffffbd199110	print_type_x16
0xffffffffbd199110-0xffffffffbd199160	print_type_x32
0xffffffffbd199160-0xffffffffbd1991b0	print_type_x64

But not always:

0xffffffffbd1997b0-0xffffffffbd1997c0	fetch_kernel_stack_address (kernel/trace/trace_probe.c)
0xffffffffbd1c57f0-0xffffffffbd1c58b0	__context_tracking_enter   (kernel/context_tracking.c)

Signed-off-by: Li Bin <huawei.libin@huawei.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Wang Nan <wangnan0@huawei.com>
Cc: zhangmengting@huawei.com
Fixes: 9aaf5a5f47 ("perf probe: Check kprobes blacklist when adding new events")
Link: http://lkml.kernel.org/r/1504011443-7269-1-git-send-email-huawei.libin@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-29 11:14:12 -03:00
Arnaldo Carvalho de Melo
83bc9c371e perf trace beauty: Beautify pkey_{alloc,free,mprotect} arguments
Reuse 'mprotect' beautifiers for 'pkey_mprotect'.

System wide tracing pkey_alloc, pkey_free and pkey_mprotect calls, with
backtraces:

  # perf trace -e pkey_alloc,pkey_mprotect,pkey_free --max-stack=5
     0.000 ( 0.011 ms): pkey/7818 pkey_alloc(init_val: DISABLE_ACCESS|DISABLE_WRITE) = -1 EINVAL Invalid argument
                                       syscall (/usr/lib64/libc-2.25.so)
                                       pkey_alloc (/home/acme/c/pkey)
     0.022 ( 0.003 ms): pkey/7818 pkey_mprotect(start: 0x7f28c3890000, len: 4096, prot: READ|WRITE, pkey: -1) = 0
                                       syscall (/usr/lib64/libc-2.25.so)
                                       pkey_mprotect (/home/acme/c/pkey)
     0.030 ( 0.002 ms): pkey/7818 pkey_free(pkey: -1                               ) = -1 EINVAL Invalid argument
                                       syscall (/usr/lib64/libc-2.25.so)
                                       pkey_free (/home/acme/c/pkey)

The tools/include/uapi/asm-generic/mman-common.h file is used to find
the access rights defines for the pkey_alloc syscall second argument.

Since we have the detector of changes for the tools/include header files
versus its kernel origin (include/uapi/asm-generic/mman-common.h), we'll
get whatever new flag appears for that argument automatically.

This method should be used in other cases where it is easy to generate
those flags tables because the header has properly namespaced defines
like PKEY_DISABLE_ACCESS and PKEY_DISABLE_WRITE.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/n/tip-3xq5312qlks7wtfzv2sk3nct@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-28 16:44:47 -03:00
David Carrillo-Cisneros
70ff7c6caa perf tools: Pass full path of FEATURES_DUMP
When building with an external FEATURES_DUMP, bpf complains
that features dump file is not found. Fix it by passing full file path.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Paul Turner <pjt@google.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20170827075442.108534-7-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-28 16:44:46 -03:00
David Carrillo-Cisneros
3866058ef1 perf tools: Robustify detection of clang binary
Prior to this patch, make scripts tested for CLANG with ifeq ($(CC),
clang), failing to detect CLANG binaries with different names. Fix it by
testing for the existence of __clang__ macro in the list of compiler
defined macros.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Paul Turner <pjt@google.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20170827075442.108534-5-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-28 16:44:46 -03:00
David Carrillo-Cisneros
39a59f1e3e perf tools: Allow external definition of flex and bison binary names
Allow user to define flex and bison binary names by passing FLEX and
BISON variables.

Signed-off-by: David Carrillo-Cisneros <davidcc@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Paul Turner <pjt@google.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lkml.kernel.org/r/20170827075442.108534-3-davidcc@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-28 16:44:45 -03:00
Jiri Olsa
9933183e36 perf report: Group stat values on global event id
There's no big value on displaying counts for every event ID, which is
one per every CPU. Rather than that, displaying the whole sum for the
event.

  $ perf record -c 100000 -e cycles:u -s test
  $ perf report -T

Before:
  #  PID   TID  cycles:u  cycles:u  cycles:u  cycles:u  ... [20 more columns of 'cycles:u']
    3339  3339         0         0         0         0
    3340  3340         0         0         0         0
    3341  3341         0         0         0         0
    3342  3342         0         0         0         0

Now:
  #  PID   TID  cycles:u
    3339  3339     19678
    3340  3340     18744
    3341  3341     17335
    3342  3342     26414

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170824162737.7813-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-28 16:44:44 -03:00
Jiri Olsa
a1834fc938 perf values: Zero value buffers
We need to make sure the array of value pointers are zero initialized,
because we use them in realloc later on and uninitialized non zero value
will cause allocation error and aborted execution.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170824162737.7813-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-28 16:44:43 -03:00
Jiri Olsa
f4ef3b7c18 perf values: Fix allocation check
Bailing out in case the allocation failed, not the other way round.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170824162737.7813-8-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-28 16:44:43 -03:00
Jiri Olsa
64eed1deb6 perf values: Fix thread index bug
We are taking wrong index (+1) for first thread, which leaves thread
with index 0 unused and uninitialized.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170824162737.7813-7-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-28 16:44:42 -03:00
Jiri Olsa
dac7f6b7ed perf report: Add dump_read function
Adding dump_read function to gather all the dump output of read
function. Adding output of enabled and running times and id if enabled
(3 new lines with '...' prefix below).

  $ perf record -s ...
  $ perf report -D

  958358311769 0x91f8 [0x40]: PERF_RECORD_READ: 3339 3339 cycles:u 0
  ... time enabled : 958358313731
  ... time running : 958358313731
  ... id           : 80

Committer note:

Do not use 'read' as a variable name as it breaks the build on older
systems, such as RHEL6:

    CC       /tmp/build/perf/util/session.o
  cc1: warnings being treated as errors
  util/session.c: In function 'dump_read':
  util/session.c:1132: error: declaration of 'read' shadows a global declaration
  /usr/include/bits/unistd.h:35: error: shadowed declaration is here
  mv: cannot stat `/tmp/build/perf/util/.session.o.tmp': No such file or directory

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170824162737.7813-6-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-28 16:43:50 -03:00
Jiri Olsa
a17f069787 perf record: Set read_format for inherit_stat
Set read_format for what we expect to get from read event generated by
perf_event_attr::inherit_stat.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <andi@firstfloor.org>
Cc: David Ahern <dsahern@gmail.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170824162737.7813-5-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-28 11:05:10 -03:00
Jiri Olsa
12c15302dd perf c2c: Fix remote HITM detection for Skylake
Skylake introduced new mem_remote bit in union perf_mem_data_src [1].
It applies to any other memory level to express Remote unknown level, as
is reported by Skylake.

Adding this extra check to c2c_decode_stats to properly decode remote
HITMs on Skylake.

[1] http://lkml.kernel.org/r/20170816222156.19953-4-andi@firstfloor.org

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Andi Kleen <ak@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: David Ahern <dsahern@gmail.com>
Cc: Joe Mario <jmario@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20170824085732.28481-1-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-28 11:05:10 -03:00
Jiri Olsa
6bd76b8fab perf tools: Fix static build with newer toolchains
We can't pass --dynamic-list list into static build anymore, because
compilers starts to scream about that. Fedora 26 started to fail build
with following error:

  $ make LDFLAGS=-static
  ...
  /usr/bin/ld: dynamic STT_GNU_IFUNC symbol `strcmp' with pointer equality in `/usr/lib/gcc/x86_64-redhat-linux/7/../../../../lib64/libc.a(strcmp.o
+)' can not be used when making an executable; recompile with -fPIE and relink with -pie

There's no sense for --dynamic-list in static build, because there's no
.dynsym table in static binary. Consequently the traceevent plugins have
never worked with static build, but it was quietly passed by.

To fix this in future I think we should add support to compile plugins
within the perf binary directly for static build.

Reported-and-Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/n/tip-jeg6a7ff9j9hlqn8k4gllzvv@git.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-28 11:05:09 -03:00
Jack Henschel
726647d052 perf stat: Fix path to PMU formats in documentation
As defined in tools/perf/util/pmu.c, the EVENT_SOURCE_DEVICE_PATH is
/sys/bus/event_source/devices/ (no traling 's' in event_source)

This patch corrects the path in the perf stat documentation

Signed-off-by: Jack Henschel <jackdev@mailbox.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jack Henschel <jackdev@mailbox.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: trivial@kernel.org
Link: http://lkml.kernel.org/r/20170824132022.10934-1-jackdev@mailbox.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-28 11:05:09 -03:00
Konstantin Khlebnikov
60913e005c perf tools: Fix static linking with libunwind
* libunwind-x86_64 must be linked before libunwind
* libunwind requires liblzma
* static libunwind conflicts with static libgcc_eh

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/150322917247.129799.14247751517961953155.stgit@buzz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 13:24:55 -03:00
Konstantin Khlebnikov
ba335df4ea perf tools: Fix static linking with libdw from elfutils
Fix feature test for static libdw: link required dependencies.  Backends
of libebl are not statically linked thus libdl is required.

In Debian/Ubuntu libdw-dev includes libebl.a starting from 0.166-1.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/150322916720.129772.7959925864494283854.stgit@buzz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 13:24:54 -03:00
Konstantin Khlebnikov
ac0bb6b72f perf: Fix documentation for sysctls perf_event_paranoid and perf_event_mlock_kb
Fix misprint CAP_IOC_LOCK -> CAP_IPC_LOCK. This capability have nothing
to do with raw tracepoints. This part is about bypassing mlock limits.

Sysctl kernel.perf_event_paranoid = -1 allows raw and ftrace function
tracepoints without CAP_SYS_ADMIN.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lkml.kernel.org/r/150322916080.129746.11285255474738558340.stgit@buzz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 13:24:54 -03:00
Konstantin Khlebnikov
2826478a66 perf tools: Really install manpages via 'make install-man'
Target install-man builds them but forget to install.

Signed-off-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Borislav Petkov <borislav.petkov@amd.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: af3df2cf17 ("perf tools: Try to build Documentation when installing")
Link: http://lkml.kernel.org/r/150322915300.129715.13645857235229756834.stgit@buzz
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 13:24:53 -03:00
Andi Kleen
3067eaa7ce perf test: Add test cases for new data source encoding
Add some simple tests to perf test to test data source printing.

v2: Make the tests actually checked for the correct name of Forward
v3: Adjust to new encoding

Committer notes:

Avoid the in place declaration to make this build with older compilers,
for instance, in Debian 7 we get:

  tests/mem.c: In function 'test__mem':
  tests/mem.c:30:5: error: missing initializer [-Werror=missing-field-initializers]
  tests/mem.c:30:5: error: (near initialization for '(anonymous).<anonymous>.mem_snoop') [-Werror=missing-field-initializers]

So just zero a struct, then go on building the unions as needed,
reusing settings from the previous test, i.e. local -> remote, etc.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170816222156.19953-5-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 13:23:10 -03:00
Andi Kleen
52839e653b perf tools: Add support for printing new mem_info encodings
Add decoding for the new "lvlx" and "snoopx" meminfo fields added
earlier to the kernel so that "perf mem report" and other tools can
print it properly.

v2: Merge with persistent memory patch.
Switch to new bit encoding for each combination.

v3: Switch to generic lvlnum field.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170816222156.19953-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 12:30:25 -03:00
Andi Kleen
41d3d6db17 perf vendor events: Add Skylake server uncore event list
Add JSON uncore events for Skylake Server to perf.

Based on JSON list V1.01

This is a much fuller list than with earlier uncores, including
more low level (but also harder to understand) events. It does not
include the "experimential" events. The previous
high level metric (LLC_* etc.) are still available when applicable.
C state power events are not included at this point.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/20170816220553.GA19463@tassilo.jf.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 12:25:11 -03:00
Andi Kleen
630171d415 perf vendor events: Add core event list for Skylake Server
Based on JSON list version v1.01

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Link: http://lkml.kernel.org/r/3269ae458a883139110ec82bc895423bd8843d65
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 12:23:52 -03:00
Andi Kleen
d66dccdb13 perf tools: Dedup events in expression parsing
Avoid adding redundant events while parsing an expression.  When we add
an "other" event check first if it already exists.

v2: Fix perf test failure.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170811232634.30465-10-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 12:19:08 -03:00
Andi Kleen
8d3db2b97f perf tools: Increase maximum number of events in expressions
Some of the upcoming metrics need more than 8 events. Increase the maximum
number the parser supports.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170811232634.30465-9-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 12:19:05 -03:00
Andi Kleen
d73bad0685 perf tools: Expression parser enhancements for metrics
Enhance the expression parser for more complex metric formulas.

- Support python style IF ELSE operators
- Add an #SMT_On magic variable for formulas that depend on the SMT
status.

Example: 4 *( CPU_CLK_UNHALTED.THREAD_ANY / 2 ) if #SMT_on else cycles

- Support MIN/MAX operations

Example: min(1 , IDQ.MITE_UOPS / ( UPI * 16 * ( ICACHE.HIT + ICACHE.MISSES ) / 4.0 ) )

This is useful to fix up problems caused by multiplexing.

- Support | & ^ operators
- Minor cleanups and fixes
- Support an \ escape for operators. This allows to specify event names
like c2-residency
- Support @ as an alternative for / to be able to specify pmus without
conflicts with operators (like msr/tsc/ as msr@tsc@)

Example: (cstate_core@c3\\-residency@ / msr@tsc@) * 100

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170811232634.30465-8-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 12:15:53 -03:00
Andi Kleen
de5077c4e3 perf tools: Add utility function to detect SMT status
Add an smt_on() function to return if SMT is enabled or disabled.  Used
in the next patch.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170811232634.30465-7-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 12:09:04 -03:00
Andi Kleen
77d0871c76 perf bpf: Tighten detection of BPF events
perf stat -e cpu/uops_executed.core,cmask=1/

would be detected as a BPF source event because the .c matches the .c
source BPF pattern.

v2:

Originally I tried to use lex lookahead, but it doesn't seem to work.

This now extends the BPF pattern to match longer events, but then does
an extra check in the C code to reject BPF matches that do not end with
.c/.o/.obj

This uses REJECT, which makes the flex scanner slower, but that
shouldn't be a big problem for the perf events.

Committer testing:

  # perf trace -e write -e /home/acme/bpf/tracepoint.c cat /etc/passwd > /dev/null
     0.000 ( 0.006 ms): cat/18485 write(fd: 1, buf: 0x7f59eebe1000, count: 3494                         ) ...
     0.006 (         ): raw_syscalls:sys_enter:NR 1 (1, 7f59eebe1000, da6, 22, 7f59eebe0010, 0))
     0.008 (         ): perf_bpf_probe:_write:(ffffffff9626b2c0))
     0.000 ( 0.010 ms): cat/18485  ... [continued]: write()) = 3494
  #

It continues doing what was expected, i.e. identifying
/home/acme/bpf/tracepoint.c as a BPF event and activates the clang
machinery to build an eBPF object and then uses sys_bpf() to hook it up
to the raw_syscalls:sys_enter tracepoint, etc.

Andi forgot to add Wang to the CC list, fix it.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Wang Nan <wangnan0@huawei.com>
Link: http://lkml.kernel.org/r/20170811232634.30465-4-andi@firstfloor.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 11:56:22 -03:00
Andi Kleen
475fb533fb perf evsel: Fix buffer overflow while freeing events
Fix buffer overflow for:

  % perf stat -e msr/tsc/,cstate_core/c7-residency/ true

that causes glibc free list corruption. For some reason it doesn't
trigger in valgrind, but it is visible in AS:

  =================================================================
  ==32681==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x603000003f5c at pc 0x0000005671ef bp 0x7ffdaaac9ac0 sp 0x7ffdaaac9ab0
  READ of size 4 at 0x603000003f5c thread T0
    #0 0x5671ee in perf_evsel__close_fd util/evsel.c:1196
    #1 0x56c57a in perf_evsel__close util/evsel.c:1717
    #2 0x55ed5f in perf_evlist__close util/evlist.c:1631
    #3 0x4647e1 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:749
    #4 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
    #5 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
    #6 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
    #7 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
    #8 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
    #9 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
    #10 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)
    #11 0x428419 in _start (/home/ak/hle/obj-perf/perf+0x428419)

  0x603000003f5c is located 0 bytes to the right of 28-byte region [0x603000003f40,0x603000003f5c)
  allocated by thread T0 here:
    #0 0x7f0675139020 in calloc (/lib64/libasan.so.3+0xc7020)
    #1 0x648a2d in zalloc util/util.h:23
    #2 0x648a88 in xyarray__new util/xyarray.c:9
    #3 0x566419 in perf_evsel__alloc_fd util/evsel.c:1039
    #4 0x56b427 in perf_evsel__open util/evsel.c:1529
    #5 0x56c620 in perf_evsel__open_per_thread util/evsel.c:1730
    #6 0x461dea in create_perf_stat_counter /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:263
    #7 0x4637d7 in __run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:600
    #8 0x4648e3 in run_perf_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:767
    #9 0x46e1bc in cmd_stat /home/ak/hle/linux-hle-2.6/tools/perf/builtin-stat.c:2785
    #10 0x52f83d in run_builtin /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:296
    #11 0x52fd49 in handle_internal_command /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:348
    #12 0x5300de in run_argv /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:392
    #13 0x5308f3 in main /home/ak/hle/linux-hle-2.6/tools/perf/perf.c:530
    #14 0x7f0672d13400 in __libc_start_main (/lib64/libc.so.6+0x20400)

The event is allocated with cpus == 1, but freed with cpus == real number
When the evsel close function walks the file descriptors it exceeds the
fd xyarray boundaries and reads random memory.

v2:

Now that xyarrays save their original dimensions we can use these to
iterate the two dimensional fd arrays. Fix some users (close, ioctl) in
evsel.c to use these fields directly. This allows simplifying the code
and dropping quite a few function arguments. Adjust all callers by
removing the unneeded arguments.

The actual perf event reading still uses the original values from the
evsel list.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170811232634.30465-2-andi@firstfloor.org
[ Fix up xy_max_[xy]() -> xyarray__max_[xy]() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 11:51:31 -03:00
Andi Kleen
d74be47673 perf xyarray: Save max_x, max_y
Save the original array dimensions in xyarrays, so that users can
retrieve them later. Add some inline functions to access these fields.

Signed-off-by: Andi Kleen <ak@linux.intel.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Link: http://lkml.kernel.org/r/20170811232634.30465-1-andi@firstfloor.org
[ As noticed by Jiri, fix up namespacing: xy__method() -> xyarray__method() ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-22 11:51:28 -03:00
Taeung Song
3a555c7799 perf annotate browser: Circulate percent, total-period and nr-samples view
Using the existing 't' hotkey, support the three views: percent, total
period and number of samples on the annotate TUI browser, circulating
them like below:

  Percent -> Total Period -> Nr Samples -> Percent ...

Committer notes:

Removed new 'e' hotkey, should be resubmitted as a separate patch, with
proper justification for its inclusion.

Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Milian Wolff <milian.wolff@kdab.com>
Link: http://lkml.kernel.org/r/1503046028-5691-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-18 11:23:20 -03:00
Taeung Song
9cef4b0b5b perf annotate browser: Support --show-nr-samples option
Support the --show-nr-samples in the TUI browser.

Committer notes:

Lift the restriction about --tui but leave it for --gtk:

  $ export LD_LIBRARY_PATH=~/lib64
  $ perf annotate --gtk --show-nr-samples --show-nr-samples is not available in --gtk mode at this time
  $

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lkml.kernel.org/r/1503046023-5646-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-18 11:15:09 -03:00
Taeung Song
01c85629f5 perf annotate: Document --show-total-period option
When the --show-total-period option was introduced we forgot to add an
entry in the man page, fix it.

Signed-off-by: Taeung Song <treeze.taeung@gmail.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Martin Liška <mliska@suse.cz>
Fixes: 0c4a5bcea4 ("perf annotate: Display total number of samples with --show-total-period")
Link: http://lkml.kernel.org/r/1503046013-5555-1-git-send-email-treeze.taeung@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2017-08-18 10:34:08 -03:00