Commit Graph

63 Commits

Author SHA1 Message Date
K Prateek Nayak
4b87406a3b perf stat record: Save cache level information
When aggregating based on cache-topology, in addition to the aggregation
mode, knowing the cache level at which data is aggregated is necessary
to ensure consistency when running 'perf stat record' and later 'perf
stat report'.

Save the cache level for aggregation as a part of the env data that can
be later retrieved when running perf stat report.

Suggested-by: Gautham Shenoy <gautham.shenoy@amd.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wen Pu <puwen@hygon.cn>
Link: https://lore.kernel.org/r/20230517172745.5833-4-kprateek.nayak@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-05-23 16:10:13 -03:00
Ian Rogers
e5116f46d4 perf map: Add accessor for start and end
Later changes will add reference count checking for struct map, start
and end are frequently accessed variables. Add an accessor so that the
reference count check is only necessary in one place.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 16:54:11 -03:00
Ian Rogers
63df0e4bc3 perf map: Add accessor for dso
Later changes will add reference count checking for struct map, with
dso being the most frequently accessed variable. Add an accessor so
that the reference count check is only necessary in one place.

Additional changes:
 - add a dso variable to avoid repeated map__dso calls.
 - in builtin-mem.c dump_raw_samples, code only partially tested for
   dso == NULL. Make the possibility of NULL consistent.
 - in thread.c thread__memcpy fix use of spaces and use tabs.

Committer notes:

Did missing conversions on these files:

   tools/perf/arch/powerpc/util/skip-callchain-idx.c
   tools/perf/arch/powerpc/util/sym-handling.c
   tools/perf/ui/browsers/hists.c
   tools/perf/ui/gtk/annotate.c
   tools/perf/util/cs-etm.c
   tools/perf/util/thread.c
   tools/perf/util/unwind-libunwind-local.c
   tools/perf/util/unwind-libunwind.c

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 16:41:57 -03:00
Ian Rogers
ff583dc43d perf maps: Remove rb_node from struct map
struct map is reference counted, having it also be a node in an
red-black tree complicates the reference counting. Switch to having a
map_rb_node which is a red-block tree node but points at the reference
counted struct map. This reference is responsible for a single reference
count.

Committer notes:

Fixed up tools/perf/util/unwind-libunwind-local.c to use map_rb_node as
well.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-04-04 14:06:27 -03:00
Ian Rogers
484b2a8442 perf tools: Ensure evsel name is initialized
Use the evsel__name accessor as otherwise name may be NULL resulting
in a segv. This was observed with the perf stat shell test.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexandre Torgue <alexandre.torgue@foss.st.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Eduard Zingerman <eddyz87@gmail.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Maxime Coquelin <mcoquelin.stm32@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: linux-stm32@st-md-mailman.stormreply.com
Link: https://lore.kernel.org/r/20230219092848.639226-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-02-19 08:00:37 -03:00
Ian Rogers
22e06e6825 perf buildid: Avoid copy of uninitialized memory
build_id__init() only copies the buildid data up to size leaving the
rest of the data array uninitialized. Copying the full array during
synthesis means the written event contains uninitialized memory.

Ensure the size is less that the buffer size and only copy the bytes
that were initialized. This was detected by the Clang/LLVM memory
sanitizer.

v2. Avoids the potential for copying too much as suggested by Arnaldo.

Suggested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230120185828.43231-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2023-01-27 15:00:35 -03:00
Ian Rogers
378ef0f5d9 perf build: Use libtraceevent from the system
Remove the LIBTRACEEVENT_DYNAMIC and LIBTRACEFS_DYNAMIC make command
line variables.

If libtraceevent isn't installed or NO_LIBTRACEEVENT=1 is passed to the
build, don't compile in libtraceevent and libtracefs support.

This also disables CONFIG_TRACE that controls "perf trace".

CONFIG_LIBTRACEEVENT is used to control enablement in Build/Makefiles,
HAVE_LIBTRACEEVENT is used in C code.

Without HAVE_LIBTRACEEVENT tracepoints are disabled and as such the
commands kmem, kwork, lock, sched and timechart are removed.  The
majority of commands continue to work including "perf test".

Committer notes:

Fixed up a tools/perf/util/Build reject and added:

  #include <traceevent/event-parse.h>

to tools/perf/util/scripting-engines/trace-event-perl.c.

Committer testing:

  $ rpm -qi libtraceevent-devel
  Name        : libtraceevent-devel
  Version     : 1.5.3
  Release     : 2.fc36
  Architecture: x86_64
  Install Date: Mon 25 Jul 2022 03:20:19 PM -03
  Group       : Unspecified
  Size        : 27728
  License     : LGPLv2+ and GPLv2+
  Signature   : RSA/SHA256, Fri 15 Apr 2022 02:11:58 PM -03, Key ID 999f7cbf38ab71f4
  Source RPM  : libtraceevent-1.5.3-2.fc36.src.rpm
  Build Date  : Fri 15 Apr 2022 10:57:01 AM -03
  Build Host  : buildvm-x86-05.iad2.fedoraproject.org
  Packager    : Fedora Project
  Vendor      : Fedora Project
  URL         : https://git.kernel.org/pub/scm/libs/libtrace/libtraceevent.git/
  Bug URL     : https://bugz.fedoraproject.org/libtraceevent
  Summary     : Development headers of libtraceevent
  Description :
  Development headers of libtraceevent-libs
  $

Default build:

  $ ldd ~/bin/perf | grep tracee
  	libtraceevent.so.1 => /lib64/libtraceevent.so.1 (0x00007f1dcaf8f000)
  $

  # perf trace -e sched:* --max-events 10
       0.000 migration/0/17 sched:sched_migrate_task(comm: "", pid: 1603763 (perf), prio: 120, dest_cpu: 1)
       0.005 migration/0/17 sched:sched_wake_idle_without_ipi(cpu: 1)
       0.011 migration/0/17 sched:sched_switch(prev_comm: "", prev_pid: 17 (migration/0), prev_state: 1, next_comm: "", next_prio: 120)
       1.173 :0/0 sched:sched_wakeup(comm: "", pid: 3138 (gnome-terminal-), prio: 120)
       1.180 :0/0 sched:sched_switch(prev_comm: "", prev_prio: 120, next_comm: "", next_pid: 3138 (gnome-terminal-), next_prio: 120)
       0.156 migration/1/21 sched:sched_migrate_task(comm: "", pid: 1603763 (perf), prio: 120, orig_cpu: 1, dest_cpu: 2)
       0.160 migration/1/21 sched:sched_wake_idle_without_ipi(cpu: 2)
       0.166 migration/1/21 sched:sched_switch(prev_comm: "", prev_pid: 21 (migration/1), prev_state: 1, next_comm: "", next_prio: 120)
       1.183 :0/0 sched:sched_wakeup(comm: "", pid: 1602985 (kworker/u16:0-f), prio: 120, target_cpu: 1)
       1.186 :0/0 sched:sched_switch(prev_comm: "", prev_prio: 120, next_comm: "", next_pid: 1602985 (kworker/u16:0-f), next_prio: 120)
  #

Had to tweak tools/perf/util/setup.py to make sure the python binding
shared object links with libtraceevent if -DHAVE_LIBTRACEEVENT is
present in CFLAGS.

Building with NO_LIBTRACEEVENT=1 uncovered some more build failures:

- Make building of data-convert-bt.c to CONFIG_LIBTRACEEVENT=y

- perf-$(CONFIG_LIBTRACEEVENT) += scripts/

- bpf_kwork.o needs also to be dependent on CONFIG_LIBTRACEEVENT=y

- The python binding needed some fixups and util/trace-event.c can't be
  built and linked with the python binding shared object, so remove it
  in tools/perf/util/setup.py and exclude it from the list of
  dependencies in the python/perf.so Makefile.perf target.

Building without libtraceevent-devel installed uncovered more build
failures:

- The python binding tools/perf/util/python.c was assuming that
  traceevent/parse-events.h was always available, which was the case
  when we defaulted to using the in-kernel tools/lib/traceevent/ files,
  now we need to enclose it under ifdef HAVE_LIBTRACEEVENT, just like
  the other parts of it that deal with tracepoints.

- We have to ifdef the rules in the Build files with
  CONFIG_LIBTRACEEVENT=y to build builtin-trace.c and
  tools/perf/trace/beauty/ as we only ifdef setting CONFIG_TRACE=y when
  setting NO_LIBTRACEEVENT=1 in the make command line, not when we don't
  detect libtraceevent-devel installed in the system. Simplification here
  to avoid these two ways of disabling builtin-trace.c and not having
  CONFIG_TRACE=y when libtraceevent-devel isn't installed is the clean
  way.

From Athira:

<quote>
tools/perf/arch/powerpc/util/Build
-perf-y += kvm-stat.o
+perf-$(CONFIG_LIBTRACEEVENT) += kvm-stat.o
</quote>

Then, ditto for arm64 and s390, detected by container cross build tests.

- s/390 uses test__checkevent_tracepoint() that is now only available if
  HAVE_LIBTRACEEVENT is defined, enclose the callsite with ifder HAVE_LIBTRACEEVENT.

Also from Athira:

<quote>
With this change, I could successfully compile in these environment:
- Without libtraceevent-devel installed
- With libtraceevent-devel installed
- With “make NO_LIBTRACEEVENT=1”
</quote>

Then, finally rename CONFIG_TRACEEVENT to CONFIG_LIBTRACEEVENT for
consistency with other libraries detected in tools/perf/.

Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Tested-by: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/20221205225940.3079667-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-12-14 11:16:12 -03:00
Namhyung Kim
19030564ab perf inject: Set PERF_RECORD_MISC_BUILD_ID_SIZE
With perf inject -b, it synthesizes build-id event for DSOs.  But it
missed to set the size and resulted in having trailing zeros.

As perf record sets the size in write_build_id(), let's set the size
here as well.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20221119002750.1568027-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-11-23 10:34:40 -03:00
Namhyung Kim
fd643afc8f perf record: Save DSO build-ID for synthesizing
When synthesizing MMAP2 with build-id, it'd read the same file repeatedly as
it has no idea if it's done already.  Maintain a dsos to check that and skip
the file access if possible.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220920222822.2171056-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-06 11:12:14 -03:00
Ian Rogers
c7202d20fb perf cpumap: Add range data encoding
Often cpumaps encode a range of all CPUs, add a compact encoding that
doesn't require a bit mask or list of all CPUs.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220614143353.1559597-7-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04 08:55:21 -03:00
Ian Rogers
d773c999b8 perf events: Prefer union over variable length array
It is possible for casts to introduce alignment issues, prefer a union
for perf_record_event_update.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220614143353.1559597-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-10-04 08:55:21 -03:00
Namhyung Kim
999e4eaa4b perf tools: Honor namespace when synthesizing build-ids
It needs to enter the namespace before reading a file.

Fixes: 4183a8d70a ("perf tools: Allow synthesizing the build id for kernel/modules/tasks in PERF_RECORD_MMAP2")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220920222822.2171056-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-09-21 16:08:00 -03:00
Namhyung Kim
f52679b788 perf tools: Support reading PERF_FORMAT_LOST
The recent kernel added lost count can be read from either read(2) or
ring buffer data with PERF_SAMPLE_READ.  As it's a variable length data
we need to access it according to the format info.

But for perf tools use cases, PERF_FORMAT_ID is always set.  So we can
only check PERF_FORMAT_LOST bit to determine the data format.

Add sample_read_value_size() and next_sample_read_value() helpers to
make it a bit easier to access.  Use them in all places where it reads
the struct sample_read_value.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20220819003644.508916-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-19 15:56:56 -03:00
Ian Rogers
b2f10cd4e8 perf cpumap: Fix alignment for masks in event encoding
A mask encoding of a cpu map is laid out as:

  u16 nr
  u16 long_size
  unsigned long mask[];

However, the mask may be 8-byte aligned meaning there is a 4-byte pad
after long_size. This means 32-bit and 64-bit builds see the mask as
being at different offsets. On top of this the structure is in the byte
data[] encoded as:

  u16 type
  char data[]

This means the mask's struct isn't the required 4 or 8 byte aligned, but
is offset by 2. Consequently the long reads and writes are causing
undefined behavior as the alignment is broken.

Fix the mask struct by creating explicit 32 and 64-bit variants, use a
union to avoid data[] and casts; the struct must be packed so the
layout matches the existing perf.data layout. Taking an address of a
member of a packed struct breaks alignment so pass the packed
perf_record_cpu_map_data to functions, so they can access variables with
the right alignment.

As the 64-bit version has 4 bytes of padding, optimizing writing to only
write the 32-bit version.

Committer notes:

Disable warnings about 'packed' that break the build in some arches like
riscv64, but just around that specific struct.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220614143353.1559597-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-19 15:30:28 -03:00
Ian Rogers
28526478cc perf cpumap: Compute mask size in constant time
perf_cpu_map__max() computes the cpumap's maximum value, no need to
iterate over all values.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220614143353.1559597-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-19 12:26:58 -03:00
Ian Rogers
35ae6f09d8 perf cpumap: Synthetic events and const/static
Make the cpumap arguments const to make it clearer they are in rather
than out arguments. Make two functions static and remove external
declarations.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Colin Ian King <colin.king@intel.com>
Cc: Dave Marchevsky <davemarchevsky@fb.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Kees Kook <keescook@chromium.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20220614143353.1559597-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-08-19 12:26:58 -03:00
Adrian Hunter
b47bb18661 perf tools: Add machine_pid and vcpu to id_index
When injecting events from a guest perf.data file, the events will have
separate sample ID numbers. These ID numbers can then be used to determine
which machine an event belongs to. To facilitate that, add machine_pid and
vcpu to id_index records. For backward compatibility, these are added at
the end of the record, and the length of the record is used to determine
if they are present or not.

Note, this is needed because the events from a guest perf.data file contain
the pid/tid of the process running at that time inside the VM not the
pid/tid of the (QEMU) hypervisor thread. So a way is needed to relate
guest events back to the guest machine and VCPU, and using sample ID
numbers for that is relatively simple and convenient.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20220711093218.10967-11-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-20 11:07:58 -03:00
Adrian Hunter
1ee94463e9 perf tools: Add perf_event__synthesize_id_sample()
Add perf_event__synthesize_id_sample() to enable the synthesis of
ID samples.

This is needed by perf inject. When injecting events from a guest perf.data
file, there is a possibility that the sample ID numbers conflict. In that
case, perf_event__synthesize_id_sample() can be used to re-write the ID
sample.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kvm@vger.kernel.org
Link: https://lore.kernel.org/r/20220711093218.10967-7-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-20 11:07:48 -03:00
Arnaldo Carvalho de Melo
0698461ad2 Merge remote-tracking branch 'torvalds/master' into perf/core
To update the perf/core codebase.

Fix conflict by moving arch__post_evsel_config(evsel, attr) to the end
of evsel__config(), after what was added in:

  49c692b7df ("perf offcpu: Accept allowed sample types only")

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-18 10:36:11 -03:00
Namhyung Kim
ff898552fb perf synthetic-events: Ignore dead threads during event synthesis
When it synthesize various task events, it scans the list of task
first and then accesses later.  There's a window threads can die
between the two and proc entries may not be available.

Instead of bailing out, we can ignore that thread and move on.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220701205458.985106-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-02 09:22:26 -03:00
Namhyung Kim
363afa3aef perf synthetic-events: Don't sort the task scan result from /proc
It should not sort the result as procfs already returns a proper
ordering of tasks.  Actually sorting the order caused problems that it
doesn't guararantee to process the main thread first.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20220701205458.985106-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-07-02 09:21:41 -03:00
Adrian Hunter
6b080312fc perf record: Always record id index
In preparation for recording sideband events in a virtual machine guest so
that they can be injected into a host perf.data file.

Adjust the logic so that if there are IDs then the id index is recorded.

Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20220610113316.6682-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-06-23 11:54:22 -03:00
Ian Rogers
0df6ade711 perf evlist: Rename cpus to user_requested_cpus
evlist contains cpus and all_cpus. all_cpus is the union of the cpu maps
of all evsels.

For non-task targets, cpus is set to be cpus requested from the command
line, defaulting to all online cpus if no cpus are specified.

For an uncore event, all_cpus may be just CPU 0 or every online CPU.

This causes all_cpus to have fewer values than the cpus variable which
is confusing given the 'all' in the name.

To try to make the behavior clearer, rename cpus to user_requested_cpus
and add comments on the two struct variables.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Antonov <alexander.antonov@linux.intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: John Garry <john.garry@huawei.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yonghong Song <yhs@fb.com>
Cc: bpf@vger.kernel.org
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: netdev@vger.kernel.org
Link: http://lore.kernel.org/lkml/20220328232648.2127340-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-04-01 16:19:35 -03:00
Leo Yan
bc9c806e52 perf synthetic-events: Return error if procfs isn't mounted for PID namespaces
For perf recording, it retrieves process info by iterating nodes in proc
fs.  If we run perf in a non-root PID namespace with command:

  # unshare --fork --pid perf record -e cycles -a -- test_program

... in this case, unshare command creates a child PID namespace and
launches perf tool in it, but the issue is the proc fs is not mounted
for the non-root PID namespace, this leads to the perf tool gathering
process info from its parent PID namespace.

We can use below command to observe the process nodes under proc fs:

  # unshare --pid --fork ls /proc
1    137   1968  2128  3    342  48  62   78	     crypto	  kcore        net	      uptime
10   138   2	 2142  30   35	 49  63   8	     devices	  keys	       pagetypeinfo   version
11   139   20	 2143  304  36	 50  64   82	     device-tree  key-users    partitions     vmallocinfo
12   14    2011  22    305  37	 51  65   83	     diskstats	  kmsg	       self	      vmstat
128  140   2038  23    307  39	 52  656  84	     driver	  kpagecgroup  slabinfo       zoneinfo
129  15    2074  24    309  4	 53  67   9	     execdomains  kpagecount   softirqs
13   16    2094  241   31   40	 54  68   asound     fb		  kpageflags   stat
130  164   2096  242   310  41	 55  69   buddyinfo  filesystems  loadavg      swaps
131  17    2098  25    317  42	 56  70   bus	     fs		  locks        sys
132  175   21	 26    32   43	 57  71   cgroups    interrupts   meminfo      sysrq-trigger
133  179   2102  263   329  44	 58  75   cmdline    iomem	  misc	       sysvipc
134  1875  2103  27    330  45	 59  76   config.gz  ioports	  modules      thread-self
135  19    2117  29    333  46	 6   77   consoles   irq	  mounts       timer_list
136  1941  2121  298   34   47	 60  773  cpuinfo    kallsyms	  mtd	       tty

So it shows many existed tasks, since unshared command has not mounted
the proc fs for the new created PID namespace, it still accesses the
proc fs of the root PID namespace.  This leads to two prominent issues:

- Firstly, PID values are mismatched between thread info and samples.
  The gathered thread info are coming from the proc fs of the root PID
  namespace, but samples record its PID from the child PID namespace.

- The second issue is profiled program 'test_program' returns its forked
  PID number from the child PID namespace, perf tool wrongly uses this
  PID number to retrieve the process info via the proc fs of the root
  PID namespace.

To avoid issues, we need to mount proc fs for the child PID namespace
with the option '--mount-proc' when use unshare command:

  # unshare --fork --pid --mount-proc perf record -e cycles -a -- test_program

Conversely, when the proc fs of the root PID namespace is used by child
namespace, perf tool can detect the multiple PID levels and
nsinfo__is_in_root_namespace() returns false, this patch reports error
for this case:

  # unshare --fork --pid perf record -e cycles -a -- test_program
  Couldn't synthesize bpf events.
  Perf runs in non-root PID namespace but it tries to gather process info from its parent PID namespace.
  Please mount the proc file system properly, e.g. add the option '--mount-proc' for unshare command.

Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Link: https://lore.kernel.org/r/20211224124014.2492751-1-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-02-06 09:03:06 -03:00
Ian Rogers
4402869939 perf cpumap: Migrate to libperf cpumap api
Switch from directly accessing the perf_cpu_map to using the appropriate
libperf API when possible. Using the API simplifies the job of
refactoring use of perf_cpu_map.

Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: André Almeida <andrealmeid@collabora.com>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: http://lore.kernel.org/lkml/20220122045811.3402706-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-22 17:08:42 -03:00
Ian Rogers
6d18804b96 perf cpumap: Give CPUs their own type
A common problem is confusing CPU map indices with the CPU, by wrapping
the CPU with a struct then this is avoided. This approach is similar to
atomic_t.

Committer notes:

To make it build with BUILD_BPF_SKEL=1 these files needed the
conversions to 'struct perf_cpu' usage:

  tools/perf/util/bpf_counter.c
  tools/perf/util/bpf_counter_cgroup.c
  tools/perf/util/bpf_ftrace.c

Also perf_env__get_cpu() was removed back in "perf cpumap: Switch
cpu_map__build_map to cpu function".

Additionally these needed to be fixed for the ARM builds to complete:

  tools/perf/arch/arm/util/cs-etm.c
  tools/perf/arch/arm64/util/pmu.c

Suggested-by: John Garry <john.garry@huawei.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Paul Clarke <pc@us.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Vineet Singh <vineet.singh@intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Cc: zhengjun.xing@intel.com
Link: https://lore.kernel.org/r/20220105061351.120843-49-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2022-01-12 14:28:23 -03:00
Namhyung Kim
41b740b6e8 perf record: Add --synth option
Add an option to control the synthesizing behavior.

    --synth <no|all|task|mmap|cgroup>
                      Fine-tune event synthesis: default=all

This can be useful when we know it doesn't need some synthesis like
in a specific usecase and/or when using pipe:

  $ perf record -a --all-cgroups --synth cgroup -o- sleep 1 | \
  > perf report -i- -s cgroup

Committer notes:

Added a clarification to the man page entry for --synth that this is
about pre-existing threads.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https //lore.kernel.org/r/20210811044658.1313391-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-17 08:55:00 -03:00
Namhyung Kim
84111b9c95 perf tools: Allow controlling synthesizing PERF_RECORD_ metadata events during record
Depending on the use case, it might require some kind of synthesizing
and some not.  Make it controllable to turn off heavy operations like
MMAP for all tasks.

Currently all users are converted to enable all the synthesis by
default.  It'll be updated in the later patch.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https //lore.kernel.org/r/20210811044658.1313391-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-09-17 08:44:19 -03:00
Namhyung Kim
c3a057dc3a perf inject: Fix output from a file to a pipe
When the input is a regular file but the output is a pipe, it should
write a pipe header.  But just repiping would write a portion of the
existing header which is different in 'size' value.  So we need to
prevent it and write a new pipe header along with other information
like event attributes and features.

This can handle something like this:

  # perf record -a -B sleep 1

  # perf inject -b -i perf.data | perf report -i -

Factor out perf_event__synthesize_for_pipe() to be shared between perf
record and inject.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/20210719223153.1618812-5-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-08-02 10:14:34 -03:00
Arnaldo Carvalho de Melo
b0a752d43b Merge remote-tracking branch 'torvalds/master' into perf/core
To pick up fixes sent via perf/urgent and in the BPF tools/ directories.

Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-29 10:39:10 -03:00
Ingo Molnar
4d39c89f0b perf tools: Fix various typos in comments
Fix ~124 single-word typos and a few spelling errors in the perf tooling code,
accumulated over the years.

Signed-off-by: Ingo Molnar <mingo@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20210321113734.GA248990@gmail.com
Link: http://lore.kernel.org/lkml/20210323160915.GA61903@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-23 17:13:43 -03:00
Ian Rogers
2a76f6de07 perf synthetic events: Avoid write of uninitialized memory when generating PERF_RECORD_MMAP* records
Account for alignment bytes in the zero-ing memset.

Fixes: 1a853e3687 ("perf record: Allow specifying a pid to record")
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210309234945.419254-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-10 09:20:59 -03:00
Thomas Richter
c3d59cfde9 perf synthetic-events: Fix uninitialized 'kernel_thread' variable
perf build fails on 5.12.0rc2 on s390 with this error message:

util/synthetic-events.c: In function
				‘__event__synthesize_thread.part.0.isra’:
util/synthetic-events.c:787:19: error: ‘kernel_thread’ may be
    used uninitialized in this function [-Werror=maybe-uninitialized]
    787 |   if (_pid == pid && !kernel_thread) {
        |       ~~~~~~~~~~~~^~~~~~~~~~~~~~~~~

The build succeeds using command 'make DEBUG=y'.

The variable kernel_thread is set by this function sequence:

__event__synthesize_thread()
|    defines bool kernel_thread; as local variable and calls
+--> perf_event__prepare_comm(..., &kernel_thread)
     +--> perf_event__get_comm_ids(..., bool *kernel);
          On return of this function variable kernel is always
          set to true or false.

To prevent this compile error, assign variable kernel_thread
a value when it is defined.

Output after:

  [root@m35lp76 perf]# make  util/synthetic-events.o
  ....
   CC       util/synthetic-events.o
  [root@m35lp76 perf]#

Fixes: c1b907953b ("perf tools: Skip PERF_RECORD_MMAP event synthesis for kernel threads")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Sven Schnelle <svens@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Link: http://lore.kernel.org/lkml/20210309110447.834292-1-tmricht@linux.ibm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-03-10 09:16:47 -03:00
Kan Liang
fbefe9c2f8 perf tools: Support arch specific PERF_SAMPLE_WEIGHT_STRUCT processing
For X86, the var2_w field of PERF_SAMPLE_WEIGHT_STRUCT stands for the
instruction latency. Current perf forces the var2_w to the data->ins_lat
in the generic code. It works well for now because X86 is the only
architecture that supports the PERF_SAMPLE_WEIGHT_STRUCT, but it may
bring problems once other architectures support the sample type.  For
example, the var2_w may be used to capture something else on PowerPC.

Create two architecture specific functions to parse and synthesize the
weight related samples. Move the X86 specific codes to the X86 version
functions. Other architectures can implement their own functions later
separately.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Jajeev <atrajeev@linux.vnet.ibm.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: http://lore.kernel.org/lkml/1612540912-6562-1-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-18 16:07:06 -03:00
Ian Rogers
e73b0d586e perf env: Remove unneeded internal/cpumap inclusions
Minor cleanup.

Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20210211183914.4093187-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-12 17:35:48 -03:00
Kan Liang
590db42de0 perf report: Support instruction latency
The instruction latency information can be recorded on some platforms,
e.g., the Intel Sapphire Rapids server. With both memory latency
(weight) and the new instruction latency information, users can easily
locate the expensive load instructions, and also understand the time
spent in different stages. The users can optimize their applications in
different pipeline stages.

The 'weight' field is shared among different architectures. Reusing the
'weight' field may impacts other architectures. Add a new field to store
the instruction latency.

Like the 'weight' support, introduce a 'ins_lat' for the global
instruction latency, and a 'local_ins_lat' for the local instruction
latency version.

Add new sort functions, INSTR Latency and Local INSTR Latency,
accordingly.

Add local_ins_lat to the default_mem_sort_order[].

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-7-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:25:00 -03:00
Kan Liang
ea8d0ed6ea perf tools: Support PERF_SAMPLE_WEIGHT_STRUCT
The new sample type, PERF_SAMPLE_WEIGHT_STRUCT, is an alternative of the
PERF_SAMPLE_WEIGHT sample type. Users can apply either the
PERF_SAMPLE_WEIGHT sample type or the PERF_SAMPLE_WEIGHT_STRUCT sample
type to retrieve the sample weight, but they cannot apply both sample
types simultaneously.

The new sample type shares the same space as the PERF_SAMPLE_WEIGHT
sample type. The lower 32 bits are exactly the same for both sample
type. The higher 32 bits may be different for different architecture.

Add arch specific arch_evsel__set_sample_weight() to set the new sample
type for X86. Only store the lower 32 bits for the sample->weight if the
new sample type is applied. In practice, no memory access could last
than 4G cycles. No data will be lost.

If the kernel doesn't support the new sample type. Fall back to the
PERF_SAMPLE_WEIGHT sample type.

There is no impact for other architectures.

Committer notes:

Fixup related to PERF_SAMPLE_CODE_PAGE_SIZE, present in acme/perf/core
but not upstream yet.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jin Yao <yao.jin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/1612296553-21962-6-git-send-email-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-08 16:25:00 -03:00
Namhyung Kim
473f742e18 perf tools: Use scandir() to iterate threads when synthesizing PERF_RECORD_ events
Like in __event__synthesize_thread(), I think it's better to use
scandir() instead of the readdir() loop.  In case some malicious task
continues to create new threads, the readdir() loop will run over and
over to collect tids.  The scandir() also has the problem but the window
is much smaller since it doesn't do much work during the iteration.

Also add filter_task() function as we only care the tasks.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210202090118.2008551-4-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:44 -03:00
Namhyung Kim
c1b907953b perf tools: Skip PERF_RECORD_MMAP event synthesis for kernel threads
To synthesize information to resolve sample IPs, it needs to scan task
and mmap info from the /proc filesystem.  For each process, it opens
(and reads) status and maps file respectively.  But as kernel threads
don't have memory maps so we can skip the maps file.

To find kernel threads, check "VmPeak:" line in /proc/<PID>/status file.
It's about the peak virtual memory usage so only user-level tasks have
that.  Note that it's possible to miss the line due to partial reads.
So we should double-check if it's a really kernel thread when there's no
VmPeak line.

Thus check "Threads:" line (which follows the VmPeak line whether or not
it exists) to be sure it's read enough data - just in case of deeply
nested pid namespaces or large number of supplementary groups are
involved.

This is for user process:

  $ head -40 /proc/1/status
  Name:	systemd
  Umask:	0000
  State:	S (sleeping)
  Tgid:	1
  Ngid:	0
  Pid:	1
  PPid:	0
  TracerPid:	0
  Uid:	0	0	0	0
  Gid:	0	0	0	0
  FDSize:	256
  Groups:
  NStgid:	1
  NSpid:	1
  NSpgid:	1
  NSsid:	1
  VmPeak:	  234192 kB           <-- here
  VmSize:	  169964 kB
  VmLck:	       0 kB
  VmPin:	       0 kB
  VmHWM:	   29528 kB
  VmRSS:	    6104 kB
  RssAnon:	    2756 kB
  RssFile:	    3348 kB
  RssShmem:	       0 kB
  VmData:	   19776 kB
  VmStk:	    1036 kB
  VmExe:	     784 kB
  VmLib:	    9532 kB
  VmPTE:	     116 kB
  VmSwap:	    2400 kB
  HugetlbPages:	       0 kB
  CoreDumping:	0
  THP_enabled:	1
  Threads:	1                     <-- and here
  SigQ:	1/62808
  SigPnd:	0000000000000000
  ShdPnd:	0000000000000000
  SigBlk:	7be3c0fe28014a03
  SigIgn:	0000000000001000

And this is for kernel thread:

  $ head -20 /proc/2/status
  Name:	kthreadd
  Umask:	0000
  State:	S (sleeping)
  Tgid:	2
  Ngid:	0
  Pid:	2
  PPid:	0
  TracerPid:	0
  Uid:	0	0	0	0
  Gid:	0	0	0	0
  FDSize:	64
  Groups:
  NStgid:	2
  NSpid:	2
  NSpgid:	0
  NSsid:	0
  Threads:	1                     <-- here
  SigQ:	1/62808
  SigPnd:	0000000000000000
  ShdPnd:	0000000000000000

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210202090118.2008551-3-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:44 -03:00
Namhyung Kim
30626e0844 perf tools: Use /proc/<PID>/task/<TID>/status for PERF_RECORD_ event synthesis
To save memory usage, it needs to reduce the number of entries in the
proc filesystem.  It's using /proc/<PID>/task directory to traverse
threads in the process and then kernel creates /proc/<PID>/task/<TID>
entries.

After that it checks the thread info using the /proc/<TID>/status file
rather than /proc/<PID>/task/<TID>/status.  As far as I can see, they
are the same and contain all the info we need.

Using the latter eliminates the unnecessary /proc/<TID> entry.  This can
be useful especially a large number of threads are used in the system.
In my experiment around 1KB of memory on average was saved for each
thread (which is not a thread group leader).

To do this, pass both pid and tid to perf_event_prepare_comm() if it
knows them.  In case it doesn't know, passing 0 as pid will do the old
way.

Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20210202090118.2008551-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-02-03 13:10:44 -03:00
Kan Liang
c1de7f3d84 perf record: Add support for PERF_SAMPLE_CODE_PAGE_SIZE
Adds the infrastructure to sample the code address page size.

Introduce a new --code-page-size option for perf record.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Originally-by: Stephane Eranian <eranian@google.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/r/20210105195752.43489-4-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2021-01-20 14:34:20 -03:00
Jiri Olsa
4183a8d70a perf tools: Allow synthesizing the build id for kernel/modules/tasks in PERF_RECORD_MMAP2
Adding build id to synthesized mmap2 events for everything -
kernel/modules/tasks, when symbol_conf.buildid_mmap2 is true.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-11-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 10:25:33 -03:00
Jiri Olsa
e0dbf18f65 perf tools: Allow using PERF_RECORD_MMAP2 to synthesize the kernel modules maps
Allow using PERF_RECORD_MMAP2 to synthesize the kernel modules maps so
that we can use PERF_RECORD_MMAP2 to encode the kernel modules build ids
in the following csets.

It's enabled by a new symbol_conf.buildid_mmap2 bool field, which will
be switchable in following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-10-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 10:25:23 -03:00
Jiri Olsa
978410ff99 perf tools: Allow using PERF_RECORD_MMAP2 to synthesize the kernel map
Allow using PERF_RECORD_MMAP2 to synthesize the kernel map so that we
can use PERF_RECORD_MMAP2 to encode the kernel build id in the following
csets.

It's enabled by a new symbol_conf.buildid_mmap2 bool field, which will
be switchable in following changes.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexei Budankov <abudankov@huawei.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Petlan <mpetlan@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <songliubraving@fb.com>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201214105457.543111-9-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-28 10:22:55 -03:00
Kan Liang
542b88fd12 perf record: Support new sample type for data page size
Support new sample type PERF_SAMPLE_DATA_PAGE_SIZE for page size.

Add new option --data-page-size to record sample data page size.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Michael Ellerman <mpe@ellerman.id.au>
Cc: Stephane Eranian <eranian@google.com>
Cc: Will Deacon <will@kernel.org>
Link: http://lore.kernel.org/lkml/20201130172803.2676-3-kan.liang@linux.intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-12-17 14:36:16 -03:00
Arnaldo Carvalho de Melo
3ccf8a7b66 perf evlist: Use the right prefix for 'struct evlist' sample id lookup methods
perf_evlist__ is for 'struct perf_evlist' methods, in tools/lib/perf/,
go on completing this split.

Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-30 14:17:57 -03:00
Namhyung Kim
aa50d953c1 perf record: Synthesize cgroup events only if needed
It didn't check the tool->cgroup_events bit which is set when the
--all-cgroups option is given.  Without it, samples will not have cgroup
info so no reason to synthesize.

We can check the PERF_RECORD_CGROUP records after running perf record
*WITHOUT* the --all-cgroups option:

Before:

  $ perf report -D | grep CGROUP
  0 0 0x8430 [0x38]: PERF_RECORD_CGROUP cgroup: 1 /
          CGROUP events:          1
          CGROUP events:          0
          CGROUP events:          0

After:

  $ perf report -D | grep CGROUP
          CGROUP events:          0
          CGROUP events:          0
          CGROUP events:          0

Committer testing:

Before:

  # perf record -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.208 MB perf.data (10003 samples) ]
  # perf report -D | grep "CGROUP events"
            CGROUP events:        146
            CGROUP events:          0
            CGROUP events:          0
  #

After:

  # perf record -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.208 MB perf.data (10448 samples) ]
  # perf report -D | grep "CGROUP events"
            CGROUP events:          0
            CGROUP events:          0
            CGROUP events:          0
  #

With all-cgroups:

  # perf record --all-cgroups -a sleep 1
  [ perf record: Woken up 1 times to write data ]
  [ perf record: Captured and wrote 2.374 MB perf.data (11526 samples) ]
  # perf report -D | grep "CGROUP events"
            CGROUP events:        146
            CGROUP events:          0
            CGROUP events:          0
  #

Fixes: 8fb4b67939 ("perf record: Add --all-cgroups option")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Jiri Olsa <jolsa@redhat.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: http://lore.kernel.org/lkml/20201127054356.405481-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-11-27 14:26:33 -03:00
Jiri Olsa
0aba7f036a perf tools: Use build_id object in dso
Replace build_id byte array with struct build_id object and all the code
that references it.

The objective is to carry size together with build id array, so it's
better to keep both together.

This is preparatory change for following patches, and there's no
functional change.

Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20201013192441.1299447-2-jolsa@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-10-14 08:44:47 -03:00
Leo Yan
03fca3af51 perf tsc: Move out common functions from x86
Functions perf_read_tsc_conversion() and perf_event__synth_time_conv()
should work as common functions rather than x86 specific, so move these
two functions out from arch/x86 folder and place them into util/tsc.c.

Since the function perf_event__synth_time_conv() will be linked in
util/tsc.c, remove its weak version.

Committer testing:

Before/after:

  # perf test tsc
  70: Convert perf time to TSC                                        : Ok
  #
  # perf test -v tsc
  70: Convert perf time to TSC                                        :
  --- start ---
  test child forked, pid 8520
  mmap size 528384B
  1st event perf time 592110439891 tsc 2317172044331
  rdtsc          time 592110441915 tsc 2317172052010
  2nd event perf time 592110442336 tsc 2317172053605
  test child finished with 0
  ---- end ----
  Convert perf time to TSC: Ok
  #

Signed-off-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Gustavo A. R. Silva <gustavoars@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: John Garry <john.garry@huawei.com>
Cc: Kemeng Shi <shikemeng@huawei.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nick Gasson <nick.gasson@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Remi Bernon <rbernon@codeweavers.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Steve Maclean <steve.maclean@microsoft.com>
Cc: Will Deacon <will@kernel.org>
Cc: Zou Wei <zou_wei@huawei.com>
Link: http://lore.kernel.org/lkml/20200914115311.2201-2-leo.yan@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-09-22 13:38:33 -03:00
Zou Wei
8284bbeab7 perf tools: Remove unneeded semicolons
Fixes coccicheck warnings:

  tools/perf/builtin-diff.c:1565:2-3: Unneeded semicolon
  tools/perf/builtin-lock.c:778:2-3: Unneeded semicolon
  tools/perf/builtin-mem.c:126:2-3: Unneeded semicolon
  tools/perf/util/intel-pt-decoder/intel-pt-pkt-decoder.c:555:2-3: Unneeded semicolon
  tools/perf/util/ordered-events.c:317:2-3: Unneeded semicolon
  tools/perf/util/synthetic-events.c:1131:2-3: Unneeded semicolon
  tools/perf/util/trace-event-read.c:78:2-3: Unneeded semicolon

Reported-by: Hulk Robot <hulkci@huawei.com>
Signed-off-by: Zou Wei <zou_wei@huawei.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: http://lore.kernel.org/lkml/1588065523-71423-1-git-send-email-zou_wei@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
2020-04-30 10:48:32 -03:00