Cache is_pmu_core in the pmu to avoid recomputation.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230527072210.2900565-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
is_arm_pmu_core detects a core PMU via the presence of a "cpus" file
rather than a "cpumask" file. This pattern holds for hybrid PMUs so
rename the function and remove redundant perf_pmu__is_hybrid
tests.
Add a new helper is_pmu_hybrid similar to is_pmu_core.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ali Saidi <alisaidi@amazon.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Dmitrii Dolgov <9erthalion6@gmail.com>
Cc: Huacai Chen <chenhuacai@kernel.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Ming Wang <wangming01@loongson.cn>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230527072210.2900565-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'struct evsel' uses a union for the two lists. This turned out to be
error prone.
For example:
[root@quaco ~]# perf stat --bpf-prog 5246
Error: cpu-clock event does not have sample flags 66c660
failed to set filter "(null)" on event cpu-clock with 2 (No such file or directory)
[root@quaco ~]# perf stat --bpf-prog 5246
Fix this issue by separating the two lists.
Fixes: 56ec9457a4 ("perf bpf filter: Implement event sample filtering")
Signed-off-by: Song Liu <song@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: kernel-team@meta.com
Link: https://lore.kernel.org/r/20230519235757.3613719-1-song@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Copy the kernel version of the header to fix the header diff build
warning. Some new definitions were only added to the tools side header,
but these are only used in Perf so move them to a different header.
Signed-off-by: James Clark <james.clark@arm.com>
Reviewed-by: Mike Leach <mike.leach@linaro.org>
Link: https://lore.kernel.org/r/20230522102604.1081416-1-james.clark@arm.com
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Suzuki K Poulose <suzuki.poulose@arm.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: acme@kernel.org
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Will Deacon <will@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: coresight@lists.linaro.org
Cc: siyanteng@loongson.cn
Cc: John Garry <john.g.garry@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: linux-kernel@vger.kernel.org
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
demangle-cxx.cpp requires a C++ compiler, but feature checks may fail
because of the absence of this. Add a CONFIG_CXX_DEMANGLE so that the
source isn't built if not supported. Copy libbfd and cplus demangle
variants to a weak symbol-elf.c version so they aren't dependent on
C++. These variants are only built with the build option
BUILD_NONDISTRO=1.
Committer note:
This also handles this build break when a C++ compiler isn't available:
CXX /tmp/build/perf/util/demangle-cxx.o
/bin/sh: g++: command not found
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Qi Liu <liuqi115@huawei.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Link: https://lore.kernel.org/r/20230417192546.99923-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
BPF CO-RE requires 3 underscores for the ignored suffix rule but it
mistakenly used only 2. Let's fix it.
Fixes: 3a8b8fc317 ("perf bpf filter: Support pre-5.16 kernels where 'mem_hops' isn't in 'union perf_mem_data_src'")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Andrii Nakryiko <andrii@kernel.org>
Acked-by: John Fastabend <john.fastabend@gmail.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230525000307.3202449-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When aggregating based on cache-topology, in addition to the aggregation
mode, knowing the cache level at which data is aggregated is necessary
to ensure consistency when running 'perf stat record' and later 'perf
stat report'.
Save the cache level for aggregation as a part of the env data that can
be later retrieved when running perf stat report.
Suggested-by: Gautham Shenoy <gautham.shenoy@amd.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wen Pu <puwen@hygon.cn>
Link: https://lore.kernel.org/r/20230517172745.5833-4-kprateek.nayak@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Processors based on chiplet architecture, such as AMD EPYC and Hygon do
not expose the chiplet details in the sysfs CPU topology information.
However, this information can be derived from the per CPU cache level
information from the sysfs.
'perf stat' has already supported aggregation based on topology
information using core ID, socket ID, etc. It'll be useful to aggregate
based on the cache topology to detect problems like imbalance and
cache-to-cache sharing at various cache levels.
This patch lays the foundation for aggregating data in 'perf stat' based
on the processor's cache topology. The cmdline option to aggregate data
based on the cache topology is added in Patch 4 of the series while this
patch sets up all the necessary functions and variables required to
support the new aggregation option.
The patch also adds support to display per-cache aggregation, or save it
as a JSON or CSV, as splitting it into a separate patch would break
builds when compiling with "-Werror=switch-enum" where the compiler will
complain about the lack of handling for the AGGR_CACHE case in the
output functions.
Committer notes:
Don't use perf_stat_config in tools/perf/util/cpumap.c, this would make
code that is in util/, thus not really specific to a single builtin, use
a specific builtin config structure.
Move the functions introduced in this patch from
tools/perf/util/cpumap.c since it needs access to builtin specific
and is not strictly needed to live in the util/ directory.
With this 'perf test python' is back building.
Suggested-by: Gautham Shenoy <gautham.shenoy@amd.com>
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wen Pu <puwen@hygon.cn>
Link: https://lore.kernel.org/r/20230517172745.5833-3-kprateek.nayak@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
build_caches() builds the complete cache topology of the system by
iterating over all CPU, building and comparing cache levels of each CPU,
keeping only the unique ones at the end.
Extract the unit that build the cache levels for a single CPU into a
separate function. Expose this function, and the MAX_CACHE_LVL value to
be used elsewhere in perf too.
Signed-off-by: K Prateek Nayak <kprateek.nayak@amd.com>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Gautham Shenoy <gautham.shenoy@amd.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Wen Pu <puwen@hygon.cn>
Link: https://lore.kernel.org/r/20230517172745.5833-2-kprateek.nayak@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some metrics may not have a metric_group which can result in segvs
with "perf stat --topdown". Add a condition for the no metric_group
case.
Fixes: 1647cd5b88 ("perf stat: Implement --topdown using json metrics")
Reported-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230515224530.671331-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When the source argument of the "mov" instruction looks like below, it
didn't parse the whole operand and just stopped at the first comma.
mov (%rbx,%rax,1),%rcx
Fix it by checking the parentheses and move it to the closing one.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230511062725.514752-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Using -ftree-loop-distribute-patterns and -gno-variable-location-views
in the python feature test when building with clang-16 results in:
16 80.04 clearlinux:latest : FAIL clang version 16.0.1
clang-16: error: unknown argument: '-gno-variable-location-views'
clang-16: error: unknown argument: '-gno-variable-location-views'
clang-16: error: optimization flag '-ftree-loop-distribute-patterns' is not supported [-Werror,-Wignored-optimization-argument]
clang-16: error: optimization flag '-ftree-loop-distribute-patterns' is not supported [-Werror,-Wignored-optimization-argument]
error: command '/usr/sbin/clang' failed with exit code 1
Noticed when building on a docker.io/library/clearlinux:latest container.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move to print-events.c and make static.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-45-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ids/events from a metric are turned into an event string and parsed;
setup_metric_events matches the id back to the parsed evsel. With
hybrid the same event may exist on both PMUs with the same name and be
being used by metrics at the same time. A metric on cpu_core therefore
shouldn't match against evsels on cpu_atom, or the metric will compute
the wrong value. Make the matching sensitive to the PMU being parsed.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-43-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Wire up the --cputype value to limit which metrics are parsed.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-40-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Hybrid systems may define the same metric for different PMUs, this can
cause confusion of events. To avoid this make the referenced metric
searches PMU specific, matching that in the table.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-39-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Bring back the behavior of not auto-merging hybrid events by
delegating to a test in pmu.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-37-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Avoid the parser error:
'''
$ perf stat -e 'cycles/name=l1d/' true
event syntax error: 'cycles/name=l1d/'
\___ parser error
'''
by combining the name and legacy cache cases in the parser.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-36-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Avoid the parser error:
'''
$ perf stat -e 'cycles/name=name/' true
event syntax error: 'cycles/name=name/'
\___ parser error
'''
by turning the term back to a string if it is on the right. Add PMU
and generic parsing tests.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-35-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
An event like "cpu/instructions/" typically parses due to there being
a sysfs event called instructions. On hybrid recursive parsing means
that the hardware event is encoded in the attribute, with the PMU
being placed in the high bits of the config:
'''
$ perf stat -vv -e 'cpu_core/cycles/' true
...
------------------------------------------------------------
perf_event_attr:
size 136
config 0x400000000
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
inherit 1
enable_on_exec 1
exclude_guest 1
------------------------------------------------------------
'''
Make this behavior the default by adding a new term type and token for
hardware events. The token gathers both the numeric config and the
parsed name, so that if the token appears like "cycles/name=cycles/"
then the token can be handled like a name. The numeric value isn't
sufficient to distinguish say "cpu-cycles" from "cycles".
Extend the parse-events test so that all current non-PMU hardware
parsing tests, also test with the PMU cpu - more than half the change.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-34-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rather than limit the --cputype argument for "perf list" and "perf
stat" to hybrid PMUs of just cpu_atom and cpu_core, allow any PMU.
Note, that if cpu_atom isn't mounted but a filter of cpu_atom is
requested, then this will now fail. As such a filter would never
succeed, no events can come from that unmounted PMU, then this
behavior could never have been useful and failing is clearer.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-31-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To support the cputype argument added to "perf stat" for hybrid it is
necessary to filter events during wildcard matching. Add a scanner
argument for the filter and checking it when wildcard matching.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-30-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Use the typed parse_state rather than void* _parse_state when
available.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-29-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The event parser no longer needs to recurse in case of a legacy cache
event in a PMU, the necessary wild card logic has moved to
perf_pmu__supports_legacy_cache and
perf_pmu__supports_wildcard_numeric.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-28-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Legacy raw events like r1a open as PERF_TYPE_RAW on non-hybrid systems
and on each hybrid PMU on hybrid systems. Rather than iterate hybrid
PMUs add a perf_pmu__supports_wildcard_numeric function that says when
a numeric event should be opened upon it. If the parsed event
specifies the type of the PMU then don't wildcard match PMUs, use the
specified PMU type.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-27-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Mirroring parse_events_add_cache, list the legacy name alongside its
alias with the PMU. Remove the now unnecessary hybrid logic.
Note, the alias output removes the event type descriptor, so:
L1-dcache-loads [Hardware cache event]
becomes:
L1-dcache-loads OR cpu/L1-dcache-loads/
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-26-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It is inconsistent that "perf stat -e instructions-retired" wildcard
opens on all PMUs while legacy cache events like "perf stat -e
L1-dcache-load-miss" do not. A behavior introduced by hybrid is that a
legacy cache event like L1-dcache-load-miss should wildcard open on
all hybrid PMUs. Previously hybrid would call to is_event_supported
for each PMU, a failure of which results in the event not being
added. This isn't done in this case as the parser should just create
perf_event_attr and the later open should fail, or the counter give
"<not counted>". If this wants to be avoided then the PMU can be named
with the event.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-25-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Allow a legacy cache event to be both, for example,
"L1-dcache-load-miss" and "cpu/L1-dcache-load-miss/" by introducing a
new legacy cache term type.
The term type is processed in config_term_pmu, setting both the type in
perf_event_attr and the config.
The code to determine the config is factored out of
parse_events_add_cache and shared. If the PMU doesn't support legacy
events, currently just core/hybrid PMUs do, then the term is treated
like a PE_NAME term - as before.
If only terms are being parsed, such as for perf_pmu__new_alias, then
the PE_LEGACY_CACHE token is always parsed as PE_NAME.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-24-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The event parser needs to handle two special cases:
1) legacy events like L1-dcache-load-miss. These event names don't
appear in JSON or sysfs, and lookup tables are used for the config
value.
2) raw events where 'r0xead' is the same as 'read' unless the PMU has
an event called 'read' in which case the event has priority.
The previous parser to handle these cases would scan all PMUs for
components of event names. These components would then be used to
classify in the lexer whether the token should be part of a legacy
event, a raw event or an event. The grammar would handle legacy event
tokens or recombining the tokens back into a regular event name. The
code wasn't PMU specific and had issues around events like AMD's
branch-brs that would fail to parse as it expects brs to be a suffix
on a legacy event style name:
$ perf stat -e branch-brs true
event syntax error: 'branch-brs'
\___ parser error
This change removes processing all PMUs by using the lexer in the form
of a regular expression matcher. The lexer will return the token for
the longest matched sequence of characters, and in the event of a tie
the first. The legacy events are a fixed number of regular
expressions, and by matching these before a name token its possible to
generate an accurate legacy event token with everything else matching
as a name. Because of the lexer change the handling of hyphens in the
grammar can be removed as hyphens just become a part of the name.
To handle raw events and terms the parser is changed to defer trying
to evaluate whether something is a raw event until the PMU is known in
the grammar. Once the PMU is known, the events of the PMU can be
scanned for the 'read' style problem. A new term type is added for
these raw terms, used to enable deferring the evaluation.
While this change is large, it has stats of:
170 insertions(+), 436 deletions(-)
the bulk of the change is deleting the old approach. It isn't possible
to break apart the code added due to the dependencies on how the parts
of the parsing work.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-19-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The strlist in print_hwcache_events holds the event names as they are
generated, and then it is iterated and printed. This is unnecessary
and each event can just be printed as it is processed.
Rename the variable i to res, to be more intention revealing and
consistent with other code.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-18-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Change add_event to always set pmu_name when possible as not all code
checks both pmu->name and evsel->pmu_name, for example,
uniquify_counter in stat-display.c.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-17-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Set attr.type to PMU type early so that later terms can override the
value. Setting the value in perf_pmu__config means that earlier steps,
like config_term_pmu, can override the value.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-16-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Print dso offset only for object files, and in those cases force using the
dso->long_name if the dso->name starts with '[' or the dso is kcore, in
order to avoid special names such as [vdso], or mixing up kcore with
vmlinux.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230424055107.12105-3-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Declare dso const, so that functions can be called with const struct *dso.
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/r/20230424055107.12105-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This adds a helper function map__fprintf_dsoname_dsoff() to print dsoname
with optional dso offset.
Suggested-by: Adrian Hunter <adrian.hunter@intel.com>
Signed-off-by: Changbin Du <changbin.du@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hui Wang <hw.huiwang@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230418031825.1262579-3-changbin.du@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If 'struct rq' isn't defined in lock_contention.bpf.c then the type for
the 'runqueue' variable ends up being a forward declaration
(BTF_KIND_FWD) while the kernel has it defined (BTF_KIND_STRUCT).
This makes libbpf decide it has incompatible types and then fails to
load the BPF skeleton:
# perf lock con -ab sleep 1
libbpf: extern (var ksym) 'runqueues': incompatible types, expected [95] fwd rq, but kernel has [55509] struct rq
libbpf: failed to load object 'lock_contention_bpf'
libbpf: failed to load BPF skeleton 'lock_contention_bpf': -22
Failed to load lock-contention BPF skeleton
lock contention BPF setup failed
#
Add it as an empty struct to satisfy that type verification:
# perf lock con -ab sleep 1
contended total wait max wait avg wait type caller
2 50.64 us 25.38 us 25.32 us spinlock tick_do_update_jiffies64+0x25
1 26.18 us 26.18 us 26.18 us spinlock tick_do_update_jiffies64+0x25
#
Committer notes:
Extracted from a larger patch as Namhyung had already fixed the other
issues in e53de7b65a ("perf lock contention: Fix struct rq lock
access").
Signed-off-by: Jiri Olsa <jolsa@kernel.org>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Changbin Du <changbin.du@huawei.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/lkml/ZFVqeKLssg7uzxzI@krava
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Currently, vmlinux.h uses the bpf.h and perf_event.h header files in the
system path. If the header files in compilation environment are old,
compilation may fail. For example:
/home/yangjihong/linux/tools/perf/util/bpf_skel/.tmp/../vmlinux.h:151:27: error: field has incomplete type 'union perf_sample_weight'
union perf_sample_weight weight;
Use the bpf.h and perf_event.h files in the source code directory to
avoid compilation compatibility problems.
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Rix <trix@redhat.com>
Link: https://lore.kernel.org/r/20230510064401.225051-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If we have a group of {cycles,faults} then we need the faults software
event to appear to be on the same PMU as cycles so that we don't split
the group in parse_events__sort_events_and_fix_groups.
This case is relatively easy as cycles is the leader and will have a PMU
name. In the reverse case, {faults,cycles} we still need faults to
appear to have the PMU name of cycles but the old behavior is just to
return "cpu".
For hybrid this fails as cycles will be on "cpu_core" or "cpu_atom",
causing faults to be split into a different group.
Change the behavior for software events so that the whole group is
searched for the named PMU.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-20-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The pmu_group_name by default returns "cpu" which on non-hybrid/ARM
means that ungrouped software, and hardware events are all going to
sort by the original insertion index.
However, on hybrid and ARM wildcard expansion may mean the PMU name is
set and events will be unnecessarily reordered - triggering the
reordering warning.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some metric groups have metrics that don't have fully overlapping
events, meaning that the group's events become unique event groups that
may need to multiplex with each other. This can be particularly
unfortunate when the groups wouldn't need to multiplex because there are
sufficient hardware counters.
Add a flag so that if recording a metric group then the metrics within
the group needn't use groups for their events. The flag is added to
Intel TopdownL1 and TopdownL2 metrics.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-4-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'perf stat' with no arguments will use default events and metrics. These
events may fail to open even with kernel and hypervisor disabled. When
these fail then the permissions error appears even though they were
implicitly selected. This is particularly a problem with the automatic
selection of the TopdownL1 metric group on certain architectures like
Skylake:
$ perf stat true
Error:
Access to performance monitoring and observability operations is limited.
Consider adjusting /proc/sys/kernel/perf_event_paranoid setting to open
access to performance monitoring and observability operations for processes
without CAP_PERFMON, CAP_SYS_PTRACE or CAP_SYS_ADMIN Linux capability.
More information can be found at 'Perf events and tool security' document:
https://www.kernel.org/doc/html/latest/admin-guide/perf-security.html
perf_event_paranoid setting is 2:
-1: Allow use of (almost) all events by all users
Ignore mlock limit after perf_event_mlock_kb without CAP_IPC_LOCK
>= 0: Disallow raw and ftrace function tracepoint access
>= 1: Disallow CPU event access
>= 2: Disallow kernel profiling
To make the adjusted perf_event_paranoid setting permanent preserve it
in /etc/sysctl.conf (e.g. kernel.perf_event_paranoid = <setting>)
$
This patch adds skippable evsels that when they fail to open won't cause
termination and will appear as "<not supported>" in output. The
TopdownL1 events, from the metric group, are marked as skippable. This
turns the failure above to:
$ perf stat perf bench internals synthesize
Computing performance of single threaded perf event synthesis by
synthesizing events on the perf process itself:
Average synthesis took: 49.287 usec (+- 0.083 usec)
Average num. events: 3.000 (+- 0.000)
Average time per event 16.429 usec
Average data synthesis took: 49.641 usec (+- 0.085 usec)
Average num. events: 11.000 (+- 0.000)
Average time per event 4.513 usec
Performance counter stats for 'perf bench internals synthesize':
1,222.38 msec task-clock:u # 0.993 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
162 page-faults:u # 132.529 /sec
774,445,184 cycles:u # 0.634 GHz (49.61%)
1,640,969,811 instructions:u # 2.12 insn per cycle (59.67%)
302,052,148 branches:u # 247.102 M/sec (59.69%)
1,807,718 branch-misses:u # 0.60% of all branches (59.68%)
5,218,927 CPU_CLK_UNHALTED.REF_XCLK:u # 4.269 M/sec
# 17.3 % tma_frontend_bound
# 56.4 % tma_retiring
# nan % tma_backend_bound
# nan % tma_bad_speculation (60.01%)
536,580,469 IDQ_UOPS_NOT_DELIVERED.CORE:u # 438.965 M/sec (60.33%)
<not supported> INT_MISC.RECOVERY_CYCLES_ANY:u
5,223,936 CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u # 4.274 M/sec (40.31%)
774,127,250 CPU_CLK_UNHALTED.THREAD:u # 633.297 M/sec (50.34%)
1,746,579,518 UOPS_RETIRED.RETIRE_SLOTS:u # 1.429 G/sec (50.12%)
1,940,625,702 UOPS_ISSUED.ANY:u # 1.588 G/sec (49.70%)
1.231055525 seconds time elapsed
0.258327000 seconds user
0.965749000 seconds sys
$
The event INT_MISC.RECOVERY_CYCLES_ANY:u is skipped as it can't be
opened with paranoia 2 on Skylake. With a lower paranoia, or as root,
all events/metrics are computed.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-3-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Division by zero causes expression parsing to fail and no metric to be
generated. This can mean for short running benchmarks metrics are not
shown. Change the behavior to make the value nan, which gets shown like:
'''
$ perf stat -M TopdownL2 true
Performance counter stats for 'true':
1,031,492 INST_RETIRED.ANY # nan % tma_fetch_bandwidth
# nan % tma_heavy_operations
# nan % tma_light_operations
29,304 CPU_CLK_UNHALTED.REF_XCLK # nan % tma_fetch_latency
# nan % tma_branch_mispredicts
# nan % tma_machine_clears
# nan % tma_core_bound
# nan % tma_memory_bound
2,658,319 IDQ_UOPS_NOT_DELIVERED.CORE
11,167 EXE_ACTIVITY.BOUND_ON_STORES
262,058 EXE_ACTIVITY.1_PORTS_UTIL
<not counted> BR_MISP_RETIRED.ALL_BRANCHES (0.00%)
<not counted> INT_MISC.RECOVERY_CYCLES_ANY (0.00%)
<not counted> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE (0.00%)
<not counted> CPU_CLK_UNHALTED.THREAD (0.00%)
<not counted> UOPS_RETIRED.RETIRE_SLOTS (0.00%)
<not counted> CYCLE_ACTIVITY.STALLS_MEM_ANY (0.00%)
<not counted> UOPS_RETIRED.MACRO_FUSED (0.00%)
<not counted> IDQ_UOPS_NOT_DELIVERED.CYCLES_0_UOPS_DELIV.CORE (0.00%)
<not counted> EXE_ACTIVITY.2_PORTS_UTIL (0.00%)
<not counted> CYCLE_ACTIVITY.STALLS_TOTAL (0.00%)
<not counted> MACHINE_CLEARS.COUNT (0.00%)
<not counted> UOPS_ISSUED.ANY (0.00%)
0.002864879 seconds time elapsed
0.003012000 seconds user
0.000000000 seconds sys
'''
When events aren't supported a count of 0 can be confusing and make
metrics look meaningful. Change these to be nan also which, with the
next change, gets shown like:
'''
$ perf stat true
Performance counter stats for 'true':
1.25 msec task-clock:u # 0.387 CPUs utilized
0 context-switches:u # 0.000 /sec
0 cpu-migrations:u # 0.000 /sec
46 page-faults:u # 36.702 K/sec
255,942 cycles:u # 0.204 GHz (88.66%)
123,046 instructions:u # 0.48 insn per cycle
28,301 branches:u # 22.580 M/sec
2,489 branch-misses:u # 8.79% of all branches
4,719 CPU_CLK_UNHALTED.REF_XCLK:u # 3.765 M/sec
# nan % tma_frontend_bound
# nan % tma_retiring
# nan % tma_backend_bound
# nan % tma_bad_speculation
344,855 IDQ_UOPS_NOT_DELIVERED.CORE:u # 275.147 M/sec
<not supported> INT_MISC.RECOVERY_CYCLES_ANY:u
<not counted> CPU_CLK_UNHALTED.ONE_THREAD_ACTIVE:u (0.00%)
<not counted> CPU_CLK_UNHALTED.THREAD:u (0.00%)
<not counted> UOPS_RETIRED.RETIRE_SLOTS:u (0.00%)
<not counted> UOPS_ISSUED.ANY:u (0.00%)
0.003238142 seconds time elapsed
0.000000000 seconds user
0.003434000 seconds sys
'''
Ensure that nan metric values are quoted as nan isn't a valid number
in JSON.
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Kan Liang <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230502223851.2234828-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We need to better polish building with BPF skels, so revert back to
making it an experimental feature that has to be explicitely enabled
using BUILD_BPF_SKEL=1.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
-----BEGIN PGP SIGNATURE-----
iHUEABYKAB0WIQR2GiIUctdOfX2qHhGyPKLppCJ+JwUCZFbCXwAKCRCyPKLppCJ+
J7cHAP97erKY4hBXArjpfzcvpFmboh/oqhbTLntyIpS6TEnOyQEAyervAPGIjQYC
DCo4foyXmOWn3dhNtK9M+YiRl3o2SgQ=
=7G78
-----END PGP SIGNATURE-----
Merge tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux
Pull perf tool updates from Arnaldo Carvalho de Melo:
"Third version of perf tool updates, with the build problems with with
using a 'vmlinux.h' generated from the main build fixed, and the bpf
skeleton build disabled by default.
Build:
- Require libtraceevent to build, one can disable it using
NO_LIBTRACEEVENT=1.
It is required for tools like 'perf sched', 'perf kvm', 'perf
trace', etc.
libtraceevent is available in most distros so installing
'libtraceevent-devel' should be a one-time event to continue
building perf as usual.
Using NO_LIBTRACEEVENT=1 produces tooling that is functional and
sufficient for lots of users not interested in those libtraceevent
dependent features.
- Allow Python support in 'perf script' when libtraceevent isn't
linked, as not all features requires it, for instance Intel PT does
not use tracepoints.
- Error if the python interpreter needed for jevents to work isn't
available and NO_JEVENTS=1 isn't set, preventing a build without
support for JSON vendor events, which is a rare but possible
condition. The two check error messages:
$(error ERROR: No python interpreter needed for jevents generation. Install python or build with NO_JEVENTS=1.)
$(error ERROR: Python interpreter needed for jevents generation too old (older than 3.6). Install a newer python or build with NO_JEVENTS=1.)
- Make libbpf 1.0 the minimum required when building with out of
tree, distro provided libbpf.
- Use libsdtc++'s and LLVM's libcxx's __cxa_demangle, a portable C++
demangler, add 'perf test' entry for it.
- Make binutils libraries opt in, as distros disable building with it
due to licensing, they were used for C++ demangling, for instance.
- Switch libpfm4 to opt-out rather than opt-in, if libpfm-devel (or
equivalent) isn't installed, we'll just have a build warning:
Makefile.config:1144: libpfm4 not found, disables libpfm4 support. Please install libpfm4-dev
- Add a feature test for scandirat(), that is not implemented so far
in musl and uclibc, disabling features that need it, such as
scanning for tracepoints in /sys/kernel/tracing/events.
perf BPF filters:
- New feature where BPF can be used to filter samples, for instance:
$ sudo ./perf record -e cycles --filter 'period > 1000' true
$ sudo ./perf script
perf-exec 2273949 546850.708501: 5029 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708508: 32409 cycles: ffffffff826f9e25 finish_wait+0x5 ([kernel.kallsyms])
perf-exec 2273949 546850.708526: 143369 cycles: ffffffff82b4cdbf xas_start+0x5f ([kernel.kallsyms])
perf-exec 2273949 546850.708600: 372650 cycles: ffffffff8286b8f7 __pagevec_lru_add+0x117 ([kernel.kallsyms])
perf-exec 2273949 546850.708791: 482953 cycles: ffffffff829190de __mod_memcg_lruvec_state+0x4e ([kernel.kallsyms])
true 2273949 546850.709036: 501985 cycles: ffffffff828add7c tlb_gather_mmu+0x4c ([kernel.kallsyms])
true 2273949 546850.709292: 503065 cycles: 7f2446d97c03 _dl_map_object_deps+0x973 (/usr/lib/x86_64-linux-gnu/ld-linux-x86-64.so.2)
- In addition to 'period' (PERF_SAMPLE_PERIOD), the other
PERF_SAMPLE_ can be used for filtering, and also some other sample
accessible values, from tools/perf/Documentation/perf-record.txt:
Essentially the BPF filter expression is:
<term> <operator> <value> (("," | "||") <term> <operator> <value>)*
The <term> can be one of:
ip, id, tid, pid, cpu, time, addr, period, txn, weight, phys_addr,
code_pgsz, data_pgsz, weight1, weight2, weight3, ins_lat, retire_lat,
p_stage_cyc, mem_op, mem_lvl, mem_snoop, mem_remote, mem_lock,
mem_dtlb, mem_blk, mem_hops
The <operator> can be one of:
==, !=, >, >=, <, <=, &
The <value> can be one of:
<number> (for any term)
na, load, store, pfetch, exec (for mem_op)
l1, l2, l3, l4, cxl, io, any_cache, lfb, ram, pmem (for mem_lvl)
na, none, hit, miss, hitm, fwd, peer (for mem_snoop)
remote (for mem_remote)
na, locked (for mem_locked)
na, l1_hit, l1_miss, l2_hit, l2_miss, any_hit, any_miss, walk, fault (for mem_dtlb)
na, by_data, by_addr (for mem_blk)
hops0, hops1, hops2, hops3 (for mem_hops)
perf lock contention:
- Show lock type with address.
- Track and show mmap_lock, siglock and per-cpu rq_lock with address.
This is done for mmap_lock by following the current->mm pointer:
$ sudo ./perf lock con -abl -- sleep 10
contended total wait max wait avg wait address symbol
...
16344 312.30 ms 2.22 ms 19.11 us ffff8cc702595640
17686 310.08 ms 1.49 ms 17.53 us ffff8cc7025952c0
3 84.14 ms 45.79 ms 28.05 ms ffff8cc78114c478 mmap_lock
3557 76.80 ms 68.75 us 21.59 us ffff8cc77ca3af58
1 68.27 ms 68.27 ms 68.27 ms ffff8cda745dfd70
9 54.53 ms 7.96 ms 6.06 ms ffff8cc7642a48b8 mmap_lock
14629 44.01 ms 60.00 us 3.01 us ffff8cc7625f9ca0
3481 42.63 ms 140.71 us 12.24 us ffffffff937906ac vmap_area_lock
16194 38.73 ms 42.15 us 2.39 us ffff8cd397cbc560
11 38.44 ms 10.39 ms 3.49 ms ffff8ccd6d12fbb8 mmap_lock
1 5.43 ms 5.43 ms 5.43 ms ffff8cd70018f0d8
1674 5.38 ms 422.93 us 3.21 us ffffffff92e06080 tasklist_lock
581 4.51 ms 130.68 us 7.75 us ffff8cc9b1259058
5 3.52 ms 1.27 ms 703.23 us ffff8cc754510070
112 3.47 ms 56.47 us 31.02 us ffff8ccee38b3120
381 3.31 ms 73.44 us 8.69 us ffffffff93790690 purge_vmap_area_lock
255 3.19 ms 36.35 us 12.49 us ffff8d053ce30c80
- Update default map size to 16384.
- Allocate single letter option -M for --map-nr-entries, as it is
proving being frequently used.
- Fix struct rq lock access for older kernels with BPF's CO-RE
(Compile once, run everywhere).
- Fix problems found with MSAn.
perf report/top:
- Add inline information when using --call-graph=fp or lbr, as was
already done to the --call-graph=dwarf callchain mode.
- Improve the 'srcfile' sort key performance by really using an
optimization introduced in 6.2 for the 'srcline' sort key that
avoids calling addr2line for comparision with each sample.
perf sched:
- Make 'perf sched latency/map/replay' to use "sched:sched_waking"
instead of "sched:sched_waking", consistent with 'perf record'
since d566a9c2d4 ("perf sched: Prefer sched_waking event when it
exists").
perf ftrace:
- Make system wide the default target for latency subcommand, run the
following command then generate some network traffic and press
control+C:
# perf ftrace latency -T __kfree_skb
^C
DURATION | COUNT | GRAPH |
0 - 1 us | 27 | ############# |
1 - 2 us | 22 | ########### |
2 - 4 us | 8 | #### |
4 - 8 us | 5 | ## |
8 - 16 us | 24 | ############ |
16 - 32 us | 2 | # |
32 - 64 us | 1 | |
64 - 128 us | 0 | |
128 - 256 us | 0 | |
256 - 512 us | 0 | |
512 - 1024 us | 0 | |
1 - 2 ms | 0 | |
2 - 4 ms | 0 | |
4 - 8 ms | 0 | |
8 - 16 ms | 0 | |
16 - 32 ms | 0 | |
32 - 64 ms | 0 | |
64 - 128 ms | 0 | |
128 - 256 ms | 0 | |
256 - 512 ms | 0 | |
512 - 1024 ms | 0 | |
1 - ... s | 0 | |
#
perf top:
- Add --branch-history (LBR: Last Branch Record) option, just like
already available for 'perf record'.
- Fix segfault in thread__comm_len() where thread->comm was being
used outside thread->comm_lock.
perf annotate:
- Allow configuring objdump and addr2line in ~/.perfconfig., so that
you can use alternative binaries, such as llvm's.
perf kvm:
- Add TUI mode for 'perf kvm stat report'.
Reference counting:
- Add reference count checking infrastructure to check for use after
free, done to the 'cpumap', 'namespaces', 'maps' and 'map' structs,
more to come.
To build with it use -DREFCNT_CHECKING=1 in the make command line
to build tools/perf. Documented at:
https://perf.wiki.kernel.org/index.php/Reference_Count_Checking
- The above caught, for instance, fix, present in this series:
- Fix maps use after put in 'perf test "Share thread maps"':
'maps' is copied from leader, but the leader is put on line 79
and then 'maps' is used to read the reference count below - so
a use after put, with the put of maps happening within
thread__put.
Fixed by reversing the order of puts so that the leader is put
last.
- Also several fixes were made to places where reference counts were
not being held.
- Make this one of the tests in 'make -C tools/perf build-test' to
regularly build test it and to make sure no direct access to the
reference counted structs are made, doing that via accessors to
check the validity of the struct pointer.
ARM64:
- Fix 'perf report' segfault when filtering coresight traces by
sparse lists of CPUs.
- Add support for 'simd' as a sort field for 'perf report', to show
ARM's NEON SIMD's predicate flags: "partial" and "empty".
arm64 vendor events:
- Add N1 metrics.
Intel vendor events:
- Add graniterapids, grandridge and sierraforrest events.
- Refresh events for: alderlake, aldernaken, broadwell, broadwellde,
broadwellx, cascadelakx, haswell, haswellx, icelake, icelakex,
jaketown, meteorlake, knightslanding, sandybridge, sapphirerapids,
silvermont, skylake, tigerlake and westmereep-dp
- Refresh metrics for alderlake-n, broadwell, broadwellde,
broadwellx, haswell, haswellx, icelakex, ivybridge, ivytown and
skylakex.
perf stat:
- Implement --topdown using JSON metrics.
- Add TopdownL1 JSON metric as a default if present, but disable it
for now for some Intel hybrid architectures, a series of patches
addressing this is being reviewed and will be submitted for v6.5.
- Use metrics for --smi-cost.
- Update topdown documentation.
Vendor events (JSON) infrastructure:
- Add support for computing and printing metric threshold values. For
instance, here is one found in thesapphirerapids json file:
{
"BriefDescription": "Percentage of cycles spent in System Management Interrupts.",
"MetricExpr": "((msr@aperf@ - cycles) / msr@aperf@ if msr@smi@ > 0 else 0)",
"MetricGroup": "smi",
"MetricName": "smi_cycles",
"MetricThreshold": "smi_cycles > 0.1",
"ScaleUnit": "100%"
},
- Test parsing metric thresholds with the fake PMU in 'perf test
pmu-events'.
- Support for printing metric thresholds in 'perf list'.
- Add --metric-no-threshold option to 'perf stat'.
- Add rand (reverse and) and has_pmem (optane memory) support to
metrics.
- Sort list of input files to avoid depending on the order from
readdir() helping in obtaining reproducible builds.
S/390:
- Add common metrics: - CPI (cycles per instruction), prbstate (ratio
of instructions executed in problem state compared to total number
of instructions), l1mp (Level one instruction and data cache misses
per 100 instructions).
- Add cache metrics for z13, z14, z15 and z16.
- Add metric for TLB and cache.
ARM:
- Add raw decoding for SPE (Statistical Profiling Extension) v1.3 MTE
(Memory Tagging Extension) and MOPS (Memory Operations) load/store.
Intel PT hardware tracing:
- Add event type names UINTR (User interrupt delivered) and UIRET
(Exiting from user interrupt routine), documented in table 32-50
"CFE Packet Type and Vector Fields Details" in the Intel Processor
Trace chapter of The Intel SDM Volume 3 version 078.
- Add support for new branch instructions ERETS and ERETU.
- Fix CYC timestamps after standalone CBR
ARM CoreSight hardware tracing:
- Allow user to override timestamp and contextid settings.
- Fix segfault in dso lookup.
- Fix timeless decode mode detection.
- Add separate decode paths for timeless and per-thread modes.
auxtrace:
- Fix address filter entire kernel size.
Miscellaneous:
- Fix use-after-free and unaligned bugs in the PLT handling routines.
- Use zfree() to reduce chances of use after free.
- Add missing 0x prefix for addresses printed in hexadecimal in 'perf
probe'.
- Suppress massive unsupported target platform errors in the unwind
code.
- Fix return incorrect build_id size in elf_read_build_id().
- Fix 'perf scripts intel-pt-events.py' IPC output for Python 2 .
- Add missing new parameter in kfree_skb tracepoint to the python
scripts using it.
- Add 'perf bench syscall fork' benchmark.
- Add support for printing PERF_MEM_LVLNUM_UNC (Uncached access) in
'perf mem'.
- Fix wrong size expectation for perf test 'Setup struct
perf_event_attr' caused by the patch adding
perf_event_attr::config3.
- Fix some spelling mistakes"
* tag 'perf-tools-for-v6.4-3-2023-05-06' of git://git.kernel.org/pub/scm/linux/kernel/git/acme/linux: (365 commits)
Revert "perf build: Make BUILD_BPF_SKEL default, rename to NO_BPF_SKEL"
Revert "perf build: Warn for BPF skeletons if endian mismatches"
perf metrics: Fix SEGV with --for-each-cgroup
perf bpf skels: Stop using vmlinux.h generated from BTF, use subset of used structs + CO-RE
perf stat: Separate bperf from bpf_profiler
perf test record+probe_libc_inet_pton: Fix call chain match on x86_64
perf test record+probe_libc_inet_pton: Fix call chain match on s390
perf tracepoint: Fix memory leak in is_valid_tracepoint()
perf cs-etm: Add fix for coresight trace for any range of CPUs
perf build: Fix unescaped # in perf build-test
perf unwind: Suppress massive unsupported target platform errors
perf script: Add new parameter in kfree_skb tracepoint to the python scripts using it
perf script: Print raw ip instead of binary offset for callchain
perf symbols: Fix return incorrect build_id size in elf_read_build_id()
perf list: Modify the warning message about scandirat(3)
perf list: Fix memory leaks in print_tracepoint_events()
perf lock contention: Rework offset calculation with BPF CO-RE
perf lock contention: Fix struct rq lock access
perf stat: Disable TopdownL1 on hybrid
perf stat: Avoid SEGV on counter->name
...
Ensure the metric threshold is copied correctly or else a use of
uninitialized memory happens.
Fixes: d0a3052f6f ("perf metric: Compute and print threshold values")
Reported-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230505204119.3443491-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Linus reported a build break due to using a vmlinux without a BTF elf
section to generate the vmlinux.h header with bpftool for use in the BPF
tools in tools/perf/util/bpf_skel/*.bpf.c.
Instead add a vmlinux.h file with the structs needed with the fields the
tools need, marking the structs with __attribute__((preserve_access_index)),
so that libbpf's CO-RE code can fixup the struct field offsets.
In some cases the vmlinux.h file that was being generated by bpftool
from the kernel BTF information was not needed at all, just including
linux/bpf.h, sometimes linux/perf_event.h was enough as non-UAPI
types were not being used.
To keep te patch small, include those UAPI headers from the trimmed down
vmlinux.h file, that then provides the tools with just the structs and
the subset of its fields needed for them.
Testing it:
# perf lock contention -b find / > /dev/null
^C contended total wait max wait avg wait type caller
7 53.59 us 10.86 us 7.66 us rwlock:R start_this_handle+0xa0
2 30.35 us 21.99 us 15.17 us rwsem:R iterate_dir+0x52
1 9.04 us 9.04 us 9.04 us rwlock:W start_this_handle+0x291
1 8.73 us 8.73 us 8.73 us spinlock raw_spin_rq_lock_nested+0x1e
#
# perf lock contention -abl find / > /dev/null
^C contended total wait max wait avg wait address symbol
1 262.96 ms 262.96 ms 262.96 ms ffff8e67502d0170 (mutex)
12 244.24 us 39.91 us 20.35 us ffff8e6af56f8070 mmap_lock (rwsem)
7 30.28 us 6.85 us 4.33 us ffff8e6c865f1d40 rq_lock (spinlock)
3 7.42 us 4.03 us 2.47 us ffff8e6c864b1d40 rq_lock (spinlock)
2 3.72 us 2.19 us 1.86 us ffff8e6c86571d40 rq_lock (spinlock)
1 2.42 us 2.42 us 2.42 us ffff8e6c86471d40 rq_lock (spinlock)
4 2.11 us 559 ns 527 ns ffffffff9a146c80 rcu_state (spinlock)
3 1.45 us 818 ns 482 ns ffff8e674ae8384c (rwlock)
1 870 ns 870 ns 870 ns ffff8e68456ee060 (rwlock)
1 663 ns 663 ns 663 ns ffff8e6c864f1d40 rq_lock (spinlock)
1 573 ns 573 ns 573 ns ffff8e6c86531d40 rq_lock (spinlock)
1 472 ns 472 ns 472 ns ffff8e6c86431740 (spinlock)
1 397 ns 397 ns 397 ns ffff8e67413a4f04 (spinlock)
#
# perf test offcpu
95: perf record offcpu profiling tests : Ok
#
# perf kwork latency --use-bpf
Starting trace, Hit <Ctrl+C> to stop and report
^C
Kwork Name | Cpu | Avg delay | Count | Max delay | Max delay start | Max delay end |
--------------------------------------------------------------------------------------------------------------------------------
(w)flush_memcg_stats_dwork | 0000 | 1056.212 ms | 2 | 2112.345 ms | 550113.229573 s | 550115.341919 s |
(w)toggle_allocation_gate | 0000 | 10.144 ms | 62 | 416.389 ms | 550113.453518 s | 550113.869907 s |
(w)0xffff8e6748e28080 | 0002 | 0.623 ms | 1 | 0.623 ms | 550110.989841 s | 550110.990464 s |
(w)vmstat_shepherd | 0000 | 0.586 ms | 10 | 2.828 ms | 550111.971536 s | 550111.974364 s |
(w)vmstat_update | 0007 | 0.363 ms | 5 | 1.634 ms | 550113.222520 s | 550113.224154 s |
(w)vmstat_update | 0000 | 0.324 ms | 10 | 2.827 ms | 550111.971526 s | 550111.974354 s |
(w)0xffff8e674c5f4a58 | 0002 | 0.102 ms | 5 | 0.134 ms | 550110.989839 s | 550110.989972 s |
(w)psi_avgs_work | 0001 | 0.086 ms | 3 | 0.107 ms | 550114.957852 s | 550114.957959 s |
(w)psi_avgs_work | 0000 | 0.079 ms | 5 | 0.100 ms | 550118.605668 s | 550118.605768 s |
(w)kfree_rcu_monitor | 0006 | 0.079 ms | 1 | 0.079 ms | 550110.925821 s | 550110.925900 s |
(w)psi_avgs_work | 0004 | 0.079 ms | 1 | 0.079 ms | 550109.581835 s | 550109.581914 s |
(w)psi_avgs_work | 0001 | 0.078 ms | 1 | 0.078 ms | 550109.197809 s | 550109.197887 s |
(w)psi_avgs_work | 0002 | 0.077 ms | 5 | 0.086 ms | 550110.669819 s | 550110.669905 s |
<SNIP>
# strace -e bpf -o perf-stat-bpf-counters.output perf stat -e cycles --bpf-counters sleep 1
Performance counter stats for 'sleep 1':
6,197,983 cycles
1.003922848 seconds time elapsed
0.000000000 seconds user
0.002032000 seconds sys
# head -7 perf-stat-bpf-counters.output
bpf(BPF_OBJ_GET, {pathname="/sys/fs/bpf/perf_attr_map", bpf_fd=0, file_flags=0}, 16) = 3
bpf(BPF_OBJ_GET_INFO_BY_FD, {info={bpf_fd=3, info_len=88, info=0x7ffcead64990}}, 16) = 0
bpf(BPF_MAP_LOOKUP_ELEM, {map_fd=3, key=0x24129e0, value=0x7ffcead65a48, flags=BPF_ANY}, 32) = 0
bpf(BPF_LINK_GET_FD_BY_ID, {link_id=1252}, 12) = -1 ENOENT (No such file or directory)
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffcead65780, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0,
+func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0}, 116) = 4
bpf(BPF_PROG_LOAD, {prog_type=BPF_PROG_TYPE_SOCKET_FILTER, insn_cnt=2, insns=0x7ffcead65920, license="GPL", log_level=0, log_size=0, log_buf=NULL, kern_version=KERNEL_VERSION(0, 0, 0), prog_flags=0, prog_name="", prog_ifindex=0, expected_attach_type=BPF_CGROUP_INET_INGRESS, prog_btf_fd=0, func_info_rec_size=0,
+func_info=NULL, func_info_cnt=0, line_info_rec_size=0, line_info=NULL, line_info_cnt=0, attach_btf_id=0, attach_prog_fd=0, fd_array=NULL}, 128) = 4
bpf(BPF_BTF_LOAD, {btf="\237\353\1\0\30\0\0\0\0\0\0\0\20\0\0\0\20\0\0\0\5\0\0\0\1\0\0\0\0\0\0\1"..., btf_log_buf=NULL, btf_size=45, btf_log_size=0, btf_log_level=0}, 28) = 4
#
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Suggested-by: Andrii Nakryiko <andrii@kernel.org>
Tested-by: Namhyung Kim <namhyung@kernel.org>
Tested-by: Song Liu <song@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Co-developed-by: Jiri Olsa <jolsa@kernel.org>
Link: https://lore.kernel.org/lkml/ZFU1PJrn8YtHIqno@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It seems that perf stat -b <prog id> doesn't produce any results:
$ perf stat -e cycles -b 4 -I 10000 -vvv
Control descriptor is not initialized
cycles: 0 0 0
time counts unit events
10.007641640 <not supported> cycles
Looks like this happens because fentry/fexit progs are getting loaded, but the
corresponding perf event is not enabled and not added into the events bpf map.
I think there is some mixing up between two type of bpf support, one for bperf
and one for bpf_profiler. Both are identified via evsel__is_bpf, based on which
perf events are enabled, but for the latter (bpf_profiler) a perf event is
required. Using evsel__is_bperf to check only bperf produces expected results:
$ perf stat -e cycles -b 4 -I 10000 -vvv
Control descriptor is not initialized
------------------------------------------------------------
perf_event_attr:
size 136
sample_type IDENTIFIER
read_format TOTAL_TIME_ENABLED|TOTAL_TIME_RUNNING
disabled 1
exclude_guest 1
------------------------------------------------------------
sys_perf_event_open: pid -1 cpu 0 group_fd -1 flags 0x8 = 3
------------------------------------------------------------
[...perf_event_attr for other CPUs...]
------------------------------------------------------------
cycles: 309426 169009 169009
time counts unit events
10.010091271 309426 cycles
The final numbers correspond (at least in the level of magnitude) to the
same metric obtained via bpftool.
Fixes: 112cb56164 ("perf stat: Introduce config stat.bpf-counter-events")
Reviewed-by: Song Liu <song@kernel.org>
Signed-off-by: Dmitrii Dolgov <9erthalion6@gmail.com>
Tested-by: Song Liu <song@kernel.org>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Song Liu <song@kernel.org>
Link: https://lore.kernel.org/r/20230412182316.11628-1-9erthalion6@gmail.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
1, Better backtraces for humanization;
2, Relay BCE exceptions to userland as SIGSEGV;
3, Provide kernel fpu functions;
4, Optimize memory ops (memset/memcpy/memmove);
5, Optimize checksum and crc32(c) calculation;
6, Add ARCH_HAS_FORTIFY_SOURCE selection;
7, Add function error injection support;
8, Add ftrace with direct call support;
9, Add basic perf tools support.
-----BEGIN PGP SIGNATURE-----
iQJKBAABCAA0FiEEzOlt8mkP+tbeiYy5AoYrw/LiJnoFAmRQlUsWHGNoZW5odWFj
YWlAa2VybmVsLm9yZwAKCRAChivD8uImekCTD/9fc2U+FIXhJOWV5yK9TCjJTRnK
ASvk0JMYIDA60+fnof3C85tDu9Py9M5Mvt/Ec5pBaHErn16irq85AdD74/OmyCc2
V4pRFHbYLu0WBFQN77gfNXH0XErgYXdceZvaMXajVz2H6NlSKSWZOVN/9ut5SLi3
mt0rCwCsyahj92n8+hOjjZeFbDaPfPMCQ/8n9dnadhbBm9iz35fOKY+qIBHJMJ9a
wPfZ2k3wu5DHs/2+ZjFNhlwrlURTp3RlcVQ7QWDcR1LM3Z4/lEkD8tAI/r8sR9gw
rxzoBSaQzo/zscUmYo0jh1BoW2w0n+x/GfH70Pyz3iwZky3jwpdP0nRwnB4h+tnE
wKlpa5K7RfaqUxZExFfGALmlkALtjQgiXPYbORHMsD6l6XwrOMCeyQismm1oo66m
JBlsdXCms5aracYmWhXnVmTlBqGjAgYAxm62ap62uwlmULy4qUv6kFeW0fERn9NJ
5bKgbrkcal/WkMBawQqtG03niRkykqpqFooZ95ubj4Lib4VM0BmEvFrREjgXO7AE
jpLimYsT9ROE3YQJqyWyLYkmc2ShwWj70INTpz2viMtQ2blIRKvRVsxs976bHuwS
mGsZtiiANjhT2bAUhN7bct2Cf13MtPXiuf0etcJbrNSAtoBIFk+3uRRKHH2rM+CK
oKYjO+exPyuQ9nSOBg==
=3aTV
-----END PGP SIGNATURE-----
Merge tag 'loongarch-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson
Pull LoongArch updates from Huacai Chen:
- Better backtraces for humanization
- Relay BCE exceptions to userland as SIGSEGV
- Provide kernel fpu functions
- Optimize memory ops (memset/memcpy/memmove)
- Optimize checksum and crc32(c) calculation
- Add ARCH_HAS_FORTIFY_SOURCE selection
- Add function error injection support
- Add ftrace with direct call support
- Add basic perf tools support
* tag 'loongarch-6.4' of git://git.kernel.org/pub/scm/linux/kernel/git/chenhuacai/linux-loongson: (24 commits)
tools/perf: Add basic support for LoongArch
LoongArch: ftrace: Add direct call trampoline samples support
LoongArch: ftrace: Add direct call support
LoongArch: ftrace: Implement ftrace_find_callable_addr() to simplify code
LoongArch: ftrace: Fix build error if DYNAMIC_FTRACE_WITH_REGS is not set
LoongArch: ftrace: Abstract DYNAMIC_FTRACE_WITH_ARGS accesses
LoongArch: Add support for function error injection
LoongArch: Add ARCH_HAS_FORTIFY_SOURCE selection
LoongArch: crypto: Add crc32 and crc32c hw acceleration
LoongArch: Add checksum optimization for 64-bit system
LoongArch: Optimize memory ops (memset/memcpy/memmove)
LoongArch: Provide kernel fpu functions
LoongArch: Relay BCE exceptions to userland as SIGSEGV with si_code=SEGV_BNDERR
LoongArch: Tweak the BADV and CPUCFG.PRID lines in show_regs()
LoongArch: Humanize the ESTAT line when showing registers
LoongArch: Humanize the ECFG line when showing registers
LoongArch: Humanize the EUEN line when showing registers
LoongArch: Humanize the PRMD line when showing registers
LoongArch: Humanize the CRMD line when showing registers
LoongArch: Fix format of CSR lines during show_regs()
...
When is_valid_tracepoint() returns 1, need to call put_events_file() to
free `dir_path`.
Fixes: 25a7d91427 ("perf parse-events: Use get/put_events_file()")
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230421025953.173826-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When cross-analyzing perf data recorded on an another platform, massive
unsupported target platform errors are printed. So let's show this message
as warning and only once.
Signed-off-by: Changbin Du <changbin.du@huawei.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hui Wang <hw.huiwang@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230426032246.3608596-1-changbin.du@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Before this, the raw ip is printed for non-callchain and dso offset for
callchain. This inconsistent output for address may confuse people. And
mostly what we expect is the raw ip.
'dso offset' is printed in callchain:
$ perf script
...
ls 1341034 2739463.008343: 2162417 cycles:
ffffffff99d657a7 [unknown] ([unknown])
ffffffff99e00b67 [unknown] ([unknown])
235d3 memset+0x53 (/usr/lib/x86_64-linux-gnu/ld-2.31.so) # dso offset
a61b _dl_map_object+0x1bb (/usr/lib/x86_64-linux-gnu/ld-2.31.so)
raw ip is printed for non-callchain:
$ perf script -G
...
ls 1341034 2739463.008876: 2053304 cycles: ffffffffc1596923 [unknown] ([unknown])
ls 1341034 2739463.009381: 1917049 cycles: 14def8e149e6 __strcoll_l+0xd96 (/usr/lib/x86_64-linux-gnu/libc-2.31.so) # raw ip
Let's have consistent output for it. Later I'll add a new field 'dsoff' to
print dso offset.
Signed-off-by: Changbin Du <changbin.du@huawei.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Hui Wang <hw.huiwang@huawei.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230418031825.1262579-2-changbin.du@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In elf_read_build_id(), if gnu build_id is found, should return the size of
the actually copied data. If descsz is greater thanBuild_ID_SIZE,
write_buildid data access may occur.
Fixes: be96ea8ffa ("perf symbols: Fix issue with binaries using 16-bytes buildids (v2)")
Reported-by: Will Ochowicz <Will.Ochowicz@genusplc.com>
Signed-off-by: Yang Jihong <yangjihong1@huawei.com>
Tested-by: Will Ochowicz <Will.Ochowicz@genusplc.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/lkml/CWLP265MB49702F7BA3D6D8F13E4B1A719C649@CWLP265MB4970.GBRP265.PROD.OUTLOOK.COM/T/
Link: https://lore.kernel.org/r/20230427012841.231729-1-yangjihong1@huawei.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It should mention scandirat() instead of scandir().
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230427230502.1526136-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It should free entries (not only the array) filled by scandirat()
after use.
Reviewed-by: Ian Rogers <irogers@google.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230427230502.1526136-1-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It seems BPF CO-RE reloc doesn't work well with the pattern that gets
the field-offset only. Use offsetof() to make it explicit so that
the compiler would generate the correct code.
Fixes: 0c1228486b ("perf lock contention: Support pre-5.14 kernels")
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Hao Luo <haoluo@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Co-developed-by: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Link: https://lore.kernel.org/r/20230427234833.1576130-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add basic support for LoongArch, which is very similar to the MIPS
version.
Signed-off-by: Ming Wang <wangming01@loongson.cn>
Signed-off-by: Huacai Chen <chenhuacai@loongson.cn>
Switch to use evsel__name() that doesn't return NULL for hardware and
similar events.
Reviewed-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ahmad Yasin <ahmad.yasin@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Caleb Biggers <caleb.biggers@intel.com>
Cc: Edward Baker <edward.baker@intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kang Minchul <tegongkang@gmail.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Perry Taylor <perry.taylor@intel.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Samantha Alt <samantha.alt@intel.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
Cc: Weilin Wang <weilin.wang@intel.com>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: Yang Jihong <yangjihong1@huawei.com>
Link: https://lore.kernel.org/r/20230426070050.1315519-8-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Timeless and per-thread are orthogonal concepts that are currently
treated as if they are the same (per-thread == timeless). This breaks
when you modify the command line or itrace options to something that the
current logic doesn't expect.
For example:
# Force timeless with Z
--itrace=Zi10i
# Or inconsistent record options
-e cs_etm/timestamp=1/ --per-thread
Adding Z for decoding in per-cpu mode is particularly bad because in
per-thread mode trace channel IDs are discarded and all assumed to be 0,
which would mix trace from different CPUs in per-cpu mode.
Although the results might not be perfect in all scenarios, if the user
requests no timestamps, it should still be possible to decode in either
mode. Especially if the relative times of samples in different processes
aren't interesting, quite a bit of space can be saved by turning off
timestamps in per-cpu mode.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Denis Nikitin <denik@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230424134748.228137-8-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Using u8 for boolean values makes the code a bit more difficult to read
so be more explicit by using bool.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Denis Nikitin <denik@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230424134748.228137-7-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Timestamps and context tracking are automatically enabled in per-core
mode and it's impossible to override this. Use the new utility function
to set them conditionally.
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Denis Nikitin <denik@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230424134748.228137-6-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There is some duplicated code to only override config values if they
haven't already been set by the user so make a util function for this.
Signed-off-by: James Clark <james.clark@arm.com>
Acked-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Denis Nikitin <denik@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Yang Shi <shy828301@gmail.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230424134748.228137-3-james.clark@arm.com
[ Moved evsel__set_config_if_unset() to util/pmu.c to avoid dragging stuff into the python binding ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In this context, timeless refers to the trace data rather than the perf
event data. But when detecting whether there are timestamps in the trace
data or not, the presence of a timestamp flag on any perf event is used.
Since commit f42c0ce573 ("perf record: Always get text_poke events
with --kcore option") timestamps were added to a tracking event when
--kcore is used which breaks this detection mechanism. Fix it by
detecting if trace timestamps exist by looking at the ETM config flags.
This would have always been a more accurate way of doing it anyway.
This fixes the following error message when using --kcore with
Coresight:
$ perf record --kcore -e cs_etm// --per-thread
$ perf report
The perf.data/data data has no samples!
Fixes: f42c0ce573 ("perf record: Always get text_poke events with --kcore option")
Reported-by: Yang Shi <shy828301@gmail.com>
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: denik@google.com
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/lkml/CAHbLzkrJQTrYBtPkf=jf3OpQ-yBcJe7XkvQstX9j2frz4WF-SQ@mail.gmail.com/
Link: https://lore.kernel.org/r/20230424134748.228137-2-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
This makes the logic a bit clear by avoiding the !strcmp() pattern and
also a way to intercept the pointer if we need to do extra validation on
it or to do lazy setting of evsel->name via evsel__name(evsel).
Reviewed-by: "Liang, Kan" <kan.liang@linux.intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZEGLM8VehJbS0gP2@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To fix this confusing warning:
# perf probe -l
Failed to find debug information for address 798240
probe_main:prometheus_new_counter__return (on github.com/prometheus/client_golang/prometheus.NewCounter%return in /home/acme/git/prometheus-uprobes/main with counter)
#
As that 798240 is printed with PRIx64 but has no letters, better print
the 0x prefix to disambiguate.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/lkml/ZEBCyFu2GjTw6qOi@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
There's no strict get/put policy with map that leads to leaks or use
after free. Reference count checking identifies correct pairing of gets
and puts.
Committer notes:
Extracted from a larger patch removing bits that were covered by the use
of pre-existing map__ accessors (e.g. maps__nr_maps()) and new ones
added (map__refcnt() and the maps__set_ ones) to reduce
RC_CHK_ACCESS(maps)-> source code pollution.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Link: https://lore.kernel.org/lkml/20230407230405.2931830-6-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some conversions weren't performed in 4e8db2d752 ("perf map: Add
map__refcnt() accessor to use in the maps test"), fix it.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add reference count checking to make sure of good use of get and put.
Add and use accessors to reduce RC_CHK clutter.
The only significant issue was in tests/thread-maps-share.c where
reference counts were released in the reverse order to acquisition,
leading to a use after put. This was fixed by reversing the put order.
Committer notes:
Extracted from a larger patch removing bits that were covered by the use
of pre-existing maps__ accessors (e.g. maps__nr_maps()) and new ones
added (maps__refcnt()) to reduce RC_CHK_ACCESS(maps)-> source code
pollution.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Link: https://lore.kernel.org/lkml/20230407230405.2931830-5-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To remove one more direct access to 'struct maps' so that we can
intercept accesses to its instantiations and refcount check it to catch
use after free, etc.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
James Clark noticed that the recent 63df0e4bc3 ("perf map: Add
accessor for dso") patch accessed map->dso before the 'map' variable was
NULL checked, which is a change in logic that leads to segmentation
faults, so comb thru that patch to fix similar cases.
Fixes: 63df0e4bc3 ("perf map: Add accessor for dso")
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Link: https://lore.kernel.org/lkml/ZD68RYCVT8hqPuxr@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
map__dso() is called before thread__find_map() which always results in a
null pointer dereference. Fix it by finding first, then checking if it
exists.
Fixes: 63df0e4bc3 ("perf map: Add accessor for dso")
Signed-off-by: James Clark <james.clark@arm.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230418141203.673465-1-james.clark@arm.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add reference count checking controlled by REFCNT_CHECKING ifdef. The
reference count checking interposes an allocated pointer between the
reference counted struct on a get and frees the pointer on a put.
Accesses after a put cause faults and use after free, missed puts are
caughts as leaks and double puts are double frees.
This checking helped resolve a memory leak and use after free:
https://lore.kernel.org/linux-perf-users/CAP-5=fWZH20L4kv-BwVtGLwR=Em3AOOT+Q4QGivvQuYn5AsPRg@mail.gmail.com/
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Link: https://lore.kernel.org/lkml/20230407230405.2931830-4-irogers@google.com
[ Extracted from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We'll need to reference count dso->nsinfo, so reduce the number of
direct accesses by having a shorter form of obtaining a filename with
a chroot (namespace one).
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Link: https://lore.kernel.org/lkml/ZD26ZlqSbgSyH5lX@kernel.org
[ Used nsinfo__pid(dso->nsinfo), as it was already present ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Enabled when REFCNT_CHECKING is defined. The change adds a memory
allocated pointer that is interposed between the reference counted cpu
map at a get and freed by a put. The pointer replaces the original
perf_cpu_map struct, so use of the perf_cpu_map via APIs remains
unchanged. Any use of the cpu map without the API requires two versions,
handled via the RC_CHK_ACCESS macro.
This change is intended to catch:
- use after put: using a cpumap after you have put it will cause a
segv.
- unbalanced puts: two puts for a get will result in a double free
that can be captured and reported by tools like address sanitizer,
including with the associated stack traces of allocation and frees.
- missing puts: if a put is missing then the get turns into a memory
leak that can be reported by leak sanitizer, including the stack
trace at the point the get occurs.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>,
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/lkml/20230407230405.2931830-3-irogers@google.com
[ Extracted from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that we can validate the 'map' instance wrt refcount checking.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Link: https://lore.kernel.org/lkml/20230407230405.2931830-3-irogers@google.com
[ Extracted from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When extracting this patch from Ian's original patch I forgot to remove
the setting of ->nr and ->refcnt, no need to do those initializations
again as those are done in perf_cpu_map__alloc() already, duh.
Cc: Ian Rogers <irogers@google.com>
Fixes: 1f94479edb ("libperf: Make perf_cpu_map__alloc() available as an internal function for tools/perf to use")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Addresses of two data structure members were determined before
corresponding null pointer checks in the implementation of the function
“sort__sym_from_cmp”.
Thus avoid the risk for undefined behaviour by removing extra
initialisations for the local variables “from_l” and “from_r” (also
because they were already reassigned with the same value behind this
pointer check).
This issue was detected by using the Coccinelle software.
Fixes: 1b9e97a2a9 ("perf tools: Fix report -F symbol_from for data without branch info")
Signed-off-by: <elfring@users.sourceforge.net>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Link: https://lore.kernel.org/cocci/54a21fea-64e3-de67-82ef-d61b90ffad05@web.de/
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
One more step to allow for checking reference counting, user after free,
etc.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Link: https://lore.kernel.org/ZDb9dycHQ11UIXwx@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We had the open coded equivalent in perf_cpu_map__empty_new(), so reuse
what is in libperf.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Link: https://lore.kernel.org/lkml/20230407230405.2931830-3-irogers@google.com
[ Split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
So that we can have a single point where to refcount check 'struct perf_cpu_map'
instances for use after free, etc.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Link: https://lore.kernel.org/lkml/20230407230405.2931830-3-irogers@google.com
[ Split from a larger patch ]
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To remove one more direct access to 'struct map' so that we can intecept
accesses to its instantiations and refcount check it to catch use after
free, etc.
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Link: https://lore.kernel.org/lkml/ZDbRIJknafLnDwtO@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
An audit showed just this one problem with zfree(), fix it.
Fixes: 9fbc61f832 ("perf pmu: Add support for PMU capabilities")
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Do defensive programming by using zfree() to initialize freed pointers
to NULL, so that eventual use after free result in a NULL pointer deref
instead of more subtle behaviour.
This file already used zfree() in other places, so this just plugs some
leftovers.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Do defensive programming by using zfree() to initialize freed pointers
to NULL, so that eventual use after free result in a NULL pointer deref
instead of more subtle behaviour.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Do defensive programming by using zfree() to initialize freed pointers
to NULL, so that eventual use after free result in a NULL pointer deref
instead of more subtle behaviour.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Do defensive programming by using zfree() to initialize freed pointers
to NULL, so that eventual use after free result in a NULL pointer deref
instead of more subtle behaviour.
Also include the missing linux/zalloc.h header directive.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Do defensive programming by using zfree() to initialize freed pointers
to NULL, so that eventual use after free result in a NULL pointer deref
instead of more subtle behaviour.
Also remove one NULL test before free(), as it accepts a NULL arg and we
get one line shaved not doing it explicitely.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Do defensive programming by using zfree() to initialize freed pointers
to NULL, so that eventual use after free result in a NULL pointer deref
instead of more subtle behaviour.
Also remove one NULL test before free(), as it accepts a NULL arg and we
get one line shaved not doing it explicitely.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Do defensive programming by using zfree() to initialize freed pointers
to NULL, so that eventual use after free result in a NULL pointer deref
instead of more subtle behaviour.
Also remove one NULL test before free(), as it accepts a NULL arg and we
get one line shaved not doing it explicitely.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Do defensive programming by using zfree() to initialize freed pointers
to NULL, so that eventual use after free result in a NULL pointer deref
instead of more subtle behaviour.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Do defensive programming by using zfree() to initialize freed pointers
to NULL, so that eventual use after free result in a NULL pointer deref
instead of more subtle behaviour.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Do defensive programming by using zfree() to initialize freed pointers
to NULL, so that eventual use after free result in a NULL pointer deref
instead of more subtle behaviour.
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
If bperf (perf tools that use BPF skels) sets evsel->leader_skel or
evsel->follower_skel then it appears that evsel->bpf_skel is set and can
trigger the following use-after-free:
==13575==ERROR: AddressSanitizer: heap-use-after-free on address 0x60c000014080 at pc 0x55684b939880 bp 0x7ffdfcf30d70 sp 0x7ffdfcf30d68
READ of size 8 at 0x60c000014080 thread T0
#0 0x55684b93987f in sample_filter_bpf__destroy tools/perf/bpf_skel/sample_filter.skel.h:44:11
#1 0x55684b93987f in perf_bpf_filter__destroy tools/perf/util/bpf-filter.c:155:2
#2 0x55684b98f71e in evsel__exit tools/perf/util/evsel.c:1521:2
#3 0x55684b98a352 in evsel__delete tools/perf/util/evsel.c:1547:2
#4 0x55684b981918 in evlist__purge tools/perf/util/evlist.c:148:3
#5 0x55684b981918 in evlist__delete tools/perf/util/evlist.c:169:2
#6 0x55684b887d60 in cmd_stat tools/perf/builtin-stat.c:2598:2
..
0x60c000014080 is located 0 bytes inside of 128-byte region [0x60c000014080,0x60c000014100)
freed by thread T0 here:
#0 0x55684b780e86 in free compiler-rt/lib/asan/asan_malloc_linux.cpp:52:3
#1 0x55684b9462da in bperf_cgroup_bpf__destroy tools/perf/bpf_skel/bperf_cgroup.skel.h:61:2
#2 0x55684b9462da in bperf_cgrp__destroy tools/perf/util/bpf_counter_cgroup.c:282:2
#3 0x55684b944c75 in bpf_counter__destroy tools/perf/util/bpf_counter.c:819:2
#4 0x55684b98f716 in evsel__exit tools/perf/util/evsel.c:1520:2
#5 0x55684b98a352 in evsel__delete tools/perf/util/evsel.c:1547:2
#6 0x55684b981918 in evlist__purge tools/perf/util/evlist.c:148:3
#7 0x55684b981918 in evlist__delete tools/perf/util/evlist.c:169:2
#8 0x55684b887d60 in cmd_stat tools/perf/builtin-stat.c:2598:2
...
previously allocated by thread T0 here:
#0 0x55684b781338 in calloc compiler-rt/lib/asan/asan_malloc_linux.cpp:77:3
#1 0x55684b944e25 in bperf_cgroup_bpf__open_opts tools/perf/bpf_skel/bperf_cgroup.skel.h:73:35
#2 0x55684b944e25 in bperf_cgroup_bpf__open tools/perf/bpf_skel/bperf_cgroup.skel.h:97:9
#3 0x55684b944e25 in bperf_load_program tools/perf/util/bpf_counter_cgroup.c:55:9
#4 0x55684b944e25 in bperf_cgrp__load tools/perf/util/bpf_counter_cgroup.c:178:23
#5 0x55684b889289 in __run_perf_stat tools/perf/builtin-stat.c:713:7
#6 0x55684b889289 in run_perf_stat tools/perf/builtin-stat.c:949:8
#7 0x55684b888029 in cmd_stat tools/perf/builtin-stat.c:2537:12
Resolve by clearing 'evsel->bpf_skel' as part of bpf_counter__destroy().
Suggested-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: http://lore.kernel.org/lkml/20230411051718.267228-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Seen in "perf stat --bpf-counters --for-each-cgroup test" running in a
container:
libbpf: Failed to bump RLIMIT_MEMLOCK (err = -1), you might need to do it explicitly!
libbpf: Error in bpf_object__probe_loading():Operation not permitted(1). Couldn't load trivial BPF program. Make sure your kernel supports BPF (CONFIG_BPF_SYSCALL=y) and/or that RLIMIT_MEMLOCK is set to big enough value.
libbpf: failed to load object 'bperf_cgroup_bpf'
libbpf: failed to load BPF skeleton 'bperf_cgroup_bpf': -1
Failed to load cgroup skeleton
#0 0x55f28a650981 in list_empty tools/include/linux/list.h:189
#1 0x55f28a6593b4 in evsel__exit util/evsel.c:1518
#2 0x55f28a6596af in evsel__delete util/evsel.c:1544
#3 0x55f28a89d166 in bperf_cgrp__destroy util/bpf_counter_cgroup.c:283
#4 0x55f28a899e9a in bpf_counter__destroy util/bpf_counter.c:816
#5 0x55f28a659455 in evsel__exit util/evsel.c:1520
#6 0x55f28a6596af in evsel__delete util/evsel.c:1544
#7 0x55f28a640d4d in evlist__purge util/evlist.c:148
#8 0x55f28a640ea6 in evlist__delete util/evlist.c:169
#9 0x55f28a4efbf2 in cmd_stat tools/perf/builtin-stat.c:2598
#10 0x55f28a6050c2 in run_builtin tools/perf/perf.c:330
#11 0x55f28a605633 in handle_internal_command tools/perf/perf.c:384
#12 0x55f28a6059fb in run_argv tools/perf/perf.c:428
#13 0x55f28a6061d3 in main tools/perf/perf.c:562
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230410205659.3131608-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Some of the IBS_OP_DATA2 bit descriptions were stale (taken from old
version of PPR). Change it according to latest PPR.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230407112459.548-5-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
39 is taken from the length of longest printable new API string:
"Remote socket, same board Any cache hit". Although, using old API
can result into even longer strings, let's not overkill by making
it dynamic length.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230407112459.548-5-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Interpretation of 'union perf_mem_data_src' by perf_mem__lvl_scnprintf()
is non-intuitive. For ex, it ignores 'mem_lvl' when 'mem_hops' is set
but considers it otherwise. It prints both 'mem_lvl_num' and 'mem_lvl'
when 'mem_hops' is not set.
Refactor this function such that it behaves more intuitively: Use new
API 'mem_lvl_num'|'mem_remote'|'mem_hops' if 'mem_lvl_num' contains
value other than PERF_MEM_LVLNUM_NA. Otherwise, fallback to old API
'mem_lvl'. Since new API has no way to indicate MISS, use it from old
api, otherwise don't club old and new APIs while parsing as well as
printing.
Before:
$ sudo ./perf mem report -F sample,mem --stdio
# Samples Memory access
# ............ ........................
#
250097 N/A
188907 L1 hit
4116 L2 hit
3496 Remote Cache (1 hop) hit
3271 Remote Cache (2 hops) hit
873 L3 hit
598 Local RAM hit
438 Remote RAM (1 hop) hit
1 Uncached hit
After:
$ sudo ./perf mem report -F sample,mem --stdio
# Samples Memory access
# ............ .......................................
#
255517 N/A
189989 L1 hit
4541 L2 hit
3363 Remote core, same node Any cache hit
3336 Remote node, same socket Any cache hit
1275 L3 hit
743 RAM hit
545 Remote node, same socket RAM hit
4 Uncached hit
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230407112459.548-5-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add support for printing PERF_MEM_LVLNUM_UNC in perf mem report.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230407112459.548-5-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add PERF_MEM_LVLNUM_NA wherever PERF_MEM_DATA_SRC_NONE is used to set
default values.
Signed-off-by: Ravi Bangoria <ravi.bangoria@amd.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ananth Narayan <ananth.narayan@amd.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Sandipan Das <sandipan.das@amd.com>
Cc: Santosh Shukla <santosh.shukla@amd.com>
Cc: Stephane Eranian <eranian@google.com>
Link: https://lore.kernel.org/r/20230407112459.548-5-ravi.bangoria@amd.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The definitions are in util.c so move the declarations to match.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chengdong Li <chengdongli@tencent.com>
Cc: Denis Nikitin <denik@chromium.org>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raul Silvera <rsilvera@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230410162511.3055900-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'input_name' is the name of the input perf.data file, it is used by data
convert and ui code. Move it to util to make it more consistent with
other global state.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chengdong Li <chengdongli@tencent.com>
Cc: Denis Nikitin <denik@chromium.org>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raul Silvera <rsilvera@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230410162511.3055900-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Move to match the definition in header.c.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chengdong Li <chengdongli@tencent.com>
Cc: Denis Nikitin <denik@chromium.org>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raul Silvera <rsilvera@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230410162511.3055900-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The usage function is part of util.h, move the usage strings there
too.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com>
Cc: Chengdong Li <chengdongli@tencent.com>
Cc: Denis Nikitin <denik@chromium.org>
Cc: Florian Fischer <florian.fischer@muhq.space>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Martin Liška <mliska@suse.cz>
Cc: Mathieu Poirier <mathieu.poirier@linaro.org>
Cc: Mike Leach <mike.leach@linaro.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Raul Silvera <rsilvera@google.com>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Will Deacon <will@kernel.org>
Cc: Xing Zhengjun <zhengjun.xing@linux.intel.com>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230410162511.3055900-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The 'mem_hops' bits were added in 5.16 with no prior equivalent.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230408055208.1283832-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
'struct rq's member '__lock' was renamed from 'lock' in 5.14.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230408055208.1283832-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
ELF is acronym and therefore should be spelled in all caps.
I left one exception at Documentation/arm/nwfpe/nwfpe.rst which looks like
being written in the first person.
Link: https://lkml.kernel.org/r/Y/3wGWQviIOkyLJW@p183
Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
When a pointer to a map exists do a get, when that pointer is
overwritten or freed, put the map. This avoids issues with gets and
puts being inconsistently used causing, use after puts, etc. For
example, the map in struct addr_location is changed to hold a
reference count. Reference count checking and address sanitizer were
used to identify issues.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230404205954.2245628-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
To make it clearer about the ownership of a reference count split the
by-name case from the regular start-address sorted tree. Put the
reference count when maps_by_name is freed, which requires moving a
decrement to nr_maps in maps__remove. Add two missing map puts in
maps__fixup_overlappings in the event maps__insert fails.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230404205954.2245628-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Later changes will add reference count checking for 'struct map'. Add
accessors so that the reference count check is only necessary in one
place.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230404205954.2245628-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Later changes will add reference count checking for 'struct map'. Add an
accessor so that the reference count check is only necessary in one
place.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230404205954.2245628-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Later changes will add reference count checking for struct map, add a
helper function to invoke the map_ip and unmap_ip function pointers. The
helper allows the reference count check to be in fewer places.
Committer notes:
Add missing conversions to:
tools/perf/util/map.c
tools/perf/util/cs-etm.c
tools/perf/util/annotate.c
tools/perf/arch/powerpc/util/sym-handling.c
tools/perf/arch/s390/annotate/instructions.c
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230404205954.2245628-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Add dso to match comment. This avoids a naming conflict with later
added accessor functions for variables in struct map.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230404205954.2245628-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
We may have a lot of copies of a particular uncore PMU, such as
uncore_cha_0 to uncore_cha_59 on Intel sapphirerapids.
The JSON events may match each of PMUs and so the events are copied to
it.
In 'perf list' this means we see the same JSON event 60 times as events
on different PMUs don't have duplicates removed.
There are 284 uncore_cha events on sapphirerapids.
Rather than use the PMU's name to sort and remove duplicates, use the
JSON PMU name.
This reduces the 60 copies back down to 1 and has the side effect of
speeding things like the "perf all PMU test" shell test.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230406235256.2768773-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Improve documentation around perf_pmu_alias pmu_name and on
functions.
Reduce the scope of pmu_uncore_alias_match to just file.
Rename perf_pmu__valid_suffix to the more revealing
perf_pmu__match_ignoring_suffix.
Add a short-cut to perf_pmu__match_ignoring_suffix for PMU names that
don't also have a socket value, and can therefore avoid a memory
allocation.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230406235256.2768773-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
struct pmu_event has const char*s, only unit needs to be non-const for
the sake of passing as an out argument to strtod().
Reduce the const casts from 4 down to 1.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230406235256.2768773-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It doesn't delete data in the task_data and lock_stat maps. The data
is kept there until it's consumed by userspace at the end. But it calls
bpf_map_update_elem() again and again, and the data will be discarded if
the map is full. This is not good.
Worse, in the bpf_map_update_elem(), it keeps trying to get a new node
even if the map was full. I guess it makes sense if it deletes some node
like in the tstamp map (that's why I didn't make the change there).
In a pre-allocated hash map, that means it'd iterate all CPU to check the
freelist. And it has a bad performance impact on large machines.
I've checked it on my 64 CPU machine with this.
$ perf bench sched messaging -g 1000
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 1000 groups == 40000 processes run
Total time: 2.825 [sec]
And I used the task mode, so that it can guarantee the map is full.
The default map entry size is 16K and this workload has 40K tasks.
Before:
$ sudo ./perf lock con -abt -E3 -- perf bench sched messaging -g 1000
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 1000 groups == 40000 processes run
Total time: 11.299 [sec]
contended total wait max wait avg wait pid comm
19284 3.51 s 3.70 ms 181.91 us 1305863 sched-messaging
243 84.09 ms 466.67 us 346.04 us 1336608 sched-messaging
177 66.35 ms 12.08 ms 374.88 us 1220416 node
For some reason, it didn't report the data failures. But you can see the
total time in the workload is increased a lot (2.8 -> 11.3). If it fails
early when the map is full, it goes back to normal.
After:
$ sudo ./perf lock con -abt -E3 -- perf bench sched messaging -g 1000
# Running 'sched/messaging' benchmark:
# 20 sender and receiver processes per group
# 1000 groups == 40000 processes run
Total time: 3.044 [sec]
contended total wait max wait avg wait pid comm
18743 591.92 ms 442.96 us 31.58 us 1431454 sched-messaging
51 210.64 ms 207.45 ms 4.13 ms 1468724 sched-messaging
81 68.61 ms 65.79 ms 847.07 us 1463183 sched-messaging
=== output for debug ===
bad: 1164137, total: 2253341
bad rate: 51.66 %
histogram of failure reasons
task: 0
stack: 0
time: 0
data: 1164137
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It needs callstacks for two reasons:
* for stack aggregation mode, the map key is the stack id and it can
also show the full stack traces when -v is used
* for other aggregation modes, the stack filter can be used to limit
lock contentions from known call paths
The -v option is meaningful (in terms of stack trace) only for stack
aggregation mode, so it should not set the save_callstack for other
mode like with -t or -l options.
I've noticed this with the following command line:
$ sudo ./perf lock con -ablv -E 3 -M 16 -- ./perf bench sched messaging
...
contended total wait max wait avg wait address symbol
88 4.59 ms 108.07 us 52.13 us ffff935757f46ec0 (spinlock)
33 905.22 us 73.67 us 27.43 us ffff935757f41700 (spinlock)
28 703.69 us 79.28 us 25.13 us ffff938a3d9b0c80 rq_lock (spinlock)
=== output for debug ===
bad: 12272, total: 12421
bad rate: 98.80 %
histogram of failure reasons
task: 8285
stack: 3987 <---------- here
time: 0
data: 0
It should not have any failure on stacks since it doesn't use it.
No functional change intended.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When -E option is used, it only prints the given number of entries but
the event stat at the end should have the numbers for entire entries.
Likewise, -S option will hide entries that don't have the named
function in the callstack. Also update event stat for them.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It's possible to fail to update the data when the lock_stat map is full.
We should check that case and show the number at the end.
$ sudo ./perf lock con -ablv -E3 -- ./perf bench sched messaging
...
contended total wait max wait avg wait address symbol
6157 208.48 ms 69.29 us 33.86 us ffff934c001c1f00 (spinlock)
4030 72.04 ms 61.84 us 17.88 us ffff934c000415c0 (spinlock)
3201 50.30 ms 47.73 us 15.71 us ffff934c2eead850 (spinlock)
=== output for debug ===
bad: 0, total: 13388
bad rate: 0.00 %
histogram of failure reasons
task: 0
stack: 0
time: 0
data: 0 <----- added
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The BPF hash map will align the map size to a power of 2. So 10k would
be 16k anyway. Let's have the actual size to avoid confusions.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Juri Lelli <juri.lelli@redhat.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Song Liu <song@kernel.org>
Cc: bpf@vger.kernel.org
Link: https://lore.kernel.org/r/20230406210611.1622492-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Rename the fallthrough attribute to better align with the kernel
version. Copy the definition from include/linux/compiler_attributes.h
including the #else clause. Adding the #else clause allows the tools
compiler.h header to drop the check for a definition entirely and keeps
both definitions together.
Change any __fallthrough statements to fallthrough anywhere it was used
within perf.
This allows other tools to use the same key word as the kernel.
Committer notes:
Did some missing conversions to:
builtin-list.c
Also included gtk.h before the 'fallthrough' definition in:
tools/perf/ui/gtk/hists.c
tools/perf/ui/gtk/helpline.c
tools/perf/ui/gtk/browser.c
As it is the arg name for a macro in glib.h:
/var/home/acme/git/perf-tools-next/tools/include/linux/compiler-gcc.h:16:55: error: missing binary operator before token "("
16 | # define fallthrough __attribute__((__fallthrough__))
| ^
/usr/include/glib-2.0/glib/gmacros.h:637:28: note: in expansion of macro ‘fallthrough’
637 | #if g_macro__has_attribute(fallthrough)
Reviewed-by: Miguel Ojeda <ojeda@kernel.org>
Signed-off-by: Liam Howlett <Liam.Howlett@oracle.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Miguel Ojeda <ojeda@kernel.org>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Cc: Tom Rix <trix@redhat.com>
Cc: linux-sparse@vger.kernel.org <linux-sparse@vger.kernel.org>
Cc: llvm@lists.linux.dev <llvm@lists.linux.dev>
Link: https://lore.kernel.org/r/20221125154947.2163498-1-Liam.Howlett@oracle.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ensure fd is closed on error paths.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230406065224.2553640-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
By default bison uses global state for compatibility with yacc. Make
the parser reentrant so that it may be used in asynchronous and
multithreaded situations.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Gaosheng Cui <cuigaosheng1@huawei.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jing Zhang <renyu.zj@linux.alibaba.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Ravi Bangoria <ravi.bangoria@amd.com>
Cc: Rob Herring <robh@kernel.org>
Cc: Sean Christopherson <seanjc@google.com>
Cc: Suzuki Poulouse <suzuki.poulose@arm.com>
Link: https://lore.kernel.org/r/20230406065224.2553640-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Later changes will add reference count checking for struct map, start
and end are frequently accessed variables. Add an accessor so that the
reference count check is only necessary in one place.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Later changes will add reference count checking for struct map, with
dso being the most frequently accessed variable. Add an accessor so
that the reference count check is only necessary in one place.
Additional changes:
- add a dso variable to avoid repeated map__dso calls.
- in builtin-mem.c dump_raw_samples, code only partially tested for
dso == NULL. Make the possibility of NULL consistent.
- in thread.c thread__memcpy fix use of spaces and use tabs.
Committer notes:
Did missing conversions on these files:
tools/perf/arch/powerpc/util/skip-callchain-idx.c
tools/perf/arch/powerpc/util/sym-handling.c
tools/perf/ui/browsers/hists.c
tools/perf/ui/gtk/annotate.c
tools/perf/util/cs-etm.c
tools/perf/util/thread.c
tools/perf/util/unwind-libunwind-local.c
tools/perf/util/unwind-libunwind.c
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Introduce functions to access struct maps. These functions reduce the
number of places reference counting is necessary. While tidying APIs do
some small const-ification, in particlar to unwind_libunwind_ops.
Committer notes:
Fixed up tools/perf/util/unwind-libunwind.c:
- return ops->get_entries(cb, arg, thread, data, max_stack);
+ return ops->get_entries(cb, arg, thread, data, max_stack, best_effort);
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
struct map is reference counted, having it also be a node in an
red-black tree complicates the reference counting. Switch to having a
map_rb_node which is a red-block tree node but points at the reference
counted struct map. This reference is responsible for a single reference
count.
Committer notes:
Fixed up tools/perf/util/unwind-libunwind-local.c to use map_rb_node as
well.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Using a perf map as a list node is only done in symbol. Move the
list_node struct into symbol as a single pointer to the map. This makes
reference count behavior more obvious and easy to check.
Committer notes:
Some changes to reduce the number of lines touched by keeping, for
instance, the 'new_map' variable and setting it to new_node->map, so
that we keep more of the project history in place and keep as much
as possible the value of the 'git blame' tool.
Also use map__zput() when putting a struct members, so that when we free
the container struct we can get use-after-free errors as NULL pointer
derefs sometimes.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Alexey Bayduraev <alexey.v.bayduraev@linux.intel.com>
Cc: Andi Kleen <ak@linux.intel.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Darren Hart <dvhart@infradead.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Dmitriy Vyukov <dvyukov@google.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: German Gomez <german.gomez@arm.com>
Cc: Hao Luo <haoluo@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: James Clark <james.clark@arm.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: John Garry <john.g.garry@oracle.com>
Cc: Kajol Jain <kjain@linux.ibm.com>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Madhavan Srinivasan <maddy@linux.ibm.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Miaoqian Lin <linmq006@gmail.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Riccardo Mancini <rickyman7@gmail.com>
Cc: Shunsuke Nakamura <nakamura.shun@fujitsu.com>
Cc: Song Liu <song@kernel.org>
Cc: Stephane Eranian <eranian@google.com>
Cc: Stephen Brennan <stephen.s.brennan@oracle.com>
Cc: Steven Rostedt (VMware) <rostedt@goodmis.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Thomas Richter <tmricht@linux.ibm.com>
Cc: Yury Norov <yury.norov@gmail.com>
Link: https://lore.kernel.org/r/20230320212248.1175731-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
As reported by leak sanitizer.
Signed-off-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Brian Robbins <brianrob@linux.microsoft.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Yuan Can <yuancan@huawei.com>
Link: https://lore.kernel.org/r/20230403203545.1872196-1-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Ignore SIGPIPEs when addr2line is configured.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230403184033.1836023-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The sentinel value differs for llvm-addr2line. Configure this once and
then detect when reading records.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230403184033.1836023-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Don't wrap stdin and stdout of subprocess with streams, use the api/io
library for buffering.
Signed-off-by: Ian Rogers <irogers@google.com>
Acked-by: Namhyung Kim <namhyung@kernel.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Nathan Chancellor <nathan@kernel.org>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Tom Rix <trix@redhat.com>
Cc: llvm@lists.linux.dev
Link: https://lore.kernel.org/r/20230403184033.1836023-2-irogers@google.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
These two helpers will also use openat() to reduce the overhead with
relative pathnames. Convert other functions in pmu_lookup() to use
the new helpers.
Committer testing:
Before:
⬢[acme@toolbox perf-tools-next]$ perf bench internals pmu-scan
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
Average PMU scanning took: 2729.040 usec (+- 7.117 usec)
⬢[acme@toolbox perf-tools-next]$
After:
⬢[acme@toolbox perf-tools-next]$ perf bench internals pmu-scan
# Running 'internals/pmu-scan' benchmark:
Computing performance of sysfs PMU event scan for 100 times
Average PMU scanning took: 2419.870 usec (+- 9.057 usec)
⬢[acme@toolbox perf-tools-next]$
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Tested-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Likewise, it needs to traverse the pmu/caps directory, let's use
openat() with the dirfd instead of open() using the absolute path.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.org
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: LKML <linux-kernel@vger.kernel.org>
Cc: linux-perf-users@vger.kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
The PMU information is in the kernel sysfs so it needs to scan the
directory to get the whole information like event aliases, formats and
so on. During the traversal, it opens a lot of files and directories
like below:
dir = opendir("/sys/bus/event_source/devices");
while (dentry = readdir(dir)) {
char buf[PATH_MAX];
snprintf(buf, sizeof(buf), "%s/%s",
"/sys/bus/event_source/devices", dentry->d_name);
fd = open(buf, O_RDONLY);
...
}
But this is not good since it needs to copy the string to build the
absolute pathname, and it makes redundant pathname walk (from the /sys)
unnecessarily. We can use openat(2) to open the file in the given
directory. While it's not a problem ususally, it can be a problem when
the kernel has contentions on the sysfs.
Add a couple of new helper to return the file descriptor of PMU
directory so that it can use it with relative paths.
* perf_pmu__event_source_devices_fd()
- returns a fd for the PMU root ("/sys/bus/event_source/devices")
* perf_pmu__pathname_fd()
- returns a fd for "<pmu>/<file>" under the PMU root
Now the above code can be converted something like below:
dirfd = perf_pmu__event_source_devices_fd();
dir = fdopendir(dirfd);
while (dentry = readdir(dir)) {
fd = openat(dirfd, dentry->d_name, O_RDONLY);
...
}
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
It seems there's no function to delete the perf pmu struct. Add the
perf_pmu__destroy() to do the job. While at it, add some more helper
functions to delete pmu aliases and caps.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
In the parse_events_multi_pmu_add() it passes the 'config' variable
twice to parse_events_term__num() - one for config and another for
loc_term. I'm not sure about the second one as it's converted to
YYLTYPE variable. Asan reports it like below:
In function ‘parse_events_term__num’,
inlined from ‘parse_events_multi_pmu_add’ at util/parse-events.c:1602:6:
util/parse-events.c:2653:64: error: array subscript ‘YYLTYPE[0]’ is partly outside
array bounds of ‘char[8]’ [-Werror=array-bounds]
2653 | .err_term = loc_term ? loc_term->first_column : 0,
| ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^~~
util/parse-events.c: In function ‘parse_events_multi_pmu_add’:
util/parse-events.c:1587:15: note: object ‘config’ of size 8
1587 | char *config;
| ^~~~~~
cc1: all warnings being treated as errors
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Committer notes:
Added missing #include <unistd.h> for the close() prototype to fix this
on Alma Linux 8:
1 21.54 almalinux:8 : FAIL gcc version 8.5.0 20210514 (Red Hat 8.5.0-16) (GCC)
util/print-events.c: In function 'print_tracepoint_events':
util/print-events.c:103:4: error: implicit declaration of function 'close'; did you mean 'clone'? [-Werror=implicit-function-declaration]
close(evt_fd);
^~~~~
clone
Also use the newly added scandirat feature test to check if that
function is available, providing a HAVE_SCANDIRAT_SUPPORT conditional
warning to the user if it isn't available.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Acked-by: Ian Rogers <irogers@google.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ingo Molnar <mingo@kernel.org>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Kan Liang <kan.liang@linux.intel.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230331202949.810326-2-namhyung@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
After a standalone CBR (not associated with TSC), update the cycles
reference timestamp and reset the cycle count, so that CYC timestamps
are calculated relative to that point with the new frequency.
Fixes: cc33618619 ("perf tools: Add Intel PT support for decoding CYC packets")
Signed-off-by: Adrian Hunter <adrian.hunter@intel.com>
Cc: Adrian Hunter <adrian.hunter@intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20230403154831.8651-2-adrian.hunter@intel.com
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
Arm SPEv1.3 adds new load/store operation subclasses for Memory Tagging
Extension (MTE) and memory operations (MOPS). The memory operations
are memcpy and memset. Add support for decoding these new subclasses in
the raw decoding.
Reviewed-by: Leo Yan <leo.yan@linaro.org
Signed-off-by: Rob Herring <robh@kernel.org>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Ian Rogers <irogers@google.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230327162057.4057188-1-robh@kernel.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
When using dynamically assigned CoreSight trace IDs the drivers can output
the ID / CPU association as a PERF_RECORD_AUX_OUTPUT_HW_ID packet.
Update cs-etm decoder to handle this packet by setting the CPU/Trace ID
mapping.
Reviewed-by: James Clark <james.clark@arm.com>
Signed-off-by: Mike Leach <mike.leach@linaro.org>
Acked-by: Suzuki Poulouse <suzuki.poulose@arm.com>
Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
Cc: Darren Hart <darren@os.amperecomputing.com>
Cc: Ganapatrao Kulkarni <gankulkarni@os.amperecomputing.com>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Jiri Olsa <jolsa@kernel.org>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Namhyung Kim <namhyung@kernel.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Will Deacon <will@kernel.org>
Cc: coresight@lists.linaro.org
Cc: linux-arm-kernel@lists.infradead.org
Link: https://lore.kernel.org/r/20230331055645.26918-2-mike.leach@linaro.org
Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>