mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2025-01-06 05:44:20 +08:00
3fd7a168bf
There's no field for the cgroup, let's add one. To do that, users need to specify --all-cgroup option for perf record to capture the cgroup info. $ perf record --all-cgroups -- true $ perf script -F comm,pid,cgroup true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... true 337112 /user.slice/user-657345.slice/user@657345.service/... If it's recorded without the --all-cgroups, it'd complain. $ perf script -F comm,pid,cgroup Samples for 'cycles:u' event do not have CGROUP attribute set. Cannot print 'cgroup' field. Hint: run 'perf record --all-cgroups ...' Signed-off-by: Namhyung Kim <namhyung@kernel.org> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Ian Rogers <irogers@google.com> Cc: Ingo Molnar <mingo@kernel.org> Cc: Jiri Olsa <jolsa@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Link: https://lore.kernel.org/r/20230126213610.3381147-1-namhyung@kernel.org Signed-off-by: Arnaldo Carvalho de Melo <acme@redhat.com>
522 lines
18 KiB
Plaintext
522 lines
18 KiB
Plaintext
perf-script(1)
|
||
=============
|
||
|
||
NAME
|
||
----
|
||
perf-script - Read perf.data (created by perf record) and display trace output
|
||
|
||
SYNOPSIS
|
||
--------
|
||
[verse]
|
||
'perf script' [<options>]
|
||
'perf script' [<options>] record <script> [<record-options>] <command>
|
||
'perf script' [<options>] report <script> [script-args]
|
||
'perf script' [<options>] <script> <required-script-args> [<record-options>] <command>
|
||
'perf script' [<options>] <top-script> [script-args]
|
||
|
||
DESCRIPTION
|
||
-----------
|
||
This command reads the input file and displays the trace recorded.
|
||
|
||
There are several variants of perf script:
|
||
|
||
'perf script' to see a detailed trace of the workload that was
|
||
recorded.
|
||
|
||
You can also run a set of pre-canned scripts that aggregate and
|
||
summarize the raw trace data in various ways (the list of scripts is
|
||
available via 'perf script -l'). The following variants allow you to
|
||
record and run those scripts:
|
||
|
||
'perf script record <script> <command>' to record the events required
|
||
for 'perf script report'. <script> is the name displayed in the
|
||
output of 'perf script --list' i.e. the actual script name minus any
|
||
language extension. If <command> is not specified, the events are
|
||
recorded using the -a (system-wide) 'perf record' option.
|
||
|
||
'perf script report <script> [args]' to run and display the results
|
||
of <script>. <script> is the name displayed in the output of 'perf
|
||
script --list' i.e. the actual script name minus any language
|
||
extension. The perf.data output from a previous run of 'perf script
|
||
record <script>' is used and should be present for this command to
|
||
succeed. [args] refers to the (mainly optional) args expected by
|
||
the script.
|
||
|
||
'perf script <script> <required-script-args> <command>' to both
|
||
record the events required for <script> and to run the <script>
|
||
using 'live-mode' i.e. without writing anything to disk. <script>
|
||
is the name displayed in the output of 'perf script --list' i.e. the
|
||
actual script name minus any language extension. If <command> is
|
||
not specified, the events are recorded using the -a (system-wide)
|
||
'perf record' option. If <script> has any required args, they
|
||
should be specified before <command>. This mode doesn't allow for
|
||
optional script args to be specified; if optional script args are
|
||
desired, they can be specified using separate 'perf script record'
|
||
and 'perf script report' commands, with the stdout of the record step
|
||
piped to the stdin of the report script, using the '-o -' and '-i -'
|
||
options of the corresponding commands.
|
||
|
||
'perf script <top-script>' to both record the events required for
|
||
<top-script> and to run the <top-script> using 'live-mode'
|
||
i.e. without writing anything to disk. <top-script> is the name
|
||
displayed in the output of 'perf script --list' i.e. the actual
|
||
script name minus any language extension; a <top-script> is defined
|
||
as any script name ending with the string 'top'.
|
||
|
||
[<record-options>] can be passed to the record steps of 'perf script
|
||
record' and 'live-mode' variants; this isn't possible however for
|
||
<top-script> 'live-mode' or 'perf script report' variants.
|
||
|
||
See the 'SEE ALSO' section for links to language-specific
|
||
information on how to write and run your own trace scripts.
|
||
|
||
OPTIONS
|
||
-------
|
||
<command>...::
|
||
Any command you can specify in a shell.
|
||
|
||
-D::
|
||
--dump-raw-trace=::
|
||
Display verbose dump of the trace data.
|
||
|
||
--dump-unsorted-raw-trace=::
|
||
Same as --dump-raw-trace but not sorted in time order.
|
||
|
||
-L::
|
||
--Latency=::
|
||
Show latency attributes (irqs/preemption disabled, etc).
|
||
|
||
-l::
|
||
--list=::
|
||
Display a list of available trace scripts.
|
||
|
||
-s ['lang']::
|
||
--script=::
|
||
Process trace data with the given script ([lang]:script[.ext]).
|
||
If the string 'lang' is specified in place of a script name, a
|
||
list of supported languages will be displayed instead.
|
||
|
||
-g::
|
||
--gen-script=::
|
||
Generate perf-script.[ext] starter script for given language,
|
||
using current perf.data.
|
||
|
||
--dlfilter=<file>::
|
||
Filter sample events using the given shared object file.
|
||
Refer linkperf:perf-dlfilter[1]
|
||
|
||
--dlarg=<arg>::
|
||
Pass 'arg' as an argument to the dlfilter. --dlarg may be repeated
|
||
to add more arguments.
|
||
|
||
--list-dlfilters::
|
||
Display a list of available dlfilters. Use with option -v (must come
|
||
before option --list-dlfilters) to show long descriptions.
|
||
|
||
-a::
|
||
Force system-wide collection. Scripts run without a <command>
|
||
normally use -a by default, while scripts run with a <command>
|
||
normally don't - this option allows the latter to be run in
|
||
system-wide mode.
|
||
|
||
-i::
|
||
--input=::
|
||
Input file name. (default: perf.data unless stdin is a fifo)
|
||
|
||
-d::
|
||
--debug-mode::
|
||
Do various checks like samples ordering and lost events.
|
||
|
||
-F::
|
||
--fields::
|
||
Comma separated list of fields to print. Options are:
|
||
comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff,
|
||
srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output,
|
||
brstackinsn, brstackinsnlen, brstackoff, callindent, insn, insnlen, synth,
|
||
phys_addr, metric, misc, srccode, ipc, data_page_size, code_page_size, ins_lat,
|
||
machine_pid, vcpu, cgroup.
|
||
Field list can be prepended with the type, trace, sw or hw,
|
||
to indicate to which event type the field list applies.
|
||
e.g., -F sw:comm,tid,time,ip,sym and -F trace:time,cpu,trace
|
||
|
||
perf script -F <fields>
|
||
|
||
is equivalent to:
|
||
|
||
perf script -F trace:<fields> -F sw:<fields> -F hw:<fields>
|
||
|
||
i.e., the specified fields apply to all event types if the type string
|
||
is not given.
|
||
|
||
In addition to overriding fields, it is also possible to add or remove
|
||
fields from the defaults. For example
|
||
|
||
-F -cpu,+insn
|
||
|
||
removes the cpu field and adds the insn field. Adding/removing fields
|
||
cannot be mixed with normal overriding.
|
||
|
||
The arguments are processed in the order received. A later usage can
|
||
reset a prior request. e.g.:
|
||
|
||
-F trace: -F comm,tid,time,ip,sym
|
||
|
||
The first -F suppresses trace events (field list is ""), but then the
|
||
second invocation sets the fields to comm,tid,time,ip,sym. In this case a
|
||
warning is given to the user:
|
||
|
||
"Overriding previous field request for all events."
|
||
|
||
Alternatively, consider the order:
|
||
|
||
-F comm,tid,time,ip,sym -F trace:
|
||
|
||
The first -F sets the fields for all events and the second -F
|
||
suppresses trace events. The user is given a warning message about
|
||
the override, and the result of the above is that only S/W and H/W
|
||
events are displayed with the given fields.
|
||
|
||
It's possible tp add/remove fields only for specific event type:
|
||
|
||
-Fsw:-cpu,-period
|
||
|
||
removes cpu and period from software events.
|
||
|
||
For the 'wildcard' option if a user selected field is invalid for an
|
||
event type, a message is displayed to the user that the option is
|
||
ignored for that type. For example:
|
||
|
||
$ perf script -F comm,tid,trace
|
||
'trace' not valid for hardware events. Ignoring.
|
||
'trace' not valid for software events. Ignoring.
|
||
|
||
Alternatively, if the type is given an invalid field is specified it
|
||
is an error. For example:
|
||
|
||
perf script -v -F sw:comm,tid,trace
|
||
'trace' not valid for software events.
|
||
|
||
At this point usage is displayed, and perf-script exits.
|
||
|
||
The flags field is synthesized and may have a value when Instruction
|
||
Trace decoding. The flags are "bcrosyiABExghDt" which stand for branch,
|
||
call, return, conditional, system, asynchronous, interrupt,
|
||
transaction abort, trace begin, trace end, in transaction, VM-Entry,
|
||
VM-Exit, interrupt disabled and interrupt disable toggle respectively.
|
||
Known combinations of flags are printed more nicely e.g.
|
||
"call" for "bc", "return" for "br", "jcc" for "bo", "jmp" for "b",
|
||
"int" for "bci", "iret" for "bri", "syscall" for "bcs", "sysret" for "brs",
|
||
"async" for "by", "hw int" for "bcyi", "tx abrt" for "bA", "tr strt" for "bB",
|
||
"tr end" for "bE", "vmentry" for "bcg", "vmexit" for "bch".
|
||
However the "x", "D" and "t" flags will be displayed separately in those
|
||
cases e.g. "jcc (xD)" for a condition branch within a transaction
|
||
with interrupts disabled. Note, interrupts becoming disabled is "t",
|
||
whereas interrupts becoming enabled is "Dt".
|
||
|
||
The callindent field is synthesized and may have a value when
|
||
Instruction Trace decoding. For calls and returns, it will display the
|
||
name of the symbol indented with spaces to reflect the stack depth.
|
||
|
||
When doing instruction trace decoding insn and insnlen give the
|
||
instruction bytes and the instruction length of the current
|
||
instruction.
|
||
|
||
The synth field is used by synthesized events which may be created when
|
||
Instruction Trace decoding.
|
||
|
||
The ipc (instructions per cycle) field is synthesized and may have a value when
|
||
Instruction Trace decoding.
|
||
|
||
The machine_pid and vcpu fields are derived from data resulting from using
|
||
perf inject to insert a perf.data file recorded inside a virtual machine into
|
||
a perf.data file recorded on the host at the same time.
|
||
|
||
The cgroup fields requires sample having the cgroup id which is saved
|
||
when "--all-cgroups" option is passed to 'perf record'.
|
||
|
||
Finally, a user may not set fields to none for all event types.
|
||
i.e., -F "" is not allowed.
|
||
|
||
The brstack output includes branch related information with raw addresses using the
|
||
/v/v/v/v/cycles syntax in the following order:
|
||
FROM: branch source instruction
|
||
TO : branch target instruction
|
||
M/P/-: M=branch target mispredicted or branch direction was mispredicted, P=target predicted or direction predicted, -=not supported
|
||
X/- : X=branch inside a transactional region, -=not in transaction region or not supported
|
||
A/- : A=TSX abort entry, -=not aborted region or not supported
|
||
cycles
|
||
|
||
The brstacksym is identical to brstack, except that the FROM and TO addresses are printed in a symbolic form if possible.
|
||
|
||
When brstackinsn is specified the full assembler sequences of branch sequences for each sample
|
||
is printed. This is the full execution path leading to the sample. This is only supported when the
|
||
sample was recorded with perf record -b or -j any.
|
||
|
||
Use brstackinsnlen to print the brstackinsn lenght. For example, you
|
||
can’t know the next sequential instruction after an unconditional branch unless
|
||
you calculate that based on its length.
|
||
|
||
The brstackoff field will print an offset into a specific dso/binary.
|
||
|
||
With the metric option perf script can compute metrics for
|
||
sampling periods, similar to perf stat. This requires
|
||
specifying a group with multiple events defining metrics with the :S option
|
||
for perf record. perf will sample on the first event, and
|
||
print computed metrics for all the events in the group. Please note
|
||
that the metric computed is averaged over the whole sampling
|
||
period (since the last sample), not just for the sample point.
|
||
|
||
For sample events it's possible to display misc field with -F +misc option,
|
||
following letters are displayed for each bit:
|
||
|
||
PERF_RECORD_MISC_KERNEL K
|
||
PERF_RECORD_MISC_USER U
|
||
PERF_RECORD_MISC_HYPERVISOR H
|
||
PERF_RECORD_MISC_GUEST_KERNEL G
|
||
PERF_RECORD_MISC_GUEST_USER g
|
||
PERF_RECORD_MISC_MMAP_DATA* M
|
||
PERF_RECORD_MISC_COMM_EXEC E
|
||
PERF_RECORD_MISC_SWITCH_OUT S
|
||
PERF_RECORD_MISC_SWITCH_OUT_PREEMPT Sp
|
||
|
||
$ perf script -F +misc ...
|
||
sched-messaging 1414 K 28690.636582: 4590 cycles ...
|
||
sched-messaging 1407 U 28690.636600: 325620 cycles ...
|
||
sched-messaging 1414 K 28690.636608: 19473 cycles ...
|
||
misc field ___________/
|
||
|
||
-k::
|
||
--vmlinux=<file>::
|
||
vmlinux pathname
|
||
|
||
--kallsyms=<file>::
|
||
kallsyms pathname
|
||
|
||
--symfs=<directory>::
|
||
Look for files with symbols relative to this directory.
|
||
|
||
-G::
|
||
--hide-call-graph::
|
||
When printing symbols do not display call chain.
|
||
|
||
--stop-bt::
|
||
Stop display of callgraph at these symbols
|
||
|
||
-C::
|
||
--cpu:: Only report samples for the list of CPUs provided. Multiple CPUs can
|
||
be provided as a comma-separated list with no space: 0,1. Ranges of
|
||
CPUs are specified with -: 0-2. Default is to report samples on all
|
||
CPUs.
|
||
|
||
-c::
|
||
--comms=::
|
||
Only display events for these comms. CSV that understands
|
||
file://filename entries.
|
||
|
||
--pid=::
|
||
Only show events for given process ID (comma separated list).
|
||
|
||
--tid=::
|
||
Only show events for given thread ID (comma separated list).
|
||
|
||
-I::
|
||
--show-info::
|
||
Display extended information about the perf.data file. This adds
|
||
information which may be very large and thus may clutter the display.
|
||
It currently includes: cpu and numa topology of the host system.
|
||
It can only be used with the perf script report mode.
|
||
|
||
--show-kernel-path::
|
||
Try to resolve the path of [kernel.kallsyms]
|
||
|
||
--show-task-events
|
||
Display task related events (e.g. FORK, COMM, EXIT).
|
||
|
||
--show-mmap-events
|
||
Display mmap related events (e.g. MMAP, MMAP2).
|
||
|
||
--show-namespace-events
|
||
Display namespace events i.e. events of type PERF_RECORD_NAMESPACES.
|
||
|
||
--show-switch-events
|
||
Display context switch events i.e. events of type PERF_RECORD_SWITCH or
|
||
PERF_RECORD_SWITCH_CPU_WIDE.
|
||
|
||
--show-lost-events
|
||
Display lost events i.e. events of type PERF_RECORD_LOST.
|
||
|
||
--show-round-events
|
||
Display finished round events i.e. events of type PERF_RECORD_FINISHED_ROUND.
|
||
|
||
--show-bpf-events
|
||
Display bpf events i.e. events of type PERF_RECORD_KSYMBOL and PERF_RECORD_BPF_EVENT.
|
||
|
||
--show-cgroup-events
|
||
Display cgroup events i.e. events of type PERF_RECORD_CGROUP.
|
||
|
||
--show-text-poke-events
|
||
Display text poke events i.e. events of type PERF_RECORD_TEXT_POKE and
|
||
PERF_RECORD_KSYMBOL.
|
||
|
||
--demangle::
|
||
Demangle symbol names to human readable form. It's enabled by default,
|
||
disable with --no-demangle.
|
||
|
||
--demangle-kernel::
|
||
Demangle kernel symbol names to human readable form (for C++ kernels).
|
||
|
||
--header
|
||
Show perf.data header.
|
||
|
||
--header-only
|
||
Show only perf.data header.
|
||
|
||
--itrace::
|
||
Options for decoding instruction tracing data. The options are:
|
||
|
||
include::itrace.txt[]
|
||
|
||
To disable decoding entirely, use --no-itrace.
|
||
|
||
--full-source-path::
|
||
Show the full path for source files for srcline output.
|
||
|
||
--max-stack::
|
||
Set the stack depth limit when parsing the callchain, anything
|
||
beyond the specified depth will be ignored. This is a trade-off
|
||
between information loss and faster processing especially for
|
||
workloads that can have a very long callchain stack.
|
||
Note that when using the --itrace option the synthesized callchain size
|
||
will override this value if the synthesized callchain size is bigger.
|
||
|
||
Default: 127
|
||
|
||
--ns::
|
||
Use 9 decimal places when displaying time (i.e. show the nanoseconds)
|
||
|
||
-f::
|
||
--force::
|
||
Don't do ownership validation.
|
||
|
||
--time::
|
||
Only analyze samples within given time window: <start>,<stop>. Times
|
||
have the format seconds.nanoseconds. If start is not given (i.e. time
|
||
string is ',x.y') then analysis starts at the beginning of the file. If
|
||
stop time is not given (i.e. time string is 'x.y,') then analysis goes
|
||
to end of file. Multiple ranges can be separated by spaces, which
|
||
requires the argument to be quoted e.g. --time "1234.567,1234.789 1235,"
|
||
|
||
Also support time percent with multiple time ranges. Time string is
|
||
'a%/n,b%/m,...' or 'a%-b%,c%-%d,...'.
|
||
|
||
For example:
|
||
Select the second 10% time slice:
|
||
perf script --time 10%/2
|
||
|
||
Select from 0% to 10% time slice:
|
||
perf script --time 0%-10%
|
||
|
||
Select the first and second 10% time slices:
|
||
perf script --time 10%/1,10%/2
|
||
|
||
Select from 0% to 10% and 30% to 40% slices:
|
||
perf script --time 0%-10%,30%-40%
|
||
|
||
--max-blocks::
|
||
Set the maximum number of program blocks to print with brstackinsn for
|
||
each sample.
|
||
|
||
--reltime::
|
||
Print time stamps relative to trace start.
|
||
|
||
--deltatime::
|
||
Print time stamps relative to previous event.
|
||
|
||
--per-event-dump::
|
||
Create per event files with a "perf.data.EVENT.dump" name instead of
|
||
printing to stdout, useful, for instance, for generating flamegraphs.
|
||
|
||
--inline::
|
||
If a callgraph address belongs to an inlined function, the inline stack
|
||
will be printed. Each entry has function name and file/line. Enabled by
|
||
default, disable with --no-inline.
|
||
|
||
--insn-trace::
|
||
Show instruction stream for intel_pt traces. Combine with --xed to
|
||
show disassembly.
|
||
|
||
--xed::
|
||
Run xed disassembler on output. Requires installing the xed disassembler.
|
||
|
||
-S::
|
||
--symbols=symbol[,symbol...]::
|
||
Only consider the listed symbols. Symbols are typically a name
|
||
but they may also be hexadecimal address.
|
||
|
||
The hexadecimal address may be the start address of a symbol or
|
||
any other address to filter the trace records
|
||
|
||
For example, to select the symbol noploop or the address 0x4007a0:
|
||
perf script --symbols=noploop,0x4007a0
|
||
|
||
Support filtering trace records by symbol name, start address of
|
||
symbol, any hexadecimal address and address range.
|
||
|
||
The comparison order is:
|
||
|
||
1. symbol name comparison
|
||
2. symbol start address comparison.
|
||
3. any hexadecimal address comparison.
|
||
4. address range comparison (see --addr-range).
|
||
|
||
--addr-range::
|
||
Use with -S or --symbols to list traced records within address range.
|
||
|
||
For example, to list the traced records within the address range
|
||
[0x4007a0, 0x0x4007a9]:
|
||
perf script -S 0x4007a0 --addr-range 10
|
||
|
||
--dsos=::
|
||
Only consider symbols in these DSOs.
|
||
|
||
--call-trace::
|
||
Show call stream for intel_pt traces. The CPUs are interleaved, but
|
||
can be filtered with -C.
|
||
|
||
--call-ret-trace::
|
||
Show call and return stream for intel_pt traces.
|
||
|
||
--graph-function::
|
||
For itrace only show specified functions and their callees for
|
||
itrace. Multiple functions can be separated by comma.
|
||
|
||
--switch-on EVENT_NAME::
|
||
Only consider events after this event is found.
|
||
|
||
--switch-off EVENT_NAME::
|
||
Stop considering events after this event is found.
|
||
|
||
--show-on-off-events::
|
||
Show the --switch-on/off events too.
|
||
|
||
--stitch-lbr::
|
||
Show callgraph with stitched LBRs, which may have more complete
|
||
callgraph. The perf.data file must have been obtained using
|
||
perf record --call-graph lbr.
|
||
Disabled by default. In common cases with call stack overflows,
|
||
it can recreate better call stacks than the default lbr call stack
|
||
output. But this approach is not foolproof. There can be cases
|
||
where it creates incorrect call stacks from incorrect matches.
|
||
The known limitations include exception handing such as
|
||
setjmp/longjmp will have calls/returns not match.
|
||
|
||
:GMEXAMPLECMD: script
|
||
:GMEXAMPLESUBCMD:
|
||
include::guest-files.txt[]
|
||
|
||
SEE ALSO
|
||
--------
|
||
linkperf:perf-record[1], linkperf:perf-script-perl[1],
|
||
linkperf:perf-script-python[1], linkperf:perf-intel-pt[1],
|
||
linkperf:perf-dlfilter[1]
|