mirror of
https://github.com/edk2-porting/linux-next.git
synced 2024-12-16 09:13:55 +08:00
105 lines
3.7 KiB
Plaintext
105 lines
3.7 KiB
Plaintext
|
|
||
|
Performance Counters for Linux
|
||
|
------------------------------
|
||
|
|
||
|
Performance counters are special hardware registers available on most modern
|
||
|
CPUs. These registers count the number of certain types of hw events: such
|
||
|
as instructions executed, cachemisses suffered, or branches mis-predicted -
|
||
|
without slowing down the kernel or applications. These registers can also
|
||
|
trigger interrupts when a threshold number of events have passed - and can
|
||
|
thus be used to profile the code that runs on that CPU.
|
||
|
|
||
|
The Linux Performance Counter subsystem provides an abstraction of these
|
||
|
hardware capabilities. It provides per task and per CPU counters, and
|
||
|
it provides event capabilities on top of those.
|
||
|
|
||
|
Performance counters are accessed via special file descriptors.
|
||
|
There's one file descriptor per virtual counter used.
|
||
|
|
||
|
The special file descriptor is opened via the perf_counter_open()
|
||
|
system call:
|
||
|
|
||
|
int
|
||
|
perf_counter_open(u32 hw_event_type,
|
||
|
u32 hw_event_period,
|
||
|
u32 record_type,
|
||
|
pid_t pid,
|
||
|
int cpu);
|
||
|
|
||
|
The syscall returns the new fd. The fd can be used via the normal
|
||
|
VFS system calls: read() can be used to read the counter, fcntl()
|
||
|
can be used to set the blocking mode, etc.
|
||
|
|
||
|
Multiple counters can be kept open at a time, and the counters
|
||
|
can be poll()ed.
|
||
|
|
||
|
When creating a new counter fd, 'hw_event_type' is one of:
|
||
|
|
||
|
enum hw_event_types {
|
||
|
PERF_COUNT_CYCLES,
|
||
|
PERF_COUNT_INSTRUCTIONS,
|
||
|
PERF_COUNT_CACHE_REFERENCES,
|
||
|
PERF_COUNT_CACHE_MISSES,
|
||
|
PERF_COUNT_BRANCH_INSTRUCTIONS,
|
||
|
PERF_COUNT_BRANCH_MISSES,
|
||
|
};
|
||
|
|
||
|
These are standardized types of events that work uniformly on all CPUs
|
||
|
that implements Performance Counters support under Linux. If a CPU is
|
||
|
not able to count branch-misses, then the system call will return
|
||
|
-EINVAL.
|
||
|
|
||
|
[ Note: more hw_event_types are supported as well, but they are CPU
|
||
|
specific and are enumerated via /sys on a per CPU basis. Raw hw event
|
||
|
types can be passed in as negative numbers. For example, to count
|
||
|
"External bus cycles while bus lock signal asserted" events on Intel
|
||
|
Core CPUs, pass in a -0x4064 event type value. ]
|
||
|
|
||
|
The parameter 'hw_event_period' is the number of events before waking up
|
||
|
a read() that is blocked on a counter fd. Zero value means a non-blocking
|
||
|
counter.
|
||
|
|
||
|
'record_type' is the type of data that a read() will provide for the
|
||
|
counter, and it can be one of:
|
||
|
|
||
|
enum perf_record_type {
|
||
|
PERF_RECORD_SIMPLE,
|
||
|
PERF_RECORD_IRQ,
|
||
|
};
|
||
|
|
||
|
a "simple" counter is one that counts hardware events and allows
|
||
|
them to be read out into a u64 count value. (read() returns 8 on
|
||
|
a successful read of a simple counter.)
|
||
|
|
||
|
An "irq" counter is one that will also provide an IRQ context information:
|
||
|
the IP of the interrupted context. In this case read() will return
|
||
|
the 8-byte counter value, plus the Instruction Pointer address of the
|
||
|
interrupted context.
|
||
|
|
||
|
The 'pid' parameter allows the counter to be specific to a task:
|
||
|
|
||
|
pid == 0: if the pid parameter is zero, the counter is attached to the
|
||
|
current task.
|
||
|
|
||
|
pid > 0: the counter is attached to a specific task (if the current task
|
||
|
has sufficient privilege to do so)
|
||
|
|
||
|
pid < 0: all tasks are counted (per cpu counters)
|
||
|
|
||
|
The 'cpu' parameter allows a counter to be made specific to a full
|
||
|
CPU:
|
||
|
|
||
|
cpu >= 0: the counter is restricted to a specific CPU
|
||
|
cpu == -1: the counter counts on all CPUs
|
||
|
|
||
|
Note: the combination of 'pid == -1' and 'cpu == -1' is not valid.
|
||
|
|
||
|
A 'pid > 0' and 'cpu == -1' counter is a per task counter that counts
|
||
|
events of that task and 'follows' that task to whatever CPU the task
|
||
|
gets schedule to. Per task counters can be created by any user, for
|
||
|
their own tasks.
|
||
|
|
||
|
A 'pid == -1' and 'cpu == x' counter is a per CPU counter that counts
|
||
|
all events on CPU-x. Per CPU counters need CAP_SYS_ADMIN privilege.
|
||
|
|