turbostat release 2024.07.26

Enable turbostat extensions to add both perf and PMT
 (Intel Platform Monitoring Technology) counters via the cmdline.
 
 Demonstrate PMT access with built-in support for Meteor Lake's Die%c6 counter.
 -----BEGIN PGP SIGNATURE-----
 
 iQJIBAABCgAyFiEE67dNfPFP+XUaA73mB9BFOha3NhcFAmaj7c8UHGxlbi5icm93
 bkBpbnRlbC5jb20ACgkQB9BFOha3Nhe6XBAArHsMOw7r1dF94yctBukD94szLasz
 9BGI64NNVYz4pUlM0BUayZzr0kYxMonLvKMHcg1XaeCF4DRByUyhM86QfPAUVskx
 qEY3raRu18wTlfku/90jYm5AM09Dp846zOf4dtrV2Io/JM8TLqo9gAsHtZI5Qaiu
 bR9nPL4vjysnIiUG5aowBhYBnI9xvAecxbID/qfOofZMGyb6h06xlXPkDkD2KTkL
 F/5Bv7AWohAQ7cX8EEg865eQCea4YDnQtcZg4yYRMkA78bVfE1WzCgA+qjb1dJ0A
 2bbL6m7cBvikWkii82VwYWzPLbJTlQDIpQRKxRQP4SvsRc0NinZmS5d4AO9I63h3
 JfIjtj7NEhzyPnaykJqcsgyOI3lBFbcEOUckutj0M8S3B6UJSC9WQnU2D6sGuQcW
 iwav+zbsEkxL3ebYkOLuTEGLhqJFy6ZNxvPVAh+Q63jcBxraZoHVqG1VbmtqslnE
 fSrhZ/hJIqo1F25X7DsMjyk7txDQYed9g/EArQWBb+DiL/ggvaO+FT/TGE3hCGCQ
 dt2J1+hhshHesQIhF4/9sj/wHrw8RA08BVqGWzAOvwx69wevvSszSQrGU5ISypTv
 9JfeOXGvc72ZQzC8MaotQfyHmxIIJ5ZQ6wFAcThc1Og39OU2+CbfIC5tLIqRoLR6
 sEWH07ty9KpN0aQ=
 =wsSy
 -----END PGP SIGNATURE-----

Merge tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux

Pull turbostat updates from Len Brown:

 - Enable turbostat extensions to add both perf and PMT (Intel
   Platform Monitoring Technology) counters via the cmdline

 - Demonstrate PMT access with built-in support for Meteor Lake's
   Die C6 counter

* tag 'v6.11-merge' of git://git.kernel.org/pub/scm/linux/kernel/git/lenb/linux:
  tools/power turbostat: version 2024.07.26
  tools/power turbostat: Include umask=%x in perf counter's config
  tools/power turbostat: Document PMT in turbostat.8
  tools/power turbostat: Add MTL's PMT DC6 builtin counter
  tools/power turbostat: Add early support for PMT counters
  tools/power turbostat: Add selftests for added perf counters
  tools/power turbostat: Add selftests for SMI, APERF and MPERF counters
  tools/power turbostat: Move verbose counter messages to level 2
  tools/power turbostat: Move debug prints from stdout to stderr
  tools/power turbostat: Fix typo in turbostat.8
  tools/power turbostat: Add perf added counter example to turbostat.8
  tools/power turbostat: Fix formatting in turbostat.8
  tools/power turbostat: Extend --add option with perf counters
  tools/power turbostat: Group SMI counter with APERF and MPERF
  tools/power turbostat: Add ZERO_ARRAY for zero initializing builtin array
  tools/power turbostat: Replace enum rapl_source and cstate_source with counter_source
  tools/power turbostat: Remove anonymous union from rapl_counter_info_t
  tools/power/turbostat: Switch to new Intel CPU model defines
This commit is contained in:
Linus Torvalds 2024-07-28 10:52:15 -07:00
commit e172f1e906
5 changed files with 2275 additions and 496 deletions

View File

@ -46,6 +46,7 @@ snapshot: turbostat
@echo "#define GENMASK_ULL(h, l) (((~0ULL) << (l)) & (~0ULL >> (sizeof(long long) * 8 - 1 - (h))))" >> $(SNAPSHOT)/bits.h
@echo '#define BUILD_BUG_ON(cond) do { enum { compile_time_check ## __COUNTER__ = 1/(!(cond)) }; } while (0)' > $(SNAPSHOT)/build_bug.h
@echo '#define __must_be_array(arr) 0' >> $(SNAPSHOT)/build_bug.h
@echo PWD=. > $(SNAPSHOT)/Makefile
@echo "CFLAGS += -DMSRHEADER='\"msr-index.h\"'" >> $(SNAPSHOT)/Makefile

View File

@ -28,10 +28,13 @@ name as necessary to disambiguate it from others is necessary. Note that option
.PP
\fB--add attributes\fP add column with counter having specified 'attributes'. The 'location' attribute is required, all others are optional.
.nf
location: {\fBmsrDDD\fP | \fBmsr0xXXX\fP | \fB/sys/path...\fP}
location: {\fBmsrDDD\fP | \fBmsr0xXXX\fP | \fB/sys/path...\fP | \fBperf/<device>/<event>\fP}
msrDDD is a decimal offset, eg. msr16
msr0xXXX is a hex offset, eg. msr0x10
/sys/path... is an absolute path to a sysfs attribute
<device> is a perf device from /sys/bus/event_source/devices/<device> eg. cstate_core
<event> is a perf event for given device from /sys/bus/event_source/devices/<device>/events/<event> eg. c1-residency
perf/cstate_core/c1-residency would then use /sys/bus/event_source/devices/cstate_core/events/c1-residency
scope: {\fBcpu\fP | \fBcore\fP | \fBpackage\fP}
sample and print the counter for every cpu, core, or package.
@ -52,6 +55,39 @@ name as necessary to disambiguate it from others is necessary. Note that option
as the column header.
.fi
.PP
\fB--add pmt,[attr_name=attr_value, ...]\fP add column with a PMT (Intel Platform Monitoring Technology) counter in a similar way to --add option above, but require PMT metadata to be supplied to correctly read and display the counter. The metadata can be found in the Intel PMT XML files, hosted at https://github.com/intel/Intel-PMT. For a complete example see "ADD PMT COUNTER EXAMPLE".
.nf
name="name_string"
For column header.
type={\fBraw\fP}
'raw' shows the counter contents in hex.
default: raw
format={\fBraw\fP | \fBdelta\fP}
'raw' shows the counter contents in hex.
'delta' shows the difference in values during the measurement interval.
default: raw
domain={\fBcpu%u\fP | \fBcore%u\fP | \fBpackage%u\fP}
'cpu' per cpu/thread counter.
'core' per core counter.
'package' per package counter.
'%u' denotes id of the domain that the counter is associated with. For example core4 would mean that the counter is associated with core number 4.
offset=\fB%u\fP
'%u' offset within the PMT MMIO region.
lsb=\fB%u\fP
'%u' least significant bit within the 64 bit value read from 'offset'. Together with 'msb', used to form a read mask.
msb=\fB%u\fP
'%u' most significant bit within the 64 bit value read from 'offset'. Together with 'lsb', used to form a read mask.
guid=\fB%x\fP
'%x' hex identifier of the PMT MMIO region.
.fi
.PP
\fB--cpu cpu-set\fP limit output to system summary plus the specified cpu-set. If cpu-set is the string "core", then the system summary plus the first CPU in each core are printed -- eg. subsequent HT siblings are not printed. Or if cpu-set is the string "package", then the system summary plus the first CPU in each package is printed. Otherwise, the system summary plus the specified set of CPUs are printed. The cpu-set is ordered from low to high, comma delimited with ".." and "-" permitted to denote a range. eg. 1,2,8,14..17,21-44
.PP
\fB--hide column\fP do not show the specified built-in columns. May be invoked multiple times, or with a comma-separated list of column names.
@ -67,10 +103,10 @@ The column name "all" can be used to enable all disabled-by-default built-in cou
.PP
\fB--quiet\fP Do not decode and print the system configuration header information.
.PP
+\fB--no-msr\fP Disable all the uses of the MSR driver.
+.PP
+\fB--no-perf\fP Disable all the uses of the perf API.
+.PP
\fB--no-msr\fP Disable all the uses of the MSR driver.
.PP
\fB--no-perf\fP Disable all the uses of the perf API.
.PP
\fB--interval seconds\fP overrides the default 5.0 second measurement interval.
.PP
\fB--num_iterations num\fP number of the measurement iterations.
@ -320,7 +356,7 @@ available on all processors.
Here we limit turbostat to showing just the CPU number for cpu0 - cpu3.
We add a counter showing the 32-bit raw value of MSR 0x199 (MSR_IA32_PERF_CTL),
labeling it with the column header, "PRF_CTRL", and display it only once,
afte the conclusion of a 0.1 second sleep.
after the conclusion of a 0.1 second sleep.
.nf
sudo ./turbostat --quiet --cpu 0-3 --show CPU --add msr0x199,u32,raw,PRF_CTRL sleep .1
0.101604 sec
@ -333,6 +369,56 @@ CPU PRF_CTRL
.fi
.SH ADD PERF COUNTER EXAMPLE
Here we limit turbostat to showing just the CPU number for cpu0 - cpu3.
We add a counter showing time spent in C1 core cstate,
labeling it with the column header, "pCPU%c1", and display it only once,
after the conclusion of 0.1 second sleep.
We also show CPU%c1 built-in counter that should show similar values.
.nf
sudo ./turbostat --quiet --cpu 0-3 --show CPU,CPU%c1 --add perf/cstate_core/c1-residency,cpu,delta,percent,pCPU%c1 sleep .1
0.102448 sec
CPU pCPU%c1 CPU%c1
- 34.89 34.89
0 45.99 45.99
1 45.94 45.94
2 23.83 23.83
3 23.84 23.84
.fi
.SH ADD PMT COUNTER EXAMPLE
Here we limit turbostat to showing just the CPU number 0.
We add two counters, showing crystal clock count and the DC6 residency.
All the parameters passed are based on the metadata found in the PMT XML files.
For the crystal clock count, we
label it with the column header, "XTAL",
we set the type to 'raw', to read the number of clock ticks in hex,
we set the format to 'delta', to display the difference in ticks during the measurement interval,
we set the domain to 'package0', to collect it and associate it with the whole package number 0,
we set the offset to '0', which is a offset of the counter within the PMT MMIO region,
we set the lsb and msb to cover all 64 bits of the read 64 bit value,
and finally we set the guid to '0x1a067102', that identifies the PMT MMIO region to which the 'offset' is applied to read the counter value.
For the DC6 residency counter, we
label it with the column header, "Die%c6",
we set the type to 'txtal_time', to obtain the percent residency value
we set the format to 'delta', to display the difference in ticks during the measurement interval,
we set the domain to 'package0', to collect it and associate it with the whole package number 0,
we set the offset to '0', which is a offset of the counter within the PMT MMIO region,
we set the lsb and msb to cover all 64 bits of the read 64 bit value,
and finally we set the guid to '0x1a067102', that identifies the PMT MMIO region to which the 'offset' is applied to read the counter value.
.nf
sudo ./turbostat --quiet --cpu 0 --show CPU --add pmt,name=XTAL,type=raw,format=delta,domain=package0,offset=0,lsb=0,msb=63,guid=0x1a067102 --add pmt,name=Die%c6,type=txtal_time,format=delta,domain=package0,offset=120,lsb=0,msb=63,guid=0x1a067102
0.104352 sec
CPU XTAL Die%c6
- 0x0000006d4d957ca7 0.00
0 0x0000006d4d957ca7 0.00
0.102448 sec
.fi
.SH INPUT
For interval-mode, turbostat will immediately end the current interval

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,178 @@
#!/bin/env python3
# SPDX-License-Identifier: GPL-2.0
import subprocess
from shutil import which
from os import pread
class PerfCounterInfo:
def __init__(self, subsys, event):
self.subsys = subsys
self.event = event
def get_perf_event_name(self):
return f'{self.subsys}/{self.event}/'
def get_turbostat_perf_id(self, counter_scope, counter_type, column_name):
return f'perf/{self.subsys}/{self.event},{counter_scope},{counter_type},{column_name}'
PERF_COUNTERS_CANDIDATES = [
PerfCounterInfo('msr', 'mperf'),
PerfCounterInfo('msr', 'aperf'),
PerfCounterInfo('msr', 'tsc'),
PerfCounterInfo('cstate_core', 'c1-residency'),
PerfCounterInfo('cstate_core', 'c6-residency'),
PerfCounterInfo('cstate_core', 'c7-residency'),
PerfCounterInfo('cstate_pkg', 'c2-residency'),
PerfCounterInfo('cstate_pkg', 'c3-residency'),
PerfCounterInfo('cstate_pkg', 'c6-residency'),
PerfCounterInfo('cstate_pkg', 'c7-residency'),
PerfCounterInfo('cstate_pkg', 'c8-residency'),
PerfCounterInfo('cstate_pkg', 'c9-residency'),
PerfCounterInfo('cstate_pkg', 'c10-residency'),
]
present_perf_counters = []
def check_perf_access():
perf = which('perf')
if perf is None:
print('SKIP: Could not find perf binary, thus could not determine perf access.')
return False
def has_perf_counter_access(counter_name):
proc_perf = subprocess.run([perf, 'stat', '-e', counter_name, '--timeout', '10'],
capture_output = True)
if proc_perf.returncode != 0:
print(f'SKIP: Could not read {counter_name} perf counter.')
return False
if b'<not supported>' in proc_perf.stderr:
print(f'SKIP: Could not read {counter_name} perf counter.')
return False
return True
for counter in PERF_COUNTERS_CANDIDATES:
if has_perf_counter_access(counter.get_perf_event_name()):
present_perf_counters.append(counter)
if len(present_perf_counters) == 0:
print('SKIP: Could not read any perf counter.')
return False
if len(present_perf_counters) != len(PERF_COUNTERS_CANDIDATES):
print(f'WARN: Could not access all of the counters - some will be left untested')
return True
if not check_perf_access():
exit(0)
turbostat_counter_source_opts = ['']
turbostat = which('turbostat')
if turbostat is None:
print('Could not find turbostat binary')
exit(1)
timeout = which('timeout')
if timeout is None:
print('Could not find timeout binary')
exit(1)
proc_turbostat = subprocess.run([turbostat, '--list'], capture_output = True)
if proc_turbostat.returncode != 0:
print(f'turbostat failed with {proc_turbostat.returncode}')
exit(1)
EXPECTED_COLUMNS_DEBUG_DEFAULT = [b'usec', b'Time_Of_Day_Seconds', b'APIC', b'X2APIC']
expected_columns = [b'CPU']
counters_argv = []
for counter in present_perf_counters:
if counter.subsys == 'cstate_core':
counter_scope = 'core'
elif counter.subsys == 'cstate_pkg':
counter_scope = 'package'
else:
counter_scope = 'cpu'
counter_type = 'delta'
column_name = counter.event
cparams = counter.get_turbostat_perf_id(
counter_scope = counter_scope,
counter_type = counter_type,
column_name = column_name
)
expected_columns.append(column_name.encode())
counters_argv.extend(['--add', cparams])
expected_columns_debug = EXPECTED_COLUMNS_DEBUG_DEFAULT + expected_columns
def gen_user_friendly_cmdline(argv_):
argv = argv_[:]
ret = ''
while len(argv) != 0:
arg = argv.pop(0)
arg_next = ''
if arg in ('-i', '--show', '--add'):
arg_next = argv.pop(0) if len(argv) > 0 else ''
ret += f'{arg} {arg_next} \\\n\t'
# Remove the last separator and return
return ret[:-4]
#
# Run turbostat for some time and send SIGINT
#
timeout_argv = [timeout, '--preserve-status', '-s', 'SIGINT', '-k', '3', '0.2s']
turbostat_argv = [turbostat, '-i', '0.50', '--show', 'CPU'] + counters_argv
def check_columns_or_fail(expected_columns: list, actual_columns: list):
if len(actual_columns) != len(expected_columns):
print(f'turbostat column check failed\n{expected_columns=}\n{actual_columns=}')
exit(1)
failed = False
for expected_column in expected_columns:
if expected_column not in actual_columns:
print(f'turbostat column check failed: missing column {expected_column.decode()}')
failed = True
if failed:
exit(1)
cmdline = gen_user_friendly_cmdline(turbostat_argv)
print(f'Running turbostat with:\n\t{cmdline}\n... ', end = '', flush = True)
proc_turbostat = subprocess.run(timeout_argv + turbostat_argv, capture_output = True)
if proc_turbostat.returncode != 0:
print(f'turbostat failed with {proc_turbostat.returncode}')
exit(1)
actual_columns = proc_turbostat.stdout.split(b'\n')[0].split(b'\t')
check_columns_or_fail(expected_columns, actual_columns)
print('OK')
#
# Same, but with --debug
#
# We explicitly specify '--show CPU' to make sure turbostat
# don't show a bunch of default counters instead.
#
turbostat_argv.append('--debug')
cmdline = gen_user_friendly_cmdline(turbostat_argv)
print(f'Running turbostat (in debug mode) with:\n\t{cmdline}\n... ', end = '', flush = True)
proc_turbostat = subprocess.run(timeout_argv + turbostat_argv, capture_output = True)
if proc_turbostat.returncode != 0:
print(f'turbostat failed with {proc_turbostat.returncode}')
exit(1)
actual_columns = proc_turbostat.stdout.split(b'\n')[0].split(b'\t')
check_columns_or_fail(expected_columns_debug, actual_columns)
print('OK')

View File

@ -0,0 +1,157 @@
#!/bin/env python3
# SPDX-License-Identifier: GPL-2.0
import subprocess
from shutil import which
from os import pread
# CDLL calls dlopen underneath.
# Calling it with None (null), we get handle to the our own image (python interpreter).
# We hope to find sched_getcpu() inside ;]
# This is a bit ugly, but helps shipping working software, so..
try:
import ctypes
this_image = ctypes.CDLL(None)
BASE_CPU = this_image.sched_getcpu()
except:
BASE_CPU = 0 # If we fail, set to 0 and pray it's not offline.
MSR_IA32_MPERF = 0x000000e7
MSR_IA32_APERF = 0x000000e8
def check_perf_access():
perf = which('perf')
if perf is None:
print('SKIP: Could not find perf binary, thus could not determine perf access.')
return False
def has_perf_counter_access(counter_name):
proc_perf = subprocess.run([perf, 'stat', '-e', counter_name, '--timeout', '10'],
capture_output = True)
if proc_perf.returncode != 0:
print(f'SKIP: Could not read {counter_name} perf counter, assuming no access.')
return False
if b'<not supported>' in proc_perf.stderr:
print(f'SKIP: Could not read {counter_name} perf counter, assuming no access.')
return False
return True
if not has_perf_counter_access('msr/mperf/'):
return False
if not has_perf_counter_access('msr/aperf/'):
return False
if not has_perf_counter_access('msr/smi/'):
return False
return True
def check_msr_access():
try:
file_msr = open(f'/dev/cpu/{BASE_CPU}/msr', 'rb')
except:
return False
if len(pread(file_msr.fileno(), 8, MSR_IA32_MPERF)) != 8:
return False
if len(pread(file_msr.fileno(), 8, MSR_IA32_APERF)) != 8:
return False
return True
has_perf_access = check_perf_access()
has_msr_access = check_msr_access()
turbostat_counter_source_opts = ['']
if has_msr_access:
turbostat_counter_source_opts.append('--no-perf')
else:
print('SKIP: doesn\'t have MSR access, skipping run with --no-perf')
if has_perf_access:
turbostat_counter_source_opts.append('--no-msr')
else:
print('SKIP: doesn\'t have perf access, skipping run with --no-msr')
if not has_msr_access and not has_perf_access:
print('SKIP: No MSR nor perf access detected. Skipping the tests entirely')
exit(0)
turbostat = which('turbostat')
if turbostat is None:
print('Could not find turbostat binary')
exit(1)
timeout = which('timeout')
if timeout is None:
print('Could not find timeout binary')
exit(1)
proc_turbostat = subprocess.run([turbostat, '--list'], capture_output = True)
if proc_turbostat.returncode != 0:
print(f'turbostat failed with {proc_turbostat.returncode}')
exit(1)
EXPECTED_COLUMNS_DEBUG_DEFAULT = b'usec\tTime_Of_Day_Seconds\tAPIC\tX2APIC'
SMI_APERF_MPERF_DEPENDENT_BICS = [
'SMI',
'Avg_MHz',
'Busy%',
'Bzy_MHz',
]
if has_perf_access:
SMI_APERF_MPERF_DEPENDENT_BICS.append('IPC')
for bic in SMI_APERF_MPERF_DEPENDENT_BICS:
for counter_source_opt in turbostat_counter_source_opts:
# Ugly special case, but it is what it is..
if counter_source_opt == '--no-perf' and bic == 'IPC':
continue
expected_columns = bic.encode()
expected_columns_debug = EXPECTED_COLUMNS_DEBUG_DEFAULT + f'\t{bic}'.encode()
#
# Run turbostat for some time and send SIGINT
#
timeout_argv = [timeout, '--preserve-status', '-s', 'SIGINT', '-k', '3', '0.2s']
turbostat_argv = [turbostat, '-i', '0.50', '--show', bic]
if counter_source_opt:
turbostat_argv.append(counter_source_opt)
print(f'Running turbostat with {turbostat_argv=}... ', end = '', flush = True)
proc_turbostat = subprocess.run(timeout_argv + turbostat_argv, capture_output = True)
if proc_turbostat.returncode != 0:
print(f'turbostat failed with {proc_turbostat.returncode}')
exit(1)
actual_columns = proc_turbostat.stdout.split(b'\n')[0]
if expected_columns != actual_columns:
print(f'turbostat column check failed\n{expected_columns=}\n{actual_columns=}')
exit(1)
print('OK')
#
# Same, but with --debug
#
turbostat_argv.append('--debug')
print(f'Running turbostat with {turbostat_argv=}... ', end = '', flush = True)
proc_turbostat = subprocess.run(timeout_argv + turbostat_argv, capture_output = True)
if proc_turbostat.returncode != 0:
print(f'turbostat failed with {proc_turbostat.returncode}')
exit(1)
actual_columns = proc_turbostat.stdout.split(b'\n')[0]
if expected_columns_debug != actual_columns:
print(f'turbostat column check failed\n{expected_columns_debug=}\n{actual_columns=}')
exit(1)
print('OK')