no_llseek had been defined to NULL two years ago, in commit 868941b144
("fs: remove no_llseek")
To quote that commit,
At -rc1 we'll need do a mechanical removal of no_llseek -
git grep -l -w no_llseek | grep -v porting.rst | while read i; do
sed -i '/\<no_llseek\>/d' $i
done
would do it.
Unfortunately, that hadn't been done. Linus, could you do that now, so
that we could finally put that thing to rest? All instances are of the
form
.llseek = no_llseek,
so it's obviously safe.
Signed-off-by: Al Viro <viro@zeniv.linux.org.uk>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
The macros PERF_CPUM_CF_MAX_CTR and PERF_EVENT_CPUM_CF_DIAG
are used in only one source file arch/s390/kernel/perf_cpum_cf.c.
Move these defines from the header file
arch/s390/include/asm/perf_event.h to the only user.
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Event CF_DIAG reads out complete counter sets using stcctm
instruction. This is done at event start time when the process
starts execution and at event stop time when the process is
removed from the CPU. During removal the difference of each
counter in the counter sets is calculated and saved as raw data
in the ring buffer. This works fine unless the number of counters
in a counter set is zero. This may happen for the extended counter
set. This set is machine specific and the size of the counter
set can be zero even when extended counter set is authorized for
read access.
This case is not handled. cfdiag_diffctr() checks authorization
of the extended counter set. If true the functions assumes
the extended counter set has been saved in a data buffer. However
this is not the case, cfdiag_getctrset() does not save a counter
set with counter set size of zero. This mismatch causes an endless
loop in the counter set readout during event stop handling.
The calculation of the difference of the counters in each counter
now verifies the size of the counter set is non-zero. A counter set
with size zero is skipped.
Fixes: a029a4eab3 ("s390/cpumf: Allow concurrent access for CPU Measurement Counter Facility")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Cc: Heiko Carstens <hca@linux.ibm.com>
Cc: Vasily Gorbik <gor@linux.ibm.com>
Cc: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The CPU Measurement facility crypto counter set functionality
is defined by the Second Counter Version Number. This number
varies between machine types, but is upward compatible.
Lessen the checks to reflect this behavior.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Use control register bit defines instead of plain numbers where
possible.
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Add local and system prefix to some functions to clarify they change
control register contents on either the local CPU or the on all CPUs.
This results in the following API:
Two defines which load and save multiple control registers.
The defines correlate with the following C prototypes:
void __local_ctl_load(unsigned long *, unsigned int cr_low, unsigned int cr_high);
void __local_ctl_store(unsigned long *, unsigned int cr_low, unsigned int cr_high);
Two functions which locally set or clear one bit for a specified
control register:
void local_ctl_set_bit(unsigned int cr, unsigned int bit);
void local_ctl_clear_bit(unsigned int cr, unsigned int bit);
Two functions which set or clear one bit for a specified control
register on all CPUs:
void system_ctl_set_bit(unsigned int cr, unsigned int bit);
void system_ctl_clear_bit(unsigend int cr, unsigned int bit);
Reviewed-by: Alexander Gordeev <agordeev@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Remove most debug statements which are not needed anymore from
the CPU Measurement counter facility device driver.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Struct cpu_cf_events is a large data structure and is statically defined
for each possible CPU. Rework this and replace it by dynamically
allocated data structures created when a perf_event_open() system call
is invoked or an access via character device /dev/hwctr takes place.
It is replaced by an array of pointers to all possible CPUs and
reference counting. The array of pointers is allocated when the first
event is created. For each online CPU an event is installed on, a struct
cpu_cf_events is allocated and a pointer to struct cpu_cf_events is
stored in the array:
CPU 0 1 2 3 ... N
+---+---+---+---+---+---+
cpu_cf_root::cpucf--> | * | | | |...| |
+-|-+---+---+---+---+---+
|
|
\|/
+-------------+
|cpu_cf_events|
| |
+-------------+
With this approach the large data structure is only allocated when
an event is actually installed and used.
Also implement proper reference counting for allocation and removal.
During interrupt processing make sure the pointer to cpu_cf_events
is valid. The interrupt handler is shared and might be called when
no event is active.
This requires checking for a valid pointer to struct cpu_cf_events.
When the pointer to the per-cpu cpu_cf_events is NULL, simply return.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
The device /dev/hwctr was introduced to access complete
CPU Measurement facility counter sets via an ioctl system call.
The access the to device is limited to privileged processes
running as root or superuser. The capability CAP_SYS_ADMIN
is required. The device permissions are read/write for the
device owner root. There is no need for this restriction.
Make the device access permission read/write for all and
reduce the capabilities to CAP_PERFMON.
Any user space program with the CAP_PERFMON capability assigned to it
can now read and display the CPU Measurement facility counter sets.
For more details on perf tool usage and security, see linux
documentation in Documentation/admin-guide/perf-security.rst.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Remove function validate_ctr_auth() and replace this very small
function by its body.
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Function validate_ctr_version() first parameter is a pointer to
a large structure, but only member hw_perf_event::config is used.
Supply this structure member value in the function invocation.
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The CPU measurement facility counter information instruction qctri()
retrieves information about the available counter sets.
The information varies between machine generations, but is constant
when running on a particular machine.
For example the CPU measurement facility counter first and second
version numbers determine the amount of counters in a counter
set. This information never changes.
The counter sets are identical for all CPUs in the system. It does
not matter which CPU performs the instruction.
Authorization control of the CPU Measurement facility can only
be changed in the activation profile while the LPAR is not running.
Retrieve the CPU measurement counter information at device driver
initialization time and use its constant values.
Function validate_ctr_version() verifies if a user provided
CPU Measurement counter facility counter is valid and defined.
It now uses the newly introduced static CPU counter facility
information.
To avoid repeated recalculation of the counter set sizes (numbers of
counters per set), which never changes on a running machine,
calculate the counter set size once at device driver initialization
and store the result in an array. Functions cpum_cf_make_setsize()
and cpum_cf_read_setsize() are introduced.
Finally remove cpu_cf_events::info member and use the static CPU
counter facility information instead.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Simplify pr_err() statement into one line and omit return statement.
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Struct s390_ctrset_read userdata is filled by ioctl_read operation
using put_user/copy_to_user. However, the ctrset->data value access
is not performed anywhere during the ioctl_read operation.
Remove unnecessary copy_from_user() call.
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
When function cfset_all_copy() fails, also log the bad return code
in the debug statement (when turned on).
No functional change
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
- Large cleanup of the con3270/tty3270 driver. Among others this fixes:
* Background Color Support
* ASCII Line Character Support
* VT100 Support
* Geometries other than 80x24
- Cleanup and improve cmpxchg() code. Also add cmpxchg_user_key() to
uaccess functions, which will be used by KVM to access KVM guest memory
with a specific storage key.
- Add support for user space events counting to CPUMF.
- Cleanup the vfio/ccw code, which also allows now to properly support 2K
Format-2 IDALs.
- Move kernel page table allocation and initialization to decompressor,
which finally allows to enter the kernel with dynamic address translation
enabled. This in turn allows to get rid of code with special handling in
the kernel, which has to distinguish if DAT is on or off.
- Replace kretprobe with rethook.
- Various improvements to vfio/ap queue resets:
* Use TAPQ to verify completion of a reset in progress rather than
multiple invocations of ZAPQ.
* Check TAPQ response codes when verifying successful completion of ZAPQ.
* Fix erroneous handling of some error response codes.
* Increase the maximum amount of time to wait for successful completion
of ZAPQ.
- Rework system call wrappers to get rid of alias functions, which were
only left on s390.
- Cleanup diag288_wdt watchdog driver. It has been agreed on with Guenter
Roeck that this goes upstream via the s390 tree.
- Add missing loadparm parameter handling for list-directed ECKD ipl/reipl.
- Various improvements to memory detection code.
- Remove arch_cpu_idle_time() since the current implementation is broken,
and allows user space observable accounted idle times which can
temporarily decrease.
- Add Reset DAT-Protection support: (only) allow to change PTEs from RO to
RW with a new RDP instruction. Unlike the currently used IPTE
instruction, this does not necessarily guarantee that TLBs of all CPUs
are synchronously flushed; and that remote CPUs can see spurious
protection faults. The overall improvement for not requiring an all CPU
synchronization, like it is required with IPTE, should be beneficial.
- Fix KFENCE page fault reporting.
- Smaller cleanups and improvement all over the place.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCAAdFiEECMNfWEw3SLnmiLkZIg7DeRspbsIFAmPzWhQACgkQIg7DeRsp
bsKkdQ//QbCwDMt6T3bmi6gGgs9HRSkLOTHAlOIRuetEzRiBzm/O4Gm8NycvPspl
BIcuXmQKt+gBS44tWikKpwuhmWrAtiFUxs/M1uPfRXqjUf+ZFJinPJgtPCBa/3rv
tQkh541QxpX4K5Ks71WKv2Kh0RjaTqw5Kj+rlDBYHsxZvb28mDigINRYoVSxNUKi
dTVlR0UgdGLecXfezpvWeEAbJu6Q2pbIkOT3tNOumNqRAoUN4cbH3P0agHJdq8oj
L/++d4tfVbwL8N/VCwIVBeW/AQzA0B2UCDVz75Pd55+FFrIGVp1hn7QC9QQieomL
fzGOTrL4D9U8JkAIJqhioA1NlcN1+QW2svoMVo0N3vBJpIbzX4bZKTDxZZ26dG9H
ox7YvhsZtJA7p34X5hetoObzZcmiYJStT+BDao7q1x3oLf4G31HaP+YUDIKBPmNW
ieZa+ujYbKor1pD6ysaMVX+c1qhfX6S/V0uBAikoqMWUVUvH/ZeuSxCSfMuvWUrQ
KFuc0HnPiiIO1Ux3wN5oN33+pWCSdUcJOeg4aj0jkkFT9Ct3TOBupIGBGyhOAh6r
OTp1iqJuQjwOmkWPLyRuGMzRmDDp+hWz9qNF/DFwIV1IMi9AzJjEOg31cMVxx3gG
iM25560uqvhhUwQFbyjVgJmj00dmqMX07q8QSVOgg9oUrnRhQrc=
=MW9x
-----END PGP SIGNATURE-----
Merge tag 's390-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux
Pull s390 updates from Heiko Carstens:
- Large cleanup of the con3270/tty3270 driver. Among others this fixes:
- Background Color Support
- ASCII Line Character Support
- VT100 Support
- Geometries other than 80x24
- Cleanup and improve cmpxchg() code. Also add cmpxchg_user_key() to
uaccess functions, which will be used by KVM to access KVM guest
memory with a specific storage key
- Add support for user space events counting to CPUMF
- Cleanup the vfio/ccw code, which also allows now to properly support
2K Format-2 IDALs
- Move kernel page table allocation and initialization to decompressor,
which finally allows to enter the kernel with dynamic address
translation enabled. This in turn allows to get rid of code with
special handling in the kernel, which has to distinguish if DAT is on
or off
- Replace kretprobe with rethook
- Various improvements to vfio/ap queue resets:
- Use TAPQ to verify completion of a reset in progress rather than
multiple invocations of ZAPQ.
- Check TAPQ response codes when verifying successful completion of
ZAPQ.
- Fix erroneous handling of some error response codes.
- Increase the maximum amount of time to wait for successful
completion of ZAPQ
- Rework system call wrappers to get rid of alias functions, which were
only left on s390
- Cleanup diag288_wdt watchdog driver. It has been agreed on with
Guenter Roeck that this goes upstream via the s390 tree
- Add missing loadparm parameter handling for list-directed ECKD
ipl/reipl
- Various improvements to memory detection code
- Remove arch_cpu_idle_time() since the current implementation is
broken, and allows user space observable accounted idle times which
can temporarily decrease
- Add Reset DAT-Protection support: (only) allow to change PTEs from RO
to RW with a new RDP instruction. Unlike the currently used IPTE
instruction, this does not necessarily guarantee that TLBs of all
CPUs are synchronously flushed; and that remote CPUs can see spurious
protection faults. The overall improvement for not requiring an all
CPU synchronization, like it is required with IPTE, should be
beneficial
- Fix KFENCE page fault reporting
- Smaller cleanups and improvement all over the place
* tag 's390-6.3-1' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux: (182 commits)
s390/irq,idle: simplify idle check
s390/processor: add test_and_set_cpu_flag() and test_and_clear_cpu_flag()
s390/processor: let cpu helper functions return boolean values
s390/kfence: fix page fault reporting
s390/zcrypt: introduce ctfm field in struct CPRBX
s390: remove confusing comment from uapi types header file
vfio/ccw: remove WARN_ON during shutdown
s390/entry: remove toolchain dependent micro-optimization
s390/mem_detect: do not truncate online memory ranges info
s390/vx: remove __uint128_t type from __vector128 struct again
s390/mm: add support for RDP (Reset DAT-Protection)
s390/mm: define private VM_FAULT_* reasons from top bits
Documentation: s390: correct spelling
s390/ap: fix status returned by ap_qact()
s390/ap: fix status returned by ap_aqic()
s390: vfio-ap: tighten the NIB validity check
Revert "s390/mem_detect: do not update output parameters on failure"
s390/idle: remove arch_cpu_idle_time() and corresponding code
s390/vx: use simple assignments to access __vector128 members
s390/vx: add 64 and 128 bit members to __vector128 struct
...
Simplify the use of constants PMC_INIT and PMC_RELEASE.
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
With no in-kernel user, the source files can be merged.
Move all functions and the variable definitions to file perf_cpum_cf.c
This file now contains all the necessary functions and definitions
for the CPU Measurement counter facility device driver.
The files cpu_mcf.h and perf_cpum_cf_common.c are deleted.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Commit 17bebcc68e ("s390/cpum_cf: Add minimal in-kernel interface for
counter measurements") introduced a small in-kernel interface for CPU
Measurement counter facility.
There are no users of this interface, therefore remove it.
The following functions are removed:
kernel_cpumcf_alert(),
kernel_cpumcf_begin(),
kernel_cpumcf_end(),
kernel_cpumcf_avail()
there is no need for them anymore.
With the removal of function kernel_cpumcf_alert(), also remove
member alert in struct cpu_cf_events. Its purpose was to counter
measurement alert interrupts for the in-kernel interface.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Function stccm_avail() is defined in a header file and the
only user is one single source file. Move this function to the source
file where it is also used and remove it from the header file.
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Function cpum_cf_ctrset_size() is defined in one source file and the
only user is in another source file. Move this function to the source
file where it is used and remove its prototype from the header file.
No functional change.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
To remove an event from the CPU Measurement counter facility
use the lock/unlock scheme as done in event creation. Remove
the atomic_add_unless function to make the code easier.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
When we save the raw_data to the perf sample data, we need to update
the sample flags and the dynamic size. To make sure this is done
consistently, add the perf_sample_save_raw_data() helper and convert
all call sites.
Suggested-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Ingo Molnar <mingo@kernel.org>
Tested-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Jiri Olsa <jolsa@kernel.org>
Acked-by: Peter Zijlstra <peterz@infradead.org>
Link: https://lore.kernel.org/r/20230118060559.615653-4-namhyung@kernel.org
CPU Measurement counting facility events PROBLEM_STATE_CPU_CYCLES(32)
and PROBLEM_STATE_INSTRUCTIONS(33) are valid events. However the device
driver returns error -EOPNOTSUPP when these event are to be installed.
Fix this and allow installation of events PROBLEM_STATE_CPU_CYCLES,
PROBLEM_STATE_CPU_CYCLES:u, PROBLEM_STATE_INSTRUCTIONS and
PROBLEM_STATE_INSTRUCTIONS:u.
Kernel space counting only is still not supported by s390.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Use the new sample_flags to indicate whether the raw data field is
filled by the PMU driver. Although it could check with the NULL,
follow the same rule with other fields.
Remove the raw field from the perf_sample_data_init() to minimize
the number of cache lines touched.
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/20220921220032.2858517-2-namhyung@kernel.org
Events CPU_CYCLES and INSTRUCTIONS can be submitted with two different
perf_event attribute::type values:
- PERF_TYPE_HARDWARE: when invoked via perf tool predefined events name
cycles or cpu-cycles or instructions.
- pmu->type: when invoked via perf tool event name cpu_cf/CPU_CYLCES/ or
cpu_cf/INSTRUCTIONS/. This invocation also selects the PMU to which
the event belongs.
Handle both type of invocations identical for events CPU_CYLCES and
INSTRUCTIONS. They address the same hardware.
The result is different when event modifier exclude_kernel is also set.
Invocation with event modifier for user space event counting fails.
Output before:
# perf stat -e cpum_cf/cpu_cycles/u -- true
Performance counter stats for 'true':
<not supported> cpum_cf/cpu_cycles/u
0.000761033 seconds time elapsed
0.000076000 seconds user
0.000725000 seconds sys
#
Output after:
# perf stat -e cpum_cf/cpu_cycles/u -- true
Performance counter stats for 'true':
349,613 cpum_cf/cpu_cycles/u
0.000844143 seconds time elapsed
0.000079000 seconds user
0.000800000 seconds sys
#
Fixes: 6a82e23f45 ("s390/cpumf: Adjust registration of s390 PMU device drivers")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
[agordeev@linux.ibm.com corrected commit ID of Fixes commit]
Signed-off-by: Alexander Gordeev <agordeev@linux.ibm.com>
Machine generations up to z9 (released in May 2006) have been officially
out of service for several years now (z9 end of service - January 31, 2019).
No distributions build kernels supporting those old machine generations
anymore, except Debian, which seems to pick the oldest supported
generation. The team supporting Debian on s390 has been notified about
the change.
Raising minimum supported machine generation to z10 helps to reduce
maintenance cost and effectively remove code, which is not getting
enough testing coverage due to lack of older hardware and distributions
support. Besides that this unblocks some optimization opportunities and
allows to use wider instruction set in asm files for future features
implementation. Due to this change spectre mitigation and usercopy
implementations could be drastically simplified and many newer instructions
could be converted from ".insn" encoding to instruction names.
Acked-by: Ilya Leoshkevich <iii@linux.ibm.com>
Reviewed-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
When a CPU is hotplugged while the perf stat -e cycles command is
running, a wrong (very large) value is displayed immediately after the
CPU removal:
Check the values, shouldn't be too high as in
time counts unit events
1.001101919 29261846 cycles
2.002454499 17523405 cycles
3.003659292 24361161 cycles
4.004816983 18446744073638406144 cycles
5.005671647 <not counted> cycles
...
The CPU hotplug off took place after 3 seconds.
The issue is the read of the event count value after 4 seconds when
the CPU is not available and the read of the counter returns an
error. This is treated as a counter value of zero. This results
in a very large value (0 - previous_value).
Fix this by detecting the hotplugged off CPU and report 0 instead
of a very large number.
Cc: stable@vger.kernel.org
Fixes: a029a4eab3 ("s390/cpumf: Allow concurrent access for CPU Measurement Counter Facility")
Reported-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Commit a029a4eab3 ("s390/cpumf: Allow concurrent access for CPU Measurement Counter Facility")
added CPU Measurement counter facility access to multiple consumers.
It allows concurrent access to the CPU Measurement counter facility
via several perf_event_open() system call invocations and via ioctl()
system call of device /dev/hwc. However the access via device /dev/hwc
was exclusive, only one process was able to open this device.
The patch removes this restriction. Now multiple invocations of lshwc
can execute in parallel. They can access different CPUs and counter
sets or CPUs and counter set can overlap.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Suggested-by: Heiko Carstens <hca@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Acked-by: Heiko Carstens <hca@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Move array from header to C file to avoid that it gets defined in
every C file where the header is included:
In file included from arch/s390/kernel/perf_cpum_cf_common.c:19:
./arch/s390/include/asm/cpu_mcf.h:27:18: warning: ‘cpumf_ctr_ctl’ defined but not used [-Wunused-const-variable=]
27 | static const u64 cpumf_ctr_ctl[CPUMF_CTR_SET_MAX] = {
| ^~~~~~~~~~~~~
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().
Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Link: https://lore.kernel.org/r/20210803141621.780504-5-bigeasy@linutronix.de
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Commit cf6acb8bdb ("s390/cpumf: Add support for complete counter set extraction")
allows access to the CPU Measurement Counter Facility via character
device /dev/hwctr. The access was exclusive via this device or
via perf_event_open() system call. Only one path at a time was
permitted. The CPU Measurement Counter Facility device driver blocked
access to other processes.
This patch removes this restriction and allows concurrent access to
the CPU Measurement Counter Facility from multiple processes at the same
time via perf_event_open() SVC and via /dev/hwctr device. The access
via /dev/hwctr device is still exclusive, only one process is allowed to
access this device.
This patch
- moves the /dev/hwctr device access from file perf_cpum_cf_diag.c.
to file perf_cpum_cf.c.
- use only one trace buffer .../s390dbf/cpum_cf.
- remove cfset_csd structure and includes its members it into the
structure cpu_cf_events. This results in one data structure and
simplifies the access.
- rework function familiy ctr_set_enable, ctr_set_disable, ctr_set_start
and ctr_set_stop which operate on a counter set number.
Now they operate on a counter set bit mask.
- move CF_DIAG event functionality to file perf_cpum_cf.c. It now
contains the complete functionality of the CPU Measurement Counter
Facility:
- Performance measurement support for counters using perf stat.
- Support for complete counter set extraction with device /dev/hwctr.
- Support for counter set extraction event CF_DIAG attached to
samples using perf record.
- removes file perf_cpum_cf_diag.c
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Remove some WARN_ON_ONCE() warnings when a counter is started. Each
counter is installed function calls
event_sched_in() --> cpumf_pmu_add(..., PERF_EF_START).
This is done after the event has been created using
perf_pmu_event_init() which verifies the counter is valid.
Member hwc->config must be valid at this point.
Function cpumf_pmu_start(..., PERF_EF_RELOAD) is called from
function cpumf_pmu_add() for counter events. All other invocations of
cpumf_pmu_start(..., PERF_EF_RELOAD) are from the performance subsystem
for sampling events.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The command 'perf stat -e cycles ...' triggers the following function
sequence in the CPU Measurement Facility counter device driver:
perf_pmu_event_init()
__hw_perf_event_init()
validate_ctr_auth()
validate_ctr_version()
During event creation, the counter number is checked in functions
validate_ctr_auth() and validate_ctr_version() to verify it is a valid
counter and supported by the hardware. If this is not the case, both
functions return an error and the event is not created. System call
perf_event_open() returns an error in this case.
Later on the event is installed in the kernel event subsystem and the
driver functions cpumf_pmu_add() and cpumf_pmu_commit_txn() are called
to install the counter event by the hardware.
Since both events have been verified at event creation, there is no need
to re-evaluate the authorization state. This can not change since on
* LPARs the authorization change requires a restart of the LPAR (and
thus a reboot of the kernel)
* DPMs can not take resources away, just add them.
Also the sequence of CPU Measurement facility counter device driver
calls is
cpumf_pmu_start_txn
cpumf_pmu_add
cpumf_pmu_start
cpumf_pmu_commit_txn
for every single event. Which means the condition in cpumf_pmu_add()
is never met and validate_ctr_auth() is never called.
This leaves the counter device driver transaction functions with
just one task:
start_txn: Verify a transaction is not in flight and call
perf_pmu_disable()
cancel_txn, commit_txn: Verify a transaction is in flight and call
perf_pmu_enable()
The same functionality is provided by the default transaction handling
functions in kernel/events/core.c. Use those by removing the
counter device driver private call back functions.
Suggested-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
The function cpumf_pmu_add and cpumf_pmu_del call function
perf_event_update_userpage(). This calls is obsolete, the calls add and
delete a counter event. Counter events do not sample data and the
event->rb member to access the sampling ring buffer is always NULL.
The function perf_event_update_userpage() simply returns in this case.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by : Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Beautify if-then-else indentation to match coding guideline.
Also use shorter pointer notation hwc instead of event->hw.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Acked-by : Sumanth Korikkar <sumanthk@linux.ibm.com>
Signed-off-by: Heiko Carstens <hca@linux.ibm.com>
Linux-next commit titled "perf/core: Optimize perf_init_event()"
changed the semantics of PMU device driver registration.
It was done to speed up the lookup/handling of PMU device driver
specific events. It also enforces that only one PMU device
driver will be registered of type PERF_EVENT_RAW.
This change added these line in function perf_pmu_register():
...
+ ret = idr_alloc(&pmu_idr, pmu, max, 0, GFP_KERNEL);
+ if (ret < 0)
goto free_pdc;
+
+ WARN_ON(type >= 0 && ret != type);
The warn_on generates a message. We have 3 PMU device drivers,
each registered as type PERF_TYPE_RAW.
The cf_diag device driver (arch/s390/kernel/perf_cpumf_cf_diag.c)
always hits the WARN_ON because it is the second PMU device driver
(after sampling device driver arch/s390/kernel/perf_cpumf_sf.c)
which is registered as type 4 (PERF_TYPE_RAW).
So when the sampling device driver is registered, ret has value 4.
When cf_diag device driver is registered with type 4,
ret has value of 5 and WARN_ON fires.
Adjust the PMU device drivers for s390 to support the new
semantics required by perf_pmu_register().
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Vasily Gorbik <gor@linux.ibm.com>
Add support for the CPU-Measurement Facility counter
second version number 6. This number is used to detect some
more counters in the crypto counter set and the extended
counter set.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Rservation of the CPU Measurement Counter facility may fail if
it is already in use by the cf_diag device driver.
This is indicated by a non zero return code (-EBUSY).
However this return code is ignored and the counter facility
may be used in parallel by different device drivers.
Handle the failing reservation and return an error to the
caller.
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move common functions of the couter facility support into a separate
file.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
A preparation to move out common CPU-MF counter facility support
functions, first introduce a function that indicates whether the
support is ready to use.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
During a __kernel_cpumcf_begin()/end() session, save measurement alerts
for the counter facility in the per-CPU cpu_cf_events variable.
Users can obtain and, optionally, clear the alerts by calling
kernel_cpumcf_alert() to specifically handle alerts.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Make the struct cpu_cf_events and the respective per-CPU variable available
to in-kernel users. Access to this per-CPU variable shall be done between
the calls to __kernel_cpumcf_begin() and __kernel_cpumcf_end().
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Rename the struct cpu_hw_events to cpu_cf_events and also the respective
per-CPU variable to make its name more clear. No functional changes.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Prepare the counter facility support to be used by other in-kernel
users. The first step introduces the __kernel_cpumcf_begin() and
__kernel_cpumcf_end() functions to reserve the counter facility
for doing measurements and to release after the measurements are
done.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Move counter set specific controls and functions to the asm/cpu_mcf.h
header file containg all counter facility support definitions. Also
adapt few variable names and header file includes. No functional changes.
Signed-off-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
On s390 command perf top fails
[root@s35lp76 perf] # ./perf top -F100000 --stdio
Error:
cycles: PMU Hardware doesn't support sampling/overflow-interrupts.
Try 'perf stat'
[root@s35lp76 perf] #
Using event -e rb0000 works as designed. Event rb0000 is the event
number of the sampling facility for basic sampling.
During system start up the following PMUs are installed in the kernel's
PMU list (from head to tail):
cpum_cf --> s390 PMU counter facility device driver
cpum_sf --> s390 PMU sampling facility device driver
uprobe
kprobe
tracepoint
task_clock
cpu_clock
Perf top executes following functions and calls perf_event_open(2) system
call with different parameters many times:
cmd_top
--> __cmd_top
--> perf_evlist__add_default
--> __perf_evlist__add_default
--> perf_evlist__new_cycles (creates event type:0 (HW)
config 0 (CPU_CYCLES)
--> perf_event_attr__set_max_precise_ip
Uses perf_event_open(2) to detect correct
precise_ip level. Fails 3 times on s390 which is ok.
Then functions cmd_top
--> __cmd_top
--> perf_top__start_counters
-->perf_evlist__config
--> perf_can_comm_exec
--> perf_probe_api
This functions test support for the following events:
"cycles:u", "instructions:u", "cpu-clock:u" using
--> perf_do_probe_api
--> perf_event_open_cloexec
Test the close on exec flag support with
perf_event_open(2).
perf_do_probe_api returns true if the event is
supported.
The function returns true because event cpu-clock is
supported by the PMU cpu_clock.
This is achieved by many calls to perf_event_open(2).
Function perf_top__start_counters now calls perf_evsel__open() for every
event, which is the default event cpu_cycles (config:0) and type HARDWARE
(type:0) which a predfined frequence of 4000.
Given the above order of the PMU list, the PMU cpum_cf gets called first
and returns 0, which indicates support for this sampling. The event is
fully allocated in the function perf_event_open (file kernel/event/core.c
near line 10521 and the following check fails:
event = perf_event_alloc(&attr, cpu, task, group_leader, NULL,
NULL, NULL, cgroup_fd);
if (IS_ERR(event)) {
err = PTR_ERR(event);
goto err_cred;
}
if (is_sampling_event(event)) {
if (event->pmu->capabilities & PERF_PMU_CAP_NO_INTERRUPT) {
err = -EOPNOTSUPP;
goto err_alloc;
}
}
The check for the interrupt capabilities fails and the system call
perf_event_open() returns -EOPNOTSUPP (-95).
Add a check to return -ENODEV when sampling is requested in PMU cpum_cf.
This allows common kernel code in the perf_event_open() system call to
test the next PMU in above list.
Fixes: 97b1198fec (" "s390, perf: Use common PMU interrupt disabled code")
Signed-off-by: Thomas Richter <tmricht@linux.ibm.com>
Reviewed-by: Hendrik Brueckner <brueckner@linux.ibm.com>
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>