Commit Graph

731 Commits

Author SHA1 Message Date
Pu Lehui
c625154993
drivers/perf: riscv: Align errno for unsupported perf event
RISC-V perf driver does not yet support PERF_TYPE_BREAKPOINT. It would
be more appropriate to return -EOPNOTSUPP or -ENOENT for this type in
pmu_sbi_event_map. Considering that other implementations return -ENOENT
for unsupported perf types, let's synchronize this behavior. Due to this
reason, a riscv bpf testcases perf_skip fail. Meanwhile, align that
behavior to the rest of proper place.

Signed-off-by: Pu Lehui <pulehui@huawei.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Fixes: 9b3e150e31 ("RISC-V: Add a simple platform driver for RISC-V legacy perf")
Fixes: 16d3b1af09 ("perf: RISC-V: Check standard event availability")
Fixes: e999143459 ("RISC-V: Add perf platform driver based on SBI PMU extension")
Cc: stable@vger.kernel.org
Link: https://lore.kernel.org/r/20240831071520.1630360-1-pulehui@huaweicloud.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-10-01 02:47:39 -07:00
Linus Torvalds
97d8894b6f RISC-V Patches for the 6.12 Merge Window, Part 1
* Support for using Zkr to seed KASLR.
 * Support for IPI-triggered CPU backtracing.
 * Support for generic CPU vulnerabilities reporting to userspace.
 * A few cleanups for missing licenses.
 * The size limit on the XIP kernel has been removed.
 * Support for tracing userspace stacks.
 * Support for the Svvptc extension.
 * Various cleanups and fixes throughout the tree.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmbykZITHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiTDLEACABTPlMX+Ke1lMYWj6MLBTXMSlQJ6w
 TKLVk3slp4POPsd+ViWy4J6EJDpCkNp6iK30Bv4AGainA3RJnbS7neKYy+MTw0ZV
 pJWTn73sxHeF+APnZ+geFYcmyteL/SZplgHgwLipFaojs7i/+tOvLFRRD3LueCz6
 Q6Muvke9R5uN6LL3tUrmIhJeyZjOwJtKEYoRz6Fo5Tq3RRK0VHYZkdpqRQ86rr9U
 yNbRNlBLlANstq8xjtiwm22ImPGIsvvhs0AvFft0H72zhf3Tfy0VcTLlDJYwkdKN
 Bz59v4N4jg1ajXuAsj4/BQdj5uRkzJNZ3Yq1yvMmGI+2UV1tInHEMeZi63+4gs8F
 0FYCziA+j6/08u8v8CrwdDOpJ9Iq/NMV6359bt5ySu3rM6QnPRGnHGkv4Ptk9Oyh
 sSPiuHS8YxpRBOB8xXNp5xFipTZ4+65VqCz253pAsbbp5aZ/o9Jw/bbBFFWcSsRZ
 tV2xhELVzPiC7dowfYkQhcNZOLlCaJ/Mvo2nMOtjbUwzaXkcjIRcYEy7+dTlfXyr
 3h5sStjWAKEmtLEvCYSI+lAT544tbV1x+lCMJGuvztapsTMtAP4GpQKblXl5qqnV
 L+VWIPJV1elI26H/p/Max9Z1s48NtfwoJSRL7Rnaya6uJ3iWG/BtajHlFbvgOkJf
 ObPZ//Yr9e91gg==
 =zDwL
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull RISC-V updates from Palmer Dabbelt:

 - Support using Zkr to seed KASLR

 - Support IPI-triggered CPU backtracing

 - Support for generic CPU vulnerabilities reporting to userspace

 - A few cleanups for missing licenses

 - The size limit on the XIP kernel has been removed

 - Support for tracing userspace stacks

 - Support for the Svvptc extension

 - Various cleanups and fixes throughout the tree

* tag 'riscv-for-linus-6.12-mw1' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (47 commits)
  crash: Fix riscv64 crash memory reserve dead loop
  perf/riscv-sbi: Add platform specific firmware event handling
  tools: Optimize ring buffer for riscv
  tools: Add riscv barrier implementation
  RISC-V: Don't have MAX_PHYSMEM_BITS exceed phys_addr_t
  ACPI: NUMA: initialize all values of acpi_early_node_map to NUMA_NO_NODE
  riscv: Enable bitops instrumentation
  riscv: Omit optimized string routines when using KASAN
  ACPI: RISCV: Make acpi_numa_get_nid() to be static
  riscv: Randomize lower bits of stack address
  selftests: riscv: Allow mmap test to compile on 32-bit
  riscv: Make riscv_isa_vendor_ext_andes array static
  riscv: Use LIST_HEAD() to simplify code
  riscv: defconfig: Disable RZ/Five peripheral support
  RISC-V: Implement kgdb_roundup_cpus() to enable future NMI Roundup
  riscv: avoid Imbalance in RAS
  riscv: cacheinfo: Add back init_cache_level() function
  riscv: Remove unused _TIF_WORK_MASK
  drivers/perf: riscv: Remove redundant macro check
  riscv: define ILLEGAL_POINTER_VALUE for 64bit
  ...
2024-09-24 10:59:17 -07:00
Mayuresh Chitale
f0c9363db2
perf/riscv-sbi: Add platform specific firmware event handling
The SBI v2.0 specification pointed to by the link below reserves the
event code 0xffff for platform specific firmware events. Update the driver
to be able to parse and program such events. The platform specific
firmware events must now be specified in the perf command as below:
perf stat -e rCxxx ...
where bits[63:62] = 0x3 of the event config indicate a platform specific
firmware event and xxx indicate the actual event code which is passed
as the event data.

Signed-off-by: Mayuresh Chitale <mchitale@ventanamicro.com>
Link: https://github.com/riscv-non-isa/riscv-sbi-doc/releases/download/v2.0/riscv-sbi.pdf
Link: https://lore.kernel.org/r/20240812051109.6496-1-mchitale@ventanamicro.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-20 05:58:11 -07:00
Linus Torvalds
114143a595 arm64 updates for 6.12
ACPI:
 * Enable PMCG erratum workaround for HiSilicon HIP10 and 11 platforms.
 * Ensure arm64-specific IORT header is covered by MAINTAINERS.
 
 CPU Errata:
 * Enable workaround for hardware access/dirty issue on Ampere-1A cores.
 
 Memory management:
 * Define PHYSMEM_END to fix a crash in the amdgpu driver.
 * Avoid tripping over invalid kernel mappings on the kexec() path.
 * Userspace support for the Permission Overlay Extension (POE) using
   protection keys.
 
 Perf and PMUs:
 * Add support for the "fixed instruction counter" extension in the CPU
   PMU architecture.
 * Extend and fix the event encodings for Apple's M1 CPU PMU.
 * Allow LSM hooks to decide on SPE permissions for physical profiling.
 * Add support for the CMN S3 and NI-700 PMUs.
 
 Confidential Computing:
 * Add support for booting an arm64 kernel as a protected guest under
   Android's "Protected KVM" (pKVM) hypervisor.
 
 Selftests:
 * Fix vector length issues in the SVE/SME sigreturn tests
 * Fix build warning in the ptrace tests.
 
 Timers:
 * Add support for PR_{G,S}ET_TSC so that 'rr' can deal with
   non-determinism arising from the architected counter.
 
 Miscellaneous:
 * Rework our IPI-based CPU stopping code to try NMIs if regular IPIs
   don't succeed.
 * Minor fixes and cleanups.
 -----BEGIN PGP SIGNATURE-----
 
 iQFEBAABCgAuFiEEPxTL6PPUbjXGY88ct6xw3ITBYzQFAmbkVNEQHHdpbGxAa2Vy
 bmVsLm9yZwAKCRC3rHDchMFjNKeIB/9YtbN7JMgsXktM94GP03r3tlFF36Y1S51S
 +zdDZclAVZCTCZN+PaFeAZ/+ah2EQYrY6rtDoHUSEMQdF9kH+ycuIPDTwaJ4Qkam
 QKXMpAgtY/4yf2rX4lhDF8rEvkhLDsu7oGDhqUZQsA33GrMBHfgA3oqpYwlVjvGq
 gkm7olTo9LdWAxkPpnjGrjB6Mv5Dq8dJRhW+0Q5AntI5zx3RdYGJZA9GUSzyYCCt
 FIYOtMmWPkQ0kKxIVxOxAOm/ubhfyCs2sjSfkaa3vtvtt+Yjye1Xd81rFciIbPgP
 QlK/Mes2kBZmjhkeus8guLI5Vi7tx3DQMkNqLXkHAAzOoC4oConE
 =6osL
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Will Deacon:
 "The highlights are support for Arm's "Permission Overlay Extension"
  using memory protection keys, support for running as a protected guest
  on Android as well as perf support for a bunch of new interconnect
  PMUs.

  Summary:

  ACPI:
   - Enable PMCG erratum workaround for HiSilicon HIP10 and 11
     platforms.
   - Ensure arm64-specific IORT header is covered by MAINTAINERS.

  CPU Errata:
   - Enable workaround for hardware access/dirty issue on Ampere-1A
     cores.

  Memory management:
   - Define PHYSMEM_END to fix a crash in the amdgpu driver.
   - Avoid tripping over invalid kernel mappings on the kexec() path.
   - Userspace support for the Permission Overlay Extension (POE) using
     protection keys.

  Perf and PMUs:
   - Add support for the "fixed instruction counter" extension in the
     CPU PMU architecture.
   - Extend and fix the event encodings for Apple's M1 CPU PMU.
   - Allow LSM hooks to decide on SPE permissions for physical
     profiling.
   - Add support for the CMN S3 and NI-700 PMUs.

  Confidential Computing:
   - Add support for booting an arm64 kernel as a protected guest under
     Android's "Protected KVM" (pKVM) hypervisor.

  Selftests:
   - Fix vector length issues in the SVE/SME sigreturn tests
   - Fix build warning in the ptrace tests.

  Timers:
   - Add support for PR_{G,S}ET_TSC so that 'rr' can deal with
     non-determinism arising from the architected counter.

  Miscellaneous:
   - Rework our IPI-based CPU stopping code to try NMIs if regular IPIs
     don't succeed.
   - Minor fixes and cleanups"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (94 commits)
  perf: arm-ni: Fix an NULL vs IS_ERR() bug
  arm64: hibernate: Fix warning for cast from restricted gfp_t
  arm64: esr: Define ESR_ELx_EC_* constants as UL
  arm64: pkeys: remove redundant WARN
  perf: arm_pmuv3: Use BR_RETIRED for HW branch event if enabled
  MAINTAINERS: List Arm interconnect PMUs as supported
  perf: Add driver for Arm NI-700 interconnect PMU
  dt-bindings/perf: Add Arm NI-700 PMU
  perf/arm-cmn: Improve format attr printing
  perf/arm-cmn: Clean up unnecessary NUMA_NO_NODE check
  arm64/mm: use lm_alias() with addresses passed to memblock_free()
  mm: arm64: document why pte is not advanced in contpte_ptep_set_access_flags()
  arm64: Expose the end of the linear map in PHYSMEM_END
  arm64: trans_pgd: mark PTEs entries as valid to avoid dead kexec()
  arm64/mm: Delete __init region from memblock.reserved
  perf/arm-cmn: Support CMN S3
  dt-bindings: perf: arm-cmn: Add CMN S3
  perf/arm-cmn: Refactor DTC PMU register access
  perf/arm-cmn: Make cycle counts less surprising
  perf/arm-cmn: Improve build-time assertion
  ...
2024-09-16 06:55:07 +02:00
Xiao Wang
1e206fad76
drivers/perf: riscv: Remove redundant macro check
The macro CONFIG_RISCV_PMU must have been defined when riscv_pmu.c gets
compiled, so this patch removes the redundant check.

Signed-off-by: Xiao Wang <xiao.w.wang@intel.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240708121224.1148154-1-xiao.w.wang@intel.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-15 20:15:48 -07:00
Dan Carpenter
2e091a805f perf: arm-ni: Fix an NULL vs IS_ERR() bug
The devm_ioremap() function never returns error pointers, it returns a
NULL pointer if there is an error.

Fixes: 4d5a7680f2 ("perf: Add driver for Arm NI-700 interconnect PMU")
Signed-off-by: Dan Carpenter <dan.carpenter@linaro.org>
Reviewed-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/04d6ccc3-6d31-4f0f-ab0f-7a88342cc09a@stanley.mountain
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-12 12:49:33 +01:00
Alexandre Ghiti
2840dadf0d
drivers: perf: Fix smp_processor_id() use in preemptible code
As reported in [1], the use of smp_processor_id() in
pmu_sbi_device_probe() must be protected by disabling the preemption, so
simple use get_cpu()/put_cpu() instead.

Reported-by: Nam Cao <namcao@linutronix.de>
Closes: https://lore.kernel.org/linux-riscv/20240820074925.ReMKUPP3@linutronix.de/ [1]
Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com>
Reviewed-by: Anup Patel <anup@brainfault.org>
Tested-by: Nam Cao <namcao@linutronix.de>
Fixes: a8625217a0 ("drivers/perf: riscv: Implement SBI PMU snapshot function")
Reported-by: Andrea Parri <parri.andrea@gmail.com>
Tested-by: Andrea Parri <parri.andrea@gmail.com>
Link: https://lore.kernel.org/r/20240826165210.124696-1-alexghiti@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-09-10 20:28:02 -07:00
Ilkka Koskinen
5967a19f1c perf: arm_pmuv3: Use BR_RETIRED for HW branch event if enabled
The PMU driver attempts to use PC_WRITE_RETIRED for the HW branch event,
if enabled. However, PC_WRITE_RETIRED counts only taken branches,
whereas BR_RETIRED counts also non-taken ones.

Furthermore, perf uses HW branch event to calculate branch misses ratio,
implying BR_RETIRED is the correct event to count.

We keep PC_WRITE_RETIRED still as an option in case BR_RETIRED isn't
implemented.

Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/20240906191539.4847-1-ilkka@os.amperecomputing.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-09 15:28:02 +01:00
Robin Murphy
4d5a7680f2 perf: Add driver for Arm NI-700 interconnect PMU
The Arm NI-700 Network-on-Chip Interconnect has a relatively
straightforward design with a hierarchy of voltage, power, and clock
domains, where each clock domain then contains a number of interface
units and a PMU which can monitor events thereon. As such, it begets a
relatively straightforward driver to interface those PMUs with perf.

Even more so than with arm-cmn, users will require detailed knowledge of
the wider system topology in order to meaningfully analyse anything,
since the interconnect itself cannot know what lies beyond the boundary
of each inscrutably-numbered interface. Given that, for now they are
also expected to refer to the NI-700 documentation for the relevant
event IDs to provide as well. An identifier is implemented so we can
come back and add jevents if anyone really wants to.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/9933058d0ab8138c78a61cd6852ea5d5ff48e393.1725470837.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-06 12:58:28 +01:00
Robin Murphy
f32efa3e4b perf/arm-cmn: Improve format attr printing
Take full advantage of our formats being stored in bitfield form, and
make the printing even more robust and simple by letting printk do all
the hard work of formatting bitlists.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Link: https://lore.kernel.org/r/50459f2d48fc62310a566863dbf8a7c14361d363.1725474584.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-06 12:58:06 +01:00
Robin Murphy
f04b611e66 perf/arm-cmn: Clean up unnecessary NUMA_NO_NODE check
Checking for NUMA_NO_NODE is a misleading and, on reflection, entirely
unnecessary micro-optimisation. If it ever did happen that an incoming
CPU has no NUMA affinity while the current CPU does, a questionably-
useful PMU migration isn't the biggest thing wrong with that picture...

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/00634da33c21269a00844140afc7cc3a2ac1eb4d.1725474584.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-06 12:58:06 +01:00
Robin Murphy
0dc2f4963f perf/arm-cmn: Support CMN S3
CMN S3 is the latest and greatest evolution for 2024, although most of
the new features don't impact the PMU, so from our point of view it ends
up looking a lot like CMN-700 r3 still. We have some new device types to
ignore, a mildly irritating rearrangement of the register layouts, and a
scary new configuration option that makes it potentially unsafe to even
walk the full discovery tree, let alone attempt to use the PMU.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/2ec9eec5b6bf215a9886f3b69e3b00e4cd85095c.1725296395.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 16:04:08 +01:00
Robin Murphy
67acca3504 perf/arm-cmn: Refactor DTC PMU register access
Annoyingly, we're soon going to have to cope with PMU registers moving
about. This will mostly be straightforward, except for the hard-coding
of CMN_PMU_OFFSET for the DTC PMU registers. As a first step, refactor
those accessors to allow for encapsulating a variable offset without
making a big mess all over. As a bonus, we can repack the arm_cmn_dtc
structure to accommodate the new pointer without growing any larger,
since irq_friend only encodes a range of +/-3.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/fc677576fae7b5b55780e5b245a4ef6ea1b30daf.1725296395.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 16:04:08 +01:00
Robin Murphy
c5b15ddf11 perf/arm-cmn: Make cycle counts less surprising
By default, CMN has automatic clock-gating with the implication that
a DTC's cycle counter may not increment while the DTC is sufficiently
idle. Given that we may have up to 4 DTCs to choose from when scheduling
a cycles event, this may potentially lead to surprising results if
trying to measure metrics based on activity in a different DTC domain
from where cycles end up being counted. Furthermore, since the details
of internal clock gating are not documented, we can't even reason about
what "active" cycles for a DTC actually mean relative to the activity of
other nodes within the same nominal DTC domain.

Make the reasonable assumption that if the user wants to count cycles,
they almost certainly want to count all of the cycles, and disable clock
gating while a DTC's cycle counter is in use.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/c47cfdc09e907b1d7753d142a7e659982cceb246.1725296395.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 16:04:08 +01:00
Robin Murphy
ff436cee69 perf/arm-cmn: Improve build-time assertion
These days we can use static_assert() in the logical place rather than
jamming a BUILD_BUG_ON() into the nearest function scope.

Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/224ee8286f299100f1c768edb254edc898539f50.1725296395.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 16:04:08 +01:00
Robin Murphy
359414b33e perf/arm-cmn: Ensure dtm_idx is big enough
While CMN_MAX_DIMENSION was bumped to 12 for CMN-650, that only supports
up to a 10x10 mesh, so bumping dtm_idx to 256 bits at the time worked
out OK in practice. However CMN-700 did finally support up to 144 XPs,
and thus needs a worst-case 288 bits of dtm_idx for an aggregated XP
event on a maxed-out config. Oops.

Fixes: 23760a0144 ("perf/arm-cmn: Add CMN-700 support")
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/e771b358526a0d7fc06efee2c3a2fdc0c9f51d44.1725296395.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 16:04:08 +01:00
Robin Murphy
88b63a82c8 perf/arm-cmn: Fix CCLA register offset
Apparently pmu_event_sel is offset by 8 for all CCLA nodes, not just
the CCLA_RNI combination type.

Fixes: 23760a0144 ("perf/arm-cmn: Add CMN-700 support")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Ilkka Koskinen <ilkka@os.amperecomputing.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/6e7bb06fef6046f83e7647aad0e5be544139763f.1725296395.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 16:04:07 +01:00
Robin Murphy
e79634b53e perf/arm-cmn: Refactor node ID handling. Again.
The scope of the "extra device ports" configuration is not made clear by
the CMN documentation - so far we've assumed it applies globally, based
on the sole example which suggests as much. However it transpires that
this is incorrect, and the format does in fact vary based on each
individual XP's port configuration. As a consequence, we're currenly
liable to decode the port/device indices from a node ID incorrectly,
thus program the wrong event source in the DTM leading to bogus event
counts, and also show device topology on the wrong ports in debugfs.

To put this right, rework node IDs yet again to carry around the
additional data necessary to decode them properly per-XP. At this point
the notion of fully decomposing an ID becomes more impractical than it's
worth, so unabstracting the XY mesh coordinates (where 2/3 users were
just debug anyway) ends up leaving things a bit simpler overall.

Fixes: 60d1504070 ("perf/arm-cmn: Support new IP features")
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Robin Murphy <robin.murphy@arm.com>
Link: https://lore.kernel.org/r/5195f990152fc37adba5fbf5929a6b11063d9f09.1725296395.git.robin.murphy@arm.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-09-04 16:04:07 +01:00
Yicong Yang
d1c93d5c67 drivers/perf: hisi_pcie: Export supported Root Ports [bdf_min, bdf_max]
Currently users can get the Root Ports supported by the PCIe PMU by
"bus" sysfs attributes which indicates the PCIe bus number where
Root Ports are located. This maybe insufficient since Root Ports
supported by different PCIe PMUs may be located on the same PCIe bus.
So export the BDF range the Root Ports additionally.

Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240829090332.28756-4-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30 11:43:10 +01:00
Yicong Yang
17bf68aeb3 drivers/perf: hisi_pcie: Fix TLP headers bandwidth counting
We make the initial value of event ctrl register as HISI_PCIE_INIT_SET
and modify according to the user options. This will make TLP headers
bandwidth only counting never take effect since HISI_PCIE_INIT_SET
configures to count the TLP payloads bandwidth. Fix this by making
the initial value of event ctrl register as 0.

Fixes: 17d573984d ("drivers/perf: hisi: Add TLP filter support")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240829090332.28756-3-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30 11:43:10 +01:00
Yicong Yang
daecd3373a drivers/perf: hisi_pcie: Record hardware counts correctly
Currently we set the period and record it as the initial value of the
counter without checking it's set to the hardware successfully or not.
However the counter maybe unwritable if the target event is unsupported
by the device. In such case we will pass user a wrong count:

[start counts when setting the period]
hwc->prev_count = 0x8000000000000000
device.counter_value = 0 // the counter is not set as the period
[when user reads the counter]
event->count = device.counter_value - hwc->prev_count
             = 0x8000000000000000 // wrong. should be 0.

Fix this by record the hardware counter counts correctly when setting
the period.

Fixes: 8404b0fbc7 ("drivers/perf: hisi: Add driver for HiSilicon PCIe PMU")
Signed-off-by: Yicong Yang <yangyicong@hisilicon.com>
Acked-by: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Link: https://lore.kernel.org/r/20240829090332.28756-2-yangyicong@huawei.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30 11:43:09 +01:00
James Clark
5e9629d0ae drivers/perf: arm_spe: Use perf_allow_kernel() for permissions
Use perf_allow_kernel() for 'pa_enable' (physical addresses),
'pct_enable' (physical timestamps) and context IDs. This means that
perf_event_paranoid is now taken into account and LSM hooks can be used,
which is more consistent with other perf_event_open calls. For example
PERF_SAMPLE_PHYS_ADDR uses perf_allow_kernel() rather than just
perfmon_capable().

This also indirectly fixes the following error message which is
misleading because perf_event_paranoid is not taken into account by
perfmon_capable():

  $ perf record -e arm_spe/pa_enable/

  Error:
  Access to performance monitoring and observability operations is
  limited. Consider adjusting /proc/sys/kernel/perf_event_paranoid
  setting ...

Suggested-by: Al Grant <al.grant@arm.com>
Signed-off-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20240827145113.1224604-1-james.clark@linaro.org
Link: https://lore.kernel.org/all/20240807120039.GD37996@noisy.programming.kicks-ass.net/
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-30 11:42:24 +01:00
Krishna chaitanya chundru
db9e7a83d3 perf/dwc_pcie: Add support for QCOM vendor devices
Update the vendor table with QCOM PCIe vendorid.

Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240816-dwc_pmu_fix-v2-4-198b8ab1077c@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-23 16:07:26 +01:00
Krishna chaitanya chundru
b94b05478f perf/dwc_pcie: Always register for PCIe bus notifier
When the PCIe devices are discovered late, the driver can't find
the PCIe devices and returns in the init without registering with
the bus notifier. Due to that the devices which are discovered late
the driver can't register for this.

Register for bus notifier & driver even if the device is not found
as part of init.

Fixes: af9597adc2 ("drivers/perf: add DesignWare PCIe PMU driver")
Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240816-dwc_pmu_fix-v2-3-198b8ab1077c@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-23 16:07:26 +01:00
Krishna chaitanya chundru
e669388537 perf/dwc_pcie: Fix registration issue in multi PCIe controller instances
When there are multiple of instances of PCIe controllers, registration
to perf driver fails with this error.
sysfs: cannot create duplicate filename '/devices/platform/dwc_pcie_pmu.0'
CPU: 0 PID: 166 Comm: modprobe Not tainted 6.10.0-rc2-next-20240607-dirty
Hardware name: Qualcomm SA8775P Ride (DT)
Call trace:
 dump_backtrace.part.8+0x98/0xf0
 show_stack+0x14/0x1c
 dump_stack_lvl+0x74/0x88
 dump_stack+0x14/0x1c
 sysfs_warn_dup+0x60/0x78
 sysfs_create_dir_ns+0xe8/0x100
 kobject_add_internal+0x94/0x224
 kobject_add+0xa8/0x118
 device_add+0x298/0x7b4
 platform_device_add+0x1a0/0x228
 platform_device_register_full+0x11c/0x148
 dwc_pcie_register_dev+0x74/0xf0 [dwc_pcie_pmu]
 dwc_pcie_pmu_init+0x7c/0x1000 [dwc_pcie_pmu]
 do_one_initcall+0x58/0x1c0
 do_init_module+0x58/0x208
 load_module+0x1804/0x188c
 __do_sys_init_module+0x18c/0x1f0
 __arm64_sys_init_module+0x14/0x1c
 invoke_syscall+0x40/0xf8
 el0_svc_common.constprop.1+0x70/0xf4
 do_el0_svc+0x18/0x20
 el0_svc+0x28/0xb0
 el0t_64_sync_handler+0x9c/0xc0
 el0t_64_sync+0x160/0x164
kobject: kobject_add_internal failed for dwc_pcie_pmu.0 with -EEXIST,
don't try to register things with the same name in the same directory.

This is because of having same bdf value for devices under two different
controllers.

Update the logic to use sbdf which is a unique number in case of
multi instance also.

Fixes: af9597adc2 ("drivers/perf: add DesignWare PCIe PMU driver")
Signed-off-by: Krishna chaitanya chundru <quic_krichai@quicinc.com>
Reviewed-by: Yicong Yang <yangyicong@hisilicon.com>
Link: https://lore.kernel.org/r/20240816-dwc_pmu_fix-v2-1-198b8ab1077c@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-23 16:07:25 +01:00
Jing Zhang
a3dd920977 drivers/perf: Fix ali_drw_pmu driver interrupt status clearing
The alibaba_uncore_pmu driver forgot to clear all interrupt status
in the interrupt processing function. After the PMU counter overflow
interrupt occurred, an interrupt storm occurred, causing the system
to hang.

Therefore, clear the correct interrupt status in the interrupt handling
function to fix it.

Fixes: cf7b61073e ("drivers/perf: add DDR Sub-System Driveway PMU driver for Yitian 710 SoC")
Signed-off-by: Jing Zhang <renyu.zj@linux.alibaba.com>
Reviewed-by: Shuai Xue <xueshuai@linux.alibaba.com>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/1724297611-20686-1-git-send-email-renyu.zj@linux.alibaba.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-23 16:03:56 +01:00
Yangyu Chen
3cce331ee2 drivers/perf: apple_m1: add known PMU events
This patch adds known PMU events that can be found on /usr/share/kpep in
macOS. The m1_pmu_events and m1_pmu_event_affinity are generated from
the script [1], which consumes the plist file from Apple. And then added
these events to m1_pmu_perf_map and m1_pmu_event_attrs with Apple's
documentation [2].

Link: https://github.com/cyyself/m1-pmu-gen [1]
Link: https://developer.apple.com/download/apple-silicon-cpu-optimization-guide/ [2]
Signed-off-by: Yangyu Chen <cyy@cyyself.name>
Acked-by: Hector Martin <marcan@marcan.st>
Link: https://lore.kernel.org/r/tencent_C5DA658E64B8D13125210C8D707CD8823F08@qq.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-23 15:52:48 +01:00
Rob Herring (Arm)
d8226d8cfb perf: arm_pmuv3: Add support for Armv9.4 PMU instruction counter
Armv9.4/8.9 PMU adds optional support for a fixed instruction counter
similar to the fixed cycle counter. Support for the feature is indicated
in the ID_AA64DFR1_EL1 register PMICNTR field. The counter is not
accessible in AArch32.

Existing userspace using direct counter access won't know how to handle
the fixed instruction counter, so we have to avoid using the counter
when user access is requested.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Tested-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-7-280a8d7ff465@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-16 13:09:12 +01:00
Rob Herring (Arm)
126d7d7cce arm64: perf/kvm: Use a common PMU cycle counter define
The PMUv3 and KVM code each have a define for the PMU cycle counter
index. Move KVM's define to a shared location and use it for PMUv3
driver.

Reviewed-by: Marc Zyngier <maz@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Tested-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-5-280a8d7ff465@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-16 13:09:12 +01:00
Rob Herring (Arm)
a4a6e2078d perf: arm_pmuv3: Prepare for more than 32 counters
Various PMUv3 registers which are a mask of counters are 64-bit
registers, but the accessor functions take a u32. This has been fine as
the upper 32-bits have been RES0 as there has been a maximum of 32
counters prior to Armv9.4/8.9. With Armv9.4/8.9, a 33rd counter is
added. Update the accessor functions to use a u64 instead.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Tested-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-2-280a8d7ff465@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-16 13:09:12 +01:00
Rob Herring (Arm)
bf5ffc8c80 perf: arm_pmu: Remove event index to counter remapping
Xscale and Armv6 PMUs defined the cycle counter at 0 and event counters
starting at 1 and had 1:1 event index to counter numbering. On Armv7 and
later, this changed the cycle counter to 31 and event counters start at
0. The drivers for Armv7 and PMUv3 kept the old event index numbering
and introduced an event index to counter conversion. The conversion uses
masking to convert from event index to a counter number. This operation
relies on having at most 32 counters so that the cycle counter index 0
can be transformed to counter number 31.

Armv9.4 adds support for an additional fixed function counter
(instructions) which increases possible counters to more than 32, and
the conversion won't work anymore as a simple subtract and mask. The
primary reason for the translation (other than history) seems to be to
have a contiguous mask of counters 0-N. Keeping that would result in
more complicated index to counter conversions. Instead, store a mask of
available counters rather than just number of events. That provides more
information in addition to the number of events.

No (intended) functional changes.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Tested-by: James Clark <james.clark@linaro.org>
Link: https://lore.kernel.org/r/20240731-arm-pmu-3-9-icntr-v3-1-280a8d7ff465@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-16 13:09:11 +01:00
Rob Herring (Arm)
48b035121a perf: arm_pmu: Use of_property_present()
Use of_property_present() to test for property presence rather than
of_find_property(). This is part of a larger effort to remove callers
of of_find_property() and similar functions. of_find_property() leaks
the DT struct property and data pointers which is a problem for
dynamically allocated nodes which may be freed.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Reviewed-by: Anshuman Khandual <anshuman.khandual@arm.com>
Link: https://lore.kernel.org/r/20240731191312.1710417-15-robh@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-08-16 13:08:21 +01:00
Shifrin Dmitry
941a8e9b7a
perf: riscv: Fix selecting counters in legacy mode
It is required to check event type before checking event config.
Events with the different types can have the same config.
This check is missed for legacy mode code

For such perf usage:
    sysctl -w kernel.perf_user_access=2
    perf stat -e cycles,L1-dcache-loads --
driver will try to force both events to CYCLE counter.

This commit implements event type check before forcing
events on the special counters.

Signed-off-by: Shifrin Dmitry <dmitry.shifrin@syntacore.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Fixes: cc4c07c89a ("drivers: perf: Implement perf event mmap support in the SBI backend")
Link: https://lore.kernel.org/r/20240729125858.630653-1-dmitry.shifrin@syntacore.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-08-01 07:15:13 -07:00
Linus Torvalds
c9f33436d8 RISC-V Patches for the 6.11 Merge Window, Part 2
* Support for NUMA (via SRAT and SLIT), console output (via SPCR), and
   cache info (via PPTT) on ACPI-based systems.
 * The trap entry/exit code no longer breaks the return address stack
   predictor on many systems, which results in an improvement to trap
   latency.
 * Support for HAVE_ARCH_STACKLEAK.
 * The sv39 linear map has been extended to support 128GiB mappings.
 * The frequency of the mtime CSR is now visible via hwprobe.
 -----BEGIN PGP SIGNATURE-----
 
 iQJHBAABCAAxFiEEKzw3R0RoQ7JKlDp6LhMZ81+7GIkFAmaj2EYTHHBhbG1lckBk
 YWJiZWx0LmNvbQAKCRAuExnzX7sYiVG3D/9kNHTI09iPDJd6fTChE3cpMxy7xXXE
 URX3Avu+gYsJmIbYyg4RnQ8FGFN7icKBCrQqs7JmLliU0NU+YMcCcjsJA2QaivbD
 VAlaex1qNcvNGteHrpbqhr3Zs4zw8GlBkB3KFTLyPAp61bybGo0a/A5ONJ7ScQIW
 RWHewAPgb86cQ0Q34JpO87TqvMM0KMvhQP5dip+olaFjLRBzhXmGFZfHqA80kTWl
 0ytYclVCHZMtO/5mnQpuIOVs1IKw9L4wa0sivOQF0iLTqfKDFALa6yZsThHA/w3e
 JVuBAdQhcPZ3fgO2fUfJPlW16GmRC2/tdiFg5NFw8k4vo7DYBwX55ztPKXqDrJDM
 8ah85IeLiPar/A/uHdn6bPjK+aGMuzklKF50r62XXAc2fL8mza1sdvKCVOy2EOLn
 JyGI9c/10KpvN/DW8g7hPefhvbx4+tCKkFcPqf++VQha6W8cQdCKi+Li0Pm8TTnp
 XPQjIvSlDDG1Pl4ofgBSFoyB8pkBXNzvv8NZp+YYtnqSOLAKaZuP+KwA8TwHdvGM
 pdCXcL3KHiLy4/pJWEoNTutD0mbJ7PUIb2P/KkjqYDgp4F1n0Hg+/aeSIp+7a4Pv
 yTBctIGxrlriQMIdtWCR8tyhcPP4pDpGYkW0K15EE16G0NK0fjD89LEXYqT6ae2R
 C0QgiwnVe/eopg==
 =zeUn
 -----END PGP SIGNATURE-----

Merge tag 'riscv-for-linus-6.11-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux

Pull more RISC-V updates from Palmer Dabbelt:

 - Support for NUMA (via SRAT and SLIT), console output (via SPCR), and
   cache info (via PPTT) on ACPI-based systems.

 - The trap entry/exit code no longer breaks the return address stack
   predictor on many systems, which results in an improvement to trap
   latency.

 - Support for HAVE_ARCH_STACKLEAK.

 - The sv39 linear map has been extended to support 128GiB mappings.

 - The frequency of the mtime CSR is now visible via hwprobe.

* tag 'riscv-for-linus-6.11-mw2' of git://git.kernel.org/pub/scm/linux/kernel/git/riscv/linux: (21 commits)
  RISC-V: Provide the frequency of time CSR via hwprobe
  riscv: Extend sv39 linear mapping max size to 128G
  riscv: enable HAVE_ARCH_STACKLEAK
  riscv: signal: Remove unlikely() from WARN_ON() condition
  riscv: Improve exception and system call latency
  RISC-V: Select ACPI PPTT drivers
  riscv: cacheinfo: initialize cacheinfo's level and type from ACPI PPTT
  riscv: cacheinfo: remove the useless input parameter (node) of ci_leaf_init()
  RISC-V: ACPI: Enable SPCR table for console output on RISC-V
  riscv: boot: remove duplicated targets line
  trace: riscv: Remove deprecated kprobe on ftrace support
  riscv: cpufeature: Extract common elements from extension checking
  riscv: Introduce vendor variants of extension helpers
  riscv: Add vendor extensions to /proc/cpuinfo
  riscv: Extend cpufeature.c to detect vendor extensions
  RISC-V: run savedefconfig for defconfig
  RISC-V: hwprobe: sort EXT_KEY()s in hwprobe_isa_ext0() alphabetically
  ACPI: NUMA: replace pr_info with pr_debug in arch_acpi_numa_init
  ACPI: NUMA: change the ACPI_NUMA to a hidden option
  ACPI: NUMA: Add handler for SRAT RINTC affinity structure
  ...
2024-07-27 10:14:34 -07:00
Joel Granados
78eb4ea25c sysctl: treewide: constify the ctl_table argument of proc_handlers
const qualify the struct ctl_table argument in the proc_handler function
signatures. This is a prerequisite to moving the static ctl_table
structs into .rodata data which will ensure that proc_handler function
pointers cannot be modified.

This patch has been generated by the following coccinelle script:

```
  virtual patch

  @r1@
  identifier ctl, write, buffer, lenp, ppos;
  identifier func !~ "appldata_(timer|interval)_handler|sched_(rt|rr)_handler|rds_tcp_skbuf_handler|proc_sctp_do_(hmac_alg|rto_min|rto_max|udp_port|alpha_beta|auth|probe_interval)";
  @@

  int func(
  - struct ctl_table *ctl
  + const struct ctl_table *ctl
    ,int write, void *buffer, size_t *lenp, loff_t *ppos);

  @r2@
  identifier func, ctl, write, buffer, lenp, ppos;
  @@

  int func(
  - struct ctl_table *ctl
  + const struct ctl_table *ctl
    ,int write, void *buffer, size_t *lenp, loff_t *ppos)
  { ... }

  @r3@
  identifier func;
  @@

  int func(
  - struct ctl_table *
  + const struct ctl_table *
    ,int , void *, size_t *, loff_t *);

  @r4@
  identifier func, ctl;
  @@

  int func(
  - struct ctl_table *ctl
  + const struct ctl_table *ctl
    ,int , void *, size_t *, loff_t *);

  @r5@
  identifier func, write, buffer, lenp, ppos;
  @@

  int func(
  - struct ctl_table *
  + const struct ctl_table *
    ,int write, void *buffer, size_t *lenp, loff_t *ppos);

```

* Code formatting was adjusted in xfs_sysctl.c to comply with code
  conventions. The xfs_stats_clear_proc_handler,
  xfs_panic_mask_proc_handler and xfs_deprecated_dointvec_minmax where
  adjusted.

* The ctl_table argument in proc_watchdog_common was const qualified.
  This is called from a proc_handler itself and is calling back into
  another proc_handler, making it necessary to change it as part of the
  proc_handler migration.

Co-developed-by: Thomas Weißschuh <linux@weissschuh.net>
Signed-off-by: Thomas Weißschuh <linux@weissschuh.net>
Co-developed-by: Joel Granados <j.granados@samsung.com>
Signed-off-by: Joel Granados <j.granados@samsung.com>
2024-07-24 20:59:29 +02:00
Palmer Dabbelt
b9a603da42
Merge patch series "riscv: Separate vendor extensions from standard extensions"
Charlie Jenkins <charlie@rivosinc.com> says:

All extensions, both standard and vendor, live in one struct
"riscv_isa_ext". There is currently one vendor extension, xandespmu, but
it is likely that more vendor extensions will be added to the kernel in
the future. As more vendor extensions (and standard extensions) are
added, riscv_isa_ext will become more bloated with a mix of vendor and
standard extensions.

This also allows each vendor to be conditionally enabled through
Kconfig.

* b4-shazam-merge:
  riscv: cpufeature: Extract common elements from extension checking
  riscv: Introduce vendor variants of extension helpers
  riscv: Add vendor extensions to /proc/cpuinfo
  riscv: Extend cpufeature.c to detect vendor extensions

Link: https://lore.kernel.org/r/20240719-support_vendor_extensions-v3-0-0af7587bbec0@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22 15:37:01 -07:00
Charlie Jenkins
0f24254111
riscv: Introduce vendor variants of extension helpers
Vendor extensions are maintained in per-vendor structs (separate from
standard extensions which live in riscv_isa). Create vendor variants for
the existing extension helpers to interface with the riscv_isa_vendor
bitmaps.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andy Chiu <andy.chiu@sifive.com>
Link: https://lore.kernel.org/r/20240719-support_vendor_extensions-v3-3-0af7587bbec0@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22 15:36:56 -07:00
Charlie Jenkins
23c996fc2b
riscv: Extend cpufeature.c to detect vendor extensions
Instead of grouping all vendor extensions into the same riscv_isa_ext
that standard instructions use, create a struct
"riscv_isa_vendor_ext_data_list" that allows each vendor to maintain
their vendor extensions independently of the standard extensions.
xandespmu is currently the only vendor extension so that is the only
extension that is affected by this change.

An additional benefit of this is that the extensions of each vendor can
be conditionally enabled. A config RISCV_ISA_VENDOR_EXT_ANDES has been
added to allow for that.

Signed-off-by: Charlie Jenkins <charlie@rivosinc.com>
Reviewed-by: Conor Dooley <conor.dooley@microchip.com>
Reviewed-by: Andy Chiu <andy.chiu@sifive.com>
Tested-by: Yu Chien Peter Lin <peterlin@andestech.com>
Reviewed-by: Yu Chien Peter Lin <peterlin@andestech.com>
Link: https://lore.kernel.org/r/20240719-support_vendor_extensions-v3-1-0af7587bbec0@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-22 15:36:54 -07:00
Linus Torvalds
c89d780cc1 arm64 updates for 6.11:
* Virtual CPU hotplug support for arm64 ACPI systems
 
 * cpufeature infrastructure cleanups and making the FEAT_ECBHB ID bits
   visible to guests
 
 * CPU errata: expand the speculative SSBS workaround to more CPUs
 
 * arm64 ACPI:
 
   - acpi=nospcr option to disable SPCR as default console for arm64
 
   - Move some ACPI code (cpuidle, FFH) to drivers/acpi/arm64/
 
 * GICv3, use compile-time PMR values: optimise the way regular IRQs are
   masked/unmasked when GICv3 pseudo-NMIs are used, removing the need for
   a static key in fast paths by using a priority value chosen
   dynamically at boot time
 
 * arm64 perf updates:
 
   - Rework of the IMX PMU driver to enable support for I.MX95
 
   - Enable support for tertiary match groups in the CMN PMU driver
 
   - Initial refactoring of the CPU PMU code to prepare for the fixed
     instruction counter introduced by Arm v9.4
 
   - Add missing PMU driver MODULE_DESCRIPTION() strings
 
   - Hook up DT compatibles for recent CPU PMUs
 
 * arm64 kselftest updates:
 
   - Kernel mode NEON fp-stress
 
   - Cleanups, spelling mistakes
 
 * arm64 Documentation update with a minor clarification on TBI
 
 * Miscellaneous:
 
   - Fix missing IPI statistics
 
   - Implement raw_smp_processor_id() using thread_info rather than a
     per-CPU variable (better code generation)
 
   - Make MTE checking of in-kernel asynchronous tag faults conditional
     on KASAN being enabled
 
   - Minor cleanups, typos
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEE5RElWfyWxS+3PLO2a9axLQDIXvEFAmaQKN4ACgkQa9axLQDI
 XvE0Nw/+JZ6OEQ+DMUHXZfbWanvn1p0nVOoEV3MYVpOeQK1ILYCoDapatLNIlet0
 wcja7tohKbL1ifc7GOqlkitu824LMlotncrdOBycRqb/4C5KuJ+XhygFv5hGfX0T
 Uh2zbo4w52FPPEUMICfEAHrKT3QB9tv7f66xeUNbWWFqUn3rY02/ZVQVVdw6Zc0e
 fVYWGUUoQDR7+9hRkk6tnYw3+9YFVAUAbLWk+DGrW7WsANi6HuJ/rBMibwFI6RkG
 SZDZHum6vnwx0Dj9H7WrYaQCvUMm7AlckhQGfPbIFhUk6pWysfJtP5Qk49yiMl7p
 oRk/GrSXpiKumuetgTeOHbokiE1Nb8beXx0OcsjCu4RrIaNipAEpH1AkYy5oiKoT
 9vKZErMDtQgd96JHFVaXc+A3D2kxVfkc1u7K3TEfVRnZFV7CN+YL+61iyZ+uLxVi
 d9xrAmwRsWYFVQzlZG3NWvSeQBKisUA1L8JROlzWc/NFDwTqDGIt/zS4pZNL3+OM
 EXW0LyKt7Ijl6vPXKCXqrODRrPlcLc66VMZxofZOl0/dEqyJ+qLL4GUkWZu8lTqO
 BqydYnbTSjiDg/ntWjTrD0uJ8c40Qy7KTPEdaPqEIQvkDEsUGlOnhAQjHrnGNb9M
 psZtpDW2xm7GykEOcd6rgSz4Xeky2iLsaR4Wc7FTyDS0YRmeG44=
 =ob2k
 -----END PGP SIGNATURE-----

Merge tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux

Pull arm64 updates from Catalin Marinas:
 "The biggest part is the virtual CPU hotplug that touches ACPI,
  irqchip. We also have some GICv3 optimisation for pseudo-NMIs that has
  been queued via the arm64 tree. Otherwise the usual perf updates,
  kselftest, various small cleanups.

  Core:

   - Virtual CPU hotplug support for arm64 ACPI systems

   - cpufeature infrastructure cleanups and making the FEAT_ECBHB ID
     bits visible to guests

   - CPU errata: expand the speculative SSBS workaround to more CPUs

   - GICv3, use compile-time PMR values: optimise the way regular IRQs
     are masked/unmasked when GICv3 pseudo-NMIs are used, removing the
     need for a static key in fast paths by using a priority value
     chosen dynamically at boot time

  ACPI:

   - 'acpi=nospcr' option to disable SPCR as default console for arm64

   - Move some ACPI code (cpuidle, FFH) to drivers/acpi/arm64/

  Perf updates:

   - Rework of the IMX PMU driver to enable support for I.MX95

   - Enable support for tertiary match groups in the CMN PMU driver

   - Initial refactoring of the CPU PMU code to prepare for the fixed
     instruction counter introduced by Arm v9.4

   - Add missing PMU driver MODULE_DESCRIPTION() strings

   - Hook up DT compatibles for recent CPU PMUs

  Kselftest updates:

   - Kernel mode NEON fp-stress

   - Cleanups, spelling mistakes

  Miscellaneous:

   - arm64 Documentation update with a minor clarification on TBI

   - Fix missing IPI statistics

   - Implement raw_smp_processor_id() using thread_info rather than a
     per-CPU variable (better code generation)

   - Make MTE checking of in-kernel asynchronous tag faults conditional
     on KASAN being enabled

   - Minor cleanups, typos"

* tag 'arm64-upstream' of git://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux: (69 commits)
  selftests: arm64: tags: remove the result script
  selftests: arm64: tags_test: conform test to TAP output
  perf: add missing MODULE_DESCRIPTION() macros
  arm64: smp: Fix missing IPI statistics
  irqchip/gic-v3: Fix 'broken_rdists' unused warning when !SMP and !ACPI
  ACPI: Add acpi=nospcr to disable ACPI SPCR as default console on ARM64
  Documentation: arm64: Update memory.rst for TBI
  arm64/cpufeature: Replace custom macros with fields from ID_AA64PFR0_EL1
  KVM: arm64: Replace custom macros with fields from ID_AA64PFR0_EL1
  perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h
  perf: arm_v6/7_pmu: Drop non-DT probe support
  perf/arm: Move 32-bit PMU drivers to drivers/perf/
  perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check
  perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold
  arm64: Kconfig: Fix dependencies to enable ACPI_HOTPLUG_CPU
  perf: imx_perf: add support for i.MX95 platform
  perf: imx_perf: fix counter start and config sequence
  perf: imx_perf: refactor driver for imx93
  perf: imx_perf: let the driver manage the counter usage rather the user
  perf: imx_perf: add macro definitions for parsing config attr
  ...
2024-07-15 17:06:19 -07:00
Jeff Johnson
42bebc7cca perf: add missing MODULE_DESCRIPTION() macros
With ARCH=x86, make allmodconfig && make W=1 C=1 reports:
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/perf/arm-ccn.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/perf/fsl_imx8_ddr_perf.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/perf/marvell_cn10k_ddr_pmu.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/perf/arm_cspmu/arm_cspmu_module.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/perf/arm_cspmu/nvidia_cspmu.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/perf/arm_cspmu/ampere_cspmu.o
WARNING: modpost: missing MODULE_DESCRIPTION() in drivers/perf/cxl_pmu.o

Add the missing invocation of the MODULE_DESCRIPTION() macro to all
files which have a MODULE_LICENSE().

This includes drivers/perf/hisilicon/hisi_uncore_pmu.c which, although
it did not produce a warning with the x86 allmodconfig configuration,
may cause this warning with arm64 configurations.

Signed-off-by: Jeff Johnson <quic_jjohnson@quicinc.com>
Link: https://lore.kernel.org/r/20240709-md-drivers-perf-v3-1-513275b75ed0@quicinc.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-07-10 13:04:38 +01:00
Samuel Holland
16d3b1af09
perf: RISC-V: Check standard event availability
The RISC-V SBI PMU specification defines several standard hardware and
cache events. Currently, all of these events are exposed to userspace,
even when not actually implemented. They appear in the `perf list`
output, and commands like `perf stat` try to use them.

This is more than just a cosmetic issue, because the PMU driver's .add
function fails for these events, which causes pmu_groups_sched_in() to
prematurely stop scheduling in other (possibly valid) hardware events.

Add logic to check which events are supported by the hardware (i.e. can
be mapped to some counter), so only usable events are reported to
userspace. Since the kernel does not know the mapping between events and
possible counters, this check must happen during boot, when no counters
are in use. Make the check asynchronous to minimize impact on boot time.

Fixes: e999143459 ("RISC-V: Add perf platform driver based on SBI PMU extension")

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Reviewed-by: Atish Patra <atishp@rivosinc.com>
Tested-by: Atish Patra <atishp@rivosinc.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-3-e01cfddcf035@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-03 12:56:22 -07:00
Samuel Holland
7dd646cf74
drivers/perf: riscv: Reset the counter to hpmevent mapping while starting cpus
Currently, we stop all the counters while a new cpu is brought online.
However, the hpmevent to counter mappings are not reset. The firmware may
have some stale encoding in their mapping structure which may lead to
undesirable results. We have not encountered such scenario though.

Signed-off-by: Samuel Holland <samuel.holland@sifive.com>
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-2-e01cfddcf035@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-03 12:56:21 -07:00
Atish Patra
a3f24e83d1
drivers/perf: riscv: Do not update the event data if uptodate
In case of an counter overflow, the event data may get corrupted
if called from an external overflow handler. This happens because
we can't update the counter without starting it when SBI PMU
extension is in use. However, the prev_count has been already
updated at the first pass while the counter value is still the
old one.

The solution is simple where we don't need to update it again
if it is already updated which can be detected using hwc state.
The event state in the overflow handler is updated in the following
patch. Thus, this fix can't be backported to kernel version where
overflow support was added.

Fixes: a8625217a0 ("drivers/perf: riscv: Implement SBI PMU snapshot function")

Closes:https://lore.kernel.org/all/CC51D53B-846C-4D81-86FC-FBF969D0A0D6@pku.edu.cn/

Reported-by: garthlei@pku.edu.cn
Signed-off-by: Atish Patra <atishp@rivosinc.com>
Link: https://lore.kernel.org/r/20240628-misc_perf_fixes-v4-1-e01cfddcf035@rivosinc.com
Signed-off-by: Palmer Dabbelt <palmer@rivosinc.com>
2024-07-03 12:56:15 -07:00
Rob Herring (Arm)
d688ffa269 perf: arm_pmuv3: Include asm/arm_pmuv3.h from linux/perf/arm_pmuv3.h
The arm64 asm/arm_pmuv3.h depends on defines from
linux/perf/arm_pmuv3.h. Rather than depend on include order, follow the
usual pattern of "linux" headers including "asm" headers of the same
name.

With this change, the include of linux/kvm_host.h is problematic due to
circular includes:

In file included from ../arch/arm64/include/asm/arm_pmuv3.h:9,
                 from ../include/linux/perf/arm_pmuv3.h:312,
                 from ../include/kvm/arm_pmu.h:11,
                 from ../arch/arm64/include/asm/kvm_host.h:38,
                 from ../arch/arm64/mm/init.c:41:
../include/linux/kvm_host.h:383:30: error: field 'arch' has incomplete type

Switching to asm/kvm_host.h solves the issue.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20240626-arm-pmu-3-9-icntr-v2-5-c9784b4f4065@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03 14:07:14 +01:00
Rob Herring (Arm)
12f051c987 perf: arm_v6/7_pmu: Drop non-DT probe support
There are no non-DT based PMU users for v6 or v7, so drop the custom
non-DT probe table. Unfortunately XScale still needs non-DT probing.

Note that this drops support for arm1156 PMU, but there are no arm1156
based systems supported in the kernel.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20240626-arm-pmu-3-9-icntr-v2-4-c9784b4f4065@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03 14:07:14 +01:00
Rob Herring (Arm)
8d75537beb perf/arm: Move 32-bit PMU drivers to drivers/perf/
It is preferred to put drivers under drivers/ rather than under arch/.
The PMU drivers also depend on arm_pmu.c, so it's better to place them
all together.

Acked-by: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Link: https://lore.kernel.org/r/20240626-arm-pmu-3-9-icntr-v2-3-c9784b4f4065@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03 14:07:14 +01:00
Rob Herring (Arm)
598c1a2d9f perf: arm_pmuv3: Drop unnecessary IS_ENABLED(CONFIG_ARM64) check
The IS_ENABLED(CONFIG_ARM64) check for threshold support is unnecessary.
The purpose is to not enable thresholds on arm32, but if threshold is
non-zero, the check against threshold_max() just above here will have
errored out because threshold_max() is always 0 on arm32.

Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Mark rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20240626-arm-pmu-3-9-icntr-v2-2-c9784b4f4065@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03 14:07:14 +01:00
Rob Herring (Arm)
81e15ca3e5 perf: arm_pmuv3: Avoid assigning fixed cycle counter with threshold
If the user has requested a counting threshold for the CPU cycles event,
then the fixed cycle counter can't be assigned as it lacks threshold
support. Currently, the thresholds will work or not randomly depending
on which counter the event is assigned.

While using thresholds for CPU cycles doesn't make much sense, it can be
useful for testing purposes.

Fixes: 816c267544 ("arm64: perf: Add support for event counting threshold")
Signed-off-by: Rob Herring (Arm) <robh@kernel.org>
Acked-by: Mark Rutland <mark.rutland@arm.com>
Link: https://lore.kernel.org/r/20240626-arm-pmu-3-9-icntr-v2-1-c9784b4f4065@kernel.org
Signed-off-by: Will Deacon <will@kernel.org>
2024-07-03 14:07:14 +01:00
Xu Yang
d0d7c66c53 perf: imx_perf: add support for i.MX95 platform
i.MX95 has a DDR PMU which is almostly same as i.MX93, it now supports
read beat and write beat filter capabilities. This will add support for
i.MX95 and enhance the driver to support specific filter handling for it.

Usage:

For read beat:
~# perf stat -a -I 1000 -e imx9_ddr0/eddrtq_pm_rd_beat_filt2,axi_mask=ID_MASK,axi_id=ID/
~# perf stat -a -I 1000 -e imx9_ddr0/eddrtq_pm_rd_beat_filt1,axi_mask=ID_MASK,axi_id=ID/
~# perf stat -a -I 1000 -e imx9_ddr0/eddrtq_pm_rd_beat_filt0,axi_mask=ID_MASK,axi_id=ID/
eg: For edma2: perf stat -a -I 1000 -e imx9_ddr0/eddrtq_pm_rd_beat_filt0,axi_mask=0x00f,axi_id=0x00c/

For write beat:
~# perf stat -a -I 1000 -e imx9_ddr0/eddrtq_pm_wr_beat_filt,axi_mask=ID_MASK,axi_id=ID/
eg: For edma2: perf stat -a -I 1000 -e imx9_ddr0/eddrtq_pm_wr_beat_filt,axi_mask=0x00f,axi_id=0x00c/

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240529080358.703784-6-xu.yang_2@nxp.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-07-01 15:42:59 +01:00
Xu Yang
ac9aa295f7 perf: imx_perf: fix counter start and config sequence
In current driver, the counter will start firstly and then be configured.
This sequence is not correct for AXI filter events since the correct
AXI_MASK and AXI_ID are not set yet. Then the results may be inaccurate.

Reviewed-by: Frank Li <Frank.Li@nxp.com>
Fixes: 55691f99d4 ("drivers/perf: imx_ddr: Add support for NXP i.MX9 SoC DDRC PMU driver")
cc: stable@vger.kernel.org
Signed-off-by: Xu Yang <xu.yang_2@nxp.com>
Link: https://lore.kernel.org/r/20240529080358.703784-5-xu.yang_2@nxp.com
Signed-off-by: Will Deacon <will@kernel.org>
2024-07-01 15:42:59 +01:00