diff --git a/Documentation/admin-guide/perf/arm-ni.rst b/Documentation/admin-guide/perf/arm-ni.rst new file mode 100644 index 000000000000..d26a8f697c36 --- /dev/null +++ b/Documentation/admin-guide/perf/arm-ni.rst @@ -0,0 +1,17 @@ +==================================== +Arm Network-on Chip Interconnect PMU +==================================== + +NI-700 and friends implement a distinct PMU for each clock domain within the +interconnect. Correspondingly, the driver exposes multiple PMU devices named +arm_ni__cd_, where is an (arbitrary) instance identifier and is +the clock domain ID within that particular instance. If multiple NI instances +exist within a system, the PMU devices can be correlated with the underlying +hardware instance via sysfs parentage. + +Each PMU exposes base event aliases for the interface types present in its clock +domain. These require qualifying with the "eventid" and "nodeid" parameters +to specify the event code to count and the interface at which to count it +(per the configured hardware ID as reflected in the xxNI_NODE_INFO register). +The exception is the "cycles" alias for the PMU cycle counter, which is encoded +with the PMU node type and needs no further qualification. diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst index d47cd229d710..39b8e1fdd0cd 100644 --- a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst +++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst @@ -46,16 +46,16 @@ Some of the events only exist for specific configurations. DesignWare Cores (DWC) PCIe PMU Driver ======================================= -This driver adds PMU devices for each PCIe Root Port named based on the BDF of +This driver adds PMU devices for each PCIe Root Port named based on the SBDF of the Root Port. For example, - 30:03.0 PCI bridge: Device 1ded:8000 (rev 01) + 0001:30:03.0 PCI bridge: Device 1ded:8000 (rev 01) -the PMU device name for this Root Port is dwc_rootport_3018. +the PMU device name for this Root Port is dwc_rootport_13018. The DWC PCIe PMU driver registers a perf PMU driver, which provides description of available events and configuration options in sysfs, see -/sys/bus/event_source/devices/dwc_rootport_{bdf}. +/sys/bus/event_source/devices/dwc_rootport_{sbdf}. The "format" directory describes format of the config fields of the perf_event_attr structure. The "events" directory provides configuration @@ -66,16 +66,16 @@ The "perf list" command shall list the available events from sysfs, e.g.:: $# perf list | grep dwc_rootport <...> - dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/ [Kernel PMU event] + dwc_rootport_13018/Rx_PCIe_TLP_Data_Payload/ [Kernel PMU event] <...> - dwc_rootport_3018/rx_memory_read,lane=?/ [Kernel PMU event] + dwc_rootport_13018/rx_memory_read,lane=?/ [Kernel PMU event] Time Based Analysis Event Usage ------------------------------- Example usage of counting PCIe RX TLP data payload (Units of bytes):: - $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/ + $# perf stat -a -e dwc_rootport_13018/Rx_PCIe_TLP_Data_Payload/ The average RX/TX bandwidth can be calculated using the following formula: @@ -88,7 +88,7 @@ Lane Event Usage Each lane has the same event set and to avoid generating a list of hundreds of events, the user need to specify the lane ID explicitly, e.g.:: - $# perf stat -a -e dwc_rootport_3018/rx_memory_read,lane=4/ + $# perf stat -a -e dwc_rootport_13018/rx_memory_read,lane=4/ The driver does not support sampling, therefore "perf record" will not work. Per-task (without "-a") perf sessions are not supported. diff --git a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst index 5541ff40e06a..083ca50de896 100644 --- a/Documentation/admin-guide/perf/hisi-pcie-pmu.rst +++ b/Documentation/admin-guide/perf/hisi-pcie-pmu.rst @@ -28,7 +28,9 @@ The "identifier" sysfs file allows users to identify the version of the PMU hardware device. The "bus" sysfs file allows users to get the bus number of Root Ports -monitored by PMU. +monitored by PMU. Furthermore users can get the Root Ports range in +[bdf_min, bdf_max] from "bdf_min" and "bdf_max" sysfs attributes +respectively. Example usage of perf:: diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst index 7eb3dcd6f4da..8502bc174640 100644 --- a/Documentation/admin-guide/perf/index.rst +++ b/Documentation/admin-guide/perf/index.rst @@ -16,6 +16,7 @@ Performance monitor support starfive_starlink_pmu arm-ccn arm-cmn + arm-ni xgene-pmu arm_dsu_pmu thunderx2-pmu diff --git a/Documentation/devicetree/bindings/perf/arm,cmn.yaml b/Documentation/devicetree/bindings/perf/arm,cmn.yaml index 2e51072e794a..0e9d665584e6 100644 --- a/Documentation/devicetree/bindings/perf/arm,cmn.yaml +++ b/Documentation/devicetree/bindings/perf/arm,cmn.yaml @@ -16,6 +16,7 @@ properties: - arm,cmn-600 - arm,cmn-650 - arm,cmn-700 + - arm,cmn-s3 - arm,ci-700 reg: diff --git a/Documentation/devicetree/bindings/perf/arm,ni.yaml b/Documentation/devicetree/bindings/perf/arm,ni.yaml new file mode 100644 index 000000000000..d66fffa256d5 --- /dev/null +++ b/Documentation/devicetree/bindings/perf/arm,ni.yaml @@ -0,0 +1,30 @@ +# SPDX-License-Identifier: GPL-2.0-only OR BSD-2-Clause +%YAML 1.2 +--- +$id: http://devicetree.org/schemas/perf/arm,ni.yaml# +$schema: http://devicetree.org/meta-schemas/core.yaml# + +title: Arm NI (Network-on-Chip Interconnect) Performance Monitors + +maintainers: + - Robin Murphy + +properties: + compatible: + const: arm,ni-700 + + reg: + items: + - description: Complete configuration register space + + interrupts: + minItems: 1 + maxItems: 32 + description: Overflow interrupts, one per clock domain, in order of domain ID + +required: + - compatible + - reg + - interrupts + +additionalProperties: false diff --git a/MAINTAINERS b/MAINTAINERS index 7573f344b9fa..a0964ce5ffe0 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1738,6 +1738,17 @@ F: drivers/mtd/maps/physmap-versatile.* F: drivers/power/reset/arm-versatile-reboot.c F: drivers/soc/versatile/ +ARM INTERCONNECT PMU DRIVERS +M: Robin Murphy +S: Supported +F: Documentation/admin-guide/perf/arm-cmn.rst +F: Documentation/admin-guide/perf/arm-ni.rst +F: Documentation/devicetree/bindings/perf/arm,cmn.yaml +F: Documentation/devicetree/bindings/perf/arm,ni.yaml +F: drivers/perf/arm-cmn.c +F: drivers/perf/arm-ni.c +F: tools/perf/pmu-events/arch/arm64/arm/cmn/ + ARM KOMEDA DRM-KMS DRIVER M: Liviu Dudau S: Supported diff --git a/arch/arm/include/asm/arm_pmuv3.h b/arch/arm/include/asm/arm_pmuv3.h index a41b503b7dcd..f63ba8986b24 100644 --- a/arch/arm/include/asm/arm_pmuv3.h +++ b/arch/arm/include/asm/arm_pmuv3.h @@ -127,6 +127,12 @@ static inline u32 read_pmuver(void) return (dfr0 >> 24) & 0xf; } +static inline bool pmuv3_has_icntr(void) +{ + /* FEAT_PMUv3_ICNTR not accessible for 32-bit */ + return false; +} + static inline void write_pmcr(u32 val) { write_sysreg(val, PMCR); @@ -152,6 +158,13 @@ static inline u64 read_pmccntr(void) return read_sysreg(PMCCNTR); } +static inline void write_pmicntr(u64 val) {} + +static inline u64 read_pmicntr(void) +{ + return 0; +} + static inline void write_pmcntenset(u32 val) { write_sysreg(val, PMCNTENSET); @@ -177,6 +190,13 @@ static inline void write_pmccfiltr(u32 val) write_sysreg(val, PMCCFILTR); } +static inline void write_pmicfiltr(u64 val) {} + +static inline u64 read_pmicfiltr(void) +{ + return 0; +} + static inline void write_pmovsclr(u32 val) { write_sysreg(val, PMOVSR); diff --git a/arch/arm64/include/asm/arm_pmuv3.h b/arch/arm64/include/asm/arm_pmuv3.h index a4697a0b6835..468a049bc63b 100644 --- a/arch/arm64/include/asm/arm_pmuv3.h +++ b/arch/arm64/include/asm/arm_pmuv3.h @@ -33,6 +33,14 @@ static inline void write_pmevtypern(int n, unsigned long val) PMEVN_SWITCH(n, WRITE_PMEVTYPERN); } +#define RETURN_READ_PMEVTYPERN(n) \ + return read_sysreg(pmevtyper##n##_el0) +static inline unsigned long read_pmevtypern(int n) +{ + PMEVN_SWITCH(n, RETURN_READ_PMEVTYPERN); + return 0; +} + static inline unsigned long read_pmmir(void) { return read_cpuid(PMMIR_EL1); @@ -46,6 +54,14 @@ static inline u32 read_pmuver(void) ID_AA64DFR0_EL1_PMUVer_SHIFT); } +static inline bool pmuv3_has_icntr(void) +{ + u64 dfr1 = read_sysreg(id_aa64dfr1_el1); + + return !!cpuid_feature_extract_unsigned_field(dfr1, + ID_AA64DFR1_EL1_PMICNTR_SHIFT); +} + static inline void write_pmcr(u64 val) { write_sysreg(val, pmcr_el0); @@ -71,22 +87,32 @@ static inline u64 read_pmccntr(void) return read_sysreg(pmccntr_el0); } -static inline void write_pmcntenset(u32 val) +static inline void write_pmicntr(u64 val) +{ + write_sysreg_s(val, SYS_PMICNTR_EL0); +} + +static inline u64 read_pmicntr(void) +{ + return read_sysreg_s(SYS_PMICNTR_EL0); +} + +static inline void write_pmcntenset(u64 val) { write_sysreg(val, pmcntenset_el0); } -static inline void write_pmcntenclr(u32 val) +static inline void write_pmcntenclr(u64 val) { write_sysreg(val, pmcntenclr_el0); } -static inline void write_pmintenset(u32 val) +static inline void write_pmintenset(u64 val) { write_sysreg(val, pmintenset_el1); } -static inline void write_pmintenclr(u32 val) +static inline void write_pmintenclr(u64 val) { write_sysreg(val, pmintenclr_el1); } @@ -96,12 +122,27 @@ static inline void write_pmccfiltr(u64 val) write_sysreg(val, pmccfiltr_el0); } -static inline void write_pmovsclr(u32 val) +static inline u64 read_pmccfiltr(void) +{ + return read_sysreg(pmccfiltr_el0); +} + +static inline void write_pmicfiltr(u64 val) +{ + write_sysreg_s(val, SYS_PMICFILTR_EL0); +} + +static inline u64 read_pmicfiltr(void) +{ + return read_sysreg_s(SYS_PMICFILTR_EL0); +} + +static inline void write_pmovsclr(u64 val) { write_sysreg(val, pmovsclr_el0); } -static inline u32 read_pmovsclr(void) +static inline u64 read_pmovsclr(void) { return read_sysreg(pmovsclr_el0); } diff --git a/arch/arm64/include/asm/kvm_host.h b/arch/arm64/include/asm/kvm_host.h index a33f5996ca9f..c0fc753aac87 100644 --- a/arch/arm64/include/asm/kvm_host.h +++ b/arch/arm64/include/asm/kvm_host.h @@ -1330,12 +1330,12 @@ void kvm_arch_vcpu_load_debug_state_flags(struct kvm_vcpu *vcpu); void kvm_arch_vcpu_put_debug_state_flags(struct kvm_vcpu *vcpu); #ifdef CONFIG_KVM -void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr); -void kvm_clr_pmu_events(u32 clr); +void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr); +void kvm_clr_pmu_events(u64 clr); bool kvm_set_pmuserenr(u64 val); #else -static inline void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) {} -static inline void kvm_clr_pmu_events(u32 clr) {} +static inline void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) {} +static inline void kvm_clr_pmu_events(u64 clr) {} static inline bool kvm_set_pmuserenr(u64 val) { return false; diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h index 4a9ea103817e..00af1c331c1e 100644 --- a/arch/arm64/include/asm/sysreg.h +++ b/arch/arm64/include/asm/sysreg.h @@ -403,7 +403,6 @@ #define SYS_PMCNTENCLR_EL0 sys_reg(3, 3, 9, 12, 2) #define SYS_PMOVSCLR_EL0 sys_reg(3, 3, 9, 12, 3) #define SYS_PMSWINC_EL0 sys_reg(3, 3, 9, 12, 4) -#define SYS_PMSELR_EL0 sys_reg(3, 3, 9, 12, 5) #define SYS_PMCEID0_EL0 sys_reg(3, 3, 9, 12, 6) #define SYS_PMCEID1_EL0 sys_reg(3, 3, 9, 12, 7) #define SYS_PMCCNTR_EL0 sys_reg(3, 3, 9, 13, 0) diff --git a/arch/arm64/kvm/pmu-emul.c b/arch/arm64/kvm/pmu-emul.c index 82a2a003259c..ac36c438b8c1 100644 --- a/arch/arm64/kvm/pmu-emul.c +++ b/arch/arm64/kvm/pmu-emul.c @@ -233,7 +233,7 @@ void kvm_pmu_vcpu_init(struct kvm_vcpu *vcpu) int i; struct kvm_pmu *pmu = &vcpu->arch.pmu; - for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) + for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++) pmu->pmc[i].idx = i; } @@ -260,7 +260,7 @@ void kvm_pmu_vcpu_destroy(struct kvm_vcpu *vcpu) { int i; - for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) + for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++) kvm_pmu_release_perf_event(kvm_vcpu_idx_to_pmc(vcpu, i)); irq_work_sync(&vcpu->arch.pmu.overflow_work); } @@ -291,7 +291,7 @@ void kvm_pmu_enable_counter_mask(struct kvm_vcpu *vcpu, u64 val) if (!(kvm_vcpu_read_pmcr(vcpu) & ARMV8_PMU_PMCR_E) || !val) return; - for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) { + for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++) { struct kvm_pmc *pmc; if (!(val & BIT(i))) @@ -323,7 +323,7 @@ void kvm_pmu_disable_counter_mask(struct kvm_vcpu *vcpu, u64 val) if (!kvm_vcpu_has_pmu(vcpu) || !val) return; - for (i = 0; i < ARMV8_PMU_MAX_COUNTERS; i++) { + for (i = 0; i < KVM_ARMV8_PMU_MAX_COUNTERS; i++) { struct kvm_pmc *pmc; if (!(val & BIT(i))) @@ -910,10 +910,10 @@ u8 kvm_arm_pmu_get_max_counters(struct kvm *kvm) struct arm_pmu *arm_pmu = kvm->arch.arm_pmu; /* - * The arm_pmu->num_events considers the cycle counter as well. - * Ignore that and return only the general-purpose counters. + * The arm_pmu->cntr_mask considers the fixed counter(s) as well. + * Ignore those and return only the general-purpose counters. */ - return arm_pmu->num_events - 1; + return bitmap_weight(arm_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS); } static void kvm_arm_set_pmu(struct kvm *kvm, struct arm_pmu *arm_pmu) diff --git a/arch/arm64/kvm/pmu.c b/arch/arm64/kvm/pmu.c index 329819806096..0b3adf3e17b4 100644 --- a/arch/arm64/kvm/pmu.c +++ b/arch/arm64/kvm/pmu.c @@ -5,6 +5,8 @@ */ #include #include +#include +#include static DEFINE_PER_CPU(struct kvm_pmu_events, kvm_pmu_events); @@ -35,7 +37,7 @@ struct kvm_pmu_events *kvm_get_pmu_events(void) * Add events to track that we may want to switch at guest entry/exit * time. */ -void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) +void kvm_set_pmu_events(u64 set, struct perf_event_attr *attr) { struct kvm_pmu_events *pmu = kvm_get_pmu_events(); @@ -51,7 +53,7 @@ void kvm_set_pmu_events(u32 set, struct perf_event_attr *attr) /* * Stop tracking events */ -void kvm_clr_pmu_events(u32 clr) +void kvm_clr_pmu_events(u64 clr) { struct kvm_pmu_events *pmu = kvm_get_pmu_events(); @@ -62,79 +64,32 @@ void kvm_clr_pmu_events(u32 clr) pmu->events_guest &= ~clr; } -#define PMEVTYPER_READ_CASE(idx) \ - case idx: \ - return read_sysreg(pmevtyper##idx##_el0) - -#define PMEVTYPER_WRITE_CASE(idx) \ - case idx: \ - write_sysreg(val, pmevtyper##idx##_el0); \ - break - -#define PMEVTYPER_CASES(readwrite) \ - PMEVTYPER_##readwrite##_CASE(0); \ - PMEVTYPER_##readwrite##_CASE(1); \ - PMEVTYPER_##readwrite##_CASE(2); \ - PMEVTYPER_##readwrite##_CASE(3); \ - PMEVTYPER_##readwrite##_CASE(4); \ - PMEVTYPER_##readwrite##_CASE(5); \ - PMEVTYPER_##readwrite##_CASE(6); \ - PMEVTYPER_##readwrite##_CASE(7); \ - PMEVTYPER_##readwrite##_CASE(8); \ - PMEVTYPER_##readwrite##_CASE(9); \ - PMEVTYPER_##readwrite##_CASE(10); \ - PMEVTYPER_##readwrite##_CASE(11); \ - PMEVTYPER_##readwrite##_CASE(12); \ - PMEVTYPER_##readwrite##_CASE(13); \ - PMEVTYPER_##readwrite##_CASE(14); \ - PMEVTYPER_##readwrite##_CASE(15); \ - PMEVTYPER_##readwrite##_CASE(16); \ - PMEVTYPER_##readwrite##_CASE(17); \ - PMEVTYPER_##readwrite##_CASE(18); \ - PMEVTYPER_##readwrite##_CASE(19); \ - PMEVTYPER_##readwrite##_CASE(20); \ - PMEVTYPER_##readwrite##_CASE(21); \ - PMEVTYPER_##readwrite##_CASE(22); \ - PMEVTYPER_##readwrite##_CASE(23); \ - PMEVTYPER_##readwrite##_CASE(24); \ - PMEVTYPER_##readwrite##_CASE(25); \ - PMEVTYPER_##readwrite##_CASE(26); \ - PMEVTYPER_##readwrite##_CASE(27); \ - PMEVTYPER_##readwrite##_CASE(28); \ - PMEVTYPER_##readwrite##_CASE(29); \ - PMEVTYPER_##readwrite##_CASE(30) - /* * Read a value direct from PMEVTYPER where idx is 0-30 - * or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31). + * or PMxCFILTR_EL0 where idx is 31-32. */ static u64 kvm_vcpu_pmu_read_evtype_direct(int idx) { - switch (idx) { - PMEVTYPER_CASES(READ); - case ARMV8_PMU_CYCLE_IDX: - return read_sysreg(pmccfiltr_el0); - default: - WARN_ON(1); - } + if (idx == ARMV8_PMU_CYCLE_IDX) + return read_pmccfiltr(); + else if (idx == ARMV8_PMU_INSTR_IDX) + return read_pmicfiltr(); - return 0; + return read_pmevtypern(idx); } /* * Write a value direct to PMEVTYPER where idx is 0-30 - * or PMCCFILTR_EL0 where idx is ARMV8_PMU_CYCLE_IDX (31). + * or PMxCFILTR_EL0 where idx is 31-32. */ static void kvm_vcpu_pmu_write_evtype_direct(int idx, u32 val) { - switch (idx) { - PMEVTYPER_CASES(WRITE); - case ARMV8_PMU_CYCLE_IDX: - write_sysreg(val, pmccfiltr_el0); - break; - default: - WARN_ON(1); - } + if (idx == ARMV8_PMU_CYCLE_IDX) + write_pmccfiltr(val); + else if (idx == ARMV8_PMU_INSTR_IDX) + write_pmicfiltr(val); + else + write_pmevtypern(idx, val); } /* @@ -145,7 +100,7 @@ static void kvm_vcpu_pmu_enable_el0(unsigned long events) u64 typer; u32 counter; - for_each_set_bit(counter, &events, 32) { + for_each_set_bit(counter, &events, ARMPMU_MAX_HWEVENTS) { typer = kvm_vcpu_pmu_read_evtype_direct(counter); typer &= ~ARMV8_PMU_EXCLUDE_EL0; kvm_vcpu_pmu_write_evtype_direct(counter, typer); @@ -160,7 +115,7 @@ static void kvm_vcpu_pmu_disable_el0(unsigned long events) u64 typer; u32 counter; - for_each_set_bit(counter, &events, 32) { + for_each_set_bit(counter, &events, ARMPMU_MAX_HWEVENTS) { typer = kvm_vcpu_pmu_read_evtype_direct(counter); typer |= ARMV8_PMU_EXCLUDE_EL0; kvm_vcpu_pmu_write_evtype_direct(counter, typer); @@ -176,7 +131,7 @@ static void kvm_vcpu_pmu_disable_el0(unsigned long events) void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) { struct kvm_pmu_events *pmu; - u32 events_guest, events_host; + u64 events_guest, events_host; if (!kvm_arm_support_pmu_v3() || !has_vhe()) return; @@ -197,7 +152,7 @@ void kvm_vcpu_pmu_restore_guest(struct kvm_vcpu *vcpu) void kvm_vcpu_pmu_restore_host(struct kvm_vcpu *vcpu) { struct kvm_pmu_events *pmu; - u32 events_guest, events_host; + u64 events_guest, events_host; if (!kvm_arm_support_pmu_v3() || !has_vhe()) return; diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c index c90324060436..7db24de37ed6 100644 --- a/arch/arm64/kvm/sys_regs.c +++ b/arch/arm64/kvm/sys_regs.c @@ -18,6 +18,7 @@ #include #include +#include #include #include #include @@ -887,7 +888,7 @@ static u64 reset_pmevtyper(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r) static u64 reset_pmselr(struct kvm_vcpu *vcpu, const struct sys_reg_desc *r) { reset_unknown(vcpu, r); - __vcpu_sys_reg(vcpu, r->reg) &= ARMV8_PMU_COUNTER_MASK; + __vcpu_sys_reg(vcpu, r->reg) &= PMSELR_EL0_SEL_MASK; return __vcpu_sys_reg(vcpu, r->reg); } @@ -979,7 +980,7 @@ static bool access_pmselr(struct kvm_vcpu *vcpu, struct sys_reg_params *p, else /* return PMSELR.SEL field */ p->regval = __vcpu_sys_reg(vcpu, PMSELR_EL0) - & ARMV8_PMU_COUNTER_MASK; + & PMSELR_EL0_SEL_MASK; return true; } @@ -1047,8 +1048,8 @@ static bool access_pmu_evcntr(struct kvm_vcpu *vcpu, if (pmu_access_event_counter_el0_disabled(vcpu)) return false; - idx = __vcpu_sys_reg(vcpu, PMSELR_EL0) - & ARMV8_PMU_COUNTER_MASK; + idx = SYS_FIELD_GET(PMSELR_EL0, SEL, + __vcpu_sys_reg(vcpu, PMSELR_EL0)); } else if (r->Op2 == 0) { /* PMCCNTR_EL0 */ if (pmu_access_cycle_counter_el0_disabled(vcpu)) @@ -1098,7 +1099,7 @@ static bool access_pmu_evtyper(struct kvm_vcpu *vcpu, struct sys_reg_params *p, if (r->CRn == 9 && r->CRm == 13 && r->Op2 == 1) { /* PMXEVTYPER_EL0 */ - idx = __vcpu_sys_reg(vcpu, PMSELR_EL0) & ARMV8_PMU_COUNTER_MASK; + idx = SYS_FIELD_GET(PMSELR_EL0, SEL, __vcpu_sys_reg(vcpu, PMSELR_EL0)); reg = PMEVTYPER0_EL0 + idx; } else if (r->CRn == 14 && (r->CRm & 12) == 12) { idx = ((r->CRm & 3) << 3) | (r->Op2 & 7); diff --git a/arch/arm64/tools/sysreg b/arch/arm64/tools/sysreg index 7ceaa1e0b4bc..8d637ac4b7c6 100644 --- a/arch/arm64/tools/sysreg +++ b/arch/arm64/tools/sysreg @@ -2029,6 +2029,31 @@ Sysreg FAR_EL1 3 0 6 0 0 Field 63:0 ADDR EndSysreg +Sysreg PMICNTR_EL0 3 3 9 4 0 +Field 63:0 ICNT +EndSysreg + +Sysreg PMICFILTR_EL0 3 3 9 6 0 +Res0 63:59 +Field 58 SYNC +Field 57:56 VS +Res0 55:32 +Field 31 P +Field 30 U +Field 29 NSK +Field 28 NSU +Field 27 NSH +Field 26 M +Res0 25 +Field 24 SH +Field 23 T +Field 22 RLK +Field 21 RLU +Field 20 RLH +Res0 19:16 +Field 15:0 evtCount +EndSysreg + Sysreg PMSCR_EL1 3 0 9 9 0 Res0 63:8 Field 7:6 PCT @@ -2153,6 +2178,11 @@ Field 4 P Field 3:0 ALIGN EndSysreg +Sysreg PMSELR_EL0 3 3 9 12 5 +Res0 63:5 +Field 4:0 SEL +EndSysreg + SysregFields CONTEXTIDR_ELx Res0 63:32 Field 31:0 PROCID diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig index aa9530b4064f..bab8ba64162f 100644 --- a/drivers/perf/Kconfig +++ b/drivers/perf/Kconfig @@ -48,6 +48,13 @@ config ARM_CMN Support for PMU events monitoring on the Arm CMN-600 Coherent Mesh Network interconnect. +config ARM_NI + tristate "Arm NI-700 PMU support" + depends on ARM64 || COMPILE_TEST + help + Support for PMU events monitoring on the Arm NI-700 Network-on-Chip + interconnect and family. + config ARM_PMU depends on ARM || ARM64 bool "ARM PMU framework" diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile index d43df81d52f7..8268f38e42c5 100644 --- a/drivers/perf/Makefile +++ b/drivers/perf/Makefile @@ -3,6 +3,7 @@ obj-$(CONFIG_ARM_CCI_PMU) += arm-cci.o obj-$(CONFIG_ARM_CCN) += arm-ccn.o obj-$(CONFIG_ARM_CMN) += arm-cmn.o obj-$(CONFIG_ARM_DSU_PMU) += arm_dsu_pmu.o +obj-$(CONFIG_ARM_NI) += arm-ni.o obj-$(CONFIG_ARM_PMU) += arm_pmu.o arm_pmu_platform.o obj-$(CONFIG_ARM_PMU_ACPI) += arm_pmu_acpi.o obj-$(CONFIG_ARM_PMUV3) += arm_pmuv3.o diff --git a/drivers/perf/alibaba_uncore_drw_pmu.c b/drivers/perf/alibaba_uncore_drw_pmu.c index 38a2947ae813..c6ff1bc7d336 100644 --- a/drivers/perf/alibaba_uncore_drw_pmu.c +++ b/drivers/perf/alibaba_uncore_drw_pmu.c @@ -400,7 +400,7 @@ static irqreturn_t ali_drw_pmu_isr(int irq_num, void *data) } /* clear common counter intr status */ - clr_status = FIELD_PREP(ALI_DRW_PMCOM_CNT_OV_INTR_MASK, 1); + clr_status = FIELD_PREP(ALI_DRW_PMCOM_CNT_OV_INTR_MASK, status); writel(clr_status, drw_pmu->cfg_base + ALI_DRW_PMU_OV_INTR_CLR); } diff --git a/drivers/perf/apple_m1_cpu_pmu.c b/drivers/perf/apple_m1_cpu_pmu.c index f322e5ca1114..1d4d01e1275e 100644 --- a/drivers/perf/apple_m1_cpu_pmu.c +++ b/drivers/perf/apple_m1_cpu_pmu.c @@ -47,46 +47,79 @@ * implementations, we'll have to introduce per cpu-type tables. */ enum m1_pmu_events { - M1_PMU_PERFCTR_UNKNOWN_01 = 0x01, - M1_PMU_PERFCTR_CPU_CYCLES = 0x02, - M1_PMU_PERFCTR_INSTRUCTIONS = 0x8c, - M1_PMU_PERFCTR_UNKNOWN_8d = 0x8d, - M1_PMU_PERFCTR_UNKNOWN_8e = 0x8e, - M1_PMU_PERFCTR_UNKNOWN_8f = 0x8f, - M1_PMU_PERFCTR_UNKNOWN_90 = 0x90, - M1_PMU_PERFCTR_UNKNOWN_93 = 0x93, - M1_PMU_PERFCTR_UNKNOWN_94 = 0x94, - M1_PMU_PERFCTR_UNKNOWN_95 = 0x95, - M1_PMU_PERFCTR_UNKNOWN_96 = 0x96, - M1_PMU_PERFCTR_UNKNOWN_97 = 0x97, - M1_PMU_PERFCTR_UNKNOWN_98 = 0x98, - M1_PMU_PERFCTR_UNKNOWN_99 = 0x99, - M1_PMU_PERFCTR_UNKNOWN_9a = 0x9a, - M1_PMU_PERFCTR_UNKNOWN_9b = 0x9b, - M1_PMU_PERFCTR_UNKNOWN_9c = 0x9c, - M1_PMU_PERFCTR_UNKNOWN_9f = 0x9f, - M1_PMU_PERFCTR_UNKNOWN_bf = 0xbf, - M1_PMU_PERFCTR_UNKNOWN_c0 = 0xc0, - M1_PMU_PERFCTR_UNKNOWN_c1 = 0xc1, - M1_PMU_PERFCTR_UNKNOWN_c4 = 0xc4, - M1_PMU_PERFCTR_UNKNOWN_c5 = 0xc5, - M1_PMU_PERFCTR_UNKNOWN_c6 = 0xc6, - M1_PMU_PERFCTR_UNKNOWN_c8 = 0xc8, - M1_PMU_PERFCTR_UNKNOWN_ca = 0xca, - M1_PMU_PERFCTR_UNKNOWN_cb = 0xcb, - M1_PMU_PERFCTR_UNKNOWN_f5 = 0xf5, - M1_PMU_PERFCTR_UNKNOWN_f6 = 0xf6, - M1_PMU_PERFCTR_UNKNOWN_f7 = 0xf7, - M1_PMU_PERFCTR_UNKNOWN_f8 = 0xf8, - M1_PMU_PERFCTR_UNKNOWN_fd = 0xfd, - M1_PMU_PERFCTR_LAST = M1_PMU_CFG_EVENT, + M1_PMU_PERFCTR_RETIRE_UOP = 0x1, + M1_PMU_PERFCTR_CORE_ACTIVE_CYCLE = 0x2, + M1_PMU_PERFCTR_L1I_TLB_FILL = 0x4, + M1_PMU_PERFCTR_L1D_TLB_FILL = 0x5, + M1_PMU_PERFCTR_MMU_TABLE_WALK_INSTRUCTION = 0x7, + M1_PMU_PERFCTR_MMU_TABLE_WALK_DATA = 0x8, + M1_PMU_PERFCTR_L2_TLB_MISS_INSTRUCTION = 0xa, + M1_PMU_PERFCTR_L2_TLB_MISS_DATA = 0xb, + M1_PMU_PERFCTR_MMU_VIRTUAL_MEMORY_FAULT_NONSPEC = 0xd, + M1_PMU_PERFCTR_SCHEDULE_UOP = 0x52, + M1_PMU_PERFCTR_INTERRUPT_PENDING = 0x6c, + M1_PMU_PERFCTR_MAP_STALL_DISPATCH = 0x70, + M1_PMU_PERFCTR_MAP_REWIND = 0x75, + M1_PMU_PERFCTR_MAP_STALL = 0x76, + M1_PMU_PERFCTR_MAP_INT_UOP = 0x7c, + M1_PMU_PERFCTR_MAP_LDST_UOP = 0x7d, + M1_PMU_PERFCTR_MAP_SIMD_UOP = 0x7e, + M1_PMU_PERFCTR_FLUSH_RESTART_OTHER_NONSPEC = 0x84, + M1_PMU_PERFCTR_INST_ALL = 0x8c, + M1_PMU_PERFCTR_INST_BRANCH = 0x8d, + M1_PMU_PERFCTR_INST_BRANCH_CALL = 0x8e, + M1_PMU_PERFCTR_INST_BRANCH_RET = 0x8f, + M1_PMU_PERFCTR_INST_BRANCH_TAKEN = 0x90, + M1_PMU_PERFCTR_INST_BRANCH_INDIR = 0x93, + M1_PMU_PERFCTR_INST_BRANCH_COND = 0x94, + M1_PMU_PERFCTR_INST_INT_LD = 0x95, + M1_PMU_PERFCTR_INST_INT_ST = 0x96, + M1_PMU_PERFCTR_INST_INT_ALU = 0x97, + M1_PMU_PERFCTR_INST_SIMD_LD = 0x98, + M1_PMU_PERFCTR_INST_SIMD_ST = 0x99, + M1_PMU_PERFCTR_INST_SIMD_ALU = 0x9a, + M1_PMU_PERFCTR_INST_LDST = 0x9b, + M1_PMU_PERFCTR_INST_BARRIER = 0x9c, + M1_PMU_PERFCTR_UNKNOWN_9f = 0x9f, + M1_PMU_PERFCTR_L1D_TLB_ACCESS = 0xa0, + M1_PMU_PERFCTR_L1D_TLB_MISS = 0xa1, + M1_PMU_PERFCTR_L1D_CACHE_MISS_ST = 0xa2, + M1_PMU_PERFCTR_L1D_CACHE_MISS_LD = 0xa3, + M1_PMU_PERFCTR_LD_UNIT_UOP = 0xa6, + M1_PMU_PERFCTR_ST_UNIT_UOP = 0xa7, + M1_PMU_PERFCTR_L1D_CACHE_WRITEBACK = 0xa8, + M1_PMU_PERFCTR_LDST_X64_UOP = 0xb1, + M1_PMU_PERFCTR_LDST_XPG_UOP = 0xb2, + M1_PMU_PERFCTR_ATOMIC_OR_EXCLUSIVE_SUCC = 0xb3, + M1_PMU_PERFCTR_ATOMIC_OR_EXCLUSIVE_FAIL = 0xb4, + M1_PMU_PERFCTR_L1D_CACHE_MISS_LD_NONSPEC = 0xbf, + M1_PMU_PERFCTR_L1D_CACHE_MISS_ST_NONSPEC = 0xc0, + M1_PMU_PERFCTR_L1D_TLB_MISS_NONSPEC = 0xc1, + M1_PMU_PERFCTR_ST_MEMORY_ORDER_VIOLATION_NONSPEC = 0xc4, + M1_PMU_PERFCTR_BRANCH_COND_MISPRED_NONSPEC = 0xc5, + M1_PMU_PERFCTR_BRANCH_INDIR_MISPRED_NONSPEC = 0xc6, + M1_PMU_PERFCTR_BRANCH_RET_INDIR_MISPRED_NONSPEC = 0xc8, + M1_PMU_PERFCTR_BRANCH_CALL_INDIR_MISPRED_NONSPEC = 0xca, + M1_PMU_PERFCTR_BRANCH_MISPRED_NONSPEC = 0xcb, + M1_PMU_PERFCTR_L1I_TLB_MISS_DEMAND = 0xd4, + M1_PMU_PERFCTR_MAP_DISPATCH_BUBBLE = 0xd6, + M1_PMU_PERFCTR_L1I_CACHE_MISS_DEMAND = 0xdb, + M1_PMU_PERFCTR_FETCH_RESTART = 0xde, + M1_PMU_PERFCTR_ST_NT_UOP = 0xe5, + M1_PMU_PERFCTR_LD_NT_UOP = 0xe6, + M1_PMU_PERFCTR_UNKNOWN_f5 = 0xf5, + M1_PMU_PERFCTR_UNKNOWN_f6 = 0xf6, + M1_PMU_PERFCTR_UNKNOWN_f7 = 0xf7, + M1_PMU_PERFCTR_UNKNOWN_f8 = 0xf8, + M1_PMU_PERFCTR_UNKNOWN_fd = 0xfd, + M1_PMU_PERFCTR_LAST = M1_PMU_CFG_EVENT, /* * From this point onwards, these are not actual HW events, * but attributes that get stored in hw->config_base. */ - M1_PMU_CFG_COUNT_USER = BIT(8), - M1_PMU_CFG_COUNT_KERNEL = BIT(9), + M1_PMU_CFG_COUNT_USER = BIT(8), + M1_PMU_CFG_COUNT_KERNEL = BIT(9), }; /* @@ -96,46 +129,45 @@ enum m1_pmu_events { * counters had strange affinities. */ static const u16 m1_pmu_event_affinity[M1_PMU_PERFCTR_LAST + 1] = { - [0 ... M1_PMU_PERFCTR_LAST] = ANY_BUT_0_1, - [M1_PMU_PERFCTR_UNKNOWN_01] = BIT(7), - [M1_PMU_PERFCTR_CPU_CYCLES] = ANY_BUT_0_1 | BIT(0), - [M1_PMU_PERFCTR_INSTRUCTIONS] = BIT(7) | BIT(1), - [M1_PMU_PERFCTR_UNKNOWN_8d] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_8e] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_8f] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_90] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_93] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_94] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_95] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_96] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_97] = BIT(7), - [M1_PMU_PERFCTR_UNKNOWN_98] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_99] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_9a] = BIT(7), - [M1_PMU_PERFCTR_UNKNOWN_9b] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_9c] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_9f] = BIT(7), - [M1_PMU_PERFCTR_UNKNOWN_bf] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_c0] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_c1] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_c4] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_c5] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_c6] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_c8] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_ca] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_cb] = ONLY_5_6_7, - [M1_PMU_PERFCTR_UNKNOWN_f5] = ONLY_2_4_6, - [M1_PMU_PERFCTR_UNKNOWN_f6] = ONLY_2_4_6, - [M1_PMU_PERFCTR_UNKNOWN_f7] = ONLY_2_4_6, - [M1_PMU_PERFCTR_UNKNOWN_f8] = ONLY_2_TO_7, - [M1_PMU_PERFCTR_UNKNOWN_fd] = ONLY_2_4_6, + [0 ... M1_PMU_PERFCTR_LAST] = ANY_BUT_0_1, + [M1_PMU_PERFCTR_RETIRE_UOP] = BIT(7), + [M1_PMU_PERFCTR_CORE_ACTIVE_CYCLE] = ANY_BUT_0_1 | BIT(0), + [M1_PMU_PERFCTR_INST_ALL] = BIT(7) | BIT(1), + [M1_PMU_PERFCTR_INST_BRANCH] = ONLY_5_6_7, + [M1_PMU_PERFCTR_INST_BRANCH_CALL] = ONLY_5_6_7, + [M1_PMU_PERFCTR_INST_BRANCH_RET] = ONLY_5_6_7, + [M1_PMU_PERFCTR_INST_BRANCH_TAKEN] = ONLY_5_6_7, + [M1_PMU_PERFCTR_INST_BRANCH_INDIR] = ONLY_5_6_7, + [M1_PMU_PERFCTR_INST_BRANCH_COND] = ONLY_5_6_7, + [M1_PMU_PERFCTR_INST_INT_LD] = ONLY_5_6_7, + [M1_PMU_PERFCTR_INST_INT_ST] = BIT(7), + [M1_PMU_PERFCTR_INST_INT_ALU] = BIT(7), + [M1_PMU_PERFCTR_INST_SIMD_LD] = ONLY_5_6_7, + [M1_PMU_PERFCTR_INST_SIMD_ST] = ONLY_5_6_7, + [M1_PMU_PERFCTR_INST_SIMD_ALU] = BIT(7), + [M1_PMU_PERFCTR_INST_LDST] = BIT(7), + [M1_PMU_PERFCTR_INST_BARRIER] = ONLY_5_6_7, + [M1_PMU_PERFCTR_UNKNOWN_9f] = BIT(7), + [M1_PMU_PERFCTR_L1D_CACHE_MISS_LD_NONSPEC] = ONLY_5_6_7, + [M1_PMU_PERFCTR_L1D_CACHE_MISS_ST_NONSPEC] = ONLY_5_6_7, + [M1_PMU_PERFCTR_L1D_TLB_MISS_NONSPEC] = ONLY_5_6_7, + [M1_PMU_PERFCTR_ST_MEMORY_ORDER_VIOLATION_NONSPEC] = ONLY_5_6_7, + [M1_PMU_PERFCTR_BRANCH_COND_MISPRED_NONSPEC] = ONLY_5_6_7, + [M1_PMU_PERFCTR_BRANCH_INDIR_MISPRED_NONSPEC] = ONLY_5_6_7, + [M1_PMU_PERFCTR_BRANCH_RET_INDIR_MISPRED_NONSPEC] = ONLY_5_6_7, + [M1_PMU_PERFCTR_BRANCH_CALL_INDIR_MISPRED_NONSPEC] = ONLY_5_6_7, + [M1_PMU_PERFCTR_BRANCH_MISPRED_NONSPEC] = ONLY_5_6_7, + [M1_PMU_PERFCTR_UNKNOWN_f5] = ONLY_2_4_6, + [M1_PMU_PERFCTR_UNKNOWN_f6] = ONLY_2_4_6, + [M1_PMU_PERFCTR_UNKNOWN_f7] = ONLY_2_4_6, + [M1_PMU_PERFCTR_UNKNOWN_f8] = ONLY_2_TO_7, + [M1_PMU_PERFCTR_UNKNOWN_fd] = ONLY_2_4_6, }; static const unsigned m1_pmu_perf_map[PERF_COUNT_HW_MAX] = { PERF_MAP_ALL_UNSUPPORTED, - [PERF_COUNT_HW_CPU_CYCLES] = M1_PMU_PERFCTR_CPU_CYCLES, - [PERF_COUNT_HW_INSTRUCTIONS] = M1_PMU_PERFCTR_INSTRUCTIONS, - /* No idea about the rest yet */ + [PERF_COUNT_HW_CPU_CYCLES] = M1_PMU_PERFCTR_CORE_ACTIVE_CYCLE, + [PERF_COUNT_HW_INSTRUCTIONS] = M1_PMU_PERFCTR_INST_ALL, }; /* sysfs definitions */ @@ -154,8 +186,8 @@ static ssize_t m1_pmu_events_sysfs_show(struct device *dev, PMU_EVENT_ATTR_ID(name, m1_pmu_events_sysfs_show, config) static struct attribute *m1_pmu_event_attrs[] = { - M1_PMU_EVENT_ATTR(cycles, M1_PMU_PERFCTR_CPU_CYCLES), - M1_PMU_EVENT_ATTR(instructions, M1_PMU_PERFCTR_INSTRUCTIONS), + M1_PMU_EVENT_ATTR(cycles, M1_PMU_PERFCTR_CORE_ACTIVE_CYCLE), + M1_PMU_EVENT_ATTR(instructions, M1_PMU_PERFCTR_INST_ALL), NULL, }; @@ -400,7 +432,7 @@ static irqreturn_t m1_pmu_handle_irq(struct arm_pmu *cpu_pmu) regs = get_irq_regs(); - for (idx = 0; idx < cpu_pmu->num_events; idx++) { + for_each_set_bit(idx, cpu_pmu->cntr_mask, M1_PMU_NR_COUNTERS) { struct perf_event *event = cpuc->events[idx]; struct perf_sample_data data; @@ -560,7 +592,7 @@ static int m1_pmu_init(struct arm_pmu *cpu_pmu, u32 flags) cpu_pmu->reset = m1_pmu_reset; cpu_pmu->set_event_filter = m1_pmu_set_event_filter; - cpu_pmu->num_events = M1_PMU_NR_COUNTERS; + bitmap_set(cpu_pmu->cntr_mask, 0, M1_PMU_NR_COUNTERS); cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] = &m1_pmu_events_attr_group; cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_FORMATS] = &m1_pmu_format_attr_group; return 0; diff --git a/drivers/perf/arm-cmn.c b/drivers/perf/arm-cmn.c index c932d9d355cf..397a46410f7c 100644 --- a/drivers/perf/arm-cmn.c +++ b/drivers/perf/arm-cmn.c @@ -24,14 +24,6 @@ #define CMN_NI_NODE_ID GENMASK_ULL(31, 16) #define CMN_NI_LOGICAL_ID GENMASK_ULL(47, 32) -#define CMN_NODEID_DEVID(reg) ((reg) & 3) -#define CMN_NODEID_EXT_DEVID(reg) ((reg) & 1) -#define CMN_NODEID_PID(reg) (((reg) >> 2) & 1) -#define CMN_NODEID_EXT_PID(reg) (((reg) >> 1) & 3) -#define CMN_NODEID_1x1_PID(reg) (((reg) >> 2) & 7) -#define CMN_NODEID_X(reg, bits) ((reg) >> (3 + (bits))) -#define CMN_NODEID_Y(reg, bits) (((reg) >> 3) & ((1U << (bits)) - 1)) - #define CMN_CHILD_INFO 0x0080 #define CMN_CI_CHILD_COUNT GENMASK_ULL(15, 0) #define CMN_CI_CHILD_PTR_OFFSET GENMASK_ULL(31, 16) @@ -43,6 +35,9 @@ #define CMN_MAX_XPS (CMN_MAX_DIMENSION * CMN_MAX_DIMENSION) #define CMN_MAX_DTMS (CMN_MAX_XPS + (CMN_MAX_DIMENSION - 1) * 4) +/* Currently XPs are the node type we can have most of; others top out at 128 */ +#define CMN_MAX_NODES_PER_EVENT CMN_MAX_XPS + /* The CFG node has various info besides the discovery tree */ #define CMN_CFGM_PERIPH_ID_01 0x0008 #define CMN_CFGM_PID0_PART_0 GENMASK_ULL(7, 0) @@ -50,24 +45,28 @@ #define CMN_CFGM_PERIPH_ID_23 0x0010 #define CMN_CFGM_PID2_REVISION GENMASK_ULL(7, 4) -#define CMN_CFGM_INFO_GLOBAL 0x900 +#define CMN_CFGM_INFO_GLOBAL 0x0900 #define CMN_INFO_MULTIPLE_DTM_EN BIT_ULL(63) #define CMN_INFO_RSP_VC_NUM GENMASK_ULL(53, 52) #define CMN_INFO_DAT_VC_NUM GENMASK_ULL(51, 50) +#define CMN_INFO_DEVICE_ISO_ENABLE BIT_ULL(44) -#define CMN_CFGM_INFO_GLOBAL_1 0x908 +#define CMN_CFGM_INFO_GLOBAL_1 0x0908 #define CMN_INFO_SNP_VC_NUM GENMASK_ULL(3, 2) #define CMN_INFO_REQ_VC_NUM GENMASK_ULL(1, 0) /* XPs also have some local topology info which has uses too */ #define CMN_MXP__CONNECT_INFO(p) (0x0008 + 8 * (p)) -#define CMN__CONNECT_INFO_DEVICE_TYPE GENMASK_ULL(4, 0) +#define CMN__CONNECT_INFO_DEVICE_TYPE GENMASK_ULL(5, 0) #define CMN_MAX_PORTS 6 #define CI700_CONNECT_INFO_P2_5_OFFSET 0x10 /* PMU registers occupy the 3rd 4KB page of each node's region */ #define CMN_PMU_OFFSET 0x2000 +/* ...except when they don't :( */ +#define CMN_S3_DTM_OFFSET 0xa000 +#define CMN_S3_PMU_OFFSET 0xd900 /* For most nodes, this is all there is */ #define CMN_PMU_EVENT_SEL 0x000 @@ -78,7 +77,8 @@ /* Technically this is 4 bits wide on DNs, but we only use 2 there anyway */ #define CMN__PMU_OCCUP1_ID GENMASK_ULL(34, 32) -/* HN-Ps are weird... */ +/* Some types are designed to coexist with another device in the same node */ +#define CMN_CCLA_PMU_EVENT_SEL 0x008 #define CMN_HNP_PMU_EVENT_SEL 0x008 /* DTMs live in the PMU space of XP registers */ @@ -123,27 +123,28 @@ /* The DTC node is where the magic happens */ #define CMN_DT_DTC_CTL 0x0a00 #define CMN_DT_DTC_CTL_DT_EN BIT(0) +#define CMN_DT_DTC_CTL_CG_DISABLE BIT(10) /* DTC counters are paired in 64-bit registers on a 16-byte stride. Yuck */ #define _CMN_DT_CNT_REG(n) ((((n) / 2) * 4 + (n) % 2) * 4) -#define CMN_DT_PMEVCNT(n) (CMN_PMU_OFFSET + _CMN_DT_CNT_REG(n)) -#define CMN_DT_PMCCNTR (CMN_PMU_OFFSET + 0x40) +#define CMN_DT_PMEVCNT(dtc, n) ((dtc)->pmu_base + _CMN_DT_CNT_REG(n)) +#define CMN_DT_PMCCNTR(dtc) ((dtc)->pmu_base + 0x40) -#define CMN_DT_PMEVCNTSR(n) (CMN_PMU_OFFSET + 0x50 + _CMN_DT_CNT_REG(n)) -#define CMN_DT_PMCCNTRSR (CMN_PMU_OFFSET + 0x90) +#define CMN_DT_PMEVCNTSR(dtc, n) ((dtc)->pmu_base + 0x50 + _CMN_DT_CNT_REG(n)) +#define CMN_DT_PMCCNTRSR(dtc) ((dtc)->pmu_base + 0x90) -#define CMN_DT_PMCR (CMN_PMU_OFFSET + 0x100) +#define CMN_DT_PMCR(dtc) ((dtc)->pmu_base + 0x100) #define CMN_DT_PMCR_PMU_EN BIT(0) #define CMN_DT_PMCR_CNTR_RST BIT(5) #define CMN_DT_PMCR_OVFL_INTR_EN BIT(6) -#define CMN_DT_PMOVSR (CMN_PMU_OFFSET + 0x118) -#define CMN_DT_PMOVSR_CLR (CMN_PMU_OFFSET + 0x120) +#define CMN_DT_PMOVSR(dtc) ((dtc)->pmu_base + 0x118) +#define CMN_DT_PMOVSR_CLR(dtc) ((dtc)->pmu_base + 0x120) -#define CMN_DT_PMSSR (CMN_PMU_OFFSET + 0x128) +#define CMN_DT_PMSSR(dtc) ((dtc)->pmu_base + 0x128) #define CMN_DT_PMSSR_SS_STATUS(n) BIT(n) -#define CMN_DT_PMSRR (CMN_PMU_OFFSET + 0x130) +#define CMN_DT_PMSRR(dtc) ((dtc)->pmu_base + 0x130) #define CMN_DT_PMSRR_SS_REQ BIT(0) #define CMN_DT_NUM_COUNTERS 8 @@ -198,10 +199,11 @@ enum cmn_model { CMN650 = 2, CMN700 = 4, CI700 = 8, + CMNS3 = 16, /* ...and then we can use bitmap tricks for commonality */ CMN_ANY = -1, NOT_CMN600 = -2, - CMN_650ON = CMN650 | CMN700, + CMN_650ON = CMN650 | CMN700 | CMNS3, }; /* Actual part numbers and revision IDs defined by the hardware */ @@ -210,6 +212,7 @@ enum cmn_part { PART_CMN650 = 0x436, PART_CMN700 = 0x43c, PART_CI700 = 0x43a, + PART_CMN_S3 = 0x43e, }; /* CMN-600 r0px shouldn't exist in silicon, thankfully */ @@ -261,6 +264,7 @@ enum cmn_node_type { CMN_TYPE_HNS = 0x200, CMN_TYPE_HNS_MPAM_S, CMN_TYPE_HNS_MPAM_NS, + CMN_TYPE_APB = 0x1000, /* Not a real node type */ CMN_TYPE_WP = 0x7770 }; @@ -280,8 +284,11 @@ struct arm_cmn_node { u16 id, logid; enum cmn_node_type type; + /* XP properties really, but replicated to children for convenience */ u8 dtm; s8 dtc; + u8 portid_bits:4; + u8 deviceid_bits:4; /* DN/HN-F/CXHA */ struct { u8 val : 4; @@ -307,8 +314,9 @@ struct arm_cmn_dtm { struct arm_cmn_dtc { void __iomem *base; + void __iomem *pmu_base; int irq; - int irq_friend; + s8 irq_friend; bool cc_active; struct perf_event *counters[CMN_DT_NUM_COUNTERS]; @@ -357,49 +365,33 @@ struct arm_cmn { static int arm_cmn_hp_state; struct arm_cmn_nodeid { - u8 x; - u8 y; u8 port; u8 dev; }; static int arm_cmn_xyidbits(const struct arm_cmn *cmn) { - return fls((cmn->mesh_x - 1) | (cmn->mesh_y - 1) | 2); + return fls((cmn->mesh_x - 1) | (cmn->mesh_y - 1)); } -static struct arm_cmn_nodeid arm_cmn_nid(const struct arm_cmn *cmn, u16 id) +static struct arm_cmn_nodeid arm_cmn_nid(const struct arm_cmn_node *dn) { struct arm_cmn_nodeid nid; - if (cmn->num_xps == 1) { - nid.x = 0; - nid.y = 0; - nid.port = CMN_NODEID_1x1_PID(id); - nid.dev = CMN_NODEID_DEVID(id); - } else { - int bits = arm_cmn_xyidbits(cmn); - - nid.x = CMN_NODEID_X(id, bits); - nid.y = CMN_NODEID_Y(id, bits); - if (cmn->ports_used & 0xc) { - nid.port = CMN_NODEID_EXT_PID(id); - nid.dev = CMN_NODEID_EXT_DEVID(id); - } else { - nid.port = CMN_NODEID_PID(id); - nid.dev = CMN_NODEID_DEVID(id); - } - } + nid.dev = dn->id & ((1U << dn->deviceid_bits) - 1); + nid.port = (dn->id >> dn->deviceid_bits) & ((1U << dn->portid_bits) - 1); return nid; } static struct arm_cmn_node *arm_cmn_node_to_xp(const struct arm_cmn *cmn, const struct arm_cmn_node *dn) { - struct arm_cmn_nodeid nid = arm_cmn_nid(cmn, dn->id); - int xp_idx = cmn->mesh_x * nid.y + nid.x; + int id = dn->id >> (dn->portid_bits + dn->deviceid_bits); + int bits = arm_cmn_xyidbits(cmn); + int x = id >> bits; + int y = id & ((1U << bits) - 1); - return cmn->xps + xp_idx; + return cmn->xps + cmn->mesh_x * y + x; } static struct arm_cmn_node *arm_cmn_node(const struct arm_cmn *cmn, enum cmn_node_type type) @@ -423,15 +415,27 @@ static enum cmn_model arm_cmn_model(const struct arm_cmn *cmn) return CMN700; case PART_CI700: return CI700; + case PART_CMN_S3: + return CMNS3; default: return 0; }; } +static int arm_cmn_pmu_offset(const struct arm_cmn *cmn, const struct arm_cmn_node *dn) +{ + if (cmn->part == PART_CMN_S3) { + if (dn->type == CMN_TYPE_XP) + return CMN_S3_DTM_OFFSET; + return CMN_S3_PMU_OFFSET; + } + return CMN_PMU_OFFSET; +} + static u32 arm_cmn_device_connect_info(const struct arm_cmn *cmn, const struct arm_cmn_node *xp, int port) { - int offset = CMN_MXP__CONNECT_INFO(port); + int offset = CMN_MXP__CONNECT_INFO(port) - arm_cmn_pmu_offset(cmn, xp); if (port >= 2) { if (cmn->part == PART_CMN600 || cmn->part == PART_CMN650) @@ -444,7 +448,7 @@ static u32 arm_cmn_device_connect_info(const struct arm_cmn *cmn, offset += CI700_CONNECT_INFO_P2_5_OFFSET; } - return readl_relaxed(xp->pmu_base - CMN_PMU_OFFSET + offset); + return readl_relaxed(xp->pmu_base + offset); } static struct dentry *arm_cmn_debugfs; @@ -478,20 +482,25 @@ static const char *arm_cmn_device_type(u8 type) case 0x17: return "RN-F_C_E|"; case 0x18: return " RN-F_E |"; case 0x19: return "RN-F_E_E|"; + case 0x1a: return " HN-S |"; + case 0x1b: return " LCN |"; case 0x1c: return " MTSX |"; case 0x1d: return " HN-V |"; case 0x1e: return " CCG |"; + case 0x20: return " RN-F_F |"; + case 0x21: return "RN-F_F_E|"; + case 0x22: return " SN-F_F |"; default: return " ???? |"; } } -static void arm_cmn_show_logid(struct seq_file *s, int x, int y, int p, int d) +static void arm_cmn_show_logid(struct seq_file *s, const struct arm_cmn_node *xp, int p, int d) { struct arm_cmn *cmn = s->private; struct arm_cmn_node *dn; + u16 id = xp->id | d | (p << xp->deviceid_bits); for (dn = cmn->dns; dn->type; dn++) { - struct arm_cmn_nodeid nid = arm_cmn_nid(cmn, dn->id); int pad = dn->logid < 10; if (dn->type == CMN_TYPE_XP) @@ -500,7 +509,7 @@ static void arm_cmn_show_logid(struct seq_file *s, int x, int y, int p, int d) if (dn->type < CMN_TYPE_HNI) continue; - if (nid.x != x || nid.y != y || nid.port != p || nid.dev != d) + if (dn->id != id) continue; seq_printf(s, " %*c#%-*d |", pad + 1, ' ', 3 - pad, dn->logid); @@ -521,6 +530,7 @@ static int arm_cmn_map_show(struct seq_file *s, void *data) y = cmn->mesh_y; while (y--) { int xp_base = cmn->mesh_x * y; + struct arm_cmn_node *xp = cmn->xps + xp_base; u8 port[CMN_MAX_PORTS][CMN_MAX_DIMENSION]; for (x = 0; x < cmn->mesh_x; x++) @@ -528,16 +538,14 @@ static int arm_cmn_map_show(struct seq_file *s, void *data) seq_printf(s, "\n%-2d |", y); for (x = 0; x < cmn->mesh_x; x++) { - struct arm_cmn_node *xp = cmn->xps + xp_base + x; - for (p = 0; p < CMN_MAX_PORTS; p++) - port[p][x] = arm_cmn_device_connect_info(cmn, xp, p); + port[p][x] = arm_cmn_device_connect_info(cmn, xp + x, p); seq_printf(s, " XP #%-3d|", xp_base + x); } seq_puts(s, "\n |"); for (x = 0; x < cmn->mesh_x; x++) { - s8 dtc = cmn->xps[xp_base + x].dtc; + s8 dtc = xp[x].dtc; if (dtc < 0) seq_puts(s, " DTC ?? |"); @@ -554,10 +562,10 @@ static int arm_cmn_map_show(struct seq_file *s, void *data) seq_puts(s, arm_cmn_device_type(port[p][x])); seq_puts(s, "\n 0|"); for (x = 0; x < cmn->mesh_x; x++) - arm_cmn_show_logid(s, x, y, p, 0); + arm_cmn_show_logid(s, xp + x, p, 0); seq_puts(s, "\n 1|"); for (x = 0; x < cmn->mesh_x; x++) - arm_cmn_show_logid(s, x, y, p, 1); + arm_cmn_show_logid(s, xp + x, p, 1); } seq_puts(s, "\n-----+"); } @@ -585,7 +593,7 @@ static void arm_cmn_debugfs_init(struct arm_cmn *cmn, int id) {} struct arm_cmn_hw_event { struct arm_cmn_node *dn; - u64 dtm_idx[4]; + u64 dtm_idx[DIV_ROUND_UP(CMN_MAX_NODES_PER_EVENT * 2, 64)]; s8 dtc_idx[CMN_MAX_DTCS]; u8 num_dns; u8 dtm_offset; @@ -599,6 +607,7 @@ struct arm_cmn_hw_event { bool wide_sel; enum cmn_filter_select filter_sel; }; +static_assert(sizeof(struct arm_cmn_hw_event) <= offsetof(struct hw_perf_event, target)); #define for_each_hw_dn(hw, dn, i) \ for (i = 0, dn = hw->dn; i < hw->num_dns; i++, dn++) @@ -609,7 +618,6 @@ struct arm_cmn_hw_event { static struct arm_cmn_hw_event *to_cmn_hw(struct perf_event *event) { - BUILD_BUG_ON(sizeof(struct arm_cmn_hw_event) > offsetof(struct hw_perf_event, target)); return (struct arm_cmn_hw_event *)&event->hw; } @@ -790,8 +798,8 @@ static umode_t arm_cmn_event_attr_is_visible(struct kobject *kobj, CMN_EVENT_ATTR(CMN_ANY, cxha_##_name, CMN_TYPE_CXHA, _event) #define CMN_EVENT_CCRA(_name, _event) \ CMN_EVENT_ATTR(CMN_ANY, ccra_##_name, CMN_TYPE_CCRA, _event) -#define CMN_EVENT_CCHA(_name, _event) \ - CMN_EVENT_ATTR(CMN_ANY, ccha_##_name, CMN_TYPE_CCHA, _event) +#define CMN_EVENT_CCHA(_model, _name, _event) \ + CMN_EVENT_ATTR(_model, ccha_##_name, CMN_TYPE_CCHA, _event) #define CMN_EVENT_CCLA(_name, _event) \ CMN_EVENT_ATTR(CMN_ANY, ccla_##_name, CMN_TYPE_CCLA, _event) #define CMN_EVENT_CCLA_RNI(_name, _event) \ @@ -1149,42 +1157,43 @@ static struct attribute *arm_cmn_event_attrs[] = { CMN_EVENT_CCRA(wdb_alloc, 0x59), CMN_EVENT_CCRA(ssb_alloc, 0x5a), - CMN_EVENT_CCHA(rddatbyp, 0x61), - CMN_EVENT_CCHA(chirsp_up_stall, 0x62), - CMN_EVENT_CCHA(chidat_up_stall, 0x63), - CMN_EVENT_CCHA(snppcrd_link0_stall, 0x64), - CMN_EVENT_CCHA(snppcrd_link1_stall, 0x65), - CMN_EVENT_CCHA(snppcrd_link2_stall, 0x66), - CMN_EVENT_CCHA(reqtrk_occ, 0x67), - CMN_EVENT_CCHA(rdb_occ, 0x68), - CMN_EVENT_CCHA(rdbyp_occ, 0x69), - CMN_EVENT_CCHA(wdb_occ, 0x6a), - CMN_EVENT_CCHA(snptrk_occ, 0x6b), - CMN_EVENT_CCHA(sdb_occ, 0x6c), - CMN_EVENT_CCHA(snphaz_occ, 0x6d), - CMN_EVENT_CCHA(reqtrk_alloc, 0x6e), - CMN_EVENT_CCHA(rdb_alloc, 0x6f), - CMN_EVENT_CCHA(rdbyp_alloc, 0x70), - CMN_EVENT_CCHA(wdb_alloc, 0x71), - CMN_EVENT_CCHA(snptrk_alloc, 0x72), - CMN_EVENT_CCHA(sdb_alloc, 0x73), - CMN_EVENT_CCHA(snphaz_alloc, 0x74), - CMN_EVENT_CCHA(pb_rhu_req_occ, 0x75), - CMN_EVENT_CCHA(pb_rhu_req_alloc, 0x76), - CMN_EVENT_CCHA(pb_rhu_pcie_req_occ, 0x77), - CMN_EVENT_CCHA(pb_rhu_pcie_req_alloc, 0x78), - CMN_EVENT_CCHA(pb_pcie_wr_req_occ, 0x79), - CMN_EVENT_CCHA(pb_pcie_wr_req_alloc, 0x7a), - CMN_EVENT_CCHA(pb_pcie_reg_req_occ, 0x7b), - CMN_EVENT_CCHA(pb_pcie_reg_req_alloc, 0x7c), - CMN_EVENT_CCHA(pb_pcie_rsvd_req_occ, 0x7d), - CMN_EVENT_CCHA(pb_pcie_rsvd_req_alloc, 0x7e), - CMN_EVENT_CCHA(pb_rhu_dat_occ, 0x7f), - CMN_EVENT_CCHA(pb_rhu_dat_alloc, 0x80), - CMN_EVENT_CCHA(pb_rhu_pcie_dat_occ, 0x81), - CMN_EVENT_CCHA(pb_rhu_pcie_dat_alloc, 0x82), - CMN_EVENT_CCHA(pb_pcie_wr_dat_occ, 0x83), - CMN_EVENT_CCHA(pb_pcie_wr_dat_alloc, 0x84), + CMN_EVENT_CCHA(CMN_ANY, rddatbyp, 0x61), + CMN_EVENT_CCHA(CMN_ANY, chirsp_up_stall, 0x62), + CMN_EVENT_CCHA(CMN_ANY, chidat_up_stall, 0x63), + CMN_EVENT_CCHA(CMN_ANY, snppcrd_link0_stall, 0x64), + CMN_EVENT_CCHA(CMN_ANY, snppcrd_link1_stall, 0x65), + CMN_EVENT_CCHA(CMN_ANY, snppcrd_link2_stall, 0x66), + CMN_EVENT_CCHA(CMN_ANY, reqtrk_occ, 0x67), + CMN_EVENT_CCHA(CMN_ANY, rdb_occ, 0x68), + CMN_EVENT_CCHA(CMN_ANY, rdbyp_occ, 0x69), + CMN_EVENT_CCHA(CMN_ANY, wdb_occ, 0x6a), + CMN_EVENT_CCHA(CMN_ANY, snptrk_occ, 0x6b), + CMN_EVENT_CCHA(CMN_ANY, sdb_occ, 0x6c), + CMN_EVENT_CCHA(CMN_ANY, snphaz_occ, 0x6d), + CMN_EVENT_CCHA(CMN_ANY, reqtrk_alloc, 0x6e), + CMN_EVENT_CCHA(CMN_ANY, rdb_alloc, 0x6f), + CMN_EVENT_CCHA(CMN_ANY, rdbyp_alloc, 0x70), + CMN_EVENT_CCHA(CMN_ANY, wdb_alloc, 0x71), + CMN_EVENT_CCHA(CMN_ANY, snptrk_alloc, 0x72), + CMN_EVENT_CCHA(CMN_ANY, db_alloc, 0x73), + CMN_EVENT_CCHA(CMN_ANY, snphaz_alloc, 0x74), + CMN_EVENT_CCHA(CMN_ANY, pb_rhu_req_occ, 0x75), + CMN_EVENT_CCHA(CMN_ANY, pb_rhu_req_alloc, 0x76), + CMN_EVENT_CCHA(CMN_ANY, pb_rhu_pcie_req_occ, 0x77), + CMN_EVENT_CCHA(CMN_ANY, pb_rhu_pcie_req_alloc, 0x78), + CMN_EVENT_CCHA(CMN_ANY, pb_pcie_wr_req_occ, 0x79), + CMN_EVENT_CCHA(CMN_ANY, pb_pcie_wr_req_alloc, 0x7a), + CMN_EVENT_CCHA(CMN_ANY, pb_pcie_reg_req_occ, 0x7b), + CMN_EVENT_CCHA(CMN_ANY, pb_pcie_reg_req_alloc, 0x7c), + CMN_EVENT_CCHA(CMN_ANY, pb_pcie_rsvd_req_occ, 0x7d), + CMN_EVENT_CCHA(CMN_ANY, pb_pcie_rsvd_req_alloc, 0x7e), + CMN_EVENT_CCHA(CMN_ANY, pb_rhu_dat_occ, 0x7f), + CMN_EVENT_CCHA(CMN_ANY, pb_rhu_dat_alloc, 0x80), + CMN_EVENT_CCHA(CMN_ANY, pb_rhu_pcie_dat_occ, 0x81), + CMN_EVENT_CCHA(CMN_ANY, pb_rhu_pcie_dat_alloc, 0x82), + CMN_EVENT_CCHA(CMN_ANY, pb_pcie_wr_dat_occ, 0x83), + CMN_EVENT_CCHA(CMN_ANY, pb_pcie_wr_dat_alloc, 0x84), + CMN_EVENT_CCHA(CMNS3, chirsp1_up_stall, 0x85), CMN_EVENT_CCLA(rx_cxs, 0x21), CMN_EVENT_CCLA(tx_cxs, 0x22), @@ -1271,15 +1280,11 @@ static ssize_t arm_cmn_format_show(struct device *dev, struct device_attribute *attr, char *buf) { struct arm_cmn_format_attr *fmt = container_of(attr, typeof(*fmt), attr); - int lo = __ffs(fmt->field), hi = __fls(fmt->field); - - if (lo == hi) - return sysfs_emit(buf, "config:%d\n", lo); if (!fmt->config) - return sysfs_emit(buf, "config:%d-%d\n", lo, hi); + return sysfs_emit(buf, "config:%*pbl\n", 64, &fmt->field); - return sysfs_emit(buf, "config%d:%d-%d\n", fmt->config, lo, hi); + return sysfs_emit(buf, "config%d:%*pbl\n", fmt->config, 64, &fmt->field); } #define _CMN_FORMAT_ATTR(_name, _cfg, _fld) \ @@ -1415,7 +1420,7 @@ static u32 arm_cmn_wp_config(struct perf_event *event, int wp_idx) static void arm_cmn_set_state(struct arm_cmn *cmn, u32 state) { if (!cmn->state) - writel_relaxed(0, cmn->dtc[0].base + CMN_DT_PMCR); + writel_relaxed(0, CMN_DT_PMCR(&cmn->dtc[0])); cmn->state |= state; } @@ -1424,7 +1429,7 @@ static void arm_cmn_clear_state(struct arm_cmn *cmn, u32 state) cmn->state &= ~state; if (!cmn->state) writel_relaxed(CMN_DT_PMCR_PMU_EN | CMN_DT_PMCR_OVFL_INTR_EN, - cmn->dtc[0].base + CMN_DT_PMCR); + CMN_DT_PMCR(&cmn->dtc[0])); } static void arm_cmn_pmu_enable(struct pmu *pmu) @@ -1459,18 +1464,19 @@ static u64 arm_cmn_read_dtm(struct arm_cmn *cmn, struct arm_cmn_hw_event *hw, static u64 arm_cmn_read_cc(struct arm_cmn_dtc *dtc) { - u64 val = readq_relaxed(dtc->base + CMN_DT_PMCCNTR); + void __iomem *pmccntr = CMN_DT_PMCCNTR(dtc); + u64 val = readq_relaxed(pmccntr); - writeq_relaxed(CMN_CC_INIT, dtc->base + CMN_DT_PMCCNTR); + writeq_relaxed(CMN_CC_INIT, pmccntr); return (val - CMN_CC_INIT) & ((CMN_CC_INIT << 1) - 1); } static u32 arm_cmn_read_counter(struct arm_cmn_dtc *dtc, int idx) { - u32 val, pmevcnt = CMN_DT_PMEVCNT(idx); + void __iomem *pmevcnt = CMN_DT_PMEVCNT(dtc, idx); + u32 val = readl_relaxed(pmevcnt); - val = readl_relaxed(dtc->base + pmevcnt); - writel_relaxed(CMN_COUNTER_INIT, dtc->base + pmevcnt); + writel_relaxed(CMN_COUNTER_INIT, pmevcnt); return val - CMN_COUNTER_INIT; } @@ -1481,7 +1487,7 @@ static void arm_cmn_init_counter(struct perf_event *event) u64 count; for_each_hw_dtc_idx(hw, i, idx) { - writel_relaxed(CMN_COUNTER_INIT, cmn->dtc[i].base + CMN_DT_PMEVCNT(idx)); + writel_relaxed(CMN_COUNTER_INIT, CMN_DT_PMEVCNT(&cmn->dtc[i], idx)); cmn->dtc[i].counters[idx] = event; } @@ -1564,9 +1570,12 @@ static void arm_cmn_event_start(struct perf_event *event, int flags) int i; if (type == CMN_TYPE_DTC) { - i = hw->dtc_idx[0]; - writeq_relaxed(CMN_CC_INIT, cmn->dtc[i].base + CMN_DT_PMCCNTR); - cmn->dtc[i].cc_active = true; + struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0]; + + writel_relaxed(CMN_DT_DTC_CTL_DT_EN | CMN_DT_DTC_CTL_CG_DISABLE, + dtc->base + CMN_DT_DTC_CTL); + writeq_relaxed(CMN_CC_INIT, CMN_DT_PMCCNTR(dtc)); + dtc->cc_active = true; } else if (type == CMN_TYPE_WP) { u64 val = CMN_EVENT_WP_VAL(event); u64 mask = CMN_EVENT_WP_MASK(event); @@ -1595,8 +1604,10 @@ static void arm_cmn_event_stop(struct perf_event *event, int flags) int i; if (type == CMN_TYPE_DTC) { - i = hw->dtc_idx[0]; - cmn->dtc[i].cc_active = false; + struct arm_cmn_dtc *dtc = cmn->dtc + hw->dtc_idx[0]; + + dtc->cc_active = false; + writel_relaxed(CMN_DT_DTC_CTL_DT_EN, dtc->base + CMN_DT_DTC_CTL); } else if (type == CMN_TYPE_WP) { for_each_hw_dn(hw, dn, i) { void __iomem *base = dn->pmu_base + CMN_DTM_OFFSET(hw->dtm_offset); @@ -1784,7 +1795,8 @@ static int arm_cmn_event_init(struct perf_event *event) /* ...but the DTM may depend on which port we're watching */ if (cmn->multi_dtm) hw->dtm_offset = CMN_EVENT_WP_DEV_SEL(event) / 2; - } else if (type == CMN_TYPE_XP && cmn->part == PART_CMN700) { + } else if (type == CMN_TYPE_XP && + (cmn->part == PART_CMN700 || cmn->part == PART_CMN_S3)) { hw->wide_sel = true; } @@ -1815,10 +1827,7 @@ static int arm_cmn_event_init(struct perf_event *event) } if (!hw->num_dns) { - struct arm_cmn_nodeid nid = arm_cmn_nid(cmn, nodeid); - - dev_dbg(cmn->dev, "invalid node 0x%x (%d,%d,%d,%d) type 0x%x\n", - nodeid, nid.x, nid.y, nid.port, nid.dev, type); + dev_dbg(cmn->dev, "invalid node 0x%x type 0x%x\n", nodeid, type); return -EINVAL; } @@ -1921,7 +1930,7 @@ static int arm_cmn_event_add(struct perf_event *event, int flags) arm_cmn_claim_wp_idx(dtm, event, d, wp_idx, i); writel_relaxed(cfg, dtm->base + CMN_DTM_WPn_CONFIG(wp_idx)); } else { - struct arm_cmn_nodeid nid = arm_cmn_nid(cmn, dn->id); + struct arm_cmn_nodeid nid = arm_cmn_nid(dn); if (cmn->multi_dtm) nid.port %= 2; @@ -2010,7 +2019,7 @@ static int arm_cmn_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_nod cmn = hlist_entry_safe(cpuhp_node, struct arm_cmn, cpuhp_node); node = dev_to_node(cmn->dev); - if (node != NUMA_NO_NODE && cpu_to_node(cmn->cpu) != node && cpu_to_node(cpu) == node) + if (cpu_to_node(cmn->cpu) != node && cpu_to_node(cpu) == node) arm_cmn_migrate(cmn, cpu); return 0; } @@ -2043,7 +2052,7 @@ static irqreturn_t arm_cmn_handle_irq(int irq, void *dev_id) irqreturn_t ret = IRQ_NONE; for (;;) { - u32 status = readl_relaxed(dtc->base + CMN_DT_PMOVSR); + u32 status = readl_relaxed(CMN_DT_PMOVSR(dtc)); u64 delta; int i; @@ -2065,7 +2074,7 @@ static irqreturn_t arm_cmn_handle_irq(int irq, void *dev_id) } } - writel_relaxed(status, dtc->base + CMN_DT_PMOVSR_CLR); + writel_relaxed(status, CMN_DT_PMOVSR_CLR(dtc)); if (!dtc->irq_friend) return ret; @@ -2119,15 +2128,16 @@ static int arm_cmn_init_dtc(struct arm_cmn *cmn, struct arm_cmn_node *dn, int id { struct arm_cmn_dtc *dtc = cmn->dtc + idx; - dtc->base = dn->pmu_base - CMN_PMU_OFFSET; + dtc->pmu_base = dn->pmu_base; + dtc->base = dtc->pmu_base - arm_cmn_pmu_offset(cmn, dn); dtc->irq = platform_get_irq(to_platform_device(cmn->dev), idx); if (dtc->irq < 0) return dtc->irq; writel_relaxed(CMN_DT_DTC_CTL_DT_EN, dtc->base + CMN_DT_DTC_CTL); - writel_relaxed(CMN_DT_PMCR_PMU_EN | CMN_DT_PMCR_OVFL_INTR_EN, dtc->base + CMN_DT_PMCR); - writeq_relaxed(0, dtc->base + CMN_DT_PMCCNTR); - writel_relaxed(0x1ff, dtc->base + CMN_DT_PMOVSR_CLR); + writel_relaxed(CMN_DT_PMCR_PMU_EN | CMN_DT_PMCR_OVFL_INTR_EN, CMN_DT_PMCR(dtc)); + writeq_relaxed(0, CMN_DT_PMCCNTR(dtc)); + writel_relaxed(0x1ff, CMN_DT_PMOVSR_CLR(dtc)); return 0; } @@ -2168,10 +2178,12 @@ static int arm_cmn_init_dtcs(struct arm_cmn *cmn) continue; xp = arm_cmn_node_to_xp(cmn, dn); + dn->portid_bits = xp->portid_bits; + dn->deviceid_bits = xp->deviceid_bits; dn->dtc = xp->dtc; dn->dtm = xp->dtm; if (cmn->multi_dtm) - dn->dtm += arm_cmn_nid(cmn, dn->id).port / 2; + dn->dtm += arm_cmn_nid(dn).port / 2; if (dn->type == CMN_TYPE_DTC) { int err = arm_cmn_init_dtc(cmn, dn, dtc_idx++); @@ -2213,7 +2225,7 @@ static void arm_cmn_init_node_info(struct arm_cmn *cmn, u32 offset, struct arm_c node->id = FIELD_GET(CMN_NI_NODE_ID, reg); node->logid = FIELD_GET(CMN_NI_LOGICAL_ID, reg); - node->pmu_base = cmn->base + offset + CMN_PMU_OFFSET; + node->pmu_base = cmn->base + offset + arm_cmn_pmu_offset(cmn, node); if (node->type == CMN_TYPE_CFG) level = 0; @@ -2271,7 +2283,17 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) reg = readl_relaxed(cfg_region + CMN_CFGM_PERIPH_ID_23); cmn->rev = FIELD_GET(CMN_CFGM_PID2_REVISION, reg); + /* + * With the device isolation feature, if firmware has neglected to enable + * an XP port then we risk locking up if we try to access anything behind + * it; however we also have no way to tell from Non-Secure whether any + * given port is disabled or not, so the only way to win is not to play... + */ reg = readq_relaxed(cfg_region + CMN_CFGM_INFO_GLOBAL); + if (reg & CMN_INFO_DEVICE_ISO_ENABLE) { + dev_err(cmn->dev, "Device isolation enabled, not continuing due to risk of lockup\n"); + return -ENODEV; + } cmn->multi_dtm = reg & CMN_INFO_MULTIPLE_DTM_EN; cmn->rsp_vc_num = FIELD_GET(CMN_INFO_RSP_VC_NUM, reg); cmn->dat_vc_num = FIELD_GET(CMN_INFO_DAT_VC_NUM, reg); @@ -2341,18 +2363,27 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) arm_cmn_init_dtm(dtm++, xp, 0); /* * Keeping track of connected ports will let us filter out - * unnecessary XP events easily. We can also reliably infer the - * "extra device ports" configuration for the node ID format - * from this, since in that case we will see at least one XP - * with port 2 connected, for the HN-D. + * unnecessary XP events easily, and also infer the per-XP + * part of the node ID format. */ for (int p = 0; p < CMN_MAX_PORTS; p++) if (arm_cmn_device_connect_info(cmn, xp, p)) xp_ports |= BIT(p); - if (cmn->multi_dtm && (xp_ports & 0xc)) + if (cmn->num_xps == 1) { + xp->portid_bits = 3; + xp->deviceid_bits = 2; + } else if (xp_ports > 0x3) { + xp->portid_bits = 2; + xp->deviceid_bits = 1; + } else { + xp->portid_bits = 1; + xp->deviceid_bits = 2; + } + + if (cmn->multi_dtm && (xp_ports > 0x3)) arm_cmn_init_dtm(dtm++, xp, 1); - if (cmn->multi_dtm && (xp_ports & 0x30)) + if (cmn->multi_dtm && (xp_ports > 0xf)) arm_cmn_init_dtm(dtm++, xp, 2); cmn->ports_used |= xp_ports; @@ -2407,10 +2438,13 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) case CMN_TYPE_CXHA: case CMN_TYPE_CCRA: case CMN_TYPE_CCHA: - case CMN_TYPE_CCLA: case CMN_TYPE_HNS: dn++; break; + case CMN_TYPE_CCLA: + dn->pmu_base += CMN_CCLA_PMU_EVENT_SEL; + dn++; + break; /* Nothing to see here */ case CMN_TYPE_MPAM_S: case CMN_TYPE_MPAM_NS: @@ -2418,6 +2452,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) case CMN_TYPE_CXLA: case CMN_TYPE_HNS_MPAM_S: case CMN_TYPE_HNS_MPAM_NS: + case CMN_TYPE_APB: break; /* * Split "optimised" combination nodes into separate @@ -2428,7 +2463,7 @@ static int arm_cmn_discover(struct arm_cmn *cmn, unsigned int rgn_offset) case CMN_TYPE_HNP: case CMN_TYPE_CCLA_RNI: dn[1] = dn[0]; - dn[0].pmu_base += CMN_HNP_PMU_EVENT_SEL; + dn[0].pmu_base += CMN_CCLA_PMU_EVENT_SEL; dn[1].type = arm_cmn_subtype(dn->type); dn += 2; break; @@ -2603,6 +2638,7 @@ static const struct of_device_id arm_cmn_of_match[] = { { .compatible = "arm,cmn-600", .data = (void *)PART_CMN600 }, { .compatible = "arm,cmn-650" }, { .compatible = "arm,cmn-700" }, + { .compatible = "arm,cmn-s3" }, { .compatible = "arm,ci-700" }, {} }; diff --git a/drivers/perf/arm-ni.c b/drivers/perf/arm-ni.c new file mode 100644 index 000000000000..90fcfe693439 --- /dev/null +++ b/drivers/perf/arm-ni.c @@ -0,0 +1,781 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (C) 2022-2024 Arm Limited +// NI-700 Network-on-Chip PMU driver + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +/* Common registers */ +#define NI_NODE_TYPE 0x000 +#define NI_NODE_TYPE_NODE_ID GENMASK(31, 16) +#define NI_NODE_TYPE_NODE_TYPE GENMASK(15, 0) + +#define NI_CHILD_NODE_INFO 0x004 +#define NI_CHILD_PTR(n) (0x008 + (n) * 4) + +#define NI700_PMUSELA 0x00c + +/* Config node */ +#define NI_PERIPHERAL_ID0 0xfe0 +#define NI_PIDR0_PART_7_0 GENMASK(7, 0) +#define NI_PERIPHERAL_ID1 0xfe4 +#define NI_PIDR1_PART_11_8 GENMASK(3, 0) +#define NI_PERIPHERAL_ID2 0xfe8 +#define NI_PIDR2_VERSION GENMASK(7, 4) + +/* PMU node */ +#define NI_PMEVCNTR(n) (0x008 + (n) * 8) +#define NI_PMCCNTR_L 0x0f8 +#define NI_PMCCNTR_U 0x0fc +#define NI_PMEVTYPER(n) (0x400 + (n) * 4) +#define NI_PMEVTYPER_NODE_TYPE GENMASK(12, 9) +#define NI_PMEVTYPER_NODE_ID GENMASK(8, 0) +#define NI_PMCNTENSET 0xc00 +#define NI_PMCNTENCLR 0xc20 +#define NI_PMINTENSET 0xc40 +#define NI_PMINTENCLR 0xc60 +#define NI_PMOVSCLR 0xc80 +#define NI_PMOVSSET 0xcc0 +#define NI_PMCFGR 0xe00 +#define NI_PMCR 0xe04 +#define NI_PMCR_RESET_CCNT BIT(2) +#define NI_PMCR_RESET_EVCNT BIT(1) +#define NI_PMCR_ENABLE BIT(0) + +#define NI_NUM_COUNTERS 8 +#define NI_CCNT_IDX 31 + +/* Event attributes */ +#define NI_CONFIG_TYPE GENMASK_ULL(15, 0) +#define NI_CONFIG_NODEID GENMASK_ULL(31, 16) +#define NI_CONFIG_EVENTID GENMASK_ULL(47, 32) + +#define NI_EVENT_TYPE(event) FIELD_GET(NI_CONFIG_TYPE, (event)->attr.config) +#define NI_EVENT_NODEID(event) FIELD_GET(NI_CONFIG_NODEID, (event)->attr.config) +#define NI_EVENT_EVENTID(event) FIELD_GET(NI_CONFIG_EVENTID, (event)->attr.config) + +enum ni_part { + PART_NI_700 = 0x43b, + PART_NI_710AE = 0x43d, +}; + +enum ni_node_type { + NI_GLOBAL, + NI_VOLTAGE, + NI_POWER, + NI_CLOCK, + NI_ASNI, + NI_AMNI, + NI_PMU, + NI_HSNI, + NI_HMNI, + NI_PMNI, +}; + +struct arm_ni_node { + void __iomem *base; + enum ni_node_type type; + u16 id; + u32 num_components; +}; + +struct arm_ni_unit { + void __iomem *pmusela; + enum ni_node_type type; + u16 id; + bool ns; + union { + __le64 pmusel; + u8 event[8]; + }; +}; + +struct arm_ni_cd { + void __iomem *pmu_base; + u16 id; + int num_units; + int irq; + int cpu; + struct hlist_node cpuhp_node; + struct pmu pmu; + struct arm_ni_unit *units; + struct perf_event *evcnt[NI_NUM_COUNTERS]; + struct perf_event *ccnt; +}; + +struct arm_ni { + struct device *dev; + void __iomem *base; + enum ni_part part; + int id; + int num_cds; + struct arm_ni_cd cds[] __counted_by(num_cds); +}; + +#define cd_to_ni(cd) container_of((cd), struct arm_ni, cds[(cd)->id]) +#define pmu_to_cd(p) container_of((p), struct arm_ni_cd, pmu) + +#define cd_for_each_unit(cd, u) \ + for (struct arm_ni_unit *u = cd->units; u < cd->units + cd->num_units; u++) + +static int arm_ni_hp_state; + +struct arm_ni_event_attr { + struct device_attribute attr; + enum ni_node_type type; +}; + +#define NI_EVENT_ATTR(_name, _type) \ + (&((struct arm_ni_event_attr[]) {{ \ + .attr = __ATTR(_name, 0444, arm_ni_event_show, NULL), \ + .type = _type, \ + }})[0].attr.attr) + +static ssize_t arm_ni_event_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct arm_ni_event_attr *eattr = container_of(attr, typeof(*eattr), attr); + + if (eattr->type == NI_PMU) + return sysfs_emit(buf, "type=0x%x\n", eattr->type); + + return sysfs_emit(buf, "type=0x%x,eventid=?,nodeid=?\n", eattr->type); +} + +static umode_t arm_ni_event_attr_is_visible(struct kobject *kobj, + struct attribute *attr, int unused) +{ + struct device *dev = kobj_to_dev(kobj); + struct arm_ni_cd *cd = pmu_to_cd(dev_get_drvdata(dev)); + struct arm_ni_event_attr *eattr; + + eattr = container_of(attr, typeof(*eattr), attr.attr); + + cd_for_each_unit(cd, unit) { + if (unit->type == eattr->type && unit->ns) + return attr->mode; + } + + return 0; +} + +static struct attribute *arm_ni_event_attrs[] = { + NI_EVENT_ATTR(asni, NI_ASNI), + NI_EVENT_ATTR(amni, NI_AMNI), + NI_EVENT_ATTR(cycles, NI_PMU), + NI_EVENT_ATTR(hsni, NI_HSNI), + NI_EVENT_ATTR(hmni, NI_HMNI), + NI_EVENT_ATTR(pmni, NI_PMNI), + NULL +}; + +static const struct attribute_group arm_ni_event_attrs_group = { + .name = "events", + .attrs = arm_ni_event_attrs, + .is_visible = arm_ni_event_attr_is_visible, +}; + +struct arm_ni_format_attr { + struct device_attribute attr; + u64 field; +}; + +#define NI_FORMAT_ATTR(_name, _fld) \ + (&((struct arm_ni_format_attr[]) {{ \ + .attr = __ATTR(_name, 0444, arm_ni_format_show, NULL), \ + .field = _fld, \ + }})[0].attr.attr) + +static ssize_t arm_ni_format_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct arm_ni_format_attr *fmt = container_of(attr, typeof(*fmt), attr); + + return sysfs_emit(buf, "config:%*pbl\n", 64, &fmt->field); +} + +static struct attribute *arm_ni_format_attrs[] = { + NI_FORMAT_ATTR(type, NI_CONFIG_TYPE), + NI_FORMAT_ATTR(nodeid, NI_CONFIG_NODEID), + NI_FORMAT_ATTR(eventid, NI_CONFIG_EVENTID), + NULL +}; + +static const struct attribute_group arm_ni_format_attrs_group = { + .name = "format", + .attrs = arm_ni_format_attrs, +}; + +static ssize_t arm_ni_cpumask_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct arm_ni_cd *cd = pmu_to_cd(dev_get_drvdata(dev)); + + return cpumap_print_to_pagebuf(true, buf, cpumask_of(cd->cpu)); +} + +static struct device_attribute arm_ni_cpumask_attr = + __ATTR(cpumask, 0444, arm_ni_cpumask_show, NULL); + +static ssize_t arm_ni_identifier_show(struct device *dev, + struct device_attribute *attr, char *buf) +{ + struct arm_ni *ni = cd_to_ni(pmu_to_cd(dev_get_drvdata(dev))); + u32 reg = readl_relaxed(ni->base + NI_PERIPHERAL_ID2); + int version = FIELD_GET(NI_PIDR2_VERSION, reg); + + return sysfs_emit(buf, "%03x%02x\n", ni->part, version); +} + +static struct device_attribute arm_ni_identifier_attr = + __ATTR(identifier, 0444, arm_ni_identifier_show, NULL); + +static struct attribute *arm_ni_other_attrs[] = { + &arm_ni_cpumask_attr.attr, + &arm_ni_identifier_attr.attr, + NULL +}; + +static const struct attribute_group arm_ni_other_attr_group = { + .attrs = arm_ni_other_attrs, + NULL +}; + +static const struct attribute_group *arm_ni_attr_groups[] = { + &arm_ni_event_attrs_group, + &arm_ni_format_attrs_group, + &arm_ni_other_attr_group, + NULL +}; + +static void arm_ni_pmu_enable(struct pmu *pmu) +{ + writel_relaxed(NI_PMCR_ENABLE, pmu_to_cd(pmu)->pmu_base + NI_PMCR); +} + +static void arm_ni_pmu_disable(struct pmu *pmu) +{ + writel_relaxed(0, pmu_to_cd(pmu)->pmu_base + NI_PMCR); +} + +struct arm_ni_val { + unsigned int evcnt; + unsigned int ccnt; +}; + +static bool arm_ni_val_count_event(struct perf_event *evt, struct arm_ni_val *val) +{ + if (is_software_event(evt)) + return true; + + if (NI_EVENT_TYPE(evt) == NI_PMU) { + val->ccnt++; + return val->ccnt <= 1; + } + + val->evcnt++; + return val->evcnt <= NI_NUM_COUNTERS; +} + +static int arm_ni_validate_group(struct perf_event *event) +{ + struct perf_event *sibling, *leader = event->group_leader; + struct arm_ni_val val = { 0 }; + + if (leader == event) + return 0; + + arm_ni_val_count_event(event, &val); + if (!arm_ni_val_count_event(leader, &val)) + return -EINVAL; + + for_each_sibling_event(sibling, leader) { + if (!arm_ni_val_count_event(sibling, &val)) + return -EINVAL; + } + return 0; +} + +static int arm_ni_event_init(struct perf_event *event) +{ + struct arm_ni_cd *cd = pmu_to_cd(event->pmu); + + if (event->attr.type != event->pmu->type) + return -ENOENT; + + if (is_sampling_event(event)) + return -EINVAL; + + event->cpu = cd->cpu; + if (NI_EVENT_TYPE(event) == NI_PMU) + return arm_ni_validate_group(event); + + cd_for_each_unit(cd, unit) { + if (unit->type == NI_EVENT_TYPE(event) && + unit->id == NI_EVENT_NODEID(event) && unit->ns) { + event->hw.config_base = (unsigned long)unit; + return arm_ni_validate_group(event); + } + } + return -EINVAL; +} + +static u64 arm_ni_read_ccnt(struct arm_ni_cd *cd) +{ + u64 l, u_old, u_new; + int retries = 3; /* 1st time unlucky, 2nd improbable, 3rd just broken */ + + u_new = readl_relaxed(cd->pmu_base + NI_PMCCNTR_U); + do { + u_old = u_new; + l = readl_relaxed(cd->pmu_base + NI_PMCCNTR_L); + u_new = readl_relaxed(cd->pmu_base + NI_PMCCNTR_U); + } while (u_new != u_old && --retries); + WARN_ON(!retries); + + return (u_new << 32) | l; +} + +static void arm_ni_event_read(struct perf_event *event) +{ + struct arm_ni_cd *cd = pmu_to_cd(event->pmu); + struct hw_perf_event *hw = &event->hw; + u64 count, prev; + bool ccnt = hw->idx == NI_CCNT_IDX; + + do { + prev = local64_read(&hw->prev_count); + if (ccnt) + count = arm_ni_read_ccnt(cd); + else + count = readl_relaxed(cd->pmu_base + NI_PMEVCNTR(hw->idx)); + } while (local64_cmpxchg(&hw->prev_count, prev, count) != prev); + + count -= prev; + if (!ccnt) + count = (u32)count; + local64_add(count, &event->count); +} + +static void arm_ni_event_start(struct perf_event *event, int flags) +{ + struct arm_ni_cd *cd = pmu_to_cd(event->pmu); + + writel_relaxed(1U << event->hw.idx, cd->pmu_base + NI_PMCNTENSET); +} + +static void arm_ni_event_stop(struct perf_event *event, int flags) +{ + struct arm_ni_cd *cd = pmu_to_cd(event->pmu); + + writel_relaxed(1U << event->hw.idx, cd->pmu_base + NI_PMCNTENCLR); + if (flags & PERF_EF_UPDATE) + arm_ni_event_read(event); +} + +static void arm_ni_init_ccnt(struct arm_ni_cd *cd) +{ + local64_set(&cd->ccnt->hw.prev_count, S64_MIN); + lo_hi_writeq_relaxed(S64_MIN, cd->pmu_base + NI_PMCCNTR_L); +} + +static void arm_ni_init_evcnt(struct arm_ni_cd *cd, int idx) +{ + local64_set(&cd->evcnt[idx]->hw.prev_count, S32_MIN); + writel_relaxed(S32_MIN, cd->pmu_base + NI_PMEVCNTR(idx)); +} + +static int arm_ni_event_add(struct perf_event *event, int flags) +{ + struct arm_ni_cd *cd = pmu_to_cd(event->pmu); + struct hw_perf_event *hw = &event->hw; + struct arm_ni_unit *unit; + enum ni_node_type type = NI_EVENT_TYPE(event); + u32 reg; + + if (type == NI_PMU) { + if (cd->ccnt) + return -ENOSPC; + hw->idx = NI_CCNT_IDX; + cd->ccnt = event; + arm_ni_init_ccnt(cd); + } else { + hw->idx = 0; + while (cd->evcnt[hw->idx]) { + if (++hw->idx == NI_NUM_COUNTERS) + return -ENOSPC; + } + cd->evcnt[hw->idx] = event; + unit = (void *)hw->config_base; + unit->event[hw->idx] = NI_EVENT_EVENTID(event); + arm_ni_init_evcnt(cd, hw->idx); + lo_hi_writeq_relaxed(le64_to_cpu(unit->pmusel), unit->pmusela); + + reg = FIELD_PREP(NI_PMEVTYPER_NODE_TYPE, type) | + FIELD_PREP(NI_PMEVTYPER_NODE_ID, NI_EVENT_NODEID(event)); + writel_relaxed(reg, cd->pmu_base + NI_PMEVTYPER(hw->idx)); + } + if (flags & PERF_EF_START) + arm_ni_event_start(event, 0); + return 0; +} + +static void arm_ni_event_del(struct perf_event *event, int flags) +{ + struct arm_ni_cd *cd = pmu_to_cd(event->pmu); + struct hw_perf_event *hw = &event->hw; + + arm_ni_event_stop(event, PERF_EF_UPDATE); + + if (hw->idx == NI_CCNT_IDX) + cd->ccnt = NULL; + else + cd->evcnt[hw->idx] = NULL; +} + +static irqreturn_t arm_ni_handle_irq(int irq, void *dev_id) +{ + struct arm_ni_cd *cd = dev_id; + irqreturn_t ret = IRQ_NONE; + u32 reg = readl_relaxed(cd->pmu_base + NI_PMOVSCLR); + + if (reg & (1U << NI_CCNT_IDX)) { + ret = IRQ_HANDLED; + if (!(WARN_ON(!cd->ccnt))) { + arm_ni_event_read(cd->ccnt); + arm_ni_init_ccnt(cd); + } + } + for (int i = 0; i < NI_NUM_COUNTERS; i++) { + if (!(reg & (1U << i))) + continue; + ret = IRQ_HANDLED; + if (!(WARN_ON(!cd->evcnt[i]))) { + arm_ni_event_read(cd->evcnt[i]); + arm_ni_init_evcnt(cd, i); + } + } + writel_relaxed(reg, cd->pmu_base + NI_PMOVSCLR); + return ret; +} + +static int arm_ni_init_cd(struct arm_ni *ni, struct arm_ni_node *node, u64 res_start) +{ + struct arm_ni_cd *cd = ni->cds + node->id; + const char *name; + int err; + + cd->id = node->id; + cd->num_units = node->num_components; + cd->units = devm_kcalloc(ni->dev, cd->num_units, sizeof(*(cd->units)), GFP_KERNEL); + if (!cd->units) + return -ENOMEM; + + for (int i = 0; i < cd->num_units; i++) { + u32 reg = readl_relaxed(node->base + NI_CHILD_PTR(i)); + void __iomem *unit_base = ni->base + reg; + struct arm_ni_unit *unit = cd->units + i; + + reg = readl_relaxed(unit_base + NI_NODE_TYPE); + unit->type = FIELD_GET(NI_NODE_TYPE_NODE_TYPE, reg); + unit->id = FIELD_GET(NI_NODE_TYPE_NODE_ID, reg); + + switch (unit->type) { + case NI_PMU: + reg = readl_relaxed(unit_base + NI_PMCFGR); + if (!reg) { + dev_info(ni->dev, "No access to PMU %d\n", cd->id); + devm_kfree(ni->dev, cd->units); + return 0; + } + unit->ns = true; + cd->pmu_base = unit_base; + break; + case NI_ASNI: + case NI_AMNI: + case NI_HSNI: + case NI_HMNI: + case NI_PMNI: + unit->pmusela = unit_base + NI700_PMUSELA; + writel_relaxed(1, unit->pmusela); + if (readl_relaxed(unit->pmusela) != 1) + dev_info(ni->dev, "No access to node 0x%04x%04x\n", unit->id, unit->type); + else + unit->ns = true; + break; + default: + /* + * e.g. FMU - thankfully bits 3:2 of FMU_ERR_FR0 are RES0 so + * can't alias any of the leaf node types we're looking for. + */ + dev_dbg(ni->dev, "Mystery node 0x%04x%04x\n", unit->id, unit->type); + break; + } + } + + res_start += cd->pmu_base - ni->base; + if (!devm_request_mem_region(ni->dev, res_start, SZ_4K, dev_name(ni->dev))) { + dev_err(ni->dev, "Failed to request PMU region 0x%llx\n", res_start); + return -EBUSY; + } + + writel_relaxed(NI_PMCR_RESET_CCNT | NI_PMCR_RESET_EVCNT, + cd->pmu_base + NI_PMCR); + writel_relaxed(U32_MAX, cd->pmu_base + NI_PMCNTENCLR); + writel_relaxed(U32_MAX, cd->pmu_base + NI_PMOVSCLR); + writel_relaxed(U32_MAX, cd->pmu_base + NI_PMINTENSET); + + cd->irq = platform_get_irq(to_platform_device(ni->dev), cd->id); + if (cd->irq < 0) + return cd->irq; + + err = devm_request_irq(ni->dev, cd->irq, arm_ni_handle_irq, + IRQF_NOBALANCING | IRQF_NO_THREAD, + dev_name(ni->dev), cd); + if (err) + return err; + + cd->cpu = cpumask_local_spread(0, dev_to_node(ni->dev)); + cd->pmu = (struct pmu) { + .module = THIS_MODULE, + .parent = ni->dev, + .attr_groups = arm_ni_attr_groups, + .capabilities = PERF_PMU_CAP_NO_EXCLUDE, + .task_ctx_nr = perf_invalid_context, + .pmu_enable = arm_ni_pmu_enable, + .pmu_disable = arm_ni_pmu_disable, + .event_init = arm_ni_event_init, + .add = arm_ni_event_add, + .del = arm_ni_event_del, + .start = arm_ni_event_start, + .stop = arm_ni_event_stop, + .read = arm_ni_event_read, + }; + + name = devm_kasprintf(ni->dev, GFP_KERNEL, "arm_ni_%d_cd_%d", ni->id, cd->id); + if (!name) + return -ENOMEM; + + err = cpuhp_state_add_instance_nocalls(arm_ni_hp_state, &cd->cpuhp_node); + if (err) + return err; + + err = perf_pmu_register(&cd->pmu, name, -1); + if (err) + cpuhp_state_remove_instance_nocalls(arm_ni_hp_state, &cd->cpuhp_node); + + return err; +} + +static void arm_ni_probe_domain(void __iomem *base, struct arm_ni_node *node) +{ + u32 reg = readl_relaxed(base + NI_NODE_TYPE); + + node->base = base; + node->type = FIELD_GET(NI_NODE_TYPE_NODE_TYPE, reg); + node->id = FIELD_GET(NI_NODE_TYPE_NODE_ID, reg); + node->num_components = readl_relaxed(base + NI_CHILD_NODE_INFO); +} + +static int arm_ni_probe(struct platform_device *pdev) +{ + struct arm_ni_node cfg, vd, pd, cd; + struct arm_ni *ni; + struct resource *res; + void __iomem *base; + static atomic_t id; + int num_cds; + u32 reg, part; + + /* + * We want to map the whole configuration space for ease of discovery, + * but the PMU pages are the only ones for which we can honestly claim + * exclusive ownership, so we'll request them explicitly once found. + */ + res = platform_get_resource(pdev, IORESOURCE_MEM, 0); + base = devm_ioremap(&pdev->dev, res->start, resource_size(res)); + if (!base) + return -ENOMEM; + + arm_ni_probe_domain(base, &cfg); + if (cfg.type != NI_GLOBAL) + return -ENODEV; + + reg = readl_relaxed(cfg.base + NI_PERIPHERAL_ID0); + part = FIELD_GET(NI_PIDR0_PART_7_0, reg); + reg = readl_relaxed(cfg.base + NI_PERIPHERAL_ID1); + part |= FIELD_GET(NI_PIDR1_PART_11_8, reg) << 8; + + switch (part) { + case PART_NI_700: + case PART_NI_710AE: + break; + default: + dev_WARN(&pdev->dev, "Unknown part number: 0x%03x, this may go badly\n", part); + break; + } + + num_cds = 0; + for (int v = 0; v < cfg.num_components; v++) { + reg = readl_relaxed(cfg.base + NI_CHILD_PTR(v)); + arm_ni_probe_domain(base + reg, &vd); + for (int p = 0; p < vd.num_components; p++) { + reg = readl_relaxed(vd.base + NI_CHILD_PTR(p)); + arm_ni_probe_domain(base + reg, &pd); + num_cds += pd.num_components; + } + } + + ni = devm_kzalloc(&pdev->dev, struct_size(ni, cds, num_cds), GFP_KERNEL); + if (!ni) + return -ENOMEM; + + ni->dev = &pdev->dev; + ni->base = base; + ni->num_cds = num_cds; + ni->part = part; + ni->id = atomic_fetch_inc(&id); + + for (int v = 0; v < cfg.num_components; v++) { + reg = readl_relaxed(cfg.base + NI_CHILD_PTR(v)); + arm_ni_probe_domain(base + reg, &vd); + for (int p = 0; p < vd.num_components; p++) { + reg = readl_relaxed(vd.base + NI_CHILD_PTR(p)); + arm_ni_probe_domain(base + reg, &pd); + for (int c = 0; c < pd.num_components; c++) { + int ret; + + reg = readl_relaxed(pd.base + NI_CHILD_PTR(c)); + arm_ni_probe_domain(base + reg, &cd); + ret = arm_ni_init_cd(ni, &cd, res->start); + if (ret) + return ret; + } + } + } + + return 0; +} + +static void arm_ni_remove(struct platform_device *pdev) +{ + struct arm_ni *ni = platform_get_drvdata(pdev); + + for (int i = 0; i < ni->num_cds; i++) { + struct arm_ni_cd *cd = ni->cds + i; + + if (!cd->pmu_base) + continue; + + writel_relaxed(0, cd->pmu_base + NI_PMCR); + writel_relaxed(U32_MAX, cd->pmu_base + NI_PMINTENCLR); + perf_pmu_unregister(&cd->pmu); + cpuhp_state_remove_instance_nocalls(arm_ni_hp_state, &cd->cpuhp_node); + } +} + +#ifdef CONFIG_OF +static const struct of_device_id arm_ni_of_match[] = { + { .compatible = "arm,ni-700" }, + {} +}; +MODULE_DEVICE_TABLE(of, arm_ni_of_match); +#endif + +#ifdef CONFIG_ACPI +static const struct acpi_device_id arm_ni_acpi_match[] = { + { "ARMHCB70" }, + {} +}; +MODULE_DEVICE_TABLE(acpi, arm_ni_acpi_match); +#endif + +static struct platform_driver arm_ni_driver = { + .driver = { + .name = "arm-ni", + .of_match_table = of_match_ptr(arm_ni_of_match), + .acpi_match_table = ACPI_PTR(arm_ni_acpi_match), + }, + .probe = arm_ni_probe, + .remove = arm_ni_remove, +}; + +static void arm_ni_pmu_migrate(struct arm_ni_cd *cd, unsigned int cpu) +{ + perf_pmu_migrate_context(&cd->pmu, cd->cpu, cpu); + irq_set_affinity(cd->irq, cpumask_of(cpu)); + cd->cpu = cpu; +} + +static int arm_ni_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_node) +{ + struct arm_ni_cd *cd; + int node; + + cd = hlist_entry_safe(cpuhp_node, struct arm_ni_cd, cpuhp_node); + node = dev_to_node(cd_to_ni(cd)->dev); + if (cpu_to_node(cd->cpu) != node && cpu_to_node(cpu) == node) + arm_ni_pmu_migrate(cd, cpu); + return 0; +} + +static int arm_ni_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_node) +{ + struct arm_ni_cd *cd; + unsigned int target; + int node; + + cd = hlist_entry_safe(cpuhp_node, struct arm_ni_cd, cpuhp_node); + if (cpu != cd->cpu) + return 0; + + node = dev_to_node(cd_to_ni(cd)->dev); + target = cpumask_any_and_but(cpumask_of_node(node), cpu_online_mask, cpu); + if (target >= nr_cpu_ids) + target = cpumask_any_but(cpu_online_mask, cpu); + + if (target < nr_cpu_ids) + arm_ni_pmu_migrate(cd, target); + return 0; +} + +static int __init arm_ni_init(void) +{ + int ret; + + ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, + "perf/arm/ni:online", + arm_ni_pmu_online_cpu, + arm_ni_pmu_offline_cpu); + if (ret < 0) + return ret; + + arm_ni_hp_state = ret; + + ret = platform_driver_register(&arm_ni_driver); + if (ret) + cpuhp_remove_multi_state(arm_ni_hp_state); + return ret; +} + +static void __exit arm_ni_exit(void) +{ + platform_driver_unregister(&arm_ni_driver); + cpuhp_remove_multi_state(arm_ni_hp_state); +} + +module_init(arm_ni_init); +module_exit(arm_ni_exit); + +MODULE_AUTHOR("Robin Murphy "); +MODULE_DESCRIPTION("Arm NI-700 PMU driver"); +MODULE_LICENSE("GPL v2"); diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c index 8458fe2cebb4..398cce3d76fc 100644 --- a/drivers/perf/arm_pmu.c +++ b/drivers/perf/arm_pmu.c @@ -522,7 +522,7 @@ static void armpmu_enable(struct pmu *pmu) { struct arm_pmu *armpmu = to_arm_pmu(pmu); struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events); - bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events); + bool enabled = !bitmap_empty(hw_events->used_mask, ARMPMU_MAX_HWEVENTS); /* For task-bound events we may be called on other CPUs */ if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus)) @@ -742,7 +742,7 @@ static void cpu_pm_pmu_setup(struct arm_pmu *armpmu, unsigned long cmd) struct perf_event *event; int idx; - for (idx = 0; idx < armpmu->num_events; idx++) { + for_each_set_bit(idx, armpmu->cntr_mask, ARMPMU_MAX_HWEVENTS) { event = hw_events->events[idx]; if (!event) continue; @@ -772,7 +772,7 @@ static int cpu_pm_pmu_notify(struct notifier_block *b, unsigned long cmd, { struct arm_pmu *armpmu = container_of(b, struct arm_pmu, cpu_pm_nb); struct pmu_hw_events *hw_events = this_cpu_ptr(armpmu->hw_events); - bool enabled = !bitmap_empty(hw_events->used_mask, armpmu->num_events); + bool enabled = !bitmap_empty(hw_events->used_mask, ARMPMU_MAX_HWEVENTS); if (!cpumask_test_cpu(smp_processor_id(), &armpmu->supported_cpus)) return NOTIFY_DONE; @@ -924,8 +924,9 @@ int armpmu_register(struct arm_pmu *pmu) if (ret) goto out_destroy; - pr_info("enabled with %s PMU driver, %d counters available%s\n", - pmu->name, pmu->num_events, + pr_info("enabled with %s PMU driver, %d (%*pb) counters available%s\n", + pmu->name, bitmap_weight(pmu->cntr_mask, ARMPMU_MAX_HWEVENTS), + ARMPMU_MAX_HWEVENTS, &pmu->cntr_mask, has_nmi ? ", using NMIs" : ""); kvm_host_pmu_init(pmu); diff --git a/drivers/perf/arm_pmu_platform.c b/drivers/perf/arm_pmu_platform.c index 4b1a9a92ea11..118170a5cede 100644 --- a/drivers/perf/arm_pmu_platform.c +++ b/drivers/perf/arm_pmu_platform.c @@ -59,7 +59,7 @@ static int pmu_parse_percpu_irq(struct arm_pmu *pmu, int irq) static bool pmu_has_irq_affinity(struct device_node *node) { - return !!of_find_property(node, "interrupt-affinity", NULL); + return of_property_present(node, "interrupt-affinity"); } static int pmu_parse_irq_affinity(struct device *dev, int i) diff --git a/drivers/perf/arm_pmuv3.c b/drivers/perf/arm_pmuv3.c index d246840797b6..0afe02f879b4 100644 --- a/drivers/perf/arm_pmuv3.c +++ b/drivers/perf/arm_pmuv3.c @@ -451,13 +451,6 @@ static const struct attribute_group armv8_pmuv3_caps_attr_group = { .attrs = armv8_pmuv3_caps_attrs, }; -/* - * Perf Events' indices - */ -#define ARMV8_IDX_CYCLE_COUNTER 0 -#define ARMV8_IDX_COUNTER0 1 -#define ARMV8_IDX_CYCLE_COUNTER_USER 32 - /* * We unconditionally enable ARMv8.5-PMU long event counter support * (64-bit events) where supported. Indicate if this arm_pmu has long @@ -489,19 +482,12 @@ static bool armv8pmu_event_is_chained(struct perf_event *event) return !armv8pmu_event_has_user_read(event) && armv8pmu_event_is_64bit(event) && !armv8pmu_has_long_event(cpu_pmu) && - (idx != ARMV8_IDX_CYCLE_COUNTER); + (idx < ARMV8_PMU_MAX_GENERAL_COUNTERS); } /* * ARMv8 low level PMU access */ - -/* - * Perf Event to low level counters mapping - */ -#define ARMV8_IDX_TO_COUNTER(x) \ - (((x) - ARMV8_IDX_COUNTER0) & ARMV8_PMU_COUNTER_MASK) - static u64 armv8pmu_pmcr_read(void) { return read_pmcr(); @@ -514,21 +500,19 @@ static void armv8pmu_pmcr_write(u64 val) write_pmcr(val); } -static int armv8pmu_has_overflowed(u32 pmovsr) +static int armv8pmu_has_overflowed(u64 pmovsr) { - return pmovsr & ARMV8_PMU_OVERFLOWED_MASK; + return !!(pmovsr & ARMV8_PMU_OVERFLOWED_MASK); } -static int armv8pmu_counter_has_overflowed(u32 pmnc, int idx) +static int armv8pmu_counter_has_overflowed(u64 pmnc, int idx) { - return pmnc & BIT(ARMV8_IDX_TO_COUNTER(idx)); + return !!(pmnc & BIT(idx)); } static u64 armv8pmu_read_evcntr(int idx) { - u32 counter = ARMV8_IDX_TO_COUNTER(idx); - - return read_pmevcntrn(counter); + return read_pmevcntrn(idx); } static u64 armv8pmu_read_hw_counter(struct perf_event *event) @@ -557,7 +541,7 @@ static bool armv8pmu_event_needs_bias(struct perf_event *event) return false; if (armv8pmu_has_long_event(cpu_pmu) || - idx == ARMV8_IDX_CYCLE_COUNTER) + idx >= ARMV8_PMU_MAX_GENERAL_COUNTERS) return true; return false; @@ -585,8 +569,10 @@ static u64 armv8pmu_read_counter(struct perf_event *event) int idx = hwc->idx; u64 value; - if (idx == ARMV8_IDX_CYCLE_COUNTER) + if (idx == ARMV8_PMU_CYCLE_IDX) value = read_pmccntr(); + else if (idx == ARMV8_PMU_INSTR_IDX) + value = read_pmicntr(); else value = armv8pmu_read_hw_counter(event); @@ -595,9 +581,7 @@ static u64 armv8pmu_read_counter(struct perf_event *event) static void armv8pmu_write_evcntr(int idx, u64 value) { - u32 counter = ARMV8_IDX_TO_COUNTER(idx); - - write_pmevcntrn(counter, value); + write_pmevcntrn(idx, value); } static void armv8pmu_write_hw_counter(struct perf_event *event, @@ -620,15 +604,16 @@ static void armv8pmu_write_counter(struct perf_event *event, u64 value) value = armv8pmu_bias_long_counter(event, value); - if (idx == ARMV8_IDX_CYCLE_COUNTER) + if (idx == ARMV8_PMU_CYCLE_IDX) write_pmccntr(value); + else if (idx == ARMV8_PMU_INSTR_IDX) + write_pmicntr(value); else armv8pmu_write_hw_counter(event, value); } static void armv8pmu_write_evtype(int idx, unsigned long val) { - u32 counter = ARMV8_IDX_TO_COUNTER(idx); unsigned long mask = ARMV8_PMU_EVTYPE_EVENT | ARMV8_PMU_INCLUDE_EL2 | ARMV8_PMU_EXCLUDE_EL0 | @@ -638,7 +623,7 @@ static void armv8pmu_write_evtype(int idx, unsigned long val) mask |= ARMV8_PMU_EVTYPE_TC | ARMV8_PMU_EVTYPE_TH; val &= mask; - write_pmevtypern(counter, val); + write_pmevtypern(idx, val); } static void armv8pmu_write_event_type(struct perf_event *event) @@ -658,24 +643,26 @@ static void armv8pmu_write_event_type(struct perf_event *event) armv8pmu_write_evtype(idx - 1, hwc->config_base); armv8pmu_write_evtype(idx, chain_evt); } else { - if (idx == ARMV8_IDX_CYCLE_COUNTER) + if (idx == ARMV8_PMU_CYCLE_IDX) write_pmccfiltr(hwc->config_base); + else if (idx == ARMV8_PMU_INSTR_IDX) + write_pmicfiltr(hwc->config_base); else armv8pmu_write_evtype(idx, hwc->config_base); } } -static u32 armv8pmu_event_cnten_mask(struct perf_event *event) +static u64 armv8pmu_event_cnten_mask(struct perf_event *event) { - int counter = ARMV8_IDX_TO_COUNTER(event->hw.idx); - u32 mask = BIT(counter); + int counter = event->hw.idx; + u64 mask = BIT(counter); if (armv8pmu_event_is_chained(event)) mask |= BIT(counter - 1); return mask; } -static void armv8pmu_enable_counter(u32 mask) +static void armv8pmu_enable_counter(u64 mask) { /* * Make sure event configuration register writes are visible before we @@ -688,7 +675,7 @@ static void armv8pmu_enable_counter(u32 mask) static void armv8pmu_enable_event_counter(struct perf_event *event) { struct perf_event_attr *attr = &event->attr; - u32 mask = armv8pmu_event_cnten_mask(event); + u64 mask = armv8pmu_event_cnten_mask(event); kvm_set_pmu_events(mask, attr); @@ -697,7 +684,7 @@ static void armv8pmu_enable_event_counter(struct perf_event *event) armv8pmu_enable_counter(mask); } -static void armv8pmu_disable_counter(u32 mask) +static void armv8pmu_disable_counter(u64 mask) { write_pmcntenclr(mask); /* @@ -710,7 +697,7 @@ static void armv8pmu_disable_counter(u32 mask) static void armv8pmu_disable_event_counter(struct perf_event *event) { struct perf_event_attr *attr = &event->attr; - u32 mask = armv8pmu_event_cnten_mask(event); + u64 mask = armv8pmu_event_cnten_mask(event); kvm_clr_pmu_events(mask); @@ -719,18 +706,17 @@ static void armv8pmu_disable_event_counter(struct perf_event *event) armv8pmu_disable_counter(mask); } -static void armv8pmu_enable_intens(u32 mask) +static void armv8pmu_enable_intens(u64 mask) { write_pmintenset(mask); } static void armv8pmu_enable_event_irq(struct perf_event *event) { - u32 counter = ARMV8_IDX_TO_COUNTER(event->hw.idx); - armv8pmu_enable_intens(BIT(counter)); + armv8pmu_enable_intens(BIT(event->hw.idx)); } -static void armv8pmu_disable_intens(u32 mask) +static void armv8pmu_disable_intens(u64 mask) { write_pmintenclr(mask); isb(); @@ -741,13 +727,12 @@ static void armv8pmu_disable_intens(u32 mask) static void armv8pmu_disable_event_irq(struct perf_event *event) { - u32 counter = ARMV8_IDX_TO_COUNTER(event->hw.idx); - armv8pmu_disable_intens(BIT(counter)); + armv8pmu_disable_intens(BIT(event->hw.idx)); } -static u32 armv8pmu_getreset_flags(void) +static u64 armv8pmu_getreset_flags(void) { - u32 value; + u64 value; /* Read */ value = read_pmovsclr(); @@ -786,9 +771,12 @@ static void armv8pmu_enable_user_access(struct arm_pmu *cpu_pmu) struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events); /* Clear any unused counters to avoid leaking their contents */ - for_each_clear_bit(i, cpuc->used_mask, cpu_pmu->num_events) { - if (i == ARMV8_IDX_CYCLE_COUNTER) + for_each_andnot_bit(i, cpu_pmu->cntr_mask, cpuc->used_mask, + ARMPMU_MAX_HWEVENTS) { + if (i == ARMV8_PMU_CYCLE_IDX) write_pmccntr(0); + else if (i == ARMV8_PMU_INSTR_IDX) + write_pmicntr(0); else armv8pmu_write_evcntr(i, 0); } @@ -842,7 +830,7 @@ static void armv8pmu_stop(struct arm_pmu *cpu_pmu) static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) { - u32 pmovsr; + u64 pmovsr; struct perf_sample_data data; struct pmu_hw_events *cpuc = this_cpu_ptr(cpu_pmu->hw_events); struct pt_regs *regs; @@ -869,7 +857,7 @@ static irqreturn_t armv8pmu_handle_irq(struct arm_pmu *cpu_pmu) * to prevent skews in group events. */ armv8pmu_stop(cpu_pmu); - for (idx = 0; idx < cpu_pmu->num_events; ++idx) { + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS) { struct perf_event *event = cpuc->events[idx]; struct hw_perf_event *hwc; @@ -908,7 +896,7 @@ static int armv8pmu_get_single_idx(struct pmu_hw_events *cpuc, { int idx; - for (idx = ARMV8_IDX_COUNTER0; idx < cpu_pmu->num_events; idx++) { + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS) { if (!test_and_set_bit(idx, cpuc->used_mask)) return idx; } @@ -924,7 +912,9 @@ static int armv8pmu_get_chain_idx(struct pmu_hw_events *cpuc, * Chaining requires two consecutive event counters, where * the lower idx must be even. */ - for (idx = ARMV8_IDX_COUNTER0 + 1; idx < cpu_pmu->num_events; idx += 2) { + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV8_PMU_MAX_GENERAL_COUNTERS) { + if (!(idx & 0x1)) + continue; if (!test_and_set_bit(idx, cpuc->used_mask)) { /* Check if the preceding even counter is available */ if (!test_and_set_bit(idx - 1, cpuc->used_mask)) @@ -946,14 +936,27 @@ static int armv8pmu_get_event_idx(struct pmu_hw_events *cpuc, /* Always prefer to place a cycle counter into the cycle counter. */ if ((evtype == ARMV8_PMUV3_PERFCTR_CPU_CYCLES) && !armv8pmu_event_get_threshold(&event->attr)) { - if (!test_and_set_bit(ARMV8_IDX_CYCLE_COUNTER, cpuc->used_mask)) - return ARMV8_IDX_CYCLE_COUNTER; + if (!test_and_set_bit(ARMV8_PMU_CYCLE_IDX, cpuc->used_mask)) + return ARMV8_PMU_CYCLE_IDX; else if (armv8pmu_event_is_64bit(event) && armv8pmu_event_want_user_access(event) && !armv8pmu_has_long_event(cpu_pmu)) return -EAGAIN; } + /* + * Always prefer to place a instruction counter into the instruction counter, + * but don't expose the instruction counter to userspace access as userspace + * may not know how to handle it. + */ + if ((evtype == ARMV8_PMUV3_PERFCTR_INST_RETIRED) && + !armv8pmu_event_get_threshold(&event->attr) && + test_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask) && + !armv8pmu_event_want_user_access(event)) { + if (!test_and_set_bit(ARMV8_PMU_INSTR_IDX, cpuc->used_mask)) + return ARMV8_PMU_INSTR_IDX; + } + /* * Otherwise use events counters */ @@ -978,15 +981,7 @@ static int armv8pmu_user_event_idx(struct perf_event *event) if (!sysctl_perf_user_access || !armv8pmu_event_has_user_read(event)) return 0; - /* - * We remap the cycle counter index to 32 to - * match the offset applied to the rest of - * the counter indices. - */ - if (event->hw.idx == ARMV8_IDX_CYCLE_COUNTER) - return ARMV8_IDX_CYCLE_COUNTER_USER; - - return event->hw.idx; + return event->hw.idx + 1; } /* @@ -1061,14 +1056,16 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, static void armv8pmu_reset(void *info) { struct arm_pmu *cpu_pmu = (struct arm_pmu *)info; - u64 pmcr; + u64 pmcr, mask; + + bitmap_to_arr64(&mask, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS); /* The counter and interrupt enable registers are unknown at reset. */ - armv8pmu_disable_counter(U32_MAX); - armv8pmu_disable_intens(U32_MAX); + armv8pmu_disable_counter(mask); + armv8pmu_disable_intens(mask); /* Clear the counters we flip at guest entry/exit */ - kvm_clr_pmu_events(U32_MAX); + kvm_clr_pmu_events(mask); /* * Initialize & Reset PMNC. Request overflow interrupt for @@ -1089,14 +1086,14 @@ static int __armv8_pmuv3_map_event_id(struct arm_pmu *armpmu, if (event->attr.type == PERF_TYPE_HARDWARE && event->attr.config == PERF_COUNT_HW_BRANCH_INSTRUCTIONS) { - if (test_bit(ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED, - armpmu->pmceid_bitmap)) - return ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED; - if (test_bit(ARMV8_PMUV3_PERFCTR_BR_RETIRED, armpmu->pmceid_bitmap)) return ARMV8_PMUV3_PERFCTR_BR_RETIRED; + if (test_bit(ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED, + armpmu->pmceid_bitmap)) + return ARMV8_PMUV3_PERFCTR_PC_WRITE_RETIRED; + return HW_OP_UNSUPPORTED; } @@ -1211,10 +1208,15 @@ static void __armv8pmu_probe_pmu(void *info) probe->present = true; /* Read the nb of CNTx counters supported from PMNC */ - cpu_pmu->num_events = FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read()); + bitmap_set(cpu_pmu->cntr_mask, + 0, FIELD_GET(ARMV8_PMU_PMCR_N, armv8pmu_pmcr_read())); /* Add the CPU cycles counter */ - cpu_pmu->num_events += 1; + set_bit(ARMV8_PMU_CYCLE_IDX, cpu_pmu->cntr_mask); + + /* Add the CPU instructions counter */ + if (pmuv3_has_icntr()) + set_bit(ARMV8_PMU_INSTR_IDX, cpu_pmu->cntr_mask); pmceid[0] = pmceid_raw[0] = read_pmceid0(); pmceid[1] = pmceid_raw[1] = read_pmceid1(); diff --git a/drivers/perf/arm_spe_pmu.c b/drivers/perf/arm_spe_pmu.c index 9100d82bfabc..3569050f9cf3 100644 --- a/drivers/perf/arm_spe_pmu.c +++ b/drivers/perf/arm_spe_pmu.c @@ -41,7 +41,7 @@ /* * Cache if the event is allowed to trace Context information. - * This allows us to perform the check, i.e, perfmon_capable(), + * This allows us to perform the check, i.e, perf_allow_kernel(), * in the context of the event owner, once, during the event_init(). */ #define SPE_PMU_HW_FLAGS_CX 0x00001 @@ -50,7 +50,7 @@ static_assert((PERF_EVENT_FLAG_ARCH & SPE_PMU_HW_FLAGS_CX) == SPE_PMU_HW_FLAGS_C static void set_spe_event_has_cx(struct perf_event *event) { - if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && perfmon_capable()) + if (IS_ENABLED(CONFIG_PID_IN_CONTEXTIDR) && !perf_allow_kernel(&event->attr)) event->hw.flags |= SPE_PMU_HW_FLAGS_CX; } @@ -745,9 +745,8 @@ static int arm_spe_pmu_event_init(struct perf_event *event) set_spe_event_has_cx(event); reg = arm_spe_event_to_pmscr(event); - if (!perfmon_capable() && - (reg & (PMSCR_EL1_PA | PMSCR_EL1_PCT))) - return -EACCES; + if (reg & (PMSCR_EL1_PA | PMSCR_EL1_PCT)) + return perf_allow_kernel(&event->attr); return 0; } diff --git a/drivers/perf/arm_v6_pmu.c b/drivers/perf/arm_v6_pmu.c index 0bb685b4bac5..b09615bb2bb2 100644 --- a/drivers/perf/arm_v6_pmu.c +++ b/drivers/perf/arm_v6_pmu.c @@ -64,6 +64,7 @@ enum armv6_counters { ARMV6_CYCLE_COUNTER = 0, ARMV6_COUNTER0, ARMV6_COUNTER1, + ARMV6_NUM_COUNTERS }; /* @@ -254,7 +255,7 @@ armv6pmu_handle_irq(struct arm_pmu *cpu_pmu) */ armv6_pmcr_write(pmcr); - for (idx = 0; idx < cpu_pmu->num_events; ++idx) { + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV6_NUM_COUNTERS) { struct perf_event *event = cpuc->events[idx]; struct hw_perf_event *hwc; @@ -391,7 +392,8 @@ static void armv6pmu_init(struct arm_pmu *cpu_pmu) cpu_pmu->start = armv6pmu_start; cpu_pmu->stop = armv6pmu_stop; cpu_pmu->map_event = armv6_map_event; - cpu_pmu->num_events = 3; + + bitmap_set(cpu_pmu->cntr_mask, 0, ARMV6_NUM_COUNTERS); } static int armv6_1136_pmu_init(struct arm_pmu *cpu_pmu) diff --git a/drivers/perf/arm_v7_pmu.c b/drivers/perf/arm_v7_pmu.c index 928ac3d626ed..420cadd108e7 100644 --- a/drivers/perf/arm_v7_pmu.c +++ b/drivers/perf/arm_v7_pmu.c @@ -649,24 +649,12 @@ static struct attribute_group armv7_pmuv2_events_attr_group = { /* * Perf Events' indices */ -#define ARMV7_IDX_CYCLE_COUNTER 0 -#define ARMV7_IDX_COUNTER0 1 -#define ARMV7_IDX_COUNTER_LAST(cpu_pmu) \ - (ARMV7_IDX_CYCLE_COUNTER + cpu_pmu->num_events - 1) - -#define ARMV7_MAX_COUNTERS 32 -#define ARMV7_COUNTER_MASK (ARMV7_MAX_COUNTERS - 1) - +#define ARMV7_IDX_CYCLE_COUNTER 31 +#define ARMV7_IDX_COUNTER_MAX 31 /* * ARMv7 low level PMNC access */ -/* - * Perf Event to low level counters mapping - */ -#define ARMV7_IDX_TO_COUNTER(x) \ - (((x) - ARMV7_IDX_COUNTER0) & ARMV7_COUNTER_MASK) - /* * Per-CPU PMNC: config reg */ @@ -725,19 +713,17 @@ static inline int armv7_pmnc_has_overflowed(u32 pmnc) static inline int armv7_pmnc_counter_valid(struct arm_pmu *cpu_pmu, int idx) { - return idx >= ARMV7_IDX_CYCLE_COUNTER && - idx <= ARMV7_IDX_COUNTER_LAST(cpu_pmu); + return test_bit(idx, cpu_pmu->cntr_mask); } static inline int armv7_pmnc_counter_has_overflowed(u32 pmnc, int idx) { - return pmnc & BIT(ARMV7_IDX_TO_COUNTER(idx)); + return pmnc & BIT(idx); } static inline void armv7_pmnc_select_counter(int idx) { - u32 counter = ARMV7_IDX_TO_COUNTER(idx); - asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (counter)); + asm volatile("mcr p15, 0, %0, c9, c12, 5" : : "r" (idx)); isb(); } @@ -787,29 +773,25 @@ static inline void armv7_pmnc_write_evtsel(int idx, u32 val) static inline void armv7_pmnc_enable_counter(int idx) { - u32 counter = ARMV7_IDX_TO_COUNTER(idx); - asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (BIT(counter))); + asm volatile("mcr p15, 0, %0, c9, c12, 1" : : "r" (BIT(idx))); } static inline void armv7_pmnc_disable_counter(int idx) { - u32 counter = ARMV7_IDX_TO_COUNTER(idx); - asm volatile("mcr p15, 0, %0, c9, c12, 2" : : "r" (BIT(counter))); + asm volatile("mcr p15, 0, %0, c9, c12, 2" : : "r" (BIT(idx))); } static inline void armv7_pmnc_enable_intens(int idx) { - u32 counter = ARMV7_IDX_TO_COUNTER(idx); - asm volatile("mcr p15, 0, %0, c9, c14, 1" : : "r" (BIT(counter))); + asm volatile("mcr p15, 0, %0, c9, c14, 1" : : "r" (BIT(idx))); } static inline void armv7_pmnc_disable_intens(int idx) { - u32 counter = ARMV7_IDX_TO_COUNTER(idx); - asm volatile("mcr p15, 0, %0, c9, c14, 2" : : "r" (BIT(counter))); + asm volatile("mcr p15, 0, %0, c9, c14, 2" : : "r" (BIT(idx))); isb(); /* Clear the overflow flag in case an interrupt is pending. */ - asm volatile("mcr p15, 0, %0, c9, c12, 3" : : "r" (BIT(counter))); + asm volatile("mcr p15, 0, %0, c9, c12, 3" : : "r" (BIT(idx))); isb(); } @@ -853,15 +835,12 @@ static void armv7_pmnc_dump_regs(struct arm_pmu *cpu_pmu) asm volatile("mrc p15, 0, %0, c9, c13, 0" : "=r" (val)); pr_info("CCNT =0x%08x\n", val); - for (cnt = ARMV7_IDX_COUNTER0; - cnt <= ARMV7_IDX_COUNTER_LAST(cpu_pmu); cnt++) { + for_each_set_bit(cnt, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) { armv7_pmnc_select_counter(cnt); asm volatile("mrc p15, 0, %0, c9, c13, 2" : "=r" (val)); - pr_info("CNT[%d] count =0x%08x\n", - ARMV7_IDX_TO_COUNTER(cnt), val); + pr_info("CNT[%d] count =0x%08x\n", cnt, val); asm volatile("mrc p15, 0, %0, c9, c13, 1" : "=r" (val)); - pr_info("CNT[%d] evtsel=0x%08x\n", - ARMV7_IDX_TO_COUNTER(cnt), val); + pr_info("CNT[%d] evtsel=0x%08x\n", cnt, val); } } #endif @@ -958,7 +937,7 @@ static irqreturn_t armv7pmu_handle_irq(struct arm_pmu *cpu_pmu) */ regs = get_irq_regs(); - for (idx = 0; idx < cpu_pmu->num_events; ++idx) { + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS) { struct perf_event *event = cpuc->events[idx]; struct hw_perf_event *hwc; @@ -1027,7 +1006,7 @@ static int armv7pmu_get_event_idx(struct pmu_hw_events *cpuc, * For anything other than a cycle counter, try and use * the events counters */ - for (idx = ARMV7_IDX_COUNTER0; idx < cpu_pmu->num_events; ++idx) { + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) { if (!test_and_set_bit(idx, cpuc->used_mask)) return idx; } @@ -1073,7 +1052,7 @@ static int armv7pmu_set_event_filter(struct hw_perf_event *event, static void armv7pmu_reset(void *info) { struct arm_pmu *cpu_pmu = (struct arm_pmu *)info; - u32 idx, nb_cnt = cpu_pmu->num_events, val; + u32 idx, val; if (cpu_pmu->secure_access) { asm volatile("mrc p15, 0, %0, c1, c1, 1" : "=r" (val)); @@ -1082,7 +1061,7 @@ static void armv7pmu_reset(void *info) } /* The counter and interrupt enable registers are unknown at reset. */ - for (idx = ARMV7_IDX_CYCLE_COUNTER; idx < nb_cnt; ++idx) { + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMPMU_MAX_HWEVENTS) { armv7_pmnc_disable_counter(idx); armv7_pmnc_disable_intens(idx); } @@ -1161,20 +1140,22 @@ static void armv7pmu_init(struct arm_pmu *cpu_pmu) static void armv7_read_num_pmnc_events(void *info) { - int *nb_cnt = info; + int nb_cnt; + struct arm_pmu *cpu_pmu = info; /* Read the nb of CNTx counters supported from PMNC */ - *nb_cnt = (armv7_pmnc_read() >> ARMV7_PMNC_N_SHIFT) & ARMV7_PMNC_N_MASK; + nb_cnt = (armv7_pmnc_read() >> ARMV7_PMNC_N_SHIFT) & ARMV7_PMNC_N_MASK; + bitmap_set(cpu_pmu->cntr_mask, 0, nb_cnt); /* Add the CPU cycles counter */ - *nb_cnt += 1; + set_bit(ARMV7_IDX_CYCLE_COUNTER, cpu_pmu->cntr_mask); } static int armv7_probe_num_events(struct arm_pmu *arm_pmu) { return smp_call_function_any(&arm_pmu->supported_cpus, armv7_read_num_pmnc_events, - &arm_pmu->num_events, 1); + arm_pmu, 1); } static int armv7_a8_pmu_init(struct arm_pmu *cpu_pmu) @@ -1524,7 +1505,7 @@ static void krait_pmu_reset(void *info) { u32 vval, fval; struct arm_pmu *cpu_pmu = info; - u32 idx, nb_cnt = cpu_pmu->num_events; + u32 idx; armv7pmu_reset(info); @@ -1538,7 +1519,7 @@ static void krait_pmu_reset(void *info) venum_post_pmresr(vval, fval); /* Reset PMxEVNCTCR to sane default */ - for (idx = ARMV7_IDX_CYCLE_COUNTER; idx < nb_cnt; ++idx) { + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) { armv7_pmnc_select_counter(idx); asm volatile("mcr p15, 0, %0, c9, c15, 0" : : "r" (0)); } @@ -1562,7 +1543,7 @@ static int krait_event_to_bit(struct perf_event *event, unsigned int region, * Lower bits are reserved for use by the counters (see * armv7pmu_get_event_idx() for more info) */ - bit += ARMV7_IDX_COUNTER_LAST(cpu_pmu) + 1; + bit += bitmap_weight(cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX); return bit; } @@ -1845,7 +1826,7 @@ static void scorpion_pmu_reset(void *info) { u32 vval, fval; struct arm_pmu *cpu_pmu = info; - u32 idx, nb_cnt = cpu_pmu->num_events; + u32 idx; armv7pmu_reset(info); @@ -1860,7 +1841,7 @@ static void scorpion_pmu_reset(void *info) venum_post_pmresr(vval, fval); /* Reset PMxEVNCTCR to sane default */ - for (idx = ARMV7_IDX_CYCLE_COUNTER; idx < nb_cnt; ++idx) { + for_each_set_bit(idx, cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX) { armv7_pmnc_select_counter(idx); asm volatile("mcr p15, 0, %0, c9, c15, 0" : : "r" (0)); } @@ -1883,7 +1864,7 @@ static int scorpion_event_to_bit(struct perf_event *event, unsigned int region, * Lower bits are reserved for use by the counters (see * armv7pmu_get_event_idx() for more info) */ - bit += ARMV7_IDX_COUNTER_LAST(cpu_pmu) + 1; + bit += bitmap_weight(cpu_pmu->cntr_mask, ARMV7_IDX_COUNTER_MAX); return bit; } diff --git a/drivers/perf/arm_xscale_pmu.c b/drivers/perf/arm_xscale_pmu.c index 3d8b72d6b37f..638fea9b1263 100644 --- a/drivers/perf/arm_xscale_pmu.c +++ b/drivers/perf/arm_xscale_pmu.c @@ -53,6 +53,8 @@ enum xscale_counters { XSCALE_COUNTER2, XSCALE_COUNTER3, }; +#define XSCALE1_NUM_COUNTERS 3 +#define XSCALE2_NUM_COUNTERS 5 static const unsigned xscale_perf_map[PERF_COUNT_HW_MAX] = { PERF_MAP_ALL_UNSUPPORTED, @@ -168,7 +170,7 @@ xscale1pmu_handle_irq(struct arm_pmu *cpu_pmu) regs = get_irq_regs(); - for (idx = 0; idx < cpu_pmu->num_events; ++idx) { + for_each_set_bit(idx, cpu_pmu->cntr_mask, XSCALE1_NUM_COUNTERS) { struct perf_event *event = cpuc->events[idx]; struct hw_perf_event *hwc; @@ -364,7 +366,8 @@ static int xscale1pmu_init(struct arm_pmu *cpu_pmu) cpu_pmu->start = xscale1pmu_start; cpu_pmu->stop = xscale1pmu_stop; cpu_pmu->map_event = xscale_map_event; - cpu_pmu->num_events = 3; + + bitmap_set(cpu_pmu->cntr_mask, 0, XSCALE1_NUM_COUNTERS); return 0; } @@ -500,7 +503,7 @@ xscale2pmu_handle_irq(struct arm_pmu *cpu_pmu) regs = get_irq_regs(); - for (idx = 0; idx < cpu_pmu->num_events; ++idx) { + for_each_set_bit(idx, cpu_pmu->cntr_mask, XSCALE2_NUM_COUNTERS) { struct perf_event *event = cpuc->events[idx]; struct hw_perf_event *hwc; @@ -719,7 +722,8 @@ static int xscale2pmu_init(struct arm_pmu *cpu_pmu) cpu_pmu->start = xscale2pmu_start; cpu_pmu->stop = xscale2pmu_stop; cpu_pmu->map_event = xscale_map_event; - cpu_pmu->num_events = 5; + + bitmap_set(cpu_pmu->cntr_mask, 0, XSCALE2_NUM_COUNTERS); return 0; } diff --git a/drivers/perf/dwc_pcie_pmu.c b/drivers/perf/dwc_pcie_pmu.c index c5e328f23841..4ca50f9b6dfe 100644 --- a/drivers/perf/dwc_pcie_pmu.c +++ b/drivers/perf/dwc_pcie_pmu.c @@ -107,6 +107,7 @@ struct dwc_pcie_vendor_id { static const struct dwc_pcie_vendor_id dwc_pcie_vendor_ids[] = { {.vendor_id = PCI_VENDOR_ID_ALIBABA }, + {.vendor_id = PCI_VENDOR_ID_QCOM }, {} /* terminator */ }; @@ -556,10 +557,10 @@ static int dwc_pcie_register_dev(struct pci_dev *pdev) { struct platform_device *plat_dev; struct dwc_pcie_dev_info *dev_info; - u32 bdf; + u32 sbdf; - bdf = PCI_DEVID(pdev->bus->number, pdev->devfn); - plat_dev = platform_device_register_data(NULL, "dwc_pcie_pmu", bdf, + sbdf = (pci_domain_nr(pdev->bus) << 16) | PCI_DEVID(pdev->bus->number, pdev->devfn); + plat_dev = platform_device_register_data(NULL, "dwc_pcie_pmu", sbdf, pdev, sizeof(*pdev)); if (IS_ERR(plat_dev)) @@ -611,15 +612,15 @@ static int dwc_pcie_pmu_probe(struct platform_device *plat_dev) struct pci_dev *pdev = plat_dev->dev.platform_data; struct dwc_pcie_pmu *pcie_pmu; char *name; - u32 bdf, val; + u32 sbdf, val; u16 vsec; int ret; vsec = pci_find_vsec_capability(pdev, pdev->vendor, DWC_PCIE_VSEC_RAS_DES_ID); pci_read_config_dword(pdev, vsec + PCI_VNDR_HEADER, &val); - bdf = PCI_DEVID(pdev->bus->number, pdev->devfn); - name = devm_kasprintf(&plat_dev->dev, GFP_KERNEL, "dwc_rootport_%x", bdf); + sbdf = plat_dev->id; + name = devm_kasprintf(&plat_dev->dev, GFP_KERNEL, "dwc_rootport_%x", sbdf); if (!name) return -ENOMEM; @@ -650,7 +651,7 @@ static int dwc_pcie_pmu_probe(struct platform_device *plat_dev) ret = cpuhp_state_add_instance(dwc_pcie_pmu_hp_state, &pcie_pmu->cpuhp_node); if (ret) { - pci_err(pdev, "Error %d registering hotplug @%x\n", ret, bdf); + pci_err(pdev, "Error %d registering hotplug @%x\n", ret, sbdf); return ret; } @@ -663,7 +664,7 @@ static int dwc_pcie_pmu_probe(struct platform_device *plat_dev) ret = perf_pmu_register(&pcie_pmu->pmu, name, -1); if (ret) { - pci_err(pdev, "Error %d registering PMU @%x\n", ret, bdf); + pci_err(pdev, "Error %d registering PMU @%x\n", ret, sbdf); return ret; } ret = devm_add_action_or_reset(&plat_dev->dev, dwc_pcie_unregister_pmu, @@ -726,7 +727,6 @@ static struct platform_driver dwc_pcie_pmu_driver = { static int __init dwc_pcie_pmu_init(void) { struct pci_dev *pdev = NULL; - bool found = false; int ret; for_each_pci_dev(pdev) { @@ -738,11 +738,7 @@ static int __init dwc_pcie_pmu_init(void) pci_dev_put(pdev); return ret; } - - found = true; } - if (!found) - return -ENODEV; ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, "perf/dwc_pcie_pmu:online", diff --git a/drivers/perf/hisilicon/hisi_pcie_pmu.c b/drivers/perf/hisilicon/hisi_pcie_pmu.c index f06027574a24..c5394d007b61 100644 --- a/drivers/perf/hisilicon/hisi_pcie_pmu.c +++ b/drivers/perf/hisilicon/hisi_pcie_pmu.c @@ -141,6 +141,22 @@ static ssize_t bus_show(struct device *dev, struct device_attribute *attr, char } static DEVICE_ATTR_RO(bus); +static ssize_t bdf_min_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(dev_get_drvdata(dev)); + + return sysfs_emit(buf, "%#04x\n", pcie_pmu->bdf_min); +} +static DEVICE_ATTR_RO(bdf_min); + +static ssize_t bdf_max_show(struct device *dev, struct device_attribute *attr, char *buf) +{ + struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(dev_get_drvdata(dev)); + + return sysfs_emit(buf, "%#04x\n", pcie_pmu->bdf_max); +} +static DEVICE_ATTR_RO(bdf_max); + static struct hisi_pcie_reg_pair hisi_pcie_parse_reg_value(struct hisi_pcie_pmu *pcie_pmu, u32 reg_off) { @@ -208,7 +224,7 @@ static void hisi_pcie_pmu_writeq(struct hisi_pcie_pmu *pcie_pmu, u32 reg_offset, static u64 hisi_pcie_pmu_get_event_ctrl_val(struct perf_event *event) { u64 port, trig_len, thr_len, len_mode; - u64 reg = HISI_PCIE_INIT_SET; + u64 reg = 0; /* Config HISI_PCIE_EVENT_CTRL according to event. */ reg |= FIELD_PREP(HISI_PCIE_EVENT_M, hisi_pcie_get_real_event(event)); @@ -452,10 +468,24 @@ static void hisi_pcie_pmu_set_period(struct perf_event *event) struct hisi_pcie_pmu *pcie_pmu = to_pcie_pmu(event->pmu); struct hw_perf_event *hwc = &event->hw; int idx = hwc->idx; + u64 orig_cnt, cnt; + + orig_cnt = hisi_pcie_pmu_read_counter(event); local64_set(&hwc->prev_count, HISI_PCIE_INIT_VAL); hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_CNT, idx, HISI_PCIE_INIT_VAL); hisi_pcie_pmu_writeq(pcie_pmu, HISI_PCIE_EXT_CNT, idx, HISI_PCIE_INIT_VAL); + + /* + * The counter maybe unwritable if the target event is unsupported. + * Check this by comparing the counts after setting the period. If + * the counts stay unchanged after setting the period then update + * the hwc->prev_count correctly. Otherwise the final counts user + * get maybe totally wrong. + */ + cnt = hisi_pcie_pmu_read_counter(event); + if (orig_cnt == cnt) + local64_set(&hwc->prev_count, cnt); } static void hisi_pcie_pmu_enable_counter(struct hisi_pcie_pmu *pcie_pmu, struct hw_perf_event *hwc) @@ -749,6 +779,8 @@ static const struct attribute_group hisi_pcie_pmu_format_group = { static struct attribute *hisi_pcie_pmu_bus_attrs[] = { &dev_attr_bus.attr, + &dev_attr_bdf_max.attr, + &dev_attr_bdf_min.attr, NULL }; diff --git a/include/kvm/arm_pmu.h b/include/kvm/arm_pmu.h index 35d4ca4f6122..e08aeec5d936 100644 --- a/include/kvm/arm_pmu.h +++ b/include/kvm/arm_pmu.h @@ -10,7 +10,7 @@ #include #include -#define ARMV8_PMU_CYCLE_IDX (ARMV8_PMU_MAX_COUNTERS - 1) +#define KVM_ARMV8_PMU_MAX_COUNTERS 32 #if IS_ENABLED(CONFIG_HW_PERF_EVENTS) && IS_ENABLED(CONFIG_KVM) struct kvm_pmc { @@ -19,14 +19,14 @@ struct kvm_pmc { }; struct kvm_pmu_events { - u32 events_host; - u32 events_guest; + u64 events_host; + u64 events_guest; }; struct kvm_pmu { struct irq_work overflow_work; struct kvm_pmu_events events; - struct kvm_pmc pmc[ARMV8_PMU_MAX_COUNTERS]; + struct kvm_pmc pmc[KVM_ARMV8_PMU_MAX_COUNTERS]; int irq_num; bool created; bool irq_level; diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h index b3b34f6670cf..4b5b83677e3f 100644 --- a/include/linux/perf/arm_pmu.h +++ b/include/linux/perf/arm_pmu.h @@ -17,10 +17,14 @@ #ifdef CONFIG_ARM_PMU /* - * The ARMv7 CPU PMU supports up to 32 event counters. + * The Armv7 and Armv8.8 or less CPU PMU supports up to 32 event counters. + * The Armv8.9/9.4 CPU PMU supports up to 33 event counters. */ +#ifdef CONFIG_ARM #define ARMPMU_MAX_HWEVENTS 32 - +#else +#define ARMPMU_MAX_HWEVENTS 33 +#endif /* * ARM PMU hw_event flags */ @@ -96,7 +100,7 @@ struct arm_pmu { void (*stop)(struct arm_pmu *); void (*reset)(void *); int (*map_event)(struct perf_event *event); - int num_events; + DECLARE_BITMAP(cntr_mask, ARMPMU_MAX_HWEVENTS); bool secure_access; /* 32-bit ARM only */ #define ARMV8_PMUV3_MAX_COMMON_EVENTS 0x40 DECLARE_BITMAP(pmceid_bitmap, ARMV8_PMUV3_MAX_COMMON_EVENTS); diff --git a/include/linux/perf/arm_pmuv3.h b/include/linux/perf/arm_pmuv3.h index 7867db04ec98..3372c1b56486 100644 --- a/include/linux/perf/arm_pmuv3.h +++ b/include/linux/perf/arm_pmuv3.h @@ -6,8 +6,9 @@ #ifndef __PERF_ARM_PMUV3_H #define __PERF_ARM_PMUV3_H -#define ARMV8_PMU_MAX_COUNTERS 32 -#define ARMV8_PMU_COUNTER_MASK (ARMV8_PMU_MAX_COUNTERS - 1) +#define ARMV8_PMU_MAX_GENERAL_COUNTERS 31 +#define ARMV8_PMU_CYCLE_IDX 31 +#define ARMV8_PMU_INSTR_IDX 32 /* Not accessible from AArch32 */ /* * Common architectural and microarchitectural event numbers. @@ -227,8 +228,10 @@ */ #define ARMV8_PMU_OVSR_P GENMASK(30, 0) #define ARMV8_PMU_OVSR_C BIT(31) +#define ARMV8_PMU_OVSR_F BIT_ULL(32) /* arm64 only */ /* Mask for writable bits is both P and C fields */ -#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C) +#define ARMV8_PMU_OVERFLOWED_MASK (ARMV8_PMU_OVSR_P | ARMV8_PMU_OVSR_C | \ + ARMV8_PMU_OVSR_F) /* * PMXEVTYPER: Event selection reg diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index 1a8942277dda..e336306b8c08 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -1602,13 +1602,7 @@ static inline int perf_is_paranoid(void) return sysctl_perf_event_paranoid > -1; } -static inline int perf_allow_kernel(struct perf_event_attr *attr) -{ - if (sysctl_perf_event_paranoid > 1 && !perfmon_capable()) - return -EACCES; - - return security_perf_event_open(attr, PERF_SECURITY_KERNEL); -} +int perf_allow_kernel(struct perf_event_attr *attr); static inline int perf_allow_cpu(struct perf_event_attr *attr) { diff --git a/kernel/events/core.c b/kernel/events/core.c index aa3450bdc227..ae7d63c0c593 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -13351,6 +13351,15 @@ const struct perf_event_attr *perf_event_attrs(struct perf_event *event) return &event->attr; } +int perf_allow_kernel(struct perf_event_attr *attr) +{ + if (sysctl_perf_event_paranoid > 1 && !perfmon_capable()) + return -EACCES; + + return security_perf_event_open(attr, PERF_SECURITY_KERNEL); +} +EXPORT_SYMBOL_GPL(perf_allow_kernel); + /* * Inherit an event from parent task to child task. *