2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-23 20:53:53 +08:00
Commit Graph

2027 Commits

Author SHA1 Message Date
Srinivas Pandruvada
1443ebbacf cpufreq: intel_pstate: Fix sysfs limits enforcement for performance policy
A side effect of keeping intel_pstate sysfs limits in sync with cpufreq
is that the now sysfs limits can't enforced under performance policy.

For example, if the max_perf_pct is changed from 100 to 80, this will call
intel_pstate_set_policy(), which will change the max_perf to 100 again for
performance policy. Same issue happens, when no_turbo is set.

This change calculates max and min frequency using sysfs performance
limits in intel_pstate_verify_policy() and adjusts policy limits by
calling cpufreq_verify_within_limits().

Also, it causes the setting of performance limits to be skipped if
no_turbo is set.

Fixes: 111b8b3fe4 (cpufreq: intel_pstate: Always keep all limits settings in sync)
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-20 03:35:27 +01:00
Hoan Tran
e11b6293a8 cpufreq: dt: Add support for APM X-Gene 2
Add the compatible string for supporting the generic device tree cpufreq-dt
driver on APM's X-Gene 2 SoC.

Signed-off-by: Hoan Tran <hotran@apm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2017-01-05 00:27:51 +01:00
Rafael J. Wysocki
111b8b3fe4 cpufreq: intel_pstate: Always keep all limits settings in sync
Make intel_pstate update per-logical-CPU limits when the global
settings are changed to ensure that they are always in sync and
users will not see confusing values in per-logical-CPU sysfs
attributes.

This also fixes the problem that setting the "no_turbo" global
attribute to 1 in the "passive" mode (ie. when intel_pstate acts
as a regular cpufreq driver) when scaling_governor is set to
"performance" has no effect.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-12-31 21:48:44 +01:00
Rafael J. Wysocki
cad3046796 cpufreq: intel_pstate: Use locking in intel_cpufreq_verify_policy()
Race conditions are possible if intel_cpufreq_verify_policy()
is executed in parallel with global limits updates from sysfs,
so the invocation of intel_pstate_update_perf_limits() in it
should be carried out under intel_pstate_limits_lock.

Make that happen.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-12-31 21:48:43 +01:00
Rafael J. Wysocki
aa439248ab cpufreq: intel_pstate: Use locking in intel_pstate_resume()
Theoretically, intel_pstate_resume() may be executed in parallel
with intel_pstate_set_policy(), if the latter is invoked via
cpufreq_update_policy() as a result of a notification, so use
intel_pstate_limits_lock in there too to avoid race conditions.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-12-31 21:48:42 +01:00
Rafael J. Wysocki
366430b5c2 cpufreq: intel_pstate: Do not expose PID parameters in passive mode
If intel_pstate works in the passive mode in which it acts as
a regular cpufreq driver and collaborates with generic cpufreq
governors, the PID parameters are not used, so do not expose
them via debugfs in that case.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-27 03:30:11 +01:00
Arnd Bergmann
adec57c61c cpufreq: s3c64xx: remove incorrect __init annotation
s3c64xx_cpufreq_config_regulator is incorrectly annotated
as __init, since the caller is also not init:

WARNING: vmlinux.o(.text+0x92fe1c): Section mismatch in reference from the function s3c64xx_cpufreq_driver_init() to the function .init.text:s3c64xx_cpufreq_config_regulator()

With modern gcc versions, the function gets inline, so we don't
see the warning, this only happens with gcc-4.6 and older.

Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-21 02:54:18 +01:00
Boris Ostrovsky
2a8fa123d9 cpufreq: Remove CPU hotplug callbacks only if they were initialized
Since CPU hotplug callbacks are requested for CPUHP_AP_ONLINE_DYN state,
successful callback initialization will result in cpuhp_setup_state()
returning a positive value. Therefore acpi_cpufreq_online being zero
indicates that callbacks have not been installed.

This means that acpi_cpufreq_boost_exit() should only remove them if
acpi_cpufreq_online is positive. Trying to call
cpuhp_remove_state_nocalls(0) will cause a BUG().

Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-21 02:52:52 +01:00
Srinivas Pandruvada
984edbdccc cpufreq: intel_pstate: Support for energy performance hints with HWP
It is possible to provide hints to the HWP algorithms in the processor
to be more performance centric to more energy centric. These hints are
provided by using HWP energy performance preference (EPP) or energy
performance bias (EPB) settings.

The scope of these settings is per logical processor, which means that
each of the logical processors in the package can be programmed with a
different value.

This change provides cpufreq sysfs interface to provide hint. For each
policy, two additional attributes will be available to check and provide
hint. These attributes will only be present when the intel_pstate driver
is using HWP mode.

These attributes are:
 - energy_performance_available_preferences
 - energy_performance_preference

To get list of supported hints:
$ cat energy_performance_available_preferences
default performance balance_performance balance_power power

The current preference can be read or changed via cpufreq sysfs
attribute "energy_performance_preference". Reading from this attribute
will display current effective setting changed via any method. User can
write any of the valid preference string to this attribute. User can
always restore to power-on default by writing "default".

Implementation
Since these hints can be provided by direct MSR write or using some tools
like x86_energy_perf_policy, the driver internally doesn't maintain any
state. The user operation will result in direct read/write of MSR: 0x774
(HWP_REQUEST_MSR). Also driver use read modify write to update other
fields in this MSR.

Summary of changes:
 - struct cpudata field epp_saved is renamed to epp_powersave, as this
   stores the value to restore once policy is switched from performance
   to powersave to restore original powersave EPP value.
 - A new struct cpudata field epp_saved is used to store the raw MSR
   EPP/EPB value when a CPU goes offline or on suspend and restore on
   online/resume. This ensures that EPP value is restored to correct
   value irrespective of the means used to set.
 - EPP/EPB value ranges are fixed for each preference, which can be
   set for the cpufreq sysfs, so user request is mapped to/from this
   range.
 - New attributes are only added when HWP is present.
 - Since EPP value of 0 is valid the fields are initialized to
   -EINVAL when not valid. The field epp_default is read only once
   after powerup to avoid reading on subsequent CPU online operation
 - New suspend callback to store epp on suspend operation
 - Don't invalidate old epp_saved field on resume and online as now
   we can restore last epp value on suspend and this field can still
   have old EPP value sampled during switch to performance from
   powersave.
 - While here optimized setting of cpu_data->epp_powersave = epp in
   intel_pstate_hwp_set() as this was done in both true and false
   paths.
 - epp/epb set function returns error to caller on failure to pass
   on to user space for display.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-08 01:43:05 +01:00
Srinivas Pandruvada
b59fe54053 cpufreq: intel_pstate: Add locking around HWP requests
To avoid race conditions from multiple threads, increase the scope
of intel_pstate_limits_lock to include HWP requests also.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-08 01:43:04 +01:00
Chen Yu
4dd63b49a7 cpufreq: ondemand: Set MIN_FREQUENCY_UP_THRESHOLD to 1
Currently the minimal up_threshold is 11, and user may want to
use a smaller minimal up_threshold for performance tuning,
so MIN_FREQUENCY_UP_THRESHOLD could be set to 1 because:

1. Current systems wouldn't be affected as they have already
   a value >= 11.
2. New systems with a default kernel would keep still the default
   value that is >= 11.

Users now have the advantage that they can make their own decisions
and customize the 'trip point' to switch to the max frequency.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=65501
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-01 22:43:33 +01:00
Piotr Luc
58bf454272 cpufreq: intel_pstate: Add Knights Mill CPUID
Add Knights Mill (KNM) to the list of CPUIDs supported by intel_pstate.

Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-12-01 15:08:12 +01:00
Baoyou Xie
ab83805667 cpufreq: dt: Add support for zx296718
Add the compatible string for supporting the generic cpufreq driver on
the ZTE's zx296718 SoC.

Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-30 22:42:47 +01:00
Sebastian Andrzej Siewior
a3605c46e0 cpufreq: acpi-cpufreq: drop rdmsr_on_cpus() usage
The online / pre_down callback is invoked on the target CPU since commit
1cf4f629d9 ("cpu/hotplug: Move online calls to hotplugged cpu") which means
for the hotplug callback we can use rmdsrl() instead of rdmsr_on_cpus().

This leaves us with set_boost() as the only user which still needs to
read/write the MSR on different CPUs. There is no point in doing that
update on all cpus with the read modify write magic via per cpu data. We
simply can issue a function call on all online CPUs which also means that we
need half that many IPIs.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-28 14:31:06 +01:00
Sebastian Andrzej Siewior
4d66ddf28d cpufreq: acpi-cpufreq: Convert to hotplug state machine
Install the callbacks via the state machine.

Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-28 14:31:06 +01:00
Arnd Bergmann
7a3ba767f6 cpufreq: intel_pstate: fix intel_pstate_exit_perf_limits() prototype
The addition of the generic governor support marked the
intel_pstate_exit_perf_limits as inline(), which fixed a warning,
but it introduced another warning:

drivers/cpufreq/intel_pstate.c: In function ‘intel_pstate_exit_perf_limits’:
drivers/cpufreq/intel_pstate.c:483:1: error: no return statement in function returning non-void [-Werror=return-type]

This changes it back to a 'void' return type, and changes the
corresponding intel_pstate_init_acpi_perf_limits() function to
be inline as well for consistency.

Fixes: 001c76f05b (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-28 14:24:21 +01:00
Srinivas Pandruvada
8442885fca cpufreq: intel_pstate: Set EPP/EPB to 0 in performance mode
When user has selected performance policy, then set the EPP (Energy
Performance Preference) or EPB (Energy Performance Bias) to maximum
performance mode.

Also when user switch back to powersave, then restore EPP/EPB to last
EPP/EPB value before entering performance mode. If user has not changed
EPP/EPB manually then it will be power on default value.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-28 14:23:56 +01:00
Srinivas Pandruvada
d5dd33d9de cpufreq: intel_pstate: increase precision of performance limits
Even with round up of limits->min_perf and limits->max_perf, in some
cases resultant performance is 100 MHz less than the desired.

For example when the maximum frequency is 3.50 GHz, setting
scaling_min_frequency to 2.3 GHz always results in 2.2 GHz minimum.

Currently the fixed floating point operation uses 8 bit precision for
calculating limits->min_perf and limits->max_perf. For some operations
in this driver the 14 bit precision is used. Using the 14 bit precision
also for calculating limits->min_perf and limits->max_perf, addresses
this issue.

Introduced fp_ext_toint() equivalent to fp_toint() and int_ext_tofp()
equivalent to int_tofp() with 14 bit precision.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-22 02:31:49 +01:00
Srinivas Pandruvada
46992d6b55 cpufreq: intel_pstate: round up min_perf limits
In some use cases, user wants to enforce a minimum performance limit on
CPUs. But because of simple division the resultant performance is 100 MHz
less than the desired in some cases.

For example when the maximum frequency is 3.50 GHz, setting
scaling_min_frequency to 1.6 GHz always results in 1.5 GHz minimum. With
simple round up, the frequency can be set to 1.6 GHz to minimum in this
case. This round up is already done to max_policy_pct and max_perf, so do
the same for min_policy_pct and min_perf.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-22 02:31:48 +01:00
Rafael J. Wysocki
30248feff5 cpufreq: Make cpufreq_update_policy() void
The return value of cpufreq_update_policy() is never used, so make
it void.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-11-21 14:35:43 +01:00
Rafael J. Wysocki
182e36af06 cpufreq: Avoid using inactive policies
There are two places in the cpufreq core in which low-level driver
callbacks may be invoked for an inactive cpufreq policy, which isn't
guaranteed to work in general.  Both are due to possible races with
CPU offline.

First, in cpufreq_get(), the policy may become inactive after
the check against policy->cpus in cpufreq_cpu_get() and before
policy->rwsem is acquired, in which case using it going forward may
not be correct.

Second, an analogous situation is possible in cpufreq_update_policy().

Avoid using inactive policies by adding policy_is_inactive() checks
to the code in the above places.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
2016-11-21 14:35:42 +01:00
Rafael J. Wysocki
001c76f05b cpufreq: intel_pstate: Generic governors support
There may be reasons to use generic cpufreq governors (eg. schedutil)
on Intel platforms instead of the intel_pstate driver's internal
governor.  However, that currently can only be done by disabling
intel_pstate altogether and using the acpi-cpufreq driver instead
of it, which is subject to limitations.

First of all, acpi-cpufreq only works on systems where the _PSS
object is present in the ACPI tables for all logical CPUs.  Second,
on those systems acpi-cpufreq will only use frequencies listed by
_PSS which may be suboptimal.  In particular, by convention, the
whole turbo range is represented in _PSS as a single P-state and
the frequency assigned to it is greater by 1 MHz than the greatest
non-turbo frequency listed by _PSS.  That may confuse governors to
use turbo frequencies less frequently which may lead to suboptimal
performance.

For this reason, make it possible to use the intel_pstate driver
with generic cpufreq governors as a "normal" cpufreq driver.  That
mode is enforced by adding intel_pstate=passive to the kernel
command line and cannot be disabled at run time.  In that mode,
intel_pstate provides a cpufreq driver interface including
the ->target() and ->fast_switch() callbacks and is listed in
scaling_driver as "intel_cpufreq".

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Doug Smythies <dsmythies@telus.net>
2016-11-21 14:32:32 +01:00
Rafael J. Wysocki
d0ea59e188 cpufreq: intel_pstate: Request P-states control from SMM if needed
Currently, intel_pstate is unable to control P-states on my
IvyBridge-based Acer Aspire S5, because they are controlled by SMM
on that machine by default and it is necessary to request OS control
of P-states from it via the SMI Command register exposed in the ACPI
FADT.  intel_pstate doesn't do that now, but acpi-cpufreq and other
cpufreq drivers for x86 platforms do.

Address this problem by making intel_pstate use the ACPI-defined
mechanism as well.  However, intel_pstate is not modular and it
doesn't need the module refcount tricks played by
acpi_processor_notify_smm(), so export the core of this function
to it as acpi_processor_pstate_control() and make it call that.
[The changes in processor_perflib.c related to this should not
make any functional difference for the acpi_processor_notify_smm()
users].

To be safe, only call acpi_processor_notify_smm() from intel_pstate
if ACPI _PPC support is enabled in it.

Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-11-17 22:47:47 +01:00
Geert Uytterhoeven
f0da898b46 cpufreq: dt: Add support for r8a7743 and r8a7745
Add the compatible strings for supporting the generic cpufreq driver on
the Renesas RZ/G1M (r8a7743) and RZ/G1E (r8a7745) SoCs.

Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-16 23:31:52 +01:00
Denis Kirjanov
8a10c06a20 cpufreq: powernv: Disable preemption while checking CPU throttling state
With preemption turned on we can read incorrect throttling state
while being switched to CPU on a different chip.

 BUG: using smp_processor_id() in preemptible [00000000] code: cat/7343
 caller is .powernv_cpufreq_throttle_check+0x2c/0x710
 CPU: 13 PID: 7343 Comm: cat Not tainted 4.8.0-rc5-dirty #1
 Call Trace:
 [c0000007d25b75b0] [c000000000971378] .dump_stack+0xe4/0x150 (unreliable)
 [c0000007d25b7640] [c0000000005162e4] .check_preemption_disabled+0x134/0x150
 [c0000007d25b76e0] [c0000000007b63ac] .powernv_cpufreq_throttle_check+0x2c/0x710
 [c0000007d25b7790] [c0000000007b6d18] .powernv_cpufreq_target_index+0x288/0x360
 [c0000007d25b7870] [c0000000007acee4] .__cpufreq_driver_target+0x394/0x8c0
 [c0000007d25b7920] [c0000000007b22ac] .cpufreq_set+0x7c/0xd0
 [c0000007d25b79b0] [c0000000007adf50] .store_scaling_setspeed+0x80/0xc0
 [c0000007d25b7a40] [c0000000007ae270] .store+0xa0/0x100
 [c0000007d25b7ae0] [c0000000003566e8] .sysfs_kf_write+0x88/0xb0
 [c0000007d25b7b70] [c0000000003553b8] .kernfs_fop_write+0x178/0x260
 [c0000007d25b7c10] [c0000000002ac3cc] .__vfs_write+0x3c/0x1c0
 [c0000007d25b7cf0] [c0000000002ad584] .vfs_write+0xc4/0x230
 [c0000007d25b7d90] [c0000000002aeef8] .SyS_write+0x58/0x100
 [c0000007d25b7e30] [c00000000000bfec] system_call+0x38/0xfc

Fixes: 09a972d162 (cpufreq: powernv: Report cpu frequency throttling)
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Denis Kirjanov <kda@linux-powerpc.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-16 23:29:59 +01:00
Stratos Karafotis
42d951c851 cpufreq: conservative: Fix comment explaining frequency updates
The original comment about the frequency increase to maximum is wrong.

Both increase and decrease happen at steps.

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-16 23:15:56 +01:00
Stratos Karafotis
00bfe05889 cpufreq: conservative: Decrease frequency faster for deferred updates
Conservative governor changes the CPU frequency in steps.
That means that if a CPU runs at max frequency, it will need several
sampling periods to return to min frequency when the workload
is finished.

If the update function that calculates the load and target frequency
is deferred, the governor might need even more time to decrease the
frequency.

This may have impact to power consumption and after all conservative
should decrease the frequency if there is no workload at every sampling
rate.

To resolve the above issue calculate the number of sampling periods
that the update is deferred. Considering that for each sampling period
conservative should drop the frequency by a freq_step because the
CPU was idle apply the proper subtraction to requested frequency.

Below, the kernel trace with and without this patch. First an
intensive workload is applied on a specific CPU. Then the workload
is removed and the CPU goes to idle.

WITHOUT

     <idle>-0     [007] dN..   620.329153: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556   [007] ....   620.350857: cpu_frequency: state=1700000 cpu_id=7
kworker/7:2-556   [007] ....   620.370856: cpu_frequency: state=1900000 cpu_id=7
kworker/7:2-556   [007] ....   620.390854: cpu_frequency: state=2100000 cpu_id=7
kworker/7:2-556   [007] ....   620.411853: cpu_frequency: state=2200000 cpu_id=7
kworker/7:2-556   [007] ....   620.432854: cpu_frequency: state=2400000 cpu_id=7
kworker/7:2-556   [007] ....   620.453854: cpu_frequency: state=2600000 cpu_id=7
kworker/7:2-556   [007] ....   620.494856: cpu_frequency: state=2900000 cpu_id=7
kworker/7:2-556   [007] ....   620.515856: cpu_frequency: state=3100000 cpu_id=7
kworker/7:2-556   [007] ....   620.536858: cpu_frequency: state=3300000 cpu_id=7
kworker/7:2-556   [007] ....   620.557857: cpu_frequency: state=3401000 cpu_id=7
     <idle>-0     [007] d...   669.591363: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   669.591939: cpu_idle: state=4294967295 cpu_id=7
     <idle>-0     [007] d...   669.591980: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] dN..   669.591989: cpu_idle: state=4294967295 cpu_id=7
...
     <idle>-0     [007] d...   670.201224: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   670.221975: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556   [007] ....   670.222016: cpu_frequency: state=3300000 cpu_id=7
     <idle>-0     [007] d...   670.222026: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   670.234964: cpu_idle: state=4294967295 cpu_id=7
...
     <idle>-0     [007] d...   670.801251: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   671.236046: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556   [007] ....   671.236073: cpu_frequency: state=3100000 cpu_id=7
     <idle>-0     [007] d...   671.236112: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   671.393437: cpu_idle: state=4294967295 cpu_id=7
...
     <idle>-0     [007] d...   671.401277: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   671.404083: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556   [007] ....   671.404111: cpu_frequency: state=2900000 cpu_id=7
     <idle>-0     [007] d...   671.404125: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   671.404974: cpu_idle: state=4294967295 cpu_id=7
...
     <idle>-0     [007] d...   671.501180: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   671.995414: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556   [007] ....   671.995459: cpu_frequency: state=2800000 cpu_id=7
     <idle>-0     [007] d...   671.995469: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   671.996287: cpu_idle: state=4294967295 cpu_id=7
...
     <idle>-0     [007] d...   672.001305: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   672.078374: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556   [007] ....   672.078410: cpu_frequency: state=2600000 cpu_id=7
     <idle>-0     [007] d...   672.078419: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   672.158020: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556   [007] ....   672.158040: cpu_frequency: state=2400000 cpu_id=7
     <idle>-0     [007] d...   672.158044: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   672.160038: cpu_idle: state=4294967295 cpu_id=7
...
     <idle>-0     [007] d...   672.234557: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   672.237121: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556   [007] ....   672.237174: cpu_frequency: state=2100000 cpu_id=7
     <idle>-0     [007] d...   672.237186: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   672.237778: cpu_idle: state=4294967295 cpu_id=7
...
     <idle>-0     [007] d...   672.267902: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   672.269860: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556   [007] ....   672.269906: cpu_frequency: state=1900000 cpu_id=7
     <idle>-0     [007] d...   672.269914: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   672.271902: cpu_idle: state=4294967295 cpu_id=7
...
     <idle>-0     [007] d...   672.751342: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...   672.823056: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-556   [007] ....   672.823095: cpu_frequency: state=1600000 cpu_id=7

WITH

     <idle>-0     [007] dN..  4380.928009: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007] ....  4380.949767: cpu_frequency: state=2000000 cpu_id=7
kworker/7:2-399   [007] ....  4380.969765: cpu_frequency: state=2200000 cpu_id=7
kworker/7:2-399   [007] ....  4381.009766: cpu_frequency: state=2500000 cpu_id=7
kworker/7:2-399   [007] ....  4381.029767: cpu_frequency: state=2600000 cpu_id=7
kworker/7:2-399   [007] ....  4381.049769: cpu_frequency: state=2800000 cpu_id=7
kworker/7:2-399   [007] ....  4381.069769: cpu_frequency: state=3000000 cpu_id=7
kworker/7:2-399   [007] ....  4381.089771: cpu_frequency: state=3100000 cpu_id=7
kworker/7:2-399   [007] ....  4381.109772: cpu_frequency: state=3400000 cpu_id=7
kworker/7:2-399   [007] ....  4381.129773: cpu_frequency: state=3401000 cpu_id=7
     <idle>-0     [007] d...  4428.226159: cpu_idle: state=1 cpu_id=7
     <idle>-0     [007] d...  4428.226176: cpu_idle: state=4294967295 cpu_id=7
     <idle>-0     [007] d...  4428.226181: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...  4428.227177: cpu_idle: state=4294967295 cpu_id=7
...
     <idle>-0     [007] d...  4428.551640: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...  4428.649239: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007] ....  4428.649268: cpu_frequency: state=2800000 cpu_id=7
     <idle>-0     [007] d...  4428.649278: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...  4428.689856: cpu_idle: state=4294967295 cpu_id=7
...
     <idle>-0     [007] d...  4428.799542: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...  4428.801683: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007] ....  4428.801748: cpu_frequency: state=1700000 cpu_id=7
     <idle>-0     [007] d...  4428.801761: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...  4428.806545: cpu_idle: state=4294967295 cpu_id=7
...
     <idle>-0     [007] d...  4429.051880: cpu_idle: state=4 cpu_id=7
     <idle>-0     [007] d...  4429.086240: cpu_idle: state=4294967295 cpu_id=7
kworker/7:2-399   [007] ....  4429.086293: cpu_frequency: state=1600000 cpu_id=7

Without the patch the CPU dropped to min frequency after 3.2s
With the patch applied the CPU dropped to min frequency after 0.86s

Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-16 23:15:56 +01:00
Viresh Kumar
d5f905a93c cpufreq: conservative: Rename get_freq_target() to get_freq_step()
What's returned from this function is the delta by which the frequency
must be increased or decreased and not the final frequency that should
be selected.

Name it properly to match its purpose. Also update the variables used to
store that value.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-14 21:34:52 +01:00
Akshay Adiga
c9a81e6864 cpufreq: powernv: Fix uninitialized lpstate_idx in gpstates_timer_handler()
lpstate_idx remains uninitialized in the case when elapsed_time
is greater than MAX_RAMP_DOWN_TIME.  At the end of rampdown the
global pstate should be equal to the local pstate.

Fixes: 20b15b7663 (cpufreq: powernv: Use PMCR to verify global and localpstate)
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-14 21:32:31 +01:00
Srinivas Pandruvada
7f7a516ee3 cpufreq: intel_pstate: Use CPU load based algorithm for PM_MOBILE
Use get_target_pstate_use_cpu_load() to calculate target P-State for
devices, with the preferred power management profile in ACPI FADT
set to PM_MOBILE.

This may help in resolving some thermal issues caused by low sustained
cpu bound workloads. The current algorithm tend to over provision in this
case as it doesn't look at the CPU busyness.

Also included the fix from Arnd Bergmann <arnd@arndb.de> to solve compile
issue, when CONFIG_ACPI is not defined.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-14 21:25:23 +01:00
Robert Jarzmik
dcd2ea410d cpufreq: pxa: use generic platdev driver for device-tree
For device-tree based pxa25x and pxa27x platforms, cpufreq-dt driver is
doing the job as well as pxa2xx-cpufreq, so add these platforms to the
compatibility list.

This won't work for legacy non device-tree platforms where
pxa2xx-cpufreq is still required.

Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-11 02:08:42 +01:00
Markus Mayer
ee7930ee27 cpufreq: stats: New sysfs attribute for clearing statistics
Allow CPUfreq statistics to be cleared by writing anything to
/sys/.../cpufreq/stats/reset.

Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-11 01:51:11 +01:00
Viresh Kumar
26f0dbc9ab cpufreq: governor: Don't use 'timer' keyword
The earlier implementation of governors used background timers and so
functions, mutex, etc had 'timer' keyword in their names.

But that's not true anymore. Replace 'timer' with 'update', as those
functions, variables are based around updates to frequency.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-11 01:48:33 +01:00
Akshay Adiga
20b15b7663 cpufreq: powernv: Use PMCR to verify global and local pstate
As fast_switch() may get called with interrupt disable mode, we cannot
hold a mutex to update the global_pstate_info. So currently, fast_switch()
does not update the global_pstate_info and it will end up with stale data
whenever pstate is updated through fast_switch().

As the gpstate_timer can fire after fast_switch() has updated the pstates,
the timer handler cannot rely on the cached values of local and global
pstate and needs to read it from the PMCR.

Only gpstate_timer_handler() is affected by the stale cached pstate data
beacause either fast_switch() or target_index() routines will be called
for a given govenor, but gpstate_timer can fire after the governor has
changed to schedutil.

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-11 01:41:02 +01:00
Akshay Adiga
60c9efb8f7 cpufreq: powernv: Adding fast_switch for schedutil
Adding fast_switch which does light weight operation to set the desired
pstate. Both global and local pstates are set to the same desired pstate.

Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-11 01:41:02 +01:00
Wei Yongjun
e7d040b8a2 cpufreq: brcmstb-avs-cpufreq: make symbol brcm_avs_cpufreq_attr static
Fixes the following sparse warning:

drivers/cpufreq/brcmstb-avs-cpufreq.c:982:18: warning:
 symbol 'brcm_avs_cpufreq_attr' was not declared. Should it be static?

Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Markus Mayer <mmayer@broadcom.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-11 01:32:53 +01:00
Srinivas Pandruvada
a410c03d66 cpufreq: intel_pstate: protect limits variable
The limits variable gets modified from intel_pstate sysfs and also gets
modified from cpufreq sysfs. So protect with a mutex to keep data
integrity, when they are getting modified from multiple threads.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-01 06:10:54 +01:00
Markus Mayer
33de45c133 cpufreq: brcmstb-avs-cpufreq: add debugfs support
In order to aid debugging, we add a debugfs interface to the driver
that allows direct interaction with the AVS co-processor.

The debugfs interface provides a means for reading all and writing some
of the mailbox registers directly from the shell prompt and enables a
user to execute the communications protocol between ARM CPU and AVS CPU
step-by-step.

This interface should be used for debugging purposes only.

Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-01 06:07:38 +01:00
Markus Mayer
de322e0859 cpufreq: brcmstb-avs-cpufreq: AVS CPUfreq driver for Broadcom STB SoCs
This driver supports voltage and frequency scaling on Broadcom STB SoCs
using AVS firmware with DFS and DVFS support.

Actual frequency or voltage scaling is done exclusively by the AVS
firmware. The driver merely provides a standard CPUfreq interface to
other kernel components and userland, and instructs the AVS firmware to
perform frequency or voltage changes on its behalf.

Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-01 06:07:37 +01:00
Masahiro Yamada
1758b3374b cpufreq: dt: add Socionext UniPhier SoCs support
Add compatible strings for Pro5, PXs2, LD6b, LD11, LD20 SoCs to use
the generic cpufreq driver.

Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-01 06:05:42 +01:00
Srinivas Pandruvada
5879f87739 cpufreq: intel_pstate: Reduce impact due to rounding error
When policy->max and policy->min are same, in some cases they don't
result in the same frequency cap. The max_policy_pct is rounded up but
not min_perf_pct. So even when they are same, results in different
percentage or maximum and minimum.
Since minimum is a conservative value for power, a lower value without
rounding is better in most of the cases, unless user wants
policy->max = policy->min.
This change uses use the same policy percentage when policy->max and
policy->min are same.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-01 06:04:06 +01:00
Srinivas Pandruvada
eae48f046f cpufreq: intel_pstate: Per CPU P-State limits
Intel P-State offers two interface to set performance limits:
- Intel P-State sysfs
	/sys/devices/system/cpu/intel_pstate/max_perf_pct
	/sys/devices/system/cpu/intel_pstate/min_perf_pct
- cpufreq
	/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
	/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq

In the current implementation both of the above methods, change limits
to every CPU in the system. Moreover the limits placed using cpufreq
policy interface also presented in the Intel P-State sysfs via modified
max_perf_pct and min_per_pct during sysfs reads. This allows to check
percent of reduced/increased performance, irrespective of method used to
limit.

There are some new generations of processors, where it is possible to
have limits placed on individual CPU cores. Using cpufreq interface it
is possible to set limits on each CPU. But the current processing will
use last limits placed on all CPUs. So the per core limit feature of
CPUs can't be used.

This change brings in capability to set P-States limits for each CPU,
with some limitations. In this case what should be the read of
max_perf_pct and min_perf_pct? It can be most restrictive limits placed
on any CPU or max possible performance on any given CPU on which no
limits are placed. In either case someone will have issue.

So the consensus is, we can't have both sysfs controls present when user
wants to use limit per core limits.
- By default per-core-control feature is not enabled. So no one will
notice any difference.
- The way to enable is by kernel command line
intel_pstate=per_cpu_perf_limits
- When the per-core-controls are enabled there is no display of for both
read and write on
	/sys/devices/system/cpu/intel_pstate/max_perf_pct
	/sys/devices/system/cpu/intel_pstate/min_perf_pct
- User can change limits using
	/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
	/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
	/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
- User can still observe turbo percent and number of P-States from
	/sys/devices/system/cpu/intel_pstate/turbo_pct
	/sys/devices/system/cpu/intel_pstate/num_pstates
- User can read write system wide turbo status
	/sys/devices/system/cpu/no_turbo

While changing this BUG_ON is changed to WARN_ON, as they are not fatal
errors for the system.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-01 06:04:06 +01:00
Linus Walleij
ae8b8d8f86 cpufreq: retire the Integrator cpufreq driver
After switching the core module clocks controlling the Integrator
clock frequencies to the common clock framework, defining the
operating points in the device tree, and activating the generic
DT-based CPUfreq driver, we can retire the old Integrator
cpufreq driver.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-01 06:01:18 +01:00
Linus Walleij
650ec6cfe3 cpufreq: enable the DT cpufreq driver on the Integrators
This enables the generic DT and OPP-based cpufreq driver on the
ARM Integrator/AP and Integrator/CP.

Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-11-01 06:01:18 +01:00
Rafael J. Wysocki
fe0f59c412 Merge back earlier cpufreq material for v4.10. 2016-10-30 06:12:50 +01:00
Rafael J. Wysocki
2f1d407ada cpufreq: intel_pstate: Always set max P-state in performance mode
The only times at which intel_pstate checks the policy set for
a given CPU is the initialization of that CPU and updates of its
policy settings from cpufreq when intel_pstate_set_policy() is
invoked.

That is insufficient, however, because intel_pstate uses the same
P-state selection function for all CPUs regardless of the policy
setting for each of them and the P-state limits are shared between
them.  Thus if the policy is set to "performance" for a particular
CPU, it may not behave as expected if the cpufreq settings are
changed subsequently for another CPU.

That can be easily demonstrated by writing "performance" to
scaling_governor for all CPUs and then switching it to "powersave"
for one of them in which case all of the CPUs will behave as though
their scaling_governor were all "powersave" (even though the policy
still appears to be "performance" for the remaining CPUs).

Fix this problem by modifying intel_pstate_adjust_busy_pstate() to
always set the P-state to the maximum allowed by the current limits
for all CPUs whose policy is set to "performance".

Note that it still is recommended to always change the policy setting
in the same way for all CPUs even with this fix applied to avoid
confusion.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-24 23:20:25 +02:00
Rafael J. Wysocki
a6c6ead141 cpufreq: intel_pstate: Set P-state upfront in performance mode
After commit a4675fbc4a (cpufreq: intel_pstate: Replace timers with
utilization update callbacks) the cpufreq governor callbacks may not
be invoked on NOHZ_FULL CPUs and, in particular, switching to the
"performance" policy via sysfs may not have any effect on them.  That
is a problem, because it usually is desirable to squeeze the last
bit of performance out of those CPUs, so work around it by setting
the maximum P-state (within the limits) in intel_pstate_set_policy()
upfront when the policy is CPUFREQ_POLICY_PERFORMANCE.

Fixes: a4675fbc4a (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-10-21 22:18:22 +02:00
Srinivas Pandruvada
185d82456e cpufreq: intel_pstate: Remove PID debugfs when not used
When target state is calculated using get_target_pstate_use_cpu_load(),
PID controller is not used, hence it has no effect on performance.
So don't present debugfs entries to tune PID controller.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-21 22:16:26 +02:00
Rafael J. Wysocki
1d29815ef2 cpufreq: intel_pstate: Drop boost_iowait flag
The "IOwait boosting" mechanism is only used by the
get_target_pstate_use_cpu_load() governor function and the
boost_iowait flag in pid_params is always set when that function
is in use (and it is never set otherwise).  This means that the
boost_iowait flag is in fact redundant and may be dropped.

For this reason, replace the boost_iowait flag check in
intel_pstate_update_util() with an equivalent check against
pstate_funcs.get_target_pstate and drop that flag.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2016-10-21 22:13:51 +02:00
Prakash, Prashanth
974f86498e cpufreq / CPPC: Add MODULE_DEVICE_TABLE for cppc_cpufreq driver
MODULE_DEVICE_TABLE is added so that CPPC cpufreq module can be
automatically loaded when we have a acpi processor device with
"ACPI0007" hid.

Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2016-10-21 15:11:23 +02:00