A side effect of keeping intel_pstate sysfs limits in sync with cpufreq
is that the now sysfs limits can't enforced under performance policy.
For example, if the max_perf_pct is changed from 100 to 80, this will call
intel_pstate_set_policy(), which will change the max_perf to 100 again for
performance policy. Same issue happens, when no_turbo is set.
This change calculates max and min frequency using sysfs performance
limits in intel_pstate_verify_policy() and adjusts policy limits by
calling cpufreq_verify_within_limits().
Also, it causes the setting of performance limits to be skipped if
no_turbo is set.
Fixes: 111b8b3fe4 (cpufreq: intel_pstate: Always keep all limits settings in sync)
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add the compatible string for supporting the generic device tree cpufreq-dt
driver on APM's X-Gene 2 SoC.
Signed-off-by: Hoan Tran <hotran@apm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Make intel_pstate update per-logical-CPU limits when the global
settings are changed to ensure that they are always in sync and
users will not see confusing values in per-logical-CPU sysfs
attributes.
This also fixes the problem that setting the "no_turbo" global
attribute to 1 in the "passive" mode (ie. when intel_pstate acts
as a regular cpufreq driver) when scaling_governor is set to
"performance" has no effect.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Race conditions are possible if intel_cpufreq_verify_policy()
is executed in parallel with global limits updates from sysfs,
so the invocation of intel_pstate_update_perf_limits() in it
should be carried out under intel_pstate_limits_lock.
Make that happen.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Theoretically, intel_pstate_resume() may be executed in parallel
with intel_pstate_set_policy(), if the latter is invoked via
cpufreq_update_policy() as a result of a notification, so use
intel_pstate_limits_lock in there too to avoid race conditions.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
If intel_pstate works in the passive mode in which it acts as
a regular cpufreq driver and collaborates with generic cpufreq
governors, the PID parameters are not used, so do not expose
them via debugfs in that case.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
s3c64xx_cpufreq_config_regulator is incorrectly annotated
as __init, since the caller is also not init:
WARNING: vmlinux.o(.text+0x92fe1c): Section mismatch in reference from the function s3c64xx_cpufreq_driver_init() to the function .init.text:s3c64xx_cpufreq_config_regulator()
With modern gcc versions, the function gets inline, so we don't
see the warning, this only happens with gcc-4.6 and older.
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Krzysztof Kozlowski <krzk@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Since CPU hotplug callbacks are requested for CPUHP_AP_ONLINE_DYN state,
successful callback initialization will result in cpuhp_setup_state()
returning a positive value. Therefore acpi_cpufreq_online being zero
indicates that callbacks have not been installed.
This means that acpi_cpufreq_boost_exit() should only remove them if
acpi_cpufreq_online is positive. Trying to call
cpuhp_remove_state_nocalls(0) will cause a BUG().
Signed-off-by: Boris Ostrovsky <boris.ostrovsky@oracle.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It is possible to provide hints to the HWP algorithms in the processor
to be more performance centric to more energy centric. These hints are
provided by using HWP energy performance preference (EPP) or energy
performance bias (EPB) settings.
The scope of these settings is per logical processor, which means that
each of the logical processors in the package can be programmed with a
different value.
This change provides cpufreq sysfs interface to provide hint. For each
policy, two additional attributes will be available to check and provide
hint. These attributes will only be present when the intel_pstate driver
is using HWP mode.
These attributes are:
- energy_performance_available_preferences
- energy_performance_preference
To get list of supported hints:
$ cat energy_performance_available_preferences
default performance balance_performance balance_power power
The current preference can be read or changed via cpufreq sysfs
attribute "energy_performance_preference". Reading from this attribute
will display current effective setting changed via any method. User can
write any of the valid preference string to this attribute. User can
always restore to power-on default by writing "default".
Implementation
Since these hints can be provided by direct MSR write or using some tools
like x86_energy_perf_policy, the driver internally doesn't maintain any
state. The user operation will result in direct read/write of MSR: 0x774
(HWP_REQUEST_MSR). Also driver use read modify write to update other
fields in this MSR.
Summary of changes:
- struct cpudata field epp_saved is renamed to epp_powersave, as this
stores the value to restore once policy is switched from performance
to powersave to restore original powersave EPP value.
- A new struct cpudata field epp_saved is used to store the raw MSR
EPP/EPB value when a CPU goes offline or on suspend and restore on
online/resume. This ensures that EPP value is restored to correct
value irrespective of the means used to set.
- EPP/EPB value ranges are fixed for each preference, which can be
set for the cpufreq sysfs, so user request is mapped to/from this
range.
- New attributes are only added when HWP is present.
- Since EPP value of 0 is valid the fields are initialized to
-EINVAL when not valid. The field epp_default is read only once
after powerup to avoid reading on subsequent CPU online operation
- New suspend callback to store epp on suspend operation
- Don't invalidate old epp_saved field on resume and online as now
we can restore last epp value on suspend and this field can still
have old EPP value sampled during switch to performance from
powersave.
- While here optimized setting of cpu_data->epp_powersave = epp in
intel_pstate_hwp_set() as this was done in both true and false
paths.
- epp/epb set function returns error to caller on failure to pass
on to user space for display.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
To avoid race conditions from multiple threads, increase the scope
of intel_pstate_limits_lock to include HWP requests also.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently the minimal up_threshold is 11, and user may want to
use a smaller minimal up_threshold for performance tuning,
so MIN_FREQUENCY_UP_THRESHOLD could be set to 1 because:
1. Current systems wouldn't be affected as they have already
a value >= 11.
2. New systems with a default kernel would keep still the default
value that is >= 11.
Users now have the advantage that they can make their own decisions
and customize the 'trip point' to switch to the max frequency.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=65501
Signed-off-by: Chen Yu <yu.c.chen@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add Knights Mill (KNM) to the list of CPUIDs supported by intel_pstate.
Signed-off-by: Piotr Luc <piotr.luc@intel.com>
Reviewed-by: Dave Hansen <dave.hansen@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add the compatible string for supporting the generic cpufreq driver on
the ZTE's zx296718 SoC.
Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The online / pre_down callback is invoked on the target CPU since commit
1cf4f629d9 ("cpu/hotplug: Move online calls to hotplugged cpu") which means
for the hotplug callback we can use rmdsrl() instead of rdmsr_on_cpus().
This leaves us with set_boost() as the only user which still needs to
read/write the MSR on different CPUs. There is no point in doing that
update on all cpus with the read modify write magic via per cpu data. We
simply can issue a function call on all online CPUs which also means that we
need half that many IPIs.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Install the callbacks via the state machine.
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The addition of the generic governor support marked the
intel_pstate_exit_perf_limits as inline(), which fixed a warning,
but it introduced another warning:
drivers/cpufreq/intel_pstate.c: In function ‘intel_pstate_exit_perf_limits’:
drivers/cpufreq/intel_pstate.c:483:1: error: no return statement in function returning non-void [-Werror=return-type]
This changes it back to a 'void' return type, and changes the
corresponding intel_pstate_init_acpi_perf_limits() function to
be inline as well for consistency.
Fixes: 001c76f05b (cpufreq: intel_pstate: Generic governors support)
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When user has selected performance policy, then set the EPP (Energy
Performance Preference) or EPB (Energy Performance Bias) to maximum
performance mode.
Also when user switch back to powersave, then restore EPP/EPB to last
EPP/EPB value before entering performance mode. If user has not changed
EPP/EPB manually then it will be power on default value.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Even with round up of limits->min_perf and limits->max_perf, in some
cases resultant performance is 100 MHz less than the desired.
For example when the maximum frequency is 3.50 GHz, setting
scaling_min_frequency to 2.3 GHz always results in 2.2 GHz minimum.
Currently the fixed floating point operation uses 8 bit precision for
calculating limits->min_perf and limits->max_perf. For some operations
in this driver the 14 bit precision is used. Using the 14 bit precision
also for calculating limits->min_perf and limits->max_perf, addresses
this issue.
Introduced fp_ext_toint() equivalent to fp_toint() and int_ext_tofp()
equivalent to int_tofp() with 14 bit precision.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In some use cases, user wants to enforce a minimum performance limit on
CPUs. But because of simple division the resultant performance is 100 MHz
less than the desired in some cases.
For example when the maximum frequency is 3.50 GHz, setting
scaling_min_frequency to 1.6 GHz always results in 1.5 GHz minimum. With
simple round up, the frequency can be set to 1.6 GHz to minimum in this
case. This round up is already done to max_policy_pct and max_perf, so do
the same for min_policy_pct and min_perf.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The return value of cpufreq_update_policy() is never used, so make
it void.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
There are two places in the cpufreq core in which low-level driver
callbacks may be invoked for an inactive cpufreq policy, which isn't
guaranteed to work in general. Both are due to possible races with
CPU offline.
First, in cpufreq_get(), the policy may become inactive after
the check against policy->cpus in cpufreq_cpu_get() and before
policy->rwsem is acquired, in which case using it going forward may
not be correct.
Second, an analogous situation is possible in cpufreq_update_policy().
Avoid using inactive policies by adding policy_is_inactive() checks
to the code in the above places.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
There may be reasons to use generic cpufreq governors (eg. schedutil)
on Intel platforms instead of the intel_pstate driver's internal
governor. However, that currently can only be done by disabling
intel_pstate altogether and using the acpi-cpufreq driver instead
of it, which is subject to limitations.
First of all, acpi-cpufreq only works on systems where the _PSS
object is present in the ACPI tables for all logical CPUs. Second,
on those systems acpi-cpufreq will only use frequencies listed by
_PSS which may be suboptimal. In particular, by convention, the
whole turbo range is represented in _PSS as a single P-state and
the frequency assigned to it is greater by 1 MHz than the greatest
non-turbo frequency listed by _PSS. That may confuse governors to
use turbo frequencies less frequently which may lead to suboptimal
performance.
For this reason, make it possible to use the intel_pstate driver
with generic cpufreq governors as a "normal" cpufreq driver. That
mode is enforced by adding intel_pstate=passive to the kernel
command line and cannot be disabled at run time. In that mode,
intel_pstate provides a cpufreq driver interface including
the ->target() and ->fast_switch() callbacks and is listed in
scaling_driver as "intel_cpufreq".
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Doug Smythies <dsmythies@telus.net>
Currently, intel_pstate is unable to control P-states on my
IvyBridge-based Acer Aspire S5, because they are controlled by SMM
on that machine by default and it is necessary to request OS control
of P-states from it via the SMI Command register exposed in the ACPI
FADT. intel_pstate doesn't do that now, but acpi-cpufreq and other
cpufreq drivers for x86 platforms do.
Address this problem by making intel_pstate use the ACPI-defined
mechanism as well. However, intel_pstate is not modular and it
doesn't need the module refcount tricks played by
acpi_processor_notify_smm(), so export the core of this function
to it as acpi_processor_pstate_control() and make it call that.
[The changes in processor_perflib.c related to this should not
make any functional difference for the acpi_processor_notify_smm()
users].
To be safe, only call acpi_processor_notify_smm() from intel_pstate
if ACPI _PPC support is enabled in it.
Suggested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Add the compatible strings for supporting the generic cpufreq driver on
the Renesas RZ/G1M (r8a7743) and RZ/G1E (r8a7745) SoCs.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Simon Horman <horms+renesas@verge.net.au>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The original comment about the frequency increase to maximum is wrong.
Both increase and decrease happen at steps.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
What's returned from this function is the delta by which the frequency
must be increased or decreased and not the final frequency that should
be selected.
Name it properly to match its purpose. Also update the variables used to
store that value.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
lpstate_idx remains uninitialized in the case when elapsed_time
is greater than MAX_RAMP_DOWN_TIME. At the end of rampdown the
global pstate should be equal to the local pstate.
Fixes: 20b15b7663 (cpufreq: powernv: Use PMCR to verify global and localpstate)
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use get_target_pstate_use_cpu_load() to calculate target P-State for
devices, with the preferred power management profile in ACPI FADT
set to PM_MOBILE.
This may help in resolving some thermal issues caused by low sustained
cpu bound workloads. The current algorithm tend to over provision in this
case as it doesn't look at the CPU busyness.
Also included the fix from Arnd Bergmann <arnd@arndb.de> to solve compile
issue, when CONFIG_ACPI is not defined.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
For device-tree based pxa25x and pxa27x platforms, cpufreq-dt driver is
doing the job as well as pxa2xx-cpufreq, so add these platforms to the
compatibility list.
This won't work for legacy non device-tree platforms where
pxa2xx-cpufreq is still required.
Signed-off-by: Robert Jarzmik <robert.jarzmik@free.fr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Allow CPUfreq statistics to be cleared by writing anything to
/sys/.../cpufreq/stats/reset.
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The earlier implementation of governors used background timers and so
functions, mutex, etc had 'timer' keyword in their names.
But that's not true anymore. Replace 'timer' with 'update', as those
functions, variables are based around updates to frequency.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
As fast_switch() may get called with interrupt disable mode, we cannot
hold a mutex to update the global_pstate_info. So currently, fast_switch()
does not update the global_pstate_info and it will end up with stale data
whenever pstate is updated through fast_switch().
As the gpstate_timer can fire after fast_switch() has updated the pstates,
the timer handler cannot rely on the cached values of local and global
pstate and needs to read it from the PMCR.
Only gpstate_timer_handler() is affected by the stale cached pstate data
beacause either fast_switch() or target_index() routines will be called
for a given govenor, but gpstate_timer can fire after the governor has
changed to schedutil.
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Adding fast_switch which does light weight operation to set the desired
pstate. Both global and local pstates are set to the same desired pstate.
Signed-off-by: Akshay Adiga <akshay.adiga@linux.vnet.ibm.com>
Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fixes the following sparse warning:
drivers/cpufreq/brcmstb-avs-cpufreq.c:982:18: warning:
symbol 'brcm_avs_cpufreq_attr' was not declared. Should it be static?
Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
Acked-by: Markus Mayer <mmayer@broadcom.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The limits variable gets modified from intel_pstate sysfs and also gets
modified from cpufreq sysfs. So protect with a mutex to keep data
integrity, when they are getting modified from multiple threads.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In order to aid debugging, we add a debugfs interface to the driver
that allows direct interaction with the AVS co-processor.
The debugfs interface provides a means for reading all and writing some
of the mailbox registers directly from the shell prompt and enables a
user to execute the communications protocol between ARM CPU and AVS CPU
step-by-step.
This interface should be used for debugging purposes only.
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This driver supports voltage and frequency scaling on Broadcom STB SoCs
using AVS firmware with DFS and DVFS support.
Actual frequency or voltage scaling is done exclusively by the AVS
firmware. The driver merely provides a standard CPUfreq interface to
other kernel components and userland, and instructs the AVS firmware to
perform frequency or voltage changes on its behalf.
Signed-off-by: Markus Mayer <mmayer@broadcom.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add compatible strings for Pro5, PXs2, LD6b, LD11, LD20 SoCs to use
the generic cpufreq driver.
Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When policy->max and policy->min are same, in some cases they don't
result in the same frequency cap. The max_policy_pct is rounded up but
not min_perf_pct. So even when they are same, results in different
percentage or maximum and minimum.
Since minimum is a conservative value for power, a lower value without
rounding is better in most of the cases, unless user wants
policy->max = policy->min.
This change uses use the same policy percentage when policy->max and
policy->min are same.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Intel P-State offers two interface to set performance limits:
- Intel P-State sysfs
/sys/devices/system/cpu/intel_pstate/max_perf_pct
/sys/devices/system/cpu/intel_pstate/min_perf_pct
- cpufreq
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
In the current implementation both of the above methods, change limits
to every CPU in the system. Moreover the limits placed using cpufreq
policy interface also presented in the Intel P-State sysfs via modified
max_perf_pct and min_per_pct during sysfs reads. This allows to check
percent of reduced/increased performance, irrespective of method used to
limit.
There are some new generations of processors, where it is possible to
have limits placed on individual CPU cores. Using cpufreq interface it
is possible to set limits on each CPU. But the current processing will
use last limits placed on all CPUs. So the per core limit feature of
CPUs can't be used.
This change brings in capability to set P-States limits for each CPU,
with some limitations. In this case what should be the read of
max_perf_pct and min_perf_pct? It can be most restrictive limits placed
on any CPU or max possible performance on any given CPU on which no
limits are placed. In either case someone will have issue.
So the consensus is, we can't have both sysfs controls present when user
wants to use limit per core limits.
- By default per-core-control feature is not enabled. So no one will
notice any difference.
- The way to enable is by kernel command line
intel_pstate=per_cpu_perf_limits
- When the per-core-controls are enabled there is no display of for both
read and write on
/sys/devices/system/cpu/intel_pstate/max_perf_pct
/sys/devices/system/cpu/intel_pstate/min_perf_pct
- User can change limits using
/sys/devices/system/cpu/cpu*/cpufreq/scaling_max_freq
/sys/devices/system/cpu/cpu*/cpufreq/scaling_min_freq
/sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
- User can still observe turbo percent and number of P-States from
/sys/devices/system/cpu/intel_pstate/turbo_pct
/sys/devices/system/cpu/intel_pstate/num_pstates
- User can read write system wide turbo status
/sys/devices/system/cpu/no_turbo
While changing this BUG_ON is changed to WARN_ON, as they are not fatal
errors for the system.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After switching the core module clocks controlling the Integrator
clock frequencies to the common clock framework, defining the
operating points in the device tree, and activating the generic
DT-based CPUfreq driver, we can retire the old Integrator
cpufreq driver.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This enables the generic DT and OPP-based cpufreq driver on the
ARM Integrator/AP and Integrator/CP.
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The only times at which intel_pstate checks the policy set for
a given CPU is the initialization of that CPU and updates of its
policy settings from cpufreq when intel_pstate_set_policy() is
invoked.
That is insufficient, however, because intel_pstate uses the same
P-state selection function for all CPUs regardless of the policy
setting for each of them and the P-state limits are shared between
them. Thus if the policy is set to "performance" for a particular
CPU, it may not behave as expected if the cpufreq settings are
changed subsequently for another CPU.
That can be easily demonstrated by writing "performance" to
scaling_governor for all CPUs and then switching it to "powersave"
for one of them in which case all of the CPUs will behave as though
their scaling_governor were all "powersave" (even though the policy
still appears to be "performance" for the remaining CPUs).
Fix this problem by modifying intel_pstate_adjust_busy_pstate() to
always set the P-state to the maximum allowed by the current limits
for all CPUs whose policy is set to "performance".
Note that it still is recommended to always change the policy setting
in the same way for all CPUs even with this fix applied to avoid
confusion.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
After commit a4675fbc4a (cpufreq: intel_pstate: Replace timers with
utilization update callbacks) the cpufreq governor callbacks may not
be invoked on NOHZ_FULL CPUs and, in particular, switching to the
"performance" policy via sysfs may not have any effect on them. That
is a problem, because it usually is desirable to squeeze the last
bit of performance out of those CPUs, so work around it by setting
the maximum P-state (within the limits) in intel_pstate_set_policy()
upfront when the policy is CPUFREQ_POLICY_PERFORMANCE.
Fixes: a4675fbc4a (cpufreq: intel_pstate: Replace timers with utilization update callbacks)
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
When target state is calculated using get_target_pstate_use_cpu_load(),
PID controller is not used, hence it has no effect on performance.
So don't present debugfs entries to tune PID controller.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The "IOwait boosting" mechanism is only used by the
get_target_pstate_use_cpu_load() governor function and the
boost_iowait flag in pid_params is always set when that function
is in use (and it is never set otherwise). This means that the
boost_iowait flag is in fact redundant and may be dropped.
For this reason, replace the boost_iowait flag check in
intel_pstate_update_util() with an equivalent check against
pstate_funcs.get_target_pstate and drop that flag.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
MODULE_DEVICE_TABLE is added so that CPPC cpufreq module can be
automatically loaded when we have a acpi processor device with
"ACPI0007" hid.
Signed-off-by: Prashanth Prakash <pprakash@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>