- Fix error handling in the apple-soc cpufreq driver (Dan Carpenter).
- Change the log level of a message in the amd-pstate cpufreq driver
so it is more visible to users (Kai-Heng Feng).
- Adjust the balance_performance EPP value for Sapphire Rapids in the
intel_pstate cpufreq driver (Srinivas Pandruvada).
- Remove MODULE_LICENSE from 3 pieces of non-modular code (Nick Alcock).
- Make a read-only kobj_type structure in the schedutil cpufreq governor
constant (Thomas Weißschuh).
- Add Add Power Limit4 support for Meteor Lake SoC to the Intel RAPL
power capping driver (Sumeet Pawnikar).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmQCNm0SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx0P8QAJdcjEg4cY/OJq9VTqiUo92mt63aqWXr
vOMYgp5QDuDRPKiVMMuCJNCJSrA/xPwAz4pcSOj/59CZn1UZqzg6Ap/vdmV7pQV1
vDp6N9E2sgCL0QMQqueyv3yEN9meleGrQPnAulOZf29Q9PC0rGiAUdivvgDXZq/9
9wo+/11EzZpKyaDTfLzotIvNOkmmkp5FtxaCR64D5w3e6gehGrL/wL3FgSefDnsa
fNlhxgi66FYuamSgXqQTkuIuig0Rbvlp0fmhllPaIOkNMI4o7rvP5rB7FbAKZm1E
XI+M3aVlZsImPpEEJ1dTqbc4y9WU9HakLfRRSiUnnWHSXpwBI8ncEwP0oulqqVoF
elA9kd7Sv1DwLiUGMy3GaOwscTN6NDUwICH4UJeISWlMZfrP7YR3Bb7gVtzlCrKC
b99oA92OFazzWliYPSzzSsFQTI1PczNLZKnelzgF1+5g76q3sIUa3tu6Ts6Z84UK
rHCLDVw8TUFpsEQxKOvM3oLUdpvE0mmQpdG8fSeHVxon47jVzmkbDjgOBFp1i19F
HBI7LfOhDkkzO35qb6+DfkQoij0mpn8ldbVVLd1XQe4F0WjfSe8D4oGTgB3ErTK7
8cOKGoez9FsQRHbSVCaZcaTSemDHdB3UMRUHG2hvU1jbwXOaQcLfMAF5McyECJiN
U8uI28JVGVPI
=lRKS
-----END PGP SIGNATURE-----
Merge tag 'pm-6.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull more power management updates from Rafael Wysocki:
"These update power capping (new hardware support and cleanup) and
cpufreq (bug fixes, cleanups and intel_pstate adjustment for a new
platform).
Specifics:
- Fix error handling in the apple-soc cpufreq driver (Dan Carpenter)
- Change the log level of a message in the amd-pstate cpufreq driver
so it is more visible to users (Kai-Heng Feng)
- Adjust the balance_performance EPP value for Sapphire Rapids in the
intel_pstate cpufreq driver (Srinivas Pandruvada)
- Remove MODULE_LICENSE from 3 pieces of non-modular code (Nick
Alcock)
- Make a read-only kobj_type structure in the schedutil cpufreq
governor constant (Thomas Weißschuh)
- Add Add Power Limit4 support for Meteor Lake SoC to the Intel RAPL
power capping driver (Sumeet Pawnikar)"
* tag 'pm-6.3-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
cpufreq: apple-soc: Fix an IS_ERR() vs NULL check
powercap: remove MODULE_LICENSE in non-modules
cpufreq: intel_pstate: remove MODULE_LICENSE in non-modules
powercap: RAPL: Add Power Limit4 support for Meteor Lake SoC
cpufreq: amd-pstate: remove MODULE_LICENSE in non-modules
cpufreq: schedutil: make kobj_type structure constant
cpufreq: amd-pstate: Let user know amd-pstate is disabled
cpufreq: intel_pstate: Adjust balance_performance EPP for Sapphire Rapids
Since commit 8b41fc4454 ("kbuild: create modules.builtin without
Makefile.modbuiltin or tristate.conf"), MODULE_LICENSE declarations
are used to identify modules. As a consequence, uses of the macro
in non-modules will cause modprobe to misidentify their containing
object file as a module when it is not (false positives), and modprobe
might succeed rather than failing with a suitable error message.
So remove it in the files in this commit, none of which can be built as
modules.
Signed-off-by: Nick Alcock <nick.alcock@oracle.com>
Suggested-by: Luis Chamberlain <mcgrof@kernel.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add Meteor Lake SoC to the list of processor models for which
Power Limit4 is supported by the Intel RAPL driver.
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Rework a large bunch of drivers to use the generic thermal trip
structure and use the opportunity to do more cleanups by removing
unused functions from the OF code (Daniel Lezcano).
- Remove core header inclusion from drivers (Daniel Lezcano).
- Fix some locking issues related to the generic thermal trip rework
(Johan Hovold).
- Fix a crash when requesting the critical temperature on tegra, which
is related to the generic trip point work (Jon Hunter).
- Clean up thermal device unregistration code (Viresh Kumar).
- Fix and clean up thermal control core initialization error code
paths (Daniel Lezcano).
- Relocate the trip points handling code into a separate file (Daniel
Lezcano).
- Make the thermal core fail registration of thermal zones and cooling
devices if the thermal class has not been registered (Rafael Wysocki).
- Add trip point initialization helper functions for ACPI-defined trip
points and modify two thermal drivers to use them (Rafael Wysocki,
Daniel Lezcano).
- Make the core thermal control code use sysfs_emit_at() instead of
scnprintf() where applicable (ye xingchen).
- Consolidate code accessing the Intel TCC (Thermal Control Circuitry)
MSRs by introducing library functions for that and making the
TCC-related code in thermal drivers use them (Zhang Rui).
- Enhance the x86_pkg_temp_thermal driver to support dynamic tjmax
changes (Zhang Rui).
- Address an "unsigned expression compared with zero" warning in the
intel_soc_dts_iosf thermal driver (Yang Li).
- Update comments regarding two functions in the Intel Menlow thermal
driver (Deming Wang).
- Use sysfs_emit_at() instead of scnprintf() in the int340x thermal
driver (ye xingchen).
- Make the intel_pch thermal driver support the Wellsburg PCH (Tim
Zimmermann).
- Modify the intel_pch and processor_thermal_device_pci thermal drivers
use generic trip point tables instead of thermal zone trip point
callbacks (Daniel Lezcano).
- Add production mode attribute sysfs attribute to the int340x thermal
driver (Srinivas Pandruvada).
- Rework dynamic trip point updates handling and locking in the int340x
thermal driver (Rafael Wysocki).
- Make the int340x thermal driver use a generic trip points table
instead of thermal zone trip point callbacks (Rafael Wysocki, Daniel
Lezcano).
- Clean up and improve the int340x thermal driver (Rafael Wysocki).
- Simplify and clean up the intel_pch thermal driver (Rafael Wysocki).
- Fix the Intel powerclamp thermal driver and make it use the common
idle injection framework (Srinivas Pandruvada).
- Add two module parameters, cpumask and max_idle, to the Intel powerclamp
thermal driver to allow it to affect only a specific subset of CPUs
instead of all of them (Srinivas Pandruvada).
- Make the Intel quark_dts thermal driver Use generic trip point
objects instead of its own trip point representation (Daniel
Lezcano).
- Add toctree entry for thermal documents and fix two issues in the
Intel powerclamp driver documentation (Bagas Sanjaya).
- Use strscpy() to instead of strncpy() in the thermal core (Xu Panda).
- Fix thermal_sampling_exit() (Vincent Guittot).
- Add Mediatek Low Voltage Thermal Sensor (LVTS) driver (Balsam Chihi).
- Add r8a779g0 RCar support to the rcar_gen3 thermal driver (Geert
Uytterhoeven).
- Fix useless call to set_trips() when resuming in the rcar_gen3
thermal control driver and add interrupt support detection at init
time to it (Niklas Söderlund).
- Fix memory corruption in the hi3660 thermal driver (Yongqin Liu).
- Fix include path for libnl3 in pkg-config file for libthermal (Vibhav
Pant).
- Remove syscfg-based driver for st as the platform is not supported
any more (Alain Volmat).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmPuJuESHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRxef0P/3h73rPjGEyuDlvXaazyXsJ2ItIoGeXF
v9sDwK3IPeFTNwAu80RySXQViOG6G1e5Cl8Ee+LuuMZfPRlBnr3n35BazejDDK0N
u3YAhPqtNOvWqr31T3A27dYtK+feFR2QL9SGFP0E4yxS1jpMOSO4Q24z7yaXdegT
hD8YT1HbTW4Cra7A17qdXsG8LkIe0+GQXy7Ig/Dul1eqXTM4RSReGTmXic66hGpv
lutqIQl8VdjmVBcQtTustpdycAD9zj07xd9BvOyM0lmF90zt6S0VOWFDsk+8u1jA
FCiuRLBAM1xbguxGubahTVOM051J/MdfM5WqGgPtesNIXlDq4Je2WUGC07jGvSfV
DMjNNb+nTkD3BK+BEe+rgv3KZBngj4p2sGHFW19v3EPdGftzohqDD5Oqn0GpsKR0
J4GaT04T66A6jlNdzY/nPfOIw5FYEAsMwx4hR0qtEWDMT4uYtXQYM5iml9TBDoDx
Kqyx+N8KhaKnQ4PLZ0MwtusyZydKQC1S1YK6G2eo+bXeJEre07FjZkiNfURi5gv9
lrKS5nbAGBqUrNV4XnS18RmGAC+bxuQrNA5Gr0ouYaaLMT+jGzcdu1yCMeWJxwZI
fFGAwE6sOU8EtmdGJrQdJt4eKCnpzOS7I1XuMDTBstl8Wv92x/YbH39vOl9wbJVs
rmSkM+4t+sXb
=tZwm
-----END PGP SIGNATURE-----
Merge tag 'thermal-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull thermal control updates from Rafael Wysocki:
"The majority of changes here are related to the general switch-over to
using arrays of generic trip point structures registered along with a
thermal zone instead of trip point callbacks (this has been done
mostly by Daniel Lezcano with some help from yours truly on the Intel
drivers front).
Apart from that and the related reorganization of code, there are some
enhancements of the existing driver and a new Mediatek Low Voltage
Thermal Sensor (LVTS) driver. The Intel powerclamp undergoes a major
rework so it will use the generic idle_inject facility for CPU idle
time injection going forward and it will take additional module
parameters for specifying the subset of CPUs to be affected by it
(work done by Srinivas Pandruvada).
Also included are assorted fixes and a whole bunch of cleanups.
Specifics:
- Rework a large bunch of drivers to use the generic thermal trip
structure and use the opportunity to do more cleanups by removing
unused functions from the OF code (Daniel Lezcano)
- Remove core header inclusion from drivers (Daniel Lezcano)
- Fix some locking issues related to the generic thermal trip rework
(Johan Hovold)
- Fix a crash when requesting the critical temperature on tegra,
which is related to the generic trip point work (Jon Hunter)
- Clean up thermal device unregistration code (Viresh Kumar)
- Fix and clean up thermal control core initialization error code
paths (Daniel Lezcano)
- Relocate the trip points handling code into a separate file (Daniel
Lezcano)
- Make the thermal core fail registration of thermal zones and
cooling devices if the thermal class has not been registered
(Rafael Wysocki)
- Add trip point initialization helper functions for ACPI-defined
trip points and modify two thermal drivers to use them (Rafael
Wysocki, Daniel Lezcano)
- Make the core thermal control code use sysfs_emit_at() instead of
scnprintf() where applicable (ye xingchen)
- Consolidate code accessing the Intel TCC (Thermal Control
Circuitry) MSRs by introducing library functions for that and
making the TCC-related code in thermal drivers use them (Zhang Rui)
- Enhance the x86_pkg_temp_thermal driver to support dynamic tjmax
changes (Zhang Rui)
- Address an "unsigned expression compared with zero" warning in the
intel_soc_dts_iosf thermal driver (Yang Li)
- Update comments regarding two functions in the Intel Menlow thermal
driver (Deming Wang)
- Use sysfs_emit_at() instead of scnprintf() in the int340x thermal
driver (ye xingchen)
- Make the intel_pch thermal driver support the Wellsburg PCH (Tim
Zimmermann)
- Modify the intel_pch and processor_thermal_device_pci thermal
drivers use generic trip point tables instead of thermal zone trip
point callbacks (Daniel Lezcano)
- Add production mode attribute sysfs attribute to the int340x
thermal driver (Srinivas Pandruvada)
- Rework dynamic trip point updates handling and locking in the
int340x thermal driver (Rafael Wysocki)
- Make the int340x thermal driver use a generic trip points table
instead of thermal zone trip point callbacks (Rafael Wysocki,
Daniel Lezcano)
- Clean up and improve the int340x thermal driver (Rafael Wysocki)
- Simplify and clean up the intel_pch thermal driver (Rafael Wysocki)
- Fix the Intel powerclamp thermal driver and make it use the common
idle injection framework (Srinivas Pandruvada)
- Add two module parameters, cpumask and max_idle, to the Intel
powerclamp thermal driver to allow it to affect only a specific
subset of CPUs instead of all of them (Srinivas Pandruvada)
- Make the Intel quark_dts thermal driver Use generic trip point
objects instead of its own trip point representation (Daniel
Lezcano)
- Add toctree entry for thermal documents and fix two issues in the
Intel powerclamp driver documentation (Bagas Sanjaya)
- Use strscpy() to instead of strncpy() in the thermal core (Xu
Panda)
- Fix thermal_sampling_exit() (Vincent Guittot)
- Add Mediatek Low Voltage Thermal Sensor (LVTS) driver (Balsam
Chihi)
- Add r8a779g0 RCar support to the rcar_gen3 thermal driver (Geert
Uytterhoeven)
- Fix useless call to set_trips() when resuming in the rcar_gen3
thermal control driver and add interrupt support detection at init
time to it (Niklas Söderlund)
- Fix memory corruption in the hi3660 thermal driver (Yongqin Liu)
- Fix include path for libnl3 in pkg-config file for libthermal
(Vibhav Pant)
- Remove syscfg-based driver for st as the platform is not supported
any more (Alain Volmat)"
* tag 'thermal-6.3-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (135 commits)
thermal/drivers/st: Remove syscfg based driver
thermal: Remove core header inclusion from drivers
tools/lib/thermal: Fix include path for libnl3 in pkg-config file.
thermal/drivers/hisi: Drop second sensor hi3660
thermal/drivers/rcar_gen3_thermal: Fix device initialization
thermal/drivers/rcar_gen3_thermal: Create device local ops struct
thermal/drivers/rcar_gen3_thermal: Do not call set_trips() when resuming
thermal/drivers/rcar_gen3: Add support for R-Car V4H
dt-bindings: thermal: rcar-gen3-thermal: Add r8a779g0 support
thermal/drivers/mediatek: Add the Low Voltage Thermal Sensor driver
dt-bindings: thermal: mediatek: Add LVTS thermal controllers
thermal/drivers/mediatek: Relocate driver to mediatek folder
tools/lib/thermal: Fix thermal_sampling_exit()
Documentation: powerclamp: Fix numbered lists formatting
Documentation: powerclamp: Escape wildcard in cpumask description
Documentation: admin-guide: Add toctree entry for thermal docs
thermal: intel: powerclamp: Add two module parameters
Documentation: admin-guide: Move intel_powerclamp documentation
thermal: core: Use sysfs_emit_at() instead of scnprintf()
thermal: intel: powerclamp: Fix duration module parameter
...
When setting the power limit time window, software updates the 'y' bits
and 'f' bits in the power limit register, and the value hardware takes
follows the formula below
Time window = 2 ^ y * (1 + f / 4) * Time_Unit
When handling large time window input from userspace, using left
shifting breaks in two cases:
1. when ilog2(value) is bigger than 31, in expression "1 << y", left
shifting by more than 31 bits has undefined behavior. This breaks
'y'. For example, on an Alderlake platform, "1 << 32" returns 1.
2. when ilog2(value) equals 31, "1 << 31" returns negative value
because '1' is recognized as signed int. And this breaks 'f'.
Given that 'y' has 5 bits and hardware can never take a value larger
than 31, fix the first problem by clamp the time window to the maximum
possible value that the hardware can take.
Fix the second problem by using unsigned bit left shift.
Note that hardware has its own maximum time window limitation, which
may be lower than the time window value retrieved from the power limit
register. When this happens, hardware clamps the input to its maximum
time window limitation. That is why a software clamp is preferred to
handle the problem on hand.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Adjusted the comment added by this change ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The powercap/idle_inject core uses play_idle_precise() to inject idle
time. But play_idle_precise() can't ensure that the CPU is fully idle
for the specified duration because of wakeups due to interrupts. To
compensate for the reduced idle time due to these wakes, the caller
can adjust requested idle time for the next cycle.
The goal of idle injection is to keep system at some idle percent on
average, so this is fine to overshoot or undershoot instantaneous idle
times.
The idle inject core provides an interface idle_inject_set_duration()
to set idle and runtime duration.
Some architectures provide interface to get actual idle time observed
by the hardware. So, the effective idle percent can be adjusted using
the hardware feedback. For example, Intel CPUs provides package idle
counters, which is currently used by Intel powerclamp driver to
readjust runtime duration.
When the caller's desired idle time over a period is less or greater
than the actual CPU idle time observed by the hardware, caller can
readjust idle and runtime duration for the next cycle.
The only way this can be done currently is by monitoring hardware idle
time from a different software thread and readjust idle and runtime
duration using idle_inject_set_duration().
This can be avoided by adding a callback which callers can register and
readjust from this callback function.
Add a capability to register an optional update() callback, which can be
called from the idle inject core before waking up CPUs for idle injection.
This callback can be registered via a new interface:
idle_inject_register_full().
During this process of constantly adjusting idle and runtime duration
there can be some cases where actual idle time is more than the desired.
In this case idle inject can be skipped for a cycle. If update() callback
returns false, then the idle inject core skips waking up CPUs for the
idle injection.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Export symbols for external interfaces, so that they can be used in
other loadable modules.
Export is done under name space IDLE_INJECT.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The users of the idle injection framework allow 100% idle injection. For
example: thermal/cpuidle_cooling.c driver. When the ratio is set to
100%, the runtime_duration becomes zero.
However, idle_inject_set_duration() in the idle injection framework
silently ignores run_duration_us == 0 without any error (it is a void
function). The caller will then assume that everything is fine and
100% idle is effective, but in reality the idle duration will not
change.
There are two options:
- The caller may change their max state to 99% instead of 100% and
document that 100% is not supported by the idle inject framework.
- Add 100% idle support to the idle inject framework.
Since there are other protections via RT throttling, this framework can
allow 100% idle. The RT throttling will be activated at 95% idle by
default. The caller disabling RT throttling and injecting 100% idle,
should be aware that CPU can't be used at all.
The idle inject timer is started for (run_duration_us + idle_duration_us)
duration. Hence replace (run_duration_us && idle_duration_us) with
(run_duration_us + idle_duration_us) in the function
idle_inject_set_duration(). Also check for !(run_duration_us +
idle_duration_us) to return -EINVAL in idle_inject_start().
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add Emerald Rapids to the list of supported processor models in the
Intel RAPL power capping driver.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add Meteor Lake to the list of supported processor models in the
Intel RAPL power capping driver.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In the error path after calling dev_set_name(), the device
name is leaked. To fix this, calling dev_set_name() before
device_register(), and call put_device() if it returns error.
All the resources is released in powercap_release(), so it
can return from powercap_register_zone() directly.
Fixes: 75d2364ea0 ("PowerCap: Add class driver")
Signed-off-by: Yang Yingliang <yangyingliang@huawei.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix following warning at three places:
Function parameter or member 'ii_dev' not described.
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
strtobool() is the same as kstrtobool().
However, the latter is more used within the kernel.
In order to remove strtobool() and slightly simplify kstrtox.h, switch to
the other function name.
While at it, include the corresponding header file (<linux/kstrtox.h>)
Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add a powercap driver that, using the ARM SCMI Protocol to query the SCMI
platform firmware for the list of existing Powercap domains, registers all
of such discovered domains under the new 'arm-scmi' powercap control type.
A new simple powercap zone and constraint is registered for all the SCMI
powercap zones that are found.
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Cristian Marussi <cristian.marussi@arm.com>
Acked-by: Sudeep Holla <sudeep.holla@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Debuggability:
- Change most occurances of BUG_ON() to WARN_ON_ONCE()
- Reorganize & fix TASK_ state comparisons, turn it into a bitmap
- Update/fix misc scheduler debugging facilities
- Load-balancing & regular scheduling:
- Improve the behavior of the scheduler in presence of lot of
SCHED_IDLE tasks - in particular they should not impact other
scheduling classes.
- Optimize task load tracking, cleanups & fixes
- Clean up & simplify misc load-balancing code
- Freezer:
- Rewrite the core freezer to behave better wrt thawing and be simpler
in general, by replacing PF_FROZEN with TASK_FROZEN & fixing/adjusting
all the fallout.
- Deadline scheduler:
- Fix the DL capacity-aware code
- Factor out dl_task_is_earliest_deadline() & replenish_dl_new_period()
- Relax/optimize locking in task_non_contending()
- Cleanups:
- Factor out the update_current_exec_runtime() helper
- Various cleanups, simplifications
Signed-off-by: Ingo Molnar <mingo@kernel.org>
-----BEGIN PGP SIGNATURE-----
iQJFBAABCgAvFiEEBpT5eoXrXCwVQwEKEnMQ0APhK1gFAmM/01cRHG1pbmdvQGtl
cm5lbC5vcmcACgkQEnMQ0APhK1geZA/+PB4KC1T9aVxzaTHI36R03YgJYZmIdtxw
wTf02MixePmz+gQCbepJbempGOh5ST28aOcI0xhdYOql5B63MaUBBMlB0HvGUyDG
IU3zETqLMRtAbnSTdQFv8m++ECUtZYp8/x1FCel4WO7ya4ETkRu1NRfCoUepEhpZ
aVAlae9LH3NBaF9t7s0PT2lTjf3pIzMFRkddJ0ywJhbFR3VnWat05fAK+J6fGY8+
LS54coefNlJD4oDh5TY8uniL1j5SmWmmwbk9Cdj7bLU5P3dFSS0/+5FJNHJPVGDE
srGT7wstRUcDrN0CnZo48VIUBiApJCCDqTfJYi9wNYd0NAHvwY6MIJJgEIY8mKsI
L/qH26H81Wt+ezSZ/5JIlGlZ/LIeNaa6OO/fbWEYABBQogvvx3nxsRNUYKSQzumH
CnSBasBjLnjWyLlK4qARM9cI7NFSEK6NUigrEx/7h8JFu/8T4DlSy6LsF1HUyKgq
4+FJLAqG6cL0tcwB/fHYd0oRESN8dStnQhGxSojgufwLc7dlFULvCYF5JM/dX+/V
IKwbOfIOeOn6ViMtSOXAEGdII+IQ2/ZFPwr+8Z5JC7NzvTVL6xlu/3JXkLZR3L7o
yaXTSaz06h1vil7Z+GRf7RHc+wUeGkEpXh5vnarGZKXivhFdWsBdROIJANK+xR0i
TeSLCxQxXlU=
=KjMD
-----END PGP SIGNATURE-----
Merge tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip
Pull scheduler updates from Ingo Molnar:
"Debuggability:
- Change most occurances of BUG_ON() to WARN_ON_ONCE()
- Reorganize & fix TASK_ state comparisons, turn it into a bitmap
- Update/fix misc scheduler debugging facilities
Load-balancing & regular scheduling:
- Improve the behavior of the scheduler in presence of lot of
SCHED_IDLE tasks - in particular they should not impact other
scheduling classes.
- Optimize task load tracking, cleanups & fixes
- Clean up & simplify misc load-balancing code
Freezer:
- Rewrite the core freezer to behave better wrt thawing and be
simpler in general, by replacing PF_FROZEN with TASK_FROZEN &
fixing/adjusting all the fallout.
Deadline scheduler:
- Fix the DL capacity-aware code
- Factor out dl_task_is_earliest_deadline() &
replenish_dl_new_period()
- Relax/optimize locking in task_non_contending()
Cleanups:
- Factor out the update_current_exec_runtime() helper
- Various cleanups, simplifications"
* tag 'sched-core-2022-10-07' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (41 commits)
sched: Fix more TASK_state comparisons
sched: Fix TASK_state comparisons
sched/fair: Move call to list_last_entry() in detach_tasks
sched/fair: Cleanup loop_max and loop_break
sched/fair: Make sure to try to detach at least one movable task
sched: Show PF_flag holes
freezer,sched: Rewrite core freezer logic
sched: Widen TAKS_state literals
sched/wait: Add wait_event_state()
sched/completion: Add wait_for_completion_state()
sched: Add TASK_ANY for wait_task_inactive()
sched: Change wait_task_inactive()s match_state
freezer,umh: Clean up freezer/initrd interaction
freezer: Have {,un}lock_system_sleep() save/restore flags
sched: Rename task_running() to task_on_cpu()
sched/fair: Cleanup for SIS_PROP
sched/fair: Default to false in test_idle_cores()
sched/fair: Remove useless check in select_idle_core()
sched/fair: Avoid double search on same cpu
sched/fair: Remove redundant check in select_idle_smt()
...
Intel Xeon servers used to use a fixed energy resolution (15.3uj) for
Dram RAPL domain. But on SPR, Dram RAPL domain follows the standard
energy resolution as described in MSR_RAPL_POWER_UNIT.
Remove the SPR dram_domain_energy_unit quirk.
Fixes: 2d798d9f59 ("powercap: intel_rapl: add support for Sapphire Rapids")
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Cc: 5.9+ <stable@vger.kernel.org> # 5.9+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When value < time_unit, the parameter of ilog2() will be zero and
the return value is -1. u64(-1) is too large for shift exponent
and then will trigger shift-out-of-bounds:
shift exponent 18446744073709551615 is too large for 32-bit type 'int'
Call Trace:
rapl_compute_time_window_core
rapl_write_data_raw
set_time_window
store_constraint_time_window_us
Signed-off-by: Chao Qin <chao.qin@intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Now that wait_task_inactive()'s @match_state argument is a mask (like
ttwu()) it is possible to replace the special !match_state case with
an 'all-states' value such that any blocked state will match.
Suggested-by: Ingo Molnar (mingo@kernel.org>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/YxhkzfuFTvRnpUaH@hirez.programming.kicks-ass.net
Add intel_rapl support for RAPTORLAKE_S platform, which behaves the same
as RAPTORLAKE and RAPTORLAKE_P platforms.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
- Make cpufreq_show_cpus() more straightforward (Viresh Kumar).
- Drop unnecessary CPU hotplug locking from store() used by cpufreq
sysfs attributes (Viresh Kumar).
- Make the ACPI cpufreq driver support the boost control interface on
Zhaoxin/Centaur processors (Tony W Wang-oc).
- Print a warning message on attempts to free an active cpufreq policy
which should never happen (Viresh Kumar).
- Fix grammar in the Kconfig help text for the loongson2 cpufreq
driver (Randy Dunlap).
- Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq
governor (Zhao Liu).
- Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll
cpuidle driver (Eiichi Tsukata).
- Modify intel_idle to treat C1 and C1E as independent idle states on
Sapphire Rapids (Artem Bityutskiy).
- Extend support for wakeirq to callback wrappers used during system
suspend and resume (Ulf Hansson).
- Defer waiting for device probe before loading a hibernation image
till the first actual device access to avoid possible deadlocks
reported by syzbot (Tetsuo Handa).
- Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
Helgaas).
- Add Raptor Lake-P to the list of processors supported by the Intel
RAPL driver (George D Sworo).
- Add Alder Lake-N and Raptor Lake-P to the list of processors for
which Power Limit4 is supported in the Intel RAPL driver (Sumeet
Pawnikar).
- Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
attempting to remove it (Hsin-Yi Wang).
- Change the Energy Model code to represent power in micro-Watts and
adjust its users accordingly (Lukasz Luba).
- Add new devfreq driver for Mediatek CCI (Cache Coherent
Interconnect) (Johnson Wang).
- Convert the Samsung Exynos SoC Bus bindings to DT schema of
exynos-bus.c (Krzysztof Kozlowski).
- Address kernel-doc warnings by adding the description for unused
fucntion parameters in devfreq core (Mauro Carvalho Chehab).
- Use NULL to pass a null pointer rather than zero according to the
function propotype in imx-bus.c (Colin Ian King).
- Print error message instead of error interger value in
tegra30-devfreq.c (Dmitry Osipenko).
- Add checks to prevent setting negative frequency QoS limits for
CPUs (Shivnandan Kumar).
- Update the pm-graph suite of utilities to the latest revision 5.9
including multiple improvements (Todd Brandt).
- Drop pme_interrupt reference from the PCI power management
documentation (Mario Limonciello).
-----BEGIN PGP SIGNATURE-----
iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmLoKy8SHHJqd0Byand5
c29ja2kubmV0AAoJEILEb/54YlRx3+oQAJNVU+W14EaRPWXQRMuwBC5zk3hb6T9q
JqmMd8coEd+9/4ABAeRAWso1B26rUzB6JyBvw3lGH9OXInpYmvnJEhEPrTpK2h0D
U9HxEARuGJolrDm0X9NAkn7tKKMC9GnvPS9z2s7s+N97VFFWC/QiU+PHB0SypGNb
JxRfbVJZQCuxmNG9UeK+xeHFQ9lM2Z9ZdTxR71G0n7nQPPR+sUvnFufFby3Aogf3
XnBYfia+YNqkUlefxxwB5a0cFwPXOUGsQkIf4d64gZnq1TgZ+71kht1GEF08PDFl
wV8v1rOWuXEae8dozuf5xszp/eVyAqzgB+IShT9APREOO3Wg6I16XdBm8R1TGwCK
JTdZqnm6HVKBNqchEwYViJILX69rrNUT+AwHBWhtKKDNh3qeTuwi/JGTeDVN++en
xf3TNKx3LV31Nq6nWJFzDGLehfZMnAPkhfYohUBI7FNyblpk4mJRVcZ0bYI7UNnS
als77uoipvb5KdFCtdhxYBHd/y867NvXKa1qsAuDxusAsfJHf4SnlMdbgOepBH2y
jJg06CGrMDU3TZ8BL+WpqUYk4irQnAMs/159Txh7A6/dOnTjE7S9NHrENCwmt2og
FrHSLH1eLX6Sa4RSibiGHPC7mNULP2/TOtryf3zFdlIVcjm3NEU3bnfzx7nlJn05
8t6ObMxgMhWT
=XeLV
-----END PGP SIGNATURE-----
Merge tag 'pm-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm
Pull power management updates from Rafael Wysocki:
"These are mostly minor improvements all over including new CPU IDs for
the Intel RAPL driver, an Energy Model rework to use micro-Watt as the
power unit, cpufreq fixes and cleanus, cpuidle updates, devfreq
updates, documentation cleanups and a new version of the pm-graph
suite of utilities.
Specifics:
- Make cpufreq_show_cpus() more straightforward (Viresh Kumar).
- Drop unnecessary CPU hotplug locking from store() used by cpufreq
sysfs attributes (Viresh Kumar).
- Make the ACPI cpufreq driver support the boost control interface on
Zhaoxin/Centaur processors (Tony W Wang-oc).
- Print a warning message on attempts to free an active cpufreq
policy which should never happen (Viresh Kumar).
- Fix grammar in the Kconfig help text for the loongson2 cpufreq
driver (Randy Dunlap).
- Use cpumask_var_t for an on-stack CPU mask in the ondemand cpufreq
governor (Zhao Liu).
- Add trace points for guest_halt_poll_ns grow/shrink to the haltpoll
cpuidle driver (Eiichi Tsukata).
- Modify intel_idle to treat C1 and C1E as independent idle states on
Sapphire Rapids (Artem Bityutskiy).
- Extend support for wakeirq to callback wrappers used during system
suspend and resume (Ulf Hansson).
- Defer waiting for device probe before loading a hibernation image
till the first actual device access to avoid possible deadlocks
reported by syzbot (Tetsuo Handa).
- Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
Helgaas).
- Add Raptor Lake-P to the list of processors supported by the Intel
RAPL driver (George D Sworo).
- Add Alder Lake-N and Raptor Lake-P to the list of processors for
which Power Limit4 is supported in the Intel RAPL driver (Sumeet
Pawnikar).
- Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
attempting to remove it (Hsin-Yi Wang).
- Change the Energy Model code to represent power in micro-Watts and
adjust its users accordingly (Lukasz Luba).
- Add new devfreq driver for Mediatek CCI (Cache Coherent
Interconnect) (Johnson Wang).
- Convert the Samsung Exynos SoC Bus bindings to DT schema of
exynos-bus.c (Krzysztof Kozlowski).
- Address kernel-doc warnings by adding the description for unused
function parameters in devfreq core (Mauro Carvalho Chehab).
- Use NULL to pass a null pointer rather than zero according to the
function propotype in imx-bus.c (Colin Ian King).
- Print error message instead of error interger value in
tegra30-devfreq.c (Dmitry Osipenko).
- Add checks to prevent setting negative frequency QoS limits for
CPUs (Shivnandan Kumar).
- Update the pm-graph suite of utilities to the latest revision 5.9
including multiple improvements (Todd Brandt).
- Drop pme_interrupt reference from the PCI power management
documentation (Mario Limonciello)"
* tag 'pm-5.20-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (27 commits)
powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P
PM: QoS: Add check to make sure CPU freq is non-negative
PM: hibernate: defer device probing when resuming from hibernation
intel_idle: make SPR C1 and C1E be independent
cpufreq: ondemand: Use cpumask_var_t for on-stack cpu mask
cpufreq: loongson2: fix Kconfig "its" grammar
pm-graph v5.9
cpufreq: Warn users while freeing active policy
cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1
firmware: arm_scmi: Get detailed power scale from perf
Documentation: EM: Switch to micro-Watts scale
PM: EM: convert power field to micro-Watts precision and align drivers
PM / devfreq: tegra30: Add error message for devm_devfreq_add_device()
PM / devfreq: imx-bus: use NULL to pass a null pointer rather than zero
PM / devfreq: shut up kernel-doc warnings
dt-bindings: interconnect: samsung,exynos-bus: convert to dtschema
PM / devfreq: mediatek: Introduce MediaTek CCI devfreq driver
dt-bindings: interconnect: Add MediaTek CCI dt-bindings
PM: domains: Ensure genpd_debugfs_dir exists before remove
PM: runtime: Extend support for wakeirq for force_suspend|resume
...
Merge core device power management changes for v5.20-rc1:
- Extend support for wakeirq to callback wrappers used during system
suspend and resume (Ulf Hansson).
- Defer waiting for device probe before loading a hibernation image
till the first actual device access to avoid possible deadlocks
reported by syzbot (Tetsuo Handa).
- Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP (Bjorn
Helgaas).
- Add Raptor Lake-P to the list of processors supported by the Intel
RAPL driver (George D Sworo).
- Add Alder Lake-N and Raptor Lake-P to the list of processors for
which Power Limit4 is supported in the Intel RAPL driver (Sumeet
Pawnikar).
- Make pm_genpd_remove() check genpd_debugfs_dir against NULL before
attempting to remove it (Hsin-Yi Wang).
- Change the Energy Model code to represent power in micro-Watts and
adjust its users accordingly (Lukasz Luba).
* pm-core:
PM: runtime: Extend support for wakeirq for force_suspend|resume
* pm-sleep:
PM: hibernate: defer device probing when resuming from hibernation
PM: wakeup: Unify device_init_wakeup() for PM_SLEEP and !PM_SLEEP
* powercap:
powercap: RAPL: Add Power Limit4 support for Alder Lake-N and Raptor Lake-P
powercap: intel_rapl: Add support for RAPTORLAKE_P
* pm-domains:
PM: domains: Ensure genpd_debugfs_dir exists before remove
* pm-em:
cpufreq: scmi: Support the power scale in micro-Watts in SCMI v3.1
firmware: arm_scmi: Get detailed power scale from perf
Documentation: EM: Switch to micro-Watts scale
PM: EM: convert power field to micro-Watts precision and align drivers
Add Alder Lake-N and Raptor Lake-P to the list of processor models
for which Power Limit4 is supported by the Intel RAPL driver.
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The milli-Watts precision causes rounding errors while calculating
efficiency cost for each OPP. This is especially visible in the 'simple'
Energy Model (EM), where the power for each OPP is provided from OPP
framework. This can cause some OPPs to be marked inefficient, while
using micro-Watts precision that might not happen.
Update all EM users which access 'power' field and assume the value is
in milli-Watts.
Solve also an issue with potential overflow in calculation of energy
estimation on 32bit machine. It's needed now since the power value
(thus the 'cost' as well) are higher.
Example calculation which shows the rounding error and impact:
power = 'dyn-power-coeff' * volt_mV * volt_mV * freq_MHz
power_a_uW = (100 * 600mW * 600mW * 500MHz) / 10^6 = 18000
power_a_mW = (100 * 600mW * 600mW * 500MHz) / 10^9 = 18
power_b_uW = (100 * 605mW * 605mW * 600MHz) / 10^6 = 21961
power_b_mW = (100 * 605mW * 605mW * 600MHz) / 10^9 = 21
max_freq = 2000MHz
cost_a_mW = 18 * 2000MHz/500MHz = 72
cost_a_uW = 18000 * 2000MHz/500MHz = 72000
cost_b_mW = 21 * 2000MHz/600MHz = 70 // <- artificially better
cost_b_uW = 21961 * 2000MHz/600MHz = 73203
The 'cost_b_mW' (which is based on old milli-Watts) is misleadingly
better that the 'cost_b_uW' (this patch uses micro-Watts) and such
would have impact on the 'inefficient OPPs' information in the Cpufreq
framework. This patch set removes the rounding issue.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add RAPTORLAKE_P to the list of supported processor models in the Intel
RAPL power capping driver.
Signed-off-by: George D Sworo <george.d.sworo@intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
[ rjw: Minor changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
effective_cpu_util() already has a `int cpu' parameter which allows to
retrieve the CPU capacity scale factor (or maximum CPU capacity) inside
this function via an arch_scale_cpu_capacity(cpu).
A lot of code calling effective_cpu_util() (or the shim
sched_cpu_util()) needs the maximum CPU capacity, i.e. it will call
arch_scale_cpu_capacity() already.
But not having to pass it into effective_cpu_util() will make the EAS
wake-up code easier, especially when the maximum CPU capacity reduced
by the thermal pressure is passed through the EAS wake-up functions.
Due to the asymmetric CPU capacity support of arm/arm64 architectures,
arch_scale_cpu_capacity(int cpu) is a per-CPU variable read access via
per_cpu(cpu_scale, cpu) on such a system.
On all other architectures it is a a compile-time constant
(SCHED_CAPACITY_SCALE).
Signed-off-by: Dietmar Eggemann <dietmar.eggemann@arm.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Acked-by: Vincent Guittot <vincent.guittot@linaro.org>
Tested-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lkml.kernel.org/r/20220621090414.433602-4-vdonnefort@google.com
Marge Energy Model support updates and cpuidle updates for 5.19-rc1:
- Update the Energy Model support code to allow the Energy Model to be
artificial, which means that the power values may not be on a uniform
scale with other devices providing power information, and update the
cpufreq_cooling and devfreq_cooling thermal drivers to support
artificial Energy Models (Lukasz Luba).
- Make DTPM check the Energy Model type (Lukasz Luba).
- Fix policy counter decrementation in cpufreq if Energy Model is in
use (Pierre Gondois).
- Add AlderLake processor support to the intel_idle driver (Zhang Rui).
- Fix regression leading to no genpd governor in the PSCI cpuidle
driver and fix the riscv-sbi cpuidle driver to allow a genpd
governor to be used (Ulf Hansson).
* pm-em:
PM: EM: Decrement policy counter
powercap: DTPM: Check for Energy Model type
thermal: cooling: Check Energy Model type in cpufreq_cooling and devfreq_cooling
Documentation: EM: Add artificial EM registration description
PM: EM: Remove old debugfs files and print all 'flags'
PM: EM: Change the order of arguments in the .active_power() callback
PM: EM: Use the new .get_cost() callback while registering EM
PM: EM: Add artificial EM flag
PM: EM: Add .get_cost() callback
* pm-cpuidle:
cpuidle: riscv-sbi: Fix code to allow a genpd governor to be used
cpuidle: psci: Fix regression leading to no genpd governor
intel_idle: Add AlderLake support
There is no need to store the result of the multiply back to variable value
after the multiplication. The store is redundant, replace *= with just *.
Cleans up clang scan build warning:
warning: Although the value stored to 'value' is used in the enclosing
expression, the value is never actually read from 'value'
[deadcode.DeadStores]
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add ALDERLAKE_N to the list of supported processor models in the Intel
RAPL power capping driver.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add RaptorLake to the list of processor models for which Power Limit4
is supported by the Intel RAPL driver.
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
[ rjw: Changelog rewrite ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Add intel_rapl support for the RaptorLake platform.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The Energy Model power values might be artificial. In such case
it's safe to bail out during the registration, since the PowerCap
framework supports only micro-Watts.
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Ionela Voinescu <ionela.voinescu@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is a spelling mistake in a pr_info() message. Fix it.
Signed-off-by: Colin Ian King <colin.i.king@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
drivers/powercap/dtpm.c:525:22: warning: symbol 'dtpm_node_callback' was not declared. Should it be static?
Fixes: 3759ec678e ("powercap/drivers/dtpm: Add hierarchy creation")
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: kernel test robot <lkp@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Now that we can destroy the hierarchy, the code must remove what it
had put in place at the creation. In our case, the cpu hotplug
callbacks.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20220130210210.549877-6-daniel.lezcano@linaro.org
The 'root' node is checked everytime a dtpm node is destroyed.
When we reach the end of the hierarchy destruction function, we can
unconditionnaly set the 'root' node to NULL again.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20220130210210.549877-5-daniel.lezcano@linaro.org
The hierarchy creation function exits but without a destroy hierarchy
function. Due to that, the modules creating the hierarchy can not be
unloaded properly because they don't have an exit callback.
Provide the dtpm_destroy_hierarchy() function to remove the previously
created hierarchy.
The function relies on all the release mechanisms implemented by the
underlying powercap framework.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20220130210210.549877-4-daniel.lezcano@linaro.org
The release function does not reset the per cpu variable when it is
called. That will prevent creation again as the variable will be
already from the previous creation.
Fix it by resetting them.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20220130210210.549877-2-daniel.lezcano@linaro.org
The different functions are all called through the
dtpm_create_hierarchy() which handle the mutex. The different
functions are used in this context, consequently with the lock always
held.
Remove all locks taken in the function and add the lock in the
hierarchy creation function.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20220130210210.549877-1-daniel.lezcano@linaro.org
Currently the dtpm supports the CPUs via cpufreq and the energy
model. This change provides the same for the device which supports
devfreq.
Each device supporting devfreq and having an energy model can be added
to the hierarchy.
The concept is the same as the cpufreq DTPM support: the QoS is used
to aggregate the requests and the energy model gives the value of the
instantaneous power consumption ponderated by the load of the device.
Cc: Chanwoo Choi <cwchoi00@gmail.com>
Cc: Lukasz Luba <lukasz.luba@arm.com>
Cc: Kyungmin Park <kyungmin.park@samsung.com>
Cc: MyungJoo Ham <myungjoo.ham@samsung.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20220128163537.212248-5-daniel.lezcano@linaro.org
Based on the previous DT changes in the core code, use the 'setup'
callback to initialize the CPU DTPM backend.
Code is reorganized to stick to the DTPM table description. No
functional changes.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20220128163537.212248-4-daniel.lezcano@linaro.org
The DTPM framework is available but without a way to configure it.
This change provides a way to create a hierarchy of DTPM node where
the power consumption reflects the sum of the children's power
consumption.
It is up to the platform to specify an array of dtpm nodes where each
element has a pointer to its parent, except the top most one. The type
of the node gives the indication of which initialization callback to
call. At this time, we can create a virtual node, where its purpose is
to be a parent in the hierarchy, and a DT node where the name
describes its path.
In order to ensure a nice self-encapsulation, the DTPM subsys array
contains a couple of initialization functions, one to setup the DTPM
backend and one to initialize it up. With this approach, the DTPM
framework has a very few material to export.
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Link: https://lore.kernel.org/r/20220128163537.212248-3-daniel.lezcano@linaro.org
The init table section is freed after the system booted. However the
next changes will make per module the DTPM description, so the table
won't be accessible when the module is loaded.
In order to fix that, we should move the table to the data section
where there are very few entries and that makes strange to add it
there.
The main goal of the table was to keep self-encapsulated code and we
can keep it almost as it by using an array instead.
Suggested-by: Ulf Hansson <ulf.hansson@linaro.org>
Reviewed-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20220128163537.212248-2-daniel.lezcano@linaro.org
Drop superfluous "the" from the comment in line 15.
Signed-off-by: Jason Wang <wangborong@cdjrlc.com>
[ rjw: Subject edit, new changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On Sapphire Rapids, the layout of the Psys domain Power Limit Register
is different from from what it was before.
Enhance the code to support the new Psys PL register layout.
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Reported-and-tested-by: Alkattan Dana <dana.alkattan@intel.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The dtpm_descr variable in init_dtpm() is not used after commit
f751db8ada ("powercap/drivers/dtpm: Disable DTPM at boot time"),
so drop it.
Fixes: f751db8ada ("powercap/drivers/dtpm: Disable DTPM at boot time")
Reported-by: Stephen Rothwell <sfr@canb.auug.org.au>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The DTPM framework misses a mechanism to set it up. That is currently
under review but will come after the next cycle.
As the distro are enabling all the kernel options, the DTPM framework
is enabled on platforms where the energy model is not implemented,
thus making the framework inconsistent and disrupting the CPU
frequency scaling service.
Remove the initialization at boot time as a hot fix.
Fixes: 7a89d7eacf ("powercap/drivers/dtpm: Simplify the dtpm table")
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reported-By: Doug Smythies <dsmythies@telus.net>
Tested-By: Doug Smythies <dsmythies@telus.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When the ENERGY_MODEL and DTPM_CPU are enabled but actually without
any energy model, at cpu hotplug time, the dead cpuhp callback fails
leading to the warning.
Actually, the check could be simplified and we only do an action if
the dtpm cpu is enabled, otherwise we bail out without error.
Fixes: 7a89d7eacf ("powercap/drivers/dtpm: Simplify the dtpm table")
Reported-by: Kenneth R. Crudup <kenny@panix.com>
Tested-by: Kenneth R. Crudup <kenny@panix.com>
Reported-by: kernel test robot <oliver.sang@intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>