Commit Graph

349 Commits

Author SHA1 Message Date
Srinivas Pandruvada
ea197ea2ba thermal: intel: int340x_thermal: New IOCTLs for Passive v2 table
Export Passive version 2 table similar to the way _TRT and _ART tables
via IOCTLs.

This removes need for binary utility to read ACPI Passive 2 table by
providing open source support. This table already has open source
implementation in the user space thermald, when the table is part of
data vault exported by the int3400 sysfs.

This table is supported in some older platforms before Ice Lake
generation.

Passive 2 tables contain multiple entries. Each entry has following
fields:

 * Source: Named Reference (String). This is the source device for
   temperature.
 * Target: Named Reference (String). This is the target device to
   control.
 * Priority: Priority of this device compared to others.
 * SamplingPeriod: Time Period in 1/10 of seconds unit.
 * PassiveTemp: Passive Temperature in 1/10 of Kelvin.
 * SourceDomain: Domain for the source (00:Processor, others reserved).
 * ControlKnob: Type of control knob (00:Power Limit 1, others: reserved)
 * Limit: The target state to set on reaching passive temperature.
   This can be a string "max", "min" or a power limit value.
 * LimitStepSize: Step size during activation.
 * UnLimitStepSize: Step size during deactivation.
 * Reserved1: Reserved

Three IOCTLs are added similar to IOCTLs for reading TRT:

ACPI_THERMAL_GET_PSVT_COUNT: Number of passive 2 entries.
ACPI_THERMAL_GET_PSVT_LEN: Total return data size (count x each
  passive 2 entry size).
ACPI_THERMAL_GET_PSVT: Get the data as an array of objects with
  passive 2 entries.

This change is based on original development done by:
Todd Brandt <todd.e.brandt@linux.intel.com>

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog and subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-06-13 17:57:39 +02:00
Srinivas Pandruvada
5f7fdb0f25 thermal: intel: int340x: Add new line for UUID display
Prior to the commit "763bd29fd3d1 ("thermal: int340x_thermal: Use
sysfs_emit_at() instead of scnprintf()", there was a new line after each
UUID string.

With the newline removed, existing user space like "thermald" fails to
compare each supported UUID as it is using getline() to read UUID and
apply correct thermal table.

To avoid breaking existing user space, add newline after each UUID string.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Fixes: 763bd29fd3 ("thermal: int340x_thermal: Use sysfs_emit_at() instead of scnprintf()")
Cc: 6.3+ <stable@vger.kernel.org> # 6.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 19:50:04 +02:00
Zhang Rui
b4288ce788 powercap: intel_rapl: Introduce RAPL I/F type
Different RAPL Interfaces may have different primitive information and
rapl_defaults calls.

To better distinguish this difference in the RAPL framework code,
introduce a new enum to represent different types of RAPL Interfaces.

No functional change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:20 +02:00
Zhang Rui
bf44b9011d powercap: intel_rapl: Make cpu optional for rapl_package
MSR RAPL Interface always removes a rapl_package when all the CPUs in
that rapl_package are offlined. This is because it relies on an online
CPU to access the MSR.

But for RAPL Interface using MMIO registers, when all the cpus within
the rapl_package are offlined,
1. the register can still be accessed
2. monitoring and setting the Power Pimits for the rapl_package is still
   meaningful because of uncore power.

This means that, a valid rapl_package doesn't rely on one or more cpus
being onlined.

For this sense, make cpu optional for rapl_package. A rapl_package can
be registered either using a CPU id to represent the physical
package/die, or using the physical package id directly.

Note that, the thermal throttling interrupt is not disabled via
MSR_IA32_PACKAGE_THERM_INTERRUPT for such rapl_package at the moment.
If it is still needed in the future, this can be achieved by selecting
an onlined CPU using the physical package id.

Note that, processor_thermal_rapl, the current MMIO RAPL Interface
driver, can also be converted to register using a package id instead.
But this is not done right now because processor_thermal_rapl driver
works on single-package systems only, and offlining the only package
will not happen. So keep the previous logic.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:20 +02:00
Zhang Rui
a38f300bb2 powercap: intel_rapl: Use bitmap for Power Limits
Currently, a RAPL package is registered with the number of Power Limits
supported in each RAPL domain. But this doesn't tell which Power Limits
are available. Using the number of Power Limits supported to guess the
availability of each Power Limit is fragile.

Use bitmap to represent the availability of each Power Limit.

Note that PL1 is mandatory thus it does not need to be set explicitly by
the RAPL Interface drivers.

No functional change intended.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Wang Wendy <wendy.wang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-24 18:46:19 +02:00
Srinivas Pandruvada
b5d68f84f4 thermal: intel: powerclamp: Fix NULL pointer access issue
If cur_state for the powerclamp cooling device is set to the default
minimum state of 0, without setting first to cur_state > 0, this results
in NULL pointer access.

This NULL pointer access happens in the powercap core idle-inject
function idle_inject_set_duration() as there is no NULL check for
idle_inject_device pointer. This pointer must be allocated by calling
idle_inject_register() or idle_inject_register_full().

In the function powerclamp_set_cur_state(), idle_inject_device pointer
is allocated only when the cur_state > 0. But setting 0 without changing
to any other state, idle_inject_set_duration() will be called with a
NULL idle_inject_device pointer.

To address this, just return from powerclamp_set_cur_state() if the
current cooling device state is the same as the last one. Since the
power-up default cooling device state is 0, changing the state to 0
again here will return without calling idle_inject_set_duration().

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Fixes: 8526eb7fc7 ("thermal: intel: powerclamp: Use powercap idle-inject feature")
Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=217386
Tested-by: Risto A. Paju <teknohog@iki.fi>
Cc: 6.3+ <stable@kernel.org> # 6.3+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-05-04 20:30:18 +02:00
Rafael J. Wysocki
2b6a7409ac thermal: intel: menlow: Get rid of this driver
According to my information, there are no active users of this driver in
the field.

Moreover, it does some really questionable things and gets in the way of
thermal core improvements, so drop it.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-04-27 19:24:13 +02:00
Daniel Lezcano
ba7894be5e thermal: intel: pch_thermal: Use thermal driver device to write a trace
The pch_critical() callback accesses the thermal zone device structure
internals, it dereferences the thermal zone struct device and the 'type'.

Use the available accessors instead of accessing the structure directly.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-27 19:20:12 +02:00
Srinivas Pandruvada
5bc6b1df65 thermal: intel: int340x: Add DLVR support for RFIM control
Add support for DLVR (Digital Linear Voltage Regulator) attributes,
which can be used to control RFIM.

Here instead of "fivr" another directory "dlvr" is created with DLVR
attributes:

/sys/bus/pci/devices/0000:00:04.0/dlvr
├── dlvr_freq_mhz
├── dlvr_freq_select
├── dlvr_hardware_rev
├── dlvr_pll_busy
├── dlvr_rfim_enable
└── dlvr_spread_spectrum_pct
└── dlvr_control_mode
└── dlvr_control_lock

Attributes
dlvr_freq_mhz (RO):
Current DLVR PLL frequency in MHz.

dlvr_freq_select (RW):
Sets DLVR PLL clock frequency.

dlvr_hardware_rev (RO):
DLVR hardware revision.

dlvr_pll_busy (RO):
PLL can't accept frequency change when set.

dlvr_rfim_enable (RW):
0: Disable RF frequency hopping, 1: Enable RF frequency hopping.

dlvr_control_mode (RW):
Specifies how frequencies are spread. 0: Down spread, 1: Spread in Center.

dlvr_control_lock (RW):
1: future writes are ignored.

dlvr_spread_spectrum_pct (RW)
A write to this register updates the DLVR spread spectrum percent value.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-18 15:24:40 +02:00
Rafael J. Wysocki
065ca2a8c6 Merge back Intel thermal control material for 6.4-rc1. 2023-04-14 17:14:02 +02:00
Srinivas Pandruvada
117e4e5bd9 thermal: intel: Avoid updating unsupported THERM_STATUS_CLEAR mask bits
Some older processors don't allow BIT(13) and BIT(15) in the current
mask set by "THERM_STATUS_CLEAR_CORE_MASK". This results in:

unchecked MSR access error: WRMSR to 0x19c (tried to
write 0x000000000000aaa8) at rIP: 0xffffffff816f66a6
(throttle_active_work+0xa6/0x1d0)

To avoid unchecked MSR issues, check CPUID for each relevant feature and
use that information to set the supported feature bits only in the
"clear" mask for cores. Do the same for the analogous package mask set
by "THERM_STATUS_CLEAR_PKG_MASK".

Introduce functions thermal_intr_init_core_clear_mask() and
thermal_intr_init_pkg_clear_mask() to set core and package mask bits,
respectively. These functions are called during initialization.

Fixes: 6fe1e64b60 ("thermal: intel: Prevent accidental clearing of HFI status")
Reported-by: Rui Salvaterra <rsalvaterra@gmail.com>
Link: https://lore.kernel.org/lkml/cdf43fb423368ee3994124a9e8c9b4f8d00712c6.camel@linux.intel.com/T/
Tested-by: Rui Salvaterra <rsalvaterra@gmail.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 6.2+ <stable@kernel.org> # 6.2+
[ rjw: Renamed 2 funtions and 2 static variables, edited subject and
  changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-04-11 18:12:19 +02:00
Rafael J. Wysocki
d4d8516624 Merge back Intel thermal driver changes for 6.4-rc1. 2023-03-31 19:32:43 +02:00
David Arcari
ae817e618d thermal: intel: powerclamp: Fix cpumask and max_idle module parameters
When cpumask is specified as a module parameter the value is
overwritten by the module init routine.  This can easily be fixed
by checking to see if the mask has already been allocated in the
init routine.

When max_idle is specified as a module parameter a panic will occur.
The problem is that the idle_injection_cpu_mask is not allocated until
the module init routine executes. This can easily be fixed by allocating
the cpumask if it's not already allocated.

Fixes: ebf5197102 ("thermal: intel: powerclamp: Add two module parameters")
Signed-off-by: David Arcari <darcari@redhat.com>
Reviewed-by: Srinivas Pandruvada<srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-30 20:04:29 +02:00
Srinivas Pandruvada
a57cc2dbb3 thermal: intel: int340x: processor_thermal: Fix additional deadlock
Commit 52f04f10b9 ("thermal: intel: int340x: processor_thermal: Fix
deadlock") addressed deadlock issue during user space trip update. But it
missed a case when thermal zone device is disabled when user writes 0.

Call to thermal_zone_device_disable() also causes deadlock as it also
tries to lock tz->lock, which is already claimed by trip_point_temp_store()
in the thermal core code.

Remove call to thermal_zone_device_disable() in the function
sys_set_trip_temp(), which is called from trip_point_temp_store().

Fixes: 52f04f10b9 ("thermal: intel: int340x: processor_thermal: Fix deadlock")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 6.2+ <stable@vger.kernel.org> # 6.2+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-29 20:36:35 +02:00
Rafael J. Wysocki
85b52122e5 Merge branch 'thermal-intel'
Merge an x86_pkg_temp_thermal Intel thermal driver fix (Zhang Rui).

* thermal-intel:
  thermal: intel: x86_pkg_temp_thermal: Add lower bound check for sysfs input
2023-03-27 13:47:11 +02:00
Zhang Rui
2a96243e22 thermal: intel: x86_pkg_temp_thermal: Add lower bound check for sysfs input
When setting a trip point temperature from sysfs, there is an upper
bound check on the user input, but no lower bound check.

As hardware register has 7 bits for a trip point temperature, the offset
to tj_max of the input temperature must be equal to/less than 0x7f.
Or else,
 1. bogus temperature is updated into the trip temperature bits.
 2. the upper bits of the register can be polluted.

For example,
$ rdmsr 0x1b2
2000003
$ echo -180000 > /sys/class/thermal/thermal_zone1/trip_point_1_temp
$ rdmsr 0x1b2
3980003

Not only the trip point temp is set to 76C on this platform (tj_max is
100), the Power Notification (Bit 24) is also enabled erronously.

Fix the problem by adding lower bound check for sysfs input.

Reported-by: Dan Carpenter <error27@gmail.com>
Link: https://lore.kernel.org/all/add7a378-4d50-4ba1-81d3-a0c17db25a0b@kili.mountain/
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-17 19:22:44 +01:00
Rafael J. Wysocki
2b6db9efa5 Merge branch 'thermal-core' into thermal
Merge thermal control updates for 6.4-rc1:

 - Add a thermal zone 'devdata' accessor and modify several drivers to
   use it (Daniel Lezcano).

 - Prevent drivers from using the 'device' internal thermal zone
   structure field directly (Daniel Lezcano).

 - Clean up the hwmon thermal driver (Daniel Lezcano).

 - Add thermal zone id accessor and thermal zone type accessor
   and prevent drivers from using thermal zone fields directly (Daniel
   Lezcano).

 - Clean up the acerhdf and tegra thermal drivers (Daniel Lezcano).

* thermal-core:
  thermal/drivers/acerhdf: Remove pointless governor test
  thermal/drivers/acerhdf: Make interval setting only at module load time
  thermal/drivers/tegra: Remove unneeded lock when setting a trip point
  thermal/hwmon: Use the thermal_core.h header
  thermal/drivers/da9062: Don't access the thermal zone device fields
  thermal: Use thermal_zone_device_type() accessor
  thermal: Add a thermal zone id accessor
  thermal/drivers/spear: Don't use tz->device but pdev->dev
  thermal/core: Add thermal_zone_device structure 'type' accessor
  thermal: Don't use 'device' internal thermal zone structure field
  thermal/hwmon: Use the right device for devm_thermal_add_hwmon_sysfs()
  thermal/hwmon: Do not set no_hwmon before calling thermal_add_hwmon_sysfs()
  thermal: Remove debug or error messages in get_temp() ops
  thermal/core: Show a debug message when get_temp() fails
  thermal/core: Use the thermal zone 'devdata' accessor in remaining drivers
  thermal/core: Use the thermal zone 'devdata' accessor in hwmon located drivers
  thermal/core: Use the thermal zone 'devdata' accessor in thermal located drivers
  thermal/core: Add a thermal zone 'devdata' accessor
2023-03-08 14:03:56 +01:00
Daniel Lezcano
5f68d0785e thermal/core: Use the thermal zone 'devdata' accessor in thermal located drivers
The thermal zone device structure is exposed to the different drivers
and obviously they access the internals while that should be
restricted to the core thermal code.

In order to self-encapsulate the thermal core code, we need to prevent
the drivers accessing directly the thermal zone structure and provide
accessor functions to deal with.

Use the devdata accessor introduced in the previous patch.

No functional changes intended.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> #R-Car
Acked-by: Mark Brown <broonie@kernel.org>
Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com> #MediaTek auxadc and lvts
Reviewed-by: Balsam CHIHI <bchihi@baylibre.com> #Mediatek lvts
Reviewed-by: Adam Ward <DLG-Adam.Ward.opensource@dm.renesas.com> #da9062
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>  #spread
Acked-by: Jernej Skrabec <jernej.skrabec@gmail.com> #sun8i_thermal
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Florian Fainelli <f.fainelli@gmail.com> #Broadcom
Reviewed-by: Dhruva Gole <d-gole@ti.com> # K3 bandgap
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Heiko Stuebner <heiko@sntech.de> #rockchip
Reviewed-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> #uniphier
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-03 20:45:02 +01:00
Srinivas Pandruvada
52f04f10b9 thermal: intel: int340x: processor_thermal: Fix deadlock
When user space updates the trip point there is a deadlock, which results
in caller gets blocked forever.

Commit 05eeee2b51 ("thermal/core: Protect sysfs accesses to thermal
operations with thermal zone mutex"), added a mutex for tz->lock in the
function trip_point_temp_store(). Hence, trip set callback() can't
call any thermal zone API as they are protected with the same mutex lock.

The callback here calling thermal_zone_device_enable(), which will result
in deadlock.

Move the thermal_zone_device_enable() to proc_thermal_pci_probe() to
avoid this deadlock.

Fixes: 05eeee2b51 ("thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@intel.com>
Cc: 6.2+ <stable@vger.kernel.org> # 6.2+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-03 20:34:49 +01:00
Randy Dunlap
1467fb9603 thermal: intel: BXT_PMIC: select REGMAP instead of depending on it
REGMAP is a hidden (not user visible) symbol. Users cannot set it
directly thru "make *config", so drivers should select it instead of
depending on it if they need it.

Consistently using "select" or "depends on" can also help reduce
Kconfig circular dependency issues.

Therefore, change the use of "depends on REGMAP" to "select REGMAP".

Fixes: b474303ffd ("thermal: add Intel BXT WhiskeyCove PMIC thermal driver")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-01 19:32:00 +01:00
Dan Carpenter
f1b930e740 thermal: intel: quark_dts: fix error pointer dereference
If alloc_soc_dts() fails, then we can just return.  Trying to free
"soc_dts" will lead to an Oops.

Fixes: 8c18769396 ("thermal: intel Quark SoC X1000 DTS thermal driver")
Signed-off-by: Dan Carpenter <error27@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-03-01 19:28:57 +01:00
Daniel Lezcano
9272d2d43b thermal: Remove core header inclusion from drivers
As the name states "thermal_core.h" is the header file for the core
components of the thermal framework.

Too many drivers are including it. Hopefully the recent cleanups
helped to self encapsulate the code a bit more and prevented the
drivers to need this header.

Remove this inclusion in every place where it is possible.

Some other drivers did a confusion with the core header and the one
exported in linux/thermal.h. They include the former instead of the
latter. The changes also fix this.

The tegra/soctherm driver still remains as it uses an internal
function which need to be replaced.

The Intel HFI driver uses the netlink internal framework core and
should be changed to prevent to deal with the internals.

No functional changes intended.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Miquel Raynal <miquel.raynal@bootlin.com> # armada_thermal.c
Reviewed-by: Kunihiko Hayashi <hayashi.kunihiko@socionext.com> # uniphier_thermal.c
Reviewed-by: Niklas Söderlund <niklas.soderlund+renesas@ragnatech.se> # rcar_gen3_thermal.c
Reviewed-by: Neil Armstrong <neil.armstrong@linaro.org> # amlogic_thermal.c
Acked-by: Florian Fainelli <f.fainelli@gmail.com> # bcm2835_thermal.c
Acked-by: Thierry Reding <treding@nvidia.com> # tegra30-tsensor.c
Link: https://lore.kernel.org/r/20230206153432.1017282-1-daniel.lezcano@linaro.org
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-15 17:29:48 +01:00
Srinivas Pandruvada
ebf5197102 thermal: intel: powerclamp: Add two module parameters
In some use cases, it is desirable to only inject idle on certain set
of CPUs. For example on Alder Lake systems, it is possible that we force
idle only on P-Cores for thermal reasons. Also the idle percent can be
more than 50% if we only choose partial set of CPUs in the system.

Introduce 2 new module parameters for this purpose. They can be only
changed when the cooling device is inactive.

cpumask (Read/Write): A bit mask of CPUs to inject idle. The format of
this bitmask is same as used in other subsystems like in
/proc/irq/*/smp_affinity. The mask is comma separated 32 bit groups.
Each CPU is one bit. For example for 256 CPU system the full mask is:
ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff,ffffffff
The rightmost mask is for CPU 0-32.

max_idle (Read/Write): Maximum injected idle time to the total CPU time
ratio in percent range from 1 to 100. Even if the cooling device max_state
is always 100 (100%), this parameter allows to add a max idle percent
limit. The default is 50, to match the current implementation of powerclamp
driver. Also doesn't allow value more than 75, if the cpumask includes
every CPU present in the system.

Also when the cpumask doesn't include every CPU, there is no use of
compensation using package C-state idle counters. Hence don't start
package C-state polling thread even for a single package or a single die
system in this case.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09 21:01:06 +01:00
Srinivas Pandruvada
966d0ab673 thermal: intel: powerclamp: Fix duration module parameter
After the switch to use the powercap/idle-inject framework in the Intel
powerclamp driver, the idle duration unit is microsecond.

However, the module parameter for idle duration is in milliseconds, so
convert it to microseconds in the "set" callback and back to milliseconds
in a new "get" callback.

While here, also use mutex protection for setting and getting "duration".

The other uses of "duration" are already protected by the mutex.

Fixes: 8526eb7fc7 ("thermal: intel: powerclamp: Use powercap idle-inject feature")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-09 19:57:35 +01:00
Srinivas Pandruvada
6210849654 thermal: intel: powerclamp: Return last requested state as cur_state
When the user is reading cur_state from the thermal cooling device for
Intel powerclamp device:
 - It returns the idle ratio from Package C-state counters when
   there is active idle injection session.
 - -1, when there is no active idle injection session.

This information is not very useful as the package C-state counters vary
a lot from read to read. Instead just return the last requested cur_state.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-07 20:51:15 +01:00
Daniel Lezcano
72ffc28f2f thermal: intel: quark_dts: Use generic trip points
Make the intel_quark_dts_thermal driver register an array of generic
trip points along with the thermal zone and drop the trip points
thermal zone callbacks that are not used any more from it.

Signed-off-by: Daniel Lezcano <daniel.lezcano@kernel.org>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-03 14:49:26 +01:00
Srinivas Pandruvada
8526eb7fc7 thermal: intel: powerclamp: Use powercap idle-inject feature
There are two idle injection implementation in the Linux kernel. One
via intel_powerclamp and the other using powercap/idle_inject. Both
implementation end up in calling play_idle* function from a FIFO
priority thread. Both can't be used at the same time.

It is better to use one idle injection framework for better
maintainability. In this way, there is only one caller for play_idle.

Here powercap/idle_inject can be used for both per-core and for system
wide idle injection. This framework has a well defined interface which
allow registry for per-core or for all CPUs (system wide).

This reduces code complexity in the intel powerclamp driver as all the
per CPU kthreads, delayed work and calls to play_idle can be removed.

The changes include:
 - Remove unneeded include files
 - Remove per CPU kthread workers: balancing_work and idle_injection_work.
 - Reuse the compensation related code by moving from previous worker
   thread to idle_injection callback.
 - Adjust the idle_duration and runtime by using powercap/idle_inject
   interface.
 - Remove all variables, which are not required once powercap/idle_inject
   is used.
 - Add mutex to avoid race during removal of idle injection during module
   unload and user action to change idle inject percent. Also for
   protection during dynamic adjustment of run and idle time from
   update() callback.
 - Remove online/offline callbacks to designate control CPU
 - Use cpu_present_mask global variable for CPU mask
 - Remove hot plug locks

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-03 14:48:34 +01:00
Srinivas Pandruvada
8e47363588 thermal: intel: powerclamp: Fix cur_state for multi package system
The powerclamp cooling device cur_state shows actual idle observed by
package C-state idle counters. But the implementation is not sufficient
for multi package or multi die system. The cur_state value is incorrect.
On these systems, these counters must be read from each package/die and
somehow aggregate them. But there is no good method for aggregation.

It was not a problem when explicit CPU model addition was required to
enable intel powerclamp. In this way certain CPU models could have
been avoided. But with the removal of CPU model check with the
availability of Package C-state counters, the driver is loaded on most
of the recent systems.

For multi package/die systems, just show the actual target idle state,
the system is trying to achieve. In powerclamp this is the user set
state minus one.

Also there is no use of starting a worker thread for polling package
C-state counters and applying any compensation for multiple package
or multiple die systems.

Fixes: b721ca0d19 ("thermal/powerclamp: remove cpu whitelist")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-02-02 21:08:32 +01:00
Rafael J. Wysocki
2153a87ff9 thermal: intel: intel_pch: Drop struct board_info
Because the only member of struct board_info is the name, the
board_info[] array of struct board_info elements can be replaced with
an array of strings.

Modify the code accordingly and drop struct board_info.

No intentional functional impact.

Suggested-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02 21:06:24 +01:00
Rafael J. Wysocki
ae98e57a6e thermal: intel: intel_pch: Rename board ID symbols
Use capitals in the names of the board ID symbols and add the PCH_
prefix to each of them for consistency.

Also rename the board_ids enum accordingly.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02 21:06:15 +01:00
Rafael J. Wysocki
c5f43242f4 thermal: intel: intel_pch: Fold suspend and resume routines into their callers
Fold pch_suspend() and pch_resume(), that each have only one caller,
into their respective callers to make the code somewhat easier to
follow.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02 21:06:09 +01:00
Rafael J. Wysocki
35c87f948d thermal: intel: intel_pch: Fold two functions into their callers
Fold two functions, pch_hw_init() and pch_get_temp(), that each have
only one caller, into their respective callers to make the code somewhat
easier to follow.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02 21:04:01 +01:00
Rafael J. Wysocki
86cb1004b6 thermal: intel: intel_pch: Eliminate device operations object
The same device operations object is pointed to by all of the board
configurations in the driver, so effectively the same operations
callbacks are used by all of them which only adds overhead (that can
be significant due to retpolines) for no real purpose.

For this reason, drop the device operations object and replace the
respective callback invocations by direct calls to the specific
functions that were previously pointed to by callback pointers.

No intentional change in behavior.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02 15:40:39 +01:00
Rafael J. Wysocki
1aa4f925d8 thermal: intel: intel_pch: Rename device operations callbacks
Because the same device operations callbacks are used for all supported
boards, they are in fact generic, so rename them to reflect that.

Also rename the operations object itself for consistency.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02 15:40:39 +01:00
Rafael J. Wysocki
558718f4d3 thermal: intel: intel_pch: Eliminate redundant return pointers
Both pch_wpt_init() and pch_wpt_get_temp() can return the proper
result via their return values, so they do not need to use return
pointers.

Modify them accordingly.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02 15:40:39 +01:00
Rafael J. Wysocki
2cee73568e thermal: intel: intel_pch: Make pch_wpt_add_acpi_psv_trip() return int
Modify pch_wpt_add_acpi_psv_trip() to return an int value instead of
using a return pointer for that.

While at it, drop an excessive empty code line.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02 15:40:39 +01:00
Rafael J. Wysocki
1bcebcab88 thermal: intel: int340x: Improve int340x_thermal_set_trip_temp()
Instead of using snprintf() to populate the ACPI object name in
int340x_thermal_set_trip_temp(), use an appropriate initializer
and make the function fail if its trip argument is greater than 9,
because ACPI object names can only be 4 characters long and it does
not make sense to even try to evaluate objects with longer names (that
argument is guaranteed to be non-negative, because it comes from the
thermal code that will not pass negative trip numbers to zone
callbacks).

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02 15:31:04 +01:00
Rafael J. Wysocki
d0009d14e9 thermal: intel: int340x: Drop pointless cast to unsigned long
The explicit casting from int to unsigned long in
int340x_thermal_get_zone_temp() is pointless, becuase the multiplication
result is cast back to int by the assignment in the same statement, so
drop it.

No expected functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02 15:31:04 +01:00
Rafael J. Wysocki
67c6945867 thermal: intel: int340x: Rename variable in int340x_thermal_zone_add()
Rename local variables int34x_thermal_zone in int340x_thermal_zone_add()
and int340x_thermal_zone_remove() to int34x_zone which allows a number
of code lines to be shorter and easier to read and adjust some white
space for consistency.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02 15:31:04 +01:00
Rafael J. Wysocki
be014c789c thermal: intel: int340x: Assorted minor cleanups
Improve some inconsistent usage of white space in int340x_thermal_zone.c,
fix up one coding style issue in it (missing braces around an else
branch of a conditional) and while at it replace a !ACPI_FAILURE()
check with an equivalent ACPI_SUCCESS() one.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Tested-by: Zhang Rui <rui.zhang@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
2023-02-02 15:31:04 +01:00
Rafael J. Wysocki
dd3b3d160e thermal: ACPI: Make helpers retrieve temperature only
It is slightly better to make the ACPI thermal helper functions retrieve
the trip point temperature only instead of doing the full trip point
initialization, because they are also used for updating some already
registered trip points, in which case initializing a new trip just
in order to update the temperature of an existing one is somewhat
wasteful.

Modify the ACPI thermal helpers accordingly and update their users.

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2023-02-02 15:26:45 +01:00
Rafael J. Wysocki
f4118dbe61 thermal: intel: int340x: Use generic trip points table
Modify int340x_thermal_zone_add() to register the thermal zone along
with a trip points table, which allows the trip-related zone callbacks
to be dropped, because they are not needed any more.

In order to consolidate the code, use ACPI trip library functions to
populate generic trip points in int340x_thermal_read_trips() and to
update them in int340x_thermal_update_trips().

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Co-developed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-27 15:11:12 +01:00
Rafael J. Wysocki
9e9b7e182c thermal: intel: int340x: Use zone lock for synchronization
Because the ->get_trip_temp() and ->get_trip_type() thermal zone
callbacks are only invoked from __thermal_zone_get_trip() which is
always called by the thermal core under the zone lock, it is sufficient
for int340x_thermal_update_trips() to acquire the zone lock for mutual
exclusion with those callbacks.

Accordingly, modify int340x_thermal_update_trips() to use the zone lock
instead of the internal trip_mutex and drop the latter which is not
necessary any more.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-27 15:11:12 +01:00
Rafael J. Wysocki
b1bf9dbffc thermal: intel: int340x: Rework updating trip points
It is generally invalid to change the trip point indices after they have
been exposed via sysfs.

Moreover, the thermal objects in the ACPI namespace cannot go away and
appear on the fly.  In practice, the only thing that can happen when the
INT3403_PERF_TRIP_POINT_CHANGED notification is sent by the platform
firmware is a change of the return values of those thermal objects.

For this reason, add a special function for updating the trip point
temperatures after re-evaluating the respective ACPI thermal objects
and change int3403_notify() to invoke it instead of
int340x_thermal_read_trips() that would change the trip point indices
on errors.  Also remove the locking from the latter, because it is only
called before registering the thermal zone and it cannot race with the
zone's callbacks.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-27 15:11:12 +01:00
Rafael J. Wysocki
97efecfdbf thermal: ACPI: Initialize trips if temperature is out of range
In some cases it is still useful to register a trip point if the
temperature returned by the corresponding ACPI thermal object (for
example, _HOT) is invalid to start with, because the same ACPI
thermal object may start to return a valid temperature after a
system configuration change (for example, from an AC power source
to battery an vice versa).

For this reason, if the ACPI thermal object evaluated by
thermal_acpi_trip_init() successfully returns a temperature value that
is out of the range of values taken into account, initialize the trip
point using THERMAL_TEMP_INVALID as the temperature value instead of
returning an error to allow the user of the trip point to decide what
to do with it.

Also update pch_wpt_add_acpi_psv_trip() to reject trip points with
invalid temperature values.

Fixes: 7a0e397488 ("thermal: ACPI: Add ACPI trip point routines")
Reported-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-27 15:09:47 +01:00
Rafael J. Wysocki
a5c926acd0 Merge back Intel thermal control changes for 6.3. 2023-01-27 15:08:08 +01:00
Daniel Lezcano
e90eb1df70 thermal: intel: processor_thermal_device_pci: Use generic trip point
Make proc_thermal_pci_probe() register the TCPU_PCI thermal zone along
with the trip point used by it and drop the zone callbacks related to
this trip point that are not needed any more.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-26 16:17:28 +01:00
Srinivas Pandruvada
5c36cf2784 thermal: intel: int340x: Add production mode attribute
It is possible that the system manufacturer locks down thermal tuning
beyond what is usually done on the given platform. In that case user
space calibration tools should not try to adjust the thermal
configuration of the system.

To allow user space to check if that is the case, add a new sysfs
attribute "production_mode" that will be present when the ACPI DCFG
method is present under the INT3400 device object in the ACPI Namespace.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-26 14:47:16 +01:00
Rafael J. Wysocki
acd7e9ee57 thermal: intel: int340x: Add locking to int340x_thermal_get_trip_type()
In order to prevent int340x_thermal_get_trip_type() from possibly
racing with int340x_thermal_read_trips() invoked by int3403_notify()
add locking to it in analogy with int340x_thermal_get_trip_temp().

Fixes: 6757a7abe4 ("thermal: intel: int340x: Protect trip temperature from concurrent updates")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-25 15:37:21 +01:00
Srinivas Pandruvada
6757a7abe4 thermal: intel: int340x: Protect trip temperature from concurrent updates
Trip temperatures are read using ACPI methods and stored in the memory
during zone initializtion and when the firmware sends a notification for
change. This trip temperature is returned when the thermal core calls via
callback get_trip_temp().

But it is possible that while updating the memory copy of the trips when
the firmware sends a notification for change, thermal core is reading the
trip temperature via the callback get_trip_temp(). This may return invalid
trip temperature.

To address this add a mutex to protect the invalid temperature reads in
the callback get_trip_temp() and int340x_thermal_read_trips().

Fixes: 5fbf7f27fa ("Thermal/int340x: Add common thermal zone handler")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: 5.0+ <stable@vger.kernel.org> # 5.0+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-24 21:28:19 +01:00
Daniel Lezcano
fee19c6921 thermal: intel: intel_pch: Use generic trip points
The thermal framework gives the possibility to register the trip
points along with the thermal zone.  When that is done, no get_trip_*
callbacks are needed and they can be removed.

Convert the existing callbacks content logic into generic trip points
initialization code and register them along with the thermal zone.

In order to consolidate the code, use an ACPI trip library function
to populate a generic trip point.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Subject and changelog edits, rebase ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
2023-01-24 21:13:42 +01:00
Tim Zimmermann
40dc192908 thermal: intel: intel_pch: Add support for Wellsburg PCH
Add the PCI ID for the Wellsburg C610 series chipset PCH.

The driver can read the temperature from the Wellsburg PCH with only
the PCI ID added and no other modifications.

Signed-off-by: Tim Zimmermann <tim@linux4.de>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-24 21:13:25 +01:00
Rafael J. Wysocki
8ef0ca4a17 Merge back other thermal control material for 6.3.
* thermal: (734 commits)
  thermal: core: call put_device() only after device_register() fails
  Linux 6.2-rc4
  kbuild: Fix CFI hash randomization with KASAN
  firmware: coreboot: Check size of table entry and use flex-array
  kallsyms: Fix scheduling with interrupts disabled in self-test
  ata: pata_cs5535: Don't build on UML
  lockref: stop doing cpu_relax in the cmpxchg loop
  x86/pci: Treat EfiMemoryMappedIO as reservation of ECAM space
  efi: tpm: Avoid READ_ONCE() for accessing the event log
  io_uring: lock overflowing for IOPOLL
  ALSA: pcm: Move rwsem lock inside snd_ctl_elem_read to prevent UAF
  iommu/mediatek-v1: Fix an error handling path in mtk_iommu_v1_probe()
  iommu/iova: Fix alloc iova overflows issue
  iommu: Fix refcount leak in iommu_device_claim_dma_owner
  iommu/arm-smmu-v3: Don't unregister on shutdown
  iommu/arm-smmu: Don't unregister on shutdown
  iommu/arm-smmu: Report IOMMU_CAP_CACHE_COHERENCY even betterer
  platform/x86: thinkpad_acpi: Fix profile mode display in AMT mode
  ALSA: usb-audio: Fix possible NULL pointer dereference in snd_usb_pcm_has_fixed_rate()
  platform/x86: int3472/discrete: Ensure the clk/power enable pins are in output mode
  ...
2023-01-24 21:12:49 +01:00
ye xingchen
763bd29fd3 thermal: int340x_thermal: Use sysfs_emit_at() instead of scnprintf()
Follow the advice of the Documentation/filesystems/sysfs.rst that show()
should only use sysfs_emit() or sysfs_emit_at() when formatting the
value to be returned to user space.

Signed-off-by: ye xingchen <ye.xingchen@zte.com.cn>
[ rjw: Subject rewrite ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-20 17:59:35 +01:00
Deming Wang
a30e65792c thermal: intel: menlow: Update function descriptions
Update function parameter descriptions for sensor_get_auxtrip() and
sensor_set_auxtrip().

[ rjw: New changelog, subject edits ]
Signed-off-by: Deming Wang <wangdeming@inspur.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-18 21:02:36 +01:00
Yang Li
e7fcfe67f9 thermal: intel: Fix unsigned comparison with less than zero
The return value from the call to intel_tcc_get_tjmax() is int, which can
be a negative error code. However, the return value is being assigned to
an u32 variable 'tj_max', so making 'tj_max' an int.

Eliminate the following warning:
./drivers/thermal/intel/intel_soc_dts_iosf.c:394:5-11: WARNING: Unsigned expression compared with zero: tj_max < 0

Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=3637
Reported-by: Abaci Robot <abaci@linux.alibaba.com>
Signed-off-by: Yang Li <yang.lee@linux.alibaba.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-10 20:33:26 +01:00
Daniel Lezcano
d3ecaf17b5 thermal/drivers/intel: Use generic thermal_zone_get_trip() function
The thermal framework gives the possibility to register the trip
points with the thermal zone. When that is done, no get_trip_* ops are
needed and they can be removed.

Convert ops content logic into generic trip points and register them with the
thermal zone.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20221003092602.1323944-30-daniel.lezcano@linaro.org
2023-01-06 14:14:48 +01:00
Daniel Lezcano
0d2d586a86 thermal/intel/int340x: Replace parameter to simplify
In the process of replacing the get_trip_* ops by the generic trip
points, the current code has an 'override' property to add another
indirection to a different ops.

Rework this approach to prevent this indirection and make the code
ready for the generic trip points conversion.

Actually the get_temp() is different regarding the platform, so it is
pointless to add a new set of ops but just create dynamically the ops
at init time.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20221003092602.1323944-29-daniel.lezcano@linaro.org
2023-01-06 14:14:48 +01:00
Zhang Rui
58374a3970 thermal/x86_pkg_temp_thermal: Add support for handling dynamic tjmax
Tjmax value retrieved from MSR_IA32_TEMPERATURE_TARGET can be changed at
runtime when the Intel SST-PP (Intel Speed Select Technology -
Performance Profile) level is changed.

Enhance the code to use updated tjmax when programming the thermal
interrupt thresholds.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-30 19:57:38 +01:00
Zhang Rui
983eb370cb thermal/x86_pkg_temp_thermal: Use Intel TCC library
Cleanup the code by using Intel TCC library for TCC (Thermal Control
Circuitry) MSR access.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-30 19:57:38 +01:00
Zhang Rui
4e3ecc2898 thermal/intel/intel_tcc_cooling: Use Intel TCC library
Cleanup the code by using Intel TCC library for TCC (Thermal Control
Circuitry) MSR access.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-30 19:57:38 +01:00
Zhang Rui
955fb8719e thermal/intel/intel_soc_dts_iosf: Use Intel TCC library
Cleanup the code by using Intel TCC library for TCC (Thermal Control
Circuitry) MSR access.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-30 19:57:38 +01:00
Zhang Rui
d91a4714e5 thermal/int340x/processor_thermal: Use Intel TCC library
Cleanup the code by using Intel TCC library for TCC (Thermal Control
Circuitry) MSR access.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-30 19:57:38 +01:00
Zhang Rui
a3c1f066e1 thermal/intel: Introduce Intel TCC library
There are several different drivers that accesses the Intel TCC
(thermal control circuitry) MSRs, and each of them has its own
implementation for the same functionalities, e.g. getting the current
temperature, getting the tj_max, and getting/setting the tj_max offset.

Introduce a library to unify the code for Intel CPU TCC MSR access.

At the same time, ensure the temperature is got based on the updated
tjmax value because tjmax can be changed at runtime for cases like
the Intel SST-PP (Intel Speed Select Technology - Performance Profile)
level change.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-30 19:57:38 +01:00
Srinivas Pandruvada
b878d3ba9b thermal: int340x: Add missing attribute for data rate base
Commit 473be51142 ("thermal: int340x: processor_thermal: Add RFIM
driver")' added rfi_restriction_data_rate_base string, mmio details and
documentation, but missed adding attribute to sysfs.

Add missing sysfs attribute.

Fixes: 473be51142 ("thermal: int340x: processor_thermal: Add RFIM driver")
Cc: 5.11+ <stable@vger.kernel.org> # v5.11+
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-30 19:48:37 +01:00
Linus Torvalds
601c1aa855 More thermal control updates for 6.2-rc1
- Avoid clearing the HFI status bit on systems without HFI support
    which triggers unchecked MSR access errors (Srinivas Pandruvada).
 
  - Add sm8450 and sm8550 QCom compatible string to DT bindings (Luca
    Weiss, Neil Armstrong).
 
  - Use devm_platform_get_and_ioremap_resource on the ST platform to
    group two calls into a single one (Minghao Chi).
 
  - Use GENMASK instead of bitmaps and validate the temperature after
    reading it in the imx8mm_thermal driver (Marcus Folkesson).
 
  - Convert generic-adc-thermal to DT schema (Rob Herring).
 
  - Fix debug print message with inverted logic in the k3_j72xx_bandgap
    driver (Keerthy).
 
  - Fix memory leak on thermal_of_zone_register() failure (Ido Schimmel).
 
  - Add support for IPQ8074 in the tsens thermal driver along with the DT
    bindings (Robert Marko).
 
  - Fix and rework the debugfs code in the tsens driver (Christian
    Marangi).
 
  - Add calibration and DT documentation for the imx8mm driver (Marek
    Vasut).
 
  - Add DT bindings and compatible for the Mediatek SoCs mt7981 and
    mt7983 (Daniel Golle).
 
  - Don't show an error message if it happens at probe time while it
    will be deferred on the QCom SPMI ADC driver (Johan Hovold).
 
  - Add HWMon support for the imx8mm board (Alexander Stein).
 
  - Remove pointless include from the power allocator governor (Christophe
    JAILLET).
 
  - Add interrupt DT bindings for QCom SoCs SC8280XP, SM6350 and SM8450
    (Krzysztof Kozlowski).
 
  - Fix inaccurate warning message for the QCom tsens gen2 (Luca Weiss).
 
  - Demote error log of thermal zone register to debug in the tsens QCom
    driver (Manivannan Sadhasivam).
 
  - Consolidate the the efuse values and the errata handling in the TI
    Bandgap driver (Bryan Brattlof).
 
  - Document Renesas RZ/Five as compatible with RZ/G2UL in the DT
    bindings (Lad Prabhakar).
 
  - Fix the irq handler return value in the LMh driver (Bjorn Andersson).
 
  - Delete empty platform remove callback from imx_sc_thermal (Uwe
    Kleine-König).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmObX2sSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx7u0P/0NtwAxUICNqL1+QCdPupjbeF5c/tNFp
 c3FJGxO3QEfs+/UQ7GDaVGQqlPpao+E9jv3f2ii2+datznhOHday00o8MQ/B1pwG
 j1bAfyqmaJtruX0Q5uFGed1G6HBfcTn50iMweOOsXAm4opPtT0vgOG/Vm/TcZPbt
 4Sxo4aybZ2ZQuknbX8o9oWiUoOi43+OS3HHalC+VbBXXO+9Gp1RIcvmGPQEZtwy+
 tPq7+B0ofuCkLpVjDfWwwDQ4VsD5pSwPM5z+FcO0zu0dHpl8TROs432Y7RvvvIdk
 yhBnP7iTczvikPvlWK+6S2qHc+WD2kv5xsngF6RDDB0N8lTDTwhLJpTdYHnC/Zin
 Ho8+Lzifmj6V5ZeUe3xphQZAPqn9DAr8T9pFAOSEyXthsAqHPYBvnWuvAJg4G8uE
 8LnqwNV0xe1LEs0whVVqlXgG2QD1z42T9Wxv5y6QWCypiEudiSFRjSkvXeZMYU3K
 sn4KCYUYya2b+/ENN5xQca/eKqAo+EKXtDGMKTMQIShI8K7CdLpjpCAIFrdYptCi
 zescOMjO/trfA4RhPBFMcL03g1YARFHBCJ+HkRSlHu2vO+M0zu3HlkCNyvsVHGNZ
 2nv5+m7jLFE+N9sidxUrKVoMgsJU4qtPp/IRYS2shlzcMs6Jdxe2NEIaWK2hDC+w
 YUuIh8LhWWCq
 =MsHk
 -----END PGP SIGNATURE-----

Merge tag 'thermal-6.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull more thermal control updates from Rafael Wysocki:
 "These are updates of assorted thermal drivers, mostly for ARM
  platforms, generally isolated and fairly straightforward, and the
  recent Intel HFI driver fix for systems without HFI support.

  Specifics:

   - Avoid clearing the HFI status bit on systems without HFI support
     which triggers unchecked MSR access errors (Srinivas Pandruvada)

   - Add sm8450 and sm8550 QCom compatible string to DT bindings (Luca
     Weiss, Neil Armstrong)

   - Use devm_platform_get_and_ioremap_resource on the ST platform to
     group two calls into a single one (Minghao Chi)

   - Use GENMASK instead of bitmaps and validate the temperature after
     reading it in the imx8mm_thermal driver (Marcus Folkesson)

   - Convert generic-adc-thermal to DT schema (Rob Herring)

   - Fix debug print message with inverted logic in the k3_j72xx_bandgap
     driver (Keerthy)

   - Fix memory leak on thermal_of_zone_register() failure (Ido
     Schimmel)

   - Add support for IPQ8074 in the tsens thermal driver along with the
     DT bindings (Robert Marko)

   - Fix and rework the debugfs code in the tsens driver (Christian
     Marangi)

   - Add calibration and DT documentation for the imx8mm driver (Marek
     Vasut)

   - Add DT bindings and compatible for the Mediatek SoCs mt7981 and
     mt7983 (Daniel Golle)

   - Don't show an error message if it happens at probe time while it
     will be deferred on the QCom SPMI ADC driver (Johan Hovold)

   - Add HWMon support for the imx8mm board (Alexander Stein)

   - Remove pointless include from the power allocator governor
     (Christophe JAILLET)

   - Add interrupt DT bindings for QCom SoCs SC8280XP, SM6350 and SM8450
     (Krzysztof Kozlowski)

   - Fix inaccurate warning message for the QCom tsens gen2 (Luca Weiss)

   - Demote error log of thermal zone register to debug in the tsens
     QCom driver (Manivannan Sadhasivam)

   - Consolidate the the efuse values and the errata handling in the TI
     Bandgap driver (Bryan Brattlof)

   - Document Renesas RZ/Five as compatible with RZ/G2UL in the DT
     bindings (Lad Prabhakar)

   - Fix the irq handler return value in the LMh driver (Bjorn
     Andersson)

   - Delete empty platform remove callback from imx_sc_thermal (Uwe
     Kleine-König)"

* tag 'thermal-6.2-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: (35 commits)
  thermal/drivers/imx_sc_thermal: Drop empty platform remove function
  thermal/drivers/qcom/lmh: Fix irq handler return value
  dt-bindings: thermal: qcom-tsens: Add compatible for sm8550
  thermal/drivers/st: Use devm_platform_get_and_ioremap_resource()
  dt-bindings: thermal: rzg2l-thermal: Document RZ/Five SoC
  dt-bindings: thermal: k3-j72xx: conditionally require efuse reg range
  dt-bindings: thermal: k3-j72xx: elaborate on binding description
  thermal/drivers/k3_j72xx_bandgap: Map fuse_base only for erratum workaround
  thermal/drivers/k3_j72xx_bandgap: Remove fuse_base from structure
  thermal/drivers/k3_j72xx_bandgap: Use bool for i2128 erratum flag
  thermal/drivers/k3_j72xx_bandgap: Simplify k3_thermal_get_temp() function
  thermal/drivers/qcom: Demote error log of thermal zone register to debug
  thermal/drivers/qcom/temp-alarm: Fix inaccurate warning for gen2
  dt-bindings: thermal: qcom-tsens: narrow interrupts for SC8280XP, SM6350 and SM8450
  thermal/core/power allocator: Remove a useless include
  thermal/drivers/imx8mm: Add hwmon support
  thermal: qcom-spmi-adc-tm5: suppress probe-deferral error message
  dt-bindings: thermal: mediatek: add compatible string for MT7986 and MT7981 SoC
  thermal: ti-soc-thermal: Drop comma after SoC match table sentinel
  thermal/drivers/imx: Add support for loading calibration data from OCOTP
  ...
2022-12-15 10:16:04 -08:00
Srinivas Pandruvada
904f309ae7 thermal: intel: Don't set HFI status bit to 1
When CPU doesn't support HFI (Hardware Feedback Interface), don't include
BIT 26 in the mask to prevent clearing. otherwise this results in:
    unchecked MSR access error: WRMSR to 0x1b1
      (tried to write 0x0000000004000aa8)
      at rIP: 0xffffffff8b8559fe (throttle_active_work+0xbe/0x1b0)

Fixes: 6fe1e64b60 ("thermal: intel: Prevent accidental clearing of HFI status")
Reported-by: Linus Torvalds <torvalds@linux-foundation.org>
Tested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-14 14:50:15 +01:00
Linus Torvalds
691806e977 Thermal control updates for 6.2-rc1
- Fix race conditions related to thermal device operations that are not
    protected against thermal device removal (Guenter Roeck).
 
  - Fix error code in __thermal_cooling_device_register() (Dan Carpenter).
 
  - Validate new cooling device state (coming from user space) in
    cur_state_store() and reuse the max_state value from cooling device
    structure in the sysfs interface (Viresh Kumar).
 
  - Fix some possible name leaks in error paths in the thermal control
    core code (Yang Yingliang).
 
  - Detect TCC lock bit set in the intel_tcc_cooling driver and make it
    refuse to update the TCC offset in that case (Zhang Rui).
 
  - Add TCC cooling support for RaptorLake-S (Zhang Rui).
 
  - Prevent accidental clearing of HFI status by one of the other
    drivers using the same status register (Srinivas Pandruvada).
 
  - Protect clearing of thermal status bits in Intel thermal control
    drivers (Srinivas Pandruvada).
 
  - Allow the HFI thermal control driver to ACK an HFI event for the
    previously observed timestamp (Srinivas Pandruvada).
 
  - Remove a pointless die_id check from the HFI thermal driver and
    adjust the definition a data structure used by it (Ricardo Neri).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmOXWUoSHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxAhoQAI3OFLiQ/boXpTUgPFWbio+E5RZbDGTa
 Tb38JEAhREpXoce3QiT1ZDqoLUGS9tMpyfIBu8QkUXtgdyHkxFJc/AqwvFKUUmVN
 oS5GMn6s3ufQKTeiFoBd/hJ6zaGwcW4e/2QXzTZjq8oUaI0yca0HCc0QUYx6kWF6
 HqSD2KtFRVw4xe6uqZmwTVQddyIqA5FWRaZvhGolDPmVpg8NRTYVbsOzMf/StnRu
 lXyGrgk1tNIkJnGlOKnqsV9xIsH5iS8SRAgZaUc9wK/Wzor486VzH4TdVn6oYdUn
 fGCA1W982VX+mUlvVZQpyjGj8bL1JylZZSAcn8cg5cyKnTP1OinM6j/TIKc1JHK3
 ZFTvTxLuQbJD90frCKub/iIigcTX/GTk8iVRSuzmmzUiBkZ/3/15X6PVA15OEcQU
 zzHaiz0lir7LP0l2BhPDNbfTa1wJkiAtxVW7hcYUOtxL8S8vBldSEB9Kr0+e9A6i
 Lis8OOFS2nZYbGZxplX68JXp3Lo42pcF2QcdekU+iOp0eLziLc2WCB7Lj97TWJoR
 QAeyYf4eAsdKrW6vAiB4ygVxNDm4RPmesqlaNtlfhsFLUY7b5tc5vTinv6it5ktE
 CCNEi0N0NCIGifkIBnuthjToPsvhrTaYF9C1BPTi7p6mZp6VVacScA/qZ3rw2DG6
 W7FV8CJt5m5/
 =ZdzX
 -----END PGP SIGNATURE-----

Merge tag 'thermal-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control updates from Rafael Wysocki:
 "These include thermal core fixes to protect thermal device operations
  against thermal device removal, other thermal core fixes and updates
  of Intel thermal control drivers.

  Specifics:

   - Fix race conditions related to thermal device operations that are
     not protected against thermal device removal (Guenter Roeck)

   - Fix error code in __thermal_cooling_device_register() (Dan
     Carpenter)

   - Validate new cooling device state (coming from user space) in
     cur_state_store() and reuse the max_state value from cooling device
     structure in the sysfs interface (Viresh Kumar)

   - Fix some possible name leaks in error paths in the thermal control
     core code (Yang Yingliang)

   - Detect TCC lock bit set in the intel_tcc_cooling driver and make it
     refuse to update the TCC offset in that case (Zhang Rui)

   - Add TCC cooling support for RaptorLake-S (Zhang Rui)

   - Prevent accidental clearing of HFI status by one of the other
     drivers using the same status register (Srinivas Pandruvada)

   - Protect clearing of thermal status bits in Intel thermal control
     drivers (Srinivas Pandruvada)

   - Allow the HFI thermal control driver to ACK an HFI event for the
     previously observed timestamp (Srinivas Pandruvada)

   - Remove a pointless die_id check from the HFI thermal driver and
     adjust the definition a data structure used by it (Ricardo Neri)"

* tag 'thermal-6.2-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: intel: hfi: Remove a pointless die_id check
  thermal: core: fix some possible name leaks in error paths
  thermal: intel: hfi: ACK HFI for the same timestamp
  thermal: intel: Protect clearing of thermal status bits
  thermal: intel: Prevent accidental clearing of HFI status
  thermal/core: Protect thermal device operations against thermal device removal
  thermal/core: Remove thermal_zone_set_trips()
  thermal/core: Protect sysfs accesses to thermal operations with thermal zone mutex
  thermal/core: Protect hwmon accesses to thermal operations with thermal zone mutex
  thermal/core: Introduce locked version of thermal_zone_device_update
  thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp
  thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp
  thermal/core: Delete device under thermal device zone lock
  thermal/core: Destroy thermal zone device mutex in release function
  thermal: intel: intel_tcc_cooling: Add TCC cooling support for RaptorLake-S
  thermal: intel: intel_tcc_cooling: Detect TCC lock bit
  thermal: intel: hfi: Improve the type of hfi_features::nr_table_pages
  thermal/core: fix error code in __thermal_cooling_device_register()
  thermal: sysfs: Reuse cdev->max_state
  thermal: Validate new state in cur_state_store()
2022-12-12 13:45:21 -08:00
Ricardo Neri
3a3073b69c thermal: intel: hfi: Remove a pointless die_id check
die_id is an u16 quantity. On single-die systems the default value of
die_id is 0. No need to check for negative values.

Plus, removing this check makes Coverity happy.

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-12-02 20:47:52 +01:00
Srinivas Pandruvada
c0e3acdcde thermal: intel: hfi: ACK HFI for the same timestamp
Some processors issue more than one HFI interrupt with the same
timestamp. Each interrupt must be acknowledged to let the hardware issue
new HFI interrupts. But this can't be done without some additional flow
modification in the existing interrupt handling.

For background, the HFI interrupt is a package level thermal interrupt
delivered via a LVT. This LVT is common for both the CPU and package
level interrupts. Hence, all CPUs receive the HFI interrupts. But only
one CPU should process interrupt and others simply exit by issuing EOI
to LAPIC.

The current HFI interrupt processing flow:

  1. Receive Thermal interrupt
  2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS
  3. Try and get spinlock, one CPU will enter spinlock and others
     will simply return from here to issue EOI.
    (Let's assume CPU 4 is processing interrupt)
  4. Check the stored time-stamp from the HFI memory time-stamp
  5. if same
  6.      ignore interrupt, unlock and return
  7. Copy the HFI message to local buffer
  8. unlock spinlock
  9. ACK HFI interrupt
 10. Queue the message for processing in a work-queue

It is tempting to simply acknowledge all the interrupts even if they
have the same timestamp. This may cause some interrupts to not be
processed.

Let's say CPU5 is slightly late and reaches step 4 while CPU4 is
between steps 8 and 9.

Currently we simply ignore interrupts with the same timestamp. No
issue here for CPU5. When CPU4 acknowledges the interrupt, the next
HFI interrupt can be delivered.

If we acknowledge interrupts with the same timestamp (at step 6), there
is a race condition. Under the same scenario, CPU 5 will acknowledge
the HFI interrupt. This lets hardware generate another HFI interrupt,
before CPU 4 start executing step 9. Once CPU 4 complete step 9, it
will acknowledge the newly arrived HFI interrupt, without actually
processing it.

Acknowledge the interrupt when holding the spinlock. This avoids
contention of the interrupt acknowledgment.

Updated flow:

  1. Receive HFI Thermal interrupt
  2. Check if there is an active HFI status in MSR_IA32_THERM_STATUS
  3. Try and get spin-lock
     Let's assume CPU 4 is processing interrupt
  4.1 Read MSR_IA32_PACKAGE_THERM_STATUS and check HFI status bit
  4.2	If hfi status is 0
  4.3		unlock spinlock
  4.4		return
  4.5 Check the stored time-stamp from the HFI memory time-stamp
  5. if same
  6.1      ACK HFI Interrupt,
  6.2	unlock spinlock
  6.3	return
  7. Copy the HFI message to local buffer
  8. ACK HFI interrupt
  9. unlock spinlock
 10. Queue the message for processing in a work-queue

To avoid taking the lock unnecessarily, intel_hfi_process_event() checks
the status of the HFI interrupt before taking the lock. If CPU5 is late,
when it starts processing the interrupt there are two scenarios:

 a) CPU4 acknowledged the HFI interrupt before CPU5 read
    MSR_IA32_THERM_STATUS. CPU5 exits.

 b) CPU5 reads MSR_IA32_THERM_STATUS before CPU4 has acknowledged the
    interrupt. CPU5 will take the lock if CPU4 has released it. It then
    re-reads MSR_IA32_THERM_STATUS. If there is not a new interrupt,
    the HFI status bit is clear and CPU5 exits. If a new HFI interrupt
    was generated it will find that the status bit is set and it will
    continue to process the interrupt. In this case even if timestamp
    is not changed, the ACK can be issued as this is a new interrupt.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Tested-by: Arshad, Adeel<adeel.arshad@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23 20:13:22 +01:00
Srinivas Pandruvada
930d06bf07 thermal: intel: Protect clearing of thermal status bits
The clearing of the package thermal status is done by Read-Modify-Write
operation. This may result in clearing of some new status bits which are
being or about to be processed.

For example, while clearing of HFI status, after read of thermal status
register, a new thermal status bit is set by the hardware. But during
write back, the newly generated status bit will be set to 0 or cleared.
So, it is not safe to do read-modify-write.

Since thermal status Read-Write bits can be set to only 0 not 1, it is
safe to set all other bits to 1 which are not getting cleared.

Create a common interface for clearing package thermal status bits. Use
this interface to replace existing code to clear thermal package status
bits.

It is safe to call from different CPUs without protection as there is no
read-modify-write. Also wrmsrl results in just single instruction. For
example while CPU 0 and CPU 3 are clearing bit 1 and 3 respectively. If
CPU 3 wins the race, it will write 0x4000aa2, then CPU 1 will write
0x4000aa8. The bits which are not part of clear are set to 1. The default
mask for bits, which can be written here is 0x4000aaa.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23 20:09:06 +01:00
Srinivas Pandruvada
6fe1e64b60 thermal: intel: Prevent accidental clearing of HFI status
When there is a package thermal interrupt with PROCHOT log, it will be
processed and cleared. It is possible that there is an active HFI event
status, which is about to get processed or getting processed. While
clearing PROCHOT log bit, it will also clear HFI status bit. This means
that hardware is free to update HFI memory.

When clearing a package thermal interrupt, some processors will generate
a "general protection fault" when any of the read only bit is set to 1.

The driver maintains a mask of all read-write bits which can be set.

This mask doesn't include HFI status bit. This bit will also be cleared,
as it will be assumed read-only bit. So, add HFI status bit 26 to the
mask.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23 20:09:06 +01:00
Dawei Li
6c0eb5ba35 ACPI: make remove callback of ACPI driver void
For bus-based driver, device removal is implemented as:
1 device_remove()->
2   bus->remove()->
3     driver->remove()

Driver core needs no inform from callee(bus driver) about the
result of remove callback. In that case, commit fc7a6209d5
("bus: Make remove callback return void") forces bus_type::remove
be void-returned.

Now we have the situation that both 1 & 2 of calling chain are
void-returned, so it does not make much sense for 3(driver->remove)
to return non-void to its caller.

So the basic idea behind this change is making remove() callback of
any bus-based driver to be void-returned.

This change, for itself, is for device drivers based on acpi-bus.

Acked-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de>
Acked-by: Lee Jones <lee@kernel.org>
Acked-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Reviewed-by: Hans de Goede <hdegoede@redhat.com>
Signed-off-by: Dawei Li <set_pte_at@outlook.com>
Reviewed-by: Maximilian Luz <luzmaximilian@gmail.com>  # for drivers/platform/surface/*
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-23 19:11:22 +01:00
Zhang Rui
e77f069fd6 thermal: intel: intel_tcc_cooling: Add TCC cooling support for RaptorLake-S
Add RaptorLake to the list of processor models supported by the Intel
TCC cooling driver.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-09 14:58:02 +01:00
Zhang Rui
be6abd3ed6 thermal: intel: intel_tcc_cooling: Detect TCC lock bit
When MSR_IA32_TEMPERATURE_TARGET is locked, TCC Offset can not be
updated even if the PROGRAMMABE Bit is set.

Yield the driver on platforms with MSR_IA32_TEMPERATURE_TARGET locked.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-09 14:58:02 +01:00
Ricardo Neri
54d9135cf2 thermal: intel: hfi: Improve the type of hfi_features::nr_table_pages
A Coverity static code scan raised a potential overflow_before_widen
warning when hfi_features::nr_table_pages is used as an argument to
memcpy in intel_hfi_process_event().

Even though the overflow can never happen (the maximum number of pages of
the HFI table is 0x10 and 0x10 << PAGE_SHIFT = 0x10000), using size_t as
the data type of hfi_features::nr_table_pages makes Coverity happy and
matches the data type of the argument 'size' of memcpy().

Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-10-28 20:11:48 +02:00
Rafael J. Wysocki
4bb7f6c278 thermal: intel_powerclamp: Use first online CPU as control_cpu
Commit 68b99e94a4 ("thermal: intel_powerclamp: Use get_cpu() instead
of smp_processor_id() to avoid crash") fixed an issue related to using
smp_processor_id() in preemptible context by replacing it with a pair
of get_cpu()/put_cpu(), but what is needed there really is any online
CPU and not necessarily the one currently running the code.  Arguably,
getting the one that's running the code in there is confusing.

For this reason, simply give the control CPU role to the first online
one which automatically will be CPU0 if it is online, so one check
can be dropped from the code for an added benefit.

Link: https://lore.kernel.org/linux-pm/20221011113646.GA12080@duo.ucw.cz/
Fixes: 68b99e94a4 ("thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Chen Yu <yu.c.chen@intel.com>
2022-10-15 19:33:57 +02:00
Shang XiaoJing
53e41b8593 thermal: int340x: processor_thermal: Use module_pci_driver() macro
Since PCI provides helper macro module_pci_driver(), the
module_init/exit code can be replaced with it.

Signed-off-by: Shang XiaoJing <shangxiaojing@huawei.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-24 19:22:09 +02:00
Srinivas Pandruvada
c4e927da89 thermal: intel_powerclamp: Remove accounting for IRQ wakes
There is a static variable "idle_wakeup_counter", which accounts for
number of wake ups because of IRQs and take actions to compensate idle
injection. This is now read and reset to 0, but never incremented.
So all the usage of this counter for idle injection has no use.

Also another static variable "reduce_irq", which depends on
"idle_wakeup_counter", so remove usage of "reduce_irq" also.

Commit feb6cd6a0f ("thermal/intel_powerclamp: stop sched tick in
forced idle") replaced the local use of "mwait_idle_with_hints" with
play_idle(). This removed possibility of updating "idle_wakeup_counter"
without change in play_idle(). This change was made in Linux 4.10.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-21 20:31:05 +02:00
Srinivas Pandruvada
68b99e94a4 thermal: intel_powerclamp: Use get_cpu() instead of smp_processor_id() to avoid crash
When CPU 0 is offline and intel_powerclamp is used to inject
idle, it generates kernel BUG:

BUG: using smp_processor_id() in preemptible [00000000] code: bash/15687
caller is debug_smp_processor_id+0x17/0x20
CPU: 4 PID: 15687 Comm: bash Not tainted 5.19.0-rc7+ #57
Call Trace:
<TASK>
dump_stack_lvl+0x49/0x63
dump_stack+0x10/0x16
check_preemption_disabled+0xdd/0xe0
debug_smp_processor_id+0x17/0x20
powerclamp_set_cur_state+0x7f/0xf9 [intel_powerclamp]
...
...

Here CPU 0 is the control CPU by default and changed to the current CPU,
if CPU 0 offlined. This check has to be performed under cpus_read_lock(),
hence the above warning.

Use get_cpu() instead of smp_processor_id() to avoid this BUG.

Suggested-by: Chen Yu <yu.c.chen@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-21 20:27:06 +02:00
Rafael J. Wysocki
e9a7c526c2 thermal: int340x_thermal: Consolidate priv->data_vault checks
It is sufficient to check priv->data_vault once in the error code path
of int3400_thermal_probe(), so do that.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-09-03 20:23:07 +02:00
Lee, Chun-Yi
7931e28098 thermal/int340x_thermal: handle data_vault when the value is ZERO_SIZE_PTR
In some case, the GDDV returns a package with a buffer which has
zero length. It causes that kmemdup() returns ZERO_SIZE_PTR (0x10).

Then the data_vault_read() got NULL point dereference problem when
accessing the 0x10 value in data_vault.

[   71.024560] BUG: kernel NULL pointer dereference, address:
0000000000000010

This patch uses ZERO_OR_NULL_PTR() for checking ZERO_SIZE_PTR or
NULL value in data_vault.

Signed-off-by: "Lee, Chun-Yi" <jlee@suse.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-08-23 19:50:43 +02:00
Sumeet Pawnikar
312c1a44da thermal: intel: Add TCC cooling support for Alder Lake-N and Raptor Lake-P
Add Alder Lake-N and Raptor Lake-P to the list of processor models
supported by the Intel TCC cooling driver.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-08-03 19:11:38 +02:00
Rafael J. Wysocki
b3ca7aff3c intel: thermal: PCH: Drop ACPI_FADT_LOW_POWER_S0 check
If ACPI_FADT_LOW_POWER_S0 is not set, this doesn't mean that low-power
S0 idle is not usable.  It merely means that using S3 on the given
system is more beneficial from the energy saving perspective than using
low-power S0 idle, as long as S3 is supported.

Suspend-to-idle is still a valid suspend mode if ACPI_FADT_LOW_POWER_S0
is not set and the pm_suspend_via_firmware() check in pch_wpt_suspend()
is sufficient to distinguish suspend-to-idle from S3, so drop the
confusing ACPI_FADT_LOW_POWER_S0 check.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
2022-07-22 21:32:47 +02:00
Jiang Jian
06d9fb48a8 thermal: intel: x86_pkg_temp_thermal: Drop duplicate 'is' from comment
There is an unexpected word 'is' in a comments that need to be dropped

file: ./drivers/thermal/intel/x86_pkg_temp_thermal.c
line: 108

* tj-max is is interesting because threshold is set relative to this

changed to:

* tj-max is interesting because threshold is set relative to this

Signed-off-by: Jiang Jian <jiangjian@cdjrlc.com>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-07-12 20:23:16 +02:00
Linus Torvalds
1ce8c443e9 Thermal control update for 5.19-rc5
Add a new CPU ID to the list of supported processors in the
 intel_tcc_cooling driver (Sumeet Pawnikar).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmK/TzESHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRxwPcQALCM2DfwhD+6lvO2bc0e8gAVYRPwekfa
 dZYK6e+b2HBnekzGOpOCSF58qB8jd4hxk2ODJ7RXUSkvtNfLXVDTotZ74Ggr0WN3
 JSoSW4OTpds844HiMzaal1WOxc6DEWydCPliSryqyPYLjyxMJjFd/sfsSJRSANs1
 q7ySDj+qdSNzSrsFkJ5Ia+W687lBFG3AJBOQ65Fuh8sfJDVsPOcyaWKYcdXUYZVn
 bPNm7ll9EWYUYE8lAgWHTgH/keIYTMnyJbfcmqLXfp0jiYAfyKm4559Rf0YiCSX9
 Hix5cvhlLBK7exHE7IUBY8k5pC1UxhIWUBlWf96vxGz3XY4hMgXaWUp6w/MOFjqa
 ht3aobhvtPv6a3rzCpH3pZTLHuoILwR9pH7kq5aE562K687m2T9vsP+Wvn/5Oj7d
 OmThKDuqeWq6FuVsTvBvBFHRtVZ+olUPx0R9aKhFtFDH1bxjwRXX0uEHQu12M8Ww
 QwgRmkSSJeCd+iBtSjMXorCS5vjztCQDPZgvD8xH3sg0G5f9Ha/QHJbbpDUv2TaC
 P6nghyqeFla8EA21Xg973UxJaajfnal2/FIDmBvoyEhLoQ5M5hs06QjIaY3HA9Js
 zNI/4qjLfTR7HZoCy4CDttSA0pgdCHRfNY673QfN6F7sO5kZn9FGJPp7F/qsruO6
 3f5kh/HcmQpy
 =F4yU
 -----END PGP SIGNATURE-----

Merge tag 'thermal-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull thermal control fix from Rafael Wysocki:
 "Add a new CPU ID to the list of supported processors in the
  intel_tcc_cooling driver (Sumeet Pawnikar)"

* tag 'thermal-5.19-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: intel_tcc_cooling: Add TCC cooling support for RaptorLake
2022-07-01 13:00:47 -07:00
Sumeet Pawnikar
62f46fc7b8 thermal: intel_tcc_cooling: Add TCC cooling support for RaptorLake
Add RaptorLake to the list of processor models supported by the Intel
TCC cooling driver.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
[ rjw: Subject edits, new changelog ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-06-30 19:48:44 +02:00
Linus Torvalds
32665a9e54 Additional thermal control update for 5.19-rc1
Add Meteor Lake PCI device ID to the int340x thermal control
 driver (Sumeet Pawnikar).
 -----BEGIN PGP SIGNATURE-----
 
 iQJGBAABCAAwFiEE4fcc61cGeeHD/fCwgsRv/nhiVHEFAmKU8b0SHHJqd0Byand5
 c29ja2kubmV0AAoJEILEb/54YlRx5iYP/2XlliK93GruPtohHKfvBXA7GWBunDqe
 2OWWGXNKOojZxkpVr+ek96MBjjmnqxMt9cXhqCQMyEYJgp1EePdDc3ixt/p8WDC5
 oxSYXhNXe5dXz/GHrwUTp5xQdkOvJNs1PqPQmPCdsUWhNMiGJ1wN0v6HRb4qoTce
 /4/zCj4LHk71fYh+A4zma7dY1SnE5RG7JlfLe1TIPHMEjx6QOeoywroqjifn6jZ2
 9AjSl3crAz6tP72ng/QL3bG6j8p6CKbT6xmBD47SrHOVbwkDe9ZdTD7m1H4kyU2c
 sFwGUix5HJK+LR4zWmeYG8kXzbNScaDIsyyxdRCyB+kOl8IrvCEY++gBW+3ALHWp
 HLqMN3lzEOi9VO0hYmmbWMbtwndjXqtLKauU9e5WMjW3dKHkslH1seAlY8H3vmal
 czu+jZZMlu3XTMAPo9JD4ycBw/pNdk1eLi0KSerSsuHWOz0vK/cdPL4ew4undolz
 Y2AwkGmTO7dyFmhG5jqIlYpeUD7QgujARmdxGhDTWlx6eZ7YiU7bFM2jom+dN+fU
 HvRPSleGVs1vpCwyqQsXqyf5I8t7AEh/inHok3YNetGe7Su0RgJvPNgbns3qurHy
 MGv8WKHAKHW0VTRZfBrPr8ko/b6DzhlBQEyvAMqQN6vjvznKW8ckZpmVDv+cnN/V
 quX/Qo+UHuD7
 =lIzD
 -----END PGP SIGNATURE-----

Merge tag 'thermal-5.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm

Pull additional thermal control update from Rafael Wysocki:
 "Add Meteor Lake PCI device ID to the int340x thermal control driver
  (Sumeet Pawnikar)"

* tag 'thermal-5.19-rc1-2' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm:
  thermal: int340x: Add Meteor Lake PCI device ID
2022-05-30 11:34:13 -07:00
Sumeet Pawnikar
3c1d004bdb thermal: int340x: Add Meteor Lake PCI device ID
Add Meteor Lake PCI ID for processor thermal device.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-25 15:44:58 +02:00
Sumeet Pawnikar
657b95d34b ACPI: DPTF: Support Meteor Lake
Add Meteor Lake ACPI IDs for DPTF devices.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-25 15:37:07 +02:00
Rafael J. Wysocki
bbb544f334 Merge branches 'thermal-int340x', 'thermal-pch' and 'thermal-misc'
Merge int340x thermal driver updates, PCH thermal driver updates and
miscellaneous thermal control updates for 5.19-rc1:

 - Clean up _OSC handling in int340x (Davidlohr Bueso).

 - Improve overheat condition handling during suspend-to-idle in the
   Intel PCH thermal driver (Zhang Rui).

 - Use local ops instead of global ops in devfreq_cooling (Kant Fan).

 - Switch hisi_termal from CONFIG_PM_SLEEP guards to pm_sleep_ptr()
   (Hesham Almatary)

* thermal-int340x:
  thermal: int340x: Clean up _OSC context init
  thermal: int340x: Consolidate freeing of acpi_buffer pointer
  thermal: int340x: Clean up unnecessary acpi_buffer pointer freeing

* thermal-pch:
  thermal: intel: pch: improve the cooling delay log
  thermal: intel: pch: enhance overheat handling
  thermal: intel: pch: move cooling delay to suspend_noirq phase
  PM: wakeup: expose pm_wakeup_pending to modules

* thermal-misc:
  thermal: devfreq_cooling: use local ops instead of global ops
  thermal: hisi_termal: Switch from CONFIG_PM_SLEEP guards to pm_sleep_ptr()
2022-05-23 20:32:58 +02:00
Rafael J. Wysocki
388292df27 Merge back earlier thermal control updates for 5.19-rc1. 2022-05-23 20:31:57 +02:00
Zhang Rui
bd30d075ee thermal: intel: pch: improve the cooling delay log
Previously, during suspend, intel_pch_thermal driver logs for every
cooling iteration, about the current PCH temperature and number of cooling
iterations that have been tried, like below

[  100.955526] intel_pch_thermal 0000:00:14.2: CPU-PCH current temp [53C] higher than the threshold temp [50C], sleep 1 times for 100 ms duration
[  101.064156] intel_pch_thermal 0000:00:14.2: CPU-PCH current temp [53C] higher than the threshold temp [50C], sleep 2 times for 100 ms duration

After changing the default delay_cnt to 600, in practice, it is common to
see tens of the above messages if the system is suspended when PCH
overheats. Thus, change this log message from dev_warn to dev_dbg because
it is only useful when we want to check the temperature trend.

At the same time, there is always a one-line message given by the driver
with the patch applied, with below four possibilities.

1. PCH is cool, no cooling delay needed
[ 1791.902853] intel_pch_thermal 0000:00:12.0: CPU-PCH is cool [48C]

2. PCH overheats and becomes cool after the cooling delays
[ 1475.511617] intel_pch_thermal 0000:00:12.0: CPU-PCH is cool [49C] after 30700 ms delay

3. PCH still overheats after the overall cooling timeout
[ 2250.157487] intel_pch_thermal 0000:00:12.0: CPU-PCH is hot [60C] after 60000 ms delay. S0ix might fail

4. PCH aborts cooling because of wakeup event detected during the delay
[ 1933.639509] intel_pch_thermal 0000:00:12.0: Wakeup event detected, abort cooling

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-19 19:40:25 +02:00
Zhang Rui
92923028e9 thermal: intel: pch: enhance overheat handling
Commit ef63b043ac ("thermal: intel: pch: fix S0ix failure due to PCH
temperature above threshold") introduces delay loop mechanism that allows
PCH temperature to go down below threshold during suspend so it won't
block S0ix. And the default overall delay timeout is 1 second.

However, in practice, we found that the time it takes to cool the PCH down
below threshold highly depends on the initial PCH temperature when the
delay starts, as well as the ambient temperature.
And in some cases, the 1 second delay is not sufficient. As a result, the
system stays in a shallower power state like PCx instead of S0ix, and
drains the battery power, without user' notice.

To make sure S0ix is not blocked by the PCH overheating, we
1. expand the default overall timeout to 60 seconds.
2. make sure the temperature is below threshold rather than equal to it.

At the same time, as the cooling delay can be much longer and many wakeup
events (ACPI Power Button press, USB mouse move, etc) becomes valid in the
suspend_noirq phase, add detection of wakeup event so that the driver
does not delay blindly when the system suspend is likely to abort soon.

This patch may introduce longer suspend time, but only in the cases when
the system overheats and Linux used to enter a shallower S2idle state,
say, PCx instead of S0ix.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-19 19:40:25 +02:00
Zhang Rui
28708e1937 thermal: intel: pch: move cooling delay to suspend_noirq phase
Move the PCH Thermal driver suspend callback to suspend_noirq to do
cooling while the system is more quiescent.

Signed-off-by: Zhang Rui <rui.zhang@intel.com>
Tested-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-19 19:40:25 +02:00
Haowen Bai
5a66bfb277 thermal: intel: hfi: remove NULL check after container_of() call
container_of() will never return NULL, so remove useless code.

Signed-off-by: Haowen Bai <baihaowen@meizu.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-18 20:53:05 +02:00
Rafael J. Wysocki
7acc8a2ac0 Merge back earlier int340x driver changes for 5.19. 2022-05-18 13:11:19 +02:00
Srinivas Pandruvada
7b145802ba thermal: int340x: Mode setting with new OS handshake
With the new OS handshake introduced by commit: "c7ff29763989 ("thermal:
int340x: Update OS policy capability handshake")", the "enabled" thermal
zone mode doesn't work in the same way as previously.

The "enabled" mode fails with -EINVAL when the new handshake is used.

To address this issue, when the new OS UUID mask is set:

 - When the mode is "enabled", return 0 as the firmware already has the
   latest policy mask.

 - When the mode is "disabled", update the firmware with the UUID mask
   of zero.

This way, the firmware can take over the thermal control.

Also reset the OS UUID mask, which allows user space to update with new
set of policies.

Fixes: c7ff297639 ("thermal: int340x: Update OS policy capability handshake")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog edits, removed unneeded parens ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-05-11 20:08:15 +02:00
Rafael J. Wysocki
be60348a82 Merge back earlier int340x thermal driver changes for 5.19. 2022-05-05 14:25:13 +02:00
Kees Cook
d0f6cfb2bd thermal: int340x: Fix attr.show callback prototype
Control Flow Integrity (CFI) instrumentation of the kernel noticed that
the caller, dev_attr_show(), and the callback, odvp_show(), did not have
matching function prototypes, which would cause a CFI exception to be
raised. Correct the prototype by using struct device_attribute instead
of struct kobj_attribute.

Reported-and-tested-by: Joao Moreira <joao@overdrivepizza.com>
Link: https://lore.kernel.org/lkml/067ce8bd4c3968054509831fa2347f4f@overdrivepizza.com/
Fixes: 006f006f1e ("thermal/int340x_thermal: Export OEM vendor variables")
Cc: 5.8+ <stable@vger.kernel.org> # 5.8+
Signed-off-by: Kees Cook <keescook@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-21 20:13:47 +02:00
Davidlohr Bueso
ad47f8343a thermal: int340x: Clean up _OSC context init
Now that the UUID is already sanitized by the caller,
lets trivially clean up some of the context arming.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-05 20:25:21 +02:00
Davidlohr Bueso
9e5d3d6be6 thermal: int340x: Consolidate freeing of acpi_buffer pointer
Introduce a single point of freeing/exit after ensuring no error in
int3400_setup_gddv().

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-05 20:25:21 +02:00
Davidlohr Bueso
bdff938d04 thermal: int340x: Clean up unnecessary acpi_buffer pointer freeing
It is the caller's responsibility to free only upon ACPI_SUCCESS.

Signed-off-by: Davidlohr Bueso <dave@stgolabs.net>
Acked-by: Zhang Rui <rui.zhang@intel.com>
[ rjw: Subject edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-04-05 20:25:21 +02:00
Rafael J. Wysocki
31035f3e20 Merge branch 'thermal-hfi'
Merge Intel Hardware Feedback Interface (HFI) thermal driver for
5.18-rc1 and update the intel-speed-select utility to support that
driver.

* thermal-hfi:
  tools/power/x86/intel-speed-select: v1.12 release
  tools/power/x86/intel-speed-select: HFI support
  tools/power/x86/intel-speed-select: OOB daemon mode
  thermal: intel: hfi: INTEL_HFI_THERMAL depends on NET
  thermal: netlink: Fix parameter type of thermal_genl_cpu_capability_event() stub
  thermal: intel: hfi: Notify user space for HFI events
  thermal: netlink: Add a new event to notify CPU capabilities change
  thermal: intel: hfi: Enable notification interrupt
  thermal: intel: hfi: Handle CPU hotplug events
  thermal: intel: hfi: Minimally initialize the Hardware Feedback Interface
  x86/cpu: Add definitions for the Intel Hardware Feedback Interface
  x86/Documentation: Describe the Intel Hardware Feedback Interface
2022-03-18 19:00:26 +01:00
Rafael J. Wysocki
2d6fc1455f Merge branches 'thermal-powerclamp', 'thermal-int340x' and 'thermal-docs'
Merge powerclamp thermal driver changes, int340x thermal driver changes
and thermal documentation changes for 5.18-rc1:

 - Don't use bitmap_weight() in end_power_clamp() in the powerclamp
   driver (Yury Norov).

 - Update the OS policy capabilities handshake in the int340x thermal
   driver (Srinivas Pandruvada).

 - Increase the policies bitmap size in int340x (Srinivas Pandruvada).

 - Replace acpi_bus_get_device() with acpi_fetch_acpi_dev() in the
   int340x thermal driver (Rafael Wysocki).

 - Check for NULL after calling kmemdup() in int340x (Jiasheng Jiang).

 - Add Intel Dynamic Power and Thermal Framework (DPTF) kernel interface
   documentation (Srinivas Pandruvada).

 - Fix bullet list warning in the thermal documentation (Randy Dunlap).

* thermal-powerclamp:
  thermal: intel_powerclamp: don't use bitmap_weight() in end_power_clamp()

* thermal-int340x:
  thermal: int340x: Update OS policy capability handshake
  thermal: int340x: Increase bitmap size
  thermal: Replace acpi_bus_get_device()
  thermal: int340x: Check for NULL after calling kmemdup()

* thermal-docs:
  Documentation: thermal: DPTF Documentation
  thermal: fix Documentation bullet list warning
2022-03-18 18:53:02 +01:00
Srinivas Pandruvada
c7ff297639 thermal: int340x: Update OS policy capability handshake
Update the firmware with OS supported policies mask, so	that firmware can
relinquish its internal controls. Without this update several Tiger Lake
laptops gets performance limited with in few seconds of executing in
turbo region.

The existing way of enumerating firmware policies via IDSP method and
selecting policy by directly writing those policy UUIDS via _OSC method
is not supported in newer generation of hardware.

There is a new UUID "B23BA85D-C8B7-3542-88DE-8DE2FFCFD698" is defined for
updating policy capabilities. As part of ACPI _OSC method:

Arg0 - UUID: B23BA85D-C8B7-3542-88DE-8DE2FFCFD698
Arg1 - Rev ID: 1
Arg2 - Count: 2
Arg3 - Capability buffers: Array of Arg2 DWORDS

DWORD1: As defined in the ACPI 5.0 Specification
- Bit 0: Query Flag
- Bits 1-3: Always 0
- Bits 4-31: Reserved

DWORD2 and beyond:
- Bit0: set to 1 to indicate Intel(R) Dynamic Tuning is active, 0 to
indicate it is disabled and legacy thermal mechanism should
be enabled.
- Bit1: set to 1 to indicate Intel(R) Dynamic Tuning is controlling
active cooling, 0 to indicate bios shall enable legacy thermal
zone with active trip point.
- Bit2: set to 1 to indicate Intel(R) Dynamic Tuning is controlling
passive cooling, 0 to indicate bios shall enable legacy thermal
zone with passive trip point.
- Bit3: set to 1 to indicate Intel(R) Dynamic Tuning is handling
critical trip point, 0 to indicate bios shall enable legacy
thermal zone with critical trip point.
- Bits 4:31: Reserved

From sysfs interface, there is an existing interface to update policy
UUID using attribute "current_uuid". User space can write the same UUID
for ACTIVE, PASSIVE and CRITICAL policy. Driver converts these UUIDs to
DWORD2 Bit 1 to Bit 3. When any of the policy is activated by user
space it is assumed that dynamic tuning is active.

For example
$cd /sys/bus/platform/devices/INTC1040:00/uuids
To support active policy
$echo "3A95C389-E4B8-4629-A526-C52C88626BAE" > current_uuid
To support passive policy
$echo "42A441D6-AE6A-462b-A84B-4A8CE79027D3" > current_uuid
To support critical policy
$echo "97C68AE7-15FA-499c-B8C9-5DA81D606E0A" > current_uuid

To check all the supported policies
$cat current_uuid
3A95C389-E4B8-4629-A526-C52C88626BAE
42A441D6-AE6A-462b-A84B-4A8CE79027D3
97C68AE7-15FA-499c-B8C9-5DA81D606E0A

To match the bit format for DWORD2, rearranged enum int3400_thermal_uuid
and int3400_thermal_uuids[] by swapping current INT3400_THERMAL_ACTIVE
and INT3400_THERMAL_PASSIVE_1.

If the policies are enumerated via IDSP method then legacy method is
used, if not the new method is used to update policy support.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-16 19:40:41 +01:00
Srinivas Pandruvada
668f69a5f8 thermal: int340x: Increase bitmap size
The number of policies are 10, so can't be supported by the bitmap size
of u8.

Even though there are no platfoms with these many policies, but
for correctness increase to u32.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Fixes: 16fc8eca19 ("thermal/int340x_thermal: Add additional UUIDs")
Cc: 5.1+ <stable@vger.kernel.org> # 5.1+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-03-16 19:36:10 +01:00
Rafael J. Wysocki
ec52cd3fa1 Merge back int340x thermal driver changes for v5.18. 2022-02-28 20:46:53 +01:00
Chuansheng Liu
3abea10e6a thermal: int340x: fix memory leak in int3400_notify()
It is easy to hit the below memory leaks in my TigerLake platform:

unreferenced object 0xffff927c8b91dbc0 (size 32):
  comm "kworker/0:2", pid 112, jiffies 4294893323 (age 83.604s)
  hex dump (first 32 bytes):
    4e 41 4d 45 3d 49 4e 54 33 34 30 30 20 54 68 65  NAME=INT3400 The
    72 6d 61 6c 00 6b 6b 6b 6b 6b 6b 6b 6b 6b 6b a5  rmal.kkkkkkkkkk.
  backtrace:
    [<ffffffff9c502c3e>] __kmalloc_track_caller+0x2fe/0x4a0
    [<ffffffff9c7b7c15>] kvasprintf+0x65/0xd0
    [<ffffffff9c7b7d6e>] kasprintf+0x4e/0x70
    [<ffffffffc04cb662>] int3400_notify+0x82/0x120 [int3400_thermal]
    [<ffffffff9c8b7358>] acpi_ev_notify_dispatch+0x54/0x71
    [<ffffffff9c88f1a7>] acpi_os_execute_deferred+0x17/0x30
    [<ffffffff9c2c2c0a>] process_one_work+0x21a/0x3f0
    [<ffffffff9c2c2e2a>] worker_thread+0x4a/0x3b0
    [<ffffffff9c2cb4dd>] kthread+0xfd/0x130
    [<ffffffff9c201c1f>] ret_from_fork+0x1f/0x30

Fix it by calling kfree() accordingly.

Fixes: 38e44da591 ("thermal: int3400_thermal: process "thermal table changed" event")
Signed-off-by: Chuansheng Liu <chuansheng.liu@intel.com>
Cc: 4.14+ <stable@vger.kernel.org> # 4.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-24 20:14:19 +01:00
Randy Dunlap
c95aa2bab9 thermal: intel: hfi: INTEL_HFI_THERMAL depends on NET
THERMAL_NETLINK depends on NET and since 'select' does not follow
any dependency chain, INTEL_HFI_THERMAL also should depend on NET.

Fix one Kconfig warning and 48 subsequent build errors:

WARNING: unmet direct dependencies detected for THERMAL_NETLINK
  Depends on [n]: THERMAL [=y] && NET [=n]
  Selected by [y]:
  - INTEL_HFI_THERMAL [=y] && THERMAL [=y] && (X86 [=y] || X86_INTEL_QUARK [=n] || COMPILE_TEST [=y]) && CPU_SUP_INTEL [=y] && X86_THERMAL_VECTOR [=y]

Fixes: bd30cdfd9b ("thermal: intel: hfi: Notify user space for HFI events")
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-10 20:58:24 +01:00
Rafael J. Wysocki
098c874e20 thermal: Replace acpi_bus_get_device()
Replace acpi_bus_get_device() that is going to be dropped with
acpi_fetch_acpi_dev().

No intentional functional impact.

Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-04 19:33:18 +01:00
Yury Norov
a11cda8e2f thermal: intel_powerclamp: don't use bitmap_weight() in end_power_clamp()
Don't call bitmap_weight() if the following code can get by
without it.

Signed-off-by: Yury Norov <yury.norov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-04 19:28:03 +01:00
Jiasheng Jiang
38b16d6cfe thermal: int340x: Check for NULL after calling kmemdup()
As the potential failure of the allocation, kmemdup() may return NULL.

Then, 'bin_attr_data_vault.private' will be NULL, but
'bin_attr_data_vault.size' is not 0, which is not consistent.

Therefore, it is better to check the return value of kmemdup() to
avoid the confusion.

Fixes: 0ba13c763a ("thermal/int340x_thermal: Export GDDV")
Signed-off-by: Jiasheng Jiang <jiasheng@iscas.ac.cn>
[ rjw: Subject and changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-04 18:55:43 +01:00
Srinivas Pandruvada
bd30cdfd9b thermal: intel: hfi: Notify user space for HFI events
When the hardware issues an HFI event, relay a notification to user space.
This allows user space to respond by reading performance and efficiency of
each CPU and take appropriate action.

For example, when the performance and efficiency of a CPU is 0, user space
can either offline the CPU or inject idle. Also, if user space notices a
downward trend in performance, it may proactively adjust power limits to
avoid future situations in which performance drops to 0.

To avoid excessive notifications, the rate is limited by one HZ per event.
To limit the netlink message size, send parameters for up to 16 CPUs in a
single message. If there are more than 16 CPUs, issue as many messages as
needed to notify the status of all CPUs.

In the HFI specification, both performance and efficiency capabilities are
defined in the [0, 255] range. The existing implementations of HFI hardware
do not scale the maximum values to 255. Since userspace cares about
capability values that are either 0 or show a downward/upward trend, this
fact does not matter much. Relative changes in capabilities are enough. To
comply with the thermal netlink ABI, scale both performance and efficiency
capabilities to the [0, 1023] interval.

Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03 19:50:49 +01:00
Ricardo Neri
ab09b0744a thermal: intel: hfi: Enable notification interrupt
When hardware wants to inform the operating system about updates in the HFI
table, it issues a package-level thermal event interrupt. For this,
hardware has new interrupt and status bits in the IA32_PACKAGE_THERM_
INTERRUPT and IA32_PACKAGE_THERM_STATUS registers. The existing thermal
throttle driver already handles thermal event interrupts: it initializes
the thermal vector of the local APIC as well as per-CPU and package-level
interrupt reporting. It also provides routines to service such interrupts.
Extend its functionality to also handle HFI interrupts.

The frequency of the thermal HFI interrupt is specific to each processor
model. On some processors, a single interrupt happens as soon as the HFI is
enabled and hardware will never update HFI capabilities afterwards. On
other processors, thermal and power constraints may cause thermal HFI
interrupts every tens of milliseconds.

To not overwhelm consumers of the HFI data, use delayed work to throttle
the rate at which HFI updates are processed. Use a dedicated workqueue to
not overload system_wq if hardware issues many HFI updates.

Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03 19:50:49 +01:00
Ricardo Neri
2d74e6319a thermal: intel: hfi: Handle CPU hotplug events
All CPUs in a package are represented in an HFI table. There exists an
HFI table per package. Thus, CPUs in a package need to coordinate to
initialize and access the table. Do such coordination during CPU hotplug.
Use the first CPU to come online in a package to initialize the HFI
instance and the data structure representing it. Other CPUs in the same
package need only to register or unregister themselves in that data
structure.

The HFI depends on both the package-level thermal management and the local
APIC thermal local vector. Thus, to ensure that a CPU coming online has an
associated HFI instance when the hardware issues an HFI event, enable the
HFI only after having enabled the local APIC thermal vector. The thermal
throttle driver takes care of the needed package-level initialization.

Reviewed-by: Len Brown <len.brown@intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03 19:50:49 +01:00
Ricardo Neri
1cb19cabeb thermal: intel: hfi: Minimally initialize the Hardware Feedback Interface
The Intel Hardware Feedback Interface provides guidance to the operating
system about the performance and energy efficiency capabilities of each
CPU in the system. Capabilities are numbers between 0 and 255 where a
higher number represents a higher capability. For each CPU, energy
efficiency and performance are reported as separate capabilities.

Hardware computes these capabilities based on the operating conditions of
the system such as power and thermal limits. These capabilities are shared
with the operating system in a table resident in memory. Each package in
the system has its own HFI instance. Every logical CPU in the package is
represented in the table. More than one logical CPUs may be represented in
a single table entry. When the hardware updates the table, it generates a
package-level thermal interrupt.

The size and format of the HFI table depend on the supported features and
can only be determined at runtime. To minimally initialize the HFI, parse
its features and allocate one instance per package of a data structure with
the necessary parameters to read and navigate a local copy (i.e., owned by
the driver) of individual HFI tables.

A subsequent changeset will provide per-CPU initialization and interrupt
handling.

Reviewed-by: Len Brown <len.brown@intel.com>
Co-developed by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Aubrey Li <aubrey.li@linux.intel.com>
Signed-off-by: Ricardo Neri <ricardo.neri-calderon@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-02-03 19:50:49 +01:00
Srinivas Pandruvada
e5b54867f4 thermal: int340x: Add Raptor Lake PCI device id
Add Raptor Lake PCI ID for processor thermal device.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-17 19:48:07 +01:00
Srinivas Pandruvada
a95be874d2 thermal: int340x: Support Raptor Lake
Add Raptor Lake ACPI IDs for DPTF devices.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-01-17 19:48:07 +01:00
Rafael J. Wysocki
fff489ff07 Merge branch 'thermal-int340x'
Merge int340x thermal driver update fixing RFIM mailbox write
commands handling for 5.17-rc1.

* thermal-int340x:
  thermal/drivers/int340x: Fix RFIM mailbox write commands
2022-01-10 18:08:30 +01:00
Sumeet Pawnikar
2685c77b80 thermal/drivers/int340x: Fix RFIM mailbox write commands
The existing mail mechanism only supports writing of workload types.

However, mailbox command for RFIM (cmd = 0x08) also requires write
operation which is ignored. This results in failing to store RFI
restriction.

Fixint this requires enhancing mailbox writes for non workload
commands too, so remove the check for MBOX_CMD_WORKLOAD_TYPE_WRITE
in mailbox write to allow this other write commands to be supoorted.

At the same time, however, we have to make sure that there is no
impact on read commands, by avoiding to write anything into the
mailbox data register.

To properly implement that, add two separate functions for mbox read
and write commands for the processor thermal workload command type.
This helps to distinguish the read and write workload command types
from each other while sending mbox commands.

Fixes: 5d6fbc96bd ("thermal/drivers/int340x: processor_thermal: Export additional attributes")
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Cc: 5.14+ <stable@vger.kernel.org> # 5.14+
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
[ rjw: Changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-30 16:42:53 +01:00
Rafael J. Wysocki
125521addc - Fix PM issue on the iMX driver when suspend/resume is happening by
implementing PM runtime support (Oleksij Rempel)
 
 - Add 'const' annotation to the thermal_cooling_ops in the Intel
   powerclamp driver (Rikard Falkeborn)
 
 - Add TSU driver and bindings for the RZ/G2L platform (Biju Das)
 
 - Fix the missing ADC bit set on iMX8MP to enable the sensor (Paul
   Gerber)
 
 - Fix missing check when calling reset_control_deassert() (Biju Das)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmHEmTsACgkQqDIjiipP
 6E/KMgf+MdzK/OTuxP+pf0yVqpC3WpH6lrNhBRC+mO4S0yAe8ZlYCFVVtvMftSrn
 M2H8uxw74XlVkD8yi1kSYa1iVeD9jF+5ZViOybc83SKwryYEFI7tjxsjpaPFotxe
 ZnL8y55hDC63YpwNeafGgW9M/JFehBP09zafakdVsvzzNGP8XKoYQcgyeofwNIrX
 CRwU95K85k0eS2h9gPFPRWaNwZS4ubVEYr857oiGArX4/hEycvJ2UO+cH+zEmGjQ
 QKJIEh5gQEBs2lKgcqUwzNbitm5lUG+rYRWD9omQ4QLtHK/fxZw2CMMVnNso/G5z
 lslc8pLLwLh5QdR3v9tX4+OAxsV38A==
 =uIVF
 -----END PGP SIGNATURE-----

Merge tag 'thermal-v5.17-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux

Pull thermal control material for 5.17-rc1 from Daniel Lezcano:

 - Fix PM issue on the iMX driver when suspend/resume is happening by
   implementing PM runtime support (Oleksij Rempel)

 - Add 'const' annotation to the thermal_cooling_ops in the Intel
   powerclamp driver (Rikard Falkeborn)

 - Add TSU driver and bindings for the RZ/G2L platform (Biju Das)

 - Fix missing ADC bit set on iMX8MP to enable the sensor (Paul Gerber)

 - Fix missing check when calling reset_control_deassert() (Biju Das)

* tag 'thermal-v5.17-rc1' of https://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal/drivers/rz2gl: Add error check for reset_control_deassert()
  thermal/drivers/imx8mm: Enable ADC when enabling monitor
  thermal/drivers: Add TSU driver for RZ/G2L
  dt-bindings: thermal: Document Renesas RZ/G2L TSU
  thermal/drivers/intel_powerclamp: Constify static thermal_cooling_device_ops
  thermal/drivers/imx: Implement runtime PM support
2021-12-27 16:42:30 +01:00
Rafael J. Wysocki
9c33eef84e Merge back int340x driver material for 5.17. 2021-12-14 19:31:13 +01:00
Sumeet Pawnikar
f872f73601 thermal: int340x: Fix VCoRefLow MMIO bit offset for TGL
The VCoRefLow CPU FIVR register definition for Tiger Lake is incorrect.

Current implementation reads it from MMIO offset 0x5A18 and bit
offset [12:14], but the actual correct register definition is from
bit offset [11:13].

Update to fix the bit offset.

Fixes: 473be51142 ("thermal: int340x: processor_thermal: Add RFIM driver")
Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Cc: 5.14+ <stable@vger.kernel.org> # 5.14+
[ rjw: New subject, changelog edits ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-12-08 15:29:22 +01:00
Rikard Falkeborn
8152d2a9e7 thermal/drivers/intel_powerclamp: Constify static thermal_cooling_device_ops
The only usage of powerclamp_cooling_ops is to pass its address to
thermal_cooling_device_register(), which takes a pointer to const struct
thermal_cooling_device_ops. Make it const to allow the compiler to put
it in read-only memory.

Signed-off-by: Rikard Falkeborn <rikard.falkeborn@gmail.com>
Link: https://lore.kernel.org/r/20211128214641.30953-1-rikard.falkeborn@gmail.com
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2021-11-30 15:42:39 +01:00
Kees Cook
764cedc563 thermal: int340x: Use struct_group() for memcpy() region
In preparation for FORTIFY_SOURCE performing compile-time and
run-time field bounds checking for memcpy(), avoid intentionally
writing across neighboring fields.

Use struct_group() in struct art around members weight, and
ac[0-9]_max, so they can be referenced together. This will allow
memcpy() and sizeof() to more easily reason about sizes, improve
readability, and avoid future warnings about writing beyond the
end of weight.

"pahole" shows no size nor member offset changes to struct art.
"objdump -d" shows no meaningful object code changes (i.e. only
source line number induced differences).

Signed-off-by: Kees Cook <keescook@chromium.org>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-24 14:31:56 +01:00
Arnd Bergmann
994a04a20b thermal: int340x: Limit Kconfig to 64-bit
32-bit processors cannot generally access 64-bit MMIO registers
atomically, and it is unknown in which order the two halves of
this registers would need to be read:

drivers/thermal/intel/int340x_thermal/processor_thermal_mbox.c: In function 'send_mbox_cmd':
drivers/thermal/intel/int340x_thermal/processor_thermal_mbox.c:79:37: error: implicit declaration of function 'readq'; did you mean 'readl'? [-Werror=implicit-function-declaration]
   79 |                         *cmd_resp = readq((void __iomem *) (proc_priv->mmio_base + MBOX_OFFSET_DATA));
      |                                     ^~~~~
      |                                     readl

The driver already does not build for anything other than x86,
so limit it further to x86-64.

Fixes: aeb58c860d ("thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Reviewed-by: Randy Dunlap <rdunlap@infradead.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-16 20:16:35 +01:00
Linus Torvalds
d9c8e52ff9 thermal: int340x: fix build on 32-bit targets
Commit aeb58c860d ("thermal/drivers/int340x: processor_thermal: Suppot
64 bit RFIM responses") started using 'readq()' to read 64-bit status
responses from the int340x hardware.

That's all fine and good, but on 32-bit targets a 64-bit 'readq()' is
ambiguous, since it's no longer an atomic access.  Some hardware might
require 64-bit accesses, and other hardware might want low word first or
high word first.

It's quite likely that the driver isn't relevant in a 32-bit environment
any more, and there's a patch floating around to just make it depend on
X86_64, but let's make it buildable on x86-32 anyway.

The driver previously just read the low 32 bits, so the hardware
certainly is ok with 32-bit reads, and in a little-endian environment
the low word first model is the natural one.

So just add the include for the 'io-64-nonatomic-lo-hi.h' version.

Fixes: aeb58c860d ("thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses")
Reported-by: Jakub Kicinski <kuba@kernel.org>
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2021-11-12 10:56:25 -08:00
Srinivas Pandruvada
aeb58c860d thermal/drivers/int340x: processor_thermal: Suppot 64 bit RFIM responses
Some of the RFIM mail box command returns 64 bit values. So enhance
mailbox interface to return 64 bit values and use them for RFIM
commands.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Fixes: 5d6fbc96bd ("thermal/drivers/int340x: processor_thermal: Export additional attributes")
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-11-04 19:56:52 +01:00
Rafael J. Wysocki
46e9f92f31 Merge branches 'thermal-int340x', 'thermal-powerclamp' and 'thermal-docs'
Merge Intel thermal driver updates and a thermal documentation update
for v5.16.

* thermal-int340x:
  thermal: int340x: delete bogus length check

* thermal-powerclamp:
  thermal: intel_powerclamp: Use bitmap_zalloc/bitmap_free when applicable

* thermal-docs:
  thermal: Move ABI documentation to Documentation/ABI
2021-10-26 15:00:55 +02:00
Antoine Tenart
c4fcf1ada4 thermal/drivers/int340x: Improve the tcc offset saving for suspend/resume
When the driver resumes, the tcc offset is set back to its previous
value. But this only works if the value was user defined as otherwise
the offset isn't saved. This asymmetric logic is harder to maintain and
introduced some issues.

Improve the logic by saving the tcc offset in a suspend op, so the right
value is always restored after a resume.

Signed-off-by: Antoine Tenart <atenart@kernel.org>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pI andruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20210909085613.5577-3-atenart@kernel.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2021-10-21 11:46:24 +02:00
Dan Carpenter
52628a85dd thermal: int340x: delete bogus length check
This check has a signedness bug and does not work.  If "length" is
larger than "PAGE_SIZE" then "PAGE_SIZE - length" is not negative
but instead it is a large unsigned value.  Fortunately, Takashi Iwai
changed this code to use scnprint() instead of snprintf() so now
"length" is never larger than "PAGE_SIZE - 1" and the check can be
removed.

Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 16:46:27 +02:00
Christophe JAILLET
7fc775ffeb thermal: intel_powerclamp: Use bitmap_zalloc/bitmap_free when applicable
'cpu_clamping_mask' is a bitmap. So use 'bitmap_zalloc()' and
'bitmap_free()' to simplify code, improve the semantic of the code and
avoid some open-coded arithmetic in allocator arguments.

Signed-off-by: Christophe JAILLET <christophe.jaillet@wanadoo.fr>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-10-05 16:40:22 +02:00
Antoine Tenart
8b4bd25667 thermal/drivers/int340x: Do not set a wrong tcc offset on resume
After upgrading to Linux 5.13.3 I noticed my laptop would shutdown due
to overheat (when it should not). It turned out this was due to commit
fe6a6de669 ("thermal/drivers/int340x/processor_thermal: Fix tcc setting").

What happens is this drivers uses a global variable to keep track of the
tcc offset (tcc_offset_save) and uses it on resume. The issue is this
variable is initialized to 0, but is only set in
tcc_offset_degree_celsius_store, i.e. when the tcc offset is explicitly
set by userspace. If that does not happen, the resume path will set the
offset to 0 (in my case the h/w default being 3, the offset would become
too low after a suspend/resume cycle).

The issue did not arise before commit fe6a6de669, as the function
setting the offset would return if the offset was 0. This is no longer
the case (rightfully).

Fix this by not applying the offset if it wasn't saved before, reverting
back to the old logic. A better approach will come later, but this will
be easier to apply to stable kernels.

The logic to restore the offset after a resume was there long before
commit fe6a6de669, but as a value of 0 was considered invalid I'm
referencing the commit that made the issue possible in the Fixes tag
instead.

Fixes: fe6a6de669 ("thermal/drivers/int340x/processor_thermal: Fix tcc setting")
Cc: stable@vger.kernel.org
Cc: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Antoine Tenart <atenart@kernel.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Tested-by: Srinivas Pandruvada <srinivas.pI andruvada@linux.intel.com>
Link: https://lore.kernel.org/r/20210909085613.5577-2-atenart@kernel.org
2021-09-14 19:53:24 +02:00
Linus Torvalds
dd4703876e - Add the tegra3 thermal sensor and fix the compilation testing on
tegra by adding a dependency on ARCH_TEGRA along with COMPILE_TEST
   (Dmitry Osipenko)
 
 - Fix the error code for the exynos when devm_get_clk() fails (Dan
   Carpenter)
 
 - Add the TCC cooling support for AlderLake platform (Sumeet Pawnikar)
 
 - Add support for hardware trip points for the rcar gen3 thermal
   driver and store TSC id as unsigned int (Niklas Söderlund)
 
 - Replace the deprecated CPU-hotplug functions get_online_cpus() and
   put_online_cpus (Sebastian Andrzej Siewior)
 
 - Add the thermal tools directory in the MAINTAINERS file (Daniel
   Lezcano)
 
 - Fix the Makefile and the cross compilation flags for the userspace
   'tmon' tool (Rolf Eike Beer)
 
 - Allow to use the IMOK independently from the GDDV on Int340x (Sumeet
   Pawnikar)
 
 - Fix the stub thermal_cooling_device_register() function prototype
   which does not match the real function (Arnd Bergmann)
 
 - Make the thermal trip point optional in the DT bindings (Maxime
   Ripard)
 
 - Fix a typo in a comment in the core code (Geert Uytterhoeven)
 
 - Reduce the verbosity of the trace in the SoC thermal tegra driver
   (Dmitry Osipenko)
 
 - Add the support for the LMh (Limit Management hardware) driver on
   the QCom platforms (Thara Gopinath)
 
 - Allow processing of HWP interrupt by adding a weak function in the
   Intel driver (Srinivas Pandruvada)
 
 - Prevent an abort of the sensor probe is a channel is not used
   (Matthias Kaehlcke)
 -----BEGIN PGP SIGNATURE-----
 
 iQEzBAABCAAdFiEEGn3N4YVz0WNVyHskqDIjiipP6E8FAmE7yAcACgkQqDIjiipP
 6E8WKwgAmI+kdOURCz1CtCIv6FnQFx20suxZidfATPULBPZ8zcHqdjxJECXJpvDz
 y8T8dGAPRqXmMBJm2vF/qVraHaDJvscHXaflhsurhyEp0tiGptNeugBosso0jDEQ
 JFrQ6Rny/NCmNMOkWW4YlSfkik0zQ43xlHocy3UfOKfcshhYWGsPTbIXdA6wH/zh
 3tFW8j327tkyenrKZiaioJOSX88Oyy7Ft8l5k0HP+hPYkrvihWDJuEfEEmpfkdA7
 Q70Lacf2QtFP8mUfGKpQxURSgndjphDPU7tzP5NqqWvLuT94aTspvJx4ywVwK8Iv
 o+jNvR33yBIs5FXXB7B1ymkNE1h8uQ==
 =blIQ
 -----END PGP SIGNATURE-----

Merge tag 'thermal-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux

Pull thermal updates from Daniel Lezcano:

 - Add the tegra3 thermal sensor and fix the compilation testing on
   tegra by adding a dependency on ARCH_TEGRA along with COMPILE_TEST
   (Dmitry Osipenko)

 - Fix the error code for the exynos when devm_get_clk() fails (Dan
   Carpenter)

 - Add the TCC cooling support for AlderLake platform (Sumeet Pawnikar)

 - Add support for hardware trip points for the rcar gen3 thermal driver
   and store TSC id as unsigned int (Niklas Söderlund)

 - Replace the deprecated CPU-hotplug functions get_online_cpus() and
   put_online_cpus (Sebastian Andrzej Siewior)

 - Add the thermal tools directory in the MAINTAINERS file (Daniel
   Lezcano)

 - Fix the Makefile and the cross compilation flags for the userspace
   'tmon' tool (Rolf Eike Beer)

 - Allow to use the IMOK independently from the GDDV on Int340x (Sumeet
   Pawnikar)

 - Fix the stub thermal_cooling_device_register() function prototype
   which does not match the real function (Arnd Bergmann)

 - Make the thermal trip point optional in the DT bindings (Maxime
   Ripard)

 - Fix a typo in a comment in the core code (Geert Uytterhoeven)

 - Reduce the verbosity of the trace in the SoC thermal tegra driver
   (Dmitry Osipenko)

 - Add the support for the LMh (Limit Management hardware) driver on the
   QCom platforms (Thara Gopinath)

 - Allow processing of HWP interrupt by adding a weak function in the
   Intel driver (Srinivas Pandruvada)

 - Prevent an abort of the sensor probe is a channel is not used
   (Matthias Kaehlcke)

* tag 'thermal-v5.15-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/thermal/linux:
  thermal/drivers/qcom/spmi-adc-tm5: Don't abort probing if a sensor is not used
  thermal/drivers/intel: Allow processing of HWP interrupt
  dt-bindings: thermal: Add dt binding for QCOM LMh
  thermal/drivers/qcom: Add support for LMh driver
  firmware: qcom_scm: Introduce SCM calls to access LMh
  thermal/drivers/tegra-soctherm: Silence message about clamped temperature
  thermal: Spelling s/scallbacks/callbacks/
  dt-bindings: thermal: Make trips node optional
  thermal/core: Fix thermal_cooling_device_register() prototype
  thermal/drivers/int340x: Use IMOK independently
  tools/thermal/tmon: Add cross compiling support
  thermal/tools/tmon: Improve the Makefile
  MAINTAINERS: Add missing userspace thermal tools to the thermal section
  thermal/drivers/intel_powerclamp: Replace deprecated CPU-hotplug functions.
  thermal/drivers/rcar_gen3_thermal: Store TSC id as unsigned int
  thermal/drivers/rcar_gen3_thermal: Add support for hardware trip points
  drivers/thermal/intel: Add TCC cooling support for AlderLake platform
  thermal/drivers/exynos: Fix an error code in exynos_tmu_probe()
  thermal/drivers/tegra: Correct compile-testing of drivers
  thermal/drivers/tegra: Add driver for Tegra30 thermal sensor
2021-09-11 09:20:57 -07:00
Srinivas Pandruvada
5950fc44a5 thermal/drivers/intel: Allow processing of HWP interrupt
Add a weak function to process HWP (Hardware P-states) notifications and
move updating HWP_STATUS MSR to this function.

This allows HWP interrupts to be processed by the intel_pstate driver in
HWP mode by overriding the implementation.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210820024006.2347720-1-srinivas.pandruvada@linux.intel.com
2021-09-09 16:33:20 +02:00
Linus Torvalds
7ba88a2a09 platform-drivers-x86 for v5.15-1
Highlights:
  - Move all the Intel drivers into their own subdir(s) (mostly Kate's work)
  - New meraki-mx100 platform driver
  - Asus WMI driver enhancements, including
    /sys/firmware/acpi/platform_profile support
  - New BIOS SAR driver for Intel M.2 WWAM modems
  - Alder Lake support for the Intel PMC driver
  - A whole bunch of cleanups + fixes all over the place
 
 The following is an automated git shortlog grouped by driver:
 
 BIOS SAR driver for Intel M.2 Modem:
  - BIOS SAR driver for Intel M.2 Modem
 
 ISST:
  -  use semi-colons instead of commas
  -  Fix optimization with use of numa
 
 Replace deprecated CPU-hotplug functions.:
  - Replace deprecated CPU-hotplug functions.
 
 Update Mario Limonciello's email address in the docs:
  - Update Mario Limonciello's email address in the docs
 
 acer-wmi:
  -  Add Turbo Mode support for Acer PH315-53
 
 add meraki-mx100 platform driver:
  - add meraki-mx100 platform driver
 
 asus-nb-wmi:
  -  Add tablet_mode_sw=lid-flip quirk for the TP200s
  -  Allow configuring SW_TABLET_MODE method with a module option
 
 asus-wmi:
  -  Fix "unsigned 'retval' is never less than zero" smatch warning
  -  Delete impossible condition
  -  Add support for platform_profile
  -  Add egpu enable method
  -  Add dgpu disable method
  -  Add panel overdrive functionality
 
 dell-smbios:
  -  Remove unused dmi_system_id table
 
 dell-smbios-wmi:
  -  Add missing kfree in error-exit from run_smbios_call
  -  Avoid false-positive memcpy() warning
 
 dell-smo8800:
  -  Convert to be a platform driver
 
 dual_accel_detect:
  -  Use the new i2c_acpi_client_count() helper
 
 gigabyte-wmi:
  -  add support for B450M S2H V2
  -  add support for X570 GAMING X
 
 hp_accel:
  -  Convert to be a platform driver
  -  Remove _INI method call
 
 i2c:
  -  acpi: Add an i2c_acpi_client_count() helper function
 
 i2c-multi-instantiate:
  -  Use the new i2c_acpi_client_count() helper
 
 ideapad-laptop:
  -  Fix Legion 5 Fn lock LED
 
 intel-hid:
  -  Move to intel sub-directory
 
 intel-rst:
  -  Move to intel sub-directory
 
 intel-smartconnect:
  -  Move to intel sub-directory
 
 intel-uncore-frequency:
  -  Move to intel sub-directory
 
 intel-vbtn:
  -  Move to intel sub-directory
 
 intel-wmi-sbl-fw-update:
  -  Move to intel sub-directory
 
 intel-wmi-thunderbolt:
  -  Move to intel sub-directory
 
 intel_atomisp2:
  -  Move to intel sub-directory
 
 intel_bxtwc_tmu:
  -  Move to intel sub-directory
 
 intel_cht_int33fe:
  -  Use the new i2c_acpi_client_count() helper
 
 intel_chtdc_ti_pwrbtn:
  -  Move to intel sub-directory
 
 intel_int0002_vgpio:
  -  Move to intel sub-directory
 
 intel_mrfld_pwrbtn:
  -  Move to intel sub-directory
 
 intel_oaktrail:
  -  Move to intel sub-directory
 
 intel_pmc_core:
  -  Move to intel sub-directory
  -  Prevent possibile overflow
 
 intel_pmt_telemetry:
  -  Ignore zero sized entries
 
 intel_punit_ipc:
  -  Move to intel sub-directory
 
 intel_scu_ipc:
  -  Fix doc of intel_scu_ipc_dev_command_with_size()
 
 intel_speed_select_if:
  -  Move to intel sub-directory
 
 intel_telemetry:
  -  Move to intel sub-directory
 
 intel_turbo_max_3:
  -  Move to intel sub-directory
 
 lg-laptop:
  -  Use correct event for keyboard backlight FN-key
  -  Use correct event for touchpad toggle FN-key
  -  Support for battery charge limit on newer models
 
 platform/mellanox:
  -  mlxbf-pmc: fix kernel-doc notation
 
 platform/surface:
  -  aggregator: Use y instead of objs in Makefile
  -  surface3_power: Use i2c_acpi_get_i2c_resource() helper
 
 platform/x86/intel:
  -  pmc/core: Add GBE Package C10 fix for Alder Lake PCH
  -  pmc/core: Add Alder Lake low power mode support for pmc core
  -  pmc/core: Add Latency Tolerance Reporting (LTR) support to Alder Lake
  -  pmc/core: Add Alderlake support to pmc core driver
  -  int3472: Use y instead of objs in Makefile
  -  pmt: Use y instead of objs in Makefile
  -  int33fe: Use y instead of objs in Makefile
  -  Move Intel PMT drivers to new subfolder
 
 thermal/drivers/intel:
  -  Move intel_menlow to thermal drivers
 
 think-lmi:
  -  add debug_cmd
 -----BEGIN PGP SIGNATURE-----
 
 iQFIBAABCAAyFiEEuvA7XScYQRpenhd+kuxHeUQDJ9wFAmEw1KAUHGhkZWdvZWRl
 QHJlZGhhdC5jb20ACgkQkuxHeUQDJ9wk7Qf/dsaaDgx7aC6DfKzdcMgfeLIdTaGm
 a6svNXM2t/JFdvhjYzxA+4QlQgco7zkN06iRlWbObEonSUsHlGlwEOHX60VgcopO
 qJaqnznmfdXUocFTnA+5acJXabNaw7xkKHS0K61UWgk+mm6aMuygpKxULnNTa4X+
 p3HoU6uXFckpoA/Jstzo5UfegNYhg11bflNd7XN4F3rMCbbNHAsWlf4oVr2YsEHa
 wECW+1e8wZl4BInUzoXQhilRoybJWXWJ8sLsvQfDXLs9aNoLdDqu9p0MuXEW5QqE
 wNt26SNNAP2L49BD6kaJszV5Ry/jNSEtVkWwkrzHbGTZoOEyMYas8pm6Uw==
 =iK1W
 -----END PGP SIGNATURE-----

Merge tag 'platform-drivers-x86-v5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86

Pull x86 platform driver updates from Hans de Goede:
 "Highlights:

   - Move all the Intel drivers into their own subdir(s) (mostly Kate's
     work)

   - New meraki-mx100 platform driver

   - Asus WMI driver enhancements, including support for
     /sys/firmware/acpi/platform_profile

   - New BIOS SAR driver for Intel M.2 WWAM modems

   - Alder Lake support for the Intel PMC driver

   - A whole bunch of cleanups + fixes all over the place"

* tag 'platform-drivers-x86-v5.15-1' of git://git.kernel.org/pub/scm/linux/kernel/git/pdx86/platform-drivers-x86: (65 commits)
  platform/x86: dell-smbios-wmi: Add missing kfree in error-exit from run_smbios_call
  platform/x86: dell-smbios-wmi: Avoid false-positive memcpy() warning
  platform/x86: ISST: use semi-colons instead of commas
  platform/x86: asus-wmi: Fix "unsigned 'retval' is never less than zero" smatch warning
  platform/x86: asus-wmi: Delete impossible condition
  platform/x86: hp_accel: Convert to be a platform driver
  platform/x86: hp_accel: Remove _INI method call
  platform/mellanox: mlxbf-pmc: fix kernel-doc notation
  platform/x86/intel: pmc/core: Add GBE Package C10 fix for Alder Lake PCH
  platform/x86/intel: pmc/core: Add Alder Lake low power mode support for pmc core
  platform/x86/intel: pmc/core: Add Latency Tolerance Reporting (LTR) support to Alder Lake
  platform/x86/intel: pmc/core: Add Alderlake support to pmc core driver
  platform/x86: intel-wmi-thunderbolt: Move to intel sub-directory
  platform/x86: intel-wmi-sbl-fw-update: Move to intel sub-directory
  platform/x86: intel-vbtn: Move to intel sub-directory
  platform/x86: intel_oaktrail: Move to intel sub-directory
  platform/x86: intel_int0002_vgpio: Move to intel sub-directory
  platform/x86: intel-hid: Move to intel sub-directory
  platform/x86: intel_atomisp2: Move to intel sub-directory
  platform/x86: intel_speed_select_if: Move to intel sub-directory
  ...
2021-09-02 13:49:39 -07:00
Srinivas Pandruvada
950809cd6c thermal: intel: Allow processing of HWP interrupt
Add a weak function to process HWP (Hardware P-states) notifications and
move updating HWP_STATUS MSR to this function.

This allows HWP interrupts to be processed by the intel_pstate driver in
HWP mode by overriding the implementation.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2021-08-25 20:09:37 +02:00
Srinivas Pandruvada
2010319b3c thermal/drivers/intel: Move intel_menlow to thermal drivers
Moved drivers/platform/x86/intel_menlow.c to drivers/thermal/intel.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20210816035356.1955982-1-srinivas.pandruvada@linux.intel.com
Signed-off-by: Hans de Goede <hdegoede@redhat.com>
2021-08-17 14:11:48 +02:00
Sumeet Pawnikar
f1b07a1469 thermal/drivers/int340x: Use IMOK independently
Some chrome platform requires IMOK method in coreboot. But these platforms
don't use GDDV data vault in coreboot. As per current code flow, to enable
and use IMOK only, we need to have GDDV support as well in coreboot. This
patch removes the dependency for IMOK from GDDV to enable and use IMOK
independently.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Acked-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210716163946.3142-1-sumeet.r.pawnikar@intel.com
2021-08-14 15:39:13 +02:00
Sebastian Andrzej Siewior
d31eb7c1a2 thermal/drivers/intel_powerclamp: Replace deprecated CPU-hotplug functions.
The functions get_online_cpus() and put_online_cpus() have been
deprecated during the CPU hotplug rework. They map directly to
cpus_read_lock() and cpus_read_unlock().

Replace deprecated CPU-hotplug functions with the official version.
The behavior remains unchanged.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: Daniel Lezcano <daniel.lezcano@linaro.org>
Cc: Amit Kucheria <amitk@kernel.org>
Cc: linux-pm@vger.kernel.org
Signed-off-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210803141621.780504-20-bigeasy@linutronix.de
2021-08-14 12:51:42 +02:00
Sumeet Pawnikar
a414a08aef drivers/thermal/intel: Add TCC cooling support for AlderLake platform
Add tcc cooling support for the AlderLake platform.

Signed-off-by: Sumeet Pawnikar <sumeet.r.pawnikar@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210809115635.10100-1-sumeet.r.pawnikar@intel.com
2021-08-14 12:41:36 +02:00
Srinivas Pandruvada
fe6a6de669 thermal/drivers/int340x/processor_thermal: Fix tcc setting
The following fixes are done for tcc sysfs interface:
- TCC is 6 bits only from bit 29-24
- TCC of 0 is valid
- When BIT(31) is set, this register is read only
- Check for invalid tcc value
- Error for negative values

Fixes: fdf4f2fb8e ("drivers: thermal: processor_thermal_device: Export sysfs interface for TCC offset")
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Cc: stable@vger.kernel.org
Acked-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210628215803.75038-1-srinivas.pandruvada@linux.intel.com
2021-07-04 18:28:04 +02:00
Srinivas Pandruvada
ad079d981d thermal/drivers/int340x/processor_thermal: Fix warning for return value
Fix smatch warnings:
drivers/thermal/intel/int340x_thermal/processor_thermal_device_pci.c:258 proc_thermal_pci_probe() warn: missing error code 'ret'

Use PTR_ERR to return failure of thermal_zone_device_register().

Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210628183232.62877-1-srinivas.pandruvada@linux.intel.com
2021-07-04 18:28:04 +02:00
Srinivas Pandruvada
acd65d5d1c thermal/drivers/int340x/processor_thermal: Add PCI MMIO based thermal driver
Add a new PCI driver which register a thermal zone and allows to get
notification for threshold violation by a RW trip point. These
notifications are delivered from the device using MSI based
interrupt.

The main difference between this new PCI driver and the existing
one is that the temperature and trip points directly use PCI
MMIO instead of using ACPI methods.

This driver registers a thermal zone "TCPU_PCI" in addition to the
legacy processor thermal device, which uses ACPI companion device
to set name, temperature and trips.

This driver is enabled for AlderLake.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210525204811.3793651-3-srinivas.pandruvada@linux.intel.com
2021-07-04 18:28:04 +02:00
Srinivas Pandruvada
8fe145f7ce thermal/drivers/int340x/processor_thermal: Split enumeration and processing part
Remove enumeration part from the processor_thermal_device to two
different modules. One for ACPI and one for PCI:
	ACPI enumeration: int3401_thermal
	PCI part: processor_thermal_device_pci_legacy

The current processor_thermal_device now just implements interface
functions to be used by the ACPI and PCI enumeration module. This is
done by:
1. Make functions proc_thermal_add() and proc_thermal_remove() non static
and export them for usage in other processor_thermal_device_pci_legacy.c
and in int3401_thermal.c.

2. Move the sysfs file creation for TCC offset and power limit attribute
group to the proc_thermal_add() from the individual enumeration callbacks
for PCI and ACPI.

3. Create new interface functions proc_thermal_mmio_add() and
proc_thermal_mmio_remove() which will be called from the
processor_thermal_device_pci_legacy module.

4. Export proc_thermal_resume(), so that it can be used by power
management callbacks.

5. Remove special check for double enumeration as it never happens.

While here, fix some cleanup on error conditions in proc_thermal_add().

No functional changes are expected with this change.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210525204811.3793651-2-srinivas.pandruvada@linux.intel.com
2021-07-04 18:28:04 +02:00
Andy Shevchenko
da5e562fbc thermal/drivers/intel/intel_soc_dts_iosf: Switch to use find_first_zero_bit()
Switch to use find_first_zero_bit() instead of open-coded variant.

Signed-off-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210618153451.89246-1-andriy.shevchenko@linux.intel.com
2021-07-04 18:28:04 +02:00
Srinivas Pandruvada
5d6fbc96bd thermal/drivers/int340x: processor_thermal: Export additional attributes
Export additional attributes:

ddr_data_rate (RO) : Show current DDR (Double Data Rate) data rate.
rfi_restriction (RW) : Show or set current state for RFI (Radio
			Frequency Interference) protection.

These attributes use mailbox commands to get/set information. Here
command codes are:
0x0007: Read RFI restriction
0x0107: Read DDR data rate
0x0008: Write RFI restriction

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210517061441.1921901-3-srinivas.pandruvada@linux.intel.com
2021-06-11 11:55:47 +02:00
Srinivas Pandruvada
fb5a6ec803 thermal/drivers/int340x: processor_thermal: Export mailbox interface
Export the mailbox interface to be used by other modules. Also change
command id and response from u8 to u32 data type.

Signed-off-by: Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20210517061441.1921901-2-srinivas.pandruvada@linux.intel.com
2021-06-11 11:54:42 +02:00
Linus Torvalds
773ac53bbf - Fix out-of-spec hardware (1st gen Hygon) which does not implement
MSR_AMD64_SEV even though the spec clearly states so, and check CPUID
 bits first.
 
 - Send only one signal to a task when it is a SEGV_PKUERR si_code type.
 
 - Do away with all the wankery of reserving X amount of memory in
 the first megabyte to prevent BIOS corrupting it and simply and
 unconditionally reserve the whole first megabyte.
 
 - Make alternatives NOP optimization work at an arbitrary position
 within the patched sequence because the compiler can put single-byte
 NOPs for alignment anywhere in the sequence (32-bit retpoline), vs our
 previous assumption that the NOPs are only appended.
 
 - Force-disable ENQCMD[S] instructions support and remove update_pasid()
 because of insufficient protection against FPU state modification in an
 interrupt context, among other xstate horrors which are being addressed
 at the moment. This one limits the fallout until proper enablement.
 
 - Use cpu_feature_enabled() in the idxd driver so that it can be
 build-time disabled through the defines in .../asm/disabled-features.h.
 
 - Fix LVT thermal setup for SMI delivery mode by making sure the APIC
 LVT value is read before APIC initialization so that softlockups during
 boot do not happen at least on one machine.
 
 - Mark all legacy interrupts as legacy vectors when the IO-APIC is
 disabled and when all legacy interrupts are routed through the PIC.
 -----BEGIN PGP SIGNATURE-----
 
 iQIzBAABCgAdFiEEzv7L6UO9uDPlPSfHEsHwGGHeVUoFAmC8fdEACgkQEsHwGGHe
 VUqO5A/+IbIo8myl8VPjw6HRnHgY8rsYRjxdtmVhbaMi5XOmTMfVA9zJ6QALxseo
 Mar8bmWcezEs0/FmNvk1vEOtIgZvRVy5RqXbu3W2EgWICuzRWbj822q+KrkbY0tH
 1GWjcZQO8VlgeuQsukyj5QHaBLffpn3Fh1XB8r0cktZvwciM+LRNMnK8d6QjqxNM
 ctTX4wdI6kc076pOi7MhKxSe+/xo5Wnf27lClLMOcsO/SS42KqgeRM5psWqxihhL
 j6Y3Oe+Nm+7GKF8y841PUSlwjgWmlZa6UkR6DBTP7DGnHDa5hMpzxYvHOquq/SbA
 leV9OLqI0iWs56kSzbEcXo7do1kld62KjsA2KtUhJfVAtm+igQLh5G0jESBwrWca
 TBWaE5kt6s8wP7LXeg26o4U8XD8vqEH88Tmsjlgqb/t/PKDV9PMGvNpF00dPZFo6
 Jhj2yntJYjLQYoAQLuQm5pfnKhZy3KKvk7ViGcnp3iN9i4eU9HzawIiXnliNOrTI
 ohQ9KoRhy1Cx0UfLkR+cdK4ks0u26DC2/Ewt0CE5AP/CQ1rX6Zbv2gFLjSpy7yQo
 6A99HEpbaLuy3kDt5vn91viPNUlOveuIXIdHp6u+zgFfx88eLUoEvfR135aV/Gyh
 p5PJm/BO99KByQzFCnilkp7nBeKtnKYSmUojA6JsZKjzJimSPYo=
 =zRI1
 -----END PGP SIGNATURE-----

Merge tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip

Pull x86 fixes from Borislav Petkov:
 "A bunch of x86/urgent stuff accumulated for the last two weeks so
  lemme unload it to you.

  It should be all totally risk-free, of course. :-)

   - Fix out-of-spec hardware (1st gen Hygon) which does not implement
     MSR_AMD64_SEV even though the spec clearly states so, and check
     CPUID bits first.

   - Send only one signal to a task when it is a SEGV_PKUERR si_code
     type.

   - Do away with all the wankery of reserving X amount of memory in the
     first megabyte to prevent BIOS corrupting it and simply and
     unconditionally reserve the whole first megabyte.

   - Make alternatives NOP optimization work at an arbitrary position
     within the patched sequence because the compiler can put
     single-byte NOPs for alignment anywhere in the sequence (32-bit
     retpoline), vs our previous assumption that the NOPs are only
     appended.

   - Force-disable ENQCMD[S] instructions support and remove
     update_pasid() because of insufficient protection against FPU state
     modification in an interrupt context, among other xstate horrors
     which are being addressed at the moment. This one limits the
     fallout until proper enablement.

   - Use cpu_feature_enabled() in the idxd driver so that it can be
     build-time disabled through the defines in disabled-features.h.

   - Fix LVT thermal setup for SMI delivery mode by making sure the APIC
     LVT value is read before APIC initialization so that softlockups
     during boot do not happen at least on one machine.

   - Mark all legacy interrupts as legacy vectors when the IO-APIC is
     disabled and when all legacy interrupts are routed through the PIC"

* tag 'x86_urgent_for_v5.13-rc5' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip:
  x86/sev: Check SME/SEV support in CPUID first
  x86/fault: Don't send SIGSEGV twice on SEGV_PKUERR
  x86/setup: Always reserve the first 1M of RAM
  x86/alternative: Optimize single-byte NOPs at an arbitrary position
  x86/cpufeatures: Force disable X86_FEATURE_ENQCMD and remove update_pasid()
  dmaengine: idxd: Use cpu_feature_enabled()
  x86/thermal: Fix LVT thermal setup for SMI delivery mode
  x86/apic: Mark _all_ legacy interrupts when IO/APIC is missing
2021-06-06 12:25:43 -07:00