Commit Graph

19 Commits

Author SHA1 Message Date
Daniel Lezcano
5b8de18ee9 thermal/core: Move the thermal trip code to a dedicated file
The thermal_core.c files contains a lot of functions handling
different thermal components like the governors, the trip points, the
cooling device, the OF cooling device, etc ...

This organization does not help to migrate to a more sane code where
there is a better self-encapsulation as all the components' internals
can be directly accessed from a single file.

For the sake of clarity, let's move the thermal trip points code in a
dedicated thermal_trip.c file and add a function to browse all the
trip points like we do with the thermal zones, the govenors and the
cooling devices.

The same can be done for the cooling devices and the governor code but
that will come later as the current work in the thermal framework is
to fix the trip point handling and use a generic trip point structure.

No functional changes intended.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2023-01-25 16:40:39 +01:00
Daniel Lezcano
7c3d5c20dc thermal/core: Add a generic thermal_zone_get_trip() function
The thermal_zone_device_ops structure defines a set of ops family,
get_trip_temp(), get_trip_hyst(), get_trip_type(). Each of them is
returning a property of a trip point.

The result is the code is calling the ops everywhere to get a trip
point which is supposed to be defined in the backend driver. It is a
non-sense as a thermal trip can be generic and used by the backend
driver to declare its trip points.

Part of the thermal framework has been changed and all the OF thermal
drivers are using the same definition for the trip point and use a
thermal zone registration variant to pass those trip points which are
part of the thermal zone device structure.

Consequently, we can use a generic function to get the trip points
when they are stored in the thermal zone device structure.

This approach can be generalized to all the drivers and we can get rid
of the ops->get_trip_*. That will result to a much more simpler code
and make possible to rework how the thermal trip are handled in the
thermal core framework as discussed previously.

This change adds a function thermal_zone_get_trip() where we get the
thermal trip point structure which contains all the properties (type,
temp, hyst) instead of doing multiple calls to ops->get_trip_*.

That opens the door for trip point extension with more attributes. For
instance, replacing the trip points disabled bitmask with a 'disabled'
field in the structure.

Here we replace all the calls to ops->get_trip_* in the thermal core
code with a call to the thermal_zone_get_trip() function.

The thermal zone ops defines a callback to retrieve the critical
temperature. As the trip handling is being reworked, all the trip
points will be the same whatever the driver and consequently finding
the critical trip temperature will be just a loop to search for a
critical trip point type.

Provide such a generic function, so we encapsulate the ops
get_crit_temp() which can be removed when all the backend drivers are
using the generic trip points handling.

While at it, add the thermal_zone_get_num_trips() to encapsulate the
code more and reduce the grip with the thermal framework internals.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reviewed-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20221003092602.1323944-2-daniel.lezcano@linaro.org
2023-01-06 14:14:47 +01:00
Guenter Roeck
91b3aafc22 thermal/core: Remove thermal_zone_set_trips()
Since no callers of thermal_zone_set_trips() are left, remove the function.
Document __thermal_zone_set_trips() instead. Explicitly state that the
thermal zone lock must be held when calling the function, and that the
pointer to the thermal zone must be valid.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-14 19:04:37 +01:00
Guenter Roeck
ed97d10a8b thermal/core: Move parameter validation from __thermal_zone_get_temp to thermal_zone_get_temp
All callers of __thermal_zone_get_temp() already validated the
thermal zone parameters. Move validation to thermal_zone_get_temp()
where it is actually needed. Also add kernel documentation for
__thermal_zone_get_temp(), listing the requirement that the
function must be called with validated parameters and with thermal
device mutex held.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-14 19:04:37 +01:00
Guenter Roeck
1c6b300607 thermal/core: Ensure that thermal device is registered in thermal_zone_get_temp
Calls to thermal_zone_get_temp() are not protected against thermal zone
device removal. As result, it is possible that the thermal zone operations
callbacks are no longer valid when thermal_zone_get_temp() is called.
This may result in crashes such as

BUG: unable to handle page fault for address: ffffffffc04ef420
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
PGD 5d60e067 P4D 5d60e067 PUD 5d610067 PMD 110197067 PTE 0
Oops: 0000 [#1] PREEMPT SMP NOPTI
CPU: 1 PID: 3209 Comm: cat Tainted: G        W         5.10.136-19389-g615abc6eb807 #1 02df41ac0b12f3a64f4b34245188d8875bb3bce1
Hardware name: Google Coral/Coral, BIOS Google_Coral.10068.92.0 11/27/2018
RIP: 0010:thermal_zone_get_temp+0x26/0x73
Code: 89 c3 eb d3 0f 1f 44 00 00 55 48 89 e5 41 57 41 56 53 48 85 ff 74 50 48 89 fb 48 81 ff 00 f0 ff ff 77 44 48 8b 83 98 03 00 00 <48> 83 78 10 00 74 36 49 89 f6 4c 8d bb d8 03 00 00 4c 89 ff e8 9f
RSP: 0018:ffffb3758138fd38 EFLAGS: 00010287
RAX: ffffffffc04ef410 RBX: ffff98f14d7fb000 RCX: 0000000000000000
RDX: ffff98f17cf90000 RSI: ffffb3758138fd64 RDI: ffff98f14d7fb000
RBP: ffffb3758138fd50 R08: 0000000000001000 R09: ffff98f17cf90000
R10: 0000000000000000 R11: ffffffff8dacad28 R12: 0000000000001000
R13: ffff98f1793a7d80 R14: ffff98f143231708 R15: ffff98f14d7fb018
FS:  00007ec166097800(0000) GS:ffff98f1bbd00000(0000) knlGS:0000000000000000
CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
CR2: ffffffffc04ef420 CR3: 000000010ee9a000 CR4: 00000000003506e0
Call Trace:
 temp_show+0x31/0x68
 dev_attr_show+0x1d/0x4f
 sysfs_kf_seq_show+0x92/0x107
 seq_read_iter+0xf5/0x3f2
 vfs_read+0x205/0x379
 __x64_sys_read+0x7c/0xe2
 do_syscall_64+0x43/0x55
 entry_SYSCALL_64_after_hwframe+0x61/0xc6

if a thermal device is removed while accesses to its device attributes
are ongoing.

The problem is exposed by code in iwl_op_mode_mvm_start(), which registers
a thermal zone device only to unregister it shortly afterwards if an
unrelated failure is encountered while accessing the hardware.

Check if the thermal zone device is registered after acquiring the
thermal zone device mutex to ensure this does not happen.

The code was tested by triggering the failure in iwl_op_mode_mvm_start()
on purpose. Without this patch, the kernel crashes reliably. The crash
is no longer observed after applying this and the preceding patches.

Signed-off-by: Guenter Roeck <linux@roeck-us.net>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2022-11-14 19:04:37 +01:00
Daniel Lezcano
a930da9bf5 thermal/core: Move the mutex inside the thermal_zone_device_update() function
All the different calls inside the thermal_zone_device_update()
function take the mutex.

The previous changes move the mutex out of the different functions,
like the throttling ops. Now that the mutexes are all at the same
level in the call stack for the thermal_zone_device_update() function,
they can be moved inside this one.

That has the benefit of:

1. Simplify the code by not having a plethora of places where the lock is taken

2. Probably closes more race windows because releasing the lock from
one line to another can give the opportunity to the thermal zone to change
its state in the meantime. For example, the thermal zone can be
enabled right after checking it is disabled.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/20220805153834.2510142-5-daniel.lezcano@linaro.org
2022-08-17 14:09:39 +02:00
Daniel Lezcano
e5bfcd30f8 thermal/core: Rename 'trips' to 'num_trips'
In order to use thermal trips defined in the thermal structure, rename
the 'trips' field to 'num_trips' to have the 'trips' field containing the
thermal trip points.

Cc: Alexandre Bailon <abailon@baylibre.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220722200007.1839356-8-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-07-28 17:29:56 +02:00
Daniel Lezcano
e5f2cda61d thermal/core: Move thermal_set_delay_jiffies to static
The function 'thermal_set_delay_jiffies' is only used in
thermal_core.c but it is defined and implemented in a separate
file. Move the function to thermal_core.c and make it static.

Cc: Alexandre Bailon <abailon@baylibre.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20220722200007.1839356-7-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-07-28 17:29:55 +02:00
Daniel Lezcano
6390383b67 thermal/core: Remove unneeded EXPORT_SYMBOLS
Different functions are exporting the symbols but are actually only
used by the thermal framework internals. Remove these EXPORT_SYMBOLS.

Cc: Alexandre Bailon <abailon@baylibre.com>
Cc: Kevin Hilman <khilman@baylibre.com>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linexp.org>
Link: https://lore.kernel.org/r/20220722200007.1839356-6-daniel.lezcano@linexp.org
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
2022-07-28 17:29:55 +02:00
Lukasz Luba
b70dbf40eb thermal/core: Create a helper __thermal_cdev_update() without a lock
There is a need to have a helper function which updates cooling device
state from the governors code. With this change governor can use
lock and unlock while calling helper function. This avoid unnecessary
second time lock/unlock which was in previous solution present in
governor implementation. This new helper function must be called
with mutex 'cdev->lock' hold.

The changed been discussed and part of code presented in thread:
https://lore.kernel.org/linux-pm/20210419084536.25000-1-lukasz.luba@arm.com/

Co-developed-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Link: https://lore.kernel.org/r/20210422114308.29684-2-lukasz.luba@arm.com
2021-04-22 14:10:28 +02:00
Daniel Lezcano
17d399cd9c thermal/core: Precompute the delays from msecs to jiffies
The delays are stored in ms units and when the polling function is
called this delay is converted into jiffies at each call.

Instead of doing the conversion again and again, compute the jiffies
at init time and use the value directly when setting the polling.

Cc: Thara Gopinath <thara.gopinath@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Thara Gopinath <thara.gopinath@linaro.org>
Link: https://lore.kernel.org/r/20201216220337.839878-1-daniel.lezcano@linaro.org
2021-01-19 22:23:38 +01:00
Daniel Lezcano
55cdf0a283 thermal: core: Add notifications call in the framework
The generic netlink protocol is implemented but the different
notification functions are not yet connected to the core code.

These changes add the notification calls in the different
corresponding places.

Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Acked-by: Zhang Rui <rui.zhang@intel.com>
Link: https://lore.kernel.org/r/20200706105538.2159-4-daniel.lezcano@linaro.org
2020-07-07 15:55:22 +02:00
Amit Kucheria
3a74c882dc thermal/drivers/thermal_helpers: Include export.h
It is preferable to include export.h when you are using EXPORT_SYMBOL
family of macros.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/fd3443f00dbba6ca90f35726c7451ae52145d2d4.1589199124.git.amit.kucheria@linaro.org
2020-05-22 18:48:53 +02:00
Amit Kucheria
231b98af4d thermal/drivers/thermal_helpers: Sort headers alphabetically
Sort headers to make it easier to read and find duplicate headers.

Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org>
Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Link: https://lore.kernel.org/r/133db154796f354e6c51e6310095f679e1f45441.1589199124.git.amit.kucheria@linaro.org
2020-05-22 18:48:53 +02:00
Daniel Lezcano
bceb5646a1 thermal: core: Make thermal_zone_set_trips private
The function thermal_zone_set_trips() is used by the thermal core code
in order to update the next trip points, there are no other users.

Move the function definition in the thermal_core.h, remove the
EXPORT_SYMBOL_GPL and document the function.

Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Lukasz Luba <lukasz.luba@arm.com>
Reviewed-by: Amit Kucheria <amit.kucheria@linaro.org>
Link: https://lore.kernel.org/r/20200331165449.30355-1-daniel.lezcano@linaro.org
2020-04-14 11:41:12 +02:00
Lina Iyer
7e3c03817f drivers: thermal: Update license to SPDX format
Update licences format for core thermal files.

Signed-off-by: Lina Iyer <ilina@codeaurora.org>
Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2018-05-30 14:46:17 +08:00
Viresh Kumar
8ea229511e thermal: Add cooling device's statistics in sysfs
This extends the sysfs interface for thermal cooling devices and exposes
some pretty useful statistics. These statistics have proven to be quite
useful specially while doing benchmarks related to the task scheduler,
where we want to make sure that nothing has disrupted the test,
specially the cooling device which may have put constraints on the CPUs.
The information exposed here tells us to what extent the CPUs were
constrained by the thermal framework.

The write-only "reset" file is used to reset the statistics.

The read-only "time_in_state_ms" file shows the time (in msec) spent by the
device in the respective cooling states, and it prints one line per
cooling state.

The read-only "total_trans" file shows single positive integer value
showing the total number of cooling state transitions the device has
gone through since the time the cooling device is registered or the time
when statistics were reset last.

The read-only "trans_table" file shows a two dimensional matrix, where
an entry <i,j> (row i, column j) represents the number of transitions
from State_i to State_j.

This is how the directory structure looks like for a single cooling
device:

$ ls -R /sys/class/thermal/cooling_device0/
/sys/class/thermal/cooling_device0/:
cur_state  max_state  power  stats  subsystem  type  uevent

/sys/class/thermal/cooling_device0/power:
autosuspend_delay_ms  runtime_active_time  runtime_suspended_time
control               runtime_status

/sys/class/thermal/cooling_device0/stats:
reset  time_in_state_ms  total_trans  trans_table

This is tested on ARM 64-bit Hisilicon hikey620 board running Ubuntu and
ARM 64-bit Hisilicon hikey960 board running Android.

Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2018-04-02 21:49:01 +08:00
Eduardo Valentin
373f91d125 thermal: core: move slop and offset helpers to thermal_helpers.c
Reorganize code to reflect better placement.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2016-11-23 10:06:12 +08:00
Eduardo Valentin
cd221c7b63 thermal: core: introduce thermal_helpers.c
Here we have a simple code organization. This patch moves
functions that do not need to handle thermal core internal
data structure to thermal_helpers.c file.

Cc: Zhang Rui <rui.zhang@intel.com>
Cc: linux-pm@vger.kernel.org
Cc: linux-kernel@vger.kernel.org
Signed-off-by: Eduardo Valentin <edubezval@gmail.com>
Signed-off-by: Zhang Rui <rui.zhang@intel.com>
2016-11-23 10:06:12 +08:00