linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-15 16:24:13 +08:00

Author	SHA1	Message	Date
Quentin Perret	10dd8573b0	cpufreq: Register governors at core_initcall Currently, most CPUFreq governors are registered at the core_initcall time when the given governor is the default one, and the module_init time otherwise. In preparation for letting users specify the default governor on the kernel command line, change all of them to be registered at the core_initcall unconditionally, as it is already the case for the schedutil and performance governors. This will allow us to assume that builtin governors have been registered before the built-in CPUFreq drivers probe. And since all governors have similar init/exit patterns now, introduce two new macros, cpufreq_governor_{init,exit}(), to factorize the code. Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Quentin Perret <qperret@google.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2020-07-02 13:03:30 +02:00
Amit Kucheria	3f6ec871e1	cpufreq: Initialize the governors in core_initcall Initialize the cpufreq governors earlier to allow for earlier performance control during the boot process. Signed-off-by: Amit Kucheria <amit.kucheria@linaro.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Daniel Lezcano <daniel.lezcano@linaro.org> Link: https://lore.kernel.org/r/b98eae9b44eb2f034d7f5d12a161f5f831be1eb7.1571656015.git.amit.kucheria@linaro.org	2019-11-07 07:00:26 +01:00
Thomas Gleixner	d2912cb15b	treewide: Replace GPLv2 boilerplate/reference with SPDX - rule 500 Based on 2 normalized pattern(s): this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation this program is free software you can redistribute it and or modify it under the terms of the gnu general public license version 2 as published by the free software foundation # extracted by the scancode license scanner the SPDX license identifier GPL-2.0-only has been chosen to replace the boilerplate/reference in 4122 file(s). Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Enrico Weigelt <info@metux.net> Reviewed-by: Kate Stewart <kstewart@linuxfoundation.org> Reviewed-by: Allison Randal <allison@lohutok.net> Cc: linux-spdx@vger.kernel.org Link: https://lkml.kernel.org/r/20190604081206.933168790@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2019-06-19 17:09:55 +02:00
Rafael J. Wysocki	da5e79bc70	cpufreq: conservative: Take limits changes into account properly If the policy limits change between invocations of cs_dbs_update(), the requested frequency value stored in dbs_info may not be updated and the function may use a stale value of it next time. Moreover, if idle periods are takem into account by cs_dbs_update(), the requested frequency value stored in dbs_info may be below the min policy limit, which is incorrect. To fix these problems, always update the requested frequency value in dbs_info along with the local copy of it when the previous requested frequency is beyond the policy limits and avoid decreasing the requested frequency below the min policy limit when taking idle periods into account. Fixes: `abb6627910` (cpufreq: conservative: Fix next frequency selection) Fixes: `00bfe05889` (cpufreq: conservative: Decrease frequency faster for deferred updates) Reported-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com> Cc: All applicable <stable@vger.kernel.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Waldemar Rymarkiewicz <waldemarx.rymarkiewicz@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2018-10-16 12:34:45 +02:00
Viresh Kumar	2d04503632	cpufreq: governor: Drop min_sampling_rate The cpufreq core and governors aren't supposed to set a limit on how fast we want to try changing the frequency. This is currently done for the legacy governors with help of min_sampling_rate. At worst, we may end up setting the sampling rate to a value lower than the rate at which frequency can be changed and then one of the CPUs in the policy will be only changing frequency for ever. But that is something for the user to decide and there is no need to have special handling for such cases in the core. Leave it for the user to figure out. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-07-22 02:25:20 +02:00
Tomasz Wilczyński	b8e11f7d27	cpufreq: conservative: Allow down_threshold to take values from 1 to 10 Commit `27ed3cd2eb` (cpufreq: conservative: Fix the logic in frequency decrease checking) removed the 10 point substraction when comparing the load against down_threshold but did not remove the related limit for the down_threshold value. As a result, down_threshold lower than 11 is not allowed even though values from 1 to 10 do work correctly too. The comment ("cannot be lower than 11 otherwise freq will not fall") also does not apply after removing the substraction. For this reason, allow down_threshold to take any value from 1 to 99 and fix the related comment. Fixes: `27ed3cd2eb` (cpufreq: conservative: Fix the logic in frequency decrease checking) Signed-off-by: Tomasz Wilczyński <twilczynski@naver.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 3.10+ <stable@vger.kernel.org> # 3.10+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2017-06-12 14:28:07 +02:00
Stratos Karafotis	42d951c851	cpufreq: conservative: Fix comment explaining frequency updates The original comment about the frequency increase to maximum is wrong. Both increase and decrease happen at steps. Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-11-16 23:15:56 +01:00
Stratos Karafotis	00bfe05889	cpufreq: conservative: Decrease frequency faster for deferred updates Conservative governor changes the CPU frequency in steps. That means that if a CPU runs at max frequency, it will need several sampling periods to return to min frequency when the workload is finished. If the update function that calculates the load and target frequency is deferred, the governor might need even more time to decrease the frequency. This may have impact to power consumption and after all conservative should decrease the frequency if there is no workload at every sampling rate. To resolve the above issue calculate the number of sampling periods that the update is deferred. Considering that for each sampling period conservative should drop the frequency by a freq_step because the CPU was idle apply the proper subtraction to requested frequency. Below, the kernel trace with and without this patch. First an intensive workload is applied on a specific CPU. Then the workload is removed and the CPU goes to idle. WITHOUT <idle>-0 [007] dN.. 620.329153: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 620.350857: cpu_frequency: state=1700000 cpu_id=7 kworker/7:2-556 [007] .... 620.370856: cpu_frequency: state=1900000 cpu_id=7 kworker/7:2-556 [007] .... 620.390854: cpu_frequency: state=2100000 cpu_id=7 kworker/7:2-556 [007] .... 620.411853: cpu_frequency: state=2200000 cpu_id=7 kworker/7:2-556 [007] .... 620.432854: cpu_frequency: state=2400000 cpu_id=7 kworker/7:2-556 [007] .... 620.453854: cpu_frequency: state=2600000 cpu_id=7 kworker/7:2-556 [007] .... 620.494856: cpu_frequency: state=2900000 cpu_id=7 kworker/7:2-556 [007] .... 620.515856: cpu_frequency: state=3100000 cpu_id=7 kworker/7:2-556 [007] .... 620.536858: cpu_frequency: state=3300000 cpu_id=7 kworker/7:2-556 [007] .... 620.557857: cpu_frequency: state=3401000 cpu_id=7 <idle>-0 [007] d... 669.591363: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 669.591939: cpu_idle: state=4294967295 cpu_id=7 <idle>-0 [007] d... 669.591980: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] dN.. 669.591989: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 670.201224: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 670.221975: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 670.222016: cpu_frequency: state=3300000 cpu_id=7 <idle>-0 [007] d... 670.222026: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 670.234964: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 670.801251: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 671.236046: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 671.236073: cpu_frequency: state=3100000 cpu_id=7 <idle>-0 [007] d... 671.236112: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 671.393437: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 671.401277: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 671.404083: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 671.404111: cpu_frequency: state=2900000 cpu_id=7 <idle>-0 [007] d... 671.404125: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 671.404974: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 671.501180: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 671.995414: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 671.995459: cpu_frequency: state=2800000 cpu_id=7 <idle>-0 [007] d... 671.995469: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 671.996287: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 672.001305: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.078374: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 672.078410: cpu_frequency: state=2600000 cpu_id=7 <idle>-0 [007] d... 672.078419: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.158020: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 672.158040: cpu_frequency: state=2400000 cpu_id=7 <idle>-0 [007] d... 672.158044: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.160038: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 672.234557: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.237121: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 672.237174: cpu_frequency: state=2100000 cpu_id=7 <idle>-0 [007] d... 672.237186: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.237778: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 672.267902: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.269860: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 672.269906: cpu_frequency: state=1900000 cpu_id=7 <idle>-0 [007] d... 672.269914: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.271902: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 672.751342: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 672.823056: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-556 [007] .... 672.823095: cpu_frequency: state=1600000 cpu_id=7 WITH <idle>-0 [007] dN.. 4380.928009: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-399 [007] .... 4380.949767: cpu_frequency: state=2000000 cpu_id=7 kworker/7:2-399 [007] .... 4380.969765: cpu_frequency: state=2200000 cpu_id=7 kworker/7:2-399 [007] .... 4381.009766: cpu_frequency: state=2500000 cpu_id=7 kworker/7:2-399 [007] .... 4381.029767: cpu_frequency: state=2600000 cpu_id=7 kworker/7:2-399 [007] .... 4381.049769: cpu_frequency: state=2800000 cpu_id=7 kworker/7:2-399 [007] .... 4381.069769: cpu_frequency: state=3000000 cpu_id=7 kworker/7:2-399 [007] .... 4381.089771: cpu_frequency: state=3100000 cpu_id=7 kworker/7:2-399 [007] .... 4381.109772: cpu_frequency: state=3400000 cpu_id=7 kworker/7:2-399 [007] .... 4381.129773: cpu_frequency: state=3401000 cpu_id=7 <idle>-0 [007] d... 4428.226159: cpu_idle: state=1 cpu_id=7 <idle>-0 [007] d... 4428.226176: cpu_idle: state=4294967295 cpu_id=7 <idle>-0 [007] d... 4428.226181: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 4428.227177: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 4428.551640: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 4428.649239: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-399 [007] .... 4428.649268: cpu_frequency: state=2800000 cpu_id=7 <idle>-0 [007] d... 4428.649278: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 4428.689856: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 4428.799542: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 4428.801683: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-399 [007] .... 4428.801748: cpu_frequency: state=1700000 cpu_id=7 <idle>-0 [007] d... 4428.801761: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 4428.806545: cpu_idle: state=4294967295 cpu_id=7 ... <idle>-0 [007] d... 4429.051880: cpu_idle: state=4 cpu_id=7 <idle>-0 [007] d... 4429.086240: cpu_idle: state=4294967295 cpu_id=7 kworker/7:2-399 [007] .... 4429.086293: cpu_frequency: state=1600000 cpu_id=7 Without the patch the CPU dropped to min frequency after 3.2s With the patch applied the CPU dropped to min frequency after 0.86s Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-11-16 23:15:56 +01:00
Viresh Kumar	d5f905a93c	cpufreq: conservative: Rename get_freq_target() to get_freq_step() What's returned from this function is the delta by which the frequency must be increased or decreased and not the final frequency that should be selected. Name it properly to match its purpose. Also update the variables used to store that value. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-11-14 21:34:52 +01:00
Viresh Kumar	26f0dbc9ab	cpufreq: governor: Don't use 'timer' keyword The earlier implementation of governors used background timers and so functions, mutex, etc had 'timer' keyword in their names. But that's not true anymore. Replace 'timer' with 'update', as those functions, variables are based around updates to frequency. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-11-11 01:48:33 +01:00
Rafael J. Wysocki	abb6627910	cpufreq: conservative: Fix next frequency selection Commit `d352cf47d9` (cpufreq: conservative: Do not use transition notifications) overlooked the case when the "frequency step" used by the conservative governor is small relative to the distances between the available frequencies and broke the algorithm by using policy->cur instead of the previously requested frequency when computing the next one. As a result, the governor may not be able to go outside of a narrow range between two consecutive available frequencies. Fix the problem by making the governor save the previously requested frequency and select the next one relative that value (unless it is out of range, in which case policy->cur will be used instead). Fixes: `d352cf47d9` (cpufreq: conservative: Do not use transition notifications) Link: https://bugzilla.kernel.org/show_bug.cgi?id=177171 Reported-and-tested-by: Aleksey Rybalkin <aleksey@rybalkin.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Cc: 4.8+ <stable@vger.kernel.org> # 4.8+ Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-10-13 14:42:06 +02:00
Rafael J. Wysocki	d352cf47d9	cpufreq: conservative: Do not use transition notifications The conservative governor registers a transition notifier so it can update its internal requested_freq value if it falls out of the policy->min...policy->max range, but requested_freq is not really necessary. That value is used to track the frequency requested by the governor previously, but policy->cur can be used instead of it and then the governor will not have to worry about updating the tracked value when the current frequency changes independently (for example, as a result of min or max changes). Accodringly, drop requested_freq from struct cs_policy_dbs_info and modify cs_dbs_timer() to use policy->cur instead of it. While at it, notice that __cpufreq_driver_target() clamps its target_freq argument between policy->min and policy->max, so the callers of it don't have to do that and make additional changes in cs_dbs_timer() in accordance with that. After these changes the transition notifier used by the conservative governor is not necessary any more, so drop it, which also makes it possible to drop the struct cs_governor definition and simplify the code accordingly. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-06-13 23:33:49 +02:00
Rafael J. Wysocki	9a15fb2c79	cpufreq: Drop the 'initialized' field from struct cpufreq_governor The 'initialized' field in struct cpufreq_governor is only used by the conservative governor (as a usage counter) and the way that happens is far from straightforward and arguably incorrect. Namely, the value of 'initialized' is checked by cpufreq_dbs_governor_init() and cpufreq_dbs_governor_exit() and the results of those checks are passed (as the second argument) to the ->init() and ->exit() callbacks in struct dbs_governor. Those callbacks are only implemented by the ondemand and conservative governors and ondemand doesn't use their second argument at all. In turn, the conservative governor uses it to decide whether or not to either register or unregister a transition notifier. That whole mechanism is not only unnecessarily convoluted, but also racy, because the 'initialized' field of struct cpufreq_governor is updated in cpufreq_init_governor() and cpufreq_exit_governor() under policy->rwsem which doesn't help if one of these functions is run twice in parallel for different policies (which isn't impossible in principle), for example. Instead of it, add a proper usage counter to the conservative governor and update it from cs_init() and cs_exit() which is guaranteed to be non-racy, as those functions are only called under gov_dbs_data_mutex which is global. With that in place, drop the 'initialized' field from struct cpufreq_governor as it is not used any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-06-02 23:24:39 +02:00
Viresh Kumar	a69d6b2914	cpufreq: governor: Remove prints from allocation failures These aren't required anymore as the allocation core already prints such messages. Remove the redundant ones. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-06-02 23:24:37 +02:00
Rafael J. Wysocki	e788892ba3	cpufreq: governor: Get rid of governor events The design of the cpufreq governor API is not very straightforward, as struct cpufreq_governor provides only one callback to be invoked from different code paths for different purposes. The purpose it is invoked for is determined by its second "event" argument, causing it to act as a "callback multiplexer" of sorts. Unfortunately, that leads to extra complexity in governors, some of which implement the ->governor() callback as a switch statement that simply checks the event argument and invokes a separate function to handle that specific event. That extra complexity can be eliminated by replacing the all-purpose ->governor() callback with a family of callbacks to carry out specific governor operations: initialization and exit, start and stop and policy limits updates. That also turns out to reduce the code size too, so do it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-06-02 23:24:15 +02:00
Rafael J. Wysocki	0dd3c1d678	cpufreq: governor: New data type for management part of dbs_data In addition to fields representing governor tunables, struct dbs_data contains some fields needed for the management of objects of that type. As it turns out, that part of struct dbs_data may be shared with (future) governors that won't use the common code used by "ondemand" and "conservative", so move it to a separate struct type and modify the code using struct dbs_data to follow. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-04-02 01:09:00 +02:00
Rafael J. Wysocki	47ebaac1f3	cpufreq: governor: Relocate definitions of tuners structures Move the definitions of struct od_dbs_tuners and struct cs_dbs_tuners from the common governor header to the ondemand and conservative governor code, respectively, as they don't need to be in the common header any more. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:09 +01:00
Rafael J. Wysocki	8c8f77fd07	cpufreq: governor: Move per-CPU data to the common code After previous changes there is only one piece of code in the ondemand governor making references to per-CPU data structures, but it can be easily modified to avoid doing that, so modify it accordingly and move the definition of per-CPU data used by the ondemand and conservative governors to the common code. Next, change that code to access the per-CPU data structures directly rather than via a governor callback. This causes the ->get_cpu_cdbs governor callback to become unnecessary, so drop it along with the macro and function definitions related to it. Finally, drop the definitions of struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s that aren't necessary any more. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:09 +01:00
Rafael J. Wysocki	7d5a9956af	cpufreq: governor: Make governor private data per-policy Some fields in struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s are only used for a limited set of CPUs. Namely, if a policy is shared between multiple CPUs, those fields will only be used for one of them (policy->cpu). This means that they really are per-policy rather than per-CPU and holding room for them in per-CPU data structures is generally wasteful. Also moving those fields into per-policy data structures will allow some significant simplifications to be made going forward. For this reason, introduce struct cs_policy_dbs_info and struct od_policy_dbs_info to hold those fields. Define each of the new structures as an extension of struct policy_dbs_info (such that struct policy_dbs_info is embedded in each of them) and introduce new ->alloc and ->free governor callbacks to allocate and free those structures, respectively, such that ->alloc() will return a pointer to the struct policy_dbs_info embedded in the allocated data structure and ->free() will take that pointer as its argument. With that, modify the code accessing the data fields in question in per-CPU data objects to look for them in the new structures via the struct policy_dbs_info pointer available to it and drop them from struct od_cpu_dbs_info_s and struct cs_cpu_dbs_info_s. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:08 +01:00
Rafael J. Wysocki	a33cce1c6c	cpufreq: governor: Fix CPU load information updates via ->store The ->store() callbacks of some tunable sysfs attributes of the ondemand and conservative governors trigger immediate updates of the CPU load information for all CPUs "governed" by the given dbs_data by walking the cpu_dbs_info structures for all online CPUs in the system and updating them. This is questionable for two reasons. First, it may lead to a lot of extra overhead on a system with many CPUs if the given dbs_data is only associated with a few of them. Second, if governor tunables are per-policy, the CPUs associated with the other sets of governor tunables should not be updated. To address this issue, use the observation that in all of the places in question the update operation may be carried out in the same way (because all of the tunables involved are now located in struct dbs_data and readily available to the common code) and make the code in those places invoke the same (new) helper function that will carry out the update correctly. That new function always checks the ignore_nice_load tunable value and updates the CPUs' prev_cpu_nice data fields if that's set, which wasn't done by the original code in store_io_is_busy(), but it should have been done in there too. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:08 +01:00
Rafael J. Wysocki	8434dadbb4	cpufreq: governor: Drop unused governor callback and data fields After some previous changes, the ->get_cpu_dbs_info_s governor callback and the "governor" field in struct dbs_governor (whose value represents the governor type) are not used any more, so drop them. Also drop the unused gov_ops field from struct dbs_governor. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:07 +01:00
Rafael J. Wysocki	702c9e542a	cpufreq: governor: Add a ->start callback for governors To avoid having to check the governor type explicitly in the common code in order to initialize data structures specific to the governor type properly, add a ->start callback to struct dbs_governor and use it to initialize those data structures for the ondemand and conservative governors. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:07 +01:00
Rafael J. Wysocki	07aa4402a0	cpufreq: governor: Use microseconds in sample delay computations Do not convert microseconds to jiffies and the other way around in governor computations related to the sampling rate and sample delay and drop delay_for_sampling_rate() which isn't of any use then. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:05 +01:00
Rafael J. Wysocki	4cccf75557	cpufreq: governor: Get rid of the ->gov_check_cpu callback The way the ->gov_check_cpu governor callback is used by the ondemand and conservative governors is not really straightforward. Namely, the governor calls dbs_check_cpu() that updates the load information for the policy and the invokes ->gov_check_cpu() for the governor. To get rid of that entanglement, notice that cpufreq_governor_limits() doesn't need to call dbs_check_cpu() directly. Instead, it can simply reset the sample delay to 0 which will cause a sample to be taken immediately. The result of that is practically equivalent to calling dbs_check_cpu() except that it will trigger a full update of governor internal state and not just the ->gov_check_cpu() part. Following that observation, make cpufreq_governor_limits() reset the sample delay and turn dbs_check_cpu() into a function that will simply evaluate the load and return the result called dbs_update(). That function can now be called by governors from the routines that previously were pointed to by ->gov_check_cpu and those routines can be called directly by each governor instead of dbs_check_cpu(). This way ->gov_check_cpu becomes unnecessary, so drop it. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:41:04 +01:00
Viresh Kumar	aded387b94	cpufreq: conservative: Update sample_delay_ns immediately The ondemand governor already updates sample_delay_ns immediately on updates to the sampling rate, but conservative doesn't do that. It was left out earlier as the code was really too complex to get that done easily. Things are sorted out very well now, however, and the conservative governor can be modified to follow ondemand in that respect. Moreover, since the code needed to implement that in the conservative governor would be identical to the corresponding ondemand governor's code, make that code common and change both governors to use it. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:41:01 +01:00
Viresh Kumar	c443563036	cpufreq: governor: New sysfs show/store callbacks for governor tunables The ondemand and conservative governors use the global-attr or freq-attr structures to represent sysfs attributes corresponding to their tunables (which of them is actually used depends on whether or not different policy objects can use the same governor with different tunables at the same time and, consequently, on where those attributes are located in sysfs). Unfortunately, in the freq-attr case, the standard cpufreq show/store sysfs attribute callbacks are applied to the governor tunable attributes and they always acquire the policy->rwsem lock before carrying out the operation. That may lead to an ABBA deadlock if governor tunable attributes are removed under policy->rwsem while one of them is being accessed concurrently (if sysfs attributes removal wins the race, it will wait for the access to complete with policy->rwsem held while the attribute callback will block on policy->rwsem indefinitely). We attempted to address this issue by dropping policy->rwsem around governor tunable attributes removal (that is, around invocations of the ->governor callback with the event arg equal to CPUFREQ_GOV_POLICY_EXIT) in cpufreq_set_policy(), but that opened up race conditions that had not been possible with policy->rwsem held all the time. Therefore policy->rwsem cannot be dropped in cpufreq_set_policy() at any point, but the deadlock situation described above must be avoided too. To that end, use the observation that in principle governor tunables may be represented by the same data type regardless of whether the governor is system-wide or per-policy and introduce a new structure, struct governor_attr, for representing them and new corresponding macros for creating show/store sysfs callbacks for them. Also make their parent kobject use a new kobject type whose default show/store callbacks are not related to the standard core cpufreq ones in any way (and they don't acquire policy->rwsem in particular). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Subject & changelog + rebase ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:40:58 +01:00
Viresh Kumar	ff4b17895e	cpufreq: governor: Move common tunables to 'struct dbs_data' There are a few common tunables shared between the ondemand and conservative governors. Move them to struct dbs_data to simplify code. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:40:58 +01:00
Viresh Kumar	d0684d3b89	cpufreq: governor: Create generic macro for common tunables Some tunables are present in governor-specific structures, whereas one (min_sampling_rate) is located directly in struct dbs_data. There is a special macro for creating its sysfs attribute and the show/store callbacks, but since more tunables are going to be moved to struct dbs_data, a new generic macro for such cases will be useful, so add it and use it for min_sampling_rate. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Juri Lelli <juri.lelli@arm.com> Tested-by: Shilpasri G Bhat <shilpa.bhat@linux.vnet.ibm.com> [ rjw: Subject & changelog ] Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2016-03-09 14:40:57 +01:00
Rafael J. Wysocki	bc505475b8	cpufreq: governor: Rearrange governor data structures The struct policy_dbs_info objects representing per-policy governor data are not accessible directly from the corresponding policy objects. To access them, one has to get a pointer to the struct cpu_dbs_info of policy->cpu and use the policy_dbs field of that which isn't really straightforward. To address that rearrange the governor data structures so the governor_data pointer in struct cpufreq_policy will point to struct policy_dbs_info (instead of struct dbs_data) and that will contain a pointer to struct dbs_data. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:56 +01:00
Rafael J. Wysocki	d10b5eb5fc	cpufreq: governor: Drop cpu argument from dbs_check_cpu() Since policy->cpu is always passed as the second argument to dbs_check_cpu(), it is not really necessary to pass it, because the function can obtain that value via its first argument just fine. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:55 +01:00
Rafael J. Wysocki	e40e7b255e	cpufreq: governor: Rename cpu_common_dbs_info to policy_dbs_info The struct cpu_common_dbs_info structure represents the per-policy part of the governor data (for the ondemand and conservative governors), but its name doesn't reflect its purpose. Rename it to struct policy_dbs_info and rename variables related to it accordingly. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:55 +01:00
Rafael J. Wysocki	ea59ee0dc9	cpufreq: governor: Drop the gov pointer from struct dbs_data Since it is possible to obtain a pointer to struct dbs_governor from a pointer to the struct governor embedded in it with the help of container_of(), the additional gov pointer in struct dbs_data isn't really necessary. Drop that pointer and make the code using it reach the dbs_governor object via policy->governor. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:55 +01:00
Rafael J. Wysocki	906a6e5aae	cpufreq: governor: Rework cpufreq_governor_dbs() Since it is possible to obtain a pointer to struct dbs_governor from a pointer to the struct governor embedded in it via container_of(), the second argument of cpufreq_governor_init() is not necessary. Accordingly, cpufreq_governor_dbs() doesn't need its second argument either and the ->governor callbacks for both the ondemand and conservative governors may be set to cpufreq_governor_dbs() directly. Make that happen. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:54 +01:00
Rafael J. Wysocki	7bdad34d08	cpufreq: governor: Rename some data types and variables The ondemand and conservative governors are represented by struct common_dbs_data whose name doesn't reflect the purpose it is used for, so rename it to struct dbs_governor and rename variables of that type accordingly. No functional changes. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:54 +01:00
Rafael J. Wysocki	af92618523	cpufreq: governor: Put governor structure into common_dbs_data For the ondemand and conservative governors (generally, governors that use the common code in cpufreq_governor.c), there are two static data structures representing the governor, the struct governor structure (the interface to the cpufreq core) and the struct common_dbs_data one (the interface to the cpufreq_governor.c code). There's no fundamental reason why those two structures have to be separate. Moreover, if the struct governor one is included into struct common_dbs_data, it will be possible to reach the latter from the policy via its policy->governor pointer, so it won't be necessary to pass a separate pointer to it around. For this reason, embed struct governor in struct common_dbs_data. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:54 +01:00
Rafael J. Wysocki	2bb8d94fb0	cpufreq: governor: Use common mutex for dbs_data protection Every governor relying on the common code in cpufreq_governor.c has to provide its own mutex in struct common_dbs_data. However, there actually is no need to have a separate mutex per governor for this purpose, they may be using the same global mutex just fine. Accordingly, introduce a single common mutex for that and drop the mutex field from struct common_dbs_data. That at least will ensure that the mutex is always present and initialized regardless of what the particular governors do. Another benefit is that the common code does not need a pointer to a governor-related structure to get to the mutex which sometimes helps. Finally, it makes the code generally easier to follow. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-03-09 14:40:53 +01:00
Rafael J. Wysocki	9be4fd2c77	cpufreq: governor: Replace timers with utilization update callbacks Instead of using a per-CPU deferrable timer for queuing up governor work items, register a utilization update callback that will be invoked from the scheduler on utilization changes. The sampling rate is still the same as what was used for the deferrable timers and the added irq_work overhead should be offset by the eliminated timers overhead, so in theory the functional impact of this patch should not be significant. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Tested-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>	2016-03-09 14:40:53 +01:00
Rafael J. Wysocki	de1df26b7c	cpufreq: Clean up default and fallback governor setup The preprocessor magic used for setting the default cpufreq governor (and for using the performance governor as a fallback one for that matter) is really nasty, so replace it with __weak functions and overrides. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Saravana Kannan <skannan@codeaurora.org> Acked-by: Viresh Kumar <viresh.kumar@linaro.org>	2016-02-05 02:37:42 +01:00
Viresh Kumar	affde5d06a	cpufreq: governor: Pass policy as argument to ->gov_dbs_timer() Pass 'policy' as argument to ->gov_dbs_timer() instead of cdbs and dbs_data. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-12-07 02:20:22 +01:00
Viresh Kumar	03d5eec000	cpufreq: conservative: remove 'enable' field Conservative governor has its own 'enable' field to check if conservative governor is used for a CPU or not This can be checked by policy->governor with 'cpufreq_gov_conservative' and so this field can be dropped. Because its not guaranteed that dbs_info->cdbs.shared will a valid pointer for all CPUs (will be NULL for CPUs that don't use ondemand/conservative governors), we can't use it anymore. Lets get policy with cpufreq_cpu_get_raw() instead. Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-09-26 02:59:38 +02:00
Viresh Kumar	43e0ee361e	cpufreq: governor: split out common part of {cs\|od}_dbs_timer() Some part of cs_dbs_timer() and od_dbs_timer() is exactly same and is unnecessarily duplicated. Create the real work-handler in cpufreq_governor.c and put the common code in this routine (dbs_timer()). Shouldn't make any functional change. Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-07-21 01:12:01 +02:00
Viresh Kumar	44152cb82d	cpufreq: governor: Keep single copy of information common to policy->cpus Some information is common to all CPUs belonging to a policy, but are kept on per-cpu basis. Lets keep that in another structure common to all policy->cpus. That will make updates/reads to that less complex and less error prone. The memory for cpu_common_dbs_info is allocated/freed at INIT/EXIT, so that it we don't reallocate it for STOP/START sequence. It will be also be used (in next patch) while the governor is stopped and so must not be freed that early. Reviewed-and-tested-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-07-21 01:12:01 +02:00
Viresh Kumar	42994af63c	cpufreq: governor: rename cur_policy as policy Just call it 'policy', cur_policy is unnecessarily long and doesn't have any special meaning. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-07-17 23:46:48 +02:00
Viresh Kumar	386d46e6d5	cpufreq: governor: Name delayed-work as dwork Delayed work was named as 'work' and to access work within it we do work.work. Not much readable. Rename delayed_work as 'dwork'. Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-07-17 23:46:47 +02:00
Viresh Kumar	732b6d617a	cpufreq: governor: Serialize governor callbacks There are several races reported in cpufreq core around governors (only ondemand and conservative) by different people. There are at least two race scenarios present in governor code: (a) Concurrent access/updates of governor internal structures. It is possible that fields such as 'dbs_data->usage_count', etc. are accessed simultaneously for different policies using same governor structure (i.e. CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag unset). And because of this we can dereference bad pointers. For example consider a system with two CPUs with separate 'struct cpufreq_policy' instances. CPU0 governor: ondemand and CPU1: powersave. CPU0 switching to powersave and CPU1 to ondemand: CPU0 CPU1 store* store* cpufreq_governor_exit() cpufreq_governor_init() dbs_data = cdata->gdbs_data; if (!--dbs_data->usage_count) kfree(dbs_data); dbs_data->usage_count++; Bad pointer dereference There are other races possible between EXIT and START/STOP/LIMIT as well. Its really complicated. (b) Switching governor state in bad sequence: For example trying to switch a governor to START state, when the governor is in EXIT state. There are some checks present in __cpufreq_governor() but they aren't sufficient as they compare events against 'policy->governor_enabled', where as we need to take governor's state into account, which can be used by multiple policies. These two issues need to be solved separately and the responsibility should be properly divided between cpufreq and governor core. The first problem is more about the governor core, as it needs to protect its structures properly. And the second problem should be fixed in cpufreq core instead of governor, as its all about sequence of events. This patch is trying to solve only the first problem. There are two types of data we need to protect, - 'struct common_dbs_data': No matter what, there is going to be a single copy of this per governor. - 'struct dbs_data': With CPUFREQ_HAVE_GOVERNOR_PER_POLICY flag set, we will have per-policy copy of this data, otherwise a single copy. Because of such complexities, the mutex present in 'struct dbs_data' is insufficient to solve our problem. For example we need to protect fetching of 'dbs_data' from different structures at the beginning of cpufreq_governor_dbs(), to make sure it isn't currently being updated. This can be fixed if we can guarantee serialization of event parsing code for an individual governor. This is best solved with a mutex per governor, and the placeholder for that is 'struct common_dbs_data'. And so this patch moves the mutex from 'struct dbs_data' to 'struct common_dbs_data' and takes it at the beginning and drops it at the end of cpufreq_governor_dbs(). Tested with and without following configuration options: CONFIG_LOCKDEP_SUPPORT=y CONFIG_DEBUG_RT_MUTEXES=y CONFIG_DEBUG_PI_LIST=y CONFIG_DEBUG_SPINLOCK=y CONFIG_DEBUG_MUTEXES=y CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y CONFIG_LOCKDEP=y CONFIG_DEBUG_ATOMIC_SLEEP=y Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-06-15 15:42:53 +02:00
Viresh Kumar	8e0484d2b3	cpufreq: governor: register notifier from cs_init() Notifiers are required only for conservative governor and the common governor code is unnecessarily polluted with that. Handle that from cs_init/exit() instead of cpufreq_governor_dbs(). Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org> Reviewed-by: Preeti U Murthy <preeti@linux.vnet.ibm.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2015-06-15 15:37:12 +02:00
Xiaoguang Chen	6d7bcb1464	cpufreq: conservative: set requested_freq to policy max when it is over policy max When requested_freq is over policy->max, set it to policy->max. This can help to speed up decreasing frequency. Signed-off-by: Xiaoguang Chen <chenxg@marvell.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-11-12 23:18:20 +01:00
Xiaoguang Chen	3baa976ae6	cpufreq: conservative: fix requested_freq reduction issue When decreasing frequency, requested_freq may be less than freq_target, So requested_freq minus freq_target may be negative, But reqested_freq's unit is unsigned int, then the negative result will be one larger interger which may be even higher than requested_freq. This patch is to fix such issue. when result becomes negative, set requested_freq as the min value of policy. Signed-off-by: Xiaoguang Chen <chenxg@marvell.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-11-07 19:36:19 +01:00
Stratos Karafotis	934dac1ea0	cpufreq: governors: Remove duplicate check of target freq in supported range Function __cpufreq_driver_target() checks if target_freq is within policy->min and policy->max range. generic_powersave_bias_target() also checks if target_freq is valid via a cpufreq_frequency_table_target() call. So, drop the unnecessary duplicate check in *_check_cpu(). Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>	2013-08-28 22:03:02 +02:00
Rafael J. Wysocki	c49a089c3e	Merge back earlier 'pm-cpufreq' material	2013-08-14 22:21:16 +02:00

1 2 3

134 Commits