There are two cases when we may try to add CPUs we're already handling:
- On boot, the first cpu has marked all policy->cpus managed and so we
will find policy for all other policy->cpus later on.
- When a managed cpu is hotplugged out and later brought back in.
Currently, separate paths and checks take care of the two. While the
first one is detected by testing cpu against 'policy->cpus', the other
one is detected by testing cpu against 'policy->related_cpus'.
We can handle them both via a single path and there is no need to do
special checking for the first one.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
[ rjw: Changelog, comments ]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Simply returning here with an error is not enough. It shouldn't be allowed at
all to try calling cpufreq_cpu_get() for an invalid CPU.
Add a WARN here to make it clear that it wouldn't be acceptable at all.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq_add_dev() is an unnecessary wrapper over __cpufreq_add_dev(). Merge
them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This clearly states what the code inside these routines is doing and how these
must be used.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
All CPUs leaving the first-online CPU are hotplugged out on suspend and
and cpufreq core stops managing them.
On resume, we need to call cpufreq_update_policy() for this CPU's policy
to make sure its frequency is in sync with cpufreq's cached value, as it
might have got updated by hardware during suspend/resume.
The policies are always added to the top of the policy-list. So, in
normal circumstances, CPU 0's policy will be the last one in the list.
And so the code checks for the last policy.
But there are cases where it will fail. Consider quad-core system, with
policy-per core. If CPU0 is hotplugged out and added back again, the
last policy will be on CPU1 :(
To fix this in a proper way, always look for the policy of the first
online CPU. That way we will be sure that we are calling
cpufreq_update_policy() for the only CPU that wasn't hotplugged out.
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Fixes: 2f0aea9363 ("cpufreq: suspend governors on system suspend/hibernate")
Reported-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
To make code more readable and less error prone, lets create a helper macro for
iterating over all available governors.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
To make code more readable and less error prone, lets create a helper macro for
iterating over all active policies.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When cpufreq is disabled, the per-cpu variable would have been set to
NULL. Remove this unnecessary check.
[ Changelog from Saravana Kannan. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Acked-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In __cpufreq_remove_dev_finish(), per-cpu 'cpufreq_cpu_data' needs
to be cleared before calling kobject_put(&policy->kobj) and under
cpufreq_driver_lock. Otherwise, if someone else calls cpufreq_cpu_get()
in parallel with it, they can obtain a non-NULL policy from that after
kobject_put(&policy->kobj) was executed.
Consider this case:
Thread A Thread B
cpufreq_cpu_get()
acquire cpufreq_driver_lock
read-per-cpu cpufreq_cpu_data
kobject_put(&policy->kobj);
kobject_get(&policy->kobj);
...
per_cpu(&cpufreq_cpu_data, cpu) = NULL
And this will result in a warning like this one:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 4 at include/linux/kref.h:47
kobject_get+0x41/0x50()
Modules linked in: acpi_cpufreq(+) nfsd auth_rpcgss nfs_acl
lockd grace sunrpc xfs libcrc32c sd_mod ixgbe igb mdio ahci hwmon
...
Call Trace:
[<ffffffff81661b14>] dump_stack+0x46/0x58
[<ffffffff81072b61>] warn_slowpath_common+0x81/0xa0
[<ffffffff81072c7a>] warn_slowpath_null+0x1a/0x20
[<ffffffff812e16d1>] kobject_get+0x41/0x50
[<ffffffff815262a5>] cpufreq_cpu_get+0x75/0xc0
[<ffffffff81527c3e>] cpufreq_update_policy+0x2e/0x1f0
[<ffffffff810b8cb2>] ? up+0x32/0x50
[<ffffffff81381aa9>] ? acpi_ns_get_node+0xcb/0xf2
[<ffffffff81381efd>] ? acpi_evaluate_object+0x22c/0x252
[<ffffffff813824f6>] ? acpi_get_handle+0x95/0xc0
[<ffffffff81360967>] ? acpi_has_method+0x25/0x40
[<ffffffff81391e08>] acpi_processor_ppc_has_changed+0x77/0x82
[<ffffffff81089566>] ? move_linked_works+0x66/0x90
[<ffffffff8138e8ed>] acpi_processor_notify+0x58/0xe7
[<ffffffff8137410c>] acpi_ev_notify_dispatch+0x44/0x5c
[<ffffffff8135f293>] acpi_os_execute_deferred+0x15/0x22
[<ffffffff8108c910>] process_one_work+0x160/0x410
[<ffffffff8108d05b>] worker_thread+0x11b/0x520
[<ffffffff8108cf40>] ? rescuer_thread+0x380/0x380
[<ffffffff81092421>] kthread+0xe1/0x100
[<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
[<ffffffff81669ebc>] ret_from_fork+0x7c/0xb0
[<ffffffff81092340>] ? kthread_create_on_node+0x1b0/0x1b0
---[ end trace 89e66eb9795efdf7 ]---
The actual code flow is as follows:
Thread A: Workqueue: kacpi_notify
acpi_processor_notify()
acpi_processor_ppc_has_changed()
cpufreq_update_policy()
cpufreq_cpu_get()
kobject_get()
Thread B: xenbus_thread()
xenbus_thread()
msg->u.watch.handle->callback()
handle_vcpu_hotplug_event()
vcpu_hotplug()
cpu_down()
__cpu_notify(CPU_POST_DEAD..)
cpufreq_cpu_callback()
__cpufreq_remove_dev_finish()
cpufreq_policy_put_kobj()
kobject_put()
cpufreq_cpu_get() gets the policy from per-cpu variable cpufreq_cpu_data
under cpufreq_driver_lock, and once it gets a valid policy it expects it
to not be freed until cpufreq_cpu_put() is called.
But the race happens when another thread puts the kobject first and updates
cpufreq_cpu_data before or later. And so the first thread gets a valid policy
structure and before it does kobject_get() on it, the second one has already
done kobject_put().
Fix this by setting cpufreq_cpu_data to NULL before putting the kobject and that
too under locks.
Reported-by: Ethan Zhao <ethan.zhao@oracle.com>
Reported-by: Santosh Shilimkar <santosh.shilimkar@oracle.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CPUFREQ_UPDATE_POLICY_CPU notifications were used only from cpufreq-stats which
doesn't use it anymore. Remove them.
This also decrements values of other notification macros defined after
CPUFREQ_UPDATE_POLICY_CPU by 1 to remove gaps. Hopefully all users are using
macro's instead of direct numbers and so they wouldn't break as macro values are
changed now.
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
'last_cpu' was used only from cpufreq-stats and isn't used anymore. Get rid of
it.
Reviewed-by: Prarit Bhargava <prarit@redhat.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We need to initialize completion and work only on policy allocation and not
really on the policy restore side and so we better move this piece of code to
cpufreq_policy_alloc().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CPUFREQ_STICKY flag is set by drivers which don't want to get unregistered
even if cpufreq-core isn't able to initialize policy for any CPU.
When this flag isn't set, we try to unregister the driver. To find out
which CPUs are registered and which are not, we try to check per_cpu
cpufreq_cpu_data for all CPUs. Because we have a list of valid policies
available now, we better check if the list is empty or not instead of
the 'for' loop. That will be much more efficient.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
These variables are just used within adjust_jiffies() and so must be
local to it. Also there is no need of a dummy routine for CONFIG_SMP
case as we can take care of all that with help of macros in the same
routine. It doesn't look that ugly.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We just need to check if a 'policy' is already present for the cpu we are
adding. We don't need to take all the locks and do kobject usage updates. Use
the light-weight cpufreq_cpu_get_raw() routine instead.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is no need of this separate variable, use 'policy' instead.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
These are messing up more than the benefit they provide. It isn't
a lot of code anyway, that we will compile without them.
Kill them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We should first check if a cpufreq driver is already registered or not
before updating driver_data->flags.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is no point finding out the 'policy' again within __cpufreq_get()
when all the callers already have it. Just make them pass policy instead.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There is no point finding out the 'policy' again within cpufreq_out_of_sync()
when all the callers already have it. Just make them pass policy instead.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Either we can be setpolicy or target type, nothing else. And so the
else part of setpolicy will automatically be of has_target() type.
And so we don't need to check it again.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove unnecessary from find_governor's name.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are two 'if' blocks here, checking for !cpufreq_driver->setpolicy and
has_target(). Both are actually doing the same thing, merge them.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
No need of an unnecessary line break.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We can live without it and so we should.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
It doesn't make any sense at all and is a leftover of some earlier commit.
Remove it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We should stop cpufreq governors when we shut down the system. If we
don't do this, we can end up with this deadlock:
1. cpufreq governor may be running on a CPU other than CPU0.
2. In machine_restart() we call smp_send_stop() which stops CPUs.
If one of these CPUs was actively running a cpufreq governor
then it may have the mutex / spinlock needed to access the main
PMIC in the system (perhaps over I2C)
3. If a machine needs access to the main PMIC in order to shutdown
then it will never get it since the mutex was lost when the other
CPU stopped.
4. We'll hang (possibly eventually hitting the hard lockup detector).
Let's avoid the problem by stopping the cpufreq governor at shutdown,
which is a sensible thing to do anyway.
Signed-off-by: Doug Anderson <dianders@chromium.org>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently there is no callback for cpufreq drivers which is called once the
policy is ready to be used. There are some requirements where such a callback is
required.
One of them is registering a cooling device with the help of
of_cpufreq_cooling_register(). This routine tries to get 'struct cpufreq_policy'
for CPUs which isn't yet initialed at the time ->init() is called and so we face
issues while registering the cooling device.
Because we can't register cooling device from ->init(), we need a callback that
is called after the policy is ready to be used and hence we introduce ->ready()
callback.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Eduardo Valentin <edubezval@gmail.com>
Tested-by: Eduardo Valentin <edubezval@gmail.com>
Reviewed-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Do it before it's assigned to cpufreq_cpu_data, otherwise when a driver
tries to get the cpu frequency during initialization the policy kobj is
referenced and we get this warning:
------------[ cut here ]------------
WARNING: CPU: 1 PID: 64 at include/linux/kref.h:47 kobject_get+0x64/0x70()
Modules linked in:
CPU: 1 PID: 64 Comm: irq/77-tegra-ac Not tainted 3.18.0-rc4-next-20141114ccu-00050-g3eff942 #326
[<c0016fac>] (unwind_backtrace) from [<c001272c>] (show_stack+0x10/0x14)
[<c001272c>] (show_stack) from [<c06085d8>] (dump_stack+0x98/0xd8)
[<c06085d8>] (dump_stack) from [<c002892c>] (warn_slowpath_common+0x84/0xb4)
[<c002892c>] (warn_slowpath_common) from [<c00289f8>] (warn_slowpath_null+0x1c/0x24)
[<c00289f8>] (warn_slowpath_null) from [<c0220290>] (kobject_get+0x64/0x70)
[<c0220290>] (kobject_get) from [<c03e944c>] (cpufreq_cpu_get+0x88/0xc8)
[<c03e944c>] (cpufreq_cpu_get) from [<c03e9500>] (cpufreq_get+0xc/0x64)
[<c03e9500>] (cpufreq_get) from [<c0285288>] (actmon_thread_isr+0x134/0x198)
[<c0285288>] (actmon_thread_isr) from [<c0069008>] (irq_thread_fn+0x1c/0x40)
[<c0069008>] (irq_thread_fn) from [<c0069324>] (irq_thread+0x134/0x174)
[<c0069324>] (irq_thread) from [<c0040290>] (kthread+0xdc/0xf4)
[<c0040290>] (kthread) from [<c000f4b8>] (ret_from_fork+0x14/0x3c)
---[ end trace b7bd64a81b340c59 ]---
Signed-off-by: Tomeu Vizoso <tomeu.vizoso@collabora.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When the user space tries to set scaling_(max|min)_freq through
sysfs, the cpufreq_set_policy() asks other driver's opinions
for the max/min frequencies. Some device drivers, like Tegra
CPU EDP which is not upstreamed yet though, may constrain the
CPU maximum frequency dynamically because of board design.
So if the user space access happens and some driver is capping
the cpu frequency at the same time, the user_policy->(max|min)
is overridden by the capped value, and that's not expected by
the user space. And if the user space is not invoked again,
the CPU will always be capped by the user_policy->(max|min)
even no drivers limit the CPU frequency any more.
This patch preserves the user specified min/max settings, so that
every time the cpufreq policy is updated, the new max/min can
be re-evaluated correctly based on the user's expection and
the present device drivers' status.
Signed-off-by: Vince Hsu <vinceh@nvidia.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When resuming from s2ram on an SMP system without cpufreq operating
points (e.g. there's no "operating-points" property for the CPU node in
DT, or the platform doesn't use DT yet), the kernel crashes when
bringing CPU 1 online:
Enabling non-boot CPUs ...
CPU1: Booted secondary processor
Unable to handle kernel NULL pointer dereference at virtual address 0000003c
pgd = ee5e6b00
[0000003c] *pgd=6e579003, *pmd=6e588003, *pte=00000000
Internal error: Oops: a07 [#1] SMP ARM
Modules linked in:
CPU: 0 PID: 1246 Comm: s2ram Tainted: G W 3.18.0-rc3-koelsch-01614-g0377af242bb175c8-dirty #589
task: eeec5240 ti: ee704000 task.ti: ee704000
PC is at __cpufreq_add_dev.isra.24+0x24c/0x77c
LR is at __cpufreq_add_dev.isra.24+0x244/0x77c
pc : [<c0298efc>] lr : [<c0298ef4>] psr: 60000153
sp : ee705d48 ip : ee705d48 fp : ee705d84
r10: c04e0450 r9 : 00000000 r8 : 00000001
r7 : c05426a8 r6 : 00000001 r5 : 00000001 r4 : 00000000
r3 : 00000000 r2 : 00000000 r1 : 20000153 r0 : c0542734
Verify that policy is not NULL before dereferencing it to fix this.
Signed-off-by: Geert Uytterhoeven <geert+renesas@glider.be>
Fixes: 8414809c6a (cpufreq: Preserve policy structure across suspend/resume)
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently the core does not expose scaling_cur_freq for set_policy()
drivers this breaks some userspace monitoring tools.
Change the core to expose this file for all drivers and if the
set_policy() driver supports the get() callback use it to retrieve the
current frequency.
Link: https://bugzilla.kernel.org/show_bug.cgi?id=73741
Cc: All applicable <stable@vger.kernel.org>
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This commit extends the cpufreq_driver structure with an additional
'void *driver_data' field that can be filled by the ->probe() function
of a cpufreq driver to pass additional custom information to the
driver itself.
A new function called cpufreq_get_driver_data() is added to allow a
cpufreq driver to retrieve those driver data, since they are typically
needed from a cpufreq_policy->init() callback, which does not have
access to the cpufreq_driver structure. This function call is similar
to the existing cpufreq_get_current_driver() function call.
Signed-off-by: Thomas Petazzoni <thomas.petazzoni@free-electrons.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 8e30444e15 ("cpufreq: fix cpufreq suspend/resume for intel_pstate")
introduced a bug where the governors wouldn't be stopped anymore for
->target{_index}() drivers during suspend. This happens because
'cpufreq_suspended' is updated before stopping the governors during suspend
and due to this __cpufreq_governor() would return early due to this check:
/* Don't start any governor operations if we are entering suspend */
if (cpufreq_suspended)
return 0;
Fixes: 8e30444e15 ("cpufreq: fix cpufreq suspend/resume for intel_pstate")
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+: 8e30444e15 "cpufreq: fix cpufreq suspend/resume for intel_pstate"
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The kernel used to contain two functions for length-delimited,
case-insensitive string comparison, strnicmp with correct semantics
and a slightly buggy strncasecmp. The latter is the POSIX name, so
strnicmp was renamed to strncasecmp, and strnicmp made into a wrapper
for the new strncasecmp to avoid breaking existing users.
To allow the compat wrapper strnicmp to be removed at some point in
the future, and to avoid the extra indirection cost, do
s/strnicmp/strncasecmp/g.
Signed-off-by: Rasmus Villemoes <linux@rasmusvillemoes.dk>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 367dc4aa93 ("cpufreq: Add stop CPU callback to
cpufreq_driver interface") introduced the stop CPU callback for
intel_pstate drivers. During the CPU_DOWN_PREPARE stage, this
callback is invoked so that drivers can take some action on the
pstate of the cpu before it is taken offline. This callback was
assumed to be useful only for those drivers which have implemented
the set_policy CPU callback because they have no other way to take
action about the cpufreq of a CPU which is being hotplugged out
except in the exit callback which is called very late in the offline
process.
The drivers which implement the target/target_index callbacks were
expected to take care of requirements like the ones that commit
367dc4aa addresses in the GOV_STOP notification event. But there
are disadvantages to restricting the usage of stop CPU callback
to cpufreq drivers that implement the set_policy callbacks and who
want to take explicit action on the setting the cpufreq during a
hotplug operation.
1.GOV_STOP gets called for every CPU offline and drivers would usually
want to take action when the last cpu in the policy->cpus mask
is taken offline. As long as there is more than one cpu in the
policy->cpus mask, cpufreq core itself makes sure that the freq
for the other cpus in this mask is set according to the maximum load.
This is sensible and drivers which implement the target_index callback
would mostly not want to modify that. However the cpufreq core leaves a
loose end when the cpu in the policy->cpus mask is the last one to go offline;
it does nothing explicit to the frequency of the core. Drivers may need
a way to take some action here and stop CPU callback mechanism is the
best way to do it today.
2. We cannot implement driver specific actions in the GOV_STOP mechanism.
So we will need another driver callback which is invoked from here which is
unnecessary.
Therefore this patch extends the usage of stop CPU callback to be used
by all cpufreq drivers as long as they have this callback implemented
and irrespective of whether they are set_policy/target_index drivers.
The assumption is if the drivers find the GOV_STOP path to be a suitable
way of implementing what they want to do with the freq of the cpu
going offine,they will not implement the stop CPU callback at all.
Signed-off-by: Preeti U Murthy <preeti@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
While debugging a cpufreq-related hardware failure on a system I saw the
following lockdep warning:
=========================
[ BUG: held lock freed! ] 3.17.0-rc4+ #1 Tainted: G E
-------------------------
insmod/2247 is freeing memory ffff88006e1b1400-ffff88006e1b17ff, with a lock still held there!
(&policy->rwsem){+.+...}, at: [<ffffffff8156d37d>] __cpufreq_add_dev.isra.21+0x47d/0xb80
3 locks held by insmod/2247:
#0: (subsys mutex#5){+.+.+.}, at: [<ffffffff81485579>] subsys_interface_register+0x69/0x120
#1: (cpufreq_rwsem){.+.+.+}, at: [<ffffffff8156cf73>] __cpufreq_add_dev.isra.21+0x73/0xb80
#2: (&policy->rwsem){+.+...}, at: [<ffffffff8156d37d>] __cpufreq_add_dev.isra.21+0x47d/0xb80
stack backtrace:
CPU: 0 PID: 2247 Comm: insmod Tainted: G E 3.17.0-rc4+ #1
Hardware name: HP ProLiant MicroServer Gen8, BIOS J06 08/24/2013
0000000000000000 000000008f3063c4 ffff88006f87bb30 ffffffff8171b358
ffff88006bcf3750 ffff88006f87bb68 ffffffff810e09e1 ffff88006e1b1400
ffffea0001b86c00 ffffffff8156d327 ffff880073003500 0000000000000246
Call Trace:
[<ffffffff8171b358>] dump_stack+0x4d/0x66
[<ffffffff810e09e1>] debug_check_no_locks_freed+0x171/0x180
[<ffffffff8156d327>] ? __cpufreq_add_dev.isra.21+0x427/0xb80
[<ffffffff8121412b>] kfree+0xab/0x2b0
[<ffffffff8156d327>] __cpufreq_add_dev.isra.21+0x427/0xb80
[<ffffffff81724cf7>] ? _raw_spin_unlock+0x27/0x40
[<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
[<ffffffff8156da8e>] cpufreq_add_dev+0xe/0x10
[<ffffffff814855d1>] subsys_interface_register+0xc1/0x120
[<ffffffff8156bcf2>] cpufreq_register_driver+0x112/0x340
[<ffffffff8121415a>] ? kfree+0xda/0x2b0
[<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
[<ffffffffa003562e>] pcc_cpufreq_init+0x4af/0xe81 [pcc_cpufreq]
[<ffffffffa003517f>] ? pcc_cpufreq_do_osc+0x17f/0x17f [pcc_cpufreq]
[<ffffffff81002144>] do_one_initcall+0xd4/0x210
[<ffffffff811f7472>] ? __vunmap+0xd2/0x120
[<ffffffff81127155>] load_module+0x1315/0x1b70
[<ffffffff811222a0>] ? store_uevent+0x70/0x70
[<ffffffff811229d9>] ? copy_module_from_fd.isra.44+0x129/0x180
[<ffffffff81127b86>] SyS_finit_module+0xa6/0xd0
[<ffffffff81725b69>] system_call_fastpath+0x16/0x1b
cpufreq: __cpufreq_add_dev: ->get() failed
insmod: ERROR: could not insert module pcc-cpufreq.ko: No such device
The warning occurs in the __cpufreq_add_dev() code which does
down_write(&policy->rwsem);
...
if (cpufreq_driver->get && !cpufreq_driver->setpolicy) {
policy->cur = cpufreq_driver->get(policy->cpu);
if (!policy->cur) {
pr_err("%s: ->get() failed\n", __func__);
goto err_get_freq;
}
If cpufreq_driver->get(policy->cpu) returns an error we execute the
code at err_get_freq, which does not up the policy->rwsem. This causes
the lockdep warning.
Trivial patch to up the policy->rwsem in the error path.
After the patch has been applied, and an error occurs in the
cpufreq_driver->get(policy->cpu) call we will now see
cpufreq: __cpufreq_add_dev: ->get() failed
cpufreq: __cpufreq_add_dev: ->get() failed
modprobe: ERROR: could not insert 'pcc_cpufreq': No such device
Fixes: 4e97b631f2 (cpufreq: Initialize governor for a new policy under policy->rwsem)
Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.14+ <stable@vger.kernel.org> # 3.14+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Cpufreq core introduces cpufreq_suspended flag to let cpufreq sysfs nodes
across S2RAM/S2DISK. But the flag is only set in the cpufreq_suspend()
for cpufreq drivers which have target or target_index callback. This
skips intel_pstate driver. This patch is to set the flag before checking
target or target_index callback.
Fixes: 2f0aea9363 (cpufreq: suspend governors on system suspend/hibernate)
Signed-off-by: Lan Tianyu <tianyu.lan@intel.com>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
[rjw: Subject]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We are calling kobject_move() from two separate places currently and both these
places share another routine update_policy_cpu() which is handling everything
around updating policy->cpu. Moving ownership of policy->kobj also lies under
the role of update_policy_cpu() routine and must be handled from there.
So, Lets move kobject_move() to update_policy_cpu() and get rid of
cpufreq_nominate_new_policy_cpu() as it doesn't have anything significant left.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We are returning -EINVAL instead of the error returned from kobject_move() when
it fails. Propagate the actual error number.
Also add a meaningful print when sysfs_create_link() fails.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
While hot-unplugging policy->cpu, we call cpufreq_nominate_new_policy_cpu() to
nominate next owner of policy, i.e. policy->cpu. If we fail to move policy
kobject under the new policy->cpu, we try to update policy->cpus with the old
policy->cpu.
This would have been required in case old-CPU is removed from policy->cpus in
the first place. But its not done before calling
cpufreq_nominate_new_policy_cpu(), but during the POST_DEAD notification which
happens quite late in the hot-unplugging path.
So, this is just some useless code hanging around, get rid of it.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This is only relevant to implementations with multiple clusters, where clusters
have separate clock lines but all CPUs within a cluster share it.
Consider a dual cluster platform with 2 cores per cluster. During suspend we
start hot unplugging CPUs in order 1 to 3. When CPU2 is removed, policy->kobj
would be moved to CPU3 and when CPU3 goes down we wouldn't free policy or its
kobj as we want to retain permissions/values/etc.
Now on resume, we will get CPU2 before CPU3 and will call __cpufreq_add_dev().
We will recover the old policy and update policy->cpu from 3 to 2 from
update_policy_cpu().
But the kobj is still tied to CPU3 and isn't moved to CPU2. We wouldn't create a
link for CPU2, but would try that for CPU3 while bringing it online. Which will
report errors as CPU3 already has kobj assigned to it.
This bug got introduced with commit 42f921a, which overlooked this scenario.
To fix this, lets move kobj to the new policy->cpu while bringing first CPU of a
cluster back. Also do a WARN_ON() if kobject_move failed, as we would reach here
only for the first CPU of a non-boot cluster. And we can't recover from this
situation, if kobject_move() fails.
Fixes: 42f921a6f1 (cpufreq: remove sysfs files for CPUs which failed to come back after resume)
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Reported-and-tested-by: Bu Yitian <ybu@qti.qualcomm.com>
Reported-by: Saravana Kannan <skannan@codeaurora.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa@mit.edu>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit bd0fa9bb45 introduced a failure path to cpufreq_update_policy() if
cpufreq_driver->get(cpu) returns NULL. However, it jumps to the 'no_policy'
label, which exits without unlocking any of the locks the function acquired
earlier. This causes later calls into cpufreq to hang.
Fix this by creating a new 'unlock' label and jumping to that instead.
Fixes: bd0fa9bb45 ("cpufreq: Return error if ->get() failed in cpufreq_update_policy()")
Link: https://devtalk.nvidia.com/default/topic/751903/kernel-3-15-and-nv-drivers-337-340-failed-to-initialize-the-nvidia-kernel-module-gtx-550-ti-/
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Cc: 3.15+ <stable@vger.kernel.org> # 3.15+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Douglas Anderson, recently pointed out an interesting problem due to which
udelay() was expiring earlier than it should.
While transitioning between frequencies few platforms may temporarily switch to
a stable frequency, waiting for the main PLL to stabilize.
For example: When we transition between very low frequencies on exynos, like
between 200MHz and 300MHz, we may temporarily switch to a PLL running at 800MHz.
No CPUFREQ notification is sent for that. That means there's a period of time
when we're running at 800MHz but loops_per_jiffy is calibrated at between 200MHz
and 300MHz. And so udelay behaves badly.
To get this fixed in a generic way, introduce another set of callbacks
get_intermediate() and target_intermediate(), only for drivers with
target_index() and CPUFREQ_ASYNC_NOTIFICATION unset.
get_intermediate() should return a stable intermediate frequency platform wants
to switch to, and target_intermediate() should set CPU to that frequency,
before jumping to the frequency corresponding to 'index'. Core will take care of
sending notifications and driver doesn't have to handle them in
target_intermediate() or target_index().
NOTE: ->target_index() should restore to policy->restore_freq in case of
failures as core would send notifications for that.
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Handling calls to ->target_index() has got complex over time and might become
more complex. So, its better to take target_index() bits out in another routine
__target_index() for better code readability. Shouldn't have any functional
impact.
Tested-by: Stephen Warren <swarren@nvidia.com>
Reviewed-by: Doug Anderson <dianders@chromium.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
On platforms that use cpufreq_for_each_* macros, build fails if
CONFIG_CPU_FREQ=n, e.g. ARM/shmobile/koelsch/non-multiplatform:
drivers/built-in.o: In function `clk_round_parent':
clkdev.c:(.text+0xcf168): undefined reference to `cpufreq_next_valid'
drivers/built-in.o: In function `clk_rate_table_find':
clkdev.c:(.text+0xcf820): undefined reference to `cpufreq_next_valid'
make[3]: *** [vmlinux] Error 1
Fix this making cpufreq_next_valid function inline and move it to
cpufreq.h.
Fixes: 27e289dce2 (cpufreq: Introduce macros for cpufreq_frequency_table iteration)
Reported-and-tested-by: Geert Uytterhoeven <geert@linux-m68k.org>
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Some cpufreq drivers were redundantly invoking the _begin() and _end()
APIs around frequency transitions, and this double invocation (one from
the cpufreq core and the other from the cpufreq driver) used to result
in a self-deadlock, leading to system hangs during boot. (The _begin()
API makes contending callers wait until the previous invocation is
complete. Hence, the cpufreq driver would end up waiting on itself!).
Now all such drivers have been fixed, but debugging this issue was not
very straight-forward (even lockdep didn't catch this). So let us add a
debug infrastructure to the cpufreq core to catch such issues more easily
in the future.
We add a new field called 'transition_task' to the policy structure, to keep
track of the task which is performing the frequency transition. Using this
field, we make note of this task during _begin() and print a warning if we
find a case where the same task is calling _begin() again, before completing
the previous frequency transition using the corresponding _end().
We have left out ASYNC_NOTIFICATION drivers from this debug infrastructure
for 2 reasons:
1. At the moment, we have no way to avoid a particular scenario where this
debug infrastructure can emit false-positive warnings for such drivers.
The scenario is depicted below:
Task A Task B
/* 1st freq transition */
Invoke _begin() {
...
...
}
Change the frequency
/* 2nd freq transition */
Invoke _begin() {
... //waiting for B to
... //finish _end() for
... //the 1st transition
... | Got interrupt for successful
... | change of frequency (1st one).
... |
... | /* 1st freq transition */
... | Invoke _end() {
... | ...
... V }
...
...
}
This scenario is actually deadlock-free because, once Task A changes the
frequency, it is Task B's responsibility to invoke the corresponding
_end() for the 1st frequency transition. Hence it is perfectly legal for
Task A to go ahead and attempt another frequency transition in the meantime.
(Of course it won't be able to proceed until Task B finishes the 1st _end(),
but this doesn't cause a deadlock or a hang).
The debug infrastructure cannot handle this scenario and will treat it as
a deadlock and print a warning. To avoid this, we exclude such drivers
from the purview of this code.
2. Luckily, we don't _need_ this infrastructure for ASYNC_NOTIFICATION drivers
at all! The cpufreq core does not automatically invoke the _begin() and
_end() APIs during frequency transitions in such drivers. Thus, the driver
alone is responsible for invoking _begin()/_end() and hence there shouldn't
be any conflicts which lead to double invocations. So, we can skip these
drivers, since the probability that such drivers will hit this problem is
extremely low, as outlined above.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Many cpufreq drivers need to iterate over the cpufreq_frequency_table
for various tasks.
This patch introduces two macros which can be used for iteration over
cpufreq_frequency_table keeping a common coding style across drivers:
- cpufreq_for_each_entry: iterate over each entry of the table
- cpufreq_for_each_valid_entry: iterate over each entry that contains
a valid frequency.
It should have no functional changes.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Lad, Prabhakar <prabhakar.csengg@gmail.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq_notify_transition() and cpufreq_notify_post_transition() shouldn't be
called directly by cpufreq drivers anymore and so these should be marked static.
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CPUFreq core has new infrastructure that would guarantee serialized calls to
target() or target_index() callbacks. These are called
cpufreq_freq_transition_begin() and cpufreq_freq_transition_end().
This patch converts existing drivers to use these new set of routines.
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Whenever we change the frequency of a CPU, we call the PRECHANGE and POSTCHANGE
notifiers. They must be serialized, i.e. PRECHANGE and POSTCHANGE notifiers
should strictly alternate, thereby preventing two different sets of PRECHANGE or
POSTCHANGE notifiers from interleaving arbitrarily.
The following examples illustrate why this is important:
Scenario 1:
-----------
A thread reading the value of cpuinfo_cur_freq, will call
__cpufreq_cpu_get()->cpufreq_out_of_sync()->cpufreq_notify_transition()
The ondemand governor can decide to change the frequency of the CPU at the same
time and hence it can end up sending the notifications via ->target().
If the notifiers are not serialized, the following sequence can occur:
- PRECHANGE Notification for freq A (from cpuinfo_cur_freq)
- PRECHANGE Notification for freq B (from target())
- Freq changed by target() to B
- POSTCHANGE Notification for freq B
- POSTCHANGE Notification for freq A
We can see from the above that the last POSTCHANGE Notification happens for freq
A but the hardware is set to run at freq B.
Where would we break then?: adjust_jiffies() in cpufreq.c & cpufreq_callback()
in arch/arm/kernel/smp.c (which also adjusts the jiffies). All the
loops_per_jiffy calculations will get messed up.
Scenario 2:
-----------
The governor calls __cpufreq_driver_target() to change the frequency. At the
same time, if we change scaling_{min|max}_freq from sysfs, it will end up
calling the governor's CPUFREQ_GOV_LIMITS notification, which will also call
__cpufreq_driver_target(). And hence we end up issuing concurrent calls to
->target().
Typically, platforms have the following logic in their ->target() routines:
(Eg: cpufreq-cpu0, omap, exynos, etc)
A. If new freq is more than old: Increase voltage
B. Change freq
C. If new freq is less than old: decrease voltage
Now, if the two concurrent calls to ->target() are X and Y, where X is trying to
increase the freq and Y is trying to decrease it, we get the following race
condition:
X.A: voltage gets increased for larger freq
Y.A: nothing happens
Y.B: freq gets decreased
Y.C: voltage gets decreased
X.B: freq gets increased
X.C: nothing happens
Thus we can end up setting a freq which is not supported by the voltage we have
set. That will probably make the clock to the CPU unstable and the system might
not work properly anymore.
This patch introduces a set of synchronization primitives to serialize frequency
transitions, which are to be used as shown below:
cpufreq_freq_transition_begin();
//Perform the frequency change
cpufreq_freq_transition_end();
The _begin() call sends the PRECHANGE notification whereas the _end() call sends
the POSTCHANGE notification. Also, all the necessary synchronization is handled
within these calls. In particular, even drivers which set the ASYNC_NOTIFICATION
flag can also use these APIs for performing frequency transitions (ie., you can
call _begin() from one task, and call the corresponding _end() from a different
task).
The actual synchronization underneath is not that complicated:
The key challenge is to allow drivers to begin the transition from one thread
and end it in a completely different thread (this is to enable drivers that do
asynchronous POSTCHANGE notification from bottom-halves, to also use the same
interface).
To achieve this, a 'transition_ongoing' flag, a 'transition_lock' spinlock and a
wait-queue are added per-policy. The flag and the wait-queue are used in
conjunction to create an "uninterrupted flow" from _begin() to _end(). The
spinlock is used to ensure that only one such "flow" is in flight at any given
time. Put together, this provides us all the necessary synchronization.
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
During suspend, we first stop governors and then suspend cpufreq drivers and
resume must be exactly opposite of that. i.e. resume drivers first and then
start governors.
But the current code in resume enables governors first and then resume drivers.
Fix it be changing code sequence there.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This callback allows the driver to do clean up before the CPU is
completely down and its state cannot be modified. This is used
by the intel_pstate driver to reduce the requested P state prior to
the core going away. This is required because the requested P state
of the offline core is used to select the package P state. This
effectively sets the floor package P state to the requested P state on
the offline core.
Signed-off-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
[rjw: Minor modifications]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Remove unnecessary braces from a single statement.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Fix 2 checkpatch errors about using assignment in if condition,
1 checkpatch error about a required space after comma
and 3 warnings about line over 80 characters.
Signed-off-by: Stratos Karafotis <stratosk@semaphore.gr>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Two cpufreq notifiers CPUFREQ_RESUMECHANGE and CPUFREQ_SUSPENDCHANGE have
not been used for some time, so remove them to clean up code a bit.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq drivers that provide the ->setpolicy() callback are supposed
to have integrated governors, so they don't need to set ->target()
or ->target_index() and may confuse the core if any of these callbacks
is present.
For this reason, add a check preventing ->setpolicy cpufreq drivers
from registering if they have non-NULL ->target or ->target_index.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
After commit da60ce9f2f (cpufreq: call cpufreq_driver->get() after
calling ->init()) __cpufreq_add_dev() sometimes fails for CPUs handled
by intel_pstate, because that driver may return 0 from its ->get()
callback if it has not run long enough to collect enough samples on the
given CPU. That didn't happen before commit da60ce9f2f which added
policy->cur initialization to __cpufreq_add_dev() to help reduce code
duplication in other cpufreq drivers.
However, the code added by commit da60ce9f2f need not be executed
for cpufreq drivers having the ->setpolicy callback defined, because
the subsequent invocation of cpufreq_set_policy() will use that
callback to initialize the policy anyway and it doesn't need
policy->cur to be initialized upfront. The analogous code in
cpufreq_update_policy() is also unnecessary for cpufreq drivers
having ->setpolicy set and may be skipped for them as well.
Since intel_pstate provides ->setpolicy, skipping the upfront
policy->cur initialization for cpufreq drivers with that callback
set will cover intel_pstate and the problem it's been having after
commit da60ce9f2f will be addressed.
Fixes: da60ce9f2f (cpufreq: call cpufreq_driver->get() after calling ->init())
References: https://bugzilla.kernel.org/show_bug.cgi?id=71931
Reported-and-tested-by: Patrik Lundquist <patrik.lundquist@gmail.com>
Acked-by: Dirk Brandewie <dirk.j.brandewie@intel.com>
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We have used 'frozen' variable/function parameter at many places to
distinguish between CPU offline/online on suspend/resume vs sysfs
removals. We now have another variable cpufreq_suspended which can
be used in these cases, so we can get rid of all those variables or
function parameters.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
freq table is not per CPU but per policy, so it makes more sense to
keep it within struct cpufreq_policy instead of a per-cpu variable.
This patch does it. Over that, there is no need to set policy->freq_table
to NULL in ->exit(), as policy structure is going to be freed soon.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Multiple platforms need to set CPUs to a particular frequency before
suspending the system, so provide a common infrastructure for them.
Those platforms only need to point their ->suspend callback pointers
to the generic routine.
Tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch adds cpufreq suspend/resume calls to dpm_{suspend|resume}()
for handling suspend/resume of cpufreq governors.
Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found an issue where the
tunables configuration for clusters/sockets with non-boot CPUs was
lost after system suspend/resume, as we were notifying governors with
CPUFREQ_GOV_POLICY_EXIT on removal of the last CPU for that policy
which caused the tunables memory to be freed.
This is fixed by preventing any governor operations from being
carried out between the device suspend and device resume stages of
system suspend and resume, respectively.
We could have added these callbacks at dpm_{suspend|resume}_noirq()
level, but there is an additional problem that the majority of I/O
devices is already suspended at that point and if cpufreq drivers
want to change the frequency before suspending, then that not be
possible on some platforms (which depend on peripherals like i2c,
regulators, etc).
Reported-and-tested-by: Lan Tianyu <tianyu.lan@intel.com>
Reported-by: Jinhyuk Choi <jinchoi@broadcom.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We call __find_governor() during the addition of the first CPU of
each policy from __cpufreq_add_dev() to find the last governor used
for this CPU before it was hot-removed.
After that we call cpufreq_parse_governor() in cpufreq_init_policy(),
either with this governor, or with the default governor. Right after
that policy->governor is set to NULL.
While that code is not functionally problematic, the structure of it
is suboptimal, because some of the code required in cpufreq_init_policy()
is being executed by its caller, __cpufreq_add_dev(). So, it would make
more sense to get all of it together in a single place to make code more
readable.
Accordingly, move the code needed for policy initialization to
cpufreq_init_policy() and initialize policy->governor to NULL at the
beginning.
In order to clean up the code a bit more, some of the #ifdefs for
CONFIG_HOTPLUG_CPU are dropped too.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
policy->rwsem is used to lock access to all parts of code modifying
struct cpufreq_policy, but it's not used on a new policy created by
__cpufreq_add_dev().
Because of that, if cpufreq_update_policy() is called in a tight loop
on one CPU in parallel with offline/online of another CPU, then the
following crash can be triggered:
Unable to handle kernel NULL pointer dereference at virtual address 00000020
pgd = c0003000
[00000020] *pgd=80000000004003, *pmd=00000000
Internal error: Oops: 206 [#1] PREEMPT SMP ARM
PC is at __cpufreq_governor+0x10/0x1ac
LR is at cpufreq_update_policy+0x114/0x150
---[ end trace f23a8defea6cd706 ]---
Kernel panic - not syncing: Fatal exception
CPU0: stopping
CPU: 0 PID: 7136 Comm: mpdecision Tainted: G D W 3.10.0-gd727407-00074-g979ede8 #396
[<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58)
[<c02a23ac>] (__blocking_notifier_call_chain+0x40/0x58) from [<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c)
[<c02a23d8>] (blocking_notifier_call_chain+0x14/0x1c) from [<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8)
[<c0803c68>] (cpufreq_set_policy+0xd4/0x2b8) from [<c0803e7c>] (cpufreq_init_policy+0x30/0x98)
[<c0803e7c>] (cpufreq_init_policy+0x30/0x98) from [<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4)
[<c0805a18>] (__cpufreq_add_dev.isra.17+0x4dc/0x7a4) from [<c0805d38>] (cpufreq_cpu_callback+0x58/0x84)
[<c0805d38>] (cpufreq_cpu_callback+0x58/0x84) from [<c0afe180>] (notifier_call_chain+0x40/0x68)
[<c0afe180>] (notifier_call_chain+0x40/0x68) from [<c02812dc>] (__cpu_notify+0x28/0x44)
[<c02812dc>] (__cpu_notify+0x28/0x44) from [<c0aeed90>] (_cpu_up+0xf4/0x1dc)
[<c0aeed90>] (_cpu_up+0xf4/0x1dc) from [<c0aeeed4>] (cpu_up+0x5c/0x78)
[<c0aeeed4>] (cpu_up+0x5c/0x78) from [<c0aec808>] (store_online+0x44/0x74)
[<c0aec808>] (store_online+0x44/0x74) from [<c03a40f4>] (sysfs_write_file+0x108/0x14c)
[<c03a40f4>] (sysfs_write_file+0x108/0x14c) from [<c03517d4>] (vfs_write+0xd0/0x180)
[<c03517d4>] (vfs_write+0xd0/0x180) from [<c0351ca8>] (SyS_write+0x38/0x68)
[<c0351ca8>] (SyS_write+0x38/0x68) from [<c0205de0>] (ret_fast_syscall+0x0/0x30)
Fix that by taking locks at appropriate places in __cpufreq_add_dev()
as well.
Reported-by: Saravana Kannan <skannan@codeaurora.org>
Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Policy must be fully initialized before it is being made available
for use by others. Otherwise cpufreq_cpu_get() would be able to grab
a half initialized policy structure that might not have affected_cpus
(for example) populated. Then, anybody accessing those fields will get
a wrong value and that will lead to unpredictable results.
In order to fix this, do all the necessary initialization before we
make the policy structure available via cpufreq_cpu_get(). That will
guarantee that any code accessing fields of the policy will get
correct data from them.
Reported-by: Saravana Kannan <skannan@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If a module calls cpufreq_get while cpufreq is initializing, it's
possible for it to be called after cpufreq_driver is set but before
cpufreq_cpu_data is written during subsys_interface_register. This
happens because cpufreq_get doesn't take the cpufreq_driver_lock
around its use of cpufreq_cpu_data.
Fix this by using cpufreq_cpu_get(cpu) to look up the policy rather
than reading it out of cpufreq_cpu_data directly. cpufreq_cpu_get()
takes the appropriate locks to prevent this race from happening.
Since it's possible for policy to be NULL if the caller passes in an
invalid CPU number or calls the function before cpufreq is initialized,
delete the BUG_ON(!policy) and simply return 0. Don't try to return
-ENOENT because that's negative and the function returns an unsigned
integer.
References: https://bbs.archlinux.org/viewtopic.php?id=177934
Signed-off-by: Aaron Plattner <aplattner@nvidia.com>
Cc: 3.13+ <stable@vger.kernel.org> # 3.13+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq_update_policy() calls cpufreq_driver->get() to get current
frequency of a CPU and it is not supposed to fail or return zero.
Return error in case that happens.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Mark function as static in cpufreq.c because it is not
used outside this file.
This eliminates the following warning in cpufreq.c:
drivers/cpufreq/cpufreq.c:355:9: warning: no previous prototype for ‘show_boost’ [-Wmissing-prototypes]
Signed-off-by: Rashika Kheria <rashika.kheria@gmail.com>
Reviewed-by: Josh Triplett <josh@joshtriplett.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
cpufreq_update_policy() is called from two places currently. From a
workqueue handled queued from cpufreq_bp_resume() for boot CPU and
from cpufreq_cpu_callback() whenever a CPU is added.
The first one makes sure that boot CPU is running on the frequency
present in policy->cpu. But we don't really need a call from
cpufreq_cpu_callback(), because we always call cpufreq_driver->init()
(which will set policy->cur correctly) whenever first CPU of any
policy is added back. And so every policy structure is guaranteed to
have the right frequency in policy->cur.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Reduce the rampant usage of goto and the indentation level in
cpufreq_set_policy() to improve the readability of that code.
No functional changes should result from that.
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Commit 42f921a (cpufreq: remove sysfs files for CPUs which failed to
come back after resume) tried to do this but missed this piece of code
to fix.
Currently we are getting this on suspend/resume:
------------[ cut here ]------------
WARNING: CPU: 0 PID: 877 at fs/sysfs/dir.c:52 sysfs_warn_dup+0x68/0x84()
sysfs: cannot create duplicate filename '/devices/system/cpu/cpu1/cpufreq'
Modules linked in: brcmfmac brcmutil
CPU: 0 PID: 877 Comm: test-rtc-resume Not tainted 3.14.0-rc2-00259-g9398a10cd964 #12
[<c0015bac>] (unwind_backtrace) from [<c0011850>] (show_stack+0x10/0x14)
[<c0011850>] (show_stack) from [<c056e018>] (dump_stack+0x80/0xcc)
[<c056e018>] (dump_stack) from [<c0025e44>] (warn_slowpath_common+0x64/0x88)
[<c0025e44>] (warn_slowpath_common) from [<c0025efc>] (warn_slowpath_fmt+0x30/0x40)
[<c0025efc>] (warn_slowpath_fmt) from [<c012776c>] (sysfs_warn_dup+0x68/0x84)
[<c012776c>] (sysfs_warn_dup) from [<c0127a54>] (sysfs_do_create_link_sd+0xb0/0xb8)
[<c0127a54>] (sysfs_do_create_link_sd) from [<c038ef64>] (__cpufreq_add_dev.isra.27+0x2a8/0x814)
[<c038ef64>] (__cpufreq_add_dev.isra.27) from [<c038f548>] (cpufreq_cpu_callback+0x70/0x8c)
[<c038f548>] (cpufreq_cpu_callback) from [<c0043864>] (notifier_call_chain+0x44/0x84)
[<c0043864>] (notifier_call_chain) from [<c0025f60>] (__cpu_notify+0x28/0x44)
[<c0025f60>] (__cpu_notify) from [<c00261e8>] (_cpu_up+0xf0/0x140)
[<c00261e8>] (_cpu_up) from [<c0569eb8>] (enable_nonboot_cpus+0x68/0xb0)
[<c0569eb8>] (enable_nonboot_cpus) from [<c006339c>] (suspend_devices_and_enter+0x198/0x2dc)
[<c006339c>] (suspend_devices_and_enter) from [<c0063654>] (pm_suspend+0x174/0x1e8)
[<c0063654>] (pm_suspend) from [<c00624e0>] (state_store+0x6c/0xbc)
[<c00624e0>] (state_store) from [<c01fc200>] (kobj_attr_store+0x14/0x20)
[<c01fc200>] (kobj_attr_store) from [<c0126e50>] (sysfs_kf_write+0x44/0x48)
[<c0126e50>] (sysfs_kf_write) from [<c012a274>] (kernfs_fop_write+0xb4/0x14c)
[<c012a274>] (kernfs_fop_write) from [<c00d4818>] (vfs_write+0xa8/0x180)
[<c00d4818>] (vfs_write) from [<c00d4bb8>] (SyS_write+0x3c/0x70)
[<c00d4bb8>] (SyS_write) from [<c000e620>] (ret_fast_syscall+0x0/0x30)
---[ end trace 76969904b614c18f ]---
Fix this by removing sysfs link for cpufreq directory when cpu removed
isn't policy->cpu.
Revamps: 42f921a (cpufreq: remove sysfs files for CPUs which failed to come back after resume)
Reported-and-tested-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This commit adds boost frequency support in cpufreq core (Hardware &
Software). Some SoCs (like Exynos4 - e.g. 4x12) allow setting frequency
above its normal operation limits. Such mode shall be only used for a
short time.
Overclocking (boost) support is essentially provided by platform
dependent cpufreq driver.
This commit unifies support for SW and HW (Intel) overclocking solutions
in the core cpufreq driver. Previously the "boost" sysfs attribute was
defined in the ACPI processor driver code. By default boost is disabled.
One global attribute is available at: /sys/devices/system/cpu/cpufreq/boost.
It only shows up when cpufreq driver supports overclocking.
Under the hood frequencies dedicated for boosting are marked with a
special flag (CPUFREQ_BOOST_FREQ) at driver's frequency table.
It is the user's concern to enable/disable overclocking with a proper call
to sysfs.
The cpufreq_boost_trigger_state() function is defined non static on purpose.
It is used later with thermal subsystem to provide automatic enable/disable
of the BOOST feature.
Signed-off-by: Lukasz Majewski <l.majewski@samsung.com>
Signed-off-by: Myungjoo Ham <myungjoo.ham@samsung.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CPUFreq drivers that use clock frameworks interface,i.e. clk_get_rate(),
to get CPUs clk rate, have similar sort of code used in most of them.
This patch adds a generic ->get() which will do the same thing for them.
All those drivers are required to now is to set .get to cpufreq_generic_get()
and set their clk pointer in policy->clk during ->init().
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Shawn Guo <shawn.guo@linaro.org>
Acked-by: Stephen Warren <swarren@nvidia.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are several problems with cpufreq stats in the way it handles
cpufreq_unregister_driver() and suspend/resume..
- We must not lose data collected so far when suspend/resume happens
and so stats directories must not be removed/allocated during these
operations, which is done currently.
- cpufreq_stat has registered notifiers with both cpufreq and hotplug.
It adds sysfs stats directory with a cpufreq notifier: CPUFREQ_NOTIFY
and removes this directory with a notifier from hotplug core.
In case cpufreq_unregister_driver() is called (on rmmod cpufreq driver),
stats directories per cpu aren't removed as CPUs are still online. The
only call cpufreq_stats gets is cpufreq_stats_update_policy_cpu() for
all CPUs except the last of each policy. And pointer to stat information
is stored in the entry for last CPU in the per-cpu cpufreq_stats_table.
But policy structure would be freed inside cpufreq core and so that will
result in memory leak inside cpufreq stats (as we are never freeing
memory for stats).
Now if we again insert the module cpufreq_register_driver() will be
called and we will again allocate stats data and put it on for first
CPU of every policy. In case we only have a single CPU per policy, we
will return with a error from cpufreq_stats_create_table() due to this
code:
if (per_cpu(cpufreq_stats_table, cpu))
return -EBUSY;
And so probably cpufreq stats directory would not show up anymore (as
it was added inside last policies->kobj which doesn't exist anymore).
I haven't tested it, though. Also the values in stats files wouldn't
be refreshed as we are using the earlier stats structure.
- CPUFREQ_NOTIFY is called from cpufreq_set_policy() which is called for
scenarios where we don't really want cpufreq_stat_notifier_policy() to get
called. For example whenever we are changing anything related to a policy:
min/max/current freq, etc. cpufreq_set_policy() is called and so cpufreq
stats is notified. Where we don't do any useful stuff other than simply
returning with -EBUSY from cpufreq_stats_create_table(). And so this
isn't the right notifier that cpufreq stats..
Due to all above reasons this patch does following changes:
- Add new notifiers CPUFREQ_CREATE_POLICY and CPUFREQ_REMOVE_POLICY,
which are only called when policy is created/destroyed. They aren't
called for suspend/resume paths..
- Use these notifiers in cpufreq_stat_notifier_policy() to create/destory
stats sysfs entries. And so cpufreq_unregister_driver() or suspend/resume
shouldn't be a problem for cpufreq_stats.
- Return early from cpufreq_stat_cpu_callback() for suspend/resume sequence,
so that we don't free stats structure.
Acked-by: Nicolas Pitre <nico@linaro.org>
Tested-by: Nicolas Pitre <nico@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Sometimes boot loaders set CPU frequency to a value outside of frequency table
present with cpufreq core. In such cases CPU might be unstable if it has to run
on that frequency for long duration of time and so its better to set it to a
frequency which is specified in freq-table. This also makes cpufreq stats
inconsistent as cpufreq-stats would fail to register because current frequency
of CPU isn't found in freq-table.
Because we don't want this change to affect boot process badly, we go for the
next freq which is >= policy->cur ('cur' must be set by now, otherwise we will
end up setting freq to lowest of the table as 'cur' is initialized to zero).
In case current frequency doesn't match any frequency from freq-table, we throw
warnings to user, so that user can get this fixed in their bootloaders or
freq-tables.
Reported-by: Carlos Hernandez <ceh@ti.com>
Reported-and-tested-by: Nishanth Menon <nm@ti.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
In the current code, if we fail during a frequency transition, we
simply send the POSTCHANGE notification with the old frequency. This
isn't enough.
One of the core users of these notifications is the code responsible
for keeping loops_per_jiffy aligned with frequency changes. And mostly
it is written as:
if ((val == CPUFREQ_PRECHANGE && freq->old < freq->new) ||
(val == CPUFREQ_POSTCHANGE && freq->old > freq->new)) {
update-loops-per-jiffy...
}
So, suppose we are changing to a higher frequency and failed during
transition, then following will happen:
- CPUFREQ_PRECHANGE notification with freq-new > freq-old
- CPUFREQ_POSTCHANGE notification with freq-new == freq-old
The first one will update loops_per_jiffy and second one will do
nothing. Even if we send the 2nd notification by exchanging values of
freq-new and old, some users of these notifications might get
unstable.
This can be fixed by simply calling cpufreq_notify_post_transition()
with error code and this routine will take care of sending
notifications in the correct order.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Folded 3 patches into one, rebased unicore2 changes]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This introduces a new routine cpufreq_notify_post_transition() which
can be used to send POSTCHANGE notification for new freq with or
without both {PRE|POST}CHANGE notifications for last freq. This is
useful at multiple places, especially for sending transition failure
notifications.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
When a CPU is hot removed we'll cancel all the delayed work items via
gov_cancel_work(). Sometimes the delayed work function determines that
it should adjust the delay for all other CPUs that the policy is
managing. If this scenario occurs, the canceling CPU will cancel its own
work but queue up the other CPUs works to run.
Commit 3617f2 (cpufreq: Fix timer/workqueue corruption due to double
queueing) has tried to fix this, but reading governor_enabled is not
protected by cpufreq_governor_lock. Even though od_dbs_timer() checks
governor_enabled before gov_queue_work(), this scenario may occur. For
example:
CPU0 CPU1
---- ----
cpu_down()
... <work runs>
__cpufreq_remove_dev() od_dbs_timer()
__cpufreq_governor() policy->governor_enabled
policy->governor_enabled = false;
cpufreq_governor_dbs()
case CPUFREQ_GOV_STOP:
gov_cancel_work(dbs_data, policy);
cpu0 work is canceled
timer is canceled
cpu1 work is canceled
<waits for cpu1>
gov_queue_work(*, *, true);
cpu0 work queued
cpu1 work queued
cpu2 work queued
...
cpu1 work is canceled
cpu2 work is canceled
...
At the end of the GOV_STOP case cpu0 still has a work queued to
run although the code is expecting all of the works to be
canceled. __cpufreq_remove_dev() will then proceed to
re-initialize all the other CPUs works except for the CPU that is
going down. The CPUFREQ_GOV_START case in cpufreq_governor_dbs()
will trample over the queued work and debugobjects will spit out
a warning:
WARNING: at lib/debugobjects.c:260 debug_print_object+0x94/0xbc()
ODEBUG: init active (active state 0) object type: timer_list hint: delayed_work_timer_fn+0x0/0x14
Modules linked in:
CPU: 1 PID: 1205 Comm: sh Tainted: G W 3.10.0 #200
[<c01144f0>] (unwind_backtrace+0x0/0xf8) from [<c0111d98>] (show_stack+0x10/0x14)
[<c0111d98>] (show_stack+0x10/0x14) from [<c01272cc>] (warn_slowpath_common+0x4c/0x68)
[<c01272cc>] (warn_slowpath_common+0x4c/0x68) from [<c012737c>] (warn_slowpath_fmt+0x30/0x40)
[<c012737c>] (warn_slowpath_fmt+0x30/0x40) from [<c034c640>] (debug_print_object+0x94/0xbc)
[<c034c640>] (debug_print_object+0x94/0xbc) from [<c034c7f8>] (__debug_object_init+0xc8/0x3c0)
[<c034c7f8>] (__debug_object_init+0xc8/0x3c0) from [<c01360e0>] (init_timer_key+0x20/0x104)
[<c01360e0>] (init_timer_key+0x20/0x104) from [<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c)
[<c04872ac>] (cpufreq_governor_dbs+0x1dc/0x68c) from [<c04833a8>] (__cpufreq_governor+0x80/0x1b0)
[<c04833a8>] (__cpufreq_governor+0x80/0x1b0) from [<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380)
[<c0483704>] (__cpufreq_remove_dev.isra.12+0x22c/0x380) from [<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c)
[<c0692f38>] (cpufreq_cpu_callback+0x48/0x5c) from [<c014fb40>] (notifier_call_chain+0x44/0x84)
[<c014fb40>] (notifier_call_chain+0x44/0x84) from [<c012ae44>] (__cpu_notify+0x2c/0x48)
[<c012ae44>] (__cpu_notify+0x2c/0x48) from [<c068dd40>] (_cpu_down+0x80/0x258)
[<c068dd40>] (_cpu_down+0x80/0x258) from [<c068df40>] (cpu_down+0x28/0x3c)
[<c068df40>] (cpu_down+0x28/0x3c) from [<c068e4c0>] (store_online+0x30/0x74)
[<c068e4c0>] (store_online+0x30/0x74) from [<c03a7308>] (dev_attr_store+0x18/0x24)
[<c03a7308>] (dev_attr_store+0x18/0x24) from [<c0256fe0>] (sysfs_write_file+0x100/0x180)
[<c0256fe0>] (sysfs_write_file+0x100/0x180) from [<c01fec9c>] (vfs_write+0xbc/0x184)
[<c01fec9c>] (vfs_write+0xbc/0x184) from [<c01ff034>] (SyS_write+0x40/0x68)
[<c01ff034>] (SyS_write+0x40/0x68) from [<c010e200>] (ret_fast_syscall+0x0/0x48)
In gov_queue_work(), lock cpufreq_governor_lock before gov_queue_work,
and unlock it after __gov_queue_work(). In this way, governor_enabled
is guaranteed not changed in gov_queue_work().
Signed-off-by: Jane Li <jiel@marvell.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Dmitry Torokhov <dmitry.torokhov@gmail.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Prevent __cpufreq_add_dev() from overwriting the existing values of
user_policy.{min|max|policy|governor} with defaults during resume
from system suspend.
Fixes: 5302c3fb2e ("cpufreq: Perform light-weight init/teardown during suspend/resume")
Reported-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
If cpufreq_policy_restore() returns NULL during system resume,
__cpufreq_add_dev() should just fall back to the full initialization
instead of returning an error, because that may actually make things
work. Moreover, it should not leave stale fallback data behind after
it has failed to restore a previously existing policy.
This change is based on Viresh Kumar's work.
Fixes: 5302c3fb2e ("cpufreq: Perform light-weight init/teardown during suspend/resume")
Reported-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
When configuring a default governor (via CONFIG_CPU_FREQ_DEFAULT_*) with the
intel_pstate driver, the desired default policy is not properly set. For
example, setting 'CONFIG_CPU_FREQ_DEFAULT_GOV_PERFORMANCE' ends up with the
'powersave' policy being set.
Fix by configuring the correct default policy, if either 'powersave' or
'performance' are requested. Otherwise, fallback to what the driver originally
set via its 'init' routine.
Signed-off-by: Jason Baron <jbaron@akamai.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
There are cases where cpufreq_add_dev() may fail for some CPUs
during system resume. With the current code we will still have
sysfs cpufreq files for those CPUs and struct cpufreq_policy
would be already freed for them. Hence any operation on those
sysfs files would result in kernel warnings.
Example of problems resulting from resume errors (from Bjørn Mork):
WARNING: CPU: 0 PID: 6055 at fs/sysfs/file.c:343 sysfs_open_file+0x77/0x212()
missing sysfs attribute operations for kobject: (null)
Modules linked in: [stripped as irrelevant]
CPU: 0 PID: 6055 Comm: grep Tainted: G D 3.13.0-rc2 #153
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
0000000000000009 ffff8802327ebb78 ffffffff81380b0e 0000000000000006
ffff8802327ebbc8 ffff8802327ebbb8 ffffffff81038635 0000000000000000
ffffffff811823c7 ffff88021a19e688 ffff88021a19e688 ffff8802302f9310
Call Trace:
[<ffffffff81380b0e>] dump_stack+0x55/0x76
[<ffffffff81038635>] warn_slowpath_common+0x7c/0x96
[<ffffffff811823c7>] ? sysfs_open_file+0x77/0x212
[<ffffffff810386e3>] warn_slowpath_fmt+0x41/0x43
[<ffffffff81182dec>] ? sysfs_get_active+0x6b/0x82
[<ffffffff81182382>] ? sysfs_open_file+0x32/0x212
[<ffffffff811823c7>] sysfs_open_file+0x77/0x212
[<ffffffff81182350>] ? sysfs_schedule_callback+0x1ac/0x1ac
[<ffffffff81122562>] do_dentry_open+0x17c/0x257
[<ffffffff8112267e>] finish_open+0x41/0x4f
[<ffffffff81130225>] do_last+0x80c/0x9ba
[<ffffffff8112dbbd>] ? inode_permission+0x40/0x42
[<ffffffff81130606>] path_openat+0x233/0x4a1
[<ffffffff81130b7e>] do_filp_open+0x35/0x85
[<ffffffff8113b787>] ? __alloc_fd+0x172/0x184
[<ffffffff811232ea>] do_sys_open+0x6b/0xfa
[<ffffffff811233a7>] SyS_openat+0xf/0x11
[<ffffffff8138c812>] system_call_fastpath+0x16/0x1b
To fix this, remove those sysfs files or put the associated kobject
in case of such errors. Also, to make it simple, remove the cpufreq
sysfs links from all the CPUs (except for the policy->cpu) during
suspend, as that operation won't result in a loss of sysfs file
permissions and we can create those links during resume just fine.
Fixes: 5302c3fb2e ("cpufreq: Perform light-weight init/teardown during suspend/resume")
Reported-and-tested-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
[rjw: Changelog]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 2167e2399d (cpufreq: fix garbage kobjects on errors during
suspend/resume) breaks suspend/resume on Martin Ziegler's system
(hard lockup during resume), so revert it.
Fixes: 2167e2399d (cpufreq: fix garbage kobjects on errors during suspend/resume)
References: https://bugzilla.kernel.org/show_bug.cgi?id=66751
Reported-by: Martin Ziegler <ziegler@uni-freiburg.de>
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Commit 5a87182aa2 (cpufreq: suspend governors on system
suspend/hibernate) causes hibernation problems to happen on
Bjørn Mork's and Paul Bolle's systems, so revert it.
Fixes: 5a87182aa2 (cpufreq: suspend governors on system suspend/hibernate)
Reported-by: Bjørn Mork <bjorn@mork.no>
Reported-by: Paul Bolle <pebolle@tiscali.nl>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This is effectively a revert of commit 5302c3fb2e ("cpufreq: Perform
light-weight init/teardown during suspend/resume"), which enabled
suspend/resume optimizations leaving the sysfs files in place.
Errors during suspend/resume are not handled properly, leaving
dead sysfs attributes in case of failures. There are are number of
functions with special code for the "frozen" case, and all these
need to also have special error handling.
The problem is easy to demonstrate by making cpufreq_driver->init()
or cpufreq_driver->get() fail during resume.
The code is too complex for a simple fix, with split code paths
in multiple blocks within a number of functions. It is therefore
best to revert the patch enabling this code until the error handling
is in place.
Examples of problems resulting from resume errors:
WARNING: CPU: 0 PID: 6055 at fs/sysfs/file.c:343 sysfs_open_file+0x77/0x212()
missing sysfs attribute operations for kobject: (null)
Modules linked in: [stripped as irrelevant]
CPU: 0 PID: 6055 Comm: grep Tainted: G D 3.13.0-rc2 #153
Hardware name: LENOVO 2776LEG/2776LEG, BIOS 6EET55WW (3.15 ) 12/19/2011
0000000000000009 ffff8802327ebb78 ffffffff81380b0e 0000000000000006
ffff8802327ebbc8 ffff8802327ebbb8 ffffffff81038635 0000000000000000
ffffffff811823c7 ffff88021a19e688 ffff88021a19e688 ffff8802302f9310
Call Trace:
[<ffffffff81380b0e>] dump_stack+0x55/0x76
[<ffffffff81038635>] warn_slowpath_common+0x7c/0x96
[<ffffffff811823c7>] ? sysfs_open_file+0x77/0x212
[<ffffffff810386e3>] warn_slowpath_fmt+0x41/0x43
[<ffffffff81182dec>] ? sysfs_get_active+0x6b/0x82
[<ffffffff81182382>] ? sysfs_open_file+0x32/0x212
[<ffffffff811823c7>] sysfs_open_file+0x77/0x212
[<ffffffff81182350>] ? sysfs_schedule_callback+0x1ac/0x1ac
[<ffffffff81122562>] do_dentry_open+0x17c/0x257
[<ffffffff8112267e>] finish_open+0x41/0x4f
[<ffffffff81130225>] do_last+0x80c/0x9ba
[<ffffffff8112dbbd>] ? inode_permission+0x40/0x42
[<ffffffff81130606>] path_openat+0x233/0x4a1
[<ffffffff81130b7e>] do_filp_open+0x35/0x85
[<ffffffff8113b787>] ? __alloc_fd+0x172/0x184
[<ffffffff811232ea>] do_sys_open+0x6b/0xfa
[<ffffffff811233a7>] SyS_openat+0xf/0x11
[<ffffffff8138c812>] system_call_fastpath+0x16/0x1b
The failure to restore cpufreq devices on cancelled hibernation is
not a new bug. It is caused by the ACPI _PPC call failing unless the
hibernate is completed. This makes the acpi_cpufreq driver fail its
init.
Previously, the cpufreq device could be restored by offlining the
cpu temporarily. And as a complete hibernation cycle would do this,
it would be automatically restored most of the time. But after
commit 5302c3fb2e the leftover sysfs attributes will block any
device add action. Therefore offlining and onlining CPU 1 will no
longer restore the cpufreq object, and a complete suspend/resume
cycle will replace it with garbage.
Fixes: 5302c3fb2e ("cpufreq: Perform light-weight init/teardown during suspend/resume")
Cc: 3.12+ <stable@vger.kernel.org> # 3.12+
Signed-off-by: Bjørn Mork <bjorn@mork.no>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
This patch adds cpufreq suspend/resume calls to dpm_{suspend|resume}_noirq()
for handling suspend/resume of cpufreq governors.
Lan Tianyu (Intel) & Jinhyuk Choi (Broadcom) found anr issue where
tunables configuration for clusters/sockets with non-boot CPUs was
getting lost after suspend/resume, as we were notifying governors
with CPUFREQ_GOV_POLICY_EXIT on removal of the last cpu for that
policy and so deallocating memory for tunables. This is fixed by
this patch as we don't allow any operation on governors after
device suspend and before device resume now.
Reported-and-tested-by: Lan Tianyu <tianyu.lan@intel.com>
Reported-by: Jinhyuk Choi <jinchoi@broadcom.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
[rjw: Changelog, minor cleanups]
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Most of the drivers do following in their ->target_index() routines:
struct cpufreq_freqs freqs;
freqs.old = old freq...
freqs.new = new freq...
cpufreq_notify_transition(policy, &freqs, CPUFREQ_PRECHANGE);
/* Change rate here */
cpufreq_notify_transition(policy, &freqs, CPUFREQ_POSTCHANGE);
This is replicated over all cpufreq drivers today and there doesn't exists a
good enough reason why this shouldn't be moved to cpufreq core instead.
There are few special cases though, like exynos5440, which doesn't do everything
on the call to ->target_index() routine and call some kind of bottom halves for
doing this work, work/tasklet/etc..
They may continue doing notification from their own code as flag:
CPUFREQ_ASYNC_NOTIFICATION is already set for them.
All drivers are also modified in this patch to avoid breaking 'git bisect', as
double notification would happen otherwise.
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Russell King <linux@arm.linux.org.uk>
Acked-by: Stephen Warren <swarren@nvidia.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Tested-by: Nicolas Pitre <nicolas.pitre@linaro.org>
Reviewed-by: Lan Tianyu <tianyu.lan@intel.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
We have per-CPU cpu_policy_rwsem for cpufreq core, but we never use
all of them. We always use rwsem of policy->cpu and so we can
actually make this rwsem per policy instead.
This patch does this change. With this change other tricky situations
are also avoided now, like which lock to take while we are changing
policy->cpu, etc.
Suggested-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Currently, the prototype of cpufreq_drivers target routines is:
int target(struct cpufreq_policy *policy, unsigned int target_freq,
unsigned int relation);
And most of the drivers call cpufreq_frequency_table_target() to get a valid
index of their frequency table which is closest to the target_freq. And they
don't use target_freq and relation after that.
So, it makes sense to just do this work in cpufreq core before calling
cpufreq_frequency_table_target() and simply pass index instead. But this can be
done only with drivers which expose their frequency table with cpufreq core. For
others we need to stick with the old prototype of target() until those drivers
are converted to expose frequency tables.
This patch implements the new light weight prototype for target_index() routine.
It looks like this:
int target_index(struct cpufreq_policy *policy, unsigned int index);
CPUFreq core will call cpufreq_frequency_table_target() before calling this
routine and pass index to it. Because CPUFreq core now requires to call routines
present in freq_table.c CONFIG_CPU_FREQ_TABLE must be enabled all the time.
This also marks target() interface as deprecated. So, that new drivers avoid
using it. And Documentation is updated accordingly.
It also converts existing .target() to newly defined light weight
.target_index() routine for many driver.
Acked-by: Hans-Christian Egtvedt <egtvedt@samfundet.no>
Acked-by: Jesper Nilsson <jesper.nilsson@axis.com>
Acked-by: Linus Walleij <linus.walleij@linaro.org>
Acked-by: Russell King <linux@arm.linux.org.uk>
Acked-by: David S. Miller <davem@davemloft.net>
Tested-by: Andrew Lunn <andrew@lunn.ch>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rjw@rjwysocki.net>
The function update_policy_cpu() is expected to be called when the policy->cpu
of a cpufreq policy is to be changed: ie., the new CPU nominated to become the
policy->cpu is different from the old one.
Print a warning if it is invoked with new_cpu == old_cpu, since such an
invocation might hint at a faulty logic in the caller.
Suggested-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Acked-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
CONFIG_CPU_FREQ_TABLE will be always enabled when cpufreq framework is used, as
cpufreq core depends on it. So, we don't need this CONFIG option anymore as it
is not configurable. Remove CONFIG_CPU_FREQ_TABLE and update its users.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Many CPUFreq drivers for SMP system (where all cores share same clock lines), do
similar stuff in their ->init() part.
This patch creates a generic routine in cpufreq core which can be used by these
so that we can remove some redundant code.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Almost all drivers set policy->cur with current CPU frequency in their ->init()
part. This can be done for all of them at core level and so they wouldn't need
to do it.
This patch adds supporting code in cpufreq core for calling get() after we have
called init() for a policy.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Use cpufreq_driver->flags to mark CPUFREQ_HAVE_GOVERNOR_PER_POLICY instead
of a separate field within cpufreq_driver. This will save some bytes of
memory.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Reviewed-by: Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>