We are required to call dev_pm_opp_put() from outside of the
opp_table->lock as debugfs removal needs to happen lock-less to avoid
circular dependency issues.
commit cf1fac943c ("opp: Reduce the size of critical section in
_opp_kref_release()") tried to fix that introducing a new routine
_opp_get_next() which keeps returning OPPs that can be freed by the
callers and this routine shall be called without holding the
opp_table->lock.
Though the commit overlooked the fact that the OPPs can be referenced by
other users as well and this routine will end up dropping references
which were taken by other users and hence freeing the OPPs prematurely.
In effect, other users of the OPPs will end up having invalid pointers
at hand. We didn't see any crash reports earlier as the exact situation
never happened, though it is certainly possible.
We need a way to mark which OPPs are no longer referenced by the OPP
core, so we don't drop extra references to them accidentally.
This commit adds another OPP flag, "removed", which is used to track
this. And now we should never end up dropping extra references to the
OPPs.
Cc: v5.11+ <stable@vger.kernel.org> # v5.11+
Fixes: cf1fac943c ("opp: Reduce the size of critical section in _opp_kref_release()")
Signed-off-by: Beata Michalska <beata.michalska@arm.com>
[ Viresh: Almost rewrote entire patch, added new "removed" field,
rewrote commit log and added the correct Fixes tag. ]
Co-developed-by: Viresh Kumar <viresh.kumar@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
We skip the OPP update if the current and target OPPs are same. This is
fine for the devices that don't support frequency but may cause issues
for the ones that need to program frequency.
An OPP entry doesn't really signify a single operating frequency but
rather the highest frequency at which the other properties of the OPP
entry apply. And we may reach here with different frequency values,
while all of them would point to the same OPP entry in the OPP table.
We just need to update the clock frequency in that case, though in order
to not add special exit points we reuse the code flow from a normal
path.
While at it, rearrange the conditionals in the 'if' statement to check
'enabled' flag at the end.
Fixes: 81c4d8a3c4 ("opp: Keep track of currently programmed OPP")
Signed-off-by: Jonathan Marek <jonathan@marek.ca>
[ Viresh: Improved commit log and subject, rename current_freq as
current_rate, document it, remove local variable and rearrange
code. ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Add a function that allows looking up required OPPs given a source OPP
table, destination OPP table and the source OPP.
Signed-off-by: Saravana Kannan <saravanak@google.com>
Signed-off-by: Hsin-Yi Wang <hsinyi@chromium.org>
[ Viresh: Rearranged code, fixed return errors ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Not all devices that need to use OPP core need to have clocks, a missing
clock is fine in which case -ENOENT shall be returned by clk_get().
Anything else is an error and must be handled properly.
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The bandwidth must be scaled at a different point in the code flow based
on if we are scaling up or down the frequency, otherwise this may cause
undesired effects as the device will try to use more of the memory
bandwidth which may be shared across several devices. Much like how
regulators and required-opps are programmed.
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Reported-by: Akhil P Oommen <akhilpo@codeaurora.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
The OPP core currently requires the required opp tables to be available
before the dependent OPP table is added, as it needs to create links
from the dependent OPP table to the required ones. This may not be
convenient for all the platforms though, as this requires strict
ordering for probing the drivers.
This patch allows lazy-linking of the required-opps. The OPP tables for
which the required-opp-tables aren't available at the time of their
initialization, are added to a special list of OPP tables:
lazy_opp_tables. Later on, whenever a new OPP table is registered with
the OPP core, we check if it is required by an OPP table in the pending
list; if yes, then we complete the linking then and there.
An OPP table is marked unusable until the time all its required-opp
tables are available. And if lazy-linking fails for an OPP table, the
OPP core disables all of its OPPs to make sure no one can use them.
Tested-by: Hsin-Yi Wang <hsinyi@chromium.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
All the users have migrated to dev_pm_opp_set_opp() now, get rid of the
duplicate API, dev_pm_opp_set_bw(), which only performs a part of the new API.
While at it, remove the unnecessary parameter to _set_opp_bw().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
The new helper dev_pm_opp_set_opp() can be used for configuring the
devices for a particular OPP and can be used by different type of
devices, even the ones which don't change frequency (like power
domains).
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Drop the unnecessary parameters and follow the pattern from
_generic_set_opp_regulator().
While at it, also remove the local variable old_freq.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
In order to avoid conditional statements at the caller site, this patch
updates _generic_set_opp_clk_only() to work for devices that don't
change frequency (like power domains, etc.). Return 0 if the clk pointer
passed to this routine is not valid.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
The _generic_set_opp_regulator() helper will be used for devices which
don't change frequency (like power domains, etc.) later on, prepare for
that by not relying on frequency for making decisions here.
While at it, update its parameters to pass only what is necessary.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
The _set_opp() helper will be used for devices which don't change frequency
(like power domains, etc.) later on, prepare for that by not relying on
frequency for making decisions here.
While at it, also update the debug print to contain all relevant
information.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
The _set_opp() helper will be used for devices which don't change their
frequency (like power domains, etc.) later on, prepare for that by
breaking the generic part out of dev_pm_opp_set_rate().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
The dev_pm_opp_set_rate() helper needs to know the currently programmed
OPP to make few decisions and currently we try to find it on every
invocation of this routine.
Lets start keeping track of the current_opp programmed for the devices
of the opp table, that will be quite useful going forward.
If we fail to find the current OPP, we pick the first one available in
the list, as the list is in ascending order of frequencies, level, or
bandwidth and that's the best guess we can make anyway.
Note that we used to do the frequency comparison a bit early in
dev_pm_opp_set_rate() previously, and now instead we check the target
opp, which shall be more accurate anyway.
We need to make sure that current_opp's memory doesn't get freed while
it is being used and so we keep a reference of it until the time it is
used.
Now that current_opp will always be set, we can drop some unnecessary
checks as well.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Clock is not optional for users who call into dev_pm_opp_set_rate().
Remove the unnecessary checks.
While at it also drop the local variable for clk and use opp_table->clk
instead.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
This routine has nothing to do with frequency, it just disables all the
resources previously enabled. Rename it to match its purpose.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Check whether OPP table has regulators in _set_opp_custom() and set up
dev_pm_set_opp_data accordingly. Now _set_opp_custom() works properly,
i.e. it doesn't crash if OPP table doesn't have assigned regulators.
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
[ Viresh: Rearrange the routine a bit ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Until now the ->set_opp() helper (i.e. special implementation for
setting the OPPs for platforms) was implemented only to take care of
multiple regulators case, but going forward we would need that for other
use cases as well.
This patch prepares for that by allocating the regulator specific part
from dev_pm_opp_set_regulators() and the opp helper part from
dev_pm_opp_register_set_opp_helper().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Print OPP level in debug message of _opp_add_static_v2(). This helps to
chase GENPD bugs.
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Matt Merhar <mattmerhar@protonmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
NVIDIA Tegra SoCs have a power domains topology such that child domains
only clamp a power rail, while parent domain controls shared performance
state of the multiple child domains. In this case child's domain doesn't
need to have OPP table. Hence we want to allow children power domains to
pass performance state to the parent domain if child's domain doesn't have
OPP table.
The dev_pm_opp_xlate_performance_state() gets src_table=NULL if a child
power domain doesn't have OPP table and in this case we should pass the
performance state to the parent domain.
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Matt Merhar <mattmerhar@protonmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Add resource-managed version of dev_pm_opp_attach_genpd().
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
[ Viresh: Manually apply the patch and relocate the routines ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Add resource-managed version of dev_pm_opp_register_set_opp_helper().
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Matt Merhar <mattmerhar@protonmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
[ Viresh: Manually apply the patch and relocate the routines ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
A few drivers have device's clk but they don't want the OPP core to
handle that. Add a new helper for them, dev_pm_opp_of_add_table_noclk().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
We acquire the clk at the time the OPP table is allocated, though it
works fine, it is not the best place to do so. One of the main reason
being we may need to acquire it again from dev_pm_opp_set_clkname() if
the platform wants another clock to be acquired instead.
There is also requirement from some of the platforms where they do not
want the OPP core to manage the clock at all.
This patch hence defers acquiring the clk until the time we are certain
about which clk we need to acquire and if we really need to acquire one.
With this commit, the clk will get acquired either from
dev_pm_opp_set_clkname() or while we initialize the OPPs within the
table.
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
The implementation of dev_pm_opp_of_add_table() and
dev_pm_opp_of_add_table_indexed() are almost identical. Create
_of_add_table_indexed() to reduce code redundancy.
Also remove the duplication of the doc style comments by referring to
dev_pm_opp_of_add_table() from dev_pm_opp_of_add_table_indexed().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dmitry Osipenko <digetx@gmail.com>
Extend OPP API with dev_pm_opp_sync_regulators() function, which syncs
voltage state of regulators.
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Matt Merhar <mattmerhar@protonmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
[ Viresh: Added unlikely() ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Add dev_pm_opp_get_required_pstate() which allows OPP users to retrieve
required performance state of a given OPP.
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Matt Merhar <mattmerhar@protonmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Add a ceil version of the dev_pm_opp_find_level(). It's handy to have if
levels don't start from 0 in OPP table and zero usually means a minimal
level.
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Matt Merhar <mattmerhar@protonmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The debug message always prints rate=0 instead of a proper value, fix it.
Fixes: 6c591eec67 ("OPP: Add helpers for reading the binding properties")
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Matt Merhar <mattmerhar@protonmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
[ Viresh: Added Fixes tag ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
A required OPP may not be available, and thus, all OPPs which are using
this required OPP should be unavailable too.
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Matt Merhar <mattmerhar@protonmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Fix adding OPP entries in a wrong (opposite) order if OPP rate is
unavailable. The OPP comparison was erroneously skipped, thus OPPs
were left unsorted.
Tested-by: Peter Geis <pgwipeout@gmail.com>
Tested-by: Nicolas Chauvet <kwizart@gmail.com>
Tested-by: Matt Merhar <mattmerhar@protonmail.com>
Signed-off-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
In function _allocate_opp_table, opp_dev is allocated and referenced
by opp_table via _add_opp_dev. But in the case that the subsequent calls
return -EPROBE_DEFER, it will jump to err label and opp_table will be
freed. Then opp_dev becomes an unreferenced object to cause memory leak.
So let's call _remove_opp_dev to do the cleanup.
This fixes the following kmemleak report:
unreferenced object 0xffff000801524a00 (size 128):
comm "swapper/0", pid 1, jiffies 4294892465 (age 84.616s)
hex dump (first 32 bytes):
40 00 56 01 08 00 ff ff 40 00 56 01 08 00 ff ff @.V.....@.V.....
b8 52 77 7f 08 00 ff ff 00 3c 4c 00 08 00 ff ff .Rw......<L.....
backtrace:
[<00000000b1289fb1>] kmemleak_alloc+0x30/0x40
[<0000000056da48f0>] kmem_cache_alloc+0x3d4/0x588
[<00000000a84b3b0e>] _add_opp_dev+0x2c/0x88
[<0000000062a380cd>] _add_opp_table_indexed+0x124/0x268
[<000000008b4c8f1f>] dev_pm_opp_of_add_table+0x20/0x1d8
[<00000000e5316798>] dev_pm_opp_of_cpumask_add_table+0x48/0xf0
[<00000000db0a8ec2>] dt_cpufreq_probe+0x20c/0x448
[<0000000030a3a26c>] platform_probe+0x68/0xd8
[<00000000c618e78d>] really_probe+0xd0/0x3a0
[<00000000642e856f>] driver_probe_device+0x58/0xb8
[<00000000f10f5307>] device_driver_attach+0x74/0x80
[<0000000004f254b8>] __driver_attach+0x58/0xe0
[<0000000009d5d19e>] bus_for_each_dev+0x70/0xc8
[<0000000000d22e1c>] driver_attach+0x24/0x30
[<0000000001d4e952>] bus_add_driver+0x14c/0x1f0
[<0000000089928aaa>] driver_register+0x64/0x120
Cc: v5.10 <stable@vger.kernel.org> # v5.10
Fixes: dd461cd918 ("opp: Allow dev_pm_opp_get_opp_table() to return -EPROBE_DEFER")
Signed-off-by: Quanyang Wang <quanyang.wang@windriver.com>
[ Viresh: Added the stable tag ]
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
The opp binding now allows to have an empty opp table and shared-opp to
still describe that devices share v/f lines.
When initialising an empty opp table, allow such case by:
- treating such conditions with warnings in place of errors
- don't fail on empty table
Signed-off-by: Nicola Mazzucato <nicola.mazzucato@arm.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
It has been found that some users (like cpufreq-dt and others on LKML)
have abused the helper dev_pm_opp_get_opp_table() to create the OPP
table instead of just finding it, which is the wrong thing to do. This
routine was meant for OPP core's internal working and exposed the whole
functionality by mistake.
Change the scope of dev_pm_opp_get_opp_table() to only finding the
table. The internal helpers _opp_get_opp_table*() are thus renamed to
_add_opp_table*(), dev_pm_opp_get_opp_table_indexed() is removed (as we
don't need the index field for finding the OPP table) and so the only
user, genpd, is updated.
Note that the prototype of _add_opp_table() was already left in opp.h by
mistake when it was removed earlier and so we weren't required to add it
now.
Acked-by: Ulf Hansson <ulf.hansson@linaro.org>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
There is a lot of stuff here which can be done outside of the
opp_table->lock, do that. This helps avoiding a circular dependency
lockdeps around debugfs.
Reported-by: Rob Clark <robdclark@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
There are different platforms and devices which might use different scale
for the power values. Kernel sub-systems might need to check if all
Energy Model (EM) devices are using the same scale. Address that issue and
store the information inside EM for each device. Thanks to that they can
be easily compared and proper action triggered.
Suggested-by: Daniel Lezcano <daniel.lezcano@linaro.org>
Reviewed-by: Quentin Perret <qperret@google.com>
Signed-off-by: Lukasz Luba <lukasz.luba@arm.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
The readers of dev_list expect the updates to it to take place from
within the opp_table->lock and this is missing in the case where the
dev_list is updated for already managed OPPs.
Fix that by calling _add_opp_dev() from there and remove the now unused
_add_opp_dev_unlocked() callback. While at it, also reduce the length of
the critical section in _add_opp_dev().
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
There is a lot of stuff here which can be done outside of the big
opp_table_lock, do that. This helps avoiding few circular dependency
lockdeps around debugfs and interconnects.
Reported-by: Rob Clark <robdclark@gmail.com>
Reported-by: Dmitry Osipenko <digetx@gmail.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
We returned earlier by mistake even when there were no failures. Fix it.
Fixes: dd461cd918 ("opp: Allow dev_pm_opp_get_opp_table() to return -EPROBE_DEFER")
Reported-by: Naresh Kamboju <naresh.kamboju@linaro.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Naresh Kamboju <naresh.kamboju@linaro.com>
The patch missed returning 0 early in case of success and hence the
static OPPs got removed by mistake. Fix it.
Fixes: 90d46d71cc ("opp: Handle multiple calls for same OPP table in _of_add_opp_table_v1()")
Reported-by: Aisheng Dong <aisheng.dong@nxp.com>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
Tested-by: Dong Aisheng <aisheng.dong@nxp.com>
Pull opertaing performance points (OPP) framework fixes for 5.10-rc1
from Viresh Kumar:
"- Return -EPROBE_DEFER properly from dev_pm_opp_get_opp_table()
(Stephan Gerhold).
- Minor cleanups around required-opps (Stephan Gerhold).
- Extends opp-supported-hw property to contain multiple versions
(Viresh Kumar).
- Multiple cleanups around dev_pm_opp_attach_genpd() (Viresh Kumar).
- Multiple fixes, cleanups in the OPP core for overall better design
(Viresh Kumar)."
* 'opp/linux-next' of git://git.kernel.org/pub/scm/linux/kernel/git/vireshk/pm:
opp: Allow opp-level to be set to 0
opp: Prevent memory leak in dev_pm_opp_attach_genpd()
ARM: tegra: Pass multiple versions in opp-supported-hw property
opp: Allow opp-supported-hw to contain multiple versions
dt-bindings: opp: Allow opp-supported-hw to contain multiple versions
opp: Set required OPPs in reverse order when scaling down
opp: Reduce code duplication in _set_required_opps()
opp: Drop unnecessary check from dev_pm_opp_attach_genpd()
opp: Handle multiple calls for same OPP table in _of_add_opp_table_v1()
opp: Allow dev_pm_opp_get_opp_table() to return -EPROBE_DEFER
opp: Remove _dev_pm_opp_find_and_remove_table() wrapper
opp: Split out _opp_set_rate_zero()
opp: Reuse the enabled flag in !target_freq path
opp: Rename regulator_enabled and use it as status of all resources
The DT bindings don't put such a constraint, nor should the kernel. It
is perfectly fine for opp-level to be set to 0, if we need to put the
performance state votes for a domain for a particular OPP.
Reported-by: Stephan Gerhold <stephan@gerhold.net>
Tested-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>
If dev_pm_opp_attach_genpd() is called multiple times (once for each CPU
sharing the table), then it would result in unwanted behavior like
memory leak, attaching the domain multiple times, etc.
Handle that by checking and returning earlier if the domains are already
attached. Now that dev_pm_opp_detach_genpd() can get called multiple
times as well, we need to protect that too.
Note that the virtual device pointers aren't returned in this case, as
they may become unavailable to some callers during the middle of the
operation.
Reported-by: Stephan Gerhold <stephan@gerhold.net>
Signed-off-by: Viresh Kumar <viresh.kumar@linaro.org>