I came across a memory leak during a cyclic cpu-online-offline test.
Signed-off-by: Yu Luming <luming.yu@intel.com>
Cc: Len Brown <lenb@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch augments the pstate transition code to error out
(instead of returning 0) when an incorrect pstate is provided.
Suggested-by: Borislav Petkov <bp@alien8.de>
CC: andre.przywara@amd.com
CC: Mark.Langsdorf@amd.com
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Before this patch if we failed the vid transition would still try to
submit the "new" frequencies to cpufreq.
That is incorrect - also we could submit a non-existing frequency value
which would cause cpufreq to crash. The ultimate fix is in cpufreq
to deal with incorrect values, but this patch improves the error
recovery in the AMD powernowk8 driver.
The failure that was reported was as follows:
powernow-k8: Found 1 AMD Athlon(tm) 64 Processor 3700+ (1 cpu cores) (version 2.20.00)
powernow-k8: fid 0x2 (1000 MHz), vid 0x12
powernow-k8: fid 0xa (1800 MHz), vid 0xa
powernow-k8: fid 0xc (2000 MHz), vid 0x8
powernow-k8: fid 0xe (2200 MHz), vid 0x8
Marking TSC unstable due to cpufreq changes
powernow-k8: fid trans failed, fid 0x2, curr 0x0
BUG: unable to handle kernel paging request at ffff880807e07b78
IP: [<ffffffff81479163>] cpufreq_stats_update+0x46/0x5b
...
And transition fails and data->currfid ends up with 0. Since
the machine does not support 800Mhz value when the calculation is
done ('find_khz_freq_from_fid(data->currfid);') it reports the
new frequency as 800000 which is bogus. This patch fixes
the issue during target setting.
The patch however does not fix the issue in 'powernowk8_cpu_init'
where the pol->cur can also be set with the 800000 value:
pol->cur = find_khz_freq_from_fid(data->currfid);
dprintk("policy current frequency %d kHz\n", pol->cur);
/* min/max the cpu is capable of */
if (cpufreq_frequency_table_cpuinfo(pol, data->powernow_table)) {
The fix for that looks to update cpufreq_frequency_table_cpuinfo to
check pol->cur.... but that would cause an regression in how the
acpi-cpufreq driver works (it sets cpu->cur after calling
cpufreq_frequency_table_cpuinfo). Instead the fix will be to let
cpufreq gracefully handle bogus data (another patch).
Acked-by: Borislav Petkov <bp@alien8.de>
CC: andre.przywara@amd.com
CC: Mark.Langsdorf@amd.com
Reported-by: Tobias Diedrich <ranma+xen@tdiedrich.de>
Tested-by: Tobias Diedrich <ranma+xen@tdiedrich.de>
[v1: Rebased on v3.0-rc2, reduced patch to deal with vid case]
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dave Jones <davej@redhat.com>
If the driver submitted an non-existing pol>cur value (say it
used the default initialized value of zero), when the cpufreq
stats tries to setup its initial values it incorrectly sets
stat->last_index to -1 (or 0xfffff...). And cpufreq_stats_update
tries to update at that index location and fails.
This can be caused by:
stat->last_index = freq_table_get_index(stat, policy->cur);
not finding the appropiate frequency in the table (b/c the policy->cur
is wrong) and we end up crashing. The fix however is
concentrated in the 'cpufreq_stats_update' as the last_index
(and old_index) are updated there. Which means it can reset
the last_index to -1 again and on the next iteration cause a crash.
Without this patch, the following crash is observed:
powernow-k8: Found 1 AMD Athlon(tm) 64 Processor 3700+ (1 cpu cores) (version 2.20.00)
powernow-k8: fid 0x2 (1000 MHz), vid 0x12
powernow-k8: fid 0xa (1800 MHz), vid 0xa
powernow-k8: fid 0xc (2000 MHz), vid 0x8
powernow-k8: fid 0xe (2200 MHz), vid 0x8
Marking TSC unstable due to cpufreq changes
powernow-k8: fid trans failed, fid 0x2, curr 0x0
BUG: unable to handle kernel paging request at ffff880807e07b78
IP: [<ffffffff81479163>] cpufreq_stats_update+0x46/0x5b
.. snip..
Pid: 1, comm: swapper Not tainted 3.0.0-rc2 #45 MICRO-STAR INTERNATIONAL CO., LTD MS-7094/MS-7094
..snip..
Call Trace:
[<ffffffff81479248>] cpufreq_stat_notifier_trans+0x48/0x7c
[<ffffffff81095d68>] notifier_call_chain+0x32/0x5e
[<ffffffff81095e6b>] __srcu_notifier_call_chain+0x47/0x63
[<ffffffff81095e96>] srcu_notifier_call_chain+0xf/0x11
[<ffffffff81477e7a>] cpufreq_notify_transition+0x111/0x134
[<ffffffff8147b0d4>] powernowk8_target+0x53b/0x617
[<ffffffff8147723a>] __cpufreq_driver_target+0x2e/0x30
[<ffffffff8147a127>] cpufreq_governor_dbs+0x339/0x356
[<ffffffff81477394>] __cpufreq_governor+0xa8/0xe9
[<ffffffff81477525>] __cpufreq_set_policy+0x132/0x13e
[<ffffffff8147848d>] cpufreq_add_dev_interface+0x272/0x28c
Reported-by: Tobias Diedrich <ranma+xen@tdiedrich.de>
Tested-by: Tobias Diedrich <ranma+xen@tdiedrich.de>
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Signed-off-by: Dave Jones <davej@redhat.com>
cpufreq_stats leaves behind its sysfs entries, which causes a panic
when something stumbled across them.
(Discovered by unloading cpufreq_stats while powertop was loaded).
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org
Concluding interface update and movement of the driver by making
the DB8500 cpufreq driver compile in the cpufreq subsystem.
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
This updates the ux500 cpufreq driver to the new interface from the
updated DB8500 PRCMU
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
As part of the ARM arch subsystem migration, move the DB8500
cpufreq driver to drivers/cpufreq as discussed with Dave Jones. The
Makefile is not updated in order to avoid cross-subsystem conflicts
for this file in merges.
Cc: Arnd Bergmann <arnd@arndb.de>
Acked-by: Dave Jones <davej@redhat.com>
Signed-off-by: Linus Walleij <linus.walleij@linaro.org>
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, cpu: Fix detection of Celeron Covington stepping A1 and B0
Documentation, ABI: Update L3 cache index disable text
x86, AMD, cacheinfo: Fix L3 cache index disable checks
x86, AMD, cacheinfo: Fix fallout caused by max3 conversion
x86, cpu: Change NOP selection for certain Intel CPUs
x86, cpu: Clean up and unify the NOP selection infrastructure
x86, percpu: Use ASM_NOP4 instead of hardcoding P6_NOP4
x86, cpu: Move AMD Elan Kconfig under "Processor family"
Fix up trivial conflicts in alternative handling (commit dc326fca2b
"x86, cpu: Clean up and unify the NOP selection infrastructure" removed
some hacky 5-byte instruction stuff, while commit d430d3d7e6 "jump
label: Introduce static_branch() interface" renamed HAVE_JUMP_LABEL to
CONFIG_JUMP_LABEL in the code that went away)
Since format string handling is part of request_module, there is no
need to construct the module name. As such, drop the redundant sprintf
and heap usage.
Signed-off-by: Kees Cook <kees.cook@canonical.com>
Signed-off-by: Dave Jones <davej@redhat.com>
When a CPU is taken offline in an SMP system, cpufreq_remove_dev()
nulls out the per-cpu policy before cpufreq_stats_free_table() can
make use of it. cpufreq_stats_free_table() then skips the
call to sysfs_remove_group(), leaving about 100 bytes of sysfs-related
memory unclaimed each time a CPU-removal occurs. Break up
cpu_stats_free_table into sysfs and table portions, and
call the sysfs portion early.
Signed-off-by: Steven Finney <steven.finney@palm.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org
With dynamic debug having gained the capability to report debug messages
also during the boot process, it offers a far superior interface for
debug messages than the custom cpufreq infrastructure. As a first step,
remove the old cpufreq_debug_printk() function and replace it with a call
to the generic pr_debug() function.
How can dynamic debug be used on cpufreq? You need a kernel which has
CONFIG_DYNAMIC_DEBUG enabled.
To enabled debugging during runtime, mount debugfs and
$ echo -n 'module cpufreq +p' > /sys/kernel/debug/dynamic_debug/control
for debugging the complete "cpufreq" module. To achieve the same goal during
boot, append
ddebug_query="module cpufreq +p"
as a boot parameter to the kernel of your choice.
For more detailled instructions, please see
Documentation/dynamic-debug-howto.txt
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
When we discover CPUs that are affected by each other's
frequency/voltage transitions, the first CPU gets a sysfs directory
created, and rest of the siblings get symlinks. Currently, when we
hotplug off only the first CPU, all of the symlinks and the sysfs
directory gets removed. Even though rest of the siblings are still
online and functional, they are orphaned, and no longer governed by
cpufreq.
This patch, given the above scenario, creates a sysfs directory for
the first sibling and symlinks for the rest of the siblings.
Please note the recursive call, it was rather too ugly to roll it
out. And the removal of redundant NULL setting (it is already taken
care of near the top of the function).
Signed-off-by: Jacob Shin <jacob.shin@amd.com>
Acked-by: Mark Langsdorf <mark.langsdorf@amd.com>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
Cc: stable@kernel.org
The cpufreq subsystem uses sysdev suspend and resume for
executing cpufreq_suspend() and cpufreq_resume(), respectively,
during system suspend, after interrupts have been switched off on the
boot CPU, and during system resume, while interrupts are still off on
the boot CPU. In both cases the other CPUs are off-line at the
relevant point (either they have been switched off via CPU hotplug
during suspend, or they haven't been switched on yet during resume).
For this reason, although it may seem that cpufreq_suspend() and
cpufreq_resume() are executed for all CPUs in the system, they are
only called for the boot CPU in fact, which is quite confusing.
To remove the confusion and to prepare for elimiating sysdev
suspend and resume operations from the kernel enirely, convernt
cpufreq to using a struct syscore_ops object for the boot CPU
suspend and resume and rename the callbacks so that their names
reflect their purpose. In addition, put some explanatory remarks
into their kerneldoc comments.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
None of the existing cpufreq drivers uses the second argument of
its .suspend() callback (which isn't useful anyway), so remove it.
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Signed-off-by: Dave Jones <davej@redhat.com>
There cannot be any concurrent access to these through
different cpu sysfs files anymore, because these tunables
are now all global (not per cpu).
I still have some doubts whether some of these locks
were needed at all. Anyway, let's get rid of them.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
CC: cpufreq@vger.kernel.org
Marked deprecated for quite a whilte now...
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
CC: cpufreq@vger.kernel.org
Marked deprecated for quite a while now...
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
CC: cpufreq@vger.kernel.org
calculate ondemand delay after dbs_check_cpu call because it can
modify rate_mult value
use freq_lo_jiffies value for the sub sample period of powersave_bias mode
Signed-off-by: Vincent Guittot <vincent.guittot@linaro.org>
Signed-off-by: Dave Jones <davej@redhat.com>
* 'for-2.6.39' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq:
workqueue: fix build failure introduced by s/freezeable/freezable/
workqueue: add system_freezeable_wq
rds/ib: use system_wq instead of rds_ib_fmr_wq
net/9p: replace p9_poll_task with a work
net/9p: use system_wq instead of p9_mux_wq
xfs: convert to alloc_workqueue()
reiserfs: make commit_wq use the default concurrency level
ocfs2: use system_wq instead of ocfs2_quota_wq
ext4: convert to alloc_workqueue()
scsi/scsi_tgt_lib: scsi_tgtd isn't used in memory reclaim path
scsi/be2iscsi,qla2xxx: convert to alloc_workqueue()
misc/iwmc3200top: use system_wq instead of dedicated workqueues
i2o: use alloc_workqueue() instead of create_workqueue()
acpi: kacpi*_wq don't need WQ_MEM_RECLAIM
fs/aio: aio_wq isn't used in memory reclaim path
input/tps6507x-ts: use system_wq instead of dedicated workqueue
cpufreq: use system_wq instead of dedicated workqueues
wireless/ipw2x00: use system_wq instead of dedicated workqueues
arm/omap: use system_wq in mailbox
workqueue: use WQ_MEM_RECLAIM instead of WQ_RESCUER
cpufreq_register_driver sets cpufreq_driver to a structure owned (and
placed) in the caller's memory. If cpufreq policy fails in its ->init
function, sysdev_driver_register returns nonzero in
cpufreq_register_driver. Now, cpufreq_register_driver returns an error
without setting cpufreq_driver back to NULL.
Usually cpufreq policy modules are unloaded because they propagate the
error to the module init function and return that.
So a later access to any member of cpufreq_driver causes bugs like:
BUG: unable to handle kernel paging request at ffffffffa00270a0
IP: [<ffffffff8145eca3>] cpufreq_cpu_get+0x53/0xe0
PGD 1805067 PUD 1809063 PMD 1c3f90067 PTE 0
Oops: 0000 [#1] SMP
last sysfs file: /sys/devices/virtual/net/tun0/statistics/collisions
CPU 0
Modules linked in: ...
Pid: 5677, comm: thunderbird-bin Tainted: G W 2.6.38-rc4-mm1_64+ #1389 To be filled by O.E.M./To Be Filled By O.E.M.
RIP: 0010:[<ffffffff8145eca3>] [<ffffffff8145eca3>] cpufreq_cpu_get+0x53/0xe0
RSP: 0018:ffff8801aec37d98 EFLAGS: 00010086
RAX: 0000000000000202 RBX: 0000000000000000 RCX: 0000000000000001
RDX: ffffffffa00270a0 RSI: 0000000000001000 RDI: ffffffff8199ece8
...
Call Trace:
[<ffffffff8145f490>] cpufreq_quick_get+0x10/0x30
[<ffffffff8103f12b>] show_cpuinfo+0x2ab/0x300
[<ffffffff81136292>] seq_read+0xf2/0x3f0
[<ffffffff8126c5d3>] ? __strncpy_from_user+0x33/0x60
[<ffffffff8116850d>] proc_reg_read+0x6d/0xa0
[<ffffffff81116e53>] vfs_read+0xc3/0x180
[<ffffffff81116f5c>] sys_read+0x4c/0x90
[<ffffffff81030dbb>] system_call_fastpath+0x16/0x1b
...
It's all cause by weird fail path handling in cpufreq_register_driver.
To fix that, shuffle the code to do proper handling with gotos.
Signed-off-by: Jiri Slaby <jslaby@suse.cz>
Signed-off-by: Dave Jones <davej@redhat.com>
With cmwq, there's no reason for cpufreq drivers to use separate
workqueues. Remove the dedicated workqueues from cpufreq_conservative
and cpufreq_ondemand and use system_wq instead. The work items are
already sync canceled on stop, so it's already guaranteed that no work
is running on module exit.
Signed-off-by: Tejun Heo <tj@kernel.org>
Acked-by: Dave Jones <davej@redhat.com>
Cc: cpufreq@vger.kernel.org
The meaning of CONFIG_EMBEDDED has long since been obsoleted; the option
is used to configure any non-standard kernel with a much larger scope than
only small devices.
This patch renames the option to CONFIG_EXPERT in init/Kconfig and fixes
references to the option throughout the kernel. A new CONFIG_EMBEDDED
option is added that automatically selects CONFIG_EXPERT when enabled and
can be used in the future to isolate options that should only be
considered for embedded systems (RISC architectures, SLOB, etc).
Calling the option "EXPERT" more accurately represents its intention: only
expert users who understand the impact of the configuration changes they
are making should enable it.
Reviewed-by: Ingo Molnar <mingo@elte.hu>
Acked-by: David Woodhouse <david.woodhouse@intel.com>
Signed-off-by: David Rientjes <rientjes@google.com>
Cc: Greg KH <gregkh@suse.de>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Robin Holt <holt@sgi.com>
Cc: <linux-arch@vger.kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add these new power trace events:
power:cpu_idle
power:cpu_frequency
power:machine_suspend
The old C-state/idle accounting events:
power:power_start
power:power_end
Have now a replacement (but we are still keeping the old
tracepoints for compatibility):
power:cpu_idle
and
power:power_frequency
is replaced with:
power:cpu_frequency
power:machine_suspend is newly introduced.
Jean Pihet has a patch integrated into the generic layer
(kernel/power/suspend.c) which will make use of it.
the type= field got removed from both, it was never
used and the type is differed by the event type itself.
perf timechart userspace tool gets adjusted in a separate patch.
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Acked-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Jean Pihet <jean.pihet@newoldbits.com>
Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: rjw@sisk.pl
LKML-Reference: <1294073445-14812-3-git-send-email-trenn@suse.de>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
LKML-Reference: <1290072314-31155-2-git-send-email-trenn@suse.de>
Adds a new global tunable, sampling_down_factor. Set to 1 it makes no
changes from existing behavior, but set to greater than 1 (e.g. 100)
it acts as a multiplier for the scheduling interval for reevaluating
load when the CPU is at its top speed due to high load. This improves
performance by reducing the overhead of load evaluation and helping
the CPU stay at its top speed when truly busy, rather than shifting
back and forth in speed. This tunable has no effect on behavior at
lower speeds/lower CPU loads.
This patch is against 2.6.36-rc6.
This patch should help solve kernel bug 19672 "ondemand is slow".
Signed-off-by: David Niemi <dniemi@verisign.com>
Acked-by: Venkatesh Pallipadi <venki@google.com>
CC: Daniel Hollocher <danielhollocher@gmail.com>
CC: <cpufreq-list@vger.kernel.org>
CC: <linux-kernel@vger.kernel.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Indent the body of for_each_cpu.
The semantic match that finds this problem is as follows:
(http://coccinelle.lip6.fr/)
// <smpl>
@r disable braces4@
position p1,p2;
statement S1,S2;
@@
(
if (...) { ... }
|
if (...) S1@p1 S2@p2
)
@script:python@
p1 << r.p1;
p2 << r.p2;
@@
if (p1[0].column == p2[0].column):
cocci.print_main("branch",p1)
cocci.print_secs("after",p2)
// </smpl>
Signed-off-by: Julia Lawall <julia@diku.dk>
Signed-off-by: Dave Jones <davej@redhat.com>
This patch fixes up a brace warning found by the checkpatch.pl tool
Signed-off-by: Neal Buckendahl <nealb001@tbcnet.com>
Signed-off-by: Dave Jones <davej@redhat.com>
and fix the broken case if a core's frequency depends on others.
trace_power_frequency was only implemented in a rather ungeneric way
in acpi-cpufreq driver's target() function only.
-> Move the call to trace_power_frequency to
cpufreq.c:cpufreq_notify_transition() where CPUFREQ_POSTCHANGE
notifier is triggered.
This will support power frequency tracing by all cpufreq drivers
trace_power_frequency did not trace frequency changes correctly when
the userspace governor was used or when CPU cores' frequency depend
on each other.
-> Moving this into the CPUFREQ_POSTCHANGE notifier and pass the cpu
which gets switched automatically fixes this.
Robert Schoene provided some important fixes on top of my initial
quick shot version which are integrated in this patch:
- Forgot some changes in power_end trace (TP_printk/variable names)
- Variable dummy in power_end must now be cpu_id
- Use static 64 bit variable instead of unsigned int for cpu_id
Signed-off-by: Thomas Renninger <trenn@suse.de>
CC: davej@redhat.com
CC: arjan@infradead.org
CC: linux-kernel@vger.kernel.org
CC: robert.schoene@tu-dresden.de
Tested-by: robert.schoene@tu-dresden.de
Signed-off-by: Dave Jones <davej@redhat.com>
For UP systems this is not required, and results in a more consistent
sample interval.
[akpm@linux-foundation.org: coding-style fixes]
Signed-off-by: Jocelyn Falempe <jocelyn.falempe@motorola.com>
Signed-off-by: Mike Chan <mike@android.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
lock_policy_rwsem_* and unlock_policy_rwsem_* functions are scheduled
to be unexported when 2.6.33. Now there are no other callers of them
out of cpufreq.c, unexport them and make them static.
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
Make simpler to read and call.
*** v3 - Always call when powersave_bias is enabled.
Acked-by: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Mike Chan <mike@android.com>
Signed-off-by: Dave Jones <davej@redhat.com>
395913d0b1 ("[CPUFREQ] remove rwsem lock
from CPUFREQ_GOV_STOP call (second call site)") is not needed, because
there is no rwsem lock in cpufreq_ondemand and cpufreq_conservative
anymore. Lock should not be released until the work done.
Addresses https://bugzilla.kernel.org/show_bug.cgi?id=1594
Signed-off-by: Andrej Gelenberg <andrej.gelenberg@udo.edu>
Cc: Mathieu Desnoyers <mathieu.desnoyers@polymtl.ca>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Acked-by: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Signed-off-by: Dave Jones <davej@redhat.com>
* 'x86-cpu-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
x86, hypervisor: add missing <linux/module.h>
Modify the VMware balloon driver for the new x86_hyper API
x86, hypervisor: Export the x86_hyper* symbols
x86: Clean up the hypervisor layer
x86, HyperV: fix up the license to mshyperv.c
x86: Detect running on a Microsoft HyperV system
x86, cpu: Make APERF/MPERF a normal table-driven flag
x86, k8: Fix build error when K8_NB is disabled
x86, cacheinfo: Disable index in all four subcaches
x86, cacheinfo: Make L3 cache info per node
x86, cacheinfo: Reorganize AMD L3 cache structure
x86, cacheinfo: Turn off L3 cache index disable feature in virtualized environments
x86, cacheinfo: Unify AMD L3 cache index disable checking
cpufreq: Unify sysfs attribute definition macros
powernow-k8: Fix frequency reporting
x86, cpufreq: Add APERF/MPERF support for AMD processors
x86: Unify APERF/MPERF support
powernow-k8: Add core performance boost support
x86, cpu: Add AMD core boosting feature flag to /proc/cpuinfo
Fix up trivial conflicts in arch/x86/kernel/cpu/intel_cacheinfo.c and
drivers/cpufreq/cpufreq_ondemand.c
Pavel Machek pointed out that not all CPUs have an efficient
idle at high frequency. Specifically, older Intel and various
AMD cpus would get a higher powerusage when copying files from
USB.
Mike Chan pointed out that the same is true for various ARM
chips as well.
Thomas Renninger suggested to make this a sysfs tunable with a
reasonable default.
This patch adds a sysfs tunable for the new behavior, and uses
a very simple function to determine a reasonable default,
depending on the CPU vendor/type.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Acked-by: Rik van Riel <riel@redhat.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: davej@redhat.com
LKML-Reference: <20100509082651.46914d04@infradead.org>
[ minor tidyup ]
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The ondemand cpufreq governor uses CPU busy time (e.g. not-idle
time) as a measure for scaling the CPU frequency up or down.
If the CPU is busy, the CPU frequency scales up, if it's idle,
the CPU frequency scales down. Effectively, it uses the CPU busy
time as proxy variable for the more nebulous "how critical is
performance right now" question.
This algorithm falls flat on its face in the light of workloads
where you're alternatingly disk and CPU bound, such as the ever
popular "git grep", but also things like startup of programs and
maildir using email clients... much to the chagarin of Andrew
Morton.
This patch changes the ondemand algorithm to count iowait time
as busy, not idle, time. As shown in the breakdown cases above,
iowait is performance critical often, and by counting iowait,
the proxy variable becomes a more accurate representation of the
"how critical is performance" question.
The problem and fix are both verified with the "perf timechar"
tool.
Signed-off-by: Arjan van de Ven <arjan@linux.intel.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Dave Jones <davej@redhat.com>
Reviewed-by: Rik van Riel <riel@redhat.com>
Acked-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
LKML-Reference: <20100509082606.3d9f00d0@infradead.org>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
* 'fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/davej/cpufreq:
[CPUFREQ] use max load in conservative governor
[CPUFREQ] fix a lockdep warning
Multiple modules used to define those which are with identical
functionality and were needlessly replicated among the different cpufreq
drivers. Push them into the header and remove duplication.
Signed-off-by: Borislav Petkov <borislav.petkov@amd.com>
LKML-Reference: <1270065406-1814-7-git-send-email-bp@amd64.org>
Reviewed-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: H. Peter Anvin <hpa@zytor.com>
Instead of using the load of the last CPU in a package, use the
maximum load of all CPUs in a package.
Reported-by: Jean-Christian Goussard <jeanchristian.goussard@sfr.fr>
Signed-off-by: Dominik Brodowski <linux@dominikbrodowski.net>
Signed-off-by: Dave Jones <davej@redhat.com>
There is no need to do sysfs_remove_link() or kobject_put() etc.
when policy_rwsem_write is held, move them after releasing the lock.
This fixes the lockdep warning:
halt/4071 is trying to acquire lock:
(s_active){++++.+}, at: [<c0000000001ef868>] .sysfs_addrm_finish+0x58/0xc0
but task is already holding lock:
(&per_cpu(cpu_policy_rwsem, cpu)){+.+.+.}, at: [<c0000000004cd6ac>] .lock_policy_rwsem_write+0x84/0xf4
Reported-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: WANG Cong <amwang@redhat.com>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: Venkatesh Pallipadi <venkatesh.pallipadi@intel.com>
Signed-off-by: Dave Jones <davej@redhat.com>
percpu.h is included by sched.h and module.h and thus ends up being
included when building most .c files. percpu.h includes slab.h which
in turn includes gfp.h making everything defined by the two files
universally available and complicating inclusion dependencies.
percpu.h -> slab.h dependency is about to be removed. Prepare for
this change by updating users of gfp and slab facilities include those
headers directly instead of assuming availability. As this conversion
needs to touch large number of source files, the following script is
used as the basis of conversion.
http://userweb.kernel.org/~tj/misc/slabh-sweep.py
The script does the followings.
* Scan files for gfp and slab usages and update includes such that
only the necessary includes are there. ie. if only gfp is used,
gfp.h, if slab is used, slab.h.
* When the script inserts a new include, it looks at the include
blocks and try to put the new include such that its order conforms
to its surrounding. It's put in the include block which contains
core kernel includes, in the same order that the rest are ordered -
alphabetical, Christmas tree, rev-Xmas-tree or at the end if there
doesn't seem to be any matching order.
* If the script can't find a place to put a new include (mostly
because the file doesn't have fitting include block), it prints out
an error message indicating which .h file needs to be added to the
file.
The conversion was done in the following steps.
1. The initial automatic conversion of all .c files updated slightly
over 4000 files, deleting around 700 includes and adding ~480 gfp.h
and ~3000 slab.h inclusions. The script emitted errors for ~400
files.
2. Each error was manually checked. Some didn't need the inclusion,
some needed manual addition while adding it to implementation .h or
embedding .c file was more appropriate for others. This step added
inclusions to around 150 files.
3. The script was run again and the output was compared to the edits
from #2 to make sure no file was left behind.
4. Several build tests were done and a couple of problems were fixed.
e.g. lib/decompress_*.c used malloc/free() wrappers around slab
APIs requiring slab.h to be added manually.
5. The script was run on all .h files but without automatically
editing them as sprinkling gfp.h and slab.h inclusions around .h
files could easily lead to inclusion dependency hell. Most gfp.h
inclusion directives were ignored as stuff from gfp.h was usually
wildly available and often used in preprocessor macros. Each
slab.h inclusion directive was examined and added manually as
necessary.
6. percpu.h was updated not to include slab.h.
7. Build test were done on the following configurations and failures
were fixed. CONFIG_GCOV_KERNEL was turned off for all tests (as my
distributed build env didn't work with gcov compiles) and a few
more options had to be turned off depending on archs to make things
build (like ipr on powerpc/64 which failed due to missing writeq).
* x86 and x86_64 UP and SMP allmodconfig and a custom test config.
* powerpc and powerpc64 SMP allmodconfig
* sparc and sparc64 SMP allmodconfig
* ia64 SMP allmodconfig
* s390 SMP allmodconfig
* alpha SMP allmodconfig
* um on x86_64 SMP allmodconfig
8. percpu.h modifications were reverted so that it could be applied as
a separate patch and serve as bisection point.
Given the fact that I had only a couple of failures from tests on step
6, I'm fairly confident about the coverage of this conversion patch.
If there is a breakage, it's likely to be something in one of the arch
headers which should be easily discoverable easily on most builds of
the specific arch.
Signed-off-by: Tejun Heo <tj@kernel.org>
Guess-its-ok-by: Christoph Lameter <cl@linux-foundation.org>
Cc: Ingo Molnar <mingo@redhat.com>
Cc: Lee Schermerhorn <Lee.Schermerhorn@hp.com>
Constify struct sysfs_ops.
This is part of the ops structure constification
effort started by Arjan van de Ven et al.
Benefits of this constification:
* prevents modification of data that is shared
(referenced) by many other structure instances
at runtime
* detects/prevents accidental (but not intentional)
modification attempts on archs that enforce
read-only kernel data at runtime
* potentially better optimized code as the compiler
can assume that the const data cannot be changed
* the compiler/linker move const data into .rodata
and therefore exclude them from false sharing
Signed-off-by: Emese Revfy <re.emese@gmail.com>
Acked-by: David Teigland <teigland@redhat.com>
Acked-by: Matt Domsch <Matt_Domsch@dell.com>
Acked-by: Maciej Sosnowski <maciej.sosnowski@intel.com>
Acked-by: Hans J. Koch <hjk@linutronix.de>
Acked-by: Pekka Enberg <penberg@cs.helsinki.fi>
Acked-by: Jens Axboe <jens.axboe@oracle.com>
Acked-by: Stephen Hemminger <shemminger@vyatta.com>
Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
Dominik said:
target_freq cannot be below policy->min or above policy->max.
If it were, the whole cpufreq subsystem is broken.
But (answer):
I think the "ondemand" governor can ask for a target frequency that is
below policy->min.
...
A patch such as below may be needed to sanitize the target frequency
requested by "ondemand". The "conservative" governor already has this check:
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>
# diff -bur x/drivers/cpufreq/cpufreq_ondemand.c.orig y/drivers/cpufreq/cpufreq_ondemand.c
* 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/percpu: (34 commits)
m68k: rename global variable vmalloc_end to m68k_vmalloc_end
percpu: add missing per_cpu_ptr_to_phys() definition for UP
percpu: Fix kdump failure if booted with percpu_alloc=page
percpu: make misc percpu symbols unique
percpu: make percpu symbols in ia64 unique
percpu: make percpu symbols in powerpc unique
percpu: make percpu symbols in x86 unique
percpu: make percpu symbols in xen unique
percpu: make percpu symbols in cpufreq unique
percpu: make percpu symbols in oprofile unique
percpu: make percpu symbols in tracer unique
percpu: make percpu symbols under kernel/ and mm/ unique
percpu: remove some sparse warnings
percpu: make alloc_percpu() handle array types
vmalloc: fix use of non-existent percpu variable in put_cpu_var()
this_cpu: Use this_cpu_xx in trace_functions_graph.c
this_cpu: Use this_cpu_xx for ftrace
this_cpu: Use this_cpu_xx in nmi handling
this_cpu: Use this_cpu operations in RCU
this_cpu: Use this_cpu ops for VM statistics
...
Fix up trivial (famous last words) global per-cpu naming conflicts in
arch/x86/kvm/svm.c
mm/slab.c
This interface is mainly intended (and implemented) for ACPI _PPC BIOS
frequency limitations, but other cpufreq drivers can also use it for
similar use-cases.
Why is this needed:
Currently it's not obvious why cpufreq got limited.
People see cpufreq/scaling_max_freq reduced, but this could have
happened by:
- any userspace prog writing to scaling_max_freq
- thermal limitations
- hardware (_PPC in ACPI case) limitiations
Therefore export bios_limit (in kHz) to:
- Point the user that it's the BIOS (broken or intended) which limits
frequency
- Export it as a sysfs interface for userspace progs.
While this was a rarely used feature on laptops, there will appear
more and more server implemenations providing "Green IT" features like
allowing the service processor to limit the frequency. People want
to know about HW/BIOS frequency limitations.
All ACPI P-state driven cpufreq drivers are covered with this patch:
- powernow-k8
- powernow-k7
- acpi-cpufreq
Tested with a patched DSDT which limits the first two cores (_PPC returns 1)
via _PPC, exposed by bios_limit:
# echo 2200000 >cpu2/cpufreq/scaling_max_freq
# cat cpu*/cpufreq/scaling_max_freq
2600000
2600000
2200000
2200000
# #scaling_max_freq shows general user/thermal/BIOS limitations
# cat cpu*/cpufreq/bios_limit
2600000
2600000
2800000
2800000
# #bios_limit only shows the HW/BIOS limitation
CC: Pallipadi Venkatesh <venkatesh.pallipadi@intel.com>
CC: Len Brown <lenb@kernel.org>
CC: davej@codemonkey.org.uk
CC: linux@dominikbrodowski.net
Signed-off-by: Thomas Renninger <trenn@suse.de>
Signed-off-by: Dave Jones <davej@redhat.com>