Static and extern variables in kernel/power/hibernate.c need not be
initialized to 0 explicitly, so remove those initializations.
[rjw: Modified subject, added changelog.]
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
TASK_KILLABLE is often used to put tasks to sleep for quite some time.
One of the most common uses is to put tasks to sleep while waiting for
replies from a server on a networked filesystem (such as CIFS or NFS).
Unfortunately, fake_signal_wake_up does not currently wake up tasks
that are sleeping in TASK_KILLABLE state. This means that even if the
code were in place to allow them to freeze while in this sleep, it
wouldn't work anyway.
This patch changes this function to wake tasks in this state as well.
This should be harmless -- if the code doing the sleeping doesn't have
handling to deal with freezer events, it should just go back to sleep.
If it does, then this will allow that code to do the right thing.
Signed-off-by: Jeff Layton <jlayton@redhat.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Patch "PM / Hibernate: Add resumewait param to support MMC-like
devices as resume file" added the resumewait kernel command line
option. The present patch adds resumedelay so that
resumewait/delay were analogous to rootwait/delay.
[rjw: Modified the subject and changelog slightly.]
Signed-off-by: Barry Song <baohua.song@csr.com>
Acked-by: Pavel Machek <pavel@ucw.cz>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Some devices like MMC are async detected very slow. For example,
drivers/mmc/host/sdhci.c launches a 200ms delayed work to detect
MMC partitions then add disk.
We have wait_for_device_probe() and scsi_complete_async_scans()
before calling swsusp_check(), but it is not enough to wait for MMC.
This patch adds resumewait kernel param just like rootwait so
that we have enough time to wait until MMC is ready. The difference is
that we wait for resume partition whereas rootwait waits for rootfs
partition (which may be on a different device).
This patch will make hibernation support many embedded products
without SCSI devices, but with devices like MMC.
[rjw: Modified the changelog slightly.]
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Reviewed-by: Valdis Kletnieks <valdis.kletnieks@vt.edu>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Fix a typo in a function name in the kerneldoc comment next to
resume_target_kernel().
[rjw: Changed the subject slightly, added the changelog.]
Signed-off-by: Barry Song <Baohua.Song@csr.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
There is a problem with the current ordering of hibernate code which
leads to deadlocks in some filesystems' memory shrinkers. Namely,
some filesystems use freezable kernel threads that are inactive when
the hibernate memory preallocation is carried out. Those same
filesystems use memory shrinkers that may be triggered by the
hibernate memory preallocation. If those memory shrinkers wait for
the frozen kernel threads, the hibernate process deadlocks (this
happens with XFS, for one example).
Apparently, it is not technically viable to redesign the filesystems
in question to avoid the situation described above, so the only
possible solution of this issue is to defer the freezing of kernel
threads until the hibernate memory preallocation is done, which is
implemented by this change.
Unfortunately, this requires the memory preallocation to be done
before the "prepare" stage of device freeze, so after this change the
only way drivers can allocate additional memory for their freeze
routines in a clean way is to use PM notifiers.
Reported-by: Christoph <cr2005@u-club.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Introduce the config option CONFIG_VT_CONSOLE_SLEEP in order to cleanup
the #if defined ugliness for the vt suspend support functions. Note that
CONFIG_VT_CONSOLE is already dependant on CONFIG_VT.
The function pm_set_vt_switch is actually dependant on CONFIG_VT and not
CONFIG_PM_SLEEP. This fixes a compile error when CONFIG_PM_SLEEP is
not set:
drivers/tty/vt/vt_ioctl.c:1794: error: redefinition of 'pm_set_vt_switch'
include/linux/suspend.h:17: error: previous definition of 'pm_set_vt_switch' was here
Also, remove the incorrect path from the comment in console.c.
[rjw: Replaced #if defined() with #ifdef in suspend.h.]
Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com>
Acked-by: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
In enter_state() we use "state" as an offset for the pm_states[]
array. The pm_states[] array only has PM_SUSPEND_MAX elements so
this test is off by one.
Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Cc: stable@kernel.org
For s390 there is one additional byte associated with each page,
the storage key. This byte contains the referenced and changed
bits and needs to be included into the hibernation image.
If the storage keys are not restored to their previous state all
original pages would appear to be dirty. This can cause
inconsistencies e.g. with read-only filesystems.
Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Record S3 failure time about each reason and the latest two failed
devices' names in S3 progress.
We can check it through 'suspend_stats' entry in debugfs.
The motivation of the patch:
We are enabling power features on Medfield. Comparing with PC/notebook,
a mobile enters/exits suspend-2-ram (we call it s3 on Medfield) far
more frequently. If it can't enter suspend-2-ram in time, the power
might be used up soon.
We often find sometimes, a device suspend fails. Then, system retries
s3 over and over again. As display is off, testers and developers
don't know what happens.
Some testers and developers complain they don't know if system
tries suspend-2-ram, and what device fails to suspend. They need
such info for a quick check. The patch adds suspend_stats under
debugfs for users to check suspend to RAM statistics quickly.
If not using this patch, we have other methods to get info about
what device fails. One is to turn on CONFIG_PM_DEBUG, but users
would get too much info and testers need recompile the system.
In addition, dynamic debug is another good tool to dump debug info.
But it still doesn't match our utilization scenario closely.
1) user need write a user space parser to process the syslog output;
2) Our testing scenario is we leave the mobile for at least hours.
Then, check its status. No serial console available during the
testing. One is because console would be suspended, and the other
is serial console connecting with spi or HSU devices would consume
power. These devices are powered off at suspend-2-ram.
Signed-off-by: ShuoX Liu <shuox.liu@intel.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* pm-runtime:
PM / Tracing: build rpm-traces.c only if CONFIG_PM_RUNTIME is set
PM / Runtime: Replace dev_dbg() with trace_rpm_*()
PM / Runtime: Introduce trace points for tracing rpm_* functions
PM / Runtime: Don't run callbacks under lock for power.irq_safe set
USB: Add wakeup info to debugging messages
PM / Runtime: pm_runtime_idle() can be called in atomic context
PM / Runtime: Add macro to test for runtime PM events
PM / Runtime: Add might_sleep() to runtime PM functions
* 'irq-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
irq: Fix check for already initialized irq_domain in irq_domain_add
irq: Add declaration of irq_domain_simple_ops to irqdomain.h
* 'x86-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
x86/rtc: Don't recursively acquire rtc_lock
* 'sched-urgent-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
posix-cpu-timers: Cure SMP wobbles
sched: Fix up wchan borkage
sched/rt: Migrate equal priority tasks to available CPUs
David reported:
Attached below is a watered-down version of rt/tst-cpuclock2.c from
GLIBC. Just build it with "gcc -o test test.c -lpthread -lrt" or
similar.
Run it several times, and you will see cases where the main thread
will measure a process clock difference before and after the nanosleep
which is smaller than the cpu-burner thread's individual thread clock
difference. This doesn't make any sense since the cpu-burner thread
is part of the top-level process's thread group.
I've reproduced this on both x86-64 and sparc64 (using both 32-bit and
64-bit binaries).
For example:
[davem@boricha build-x86_64-linux]$ ./test
process: before(0.001221967) after(0.498624371) diff(497402404)
thread: before(0.000081692) after(0.498316431) diff(498234739)
self: before(0.001223521) after(0.001240219) diff(16698)
[davem@boricha build-x86_64-linux]$
The diff of 'process' should always be >= the diff of 'thread'.
I make sure to wrap the 'thread' clock measurements the most tightly
around the nanosleep() call, and that the 'process' clock measurements
are the outer-most ones.
---
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>
#include <time.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>
#include <pthread.h>
static pthread_barrier_t barrier;
static void *chew_cpu(void *arg)
{
pthread_barrier_wait(&barrier);
while (1)
__asm__ __volatile__("" : : : "memory");
return NULL;
}
int main(void)
{
clockid_t process_clock, my_thread_clock, th_clock;
struct timespec process_before, process_after;
struct timespec me_before, me_after;
struct timespec th_before, th_after;
struct timespec sleeptime;
unsigned long diff;
pthread_t th;
int err;
err = clock_getcpuclockid(0, &process_clock);
if (err)
return 1;
err = pthread_getcpuclockid(pthread_self(), &my_thread_clock);
if (err)
return 1;
pthread_barrier_init(&barrier, NULL, 2);
err = pthread_create(&th, NULL, chew_cpu, NULL);
if (err)
return 1;
err = pthread_getcpuclockid(th, &th_clock);
if (err)
return 1;
pthread_barrier_wait(&barrier);
err = clock_gettime(process_clock, &process_before);
if (err)
return 1;
err = clock_gettime(my_thread_clock, &me_before);
if (err)
return 1;
err = clock_gettime(th_clock, &th_before);
if (err)
return 1;
sleeptime.tv_sec = 0;
sleeptime.tv_nsec = 500000000;
nanosleep(&sleeptime, NULL);
err = clock_gettime(th_clock, &th_after);
if (err)
return 1;
err = clock_gettime(my_thread_clock, &me_after);
if (err)
return 1;
err = clock_gettime(process_clock, &process_after);
if (err)
return 1;
diff = process_after.tv_nsec - process_before.tv_nsec;
printf("process: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
process_before.tv_sec, process_before.tv_nsec,
process_after.tv_sec, process_after.tv_nsec, diff);
diff = th_after.tv_nsec - th_before.tv_nsec;
printf("thread: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
th_before.tv_sec, th_before.tv_nsec,
th_after.tv_sec, th_after.tv_nsec, diff);
diff = me_after.tv_nsec - me_before.tv_nsec;
printf("self: before(%lu.%.9lu) after(%lu.%.9lu) diff(%lu)\n",
me_before.tv_sec, me_before.tv_nsec,
me_after.tv_sec, me_after.tv_nsec, diff);
return 0;
}
This is due to us using p->se.sum_exec_runtime in
thread_group_cputime() where we iterate the thread group and sum all
data. This does not take time since the last schedule operation (tick
or otherwise) into account. We can cure this by using
task_sched_runtime() at the cost of having to take locks.
This also means we can (and must) do away with
thread_group_sched_runtime() since the modified thread_group_cputime()
is now more accurate and would deadlock when called from
thread_group_sched_runtime().
Aside of that it makes the function safe on 32 bit systems. The old
code added t->se.sum_exec_runtime unprotected. sum_exec_runtime is a
64bit value and could be changed on another cpu at the same time.
Reported-by: David Miller <davem@davemloft.net>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org
Link: http://lkml.kernel.org/r/1314874459.7945.22.camel@twins
Tested-by: David Miller <davem@davemloft.net>
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
__find_resource() incorrectly returns a resource window which overlaps
an existing allocated window. This happens when the parent's
resource-window spans 0x00000000 to 0xffffffff and is entirely allocated
to all its children resource-windows.
__find_resource() looks for gaps in resource allocation among the
children resource windows. When it encounters the last child window it
blindly tries the range next to one allocated to the last child. Since
the last child's window ends at 0xffffffff the calculation overflows,
leading the algorithm to believe that any window in the range 0x0000000
to 0xfffffff is available for allocation. This leads to a conflicting
window allocation.
Michal Ludvig reported this issue seen on his platform. The following
patch fixes the problem and has been verified by Michal. I believe this
bug has been there for ages. It got exposed by git commit 2bbc694227
("PCI : ability to relocate assigned pci-resources")
Signed-off-by: Ram Pai <linuxram@us.ibm.com>
Tested-by: Michal Ludvig <mludvig@logix.net.nz>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Do not build kernel/trace/rpm-traces.c if CONFIG_PM_RUNTIME is not
set, which avoids a build failure.
[rjw: Added the changelog and modified the subject slightly.]
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
This patch introduces 3 trace points to prepare for tracing
rpm_idle/rpm_suspend/rpm_resume functions, so we can use these
trace points to replace the current dev_dbg().
Signed-off-by: Ming Lei <ming.lei@canonical.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Commit c259e01a1e ("sched: Separate the scheduler entry for
preemption") contained a boo-boo wrecking wchan output. It forgot to
put the new schedule() function in the __sched section and thereby
doesn't get properly ignored for things like wchan.
Tested-by: Simon Kirby <sim@hostway.ca>
Cc: stable@kernel.org # 2.6.39+
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110923000346.GA25425@hostway.ca
Signed-off-by: Ingo Molnar <mingo@elte.hu>
If PTRACE_LISTEN fails after lock_task_sighand() it doesn't drop ->siglock.
Reported-by: Matt Fleming <matt.fleming@intel.com>
Signed-off-by: Oleg Nesterov <oleg@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'irq-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
x86, iommu: Mark DMAR IRQ as non-threaded
genirq: Make irq_shutdown() symmetric vs. irq_startup again
Even with just the interface limited to admin, there really is little to
reason to give byte-per-byte counts for taskstats. So round it down to
something less intrusive.
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Ok, this isn't optimal, since it means that 'iotop' needs admin
capabilities, and we may have to work on this some more. But at the
same time it is very much not acceptable to let anybody just read
anybody elses IO statistics quite at this level.
Use of the GENL_ADMIN_PERM suggested by Johannes Berg as an alternative
to checking the capabilities by hand.
Reported-by: Vasiliy Kulikov <segoon@openwall.com>
Cc: Johannes Berg <johannes.berg@intel.com>
Acked-by: Balbir Singh <bsingharora@gmail.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Commit 43fa5460fe ("sched: Try not to
migrate higher priority RT tasks") also introduced a change in behavior
which keeps RT tasks on the same CPU if there is an equal priority RT
task currently running even if there are empty CPUs available.
This can cause unnecessary wakeup latencies, and can prevent the
scheduler from balancing all RT tasks across available CPUs.
This change causes an RT task to search for a new CPU if an equal
priority RT task is already running on wakeup. Lower priority tasks
will still have to wait on higher priority tasks, but the system should
still balance out because there is always the possibility that if there
are both a high and low priority RT tasks on a given CPU that the high
priority task could wakeup while the low priority task is running and
force it to search for a better runqueue.
Signed-off-by: Shawn Bohrer <sbohrer@rgmadvisors.com>
Acked-by: Steven Rostedt <rostedt@goodmis.org>
Tested-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: stable@kernel.org # 37+
Link: http://lkml.kernel.org/r/1315837684-18733-1-git-send-email-sbohrer@rgmadvisors.com
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Take cwq->gcwq->lock to avoid racing between drain_workqueue checking to
make sure the workqueues are empty and cwq_dec_nr_in_flight decrementing
and then incrementing nr_active when it activates a delayed work.
We discovered this when a corner case in one of our drivers resulted in
us trying to destroy a workqueue in which the remaining work would
always requeue itself again in the same workqueue. We would hit this
race condition and trip the BUG_ON on workqueue.c:3080.
Signed-off-by: Thomas Tuttle <ttuttle@chromium.org>
Acked-by: Tejun Heo <tj@kernel.org>
Cc: <stable@kernel.org>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
If an irq_chip provides .irq_shutdown(), but neither of .irq_disable() or
.irq_mask(), free_irq() crashes when jumping to NULL.
Fix this by only trying .irq_disable() and .irq_mask() if there's no
.irq_shutdown() provided.
This revives the symmetry with irq_startup(), which tries .irq_startup(),
.irq_enable(), and irq_unmask(), and makes it consistent with the comment for
irq_chip.irq_shutdown() in <linux/irq.h>, which says:
* @irq_shutdown: shut down the interrupt (defaults to ->disable if NULL)
This is also how __free_irq() behaved before the big overhaul, cfr. e.g.
3b56f0585f ("genirq: Remove bogus conditional"),
where the core interrupt code always overrode .irq_shutdown() to
.irq_disable() if .irq_shutdown() was NULL.
Signed-off-by: Geert Uytterhoeven <geert@linux-m68k.org>
Cc: linux-m68k@lists.linux-m68k.org
Link: http://lkml.kernel.org/r/1315742394-16036-2-git-send-email-geert@linux-m68k.org
Cc: stable@kernel.org
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
* 'timers-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
rtc: twl: Fix registration vs. init order
rtc: Initialized rtc_time->tm_isdst
rtc: Fix RTC PIE frequency limit
rtc: rtc-twl: Remove lockdep related local_irq_enable()
rtc: rtc-twl: Switch to using threaded irq
rtc: ep93xx: Fix 'rtc' may be used uninitialized warning
alarmtimers: Avoid possible denial of service with high freq periodic timers
alarmtimers: Memset itimerspec passed into alarm_timer_get
alarmtimers: Avoid possible null pointer traversal
* 'sched-fixes-for-linus' of git://tesla.tglx.de/git/linux-2.6-tip:
sched: Fix a memory leak in __sdt_free()
sched: Move blk_schedule_flush_plug() out of __schedule()
sched: Separate the scheduler entry for preemption
We detected a serious issue with PERF_SAMPLE_READ and
timing information when events were being multiplexing.
Samples would have time_running > time_enabled. That
was easy to reproduce with a libpfm4 example (ran 3
times to cause multiplexing on Core 2):
$ syst_smpl -e uops_retired:freq=1 &
$ syst_smpl -e uops_retired:freq=1 &
$ syst_smpl -e uops_retired:freq=1 &
IIP:0x0000000040062d ... PERIOD:2355332948 ENA=40144625315 RUN=60014875184
syst_smpl: WARNING: time_running > time_enabled
63277537998 uops_retired:freq=1 , scaled
The bug was not present in kernel up to (and including) 3.0. It turns
out the bug was introduced by the following commit:
commit c479429591
events: Move lockless timer calculation into helper function
The parameters of the function got reversed yet the call sites
were not updated to reflect the change. That lead to time_running
and time_enabled being swapped. That had no effect when there was
no multiplexing because in that case time_running = time_enabled
but it would show up in any other scenario.
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110829124112.GA4828@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The current cgroup context switch code was incorrect leading
to bogus counts. Furthermore, as soon as there was an active
cgroup event on a CPU, the context switch cost on that CPU
would increase by a significant amount as demonstrated by a
simple ping/pong example:
$ ./pong
Both processes pinned to CPU1, running for 10s
10684.51 ctxsw/s
Now start a cgroup perf stat:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100
$ ./pong
Both processes pinned to CPU1, running for 10s
6674.61 ctxsw/s
That's a 37% penalty.
Note that pong is not even in the monitored cgroup.
The results shown by perf stat are bogus:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 100
Performance counter stats for 'sleep 100':
CPU1 <not counted> cycles test
CPU1 16,984,189,138 cycles # 0.000 GHz
The second 'cycles' event should report a count @ CPU clock
(here 2.4GHz) as it is counting across all cgroups.
The patch below fixes the bogus accounting and bypasses any
cgroup switches in case the outgoing and incoming tasks are
in the same cgroup.
With this patch the same test now yields:
$ ./pong
Both processes pinned to CPU1, running for 10s
10775.30 ctxsw/s
Start perf stat with cgroup:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10
Run pong outside the cgroup:
$ /pong
Both processes pinned to CPU1, running for 10s
10687.80 ctxsw/s
The penalty is now less than 2%.
And the results for perf stat are correct:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10
Performance counter stats for 'sleep 10':
CPU1 <not counted> cycles test # 0.000 GHz
CPU1 23,933,981,448 cycles # 0.000 GHz
Now perf stat reports the correct counts for
for the non cgroup event.
If we run pong inside the cgroup, then we also get the
correct counts:
$ perf stat -e cycles,cycles -A -a -G test -C 1 -- sleep 10
Performance counter stats for 'sleep 10':
CPU1 22,297,726,205 cycles test # 0.000 GHz
CPU1 23,933,981,448 cycles # 0.000 GHz
10.001457237 seconds time elapsed
Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
There is no real reason to run blk_schedule_flush_plug() with
interrupts and preemption disabled.
Move it into schedule() and call it when the task is going voluntarily
to sleep. There might be false positives when the task is woken
between that call and actually scheduling, but that's not really
different from being woken immediately after switching away.
This fixes a deadlock in the scheduler where the
blk_schedule_flush_plug() callchain enables interrupts and thereby
allows a wakeup to happen of the task that's going to sleep.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@kernel.org # 2.6.39+
Link: http://lkml.kernel.org/n/tip-dwfxtra7yg1b5r65m32ywtct@git.kernel.org
Signed-off-by: Ingo Molnar <mingo@elte.hu>
Block-IO and workqueues call into notifier functions from the
scheduler core code with interrupts and preemption disabled. These
calls should be made before entering the scheduler core.
To simplify this, separate the scheduler core code into
__schedule(). __schedule() is directly called from the places which
set PREEMPT_ACTIVE and from schedule(). This allows us to add the work
checks into schedule(), so they are only called when a task voluntary
goes to sleep.
Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Tejun Heo <tj@kernel.org>
Cc: Jens Axboe <axboe@kernel.dk>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: stable@kernel.org # 2.6.39+
Link: http://lkml.kernel.org/r/20110622174918.813258321@linutronix.de
Signed-off-by: Ingo Molnar <mingo@elte.hu>
The nfsservctl system call is now gone, so we should remove all
linkage for it.
Signed-off-by: NeilBrown <neilb@suse.de>
Signed-off-by: J. Bruce Fields <bfields@redhat.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
It seems that 7bf693951a ("console: allow to retain boot console via
boot option keep_bootcon") doesn't always achieve what it aims, as when
printk_late_init() runs it unconditionally turns off all boot consoles.
With this patch, I am able to see more messages on the boot console in
KVM guests than I can without, when keep_bootcon is specified.
I think it is appropriate for the relevant -stable trees. However, it's
more of an annoyance than a serious bug (ideally you don't need to keep
the boot console around as console handover should be working -- I was
encountering a situation where the console handover wasn't working and
not having the boot console available meant I couldn't see why).
Signed-off-by: Nishanth Aravamudan <nacc@us.ibm.com>
Cc: David S. Miller <davem@davemloft.net>
Cc: Alan Cox <alan@lxorguk.ukuu.org.uk>
Cc: Greg KH <gregkh@suse.de>
Acked-by: Fabio M. Di Nitto <fdinitto@redhat.com>
Cc: <stable@kernel.org> [2.6.39.x, 3.0.x]
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
I ran into a couple of programs which broke with the new Linux 3.0
version. Some of those were binary only. I tried to use LD_PRELOAD to
work around it, but it was quite difficult and in one case impossible
because of a mix of 32bit and 64bit executables.
For example, all kind of management software from HP doesnt work, unless
we pretend to run a 2.6 kernel.
$ uname -a
Linux svivoipvnx001 3.0.0-08107-g97cd98f #1062 SMP Fri Aug 12 18:11:45 CEST 2011 i686 i686 i386 GNU/Linux
$ hpacucli ctrl all show
Error: No controllers detected.
$ rpm -qf /usr/sbin/hpacucli
hpacucli-8.75-12.0
Another notable case is that Python now reports "linux3" from
sys.platform(); which in turn can break things that were checking
sys.platform() == "linux2":
https://bugzilla.mozilla.org/show_bug.cgi?id=664564
It seems pretty clear to me though it's a bug in the apps that are using
'==' instead of .startswith(), but this allows us to unbreak broken
programs.
This patch adds a UNAME26 personality that makes the kernel report a
2.6.40+x version number instead. The x is the x in 3.x.
I know this is somewhat ugly, but I didn't find a better workaround, and
compatibility to existing programs is important.
Some programs also read /proc/sys/kernel/osrelease. This can be worked
around in user space with mount --bind (and a mount namespace)
To use:
wget ftp://ftp.kernel.org/pub/linux/kernel/people/ak/uname26/uname26.c
gcc -o uname26 uname26.c
./uname26 program
Signed-off-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
Add a global notification chain that gets called upon changes to the
aggregated constraint value for any device.
The notification callbacks are passing the full constraint request data
in order for the callees to have access to it. The current use is for the
platform low-level code to access the target device of the constraint.
Signed-off-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
In preparation for the per-device constratins support:
- rename update_target to pm_qos_update_target
- generalize and export pm_qos_update_target for usage by the upcoming
per-device latency constraints framework:
* operate on struct pm_qos_constraints for constraints management,
* introduce an 'action' parameter for constraints add/update/remove,
* the return value indicates if the aggregated constraint value has
changed,
- update the internal code to operate on struct pm_qos_constraints
- add a NULL pointer check in the API functions
Signed-off-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
In preparation for the per-device constratins support, re-organize
the data strctures:
- add a struct pm_qos_constraints which contains the constraints
related data
- update struct pm_qos_object contents to the PM QoS internal object
data. Add a pointer to struct pm_qos_constraints
- update the internal code to use the new data structs.
Signed-off-by: Jean Pihet <j-pihet@ti.com>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
Move around the PM QoS misc devices management code
for better readability.
Signed-off-by: Jean Pihet <j-pihet@ti.com>
Acked-by: markgross <markgross@thegnar.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
- Misc fixes to improve code readability:
* rename struct pm_qos_request_list to struct pm_qos_request,
* rename pm_qos_req parameter to req in internal code,
consistenly use req in the API parameters,
* update the in-kernel API callers to the new parameters names,
* rename of fields names (requests, list, node, constraints)
Signed-off-by: Jean Pihet <j-pihet@ti.com>
Acked-by: markgross <markgross@thegnar.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
The PM QoS implementation files are better named
kernel/power/qos.c and include/linux/pm_qos.h.
The PM QoS support is compiled under the CONFIG_PM option.
Signed-off-by: Jean Pihet <j-pihet@ti.com>
Acked-by: markgross <markgross@thegnar.org>
Reviewed-by: Kevin Hilman <khilman@ti.com>
Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl>
* 'for-linus' of git://oss.sgi.com/xfs/xfs:
xfs: fix tracing builds inside the source tree
xfs: remove subdirectories
xfs: don't expect xfs headers to be in subdirectories
This reverts commit f3637a5f2e.
It turns out that this breaks several drivers, one example being OMAP
boards which use the on-board OMAP UARTs and the omap-serial driver that
will not boot to userspace after the commit.
Paul Walmsley reports that enabling CONFIG_DEBUG_SHIRQ reveals 'IRQ
handler type mismatch' errors:
IRQ handler type mismatch for IRQ 74
current handler: serial idle
...
and the reason is that setting IRQF_ONESHOT will now result in those
interrupt handlers having different IRQF flags, and thus being
unsharable. So the commit log in the reverted commit:
"Since it is required for those users and
there is no difference for others it makes sense to add this flag
unconditionally."
is simply not true: there may not be any difference from a "actions at
irq time", but there is a *big* difference wrt this flag testing irq
management (see __setup_irq() in kernel/irq/manage.c).
One solution may be to stop verifying IRQF_ONESHOT in __setup_irq(), but
right now the safe course of action is to revert the change. Let's
revisit this in a later merge window.
Reported-by: Paul Walmsley <paul@pwsan.com>
Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
Requested-by: Alan Cox <alan@lxorguk.ukuu.org.uk>
Acked-by: Thomas Gleixner <tglx@linutronix.de>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'for-linus' of git://git.kernel.dk/linux-block: (23 commits)
Revert "cfq: Remove special treatment for metadata rqs."
block: fix flush machinery for stacking drivers with differring flush flags
block: improve rq_affinity placement
blktrace: add FLUSH/FUA support
Move some REQ flags to the common bio/request area
allow blk_flush_policy to return REQ_FSEQ_DATA independent of *FLUSH
xen/blkback: Make description more obvious.
cfq-iosched: Add documentation about idling
block: Make rq_affinity = 1 work as expected
block: swim3: fix unterminated of_device_id table
block/genhd.c: remove useless cast in diskstats_show()
drivers/cdrom/cdrom.c: relax check on dvd manufacturer value
drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parse
bsg-lib: add module.h include
cfq-iosched: Reduce linked group count upon group destruction
blk-throttle: correctly determine sync bio
loop: fix deadlock when sysfs and LOOP_CLR_FD race against each other
loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop devices
loop: add management interface for on-demand device allocation
loop: replace linked list of allocated devices with an idr index
...
Fix kernel-doc warning in irqdesc.c:
Warning(kernel/irq/irqdesc.c:353): No description found for parameter 'owner'
Signed-off-by: Randy Dunlap <rdunlap@xenotime.net>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
* 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip:
irq: Track the owner of irq descriptor
irq: Always set IRQF_ONESHOT if no primary handler is specified
genirq: Fix wrong bit operation