linux/kernel
Peter Zijlstra 3f5087a2ba sched: fix share (re)distribution
fix __aggregate_redistribute_shares() related lockup reported by
David S. Miller.

The problem this code tries to solve is 'accurately' calculating the 'fair'
share of the group weight for each cpu. The current code falls back to a global
group rebalance in case the sched_domain's span it looks at has no shares, but
does have tasks.

The reason it gets stuck here, is because its inherently racy - if someone
steals the last task after we compute the agg->rq_weight, but before we
rebalance, we'll never get out of the loop.

We could of course go fix that, but while looking at this issue I found that
this 'fallback' wasn't nearly as rare as I'd hoped it to be. In fact its quite
common - and given it walks the whole machine, thats very bad.

The new approach is simple (why didn't I think of it before?), we set the
aggregate shares to the full task group weight, and each larger sched domain
that encounters an aggregate shares larger than the weight, clips it (it
already re-distributes anyway).

This nicely converges to the desired global picture where the sum of all
shares equals the task group weight.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2008-04-25 00:25:08 +02:00
..
irq cpumask: Cleanup more uses of CPU_MASK and NODE_MASK 2008-04-19 19:44:58 +02:00
power Merge branches 'release' and 'doc' into release 2008-03-13 01:59:53 -04:00
time softlockup: fix NOHZ wakeup 2008-04-25 00:25:08 +02:00
.gitignore Update kernel/.gitignore with new auto-generated files 2008-02-09 23:27:01 -08:00
acct.c bsd_acct: using task_struct->tgid is not right in pid-namespaces 2008-03-24 19:22:20 -07:00
audit_tree.c Introduce path_put() 2008-02-14 21:13:33 -08:00
audit.c Audit: internally use the new LSM audit hooks 2008-04-19 09:52:37 +10:00
audit.h SELinux: use new audit hooks, remove redundant exports 2008-04-19 09:53:46 +10:00
auditfilter.c Audit: Final renamings and cleanup 2008-04-19 09:59:43 +10:00
auditsc.c Audit: Final renamings and cleanup 2008-04-19 09:59:43 +10:00
backtracetest.c x86: add a simple backtrace test module 2008-01-30 13:33:08 +01:00
capability.c Add 64-bit capability support to the kernel 2008-02-05 09:44:20 -08:00
cgroup_debug.c Task Control Groups: simple task cgroup debug info subsystem 2007-10-19 11:53:36 -07:00
cgroup.c cgroup: fix a race condition in manipulating tsk->cg_list 2008-04-18 08:17:57 -07:00
compat.c generic: reduce stack pressure in sched_affinity 2008-04-19 19:44:59 +02:00
configs.c use simple_read_from_buffer in kernel/ 2007-05-09 12:30:49 -07:00
cpu.c generic: use new set_cpus_allowed_ptr function 2008-04-19 19:44:58 +02:00
cpuset.c sched, cpuset: customize sched domains, core 2008-04-19 19:45:00 +02:00
delayacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
dma.c whitespace fixes: DMA channel allocator 2007-10-18 14:37:24 -07:00
exec_domain.c whitespace fixes: execution domains 2007-10-18 14:37:26 -07:00
exit.c [PATCH] get rid of __exit_files(), __exit_fs() and __put_fs_struct() 2008-04-22 19:55:09 -04:00
extable.c module: Don't report discarded init pages as kernel text. 2008-01-29 17:13:18 +11:00
fork.c x86: fpu xstate split fix 2008-04-19 19:19:55 +02:00
futex_compat.c futex_compat __user annotation 2008-03-30 14:18:41 -07:00
futex.c NULL noise: fs/*, mm/*, kernel/* 2008-03-30 14:18:41 -07:00
hrtimer.c hrtimer: optimize the softirq time optimization 2008-04-21 07:59:51 +02:00
itimer.c ITIMER_REAL: convert to use struct pid 2008-02-08 09:22:29 -08:00
kallsyms.c remove support for un-needed _extratext section 2008-02-06 10:41:01 -08:00
Kconfig.hz sched: high-res preemption tick 2008-01-25 21:08:29 +01:00
Kconfig.preempt rcu: move PREEMPT_RCU config option back under PREEMPT 2008-03-10 18:01:20 -07:00
kexec.c kernel: Remove unnecessary inclusions of asm/semaphore.h 2008-04-18 22:17:04 -04:00
kfifo.c is_power_of_2: kernel/kfifo.c 2007-07-16 09:05:50 -07:00
kgdb.c kgdb: always use icache flush for sw breakpoints 2008-04-17 20:05:43 +02:00
kmod.c generic: use new set_cpus_allowed_ptr function 2008-04-19 19:44:58 +02:00
kprobes.c kprobes: fix a null pointer bug in register_kretprobe() 2008-03-04 16:35:19 -08:00
ksysfs.c Kobject: convert remaining kobject_unregister() to kobject_put() 2008-01-24 20:40:40 -08:00
kthread.c Merge branch 'semaphore' of git://git.kernel.org/pub/scm/linux/kernel/git/willy/misc 2008-04-21 15:41:27 -07:00
latencytop.c latencytop: optimize LT_BACKTRACEDEPTH loops a bit 2008-04-19 19:44:57 +02:00
lockdep_internals.h
lockdep_proc.c lockdep: Avoid /proc/lockdep & lock_stat infinite output 2007-10-11 22:11:11 +02:00
lockdep.c Subject: lockdep: include all lock classes in all_lock_classes 2008-02-25 23:03:02 +01:00
Makefile Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-kgdb 2008-04-18 08:37:01 -07:00
marker.c markers: use synchronize_sched() 2008-04-02 15:28:19 -07:00
module.c kernel: Remove unnecessary inclusions of asm/semaphore.h 2008-04-18 22:17:04 -04:00
mutex-debug.c kernel: remove fastcall in kernel/* 2008-02-08 09:22:31 -08:00
mutex-debug.h
mutex.c kernel: remove fastcall in kernel/* 2008-02-08 09:22:31 -08:00
mutex.h
notifier.c kernel/notifier.c should #include <linux/reboot.h> 2008-02-06 10:41:02 -08:00
ns_cgroup.c cgroups: implement namespace tracking subsystem 2007-10-19 11:53:37 -07:00
nsproxy.c namespaces: move the IPC namespace under IPC_NS option 2008-02-08 09:22:23 -08:00
panic.c ACPI: Taint kernel on ACPI table override (format corrected) 2008-02-06 22:07:51 -05:00
params.c Add new string functions strict_strto* and convert kernel params to use them 2008-02-08 09:22:41 -08:00
pid_namespace.c namespaces: cleanup the code managed with PID_NS option 2008-02-08 09:22:23 -08:00
pid.c kernel: remove fastcall in kernel/* 2008-02-08 09:22:31 -08:00
pm_qos_params.c pm qos infrastructure and interface 2008-02-05 09:44:22 -08:00
posix-cpu-timers.c posix-timers: fix shadowed variables 2008-04-17 12:22:30 +02:00
posix-timers.c kernel: Remove unnecessary inclusions of asm/semaphore.h 2008-04-18 22:17:04 -04:00
printk.c Fix locking bug in "acquire_console_semaphore_for_printk()" 2008-04-15 13:09:54 -07:00
profile.c kernel: Remove unnecessary inclusions of asm/semaphore.h 2008-04-18 22:17:04 -04:00
ptrace.c ptrace: compat_ptrace_request siginfo 2008-04-21 15:53:41 -07:00
rcuclassic.c Preempt-RCU: implementation 2008-01-25 21:08:24 +01:00
rcupdate.c rcupdate: fix comment 2008-02-13 16:21:18 -08:00
rcupreempt_trace.c Preempt-RCU: implementation 2008-01-25 21:08:24 +01:00
rcupreempt.c generic: reduce stack pressure in sched_affinity 2008-04-19 19:44:59 +02:00
rcutorture.c generic: use new set_cpus_allowed_ptr function 2008-04-19 19:44:58 +02:00
relay.c relay: set an spd_release() hook for splice 2008-03-26 12:04:09 +01:00
res_counter.c Memory Resource Controller use strstrip while parsing arguments 2008-03-04 16:35:09 -08:00
resource.c PCI: clean up resource alignment management 2008-04-20 21:47:08 -07:00
rtmutex_common.h Don't operate with pid_t in rtmutex tester 2008-02-08 09:22:41 -08:00
rtmutex-debug.c Don't operate with pid_t in rtmutex tester 2008-02-08 09:22:41 -08:00
rtmutex-debug.h
rtmutex-tester.c Driver core: change sysdev classes to use dynamic kobject names 2008-01-24 20:40:40 -08:00
rtmutex.c hrtimer: more hrtimer_init_sleeper() fallout. 2008-02-13 15:45:36 +01:00
rtmutex.h
rwsem.c sched: mark rwsem functions as __sched for wchan/profiling 2007-12-18 15:21:13 +01:00
sched_debug.c sched: build fix 2008-04-19 19:45:01 +02:00
sched_fair.c sched: debug: show a weight tree 2008-04-19 19:45:00 +02:00
sched_features.h sched: /debug/sched_features 2008-04-19 19:45:00 +02:00
sched_idletask.c sched: high-res preemption tick 2008-01-25 21:08:29 +01:00
sched_rt.c sched: rt-group: optimize dequeue_rt_stack 2008-04-19 19:45:00 +02:00
sched_stats.h cpumask: use new cpus_scnprintf function 2008-04-19 19:44:59 +02:00
sched.c sched: fix share (re)distribution 2008-04-25 00:25:08 +02:00
seccomp.c make seccomp zerocost in schedule 2007-07-16 09:05:50 -07:00
semaphore.c Improve semaphore documentation 2008-04-17 10:43:01 -04:00
signal.c trivial: small cleanups 2008-04-21 22:15:06 +00:00
softirq.c tasklets: execute tasklets in the same order they were queued 2008-04-19 19:44:58 +02:00
softlockup.c softlockup: fix task state setting 2008-02-29 18:46:53 +01:00
spinlock.c spinlock: lockbreak cleanup 2008-01-30 13:31:20 +01:00
srcu.c make srcu_readers_active() static 2008-02-06 10:41:02 -08:00
stacktrace.c
stop_machine.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/juhl/trivial 2008-04-21 16:36:46 -07:00
sys_ni.c timerfd: new timerfd API 2008-02-05 09:44:07 -08:00
sys.c generic, x86: add prctl commands PR_GET_TSC and PR_SET_TSC 2008-04-19 19:19:55 +02:00
sysctl_check.c constify tables in kernel/sysctl_check.c 2008-02-08 09:22:31 -08:00
sysctl.c sched: rt-group: synchonised bandwidth period 2008-04-19 19:44:57 +02:00
taskstats.c kernel/taskstats.c: fix bogus nlmsg_free() 2007-11-14 18:45:44 -08:00
test_kprobes.c kprobes: kretprobe user entry-handler 2008-02-06 10:41:11 -08:00
time.c time: Export set_normalized_timespec. 2008-04-21 19:45:12 -07:00
timeconst.pl timeconst.pl: correct reversal of USEC_TO_HZ and HZ_TO_USEC 2008-02-12 14:29:26 -08:00
timer.c timers: simplify lockdep handling 2008-04-17 12:22:31 +02:00
tsacct.c Add scaled time to taskstats based process accounting 2007-10-18 14:37:28 -07:00
uid16.c asmlinkage_protect replaces prevent_tail_call 2008-04-10 17:28:26 -07:00
user_namespace.c namespaces: cleanup the code managed with the USER_NS option 2008-02-08 09:22:23 -08:00
user.c sched: fix the task_group hierarchy for UID grouping 2008-04-19 19:45:00 +02:00
utsname_sysctl.c Isolate the UTS namespace's domainname and hostname back 2007-11-29 09:24:53 -08:00
utsname.c Fix UTS corruption during clone(CLONE_NEWUTS) 2007-09-19 11:24:17 -07:00
wait.c kernel: remove fastcall in kernel/* 2008-02-08 09:22:31 -08:00
workqueue.c timer_list: add annotations to workqueue.c 2008-04-17 12:22:30 +02:00