linux/kernel/sched
Srivatsa Vaddagiri 88b8dac0a1 sched: Improve balance_cpu() to consider other cpus in its group as target of (pinned) task
Current load balance scheme requires only one cpu in a
sched_group (balance_cpu) to look at other peer sched_groups for
imbalance and pull tasks towards itself from a busy cpu. Tasks
thus pulled by balance_cpu could later get picked up by cpus
that are in the same sched_group as that of balance_cpu.

This scheme however fails to pull tasks that are not allowed to
run on balance_cpu (but are allowed to run on other cpus in its
sched_group). That can affect fairness and in some worst case
scenarios cause starvation.

Consider a two core (2 threads/core) system running tasks as
below:

          Core0            Core1
         /     \          /     \
	C0     C1	 C2     C3
        |      |         |      |
        v      v         v      v
	F0     T1        F1     [idle]
			 T2

 F0 = SCHED_FIFO task (pinned to C0)
 F1 = SCHED_FIFO task (pinned to C2)
 T1 = SCHED_OTHER task (pinned to C1)
 T2 = SCHED_OTHER task (pinned to C1 and C2)

F1 could become a cpu hog, which will starve T2 unless C1 pulls
it. Between C0 and C1 however, C0 is required to look for
imbalance between cores, which will fail to pull T2 towards
Core0. T2 will starve eternally in this case. The same scenario
can arise in presence of non-rt tasks as well (say we replace F1
with high irq load).

We tackle this problem by having balance_cpu move pinned tasks
to one of its sibling cpus (where they can run). We first check
if load balance goal can be met by ignoring pinned tasks,
failing which we retry move_tasks() with a new env->dst_cpu.

This patch modifies load balance semantics on who can move load
towards a given cpu in a given sched_domain.

Before this patch, a given_cpu or a ilb_cpu acting on behalf of
an idle given_cpu is responsible for moving load to given_cpu.

With this patch applied, balance_cpu can in addition decide on
moving some load to a given_cpu.

There is a remote possibility that excess load could get moved
as a result of this (balance_cpu and given_cpu/ilb_cpu deciding
*independently* and at *same* time to move some load to a
given_cpu). However we should see less of such conflicting
decisions in practice and moreover subsequent load balance
cycles should correct the excess load moved to given_cpu.

Signed-off-by: Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com>
Signed-off-by: Prashanth Nageshappa <prashanth@linux.vnet.ibm.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/4FE06CDB.2060605@linux.vnet.ibm.com
[ minor edits ]
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2012-07-24 13:58:06 +02:00
..
auto_group.c sched: Clean up parameter passing of proc_sched_autogroup_set_nice() 2012-03-02 12:23:49 +01:00
auto_group.h
clock.c
core.c sched: Improve scalability via 'CPU buddies', which withstand random perturbations 2012-07-24 13:53:34 +02:00
cpupri.c kernel-doc: fix kernel-doc warnings in sched 2012-01-23 08:44:54 -08:00
cpupri.h
debug.c sched/debug: Fix printing large integers on 32-bit platforms 2012-05-14 15:05:28 +02:00
fair.c sched: Improve balance_cpu() to consider other cpus in its group as target of (pinned) task 2012-07-24 13:58:06 +02:00
features.h sched: Fix more load-balancing fallout 2012-04-26 12:54:52 +02:00
idle_task.c sched/nohz: Rewrite and fix load-avg computation -- again 2012-07-05 20:58:13 +02:00
Makefile init_task: Create generic init_task instance 2012-05-05 13:00:21 +02:00
rt.c sched/rt: Fix lockdep annotation within find_lock_lowest_rq() 2012-06-06 16:52:26 +02:00
sched.h sched/nohz: Rewrite and fix load-avg computation -- again 2012-07-05 20:58:13 +02:00
stats.c sched: Remove sched_switch 2012-01-27 13:28:53 +01:00
stats.h
stop_task.c