linux-next

mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-25 05:34:00 +08:00

History

Morten Rasmussen 2802bf3cd9 sched/fair: Add over-utilization/tipping point indicator Energy-aware scheduling is only meant to be active while the system is _not_ over-utilized. That is, there are spare cycles available to shift tasks around based on their actual utilization to get a more energy-efficient task distribution without depriving any tasks. When above the tipping point task placement is done the traditional way based on load_avg, spreading the tasks across as many cpus as possible based on priority scaled load to preserve smp_nice. Below the tipping point we want to use util_avg instead. We need to define a criteria for when we make the switch. The util_avg for each cpu converges towards 100% regardless of how many additional tasks we may put on it. If we define over-utilized as: sum_{cpus}(rq.cfs.avg.util_avg) + margin > sum_{cpus}(rq.capacity) some individual cpus may be over-utilized running multiple tasks even when the above condition is false. That should be okay as long as we try to spread the tasks out to avoid per-cpu over-utilization as much as possible and if all tasks have the _same_ priority. If the latter isn't true, we have to consider priority to preserve smp_nice. For example, we could have n_cpus nice=-10 util_avg=55% tasks and n_cpus/2 nice=0 util_avg=60% tasks. Balancing based on util_avg we are likely to end up with nice=-10 tasks sharing cpus and nice=0 tasks getting their own as we 1.5*n_cpus tasks in total and 55%+55% is less over-utilized than 55%+60% for those cpus that have to be shared. The system utilization is only 85% of the system capacity, but we are breaking smp_nice. To be sure not to break smp_nice, we have defined over-utilization conservatively as when any cpu in the system is fully utilized at its highest frequency instead: cpu_rq(any).cfs.avg.util_avg + margin > cpu_rq(any).capacity IOW, as soon as one cpu is (nearly) 100% utilized, we switch to load_avg to factor in priority to preserve smp_nice. With this definition, we can skip periodic load-balance as no cpu has an always-running task when the system is not over-utilized. All tasks will be periodic and we can balance them at wake-up. This conservative condition does however mean that some scenarios that could benefit from energy-aware decisions even if one cpu is fully utilized would not get those benefits. For systems where some cpus might have reduced capacity on some cpus (RT-pressure and/or big.LITTLE), we want periodic load-balance checks as soon a just a single cpu is fully utilized as it might one of those with reduced capacity and in that case we want to migrate it. [ peterz: Added a comment explaining why new tasks are not accounted during overutilization detection. ] Signed-off-by: Morten Rasmussen <morten.rasmussen@arm.com> Signed-off-by: Quentin Perret <quentin.perret@arm.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mike Galbraith <efault@gmx.de> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: adharmap@codeaurora.org Cc: chris.redpath@arm.com Cc: currojerez@riseup.net Cc: dietmar.eggemann@arm.com Cc: edubezval@gmail.com Cc: gregkh@linuxfoundation.org Cc: javi.merino@kernel.org Cc: joel@joelfernandes.org Cc: juri.lelli@redhat.com Cc: patrick.bellasi@arm.com Cc: pkondeti@codeaurora.org Cc: rjw@rjwysocki.net Cc: skannan@codeaurora.org Cc: smuckle@google.com Cc: srinivas.pandruvada@linux.intel.com Cc: thara.gopinath@linaro.org Cc: tkjos@google.com Cc: valentin.schneider@arm.com Cc: vincent.guittot@linaro.org Cc: viresh.kumar@linaro.org Link: https://lkml.kernel.org/r/20181203095628.11858-13-quentin.perret@arm.com Signed-off-by: Ingo Molnar <mingo@kernel.org>		2018-12-11 15:17:01 +01:00
..
autogroup.c	sched/autogroup: Fix possible Spectre-v1 indexing for sched_prio_to_weight[]	2018-05-05 08:34:42 +02:00
autogroup.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
clock.c	sched/clock: Disable interrupts when calling generic_sched_clock_init()	2018-07-30 19:33:35 +02:00
completion.c	sched/Documentation: Update wake_up() & co. memory-barrier guarantees	2018-07-17 09:30:34 +02:00
core.c	sched: Fix various typos in comments	2018-12-03 11:55:42 +01:00
cpuacct.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpudeadline.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpudeadline.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpufreq_schedutil.c	sched/topology: Make Energy Aware Scheduling depend on schedutil	2018-12-11 15:17:00 +01:00
cpufreq.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpupri.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cpupri.h	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
cputime.c	sched: Fix various typos in comments	2018-12-03 11:55:42 +01:00
deadline.c	sched/core: Remove unnecessary unlikely() in push_*_task()	2018-12-11 15:16:57 +01:00
debug.c	sched/core: Create task_has_idle_policy() helper	2018-11-12 06:17:52 +01:00
fair.c	sched/fair: Add over-utilization/tipping point indicator	2018-12-11 15:17:01 +01:00
features.h	sched/fair: Disable LB_BIAS by default	2018-10-02 09:45:01 +02:00
idle.c	x86/stackprotector: Remove the call to boot_init_stack_canary() from cpu_startup_entry()	2018-10-22 04:07:24 +02:00
isolation.c	sched: Fix various typos in comments	2018-12-03 11:55:42 +01:00
loadavg.c	sched: loadavg: make calc_load_n() public	2018-10-26 16:26:32 -07:00
Makefile	psi: pressure stall information for CPU, memory, and IO	2018-10-26 16:26:32 -07:00
membarrier.c	sched/headers: Simplify and clean up header usage in the scheduler	2018-03-04 12:39:29 +01:00
pelt.c	sched/fair: Remove setting task's se->runnable_weight during PELT update	2018-10-02 09:45:03 +02:00
pelt.h	sched/pelt: Fix warning and clean up IRQ PELT config	2018-10-02 09:45:00 +02:00
psi.c	psi: make disabling/enabling easier for vendor kernels	2018-11-30 14:56:14 -08:00
rt.c	sched/core: Remove unnecessary unlikely() in push_*_task()	2018-12-11 15:16:57 +01:00
sched-pelt.h	License cleanup: add SPDX GPL-2.0 license identifier to files with no license	2017-11-02 11:10:55 +01:00
sched.h	sched/fair: Add over-utilization/tipping point indicator	2018-12-11 15:17:01 +01:00
stats.c	proc: introduce proc_create_seq{,_data}	2018-05-16 07:23:35 +02:00
stats.h	psi: make disabling/enabling easier for vendor kernels	2018-11-30 14:56:14 -08:00
stop_task.c	sched: Clean up and harmonize the coding style of the scheduler code base	2018-03-03 15:50:21 +01:00
swait.c	sched/swait: Rename to exclusive	2018-06-20 11:35:56 +02:00
topology.c	sched/toplogy: Introduce the 'sched_energy_present' static key	2018-12-11 15:17:01 +01:00
wait_bit.c	sched/wait: Improve __var_waitqueue() code generation	2018-03-20 08:23:25 +01:00
wait.c	sched/wait: assert the wait_queue_head lock is held in __wake_up_common	2018-08-22 10:52:47 -07:00