linux/kernel/sched
Thomas Gleixner e9532e69b8 sched/cputime: Fix steal time accounting vs. CPU hotplug
On CPU hotplug the steal time accounting can keep a stale rq->prev_steal_time
value over CPU down and up. So after the CPU comes up again the delta
calculation in steal_account_process_tick() wreckages itself due to the
unsigned math:

	 u64 steal = paravirt_steal_clock(smp_processor_id());

	 steal -= this_rq()->prev_steal_time;

So if steal is smaller than rq->prev_steal_time we end up with an insane large
value which then gets added to rq->prev_steal_time, resulting in a permanent
wreckage of the accounting. As a consequence the per CPU stats in /proc/stat
become stale.

Nice trick to tell the world how idle the system is (100%) while the CPU is
100% busy running tasks. Though we prefer realistic numbers.

None of the accounting values which use a previous value to account for
fractions is reset at CPU hotplug time. update_rq_clock_task() has a sanity
check for prev_irq_time and prev_steal_time_rq, but that sanity check solely
deals with clock warps and limits the /proc/stat visible wreckage. The
prev_time values are still wrong.

Solution is simple: Reset rq->prev_*_time when the CPU is plugged in again.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>
Acked-by: Rik van Riel <riel@redhat.com>
Cc: <stable@vger.kernel.org>
Cc: Frederic Weisbecker <fweisbec@gmail.com>
Cc: Glauber Costa <glommer@parallels.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Peter Zijlstra <peterz@infradead.org>
Fixes: commit 095c0aa83e "sched: adjust scheduler cpu power for stolen time"
Fixes: commit aa48380851 "sched: Remove irq time from available CPU power"
Fixes: commit e6e6685acc "KVM guest: Steal time accounting"
Link: http://lkml.kernel.org/r/alpine.DEB.2.11.1603041539490.3686@nanos
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2016-03-05 09:17:20 +01:00
..
auto_group.c sched/core: Move the sched_to_prio[] arrays out of line 2015-12-04 10:34:46 +01:00
auto_group.h sched, timer: Convert usages of ACCESS_ONCE() in the scheduler to READ_ONCE()/WRITE_ONCE() 2015-05-08 12:11:32 +02:00
clock.c Merge branch 'for-4.5' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2016-01-11 18:53:13 -08:00
completion.c sched/completion: Serialize completion_done() with complete() 2015-02-18 14:27:40 +01:00
core.c sched/cputime: Fix steal time accounting vs. CPU hotplug 2016-03-05 09:17:20 +01:00
cpuacct.c cgroup: rename cgroup_subsys->base_cftypes to ->legacy_cftypes 2014-07-15 11:05:09 -04:00
cpuacct.h sched/cpuacct: Initialize root cpuacct earlier 2013-04-10 13:54:20 +02:00
cpudeadline.c sched/deadline: Unify dl_time_before() usage 2015-09-23 09:51:25 +02:00
cpudeadline.h sched/deadline: Unify dl_time_before() usage 2015-09-23 09:51:25 +02:00
cpupri.c Merge commit '3cf2f34' into sched/core, to fix build error 2014-06-12 13:46:37 +02:00
cpupri.h sched/cpupri: Remove unnecessary definitions in cpupri.h 2014-11-16 10:58:59 +01:00
cputime.c xen: features and fixes for 4.5-rc0 2016-01-12 13:05:36 -08:00
deadline.c sched/deadline: Always calculate end of period on sched_yield() 2016-02-29 09:41:51 +01:00
debug.c sched/fair: Provide runnable_load_avg back to cfs_rq 2015-08-03 12:24:31 +02:00
fair.c sched/cgroup: Fix cgroup entity load tracking tear-down 2016-02-29 09:41:50 +01:00
features.h sched/fair: Convert arch_scale_cpu_capacity() from weak function to #define 2015-09-13 09:52:55 +02:00
idle_task.c sched/fair: Remove empty idle enter and exit functions 2015-11-23 09:37:51 +01:00
idle.c Merge branches 'pm-cpuidle', 'pm-cpufreq', 'pm-domains' and 'pm-sleep' 2016-01-29 21:45:17 +01:00
loadavg.c sched: Move the loadavg code to a more obvious location 2015-05-08 12:04:12 +02:00
Makefile sched: Move the loadavg code to a more obvious location 2015-05-08 12:04:12 +02:00
rt.c sched/rt: Hide the push_irq_work_func() declaration 2015-11-23 09:25:08 +01:00
sched.h sched/cputime: Fix steal time accounting vs. CPU hotplug 2016-03-05 09:17:20 +01:00
stats.c sched: use %*pb[l] to print bitmaps including cpumasks and nodemasks 2015-02-13 21:21:37 -08:00
stats.h sched/stat: Simplify the sched_info accounting dependency 2015-07-04 10:04:30 +02:00
stop_task.c sched: Make sched_class::set_cpus_allowed() unconditional 2015-08-12 12:06:09 +02:00
wait.c sched/wait: Fix the signal handling fix 2015-12-13 14:30:59 -08:00