linux/kernel
Youquan Song 69a37beabf cpuidle: Quickly notice prediction failure for repeat mode
The prediction for future is difficult and when the cpuidle governor prediction
fails and govenor possibly choose the shallower C-state than it should. How to
quickly notice and find the failure becomes important for power saving.

cpuidle menu governor has a method to predict the repeat pattern if there are 8
C-states residency which are continuous and the same or very close, so it will
predict the next C-states residency will keep same residency time.

There is a real case that turbostat utility (tools/power/x86/turbostat)
at kernel 3.3 or early. turbostat utility will read 10 registers one by one at
Sandybridge, so it will generate 10 IPIs to wake up idle CPUs. So cpuidle menu
 governor will predict it is repeat mode and there is another IPI wake up idle
 CPU soon, so it keeps idle CPU stay at C1 state even though CPU is totally
idle. However, in the turbostat, following 10 registers reading is sleep 5
seconds by default, so the idle CPU will keep at C1 for a long time though it is
 idle until break event occurs.
In a idle Sandybridge system, run "./turbostat -v", we will notice that deep
C-state dangles between "70% ~ 99%". After patched the kernel, we will notice
deep C-state stays at >99.98%.

In the patch, a timer is added when menu governor detects a repeat mode and
choose a shallow C-state. The timer is set to a time out value that greater
than predicted time, and we conclude repeat mode prediction failure if timer is
triggered. When repeat mode happens as expected, the timer is not triggered
and CPU waken up from C-states and it will cancel the timer initiatively.
When repeat mode does not happen, the timer will be time out and menu governor
will quickly notice that the repeat mode prediction fails and then re-evaluates
deeper C-states possibility.

Below is another case which will clearly show the patch much benefit:

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <signal.h>
#include <sys/time.h>
#include <time.h>
#include <pthread.h>

volatile int * shutdown;
volatile long * count;
int delay = 20;
int loop = 8;

void usage(void)
{
	fprintf(stderr,
		"Usage: idle_predict [options]\n"
		"  --help	-h  Print this help\n"
		"  --thread	-n  Thread number\n"
		"  --loop     	-l  Loop times in shallow Cstate\n"
		"  --delay	-t  Sleep time (uS)in shallow Cstate\n");
}

void *simple_loop() {
	int idle_num = 1;
	while (!(*shutdown)) {
		*count = *count + 1;

		if (idle_num % loop)
			usleep(delay);
		else {
			/* sleep 1 second */
			usleep(1000000);
			idle_num = 0;
		}
		idle_num++;
	}

}

static void sighand(int sig)
{
	*shutdown = 1;
}

int main(int argc, char *argv[])
{
	sigset_t sigset;
	int signum = SIGALRM;
	int i, c, er = 0, thread_num = 8;
	pthread_t pt[1024];

	static char optstr[] = "n:l:t:h:";

	while ((c = getopt(argc, argv, optstr)) != EOF)
		switch (c) {
			case 'n':
				thread_num = atoi(optarg);
				break;
			case 'l':
				loop = atoi(optarg);
				break;
			case 't':
				delay = atoi(optarg);
				break;
			case 'h':
			default:
				usage();
				exit(1);
		}

	printf("thread=%d,loop=%d,delay=%d\n",thread_num,loop,delay);
	count = malloc(sizeof(long));
	shutdown = malloc(sizeof(int));
	*count = 0;
	*shutdown = 0;

	sigemptyset(&sigset);
	sigaddset(&sigset, signum);
	sigprocmask (SIG_BLOCK, &sigset, NULL);
	signal(SIGINT, sighand);
	signal(SIGTERM, sighand);

	for(i = 0; i < thread_num ; i++)
		pthread_create(&pt[i], NULL, simple_loop, NULL);

	for (i = 0; i < thread_num; i++)
		pthread_join(pt[i], NULL);

	exit(0);
}

Get powertop V2 from git://github.com/fenrus75/powertop, build powertop.
After build the above test application, then run it.
Test plaform can be Intel Sandybridge or other recent platforms.
#./idle_predict -l 10 &
#./powertop

We will find that deep C-state will dangle between 40%~100% and much time spent
on C1 state. It is because menu governor wrongly predict that repeat mode
is kept, so it will choose the C1 shallow C-state even though it has chance to
sleep 1 second in deep C-state.

While after patched the kernel, we find that deep C-state will keep >99.6%.

Signed-off-by: Rik van Riel <riel@redhat.com>
Signed-off-by: Youquan Song <youquan.song@intel.com>
Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
2012-11-15 00:34:19 +01:00
..
debug KGDB/KDB fixes and cleanups 2012-10-13 11:16:58 +09:00
events Merge branch 'uprobes/core' of git://git.kernel.org/pub/scm/linux/kernel/git/oleg/misc into perf/urgent 2012-10-21 18:18:17 +02:00
gcov gcov: disable CONSTRUCTORS for UML 2011-07-26 16:49:45 -07:00
irq irqdomain: augment add_simple() to allocate descs 2012-10-10 08:57:26 +02:00
power Merge branch 'pm-qos' 2012-09-17 20:25:51 +02:00
sched Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-12 22:13:05 +09:00
time cpuidle: Quickly notice prediction failure for repeat mode 2012-11-15 00:34:19 +01:00
trace Merge branch 'tip/perf/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/rostedt/linux-trace into perf/urgent 2012-10-21 19:53:34 +02:00
.gitignore
acct.c vfs: make path_openat take a struct filename pointer 2012-10-12 20:15:09 -04:00
async.c [SCSI] async: make async_synchronize_full() flush all work regardless of domain 2012-07-20 09:07:37 +01:00
audit_tree.c audit: clean up refcounting in audit-tree 2012-08-15 12:55:22 +02:00
audit_watch.c audit: optimize audit_compare_dname_path 2012-10-12 00:32:02 -04:00
audit.c fs: handle failed audit_log_start properly 2012-10-09 23:33:37 -04:00
audit.h audit: optimize audit_compare_dname_path 2012-10-12 00:32:02 -04:00
auditfilter.c audit: optimize audit_compare_dname_path 2012-10-12 00:32:02 -04:00
auditsc.c audit: make audit_inode take struct filename 2012-10-12 20:15:09 -04:00
backtracetest.c
bounds.c memcg: remove direct page_cgroup-to-page pointer 2011-03-23 19:46:28 -07:00
capability.c userns: Teach inode_capable to understand inodes whose uids map to other namespaces. 2012-05-15 14:59:24 -07:00
cgroup_freezer.c cgroup: mark subsystems with broken hierarchy support and whine if cgroups are nested for them 2012-09-14 12:01:16 -07:00
cgroup.c Revert "cgroup: Remove task_lock() from cgroup_post_fork()" 2012-10-19 14:09:35 -07:00
compat.c new helper: sigsuspend() 2012-05-21 23:52:30 -04:00
configs.c kernel/configs.c: include MODULE_*() when CONFIG_IKCONFIG_PROC=n 2011-07-25 20:57:15 -07:00
cpu_pm.c kernel/cpu_pm.c: fix various typos 2012-05-31 17:49:27 -07:00
cpu.c CPU hotplug, debug: detect imbalance between get_online_cpus() and put_online_cpus() 2012-10-09 16:22:15 +09:00
cpuset.c cpusets: Remove/update outdated comments 2012-07-24 13:53:28 +02:00
crash_dump.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
cred.c userns: Make credential debugging user namespace safe. 2012-08-23 22:54:18 -07:00
delayacct.c KVM: Steal time implementation 2011-07-14 12:59:14 +03:00
dma.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
elfcore.c
exec_domain.c
exit.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2012-10-02 20:25:04 -07:00
extable.c extable: Skip sorting if sorted at build time. 2012-04-19 15:06:55 -07:00
fork.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2012-10-10 12:02:25 +09:00
freezer.c PM / Freezer: Remove references to TIF_FREEZE in comments 2012-03-04 23:08:54 +01:00
futex_compat.c futex: Mark get_robust_list as deprecated 2012-03-29 11:37:17 +02:00
futex.c futex: Forbid uaddr == uaddr2 in futex_wait_requeue_pi() 2012-07-24 16:02:57 +02:00
groups.c userns: Convert in_group_p and in_egroup_p to use kgid_t 2012-05-03 03:29:33 -07:00
hrtimer.c hrtimer: Update hrtimer base offsets each hrtimer_interrupt 2012-07-11 23:34:39 +02:00
hung_task.c hung task debugging: Inject NMI when hung and going to panic 2012-04-25 12:39:25 +02:00
irq_work.c irq_work: fix compile failure on tile from missing include 2012-04-13 13:15:16 -04:00
itimer.c itimer: Use printk_once instead of WARN_ONCE 2012-04-10 11:00:30 +02:00
jump_label.c jump_label: Export jump_label_rate_limit() 2012-08-06 19:00:35 +03:00
kallsyms.c vsprintf: fix %ps on non symbols when using kallsyms 2012-05-29 16:22:32 -07:00
kcmp.c syscalls, x86: add __NR_kcmp syscall 2012-05-31 17:49:32 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks locking: Adjust spin lock inlining Kconfig options 2012-09-13 17:56:13 +02:00
Kconfig.preempt locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
kexec.c kdump: remove unneeded include 2012-10-06 03:05:19 +09:00
kfifo.c [media] kernel:kfifo: export __kfifo_max_r symbol 2012-04-11 18:24:37 -03:00
kmod.c infrastructure for saner ret_from_kernel_thread semantics 2012-10-12 13:35:07 -04:00
kprobes.c kprobes/x86: Fix to support jprobes on ftrace-based kprobe 2012-09-13 22:52:11 -04:00
ksysfs.c kernel: ksysfs.c is implicitly using stat.h 2011-10-31 09:20:13 -04:00
kthread.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2012-10-13 10:05:52 +09:00
latencytop.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
lglock.c brlocks/lglocks: turn into functions 2012-05-29 23:28:41 -04:00
lockdep_internals.h
lockdep_proc.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
lockdep_states.h
lockdep.c lockdep: Check if nested lock is actually held 2012-09-13 17:00:44 +02:00
Makefile Makefile: Documentation for external tool should be correct 2012-10-25 16:00:53 -07:00
modsign_pubkey.c MODSIGN: Provide module signing public keys to the kernel 2012-10-10 20:01:22 +10:30
module_signing.c module_signing: fix printk format warning 2012-10-22 08:56:34 +03:00
module-internal.h MODSIGN: Move the magic string to the end of a module and eliminate the search 2012-10-19 17:30:40 -07:00
module.c module: fix out-by-one error in kallsyms 2012-10-31 13:56:37 +10:30
mutex-debug.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
mutex-debug.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex.c sched/rt: Use schedule_preempt_disabled() 2012-03-01 10:28:03 +01:00
mutex.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
notifier.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
nsproxy.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
padata.c padata: Fix cpu hotplug 2012-03-29 19:52:46 +08:00
panic.c panic: fix a possible deadlock in panic() 2012-07-30 17:25:13 -07:00
params.c params: replace printk(KERN_<LVL>...) with pr_<lvl>(...) 2012-05-04 17:28:18 -07:00
pid_namespace.c pidns: limit the nesting depth of pid namespaces 2012-10-25 14:37:53 -07:00
pid.c net ip6 flowlabel: Make owner a union of struct pid * and kuid_t 2012-08-14 21:49:25 -07:00
posix-cpu-timers.c [S390] cputime: add sparse checking and cleanup 2011-12-15 14:56:19 +01:00
posix-timers.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
printk.c printk: Fix scheduling-while-atomic problem in console_cpu_notify() 2012-10-16 18:17:44 -07:00
profile.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
ptrace.c ptrace: mark __ptrace_may_access() static 2012-08-03 14:47:17 +10:00
range.c range: fix bogus misuse of module.h to get printk() 2011-10-31 09:20:11 -04:00
rcu.h rcu: Allow nesting of rcu_idle_enter() and rcu_idle_exit() 2012-02-21 09:06:12 -08:00
rcupdate.c rcu: Add PROVE_RCU_DELAY to provoke difficult races 2012-09-23 07:42:49 -07:00
rcutiny_plugin.h rcu: Move TINY_PREEMPT_RCU away from raw_local_irq_save() 2012-09-23 07:42:51 -07:00
rcutiny.c rcu: Move TINY_RCU quiescent state out of extended quiescent state 2012-09-23 07:42:52 -07:00
rcutorture.c rcu: Prevent initialization race in rcutorture kthreads 2012-09-23 07:42:23 -07:00
rcutree_plugin.h rcu: Make RCU_FAST_NO_HZ handle adaptive ticks 2012-09-26 15:44:02 +02:00
rcutree_trace.c Merge remote-tracking branch 'tip/smp/hotplug' into next.2012.09.25b 2012-09-25 10:01:45 -07:00
rcutree.c rcu: Grace-period initialization excludes only RCU notifier 2012-10-08 09:06:38 -07:00
rcutree.h rcu: Grace-period initialization excludes only RCU notifier 2012-10-08 09:06:38 -07:00
relay.c splice: fix racy pipe->buffers uses 2012-06-13 21:16:42 +02:00
res_counter.c rescounters: add res_counter_uncharge_until() 2012-05-29 16:22:27 -07:00
resource.c kernel/resource.c: fix stack overflow in __reserve_region_with_split() 2012-10-06 03:05:31 +09:00
rtmutex_common.h
rtmutex-debug.c lockdep, rtmutex, bug: Show taint flags on error 2011-12-06 08:16:49 +01:00
rtmutex-debug.h
rtmutex-tester.c rtmutex-tester: convert sysdev_class to a regular subsystem 2011-12-14 14:54:22 -08:00
rtmutex.c Revert "rcu: Permit rt_mutex_unlock() with irqs disabled" 2011-12-11 10:33:18 -08:00
rtmutex.h
rwsem.c Remove all #inclusions of asm/system.h 2012-03-28 18:30:03 +01:00
seccomp.c seccomp: fix build warnings when there is no CONFIG_SECCOMP_FILTER 2012-04-18 12:24:52 +10:00
semaphore.c semaphore: fix improper comment reference to mutex 2012-04-05 17:15:55 -07:00
signal.c coredump: pass siginfo_t* to do_coredump() and below, not merely signr 2012-10-06 03:05:16 +09:00
smp.c smp: Remove ipi_call_lock[_irq]()/ipi_call_unlock[_irq]() 2012-06-05 17:27:14 +02:00
smpboot.c hotplug: Fix UP bug in smpboot hotplug code 2012-08-13 17:01:07 +02:00
smpboot.h smpboot: Provide infrastructure for percpu hotplug threads 2012-08-13 17:01:07 +02:00
softirq.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2012-10-01 10:43:39 -07:00
spinlock.c locking/kconfig: Simplify INLINE_SPIN_UNLOCK usage 2012-03-23 13:18:57 +01:00
srcu.c workqueue: deprecate system_nrt[_freezable]_wq 2012-08-20 14:51:24 -07:00
stacktrace.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
stop_machine.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
sys_ni.c syscalls, x86: add __NR_kcmp syscall 2012-05-31 17:49:32 -07:00
sys.c use clamp_t in UNAME26 fix 2012-10-19 18:51:17 -07:00
sysctl_binary.c mm: prepare for removal of obsolete /proc/sys/vm/nr_pdflush_threads 2012-07-31 18:42:40 -07:00
sysctl.c Kconfig: clean up the "#if defined(arch)" list for exception-trace sysctl entry 2012-10-09 16:22:14 +09:00
task_work.c task_work: task_work_add() should not succeed after exit_task_work() 2012-09-13 16:47:34 +02:00
taskstats.c taskstats: cgroupstats_user_cmd() may leak on error 2012-10-06 03:05:31 +09:00
test_kprobes.c
time.c time: Move update_vsyscall definitions to timekeeper_internal.h 2012-09-24 12:38:06 -04:00
timeconst.pl
timer.c timers: Fix endless looping between cascade() and internal_add_timer() 2012-10-09 21:27:14 +02:00
tracepoint.c static keys: Introduce 'struct static_key', static_key_true()/false() and static_key_slow_[inc|dec]() 2012-02-24 10:05:59 +01:00
tsacct.c userns: Convert taskstats to handle the user and pid namespaces. 2012-09-18 01:01:32 -07:00
uid16.c userns: Convert setting and getting uid and gid system calls to use kuid and kgid 2012-05-03 03:28:41 -07:00
up.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
user_namespace.c userns: Add kprojid_t and associated infrastructure in projid.h 2012-09-18 01:01:37 -07:00
user-return-notifier.c kernel: Map most files to use export.h instead of module.h 2011-10-31 09:20:12 -04:00
user.c userns: Add kprojid_t and associated infrastructure in projid.h 2012-09-18 01:01:37 -07:00
utsname_sysctl.c Merge branch 'modsplit-Oct31_2011' of git://git.kernel.org/pub/scm/linux/kernel/git/paulg/linux 2011-11-06 19:44:47 -08:00
utsname.c userns: Use cred->user_ns instead of cred->user->user_ns 2012-04-07 16:55:51 -07:00
wait.c lockdep/waitqueues: Add better annotation 2011-12-21 10:07:39 +01:00
watchdog.c watchdog: Use hotplug thread infrastructure 2012-08-13 17:01:07 +02:00
workqueue_sched.h
workqueue.c workqueue: cancel_delayed_work() should return %false if work item is idle 2012-10-24 12:38:16 -07:00