2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-21 11:44:01 +08:00
linux-next/kernel
Stephane Eranian a8d757ef07 perf events: Fix slow and broken cgroup context switch code
The current cgroup context switch code was incorrect leading
to bogus counts. Furthermore, as soon as there was an active
cgroup event on a CPU, the context switch cost on that CPU
would increase by a significant amount as demonstrated by a
simple ping/pong example:

 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10684.51 ctxsw/s

Now start a cgroup perf stat:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

$ ./pong
 Both processes pinned to CPU1, running for 10s
 6674.61 ctxsw/s

That's a 37% penalty.

Note that pong is not even in the monitored cgroup.

The results shown by perf stat are bogus:
 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 100

 Performance counter stats for 'sleep 100':

 CPU1 <not counted> cycles   test
 CPU1 16,984,189,138 cycles  #    0.000 GHz

The second 'cycles' event should report a count @ CPU clock
(here 2.4GHz) as it is counting across all cgroups.

The patch below fixes the bogus accounting and bypasses any
cgroup switches in case the outgoing and incoming tasks are
in the same cgroup.

With this patch the same test now yields:
 $ ./pong
 Both processes pinned to CPU1, running for 10s
 10775.30 ctxsw/s

Start perf stat with cgroup:

 $ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

Run pong outside the cgroup:
 $ /pong
 Both processes pinned to CPU1, running for 10s
 10687.80 ctxsw/s

The penalty is now less than 2%.

And the results for perf stat are correct:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 <not counted> cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

Now perf stat reports the correct counts for
for the non cgroup event.

If we run pong inside the cgroup, then we also get the
correct counts:

$ perf stat -e cycles,cycles -A -a -G test  -C 1 -- sleep 10

 Performance counter stats for 'sleep 10':

 CPU1 22,297,726,205 cycles test #    0.000 GHz
 CPU1 23,933,981,448 cycles      #    0.000 GHz

      10.001457237 seconds time elapsed

Signed-off-by: Stephane Eranian <eranian@google.com>
Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Link: http://lkml.kernel.org/r/20110825135803.GA4697@quad
Signed-off-by: Ingo Molnar <mingo@elte.hu>
2011-08-29 12:28:33 +02:00
..
debug Merge branch 'for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jwessel/linux-2.6-kgdb 2011-08-01 13:39:40 -10:00
events perf events: Fix slow and broken cgroup context switch code 2011-08-29 12:28:33 +02:00
gcov gcov: disable CONSTRUCTORS for UML 2011-07-26 16:49:45 -07:00
irq Revert "irq: Always set IRQF_ONESHOT if no primary handler is specified" 2011-08-23 10:36:51 -07:00
power PM / Domains: Fix build for CONFIG_PM_RUNTIME unset 2011-08-14 13:34:31 +02:00
time Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-07-22 16:52:18 -07:00
trace Merge branch 'for-linus' of git://git.kernel.dk/linux-block 2011-08-19 10:47:07 -07:00
.gitignore
acct.c
async.c async: Fixed an include coding style issue 2011-06-14 22:48:46 -04:00
audit_tree.c audit_tree,rcu: Convert call_rcu(__put_tree) to kfree_rcu() 2011-07-20 14:10:11 -07:00
audit_watch.c kill path_lookup() 2011-03-14 09:15:23 -04:00
audit.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
audit.h
auditfilter.c netlink: kill loginuid/sessionid/sid members from struct netlink_skb_parms 2011-03-03 10:55:40 -08:00
auditsc.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
backtracetest.c
bounds.c memcg: remove direct page_cgroup-to-page pointer 2011-03-23 19:46:28 -07:00
capability.c Merge branch 'master' into next 2011-05-19 18:51:57 +10:00
cgroup_freezer.c cgroups: add per-thread subsystem callbacks 2011-05-26 17:12:34 -07:00
cgroup.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/security-testing-2.6 2011-07-27 19:26:38 -07:00
compat.c Merge branch 'v4l_for_linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-2.6 2011-07-30 00:08:53 -07:00
configs.c kernel/configs.c: include MODULE_*() when CONFIG_IKCONFIG_PROC=n 2011-07-25 20:57:15 -07:00
cpu.c Fix common misspellings 2011-03-31 11:26:23 -03:00
cpuset.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
crash_dump.c crash_dump: export is_kdump_kernel to modules, consolidate elfcorehdr_addr, setup_elfcorehdr and saved_max_pfn 2011-03-23 19:47:19 -07:00
cred.c move RLIMIT_NPROC check from set_user() to do_execve_common() 2011-08-11 11:24:42 -07:00
delayacct.c KVM: Steal time implementation 2011-07-14 12:59:14 +03:00
dma.c
elfcore.c
exec_domain.c
exit.c ipc: introduce shm_rmid_forced sysctl 2011-07-26 16:49:44 -07:00
extable.c extable, core_kernel_data(): Make sure all archs define _sdata 2011-05-20 08:56:56 +02:00
fork.c move RLIMIT_NPROC check from set_user() to do_execve_common() 2011-08-11 11:24:42 -07:00
freezer.c Freezer: Use SMP barriers 2011-05-17 23:19:17 +02:00
futex_compat.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
futex.c Merge branch 'linus' into core/urgent 2011-08-04 09:09:27 +02:00
groups.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
hrtimer.c hrtimers: Fix typo causing erratic timers 2011-05-25 15:31:58 -07:00
hung_task.c watchdog, hung_task_timeout: Add Kconfig configurable default 2011-04-28 09:13:17 +02:00
irq_work.c irq_work: Use per cpu atomics instead of regular atomics 2010-12-18 15:54:48 +01:00
itimer.c
jump_label.c jump_label: Fix jump_label update for modules 2011-06-29 09:59:17 -04:00
kallsyms.c Merge branch 'core-fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-25 17:52:22 -07:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks arch:Kconfig.locks Remove unused config option. 2011-04-10 17:01:05 +02:00
Kconfig.preempt sched: Isolate preempt counting in its own config option 2011-06-10 15:15:40 +02:00
kexec.c treewide: Convert uses of struct resource to resource_size(ptr) 2011-06-10 14:55:36 +02:00
kfifo.c
kmod.c Boot up with usermodehelper disabled 2011-08-03 22:03:29 -10:00
kprobes.c kprobes: Return -ENOENT if probe point doesn't exist 2011-07-15 15:11:47 -04:00
ksysfs.c kernel/ksysfs.c: expose file_caps_enabled in sysfs 2011-04-19 16:45:51 -07:00
kthread.c cpuset: Fix cpuset_cpus_allowed_fallback(), don't update tsk->rt.nr_cpus_allowed 2011-05-28 17:02:57 +02:00
latencytop.c Fix common misspellings 2011-03-31 11:26:23 -03:00
lockdep_internals.h
lockdep_proc.c lockdep: Remove unused 'factor' variable from lockdep_stats_show() 2011-03-23 13:54:47 +01:00
lockdep_states.h
lockdep.c lockdep: Fix wrong assumption in match_held_lock 2011-08-09 11:57:35 +02:00
Makefile jump label: Reduce the cycle count by changing the link order 2011-08-05 23:57:33 +02:00
module.c module: add /sys/module/<name>/uevent files 2011-07-24 22:06:04 +09:30
mutex-debug.c mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex-debug.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
mutex.c lockdep, mutex: provide mutex_lock_nest_lock 2011-05-25 08:39:17 -07:00
mutex.h mutex: Use p->on_cpu for the adaptive spin 2011-04-14 08:52:33 +02:00
notifier.c notifiers: sys: move reboot notifiers into reboot.h 2011-07-25 20:57:14 -07:00
nsproxy.c make sure that nsproxy_cache is initialized early enough 2011-07-20 01:44:07 -04:00
padata.c Fix common misspellings 2011-03-31 11:26:23 -03:00
panic.c panic: panic=-1 for immediate reboot 2011-07-26 16:49:45 -07:00
params.c module: add /sys/module/<name>/uevent files 2011-07-24 22:06:04 +09:30
pid_namespace.c pidns: call pid_ns_prepare_proc() from create_pid_namespace() 2011-03-23 19:46:58 -07:00
pid.c rcu: treewide: Do not use rcu_read_lock_held when calling rcu_dereference_check 2011-07-08 22:21:58 +02:00
pm_qos_params.c plist: Remove the need to supply locks to plist heads 2011-07-08 14:02:53 +02:00
posix-cpu-timers.c hrtimers: Avoid touching inactive timer bases 2011-05-23 13:59:54 +02:00
posix-timers.c posix-timers: RCU conversion 2011-05-24 12:10:51 +02:00
printk.c kernel/printk: do not turn off bootconsole in printk_late_init() if keep_bootcon 2011-08-25 16:25:34 -07:00
profile.c kernel/profile.c: remove some duplicate code from profile_hits() 2011-05-26 17:12:37 -07:00
ptrace.c connector: add an event for monitoring process tracers 2011-07-18 21:38:33 +02:00
range.c
rcupdate.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
rcutiny_plugin.h rcu: Converge TINY_RCU expedited and normal boosting 2011-05-05 23:16:58 -07:00
rcutiny.c sanitize <linux/prefetch.h> usage 2011-05-20 12:50:29 -07:00
rcutorture.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
rcutree_plugin.h softirq,rcu: Inform RCU of irq_exit() activity 2011-07-20 10:50:12 -07:00
rcutree_trace.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
rcutree.c rcu: Prevent RCU callbacks from executing before scheduler initialized 2011-07-13 08:17:56 -07:00
rcutree.h rcu: Move RCU_BOOST #ifdefs to header file 2011-06-16 16:12:05 -07:00
relay.c
res_counter.c memcg: res_counter_read_u64(): fix potential races on 32-bit machines 2011-03-23 19:46:22 -07:00
resource.c resources: Add lookup_resource() 2011-07-30 21:21:39 +02:00
rtmutex_common.h rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex-debug.c rtmutex: Simplify PI algorithm and make highest prio task get lock 2011-01-27 21:13:51 -05:00
rtmutex-debug.h
rtmutex-tester.c rtmutex: tester: Remove the remaining BKL leftovers 2011-02-22 22:07:22 +01:00
rtmutex.c plist: Remove the need to supply locks to plist heads 2011-07-08 14:02:53 +02:00
rtmutex.h
rwsem.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
sched_autogroup.c Fix common misspellings 2011-03-31 11:26:23 -03:00
sched_autogroup.h sched: Skip autogroup when looking for all rt sched groups 2011-07-01 10:39:08 +02:00
sched_clock.c sched: Add some clock info to sched_debug 2010-11-23 10:29:08 +01:00
sched_cpupri.c
sched_cpupri.h
sched_debug.c sched: Get rid of lock_depth 2011-04-24 13:18:38 +02:00
sched_fair.c sched: Cleanup duplicate local variable in [enqueue|dequeue]_task_fair 2011-07-22 12:47:22 +02:00
sched_features.h Merge branch 'kvm-updates/3.1' of git://git.kernel.org/pub/scm/virt/kvm/kvm 2011-07-24 09:07:03 -07:00
sched_idletask.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
sched_rt.c sched: Skip autogroup when looking for all rt sched groups 2011-07-01 10:39:08 +02:00
sched_stats.h sched: More sched_domain iterations fixes 2011-05-28 17:02:54 +02:00
sched_stoptask.c sched: Drop the rq argument to sched_class::select_task_rq() 2011-04-14 08:52:36 +02:00
sched.c perf events: Fix slow and broken cgroup context switch code 2011-08-29 12:28:33 +02:00
seccomp.c
semaphore.c
signal.c signals: sys_ssetmask/sys_rt_sigsuspend should use set_current_blocked() 2011-07-27 12:53:36 -07:00
smp.c generic-ipi: Fix kexec boot crash by initializing call_single_queue before enabling interrupts 2011-06-17 10:17:12 +02:00
softirq.c softirq,rcu: Inform RCU of irq_exit() activity 2011-07-20 10:50:12 -07:00
spinlock.c
srcu.c rcu: demote SRCU_SYNCHRONIZE_DELAY from kernel-parameter status 2011-01-14 04:56:49 -08:00
stacktrace.c stack_trace: Add weak save_stack_trace_regs() 2011-06-14 22:48:52 -04:00
stop_machine.c atomic: use <linux/atomic.h> 2011-07-26 16:49:47 -07:00
sys_ni.c All Arch: remove linkage for sys_nfsservctl system call 2011-08-26 15:09:58 -07:00
sys.c Add a personality to report 2.6.x version numbers 2011-08-25 10:17:28 -07:00
sysctl_binary.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
sysctl_check.c xfs: remove subdirectories 2011-08-12 16:21:35 -05:00
sysctl.c sysctl,rcu: Convert call_rcu(free_head) to kfree 2011-07-20 14:10:18 -07:00
taskstats.c taskstats: add_del_listener() should ignore !valid listeners 2011-08-03 14:25:20 -10:00
test_kprobes.c
time.c Merge branch 'timers-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2011-03-15 18:53:35 -07:00
timeconst.pl
timer.c timers: Consider slack value in mod_timer() 2011-06-03 15:02:32 +02:00
tracepoint.c jump label: Introduce static_branch() interface 2011-04-04 12:48:08 -04:00
tsacct.c
uid16.c userns: user namespaces: convert several capable() calls 2011-03-23 19:47:08 -07:00
up.c
user_namespace.c user_ns: improve the user_ns on-the-slab packaging 2011-01-13 08:03:18 -08:00
user-return-notifier.c Fix common misspellings 2011-03-31 11:26:23 -03:00
user.c userns: add a user_namespace as creator/owner of uts_namespace 2011-03-23 19:46:59 -07:00
utsname_sysctl.c
utsname.c ns proc: Add support for the uts namespace 2011-05-10 14:35:35 -07:00
wait.c Fix common misspellings 2011-03-31 11:26:23 -03:00
watchdog.c perf, x86: P4 PMU - Introduce event alias feature 2011-07-14 17:25:04 -04:00
workqueue_sched.h
workqueue.c Merge branch 'for-3.1' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2011-07-22 15:07:15 -07:00