linux/kernel
Daniel Lezcano 97978e6d1f cgroup: add clone_children control file
The ns_cgroup is a control group interacting with the namespaces.  When a
new namespace is created, a corresponding cgroup is automatically created
too.  The cgroup name is the pid of the process who did 'unshare' or the
child of 'clone'.

This cgroup is tied with the namespace because it prevents a process to
escape the control group and use the post_clone callback, so the child
cgroup inherits the values of the parent cgroup.

Unfortunately, the more we use this cgroup and the more we are facing
problems with it:

(1) when a process unshares, the cgroup name may conflict with a
    previous cgroup with the same pid, so unshare or clone return -EEXIST

(2) the cgroup creation is out of control because there may have an
    application creating several namespaces where the system will
    automatically create several cgroups in his back and let them on the
    cgroupfs (eg.  a vrf based on the network namespace).

(3) the mix of (1) and (2) force an administrator to regularly check
    and clean these cgroups.

This patchset removes the ns_cgroup by adding a new flag to the cgroup and
the cgroupfs mount option.  It enables the copy of the parent cgroup when
a child cgroup is created.  We can then safely remove the ns_cgroup as
this flag brings a compatibility.  We have now to manually create and add
the task to a cgroup, which is consistent with the cgroup framework.

This patch:

Sent as an answer to a previous thread around the ns_cgroup.

https://lists.linux-foundation.org/pipermail/containers/2009-June/018627.html

It adds a control file 'clone_children' for a cgroup.  This control file
is a boolean specifying if the child cgroup should be a clone of the
parent cgroup or not.  The default value is 'false'.

This flag makes the child cgroup to call the post_clone callback of all
the subsystem, if it is available.

At present, the cpuset is the only one which had implemented the
post_clone callback.

The option can be set at mount time by specifying the 'clone_children'
mount option.

Signed-off-by: Daniel Lezcano <daniel.lezcano@free.fr>
Signed-off-by: Serge E. Hallyn <serge.hallyn@canonical.com>
Cc: Eric W. Biederman <ebiederm@xmission.com>
Acked-by: Paul Menage <menage@google.com>
Reviewed-by: Li Zefan <lizf@cn.fujitsu.com>
Cc: Jamal Hadi Salim <hadi@cyberus.ca>
Cc: Matt Helsley <matthltc@us.ibm.com>
Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2010-10-27 18:03:09 -07:00
..
debug kdb,debug_core: adjust master cpu switch logic against new debug_core locking 2010-10-22 15:34:13 -05:00
gcov llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
irq genirq: Fix CONFIG_GENIRQ_NO_DEPRECATED=y build 2010-10-12 21:59:55 +02:00
power use clear_page()/copy_page() in favor of memset()/memcpy() on whole pages 2010-10-26 16:52:13 -07:00
time ntp: Clamp PLL update interval 2010-09-09 20:48:37 +02:00
trace Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-10-24 13:41:39 -07:00
.gitignore
acct.c pass a struct path to vfs_statfs 2010-08-09 16:48:42 -04:00
async.c async: use workqueue for worker pool 2010-07-14 11:29:46 +02:00
audit_tree.c fanotify: use both marks when possible 2010-07-28 10:18:55 -04:00
audit_watch.c Revert "fsnotify: store struct file not struct path" 2010-08-12 14:23:04 -07:00
audit.c Merge branch 'for-linus' of git://git.infradead.org/users/eparis/notify 2010-08-10 11:39:13 -07:00
audit.h Audit: split audit watch Kconfig 2010-07-28 09:58:19 -04:00
auditfilter.c audit: do not get and put just to free a watch 2010-07-28 09:58:17 -04:00
auditsc.c vfs: add helpers to get root and pwd 2010-08-11 00:28:20 -04:00
backtracetest.c
bounds.c
capability.c sched: Remove remaining USER_SCHED code 2010-04-02 20:12:00 +02:00
cgroup_freezer.c cgroup_freezer: update_freezer_state() does incorrect state transitions 2010-10-27 18:03:08 -07:00
cgroup.c cgroup: add clone_children control file 2010-10-27 18:03:09 -07:00
compat.c compat: Make compat_alloc_user_space() incorporate the access_ok() 2010-09-14 16:08:45 -07:00
configs.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
cpu.c sched: adjust when cpu_active and cpuset configurations are updated during cpu on/offlining 2010-06-08 21:40:36 +02:00
cpuset.c security: remove unused parameter from security_task_setscheduler() 2010-10-21 10:12:44 +11:00
cred.c Add a dummy printk function for the maintenance of unused printks 2010-08-12 09:51:35 -07:00
delayacct.c
dma.c
elfcore.c elf coredump: add extended numbering support 2010-03-06 11:26:46 -08:00
exec_domain.c sys_personality: remove the bogus checks in sys_personality()->__set_personality() path 2010-08-09 20:45:05 -07:00
exit.c oom: add per-mm oom disable count 2010-10-26 16:52:05 -07:00
extable.c
fork.c oom: add per-mm oom disable count 2010-10-26 16:52:05 -07:00
freezer.c
futex_compat.c futex: Change 3rd arg of fetch_robust_entry() to unsigned int* 2010-09-18 12:19:21 +02:00
futex.c new helper: ihold() 2010-10-25 21:26:11 -04:00
groups.c kernel/groups.c: fix integer overflow in groups_search 2010-09-09 18:57:24 -07:00
hrtimer.c hrtimer: Preserve timer state in remove_hrtimer() 2010-10-14 13:29:59 +02:00
hung_task.c lockup detector: Fix grammar by adding a missing "to" in the comments 2010-08-17 09:11:52 +02:00
hw_breakpoint.c perf, hw_breakpoint: Fix crash in hw_breakpoint creation 2010-10-18 19:58:55 +02:00
irq_work.c irq_work: Add generic hardirq context callbacks 2010-10-18 19:58:50 +02:00
itimer.c
jump_label.c jump label: Add jump_label_text_reserved() to reserve jump points 2010-09-22 16:30:46 -04:00
kallsyms.c kdb: core for kgdb back end (2 of 2) 2010-05-20 21:04:21 -05:00
Kconfig.freezer
Kconfig.hz
Kconfig.locks
Kconfig.preempt
kexec.c use clear_page()/copy_page() in favor of memset()/memcpy() on whole pages 2010-10-26 16:52:13 -07:00
kfifo.c kfifo: fix scatterlist usage 2010-10-01 10:50:58 -07:00
kmod.c Make do_execve() take a const filename pointer 2010-08-17 18:07:43 -07:00
kprobes.c Merge branch 'llseek' of git://git.kernel.org/pub/scm/linux/kernel/git/arnd/bkl 2010-10-22 10:52:56 -07:00
ksysfs.c sysfs: add struct file* to bin_attr callbacks 2010-05-21 09:37:31 -07:00
kthread.c kthread: implement kthread_data() 2010-06-29 10:07:09 +02:00
latencytop.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
lockdep_internals.h lockdep: No need to disable preemption in debug atomic ops 2010-05-04 05:38:16 +02:00
lockdep_proc.c lockstat: Make lockstat counting per cpu 2010-04-06 00:15:37 +02:00
lockdep_states.h
lockdep.c lockdep: Check the depth of subclass 2010-10-18 18:44:26 +02:00
Makefile Merge branch 'core-memblock-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-10-21 18:52:11 -07:00
module.c Merge commit 'v2.6.36-rc7' into perf/core 2010-10-08 10:46:27 +02:00
mutex-debug.c
mutex-debug.h locking: Implement new raw_spinlock 2009-12-14 23:55:32 +01:00
mutex.c mutex: Fix annotations to include it in kernel-locking docbook 2010-09-03 08:19:51 +02:00
mutex.h
notifier.c sched: Use lockdep-based checking on rcu_dereference() 2010-02-25 10:34:26 +01:00
ns_cgroup.c
nsproxy.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
padata.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 2010-08-04 15:23:14 -07:00
panic.c lib/bug.c: add oops end marker to WARN implementation 2010-08-11 08:59:22 -07:00
params.c param: locking for kernel parameters 2010-08-11 23:04:20 +09:30
perf_event.c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-10-21 12:54:49 -07:00
pid_namespace.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
pid.c Add RCU check for find_task_by_vpid(). 2010-08-19 17:18:02 -07:00
pm_qos_params.c Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/jikos/trivial 2010-10-24 13:41:39 -07:00
posix-cpu-timers.c Merge branch 'writable_limits' of git://decibel.fi.muni.cz/~xslaby/linux 2010-08-10 12:07:51 -07:00
posix-timers.c posix_timer: Move copy_to_user(created_timer_id) down in timer_create() 2010-07-23 15:08:12 +02:00
printk.c printk: change type of 'boot_delay' to int * 2010-10-26 16:52:16 -07:00
profile.c llseek: automatically add .llseek fop 2010-10-15 15:53:27 +02:00
ptrace.c ptrace: optimize exit_ptrace() for the likely case 2010-08-11 08:59:19 -07:00
range.c kernel/range: remove unused definition of ARRAY_SIZE() 2010-08-09 20:45:06 -07:00
rcupdate.c Merge branch 'rcu/urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-2.6-rcu into core/rcu 2010-10-07 09:43:11 +02:00
rcutiny_plugin.h rcu: performance fixes to TINY_PREEMPT_RCU callback checking 2010-08-27 10:51:17 -07:00
rcutiny.c rcu: Add a TINY_PREEMPT_RCU 2010-08-20 08:55:00 -07:00
rcutorture.c rcu: fix sparse errors in rcutorture.c 2010-09-23 09:16:42 -07:00
rcutree_plugin.h rcu: fix _oddness handling of verbose stall warnings 2010-09-02 16:15:30 -07:00
rcutree_trace.c rcu: Add tracing data to support queueing models 2010-09-23 09:16:53 -07:00
rcutree.c rcu: using ACCESS_ONCE() to observe the jiffies_stall/rnp->qsmask value 2010-10-07 10:41:06 -07:00
rcutree.h rcu: Add tracing data to support queueing models 2010-09-23 09:16:53 -07:00
relay.c kernel/: convert cpu notifier to return encapsulate errno value 2010-05-27 09:12:48 -07:00
res_counter.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
resource.c resource: shared I/O region support 2010-05-11 12:01:10 -07:00
rtmutex_common.h
rtmutex-debug.c sched: Convert pi_lock to raw_spinlock 2009-12-14 23:55:33 +01:00
rtmutex-debug.h
rtmutex-tester.c rtmutex-tester: make it build without BKL 2010-10-19 11:29:56 +02:00
rtmutex.c rtmutes: Convert rtmutex.lock to raw_spinlock 2009-12-14 23:55:33 +01:00
rtmutex.h
rwsem.c
sched_clock.c sched_clock: Add local_clock() API and improve documentation 2010-06-09 10:34:49 +02:00
sched_cpupri.c sched: No need for bootmem special cases 2010-07-17 12:06:22 +02:00
sched_cpupri.h sched: No need for bootmem special cases 2010-07-17 12:06:22 +02:00
sched_debug.c sched: Use correct macro to display sched_child_runs_first in /proc/sched_debug 2010-07-21 21:46:12 +02:00
sched_fair.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-10-21 12:55:43 -07:00
sched_features.h sched: Remove irq time from available CPU power 2010-10-18 20:52:27 +02:00
sched_idletask.c sched: Cure load average vs NO_HZ woes 2010-04-23 11:02:02 +02:00
sched_rt.c sched: Do not account irq time to current task 2010-10-18 20:52:26 +02:00
sched_stats.h sched: Remove the obsolete exit_state/signal hacks 2010-06-18 10:46:56 +02:00
sched_stoptask.c sched: Create special class for stop/migrate work 2010-10-18 18:41:58 +02:00
sched.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-10-21 12:55:43 -07:00
seccomp.c
semaphore.c
signal.c HWPOISON: Copy si_addr_lsb to user 2010-10-07 09:41:25 +02:00
smp.c generic-ipi: Fix deadlock in __smp_call_function_single 2010-09-10 16:48:40 +02:00
softirq.c Merge branches 'softirq-for-linus', 'x86-debug-for-linus', 'x86-numa-for-linus', 'x86-quirks-for-linus', 'x86-setup-for-linus', 'x86-uv-for-linus' and 'x86-vm86-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-10-23 08:25:36 -07:00
spinlock.c locking: Cleanup the name space completely 2009-12-14 23:55:33 +01:00
srcu.c kernel: Remove undead ifdef CONFIG_DEBUG_LOCK_ALLOC 2010-09-23 09:14:51 -07:00
stacktrace.c
stop_machine.c stop_machine: convert cpu notifier to return encapsulate errno value 2010-10-26 16:52:15 -07:00
sys_ni.c powerpc: define a compat_sys_recv cond_syscall 2010-09-23 17:03:55 +10:00
sys.c pid: make setpgid() system call use RCU read-side critical section 2010-08-31 17:00:18 -07:00
sysctl_binary.c sysctl: don't use own implementation of hex_to_bin() 2010-05-25 08:07:05 -07:00
sysctl_check.c sysctl: min/max bounds are optional 2010-10-15 14:42:24 -07:00
sysctl.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs-2.6 2010-10-26 17:58:44 -07:00
taskstats.c include cleanup: Update gfp.h and slab.h includes to prepare for breaking implicit slab.h inclusion from percpu.h 2010-03-30 22:02:32 +09:00
test_kprobes.c kprobes: Fix selftest to clear flags field for reusing probes 2010-10-14 08:55:27 +02:00
time.c time: Kill off CONFIG_GENERIC_TIME 2010-07-27 12:40:54 +02:00
timeconst.pl
timer.c irq_work: Add generic hardirq context callbacks 2010-10-18 19:58:50 +02:00
tracepoint.c jump_label: Use more consistent naming 2010-10-18 19:58:56 +02:00
tsacct.c mm: clean up mm_counter 2010-03-06 11:26:23 -08:00
uid16.c
up.c
user_namespace.c user_ns: Introduce user_nsmap_uid and user_ns_map_gid. 2010-06-16 14:55:34 -07:00
user-return-notifier.c
user.c kernel/user.c: add lock release annotation on free_user() 2010-10-26 16:52:15 -07:00
utsname_sysctl.c
utsname.c
wait.c docbook: add more wait/wake/completion to device-drivers docbook 2010-10-26 17:32:41 -07:00
watchdog.c Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/linux-2.6-tip 2010-10-21 12:54:49 -07:00
workqueue_sched.h workqueue: implement concurrency managed dynamic worker pool 2010-06-29 10:07:14 +02:00
workqueue.c workqueues: s/ON_STACK/ONSTACK/ 2010-10-26 16:52:14 -07:00