2
0
mirror of https://github.com/edk2-porting/linux-next.git synced 2024-12-27 22:53:55 +08:00
linux-next/kernel
Tejun Heo 76bb5ab8f6 cpuset: break kernfs active protection in cpuset_write_resmask()
Writing to either "cpuset.cpus" or "cpuset.mems" file flushes
cpuset_hotplug_work so that cpu or memory hotunplug doesn't end up
migrating tasks off a cpuset after new resources are added to it.

As cpuset_hotplug_work calls into cgroup core via
cgroup_transfer_tasks(), this flushing adds the dependency to cgroup
core locking from cpuset_write_resmak().  This used to be okay because
cgroup interface files were protected by a different mutex; however,
8353da1f91 ("cgroup: remove cgroup_tree_mutex") simplified the
cgroup core locking and this dependency became a deadlock hazard -
cgroup file removal performed under cgroup core lock tries to drain
on-going file operation which is trying to flush cpuset_hotplug_work
blocked on the same cgroup core lock.

The locking simplification was done because kernfs added an a lot
easier way to deal with circular dependencies involving kernfs active
protection.  Let's use the same strategy in cpuset and break active
protection in cpuset_write_resmask().  While it isn't the prettiest,
this is a very rare, likely unique, situation which also goes away on
the unified hierarchy.

The commands to trigger the deadlock warning without the patch and the
lockdep output follow.

 localhost:/ # mount -t cgroup -o cpuset xxx /cpuset
 localhost:/ # mkdir /cpuset/tmp
 localhost:/ # echo 1 > /cpuset/tmp/cpuset.cpus
 localhost:/ # echo 0 > cpuset/tmp/cpuset.mems
 localhost:/ # echo $$ > /cpuset/tmp/tasks
 localhost:/ # echo 0 > /sys/devices/system/cpu/cpu1/online

  ======================================================
  [ INFO: possible circular locking dependency detected ]
  3.16.0-rc1-0.1-default+ #7 Not tainted
  -------------------------------------------------------
  kworker/1:0/32649 is trying to acquire lock:
   (cgroup_mutex){+.+.+.}, at: [<ffffffff8110e3d7>] cgroup_transfer_tasks+0x37/0x150

  but task is already holding lock:
   (cpuset_hotplug_work){+.+...}, at: [<ffffffff81085412>] process_one_work+0x192/0x520

  which lock already depends on the new lock.

  the existing dependency chain (in reverse order) is:

  -> #2 (cpuset_hotplug_work){+.+...}:
  ...
  -> #1 (s_active#175){++++.+}:
  ...
  -> #0 (cgroup_mutex){+.+.+.}:
  ...

  other info that might help us debug this:

  Chain exists of:
    cgroup_mutex --> s_active#175 --> cpuset_hotplug_work

   Possible unsafe locking scenario:

	 CPU0                    CPU1
	 ----                    ----
    lock(cpuset_hotplug_work);
				 lock(s_active#175);
				 lock(cpuset_hotplug_work);
    lock(cgroup_mutex);

   *** DEADLOCK ***

  2 locks held by kworker/1:0/32649:
   #0:  ("events"){.+.+.+}, at: [<ffffffff81085412>] process_one_work+0x192/0x520
   #1:  (cpuset_hotplug_work){+.+...}, at: [<ffffffff81085412>] process_one_work+0x192/0x520

  stack backtrace:
  CPU: 1 PID: 32649 Comm: kworker/1:0 Not tainted 3.16.0-rc1-0.1-default+ #7
 ...
  Call Trace:
   [<ffffffff815a5f78>] dump_stack+0x72/0x8a
   [<ffffffff810c263f>] print_circular_bug+0x10f/0x120
   [<ffffffff810c481e>] check_prev_add+0x43e/0x4b0
   [<ffffffff810c4ee6>] validate_chain+0x656/0x7c0
   [<ffffffff810c53d2>] __lock_acquire+0x382/0x660
   [<ffffffff810c57a9>] lock_acquire+0xf9/0x170
   [<ffffffff815aa13f>] mutex_lock_nested+0x6f/0x380
   [<ffffffff8110e3d7>] cgroup_transfer_tasks+0x37/0x150
   [<ffffffff811129c0>] hotplug_update_tasks_insane+0x110/0x1d0
   [<ffffffff81112bbd>] cpuset_hotplug_update_tasks+0x13d/0x180
   [<ffffffff811148ec>] cpuset_hotplug_workfn+0x18c/0x630
   [<ffffffff810854d4>] process_one_work+0x254/0x520
   [<ffffffff810875dd>] worker_thread+0x13d/0x3d0
   [<ffffffff8108e0c8>] kthread+0xf8/0x100
   [<ffffffff815acaec>] ret_from_fork+0x7c/0xb0

Signed-off-by: Tejun Heo <tj@kernel.org>
Reported-by: Li Zefan <lizefan@huawei.com>
Tested-by: Li Zefan <lizefan@huawei.com>
2014-07-01 16:42:28 -04:00
..
debug kernel/printk: use symbolic defines for console loglevels 2014-06-04 16:54:17 -07:00
events Merge branch 'perf-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-06-12 19:18:49 -07:00
gcov gcov: add support for GCC 4.9 2014-06-10 15:34:46 -07:00
irq genirq: Improve documentation to match current implementation 2014-05-27 10:16:44 +02:00
locking Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-06-12 18:48:15 -07:00
power Merge branch 'pm-sleep' 2014-06-12 13:43:08 +02:00
printk kernel/printk: use symbolic defines for console loglevels 2014-06-04 16:54:17 -07:00
rcu Merge branch 'locking-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip into next 2014-06-03 12:57:53 -07:00
sched Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-06-12 19:42:15 -07:00
time Merge branch 'akpm' (patchbomb from Andrew) into next 2014-06-04 16:55:13 -07:00
trace One bug fix that goes back to 3.10. Accessing a non existent buffer 2014-06-12 21:07:25 -07:00
.gitignore Ignore generated file kernel/x509_certificate_list 2013-12-10 18:21:34 +00:00
acct.c ipc, kernel: clear whitespace 2014-06-06 16:08:14 -07:00
async.c
audit_tree.c inotify: Fix reporting of cookies for inotify events 2014-02-18 11:17:17 +01:00
audit_watch.c inotify: Fix reporting of cookies for inotify events 2014-02-18 11:17:17 +01:00
audit.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
audit.h audit: Use struct net not pid_t to remember the network namespce to reply in 2014-03-20 10:10:53 -04:00
auditfilter.c Merge git://git.infradead.org/users/eparis/audit 2014-04-12 12:38:53 -07:00
auditsc.c auditsc: audit_krule mask accesses need bounds checking 2014-06-10 08:44:40 -07:00
backtracetest.c kernel/backtracetest.c: replace no level printk by pr_info() 2014-06-04 16:54:14 -07:00
bounds.c mm: do not allocate page->ptl dynamically, if spinlock_t fits to long 2013-12-20 12:25:45 -08:00
capability.c fs,userns: Change inode_capable to capable_wrt_inode_uidgid 2014-06-10 13:57:22 -07:00
cgroup_freezer.c cgroup: remove css_parent() 2014-05-16 13:22:48 -04:00
cgroup.c cgroup: fix a race between cgroup_mount() and cgroup_kill_sb() 2014-06-30 10:16:26 -04:00
compat.c kernel/compat.c: use sizeof() instead of sizeof 2014-06-04 16:54:19 -07:00
configs.c
context_tracking.c asmlinkage: Add explicit __visible to drivers/*, lib/*, kernel/* 2014-05-05 16:07:46 -07:00
cpu_pm.c
cpu.c More ACPI and power management updates for 3.16-rc1 2014-06-12 13:14:19 -07:00
cpuset.c cpuset: break kernfs active protection in cpuset_write_resmask() 2014-07-01 16:42:28 -04:00
crash_dump.c
cred.c
delayacct.c kernel/delayacct.c: remove redundant checking in __delayacct_add_tsk() 2013-11-13 12:09:12 +09:00
dma.c
elfcore.c switch elf_core_write_extra_phdrs() to dump_emit() 2013-11-09 00:16:23 -05:00
exec_domain.c kernel/exec_domain.c: code clean-up 2014-06-04 16:54:15 -07:00
exit.c signals: mv {dis,}allow_signal() from sched.h/exit.c to signal.[ch] 2014-06-06 16:08:11 -07:00
extable.c asmlinkage: Make main_extable_sort_needed visible 2014-02-13 18:13:22 -08:00
fork.c ptrace: fix fork event messages across pid namespaces 2014-06-06 16:08:11 -07:00
freezer.c libata, freezer: avoid block device removal while system is frozen 2013-12-19 13:50:32 -05:00
futex_compat.c compat: Get rid of (get|put)_compat_time(val|spec) 2014-02-02 14:09:12 -08:00
futex.c Merge branch 'next' (accumulated 3.16 merge window patches) into master 2014-06-08 11:31:16 -07:00
groups.c kernel/groups.c: remove return value of set_groups 2014-04-03 16:21:05 -07:00
hrtimer.c Merge branch 'perf/urgent' into perf/core, to resolve conflict and to prepare for new patches 2014-06-06 07:55:06 +02:00
hung_task.c kernel/hung_task.c: convert simple_strtoul to kstrtouint 2014-06-04 16:54:15 -07:00
irq_work.c perf/x86: Warn to early_printk() in case irq_work is too slow 2014-02-21 21:49:07 +01:00
itimer.c
jump_label.c
kallsyms.c kernel: use macros from compiler.h instead of __attribute__((...)) 2014-04-07 16:36:11 -07:00
kcmp.c
Kconfig.freezer
Kconfig.hz kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS 2013-11-15 09:32:22 +09:00
Kconfig.locks locking/rwlocks: Introduce 'qrwlocks' - fair, queued rwlocks 2014-06-06 07:58:28 +02:00
Kconfig.preempt
kexec.c kernel/kexec.c: convert printk to pr_foo() 2014-06-06 16:08:12 -07:00
kmod.c signals: change wait_for_helper() to use kernel_sigaction() 2014-06-06 16:08:12 -07:00
kprobes.c kprobes: Show blacklist entries via debugfs 2014-04-24 10:26:41 +02:00
ksysfs.c kobject: Make support for uevent_helper optional. 2014-04-25 12:00:49 -07:00
kthread.c kthread: fix return value of kthread_create() upon SIGKILL. 2014-06-04 16:53:51 -07:00
latencytop.c kernel/latencytop.c: convert seq_printf to seq_puts 2014-06-04 16:54:15 -07:00
Makefile Merge branch 'x86-asmlinkage-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2014-03-31 14:13:25 -07:00
module_signing.c
module-internal.h
module.c Most of this is cleaning up various driver sysfs permissions so we can 2014-06-11 16:09:14 -07:00
notifier.c kprobes, notifier: Use NOKPROBE_SYMBOL macro in notifier 2014-04-24 10:26:39 +02:00
nsproxy.c
padata.c padata: Fix wrong usage of rcu_dereference() 2013-12-05 21:28:42 +08:00
panic.c kernel/panic.c: add "crash_kexec_post_notifiers" option for kdump after panic_notifers 2014-06-06 16:08:12 -07:00
params.c param: hand arguments after -- straight to init 2014-04-28 11:48:34 +09:30
pid_namespace.c pid_namespace: pidns_get() should check task_active_pid_ns() != NULL 2014-04-02 16:20:21 -07:00
pid.c
posix-cpu-timers.c posix-timers: Convert abuses of BUG_ON to WARN_ON 2013-12-09 16:56:29 +01:00
posix-timers.c
profile.c kernel/profile.c: use static const char instead of static char 2014-06-06 16:08:13 -07:00
ptrace.c kernel/compat: convert to COMPAT_SYSCALL_DEFINE 2014-03-06 15:35:10 +01:00
range.c
reboot.c kernel/reboot.c: convert simple_strtoul to kstrtoint 2014-06-04 16:54:15 -07:00
relay.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2014-04-12 14:49:50 -07:00
res_counter.c kernel/res_counter.c: replace simple_strtoull by kstrtoull 2014-06-04 16:54:15 -07:00
resource.c resources: Clarify sanity check message 2014-05-23 10:47:21 -06:00
seccomp.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
signal.c signals: introduce kernel_sigaction() 2014-06-06 16:08:12 -07:00
smp.c smp: print more useful debug info upon receiving IPI on an offline CPU 2014-06-06 16:08:12 -07:00
smpboot.c
smpboot.h
softirq.c Merge branch 'rcu/next' of git://git.kernel.org/pub/scm/linux/kernel/git/paulmck/linux-rcu into core/rcu 2014-05-22 11:36:10 +02:00
stacktrace.c
stop_machine.c kernel/stop_machine.c: kernel-doc warning fix 2014-06-04 16:54:15 -07:00
sys_ni.c sys_sgetmask/sys_ssetmask: add CONFIG_SGETMASK_SYSCALL 2014-06-04 16:54:14 -07:00
sys.c sched: Consolidate open coded implementations of nice level frobbing into nice_to_rlimit() and rlimit_to_nice() 2014-05-22 11:16:36 +02:00
sysctl_binary.c kernel/sysctl_binary.c: use scnprintf() instead of snprintf() 2013-11-13 12:09:33 +09:00
sysctl.c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-06-12 14:27:40 -07:00
system_certificates.S KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
system_keyring.c KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
task_work.c
taskstats.c genetlink: only pass array to genl_register_family_with_ops() 2013-11-19 16:39:05 -05:00
test_kprobes.c
time.c
timeconst.bc
timer.c timer: Prevent overflow in apply_slack 2014-04-30 13:46:17 +02:00
torture.c torture: Remove __init from torture_init_begin/end 2014-05-14 09:46:30 -07:00
tracepoint.c kernel/tracepoint.c: kernel-doc fixes 2014-06-04 16:54:15 -07:00
tsacct.c
uid16.c
up.c smp: Rename __smp_call_function_single() to smp_call_function_single_async() 2014-02-24 14:47:15 -08:00
user_namespace.c kernel/user_namespace.c: kernel-doc/checkpatch fixes 2014-06-06 16:08:13 -07:00
user-return-notifier.c
user.c kernel/user.c: drop unused field 'files' from user_struct 2014-06-04 16:54:16 -07:00
utsname_sysctl.c sysctl: convert use of typedef ctl_table to struct ctl_table 2014-06-06 16:08:16 -07:00
utsname.c
watchdog.c kernel/watchdog.c:touch_softlockup_watchdog(): use raw_cpu_write() 2014-04-18 16:40:08 -07:00
workqueue_internal.h workqueue: rename manager_mutex to attach_mutex 2014-05-20 10:59:32 -04:00
workqueue.c Merge branch 'for-3.16' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq 2014-06-09 14:56:49 -07:00