linux/kernel
Davidlohr Bueso b0c29f79ec futexes: Avoid taking the hb->lock if there's nothing to wake up
In futex_wake() there is clearly no point in taking the hb->lock
if we know beforehand that there are no tasks to be woken. While
the hash bucket's plist head is a cheap way of knowing this, we
cannot rely 100% on it as there is a racy window between the
futex_wait call and when the task is actually added to the
plist. To this end, we couple it with the spinlock check as
tasks trying to enter the critical region are most likely
potential waiters that will be added to the plist, thus
preventing tasks sleeping forever if wakers don't acknowledge
all possible waiters.

Furthermore, the futex ordering guarantees are preserved,
ensuring that waiters either observe the changed user space
value before blocking or is woken by a concurrent waker. For
wakers, this is done by relying on the barriers in
get_futex_key_refs() -- for archs that do not have implicit mb
in atomic_inc(), we explicitly add them through a new
futex_get_mm function. For waiters we rely on the fact that
spin_lock calls already update the head counter, so spinners
are visible even if the lock hasn't been acquired yet.

For more details please refer to the updated comments in the
code and related discussion:

  https://lkml.org/lkml/2013/11/26/556

Special thanks to tglx for careful review and feedback.

Suggested-by: Linus Torvalds <torvalds@linux-foundation.org>
Reviewed-by: Darren Hart <dvhart@linux.intel.com>
Reviewed-by: Thomas Gleixner <tglx@linutronix.de>
Reviewed-by: Peter Zijlstra <peterz@infradead.org>
Signed-off-by: Davidlohr Bueso <davidlohr@hp.com>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Mike Galbraith <efault@gmx.de>
Cc: Jeff Mahoney <jeffm@suse.com>
Cc: Scott Norton <scott.norton@hp.com>
Cc: Tom Vaden <tom.vaden@hp.com>
Cc: Aswin Chandramouleeswaran <aswin@hp.com>
Cc: Waiman Long <Waiman.Long@hp.com>
Cc: Jason Low <jason.low2@hp.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Link: http://lkml.kernel.org/r/1389569486-25487-5-git-send-email-davidlohr@hp.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2014-01-13 11:45:21 +01:00
..
cpu sched: Add NEED_RESCHED to the preempt_count 2013-09-25 14:07:49 +02:00
debug kdb: Add support for external NMI handler to call KGDB/KDB 2013-10-03 18:47:54 +02:00
events perf: Disable all pmus on unthrottling and rescheduling 2013-12-17 15:04:00 +01:00
gcov gcov: reuse kbasename helper 2013-11-13 12:09:34 +09:00
irq Merge branch 'irq-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-12-02 10:15:39 -08:00
locking mutexes: Give more informative mutex warning in the !lock->owner case 2013-12-17 15:35:10 +01:00
power PM / sleep: Fix memory leak in pm_vt_switch_unregister(). 2013-12-22 00:56:35 +01:00
printk printk.c: comments should refer to /proc/vmcore instead of /proc/vmcoreinfo 2013-11-13 12:09:14 +09:00
rcu Linux 3.13-rc4 2013-12-17 15:27:08 +01:00
sched Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-12-19 09:11:22 -08:00
time nohz: Fix another inconsistency between CONFIG_NO_HZ=n and nohz=off 2013-11-29 12:23:03 +01:00
trace This fixes a long standing bug in the ftrace profiler. 2013-12-20 09:32:30 -08:00
.gitignore Ignore generated file kernel/x509_certificate_list 2013-12-10 18:21:34 +00:00
acct.c fs: Fix hang with BSD accounting on frozen filesystem 2013-05-04 14:57:58 -04:00
async.c
audit_tree.c kernel/audit_tree.c:audit_add_tree_rule(): protect `rule' from kill_rules() 2013-06-12 16:29:46 -07:00
audit_watch.c
audit.c Merge git://git.infradead.org/users/eparis/audit 2013-11-21 19:18:14 -08:00
audit.h audit: call audit_bprm() only once to add AUDIT_EXECVE information 2013-11-05 11:15:03 -05:00
auditfilter.c audit: do not reject all AUDIT_INODE filter types 2013-11-05 11:09:16 -05:00
auditsc.c audit: fix type of sessionid in audit_set_loginuid() 2013-11-06 11:47:24 -05:00
backtracetest.c
bounds.c mm: do not allocate page->ptl dynamically, if spinlock_t fits to long 2013-12-20 12:25:45 -08:00
capability.c xfs: update for v3.12-rc1 2013-09-09 11:19:09 -07:00
cgroup_freezer.c cgroup: make css_for_each_descendant() and friends include the origin css in the iteration 2013-08-08 20:11:27 -04:00
cgroup.c cgroup: don't recycle cgroup id until all csses' have been destroyed 2013-12-17 08:11:52 -05:00
compat.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/signal 2013-05-01 07:21:43 -07:00
configs.c proc: Supply PDE attribute setting accessor functions 2013-05-01 17:29:18 -04:00
context_tracking.c Linux 3.12-rc4 2013-10-09 12:36:13 +02:00
cpu_pm.c
cpu.c Merge branch 'sched-urgent-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:55:11 +09:00
cpuset.c cpuset: Fix memory allocator deadlock 2013-11-27 13:52:47 -05:00
crash_dump.c
cred.c
delayacct.c kernel/delayacct.c: remove redundant checking in __delayacct_add_tsk() 2013-11-13 12:09:12 +09:00
dma.c
elfcore.c switch elf_core_write_extra_phdrs() to dump_emit() 2013-11-09 00:16:23 -05:00
exec_domain.c
exit.c ptrace: revert "Prepare to fix racy accesses on task breakpoints" 2013-07-09 10:33:26 -07:00
extable.c kernel/extable: fix address-checks for core_kernel and init areas 2013-11-28 09:49:41 -08:00
fork.c mm: fix TLB flush race between migration, and change_protection_range 2013-12-18 19:04:51 -08:00
freezer.c libata, freezer: avoid block device removal while system is frozen 2013-12-19 13:50:32 -05:00
futex_compat.c
futex.c futexes: Avoid taking the hb->lock if there's nothing to wake up 2014-01-13 11:45:21 +01:00
groups.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
hrtimer.c kernel: delete __cpuinit usage from all core kernel files 2013-07-14 19:36:59 -04:00
hung_task.c Here are the 3.13 KVM changes. There was a lot of work on the PPC 2013-11-15 13:51:36 +09:00
irq_work.c
itimer.c
jump_label.c static_key: WARN on usage before jump_label_init was called 2013-10-19 19:45:35 -04:00
kallsyms.c kernel: kallsyms: memory override issue, need check destination buffer length 2013-04-15 15:17:26 +09:30
kcmp.c
Kconfig.freezer
Kconfig.hz kernel: remove CONFIG_USE_GENERIC_SMP_HELPERS 2013-11-15 09:32:22 +09:00
Kconfig.locks locking: Fix copy/paste errors of "ARCH_INLINE_*_UNLOCK_BH" 2013-05-28 08:50:00 +02:00
Kconfig.preempt
kexec.c kexec: migrate to reboot cpu 2013-12-18 19:04:50 -08:00
kmod.c kernel/kmod.c: check for NULL in call_usermodehelper_exec() 2013-09-30 14:31:02 -07:00
kprobes.c kprobes: use KSYM_NAME_LEN to size identifier buffers 2013-11-13 12:09:26 +09:00
ksysfs.c kernel: replace strict_strto*() with kstrto*() 2013-09-12 15:38:03 -07:00
kthread.c kthread: make kthread_create() killable 2013-11-13 12:08:59 +09:00
latencytop.c
Makefile KEYS: Remove files generated when SYSTEM_TRUSTED_KEYRING=y 2013-12-13 15:59:11 +00:00
module_signing.c keys: change asymmetric keys to use common hash definitions 2013-10-25 17:15:18 -04:00
module-internal.h KEYS: Separate the kernel signature checking keyring from module signing 2013-09-25 17:17:01 +01:00
module.c Mainly boring here, too. rmmod --wait finally removed, though. 2013-11-15 13:27:50 +09:00
notifier.c
nsproxy.c Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace 2013-09-07 14:35:32 -07:00
padata.c padata: make the sequence counter an atomic_t 2013-10-30 12:02:58 +08:00
panic.c kernel/panic.c: reduce 1 byte usage for print tainted buffer 2013-11-13 12:09:35 +09:00
params.c kernel/params: fix handling of signed integer types 2013-09-28 12:35:52 -07:00
pid_namespace.c pid_namespace: make freeing struct pid_namespace rcu-delayed 2013-10-24 23:43:29 -04:00
pid.c pidns: fix free_pid() to handle the first fork failure 2013-09-30 14:31:03 -07:00
posix-cpu-timers.c posix_timers: fix racy timer delta caching on task exit 2013-07-03 16:54:42 +02:00
posix-timers.c posix-timers: Remove unused variable 2013-04-18 12:51:19 +02:00
profile.c kernel: delete __cpuinit usage from all core kernel files 2013-07-14 19:36:59 -04:00
ptrace.c exec/ptrace: fix get_dumpable() incorrect tests 2013-11-13 12:09:33 +09:00
range.c range: Do not add new blank slot with add_range_with_merge 2013-06-18 11:32:10 -05:00
reboot.c kexec: migrate to reboot cpu 2013-12-18 19:04:50 -08:00
relay.c kernel: delete __cpuinit usage from all core kernel files 2013-07-14 19:36:59 -04:00
res_counter.c memcg: reduce function dereference 2013-09-12 15:38:02 -07:00
resource.c kernel/resource.c: remove the unneeded assignment in function __find_resource 2013-07-03 16:08:06 -07:00
seccomp.c
signal.c constify copy_siginfo_to_user{,32}() 2013-11-09 00:16:29 -05:00
smp.c kernel: fix generic_exec_single indentation 2013-11-15 09:32:22 +09:00
smpboot.c kernel: delete __cpuinit usage from all core kernel files 2013-07-14 19:36:59 -04:00
smpboot.h
softirq.c lockdep: Simplify a bit hardirq <-> softirq transitions 2013-11-27 11:09:40 +01:00
stacktrace.c
stop_machine.c stop_machine: Fix race between stop_two_cpus() and stop_cpus() 2013-11-11 12:43:38 +01:00
sys_ni.c unify compat fanotify_mark(2), switch to COMPAT_SYSCALL_DEFINE 2013-05-09 13:46:38 -04:00
sys.c kernel/sys.c: remove obsolete #include <linux/kexec.h> 2013-11-13 12:09:13 +09:00
sysctl_binary.c kernel/sysctl_binary.c: use scnprintf() instead of snprintf() 2013-11-13 12:09:33 +09:00
sysctl.c Merge branch 'core-locking-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2013-11-14 16:30:30 +09:00
system_certificates.S KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
system_keyring.c KEYS: correct alignment of system_certificate_list content in assembly file 2013-12-10 18:25:28 +00:00
task_work.c task_work: documentation 2013-09-11 15:58:27 -07:00
taskstats.c genetlink: only pass array to genl_register_family_with_ops() 2013-11-19 16:39:05 -05:00
test_kprobes.c kernel/: rename random32() to prandom_u32() 2013-04-29 18:28:42 -07:00
time.c sched: Rename sched.c as sched/core.c in comments and Documentation 2013-06-19 12:58:42 +02:00
timeconst.bc
timer.c timer: Convert kmalloc_node(...GFP_ZERO...) to kzalloc_node(...) 2013-11-19 14:59:50 +01:00
tracepoint.c Tracing updates for Linux 3.10 2013-04-29 13:55:38 -07:00
tsacct.c
uid16.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
up.c kernel: provide a __smp_call_function_single stub for !CONFIG_SMP 2013-11-15 09:32:22 +09:00
user_namespace.c KEYS: Add per-user_namespace registers for persistent per-UID kerberos caches 2013-09-24 10:35:19 +01:00
user-return-notifier.c
user.c KEYS: fix uninitialized persistent_keyring_register_sem 2013-12-13 15:59:11 +00:00
utsname_sysctl.c
utsname.c userns: Kill nsown_capable it makes the wrong thing easy 2013-08-30 23:44:11 -07:00
watchdog.c watchdog: update watchdog_thresh properly 2013-09-24 17:00:25 -07:00
workqueue_internal.h sched: Rename sched.c as sched/core.c in comments and Documentation 2013-06-19 12:58:42 +02:00
workqueue.c PCI updates for v3.13: 2013-12-15 11:45:27 -08:00