linux/kernel
Christian Borntraeger e91467ecd1 [PATCH] bug in futex unqueue_me
This patch adds a barrier() in futex unqueue_me to avoid aliasing of two
pointers.

On my s390x system I saw the following oops:

Unable to handle kernel pointer dereference at virtual kernel address
0000000000000000
Oops: 0004 [#1]
CPU:    0    Not tainted
Process mytool (pid: 13613, task: 000000003ecb6ac0, ksp: 00000000366bdbd8)
Krnl PSW : 0704d00180000000 00000000003c9ac2 (_spin_lock+0xe/0x30)
Krnl GPRS: 00000000ffffffff 000000003ecb6ac0 0000000000000000 0700000000000000
           0000000000000000 0000000000000000 000001fe00002028 00000000000c091f
           000001fe00002054 000001fe00002054 0000000000000000 00000000366bddc0
           00000000005ef8c0 00000000003d00e8 0000000000144f91 00000000366bdcb8
Krnl Code: ba 4e 20 00 12 44 b9 16 00 3e a7 84 00 08 e3 e0 f0 88 00 04
Call Trace:
([<0000000000144f90>] unqueue_me+0x40/0xe4)
 [<0000000000145a0c>] do_futex+0x33c/0xc40
 [<000000000014643e>] sys_futex+0x12e/0x144
 [<000000000010bb00>] sysc_noemu+0x10/0x16
 [<000002000003741c>] 0x2000003741c

The code in question is:

static int unqueue_me(struct futex_q *q)
{
        int ret = 0;
        spinlock_t *lock_ptr;

        /* In the common case we don't take the spinlock, which is nice. */
 retry:
        lock_ptr = q->lock_ptr;
        if (lock_ptr != 0) {
                spin_lock(lock_ptr);
		/*
                 * q->lock_ptr can change between reading it and
                 * spin_lock(), causing us to take the wrong lock.  This
                 * corrects the race condition.
[...]

and my compiler (gcc 4.1.0) makes the following out of it:

00000000000003c8 <unqueue_me>:
     3c8:       eb bf f0 70 00 24       stmg    %r11,%r15,112(%r15)
     3ce:       c0 d0 00 00 00 00       larl    %r13,3ce <unqueue_me+0x6>
                        3d0: R_390_PC32DBL      .rodata+0x2a
     3d4:       a7 f1 1e 00             tml     %r15,7680
     3d8:       a7 84 00 01             je      3da <unqueue_me+0x12>
     3dc:       b9 04 00 ef             lgr     %r14,%r15
     3e0:       a7 fb ff d0             aghi    %r15,-48
     3e4:       b9 04 00 b2             lgr     %r11,%r2
     3e8:       e3 e0 f0 98 00 24       stg     %r14,152(%r15)
     3ee:       e3 c0 b0 28 00 04       lg      %r12,40(%r11)
		/* write q->lock_ptr in r12 */
     3f4:       b9 02 00 cc             ltgr    %r12,%r12
     3f8:       a7 84 00 4b             je      48e <unqueue_me+0xc6>
		/* if r12 is zero then jump over the code.... */
     3fc:       e3 20 b0 28 00 04       lg      %r2,40(%r11)
		/* write q->lock_ptr in r2 */
     402:       c0 e5 00 00 00 00       brasl   %r14,402 <unqueue_me+0x3a>
                        404: R_390_PC32DBL      _spin_lock+0x2
		/* use r2 as parameter for spin_lock */

So the code becomes more or less:
if (q->lock_ptr != 0) spin_lock(q->lock_ptr)
instead of
if (lock_ptr != 0) spin_lock(lock_ptr)

Which caused the oops from above.
After adding a barrier gcc creates code without this problem:
[...] (the same)
     3ee:       e3 c0 b0 28 00 04       lg      %r12,40(%r11)
     3f4:       b9 02 00 cc             ltgr    %r12,%r12
     3f8:       b9 04 00 2c             lgr     %r2,%r12
     3fc:       a7 84 00 48             je      48c <unqueue_me+0xc4>
     400:       c0 e5 00 00 00 00       brasl   %r14,400 <unqueue_me+0x38>
                        402: R_390_PC32DBL      _spin_lock+0x2

As a general note, this code of unqueue_me seems a bit fishy. The retry logic
of unqueue_me only works if we can guarantee, that the original value of
q->lock_ptr is always a spinlock (Otherwise we overwrite kernel memory). We
know that q->lock_ptr can change. I dont know what happens with the original
spinlock, as I am not an expert with the futex code.

Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Rusty Russell <rusty@rustcorp.com.au>
Acked-by: Ingo Molnar <mingo@redhat.com>
Cc: Thomas Gleixner <tglx@timesys.com>
Signed-off-by: Christian Borntraeger <borntrae@de.ibm.com>
Signed-off-by: Andrew Morton <akpm@osdl.org>
Signed-off-by: Linus Torvalds <torvalds@osdl.org>
2006-08-06 08:57:46 -07:00
..
irq [PATCH] genirq: {en,dis}able_irq_wake() need refcounting too 2006-07-31 13:28:36 -07:00
power [PATCH] Make suspend possible with a traced process at a breakpoint 2006-08-06 08:57:45 -07:00
time [PATCH] time: rename clocksource functions 2006-06-26 09:58:21 -07:00
.gitignore gitignore: ignore more generated files 2006-01-03 11:35:26 +01:00
acct.c [PATCH] Fix sighand->siglock usage in kernel/acct.c 2006-07-14 21:53:54 -07:00
audit.c [PATCH] fix oops with CONFIG_AUDIT and !CONFIG_AUDITSYSCALL 2006-08-03 10:50:39 -04:00
audit.h [PATCH] add rule filterkey 2006-07-01 05:43:06 -04:00
auditfilter.c [PATCH] introduce audit rules counter 2006-08-03 10:55:18 -04:00
auditsc.c [PATCH] take filling ->pid, etc. out of audit_get_context() 2006-08-03 10:59:51 -04:00
capability.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
compat.c [PATCH] N32 sigset and __COMPAT_ENDIAN_SWAP__ 2006-06-25 10:01:15 -07:00
configs.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
cpu.c cpu hotplug: simplify and hopefully fix locking 2006-07-23 12:12:16 -07:00
cpuset.c [PATCH] Cpuset: fix ABBA deadlock with cpu hotplug lock 2006-07-23 13:03:05 -07:00
delayacct.c [PATCH] delay accounting: temporarily enable by default 2006-07-31 13:28:37 -07:00
dma.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
exec_domain.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
exit.c [PATCH] per-task delay accounting taskstats interface: control exit data through cpumasks 2006-07-14 21:53:57 -07:00
extable.c [PATCH] symbol_put_addr() locks kernel 2006-05-15 11:20:55 -07:00
fork.c [PATCH] delay accounting taskstats interface send tgid once 2006-07-14 21:53:57 -07:00
futex_compat.c [PATCH] pi-futex: robust-futex exit 2006-07-28 21:02:00 -07:00
futex.c [PATCH] bug in futex unqueue_me 2006-08-06 08:57:46 -07:00
hrtimer.c [PATCH] cpu hotplug: replace __devinit* with __cpuinit* for cpu notifications 2006-07-31 13:28:39 -07:00
itimer.c [PATCH] hrtimers: remove data field 2006-03-26 08:57:03 -08:00
kallsyms.c [PATCH] null-terminate over-long /proc/kallsyms symbols 2006-07-14 21:53:52 -07:00
Kconfig.hz [PATCH] i386: Selectable Frequency of the Timer Interrupt 2005-06-23 09:45:10 -07:00
Kconfig.preempt [PATCH] sched: voluntary kernel preemption 2005-06-25 16:24:45 -07:00
kexec.c [POWERPC] Add the use of the firmware soft-reset-nmi to kdump. 2006-06-28 15:18:52 +10:00
kfifo.c [PATCH] gfp flags annotations - part 1 2005-10-08 15:00:57 -07:00
kmod.c [PATCH] lockdep: annotate on-stack completions 2006-07-03 15:27:09 -07:00
kprobes.c [PATCH] IA64: kprobe invalidate icache of jump buffer 2006-07-31 13:28:38 -07:00
ksysfs.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
kthread.c [PATCH] remove kernel/kthread.c:kthread_stop_sem() 2006-07-14 21:53:52 -07:00
lockdep_internals.h [PATCH] lockdep: core 2006-07-03 15:27:03 -07:00
lockdep_proc.c [PATCH] lockdep: procfs 2006-07-03 15:27:04 -07:00
lockdep.c [PATCH] lockdep: core, reduce per-lock class-cache size 2006-07-10 13:24:14 -07:00
Makefile [PATCH] per-task-delay-accounting: taskstats interface 2006-07-14 21:53:56 -07:00
module.c [PATCH] null-terminate over-long /proc/kallsyms symbols 2006-07-14 21:53:52 -07:00
mutex-debug.c [PATCH] lockdep: prove mutex locking correctness 2006-07-03 15:27:04 -07:00
mutex-debug.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
mutex.c [PATCH] lockdep: prove mutex locking correctness 2006-07-03 15:27:04 -07:00
mutex.h [PATCH] lockdep: prove mutex locking correctness 2006-07-03 15:27:04 -07:00
panic.c [PATCH] lockdep: disable lock debugging when kernel state becomes untrusted 2006-07-10 13:24:27 -07:00
params.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
pid.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
posix-cpu-timers.c [PATCH] arm_timer: remove a racy and obsolete PF_EXITING check 2006-06-17 10:52:13 -07:00
posix-timers.c [PATCH] hrtimers: remove data field 2006-03-26 08:57:03 -08:00
printk.c [PATCH] kernel/printk.c: EXPORT_SYMBOL_UNUSED 2006-07-10 13:24:17 -07:00
profile.c Remove obsolete #include <linux/config.h> 2006-06-30 19:25:36 +02:00
ptrace.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
rcupdate.c [PATCH] cpu hotplug: replace __devinit* with __cpuinit* for cpu notifications 2006-07-31 13:28:39 -07:00
rcutorture.c [PATCH] rcutorture: add call_rcu_bh() operations 2006-06-27 17:32:40 -07:00
relay.c [PATCH] relay: consolidate sendfile() and read() code 2006-03-23 19:58:45 +01:00
resource.c [PATCH] The scheduled unexport of insert_resource 2006-07-12 16:09:08 -07:00
rtmutex_common.h [PATCH] pi-futex: futex_lock_pi/futex_unlock_pi support 2006-06-27 17:32:47 -07:00
rtmutex-debug.c [PATCH] sched: cleanup, remove task_t, convert to struct task_struct 2006-07-03 15:27:11 -07:00
rtmutex-debug.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
rtmutex-tester.c [PATCH] Add try_to_freeze() to rt-test kthreads 2006-07-14 21:53:53 -07:00
rtmutex.c [PATCH] reference rt-mutex-design in rtmutex.c 2006-07-31 13:28:43 -07:00
rtmutex.h [PATCH] lockdep: better lock debugging 2006-07-03 15:27:01 -07:00
rwsem.c [PATCH] lockdep: prove rwsem locking correctness 2006-07-03 15:27:04 -07:00
sched.c [PATCH] pi-futex: missing pi_waiters plist initialization 2006-07-31 13:28:41 -07:00
seccomp.c Linux-2.6.12-rc2 2005-04-16 15:20:36 -07:00
signal.c Fix force_sig_info() semantics after cleanups 2006-08-02 20:17:49 -07:00
softirq.c [PATCH] Reducing local_bh_enable/disable overhead in irqtrace 2006-07-31 13:28:42 -07:00
softlockup.c [PATCH] cpu hotplug: replace __devinit* with __cpuinit* for cpu notifications 2006-07-31 13:28:39 -07:00
spinlock.c [PATCH] lockdep: prove spinlock rwlock locking correctness 2006-07-03 15:27:04 -07:00
stacktrace.c [PATCH] lockdep: stacktrace subsystem, core 2006-07-03 15:27:02 -07:00
stop_machine.c [PATCH] revert "kthread: convert stop_machine into a kthread" 2006-07-03 21:25:20 -07:00
sys_ni.c [PATCH] sys_move_pages: 32bit support (i386, x86_64) 2006-06-23 07:42:53 -07:00
sys.c [PATCH] Fix prctl privilege escalation and suid_dumpable (CVE-2006-2451) 2006-07-12 12:50:25 -07:00
sysctl.c [PATCH] ZVC/zone_reclaim: Leave 1% of unmapped pagecache pages for file I/O 2006-07-03 15:26:59 -07:00
taskstats.c [PATCH] taskstats: free skb, avoid returns in send_cpu_listeners 2006-07-31 13:28:37 -07:00
time.c [PATCH] Time: Introduce arch generic time accessors 2006-06-26 09:58:20 -07:00
timer.c [PATCH] timer: Fix tvec_bases initializer 2006-07-31 13:28:44 -07:00
uid16.c [PATCH] Add more prevent_tail_call() 2006-04-19 16:27:18 -07:00
unwind.c [PATCH] x86_64: allow unwinder to build without module support 2006-06-26 10:48:18 -07:00
user.c [PATCH] selinux: add hooks for key subsystem 2006-06-22 15:05:55 -07:00
wait.c [PATCH] uninline init_waitqueue_head() 2006-07-10 13:24:25 -07:00
workqueue.c [PATCH] Add DocBook documentation for workqueue functions 2006-07-31 13:28:40 -07:00