linux/arch/powerpc
Michael Ellerman 51d7d5205d powerpc: Add smp_mb() to arch_spin_is_locked()
The kernel defines the function spin_is_locked(), which can be used to
check if a spinlock is currently locked.

Using spin_is_locked() on a lock you don't hold is obviously racy. That
is, even though you may observe that the lock is unlocked, it may become
locked at any time.

There is (at least) one exception to that, which is if two locks are
used as a pair, and the holder of each checks the status of the other
before doing any update.

Assuming *A and *B are two locks, and *COUNTER is a shared non-atomic
value:

The first CPU does:

	spin_lock(*A)

	if spin_is_locked(*B)
		# nothing
	else
		smp_mb()
		LOAD r = *COUNTER
		r++
		STORE *COUNTER = r

	spin_unlock(*A)

And the second CPU does:

	spin_lock(*B)

	if spin_is_locked(*A)
		# nothing
	else
		smp_mb()
		LOAD r = *COUNTER
		r++
		STORE *COUNTER = r

	spin_unlock(*B)

Although this is a strange locking construct, it should work.

It seems to be understood, but not documented, that spin_is_locked() is
not a memory barrier, so in the examples above and below the caller
inserts its own memory barrier before acting on the result of
spin_is_locked().

For now we assume spin_is_locked() is implemented as below, and we break
it out in our examples:

	bool spin_is_locked(*LOCK) {
		LOAD l = *LOCK
		return l.locked
	}

Our intuition is that there should be no problem even if the two code
sequences run simultaneously such as:

	CPU 0			CPU 1
	==================================================
	spin_lock(*A)		spin_lock(*B)
	LOAD b = *B		LOAD a = *A
	if b.locked # true	if a.locked # true
	# nothing		# nothing
	spin_unlock(*A)		spin_unlock(*B)

If one CPU gets the lock before the other then it will do the update and
the other CPU will back off:

	CPU 0			CPU 1
	==================================================
	spin_lock(*A)
	LOAD b = *B
				spin_lock(*B)
	if b.locked # false	LOAD a = *A
	else			if a.locked # true
	smp_mb()		# nothing
	LOAD r1 = *COUNTER	spin_unlock(*B)
	r1++
	STORE *COUNTER = r1
	spin_unlock(*A)

However in reality spin_lock() itself is not indivisible. On powerpc we
implement it as a load-and-reserve and store-conditional.

Ignoring the retry logic for the lost reservation case, it boils down to:
	spin_lock(*LOCK) {
		LOAD l = *LOCK
		l.locked = true
		STORE *LOCK = l
		ACQUIRE_BARRIER
	}

The ACQUIRE_BARRIER is required to give spin_lock() ACQUIRE semantics as
defined in memory-barriers.txt:

     This acts as a one-way permeable barrier.  It guarantees that all
     memory operations after the ACQUIRE operation will appear to happen
     after the ACQUIRE operation with respect to the other components of
     the system.

On modern powerpc systems we use lwsync for ACQUIRE_BARRIER. lwsync is
also know as "lightweight sync", or "sync 1".

As described in Power ISA v2.07 section B.2.1.1, in this scenario the
lwsync is not the barrier itself. It instead causes the LOAD of *LOCK to
act as the barrier, preventing any loads or stores in the locked region
from occurring prior to the load of *LOCK.

Whether this behaviour is in accordance with the definition of ACQUIRE
semantics in memory-barriers.txt is open to discussion, we may switch to
a different barrier in future.

What this means in practice is that the following can occur:

	CPU 0			CPU 1
	==================================================
	LOAD a = *A 		LOAD b = *B
	a.locked = true		b.locked = true
	LOAD b = *B		LOAD a = *A
	STORE *A = a		STORE *B = b
	if b.locked # false	if a.locked # false
	else			else
	smp_mb()		smp_mb()
	LOAD r1 = *COUNTER	LOAD r2 = *COUNTER
	r1++			r2++
	STORE *COUNTER = r1
				STORE *COUNTER = r2	# Lost update
	spin_unlock(*A)		spin_unlock(*B)

That is, the load of *B can occur prior to the store that makes *A
visibly locked. And similarly for CPU 1. The result is both CPUs hold
their lock and believe the other lock is unlocked.

The easiest fix for this is to add a full memory barrier to the start of
spin_is_locked(), so adding to our previous definition would give us:

	bool spin_is_locked(*LOCK) {
		smp_mb()
		LOAD l = *LOCK
		return l.locked
	}

The new barrier orders the store to the lock we are locking vs the load
of the other lock:

	CPU 0			CPU 1
	==================================================
	LOAD a = *A 		LOAD b = *B
	a.locked = true		b.locked = true
	STORE *A = a		STORE *B = b
	smp_mb()		smp_mb()
	LOAD b = *B		LOAD a = *A
	if b.locked # true	if a.locked # true
	# nothing		# nothing
	spin_unlock(*A)		spin_unlock(*B)

Although the above example is theoretical, there is code similar to this
example in sem_lock() in ipc/sem.c. This commit in addition to the next
commit appears to be a fix for crashes we are seeing in that code where
we believe this race happens in practice.

Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
2014-08-13 15:13:26 +10:00
..
boot Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2014-08-07 08:50:34 -07:00
configs Here are the PPC and ARM changes for KVM, which I separated because 2014-08-07 11:35:30 -07:00
crypto powerpc: Fix compile of sha1-powerpc-asm.S on 32-bit 2013-03-05 16:56:26 +11:00
include powerpc: Add smp_mb() to arch_spin_is_locked() 2014-08-13 15:13:26 +10:00
kernel powerpc: Fix "attempt to move .org backwards" error 2014-08-13 15:13:25 +10:00
kvm Here are the PPC and ARM changes for KVM, which I separated because 2014-08-07 11:35:30 -07:00
lib powerpc: Remove power3 from comments 2014-07-28 14:10:26 +10:00
math-emu powerpc: Correct emulated mtfsf instruction 2014-04-07 10:33:11 +10:00
mm powerpc/nohash: Split __early_init_mmu() into boot and secondary 2014-08-13 15:13:25 +10:00
net net: filter: split 'struct sk_filter' into socket and bpf parts 2014-08-02 15:03:58 -07:00
oprofile powerpc: Remove oprofile RS64 support 2014-07-28 14:10:25 +10:00
perf Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/benh/powerpc 2014-08-07 08:50:34 -07:00
platforms arch/powerpc: replace obsolete strict_strto* calls 2014-08-08 15:57:28 -07:00
sysdev Merge remote-tracking branch 'scott/next' into next 2014-08-05 14:13:41 +10:00
xmon powerpc: Remove MMU_FTR_SLB 2014-07-28 14:10:23 +10:00
Kconfig kexec: load and relocate purgatory at kernel load time 2014-08-08 15:57:32 -07:00
Kconfig.debug Patch queue for ppc - 2014-08-01 2014-08-05 09:58:11 +02:00
Makefile Merge branch 'merge' into next 2014-05-28 13:30:12 +10:00
relocs_check.pl Fix warning typo "CONFIG_RELCOATABLE" 2013-05-29 15:11:30 +02:00