linux/kernel/locking
Waiman Long d78045306c locking/pvqspinlock, x86: Optimize the PV unlock code path
The unlock function in queued spinlocks was optimized for better
performance on bare metal systems at the expense of virtualized guests.

For x86-64 systems, the unlock call needs to go through a
PV_CALLEE_SAVE_REGS_THUNK() which saves and restores 8 64-bit
registers before calling the real __pv_queued_spin_unlock()
function. The thunk code may also be in a separate cacheline from
__pv_queued_spin_unlock().

This patch optimizes the PV unlock code path by:

 1) Moving the unlock slowpath code from the fastpath into a separate
    __pv_queued_spin_unlock_slowpath() function to make the fastpath
    as simple as possible..

 2) For x86-64, hand-coded an assembly function to combine the register
    saving thunk code with the fastpath code. Only registers that
    are used in the fastpath will be saved and restored. If the
    fastpath fails, the slowpath function will be called via another
    PV_CALLEE_SAVE_REGS_THUNK(). For 32-bit, it falls back to the C
    __pv_queued_spin_unlock() code as the thunk saves and restores
    only one 32-bit register.

With a microbenchmark of 5M lock-unlock loop, the table below shows
the execution times before and after the patch with different number
of threads in a VM running on a 32-core Westmere-EX box with x86-64
4.2-rc1 based kernels:

  Threads	Before patch	After patch	% Change
  -------	------------	-----------	--------
     1		   134.1 ms	  119.3 ms	  -11%
     2		   1286  ms	   953  ms	  -26%
     3		   3715  ms	  3480  ms	  -6.3%
     4		   4092  ms	  3764  ms	  -8.0%

Signed-off-by: Waiman Long <Waiman.Long@hpe.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Davidlohr Bueso <dave@stgolabs.net>
Cc: Douglas Hatch <doug.hatch@hpe.com>
Cc: H. Peter Anvin <hpa@zytor.com>
Cc: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Paul E. McKenney <paulmck@linux.vnet.ibm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Cc: Scott J Norton <scott.norton@hpe.com>
Cc: Thomas Gleixner <tglx@linutronix.de>
Link: http://lkml.kernel.org/r/1447114167-47185-5-git-send-email-Waiman.Long@hpe.com
Signed-off-by: Ingo Molnar <mingo@kernel.org>
2015-11-23 10:02:02 +01:00
..
lglock.c sched/stop_machine: Fix deadlock between multiple stop_two_cpus() 2015-06-19 10:03:12 +02:00
lockdep_internals.h lockdep: Increase static allocations 2014-04-18 14:20:50 +02:00
lockdep_proc.c lockdep: Fix a race between /proc/lock_stat and module unload 2015-06-07 15:46:30 +02:00
lockdep_states.h
lockdep.c mm, page_alloc: distinguish between being unable to sleep, unwilling to sleep and avoiding waking kswapd 2015-11-06 17:50:42 -08:00
locktorture.c Merge branches 'doc.2015.10.06a', 'percpu-rwsem.2015.10.06a' and 'torture.2015.10.06a' into HEAD 2015-10-07 16:06:25 -07:00
Makefile Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs 2015-09-05 20:34:28 -07:00
mcs_spinlock.h locking/mcs: Use acquire/release semantics 2015-10-06 17:28:23 +02:00
mutex-debug.c mutex: Always clear owner field upon mutex_unlock() 2015-01-09 11:20:39 +01:00
mutex-debug.h
mutex.c locking/mutex: Use acquire/release semantics 2015-10-06 17:28:20 +02:00
mutex.h locking/mutexes: Use MUTEX_SPIN_ON_OWNER when appropriate 2014-08-13 10:32:02 +02:00
osq_lock.c locking/osq: Relax atomic semantics 2015-09-18 09:27:29 +02:00
percpu-rwsem.c locking/percpu-rwsem: Clean up the lockdep annotations in percpu_down_read() 2015-10-06 11:25:40 -07:00
qrwlock.c locking/qrwlock: Rename ->lock to ->wait_lock 2015-09-18 09:27:29 +02:00
qspinlock_paravirt.h locking/pvqspinlock, x86: Optimize the PV unlock code path 2015-11-23 10:02:02 +01:00
qspinlock.c locking/qspinlock: Avoid redundant read of next pointer 2015-11-23 10:02:00 +01:00
rtmutex_common.h rtmutex: Delete scriptable tester 2015-07-20 11:45:45 +02:00
rtmutex-debug.c rtmutex: Cleanup deadlock detector debug logic 2014-06-21 22:05:30 +02:00
rtmutex-debug.h rtmutex: Cleanup deadlock detector debug logic 2014-06-21 22:05:30 +02:00
rtmutex.c Merge branch 'sched-core-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip 2015-11-03 18:03:50 -08:00
rtmutex.h rtmutex: Cleanup deadlock detector debug logic 2014-06-21 22:05:30 +02:00
rwsem-spinlock.c locking/rwsem: Document barrier need when waking tasks 2015-02-18 16:57:10 +01:00
rwsem-xadd.c locking/rwsem: Use acquire/release semantics 2015-10-06 17:28:24 +02:00
rwsem.c locking/rwsem: Set lock ownership ASAP 2015-02-18 16:57:13 +01:00
rwsem.h locking/rwsem: Set lock ownership ASAP 2015-02-18 16:57:13 +01:00
semaphore.c locking/semaphore: Resolve some shadow warnings 2014-09-04 07:17:24 +02:00
spinlock_debug.c locking: Move the spinlock code to kernel/locking/ 2013-11-06 07:55:21 +01:00
spinlock.c spinlock: Add spin_lock_bh_nested() 2015-01-03 14:32:57 -05:00