linux/arch
Andy Lutomirski 2a23c6b8a9 x86_64, entry: Use sysret to return to userspace when possible
The x86_64 entry code currently jumps through complex and
inconsistent hoops to try to minimize the impact of syscall exit
work.  For a true fast-path syscall, almost nothing needs to be
done, so returning is just a check for exit work and sysret.  For a
full slow-path return from a syscall, the C exit hook is invoked if
needed and we join the iret path.

Using iret to return to userspace is very slow, so the entry code
has accumulated various special cases to try to do certain forms of
exit work without invoking iret.  This is error-prone, since it
duplicates assembly code paths, and it's dangerous, since sysret
can malfunction in interesting ways if used carelessly.  It's
also inefficient, since a lot of useful cases aren't optimized
and therefore force an iret out of a combination of paranoia and
the fact that no one has bothered to write even more asm code
to avoid it.

I would argue that this approach is backwards.  Rather than trying
to avoid the iret path, we should instead try to make the iret path
fast.  Under a specific set of conditions, iret is unnecessary.  In
particular, if RIP==RCX, RFLAGS==R11, RIP is canonical, RF is not
set, and both SS and CS are as expected, then
movq 32(%rsp),%rsp;sysret does the same thing as iret.  This set of
conditions is nearly always satisfied on return from syscalls, and
it can even occasionally be satisfied on return from an irq.

Even with the careful checks for sysret applicability, this cuts
nearly 80ns off of the overhead from syscalls with unoptimized exit
work.  This includes tracing and context tracking, and any return
that invokes KVM's user return notifier.  For example, the cost of
getpid with CONFIG_CONTEXT_TRACKING_FORCE=y drops from ~360ns to
~280ns on my computer.

This may allow the removal and even eventual conversion to C
of a respectable amount of exit asm.

This may require further tweaking to give the full benefit on Xen.

It may be worthwhile to adjust signal delivery and exec to try hit
the sysret path.

This does not optimize returns to 32-bit userspace.  Making the same
optimization for CS == __USER32_CS is conceptually straightforward,
but it will require some tedious code to handle the differences
between sysretl and sysexitl.

Link: http://lkml.kernel.org/r/71428f63e681e1b4aa1a781e3ef7c27f027d1103.1421453410.git.luto@amacapital.net
Signed-off-by: Andy Lutomirski <luto@amacapital.net>
2015-02-01 04:03:01 -08:00
..
alpha arch: Cleanup read_barrier_depends() and comments 2014-12-11 21:15:05 -05:00
arc Minor updates for ARC for 3.19 2014-12-18 16:26:41 -08:00
arm ARM: SoC fixes 2015-01-18 18:00:40 +12:00
arm64 arm64 fixes: 2015-01-17 08:01:21 +13:00
avr32 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-12-11 14:27:06 -08:00
blackfin arch/blackfin/mach-bf533/boards/stamp.c: add linux/delay.h 2015-01-08 15:10:52 -08:00
c6x net, lib: kill arch_fast_hash library bits 2014-12-10 15:17:46 -05:00
cris CRISv32: Remove last remnants of ETRAX_SPI_MMC_BOARD 2014-12-20 00:06:13 +01:00
frv Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-12-11 14:27:06 -08:00
hexagon Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/rkuo/linux-hexagon-kernel 2014-12-19 17:57:51 -08:00
ia64 Merge branches 'acpi-pm', 'acpi-processor' and 'acpi-video' 2015-01-06 23:35:43 +01:00
m32r Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-12-11 14:27:06 -08:00
m68k m68k: Wire up execveat 2015-01-11 11:14:14 +01:00
metag arch: Add lightweight memory barriers dma_rmb() and dma_wmb() 2014-12-11 21:15:06 -05:00
microblaze Microblaze patches for 3.19-rc1 2014-12-17 09:54:05 -08:00
mips kernel: Provide READ_ONCE and ASSIGN_ONCE 2014-12-20 16:48:59 -08:00
mn10300 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-12-11 14:27:06 -08:00
nios2 nios2: Use preempt_schedule_irq 2014-12-31 11:04:58 +08:00
openrisc net, lib: kill arch_fast_hash library bits 2014-12-10 15:17:46 -05:00
parisc Merge branch 'parisc-3.19-1' of git://git.kernel.org/pub/scm/linux/kernel/git/deller/parisc-linux 2014-12-26 13:41:05 -08:00
powerpc powerpc: Work around gcc bug in current_thread_info() 2015-01-12 16:40:02 +11:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2015-01-15 10:50:29 +13:00
score net, lib: kill arch_fast_hash library bits 2014-12-10 15:17:46 -05:00
sh PM: Eliminate CONFIG_PM_RUNTIME 2014-12-19 22:55:06 +01:00
sparc sparc32: destroy_context() and switch_mm() needs to disable interrupts. 2014-12-18 12:47:54 -05:00
tile Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile 2014-12-16 13:54:16 -08:00
um um: Skip futex_atomic_cmpxchg_inatomic() test 2015-01-04 14:20:26 +01:00
unicore32 Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next 2014-12-11 14:27:06 -08:00
x86 x86_64, entry: Use sysret to return to userspace when possible 2015-02-01 04:03:01 -08:00
xtensa Xtensa fixes for 3.19: 2014-12-16 14:08:53 -08:00
.gitignore
Kconfig