linux/arch/x86/lib
Borislav Petkov 48c7a2509f x86/alternatives: Make JMPs more robust
Up until now we had to pay attention to relative JMPs in alternatives
about how their relative offset gets computed so that the jump target
is still correct. Or, as it is the case for near CALLs (opcode e8), we
still have to go and readjust the offset at patching time.

What is more, the static_cpu_has_safe() facility had to forcefully
generate 5-byte JMPs since we couldn't rely on the compiler to generate
properly sized ones so we had to force the longest ones. Worse than
that, sometimes it would generate a replacement JMP which is longer than
the original one, thus overwriting the beginning of the next instruction
at patching time.

So, in order to alleviate all that and make using JMPs more
straight-forward we go and pad the original instruction in an
alternative block with NOPs at build time, should the replacement(s) be
longer. This way, alternatives users shouldn't pay special attention
so that original and replacement instruction sizes are fine but the
assembler would simply add padding where needed and not do anything
otherwise.

As a second aspect, we go and recompute JMPs at patching time so that we
can try to make 5-byte JMPs into two-byte ones if possible. If not, we
still have to recompute the offsets as the replacement JMP gets put far
away in the .altinstr_replacement section leading to a wrong offset if
copied verbatim.

For example, on a locally generated kernel image

  old insn VA: 0xffffffff810014bd, CPU feat: X86_FEATURE_ALWAYS, size: 2
  __switch_to:
   ffffffff810014bd:      eb 21                   jmp ffffffff810014e0
  repl insn: size: 5
  ffffffff81d0b23c:       e9 b1 62 2f ff          jmpq ffffffff810014f2

gets corrected to a 2-byte JMP:

  apply_alternatives: feat: 3*32+21, old: (ffffffff810014bd, len: 2), repl: (ffffffff81d0b23c, len: 5)
  alt_insn: e9 b1 62 2f ff
  recompute_jumps: next_rip: ffffffff81d0b241, tgt_rip: ffffffff810014f2, new_displ: 0x00000033, ret len: 2
  converted to: eb 33 90 90 90

and a 5-byte JMP:

  old insn VA: 0xffffffff81001516, CPU feat: X86_FEATURE_ALWAYS, size: 2
  __switch_to:
   ffffffff81001516:      eb 30                   jmp ffffffff81001548
  repl insn: size: 5
   ffffffff81d0b241:      e9 10 63 2f ff          jmpq ffffffff81001556

gets shortened into a two-byte one:

  apply_alternatives: feat: 3*32+21, old: (ffffffff81001516, len: 2), repl: (ffffffff81d0b241, len: 5)
  alt_insn: e9 10 63 2f ff
  recompute_jumps: next_rip: ffffffff81d0b246, tgt_rip: ffffffff81001556, new_displ: 0x0000003e, ret len: 2
  converted to: eb 3e 90 90 90

... and so on.

This leads to a net win of around

40ish replacements * 3 bytes savings =~ 120 bytes of I$

on an AMD guest which means some savings of precious instruction cache
bandwidth. The padding to the shorter 2-byte JMPs are single-byte NOPs
which on smart microarchitectures means discarding NOPs at decode time
and thus freeing up execution bandwidth.

Signed-off-by: Borislav Petkov <bp@suse.de>
2015-02-23 13:44:11 +01:00
..
.gitignore x86: Gitignore: arch/x86/lib/inat-tables.c 2009-11-04 13:11:28 +01:00
atomic64_32.c x86: Adjust asm constraints in atomic64 wrappers 2012-01-20 17:29:31 -08:00
atomic64_386_32.S x86: atomic64 assembly improvements 2012-01-20 17:29:49 -08:00
atomic64_cx8_32.S x86: atomic64 assembly improvements 2012-01-20 17:29:49 -08:00
cache-smp.c x86, lib: Add wbinvd smp helpers 2010-01-22 16:05:42 -08:00
checksum_32.S x86/lib: Fix spelling, put space between a numeral and its units 2013-04-15 11:40:32 +02:00
clear_page_64.S x86/alternatives: Add instruction padding 2015-02-23 13:44:00 +01:00
cmdline.c x86, boot: Carve out early cmdline parsing function 2014-05-20 20:21:24 -07:00
cmpxchg8b_emu.S x86: Improve cmpxchg8b_emu.S 2014-10-08 10:05:49 +02:00
cmpxchg16b_emu.S x86: Improve cmpxchg16b_emu.S 2014-10-08 10:05:49 +02:00
copy_page_64.S x86/alternatives: Add instruction padding 2015-02-23 13:44:00 +01:00
copy_user_64.S x86/alternatives: Make JMPs more robust 2015-02-23 13:44:11 +01:00
copy_user_nocache_64.S x86, smap: Add STAC and CLAC instructions to control user space access 2012-09-21 12:45:27 -07:00
csum-copy_64.S x86, extable: Remove open-coded exception table entries in arch/x86/lib/csum-copy_64.S 2012-04-20 13:51:39 -07:00
csum-partial_64.c x86: Fix common misspellings 2011-03-18 10:39:30 +01:00
csum-wrappers_64.c x86-64: make csum_partial_copy_from_user() error handling consistent 2014-11-16 11:00:42 -08:00
delay.c x86: Delete non-required instances of include <linux/init.h> 2014-01-06 21:25:18 -08:00
getuser.S x86: Be consistent with data size in getuser.S 2013-02-11 23:14:48 -08:00
inat.c x86: Fix to decode grouped AVX with VEX pp bits 2012-02-11 15:11:35 +01:00
insn.c x86: Fix off-by-one in instruction decoder 2015-01-09 11:12:26 +01:00
iomap_copy_64.S
Makefile net, lib: kill arch_fast_hash library bits 2014-12-10 15:17:46 -05:00
memcpy_32.c asmlinkage, x86: Fix 32bit memcpy for LTO 2014-02-13 18:14:46 -08:00
memcpy_64.S x86/alternatives: Add instruction padding 2015-02-23 13:44:00 +01:00
memmove_64.S x86/alternatives: Add instruction padding 2015-02-23 13:44:00 +01:00
memset_64.S x86/alternatives: Add instruction padding 2015-02-23 13:44:00 +01:00
misc.c x86/boot: Further compress CPUs bootup message 2013-10-01 10:52:30 +02:00
mmx_32.c x86: clean up mmx_32.c 2008-04-17 17:40:47 +02:00
msr-reg-export.c x86, pvops: Remove hooks for {rd,wr}msr_safe_regs 2012-06-07 11:41:08 -07:00
msr-reg.S x86, pvops: Remove hooks for {rd,wr}msr_safe_regs 2012-06-07 11:41:08 -07:00
msr-smp.c x86 / msr: add 64bit _on_cpu access functions 2013-10-17 00:36:06 +02:00
msr.c x86: Fix typo preventing msr_set/clear_bit from having an effect 2014-05-09 08:42:32 -07:00
putuser.S x86, smap: Add STAC and CLAC instructions to control user space access 2012-09-21 12:45:27 -07:00
rwsem.S x86: Unify rwsem assembly implementation 2011-07-21 09:03:32 +02:00
string_32.c x86/i386: Use less assembly in strlen(), speed things up a bit 2011-12-12 18:33:42 +01:00
strstr_32.c x86: coding style fixes to arch/x86/lib/strstr_32.c 2008-08-15 16:53:24 +02:00
thunk_32.S x86: Unwind-annotate thunk_32.S 2014-10-08 12:31:45 +02:00
thunk_64.S x86: Speed up ___preempt_schedule*() by using THUNK helpers 2014-09-24 15:15:38 +02:00
usercopy_32.c x86: Unify copy_to_user() and add size checking to it 2013-10-26 12:27:37 +02:00
usercopy_64.c x86, asmlinkage: Make several variables used from assembler/linker script visible 2013-08-06 14:20:13 -07:00
usercopy.c perf: Fix arch_perf_out_copy_user default 2013-11-06 12:34:25 +01:00
x86-opcode-map.txt x86, mpx: Add MPX related opcodes to the x86 opcode map 2014-01-17 11:04:09 -08:00