linux/arch
Cyril Bur 70fe3d980f powerpc: Restore FPU/VEC/VSX if previously used
Currently the FPU, VEC and VSX facilities are lazily loaded. This is not
a problem unless a process is using these facilities.

Modern versions of GCC are very good at automatically vectorising code,
new and modernised workloads make use of floating point and vector
facilities, even the kernel makes use of vectorised memcpy.

All this combined greatly increases the cost of a syscall since the
kernel uses the facilities sometimes even in syscall fast-path making it
increasingly common for a thread to take an *_unavailable exception soon
after a syscall, not to mention potentially taking all three.

The obvious overcompensation to this problem is to simply always load
all the facilities on every exit to userspace. Loading up all FPU, VEC
and VSX registers every time can be expensive and if a workload does
avoid using them, it should not be forced to incur this penalty.

An 8bit counter is used to detect if the registers have been used in the
past and the registers are always loaded until the value wraps to back
to zero.

Several versions of the assembly in entry_64.S were tested:

  1. Always calling C.
  2. Performing a common case check and then calling C.
  3. A complex check in asm.

After some benchmarking it was determined that avoiding C in the common
case is a performance benefit (option 2). The full check in asm (option
3) greatly complicated that codepath for a negligible performance gain
and the trade-off was deemed not worth it.

Signed-off-by: Cyril Bur <cyrilbur@gmail.com>
[mpe: Move load_vec in the struct to fill an existing hole, reword change log]
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>

fixup
2016-03-02 23:34:48 +11:00
..
alpha dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
arc dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
arm ARM: multi_v7_defconfig: enable DW_WATCHDOG 2016-02-04 13:25:33 -08:00
arm64 ARM: SoC fixes for v4.5-rc 2016-02-07 15:23:20 -08:00
avr32 dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
blackfin dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
c6x dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
cris Merge branch 'akpm' (patches from Andrew) 2016-01-21 12:32:08 -08:00
frv dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
h8300 Merge branch 'akpm' (patches from Andrew) 2016-01-21 12:32:08 -08:00
hexagon dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
ia64 [IA64] Enable copy_file_range syscall for ia64 2016-01-22 14:20:01 -08:00
m32r m32r: fix build failure due to SMP and MMU 2016-02-05 18:10:40 -08:00
m68k dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
metag dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
microblaze dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
mips Revert "MIPS: bcm63xx: nvram: Remove unused bcm63xx_nvram_get_psi_size() function" 2016-01-27 20:51:50 +01:00
mn10300 dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
nios2 dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
openrisc dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
parisc dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
powerpc powerpc: Restore FPU/VEC/VSX if previously used 2016-03-02 23:34:48 +11:00
s390 Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/s390/linux 2016-01-29 16:05:18 -08:00
score
sh sh: fix smp_store_mb for !SMP 2016-01-26 10:18:29 +02:00
sparc dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
tile dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
um um: asm/page.h: remove the pte_high member from struct pte_t 2016-02-05 18:10:40 -08:00
unicore32 dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00
x86 mm, hugetlb: don't require CMA for runtime gigantic pages 2016-02-05 18:10:40 -08:00
xtensa dma-mapping: remove <asm-generic/dma-coherent.h> 2016-01-20 17:09:18 -08:00
.gitignore
Kconfig dma-mapping: always provide the dma_map_ops based implementation 2016-01-20 17:09:18 -08:00