linux/arch/powerpc/lib
Anton Blanchard 15c2d45d17 powerpc: Add 64bit optimised memcmp
I noticed ksm spending quite a lot of time in memcmp on a large
KVM box. The current memcmp loop is very unoptimised - byte at a
time compares with no loop unrolling. We can do much much better.

Optimise the loop in a few ways:

- Unroll the byte at a time loop

- For large (at least 32 byte) comparisons that are also 8 byte
  aligned, use an unrolled modulo scheduled loop using 8 byte
  loads. This is similar to our glibc memcmp.

A simple microbenchmark testing 10000000 iterations of an 8192 byte
memcmp was used to measure the performance:

baseline:	29.93 s

modified:	 1.70 s

Just over 17x faster.

v2: Incorporated some suggestions from Segher:

- Use andi. instead of rdlicl.

- Convert bdnzt eq, to bdnz. It's just duplicating the earlier compare
  and was a relic from a previous version.

- Don't use cr5, we have plans to use that CR field for fast local
  atomics.

Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Michael Ellerman <mpe@ellerman.id.au>
2015-01-23 14:02:55 +11:00
..
alloc.c powerpc: Remove more traces of bootmem 2014-11-19 21:41:51 +11:00
checksum_32.S powerpc: Rename files to have consistent _32/_64 suffixes 2005-10-10 21:52:43 +10:00
checksum_64.S powerpc: Restore registers on error exit from csum_partial_copy_generic() 2013-10-03 17:22:42 +10:00
checksum_wrappers_64.c powerpc: various straight conversions from module.h --> export.h 2011-10-31 19:30:44 -04:00
code-patching.c powerpc: Move the patch_exception to a common place 2013-12-02 14:06:54 +11:00
copy_32.S powerpc: Fix incorrect .stabs entry for copy_32.S 2010-09-02 14:07:34 +10:00
copypage_64.S powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC() 2014-06-05 13:20:41 +10:00
copypage_power7.S powerpc: Fix unsafe accesses to parameter area in ELFv2 2014-04-23 10:05:24 +10:00
copyuser_64.S powerpc: Remove power3 from comments 2014-07-28 14:10:26 +10:00
copyuser_power7.S powerpc: Fix comment typos 'CONFiG_ALTIVEC' 2014-10-29 14:41:49 +01:00
crtsavres.S powerpc: Add vr save/restore functions 2014-01-15 13:46:43 +11:00
div64.S powerpc: Fix a corner case in __div64_32 2005-10-20 09:37:02 +10:00
feature-fixups-test.S powerpc: Ensure the else case of feature sections will fit 2011-01-21 14:08:33 +11:00
feature-fixups.c powerpc: Make a bunch of things static 2014-09-25 23:14:41 +10:00
hweight_64.S powerpc: No need to use dot symbols when branching to a function 2014-04-23 10:05:16 +10:00
ldstfp.S powerpc: Fixes for instructions not using correct register naming 2012-07-10 19:18:16 +10:00
locks.c powerpc: Add smp_mb()s to arch_spin_unlock_wait() 2014-08-13 15:13:27 +10:00
Makefile powerpc: Add 64bit optimised memcmp 2015-01-23 14:02:55 +11:00
mem_64.S powerpc: use _GLOBAL_TOC for memmove 2014-07-22 15:56:04 +10:00
memcmp_64.S powerpc: Add 64bit optimised memcmp 2015-01-23 14:02:55 +11:00
memcpy_64.S Merge remote-tracking branch 'anton/abiv2' into next 2014-05-05 20:57:12 +10:00
memcpy_power7.S powerpc: Fix comment typos 'CONFiG_ALTIVEC' 2014-10-29 14:41:49 +01:00
ppc_ksyms.c powerpc: Move lib symbol exports into arch/powerpc/lib/ppc_ksyms.c 2014-09-25 23:14:39 +10:00
rheap.c powerpc: various straight conversions from module.h --> export.h 2011-10-31 19:30:44 -04:00
sstep.c powerpc: Fix compilation of emulate_step() 2014-11-12 15:54:29 +11:00
string_64.S powerpc: Exported functions __clear_user and copy_page use r2 so need _GLOBAL_TOC() 2014-06-05 13:20:41 +10:00
string.S powerpc: Add 64bit optimised memcmp 2015-01-23 14:02:55 +11:00
usercopy_64.c powerpc: Rename files to have consistent _32/_64 suffixes 2005-10-10 21:52:43 +10:00
vmx-helper.c powerpc: POWER7 optimised copy_page using VMX and enhanced prefetch 2012-07-03 14:14:44 +10:00
xor_vmx.c powerpc: Add VMX optimised xor for RAID5 2013-10-30 16:02:28 +11:00