linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-16 00:34:20 +08:00

History

Ard Biesheuvel 86cd97ec4b crypto: arm/chacha-neon - optimize for non-block size multiples The current NEON based ChaCha implementation for ARM is optimized for multiples of 4x the ChaCha block size (64 bytes). This makes sense for block encryption, but given that ChaCha is also often used in the context of networking, it makes sense to consider arbitrary length inputs as well. For example, WireGuard typically uses 1420 byte packets, and performing ChaCha encryption involves 5 invocations of chacha_4block_xor_neon() and 3 invocations of chacha_block_xor_neon(), where the last one also involves a memcpy() using a buffer on the stack to process the final chunk of 1420 % 64 == 12 bytes. Let's optimize for this case as well, by letting chacha_4block_xor_neon() deal with any input size between 64 and 256 bytes, using NEON permutation instructions and overlapping loads and stores. This way, the 140 byte tail of a 1420 byte input buffer can simply be processed in one go. This results in the following performance improvements for 1420 byte blocks, without significant impact on power-of-2 input sizes. (Note that Raspberry Pi is widely used in combination with a 32-bit kernel, even though the core is 64-bit capable) Cortex-A8 (BeagleBone) : 7% Cortex-A15 (Calxeda Midway) : 21% Cortex-A53 (Raspberry Pi 3) : 3% Cortex-A72 (Raspberry Pi 4) : 19% Cc: Eric Biggers <ebiggers@google.com> Cc: "Jason A . Donenfeld" <Jason@zx2c4.com> Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>		2020-11-13 20:38:44 +11:00
..
alpha	arch-cleanup-2020-10-22	2020-10-23 10:06:38 -07:00
arc	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
arm	crypto: arm/chacha-neon - optimize for non-block size multiples	2020-11-13 20:38:44 +11:00
arm64	crypto: arm64/poly1305-neon - reorder PAC authentication with SP update	2020-11-06 14:29:11 +11:00
c6x	arch-cleanup-2020-10-22	2020-10-23 10:06:38 -07:00
csky	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
h8300	arch-cleanup-2020-10-22	2020-10-23 10:06:38 -07:00
hexagon	arch-cleanup-2020-10-22	2020-10-23 10:06:38 -07:00
ia64	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
m68k	arch-cleanup-2020-10-22	2020-10-23 10:06:38 -07:00
microblaze	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
mips	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
nds32	arch-cleanup-2020-10-22	2020-10-23 10:06:38 -07:00
nios2	arch-cleanup-2020-10-22	2020-10-23 10:06:38 -07:00
openrisc	arch-cleanup-2020-10-22	2020-10-23 10:06:38 -07:00
parisc	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
powerpc	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
riscv	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
s390	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
sh	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
sparc	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
um	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
x86	crypto: hash - Use memzero_explicit() for clearing state	2020-10-30 17:35:03 +11:00
xtensa	treewide: Convert macro and uses of __section(foo) to __section("foo")	2020-10-25 14:51:49 -07:00
.gitignore
Kconfig	Merge branch 'work.set_fs' of git://git.kernel.org/pub/scm/linux/kernel/git/viro/vfs	2020-10-22 09:59:21 -07:00