linux/arch/arm
Ard Biesheuvel 86cd97ec4b crypto: arm/chacha-neon - optimize for non-block size multiples
The current NEON based ChaCha implementation for ARM is optimized for
multiples of 4x the ChaCha block size (64 bytes). This makes sense for
block encryption, but given that ChaCha is also often used in the
context of networking, it makes sense to consider arbitrary length
inputs as well.

For example, WireGuard typically uses 1420 byte packets, and performing
ChaCha encryption involves 5 invocations of chacha_4block_xor_neon()
and 3 invocations of chacha_block_xor_neon(), where the last one also
involves a memcpy() using a buffer on the stack to process the final
chunk of 1420 % 64 == 12 bytes.

Let's optimize for this case as well, by letting chacha_4block_xor_neon()
deal with any input size between 64 and 256 bytes, using NEON permutation
instructions and overlapping loads and stores. This way, the 140 byte
tail of a 1420 byte input buffer can simply be processed in one go.

This results in the following performance improvements for 1420 byte
blocks, without significant impact on power-of-2 input sizes. (Note
that Raspberry Pi is widely used in combination with a 32-bit kernel,
even though the core is 64-bit capable)

   Cortex-A8  (BeagleBone)       :   7%
   Cortex-A15 (Calxeda Midway)   :  21%
   Cortex-A53 (Raspberry Pi 3)   :   3%
   Cortex-A72 (Raspberry Pi 4)   :  19%

Cc: Eric Biggers <ebiggers@google.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
2020-11-13 20:38:44 +11:00
..
boot ARM: Devicetree updates 2020-10-24 10:44:18 -07:00
common ARM/sa1111: add a missing include of dma-map-ops.h 2020-10-20 09:40:33 +02:00
configs ARM: SoC defconfig updates 2020-10-24 10:53:04 -07:00
crypto crypto: arm/chacha-neon - optimize for non-block size multiples 2020-11-13 20:38:44 +11:00
include treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
kernel treewide: Convert macro and uses of __section(foo) to __section("foo") 2020-10-25 14:51:49 -07:00
lib arm: propagate the calling convention changes down to csum_partial_copy_from_user() 2020-08-20 15:45:16 -04:00
mach-actions ARM: actions: Drop unneeded select of COMMON_CLK 2020-05-15 23:27:37 +02:00
mach-alpine ARM: alpine: Drop unneeded select of HAVE_SMP 2020-05-15 23:27:37 +02:00
mach-artpec
mach-asm9260 ARM: asm9260: Drop unneeded select of GENERIC_CLOCKEVENTS 2020-05-15 23:27:37 +02:00
mach-aspeed ARM: aspeed: Drop unneeded select of HAVE_SMP 2020-05-15 23:27:37 +02:00
mach-at91 ARM: at91: pm: remove unnecessary at91sam9x60_idle 2020-08-17 11:18:59 +02:00
mach-axxia
mach-bcm ARM: bcm: Enable BCM7038_L1_IRQ for ARCH_BRCMSTB 2020-08-17 09:20:34 -07:00
mach-berlin ARM: berlin: Drop unneeded select of HAVE_SMP 2020-05-15 23:27:37 +02:00
mach-clps711x ARM: clps711x: Drop unneeded select of multi-platform selected options 2020-05-15 23:27:37 +02:00
mach-cns3xxx
mach-davinci ARM: SoC platform updates 2020-10-24 10:33:08 -07:00
mach-digicolor
mach-dove
mach-ebsa110 mm: don't include asm/pgtable.h if linux/mm.h is already included 2020-06-09 09:39:13 -07:00
mach-efm32
mach-ep93xx treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
mach-exynos Samsung mach/soc changes for v5.10 2020-09-26 12:55:43 -07:00
mach-footbridge treewide: replace '---help---' in Kconfig files with 'help' 2020-06-14 01:57:21 +09:00
mach-gemini
mach-highbank dma-mapping: split <linux/dma-mapping.h> 2020-10-06 07:07:03 +02:00
mach-hisi ARM: hisi: add support for SD5203 SoC 2020-09-30 09:56:03 +08:00
mach-imx ARM: SoC platform updates 2020-10-24 10:33:08 -07:00
mach-integrator mm: reorder includes after introduction of linux/pgtable.h 2020-06-09 09:39:13 -07:00
mach-iop32x mm: don't include asm/pgtable.h if linux/mm.h is already included 2020-06-09 09:39:13 -07:00
mach-ixp4xx ARM/ixp4xx: add a missing include of dma-map-ops.h 2020-10-13 13:28:22 +02:00
mach-keystone dma-mapping: introduce DMA range map, supplanting dma_pfn_offset 2020-09-17 18:43:56 +02:00
mach-lpc18xx
mach-lpc32xx
mach-mediatek ARM: mediatek: Replace <linux/clk-provider.h> by <linux/of_clk.h> 2020-05-15 22:55:06 +02:00
mach-meson
mach-milbeaut
mach-mmp treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
mach-moxart
mach-mstar ARM: mstar: Select MStar intc 2020-10-03 12:47:56 -07:00
mach-mv78xx0
mach-mvebu dma-mapping: split <linux/dma-mapping.h> 2020-10-06 07:07:03 +02:00
mach-mxs
mach-nomadik
mach-npcm
mach-nspire
mach-omap1 ARM: SoC platform updates 2020-10-24 10:33:08 -07:00
mach-omap2 ARM: SoC platform updates 2020-10-24 10:33:08 -07:00
mach-orion5x treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
mach-oxnas
mach-picoxcell
mach-prima2 ARM: prima2: Drop unneeded select of HAVE_SMP 2020-05-15 23:27:38 +02:00
mach-pxa power: supply: gpio-charger: Convert to GPIO descriptors 2020-08-27 16:47:14 +02:00
mach-qcom
mach-rda
mach-realtek
mach-realview VExpress modularization 2020-05-15 23:04:40 +02:00
mach-rockchip
mach-rpc treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
mach-s3c ARM: SoC platform updates 2020-10-24 10:33:08 -07:00
mach-s5pv210 ARM: s5pv210: use private pm save/restore 2020-08-19 21:33:11 +02:00
mach-sa1100 power: supply: gpio-charger: Convert to GPIO descriptors 2020-08-27 16:47:14 +02:00
mach-shmobile ARM: SoC platform updates 2020-10-24 10:33:08 -07:00
mach-socfpga ARM: socfpga: PM: add missing put_device() call in socfpga_setup_ocram_self_refresh() 2020-07-28 13:57:36 -05:00
mach-spear
mach-sti Revert "ARM: sti: Implement dummy L2 cache's write_sec" 2020-06-28 14:46:54 +02:00
mach-stm32 ARM: stm32: Replace HTTP links with HTTPS ones 2020-10-03 12:38:54 -07:00
mach-sunxi
mach-tango
mach-tegra treewide: Use fallthrough pseudo-keyword 2020-08-23 17:36:59 -05:00
mach-u300
mach-uniphier
mach-ux500
mach-versatile VExpress modularization 2020-05-15 23:04:40 +02:00
mach-vexpress Revert "ARM: vexpress: Don't select VEXPRESS_CONFIG" 2020-05-28 12:30:33 +02:00
mach-vt8500
mach-zx
mach-zynq mm: reorder includes after introduction of linux/pgtable.h 2020-06-09 09:39:13 -07:00
mm ARM development for 5.10-rc1: 2020-10-20 09:18:31 -07:00
net
nwfpe
oprofile
plat-omap PM: AVS: smartreflex Move driver to soc specific drivers 2020-10-16 18:28:43 +02:00
plat-orion ARM: orion/gpio: Make use of for_each_requested_gpio() 2020-07-18 22:49:23 +02:00
plat-pxa
plat-versatile
probes arm: kprobes: Use generic kretprobe trampoline handler 2020-09-08 11:52:32 +02:00
tools mm/madvise: introduce process_madvise() syscall: an external memory hinting API 2020-10-18 09:27:10 -07:00
vdso kbuild: explicitly specify the build id style 2020-10-09 23:57:30 +09:00
vfp ARM: 8991/1: use VFP assembler mnemonics if available 2020-07-21 16:33:39 +01:00
xen dma-mapping updates for 5.10 2020-10-15 14:43:29 -07:00
Kbuild ARM: 8981/1: add arch/arm/Kbuild 2020-07-21 16:33:35 +01:00
Kconfig ARM: SoC platform updates 2020-10-24 10:33:08 -07:00
Kconfig-nommu
Kconfig.assembler ARM: 8991/1: use VFP assembler mnemonics if available 2020-07-21 16:33:39 +01:00
Kconfig.debug ARM: SoC platform updates 2020-10-24 10:33:08 -07:00
Makefile ARM: SoC platform updates 2020-10-24 10:33:08 -07:00