clang-16 warns about casting between incompatible function types:
arch/arm/crypto/sha256_glue.c:37:5: error: cast from 'void (*)(u32 *, const void *, unsigned int)' (aka 'void (*)(unsigned int *, const void *, unsigned int)') to 'sha256_block_fn *' (aka 'void (*)(struct sha256_state *, const unsigned char *, int)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
37 | (sha256_block_fn *)sha256_block_data_order);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
arch/arm/crypto/sha512-glue.c:34:3: error: cast from 'void (*)(u64 *, const u8 *, int)' (aka 'void (*)(unsigned long long *, const unsigned char *, int)') to 'sha512_block_fn *' (aka 'void (*)(struct sha512_state *, const unsigned char *, int)') converts to incompatible function type [-Werror,-Wcast-function-type-strict]
34 | (sha512_block_fn *)sha512_block_data_order);
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Fix the prototypes for the assembler functions to match the typedef.
The code already relies on the digest being the first part of the
state structure, so there is no change in behavior.
Fixes: c80ae7ca37 ("crypto: arm/sha512 - accelerated SHA-512 using ARM generic ASM and NEON")
Fixes: b59e2ae369 ("crypto: arm/sha256 - move SHA-224/256 ASM/NEON implementation to base layer")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Implement the ->digest method to improve performance on single-page
messages by reducing the number of indirect calls.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Instead of casting the function which upsets clang for some reason,
change the assembly function siganture instead.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304081828.zjGcFUyE-lkp@intel.com/
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Instead of casting the function which upsets clang for some reason,
change the assembly function siganture instead.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304081828.zjGcFUyE-lkp@intel.com/
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Instead of casting the function which upsets clang for some reason,
change the assembly function siganture instead.
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202304081828.zjGcFUyE-lkp@intel.com/
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
API:
- Use kmap_local instead of kmap_atomic.
- Change request callback to take void pointer.
- Print FIPS status in /proc/crypto (when enabled).
Algorithms:
- Add rfc4106/gcm support on arm64.
- Add ARIA AVX2/512 support on x86.
Drivers:
- Add TRNG driver for StarFive SoC.
- Delete ux500/hash driver (subsumed by stm32/hash).
- Add zlib support in qat.
- Add RSA support in aspeed.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmPzAiwACgkQxycdCkmx
i6et8xAAoO3w5MZFGXMzWsYhfSZFdceXBEQfDR7JOCdHxpMIQhw0FLlb0uttFk6m
SeWrdP9wiifBDoCmw7qffFJml8ZftPL/XeXjob2d9v7jKbPyw3lDSIdsNfN/5EEL
oIc9915zwrgawvahPAa+PQ4Ue03qRjUyOcV42dpd1W3NYhzDVHoK5OUU+mEFYDvx
Sgw/YUugKf0VXkVDFzG5049+CPcheyRZqclAo9jyl2eZiXujgUyV33nxRCtqIA+t
7jlHKwi+6QzFHY0CX5BvShR8xyEuH5MLoU3H/jYGXnRb3nEpRYAEO4VZchIHqF0F
Y6pKIKc6Q8OyIVY8RsjQY3hioCqYnQFZ5Xtc1zGtOYEitVLbkmItMG0mVn0XOfyt
gJDi6gkEw5uPUbEQdI4R1xEgJ8eCckMsOJ+uRxqTm+uLqNDxPbsB9bohKniMogXV
lDlVXjU23AA9VeKtqU8FvWjfgqsN47X4aoq1j4/4aI7X9F7P9FOP21TZloP7+ssj
PFrzNaRXUrMEsvyS1wqPegIh987lj6WkH4hyU0wjzaIq4IQELidHsSXFS12iWIPH
kTEoC/trAVoYSr0zXKWUCs4h/x0FztVNbjs4KiDP2FLXX1RzeVZ0WlaXZhryHr+n
1+8yCuS6tVofAbSX0wNkZdf0x5+3CIBw4kqSIvjKDPYYEfIDaT0=
=dMYe
-----END PGP SIGNATURE-----
Merge tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto update from Herbert Xu:
"API:
- Use kmap_local instead of kmap_atomic
- Change request callback to take void pointer
- Print FIPS status in /proc/crypto (when enabled)
Algorithms:
- Add rfc4106/gcm support on arm64
- Add ARIA AVX2/512 support on x86
Drivers:
- Add TRNG driver for StarFive SoC
- Delete ux500/hash driver (subsumed by stm32/hash)
- Add zlib support in qat
- Add RSA support in aspeed"
* tag 'v6.3-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (156 commits)
crypto: x86/aria-avx - Do not use avx2 instructions
crypto: aspeed - Fix modular aspeed-acry
crypto: hisilicon/qm - fix coding style issues
crypto: hisilicon/qm - update comments to match function
crypto: hisilicon/qm - change function names
crypto: hisilicon/qm - use min() instead of min_t()
crypto: hisilicon/qm - remove some unused defines
crypto: proc - Print fips status
crypto: crypto4xx - Call dma_unmap_page when done
crypto: octeontx2 - Fix objects shared between several modules
crypto: nx - Fix sparse warnings
crypto: ecc - Silence sparse warning
tls: Pass rec instead of aead_req into tls_encrypt_done
crypto: api - Remove completion function scaffolding
tls: Remove completion function scaffolding
tipc: Remove completion function scaffolding
net: ipv6: Remove completion function scaffolding
net: ipv4: Remove completion function scaffolding
net: macsec: Remove completion function scaffolding
dm: Remove completion function scaffolding
...
- Improve Kconfig help text for Cortex A8 and Cortex A9 errata
- Kconfig spelling and grammar fixes
- Allow kernel-mode VFP/Neon in softirq context
- Use Neon in softirq context
- Implement AES-CTR/GHASH version of GCM
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEuNNh8scc2k/wOAE+9OeQG+StrGQFAmP0w4IACgkQ9OeQG+St
rGRJJhAAnfBwqXA9FFToKt3dzLWUKcHM0wB0K1ABGJVovL1LZY1kDjVZ/nkJMlYn
2MCf7ImEv8k8QRRi1O3YjnAJ9JrIM2e5sEcPPzFAzcfxjdYQ7scfQZOE+4HU0i35
MxSoUp9nrF69rs4aL3sUNGoUoOmpvmMbeeYu/FTL0jWbr1ywfsn8JaXRwk9Xrfqw
R/kWbDpIYmtG8qitv6aMOlSJeagxvo9PooIgd9u2OeCkl30jfuU/nqaHwuJEPzRh
d+WYx4xC6twAORNc9odUqNOPIng2w2Tt99ChYAhvtcF5twW9baFiajK5kHL71Ykm
0y8RxdNP8aNuyP/XCABJkY87lnCNP0l4fIvWRPu+W5MWQMpdKE6+y5EK17rksk3Y
zyV1v6ca9twK1HQs13xUgIRTQ5dYYwrEoSBhcBb5KhwYdP/xqx6FmES47gsGQWBg
d6ammthp9zeMfJp/oiYvg4ZLsxSxH+kjNyqaTjJaSAsX4z8fH5onlxn+6r43tsTc
nKEqCWBNhW0M3vFghuSHacxjGfWDhBarWmdGgXSQt0MNmvcY6YcHHO9blUHkShW/
FvsdqXFJYnTgv83zQPwrzPd7IG/8ytA0bxxH9prhbdEu3Xb0XtwxGpgDFmLH7d/B
MDbda3vD319hpnxjOOSjzvcrJtJsSYBVZyVilcrRsvb6t5GQgdQ=
=Wdob
-----END PGP SIGNATURE-----
Merge tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm
Pull ARM udpates from Russell King:
- Improve Kconfig help text for Cortex A8 and Cortex A9 errata
- Kconfig spelling and grammar fixes
- Allow kernel-mode VFP/Neon in softirq context
- Use Neon in softirq context
- Implement AES-CTR/GHASH version of GCM
* tag 'for-linus' of git://git.armlinux.org.uk/~rmk/linux-arm:
ARM: 9289/1: Allow pre-ARMv5 builds with ld.lld 16.0.0 and newer
ARM: 9288/1: Kconfigs: fix spelling & grammar
ARM: 9286/1: crypto: Implement fused AES-CTR/GHASH version of GCM
ARM: 9285/1: remove meaningless arch/arm/mach-rda/Makefile
ARM: 9283/1: permit non-nested kernel mode NEON in softirq context
ARM: 9282/1: vfp: Manipulate task VFP state with softirqs disabled
ARM: 9281/1: improve Cortex A8/A9 errata help text
Commit 1d2e9b67b0 ("ARM: 9265/1: pass -march= only to compiler") added
a __thumb2__ define to ASFLAGS to avoid build errors in the crypto code,
which relies on __thumb2__ for preprocessing. Commit 59e2cf8d21 ("ARM:
9275/1: Drop '-mthumb' from AFLAGS_ISA") followed up on this by removing
-mthumb from AFLAGS so that __thumb2__ would not be defined when the
default target was ARMv7 or newer.
Unfortunately, the second commit's fix assumes that the toolchain
defaults to -mno-thumb / -marm, which is not the case for Debian's
arm-linux-gnueabihf target, which defaults to -mthumb:
$ echo | arm-linux-gnueabihf-gcc -dM -E - | grep __thumb
#define __thumb2__ 1
#define __thumb__ 1
This target is used by several CI systems, which will still see
redefined macro warnings, despite '-mthumb' not being present in the
flags:
<command-line>: warning: "__thumb2__" redefined
<built-in>: note: this is the location of the previous definition
Remove the global AFLAGS __thumb2__ define and move it to the crypto
folder where it is required by the imported OpenSSL algorithms; the rest
of the kernel should use the internal CONFIG_THUMB2_KERNEL symbol to
know whether or not Thumb2 is being used or not. Be sure that __thumb2__
is undefined first so that there are no macro redefinition warnings.
Link: https://github.com/ClangBuiltLinux/linux/issues/1772
Reported-by: "kernelci.org bot" <bot@kernelci.org>
Suggested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Fixes: 59e2cf8d21 ("ARM: 9275/1: Drop '-mthumb' from AFLAGS_ISA")
Fixes: 1d2e9b67b0 ("ARM: 9265/1: pass -march= only to compiler")
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
On 32-bit ARM, AES in GCM mode takes full advantage of the ARMv8 Crypto
Extensions when available, resulting in a performance of 6-7 cycles per
byte for typical IPsec frames on cores such as Cortex-A53, using the
generic GCM template encapsulating the accelerated AES-CTR and GHASH
implementations.
At such high rates, any time spent copying data or doing other poorly
optimized work in the generic layer hurts disproportionately, and we can
get a significant performance improvement by combining the optimized
AES-CTR and GHASH implementations into a single GCM driver.
On Cortex-A53, this results in a performance improvement of around 75%,
and AES-256-GCM-128 with RFC4106 encapsulation runs in 4 cycles per
byte.
Note that this code takes advantage of the fact that kernel mode NEON is
now supported in softirq context as well, and therefore does not provide
a non-NEON fallback path at all. (AEADs are only callable in process or
softirq context)
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Instead of casting the function which upsets clang for some reason,
change the assembly function siganture instead.
Reported-by: kernel test robot <lkp@intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The helper crypto_tfm_ctx is only used by the Crypto API algorithm
code and should really be in algapi.h. However, for historical
reasons many files relied on it to be in crypto.h. This patch
changes those files to use algapi.h instead in prepartion for a
move.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The arm architecture doesn't support CFI yet, and even if it did, the
new CFI implementation supports indirect calls to assembly functions.
Therefore, there's no need to use a wrapper function for nh_neon().
Acked-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Reviewed-by: Sami Tolvanen <samitolvanen@google.com>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The gf128mul library does not depend on the crypto API at all, so it can
be moved into lib/crypto. This will allow us to use it in other library
code in a subsequent patch without having to depend on CONFIG_CRYPTO.
While at it, change the Kconfig symbol name to align with other crypto
library implementations. However, the source file name is retained, as
it is reflected in the module .ko filename, and changing this might
break things for users.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Shorten menu titles and make them consistent:
- acronym
- name
- architecture features in parenthesis
- no suffixes like "<something> algorithm", "support", or
"hardware acceleration", or "optimized"
Simplify help text descriptions, update references, and ensure that
https references are still valid.
Signed-off-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Shorten menu titles and make them consistent:
- acronym
- name
- architecture features in parenthesis
- no suffixes like "<something> algorithm", "support", or
"hardware acceleration", or "optimized"
Simplify help text descriptions, update references, and ensure that
https references are still valid.
Signed-off-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Shorten menu titles and make them consistent:
- acronym
- name
- architecture features in parenthesis
- no suffixes like "<something> algorithm", "support", or
"hardware acceleration", or "optimized"
Simplify help text descriptions, update references, and ensure that
https references are still valid.
Signed-off-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Shorten menu titles and make them consistent:
- acronym
- name
- architecture features in parenthesis
- no suffixes like "<something> algorithm", "support", or
"hardware acceleration", or "optimized"
Simplify help text descriptions, update references, and ensure that
https references are still valid.
Signed-off-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Sort the arm entries so all like entries are together.
Signed-off-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Move ARM- and ARM64-accelerated menus into a submenu under
the Crypto API menu (paralleling all the architectures).
Make each submenu always appear if the corresponding architecture
is supported. Get rid of the ARM_CRYPTO and ARM64_CRYPTO symbols.
The "ARM Accelerated" or "ARM64 Accelerated" entry disappears from:
General setup --->
Platform selection --->
Kernel Features --->
Boot options --->
Power management options --->
CPU Power Management --->
[*] ACPI (Advanced Configuration and Power Interface) Support --->
[*] Virtualization --->
[*] ARM Accelerated Cryptographic Algorithms --->
(or)
[*] ARM64 Accelerated Cryptographic Algorithms --->
...
-*- Cryptographic API --->
Library routines --->
Kernel hacking --->
and moves into the Cryptographic API menu, which now contains:
...
Accelerated Cryptographic Algorithms for CPU (arm) --->
(or)
Accelerated Cryptographic Algorithms for CPU (arm64) --->
[*] Hardware crypto devices --->
...
Suggested-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Robert Elliott <elliott@hpe.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
BLAKE2s has no currently known use as an shash. Just remove all of this
unnecessary plumbing. Removing this shash was something we talked about
back when we were making BLAKE2s a built-in, but I simply never got
around to doing it. So this completes that project.
Importantly, this fixs a bug in which the lib code depends on
crypto_simd_disabled_for_test, causing linker errors.
Also add more alignment tests to the selftests and compare SIMD and
non-SIMD compression functions, to make up for what we lose from
testmgr.c.
Reported-by: gaochao <gaochao49@huawei.com>
Cc: Eric Biggers <ebiggers@kernel.org>
Cc: Ard Biesheuvel <ardb@kernel.org>
Cc: stable@vger.kernel.org
Fixes: 6048fdcc5f ("lib/crypto: blake2s: include as built-in")
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
- Missing Kconfig dependency on arm that leads to boot failure.
- x86 SLS fixes.
- Reference leak in the stm32 driver.
-----BEGIN PGP SIGNATURE-----
iQIzBAABCgAdFiEEn51F/lCuNhUwmDeSxycdCkmxi6cFAmJD5lYACgkQxycdCkmx
i6cE1w//Xp0x6/m+iMOtctbBy8dLRoKO3ADUxFI+Y+GdYYkUuRX5YKFRRBIIsRWv
1RLv9C53g5It7O5ohtH2oMrefLiZ2jWLojfBBrv/1pvol6r1LsAxSsRN9QrFSNAB
Bsv6RouB/HYaMjbwEroPlj9/3XUlvsbvb4aNGxSnpcNI12HifxYRh3FPlJj/mdHh
SPvPqpSewuDSajNubHfRAAvayG3md7iOZBFx1q+fAaczHiO5NK8DslktFlyRUbeV
KT0YosZ7VuGLWgsQD052FYKqApqRzj9GmePtO/n5F24e+K5fbo0vP1XzjpTI2KAh
I+vZ4CvTjSz3feFSsCNjLjd+KGj+cCuG2TrTn0rhM9o2bINGw+VWwSj3Wr7EBsS5
Gf9CzdLrlcpM+HfDW2HMEqX+MXsaGQ0eoKxWs5BeKrPAUtbWTG9Y0UNrZ/eeoLYa
4j6r3Lr0eb6zLzy6rRkG6iKN2tBUmj3BC6KZjNJaHq+bxHTY2myU1YLtcTHZXvKc
x6I5G6e/AyRNQwcSoGYOnAnp8PfZyPaeMRR3ydxqRL/dZiJrH7xUjF0gr4ZYLcDr
9khwTmlMiSQA7X/FlgnmGFkVlFQdxIF1jQ5RXn5K/CrzWHgmbdoKB2rVJB/mdSMj
TwAGCbL8r0Sr7SSkisHrgZN+mGOt4XxpPWh+IpkLUQx4iB1XI7I=
=qKtj
-----END PGP SIGNATURE-----
Merge tag 'v5.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto fixes from Herbert Xu:
- Missing Kconfig dependency on arm that leads to boot failure
- x86 SLS fixes
- Reference leak in the stm32 driver
* tag 'v5.18-p1' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6:
crypto: x86/sm3 - Fixup SLS
crypto: x86/poly1305 - Fixup SLS
crypto: x86/chacha20 - Avoid spurious jumps to other functions
crypto: stm32 - fix reference leak in stm32_crc_remove
crypto: arm/aes-neonbs-cbc - Select generic cbc and aes
The algorithm __cbc-aes-neonbs requires a fallback so we need
to select the config options for them or otherwise it will fail
to register on boot-up.
Fixes: 00b99ad2ba ("crypto: arm/aes-neonbs - Use generic cbc...")
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Instead of falling back to C code to deal with the final bit of input
that is not a round multiple of the block size, handle this in the asm
code, permitting us to use overlapping loads and stores for performance,
and implement the 16-byte wide XOR using a single NEON instruction.
Since NEON loads and stores have a natural width of 16 bytes, we need to
handle inputs of less than 16 bytes in a special way, but this rarely
occurs in practice so it does not impact performance. All other input
sizes can be consumed directly by the NEON asm code, although it should
be noted that the core AES transform can still only process 128 bytes (8
AES blocks) at a time.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
blake2s_compress_generic is weakly aliased by blake2s_compress. The
current harness for function selection uses a function pointer, which is
ordinarily inlined and resolved at compile time. But when Clang's CFI is
enabled, CFI still triggers when making an indirect call via a weak
symbol. This seems like a bug in Clang's CFI, as though it's bucketing
weak symbols and strong symbols differently. It also only seems to
trigger when "full LTO" mode is used, rather than "thin LTO".
[ 0.000000][ T0] Kernel panic - not syncing: CFI failure (target: blake2s_compress_generic+0x0/0x1444)
[ 0.000000][ T0] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 5.16.0-mainline-06981-g076c855b846e #1
[ 0.000000][ T0] Hardware name: MT6873 (DT)
[ 0.000000][ T0] Call trace:
[ 0.000000][ T0] dump_backtrace+0xfc/0x1dc
[ 0.000000][ T0] dump_stack_lvl+0xa8/0x11c
[ 0.000000][ T0] panic+0x194/0x464
[ 0.000000][ T0] __cfi_check_fail+0x54/0x58
[ 0.000000][ T0] __cfi_slowpath_diag+0x354/0x4b0
[ 0.000000][ T0] blake2s_update+0x14c/0x178
[ 0.000000][ T0] _extract_entropy+0xf4/0x29c
[ 0.000000][ T0] crng_initialize_primary+0x24/0x94
[ 0.000000][ T0] rand_initialize+0x2c/0x6c
[ 0.000000][ T0] start_kernel+0x2f8/0x65c
[ 0.000000][ T0] __primary_switched+0xc4/0x7be4
[ 0.000000][ T0] Rebooting in 5 seconds..
Nonetheless, the function pointer method isn't so terrific anyway, so
this patch replaces it with a simple boolean, which also gets inlined
away. This successfully works around the Clang bug.
In general, I'm not too keen on all of the indirection involved here; it
clearly does more harm than good. Hopefully the whole thing can get
cleaned up down the road when lib/crypto is overhauled more
comprehensively. But for now, we go with a simple bandaid.
Fixes: 6048fdcc5f ("lib/crypto: blake2s: include as built-in")
Link: https://github.com/ClangBuiltLinux/linux/issues/1567
Reported-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Miles Chen <miles.chen@mediatek.com>
Tested-by: Nathan Chancellor <nathan@kernel.org>
Tested-by: John Stultz <john.stultz@linaro.org>
Acked-by: Nick Desaulniers <ndesaulniers@google.com>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
In preparation for using blake2s in the RNG, we change the way that it
is wired-in to the build system. Instead of using ifdefs to select the
right symbol, we use weak symbols. And because ARM doesn't need the
generic implementation, we make the generic one default only if an arch
library doesn't need it already, and then have arch libraries that do
need it opt-in. So that the arch libraries can remain tristate rather
than bool, we then split the shash part from the glue code.
Acked-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Masahiro Yamada <masahiroy@kernel.org>
Cc: linux-kbuild@vger.kernel.org
Cc: linux-crypto@vger.kernel.org
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Rename module_init & module_exit functions that are named
"mod_init" and "mod_exit" so that they are unique in both the
System.map file and in initcall_debug output instead of showing
up as almost anonymous "mod_init".
This is helpful for debugging and in determining how long certain
module_init calls take to execute.
Signed-off-by: Randy Dunlap <rdunlap@infradead.org>
Cc: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: linux-arm-kernel@lists.infradead.org
Cc: patches@armlinux.org.uk
Acked-by: Russell King (Oracle) <rmk+kernel@armlinux.org.uk>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Generate *.S by Perl like arch/{mips,x86}/crypto/Makefile.
Signed-off-by: Masahiro Yamada <masahiroy@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Debian's clang carries a patch that makes the default FPU mode
'vfp3-d16' instead of 'neon' for 'armv7-a' to avoid generating NEON
instructions on hardware that does not support them:
5a61ca6f21/debian/patches/clang-arm-default-vfp3-on-armv7a.patchhttps://bugs.debian.org/841474https://bugs.debian.org/842142https://bugs.debian.org/914268
This results in the following build error when clang's integrated
assembler is used because the '.arch' directive overrides the '.fpu'
directive:
arch/arm/crypto/curve25519-core.S:25:2: error: instruction requires: NEON
vmov.i32 q0, #1
^
arch/arm/crypto/curve25519-core.S:26:2: error: instruction requires: NEON
vshr.u64 q1, q0, #7
^
arch/arm/crypto/curve25519-core.S:27:2: error: instruction requires: NEON
vshr.u64 q0, q0, #8
^
arch/arm/crypto/curve25519-core.S:28:2: error: instruction requires: NEON
vmov.i32 d4, #19
^
Shuffle the order of the '.arch' and '.fpu' directives so that the code
builds regardless of the default FPU mode. This has been tested against
both clang with and without Debian's patch and GCC.
Cc: stable@vger.kernel.org
Fixes: d8f1308a02 ("crypto: arm/curve25519 - wire up NEON implementation")
Link: https://github.com/ClangBuiltLinux/continuous-integration2/issues/118
Reported-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Arnd Bergmann <arnd@arndb.de>
Suggested-by: Jessica Clarke <jrtc27@jrtc27.com>
Signed-off-by: Nathan Chancellor <nathan@kernel.org>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Reviewed-by: Nick Desaulniers <ndesaulniers@google.com>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
gcc-11 points out a mismatch between the declaration and the definition
of poly1305_core_setkey():
lib/crypto/poly1305-donna32.c:13:67: error: argument 2 of type ‘const u8[16]’ {aka ‘const unsigned char[16]’} with mismatched bound [-Werror=array-parameter=]
13 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 raw_key[16])
| ~~~~~~~~~^~~~~~~~~~~
In file included from lib/crypto/poly1305-donna32.c:11:
include/crypto/internal/poly1305.h:21:68: note: previously declared as ‘const u8 *’ {aka ‘const unsigned char *’}
21 | void poly1305_core_setkey(struct poly1305_core_key *key, const u8 *raw_key);
This is harmless in principle, as the calling conventions are the same,
but the more specific prototype allows better type checking in the
caller.
Change the declaration to match the actual function definition.
The poly1305_simd_init() is a bit suspicious here, as it previously
had a 32-byte argument type, but looks like it needs to take the
16-byte POLY1305_BLOCK_SIZE array instead.
Fixes: 1c08a10436 ("crypto: poly1305 - add new 32 and 64-bit generic versions")
Signed-off-by: Arnd Bergmann <arnd@arndb.de>
Reviewed-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Drop the local definition of a byte swapping macro and use the common
one instead.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The scalar AES implementation has some locally defined macros which
reimplement things that are now available in macros defined in
assembler.h. So let's switch to those.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Reviewed-by: Nicolas Pitre <nico@fluxnic.net>
Reviewed-by: Geert Uytterhoeven <geert+renesas@glider.be>
Reviewed-by: Linus Walleij <linus.walleij@linaro.org>
Reviewed-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The new ARM BLAKE2s code doesn't work correctly (fails the self-tests)
in big endian kernel builds because it doesn't swap the endianness of
the message words when loading them. Fix this.
Fixes: 5172d322d3 ("crypto: arm/blake2s - add ARM scalar optimized BLAKE2s")
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Neither crypto_unregister_shashes() nor the module_exit function return
a value, so the explicit 'return' is unnecessary.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add a NEON-accelerated implementation of BLAKE2b.
On Cortex-A7 (which these days is the most common ARM processor that
doesn't have the ARMv8 Crypto Extensions), this is over twice as fast as
SHA-256, and slightly faster than SHA-1. It is also almost three times
as fast as the generic implementation of BLAKE2b:
Algorithm Cycles per byte (on 4096-byte messages)
=================== =======================================
blake2b-256-neon 14.0
sha1-neon 16.3
blake2s-256-arm 18.8
sha1-asm 20.8
blake2s-256-generic 26.0
sha256-neon 28.9
sha256-asm 32.0
blake2b-256-generic 38.9
This implementation isn't directly based on any other implementation,
but it borrows some ideas from previous NEON code I've written as well
as from chacha-neon-core.S. At least on Cortex-A7, it is faster than
the other NEON implementations of BLAKE2b I'm aware of (the
implementation in the BLAKE2 official repository using intrinsics, and
Andrew Moon's implementation which can be found in SUPERCOP). It does
only one block at a time, so it performs well on short messages too.
NEON-accelerated BLAKE2b is useful because there is interest in using
BLAKE2b-256 for dm-verity on low-end Android devices (specifically,
devices that lack the ARMv8 Crypto Extensions) to replace SHA-1. On
these devices, the performance cost of upgrading to SHA-256 may be
unacceptable, whereas BLAKE2b-256 would actually improve performance.
Although BLAKE2b is intended for 64-bit platforms (unlike BLAKE2s which
is intended for 32-bit platforms), on 32-bit ARM processors with NEON,
BLAKE2b is actually faster than BLAKE2s. This is because NEON supports
64-bit operations, and because BLAKE2s's block size is too small for
NEON to be helpful for it. The best I've been able to do with BLAKE2s
on Cortex-A7 is 18.8 cpb with an optimized scalar implementation.
(I didn't try BLAKE2sp and BLAKE3, which in theory would be faster, but
they're more complex as they require running multiple hashes at once.
Note that BLAKE2b already uses all the NEON bandwidth on the Cortex-A7,
so I expect that any speedup from BLAKE2sp or BLAKE3 would come only
from the smaller number of rounds, not from the extra parallelism.)
For now this BLAKE2b implementation is only wired up to the shash API,
since there is no library API for BLAKE2b yet. However, I've tried to
keep things consistent with BLAKE2s, e.g. by defining
blake2b_compress_arch() which is analogous to blake2s_compress_arch()
and could be exported for use by the library API later if needed.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Add an ARM scalar optimized implementation of BLAKE2s.
NEON isn't very useful for BLAKE2s because the BLAKE2s block size is too
small for NEON to help. Each NEON instruction would depend on the
previous one, resulting in poor performance.
With scalar instructions, on the other hand, we can take advantage of
ARM's "free" rotations (like I did in chacha-scalar-core.S) to get an
implementation get runs much faster than the C implementation.
Performance results on Cortex-A7 in cycles per byte using the shash API:
4096-byte messages:
blake2s-256-arm: 18.8
blake2s-256-generic: 26.0
500-byte messages:
blake2s-256-arm: 20.3
blake2s-256-generic: 27.9
100-byte messages:
blake2s-256-arm: 29.7
blake2s-256-generic: 39.2
32-byte messages:
blake2s-256-arm: 50.6
blake2s-256-generic: 66.2
Except on very short messages, this is still slower than the NEON
implementation of BLAKE2b which I've written; that is 14.0, 16.4, 25.8,
and 76.1 cpb on 4096, 500, 100, and 32-byte messages, respectively.
However, optimized BLAKE2s is useful for cases where BLAKE2s is used
instead of BLAKE2b, such as WireGuard.
This new implementation is added in the form of a new module
blake2s-arm.ko, which is analogous to blake2s-x86_64.ko in that it
provides blake2s_compress_arch() for use by the library API as well as
optionally register the algorithms with the shash API.
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Eric Biggers <ebiggers@google.com>
Tested-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The cipher routines in the crypto API are mostly intended for templates
implementing skcipher modes generically in software, and shouldn't be
used outside of the crypto subsystem. So move the prototypes and all
related definitions to a new header file under include/crypto/internal.
Also, let's use the new module namespace feature to move the symbol
exports into a new namespace CRYPTO_INTERNAL.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Eric Biggers <ebiggers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Commit 86cd97ec4b ("crypto: arm/chacha-neon - optimize for non-block
size multiples") refactored the chacha block handling in the glue code in
a way that may result in the counter increment to be omitted when calling
chacha_block_xor_neon() to process a full block. This violates the skcipher
API, which requires that the output IV is suitable for handling more input
as long as the preceding input has been presented in round multiples of the
block size. Also, the same code is exposed via the chacha library interface
whose callers may actually rely on this increment to occur even for final
blocks that are smaller than the chacha block size.
So increment the counter after calling chacha_block_xor_neon().
Fixes: 86cd97ec4b ("crypto: arm/chacha-neon - optimize for non-block size multiples")
Reported-by: Eric Biggers <ebiggers@kernel.org>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
ARM Cortex-A57 and Cortex-A72 cores running in 32-bit mode are affected
by silicon errata #1742098 and #1655431, respectively, where the second
instruction of a AES instruction pair may execute twice if an interrupt
is taken right after the first instruction consumes an input register of
which a single 32-bit lane has been updated the last time it was modified.
This is not such a rare occurrence as it may seem: in counter mode, only
the least significant 32-bit word is incremented in the absence of a
carry, which makes our counter mode implementation susceptible to these
errata.
So let's shuffle the counter assignments around a bit so that the most
recent updates when the AES instruction pair executes are 128-bit wide.
[0] ARM-EPM-049219 v23 Cortex-A57 MPCore Software Developers Errata Notice
[1] ARM-EPM-012079 v11.0 Cortex-A72 MPCore Software Developers Errata Notice
Cc: <stable@vger.kernel.org> # v5.4+
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Currently <crypto/sha.h> contains declarations for both SHA-1 and SHA-2,
and <crypto/sha3.h> contains declarations for SHA-3.
This organization is inconsistent, but more importantly SHA-1 is no
longer considered to be cryptographically secure. So to the extent
possible, SHA-1 shouldn't be grouped together with any of the other SHA
versions, and usage of it should be phased out.
Therefore, split <crypto/sha.h> into two headers <crypto/sha1.h> and
<crypto/sha2.h>, and make everyone explicitly specify whether they want
the declarations for SHA-1, SHA-2, or both.
This avoids making the SHA-1 declarations visible to files that don't
want anything to do with SHA-1. It also prepares for potentially moving
sha1.h into a new insecure/ or dangerous/ directory.
Signed-off-by: Eric Biggers <ebiggers@google.com>
Acked-by: Ard Biesheuvel <ardb@kernel.org>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The current NEON based ChaCha implementation for ARM is optimized for
multiples of 4x the ChaCha block size (64 bytes). This makes sense for
block encryption, but given that ChaCha is also often used in the
context of networking, it makes sense to consider arbitrary length
inputs as well.
For example, WireGuard typically uses 1420 byte packets, and performing
ChaCha encryption involves 5 invocations of chacha_4block_xor_neon()
and 3 invocations of chacha_block_xor_neon(), where the last one also
involves a memcpy() using a buffer on the stack to process the final
chunk of 1420 % 64 == 12 bytes.
Let's optimize for this case as well, by letting chacha_4block_xor_neon()
deal with any input size between 64 and 256 bytes, using NEON permutation
instructions and overlapping loads and stores. This way, the 140 byte
tail of a 1420 byte input buffer can simply be processed in one go.
This results in the following performance improvements for 1420 byte
blocks, without significant impact on power-of-2 input sizes. (Note
that Raspberry Pi is widely used in combination with a 32-bit kernel,
even though the core is 64-bit capable)
Cortex-A8 (BeagleBone) : 7%
Cortex-A15 (Calxeda Midway) : 21%
Cortex-A53 (Raspberry Pi 3) : 3%
Cortex-A72 (Raspberry Pi 4) : 19%
Cc: Eric Biggers <ebiggers@google.com>
Cc: "Jason A . Donenfeld" <Jason@zx2c4.com>
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Loading the module deadlocks since:
-local cbc(aes) implementation needs a fallback and
-crypto API tries to find one but the request_module() resolves back to
the same module
Fix this by changing the module alias for cbc(aes) and
using the NEED_FALLBACK flag when requesting for a fallback algorithm.
Fixes: 00b99ad2ba ("crypto: arm/aes-neonbs - Use generic cbc encryption path")
Signed-off-by: Horia Geantă <horia.geanta@nxp.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Use the typed skcipher init/exit routines instead of the generic
cra_init/_exit routines when instantiating/releasing the XTS
skciphers.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Reordering the tweak is never necessary for encryption, so avoid the
argument load on the encryption path.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Instead of using a homegrown macrofied version of the adr instruction
that sets the Thumb bit in the output value, only to ensure that any
bx instructions consuming that value will not switch out of Thumb mode
when branching, use non-interworking mov (to PC) instructions, which
achieve the same thing.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The ADRL pseudo instruction is not an architectural construct, but a
convenience macro that was supported by the ARM proprietary assembler
and adopted by binutils GAS as well, but only when assembling in 32-bit
ARM mode. Therefore, it can only be used in assembler code that is known
to assemble in ARM mode only, but as it turns out, the Clang assembler
does not implement ADRL at all, and so it is better to get rid of it
entirely.
So replace the ADRL instruction with a ADR instruction that refers to
a nearer symbol, and apply the delta explicitly using an additional
instruction.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
The ADRL pseudo instruction is not an architectural construct, but a
convenience macro that was supported by the ARM proprietary assembler
and adopted by binutils GAS as well, but only when assembling in 32-bit
ARM mode. Therefore, it can only be used in assembler code that is known
to assemble in ARM mode only, but as it turns out, the Clang assembler
does not implement ADRL at all, and so it is better to get rid of it
entirely.
So replace the ADRL instruction with a ADR instruction that refers to
a nearer symbol, and apply the delta explicitly using an additional
instruction.
Signed-off-by: Ard Biesheuvel <ardb@kernel.org>
Tested-by: Nick Desaulniers <ndesaulniers@google.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Since commit b56f5cbc7e ("crypto:
arm/aes-neonbs - resolve fallback cipher at runtime") the CBC
encryption path in aes-neonbs is now identical to that obtained
through the cbc template. This means that it can simply call
the generic cbc template instead of doing its own thing.
This patch removes the custom encryption path and simply invokes
the generic cbc template.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
This patch adds a prototype for poly1305_blocks_neon to slience
a compiler warning:
CC [M] arch/arm/crypto/poly1305-glue.o
../arch/arm/crypto/poly1305-glue.c:25:13: warning: no previous prototype for `poly1305_blocks_neon' [-Wmissing-prototypes]
void __weak poly1305_blocks_neon(void *state, const u8 *src, u32 len, u32 hibit)
^~~~~~~~~~~~~~~~~~~~
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
Acked-by: Jason A. Donenfeld <Jason@zx2c4.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>