linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-13 14:24:11 +08:00

Author	SHA1	Message	Date
Eric Biggers	1ca1b91794	crypto: chacha20-generic - refactor to allow varying number of rounds In preparation for adding XChaCha12 support, rename/refactor chacha20-generic to support different numbers of rounds. The justification for needing XChaCha12 support is explained in more detail in the patch "crypto: chacha - add XChaCha12 support". The only difference between ChaCha{8,12,20} are the number of rounds itself; all other parts of the algorithm are the same. Therefore, remove the "20" from all definitions, structures, functions, files, etc. that will be shared by all ChaCha versions. Also make ->setkey() store the round count in the chacha_ctx (previously chacha20_ctx). The generic code then passes the round count through to chacha_block(). There will be a ->setkey() function for each explicitly allowed round count; the encrypt/decrypt functions will be the same. I decided not to do it the opposite way (same ->setkey() function for all round counts, with different encrypt/decrypt functions) because that would have required more boilerplate code in architecture-specific implementations of ChaCha and XChaCha. Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Martin Willi <martin@strongswan.org> Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-11-20 14:26:55 +08:00
Ard Biesheuvel	cc3cc48972	crypto: arm64/aes-blk - ensure XTS mask is always loaded Commit `2e5d2f33d1` ("crypto: arm64/aes-blk - improve XTS mask handling") optimized away some reloads of the XTS mask vector, but failed to take into account that calls into the XTS en/decrypt routines will take a slightly different code path if a single block of input is split across different buffers. So let's ensure that the first load occurs unconditionally, and move the reload to the end so it doesn't occur needlessly. Fixes: `2e5d2f33d1` ("crypto: arm64/aes-blk - improve XTS mask handling") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-10-12 14:20:45 +08:00
Eric Biggers	7ff9036a62	crypto: arm64/aes - fix handling sub-block CTS-CBC inputs In the new arm64 CTS-CBC implementation, return an error code rather than crashing on inputs shorter than AES_BLOCK_SIZE bytes. Also set cra_blocksize to AES_BLOCK_SIZE (like is done in the cts template) to indicate the minimum input size. Fixes: `dd597fb33f` ("crypto: arm64/aes-blk - add support for CTS-CBC mode") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-10-08 13:47:02 +08:00
Ard Biesheuvel	2e5d2f33d1	crypto: arm64/aes-blk - improve XTS mask handling The Crypto Extension instantiation of the aes-modes.S collection of skciphers uses only 15 NEON registers for the round key array, whereas the pure NEON flavor uses 16 NEON registers for the AES S-box. This means we have a spare register available that we can use to hold the XTS mask vector, removing the need to reload it at every iteration of the inner loop. Since the pure NEON version does not permit this optimization, tweak the macros so we can factor out this functionality. Also, replace the literal load with a short sequence to compose the mask vector. On Cortex-A53, this results in a ~4% speedup. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-21 13:24:50 +08:00
Ard Biesheuvel	dd597fb33f	crypto: arm64/aes-blk - add support for CTS-CBC mode Currently, we rely on the generic CTS chaining mode wrapper to instantiate the cts(cbc(aes)) skcipher. Due to the high performance of the ARMv8 Crypto Extensions AES instructions (~1 cycles per byte), any overhead in the chaining mode layers is amplified, and so it pays off considerably to fold the CTS handling into the SIMD routines. On Cortex-A53, this results in a ~50% speedup for smaller input sizes. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-21 13:24:50 +08:00
Ard Biesheuvel	6e7de6af91	crypto: arm64/aes-blk - revert NEON yield for skciphers The reasoning of commit `f10dc56c64` ("crypto: arm64 - revert NEON yield for fast AEAD implementations") applies equally to skciphers: the walk API already guarantees that the input size of each call into the NEON code is bounded to the size of a page, and so there is no need for an additional TIF_NEED_RESCHED flag check inside the inner loop. So revert the skcipher changes to aes-modes.S (but retain the mac ones) This partially reverts commit `0c8f838a52`. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-21 13:24:50 +08:00
Ard Biesheuvel	557ecb4543	crypto: arm64/aes-blk - remove pointless (u8 ) casts For some reason, the asmlinkage prototypes of the NEON routines take u8[] arguments for the round key arrays, while the actual round keys are arrays of u32, and so passing them into those routines requires u8 casts at each occurrence. Fix that. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-21 13:24:50 +08:00
Ard Biesheuvel	2fffee536c	crypto: arm64/crct10dif - implement non-Crypto Extensions alternative The arm64 implementation of the CRC-T10DIF algorithm uses the 64x64 bit polynomial multiplication instructions, which are optional in the architecture, and if these instructions are not available, we fall back to the C routine which is slow and inefficient. So let's reuse the 64x64 bit PMULL alternative from the GHASH driver that uses a sequence of ~40 instructions involving 8x8 bit PMULL and some shifting and masking. This is a lot slower than the original, but it is still twice as fast as the current [unoptimized] C code on Cortex-A53, and it is time invariant and much easier on the D-cache. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-04 11:37:04 +08:00
Ard Biesheuvel	6c1b0da13e	crypto: arm64/crct10dif - preparatory refactor for 8x8 PMULL version Reorganize the CRC-T10DIF asm routine so we can easily instantiate an alternative version based on 8x8 polynomial multiplication in a subsequent patch. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-04 11:37:04 +08:00
Ard Biesheuvel	598b7d41e5	crypto: arm64/crc32 - remove PMULL based CRC32 driver Now that the scalar fallbacks have been moved out of this driver into the core crc32()/crc32c() routines, we are left with a CRC32 crypto API driver for arm64 that is based only on 64x64 polynomial multiplication, which is an optional instruction in the ARMv8 architecture, and is less and less likely to be available on cores that do not also implement the CRC32 instructions, given that those are mandatory in the architecture as of ARMv8.1. Since the scalar instructions do not require the special handling that SIMD instructions do, and since they turn out to be considerably faster on some cores (Cortex-A53) as well, there is really no point in keeping this code around so let's just remove it. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-04 11:37:04 +08:00
Ard Biesheuvel	ed6ed11830	crypto: arm64/aes-modes - get rid of literal load of addend vector Replace the literal load of the addend vector with a sequence that performs each add individually. This sequence is only 2 instructions longer than the original, and 2% faster on Cortex-A53. This is an improvement by itself, but also works around a Clang issue, whose integrated assembler does not implement the GNU ARM asm syntax completely, and does not support the =literal notation for FP registers (more info at https://bugs.llvm.org/show_bug.cgi?id=38642) Cc: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Reviewed-by: Nick Desaulniers <ndesaulniers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-04 11:37:04 +08:00
Jason A. Donenfeld	578bdaabd0	crypto: speck - remove Speck These are unused, undesired, and have never actually been used by anybody. The original authors of this code have changed their mind about its inclusion. While originally proposed for disk encryption on low-end devices, the idea was discarded [1] in favor of something else before that could really get going. Therefore, this patch removes Speck. [1] https://marc.info/?l=linux-crypto-vger&m=153359499015659 Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Acked-by: Eric Biggers <ebiggers@google.com> Cc: stable@vger.kernel.org Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-09-04 11:35:03 +08:00
Ard Biesheuvel	c2b24c36e0	crypto: arm64/aes-gcm-ce - fix scatterwalk API violation Commit `71e52c278c` ("crypto: arm64/aes-ce-gcm - operate on two input blocks at a time") modified the granularity at which the AES/GCM code processes its input to allow subsequent changes to be applied that improve performance by using aggregation to process multiple input blocks at once. For this reason, it doubled the algorithm's 'chunksize' property to 2 x AES_BLOCK_SIZE, but retained the non-SIMD fallback path that processes a single block at a time. In some cases, this violates the skcipher scatterwalk API, by calling skcipher_walk_done() with a non-zero residue value for a chunk that is expected to be handled in its entirety. This results in a WARN_ON() to be hit by the TLS self test code, but is likely to break other user cases as well. Unfortunately, none of the current test cases exercises this exact code path at the moment. Fixes: `71e52c278c` ("crypto: arm64/aes-ce-gcm - operate on two ...") Reported-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Tested-by: Vakul Garg <vakul.garg@nxp.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-08-25 19:50:43 +08:00
Ard Biesheuvel	7fa885e2a2	crypto: arm64/sm4-ce - check for the right CPU feature bit ARMv8.2 specifies special instructions for the SM3 cryptographic hash and the SM4 symmetric cipher. While it is unlikely that a core would implement one and not the other, we should only use SM4 instructions if the SM4 CPU feature bit is set, and we currently check the SM3 feature bit instead. So fix that. Fixes: `e99ce921c4` ("crypto: arm64 - add support for SM4...") Cc: <stable@vger.kernel.org> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-08-25 19:50:41 +08:00
Ard Biesheuvel	22240df7ac	crypto: arm64/ghash-ce - implement 4-way aggregation Enhance the GHASH implementation that uses 64-bit polynomial multiplication by adding support for 4-way aggregation. This more than doubles the performance, from 2.4 cycles per byte to 1.1 cpb on Cortex-A53. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-08-07 17:51:40 +08:00
Ard Biesheuvel	8e492eff7d	crypto: arm64/ghash-ce - replace NEON yield check with block limit Checking the TIF_NEED_RESCHED flag is disproportionately costly on cores with fast crypto instructions and comparatively slow memory accesses. On algorithms such as GHASH, which executes at ~1 cycle per byte on cores that implement support for 64 bit polynomial multiplication, there is really no need to check the TIF_NEED_RESCHED particularly often, and so we can remove the NEON yield check from the assembler routines. However, unlike the AEAD or skcipher APIs, the shash/ahash APIs take arbitrary input lengths, and so there needs to be some sanity check to ensure that we don't hog the CPU for excessive amounts of time. So let's simply cap the maximum input size that is processed in one go to 64 KB. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-08-07 17:51:39 +08:00
Ard Biesheuvel	30f1a9f53e	crypto: arm64/aes-ce-gcm - don't reload key schedule if avoidable Squeeze out another 5% of performance by minimizing the number of invocations of kernel_neon_begin()/kernel_neon_end() on the common path, which also allows some reloads of the key schedule to be optimized away. The resulting code runs at 2.3 cycles per byte on a Cortex-A53. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-08-07 17:38:04 +08:00
Ard Biesheuvel	e0bd888dc4	crypto: arm64/aes-ce-gcm - implement 2-way aggregation Implement a faster version of the GHASH transform which amortizes the reduction modulo the characteristic polynomial across two input blocks at a time. On a Cortex-A53, the gcm(aes) performance increases 24%, from 3.0 cycles per byte to 2.4 cpb for large input sizes. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-08-07 17:38:04 +08:00
Ard Biesheuvel	71e52c278c	crypto: arm64/aes-ce-gcm - operate on two input blocks at a time Update the core AES/GCM transform and the associated plumbing to operate on 2 AES/GHASH blocks at a time. By itself, this is not expected to result in a noticeable speedup, but it paves the way for reimplementing the GHASH component using 2-way aggregation. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-08-07 17:38:04 +08:00
Herbert Xu	3465893d27	Merge git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6 Merge crypto-2.6 to pick up NEON yield revert.	2018-08-07 17:37:10 +08:00
Ard Biesheuvel	f10dc56c64	crypto: arm64 - revert NEON yield for fast AEAD implementations As it turns out, checking the TIF_NEED_RESCHED flag after each iteration results in a significant performance regression (~10%) when running fast algorithms (i.e., ones that use special instructions and operate in the < 4 cycles per byte range) on in-order cores with comparatively slow memory accesses such as the Cortex-A53. Given the speed of these ciphers, and the fact that the page based nature of the AEAD scatterwalk API guarantees that the core NEON transform is never invoked with more than a single page's worth of input, we can estimate the worst case duration of any resulting scheduling blackout: on a 1 GHz Cortex-A53 running with 64k pages, processing a page's worth of input at 4 cycles per byte results in a delay of ~250 us, which is a reasonable upper bound. So let's remove the yield checks from the fused AES-CCM and AES-GCM routines entirely. This reverts commit `7b67ae4d5c` and partially reverts commit `7c50136a8a`. Fixes: `7c50136a8a` ("crypto: arm64/aes-ghash - yield NEON after every ...") Fixes: `7b67ae4d5c` ("crypto: arm64/aes-ccm - yield NEON after every ...") Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-08-07 17:26:23 +08:00
Herbert Xu	c5f5aeef9b	Merge git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux Merge mainline to pick up `c7513c2a27` ("crypto/arm64: aes-ce-gcm - add missing kernel_neon_begin/end pair").	2018-08-03 17:55:12 +08:00
Ard Biesheuvel	c7513c2a27	crypto/arm64: aes-ce-gcm - add missing kernel_neon_begin/end pair Calling pmull_gcm_encrypt_block() requires kernel_neon_begin() and kernel_neon_end() to be used since the routine touches the NEON register file. Add the missing calls. Also, since NEON register contents are not preserved outside of a kernel mode NEON region, pass the key schedule array again. Fixes: `7c50136a8a` ("crypto: arm64/aes-ghash - yield NEON after every ...") Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Will Deacon <will.deacon@arm.com>	2018-07-31 13:20:30 +01:00
Eric Biggers	6b0daa7820	crypto: arm64/sha256 - increase cra_priority of scalar implementations Commit `b73b7ac0a7` ("crypto: sha256_generic - add cra_priority") gave sha256-generic and sha224-generic a cra_priority of 100, to match the convention for generic implementations. But sha256-arm64 and sha224-arm64 also have priority 100, so their order relative to the generic implementations became ambiguous. Therefore, increase their priority to 125 so that they have higher priority than the generic implementations but lower priority than the NEON implementations which have priority 150. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-07-27 19:16:38 +08:00
Eric Biggers	e50944e219	crypto: shash - remove useless setting of type flags Many shash algorithms set .cra_flags = CRYPTO_ALG_TYPE_SHASH. But this is redundant with the C structure type ('struct shash_alg'), and crypto_register_shash() already sets the type flag automatically, clearing any type flag that was already there. Apparently the useless assignment has just been copy+pasted around. So, remove the useless assignment from all the shash algorithms. This patch shouldn't change any actual behavior. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-07-09 00:30:24 +08:00
Jia He	6e88f01206	crypto: arm64/aes-blk - fix and move skcipher_walk_done out of kernel_neon_begin, _end In a arm64 server(QDF2400),I met a similar might-sleep warning as [1]: [ 7.019116] BUG: sleeping function called from invalid context at ./include/crypto/algapi.h:416 [ 7.027863] in_atomic(): 1, irqs_disabled(): 0, pid: 410, name: cryptomgr_test [ 7.035106] 1 lock held by cryptomgr_test/410: [ 7.039549] #0: (ptrval) (&drbg->drbg_mutex){+.+.}, at: drbg_instantiate+0x34/0x398 [ 7.048038] CPU: 9 PID: 410 Comm: cryptomgr_test Not tainted 4.17.0-rc6+ #27 [ 7.068228] dump_backtrace+0x0/0x1c0 [ 7.071890] show_stack+0x24/0x30 [ 7.075208] dump_stack+0xb0/0xec [ 7.078523] ___might_sleep+0x160/0x238 [ 7.082360] skcipher_walk_done+0x118/0x2c8 [ 7.086545] ctr_encrypt+0x98/0x130 [ 7.090035] simd_skcipher_encrypt+0x68/0xc0 [ 7.094304] drbg_kcapi_sym_ctr+0xd4/0x1f8 [ 7.098400] drbg_ctr_update+0x98/0x330 [ 7.102236] drbg_seed+0x1b8/0x2f0 [ 7.105637] drbg_instantiate+0x2ac/0x398 [ 7.109646] drbg_kcapi_seed+0xbc/0x188 [ 7.113482] crypto_rng_reset+0x4c/0xb0 [ 7.117319] alg_test_drbg+0xec/0x330 [ 7.120981] alg_test.part.6+0x1c8/0x3c8 [ 7.124903] alg_test+0x58/0xa0 [ 7.128044] cryptomgr_test+0x50/0x58 [ 7.131708] kthread+0x134/0x138 [ 7.134936] ret_from_fork+0x10/0x1c Seems there is a bug in Ard Biesheuvel's commit. Fixes: `6833817472` ("crypto: arm64/aes-blk - move kernel mode neon en/disable into loop") [1] https://www.spinics.net/lists/linux-crypto/msg33103.html Signed-off-by: jia.he@hxt-semitech.com Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Cc: <stable@vger.kernel.org> # 4.17 Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-06-15 23:06:46 +08:00
Adam Langley	c2e415fe75	crypto: clarify licensing of OpenSSL asm code Several source files have been taken from OpenSSL. In some of them a comment that "permission to use under GPL terms is granted" was included below a contradictory license statement. In several cases, there was no indication that the license of the code was compatible with the GPLv2. This change clarifies the licensing for all of these files. I've confirmed with the author (Andy Polyakov) that a) he has licensed the files with the GPLv2 comment under that license and b) that he's also happy to license the other files under GPLv2 too. In one case, the file is already contained in his CRYPTOGAMS bundle, which has a GPLv2 option, and so no special measures are needed. In all cases, the license status of code has been clarified by making the GPLv2 license prominent. The .S files have been regenerated from the updated .pl files. This is a comment-only change. No code is changed. Signed-off-by: Adam Langley <agl@chromium.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-05-31 00:13:44 +08:00
Ard Biesheuvel	6caf7adc5e	crypto: arm64/sha512-ce - yield NEON after every block of input Avoid excessive scheduling delays under a preemptible kernel by conditionally yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-05-12 00:13:12 +08:00
Ard Biesheuvel	7edc86cb1c	crypto: arm64/sha3-ce - yield NEON after every block of input Avoid excessive scheduling delays under a preemptible kernel by conditionally yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-05-12 00:13:11 +08:00
Ard Biesheuvel	5b3da65177	crypto: arm64/crct10dif-ce - yield NEON after every block of input Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-05-12 00:13:11 +08:00
Ard Biesheuvel	4e530fba69	crypto: arm64/crc32-ce - yield NEON after every block of input Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-05-12 00:13:10 +08:00
Ard Biesheuvel	7c50136a8a	crypto: arm64/aes-ghash - yield NEON after every block of input Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-05-12 00:13:10 +08:00
Ard Biesheuvel	20ab633258	crypto: arm64/aes-bs - yield NEON after every block of input Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-05-12 00:13:09 +08:00
Ard Biesheuvel	0c8f838a52	crypto: arm64/aes-blk - yield NEON after every block of input Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-05-12 00:13:08 +08:00
Ard Biesheuvel	7b67ae4d5c	crypto: arm64/aes-ccm - yield NEON after every block of input Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-05-12 00:13:07 +08:00
Ard Biesheuvel	d82f37ab5e	crypto: arm64/sha2-ce - yield NEON after every block of input Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-05-12 00:13:06 +08:00
Ard Biesheuvel	7df8d16475	crypto: arm64/sha1-ce - yield NEON after every block of input Avoid excessive scheduling delays under a preemptible kernel by yielding the NEON after every block of input. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-05-12 00:13:05 +08:00
Ard Biesheuvel	e99ce921c4	crypto: arm64 - add support for SM4 encryption using special instructions Add support for the SM4 symmetric cipher implemented using the special SM4 instructions introduced in ARM architecture revision 8.2. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-05-05 14:52:53 +08:00
Masahiro Yamada	54a702f705	kbuild: mark $(targets) as .SECONDARY and remove .PRECIOUS markers GNU Make automatically deletes intermediate files that are updated in a chain of pattern rules. Example 1) %.dtb.o <- %.dtb.S <- %.dtb <- %.dts Example 2) %.o <- %.c <- %.c_shipped A couple of makefiles mark such targets as .PRECIOUS to prevent Make from deleting them, but the correct way is to use .SECONDARY. .SECONDARY Prerequisites of this special target are treated as intermediate files but are never automatically deleted. .PRECIOUS When make is interrupted during execution, it may delete the target file it is updating if the file was modified since make started. If you mark the file as precious, make will never delete the file if interrupted. Both can avoid deletion of intermediate files, but the difference is the behavior when Make is interrupted; .SECONDARY deletes the target, but .PRECIOUS does not. The use of .PRECIOUS is relatively rare since we do not want to keep partially constructed (possibly corrupted) targets. Another difference is that .PRECIOUS works with pattern rules whereas .SECONDARY does not. .PRECIOUS: $(obj)/%.lex.c works, but .SECONDARY: $(obj)/%.lex.c has no effect. However, for the reason above, I do not want to use .PRECIOUS which could cause obscure build breakage. The targets specified as .SECONDARY must be explicit. $(targets) contains all targets that need to include ..cmd files. So, the intermediates you want to keep are mostly in there. Therefore, mark $(targets) as .SECONDARY. It means primary targets are also marked as .SECONDARY, but I do not see any drawback for this. I replaced some .SECONDARY / .PRECIOUS markers with 'targets'. This will make Kbuild search for non-existing ..cmd files, but this is not a noticeable performance issue. Signed-off-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Frank Rowand <frowand.list@gmail.com> Acked-by: Ingo Molnar <mingo@kernel.org>	2018-04-07 19:04:02 +09:00
Leonard Crestez	6aaf49b495	crypto: arm,arm64 - Fix random regeneration of S_shipped The decision to rebuild .S_shipped is made based on the relative timestamps of .S_shipped and .pl files but git makes this essentially random. This means that the perl script might run anyway (usually at most once per checkout), defeating the whole purpose of _shipped. Fix by skipping the rule unless explicit make variables are provided: REGENERATE_ARM_CRYPTO or REGENERATE_ARM64_CRYPTO. This can produce nasty occasional build failures downstream, for example for toolchains with broken perl. The solution is minimally intrusive to make it easier to push into stable. Another report on a similar issue here: https://lkml.org/lkml/2018/3/8/1379 Signed-off-by: Leonard Crestez <leonard.crestez@nxp.com> Cc: <stable@vger.kernel.org> Reviewed-by: Masahiro Yamada <yamada.masahiro@socionext.com> Acked-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-03-23 23:43:19 +08:00
Ard Biesheuvel	1a3713c7cd	crypto: arm64/sha256-neon - play nice with CONFIG_PREEMPT kernels Tweak the SHA256 update routines to invoke the SHA256 block transform block by block, to avoid excessive scheduling delays caused by the NEON algorithm running with preemption disabled. Also, remove a stale comment which no longer applies now that kernel mode NEON is actually disallowed in some contexts. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-03-16 23:35:58 +08:00
Ard Biesheuvel	870c163a0e	crypto: arm64/aes-blk - add 4 way interleave to CBC-MAC encrypt path CBC MAC is strictly sequential, and so the current AES code simply processes the input one block at a time. However, we are about to add yield support, which adds a bit of overhead, and which we prefer to align with other modes in terms of granularity (i.e., it is better to have all routines yield every 64 bytes and not have an exception for CBC MAC which yields every 16 bytes) So unroll the loop by 4. We still cannot perform the AES algorithm in parallel, but we can at least merge the loads and stores. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-03-16 23:35:58 +08:00
Ard Biesheuvel	a8f8a69e82	crypto: arm64/aes-blk - add 4 way interleave to CBC encrypt path CBC encryption is strictly sequential, and so the current AES code simply processes the input one block at a time. However, we are about to add yield support, which adds a bit of overhead, and which we prefer to align with other modes in terms of granularity (i.e., it is better to have all routines yield every 64 bytes and not have an exception for CBC encrypt which yields every 16 bytes) So unroll the loop by 4. We still cannot perform the AES algorithm in parallel, but we can at least merge the loads and stores. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-03-16 23:35:57 +08:00
Ard Biesheuvel	55868b45cf	crypto: arm64/aes-blk - remove configurable interleave The AES block mode implementation using Crypto Extensions or plain NEON was written before real hardware existed, and so its interleave factor was made build time configurable (as well as an option to instantiate all interleaved sequences inline rather than as subroutines) We ended up using INTERLEAVE=4 with inlining disabled for both flavors of the core AES routines, so let's stick with that, and remove the option to configure this at build time. This makes the code easier to modify, which is nice now that we're adding yield support. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-03-16 23:35:55 +08:00
Ard Biesheuvel	4bf7e7a19d	crypto: arm64/chacha20 - move kernel mode neon en/disable into loop When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-03-16 23:35:55 +08:00
Ard Biesheuvel	78ad7b08d8	crypto: arm64/aes-bs - move kernel mode neon en/disable into loop When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-03-16 23:35:55 +08:00
Ard Biesheuvel	6833817472	crypto: arm64/aes-blk - move kernel mode neon en/disable into loop When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Note that this requires some reshuffling of the registers in the asm code, because the XTS routines can no longer rely on the registers to retain their contents between invocations. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-03-16 23:35:54 +08:00
Ard Biesheuvel	bd2ad885e3	crypto: arm64/aes-ce-ccm - move kernel mode neon en/disable into loop When kernel mode NEON was first introduced on arm64, the preserve and restore of the userland NEON state was completely unoptimized, and involved saving all registers on each call to kernel_neon_begin(), and restoring them on each call to kernel_neon_end(). For this reason, the NEON crypto code that was introduced at the time keeps the NEON enabled throughout the execution of the crypto API methods, which may include calls back into the crypto API that could result in memory allocation or other actions that we should avoid when running with preemption disabled. Since then, we have optimized the kernel mode NEON handling, which now restores lazily (upon return to userland), and so the preserve action is only costly the first time it is called after entering the kernel. So let's put the kernel_neon_begin() and kernel_neon_end() calls around the actual invocations of the NEON crypto code, and run the remainder of the code with kernel mode NEON disabled (and preemption enabled) Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-03-16 23:35:54 +08:00
Eric Biggers	91a2abb78f	crypto: arm64/speck - add NEON-accelerated implementation of Speck-XTS Add a NEON-accelerated implementation of Speck128-XTS and Speck64-XTS for ARM64. This is ported from the 32-bit version. It may be useful on devices with 64-bit ARM CPUs that don't have the Cryptography Extensions, so cannot do AES efficiently -- e.g. the Cortex-A53 processor on the Raspberry Pi 3. It generally works the same way as the 32-bit version, but there are some slight differences due to the different instructions, registers, and syntax available in ARM64 vs. in ARM32. For example, in the 64-bit version there are enough registers to hold the XTS tweaks for each 128-byte chunk, so they don't need to be saved on the stack. Benchmarks on a Raspberry Pi 3 running a 64-bit kernel: Algorithm Encryption Decryption --------- ---------- ---------- Speck64/128-XTS (NEON) 92.2 MB/s 92.2 MB/s Speck128/256-XTS (NEON) 75.0 MB/s 75.0 MB/s Speck128/256-XTS (generic) 47.4 MB/s 35.6 MB/s AES-128-XTS (NEON bit-sliced) 33.4 MB/s 29.6 MB/s AES-256-XTS (NEON bit-sliced) 24.6 MB/s 21.7 MB/s The code performs well on higher-end ARM64 processors as well, though such processors tend to have the Crypto Extensions which make AES preferred. For example, here are the same benchmarks run on a HiKey960 (with CPU affinity set for the A73 cores), with the Crypto Extensions implementation of AES-256-XTS added: Algorithm Encryption Decryption --------- ----------- ----------- AES-256-XTS (Crypto Extensions) 1273.3 MB/s 1274.7 MB/s Speck64/128-XTS (NEON) 359.8 MB/s 348.0 MB/s Speck128/256-XTS (NEON) 292.5 MB/s 286.1 MB/s Speck128/256-XTS (generic) 186.3 MB/s 181.8 MB/s AES-128-XTS (NEON bit-sliced) 142.0 MB/s 124.3 MB/s AES-256-XTS (NEON bit-sliced) 104.7 MB/s 91.1 MB/s Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-03-16 23:35:41 +08:00
Ard Biesheuvel	fb87127bce	crypto: arm64/sha512 - fix/improve new v8.2 Crypto Extensions code Add a missing symbol export that prevents this code to be built as a module. Also, move the round constant table to the .rodata section, and use a more optimized version of the core transform. Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2018-01-26 01:10:36 +11:00

1 2 3 4

152 Commits