mirror of
https://mirrors.bfsu.edu.cn/git/linux.git
synced 2025-01-18 11:54:37 +08:00
Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6
Pull crypto updates from Herbert Xu: "API: - Add 1472-byte test to tcrypt for IPsec - Reintroduced crypto stats interface with numerous changes - Support incremental algorithm dumps Algorithms: - Add xchacha12/20 - Add nhpoly1305 - Add adiantum - Add streebog hash - Mark cts(cbc(aes)) as FIPS allowed Drivers: - Improve performance of arm64/chacha20 - Improve performance of x86/chacha20 - Add NEON-accelerated nhpoly1305 - Add SSE2 accelerated nhpoly1305 - Add AVX2 accelerated nhpoly1305 - Add support for 192/256-bit keys in gcmaes AVX - Add SG support in gcmaes AVX - ESN for inline IPsec tx in chcr - Add support for CryptoCell 703 in ccree - Add support for CryptoCell 713 in ccree - Add SM4 support in ccree - Add SM3 support in ccree - Add support for chacha20 in caam/qi2 - Add support for chacha20 + poly1305 in caam/jr - Add support for chacha20 + poly1305 in caam/qi2 - Add AEAD cipher support in cavium/nitrox" * 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (130 commits) crypto: skcipher - remove remnants of internal IV generators crypto: cavium/nitrox - Fix build with !CONFIG_DEBUG_FS crypto: salsa20-generic - don't unnecessarily use atomic walk crypto: skcipher - add might_sleep() to skcipher_walk_virt() crypto: x86/chacha - avoid sleeping under kernel_fpu_begin() crypto: cavium/nitrox - Added AEAD cipher support crypto: mxc-scc - fix build warnings on ARM64 crypto: api - document missing stats member crypto: user - remove unused dump functions crypto: chelsio - Fix wrong error counter increments crypto: chelsio - Reset counters on cxgb4 Detach crypto: chelsio - Handle PCI shutdown event crypto: chelsio - cleanup:send addr as value in function argument crypto: chelsio - Use same value for both channel in single WR crypto: chelsio - Swap location of AAD and IV sent in WR crypto: chelsio - remove set but not used variable 'kctx_len' crypto: ux500 - Use proper enum in hash_set_dma_transfer crypto: ux500 - Use proper enum in cryp_set_dma_transfer crypto: aesni - Add scatter/gather avx stubs, and use them in C crypto: aesni - Introduce partial block macro ..
This commit is contained in:
commit
b71acb0e37
@ -1,15 +1,6 @@
|
||||
Programming Interface
|
||||
=====================
|
||||
|
||||
Please note that the kernel crypto API contains the AEAD givcrypt API
|
||||
(crypto_aead_giv\* and aead_givcrypt\* function calls in
|
||||
include/crypto/aead.h). This API is obsolete and will be removed in the
|
||||
future. To obtain the functionality of an AEAD cipher with internal IV
|
||||
generation, use the IV generator as a regular cipher. For example,
|
||||
rfc4106(gcm(aes)) is the AEAD cipher with external IV generation and
|
||||
seqniv(rfc4106(gcm(aes))) implies that the kernel crypto API generates
|
||||
the IV. Different IV generators are available.
|
||||
|
||||
.. class:: toc-title
|
||||
|
||||
Table of contents
|
||||
|
@ -157,10 +157,6 @@ applicable to a cipher, it is not displayed:
|
||||
|
||||
- rng for random number generator
|
||||
|
||||
- givcipher for cipher with associated IV generator (see the geniv
|
||||
entry below for the specification of the IV generator type used by
|
||||
the cipher implementation)
|
||||
|
||||
- kpp for a Key-agreement Protocol Primitive (KPP) cipher such as
|
||||
an ECDH or DH implementation
|
||||
|
||||
@ -174,16 +170,7 @@ applicable to a cipher, it is not displayed:
|
||||
|
||||
- digestsize: output size of the message digest
|
||||
|
||||
- geniv: IV generation type:
|
||||
|
||||
- eseqiv for encrypted sequence number based IV generation
|
||||
|
||||
- seqiv for sequence number based IV generation
|
||||
|
||||
- chainiv for chain iv generation
|
||||
|
||||
- <builtin> is a marker that the cipher implements IV generation and
|
||||
handling as it is specific to the given cipher
|
||||
- geniv: IV generator (obsolete)
|
||||
|
||||
Key Sizes
|
||||
---------
|
||||
@ -218,10 +205,6 @@ the aforementioned cipher types:
|
||||
|
||||
- CRYPTO_ALG_TYPE_ABLKCIPHER Asynchronous multi-block cipher
|
||||
|
||||
- CRYPTO_ALG_TYPE_GIVCIPHER Asynchronous multi-block cipher packed
|
||||
together with an IV generator (see geniv field in the /proc/crypto
|
||||
listing for the known IV generators)
|
||||
|
||||
- CRYPTO_ALG_TYPE_KPP Key-agreement Protocol Primitive (KPP) such as
|
||||
an ECDH or DH implementation
|
||||
|
||||
@ -338,18 +321,14 @@ uses the API applicable to the cipher type specified for the block.
|
||||
|
||||
The following call sequence is applicable when the IPSEC layer triggers
|
||||
an encryption operation with the esp_output function. During
|
||||
configuration, the administrator set up the use of rfc4106(gcm(aes)) as
|
||||
the cipher for ESP. The following call sequence is now depicted in the
|
||||
ASCII art above:
|
||||
configuration, the administrator set up the use of seqiv(rfc4106(gcm(aes)))
|
||||
as the cipher for ESP. The following call sequence is now depicted in
|
||||
the ASCII art above:
|
||||
|
||||
1. esp_output() invokes crypto_aead_encrypt() to trigger an
|
||||
encryption operation of the AEAD cipher with IV generator.
|
||||
|
||||
In case of GCM, the SEQIV implementation is registered as GIVCIPHER
|
||||
in crypto_rfc4106_alloc().
|
||||
|
||||
The SEQIV performs its operation to generate an IV where the core
|
||||
function is seqiv_geniv().
|
||||
The SEQIV generates the IV.
|
||||
|
||||
2. Now, SEQIV uses the AEAD API function calls to invoke the associated
|
||||
AEAD cipher. In our case, during the instantiation of SEQIV, the
|
||||
|
@ -1,8 +1,12 @@
|
||||
Arm TrustZone CryptoCell cryptographic engine
|
||||
|
||||
Required properties:
|
||||
- compatible: Should be one of: "arm,cryptocell-712-ree",
|
||||
"arm,cryptocell-710-ree" or "arm,cryptocell-630p-ree".
|
||||
- compatible: Should be one of -
|
||||
"arm,cryptocell-713-ree"
|
||||
"arm,cryptocell-703-ree"
|
||||
"arm,cryptocell-712-ree"
|
||||
"arm,cryptocell-710-ree"
|
||||
"arm,cryptocell-630p-ree"
|
||||
- reg: Base physical address of the engine and length of memory mapped region.
|
||||
- interrupts: Interrupt number for the device.
|
||||
|
||||
|
@ -6,6 +6,8 @@ Required properties:
|
||||
- interrupts : Should contain MXS DCP interrupt numbers, VMI IRQ and DCP IRQ
|
||||
must be supplied, optionally Secure IRQ can be present, but
|
||||
is currently not implemented and not used.
|
||||
- clocks : Clock reference (only required on some SOCs: 6ull and 6sll).
|
||||
- clock-names : Must be "dcp".
|
||||
|
||||
Example:
|
||||
|
||||
|
@ -3484,6 +3484,7 @@ F: include/linux/spi/cc2520.h
|
||||
F: Documentation/devicetree/bindings/net/ieee802154/cc2520.txt
|
||||
|
||||
CCREE ARM TRUSTZONE CRYPTOCELL REE DRIVER
|
||||
M: Yael Chemla <yael.chemla@foss.arm.com>
|
||||
M: Gilad Ben-Yossef <gilad@benyossef.com>
|
||||
L: linux-crypto@vger.kernel.org
|
||||
S: Supported
|
||||
@ -7147,7 +7148,9 @@ F: crypto/842.c
|
||||
F: lib/842/
|
||||
|
||||
IBM Power in-Nest Crypto Acceleration
|
||||
M: Paulo Flabiano Smorigo <pfsmorigo@linux.ibm.com>
|
||||
M: Breno Leitão <leitao@debian.org>
|
||||
M: Nayna Jain <nayna@linux.ibm.com>
|
||||
M: Paulo Flabiano Smorigo <pfsmorigo@gmail.com>
|
||||
L: linux-crypto@vger.kernel.org
|
||||
S: Supported
|
||||
F: drivers/crypto/nx/Makefile
|
||||
@ -7211,7 +7214,9 @@ S: Supported
|
||||
F: drivers/scsi/ibmvscsi_tgt/
|
||||
|
||||
IBM Power VMX Cryptographic instructions
|
||||
M: Paulo Flabiano Smorigo <pfsmorigo@linux.ibm.com>
|
||||
M: Breno Leitão <leitao@debian.org>
|
||||
M: Nayna Jain <nayna@linux.ibm.com>
|
||||
M: Paulo Flabiano Smorigo <pfsmorigo@gmail.com>
|
||||
L: linux-crypto@vger.kernel.org
|
||||
S: Supported
|
||||
F: drivers/crypto/vmx/Makefile
|
||||
|
@ -69,6 +69,15 @@ config CRYPTO_AES_ARM
|
||||
help
|
||||
Use optimized AES assembler routines for ARM platforms.
|
||||
|
||||
On ARM processors without the Crypto Extensions, this is the
|
||||
fastest AES implementation for single blocks. For multiple
|
||||
blocks, the NEON bit-sliced implementation is usually faster.
|
||||
|
||||
This implementation may be vulnerable to cache timing attacks,
|
||||
since it uses lookup tables. However, as countermeasures it
|
||||
disables IRQs and preloads the tables; it is hoped this makes
|
||||
such attacks very difficult.
|
||||
|
||||
config CRYPTO_AES_ARM_BS
|
||||
tristate "Bit sliced AES using NEON instructions"
|
||||
depends on KERNEL_MODE_NEON
|
||||
@ -117,9 +126,14 @@ config CRYPTO_CRC32_ARM_CE
|
||||
select CRYPTO_HASH
|
||||
|
||||
config CRYPTO_CHACHA20_NEON
|
||||
tristate "NEON accelerated ChaCha20 symmetric cipher"
|
||||
tristate "NEON accelerated ChaCha stream cipher algorithms"
|
||||
depends on KERNEL_MODE_NEON
|
||||
select CRYPTO_BLKCIPHER
|
||||
select CRYPTO_CHACHA20
|
||||
|
||||
config CRYPTO_NHPOLY1305_NEON
|
||||
tristate "NEON accelerated NHPoly1305 hash function (for Adiantum)"
|
||||
depends on KERNEL_MODE_NEON
|
||||
select CRYPTO_NHPOLY1305
|
||||
|
||||
endif
|
||||
|
@ -9,7 +9,8 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
|
||||
obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
|
||||
obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
|
||||
obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
|
||||
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
|
||||
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha-neon.o
|
||||
obj-$(CONFIG_CRYPTO_NHPOLY1305_NEON) += nhpoly1305-neon.o
|
||||
|
||||
ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
|
||||
ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
|
||||
@ -52,7 +53,8 @@ aes-arm-ce-y := aes-ce-core.o aes-ce-glue.o
|
||||
ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o
|
||||
crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
|
||||
crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
|
||||
chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
|
||||
chacha-neon-y := chacha-neon-core.o chacha-neon-glue.o
|
||||
nhpoly1305-neon-y := nh-neon-core.o nhpoly1305-neon-glue.o
|
||||
|
||||
ifdef REGENERATE_ARM_CRYPTO
|
||||
quiet_cmd_perl = PERL $@
|
||||
|
@ -10,7 +10,6 @@
|
||||
|
||||
#include <asm/hwcap.h>
|
||||
#include <asm/neon.h>
|
||||
#include <asm/hwcap.h>
|
||||
#include <crypto/aes.h>
|
||||
#include <crypto/internal/simd.h>
|
||||
#include <crypto/internal/skcipher.h>
|
||||
|
@ -10,6 +10,7 @@
|
||||
*/
|
||||
|
||||
#include <linux/linkage.h>
|
||||
#include <asm/assembler.h>
|
||||
#include <asm/cache.h>
|
||||
|
||||
.text
|
||||
@ -41,7 +42,7 @@
|
||||
.endif
|
||||
.endm
|
||||
|
||||
.macro __hround, out0, out1, in0, in1, in2, in3, t3, t4, enc, sz, op
|
||||
.macro __hround, out0, out1, in0, in1, in2, in3, t3, t4, enc, sz, op, oldcpsr
|
||||
__select \out0, \in0, 0
|
||||
__select t0, \in1, 1
|
||||
__load \out0, \out0, 0, \sz, \op
|
||||
@ -73,6 +74,14 @@
|
||||
__load t0, t0, 3, \sz, \op
|
||||
__load \t4, \t4, 3, \sz, \op
|
||||
|
||||
.ifnb \oldcpsr
|
||||
/*
|
||||
* This is the final round and we're done with all data-dependent table
|
||||
* lookups, so we can safely re-enable interrupts.
|
||||
*/
|
||||
restore_irqs \oldcpsr
|
||||
.endif
|
||||
|
||||
eor \out1, \out1, t1, ror #24
|
||||
eor \out0, \out0, t2, ror #16
|
||||
ldm rk!, {t1, t2}
|
||||
@ -83,14 +92,14 @@
|
||||
eor \out1, \out1, t2
|
||||
.endm
|
||||
|
||||
.macro fround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op
|
||||
.macro fround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op, oldcpsr
|
||||
__hround \out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, 1, \sz, \op
|
||||
__hround \out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1, \sz, \op
|
||||
__hround \out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1, \sz, \op, \oldcpsr
|
||||
.endm
|
||||
|
||||
.macro iround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op
|
||||
.macro iround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op, oldcpsr
|
||||
__hround \out0, \out1, \in0, \in3, \in2, \in1, \out2, \out3, 0, \sz, \op
|
||||
__hround \out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0, \sz, \op
|
||||
__hround \out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0, \sz, \op, \oldcpsr
|
||||
.endm
|
||||
|
||||
.macro __rev, out, in
|
||||
@ -118,13 +127,14 @@
|
||||
.macro do_crypt, round, ttab, ltab, bsz
|
||||
push {r3-r11, lr}
|
||||
|
||||
// Load keys first, to reduce latency in case they're not cached yet.
|
||||
ldm rk!, {r8-r11}
|
||||
|
||||
ldr r4, [in]
|
||||
ldr r5, [in, #4]
|
||||
ldr r6, [in, #8]
|
||||
ldr r7, [in, #12]
|
||||
|
||||
ldm rk!, {r8-r11}
|
||||
|
||||
#ifdef CONFIG_CPU_BIG_ENDIAN
|
||||
__rev r4, r4
|
||||
__rev r5, r5
|
||||
@ -138,6 +148,25 @@
|
||||
eor r7, r7, r11
|
||||
|
||||
__adrl ttab, \ttab
|
||||
/*
|
||||
* Disable interrupts and prefetch the 1024-byte 'ft' or 'it' table into
|
||||
* L1 cache, assuming cacheline size >= 32. This is a hardening measure
|
||||
* intended to make cache-timing attacks more difficult. They may not
|
||||
* be fully prevented, however; see the paper
|
||||
* https://cr.yp.to/antiforgery/cachetiming-20050414.pdf
|
||||
* ("Cache-timing attacks on AES") for a discussion of the many
|
||||
* difficulties involved in writing truly constant-time AES software.
|
||||
*/
|
||||
save_and_disable_irqs t0
|
||||
.set i, 0
|
||||
.rept 1024 / 128
|
||||
ldr r8, [ttab, #i + 0]
|
||||
ldr r9, [ttab, #i + 32]
|
||||
ldr r10, [ttab, #i + 64]
|
||||
ldr r11, [ttab, #i + 96]
|
||||
.set i, i + 128
|
||||
.endr
|
||||
push {t0} // oldcpsr
|
||||
|
||||
tst rounds, #2
|
||||
bne 1f
|
||||
@ -151,8 +180,21 @@
|
||||
\round r4, r5, r6, r7, r8, r9, r10, r11
|
||||
b 0b
|
||||
|
||||
2: __adrl ttab, \ltab
|
||||
\round r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b
|
||||
2: .ifb \ltab
|
||||
add ttab, ttab, #1
|
||||
.else
|
||||
__adrl ttab, \ltab
|
||||
// Prefetch inverse S-box for final round; see explanation above
|
||||
.set i, 0
|
||||
.rept 256 / 64
|
||||
ldr t0, [ttab, #i + 0]
|
||||
ldr t1, [ttab, #i + 32]
|
||||
.set i, i + 64
|
||||
.endr
|
||||
.endif
|
||||
|
||||
pop {rounds} // oldcpsr
|
||||
\round r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b, rounds
|
||||
|
||||
#ifdef CONFIG_CPU_BIG_ENDIAN
|
||||
__rev r4, r4
|
||||
@ -175,7 +217,7 @@
|
||||
.endm
|
||||
|
||||
ENTRY(__aes_arm_encrypt)
|
||||
do_crypt fround, crypto_ft_tab, crypto_ft_tab + 1, 2
|
||||
do_crypt fround, crypto_ft_tab,, 2
|
||||
ENDPROC(__aes_arm_encrypt)
|
||||
|
||||
.align 5
|
||||
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
|
||||
* ChaCha/XChaCha NEON helper functions
|
||||
*
|
||||
* Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
|
||||
*
|
||||
@ -27,9 +27,9 @@
|
||||
* (d) vtbl.8 + vtbl.8 (multiple of 8 bits rotations only,
|
||||
* needs index vector)
|
||||
*
|
||||
* ChaCha20 has 16, 12, 8, and 7-bit rotations. For the 12 and 7-bit
|
||||
* rotations, the only choices are (a) and (b). We use (a) since it takes
|
||||
* two-thirds the cycles of (b) on both Cortex-A7 and Cortex-A53.
|
||||
* ChaCha has 16, 12, 8, and 7-bit rotations. For the 12 and 7-bit rotations,
|
||||
* the only choices are (a) and (b). We use (a) since it takes two-thirds the
|
||||
* cycles of (b) on both Cortex-A7 and Cortex-A53.
|
||||
*
|
||||
* For the 16-bit rotation, we use vrev32.16 since it's consistently fastest
|
||||
* and doesn't need a temporary register.
|
||||
@ -52,30 +52,20 @@
|
||||
.fpu neon
|
||||
.align 5
|
||||
|
||||
ENTRY(chacha20_block_xor_neon)
|
||||
// r0: Input state matrix, s
|
||||
// r1: 1 data block output, o
|
||||
// r2: 1 data block input, i
|
||||
|
||||
//
|
||||
// This function encrypts one ChaCha20 block by loading the state matrix
|
||||
// in four NEON registers. It performs matrix operation on four words in
|
||||
// parallel, but requireds shuffling to rearrange the words after each
|
||||
// round.
|
||||
//
|
||||
|
||||
// x0..3 = s0..3
|
||||
add ip, r0, #0x20
|
||||
vld1.32 {q0-q1}, [r0]
|
||||
vld1.32 {q2-q3}, [ip]
|
||||
|
||||
vmov q8, q0
|
||||
vmov q9, q1
|
||||
vmov q10, q2
|
||||
vmov q11, q3
|
||||
/*
|
||||
* chacha_permute - permute one block
|
||||
*
|
||||
* Permute one 64-byte block where the state matrix is stored in the four NEON
|
||||
* registers q0-q3. It performs matrix operations on four words in parallel,
|
||||
* but requires shuffling to rearrange the words after each round.
|
||||
*
|
||||
* The round count is given in r3.
|
||||
*
|
||||
* Clobbers: r3, ip, q4-q5
|
||||
*/
|
||||
chacha_permute:
|
||||
|
||||
adr ip, .Lrol8_table
|
||||
mov r3, #10
|
||||
vld1.8 {d10}, [ip, :64]
|
||||
|
||||
.Ldoubleround:
|
||||
@ -139,9 +129,31 @@ ENTRY(chacha20_block_xor_neon)
|
||||
// x3 = shuffle32(x3, MASK(0, 3, 2, 1))
|
||||
vext.8 q3, q3, q3, #4
|
||||
|
||||
subs r3, r3, #1
|
||||
subs r3, r3, #2
|
||||
bne .Ldoubleround
|
||||
|
||||
bx lr
|
||||
ENDPROC(chacha_permute)
|
||||
|
||||
ENTRY(chacha_block_xor_neon)
|
||||
// r0: Input state matrix, s
|
||||
// r1: 1 data block output, o
|
||||
// r2: 1 data block input, i
|
||||
// r3: nrounds
|
||||
push {lr}
|
||||
|
||||
// x0..3 = s0..3
|
||||
add ip, r0, #0x20
|
||||
vld1.32 {q0-q1}, [r0]
|
||||
vld1.32 {q2-q3}, [ip]
|
||||
|
||||
vmov q8, q0
|
||||
vmov q9, q1
|
||||
vmov q10, q2
|
||||
vmov q11, q3
|
||||
|
||||
bl chacha_permute
|
||||
|
||||
add ip, r2, #0x20
|
||||
vld1.8 {q4-q5}, [r2]
|
||||
vld1.8 {q6-q7}, [ip]
|
||||
@ -166,15 +178,33 @@ ENTRY(chacha20_block_xor_neon)
|
||||
vst1.8 {q0-q1}, [r1]
|
||||
vst1.8 {q2-q3}, [ip]
|
||||
|
||||
bx lr
|
||||
ENDPROC(chacha20_block_xor_neon)
|
||||
pop {pc}
|
||||
ENDPROC(chacha_block_xor_neon)
|
||||
|
||||
ENTRY(hchacha_block_neon)
|
||||
// r0: Input state matrix, s
|
||||
// r1: output (8 32-bit words)
|
||||
// r2: nrounds
|
||||
push {lr}
|
||||
|
||||
vld1.32 {q0-q1}, [r0]!
|
||||
vld1.32 {q2-q3}, [r0]
|
||||
|
||||
mov r3, r2
|
||||
bl chacha_permute
|
||||
|
||||
vst1.32 {q0}, [r1]!
|
||||
vst1.32 {q3}, [r1]
|
||||
|
||||
pop {pc}
|
||||
ENDPROC(hchacha_block_neon)
|
||||
|
||||
.align 4
|
||||
.Lctrinc: .word 0, 1, 2, 3
|
||||
.Lrol8_table: .byte 3, 0, 1, 2, 7, 4, 5, 6
|
||||
|
||||
.align 5
|
||||
ENTRY(chacha20_4block_xor_neon)
|
||||
ENTRY(chacha_4block_xor_neon)
|
||||
push {r4-r5}
|
||||
mov r4, sp // preserve the stack pointer
|
||||
sub ip, sp, #0x20 // allocate a 32 byte buffer
|
||||
@ -184,9 +214,10 @@ ENTRY(chacha20_4block_xor_neon)
|
||||
// r0: Input state matrix, s
|
||||
// r1: 4 data blocks output, o
|
||||
// r2: 4 data blocks input, i
|
||||
// r3: nrounds
|
||||
|
||||
//
|
||||
// This function encrypts four consecutive ChaCha20 blocks by loading
|
||||
// This function encrypts four consecutive ChaCha blocks by loading
|
||||
// the state matrix in NEON registers four times. The algorithm performs
|
||||
// each operation on the corresponding word of each state matrix, hence
|
||||
// requires no word shuffling. The words are re-interleaved before the
|
||||
@ -219,7 +250,6 @@ ENTRY(chacha20_4block_xor_neon)
|
||||
vdup.32 q0, d0[0]
|
||||
|
||||
adr ip, .Lrol8_table
|
||||
mov r3, #10
|
||||
b 1f
|
||||
|
||||
.Ldoubleround4:
|
||||
@ -417,7 +447,7 @@ ENTRY(chacha20_4block_xor_neon)
|
||||
vsri.u32 q5, q8, #25
|
||||
vsri.u32 q6, q9, #25
|
||||
|
||||
subs r3, r3, #1
|
||||
subs r3, r3, #2
|
||||
bne .Ldoubleround4
|
||||
|
||||
// x0..7[0-3] are in q0-q7, x10..15[0-3] are in q10-q15.
|
||||
@ -527,4 +557,4 @@ ENTRY(chacha20_4block_xor_neon)
|
||||
|
||||
pop {r4-r5}
|
||||
bx lr
|
||||
ENDPROC(chacha20_4block_xor_neon)
|
||||
ENDPROC(chacha_4block_xor_neon)
|
201
arch/arm/crypto/chacha-neon-glue.c
Normal file
201
arch/arm/crypto/chacha-neon-glue.c
Normal file
@ -0,0 +1,201 @@
|
||||
/*
|
||||
* ARM NEON accelerated ChaCha and XChaCha stream ciphers,
|
||||
* including ChaCha20 (RFC7539)
|
||||
*
|
||||
* Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License version 2 as
|
||||
* published by the Free Software Foundation.
|
||||
*
|
||||
* Based on:
|
||||
* ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
|
||||
*
|
||||
* Copyright (C) 2015 Martin Willi
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License as published by
|
||||
* the Free Software Foundation; either version 2 of the License, or
|
||||
* (at your option) any later version.
|
||||
*/
|
||||
|
||||
#include <crypto/algapi.h>
|
||||
#include <crypto/chacha.h>
|
||||
#include <crypto/internal/skcipher.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/module.h>
|
||||
|
||||
#include <asm/hwcap.h>
|
||||
#include <asm/neon.h>
|
||||
#include <asm/simd.h>
|
||||
|
||||
asmlinkage void chacha_block_xor_neon(const u32 *state, u8 *dst, const u8 *src,
|
||||
int nrounds);
|
||||
asmlinkage void chacha_4block_xor_neon(const u32 *state, u8 *dst, const u8 *src,
|
||||
int nrounds);
|
||||
asmlinkage void hchacha_block_neon(const u32 *state, u32 *out, int nrounds);
|
||||
|
||||
static void chacha_doneon(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int bytes, int nrounds)
|
||||
{
|
||||
u8 buf[CHACHA_BLOCK_SIZE];
|
||||
|
||||
while (bytes >= CHACHA_BLOCK_SIZE * 4) {
|
||||
chacha_4block_xor_neon(state, dst, src, nrounds);
|
||||
bytes -= CHACHA_BLOCK_SIZE * 4;
|
||||
src += CHACHA_BLOCK_SIZE * 4;
|
||||
dst += CHACHA_BLOCK_SIZE * 4;
|
||||
state[12] += 4;
|
||||
}
|
||||
while (bytes >= CHACHA_BLOCK_SIZE) {
|
||||
chacha_block_xor_neon(state, dst, src, nrounds);
|
||||
bytes -= CHACHA_BLOCK_SIZE;
|
||||
src += CHACHA_BLOCK_SIZE;
|
||||
dst += CHACHA_BLOCK_SIZE;
|
||||
state[12]++;
|
||||
}
|
||||
if (bytes) {
|
||||
memcpy(buf, src, bytes);
|
||||
chacha_block_xor_neon(state, buf, buf, nrounds);
|
||||
memcpy(dst, buf, bytes);
|
||||
}
|
||||
}
|
||||
|
||||
static int chacha_neon_stream_xor(struct skcipher_request *req,
|
||||
struct chacha_ctx *ctx, u8 *iv)
|
||||
{
|
||||
struct skcipher_walk walk;
|
||||
u32 state[16];
|
||||
int err;
|
||||
|
||||
err = skcipher_walk_virt(&walk, req, false);
|
||||
|
||||
crypto_chacha_init(state, ctx, iv);
|
||||
|
||||
while (walk.nbytes > 0) {
|
||||
unsigned int nbytes = walk.nbytes;
|
||||
|
||||
if (nbytes < walk.total)
|
||||
nbytes = round_down(nbytes, walk.stride);
|
||||
|
||||
kernel_neon_begin();
|
||||
chacha_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
|
||||
nbytes, ctx->nrounds);
|
||||
kernel_neon_end();
|
||||
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
|
||||
}
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
static int chacha_neon(struct skcipher_request *req)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
|
||||
|
||||
if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd())
|
||||
return crypto_chacha_crypt(req);
|
||||
|
||||
return chacha_neon_stream_xor(req, ctx, req->iv);
|
||||
}
|
||||
|
||||
static int xchacha_neon(struct skcipher_request *req)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
|
||||
struct chacha_ctx subctx;
|
||||
u32 state[16];
|
||||
u8 real_iv[16];
|
||||
|
||||
if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd())
|
||||
return crypto_xchacha_crypt(req);
|
||||
|
||||
crypto_chacha_init(state, ctx, req->iv);
|
||||
|
||||
kernel_neon_begin();
|
||||
hchacha_block_neon(state, subctx.key, ctx->nrounds);
|
||||
kernel_neon_end();
|
||||
subctx.nrounds = ctx->nrounds;
|
||||
|
||||
memcpy(&real_iv[0], req->iv + 24, 8);
|
||||
memcpy(&real_iv[8], req->iv + 16, 8);
|
||||
return chacha_neon_stream_xor(req, &subctx, real_iv);
|
||||
}
|
||||
|
||||
static struct skcipher_alg algs[] = {
|
||||
{
|
||||
.base.cra_name = "chacha20",
|
||||
.base.cra_driver_name = "chacha20-neon",
|
||||
.base.cra_priority = 300,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA_KEY_SIZE,
|
||||
.max_keysize = CHACHA_KEY_SIZE,
|
||||
.ivsize = CHACHA_IV_SIZE,
|
||||
.chunksize = CHACHA_BLOCK_SIZE,
|
||||
.walksize = 4 * CHACHA_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha20_setkey,
|
||||
.encrypt = chacha_neon,
|
||||
.decrypt = chacha_neon,
|
||||
}, {
|
||||
.base.cra_name = "xchacha20",
|
||||
.base.cra_driver_name = "xchacha20-neon",
|
||||
.base.cra_priority = 300,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA_KEY_SIZE,
|
||||
.max_keysize = CHACHA_KEY_SIZE,
|
||||
.ivsize = XCHACHA_IV_SIZE,
|
||||
.chunksize = CHACHA_BLOCK_SIZE,
|
||||
.walksize = 4 * CHACHA_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha20_setkey,
|
||||
.encrypt = xchacha_neon,
|
||||
.decrypt = xchacha_neon,
|
||||
}, {
|
||||
.base.cra_name = "xchacha12",
|
||||
.base.cra_driver_name = "xchacha12-neon",
|
||||
.base.cra_priority = 300,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA_KEY_SIZE,
|
||||
.max_keysize = CHACHA_KEY_SIZE,
|
||||
.ivsize = XCHACHA_IV_SIZE,
|
||||
.chunksize = CHACHA_BLOCK_SIZE,
|
||||
.walksize = 4 * CHACHA_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha12_setkey,
|
||||
.encrypt = xchacha_neon,
|
||||
.decrypt = xchacha_neon,
|
||||
}
|
||||
};
|
||||
|
||||
static int __init chacha_simd_mod_init(void)
|
||||
{
|
||||
if (!(elf_hwcap & HWCAP_NEON))
|
||||
return -ENODEV;
|
||||
|
||||
return crypto_register_skciphers(algs, ARRAY_SIZE(algs));
|
||||
}
|
||||
|
||||
static void __exit chacha_simd_mod_fini(void)
|
||||
{
|
||||
crypto_unregister_skciphers(algs, ARRAY_SIZE(algs));
|
||||
}
|
||||
|
||||
module_init(chacha_simd_mod_init);
|
||||
module_exit(chacha_simd_mod_fini);
|
||||
|
||||
MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers (NEON accelerated)");
|
||||
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
|
||||
MODULE_LICENSE("GPL v2");
|
||||
MODULE_ALIAS_CRYPTO("chacha20");
|
||||
MODULE_ALIAS_CRYPTO("chacha20-neon");
|
||||
MODULE_ALIAS_CRYPTO("xchacha20");
|
||||
MODULE_ALIAS_CRYPTO("xchacha20-neon");
|
||||
MODULE_ALIAS_CRYPTO("xchacha12");
|
||||
MODULE_ALIAS_CRYPTO("xchacha12-neon");
|
@ -1,127 +0,0 @@
|
||||
/*
|
||||
* ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
|
||||
*
|
||||
* Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License version 2 as
|
||||
* published by the Free Software Foundation.
|
||||
*
|
||||
* Based on:
|
||||
* ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
|
||||
*
|
||||
* Copyright (C) 2015 Martin Willi
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License as published by
|
||||
* the Free Software Foundation; either version 2 of the License, or
|
||||
* (at your option) any later version.
|
||||
*/
|
||||
|
||||
#include <crypto/algapi.h>
|
||||
#include <crypto/chacha20.h>
|
||||
#include <crypto/internal/skcipher.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/module.h>
|
||||
|
||||
#include <asm/hwcap.h>
|
||||
#include <asm/neon.h>
|
||||
#include <asm/simd.h>
|
||||
|
||||
asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
|
||||
asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
|
||||
|
||||
static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int bytes)
|
||||
{
|
||||
u8 buf[CHACHA20_BLOCK_SIZE];
|
||||
|
||||
while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
|
||||
chacha20_4block_xor_neon(state, dst, src);
|
||||
bytes -= CHACHA20_BLOCK_SIZE * 4;
|
||||
src += CHACHA20_BLOCK_SIZE * 4;
|
||||
dst += CHACHA20_BLOCK_SIZE * 4;
|
||||
state[12] += 4;
|
||||
}
|
||||
while (bytes >= CHACHA20_BLOCK_SIZE) {
|
||||
chacha20_block_xor_neon(state, dst, src);
|
||||
bytes -= CHACHA20_BLOCK_SIZE;
|
||||
src += CHACHA20_BLOCK_SIZE;
|
||||
dst += CHACHA20_BLOCK_SIZE;
|
||||
state[12]++;
|
||||
}
|
||||
if (bytes) {
|
||||
memcpy(buf, src, bytes);
|
||||
chacha20_block_xor_neon(state, buf, buf);
|
||||
memcpy(dst, buf, bytes);
|
||||
}
|
||||
}
|
||||
|
||||
static int chacha20_neon(struct skcipher_request *req)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
|
||||
struct skcipher_walk walk;
|
||||
u32 state[16];
|
||||
int err;
|
||||
|
||||
if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
|
||||
return crypto_chacha20_crypt(req);
|
||||
|
||||
err = skcipher_walk_virt(&walk, req, true);
|
||||
|
||||
crypto_chacha20_init(state, ctx, walk.iv);
|
||||
|
||||
kernel_neon_begin();
|
||||
while (walk.nbytes > 0) {
|
||||
unsigned int nbytes = walk.nbytes;
|
||||
|
||||
if (nbytes < walk.total)
|
||||
nbytes = round_down(nbytes, walk.stride);
|
||||
|
||||
chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
|
||||
nbytes);
|
||||
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
|
||||
}
|
||||
kernel_neon_end();
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
static struct skcipher_alg alg = {
|
||||
.base.cra_name = "chacha20",
|
||||
.base.cra_driver_name = "chacha20-neon",
|
||||
.base.cra_priority = 300,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha20_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA20_KEY_SIZE,
|
||||
.max_keysize = CHACHA20_KEY_SIZE,
|
||||
.ivsize = CHACHA20_IV_SIZE,
|
||||
.chunksize = CHACHA20_BLOCK_SIZE,
|
||||
.walksize = 4 * CHACHA20_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha20_setkey,
|
||||
.encrypt = chacha20_neon,
|
||||
.decrypt = chacha20_neon,
|
||||
};
|
||||
|
||||
static int __init chacha20_simd_mod_init(void)
|
||||
{
|
||||
if (!(elf_hwcap & HWCAP_NEON))
|
||||
return -ENODEV;
|
||||
|
||||
return crypto_register_skcipher(&alg);
|
||||
}
|
||||
|
||||
static void __exit chacha20_simd_mod_fini(void)
|
||||
{
|
||||
crypto_unregister_skcipher(&alg);
|
||||
}
|
||||
|
||||
module_init(chacha20_simd_mod_init);
|
||||
module_exit(chacha20_simd_mod_fini);
|
||||
|
||||
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
|
||||
MODULE_LICENSE("GPL v2");
|
||||
MODULE_ALIAS_CRYPTO("chacha20");
|
116
arch/arm/crypto/nh-neon-core.S
Normal file
116
arch/arm/crypto/nh-neon-core.S
Normal file
@ -0,0 +1,116 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
/*
|
||||
* NH - ε-almost-universal hash function, NEON accelerated version
|
||||
*
|
||||
* Copyright 2018 Google LLC
|
||||
*
|
||||
* Author: Eric Biggers <ebiggers@google.com>
|
||||
*/
|
||||
|
||||
#include <linux/linkage.h>
|
||||
|
||||
.text
|
||||
.fpu neon
|
||||
|
||||
KEY .req r0
|
||||
MESSAGE .req r1
|
||||
MESSAGE_LEN .req r2
|
||||
HASH .req r3
|
||||
|
||||
PASS0_SUMS .req q0
|
||||
PASS0_SUM_A .req d0
|
||||
PASS0_SUM_B .req d1
|
||||
PASS1_SUMS .req q1
|
||||
PASS1_SUM_A .req d2
|
||||
PASS1_SUM_B .req d3
|
||||
PASS2_SUMS .req q2
|
||||
PASS2_SUM_A .req d4
|
||||
PASS2_SUM_B .req d5
|
||||
PASS3_SUMS .req q3
|
||||
PASS3_SUM_A .req d6
|
||||
PASS3_SUM_B .req d7
|
||||
K0 .req q4
|
||||
K1 .req q5
|
||||
K2 .req q6
|
||||
K3 .req q7
|
||||
T0 .req q8
|
||||
T0_L .req d16
|
||||
T0_H .req d17
|
||||
T1 .req q9
|
||||
T1_L .req d18
|
||||
T1_H .req d19
|
||||
T2 .req q10
|
||||
T2_L .req d20
|
||||
T2_H .req d21
|
||||
T3 .req q11
|
||||
T3_L .req d22
|
||||
T3_H .req d23
|
||||
|
||||
.macro _nh_stride k0, k1, k2, k3
|
||||
|
||||
// Load next message stride
|
||||
vld1.8 {T3}, [MESSAGE]!
|
||||
|
||||
// Load next key stride
|
||||
vld1.32 {\k3}, [KEY]!
|
||||
|
||||
// Add message words to key words
|
||||
vadd.u32 T0, T3, \k0
|
||||
vadd.u32 T1, T3, \k1
|
||||
vadd.u32 T2, T3, \k2
|
||||
vadd.u32 T3, T3, \k3
|
||||
|
||||
// Multiply 32x32 => 64 and accumulate
|
||||
vmlal.u32 PASS0_SUMS, T0_L, T0_H
|
||||
vmlal.u32 PASS1_SUMS, T1_L, T1_H
|
||||
vmlal.u32 PASS2_SUMS, T2_L, T2_H
|
||||
vmlal.u32 PASS3_SUMS, T3_L, T3_H
|
||||
.endm
|
||||
|
||||
/*
|
||||
* void nh_neon(const u32 *key, const u8 *message, size_t message_len,
|
||||
* u8 hash[NH_HASH_BYTES])
|
||||
*
|
||||
* It's guaranteed that message_len % 16 == 0.
|
||||
*/
|
||||
ENTRY(nh_neon)
|
||||
|
||||
vld1.32 {K0,K1}, [KEY]!
|
||||
vmov.u64 PASS0_SUMS, #0
|
||||
vmov.u64 PASS1_SUMS, #0
|
||||
vld1.32 {K2}, [KEY]!
|
||||
vmov.u64 PASS2_SUMS, #0
|
||||
vmov.u64 PASS3_SUMS, #0
|
||||
|
||||
subs MESSAGE_LEN, MESSAGE_LEN, #64
|
||||
blt .Lloop4_done
|
||||
.Lloop4:
|
||||
_nh_stride K0, K1, K2, K3
|
||||
_nh_stride K1, K2, K3, K0
|
||||
_nh_stride K2, K3, K0, K1
|
||||
_nh_stride K3, K0, K1, K2
|
||||
subs MESSAGE_LEN, MESSAGE_LEN, #64
|
||||
bge .Lloop4
|
||||
|
||||
.Lloop4_done:
|
||||
ands MESSAGE_LEN, MESSAGE_LEN, #63
|
||||
beq .Ldone
|
||||
_nh_stride K0, K1, K2, K3
|
||||
|
||||
subs MESSAGE_LEN, MESSAGE_LEN, #16
|
||||
beq .Ldone
|
||||
_nh_stride K1, K2, K3, K0
|
||||
|
||||
subs MESSAGE_LEN, MESSAGE_LEN, #16
|
||||
beq .Ldone
|
||||
_nh_stride K2, K3, K0, K1
|
||||
|
||||
.Ldone:
|
||||
// Sum the accumulators for each pass, then store the sums to 'hash'
|
||||
vadd.u64 T0_L, PASS0_SUM_A, PASS0_SUM_B
|
||||
vadd.u64 T0_H, PASS1_SUM_A, PASS1_SUM_B
|
||||
vadd.u64 T1_L, PASS2_SUM_A, PASS2_SUM_B
|
||||
vadd.u64 T1_H, PASS3_SUM_A, PASS3_SUM_B
|
||||
vst1.8 {T0-T1}, [HASH]
|
||||
bx lr
|
||||
ENDPROC(nh_neon)
|
77
arch/arm/crypto/nhpoly1305-neon-glue.c
Normal file
77
arch/arm/crypto/nhpoly1305-neon-glue.c
Normal file
@ -0,0 +1,77 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/*
|
||||
* NHPoly1305 - ε-almost-∆-universal hash function for Adiantum
|
||||
* (NEON accelerated version)
|
||||
*
|
||||
* Copyright 2018 Google LLC
|
||||
*/
|
||||
|
||||
#include <asm/neon.h>
|
||||
#include <asm/simd.h>
|
||||
#include <crypto/internal/hash.h>
|
||||
#include <crypto/nhpoly1305.h>
|
||||
#include <linux/module.h>
|
||||
|
||||
asmlinkage void nh_neon(const u32 *key, const u8 *message, size_t message_len,
|
||||
u8 hash[NH_HASH_BYTES]);
|
||||
|
||||
/* wrapper to avoid indirect call to assembly, which doesn't work with CFI */
|
||||
static void _nh_neon(const u32 *key, const u8 *message, size_t message_len,
|
||||
__le64 hash[NH_NUM_PASSES])
|
||||
{
|
||||
nh_neon(key, message, message_len, (u8 *)hash);
|
||||
}
|
||||
|
||||
static int nhpoly1305_neon_update(struct shash_desc *desc,
|
||||
const u8 *src, unsigned int srclen)
|
||||
{
|
||||
if (srclen < 64 || !may_use_simd())
|
||||
return crypto_nhpoly1305_update(desc, src, srclen);
|
||||
|
||||
do {
|
||||
unsigned int n = min_t(unsigned int, srclen, PAGE_SIZE);
|
||||
|
||||
kernel_neon_begin();
|
||||
crypto_nhpoly1305_update_helper(desc, src, n, _nh_neon);
|
||||
kernel_neon_end();
|
||||
src += n;
|
||||
srclen -= n;
|
||||
} while (srclen);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static struct shash_alg nhpoly1305_alg = {
|
||||
.base.cra_name = "nhpoly1305",
|
||||
.base.cra_driver_name = "nhpoly1305-neon",
|
||||
.base.cra_priority = 200,
|
||||
.base.cra_ctxsize = sizeof(struct nhpoly1305_key),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
.digestsize = POLY1305_DIGEST_SIZE,
|
||||
.init = crypto_nhpoly1305_init,
|
||||
.update = nhpoly1305_neon_update,
|
||||
.final = crypto_nhpoly1305_final,
|
||||
.setkey = crypto_nhpoly1305_setkey,
|
||||
.descsize = sizeof(struct nhpoly1305_state),
|
||||
};
|
||||
|
||||
static int __init nhpoly1305_mod_init(void)
|
||||
{
|
||||
if (!(elf_hwcap & HWCAP_NEON))
|
||||
return -ENODEV;
|
||||
|
||||
return crypto_register_shash(&nhpoly1305_alg);
|
||||
}
|
||||
|
||||
static void __exit nhpoly1305_mod_exit(void)
|
||||
{
|
||||
crypto_unregister_shash(&nhpoly1305_alg);
|
||||
}
|
||||
|
||||
module_init(nhpoly1305_mod_init);
|
||||
module_exit(nhpoly1305_mod_exit);
|
||||
|
||||
MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function (NEON-accelerated)");
|
||||
MODULE_LICENSE("GPL v2");
|
||||
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
|
||||
MODULE_ALIAS_CRYPTO("nhpoly1305");
|
||||
MODULE_ALIAS_CRYPTO("nhpoly1305-neon");
|
@ -101,11 +101,16 @@ config CRYPTO_AES_ARM64_NEON_BLK
|
||||
select CRYPTO_SIMD
|
||||
|
||||
config CRYPTO_CHACHA20_NEON
|
||||
tristate "NEON accelerated ChaCha20 symmetric cipher"
|
||||
tristate "ChaCha20, XChaCha20, and XChaCha12 stream ciphers using NEON instructions"
|
||||
depends on KERNEL_MODE_NEON
|
||||
select CRYPTO_BLKCIPHER
|
||||
select CRYPTO_CHACHA20
|
||||
|
||||
config CRYPTO_NHPOLY1305_NEON
|
||||
tristate "NHPoly1305 hash function using NEON instructions (for Adiantum)"
|
||||
depends on KERNEL_MODE_NEON
|
||||
select CRYPTO_NHPOLY1305
|
||||
|
||||
config CRYPTO_AES_ARM64_BS
|
||||
tristate "AES in ECB/CBC/CTR/XTS modes using bit-sliced NEON algorithm"
|
||||
depends on KERNEL_MODE_NEON
|
||||
|
@ -50,8 +50,11 @@ sha256-arm64-y := sha256-glue.o sha256-core.o
|
||||
obj-$(CONFIG_CRYPTO_SHA512_ARM64) += sha512-arm64.o
|
||||
sha512-arm64-y := sha512-glue.o sha512-core.o
|
||||
|
||||
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
|
||||
chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
|
||||
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha-neon.o
|
||||
chacha-neon-y := chacha-neon-core.o chacha-neon-glue.o
|
||||
|
||||
obj-$(CONFIG_CRYPTO_NHPOLY1305_NEON) += nhpoly1305-neon.o
|
||||
nhpoly1305-neon-y := nh-neon-core.o nhpoly1305-neon-glue.o
|
||||
|
||||
obj-$(CONFIG_CRYPTO_AES_ARM64) += aes-arm64.o
|
||||
aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o
|
||||
|
@ -1,13 +1,13 @@
|
||||
/*
|
||||
* ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
|
||||
* ChaCha/XChaCha NEON helper functions
|
||||
*
|
||||
* Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
|
||||
* Copyright (C) 2016-2018 Linaro, Ltd. <ard.biesheuvel@linaro.org>
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License version 2 as
|
||||
* published by the Free Software Foundation.
|
||||
*
|
||||
* Based on:
|
||||
* Originally based on:
|
||||
* ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
|
||||
*
|
||||
* Copyright (C) 2015 Martin Willi
|
||||
@ -19,29 +19,27 @@
|
||||
*/
|
||||
|
||||
#include <linux/linkage.h>
|
||||
#include <asm/assembler.h>
|
||||
#include <asm/cache.h>
|
||||
|
||||
.text
|
||||
.align 6
|
||||
|
||||
ENTRY(chacha20_block_xor_neon)
|
||||
// x0: Input state matrix, s
|
||||
// x1: 1 data block output, o
|
||||
// x2: 1 data block input, i
|
||||
/*
|
||||
* chacha_permute - permute one block
|
||||
*
|
||||
* Permute one 64-byte block where the state matrix is stored in the four NEON
|
||||
* registers v0-v3. It performs matrix operations on four words in parallel,
|
||||
* but requires shuffling to rearrange the words after each round.
|
||||
*
|
||||
* The round count is given in w3.
|
||||
*
|
||||
* Clobbers: w3, x10, v4, v12
|
||||
*/
|
||||
chacha_permute:
|
||||
|
||||
//
|
||||
// This function encrypts one ChaCha20 block by loading the state matrix
|
||||
// in four NEON registers. It performs matrix operation on four words in
|
||||
// parallel, but requires shuffling to rearrange the words after each
|
||||
// round.
|
||||
//
|
||||
|
||||
// x0..3 = s0..3
|
||||
adr x3, ROT8
|
||||
ld1 {v0.4s-v3.4s}, [x0]
|
||||
ld1 {v8.4s-v11.4s}, [x0]
|
||||
ld1 {v12.4s}, [x3]
|
||||
|
||||
mov x3, #10
|
||||
adr_l x10, ROT8
|
||||
ld1 {v12.4s}, [x10]
|
||||
|
||||
.Ldoubleround:
|
||||
// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
|
||||
@ -102,9 +100,27 @@ ENTRY(chacha20_block_xor_neon)
|
||||
// x3 = shuffle32(x3, MASK(0, 3, 2, 1))
|
||||
ext v3.16b, v3.16b, v3.16b, #4
|
||||
|
||||
subs x3, x3, #1
|
||||
subs w3, w3, #2
|
||||
b.ne .Ldoubleround
|
||||
|
||||
ret
|
||||
ENDPROC(chacha_permute)
|
||||
|
||||
ENTRY(chacha_block_xor_neon)
|
||||
// x0: Input state matrix, s
|
||||
// x1: 1 data block output, o
|
||||
// x2: 1 data block input, i
|
||||
// w3: nrounds
|
||||
|
||||
stp x29, x30, [sp, #-16]!
|
||||
mov x29, sp
|
||||
|
||||
// x0..3 = s0..3
|
||||
ld1 {v0.4s-v3.4s}, [x0]
|
||||
ld1 {v8.4s-v11.4s}, [x0]
|
||||
|
||||
bl chacha_permute
|
||||
|
||||
ld1 {v4.16b-v7.16b}, [x2]
|
||||
|
||||
// o0 = i0 ^ (x0 + s0)
|
||||
@ -125,71 +141,156 @@ ENTRY(chacha20_block_xor_neon)
|
||||
|
||||
st1 {v0.16b-v3.16b}, [x1]
|
||||
|
||||
ldp x29, x30, [sp], #16
|
||||
ret
|
||||
ENDPROC(chacha20_block_xor_neon)
|
||||
ENDPROC(chacha_block_xor_neon)
|
||||
|
||||
ENTRY(hchacha_block_neon)
|
||||
// x0: Input state matrix, s
|
||||
// x1: output (8 32-bit words)
|
||||
// w2: nrounds
|
||||
|
||||
stp x29, x30, [sp, #-16]!
|
||||
mov x29, sp
|
||||
|
||||
ld1 {v0.4s-v3.4s}, [x0]
|
||||
|
||||
mov w3, w2
|
||||
bl chacha_permute
|
||||
|
||||
st1 {v0.16b}, [x1], #16
|
||||
st1 {v3.16b}, [x1]
|
||||
|
||||
ldp x29, x30, [sp], #16
|
||||
ret
|
||||
ENDPROC(hchacha_block_neon)
|
||||
|
||||
a0 .req w12
|
||||
a1 .req w13
|
||||
a2 .req w14
|
||||
a3 .req w15
|
||||
a4 .req w16
|
||||
a5 .req w17
|
||||
a6 .req w19
|
||||
a7 .req w20
|
||||
a8 .req w21
|
||||
a9 .req w22
|
||||
a10 .req w23
|
||||
a11 .req w24
|
||||
a12 .req w25
|
||||
a13 .req w26
|
||||
a14 .req w27
|
||||
a15 .req w28
|
||||
|
||||
.align 6
|
||||
ENTRY(chacha20_4block_xor_neon)
|
||||
ENTRY(chacha_4block_xor_neon)
|
||||
frame_push 10
|
||||
|
||||
// x0: Input state matrix, s
|
||||
// x1: 4 data blocks output, o
|
||||
// x2: 4 data blocks input, i
|
||||
// w3: nrounds
|
||||
// x4: byte count
|
||||
|
||||
adr_l x10, .Lpermute
|
||||
and x5, x4, #63
|
||||
add x10, x10, x5
|
||||
add x11, x10, #64
|
||||
|
||||
//
|
||||
// This function encrypts four consecutive ChaCha20 blocks by loading
|
||||
// This function encrypts four consecutive ChaCha blocks by loading
|
||||
// the state matrix in NEON registers four times. The algorithm performs
|
||||
// each operation on the corresponding word of each state matrix, hence
|
||||
// requires no word shuffling. For final XORing step we transpose the
|
||||
// matrix by interleaving 32- and then 64-bit words, which allows us to
|
||||
// do XOR in NEON registers.
|
||||
//
|
||||
adr x3, CTRINC // ... and ROT8
|
||||
ld1 {v30.4s-v31.4s}, [x3]
|
||||
// At the same time, a fifth block is encrypted in parallel using
|
||||
// scalar registers
|
||||
//
|
||||
adr_l x9, CTRINC // ... and ROT8
|
||||
ld1 {v30.4s-v31.4s}, [x9]
|
||||
|
||||
// x0..15[0-3] = s0..3[0..3]
|
||||
mov x4, x0
|
||||
ld4r { v0.4s- v3.4s}, [x4], #16
|
||||
ld4r { v4.4s- v7.4s}, [x4], #16
|
||||
ld4r { v8.4s-v11.4s}, [x4], #16
|
||||
ld4r {v12.4s-v15.4s}, [x4]
|
||||
add x8, x0, #16
|
||||
ld4r { v0.4s- v3.4s}, [x0]
|
||||
ld4r { v4.4s- v7.4s}, [x8], #16
|
||||
ld4r { v8.4s-v11.4s}, [x8], #16
|
||||
ld4r {v12.4s-v15.4s}, [x8]
|
||||
|
||||
// x12 += counter values 0-3
|
||||
mov a0, v0.s[0]
|
||||
mov a1, v1.s[0]
|
||||
mov a2, v2.s[0]
|
||||
mov a3, v3.s[0]
|
||||
mov a4, v4.s[0]
|
||||
mov a5, v5.s[0]
|
||||
mov a6, v6.s[0]
|
||||
mov a7, v7.s[0]
|
||||
mov a8, v8.s[0]
|
||||
mov a9, v9.s[0]
|
||||
mov a10, v10.s[0]
|
||||
mov a11, v11.s[0]
|
||||
mov a12, v12.s[0]
|
||||
mov a13, v13.s[0]
|
||||
mov a14, v14.s[0]
|
||||
mov a15, v15.s[0]
|
||||
|
||||
// x12 += counter values 1-4
|
||||
add v12.4s, v12.4s, v30.4s
|
||||
|
||||
mov x3, #10
|
||||
|
||||
.Ldoubleround4:
|
||||
// x0 += x4, x12 = rotl32(x12 ^ x0, 16)
|
||||
// x1 += x5, x13 = rotl32(x13 ^ x1, 16)
|
||||
// x2 += x6, x14 = rotl32(x14 ^ x2, 16)
|
||||
// x3 += x7, x15 = rotl32(x15 ^ x3, 16)
|
||||
add v0.4s, v0.4s, v4.4s
|
||||
add a0, a0, a4
|
||||
add v1.4s, v1.4s, v5.4s
|
||||
add a1, a1, a5
|
||||
add v2.4s, v2.4s, v6.4s
|
||||
add a2, a2, a6
|
||||
add v3.4s, v3.4s, v7.4s
|
||||
add a3, a3, a7
|
||||
|
||||
eor v12.16b, v12.16b, v0.16b
|
||||
eor a12, a12, a0
|
||||
eor v13.16b, v13.16b, v1.16b
|
||||
eor a13, a13, a1
|
||||
eor v14.16b, v14.16b, v2.16b
|
||||
eor a14, a14, a2
|
||||
eor v15.16b, v15.16b, v3.16b
|
||||
eor a15, a15, a3
|
||||
|
||||
rev32 v12.8h, v12.8h
|
||||
ror a12, a12, #16
|
||||
rev32 v13.8h, v13.8h
|
||||
ror a13, a13, #16
|
||||
rev32 v14.8h, v14.8h
|
||||
ror a14, a14, #16
|
||||
rev32 v15.8h, v15.8h
|
||||
ror a15, a15, #16
|
||||
|
||||
// x8 += x12, x4 = rotl32(x4 ^ x8, 12)
|
||||
// x9 += x13, x5 = rotl32(x5 ^ x9, 12)
|
||||
// x10 += x14, x6 = rotl32(x6 ^ x10, 12)
|
||||
// x11 += x15, x7 = rotl32(x7 ^ x11, 12)
|
||||
add v8.4s, v8.4s, v12.4s
|
||||
add a8, a8, a12
|
||||
add v9.4s, v9.4s, v13.4s
|
||||
add a9, a9, a13
|
||||
add v10.4s, v10.4s, v14.4s
|
||||
add a10, a10, a14
|
||||
add v11.4s, v11.4s, v15.4s
|
||||
add a11, a11, a15
|
||||
|
||||
eor v16.16b, v4.16b, v8.16b
|
||||
eor a4, a4, a8
|
||||
eor v17.16b, v5.16b, v9.16b
|
||||
eor a5, a5, a9
|
||||
eor v18.16b, v6.16b, v10.16b
|
||||
eor a6, a6, a10
|
||||
eor v19.16b, v7.16b, v11.16b
|
||||
eor a7, a7, a11
|
||||
|
||||
shl v4.4s, v16.4s, #12
|
||||
shl v5.4s, v17.4s, #12
|
||||
@ -197,42 +298,66 @@ ENTRY(chacha20_4block_xor_neon)
|
||||
shl v7.4s, v19.4s, #12
|
||||
|
||||
sri v4.4s, v16.4s, #20
|
||||
ror a4, a4, #20
|
||||
sri v5.4s, v17.4s, #20
|
||||
ror a5, a5, #20
|
||||
sri v6.4s, v18.4s, #20
|
||||
ror a6, a6, #20
|
||||
sri v7.4s, v19.4s, #20
|
||||
ror a7, a7, #20
|
||||
|
||||
// x0 += x4, x12 = rotl32(x12 ^ x0, 8)
|
||||
// x1 += x5, x13 = rotl32(x13 ^ x1, 8)
|
||||
// x2 += x6, x14 = rotl32(x14 ^ x2, 8)
|
||||
// x3 += x7, x15 = rotl32(x15 ^ x3, 8)
|
||||
add v0.4s, v0.4s, v4.4s
|
||||
add a0, a0, a4
|
||||
add v1.4s, v1.4s, v5.4s
|
||||
add a1, a1, a5
|
||||
add v2.4s, v2.4s, v6.4s
|
||||
add a2, a2, a6
|
||||
add v3.4s, v3.4s, v7.4s
|
||||
add a3, a3, a7
|
||||
|
||||
eor v12.16b, v12.16b, v0.16b
|
||||
eor a12, a12, a0
|
||||
eor v13.16b, v13.16b, v1.16b
|
||||
eor a13, a13, a1
|
||||
eor v14.16b, v14.16b, v2.16b
|
||||
eor a14, a14, a2
|
||||
eor v15.16b, v15.16b, v3.16b
|
||||
eor a15, a15, a3
|
||||
|
||||
tbl v12.16b, {v12.16b}, v31.16b
|
||||
ror a12, a12, #24
|
||||
tbl v13.16b, {v13.16b}, v31.16b
|
||||
ror a13, a13, #24
|
||||
tbl v14.16b, {v14.16b}, v31.16b
|
||||
ror a14, a14, #24
|
||||
tbl v15.16b, {v15.16b}, v31.16b
|
||||
ror a15, a15, #24
|
||||
|
||||
// x8 += x12, x4 = rotl32(x4 ^ x8, 7)
|
||||
// x9 += x13, x5 = rotl32(x5 ^ x9, 7)
|
||||
// x10 += x14, x6 = rotl32(x6 ^ x10, 7)
|
||||
// x11 += x15, x7 = rotl32(x7 ^ x11, 7)
|
||||
add v8.4s, v8.4s, v12.4s
|
||||
add a8, a8, a12
|
||||
add v9.4s, v9.4s, v13.4s
|
||||
add a9, a9, a13
|
||||
add v10.4s, v10.4s, v14.4s
|
||||
add a10, a10, a14
|
||||
add v11.4s, v11.4s, v15.4s
|
||||
add a11, a11, a15
|
||||
|
||||
eor v16.16b, v4.16b, v8.16b
|
||||
eor a4, a4, a8
|
||||
eor v17.16b, v5.16b, v9.16b
|
||||
eor a5, a5, a9
|
||||
eor v18.16b, v6.16b, v10.16b
|
||||
eor a6, a6, a10
|
||||
eor v19.16b, v7.16b, v11.16b
|
||||
eor a7, a7, a11
|
||||
|
||||
shl v4.4s, v16.4s, #7
|
||||
shl v5.4s, v17.4s, #7
|
||||
@ -240,42 +365,66 @@ ENTRY(chacha20_4block_xor_neon)
|
||||
shl v7.4s, v19.4s, #7
|
||||
|
||||
sri v4.4s, v16.4s, #25
|
||||
ror a4, a4, #25
|
||||
sri v5.4s, v17.4s, #25
|
||||
ror a5, a5, #25
|
||||
sri v6.4s, v18.4s, #25
|
||||
ror a6, a6, #25
|
||||
sri v7.4s, v19.4s, #25
|
||||
ror a7, a7, #25
|
||||
|
||||
// x0 += x5, x15 = rotl32(x15 ^ x0, 16)
|
||||
// x1 += x6, x12 = rotl32(x12 ^ x1, 16)
|
||||
// x2 += x7, x13 = rotl32(x13 ^ x2, 16)
|
||||
// x3 += x4, x14 = rotl32(x14 ^ x3, 16)
|
||||
add v0.4s, v0.4s, v5.4s
|
||||
add a0, a0, a5
|
||||
add v1.4s, v1.4s, v6.4s
|
||||
add a1, a1, a6
|
||||
add v2.4s, v2.4s, v7.4s
|
||||
add a2, a2, a7
|
||||
add v3.4s, v3.4s, v4.4s
|
||||
add a3, a3, a4
|
||||
|
||||
eor v15.16b, v15.16b, v0.16b
|
||||
eor a15, a15, a0
|
||||
eor v12.16b, v12.16b, v1.16b
|
||||
eor a12, a12, a1
|
||||
eor v13.16b, v13.16b, v2.16b
|
||||
eor a13, a13, a2
|
||||
eor v14.16b, v14.16b, v3.16b
|
||||
eor a14, a14, a3
|
||||
|
||||
rev32 v15.8h, v15.8h
|
||||
ror a15, a15, #16
|
||||
rev32 v12.8h, v12.8h
|
||||
ror a12, a12, #16
|
||||
rev32 v13.8h, v13.8h
|
||||
ror a13, a13, #16
|
||||
rev32 v14.8h, v14.8h
|
||||
ror a14, a14, #16
|
||||
|
||||
// x10 += x15, x5 = rotl32(x5 ^ x10, 12)
|
||||
// x11 += x12, x6 = rotl32(x6 ^ x11, 12)
|
||||
// x8 += x13, x7 = rotl32(x7 ^ x8, 12)
|
||||
// x9 += x14, x4 = rotl32(x4 ^ x9, 12)
|
||||
add v10.4s, v10.4s, v15.4s
|
||||
add a10, a10, a15
|
||||
add v11.4s, v11.4s, v12.4s
|
||||
add a11, a11, a12
|
||||
add v8.4s, v8.4s, v13.4s
|
||||
add a8, a8, a13
|
||||
add v9.4s, v9.4s, v14.4s
|
||||
add a9, a9, a14
|
||||
|
||||
eor v16.16b, v5.16b, v10.16b
|
||||
eor a5, a5, a10
|
||||
eor v17.16b, v6.16b, v11.16b
|
||||
eor a6, a6, a11
|
||||
eor v18.16b, v7.16b, v8.16b
|
||||
eor a7, a7, a8
|
||||
eor v19.16b, v4.16b, v9.16b
|
||||
eor a4, a4, a9
|
||||
|
||||
shl v5.4s, v16.4s, #12
|
||||
shl v6.4s, v17.4s, #12
|
||||
@ -283,42 +432,66 @@ ENTRY(chacha20_4block_xor_neon)
|
||||
shl v4.4s, v19.4s, #12
|
||||
|
||||
sri v5.4s, v16.4s, #20
|
||||
ror a5, a5, #20
|
||||
sri v6.4s, v17.4s, #20
|
||||
ror a6, a6, #20
|
||||
sri v7.4s, v18.4s, #20
|
||||
ror a7, a7, #20
|
||||
sri v4.4s, v19.4s, #20
|
||||
ror a4, a4, #20
|
||||
|
||||
// x0 += x5, x15 = rotl32(x15 ^ x0, 8)
|
||||
// x1 += x6, x12 = rotl32(x12 ^ x1, 8)
|
||||
// x2 += x7, x13 = rotl32(x13 ^ x2, 8)
|
||||
// x3 += x4, x14 = rotl32(x14 ^ x3, 8)
|
||||
add v0.4s, v0.4s, v5.4s
|
||||
add a0, a0, a5
|
||||
add v1.4s, v1.4s, v6.4s
|
||||
add a1, a1, a6
|
||||
add v2.4s, v2.4s, v7.4s
|
||||
add a2, a2, a7
|
||||
add v3.4s, v3.4s, v4.4s
|
||||
add a3, a3, a4
|
||||
|
||||
eor v15.16b, v15.16b, v0.16b
|
||||
eor a15, a15, a0
|
||||
eor v12.16b, v12.16b, v1.16b
|
||||
eor a12, a12, a1
|
||||
eor v13.16b, v13.16b, v2.16b
|
||||
eor a13, a13, a2
|
||||
eor v14.16b, v14.16b, v3.16b
|
||||
eor a14, a14, a3
|
||||
|
||||
tbl v15.16b, {v15.16b}, v31.16b
|
||||
ror a15, a15, #24
|
||||
tbl v12.16b, {v12.16b}, v31.16b
|
||||
ror a12, a12, #24
|
||||
tbl v13.16b, {v13.16b}, v31.16b
|
||||
ror a13, a13, #24
|
||||
tbl v14.16b, {v14.16b}, v31.16b
|
||||
ror a14, a14, #24
|
||||
|
||||
// x10 += x15, x5 = rotl32(x5 ^ x10, 7)
|
||||
// x11 += x12, x6 = rotl32(x6 ^ x11, 7)
|
||||
// x8 += x13, x7 = rotl32(x7 ^ x8, 7)
|
||||
// x9 += x14, x4 = rotl32(x4 ^ x9, 7)
|
||||
add v10.4s, v10.4s, v15.4s
|
||||
add a10, a10, a15
|
||||
add v11.4s, v11.4s, v12.4s
|
||||
add a11, a11, a12
|
||||
add v8.4s, v8.4s, v13.4s
|
||||
add a8, a8, a13
|
||||
add v9.4s, v9.4s, v14.4s
|
||||
add a9, a9, a14
|
||||
|
||||
eor v16.16b, v5.16b, v10.16b
|
||||
eor a5, a5, a10
|
||||
eor v17.16b, v6.16b, v11.16b
|
||||
eor a6, a6, a11
|
||||
eor v18.16b, v7.16b, v8.16b
|
||||
eor a7, a7, a8
|
||||
eor v19.16b, v4.16b, v9.16b
|
||||
eor a4, a4, a9
|
||||
|
||||
shl v5.4s, v16.4s, #7
|
||||
shl v6.4s, v17.4s, #7
|
||||
@ -326,11 +499,15 @@ ENTRY(chacha20_4block_xor_neon)
|
||||
shl v4.4s, v19.4s, #7
|
||||
|
||||
sri v5.4s, v16.4s, #25
|
||||
ror a5, a5, #25
|
||||
sri v6.4s, v17.4s, #25
|
||||
ror a6, a6, #25
|
||||
sri v7.4s, v18.4s, #25
|
||||
ror a7, a7, #25
|
||||
sri v4.4s, v19.4s, #25
|
||||
ror a4, a4, #25
|
||||
|
||||
subs x3, x3, #1
|
||||
subs w3, w3, #2
|
||||
b.ne .Ldoubleround4
|
||||
|
||||
ld4r {v16.4s-v19.4s}, [x0], #16
|
||||
@ -344,9 +521,17 @@ ENTRY(chacha20_4block_xor_neon)
|
||||
// x2[0-3] += s0[2]
|
||||
// x3[0-3] += s0[3]
|
||||
add v0.4s, v0.4s, v16.4s
|
||||
mov w6, v16.s[0]
|
||||
mov w7, v17.s[0]
|
||||
add v1.4s, v1.4s, v17.4s
|
||||
mov w8, v18.s[0]
|
||||
mov w9, v19.s[0]
|
||||
add v2.4s, v2.4s, v18.4s
|
||||
add a0, a0, w6
|
||||
add a1, a1, w7
|
||||
add v3.4s, v3.4s, v19.4s
|
||||
add a2, a2, w8
|
||||
add a3, a3, w9
|
||||
|
||||
ld4r {v24.4s-v27.4s}, [x0], #16
|
||||
ld4r {v28.4s-v31.4s}, [x0]
|
||||
@ -356,95 +541,304 @@ ENTRY(chacha20_4block_xor_neon)
|
||||
// x6[0-3] += s1[2]
|
||||
// x7[0-3] += s1[3]
|
||||
add v4.4s, v4.4s, v20.4s
|
||||
mov w6, v20.s[0]
|
||||
mov w7, v21.s[0]
|
||||
add v5.4s, v5.4s, v21.4s
|
||||
mov w8, v22.s[0]
|
||||
mov w9, v23.s[0]
|
||||
add v6.4s, v6.4s, v22.4s
|
||||
add a4, a4, w6
|
||||
add a5, a5, w7
|
||||
add v7.4s, v7.4s, v23.4s
|
||||
add a6, a6, w8
|
||||
add a7, a7, w9
|
||||
|
||||
// x8[0-3] += s2[0]
|
||||
// x9[0-3] += s2[1]
|
||||
// x10[0-3] += s2[2]
|
||||
// x11[0-3] += s2[3]
|
||||
add v8.4s, v8.4s, v24.4s
|
||||
mov w6, v24.s[0]
|
||||
mov w7, v25.s[0]
|
||||
add v9.4s, v9.4s, v25.4s
|
||||
mov w8, v26.s[0]
|
||||
mov w9, v27.s[0]
|
||||
add v10.4s, v10.4s, v26.4s
|
||||
add a8, a8, w6
|
||||
add a9, a9, w7
|
||||
add v11.4s, v11.4s, v27.4s
|
||||
add a10, a10, w8
|
||||
add a11, a11, w9
|
||||
|
||||
// x12[0-3] += s3[0]
|
||||
// x13[0-3] += s3[1]
|
||||
// x14[0-3] += s3[2]
|
||||
// x15[0-3] += s3[3]
|
||||
add v12.4s, v12.4s, v28.4s
|
||||
mov w6, v28.s[0]
|
||||
mov w7, v29.s[0]
|
||||
add v13.4s, v13.4s, v29.4s
|
||||
mov w8, v30.s[0]
|
||||
mov w9, v31.s[0]
|
||||
add v14.4s, v14.4s, v30.4s
|
||||
add a12, a12, w6
|
||||
add a13, a13, w7
|
||||
add v15.4s, v15.4s, v31.4s
|
||||
add a14, a14, w8
|
||||
add a15, a15, w9
|
||||
|
||||
// interleave 32-bit words in state n, n+1
|
||||
ldp w6, w7, [x2], #64
|
||||
zip1 v16.4s, v0.4s, v1.4s
|
||||
ldp w8, w9, [x2, #-56]
|
||||
eor a0, a0, w6
|
||||
zip2 v17.4s, v0.4s, v1.4s
|
||||
eor a1, a1, w7
|
||||
zip1 v18.4s, v2.4s, v3.4s
|
||||
eor a2, a2, w8
|
||||
zip2 v19.4s, v2.4s, v3.4s
|
||||
eor a3, a3, w9
|
||||
ldp w6, w7, [x2, #-48]
|
||||
zip1 v20.4s, v4.4s, v5.4s
|
||||
ldp w8, w9, [x2, #-40]
|
||||
eor a4, a4, w6
|
||||
zip2 v21.4s, v4.4s, v5.4s
|
||||
eor a5, a5, w7
|
||||
zip1 v22.4s, v6.4s, v7.4s
|
||||
eor a6, a6, w8
|
||||
zip2 v23.4s, v6.4s, v7.4s
|
||||
eor a7, a7, w9
|
||||
ldp w6, w7, [x2, #-32]
|
||||
zip1 v24.4s, v8.4s, v9.4s
|
||||
ldp w8, w9, [x2, #-24]
|
||||
eor a8, a8, w6
|
||||
zip2 v25.4s, v8.4s, v9.4s
|
||||
eor a9, a9, w7
|
||||
zip1 v26.4s, v10.4s, v11.4s
|
||||
eor a10, a10, w8
|
||||
zip2 v27.4s, v10.4s, v11.4s
|
||||
eor a11, a11, w9
|
||||
ldp w6, w7, [x2, #-16]
|
||||
zip1 v28.4s, v12.4s, v13.4s
|
||||
ldp w8, w9, [x2, #-8]
|
||||
eor a12, a12, w6
|
||||
zip2 v29.4s, v12.4s, v13.4s
|
||||
eor a13, a13, w7
|
||||
zip1 v30.4s, v14.4s, v15.4s
|
||||
eor a14, a14, w8
|
||||
zip2 v31.4s, v14.4s, v15.4s
|
||||
eor a15, a15, w9
|
||||
|
||||
mov x3, #64
|
||||
subs x5, x4, #128
|
||||
add x6, x5, x2
|
||||
csel x3, x3, xzr, ge
|
||||
csel x2, x2, x6, ge
|
||||
|
||||
// interleave 64-bit words in state n, n+2
|
||||
zip1 v0.2d, v16.2d, v18.2d
|
||||
zip2 v4.2d, v16.2d, v18.2d
|
||||
stp a0, a1, [x1], #64
|
||||
zip1 v8.2d, v17.2d, v19.2d
|
||||
zip2 v12.2d, v17.2d, v19.2d
|
||||
ld1 {v16.16b-v19.16b}, [x2], #64
|
||||
stp a2, a3, [x1, #-56]
|
||||
ld1 {v16.16b-v19.16b}, [x2], x3
|
||||
|
||||
subs x6, x4, #192
|
||||
ccmp x3, xzr, #4, lt
|
||||
add x7, x6, x2
|
||||
csel x3, x3, xzr, eq
|
||||
csel x2, x2, x7, eq
|
||||
|
||||
zip1 v1.2d, v20.2d, v22.2d
|
||||
zip2 v5.2d, v20.2d, v22.2d
|
||||
stp a4, a5, [x1, #-48]
|
||||
zip1 v9.2d, v21.2d, v23.2d
|
||||
zip2 v13.2d, v21.2d, v23.2d
|
||||
ld1 {v20.16b-v23.16b}, [x2], #64
|
||||
stp a6, a7, [x1, #-40]
|
||||
ld1 {v20.16b-v23.16b}, [x2], x3
|
||||
|
||||
subs x7, x4, #256
|
||||
ccmp x3, xzr, #4, lt
|
||||
add x8, x7, x2
|
||||
csel x3, x3, xzr, eq
|
||||
csel x2, x2, x8, eq
|
||||
|
||||
zip1 v2.2d, v24.2d, v26.2d
|
||||
zip2 v6.2d, v24.2d, v26.2d
|
||||
stp a8, a9, [x1, #-32]
|
||||
zip1 v10.2d, v25.2d, v27.2d
|
||||
zip2 v14.2d, v25.2d, v27.2d
|
||||
ld1 {v24.16b-v27.16b}, [x2], #64
|
||||
stp a10, a11, [x1, #-24]
|
||||
ld1 {v24.16b-v27.16b}, [x2], x3
|
||||
|
||||
subs x8, x4, #320
|
||||
ccmp x3, xzr, #4, lt
|
||||
add x9, x8, x2
|
||||
csel x2, x2, x9, eq
|
||||
|
||||
zip1 v3.2d, v28.2d, v30.2d
|
||||
zip2 v7.2d, v28.2d, v30.2d
|
||||
stp a12, a13, [x1, #-16]
|
||||
zip1 v11.2d, v29.2d, v31.2d
|
||||
zip2 v15.2d, v29.2d, v31.2d
|
||||
stp a14, a15, [x1, #-8]
|
||||
ld1 {v28.16b-v31.16b}, [x2]
|
||||
|
||||
// xor with corresponding input, write to output
|
||||
tbnz x5, #63, 0f
|
||||
eor v16.16b, v16.16b, v0.16b
|
||||
eor v17.16b, v17.16b, v1.16b
|
||||
eor v18.16b, v18.16b, v2.16b
|
||||
eor v19.16b, v19.16b, v3.16b
|
||||
st1 {v16.16b-v19.16b}, [x1], #64
|
||||
cbz x5, .Lout
|
||||
|
||||
tbnz x6, #63, 1f
|
||||
eor v20.16b, v20.16b, v4.16b
|
||||
eor v21.16b, v21.16b, v5.16b
|
||||
st1 {v16.16b-v19.16b}, [x1], #64
|
||||
eor v22.16b, v22.16b, v6.16b
|
||||
eor v23.16b, v23.16b, v7.16b
|
||||
st1 {v20.16b-v23.16b}, [x1], #64
|
||||
cbz x6, .Lout
|
||||
|
||||
tbnz x7, #63, 2f
|
||||
eor v24.16b, v24.16b, v8.16b
|
||||
eor v25.16b, v25.16b, v9.16b
|
||||
st1 {v20.16b-v23.16b}, [x1], #64
|
||||
eor v26.16b, v26.16b, v10.16b
|
||||
eor v27.16b, v27.16b, v11.16b
|
||||
eor v28.16b, v28.16b, v12.16b
|
||||
st1 {v24.16b-v27.16b}, [x1], #64
|
||||
cbz x7, .Lout
|
||||
|
||||
tbnz x8, #63, 3f
|
||||
eor v28.16b, v28.16b, v12.16b
|
||||
eor v29.16b, v29.16b, v13.16b
|
||||
eor v30.16b, v30.16b, v14.16b
|
||||
eor v31.16b, v31.16b, v15.16b
|
||||
st1 {v28.16b-v31.16b}, [x1]
|
||||
|
||||
.Lout: frame_pop
|
||||
ret
|
||||
ENDPROC(chacha20_4block_xor_neon)
|
||||
|
||||
CTRINC: .word 0, 1, 2, 3
|
||||
// fewer than 128 bytes of in/output
|
||||
0: ld1 {v8.16b}, [x10]
|
||||
ld1 {v9.16b}, [x11]
|
||||
movi v10.16b, #16
|
||||
sub x2, x1, #64
|
||||
add x1, x1, x5
|
||||
ld1 {v16.16b-v19.16b}, [x2]
|
||||
tbl v4.16b, {v0.16b-v3.16b}, v8.16b
|
||||
tbx v20.16b, {v16.16b-v19.16b}, v9.16b
|
||||
add v8.16b, v8.16b, v10.16b
|
||||
add v9.16b, v9.16b, v10.16b
|
||||
tbl v5.16b, {v0.16b-v3.16b}, v8.16b
|
||||
tbx v21.16b, {v16.16b-v19.16b}, v9.16b
|
||||
add v8.16b, v8.16b, v10.16b
|
||||
add v9.16b, v9.16b, v10.16b
|
||||
tbl v6.16b, {v0.16b-v3.16b}, v8.16b
|
||||
tbx v22.16b, {v16.16b-v19.16b}, v9.16b
|
||||
add v8.16b, v8.16b, v10.16b
|
||||
add v9.16b, v9.16b, v10.16b
|
||||
tbl v7.16b, {v0.16b-v3.16b}, v8.16b
|
||||
tbx v23.16b, {v16.16b-v19.16b}, v9.16b
|
||||
|
||||
eor v20.16b, v20.16b, v4.16b
|
||||
eor v21.16b, v21.16b, v5.16b
|
||||
eor v22.16b, v22.16b, v6.16b
|
||||
eor v23.16b, v23.16b, v7.16b
|
||||
st1 {v20.16b-v23.16b}, [x1]
|
||||
b .Lout
|
||||
|
||||
// fewer than 192 bytes of in/output
|
||||
1: ld1 {v8.16b}, [x10]
|
||||
ld1 {v9.16b}, [x11]
|
||||
movi v10.16b, #16
|
||||
add x1, x1, x6
|
||||
tbl v0.16b, {v4.16b-v7.16b}, v8.16b
|
||||
tbx v20.16b, {v16.16b-v19.16b}, v9.16b
|
||||
add v8.16b, v8.16b, v10.16b
|
||||
add v9.16b, v9.16b, v10.16b
|
||||
tbl v1.16b, {v4.16b-v7.16b}, v8.16b
|
||||
tbx v21.16b, {v16.16b-v19.16b}, v9.16b
|
||||
add v8.16b, v8.16b, v10.16b
|
||||
add v9.16b, v9.16b, v10.16b
|
||||
tbl v2.16b, {v4.16b-v7.16b}, v8.16b
|
||||
tbx v22.16b, {v16.16b-v19.16b}, v9.16b
|
||||
add v8.16b, v8.16b, v10.16b
|
||||
add v9.16b, v9.16b, v10.16b
|
||||
tbl v3.16b, {v4.16b-v7.16b}, v8.16b
|
||||
tbx v23.16b, {v16.16b-v19.16b}, v9.16b
|
||||
|
||||
eor v20.16b, v20.16b, v0.16b
|
||||
eor v21.16b, v21.16b, v1.16b
|
||||
eor v22.16b, v22.16b, v2.16b
|
||||
eor v23.16b, v23.16b, v3.16b
|
||||
st1 {v20.16b-v23.16b}, [x1]
|
||||
b .Lout
|
||||
|
||||
// fewer than 256 bytes of in/output
|
||||
2: ld1 {v4.16b}, [x10]
|
||||
ld1 {v5.16b}, [x11]
|
||||
movi v6.16b, #16
|
||||
add x1, x1, x7
|
||||
tbl v0.16b, {v8.16b-v11.16b}, v4.16b
|
||||
tbx v24.16b, {v20.16b-v23.16b}, v5.16b
|
||||
add v4.16b, v4.16b, v6.16b
|
||||
add v5.16b, v5.16b, v6.16b
|
||||
tbl v1.16b, {v8.16b-v11.16b}, v4.16b
|
||||
tbx v25.16b, {v20.16b-v23.16b}, v5.16b
|
||||
add v4.16b, v4.16b, v6.16b
|
||||
add v5.16b, v5.16b, v6.16b
|
||||
tbl v2.16b, {v8.16b-v11.16b}, v4.16b
|
||||
tbx v26.16b, {v20.16b-v23.16b}, v5.16b
|
||||
add v4.16b, v4.16b, v6.16b
|
||||
add v5.16b, v5.16b, v6.16b
|
||||
tbl v3.16b, {v8.16b-v11.16b}, v4.16b
|
||||
tbx v27.16b, {v20.16b-v23.16b}, v5.16b
|
||||
|
||||
eor v24.16b, v24.16b, v0.16b
|
||||
eor v25.16b, v25.16b, v1.16b
|
||||
eor v26.16b, v26.16b, v2.16b
|
||||
eor v27.16b, v27.16b, v3.16b
|
||||
st1 {v24.16b-v27.16b}, [x1]
|
||||
b .Lout
|
||||
|
||||
// fewer than 320 bytes of in/output
|
||||
3: ld1 {v4.16b}, [x10]
|
||||
ld1 {v5.16b}, [x11]
|
||||
movi v6.16b, #16
|
||||
add x1, x1, x8
|
||||
tbl v0.16b, {v12.16b-v15.16b}, v4.16b
|
||||
tbx v28.16b, {v24.16b-v27.16b}, v5.16b
|
||||
add v4.16b, v4.16b, v6.16b
|
||||
add v5.16b, v5.16b, v6.16b
|
||||
tbl v1.16b, {v12.16b-v15.16b}, v4.16b
|
||||
tbx v29.16b, {v24.16b-v27.16b}, v5.16b
|
||||
add v4.16b, v4.16b, v6.16b
|
||||
add v5.16b, v5.16b, v6.16b
|
||||
tbl v2.16b, {v12.16b-v15.16b}, v4.16b
|
||||
tbx v30.16b, {v24.16b-v27.16b}, v5.16b
|
||||
add v4.16b, v4.16b, v6.16b
|
||||
add v5.16b, v5.16b, v6.16b
|
||||
tbl v3.16b, {v12.16b-v15.16b}, v4.16b
|
||||
tbx v31.16b, {v24.16b-v27.16b}, v5.16b
|
||||
|
||||
eor v28.16b, v28.16b, v0.16b
|
||||
eor v29.16b, v29.16b, v1.16b
|
||||
eor v30.16b, v30.16b, v2.16b
|
||||
eor v31.16b, v31.16b, v3.16b
|
||||
st1 {v28.16b-v31.16b}, [x1]
|
||||
b .Lout
|
||||
ENDPROC(chacha_4block_xor_neon)
|
||||
|
||||
.section ".rodata", "a", %progbits
|
||||
.align L1_CACHE_SHIFT
|
||||
.Lpermute:
|
||||
.set .Li, 0
|
||||
.rept 192
|
||||
.byte (.Li - 64)
|
||||
.set .Li, .Li + 1
|
||||
.endr
|
||||
|
||||
CTRINC: .word 1, 2, 3, 4
|
||||
ROT8: .word 0x02010003, 0x06050407, 0x0a09080b, 0x0e0d0c0f
|
198
arch/arm64/crypto/chacha-neon-glue.c
Normal file
198
arch/arm64/crypto/chacha-neon-glue.c
Normal file
@ -0,0 +1,198 @@
|
||||
/*
|
||||
* ARM NEON accelerated ChaCha and XChaCha stream ciphers,
|
||||
* including ChaCha20 (RFC7539)
|
||||
*
|
||||
* Copyright (C) 2016 - 2017 Linaro, Ltd. <ard.biesheuvel@linaro.org>
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License version 2 as
|
||||
* published by the Free Software Foundation.
|
||||
*
|
||||
* Based on:
|
||||
* ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
|
||||
*
|
||||
* Copyright (C) 2015 Martin Willi
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License as published by
|
||||
* the Free Software Foundation; either version 2 of the License, or
|
||||
* (at your option) any later version.
|
||||
*/
|
||||
|
||||
#include <crypto/algapi.h>
|
||||
#include <crypto/chacha.h>
|
||||
#include <crypto/internal/skcipher.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/module.h>
|
||||
|
||||
#include <asm/hwcap.h>
|
||||
#include <asm/neon.h>
|
||||
#include <asm/simd.h>
|
||||
|
||||
asmlinkage void chacha_block_xor_neon(u32 *state, u8 *dst, const u8 *src,
|
||||
int nrounds);
|
||||
asmlinkage void chacha_4block_xor_neon(u32 *state, u8 *dst, const u8 *src,
|
||||
int nrounds, int bytes);
|
||||
asmlinkage void hchacha_block_neon(const u32 *state, u32 *out, int nrounds);
|
||||
|
||||
static void chacha_doneon(u32 *state, u8 *dst, const u8 *src,
|
||||
int bytes, int nrounds)
|
||||
{
|
||||
while (bytes > 0) {
|
||||
int l = min(bytes, CHACHA_BLOCK_SIZE * 5);
|
||||
|
||||
if (l <= CHACHA_BLOCK_SIZE) {
|
||||
u8 buf[CHACHA_BLOCK_SIZE];
|
||||
|
||||
memcpy(buf, src, l);
|
||||
chacha_block_xor_neon(state, buf, buf, nrounds);
|
||||
memcpy(dst, buf, l);
|
||||
state[12] += 1;
|
||||
break;
|
||||
}
|
||||
chacha_4block_xor_neon(state, dst, src, nrounds, l);
|
||||
bytes -= CHACHA_BLOCK_SIZE * 5;
|
||||
src += CHACHA_BLOCK_SIZE * 5;
|
||||
dst += CHACHA_BLOCK_SIZE * 5;
|
||||
state[12] += 5;
|
||||
}
|
||||
}
|
||||
|
||||
static int chacha_neon_stream_xor(struct skcipher_request *req,
|
||||
struct chacha_ctx *ctx, u8 *iv)
|
||||
{
|
||||
struct skcipher_walk walk;
|
||||
u32 state[16];
|
||||
int err;
|
||||
|
||||
err = skcipher_walk_virt(&walk, req, false);
|
||||
|
||||
crypto_chacha_init(state, ctx, iv);
|
||||
|
||||
while (walk.nbytes > 0) {
|
||||
unsigned int nbytes = walk.nbytes;
|
||||
|
||||
if (nbytes < walk.total)
|
||||
nbytes = rounddown(nbytes, walk.stride);
|
||||
|
||||
kernel_neon_begin();
|
||||
chacha_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
|
||||
nbytes, ctx->nrounds);
|
||||
kernel_neon_end();
|
||||
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
|
||||
}
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
static int chacha_neon(struct skcipher_request *req)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
|
||||
|
||||
if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd())
|
||||
return crypto_chacha_crypt(req);
|
||||
|
||||
return chacha_neon_stream_xor(req, ctx, req->iv);
|
||||
}
|
||||
|
||||
static int xchacha_neon(struct skcipher_request *req)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
|
||||
struct chacha_ctx subctx;
|
||||
u32 state[16];
|
||||
u8 real_iv[16];
|
||||
|
||||
if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd())
|
||||
return crypto_xchacha_crypt(req);
|
||||
|
||||
crypto_chacha_init(state, ctx, req->iv);
|
||||
|
||||
kernel_neon_begin();
|
||||
hchacha_block_neon(state, subctx.key, ctx->nrounds);
|
||||
kernel_neon_end();
|
||||
subctx.nrounds = ctx->nrounds;
|
||||
|
||||
memcpy(&real_iv[0], req->iv + 24, 8);
|
||||
memcpy(&real_iv[8], req->iv + 16, 8);
|
||||
return chacha_neon_stream_xor(req, &subctx, real_iv);
|
||||
}
|
||||
|
||||
static struct skcipher_alg algs[] = {
|
||||
{
|
||||
.base.cra_name = "chacha20",
|
||||
.base.cra_driver_name = "chacha20-neon",
|
||||
.base.cra_priority = 300,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA_KEY_SIZE,
|
||||
.max_keysize = CHACHA_KEY_SIZE,
|
||||
.ivsize = CHACHA_IV_SIZE,
|
||||
.chunksize = CHACHA_BLOCK_SIZE,
|
||||
.walksize = 5 * CHACHA_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha20_setkey,
|
||||
.encrypt = chacha_neon,
|
||||
.decrypt = chacha_neon,
|
||||
}, {
|
||||
.base.cra_name = "xchacha20",
|
||||
.base.cra_driver_name = "xchacha20-neon",
|
||||
.base.cra_priority = 300,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA_KEY_SIZE,
|
||||
.max_keysize = CHACHA_KEY_SIZE,
|
||||
.ivsize = XCHACHA_IV_SIZE,
|
||||
.chunksize = CHACHA_BLOCK_SIZE,
|
||||
.walksize = 5 * CHACHA_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha20_setkey,
|
||||
.encrypt = xchacha_neon,
|
||||
.decrypt = xchacha_neon,
|
||||
}, {
|
||||
.base.cra_name = "xchacha12",
|
||||
.base.cra_driver_name = "xchacha12-neon",
|
||||
.base.cra_priority = 300,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA_KEY_SIZE,
|
||||
.max_keysize = CHACHA_KEY_SIZE,
|
||||
.ivsize = XCHACHA_IV_SIZE,
|
||||
.chunksize = CHACHA_BLOCK_SIZE,
|
||||
.walksize = 5 * CHACHA_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha12_setkey,
|
||||
.encrypt = xchacha_neon,
|
||||
.decrypt = xchacha_neon,
|
||||
}
|
||||
};
|
||||
|
||||
static int __init chacha_simd_mod_init(void)
|
||||
{
|
||||
if (!(elf_hwcap & HWCAP_ASIMD))
|
||||
return -ENODEV;
|
||||
|
||||
return crypto_register_skciphers(algs, ARRAY_SIZE(algs));
|
||||
}
|
||||
|
||||
static void __exit chacha_simd_mod_fini(void)
|
||||
{
|
||||
crypto_unregister_skciphers(algs, ARRAY_SIZE(algs));
|
||||
}
|
||||
|
||||
module_init(chacha_simd_mod_init);
|
||||
module_exit(chacha_simd_mod_fini);
|
||||
|
||||
MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers (NEON accelerated)");
|
||||
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
|
||||
MODULE_LICENSE("GPL v2");
|
||||
MODULE_ALIAS_CRYPTO("chacha20");
|
||||
MODULE_ALIAS_CRYPTO("chacha20-neon");
|
||||
MODULE_ALIAS_CRYPTO("xchacha20");
|
||||
MODULE_ALIAS_CRYPTO("xchacha20-neon");
|
||||
MODULE_ALIAS_CRYPTO("xchacha12");
|
||||
MODULE_ALIAS_CRYPTO("xchacha12-neon");
|
@ -1,133 +0,0 @@
|
||||
/*
|
||||
* ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
|
||||
*
|
||||
* Copyright (C) 2016 - 2017 Linaro, Ltd. <ard.biesheuvel@linaro.org>
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License version 2 as
|
||||
* published by the Free Software Foundation.
|
||||
*
|
||||
* Based on:
|
||||
* ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
|
||||
*
|
||||
* Copyright (C) 2015 Martin Willi
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License as published by
|
||||
* the Free Software Foundation; either version 2 of the License, or
|
||||
* (at your option) any later version.
|
||||
*/
|
||||
|
||||
#include <crypto/algapi.h>
|
||||
#include <crypto/chacha20.h>
|
||||
#include <crypto/internal/skcipher.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/module.h>
|
||||
|
||||
#include <asm/hwcap.h>
|
||||
#include <asm/neon.h>
|
||||
#include <asm/simd.h>
|
||||
|
||||
asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
|
||||
asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
|
||||
|
||||
static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int bytes)
|
||||
{
|
||||
u8 buf[CHACHA20_BLOCK_SIZE];
|
||||
|
||||
while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
|
||||
kernel_neon_begin();
|
||||
chacha20_4block_xor_neon(state, dst, src);
|
||||
kernel_neon_end();
|
||||
bytes -= CHACHA20_BLOCK_SIZE * 4;
|
||||
src += CHACHA20_BLOCK_SIZE * 4;
|
||||
dst += CHACHA20_BLOCK_SIZE * 4;
|
||||
state[12] += 4;
|
||||
}
|
||||
|
||||
if (!bytes)
|
||||
return;
|
||||
|
||||
kernel_neon_begin();
|
||||
while (bytes >= CHACHA20_BLOCK_SIZE) {
|
||||
chacha20_block_xor_neon(state, dst, src);
|
||||
bytes -= CHACHA20_BLOCK_SIZE;
|
||||
src += CHACHA20_BLOCK_SIZE;
|
||||
dst += CHACHA20_BLOCK_SIZE;
|
||||
state[12]++;
|
||||
}
|
||||
if (bytes) {
|
||||
memcpy(buf, src, bytes);
|
||||
chacha20_block_xor_neon(state, buf, buf);
|
||||
memcpy(dst, buf, bytes);
|
||||
}
|
||||
kernel_neon_end();
|
||||
}
|
||||
|
||||
static int chacha20_neon(struct skcipher_request *req)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
|
||||
struct skcipher_walk walk;
|
||||
u32 state[16];
|
||||
int err;
|
||||
|
||||
if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE)
|
||||
return crypto_chacha20_crypt(req);
|
||||
|
||||
err = skcipher_walk_virt(&walk, req, false);
|
||||
|
||||
crypto_chacha20_init(state, ctx, walk.iv);
|
||||
|
||||
while (walk.nbytes > 0) {
|
||||
unsigned int nbytes = walk.nbytes;
|
||||
|
||||
if (nbytes < walk.total)
|
||||
nbytes = round_down(nbytes, walk.stride);
|
||||
|
||||
chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
|
||||
nbytes);
|
||||
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
|
||||
}
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
static struct skcipher_alg alg = {
|
||||
.base.cra_name = "chacha20",
|
||||
.base.cra_driver_name = "chacha20-neon",
|
||||
.base.cra_priority = 300,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha20_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA20_KEY_SIZE,
|
||||
.max_keysize = CHACHA20_KEY_SIZE,
|
||||
.ivsize = CHACHA20_IV_SIZE,
|
||||
.chunksize = CHACHA20_BLOCK_SIZE,
|
||||
.walksize = 4 * CHACHA20_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha20_setkey,
|
||||
.encrypt = chacha20_neon,
|
||||
.decrypt = chacha20_neon,
|
||||
};
|
||||
|
||||
static int __init chacha20_simd_mod_init(void)
|
||||
{
|
||||
if (!(elf_hwcap & HWCAP_ASIMD))
|
||||
return -ENODEV;
|
||||
|
||||
return crypto_register_skcipher(&alg);
|
||||
}
|
||||
|
||||
static void __exit chacha20_simd_mod_fini(void)
|
||||
{
|
||||
crypto_unregister_skcipher(&alg);
|
||||
}
|
||||
|
||||
module_init(chacha20_simd_mod_init);
|
||||
module_exit(chacha20_simd_mod_fini);
|
||||
|
||||
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
|
||||
MODULE_LICENSE("GPL v2");
|
||||
MODULE_ALIAS_CRYPTO("chacha20");
|
103
arch/arm64/crypto/nh-neon-core.S
Normal file
103
arch/arm64/crypto/nh-neon-core.S
Normal file
@ -0,0 +1,103 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
/*
|
||||
* NH - ε-almost-universal hash function, ARM64 NEON accelerated version
|
||||
*
|
||||
* Copyright 2018 Google LLC
|
||||
*
|
||||
* Author: Eric Biggers <ebiggers@google.com>
|
||||
*/
|
||||
|
||||
#include <linux/linkage.h>
|
||||
|
||||
KEY .req x0
|
||||
MESSAGE .req x1
|
||||
MESSAGE_LEN .req x2
|
||||
HASH .req x3
|
||||
|
||||
PASS0_SUMS .req v0
|
||||
PASS1_SUMS .req v1
|
||||
PASS2_SUMS .req v2
|
||||
PASS3_SUMS .req v3
|
||||
K0 .req v4
|
||||
K1 .req v5
|
||||
K2 .req v6
|
||||
K3 .req v7
|
||||
T0 .req v8
|
||||
T1 .req v9
|
||||
T2 .req v10
|
||||
T3 .req v11
|
||||
T4 .req v12
|
||||
T5 .req v13
|
||||
T6 .req v14
|
||||
T7 .req v15
|
||||
|
||||
.macro _nh_stride k0, k1, k2, k3
|
||||
|
||||
// Load next message stride
|
||||
ld1 {T3.16b}, [MESSAGE], #16
|
||||
|
||||
// Load next key stride
|
||||
ld1 {\k3\().4s}, [KEY], #16
|
||||
|
||||
// Add message words to key words
|
||||
add T0.4s, T3.4s, \k0\().4s
|
||||
add T1.4s, T3.4s, \k1\().4s
|
||||
add T2.4s, T3.4s, \k2\().4s
|
||||
add T3.4s, T3.4s, \k3\().4s
|
||||
|
||||
// Multiply 32x32 => 64 and accumulate
|
||||
mov T4.d[0], T0.d[1]
|
||||
mov T5.d[0], T1.d[1]
|
||||
mov T6.d[0], T2.d[1]
|
||||
mov T7.d[0], T3.d[1]
|
||||
umlal PASS0_SUMS.2d, T0.2s, T4.2s
|
||||
umlal PASS1_SUMS.2d, T1.2s, T5.2s
|
||||
umlal PASS2_SUMS.2d, T2.2s, T6.2s
|
||||
umlal PASS3_SUMS.2d, T3.2s, T7.2s
|
||||
.endm
|
||||
|
||||
/*
|
||||
* void nh_neon(const u32 *key, const u8 *message, size_t message_len,
|
||||
* u8 hash[NH_HASH_BYTES])
|
||||
*
|
||||
* It's guaranteed that message_len % 16 == 0.
|
||||
*/
|
||||
ENTRY(nh_neon)
|
||||
|
||||
ld1 {K0.4s,K1.4s}, [KEY], #32
|
||||
movi PASS0_SUMS.2d, #0
|
||||
movi PASS1_SUMS.2d, #0
|
||||
ld1 {K2.4s}, [KEY], #16
|
||||
movi PASS2_SUMS.2d, #0
|
||||
movi PASS3_SUMS.2d, #0
|
||||
|
||||
subs MESSAGE_LEN, MESSAGE_LEN, #64
|
||||
blt .Lloop4_done
|
||||
.Lloop4:
|
||||
_nh_stride K0, K1, K2, K3
|
||||
_nh_stride K1, K2, K3, K0
|
||||
_nh_stride K2, K3, K0, K1
|
||||
_nh_stride K3, K0, K1, K2
|
||||
subs MESSAGE_LEN, MESSAGE_LEN, #64
|
||||
bge .Lloop4
|
||||
|
||||
.Lloop4_done:
|
||||
ands MESSAGE_LEN, MESSAGE_LEN, #63
|
||||
beq .Ldone
|
||||
_nh_stride K0, K1, K2, K3
|
||||
|
||||
subs MESSAGE_LEN, MESSAGE_LEN, #16
|
||||
beq .Ldone
|
||||
_nh_stride K1, K2, K3, K0
|
||||
|
||||
subs MESSAGE_LEN, MESSAGE_LEN, #16
|
||||
beq .Ldone
|
||||
_nh_stride K2, K3, K0, K1
|
||||
|
||||
.Ldone:
|
||||
// Sum the accumulators for each pass, then store the sums to 'hash'
|
||||
addp T0.2d, PASS0_SUMS.2d, PASS1_SUMS.2d
|
||||
addp T1.2d, PASS2_SUMS.2d, PASS3_SUMS.2d
|
||||
st1 {T0.16b,T1.16b}, [HASH]
|
||||
ret
|
||||
ENDPROC(nh_neon)
|
77
arch/arm64/crypto/nhpoly1305-neon-glue.c
Normal file
77
arch/arm64/crypto/nhpoly1305-neon-glue.c
Normal file
@ -0,0 +1,77 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/*
|
||||
* NHPoly1305 - ε-almost-∆-universal hash function for Adiantum
|
||||
* (ARM64 NEON accelerated version)
|
||||
*
|
||||
* Copyright 2018 Google LLC
|
||||
*/
|
||||
|
||||
#include <asm/neon.h>
|
||||
#include <asm/simd.h>
|
||||
#include <crypto/internal/hash.h>
|
||||
#include <crypto/nhpoly1305.h>
|
||||
#include <linux/module.h>
|
||||
|
||||
asmlinkage void nh_neon(const u32 *key, const u8 *message, size_t message_len,
|
||||
u8 hash[NH_HASH_BYTES]);
|
||||
|
||||
/* wrapper to avoid indirect call to assembly, which doesn't work with CFI */
|
||||
static void _nh_neon(const u32 *key, const u8 *message, size_t message_len,
|
||||
__le64 hash[NH_NUM_PASSES])
|
||||
{
|
||||
nh_neon(key, message, message_len, (u8 *)hash);
|
||||
}
|
||||
|
||||
static int nhpoly1305_neon_update(struct shash_desc *desc,
|
||||
const u8 *src, unsigned int srclen)
|
||||
{
|
||||
if (srclen < 64 || !may_use_simd())
|
||||
return crypto_nhpoly1305_update(desc, src, srclen);
|
||||
|
||||
do {
|
||||
unsigned int n = min_t(unsigned int, srclen, PAGE_SIZE);
|
||||
|
||||
kernel_neon_begin();
|
||||
crypto_nhpoly1305_update_helper(desc, src, n, _nh_neon);
|
||||
kernel_neon_end();
|
||||
src += n;
|
||||
srclen -= n;
|
||||
} while (srclen);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static struct shash_alg nhpoly1305_alg = {
|
||||
.base.cra_name = "nhpoly1305",
|
||||
.base.cra_driver_name = "nhpoly1305-neon",
|
||||
.base.cra_priority = 200,
|
||||
.base.cra_ctxsize = sizeof(struct nhpoly1305_key),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
.digestsize = POLY1305_DIGEST_SIZE,
|
||||
.init = crypto_nhpoly1305_init,
|
||||
.update = nhpoly1305_neon_update,
|
||||
.final = crypto_nhpoly1305_final,
|
||||
.setkey = crypto_nhpoly1305_setkey,
|
||||
.descsize = sizeof(struct nhpoly1305_state),
|
||||
};
|
||||
|
||||
static int __init nhpoly1305_mod_init(void)
|
||||
{
|
||||
if (!(elf_hwcap & HWCAP_ASIMD))
|
||||
return -ENODEV;
|
||||
|
||||
return crypto_register_shash(&nhpoly1305_alg);
|
||||
}
|
||||
|
||||
static void __exit nhpoly1305_mod_exit(void)
|
||||
{
|
||||
crypto_unregister_shash(&nhpoly1305_alg);
|
||||
}
|
||||
|
||||
module_init(nhpoly1305_mod_init);
|
||||
module_exit(nhpoly1305_mod_exit);
|
||||
|
||||
MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function (NEON-accelerated)");
|
||||
MODULE_LICENSE("GPL v2");
|
||||
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
|
||||
MODULE_ALIAS_CRYPTO("nhpoly1305");
|
||||
MODULE_ALIAS_CRYPTO("nhpoly1305-neon");
|
@ -137,7 +137,7 @@ static int fallback_init_cip(struct crypto_tfm *tfm)
|
||||
struct s390_aes_ctx *sctx = crypto_tfm_ctx(tfm);
|
||||
|
||||
sctx->fallback.cip = crypto_alloc_cipher(name, 0,
|
||||
CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK);
|
||||
CRYPTO_ALG_NEED_FALLBACK);
|
||||
|
||||
if (IS_ERR(sctx->fallback.cip)) {
|
||||
pr_err("Allocating AES fallback algorithm %s failed\n",
|
||||
|
@ -476,11 +476,6 @@ static bool __init sparc64_has_aes_opcode(void)
|
||||
|
||||
static int __init aes_sparc64_mod_init(void)
|
||||
{
|
||||
int i;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(algs); i++)
|
||||
INIT_LIST_HEAD(&algs[i].cra_list);
|
||||
|
||||
if (sparc64_has_aes_opcode()) {
|
||||
pr_info("Using sparc64 aes opcodes optimized AES implementation\n");
|
||||
return crypto_register_algs(algs, ARRAY_SIZE(algs));
|
||||
|
@ -299,11 +299,6 @@ static bool __init sparc64_has_camellia_opcode(void)
|
||||
|
||||
static int __init camellia_sparc64_mod_init(void)
|
||||
{
|
||||
int i;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(algs); i++)
|
||||
INIT_LIST_HEAD(&algs[i].cra_list);
|
||||
|
||||
if (sparc64_has_camellia_opcode()) {
|
||||
pr_info("Using sparc64 camellia opcodes optimized CAMELLIA implementation\n");
|
||||
return crypto_register_algs(algs, ARRAY_SIZE(algs));
|
||||
|
@ -510,11 +510,6 @@ static bool __init sparc64_has_des_opcode(void)
|
||||
|
||||
static int __init des_sparc64_mod_init(void)
|
||||
{
|
||||
int i;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(algs); i++)
|
||||
INIT_LIST_HEAD(&algs[i].cra_list);
|
||||
|
||||
if (sparc64_has_des_opcode()) {
|
||||
pr_info("Using sparc64 des opcodes optimized DES implementation\n");
|
||||
return crypto_register_algs(algs, ARRAY_SIZE(algs));
|
||||
|
@ -8,6 +8,7 @@ OBJECT_FILES_NON_STANDARD := y
|
||||
avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no)
|
||||
avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\
|
||||
$(comma)4)$(comma)%ymm2,yes,no)
|
||||
avx512_supported :=$(call as-instr,vpmovm2b %k1$(comma)%zmm5,yes,no)
|
||||
sha1_ni_supported :=$(call as-instr,sha1msg1 %xmm0$(comma)%xmm1,yes,no)
|
||||
sha256_ni_supported :=$(call as-instr,sha256msg1 %xmm0$(comma)%xmm1,yes,no)
|
||||
|
||||
@ -23,7 +24,7 @@ obj-$(CONFIG_CRYPTO_CAMELLIA_X86_64) += camellia-x86_64.o
|
||||
obj-$(CONFIG_CRYPTO_BLOWFISH_X86_64) += blowfish-x86_64.o
|
||||
obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
|
||||
obj-$(CONFIG_CRYPTO_TWOFISH_X86_64_3WAY) += twofish-x86_64-3way.o
|
||||
obj-$(CONFIG_CRYPTO_CHACHA20_X86_64) += chacha20-x86_64.o
|
||||
obj-$(CONFIG_CRYPTO_CHACHA20_X86_64) += chacha-x86_64.o
|
||||
obj-$(CONFIG_CRYPTO_SERPENT_SSE2_X86_64) += serpent-sse2-x86_64.o
|
||||
obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
|
||||
obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
|
||||
@ -46,6 +47,9 @@ obj-$(CONFIG_CRYPTO_MORUS1280_GLUE) += morus1280_glue.o
|
||||
obj-$(CONFIG_CRYPTO_MORUS640_SSE2) += morus640-sse2.o
|
||||
obj-$(CONFIG_CRYPTO_MORUS1280_SSE2) += morus1280-sse2.o
|
||||
|
||||
obj-$(CONFIG_CRYPTO_NHPOLY1305_SSE2) += nhpoly1305-sse2.o
|
||||
obj-$(CONFIG_CRYPTO_NHPOLY1305_AVX2) += nhpoly1305-avx2.o
|
||||
|
||||
# These modules require assembler to support AVX.
|
||||
ifeq ($(avx_supported),yes)
|
||||
obj-$(CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64) += \
|
||||
@ -74,7 +78,7 @@ camellia-x86_64-y := camellia-x86_64-asm_64.o camellia_glue.o
|
||||
blowfish-x86_64-y := blowfish-x86_64-asm_64.o blowfish_glue.o
|
||||
twofish-x86_64-y := twofish-x86_64-asm_64.o twofish_glue.o
|
||||
twofish-x86_64-3way-y := twofish-x86_64-asm_64-3way.o twofish_glue_3way.o
|
||||
chacha20-x86_64-y := chacha20-ssse3-x86_64.o chacha20_glue.o
|
||||
chacha-x86_64-y := chacha-ssse3-x86_64.o chacha_glue.o
|
||||
serpent-sse2-x86_64-y := serpent-sse2-x86_64-asm_64.o serpent_sse2_glue.o
|
||||
|
||||
aegis128-aesni-y := aegis128-aesni-asm.o aegis128-aesni-glue.o
|
||||
@ -84,6 +88,8 @@ aegis256-aesni-y := aegis256-aesni-asm.o aegis256-aesni-glue.o
|
||||
morus640-sse2-y := morus640-sse2-asm.o morus640-sse2-glue.o
|
||||
morus1280-sse2-y := morus1280-sse2-asm.o morus1280-sse2-glue.o
|
||||
|
||||
nhpoly1305-sse2-y := nh-sse2-x86_64.o nhpoly1305-sse2-glue.o
|
||||
|
||||
ifeq ($(avx_supported),yes)
|
||||
camellia-aesni-avx-x86_64-y := camellia-aesni-avx-asm_64.o \
|
||||
camellia_aesni_avx_glue.o
|
||||
@ -97,10 +103,16 @@ endif
|
||||
|
||||
ifeq ($(avx2_supported),yes)
|
||||
camellia-aesni-avx2-y := camellia-aesni-avx2-asm_64.o camellia_aesni_avx2_glue.o
|
||||
chacha20-x86_64-y += chacha20-avx2-x86_64.o
|
||||
chacha-x86_64-y += chacha-avx2-x86_64.o
|
||||
serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o
|
||||
|
||||
morus1280-avx2-y := morus1280-avx2-asm.o morus1280-avx2-glue.o
|
||||
|
||||
nhpoly1305-avx2-y := nh-avx2-x86_64.o nhpoly1305-avx2-glue.o
|
||||
endif
|
||||
|
||||
ifeq ($(avx512_supported),yes)
|
||||
chacha-x86_64-y += chacha-avx512vl-x86_64.o
|
||||
endif
|
||||
|
||||
aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -84,7 +84,7 @@ struct gcm_context_data {
|
||||
u8 current_counter[GCM_BLOCK_LEN];
|
||||
u64 partial_block_len;
|
||||
u64 unused;
|
||||
u8 hash_keys[GCM_BLOCK_LEN * 8];
|
||||
u8 hash_keys[GCM_BLOCK_LEN * 16];
|
||||
};
|
||||
|
||||
asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
|
||||
@ -175,6 +175,32 @@ asmlinkage void aesni_gcm_finalize(void *ctx,
|
||||
struct gcm_context_data *gdata,
|
||||
u8 *auth_tag, unsigned long auth_tag_len);
|
||||
|
||||
static struct aesni_gcm_tfm_s {
|
||||
void (*init)(void *ctx,
|
||||
struct gcm_context_data *gdata,
|
||||
u8 *iv,
|
||||
u8 *hash_subkey, const u8 *aad,
|
||||
unsigned long aad_len);
|
||||
void (*enc_update)(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in,
|
||||
unsigned long plaintext_len);
|
||||
void (*dec_update)(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in,
|
||||
unsigned long ciphertext_len);
|
||||
void (*finalize)(void *ctx,
|
||||
struct gcm_context_data *gdata,
|
||||
u8 *auth_tag, unsigned long auth_tag_len);
|
||||
} *aesni_gcm_tfm;
|
||||
|
||||
struct aesni_gcm_tfm_s aesni_gcm_tfm_sse = {
|
||||
.init = &aesni_gcm_init,
|
||||
.enc_update = &aesni_gcm_enc_update,
|
||||
.dec_update = &aesni_gcm_dec_update,
|
||||
.finalize = &aesni_gcm_finalize,
|
||||
};
|
||||
|
||||
#ifdef CONFIG_AS_AVX
|
||||
asmlinkage void aes_ctr_enc_128_avx_by8(const u8 *in, u8 *iv,
|
||||
void *keys, u8 *out, unsigned int num_bytes);
|
||||
@ -183,136 +209,94 @@ asmlinkage void aes_ctr_enc_192_avx_by8(const u8 *in, u8 *iv,
|
||||
asmlinkage void aes_ctr_enc_256_avx_by8(const u8 *in, u8 *iv,
|
||||
void *keys, u8 *out, unsigned int num_bytes);
|
||||
/*
|
||||
* asmlinkage void aesni_gcm_precomp_avx_gen2()
|
||||
* asmlinkage void aesni_gcm_init_avx_gen2()
|
||||
* gcm_data *my_ctx_data, context data
|
||||
* u8 *hash_subkey, the Hash sub key input. Data starts on a 16-byte boundary.
|
||||
*/
|
||||
asmlinkage void aesni_gcm_precomp_avx_gen2(void *my_ctx_data, u8 *hash_subkey);
|
||||
asmlinkage void aesni_gcm_init_avx_gen2(void *my_ctx_data,
|
||||
struct gcm_context_data *gdata,
|
||||
u8 *iv,
|
||||
u8 *hash_subkey,
|
||||
const u8 *aad,
|
||||
unsigned long aad_len);
|
||||
|
||||
asmlinkage void aesni_gcm_enc_avx_gen2(void *ctx, u8 *out,
|
||||
asmlinkage void aesni_gcm_enc_update_avx_gen2(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in, unsigned long plaintext_len);
|
||||
asmlinkage void aesni_gcm_dec_update_avx_gen2(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in,
|
||||
unsigned long ciphertext_len);
|
||||
asmlinkage void aesni_gcm_finalize_avx_gen2(void *ctx,
|
||||
struct gcm_context_data *gdata,
|
||||
u8 *auth_tag, unsigned long auth_tag_len);
|
||||
|
||||
asmlinkage void aesni_gcm_enc_avx_gen2(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in, unsigned long plaintext_len, u8 *iv,
|
||||
const u8 *aad, unsigned long aad_len,
|
||||
u8 *auth_tag, unsigned long auth_tag_len);
|
||||
|
||||
asmlinkage void aesni_gcm_dec_avx_gen2(void *ctx, u8 *out,
|
||||
asmlinkage void aesni_gcm_dec_avx_gen2(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in, unsigned long ciphertext_len, u8 *iv,
|
||||
const u8 *aad, unsigned long aad_len,
|
||||
u8 *auth_tag, unsigned long auth_tag_len);
|
||||
|
||||
static void aesni_gcm_enc_avx(void *ctx,
|
||||
struct gcm_context_data *data, u8 *out,
|
||||
const u8 *in, unsigned long plaintext_len, u8 *iv,
|
||||
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
|
||||
u8 *auth_tag, unsigned long auth_tag_len)
|
||||
{
|
||||
struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx;
|
||||
if ((plaintext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)){
|
||||
aesni_gcm_enc(ctx, data, out, in,
|
||||
plaintext_len, iv, hash_subkey, aad,
|
||||
aad_len, auth_tag, auth_tag_len);
|
||||
} else {
|
||||
aesni_gcm_precomp_avx_gen2(ctx, hash_subkey);
|
||||
aesni_gcm_enc_avx_gen2(ctx, out, in, plaintext_len, iv, aad,
|
||||
aad_len, auth_tag, auth_tag_len);
|
||||
}
|
||||
}
|
||||
struct aesni_gcm_tfm_s aesni_gcm_tfm_avx_gen2 = {
|
||||
.init = &aesni_gcm_init_avx_gen2,
|
||||
.enc_update = &aesni_gcm_enc_update_avx_gen2,
|
||||
.dec_update = &aesni_gcm_dec_update_avx_gen2,
|
||||
.finalize = &aesni_gcm_finalize_avx_gen2,
|
||||
};
|
||||
|
||||
static void aesni_gcm_dec_avx(void *ctx,
|
||||
struct gcm_context_data *data, u8 *out,
|
||||
const u8 *in, unsigned long ciphertext_len, u8 *iv,
|
||||
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
|
||||
u8 *auth_tag, unsigned long auth_tag_len)
|
||||
{
|
||||
struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx;
|
||||
if ((ciphertext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)) {
|
||||
aesni_gcm_dec(ctx, data, out, in,
|
||||
ciphertext_len, iv, hash_subkey, aad,
|
||||
aad_len, auth_tag, auth_tag_len);
|
||||
} else {
|
||||
aesni_gcm_precomp_avx_gen2(ctx, hash_subkey);
|
||||
aesni_gcm_dec_avx_gen2(ctx, out, in, ciphertext_len, iv, aad,
|
||||
aad_len, auth_tag, auth_tag_len);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
#ifdef CONFIG_AS_AVX2
|
||||
/*
|
||||
* asmlinkage void aesni_gcm_precomp_avx_gen4()
|
||||
* asmlinkage void aesni_gcm_init_avx_gen4()
|
||||
* gcm_data *my_ctx_data, context data
|
||||
* u8 *hash_subkey, the Hash sub key input. Data starts on a 16-byte boundary.
|
||||
*/
|
||||
asmlinkage void aesni_gcm_precomp_avx_gen4(void *my_ctx_data, u8 *hash_subkey);
|
||||
asmlinkage void aesni_gcm_init_avx_gen4(void *my_ctx_data,
|
||||
struct gcm_context_data *gdata,
|
||||
u8 *iv,
|
||||
u8 *hash_subkey,
|
||||
const u8 *aad,
|
||||
unsigned long aad_len);
|
||||
|
||||
asmlinkage void aesni_gcm_enc_avx_gen4(void *ctx, u8 *out,
|
||||
asmlinkage void aesni_gcm_enc_update_avx_gen4(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in, unsigned long plaintext_len);
|
||||
asmlinkage void aesni_gcm_dec_update_avx_gen4(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in,
|
||||
unsigned long ciphertext_len);
|
||||
asmlinkage void aesni_gcm_finalize_avx_gen4(void *ctx,
|
||||
struct gcm_context_data *gdata,
|
||||
u8 *auth_tag, unsigned long auth_tag_len);
|
||||
|
||||
asmlinkage void aesni_gcm_enc_avx_gen4(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in, unsigned long plaintext_len, u8 *iv,
|
||||
const u8 *aad, unsigned long aad_len,
|
||||
u8 *auth_tag, unsigned long auth_tag_len);
|
||||
|
||||
asmlinkage void aesni_gcm_dec_avx_gen4(void *ctx, u8 *out,
|
||||
asmlinkage void aesni_gcm_dec_avx_gen4(void *ctx,
|
||||
struct gcm_context_data *gdata, u8 *out,
|
||||
const u8 *in, unsigned long ciphertext_len, u8 *iv,
|
||||
const u8 *aad, unsigned long aad_len,
|
||||
u8 *auth_tag, unsigned long auth_tag_len);
|
||||
|
||||
static void aesni_gcm_enc_avx2(void *ctx,
|
||||
struct gcm_context_data *data, u8 *out,
|
||||
const u8 *in, unsigned long plaintext_len, u8 *iv,
|
||||
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
|
||||
u8 *auth_tag, unsigned long auth_tag_len)
|
||||
{
|
||||
struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx;
|
||||
if ((plaintext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)) {
|
||||
aesni_gcm_enc(ctx, data, out, in,
|
||||
plaintext_len, iv, hash_subkey, aad,
|
||||
aad_len, auth_tag, auth_tag_len);
|
||||
} else if (plaintext_len < AVX_GEN4_OPTSIZE) {
|
||||
aesni_gcm_precomp_avx_gen2(ctx, hash_subkey);
|
||||
aesni_gcm_enc_avx_gen2(ctx, out, in, plaintext_len, iv, aad,
|
||||
aad_len, auth_tag, auth_tag_len);
|
||||
} else {
|
||||
aesni_gcm_precomp_avx_gen4(ctx, hash_subkey);
|
||||
aesni_gcm_enc_avx_gen4(ctx, out, in, plaintext_len, iv, aad,
|
||||
aad_len, auth_tag, auth_tag_len);
|
||||
}
|
||||
}
|
||||
struct aesni_gcm_tfm_s aesni_gcm_tfm_avx_gen4 = {
|
||||
.init = &aesni_gcm_init_avx_gen4,
|
||||
.enc_update = &aesni_gcm_enc_update_avx_gen4,
|
||||
.dec_update = &aesni_gcm_dec_update_avx_gen4,
|
||||
.finalize = &aesni_gcm_finalize_avx_gen4,
|
||||
};
|
||||
|
||||
static void aesni_gcm_dec_avx2(void *ctx,
|
||||
struct gcm_context_data *data, u8 *out,
|
||||
const u8 *in, unsigned long ciphertext_len, u8 *iv,
|
||||
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
|
||||
u8 *auth_tag, unsigned long auth_tag_len)
|
||||
{
|
||||
struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx;
|
||||
if ((ciphertext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)) {
|
||||
aesni_gcm_dec(ctx, data, out, in,
|
||||
ciphertext_len, iv, hash_subkey,
|
||||
aad, aad_len, auth_tag, auth_tag_len);
|
||||
} else if (ciphertext_len < AVX_GEN4_OPTSIZE) {
|
||||
aesni_gcm_precomp_avx_gen2(ctx, hash_subkey);
|
||||
aesni_gcm_dec_avx_gen2(ctx, out, in, ciphertext_len, iv, aad,
|
||||
aad_len, auth_tag, auth_tag_len);
|
||||
} else {
|
||||
aesni_gcm_precomp_avx_gen4(ctx, hash_subkey);
|
||||
aesni_gcm_dec_avx_gen4(ctx, out, in, ciphertext_len, iv, aad,
|
||||
aad_len, auth_tag, auth_tag_len);
|
||||
}
|
||||
}
|
||||
#endif
|
||||
|
||||
static void (*aesni_gcm_enc_tfm)(void *ctx,
|
||||
struct gcm_context_data *data, u8 *out,
|
||||
const u8 *in, unsigned long plaintext_len,
|
||||
u8 *iv, u8 *hash_subkey, const u8 *aad,
|
||||
unsigned long aad_len, u8 *auth_tag,
|
||||
unsigned long auth_tag_len);
|
||||
|
||||
static void (*aesni_gcm_dec_tfm)(void *ctx,
|
||||
struct gcm_context_data *data, u8 *out,
|
||||
const u8 *in, unsigned long ciphertext_len,
|
||||
u8 *iv, u8 *hash_subkey, const u8 *aad,
|
||||
unsigned long aad_len, u8 *auth_tag,
|
||||
unsigned long auth_tag_len);
|
||||
|
||||
static inline struct
|
||||
aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
|
||||
{
|
||||
@ -794,6 +778,7 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
|
||||
{
|
||||
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
|
||||
unsigned long auth_tag_len = crypto_aead_authsize(tfm);
|
||||
struct aesni_gcm_tfm_s *gcm_tfm = aesni_gcm_tfm;
|
||||
struct gcm_context_data data AESNI_ALIGN_ATTR;
|
||||
struct scatter_walk dst_sg_walk = {};
|
||||
unsigned long left = req->cryptlen;
|
||||
@ -811,6 +796,15 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
|
||||
if (!enc)
|
||||
left -= auth_tag_len;
|
||||
|
||||
#ifdef CONFIG_AS_AVX2
|
||||
if (left < AVX_GEN4_OPTSIZE && gcm_tfm == &aesni_gcm_tfm_avx_gen4)
|
||||
gcm_tfm = &aesni_gcm_tfm_avx_gen2;
|
||||
#endif
|
||||
#ifdef CONFIG_AS_AVX
|
||||
if (left < AVX_GEN2_OPTSIZE && gcm_tfm == &aesni_gcm_tfm_avx_gen2)
|
||||
gcm_tfm = &aesni_gcm_tfm_sse;
|
||||
#endif
|
||||
|
||||
/* Linearize assoc, if not already linear */
|
||||
if (req->src->length >= assoclen && req->src->length &&
|
||||
(!PageHighMem(sg_page(req->src)) ||
|
||||
@ -835,7 +829,7 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
|
||||
}
|
||||
|
||||
kernel_fpu_begin();
|
||||
aesni_gcm_init(aes_ctx, &data, iv,
|
||||
gcm_tfm->init(aes_ctx, &data, iv,
|
||||
hash_subkey, assoc, assoclen);
|
||||
if (req->src != req->dst) {
|
||||
while (left) {
|
||||
@ -846,10 +840,10 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
|
||||
len = min(srclen, dstlen);
|
||||
if (len) {
|
||||
if (enc)
|
||||
aesni_gcm_enc_update(aes_ctx, &data,
|
||||
gcm_tfm->enc_update(aes_ctx, &data,
|
||||
dst, src, len);
|
||||
else
|
||||
aesni_gcm_dec_update(aes_ctx, &data,
|
||||
gcm_tfm->dec_update(aes_ctx, &data,
|
||||
dst, src, len);
|
||||
}
|
||||
left -= len;
|
||||
@ -867,10 +861,10 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
|
||||
len = scatterwalk_clamp(&src_sg_walk, left);
|
||||
if (len) {
|
||||
if (enc)
|
||||
aesni_gcm_enc_update(aes_ctx, &data,
|
||||
gcm_tfm->enc_update(aes_ctx, &data,
|
||||
src, src, len);
|
||||
else
|
||||
aesni_gcm_dec_update(aes_ctx, &data,
|
||||
gcm_tfm->dec_update(aes_ctx, &data,
|
||||
src, src, len);
|
||||
}
|
||||
left -= len;
|
||||
@ -879,7 +873,7 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
|
||||
scatterwalk_done(&src_sg_walk, 1, left);
|
||||
}
|
||||
}
|
||||
aesni_gcm_finalize(aes_ctx, &data, authTag, auth_tag_len);
|
||||
gcm_tfm->finalize(aes_ctx, &data, authTag, auth_tag_len);
|
||||
kernel_fpu_end();
|
||||
|
||||
if (!assocmem)
|
||||
@ -912,147 +906,15 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
|
||||
static int gcmaes_encrypt(struct aead_request *req, unsigned int assoclen,
|
||||
u8 *hash_subkey, u8 *iv, void *aes_ctx)
|
||||
{
|
||||
u8 one_entry_in_sg = 0;
|
||||
u8 *src, *dst, *assoc;
|
||||
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
|
||||
unsigned long auth_tag_len = crypto_aead_authsize(tfm);
|
||||
struct scatter_walk src_sg_walk;
|
||||
struct scatter_walk dst_sg_walk = {};
|
||||
struct gcm_context_data data AESNI_ALIGN_ATTR;
|
||||
|
||||
if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 ||
|
||||
aesni_gcm_enc_tfm == aesni_gcm_enc ||
|
||||
req->cryptlen < AVX_GEN2_OPTSIZE) {
|
||||
return gcmaes_crypt_by_sg(true, req, assoclen, hash_subkey, iv,
|
||||
aes_ctx);
|
||||
}
|
||||
if (sg_is_last(req->src) &&
|
||||
(!PageHighMem(sg_page(req->src)) ||
|
||||
req->src->offset + req->src->length <= PAGE_SIZE) &&
|
||||
sg_is_last(req->dst) &&
|
||||
(!PageHighMem(sg_page(req->dst)) ||
|
||||
req->dst->offset + req->dst->length <= PAGE_SIZE)) {
|
||||
one_entry_in_sg = 1;
|
||||
scatterwalk_start(&src_sg_walk, req->src);
|
||||
assoc = scatterwalk_map(&src_sg_walk);
|
||||
src = assoc + req->assoclen;
|
||||
dst = src;
|
||||
if (unlikely(req->src != req->dst)) {
|
||||
scatterwalk_start(&dst_sg_walk, req->dst);
|
||||
dst = scatterwalk_map(&dst_sg_walk) + req->assoclen;
|
||||
}
|
||||
} else {
|
||||
/* Allocate memory for src, dst, assoc */
|
||||
assoc = kmalloc(req->cryptlen + auth_tag_len + req->assoclen,
|
||||
GFP_ATOMIC);
|
||||
if (unlikely(!assoc))
|
||||
return -ENOMEM;
|
||||
scatterwalk_map_and_copy(assoc, req->src, 0,
|
||||
req->assoclen + req->cryptlen, 0);
|
||||
src = assoc + req->assoclen;
|
||||
dst = src;
|
||||
}
|
||||
|
||||
kernel_fpu_begin();
|
||||
aesni_gcm_enc_tfm(aes_ctx, &data, dst, src, req->cryptlen, iv,
|
||||
hash_subkey, assoc, assoclen,
|
||||
dst + req->cryptlen, auth_tag_len);
|
||||
kernel_fpu_end();
|
||||
|
||||
/* The authTag (aka the Integrity Check Value) needs to be written
|
||||
* back to the packet. */
|
||||
if (one_entry_in_sg) {
|
||||
if (unlikely(req->src != req->dst)) {
|
||||
scatterwalk_unmap(dst - req->assoclen);
|
||||
scatterwalk_advance(&dst_sg_walk, req->dst->length);
|
||||
scatterwalk_done(&dst_sg_walk, 1, 0);
|
||||
}
|
||||
scatterwalk_unmap(assoc);
|
||||
scatterwalk_advance(&src_sg_walk, req->src->length);
|
||||
scatterwalk_done(&src_sg_walk, req->src == req->dst, 0);
|
||||
} else {
|
||||
scatterwalk_map_and_copy(dst, req->dst, req->assoclen,
|
||||
req->cryptlen + auth_tag_len, 1);
|
||||
kfree(assoc);
|
||||
}
|
||||
return 0;
|
||||
return gcmaes_crypt_by_sg(true, req, assoclen, hash_subkey, iv,
|
||||
aes_ctx);
|
||||
}
|
||||
|
||||
static int gcmaes_decrypt(struct aead_request *req, unsigned int assoclen,
|
||||
u8 *hash_subkey, u8 *iv, void *aes_ctx)
|
||||
{
|
||||
u8 one_entry_in_sg = 0;
|
||||
u8 *src, *dst, *assoc;
|
||||
unsigned long tempCipherLen = 0;
|
||||
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
|
||||
unsigned long auth_tag_len = crypto_aead_authsize(tfm);
|
||||
u8 authTag[16];
|
||||
struct scatter_walk src_sg_walk;
|
||||
struct scatter_walk dst_sg_walk = {};
|
||||
struct gcm_context_data data AESNI_ALIGN_ATTR;
|
||||
int retval = 0;
|
||||
|
||||
if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 ||
|
||||
aesni_gcm_enc_tfm == aesni_gcm_enc ||
|
||||
req->cryptlen < AVX_GEN2_OPTSIZE) {
|
||||
return gcmaes_crypt_by_sg(false, req, assoclen, hash_subkey, iv,
|
||||
aes_ctx);
|
||||
}
|
||||
tempCipherLen = (unsigned long)(req->cryptlen - auth_tag_len);
|
||||
|
||||
if (sg_is_last(req->src) &&
|
||||
(!PageHighMem(sg_page(req->src)) ||
|
||||
req->src->offset + req->src->length <= PAGE_SIZE) &&
|
||||
sg_is_last(req->dst) && req->dst->length &&
|
||||
(!PageHighMem(sg_page(req->dst)) ||
|
||||
req->dst->offset + req->dst->length <= PAGE_SIZE)) {
|
||||
one_entry_in_sg = 1;
|
||||
scatterwalk_start(&src_sg_walk, req->src);
|
||||
assoc = scatterwalk_map(&src_sg_walk);
|
||||
src = assoc + req->assoclen;
|
||||
dst = src;
|
||||
if (unlikely(req->src != req->dst)) {
|
||||
scatterwalk_start(&dst_sg_walk, req->dst);
|
||||
dst = scatterwalk_map(&dst_sg_walk) + req->assoclen;
|
||||
}
|
||||
} else {
|
||||
/* Allocate memory for src, dst, assoc */
|
||||
assoc = kmalloc(req->cryptlen + req->assoclen, GFP_ATOMIC);
|
||||
if (!assoc)
|
||||
return -ENOMEM;
|
||||
scatterwalk_map_and_copy(assoc, req->src, 0,
|
||||
req->assoclen + req->cryptlen, 0);
|
||||
src = assoc + req->assoclen;
|
||||
dst = src;
|
||||
}
|
||||
|
||||
|
||||
kernel_fpu_begin();
|
||||
aesni_gcm_dec_tfm(aes_ctx, &data, dst, src, tempCipherLen, iv,
|
||||
hash_subkey, assoc, assoclen,
|
||||
authTag, auth_tag_len);
|
||||
kernel_fpu_end();
|
||||
|
||||
/* Compare generated tag with passed in tag. */
|
||||
retval = crypto_memneq(src + tempCipherLen, authTag, auth_tag_len) ?
|
||||
-EBADMSG : 0;
|
||||
|
||||
if (one_entry_in_sg) {
|
||||
if (unlikely(req->src != req->dst)) {
|
||||
scatterwalk_unmap(dst - req->assoclen);
|
||||
scatterwalk_advance(&dst_sg_walk, req->dst->length);
|
||||
scatterwalk_done(&dst_sg_walk, 1, 0);
|
||||
}
|
||||
scatterwalk_unmap(assoc);
|
||||
scatterwalk_advance(&src_sg_walk, req->src->length);
|
||||
scatterwalk_done(&src_sg_walk, req->src == req->dst, 0);
|
||||
} else {
|
||||
scatterwalk_map_and_copy(dst, req->dst, req->assoclen,
|
||||
tempCipherLen, 1);
|
||||
kfree(assoc);
|
||||
}
|
||||
return retval;
|
||||
|
||||
return gcmaes_crypt_by_sg(false, req, assoclen, hash_subkey, iv,
|
||||
aes_ctx);
|
||||
}
|
||||
|
||||
static int helper_rfc4106_encrypt(struct aead_request *req)
|
||||
@ -1420,21 +1282,18 @@ static int __init aesni_init(void)
|
||||
#ifdef CONFIG_AS_AVX2
|
||||
if (boot_cpu_has(X86_FEATURE_AVX2)) {
|
||||
pr_info("AVX2 version of gcm_enc/dec engaged.\n");
|
||||
aesni_gcm_enc_tfm = aesni_gcm_enc_avx2;
|
||||
aesni_gcm_dec_tfm = aesni_gcm_dec_avx2;
|
||||
aesni_gcm_tfm = &aesni_gcm_tfm_avx_gen4;
|
||||
} else
|
||||
#endif
|
||||
#ifdef CONFIG_AS_AVX
|
||||
if (boot_cpu_has(X86_FEATURE_AVX)) {
|
||||
pr_info("AVX version of gcm_enc/dec engaged.\n");
|
||||
aesni_gcm_enc_tfm = aesni_gcm_enc_avx;
|
||||
aesni_gcm_dec_tfm = aesni_gcm_dec_avx;
|
||||
aesni_gcm_tfm = &aesni_gcm_tfm_avx_gen2;
|
||||
} else
|
||||
#endif
|
||||
{
|
||||
pr_info("SSE version of gcm_enc/dec engaged.\n");
|
||||
aesni_gcm_enc_tfm = aesni_gcm_enc;
|
||||
aesni_gcm_dec_tfm = aesni_gcm_dec;
|
||||
aesni_gcm_tfm = &aesni_gcm_tfm_sse;
|
||||
}
|
||||
aesni_ctr_enc_tfm = aesni_ctr_enc;
|
||||
#ifdef CONFIG_AS_AVX
|
||||
|
1025
arch/x86/crypto/chacha-avx2-x86_64.S
Normal file
1025
arch/x86/crypto/chacha-avx2-x86_64.S
Normal file
File diff suppressed because it is too large
Load Diff
836
arch/x86/crypto/chacha-avx512vl-x86_64.S
Normal file
836
arch/x86/crypto/chacha-avx512vl-x86_64.S
Normal file
@ -0,0 +1,836 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0+ */
|
||||
/*
|
||||
* ChaCha 256-bit cipher algorithm, x64 AVX-512VL functions
|
||||
*
|
||||
* Copyright (C) 2018 Martin Willi
|
||||
*/
|
||||
|
||||
#include <linux/linkage.h>
|
||||
|
||||
.section .rodata.cst32.CTR2BL, "aM", @progbits, 32
|
||||
.align 32
|
||||
CTR2BL: .octa 0x00000000000000000000000000000000
|
||||
.octa 0x00000000000000000000000000000001
|
||||
|
||||
.section .rodata.cst32.CTR4BL, "aM", @progbits, 32
|
||||
.align 32
|
||||
CTR4BL: .octa 0x00000000000000000000000000000002
|
||||
.octa 0x00000000000000000000000000000003
|
||||
|
||||
.section .rodata.cst32.CTR8BL, "aM", @progbits, 32
|
||||
.align 32
|
||||
CTR8BL: .octa 0x00000003000000020000000100000000
|
||||
.octa 0x00000007000000060000000500000004
|
||||
|
||||
.text
|
||||
|
||||
ENTRY(chacha_2block_xor_avx512vl)
|
||||
# %rdi: Input state matrix, s
|
||||
# %rsi: up to 2 data blocks output, o
|
||||
# %rdx: up to 2 data blocks input, i
|
||||
# %rcx: input/output length in bytes
|
||||
# %r8d: nrounds
|
||||
|
||||
# This function encrypts two ChaCha blocks by loading the state
|
||||
# matrix twice across four AVX registers. It performs matrix operations
|
||||
# on four words in each matrix in parallel, but requires shuffling to
|
||||
# rearrange the words after each round.
|
||||
|
||||
vzeroupper
|
||||
|
||||
# x0..3[0-2] = s0..3
|
||||
vbroadcasti128 0x00(%rdi),%ymm0
|
||||
vbroadcasti128 0x10(%rdi),%ymm1
|
||||
vbroadcasti128 0x20(%rdi),%ymm2
|
||||
vbroadcasti128 0x30(%rdi),%ymm3
|
||||
|
||||
vpaddd CTR2BL(%rip),%ymm3,%ymm3
|
||||
|
||||
vmovdqa %ymm0,%ymm8
|
||||
vmovdqa %ymm1,%ymm9
|
||||
vmovdqa %ymm2,%ymm10
|
||||
vmovdqa %ymm3,%ymm11
|
||||
|
||||
.Ldoubleround:
|
||||
|
||||
# x0 += x1, x3 = rotl32(x3 ^ x0, 16)
|
||||
vpaddd %ymm1,%ymm0,%ymm0
|
||||
vpxord %ymm0,%ymm3,%ymm3
|
||||
vprold $16,%ymm3,%ymm3
|
||||
|
||||
# x2 += x3, x1 = rotl32(x1 ^ x2, 12)
|
||||
vpaddd %ymm3,%ymm2,%ymm2
|
||||
vpxord %ymm2,%ymm1,%ymm1
|
||||
vprold $12,%ymm1,%ymm1
|
||||
|
||||
# x0 += x1, x3 = rotl32(x3 ^ x0, 8)
|
||||
vpaddd %ymm1,%ymm0,%ymm0
|
||||
vpxord %ymm0,%ymm3,%ymm3
|
||||
vprold $8,%ymm3,%ymm3
|
||||
|
||||
# x2 += x3, x1 = rotl32(x1 ^ x2, 7)
|
||||
vpaddd %ymm3,%ymm2,%ymm2
|
||||
vpxord %ymm2,%ymm1,%ymm1
|
||||
vprold $7,%ymm1,%ymm1
|
||||
|
||||
# x1 = shuffle32(x1, MASK(0, 3, 2, 1))
|
||||
vpshufd $0x39,%ymm1,%ymm1
|
||||
# x2 = shuffle32(x2, MASK(1, 0, 3, 2))
|
||||
vpshufd $0x4e,%ymm2,%ymm2
|
||||
# x3 = shuffle32(x3, MASK(2, 1, 0, 3))
|
||||
vpshufd $0x93,%ymm3,%ymm3
|
||||
|
||||
# x0 += x1, x3 = rotl32(x3 ^ x0, 16)
|
||||
vpaddd %ymm1,%ymm0,%ymm0
|
||||
vpxord %ymm0,%ymm3,%ymm3
|
||||
vprold $16,%ymm3,%ymm3
|
||||
|
||||
# x2 += x3, x1 = rotl32(x1 ^ x2, 12)
|
||||
vpaddd %ymm3,%ymm2,%ymm2
|
||||
vpxord %ymm2,%ymm1,%ymm1
|
||||
vprold $12,%ymm1,%ymm1
|
||||
|
||||
# x0 += x1, x3 = rotl32(x3 ^ x0, 8)
|
||||
vpaddd %ymm1,%ymm0,%ymm0
|
||||
vpxord %ymm0,%ymm3,%ymm3
|
||||
vprold $8,%ymm3,%ymm3
|
||||
|
||||
# x2 += x3, x1 = rotl32(x1 ^ x2, 7)
|
||||
vpaddd %ymm3,%ymm2,%ymm2
|
||||
vpxord %ymm2,%ymm1,%ymm1
|
||||
vprold $7,%ymm1,%ymm1
|
||||
|
||||
# x1 = shuffle32(x1, MASK(2, 1, 0, 3))
|
||||
vpshufd $0x93,%ymm1,%ymm1
|
||||
# x2 = shuffle32(x2, MASK(1, 0, 3, 2))
|
||||
vpshufd $0x4e,%ymm2,%ymm2
|
||||
# x3 = shuffle32(x3, MASK(0, 3, 2, 1))
|
||||
vpshufd $0x39,%ymm3,%ymm3
|
||||
|
||||
sub $2,%r8d
|
||||
jnz .Ldoubleround
|
||||
|
||||
# o0 = i0 ^ (x0 + s0)
|
||||
vpaddd %ymm8,%ymm0,%ymm7
|
||||
cmp $0x10,%rcx
|
||||
jl .Lxorpart2
|
||||
vpxord 0x00(%rdx),%xmm7,%xmm6
|
||||
vmovdqu %xmm6,0x00(%rsi)
|
||||
vextracti128 $1,%ymm7,%xmm0
|
||||
# o1 = i1 ^ (x1 + s1)
|
||||
vpaddd %ymm9,%ymm1,%ymm7
|
||||
cmp $0x20,%rcx
|
||||
jl .Lxorpart2
|
||||
vpxord 0x10(%rdx),%xmm7,%xmm6
|
||||
vmovdqu %xmm6,0x10(%rsi)
|
||||
vextracti128 $1,%ymm7,%xmm1
|
||||
# o2 = i2 ^ (x2 + s2)
|
||||
vpaddd %ymm10,%ymm2,%ymm7
|
||||
cmp $0x30,%rcx
|
||||
jl .Lxorpart2
|
||||
vpxord 0x20(%rdx),%xmm7,%xmm6
|
||||
vmovdqu %xmm6,0x20(%rsi)
|
||||
vextracti128 $1,%ymm7,%xmm2
|
||||
# o3 = i3 ^ (x3 + s3)
|
||||
vpaddd %ymm11,%ymm3,%ymm7
|
||||
cmp $0x40,%rcx
|
||||
jl .Lxorpart2
|
||||
vpxord 0x30(%rdx),%xmm7,%xmm6
|
||||
vmovdqu %xmm6,0x30(%rsi)
|
||||
vextracti128 $1,%ymm7,%xmm3
|
||||
|
||||
# xor and write second block
|
||||
vmovdqa %xmm0,%xmm7
|
||||
cmp $0x50,%rcx
|
||||
jl .Lxorpart2
|
||||
vpxord 0x40(%rdx),%xmm7,%xmm6
|
||||
vmovdqu %xmm6,0x40(%rsi)
|
||||
|
||||
vmovdqa %xmm1,%xmm7
|
||||
cmp $0x60,%rcx
|
||||
jl .Lxorpart2
|
||||
vpxord 0x50(%rdx),%xmm7,%xmm6
|
||||
vmovdqu %xmm6,0x50(%rsi)
|
||||
|
||||
vmovdqa %xmm2,%xmm7
|
||||
cmp $0x70,%rcx
|
||||
jl .Lxorpart2
|
||||
vpxord 0x60(%rdx),%xmm7,%xmm6
|
||||
vmovdqu %xmm6,0x60(%rsi)
|
||||
|
||||
vmovdqa %xmm3,%xmm7
|
||||
cmp $0x80,%rcx
|
||||
jl .Lxorpart2
|
||||
vpxord 0x70(%rdx),%xmm7,%xmm6
|
||||
vmovdqu %xmm6,0x70(%rsi)
|
||||
|
||||
.Ldone2:
|
||||
vzeroupper
|
||||
ret
|
||||
|
||||
.Lxorpart2:
|
||||
# xor remaining bytes from partial register into output
|
||||
mov %rcx,%rax
|
||||
and $0xf,%rcx
|
||||
jz .Ldone8
|
||||
mov %rax,%r9
|
||||
and $~0xf,%r9
|
||||
|
||||
mov $1,%rax
|
||||
shld %cl,%rax,%rax
|
||||
sub $1,%rax
|
||||
kmovq %rax,%k1
|
||||
|
||||
vmovdqu8 (%rdx,%r9),%xmm1{%k1}{z}
|
||||
vpxord %xmm7,%xmm1,%xmm1
|
||||
vmovdqu8 %xmm1,(%rsi,%r9){%k1}
|
||||
|
||||
jmp .Ldone2
|
||||
|
||||
ENDPROC(chacha_2block_xor_avx512vl)
|
||||
|
||||
ENTRY(chacha_4block_xor_avx512vl)
|
||||
# %rdi: Input state matrix, s
|
||||
# %rsi: up to 4 data blocks output, o
|
||||
# %rdx: up to 4 data blocks input, i
|
||||
# %rcx: input/output length in bytes
|
||||
# %r8d: nrounds
|
||||
|
||||
# This function encrypts four ChaCha blocks by loading the state
|
||||
# matrix four times across eight AVX registers. It performs matrix
|
||||
# operations on four words in two matrices in parallel, sequentially
|
||||
# to the operations on the four words of the other two matrices. The
|
||||
# required word shuffling has a rather high latency, we can do the
|
||||
# arithmetic on two matrix-pairs without much slowdown.
|
||||
|
||||
vzeroupper
|
||||
|
||||
# x0..3[0-4] = s0..3
|
||||
vbroadcasti128 0x00(%rdi),%ymm0
|
||||
vbroadcasti128 0x10(%rdi),%ymm1
|
||||
vbroadcasti128 0x20(%rdi),%ymm2
|
||||
vbroadcasti128 0x30(%rdi),%ymm3
|
||||
|
||||
vmovdqa %ymm0,%ymm4
|
||||
vmovdqa %ymm1,%ymm5
|
||||
vmovdqa %ymm2,%ymm6
|
||||
vmovdqa %ymm3,%ymm7
|
||||
|
||||
vpaddd CTR2BL(%rip),%ymm3,%ymm3
|
||||
vpaddd CTR4BL(%rip),%ymm7,%ymm7
|
||||
|
||||
vmovdqa %ymm0,%ymm11
|
||||
vmovdqa %ymm1,%ymm12
|
||||
vmovdqa %ymm2,%ymm13
|
||||
vmovdqa %ymm3,%ymm14
|
||||
vmovdqa %ymm7,%ymm15
|
||||
|
||||
.Ldoubleround4:
|
||||
|
||||
# x0 += x1, x3 = rotl32(x3 ^ x0, 16)
|
||||
vpaddd %ymm1,%ymm0,%ymm0
|
||||
vpxord %ymm0,%ymm3,%ymm3
|
||||
vprold $16,%ymm3,%ymm3
|
||||
|
||||
vpaddd %ymm5,%ymm4,%ymm4
|
||||
vpxord %ymm4,%ymm7,%ymm7
|
||||
vprold $16,%ymm7,%ymm7
|
||||
|
||||
# x2 += x3, x1 = rotl32(x1 ^ x2, 12)
|
||||
vpaddd %ymm3,%ymm2,%ymm2
|
||||
vpxord %ymm2,%ymm1,%ymm1
|
||||
vprold $12,%ymm1,%ymm1
|
||||
|
||||
vpaddd %ymm7,%ymm6,%ymm6
|
||||
vpxord %ymm6,%ymm5,%ymm5
|
||||
vprold $12,%ymm5,%ymm5
|
||||
|
||||
# x0 += x1, x3 = rotl32(x3 ^ x0, 8)
|
||||
vpaddd %ymm1,%ymm0,%ymm0
|
||||
vpxord %ymm0,%ymm3,%ymm3
|
||||
vprold $8,%ymm3,%ymm3
|
||||
|
||||
vpaddd %ymm5,%ymm4,%ymm4
|
||||
vpxord %ymm4,%ymm7,%ymm7
|
||||
vprold $8,%ymm7,%ymm7
|
||||
|
||||
# x2 += x3, x1 = rotl32(x1 ^ x2, 7)
|
||||
vpaddd %ymm3,%ymm2,%ymm2
|
||||
vpxord %ymm2,%ymm1,%ymm1
|
||||
vprold $7,%ymm1,%ymm1
|
||||
|
||||
vpaddd %ymm7,%ymm6,%ymm6
|
||||
vpxord %ymm6,%ymm5,%ymm5
|
||||
vprold $7,%ymm5,%ymm5
|
||||
|
||||
# x1 = shuffle32(x1, MASK(0, 3, 2, 1))
|
||||
vpshufd $0x39,%ymm1,%ymm1
|
||||
vpshufd $0x39,%ymm5,%ymm5
|
||||
# x2 = shuffle32(x2, MASK(1, 0, 3, 2))
|
||||
vpshufd $0x4e,%ymm2,%ymm2
|
||||
vpshufd $0x4e,%ymm6,%ymm6
|
||||
# x3 = shuffle32(x3, MASK(2, 1, 0, 3))
|
||||
vpshufd $0x93,%ymm3,%ymm3
|
||||
vpshufd $0x93,%ymm7,%ymm7
|
||||
|
||||
# x0 += x1, x3 = rotl32(x3 ^ x0, 16)
|
||||
vpaddd %ymm1,%ymm0,%ymm0
|
||||
vpxord %ymm0,%ymm3,%ymm3
|
||||
vprold $16,%ymm3,%ymm3
|
||||
|
||||
vpaddd %ymm5,%ymm4,%ymm4
|
||||
vpxord %ymm4,%ymm7,%ymm7
|
||||
vprold $16,%ymm7,%ymm7
|
||||
|
||||
# x2 += x3, x1 = rotl32(x1 ^ x2, 12)
|
||||
vpaddd %ymm3,%ymm2,%ymm2
|
||||
vpxord %ymm2,%ymm1,%ymm1
|
||||
vprold $12,%ymm1,%ymm1
|
||||
|
||||
vpaddd %ymm7,%ymm6,%ymm6
|
||||
vpxord %ymm6,%ymm5,%ymm5
|
||||
vprold $12,%ymm5,%ymm5
|
||||
|
||||
# x0 += x1, x3 = rotl32(x3 ^ x0, 8)
|
||||
vpaddd %ymm1,%ymm0,%ymm0
|
||||
vpxord %ymm0,%ymm3,%ymm3
|
||||
vprold $8,%ymm3,%ymm3
|
||||
|
||||
vpaddd %ymm5,%ymm4,%ymm4
|
||||
vpxord %ymm4,%ymm7,%ymm7
|
||||
vprold $8,%ymm7,%ymm7
|
||||
|
||||
# x2 += x3, x1 = rotl32(x1 ^ x2, 7)
|
||||
vpaddd %ymm3,%ymm2,%ymm2
|
||||
vpxord %ymm2,%ymm1,%ymm1
|
||||
vprold $7,%ymm1,%ymm1
|
||||
|
||||
vpaddd %ymm7,%ymm6,%ymm6
|
||||
vpxord %ymm6,%ymm5,%ymm5
|
||||
vprold $7,%ymm5,%ymm5
|
||||
|
||||
# x1 = shuffle32(x1, MASK(2, 1, 0, 3))
|
||||
vpshufd $0x93,%ymm1,%ymm1
|
||||
vpshufd $0x93,%ymm5,%ymm5
|
||||
# x2 = shuffle32(x2, MASK(1, 0, 3, 2))
|
||||
vpshufd $0x4e,%ymm2,%ymm2
|
||||
vpshufd $0x4e,%ymm6,%ymm6
|
||||
# x3 = shuffle32(x3, MASK(0, 3, 2, 1))
|
||||
vpshufd $0x39,%ymm3,%ymm3
|
||||
vpshufd $0x39,%ymm7,%ymm7
|
||||
|
||||
sub $2,%r8d
|
||||
jnz .Ldoubleround4
|
||||
|
||||
# o0 = i0 ^ (x0 + s0), first block
|
||||
vpaddd %ymm11,%ymm0,%ymm10
|
||||
cmp $0x10,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0x00(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0x00(%rsi)
|
||||
vextracti128 $1,%ymm10,%xmm0
|
||||
# o1 = i1 ^ (x1 + s1), first block
|
||||
vpaddd %ymm12,%ymm1,%ymm10
|
||||
cmp $0x20,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0x10(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0x10(%rsi)
|
||||
vextracti128 $1,%ymm10,%xmm1
|
||||
# o2 = i2 ^ (x2 + s2), first block
|
||||
vpaddd %ymm13,%ymm2,%ymm10
|
||||
cmp $0x30,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0x20(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0x20(%rsi)
|
||||
vextracti128 $1,%ymm10,%xmm2
|
||||
# o3 = i3 ^ (x3 + s3), first block
|
||||
vpaddd %ymm14,%ymm3,%ymm10
|
||||
cmp $0x40,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0x30(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0x30(%rsi)
|
||||
vextracti128 $1,%ymm10,%xmm3
|
||||
|
||||
# xor and write second block
|
||||
vmovdqa %xmm0,%xmm10
|
||||
cmp $0x50,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0x40(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0x40(%rsi)
|
||||
|
||||
vmovdqa %xmm1,%xmm10
|
||||
cmp $0x60,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0x50(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0x50(%rsi)
|
||||
|
||||
vmovdqa %xmm2,%xmm10
|
||||
cmp $0x70,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0x60(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0x60(%rsi)
|
||||
|
||||
vmovdqa %xmm3,%xmm10
|
||||
cmp $0x80,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0x70(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0x70(%rsi)
|
||||
|
||||
# o0 = i0 ^ (x0 + s0), third block
|
||||
vpaddd %ymm11,%ymm4,%ymm10
|
||||
cmp $0x90,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0x80(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0x80(%rsi)
|
||||
vextracti128 $1,%ymm10,%xmm4
|
||||
# o1 = i1 ^ (x1 + s1), third block
|
||||
vpaddd %ymm12,%ymm5,%ymm10
|
||||
cmp $0xa0,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0x90(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0x90(%rsi)
|
||||
vextracti128 $1,%ymm10,%xmm5
|
||||
# o2 = i2 ^ (x2 + s2), third block
|
||||
vpaddd %ymm13,%ymm6,%ymm10
|
||||
cmp $0xb0,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0xa0(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0xa0(%rsi)
|
||||
vextracti128 $1,%ymm10,%xmm6
|
||||
# o3 = i3 ^ (x3 + s3), third block
|
||||
vpaddd %ymm15,%ymm7,%ymm10
|
||||
cmp $0xc0,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0xb0(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0xb0(%rsi)
|
||||
vextracti128 $1,%ymm10,%xmm7
|
||||
|
||||
# xor and write fourth block
|
||||
vmovdqa %xmm4,%xmm10
|
||||
cmp $0xd0,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0xc0(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0xc0(%rsi)
|
||||
|
||||
vmovdqa %xmm5,%xmm10
|
||||
cmp $0xe0,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0xd0(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0xd0(%rsi)
|
||||
|
||||
vmovdqa %xmm6,%xmm10
|
||||
cmp $0xf0,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0xe0(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0xe0(%rsi)
|
||||
|
||||
vmovdqa %xmm7,%xmm10
|
||||
cmp $0x100,%rcx
|
||||
jl .Lxorpart4
|
||||
vpxord 0xf0(%rdx),%xmm10,%xmm9
|
||||
vmovdqu %xmm9,0xf0(%rsi)
|
||||
|
||||
.Ldone4:
|
||||
vzeroupper
|
||||
ret
|
||||
|
||||
.Lxorpart4:
|
||||
# xor remaining bytes from partial register into output
|
||||
mov %rcx,%rax
|
||||
and $0xf,%rcx
|
||||
jz .Ldone8
|
||||
mov %rax,%r9
|
||||
and $~0xf,%r9
|
||||
|
||||
mov $1,%rax
|
||||
shld %cl,%rax,%rax
|
||||
sub $1,%rax
|
||||
kmovq %rax,%k1
|
||||
|
||||
vmovdqu8 (%rdx,%r9),%xmm1{%k1}{z}
|
||||
vpxord %xmm10,%xmm1,%xmm1
|
||||
vmovdqu8 %xmm1,(%rsi,%r9){%k1}
|
||||
|
||||
jmp .Ldone4
|
||||
|
||||
ENDPROC(chacha_4block_xor_avx512vl)
|
||||
|
||||
ENTRY(chacha_8block_xor_avx512vl)
|
||||
# %rdi: Input state matrix, s
|
||||
# %rsi: up to 8 data blocks output, o
|
||||
# %rdx: up to 8 data blocks input, i
|
||||
# %rcx: input/output length in bytes
|
||||
# %r8d: nrounds
|
||||
|
||||
# This function encrypts eight consecutive ChaCha blocks by loading
|
||||
# the state matrix in AVX registers eight times. Compared to AVX2, this
|
||||
# mostly benefits from the new rotate instructions in VL and the
|
||||
# additional registers.
|
||||
|
||||
vzeroupper
|
||||
|
||||
# x0..15[0-7] = s[0..15]
|
||||
vpbroadcastd 0x00(%rdi),%ymm0
|
||||
vpbroadcastd 0x04(%rdi),%ymm1
|
||||
vpbroadcastd 0x08(%rdi),%ymm2
|
||||
vpbroadcastd 0x0c(%rdi),%ymm3
|
||||
vpbroadcastd 0x10(%rdi),%ymm4
|
||||
vpbroadcastd 0x14(%rdi),%ymm5
|
||||
vpbroadcastd 0x18(%rdi),%ymm6
|
||||
vpbroadcastd 0x1c(%rdi),%ymm7
|
||||
vpbroadcastd 0x20(%rdi),%ymm8
|
||||
vpbroadcastd 0x24(%rdi),%ymm9
|
||||
vpbroadcastd 0x28(%rdi),%ymm10
|
||||
vpbroadcastd 0x2c(%rdi),%ymm11
|
||||
vpbroadcastd 0x30(%rdi),%ymm12
|
||||
vpbroadcastd 0x34(%rdi),%ymm13
|
||||
vpbroadcastd 0x38(%rdi),%ymm14
|
||||
vpbroadcastd 0x3c(%rdi),%ymm15
|
||||
|
||||
# x12 += counter values 0-3
|
||||
vpaddd CTR8BL(%rip),%ymm12,%ymm12
|
||||
|
||||
vmovdqa64 %ymm0,%ymm16
|
||||
vmovdqa64 %ymm1,%ymm17
|
||||
vmovdqa64 %ymm2,%ymm18
|
||||
vmovdqa64 %ymm3,%ymm19
|
||||
vmovdqa64 %ymm4,%ymm20
|
||||
vmovdqa64 %ymm5,%ymm21
|
||||
vmovdqa64 %ymm6,%ymm22
|
||||
vmovdqa64 %ymm7,%ymm23
|
||||
vmovdqa64 %ymm8,%ymm24
|
||||
vmovdqa64 %ymm9,%ymm25
|
||||
vmovdqa64 %ymm10,%ymm26
|
||||
vmovdqa64 %ymm11,%ymm27
|
||||
vmovdqa64 %ymm12,%ymm28
|
||||
vmovdqa64 %ymm13,%ymm29
|
||||
vmovdqa64 %ymm14,%ymm30
|
||||
vmovdqa64 %ymm15,%ymm31
|
||||
|
||||
.Ldoubleround8:
|
||||
# x0 += x4, x12 = rotl32(x12 ^ x0, 16)
|
||||
vpaddd %ymm0,%ymm4,%ymm0
|
||||
vpxord %ymm0,%ymm12,%ymm12
|
||||
vprold $16,%ymm12,%ymm12
|
||||
# x1 += x5, x13 = rotl32(x13 ^ x1, 16)
|
||||
vpaddd %ymm1,%ymm5,%ymm1
|
||||
vpxord %ymm1,%ymm13,%ymm13
|
||||
vprold $16,%ymm13,%ymm13
|
||||
# x2 += x6, x14 = rotl32(x14 ^ x2, 16)
|
||||
vpaddd %ymm2,%ymm6,%ymm2
|
||||
vpxord %ymm2,%ymm14,%ymm14
|
||||
vprold $16,%ymm14,%ymm14
|
||||
# x3 += x7, x15 = rotl32(x15 ^ x3, 16)
|
||||
vpaddd %ymm3,%ymm7,%ymm3
|
||||
vpxord %ymm3,%ymm15,%ymm15
|
||||
vprold $16,%ymm15,%ymm15
|
||||
|
||||
# x8 += x12, x4 = rotl32(x4 ^ x8, 12)
|
||||
vpaddd %ymm12,%ymm8,%ymm8
|
||||
vpxord %ymm8,%ymm4,%ymm4
|
||||
vprold $12,%ymm4,%ymm4
|
||||
# x9 += x13, x5 = rotl32(x5 ^ x9, 12)
|
||||
vpaddd %ymm13,%ymm9,%ymm9
|
||||
vpxord %ymm9,%ymm5,%ymm5
|
||||
vprold $12,%ymm5,%ymm5
|
||||
# x10 += x14, x6 = rotl32(x6 ^ x10, 12)
|
||||
vpaddd %ymm14,%ymm10,%ymm10
|
||||
vpxord %ymm10,%ymm6,%ymm6
|
||||
vprold $12,%ymm6,%ymm6
|
||||
# x11 += x15, x7 = rotl32(x7 ^ x11, 12)
|
||||
vpaddd %ymm15,%ymm11,%ymm11
|
||||
vpxord %ymm11,%ymm7,%ymm7
|
||||
vprold $12,%ymm7,%ymm7
|
||||
|
||||
# x0 += x4, x12 = rotl32(x12 ^ x0, 8)
|
||||
vpaddd %ymm0,%ymm4,%ymm0
|
||||
vpxord %ymm0,%ymm12,%ymm12
|
||||
vprold $8,%ymm12,%ymm12
|
||||
# x1 += x5, x13 = rotl32(x13 ^ x1, 8)
|
||||
vpaddd %ymm1,%ymm5,%ymm1
|
||||
vpxord %ymm1,%ymm13,%ymm13
|
||||
vprold $8,%ymm13,%ymm13
|
||||
# x2 += x6, x14 = rotl32(x14 ^ x2, 8)
|
||||
vpaddd %ymm2,%ymm6,%ymm2
|
||||
vpxord %ymm2,%ymm14,%ymm14
|
||||
vprold $8,%ymm14,%ymm14
|
||||
# x3 += x7, x15 = rotl32(x15 ^ x3, 8)
|
||||
vpaddd %ymm3,%ymm7,%ymm3
|
||||
vpxord %ymm3,%ymm15,%ymm15
|
||||
vprold $8,%ymm15,%ymm15
|
||||
|
||||
# x8 += x12, x4 = rotl32(x4 ^ x8, 7)
|
||||
vpaddd %ymm12,%ymm8,%ymm8
|
||||
vpxord %ymm8,%ymm4,%ymm4
|
||||
vprold $7,%ymm4,%ymm4
|
||||
# x9 += x13, x5 = rotl32(x5 ^ x9, 7)
|
||||
vpaddd %ymm13,%ymm9,%ymm9
|
||||
vpxord %ymm9,%ymm5,%ymm5
|
||||
vprold $7,%ymm5,%ymm5
|
||||
# x10 += x14, x6 = rotl32(x6 ^ x10, 7)
|
||||
vpaddd %ymm14,%ymm10,%ymm10
|
||||
vpxord %ymm10,%ymm6,%ymm6
|
||||
vprold $7,%ymm6,%ymm6
|
||||
# x11 += x15, x7 = rotl32(x7 ^ x11, 7)
|
||||
vpaddd %ymm15,%ymm11,%ymm11
|
||||
vpxord %ymm11,%ymm7,%ymm7
|
||||
vprold $7,%ymm7,%ymm7
|
||||
|
||||
# x0 += x5, x15 = rotl32(x15 ^ x0, 16)
|
||||
vpaddd %ymm0,%ymm5,%ymm0
|
||||
vpxord %ymm0,%ymm15,%ymm15
|
||||
vprold $16,%ymm15,%ymm15
|
||||
# x1 += x6, x12 = rotl32(x12 ^ x1, 16)
|
||||
vpaddd %ymm1,%ymm6,%ymm1
|
||||
vpxord %ymm1,%ymm12,%ymm12
|
||||
vprold $16,%ymm12,%ymm12
|
||||
# x2 += x7, x13 = rotl32(x13 ^ x2, 16)
|
||||
vpaddd %ymm2,%ymm7,%ymm2
|
||||
vpxord %ymm2,%ymm13,%ymm13
|
||||
vprold $16,%ymm13,%ymm13
|
||||
# x3 += x4, x14 = rotl32(x14 ^ x3, 16)
|
||||
vpaddd %ymm3,%ymm4,%ymm3
|
||||
vpxord %ymm3,%ymm14,%ymm14
|
||||
vprold $16,%ymm14,%ymm14
|
||||
|
||||
# x10 += x15, x5 = rotl32(x5 ^ x10, 12)
|
||||
vpaddd %ymm15,%ymm10,%ymm10
|
||||
vpxord %ymm10,%ymm5,%ymm5
|
||||
vprold $12,%ymm5,%ymm5
|
||||
# x11 += x12, x6 = rotl32(x6 ^ x11, 12)
|
||||
vpaddd %ymm12,%ymm11,%ymm11
|
||||
vpxord %ymm11,%ymm6,%ymm6
|
||||
vprold $12,%ymm6,%ymm6
|
||||
# x8 += x13, x7 = rotl32(x7 ^ x8, 12)
|
||||
vpaddd %ymm13,%ymm8,%ymm8
|
||||
vpxord %ymm8,%ymm7,%ymm7
|
||||
vprold $12,%ymm7,%ymm7
|
||||
# x9 += x14, x4 = rotl32(x4 ^ x9, 12)
|
||||
vpaddd %ymm14,%ymm9,%ymm9
|
||||
vpxord %ymm9,%ymm4,%ymm4
|
||||
vprold $12,%ymm4,%ymm4
|
||||
|
||||
# x0 += x5, x15 = rotl32(x15 ^ x0, 8)
|
||||
vpaddd %ymm0,%ymm5,%ymm0
|
||||
vpxord %ymm0,%ymm15,%ymm15
|
||||
vprold $8,%ymm15,%ymm15
|
||||
# x1 += x6, x12 = rotl32(x12 ^ x1, 8)
|
||||
vpaddd %ymm1,%ymm6,%ymm1
|
||||
vpxord %ymm1,%ymm12,%ymm12
|
||||
vprold $8,%ymm12,%ymm12
|
||||
# x2 += x7, x13 = rotl32(x13 ^ x2, 8)
|
||||
vpaddd %ymm2,%ymm7,%ymm2
|
||||
vpxord %ymm2,%ymm13,%ymm13
|
||||
vprold $8,%ymm13,%ymm13
|
||||
# x3 += x4, x14 = rotl32(x14 ^ x3, 8)
|
||||
vpaddd %ymm3,%ymm4,%ymm3
|
||||
vpxord %ymm3,%ymm14,%ymm14
|
||||
vprold $8,%ymm14,%ymm14
|
||||
|
||||
# x10 += x15, x5 = rotl32(x5 ^ x10, 7)
|
||||
vpaddd %ymm15,%ymm10,%ymm10
|
||||
vpxord %ymm10,%ymm5,%ymm5
|
||||
vprold $7,%ymm5,%ymm5
|
||||
# x11 += x12, x6 = rotl32(x6 ^ x11, 7)
|
||||
vpaddd %ymm12,%ymm11,%ymm11
|
||||
vpxord %ymm11,%ymm6,%ymm6
|
||||
vprold $7,%ymm6,%ymm6
|
||||
# x8 += x13, x7 = rotl32(x7 ^ x8, 7)
|
||||
vpaddd %ymm13,%ymm8,%ymm8
|
||||
vpxord %ymm8,%ymm7,%ymm7
|
||||
vprold $7,%ymm7,%ymm7
|
||||
# x9 += x14, x4 = rotl32(x4 ^ x9, 7)
|
||||
vpaddd %ymm14,%ymm9,%ymm9
|
||||
vpxord %ymm9,%ymm4,%ymm4
|
||||
vprold $7,%ymm4,%ymm4
|
||||
|
||||
sub $2,%r8d
|
||||
jnz .Ldoubleround8
|
||||
|
||||
# x0..15[0-3] += s[0..15]
|
||||
vpaddd %ymm16,%ymm0,%ymm0
|
||||
vpaddd %ymm17,%ymm1,%ymm1
|
||||
vpaddd %ymm18,%ymm2,%ymm2
|
||||
vpaddd %ymm19,%ymm3,%ymm3
|
||||
vpaddd %ymm20,%ymm4,%ymm4
|
||||
vpaddd %ymm21,%ymm5,%ymm5
|
||||
vpaddd %ymm22,%ymm6,%ymm6
|
||||
vpaddd %ymm23,%ymm7,%ymm7
|
||||
vpaddd %ymm24,%ymm8,%ymm8
|
||||
vpaddd %ymm25,%ymm9,%ymm9
|
||||
vpaddd %ymm26,%ymm10,%ymm10
|
||||
vpaddd %ymm27,%ymm11,%ymm11
|
||||
vpaddd %ymm28,%ymm12,%ymm12
|
||||
vpaddd %ymm29,%ymm13,%ymm13
|
||||
vpaddd %ymm30,%ymm14,%ymm14
|
||||
vpaddd %ymm31,%ymm15,%ymm15
|
||||
|
||||
# interleave 32-bit words in state n, n+1
|
||||
vpunpckldq %ymm1,%ymm0,%ymm16
|
||||
vpunpckhdq %ymm1,%ymm0,%ymm17
|
||||
vpunpckldq %ymm3,%ymm2,%ymm18
|
||||
vpunpckhdq %ymm3,%ymm2,%ymm19
|
||||
vpunpckldq %ymm5,%ymm4,%ymm20
|
||||
vpunpckhdq %ymm5,%ymm4,%ymm21
|
||||
vpunpckldq %ymm7,%ymm6,%ymm22
|
||||
vpunpckhdq %ymm7,%ymm6,%ymm23
|
||||
vpunpckldq %ymm9,%ymm8,%ymm24
|
||||
vpunpckhdq %ymm9,%ymm8,%ymm25
|
||||
vpunpckldq %ymm11,%ymm10,%ymm26
|
||||
vpunpckhdq %ymm11,%ymm10,%ymm27
|
||||
vpunpckldq %ymm13,%ymm12,%ymm28
|
||||
vpunpckhdq %ymm13,%ymm12,%ymm29
|
||||
vpunpckldq %ymm15,%ymm14,%ymm30
|
||||
vpunpckhdq %ymm15,%ymm14,%ymm31
|
||||
|
||||
# interleave 64-bit words in state n, n+2
|
||||
vpunpcklqdq %ymm18,%ymm16,%ymm0
|
||||
vpunpcklqdq %ymm19,%ymm17,%ymm1
|
||||
vpunpckhqdq %ymm18,%ymm16,%ymm2
|
||||
vpunpckhqdq %ymm19,%ymm17,%ymm3
|
||||
vpunpcklqdq %ymm22,%ymm20,%ymm4
|
||||
vpunpcklqdq %ymm23,%ymm21,%ymm5
|
||||
vpunpckhqdq %ymm22,%ymm20,%ymm6
|
||||
vpunpckhqdq %ymm23,%ymm21,%ymm7
|
||||
vpunpcklqdq %ymm26,%ymm24,%ymm8
|
||||
vpunpcklqdq %ymm27,%ymm25,%ymm9
|
||||
vpunpckhqdq %ymm26,%ymm24,%ymm10
|
||||
vpunpckhqdq %ymm27,%ymm25,%ymm11
|
||||
vpunpcklqdq %ymm30,%ymm28,%ymm12
|
||||
vpunpcklqdq %ymm31,%ymm29,%ymm13
|
||||
vpunpckhqdq %ymm30,%ymm28,%ymm14
|
||||
vpunpckhqdq %ymm31,%ymm29,%ymm15
|
||||
|
||||
# interleave 128-bit words in state n, n+4
|
||||
# xor/write first four blocks
|
||||
vmovdqa64 %ymm0,%ymm16
|
||||
vperm2i128 $0x20,%ymm4,%ymm0,%ymm0
|
||||
cmp $0x0020,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x0000(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x0000(%rsi)
|
||||
vmovdqa64 %ymm16,%ymm0
|
||||
vperm2i128 $0x31,%ymm4,%ymm0,%ymm4
|
||||
|
||||
vperm2i128 $0x20,%ymm12,%ymm8,%ymm0
|
||||
cmp $0x0040,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x0020(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x0020(%rsi)
|
||||
vperm2i128 $0x31,%ymm12,%ymm8,%ymm12
|
||||
|
||||
vperm2i128 $0x20,%ymm6,%ymm2,%ymm0
|
||||
cmp $0x0060,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x0040(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x0040(%rsi)
|
||||
vperm2i128 $0x31,%ymm6,%ymm2,%ymm6
|
||||
|
||||
vperm2i128 $0x20,%ymm14,%ymm10,%ymm0
|
||||
cmp $0x0080,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x0060(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x0060(%rsi)
|
||||
vperm2i128 $0x31,%ymm14,%ymm10,%ymm14
|
||||
|
||||
vperm2i128 $0x20,%ymm5,%ymm1,%ymm0
|
||||
cmp $0x00a0,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x0080(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x0080(%rsi)
|
||||
vperm2i128 $0x31,%ymm5,%ymm1,%ymm5
|
||||
|
||||
vperm2i128 $0x20,%ymm13,%ymm9,%ymm0
|
||||
cmp $0x00c0,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x00a0(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x00a0(%rsi)
|
||||
vperm2i128 $0x31,%ymm13,%ymm9,%ymm13
|
||||
|
||||
vperm2i128 $0x20,%ymm7,%ymm3,%ymm0
|
||||
cmp $0x00e0,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x00c0(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x00c0(%rsi)
|
||||
vperm2i128 $0x31,%ymm7,%ymm3,%ymm7
|
||||
|
||||
vperm2i128 $0x20,%ymm15,%ymm11,%ymm0
|
||||
cmp $0x0100,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x00e0(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x00e0(%rsi)
|
||||
vperm2i128 $0x31,%ymm15,%ymm11,%ymm15
|
||||
|
||||
# xor remaining blocks, write to output
|
||||
vmovdqa64 %ymm4,%ymm0
|
||||
cmp $0x0120,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x0100(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x0100(%rsi)
|
||||
|
||||
vmovdqa64 %ymm12,%ymm0
|
||||
cmp $0x0140,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x0120(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x0120(%rsi)
|
||||
|
||||
vmovdqa64 %ymm6,%ymm0
|
||||
cmp $0x0160,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x0140(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x0140(%rsi)
|
||||
|
||||
vmovdqa64 %ymm14,%ymm0
|
||||
cmp $0x0180,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x0160(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x0160(%rsi)
|
||||
|
||||
vmovdqa64 %ymm5,%ymm0
|
||||
cmp $0x01a0,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x0180(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x0180(%rsi)
|
||||
|
||||
vmovdqa64 %ymm13,%ymm0
|
||||
cmp $0x01c0,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x01a0(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x01a0(%rsi)
|
||||
|
||||
vmovdqa64 %ymm7,%ymm0
|
||||
cmp $0x01e0,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x01c0(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x01c0(%rsi)
|
||||
|
||||
vmovdqa64 %ymm15,%ymm0
|
||||
cmp $0x0200,%rcx
|
||||
jl .Lxorpart8
|
||||
vpxord 0x01e0(%rdx),%ymm0,%ymm0
|
||||
vmovdqu64 %ymm0,0x01e0(%rsi)
|
||||
|
||||
.Ldone8:
|
||||
vzeroupper
|
||||
ret
|
||||
|
||||
.Lxorpart8:
|
||||
# xor remaining bytes from partial register into output
|
||||
mov %rcx,%rax
|
||||
and $0x1f,%rcx
|
||||
jz .Ldone8
|
||||
mov %rax,%r9
|
||||
and $~0x1f,%r9
|
||||
|
||||
mov $1,%rax
|
||||
shld %cl,%rax,%rax
|
||||
sub $1,%rax
|
||||
kmovq %rax,%k1
|
||||
|
||||
vmovdqu8 (%rdx,%r9),%ymm1{%k1}{z}
|
||||
vpxord %ymm0,%ymm1,%ymm1
|
||||
vmovdqu8 %ymm1,(%rsi,%r9){%k1}
|
||||
|
||||
jmp .Ldone8
|
||||
|
||||
ENDPROC(chacha_8block_xor_avx512vl)
|
@ -1,5 +1,5 @@
|
||||
/*
|
||||
* ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
|
||||
* ChaCha 256-bit cipher algorithm, x64 SSSE3 functions
|
||||
*
|
||||
* Copyright (C) 2015 Martin Willi
|
||||
*
|
||||
@ -10,6 +10,7 @@
|
||||
*/
|
||||
|
||||
#include <linux/linkage.h>
|
||||
#include <asm/frame.h>
|
||||
|
||||
.section .rodata.cst16.ROT8, "aM", @progbits, 16
|
||||
.align 16
|
||||
@ -23,35 +24,25 @@ CTRINC: .octa 0x00000003000000020000000100000000
|
||||
|
||||
.text
|
||||
|
||||
ENTRY(chacha20_block_xor_ssse3)
|
||||
# %rdi: Input state matrix, s
|
||||
# %rsi: 1 data block output, o
|
||||
# %rdx: 1 data block input, i
|
||||
|
||||
# This function encrypts one ChaCha20 block by loading the state matrix
|
||||
# in four SSE registers. It performs matrix operation on four words in
|
||||
# parallel, but requireds shuffling to rearrange the words after each
|
||||
# round. 8/16-bit word rotation is done with the slightly better
|
||||
# performing SSSE3 byte shuffling, 7/12-bit word rotation uses
|
||||
# traditional shift+OR.
|
||||
|
||||
# x0..3 = s0..3
|
||||
movdqa 0x00(%rdi),%xmm0
|
||||
movdqa 0x10(%rdi),%xmm1
|
||||
movdqa 0x20(%rdi),%xmm2
|
||||
movdqa 0x30(%rdi),%xmm3
|
||||
movdqa %xmm0,%xmm8
|
||||
movdqa %xmm1,%xmm9
|
||||
movdqa %xmm2,%xmm10
|
||||
movdqa %xmm3,%xmm11
|
||||
/*
|
||||
* chacha_permute - permute one block
|
||||
*
|
||||
* Permute one 64-byte block where the state matrix is in %xmm0-%xmm3. This
|
||||
* function performs matrix operations on four words in parallel, but requires
|
||||
* shuffling to rearrange the words after each round. 8/16-bit word rotation is
|
||||
* done with the slightly better performing SSSE3 byte shuffling, 7/12-bit word
|
||||
* rotation uses traditional shift+OR.
|
||||
*
|
||||
* The round count is given in %r8d.
|
||||
*
|
||||
* Clobbers: %r8d, %xmm4-%xmm7
|
||||
*/
|
||||
chacha_permute:
|
||||
|
||||
movdqa ROT8(%rip),%xmm4
|
||||
movdqa ROT16(%rip),%xmm5
|
||||
|
||||
mov $10,%ecx
|
||||
|
||||
.Ldoubleround:
|
||||
|
||||
# x0 += x1, x3 = rotl32(x3 ^ x0, 16)
|
||||
paddd %xmm1,%xmm0
|
||||
pxor %xmm0,%xmm3
|
||||
@ -118,39 +109,129 @@ ENTRY(chacha20_block_xor_ssse3)
|
||||
# x3 = shuffle32(x3, MASK(0, 3, 2, 1))
|
||||
pshufd $0x39,%xmm3,%xmm3
|
||||
|
||||
dec %ecx
|
||||
sub $2,%r8d
|
||||
jnz .Ldoubleround
|
||||
|
||||
ret
|
||||
ENDPROC(chacha_permute)
|
||||
|
||||
ENTRY(chacha_block_xor_ssse3)
|
||||
# %rdi: Input state matrix, s
|
||||
# %rsi: up to 1 data block output, o
|
||||
# %rdx: up to 1 data block input, i
|
||||
# %rcx: input/output length in bytes
|
||||
# %r8d: nrounds
|
||||
FRAME_BEGIN
|
||||
|
||||
# x0..3 = s0..3
|
||||
movdqa 0x00(%rdi),%xmm0
|
||||
movdqa 0x10(%rdi),%xmm1
|
||||
movdqa 0x20(%rdi),%xmm2
|
||||
movdqa 0x30(%rdi),%xmm3
|
||||
movdqa %xmm0,%xmm8
|
||||
movdqa %xmm1,%xmm9
|
||||
movdqa %xmm2,%xmm10
|
||||
movdqa %xmm3,%xmm11
|
||||
|
||||
mov %rcx,%rax
|
||||
call chacha_permute
|
||||
|
||||
# o0 = i0 ^ (x0 + s0)
|
||||
movdqu 0x00(%rdx),%xmm4
|
||||
paddd %xmm8,%xmm0
|
||||
cmp $0x10,%rax
|
||||
jl .Lxorpart
|
||||
movdqu 0x00(%rdx),%xmm4
|
||||
pxor %xmm4,%xmm0
|
||||
movdqu %xmm0,0x00(%rsi)
|
||||
# o1 = i1 ^ (x1 + s1)
|
||||
movdqu 0x10(%rdx),%xmm5
|
||||
paddd %xmm9,%xmm1
|
||||
pxor %xmm5,%xmm1
|
||||
movdqu %xmm1,0x10(%rsi)
|
||||
movdqa %xmm1,%xmm0
|
||||
cmp $0x20,%rax
|
||||
jl .Lxorpart
|
||||
movdqu 0x10(%rdx),%xmm0
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0x10(%rsi)
|
||||
# o2 = i2 ^ (x2 + s2)
|
||||
movdqu 0x20(%rdx),%xmm6
|
||||
paddd %xmm10,%xmm2
|
||||
pxor %xmm6,%xmm2
|
||||
movdqu %xmm2,0x20(%rsi)
|
||||
movdqa %xmm2,%xmm0
|
||||
cmp $0x30,%rax
|
||||
jl .Lxorpart
|
||||
movdqu 0x20(%rdx),%xmm0
|
||||
pxor %xmm2,%xmm0
|
||||
movdqu %xmm0,0x20(%rsi)
|
||||
# o3 = i3 ^ (x3 + s3)
|
||||
movdqu 0x30(%rdx),%xmm7
|
||||
paddd %xmm11,%xmm3
|
||||
pxor %xmm7,%xmm3
|
||||
movdqu %xmm3,0x30(%rsi)
|
||||
movdqa %xmm3,%xmm0
|
||||
cmp $0x40,%rax
|
||||
jl .Lxorpart
|
||||
movdqu 0x30(%rdx),%xmm0
|
||||
pxor %xmm3,%xmm0
|
||||
movdqu %xmm0,0x30(%rsi)
|
||||
|
||||
.Ldone:
|
||||
FRAME_END
|
||||
ret
|
||||
ENDPROC(chacha20_block_xor_ssse3)
|
||||
|
||||
ENTRY(chacha20_4block_xor_ssse3)
|
||||
.Lxorpart:
|
||||
# xor remaining bytes from partial register into output
|
||||
mov %rax,%r9
|
||||
and $0x0f,%r9
|
||||
jz .Ldone
|
||||
and $~0x0f,%rax
|
||||
|
||||
mov %rsi,%r11
|
||||
|
||||
lea 8(%rsp),%r10
|
||||
sub $0x10,%rsp
|
||||
and $~31,%rsp
|
||||
|
||||
lea (%rdx,%rax),%rsi
|
||||
mov %rsp,%rdi
|
||||
mov %r9,%rcx
|
||||
rep movsb
|
||||
|
||||
pxor 0x00(%rsp),%xmm0
|
||||
movdqa %xmm0,0x00(%rsp)
|
||||
|
||||
mov %rsp,%rsi
|
||||
lea (%r11,%rax),%rdi
|
||||
mov %r9,%rcx
|
||||
rep movsb
|
||||
|
||||
lea -8(%r10),%rsp
|
||||
jmp .Ldone
|
||||
|
||||
ENDPROC(chacha_block_xor_ssse3)
|
||||
|
||||
ENTRY(hchacha_block_ssse3)
|
||||
# %rdi: Input state matrix, s
|
||||
# %rsi: 4 data blocks output, o
|
||||
# %rdx: 4 data blocks input, i
|
||||
# %rsi: output (8 32-bit words)
|
||||
# %edx: nrounds
|
||||
FRAME_BEGIN
|
||||
|
||||
# This function encrypts four consecutive ChaCha20 blocks by loading the
|
||||
movdqa 0x00(%rdi),%xmm0
|
||||
movdqa 0x10(%rdi),%xmm1
|
||||
movdqa 0x20(%rdi),%xmm2
|
||||
movdqa 0x30(%rdi),%xmm3
|
||||
|
||||
mov %edx,%r8d
|
||||
call chacha_permute
|
||||
|
||||
movdqu %xmm0,0x00(%rsi)
|
||||
movdqu %xmm3,0x10(%rsi)
|
||||
|
||||
FRAME_END
|
||||
ret
|
||||
ENDPROC(hchacha_block_ssse3)
|
||||
|
||||
ENTRY(chacha_4block_xor_ssse3)
|
||||
# %rdi: Input state matrix, s
|
||||
# %rsi: up to 4 data blocks output, o
|
||||
# %rdx: up to 4 data blocks input, i
|
||||
# %rcx: input/output length in bytes
|
||||
# %r8d: nrounds
|
||||
|
||||
# This function encrypts four consecutive ChaCha blocks by loading the
|
||||
# the state matrix in SSE registers four times. As we need some scratch
|
||||
# registers, we save the first four registers on the stack. The
|
||||
# algorithm performs each operation on the corresponding word of each
|
||||
@ -163,6 +244,7 @@ ENTRY(chacha20_4block_xor_ssse3)
|
||||
lea 8(%rsp),%r10
|
||||
sub $0x80,%rsp
|
||||
and $~63,%rsp
|
||||
mov %rcx,%rax
|
||||
|
||||
# x0..15[0-3] = s0..3[0..3]
|
||||
movq 0x00(%rdi),%xmm1
|
||||
@ -202,8 +284,6 @@ ENTRY(chacha20_4block_xor_ssse3)
|
||||
# x12 += counter values 0-3
|
||||
paddd %xmm1,%xmm12
|
||||
|
||||
mov $10,%ecx
|
||||
|
||||
.Ldoubleround4:
|
||||
# x0 += x4, x12 = rotl32(x12 ^ x0, 16)
|
||||
movdqa 0x00(%rsp),%xmm0
|
||||
@ -421,7 +501,7 @@ ENTRY(chacha20_4block_xor_ssse3)
|
||||
psrld $25,%xmm4
|
||||
por %xmm0,%xmm4
|
||||
|
||||
dec %ecx
|
||||
sub $2,%r8d
|
||||
jnz .Ldoubleround4
|
||||
|
||||
# x0[0-3] += s0[0]
|
||||
@ -573,58 +653,143 @@ ENTRY(chacha20_4block_xor_ssse3)
|
||||
|
||||
# xor with corresponding input, write to output
|
||||
movdqa 0x00(%rsp),%xmm0
|
||||
cmp $0x10,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0x00(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0x00(%rsi)
|
||||
movdqa 0x10(%rsp),%xmm0
|
||||
movdqu 0x80(%rdx),%xmm1
|
||||
|
||||
movdqu %xmm4,%xmm0
|
||||
cmp $0x20,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0x10(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0x80(%rsi)
|
||||
movdqu %xmm0,0x10(%rsi)
|
||||
|
||||
movdqu %xmm8,%xmm0
|
||||
cmp $0x30,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0x20(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0x20(%rsi)
|
||||
|
||||
movdqu %xmm12,%xmm0
|
||||
cmp $0x40,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0x30(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0x30(%rsi)
|
||||
|
||||
movdqa 0x20(%rsp),%xmm0
|
||||
cmp $0x50,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0x40(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0x40(%rsi)
|
||||
|
||||
movdqu %xmm6,%xmm0
|
||||
cmp $0x60,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0x50(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0x50(%rsi)
|
||||
|
||||
movdqu %xmm10,%xmm0
|
||||
cmp $0x70,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0x60(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0x60(%rsi)
|
||||
|
||||
movdqu %xmm14,%xmm0
|
||||
cmp $0x80,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0x70(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0x70(%rsi)
|
||||
|
||||
movdqa 0x10(%rsp),%xmm0
|
||||
cmp $0x90,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0x80(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0x80(%rsi)
|
||||
|
||||
movdqu %xmm5,%xmm0
|
||||
cmp $0xa0,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0x90(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0x90(%rsi)
|
||||
|
||||
movdqu %xmm9,%xmm0
|
||||
cmp $0xb0,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0xa0(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0xa0(%rsi)
|
||||
|
||||
movdqu %xmm13,%xmm0
|
||||
cmp $0xc0,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0xb0(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0xb0(%rsi)
|
||||
|
||||
movdqa 0x30(%rsp),%xmm0
|
||||
cmp $0xd0,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0xc0(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0xc0(%rsi)
|
||||
movdqu 0x10(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm4
|
||||
movdqu %xmm4,0x10(%rsi)
|
||||
movdqu 0x90(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm5
|
||||
movdqu %xmm5,0x90(%rsi)
|
||||
movdqu 0x50(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm6
|
||||
movdqu %xmm6,0x50(%rsi)
|
||||
movdqu 0xd0(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm7
|
||||
movdqu %xmm7,0xd0(%rsi)
|
||||
movdqu 0x20(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm8
|
||||
movdqu %xmm8,0x20(%rsi)
|
||||
movdqu 0xa0(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm9
|
||||
movdqu %xmm9,0xa0(%rsi)
|
||||
movdqu 0x60(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm10
|
||||
movdqu %xmm10,0x60(%rsi)
|
||||
movdqu 0xe0(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm11
|
||||
movdqu %xmm11,0xe0(%rsi)
|
||||
movdqu 0x30(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm12
|
||||
movdqu %xmm12,0x30(%rsi)
|
||||
movdqu 0xb0(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm13
|
||||
movdqu %xmm13,0xb0(%rsi)
|
||||
movdqu 0x70(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm14
|
||||
movdqu %xmm14,0x70(%rsi)
|
||||
movdqu 0xf0(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm15
|
||||
movdqu %xmm15,0xf0(%rsi)
|
||||
|
||||
movdqu %xmm7,%xmm0
|
||||
cmp $0xe0,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0xd0(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0xd0(%rsi)
|
||||
|
||||
movdqu %xmm11,%xmm0
|
||||
cmp $0xf0,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0xe0(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0xe0(%rsi)
|
||||
|
||||
movdqu %xmm15,%xmm0
|
||||
cmp $0x100,%rax
|
||||
jl .Lxorpart4
|
||||
movdqu 0xf0(%rdx),%xmm1
|
||||
pxor %xmm1,%xmm0
|
||||
movdqu %xmm0,0xf0(%rsi)
|
||||
|
||||
.Ldone4:
|
||||
lea -8(%r10),%rsp
|
||||
ret
|
||||
ENDPROC(chacha20_4block_xor_ssse3)
|
||||
|
||||
.Lxorpart4:
|
||||
# xor remaining bytes from partial register into output
|
||||
mov %rax,%r9
|
||||
and $0x0f,%r9
|
||||
jz .Ldone4
|
||||
and $~0x0f,%rax
|
||||
|
||||
mov %rsi,%r11
|
||||
|
||||
lea (%rdx,%rax),%rsi
|
||||
mov %rsp,%rdi
|
||||
mov %r9,%rcx
|
||||
rep movsb
|
||||
|
||||
pxor 0x00(%rsp),%xmm0
|
||||
movdqa %xmm0,0x00(%rsp)
|
||||
|
||||
mov %rsp,%rsi
|
||||
lea (%r11,%rax),%rdi
|
||||
mov %r9,%rcx
|
||||
rep movsb
|
||||
|
||||
jmp .Ldone4
|
||||
|
||||
ENDPROC(chacha_4block_xor_ssse3)
|
@ -1,448 +0,0 @@
|
||||
/*
|
||||
* ChaCha20 256-bit cipher algorithm, RFC7539, x64 AVX2 functions
|
||||
*
|
||||
* Copyright (C) 2015 Martin Willi
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License as published by
|
||||
* the Free Software Foundation; either version 2 of the License, or
|
||||
* (at your option) any later version.
|
||||
*/
|
||||
|
||||
#include <linux/linkage.h>
|
||||
|
||||
.section .rodata.cst32.ROT8, "aM", @progbits, 32
|
||||
.align 32
|
||||
ROT8: .octa 0x0e0d0c0f0a09080b0605040702010003
|
||||
.octa 0x0e0d0c0f0a09080b0605040702010003
|
||||
|
||||
.section .rodata.cst32.ROT16, "aM", @progbits, 32
|
||||
.align 32
|
||||
ROT16: .octa 0x0d0c0f0e09080b0a0504070601000302
|
||||
.octa 0x0d0c0f0e09080b0a0504070601000302
|
||||
|
||||
.section .rodata.cst32.CTRINC, "aM", @progbits, 32
|
||||
.align 32
|
||||
CTRINC: .octa 0x00000003000000020000000100000000
|
||||
.octa 0x00000007000000060000000500000004
|
||||
|
||||
.text
|
||||
|
||||
ENTRY(chacha20_8block_xor_avx2)
|
||||
# %rdi: Input state matrix, s
|
||||
# %rsi: 8 data blocks output, o
|
||||
# %rdx: 8 data blocks input, i
|
||||
|
||||
# This function encrypts eight consecutive ChaCha20 blocks by loading
|
||||
# the state matrix in AVX registers eight times. As we need some
|
||||
# scratch registers, we save the first four registers on the stack. The
|
||||
# algorithm performs each operation on the corresponding word of each
|
||||
# state matrix, hence requires no word shuffling. For final XORing step
|
||||
# we transpose the matrix by interleaving 32-, 64- and then 128-bit
|
||||
# words, which allows us to do XOR in AVX registers. 8/16-bit word
|
||||
# rotation is done with the slightly better performing byte shuffling,
|
||||
# 7/12-bit word rotation uses traditional shift+OR.
|
||||
|
||||
vzeroupper
|
||||
# 4 * 32 byte stack, 32-byte aligned
|
||||
lea 8(%rsp),%r10
|
||||
and $~31, %rsp
|
||||
sub $0x80, %rsp
|
||||
|
||||
# x0..15[0-7] = s[0..15]
|
||||
vpbroadcastd 0x00(%rdi),%ymm0
|
||||
vpbroadcastd 0x04(%rdi),%ymm1
|
||||
vpbroadcastd 0x08(%rdi),%ymm2
|
||||
vpbroadcastd 0x0c(%rdi),%ymm3
|
||||
vpbroadcastd 0x10(%rdi),%ymm4
|
||||
vpbroadcastd 0x14(%rdi),%ymm5
|
||||
vpbroadcastd 0x18(%rdi),%ymm6
|
||||
vpbroadcastd 0x1c(%rdi),%ymm7
|
||||
vpbroadcastd 0x20(%rdi),%ymm8
|
||||
vpbroadcastd 0x24(%rdi),%ymm9
|
||||
vpbroadcastd 0x28(%rdi),%ymm10
|
||||
vpbroadcastd 0x2c(%rdi),%ymm11
|
||||
vpbroadcastd 0x30(%rdi),%ymm12
|
||||
vpbroadcastd 0x34(%rdi),%ymm13
|
||||
vpbroadcastd 0x38(%rdi),%ymm14
|
||||
vpbroadcastd 0x3c(%rdi),%ymm15
|
||||
# x0..3 on stack
|
||||
vmovdqa %ymm0,0x00(%rsp)
|
||||
vmovdqa %ymm1,0x20(%rsp)
|
||||
vmovdqa %ymm2,0x40(%rsp)
|
||||
vmovdqa %ymm3,0x60(%rsp)
|
||||
|
||||
vmovdqa CTRINC(%rip),%ymm1
|
||||
vmovdqa ROT8(%rip),%ymm2
|
||||
vmovdqa ROT16(%rip),%ymm3
|
||||
|
||||
# x12 += counter values 0-3
|
||||
vpaddd %ymm1,%ymm12,%ymm12
|
||||
|
||||
mov $10,%ecx
|
||||
|
||||
.Ldoubleround8:
|
||||
# x0 += x4, x12 = rotl32(x12 ^ x0, 16)
|
||||
vpaddd 0x00(%rsp),%ymm4,%ymm0
|
||||
vmovdqa %ymm0,0x00(%rsp)
|
||||
vpxor %ymm0,%ymm12,%ymm12
|
||||
vpshufb %ymm3,%ymm12,%ymm12
|
||||
# x1 += x5, x13 = rotl32(x13 ^ x1, 16)
|
||||
vpaddd 0x20(%rsp),%ymm5,%ymm0
|
||||
vmovdqa %ymm0,0x20(%rsp)
|
||||
vpxor %ymm0,%ymm13,%ymm13
|
||||
vpshufb %ymm3,%ymm13,%ymm13
|
||||
# x2 += x6, x14 = rotl32(x14 ^ x2, 16)
|
||||
vpaddd 0x40(%rsp),%ymm6,%ymm0
|
||||
vmovdqa %ymm0,0x40(%rsp)
|
||||
vpxor %ymm0,%ymm14,%ymm14
|
||||
vpshufb %ymm3,%ymm14,%ymm14
|
||||
# x3 += x7, x15 = rotl32(x15 ^ x3, 16)
|
||||
vpaddd 0x60(%rsp),%ymm7,%ymm0
|
||||
vmovdqa %ymm0,0x60(%rsp)
|
||||
vpxor %ymm0,%ymm15,%ymm15
|
||||
vpshufb %ymm3,%ymm15,%ymm15
|
||||
|
||||
# x8 += x12, x4 = rotl32(x4 ^ x8, 12)
|
||||
vpaddd %ymm12,%ymm8,%ymm8
|
||||
vpxor %ymm8,%ymm4,%ymm4
|
||||
vpslld $12,%ymm4,%ymm0
|
||||
vpsrld $20,%ymm4,%ymm4
|
||||
vpor %ymm0,%ymm4,%ymm4
|
||||
# x9 += x13, x5 = rotl32(x5 ^ x9, 12)
|
||||
vpaddd %ymm13,%ymm9,%ymm9
|
||||
vpxor %ymm9,%ymm5,%ymm5
|
||||
vpslld $12,%ymm5,%ymm0
|
||||
vpsrld $20,%ymm5,%ymm5
|
||||
vpor %ymm0,%ymm5,%ymm5
|
||||
# x10 += x14, x6 = rotl32(x6 ^ x10, 12)
|
||||
vpaddd %ymm14,%ymm10,%ymm10
|
||||
vpxor %ymm10,%ymm6,%ymm6
|
||||
vpslld $12,%ymm6,%ymm0
|
||||
vpsrld $20,%ymm6,%ymm6
|
||||
vpor %ymm0,%ymm6,%ymm6
|
||||
# x11 += x15, x7 = rotl32(x7 ^ x11, 12)
|
||||
vpaddd %ymm15,%ymm11,%ymm11
|
||||
vpxor %ymm11,%ymm7,%ymm7
|
||||
vpslld $12,%ymm7,%ymm0
|
||||
vpsrld $20,%ymm7,%ymm7
|
||||
vpor %ymm0,%ymm7,%ymm7
|
||||
|
||||
# x0 += x4, x12 = rotl32(x12 ^ x0, 8)
|
||||
vpaddd 0x00(%rsp),%ymm4,%ymm0
|
||||
vmovdqa %ymm0,0x00(%rsp)
|
||||
vpxor %ymm0,%ymm12,%ymm12
|
||||
vpshufb %ymm2,%ymm12,%ymm12
|
||||
# x1 += x5, x13 = rotl32(x13 ^ x1, 8)
|
||||
vpaddd 0x20(%rsp),%ymm5,%ymm0
|
||||
vmovdqa %ymm0,0x20(%rsp)
|
||||
vpxor %ymm0,%ymm13,%ymm13
|
||||
vpshufb %ymm2,%ymm13,%ymm13
|
||||
# x2 += x6, x14 = rotl32(x14 ^ x2, 8)
|
||||
vpaddd 0x40(%rsp),%ymm6,%ymm0
|
||||
vmovdqa %ymm0,0x40(%rsp)
|
||||
vpxor %ymm0,%ymm14,%ymm14
|
||||
vpshufb %ymm2,%ymm14,%ymm14
|
||||
# x3 += x7, x15 = rotl32(x15 ^ x3, 8)
|
||||
vpaddd 0x60(%rsp),%ymm7,%ymm0
|
||||
vmovdqa %ymm0,0x60(%rsp)
|
||||
vpxor %ymm0,%ymm15,%ymm15
|
||||
vpshufb %ymm2,%ymm15,%ymm15
|
||||
|
||||
# x8 += x12, x4 = rotl32(x4 ^ x8, 7)
|
||||
vpaddd %ymm12,%ymm8,%ymm8
|
||||
vpxor %ymm8,%ymm4,%ymm4
|
||||
vpslld $7,%ymm4,%ymm0
|
||||
vpsrld $25,%ymm4,%ymm4
|
||||
vpor %ymm0,%ymm4,%ymm4
|
||||
# x9 += x13, x5 = rotl32(x5 ^ x9, 7)
|
||||
vpaddd %ymm13,%ymm9,%ymm9
|
||||
vpxor %ymm9,%ymm5,%ymm5
|
||||
vpslld $7,%ymm5,%ymm0
|
||||
vpsrld $25,%ymm5,%ymm5
|
||||
vpor %ymm0,%ymm5,%ymm5
|
||||
# x10 += x14, x6 = rotl32(x6 ^ x10, 7)
|
||||
vpaddd %ymm14,%ymm10,%ymm10
|
||||
vpxor %ymm10,%ymm6,%ymm6
|
||||
vpslld $7,%ymm6,%ymm0
|
||||
vpsrld $25,%ymm6,%ymm6
|
||||
vpor %ymm0,%ymm6,%ymm6
|
||||
# x11 += x15, x7 = rotl32(x7 ^ x11, 7)
|
||||
vpaddd %ymm15,%ymm11,%ymm11
|
||||
vpxor %ymm11,%ymm7,%ymm7
|
||||
vpslld $7,%ymm7,%ymm0
|
||||
vpsrld $25,%ymm7,%ymm7
|
||||
vpor %ymm0,%ymm7,%ymm7
|
||||
|
||||
# x0 += x5, x15 = rotl32(x15 ^ x0, 16)
|
||||
vpaddd 0x00(%rsp),%ymm5,%ymm0
|
||||
vmovdqa %ymm0,0x00(%rsp)
|
||||
vpxor %ymm0,%ymm15,%ymm15
|
||||
vpshufb %ymm3,%ymm15,%ymm15
|
||||
# x1 += x6, x12 = rotl32(x12 ^ x1, 16)%ymm0
|
||||
vpaddd 0x20(%rsp),%ymm6,%ymm0
|
||||
vmovdqa %ymm0,0x20(%rsp)
|
||||
vpxor %ymm0,%ymm12,%ymm12
|
||||
vpshufb %ymm3,%ymm12,%ymm12
|
||||
# x2 += x7, x13 = rotl32(x13 ^ x2, 16)
|
||||
vpaddd 0x40(%rsp),%ymm7,%ymm0
|
||||
vmovdqa %ymm0,0x40(%rsp)
|
||||
vpxor %ymm0,%ymm13,%ymm13
|
||||
vpshufb %ymm3,%ymm13,%ymm13
|
||||
# x3 += x4, x14 = rotl32(x14 ^ x3, 16)
|
||||
vpaddd 0x60(%rsp),%ymm4,%ymm0
|
||||
vmovdqa %ymm0,0x60(%rsp)
|
||||
vpxor %ymm0,%ymm14,%ymm14
|
||||
vpshufb %ymm3,%ymm14,%ymm14
|
||||
|
||||
# x10 += x15, x5 = rotl32(x5 ^ x10, 12)
|
||||
vpaddd %ymm15,%ymm10,%ymm10
|
||||
vpxor %ymm10,%ymm5,%ymm5
|
||||
vpslld $12,%ymm5,%ymm0
|
||||
vpsrld $20,%ymm5,%ymm5
|
||||
vpor %ymm0,%ymm5,%ymm5
|
||||
# x11 += x12, x6 = rotl32(x6 ^ x11, 12)
|
||||
vpaddd %ymm12,%ymm11,%ymm11
|
||||
vpxor %ymm11,%ymm6,%ymm6
|
||||
vpslld $12,%ymm6,%ymm0
|
||||
vpsrld $20,%ymm6,%ymm6
|
||||
vpor %ymm0,%ymm6,%ymm6
|
||||
# x8 += x13, x7 = rotl32(x7 ^ x8, 12)
|
||||
vpaddd %ymm13,%ymm8,%ymm8
|
||||
vpxor %ymm8,%ymm7,%ymm7
|
||||
vpslld $12,%ymm7,%ymm0
|
||||
vpsrld $20,%ymm7,%ymm7
|
||||
vpor %ymm0,%ymm7,%ymm7
|
||||
# x9 += x14, x4 = rotl32(x4 ^ x9, 12)
|
||||
vpaddd %ymm14,%ymm9,%ymm9
|
||||
vpxor %ymm9,%ymm4,%ymm4
|
||||
vpslld $12,%ymm4,%ymm0
|
||||
vpsrld $20,%ymm4,%ymm4
|
||||
vpor %ymm0,%ymm4,%ymm4
|
||||
|
||||
# x0 += x5, x15 = rotl32(x15 ^ x0, 8)
|
||||
vpaddd 0x00(%rsp),%ymm5,%ymm0
|
||||
vmovdqa %ymm0,0x00(%rsp)
|
||||
vpxor %ymm0,%ymm15,%ymm15
|
||||
vpshufb %ymm2,%ymm15,%ymm15
|
||||
# x1 += x6, x12 = rotl32(x12 ^ x1, 8)
|
||||
vpaddd 0x20(%rsp),%ymm6,%ymm0
|
||||
vmovdqa %ymm0,0x20(%rsp)
|
||||
vpxor %ymm0,%ymm12,%ymm12
|
||||
vpshufb %ymm2,%ymm12,%ymm12
|
||||
# x2 += x7, x13 = rotl32(x13 ^ x2, 8)
|
||||
vpaddd 0x40(%rsp),%ymm7,%ymm0
|
||||
vmovdqa %ymm0,0x40(%rsp)
|
||||
vpxor %ymm0,%ymm13,%ymm13
|
||||
vpshufb %ymm2,%ymm13,%ymm13
|
||||
# x3 += x4, x14 = rotl32(x14 ^ x3, 8)
|
||||
vpaddd 0x60(%rsp),%ymm4,%ymm0
|
||||
vmovdqa %ymm0,0x60(%rsp)
|
||||
vpxor %ymm0,%ymm14,%ymm14
|
||||
vpshufb %ymm2,%ymm14,%ymm14
|
||||
|
||||
# x10 += x15, x5 = rotl32(x5 ^ x10, 7)
|
||||
vpaddd %ymm15,%ymm10,%ymm10
|
||||
vpxor %ymm10,%ymm5,%ymm5
|
||||
vpslld $7,%ymm5,%ymm0
|
||||
vpsrld $25,%ymm5,%ymm5
|
||||
vpor %ymm0,%ymm5,%ymm5
|
||||
# x11 += x12, x6 = rotl32(x6 ^ x11, 7)
|
||||
vpaddd %ymm12,%ymm11,%ymm11
|
||||
vpxor %ymm11,%ymm6,%ymm6
|
||||
vpslld $7,%ymm6,%ymm0
|
||||
vpsrld $25,%ymm6,%ymm6
|
||||
vpor %ymm0,%ymm6,%ymm6
|
||||
# x8 += x13, x7 = rotl32(x7 ^ x8, 7)
|
||||
vpaddd %ymm13,%ymm8,%ymm8
|
||||
vpxor %ymm8,%ymm7,%ymm7
|
||||
vpslld $7,%ymm7,%ymm0
|
||||
vpsrld $25,%ymm7,%ymm7
|
||||
vpor %ymm0,%ymm7,%ymm7
|
||||
# x9 += x14, x4 = rotl32(x4 ^ x9, 7)
|
||||
vpaddd %ymm14,%ymm9,%ymm9
|
||||
vpxor %ymm9,%ymm4,%ymm4
|
||||
vpslld $7,%ymm4,%ymm0
|
||||
vpsrld $25,%ymm4,%ymm4
|
||||
vpor %ymm0,%ymm4,%ymm4
|
||||
|
||||
dec %ecx
|
||||
jnz .Ldoubleround8
|
||||
|
||||
# x0..15[0-3] += s[0..15]
|
||||
vpbroadcastd 0x00(%rdi),%ymm0
|
||||
vpaddd 0x00(%rsp),%ymm0,%ymm0
|
||||
vmovdqa %ymm0,0x00(%rsp)
|
||||
vpbroadcastd 0x04(%rdi),%ymm0
|
||||
vpaddd 0x20(%rsp),%ymm0,%ymm0
|
||||
vmovdqa %ymm0,0x20(%rsp)
|
||||
vpbroadcastd 0x08(%rdi),%ymm0
|
||||
vpaddd 0x40(%rsp),%ymm0,%ymm0
|
||||
vmovdqa %ymm0,0x40(%rsp)
|
||||
vpbroadcastd 0x0c(%rdi),%ymm0
|
||||
vpaddd 0x60(%rsp),%ymm0,%ymm0
|
||||
vmovdqa %ymm0,0x60(%rsp)
|
||||
vpbroadcastd 0x10(%rdi),%ymm0
|
||||
vpaddd %ymm0,%ymm4,%ymm4
|
||||
vpbroadcastd 0x14(%rdi),%ymm0
|
||||
vpaddd %ymm0,%ymm5,%ymm5
|
||||
vpbroadcastd 0x18(%rdi),%ymm0
|
||||
vpaddd %ymm0,%ymm6,%ymm6
|
||||
vpbroadcastd 0x1c(%rdi),%ymm0
|
||||
vpaddd %ymm0,%ymm7,%ymm7
|
||||
vpbroadcastd 0x20(%rdi),%ymm0
|
||||
vpaddd %ymm0,%ymm8,%ymm8
|
||||
vpbroadcastd 0x24(%rdi),%ymm0
|
||||
vpaddd %ymm0,%ymm9,%ymm9
|
||||
vpbroadcastd 0x28(%rdi),%ymm0
|
||||
vpaddd %ymm0,%ymm10,%ymm10
|
||||
vpbroadcastd 0x2c(%rdi),%ymm0
|
||||
vpaddd %ymm0,%ymm11,%ymm11
|
||||
vpbroadcastd 0x30(%rdi),%ymm0
|
||||
vpaddd %ymm0,%ymm12,%ymm12
|
||||
vpbroadcastd 0x34(%rdi),%ymm0
|
||||
vpaddd %ymm0,%ymm13,%ymm13
|
||||
vpbroadcastd 0x38(%rdi),%ymm0
|
||||
vpaddd %ymm0,%ymm14,%ymm14
|
||||
vpbroadcastd 0x3c(%rdi),%ymm0
|
||||
vpaddd %ymm0,%ymm15,%ymm15
|
||||
|
||||
# x12 += counter values 0-3
|
||||
vpaddd %ymm1,%ymm12,%ymm12
|
||||
|
||||
# interleave 32-bit words in state n, n+1
|
||||
vmovdqa 0x00(%rsp),%ymm0
|
||||
vmovdqa 0x20(%rsp),%ymm1
|
||||
vpunpckldq %ymm1,%ymm0,%ymm2
|
||||
vpunpckhdq %ymm1,%ymm0,%ymm1
|
||||
vmovdqa %ymm2,0x00(%rsp)
|
||||
vmovdqa %ymm1,0x20(%rsp)
|
||||
vmovdqa 0x40(%rsp),%ymm0
|
||||
vmovdqa 0x60(%rsp),%ymm1
|
||||
vpunpckldq %ymm1,%ymm0,%ymm2
|
||||
vpunpckhdq %ymm1,%ymm0,%ymm1
|
||||
vmovdqa %ymm2,0x40(%rsp)
|
||||
vmovdqa %ymm1,0x60(%rsp)
|
||||
vmovdqa %ymm4,%ymm0
|
||||
vpunpckldq %ymm5,%ymm0,%ymm4
|
||||
vpunpckhdq %ymm5,%ymm0,%ymm5
|
||||
vmovdqa %ymm6,%ymm0
|
||||
vpunpckldq %ymm7,%ymm0,%ymm6
|
||||
vpunpckhdq %ymm7,%ymm0,%ymm7
|
||||
vmovdqa %ymm8,%ymm0
|
||||
vpunpckldq %ymm9,%ymm0,%ymm8
|
||||
vpunpckhdq %ymm9,%ymm0,%ymm9
|
||||
vmovdqa %ymm10,%ymm0
|
||||
vpunpckldq %ymm11,%ymm0,%ymm10
|
||||
vpunpckhdq %ymm11,%ymm0,%ymm11
|
||||
vmovdqa %ymm12,%ymm0
|
||||
vpunpckldq %ymm13,%ymm0,%ymm12
|
||||
vpunpckhdq %ymm13,%ymm0,%ymm13
|
||||
vmovdqa %ymm14,%ymm0
|
||||
vpunpckldq %ymm15,%ymm0,%ymm14
|
||||
vpunpckhdq %ymm15,%ymm0,%ymm15
|
||||
|
||||
# interleave 64-bit words in state n, n+2
|
||||
vmovdqa 0x00(%rsp),%ymm0
|
||||
vmovdqa 0x40(%rsp),%ymm2
|
||||
vpunpcklqdq %ymm2,%ymm0,%ymm1
|
||||
vpunpckhqdq %ymm2,%ymm0,%ymm2
|
||||
vmovdqa %ymm1,0x00(%rsp)
|
||||
vmovdqa %ymm2,0x40(%rsp)
|
||||
vmovdqa 0x20(%rsp),%ymm0
|
||||
vmovdqa 0x60(%rsp),%ymm2
|
||||
vpunpcklqdq %ymm2,%ymm0,%ymm1
|
||||
vpunpckhqdq %ymm2,%ymm0,%ymm2
|
||||
vmovdqa %ymm1,0x20(%rsp)
|
||||
vmovdqa %ymm2,0x60(%rsp)
|
||||
vmovdqa %ymm4,%ymm0
|
||||
vpunpcklqdq %ymm6,%ymm0,%ymm4
|
||||
vpunpckhqdq %ymm6,%ymm0,%ymm6
|
||||
vmovdqa %ymm5,%ymm0
|
||||
vpunpcklqdq %ymm7,%ymm0,%ymm5
|
||||
vpunpckhqdq %ymm7,%ymm0,%ymm7
|
||||
vmovdqa %ymm8,%ymm0
|
||||
vpunpcklqdq %ymm10,%ymm0,%ymm8
|
||||
vpunpckhqdq %ymm10,%ymm0,%ymm10
|
||||
vmovdqa %ymm9,%ymm0
|
||||
vpunpcklqdq %ymm11,%ymm0,%ymm9
|
||||
vpunpckhqdq %ymm11,%ymm0,%ymm11
|
||||
vmovdqa %ymm12,%ymm0
|
||||
vpunpcklqdq %ymm14,%ymm0,%ymm12
|
||||
vpunpckhqdq %ymm14,%ymm0,%ymm14
|
||||
vmovdqa %ymm13,%ymm0
|
||||
vpunpcklqdq %ymm15,%ymm0,%ymm13
|
||||
vpunpckhqdq %ymm15,%ymm0,%ymm15
|
||||
|
||||
# interleave 128-bit words in state n, n+4
|
||||
vmovdqa 0x00(%rsp),%ymm0
|
||||
vperm2i128 $0x20,%ymm4,%ymm0,%ymm1
|
||||
vperm2i128 $0x31,%ymm4,%ymm0,%ymm4
|
||||
vmovdqa %ymm1,0x00(%rsp)
|
||||
vmovdqa 0x20(%rsp),%ymm0
|
||||
vperm2i128 $0x20,%ymm5,%ymm0,%ymm1
|
||||
vperm2i128 $0x31,%ymm5,%ymm0,%ymm5
|
||||
vmovdqa %ymm1,0x20(%rsp)
|
||||
vmovdqa 0x40(%rsp),%ymm0
|
||||
vperm2i128 $0x20,%ymm6,%ymm0,%ymm1
|
||||
vperm2i128 $0x31,%ymm6,%ymm0,%ymm6
|
||||
vmovdqa %ymm1,0x40(%rsp)
|
||||
vmovdqa 0x60(%rsp),%ymm0
|
||||
vperm2i128 $0x20,%ymm7,%ymm0,%ymm1
|
||||
vperm2i128 $0x31,%ymm7,%ymm0,%ymm7
|
||||
vmovdqa %ymm1,0x60(%rsp)
|
||||
vperm2i128 $0x20,%ymm12,%ymm8,%ymm0
|
||||
vperm2i128 $0x31,%ymm12,%ymm8,%ymm12
|
||||
vmovdqa %ymm0,%ymm8
|
||||
vperm2i128 $0x20,%ymm13,%ymm9,%ymm0
|
||||
vperm2i128 $0x31,%ymm13,%ymm9,%ymm13
|
||||
vmovdqa %ymm0,%ymm9
|
||||
vperm2i128 $0x20,%ymm14,%ymm10,%ymm0
|
||||
vperm2i128 $0x31,%ymm14,%ymm10,%ymm14
|
||||
vmovdqa %ymm0,%ymm10
|
||||
vperm2i128 $0x20,%ymm15,%ymm11,%ymm0
|
||||
vperm2i128 $0x31,%ymm15,%ymm11,%ymm15
|
||||
vmovdqa %ymm0,%ymm11
|
||||
|
||||
# xor with corresponding input, write to output
|
||||
vmovdqa 0x00(%rsp),%ymm0
|
||||
vpxor 0x0000(%rdx),%ymm0,%ymm0
|
||||
vmovdqu %ymm0,0x0000(%rsi)
|
||||
vmovdqa 0x20(%rsp),%ymm0
|
||||
vpxor 0x0080(%rdx),%ymm0,%ymm0
|
||||
vmovdqu %ymm0,0x0080(%rsi)
|
||||
vmovdqa 0x40(%rsp),%ymm0
|
||||
vpxor 0x0040(%rdx),%ymm0,%ymm0
|
||||
vmovdqu %ymm0,0x0040(%rsi)
|
||||
vmovdqa 0x60(%rsp),%ymm0
|
||||
vpxor 0x00c0(%rdx),%ymm0,%ymm0
|
||||
vmovdqu %ymm0,0x00c0(%rsi)
|
||||
vpxor 0x0100(%rdx),%ymm4,%ymm4
|
||||
vmovdqu %ymm4,0x0100(%rsi)
|
||||
vpxor 0x0180(%rdx),%ymm5,%ymm5
|
||||
vmovdqu %ymm5,0x00180(%rsi)
|
||||
vpxor 0x0140(%rdx),%ymm6,%ymm6
|
||||
vmovdqu %ymm6,0x0140(%rsi)
|
||||
vpxor 0x01c0(%rdx),%ymm7,%ymm7
|
||||
vmovdqu %ymm7,0x01c0(%rsi)
|
||||
vpxor 0x0020(%rdx),%ymm8,%ymm8
|
||||
vmovdqu %ymm8,0x0020(%rsi)
|
||||
vpxor 0x00a0(%rdx),%ymm9,%ymm9
|
||||
vmovdqu %ymm9,0x00a0(%rsi)
|
||||
vpxor 0x0060(%rdx),%ymm10,%ymm10
|
||||
vmovdqu %ymm10,0x0060(%rsi)
|
||||
vpxor 0x00e0(%rdx),%ymm11,%ymm11
|
||||
vmovdqu %ymm11,0x00e0(%rsi)
|
||||
vpxor 0x0120(%rdx),%ymm12,%ymm12
|
||||
vmovdqu %ymm12,0x0120(%rsi)
|
||||
vpxor 0x01a0(%rdx),%ymm13,%ymm13
|
||||
vmovdqu %ymm13,0x01a0(%rsi)
|
||||
vpxor 0x0160(%rdx),%ymm14,%ymm14
|
||||
vmovdqu %ymm14,0x0160(%rsi)
|
||||
vpxor 0x01e0(%rdx),%ymm15,%ymm15
|
||||
vmovdqu %ymm15,0x01e0(%rsi)
|
||||
|
||||
vzeroupper
|
||||
lea -8(%r10),%rsp
|
||||
ret
|
||||
ENDPROC(chacha20_8block_xor_avx2)
|
@ -1,146 +0,0 @@
|
||||
/*
|
||||
* ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
|
||||
*
|
||||
* Copyright (C) 2015 Martin Willi
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License as published by
|
||||
* the Free Software Foundation; either version 2 of the License, or
|
||||
* (at your option) any later version.
|
||||
*/
|
||||
|
||||
#include <crypto/algapi.h>
|
||||
#include <crypto/chacha20.h>
|
||||
#include <crypto/internal/skcipher.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/module.h>
|
||||
#include <asm/fpu/api.h>
|
||||
#include <asm/simd.h>
|
||||
|
||||
#define CHACHA20_STATE_ALIGN 16
|
||||
|
||||
asmlinkage void chacha20_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src);
|
||||
asmlinkage void chacha20_4block_xor_ssse3(u32 *state, u8 *dst, const u8 *src);
|
||||
#ifdef CONFIG_AS_AVX2
|
||||
asmlinkage void chacha20_8block_xor_avx2(u32 *state, u8 *dst, const u8 *src);
|
||||
static bool chacha20_use_avx2;
|
||||
#endif
|
||||
|
||||
static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int bytes)
|
||||
{
|
||||
u8 buf[CHACHA20_BLOCK_SIZE];
|
||||
|
||||
#ifdef CONFIG_AS_AVX2
|
||||
if (chacha20_use_avx2) {
|
||||
while (bytes >= CHACHA20_BLOCK_SIZE * 8) {
|
||||
chacha20_8block_xor_avx2(state, dst, src);
|
||||
bytes -= CHACHA20_BLOCK_SIZE * 8;
|
||||
src += CHACHA20_BLOCK_SIZE * 8;
|
||||
dst += CHACHA20_BLOCK_SIZE * 8;
|
||||
state[12] += 8;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
|
||||
chacha20_4block_xor_ssse3(state, dst, src);
|
||||
bytes -= CHACHA20_BLOCK_SIZE * 4;
|
||||
src += CHACHA20_BLOCK_SIZE * 4;
|
||||
dst += CHACHA20_BLOCK_SIZE * 4;
|
||||
state[12] += 4;
|
||||
}
|
||||
while (bytes >= CHACHA20_BLOCK_SIZE) {
|
||||
chacha20_block_xor_ssse3(state, dst, src);
|
||||
bytes -= CHACHA20_BLOCK_SIZE;
|
||||
src += CHACHA20_BLOCK_SIZE;
|
||||
dst += CHACHA20_BLOCK_SIZE;
|
||||
state[12]++;
|
||||
}
|
||||
if (bytes) {
|
||||
memcpy(buf, src, bytes);
|
||||
chacha20_block_xor_ssse3(state, buf, buf);
|
||||
memcpy(dst, buf, bytes);
|
||||
}
|
||||
}
|
||||
|
||||
static int chacha20_simd(struct skcipher_request *req)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
|
||||
u32 *state, state_buf[16 + 2] __aligned(8);
|
||||
struct skcipher_walk walk;
|
||||
int err;
|
||||
|
||||
BUILD_BUG_ON(CHACHA20_STATE_ALIGN != 16);
|
||||
state = PTR_ALIGN(state_buf + 0, CHACHA20_STATE_ALIGN);
|
||||
|
||||
if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
|
||||
return crypto_chacha20_crypt(req);
|
||||
|
||||
err = skcipher_walk_virt(&walk, req, true);
|
||||
|
||||
crypto_chacha20_init(state, ctx, walk.iv);
|
||||
|
||||
kernel_fpu_begin();
|
||||
|
||||
while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
|
||||
chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
|
||||
rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE));
|
||||
err = skcipher_walk_done(&walk,
|
||||
walk.nbytes % CHACHA20_BLOCK_SIZE);
|
||||
}
|
||||
|
||||
if (walk.nbytes) {
|
||||
chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
|
||||
walk.nbytes);
|
||||
err = skcipher_walk_done(&walk, 0);
|
||||
}
|
||||
|
||||
kernel_fpu_end();
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
static struct skcipher_alg alg = {
|
||||
.base.cra_name = "chacha20",
|
||||
.base.cra_driver_name = "chacha20-simd",
|
||||
.base.cra_priority = 300,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha20_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA20_KEY_SIZE,
|
||||
.max_keysize = CHACHA20_KEY_SIZE,
|
||||
.ivsize = CHACHA20_IV_SIZE,
|
||||
.chunksize = CHACHA20_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha20_setkey,
|
||||
.encrypt = chacha20_simd,
|
||||
.decrypt = chacha20_simd,
|
||||
};
|
||||
|
||||
static int __init chacha20_simd_mod_init(void)
|
||||
{
|
||||
if (!boot_cpu_has(X86_FEATURE_SSSE3))
|
||||
return -ENODEV;
|
||||
|
||||
#ifdef CONFIG_AS_AVX2
|
||||
chacha20_use_avx2 = boot_cpu_has(X86_FEATURE_AVX) &&
|
||||
boot_cpu_has(X86_FEATURE_AVX2) &&
|
||||
cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
|
||||
#endif
|
||||
return crypto_register_skcipher(&alg);
|
||||
}
|
||||
|
||||
static void __exit chacha20_simd_mod_fini(void)
|
||||
{
|
||||
crypto_unregister_skcipher(&alg);
|
||||
}
|
||||
|
||||
module_init(chacha20_simd_mod_init);
|
||||
module_exit(chacha20_simd_mod_fini);
|
||||
|
||||
MODULE_LICENSE("GPL");
|
||||
MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
|
||||
MODULE_DESCRIPTION("chacha20 cipher algorithm, SIMD accelerated");
|
||||
MODULE_ALIAS_CRYPTO("chacha20");
|
||||
MODULE_ALIAS_CRYPTO("chacha20-simd");
|
304
arch/x86/crypto/chacha_glue.c
Normal file
304
arch/x86/crypto/chacha_glue.c
Normal file
@ -0,0 +1,304 @@
|
||||
/*
|
||||
* x64 SIMD accelerated ChaCha and XChaCha stream ciphers,
|
||||
* including ChaCha20 (RFC7539)
|
||||
*
|
||||
* Copyright (C) 2015 Martin Willi
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License as published by
|
||||
* the Free Software Foundation; either version 2 of the License, or
|
||||
* (at your option) any later version.
|
||||
*/
|
||||
|
||||
#include <crypto/algapi.h>
|
||||
#include <crypto/chacha.h>
|
||||
#include <crypto/internal/skcipher.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/module.h>
|
||||
#include <asm/fpu/api.h>
|
||||
#include <asm/simd.h>
|
||||
|
||||
#define CHACHA_STATE_ALIGN 16
|
||||
|
||||
asmlinkage void chacha_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int len, int nrounds);
|
||||
asmlinkage void chacha_4block_xor_ssse3(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int len, int nrounds);
|
||||
asmlinkage void hchacha_block_ssse3(const u32 *state, u32 *out, int nrounds);
|
||||
#ifdef CONFIG_AS_AVX2
|
||||
asmlinkage void chacha_2block_xor_avx2(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int len, int nrounds);
|
||||
asmlinkage void chacha_4block_xor_avx2(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int len, int nrounds);
|
||||
asmlinkage void chacha_8block_xor_avx2(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int len, int nrounds);
|
||||
static bool chacha_use_avx2;
|
||||
#ifdef CONFIG_AS_AVX512
|
||||
asmlinkage void chacha_2block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int len, int nrounds);
|
||||
asmlinkage void chacha_4block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int len, int nrounds);
|
||||
asmlinkage void chacha_8block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int len, int nrounds);
|
||||
static bool chacha_use_avx512vl;
|
||||
#endif
|
||||
#endif
|
||||
|
||||
static unsigned int chacha_advance(unsigned int len, unsigned int maxblocks)
|
||||
{
|
||||
len = min(len, maxblocks * CHACHA_BLOCK_SIZE);
|
||||
return round_up(len, CHACHA_BLOCK_SIZE) / CHACHA_BLOCK_SIZE;
|
||||
}
|
||||
|
||||
static void chacha_dosimd(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int bytes, int nrounds)
|
||||
{
|
||||
#ifdef CONFIG_AS_AVX2
|
||||
#ifdef CONFIG_AS_AVX512
|
||||
if (chacha_use_avx512vl) {
|
||||
while (bytes >= CHACHA_BLOCK_SIZE * 8) {
|
||||
chacha_8block_xor_avx512vl(state, dst, src, bytes,
|
||||
nrounds);
|
||||
bytes -= CHACHA_BLOCK_SIZE * 8;
|
||||
src += CHACHA_BLOCK_SIZE * 8;
|
||||
dst += CHACHA_BLOCK_SIZE * 8;
|
||||
state[12] += 8;
|
||||
}
|
||||
if (bytes > CHACHA_BLOCK_SIZE * 4) {
|
||||
chacha_8block_xor_avx512vl(state, dst, src, bytes,
|
||||
nrounds);
|
||||
state[12] += chacha_advance(bytes, 8);
|
||||
return;
|
||||
}
|
||||
if (bytes > CHACHA_BLOCK_SIZE * 2) {
|
||||
chacha_4block_xor_avx512vl(state, dst, src, bytes,
|
||||
nrounds);
|
||||
state[12] += chacha_advance(bytes, 4);
|
||||
return;
|
||||
}
|
||||
if (bytes) {
|
||||
chacha_2block_xor_avx512vl(state, dst, src, bytes,
|
||||
nrounds);
|
||||
state[12] += chacha_advance(bytes, 2);
|
||||
return;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
if (chacha_use_avx2) {
|
||||
while (bytes >= CHACHA_BLOCK_SIZE * 8) {
|
||||
chacha_8block_xor_avx2(state, dst, src, bytes, nrounds);
|
||||
bytes -= CHACHA_BLOCK_SIZE * 8;
|
||||
src += CHACHA_BLOCK_SIZE * 8;
|
||||
dst += CHACHA_BLOCK_SIZE * 8;
|
||||
state[12] += 8;
|
||||
}
|
||||
if (bytes > CHACHA_BLOCK_SIZE * 4) {
|
||||
chacha_8block_xor_avx2(state, dst, src, bytes, nrounds);
|
||||
state[12] += chacha_advance(bytes, 8);
|
||||
return;
|
||||
}
|
||||
if (bytes > CHACHA_BLOCK_SIZE * 2) {
|
||||
chacha_4block_xor_avx2(state, dst, src, bytes, nrounds);
|
||||
state[12] += chacha_advance(bytes, 4);
|
||||
return;
|
||||
}
|
||||
if (bytes > CHACHA_BLOCK_SIZE) {
|
||||
chacha_2block_xor_avx2(state, dst, src, bytes, nrounds);
|
||||
state[12] += chacha_advance(bytes, 2);
|
||||
return;
|
||||
}
|
||||
}
|
||||
#endif
|
||||
while (bytes >= CHACHA_BLOCK_SIZE * 4) {
|
||||
chacha_4block_xor_ssse3(state, dst, src, bytes, nrounds);
|
||||
bytes -= CHACHA_BLOCK_SIZE * 4;
|
||||
src += CHACHA_BLOCK_SIZE * 4;
|
||||
dst += CHACHA_BLOCK_SIZE * 4;
|
||||
state[12] += 4;
|
||||
}
|
||||
if (bytes > CHACHA_BLOCK_SIZE) {
|
||||
chacha_4block_xor_ssse3(state, dst, src, bytes, nrounds);
|
||||
state[12] += chacha_advance(bytes, 4);
|
||||
return;
|
||||
}
|
||||
if (bytes) {
|
||||
chacha_block_xor_ssse3(state, dst, src, bytes, nrounds);
|
||||
state[12]++;
|
||||
}
|
||||
}
|
||||
|
||||
static int chacha_simd_stream_xor(struct skcipher_walk *walk,
|
||||
struct chacha_ctx *ctx, u8 *iv)
|
||||
{
|
||||
u32 *state, state_buf[16 + 2] __aligned(8);
|
||||
int next_yield = 4096; /* bytes until next FPU yield */
|
||||
int err = 0;
|
||||
|
||||
BUILD_BUG_ON(CHACHA_STATE_ALIGN != 16);
|
||||
state = PTR_ALIGN(state_buf + 0, CHACHA_STATE_ALIGN);
|
||||
|
||||
crypto_chacha_init(state, ctx, iv);
|
||||
|
||||
while (walk->nbytes > 0) {
|
||||
unsigned int nbytes = walk->nbytes;
|
||||
|
||||
if (nbytes < walk->total) {
|
||||
nbytes = round_down(nbytes, walk->stride);
|
||||
next_yield -= nbytes;
|
||||
}
|
||||
|
||||
chacha_dosimd(state, walk->dst.virt.addr, walk->src.virt.addr,
|
||||
nbytes, ctx->nrounds);
|
||||
|
||||
if (next_yield <= 0) {
|
||||
/* temporarily allow preemption */
|
||||
kernel_fpu_end();
|
||||
kernel_fpu_begin();
|
||||
next_yield = 4096;
|
||||
}
|
||||
|
||||
err = skcipher_walk_done(walk, walk->nbytes - nbytes);
|
||||
}
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
static int chacha_simd(struct skcipher_request *req)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
|
||||
struct skcipher_walk walk;
|
||||
int err;
|
||||
|
||||
if (req->cryptlen <= CHACHA_BLOCK_SIZE || !irq_fpu_usable())
|
||||
return crypto_chacha_crypt(req);
|
||||
|
||||
err = skcipher_walk_virt(&walk, req, true);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
kernel_fpu_begin();
|
||||
err = chacha_simd_stream_xor(&walk, ctx, req->iv);
|
||||
kernel_fpu_end();
|
||||
return err;
|
||||
}
|
||||
|
||||
static int xchacha_simd(struct skcipher_request *req)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
|
||||
struct skcipher_walk walk;
|
||||
struct chacha_ctx subctx;
|
||||
u32 *state, state_buf[16 + 2] __aligned(8);
|
||||
u8 real_iv[16];
|
||||
int err;
|
||||
|
||||
if (req->cryptlen <= CHACHA_BLOCK_SIZE || !irq_fpu_usable())
|
||||
return crypto_xchacha_crypt(req);
|
||||
|
||||
err = skcipher_walk_virt(&walk, req, true);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
BUILD_BUG_ON(CHACHA_STATE_ALIGN != 16);
|
||||
state = PTR_ALIGN(state_buf + 0, CHACHA_STATE_ALIGN);
|
||||
crypto_chacha_init(state, ctx, req->iv);
|
||||
|
||||
kernel_fpu_begin();
|
||||
|
||||
hchacha_block_ssse3(state, subctx.key, ctx->nrounds);
|
||||
subctx.nrounds = ctx->nrounds;
|
||||
|
||||
memcpy(&real_iv[0], req->iv + 24, 8);
|
||||
memcpy(&real_iv[8], req->iv + 16, 8);
|
||||
err = chacha_simd_stream_xor(&walk, &subctx, real_iv);
|
||||
|
||||
kernel_fpu_end();
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
static struct skcipher_alg algs[] = {
|
||||
{
|
||||
.base.cra_name = "chacha20",
|
||||
.base.cra_driver_name = "chacha20-simd",
|
||||
.base.cra_priority = 300,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA_KEY_SIZE,
|
||||
.max_keysize = CHACHA_KEY_SIZE,
|
||||
.ivsize = CHACHA_IV_SIZE,
|
||||
.chunksize = CHACHA_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha20_setkey,
|
||||
.encrypt = chacha_simd,
|
||||
.decrypt = chacha_simd,
|
||||
}, {
|
||||
.base.cra_name = "xchacha20",
|
||||
.base.cra_driver_name = "xchacha20-simd",
|
||||
.base.cra_priority = 300,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA_KEY_SIZE,
|
||||
.max_keysize = CHACHA_KEY_SIZE,
|
||||
.ivsize = XCHACHA_IV_SIZE,
|
||||
.chunksize = CHACHA_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha20_setkey,
|
||||
.encrypt = xchacha_simd,
|
||||
.decrypt = xchacha_simd,
|
||||
}, {
|
||||
.base.cra_name = "xchacha12",
|
||||
.base.cra_driver_name = "xchacha12-simd",
|
||||
.base.cra_priority = 300,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA_KEY_SIZE,
|
||||
.max_keysize = CHACHA_KEY_SIZE,
|
||||
.ivsize = XCHACHA_IV_SIZE,
|
||||
.chunksize = CHACHA_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha12_setkey,
|
||||
.encrypt = xchacha_simd,
|
||||
.decrypt = xchacha_simd,
|
||||
},
|
||||
};
|
||||
|
||||
static int __init chacha_simd_mod_init(void)
|
||||
{
|
||||
if (!boot_cpu_has(X86_FEATURE_SSSE3))
|
||||
return -ENODEV;
|
||||
|
||||
#ifdef CONFIG_AS_AVX2
|
||||
chacha_use_avx2 = boot_cpu_has(X86_FEATURE_AVX) &&
|
||||
boot_cpu_has(X86_FEATURE_AVX2) &&
|
||||
cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
|
||||
#ifdef CONFIG_AS_AVX512
|
||||
chacha_use_avx512vl = chacha_use_avx2 &&
|
||||
boot_cpu_has(X86_FEATURE_AVX512VL) &&
|
||||
boot_cpu_has(X86_FEATURE_AVX512BW); /* kmovq */
|
||||
#endif
|
||||
#endif
|
||||
return crypto_register_skciphers(algs, ARRAY_SIZE(algs));
|
||||
}
|
||||
|
||||
static void __exit chacha_simd_mod_fini(void)
|
||||
{
|
||||
crypto_unregister_skciphers(algs, ARRAY_SIZE(algs));
|
||||
}
|
||||
|
||||
module_init(chacha_simd_mod_init);
|
||||
module_exit(chacha_simd_mod_fini);
|
||||
|
||||
MODULE_LICENSE("GPL");
|
||||
MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
|
||||
MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers (x64 SIMD accelerated)");
|
||||
MODULE_ALIAS_CRYPTO("chacha20");
|
||||
MODULE_ALIAS_CRYPTO("chacha20-simd");
|
||||
MODULE_ALIAS_CRYPTO("xchacha20");
|
||||
MODULE_ALIAS_CRYPTO("xchacha20-simd");
|
||||
MODULE_ALIAS_CRYPTO("xchacha12");
|
||||
MODULE_ALIAS_CRYPTO("xchacha12-simd");
|
157
arch/x86/crypto/nh-avx2-x86_64.S
Normal file
157
arch/x86/crypto/nh-avx2-x86_64.S
Normal file
@ -0,0 +1,157 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
/*
|
||||
* NH - ε-almost-universal hash function, x86_64 AVX2 accelerated
|
||||
*
|
||||
* Copyright 2018 Google LLC
|
||||
*
|
||||
* Author: Eric Biggers <ebiggers@google.com>
|
||||
*/
|
||||
|
||||
#include <linux/linkage.h>
|
||||
|
||||
#define PASS0_SUMS %ymm0
|
||||
#define PASS1_SUMS %ymm1
|
||||
#define PASS2_SUMS %ymm2
|
||||
#define PASS3_SUMS %ymm3
|
||||
#define K0 %ymm4
|
||||
#define K0_XMM %xmm4
|
||||
#define K1 %ymm5
|
||||
#define K1_XMM %xmm5
|
||||
#define K2 %ymm6
|
||||
#define K2_XMM %xmm6
|
||||
#define K3 %ymm7
|
||||
#define K3_XMM %xmm7
|
||||
#define T0 %ymm8
|
||||
#define T1 %ymm9
|
||||
#define T2 %ymm10
|
||||
#define T2_XMM %xmm10
|
||||
#define T3 %ymm11
|
||||
#define T3_XMM %xmm11
|
||||
#define T4 %ymm12
|
||||
#define T5 %ymm13
|
||||
#define T6 %ymm14
|
||||
#define T7 %ymm15
|
||||
#define KEY %rdi
|
||||
#define MESSAGE %rsi
|
||||
#define MESSAGE_LEN %rdx
|
||||
#define HASH %rcx
|
||||
|
||||
.macro _nh_2xstride k0, k1, k2, k3
|
||||
|
||||
// Add message words to key words
|
||||
vpaddd \k0, T3, T0
|
||||
vpaddd \k1, T3, T1
|
||||
vpaddd \k2, T3, T2
|
||||
vpaddd \k3, T3, T3
|
||||
|
||||
// Multiply 32x32 => 64 and accumulate
|
||||
vpshufd $0x10, T0, T4
|
||||
vpshufd $0x32, T0, T0
|
||||
vpshufd $0x10, T1, T5
|
||||
vpshufd $0x32, T1, T1
|
||||
vpshufd $0x10, T2, T6
|
||||
vpshufd $0x32, T2, T2
|
||||
vpshufd $0x10, T3, T7
|
||||
vpshufd $0x32, T3, T3
|
||||
vpmuludq T4, T0, T0
|
||||
vpmuludq T5, T1, T1
|
||||
vpmuludq T6, T2, T2
|
||||
vpmuludq T7, T3, T3
|
||||
vpaddq T0, PASS0_SUMS, PASS0_SUMS
|
||||
vpaddq T1, PASS1_SUMS, PASS1_SUMS
|
||||
vpaddq T2, PASS2_SUMS, PASS2_SUMS
|
||||
vpaddq T3, PASS3_SUMS, PASS3_SUMS
|
||||
.endm
|
||||
|
||||
/*
|
||||
* void nh_avx2(const u32 *key, const u8 *message, size_t message_len,
|
||||
* u8 hash[NH_HASH_BYTES])
|
||||
*
|
||||
* It's guaranteed that message_len % 16 == 0.
|
||||
*/
|
||||
ENTRY(nh_avx2)
|
||||
|
||||
vmovdqu 0x00(KEY), K0
|
||||
vmovdqu 0x10(KEY), K1
|
||||
add $0x20, KEY
|
||||
vpxor PASS0_SUMS, PASS0_SUMS, PASS0_SUMS
|
||||
vpxor PASS1_SUMS, PASS1_SUMS, PASS1_SUMS
|
||||
vpxor PASS2_SUMS, PASS2_SUMS, PASS2_SUMS
|
||||
vpxor PASS3_SUMS, PASS3_SUMS, PASS3_SUMS
|
||||
|
||||
sub $0x40, MESSAGE_LEN
|
||||
jl .Lloop4_done
|
||||
.Lloop4:
|
||||
vmovdqu (MESSAGE), T3
|
||||
vmovdqu 0x00(KEY), K2
|
||||
vmovdqu 0x10(KEY), K3
|
||||
_nh_2xstride K0, K1, K2, K3
|
||||
|
||||
vmovdqu 0x20(MESSAGE), T3
|
||||
vmovdqu 0x20(KEY), K0
|
||||
vmovdqu 0x30(KEY), K1
|
||||
_nh_2xstride K2, K3, K0, K1
|
||||
|
||||
add $0x40, MESSAGE
|
||||
add $0x40, KEY
|
||||
sub $0x40, MESSAGE_LEN
|
||||
jge .Lloop4
|
||||
|
||||
.Lloop4_done:
|
||||
and $0x3f, MESSAGE_LEN
|
||||
jz .Ldone
|
||||
|
||||
cmp $0x20, MESSAGE_LEN
|
||||
jl .Llast
|
||||
|
||||
// 2 or 3 strides remain; do 2 more.
|
||||
vmovdqu (MESSAGE), T3
|
||||
vmovdqu 0x00(KEY), K2
|
||||
vmovdqu 0x10(KEY), K3
|
||||
_nh_2xstride K0, K1, K2, K3
|
||||
add $0x20, MESSAGE
|
||||
add $0x20, KEY
|
||||
sub $0x20, MESSAGE_LEN
|
||||
jz .Ldone
|
||||
vmovdqa K2, K0
|
||||
vmovdqa K3, K1
|
||||
.Llast:
|
||||
// Last stride. Zero the high 128 bits of the message and keys so they
|
||||
// don't affect the result when processing them like 2 strides.
|
||||
vmovdqu (MESSAGE), T3_XMM
|
||||
vmovdqa K0_XMM, K0_XMM
|
||||
vmovdqa K1_XMM, K1_XMM
|
||||
vmovdqu 0x00(KEY), K2_XMM
|
||||
vmovdqu 0x10(KEY), K3_XMM
|
||||
_nh_2xstride K0, K1, K2, K3
|
||||
|
||||
.Ldone:
|
||||
// Sum the accumulators for each pass, then store the sums to 'hash'
|
||||
|
||||
// PASS0_SUMS is (0A 0B 0C 0D)
|
||||
// PASS1_SUMS is (1A 1B 1C 1D)
|
||||
// PASS2_SUMS is (2A 2B 2C 2D)
|
||||
// PASS3_SUMS is (3A 3B 3C 3D)
|
||||
// We need the horizontal sums:
|
||||
// (0A + 0B + 0C + 0D,
|
||||
// 1A + 1B + 1C + 1D,
|
||||
// 2A + 2B + 2C + 2D,
|
||||
// 3A + 3B + 3C + 3D)
|
||||
//
|
||||
|
||||
vpunpcklqdq PASS1_SUMS, PASS0_SUMS, T0 // T0 = (0A 1A 0C 1C)
|
||||
vpunpckhqdq PASS1_SUMS, PASS0_SUMS, T1 // T1 = (0B 1B 0D 1D)
|
||||
vpunpcklqdq PASS3_SUMS, PASS2_SUMS, T2 // T2 = (2A 3A 2C 3C)
|
||||
vpunpckhqdq PASS3_SUMS, PASS2_SUMS, T3 // T3 = (2B 3B 2D 3D)
|
||||
|
||||
vinserti128 $0x1, T2_XMM, T0, T4 // T4 = (0A 1A 2A 3A)
|
||||
vinserti128 $0x1, T3_XMM, T1, T5 // T5 = (0B 1B 2B 3B)
|
||||
vperm2i128 $0x31, T2, T0, T0 // T0 = (0C 1C 2C 3C)
|
||||
vperm2i128 $0x31, T3, T1, T1 // T1 = (0D 1D 2D 3D)
|
||||
|
||||
vpaddq T5, T4, T4
|
||||
vpaddq T1, T0, T0
|
||||
vpaddq T4, T0, T0
|
||||
vmovdqu T0, (HASH)
|
||||
ret
|
||||
ENDPROC(nh_avx2)
|
123
arch/x86/crypto/nh-sse2-x86_64.S
Normal file
123
arch/x86/crypto/nh-sse2-x86_64.S
Normal file
@ -0,0 +1,123 @@
|
||||
/* SPDX-License-Identifier: GPL-2.0 */
|
||||
/*
|
||||
* NH - ε-almost-universal hash function, x86_64 SSE2 accelerated
|
||||
*
|
||||
* Copyright 2018 Google LLC
|
||||
*
|
||||
* Author: Eric Biggers <ebiggers@google.com>
|
||||
*/
|
||||
|
||||
#include <linux/linkage.h>
|
||||
|
||||
#define PASS0_SUMS %xmm0
|
||||
#define PASS1_SUMS %xmm1
|
||||
#define PASS2_SUMS %xmm2
|
||||
#define PASS3_SUMS %xmm3
|
||||
#define K0 %xmm4
|
||||
#define K1 %xmm5
|
||||
#define K2 %xmm6
|
||||
#define K3 %xmm7
|
||||
#define T0 %xmm8
|
||||
#define T1 %xmm9
|
||||
#define T2 %xmm10
|
||||
#define T3 %xmm11
|
||||
#define T4 %xmm12
|
||||
#define T5 %xmm13
|
||||
#define T6 %xmm14
|
||||
#define T7 %xmm15
|
||||
#define KEY %rdi
|
||||
#define MESSAGE %rsi
|
||||
#define MESSAGE_LEN %rdx
|
||||
#define HASH %rcx
|
||||
|
||||
.macro _nh_stride k0, k1, k2, k3, offset
|
||||
|
||||
// Load next message stride
|
||||
movdqu \offset(MESSAGE), T1
|
||||
|
||||
// Load next key stride
|
||||
movdqu \offset(KEY), \k3
|
||||
|
||||
// Add message words to key words
|
||||
movdqa T1, T2
|
||||
movdqa T1, T3
|
||||
paddd T1, \k0 // reuse k0 to avoid a move
|
||||
paddd \k1, T1
|
||||
paddd \k2, T2
|
||||
paddd \k3, T3
|
||||
|
||||
// Multiply 32x32 => 64 and accumulate
|
||||
pshufd $0x10, \k0, T4
|
||||
pshufd $0x32, \k0, \k0
|
||||
pshufd $0x10, T1, T5
|
||||
pshufd $0x32, T1, T1
|
||||
pshufd $0x10, T2, T6
|
||||
pshufd $0x32, T2, T2
|
||||
pshufd $0x10, T3, T7
|
||||
pshufd $0x32, T3, T3
|
||||
pmuludq T4, \k0
|
||||
pmuludq T5, T1
|
||||
pmuludq T6, T2
|
||||
pmuludq T7, T3
|
||||
paddq \k0, PASS0_SUMS
|
||||
paddq T1, PASS1_SUMS
|
||||
paddq T2, PASS2_SUMS
|
||||
paddq T3, PASS3_SUMS
|
||||
.endm
|
||||
|
||||
/*
|
||||
* void nh_sse2(const u32 *key, const u8 *message, size_t message_len,
|
||||
* u8 hash[NH_HASH_BYTES])
|
||||
*
|
||||
* It's guaranteed that message_len % 16 == 0.
|
||||
*/
|
||||
ENTRY(nh_sse2)
|
||||
|
||||
movdqu 0x00(KEY), K0
|
||||
movdqu 0x10(KEY), K1
|
||||
movdqu 0x20(KEY), K2
|
||||
add $0x30, KEY
|
||||
pxor PASS0_SUMS, PASS0_SUMS
|
||||
pxor PASS1_SUMS, PASS1_SUMS
|
||||
pxor PASS2_SUMS, PASS2_SUMS
|
||||
pxor PASS3_SUMS, PASS3_SUMS
|
||||
|
||||
sub $0x40, MESSAGE_LEN
|
||||
jl .Lloop4_done
|
||||
.Lloop4:
|
||||
_nh_stride K0, K1, K2, K3, 0x00
|
||||
_nh_stride K1, K2, K3, K0, 0x10
|
||||
_nh_stride K2, K3, K0, K1, 0x20
|
||||
_nh_stride K3, K0, K1, K2, 0x30
|
||||
add $0x40, KEY
|
||||
add $0x40, MESSAGE
|
||||
sub $0x40, MESSAGE_LEN
|
||||
jge .Lloop4
|
||||
|
||||
.Lloop4_done:
|
||||
and $0x3f, MESSAGE_LEN
|
||||
jz .Ldone
|
||||
_nh_stride K0, K1, K2, K3, 0x00
|
||||
|
||||
sub $0x10, MESSAGE_LEN
|
||||
jz .Ldone
|
||||
_nh_stride K1, K2, K3, K0, 0x10
|
||||
|
||||
sub $0x10, MESSAGE_LEN
|
||||
jz .Ldone
|
||||
_nh_stride K2, K3, K0, K1, 0x20
|
||||
|
||||
.Ldone:
|
||||
// Sum the accumulators for each pass, then store the sums to 'hash'
|
||||
movdqa PASS0_SUMS, T0
|
||||
movdqa PASS2_SUMS, T1
|
||||
punpcklqdq PASS1_SUMS, T0 // => (PASS0_SUM_A PASS1_SUM_A)
|
||||
punpcklqdq PASS3_SUMS, T1 // => (PASS2_SUM_A PASS3_SUM_A)
|
||||
punpckhqdq PASS1_SUMS, PASS0_SUMS // => (PASS0_SUM_B PASS1_SUM_B)
|
||||
punpckhqdq PASS3_SUMS, PASS2_SUMS // => (PASS2_SUM_B PASS3_SUM_B)
|
||||
paddq PASS0_SUMS, T0
|
||||
paddq PASS2_SUMS, T1
|
||||
movdqu T0, 0x00(HASH)
|
||||
movdqu T1, 0x10(HASH)
|
||||
ret
|
||||
ENDPROC(nh_sse2)
|
77
arch/x86/crypto/nhpoly1305-avx2-glue.c
Normal file
77
arch/x86/crypto/nhpoly1305-avx2-glue.c
Normal file
@ -0,0 +1,77 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/*
|
||||
* NHPoly1305 - ε-almost-∆-universal hash function for Adiantum
|
||||
* (AVX2 accelerated version)
|
||||
*
|
||||
* Copyright 2018 Google LLC
|
||||
*/
|
||||
|
||||
#include <crypto/internal/hash.h>
|
||||
#include <crypto/nhpoly1305.h>
|
||||
#include <linux/module.h>
|
||||
#include <asm/fpu/api.h>
|
||||
|
||||
asmlinkage void nh_avx2(const u32 *key, const u8 *message, size_t message_len,
|
||||
u8 hash[NH_HASH_BYTES]);
|
||||
|
||||
/* wrapper to avoid indirect call to assembly, which doesn't work with CFI */
|
||||
static void _nh_avx2(const u32 *key, const u8 *message, size_t message_len,
|
||||
__le64 hash[NH_NUM_PASSES])
|
||||
{
|
||||
nh_avx2(key, message, message_len, (u8 *)hash);
|
||||
}
|
||||
|
||||
static int nhpoly1305_avx2_update(struct shash_desc *desc,
|
||||
const u8 *src, unsigned int srclen)
|
||||
{
|
||||
if (srclen < 64 || !irq_fpu_usable())
|
||||
return crypto_nhpoly1305_update(desc, src, srclen);
|
||||
|
||||
do {
|
||||
unsigned int n = min_t(unsigned int, srclen, PAGE_SIZE);
|
||||
|
||||
kernel_fpu_begin();
|
||||
crypto_nhpoly1305_update_helper(desc, src, n, _nh_avx2);
|
||||
kernel_fpu_end();
|
||||
src += n;
|
||||
srclen -= n;
|
||||
} while (srclen);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static struct shash_alg nhpoly1305_alg = {
|
||||
.base.cra_name = "nhpoly1305",
|
||||
.base.cra_driver_name = "nhpoly1305-avx2",
|
||||
.base.cra_priority = 300,
|
||||
.base.cra_ctxsize = sizeof(struct nhpoly1305_key),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
.digestsize = POLY1305_DIGEST_SIZE,
|
||||
.init = crypto_nhpoly1305_init,
|
||||
.update = nhpoly1305_avx2_update,
|
||||
.final = crypto_nhpoly1305_final,
|
||||
.setkey = crypto_nhpoly1305_setkey,
|
||||
.descsize = sizeof(struct nhpoly1305_state),
|
||||
};
|
||||
|
||||
static int __init nhpoly1305_mod_init(void)
|
||||
{
|
||||
if (!boot_cpu_has(X86_FEATURE_AVX2) ||
|
||||
!boot_cpu_has(X86_FEATURE_OSXSAVE))
|
||||
return -ENODEV;
|
||||
|
||||
return crypto_register_shash(&nhpoly1305_alg);
|
||||
}
|
||||
|
||||
static void __exit nhpoly1305_mod_exit(void)
|
||||
{
|
||||
crypto_unregister_shash(&nhpoly1305_alg);
|
||||
}
|
||||
|
||||
module_init(nhpoly1305_mod_init);
|
||||
module_exit(nhpoly1305_mod_exit);
|
||||
|
||||
MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function (AVX2-accelerated)");
|
||||
MODULE_LICENSE("GPL v2");
|
||||
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
|
||||
MODULE_ALIAS_CRYPTO("nhpoly1305");
|
||||
MODULE_ALIAS_CRYPTO("nhpoly1305-avx2");
|
76
arch/x86/crypto/nhpoly1305-sse2-glue.c
Normal file
76
arch/x86/crypto/nhpoly1305-sse2-glue.c
Normal file
@ -0,0 +1,76 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/*
|
||||
* NHPoly1305 - ε-almost-∆-universal hash function for Adiantum
|
||||
* (SSE2 accelerated version)
|
||||
*
|
||||
* Copyright 2018 Google LLC
|
||||
*/
|
||||
|
||||
#include <crypto/internal/hash.h>
|
||||
#include <crypto/nhpoly1305.h>
|
||||
#include <linux/module.h>
|
||||
#include <asm/fpu/api.h>
|
||||
|
||||
asmlinkage void nh_sse2(const u32 *key, const u8 *message, size_t message_len,
|
||||
u8 hash[NH_HASH_BYTES]);
|
||||
|
||||
/* wrapper to avoid indirect call to assembly, which doesn't work with CFI */
|
||||
static void _nh_sse2(const u32 *key, const u8 *message, size_t message_len,
|
||||
__le64 hash[NH_NUM_PASSES])
|
||||
{
|
||||
nh_sse2(key, message, message_len, (u8 *)hash);
|
||||
}
|
||||
|
||||
static int nhpoly1305_sse2_update(struct shash_desc *desc,
|
||||
const u8 *src, unsigned int srclen)
|
||||
{
|
||||
if (srclen < 64 || !irq_fpu_usable())
|
||||
return crypto_nhpoly1305_update(desc, src, srclen);
|
||||
|
||||
do {
|
||||
unsigned int n = min_t(unsigned int, srclen, PAGE_SIZE);
|
||||
|
||||
kernel_fpu_begin();
|
||||
crypto_nhpoly1305_update_helper(desc, src, n, _nh_sse2);
|
||||
kernel_fpu_end();
|
||||
src += n;
|
||||
srclen -= n;
|
||||
} while (srclen);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static struct shash_alg nhpoly1305_alg = {
|
||||
.base.cra_name = "nhpoly1305",
|
||||
.base.cra_driver_name = "nhpoly1305-sse2",
|
||||
.base.cra_priority = 200,
|
||||
.base.cra_ctxsize = sizeof(struct nhpoly1305_key),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
.digestsize = POLY1305_DIGEST_SIZE,
|
||||
.init = crypto_nhpoly1305_init,
|
||||
.update = nhpoly1305_sse2_update,
|
||||
.final = crypto_nhpoly1305_final,
|
||||
.setkey = crypto_nhpoly1305_setkey,
|
||||
.descsize = sizeof(struct nhpoly1305_state),
|
||||
};
|
||||
|
||||
static int __init nhpoly1305_mod_init(void)
|
||||
{
|
||||
if (!boot_cpu_has(X86_FEATURE_XMM2))
|
||||
return -ENODEV;
|
||||
|
||||
return crypto_register_shash(&nhpoly1305_alg);
|
||||
}
|
||||
|
||||
static void __exit nhpoly1305_mod_exit(void)
|
||||
{
|
||||
crypto_unregister_shash(&nhpoly1305_alg);
|
||||
}
|
||||
|
||||
module_init(nhpoly1305_mod_init);
|
||||
module_exit(nhpoly1305_mod_exit);
|
||||
|
||||
MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function (SSE2-accelerated)");
|
||||
MODULE_LICENSE("GPL v2");
|
||||
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
|
||||
MODULE_ALIAS_CRYPTO("nhpoly1305");
|
||||
MODULE_ALIAS_CRYPTO("nhpoly1305-sse2");
|
@ -83,35 +83,37 @@ static unsigned int poly1305_simd_blocks(struct poly1305_desc_ctx *dctx,
|
||||
if (poly1305_use_avx2 && srclen >= POLY1305_BLOCK_SIZE * 4) {
|
||||
if (unlikely(!sctx->wset)) {
|
||||
if (!sctx->uset) {
|
||||
memcpy(sctx->u, dctx->r, sizeof(sctx->u));
|
||||
poly1305_simd_mult(sctx->u, dctx->r);
|
||||
memcpy(sctx->u, dctx->r.r, sizeof(sctx->u));
|
||||
poly1305_simd_mult(sctx->u, dctx->r.r);
|
||||
sctx->uset = true;
|
||||
}
|
||||
memcpy(sctx->u + 5, sctx->u, sizeof(sctx->u));
|
||||
poly1305_simd_mult(sctx->u + 5, dctx->r);
|
||||
poly1305_simd_mult(sctx->u + 5, dctx->r.r);
|
||||
memcpy(sctx->u + 10, sctx->u + 5, sizeof(sctx->u));
|
||||
poly1305_simd_mult(sctx->u + 10, dctx->r);
|
||||
poly1305_simd_mult(sctx->u + 10, dctx->r.r);
|
||||
sctx->wset = true;
|
||||
}
|
||||
blocks = srclen / (POLY1305_BLOCK_SIZE * 4);
|
||||
poly1305_4block_avx2(dctx->h, src, dctx->r, blocks, sctx->u);
|
||||
poly1305_4block_avx2(dctx->h.h, src, dctx->r.r, blocks,
|
||||
sctx->u);
|
||||
src += POLY1305_BLOCK_SIZE * 4 * blocks;
|
||||
srclen -= POLY1305_BLOCK_SIZE * 4 * blocks;
|
||||
}
|
||||
#endif
|
||||
if (likely(srclen >= POLY1305_BLOCK_SIZE * 2)) {
|
||||
if (unlikely(!sctx->uset)) {
|
||||
memcpy(sctx->u, dctx->r, sizeof(sctx->u));
|
||||
poly1305_simd_mult(sctx->u, dctx->r);
|
||||
memcpy(sctx->u, dctx->r.r, sizeof(sctx->u));
|
||||
poly1305_simd_mult(sctx->u, dctx->r.r);
|
||||
sctx->uset = true;
|
||||
}
|
||||
blocks = srclen / (POLY1305_BLOCK_SIZE * 2);
|
||||
poly1305_2block_sse2(dctx->h, src, dctx->r, blocks, sctx->u);
|
||||
poly1305_2block_sse2(dctx->h.h, src, dctx->r.r, blocks,
|
||||
sctx->u);
|
||||
src += POLY1305_BLOCK_SIZE * 2 * blocks;
|
||||
srclen -= POLY1305_BLOCK_SIZE * 2 * blocks;
|
||||
}
|
||||
if (srclen >= POLY1305_BLOCK_SIZE) {
|
||||
poly1305_block_sse2(dctx->h, src, dctx->r, 1);
|
||||
poly1305_block_sse2(dctx->h.h, src, dctx->r.r, 1);
|
||||
srclen -= POLY1305_BLOCK_SIZE;
|
||||
}
|
||||
return srclen;
|
||||
|
@ -430,11 +430,14 @@ config CRYPTO_CTS
|
||||
help
|
||||
CTS: Cipher Text Stealing
|
||||
This is the Cipher Text Stealing mode as described by
|
||||
Section 8 of rfc2040 and referenced by rfc3962.
|
||||
(rfc3962 includes errata information in its Appendix A)
|
||||
Section 8 of rfc2040 and referenced by rfc3962
|
||||
(rfc3962 includes errata information in its Appendix A) or
|
||||
CBC-CS3 as defined by NIST in Sp800-38A addendum from Oct 2010.
|
||||
This mode is required for Kerberos gss mechanism support
|
||||
for AES encryption.
|
||||
|
||||
See: https://csrc.nist.gov/publications/detail/sp/800-38a/addendum/final
|
||||
|
||||
config CRYPTO_ECB
|
||||
tristate "ECB support"
|
||||
select CRYPTO_BLKCIPHER
|
||||
@ -493,6 +496,50 @@ config CRYPTO_KEYWRAP
|
||||
Support for key wrapping (NIST SP800-38F / RFC3394) without
|
||||
padding.
|
||||
|
||||
config CRYPTO_NHPOLY1305
|
||||
tristate
|
||||
select CRYPTO_HASH
|
||||
select CRYPTO_POLY1305
|
||||
|
||||
config CRYPTO_NHPOLY1305_SSE2
|
||||
tristate "NHPoly1305 hash function (x86_64 SSE2 implementation)"
|
||||
depends on X86 && 64BIT
|
||||
select CRYPTO_NHPOLY1305
|
||||
help
|
||||
SSE2 optimized implementation of the hash function used by the
|
||||
Adiantum encryption mode.
|
||||
|
||||
config CRYPTO_NHPOLY1305_AVX2
|
||||
tristate "NHPoly1305 hash function (x86_64 AVX2 implementation)"
|
||||
depends on X86 && 64BIT
|
||||
select CRYPTO_NHPOLY1305
|
||||
help
|
||||
AVX2 optimized implementation of the hash function used by the
|
||||
Adiantum encryption mode.
|
||||
|
||||
config CRYPTO_ADIANTUM
|
||||
tristate "Adiantum support"
|
||||
select CRYPTO_CHACHA20
|
||||
select CRYPTO_POLY1305
|
||||
select CRYPTO_NHPOLY1305
|
||||
help
|
||||
Adiantum is a tweakable, length-preserving encryption mode
|
||||
designed for fast and secure disk encryption, especially on
|
||||
CPUs without dedicated crypto instructions. It encrypts
|
||||
each sector using the XChaCha12 stream cipher, two passes of
|
||||
an ε-almost-∆-universal hash function, and an invocation of
|
||||
the AES-256 block cipher on a single 16-byte block. On CPUs
|
||||
without AES instructions, Adiantum is much faster than
|
||||
AES-XTS.
|
||||
|
||||
Adiantum's security is provably reducible to that of its
|
||||
underlying stream and block ciphers, subject to a security
|
||||
bound. Unlike XTS, Adiantum is a true wide-block encryption
|
||||
mode, so it actually provides an even stronger notion of
|
||||
security than XTS, subject to the security bound.
|
||||
|
||||
If unsure, say N.
|
||||
|
||||
comment "Hash modes"
|
||||
|
||||
config CRYPTO_CMAC
|
||||
@ -936,6 +983,18 @@ config CRYPTO_SM3
|
||||
http://www.oscca.gov.cn/UpFile/20101222141857786.pdf
|
||||
https://datatracker.ietf.org/doc/html/draft-shen-sm3-hash
|
||||
|
||||
config CRYPTO_STREEBOG
|
||||
tristate "Streebog Hash Function"
|
||||
select CRYPTO_HASH
|
||||
help
|
||||
Streebog Hash Function (GOST R 34.11-2012, RFC 6986) is one of the Russian
|
||||
cryptographic standard algorithms (called GOST algorithms).
|
||||
This setting enables two hash algorithms with 256 and 512 bits output.
|
||||
|
||||
References:
|
||||
https://tc26.ru/upload/iblock/fed/feddbb4d26b685903faa2ba11aea43f6.pdf
|
||||
https://tools.ietf.org/html/rfc6986
|
||||
|
||||
config CRYPTO_TGR192
|
||||
tristate "Tiger digest algorithms"
|
||||
select CRYPTO_HASH
|
||||
@ -1006,7 +1065,8 @@ config CRYPTO_AES_TI
|
||||
8 for decryption), this implementation only uses just two S-boxes of
|
||||
256 bytes each, and attempts to eliminate data dependent latencies by
|
||||
prefetching the entire table into the cache at the start of each
|
||||
block.
|
||||
block. Interrupts are also disabled to avoid races where cachelines
|
||||
are evicted when the CPU is interrupted to do something else.
|
||||
|
||||
config CRYPTO_AES_586
|
||||
tristate "AES cipher algorithms (i586)"
|
||||
@ -1387,32 +1447,34 @@ config CRYPTO_SALSA20
|
||||
Bernstein <djb@cr.yp.to>. See <http://cr.yp.to/snuffle.html>
|
||||
|
||||
config CRYPTO_CHACHA20
|
||||
tristate "ChaCha20 cipher algorithm"
|
||||
tristate "ChaCha stream cipher algorithms"
|
||||
select CRYPTO_BLKCIPHER
|
||||
help
|
||||
ChaCha20 cipher algorithm, RFC7539.
|
||||
The ChaCha20, XChaCha20, and XChaCha12 stream cipher algorithms.
|
||||
|
||||
ChaCha20 is a 256-bit high-speed stream cipher designed by Daniel J.
|
||||
Bernstein and further specified in RFC7539 for use in IETF protocols.
|
||||
This is the portable C implementation of ChaCha20.
|
||||
|
||||
See also:
|
||||
This is the portable C implementation of ChaCha20. See also:
|
||||
<http://cr.yp.to/chacha/chacha-20080128.pdf>
|
||||
|
||||
XChaCha20 is the application of the XSalsa20 construction to ChaCha20
|
||||
rather than to Salsa20. XChaCha20 extends ChaCha20's nonce length
|
||||
from 64 bits (or 96 bits using the RFC7539 convention) to 192 bits,
|
||||
while provably retaining ChaCha20's security. See also:
|
||||
<https://cr.yp.to/snuffle/xsalsa-20081128.pdf>
|
||||
|
||||
XChaCha12 is XChaCha20 reduced to 12 rounds, with correspondingly
|
||||
reduced security margin but increased performance. It can be needed
|
||||
in some performance-sensitive scenarios.
|
||||
|
||||
config CRYPTO_CHACHA20_X86_64
|
||||
tristate "ChaCha20 cipher algorithm (x86_64/SSSE3/AVX2)"
|
||||
tristate "ChaCha stream cipher algorithms (x86_64/SSSE3/AVX2/AVX-512VL)"
|
||||
depends on X86 && 64BIT
|
||||
select CRYPTO_BLKCIPHER
|
||||
select CRYPTO_CHACHA20
|
||||
help
|
||||
ChaCha20 cipher algorithm, RFC7539.
|
||||
|
||||
ChaCha20 is a 256-bit high-speed stream cipher designed by Daniel J.
|
||||
Bernstein and further specified in RFC7539 for use in IETF protocols.
|
||||
This is the x86_64 assembler implementation using SIMD instructions.
|
||||
|
||||
See also:
|
||||
<http://cr.yp.to/chacha/chacha-20080128.pdf>
|
||||
SSSE3, AVX2, and AVX-512VL optimized implementations of the ChaCha20,
|
||||
XChaCha20, and XChaCha12 stream ciphers.
|
||||
|
||||
config CRYPTO_SEED
|
||||
tristate "SEED cipher algorithm"
|
||||
@ -1812,7 +1874,8 @@ config CRYPTO_USER_API_AEAD
|
||||
cipher algorithms.
|
||||
|
||||
config CRYPTO_STATS
|
||||
bool
|
||||
bool "Crypto usage statistics for User-space"
|
||||
depends on CRYPTO_USER
|
||||
help
|
||||
This option enables the gathering of crypto stats.
|
||||
This will collect:
|
||||
|
@ -54,7 +54,8 @@ cryptomgr-y := algboss.o testmgr.o
|
||||
|
||||
obj-$(CONFIG_CRYPTO_MANAGER2) += cryptomgr.o
|
||||
obj-$(CONFIG_CRYPTO_USER) += crypto_user.o
|
||||
crypto_user-y := crypto_user_base.o crypto_user_stat.o
|
||||
crypto_user-y := crypto_user_base.o
|
||||
crypto_user-$(CONFIG_CRYPTO_STATS) += crypto_user_stat.o
|
||||
obj-$(CONFIG_CRYPTO_CMAC) += cmac.o
|
||||
obj-$(CONFIG_CRYPTO_HMAC) += hmac.o
|
||||
obj-$(CONFIG_CRYPTO_VMAC) += vmac.o
|
||||
@ -71,6 +72,7 @@ obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o
|
||||
obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o
|
||||
obj-$(CONFIG_CRYPTO_SHA3) += sha3_generic.o
|
||||
obj-$(CONFIG_CRYPTO_SM3) += sm3_generic.o
|
||||
obj-$(CONFIG_CRYPTO_STREEBOG) += streebog_generic.o
|
||||
obj-$(CONFIG_CRYPTO_WP512) += wp512.o
|
||||
CFLAGS_wp512.o := $(call cc-option,-fno-schedule-insns) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
|
||||
obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o
|
||||
@ -84,6 +86,8 @@ obj-$(CONFIG_CRYPTO_LRW) += lrw.o
|
||||
obj-$(CONFIG_CRYPTO_XTS) += xts.o
|
||||
obj-$(CONFIG_CRYPTO_CTR) += ctr.o
|
||||
obj-$(CONFIG_CRYPTO_KEYWRAP) += keywrap.o
|
||||
obj-$(CONFIG_CRYPTO_ADIANTUM) += adiantum.o
|
||||
obj-$(CONFIG_CRYPTO_NHPOLY1305) += nhpoly1305.o
|
||||
obj-$(CONFIG_CRYPTO_GCM) += gcm.o
|
||||
obj-$(CONFIG_CRYPTO_CCM) += ccm.o
|
||||
obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o
|
||||
@ -116,7 +120,7 @@ obj-$(CONFIG_CRYPTO_KHAZAD) += khazad.o
|
||||
obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
|
||||
obj-$(CONFIG_CRYPTO_SEED) += seed.o
|
||||
obj-$(CONFIG_CRYPTO_SALSA20) += salsa20_generic.o
|
||||
obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_generic.o
|
||||
obj-$(CONFIG_CRYPTO_CHACHA20) += chacha_generic.o
|
||||
obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_generic.o
|
||||
obj-$(CONFIG_CRYPTO_DEFLATE) += deflate.o
|
||||
obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o
|
||||
|
@ -365,23 +365,18 @@ static int crypto_ablkcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_report_blkcipher rblkcipher;
|
||||
|
||||
strncpy(rblkcipher.type, "ablkcipher", sizeof(rblkcipher.type));
|
||||
strncpy(rblkcipher.geniv, alg->cra_ablkcipher.geniv ?: "<default>",
|
||||
sizeof(rblkcipher.geniv));
|
||||
rblkcipher.geniv[sizeof(rblkcipher.geniv) - 1] = '\0';
|
||||
memset(&rblkcipher, 0, sizeof(rblkcipher));
|
||||
|
||||
strscpy(rblkcipher.type, "ablkcipher", sizeof(rblkcipher.type));
|
||||
strscpy(rblkcipher.geniv, "<default>", sizeof(rblkcipher.geniv));
|
||||
|
||||
rblkcipher.blocksize = alg->cra_blocksize;
|
||||
rblkcipher.min_keysize = alg->cra_ablkcipher.min_keysize;
|
||||
rblkcipher.max_keysize = alg->cra_ablkcipher.max_keysize;
|
||||
rblkcipher.ivsize = alg->cra_ablkcipher.ivsize;
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER,
|
||||
sizeof(struct crypto_report_blkcipher), &rblkcipher))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER,
|
||||
sizeof(rblkcipher), &rblkcipher);
|
||||
}
|
||||
#else
|
||||
static int crypto_ablkcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
@ -403,7 +398,7 @@ static void crypto_ablkcipher_show(struct seq_file *m, struct crypto_alg *alg)
|
||||
seq_printf(m, "min keysize : %u\n", ablkcipher->min_keysize);
|
||||
seq_printf(m, "max keysize : %u\n", ablkcipher->max_keysize);
|
||||
seq_printf(m, "ivsize : %u\n", ablkcipher->ivsize);
|
||||
seq_printf(m, "geniv : %s\n", ablkcipher->geniv ?: "<default>");
|
||||
seq_printf(m, "geniv : <default>\n");
|
||||
}
|
||||
|
||||
const struct crypto_type crypto_ablkcipher_type = {
|
||||
@ -415,78 +410,3 @@ const struct crypto_type crypto_ablkcipher_type = {
|
||||
.report = crypto_ablkcipher_report,
|
||||
};
|
||||
EXPORT_SYMBOL_GPL(crypto_ablkcipher_type);
|
||||
|
||||
static int crypto_init_givcipher_ops(struct crypto_tfm *tfm, u32 type,
|
||||
u32 mask)
|
||||
{
|
||||
struct ablkcipher_alg *alg = &tfm->__crt_alg->cra_ablkcipher;
|
||||
struct ablkcipher_tfm *crt = &tfm->crt_ablkcipher;
|
||||
|
||||
if (alg->ivsize > PAGE_SIZE / 8)
|
||||
return -EINVAL;
|
||||
|
||||
crt->setkey = tfm->__crt_alg->cra_flags & CRYPTO_ALG_GENIV ?
|
||||
alg->setkey : setkey;
|
||||
crt->encrypt = alg->encrypt;
|
||||
crt->decrypt = alg->decrypt;
|
||||
crt->base = __crypto_ablkcipher_cast(tfm);
|
||||
crt->ivsize = alg->ivsize;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
#ifdef CONFIG_NET
|
||||
static int crypto_givcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_report_blkcipher rblkcipher;
|
||||
|
||||
strncpy(rblkcipher.type, "givcipher", sizeof(rblkcipher.type));
|
||||
strncpy(rblkcipher.geniv, alg->cra_ablkcipher.geniv ?: "<built-in>",
|
||||
sizeof(rblkcipher.geniv));
|
||||
rblkcipher.geniv[sizeof(rblkcipher.geniv) - 1] = '\0';
|
||||
|
||||
rblkcipher.blocksize = alg->cra_blocksize;
|
||||
rblkcipher.min_keysize = alg->cra_ablkcipher.min_keysize;
|
||||
rblkcipher.max_keysize = alg->cra_ablkcipher.max_keysize;
|
||||
rblkcipher.ivsize = alg->cra_ablkcipher.ivsize;
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER,
|
||||
sizeof(struct crypto_report_blkcipher), &rblkcipher))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
}
|
||||
#else
|
||||
static int crypto_givcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
return -ENOSYS;
|
||||
}
|
||||
#endif
|
||||
|
||||
static void crypto_givcipher_show(struct seq_file *m, struct crypto_alg *alg)
|
||||
__maybe_unused;
|
||||
static void crypto_givcipher_show(struct seq_file *m, struct crypto_alg *alg)
|
||||
{
|
||||
struct ablkcipher_alg *ablkcipher = &alg->cra_ablkcipher;
|
||||
|
||||
seq_printf(m, "type : givcipher\n");
|
||||
seq_printf(m, "async : %s\n", alg->cra_flags & CRYPTO_ALG_ASYNC ?
|
||||
"yes" : "no");
|
||||
seq_printf(m, "blocksize : %u\n", alg->cra_blocksize);
|
||||
seq_printf(m, "min keysize : %u\n", ablkcipher->min_keysize);
|
||||
seq_printf(m, "max keysize : %u\n", ablkcipher->max_keysize);
|
||||
seq_printf(m, "ivsize : %u\n", ablkcipher->ivsize);
|
||||
seq_printf(m, "geniv : %s\n", ablkcipher->geniv ?: "<built-in>");
|
||||
}
|
||||
|
||||
const struct crypto_type crypto_givcipher_type = {
|
||||
.ctxsize = crypto_ablkcipher_ctxsize,
|
||||
.init = crypto_init_givcipher_ops,
|
||||
#ifdef CONFIG_PROC_FS
|
||||
.show = crypto_givcipher_show,
|
||||
#endif
|
||||
.report = crypto_givcipher_report,
|
||||
};
|
||||
EXPORT_SYMBOL_GPL(crypto_givcipher_type);
|
||||
|
@ -33,15 +33,11 @@ static int crypto_acomp_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_report_acomp racomp;
|
||||
|
||||
strncpy(racomp.type, "acomp", sizeof(racomp.type));
|
||||
memset(&racomp, 0, sizeof(racomp));
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_ACOMP,
|
||||
sizeof(struct crypto_report_acomp), &racomp))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
strscpy(racomp.type, "acomp", sizeof(racomp.type));
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_REPORT_ACOMP, sizeof(racomp), &racomp);
|
||||
}
|
||||
#else
|
||||
static int crypto_acomp_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
|
664
crypto/adiantum.c
Normal file
664
crypto/adiantum.c
Normal file
@ -0,0 +1,664 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/*
|
||||
* Adiantum length-preserving encryption mode
|
||||
*
|
||||
* Copyright 2018 Google LLC
|
||||
*/
|
||||
|
||||
/*
|
||||
* Adiantum is a tweakable, length-preserving encryption mode designed for fast
|
||||
* and secure disk encryption, especially on CPUs without dedicated crypto
|
||||
* instructions. Adiantum encrypts each sector using the XChaCha12 stream
|
||||
* cipher, two passes of an ε-almost-∆-universal (ε-∆U) hash function based on
|
||||
* NH and Poly1305, and an invocation of the AES-256 block cipher on a single
|
||||
* 16-byte block. See the paper for details:
|
||||
*
|
||||
* Adiantum: length-preserving encryption for entry-level processors
|
||||
* (https://eprint.iacr.org/2018/720.pdf)
|
||||
*
|
||||
* For flexibility, this implementation also allows other ciphers:
|
||||
*
|
||||
* - Stream cipher: XChaCha12 or XChaCha20
|
||||
* - Block cipher: any with a 128-bit block size and 256-bit key
|
||||
*
|
||||
* This implementation doesn't currently allow other ε-∆U hash functions, i.e.
|
||||
* HPolyC is not supported. This is because Adiantum is ~20% faster than HPolyC
|
||||
* but still provably as secure, and also the ε-∆U hash function of HBSH is
|
||||
* formally defined to take two inputs (tweak, message) which makes it difficult
|
||||
* to wrap with the crypto_shash API. Rather, some details need to be handled
|
||||
* here. Nevertheless, if needed in the future, support for other ε-∆U hash
|
||||
* functions could be added here.
|
||||
*/
|
||||
|
||||
#include <crypto/b128ops.h>
|
||||
#include <crypto/chacha.h>
|
||||
#include <crypto/internal/hash.h>
|
||||
#include <crypto/internal/skcipher.h>
|
||||
#include <crypto/nhpoly1305.h>
|
||||
#include <crypto/scatterwalk.h>
|
||||
#include <linux/module.h>
|
||||
|
||||
#include "internal.h"
|
||||
|
||||
/*
|
||||
* Size of right-hand part of input data, in bytes; also the size of the block
|
||||
* cipher's block size and the hash function's output.
|
||||
*/
|
||||
#define BLOCKCIPHER_BLOCK_SIZE 16
|
||||
|
||||
/* Size of the block cipher key (K_E) in bytes */
|
||||
#define BLOCKCIPHER_KEY_SIZE 32
|
||||
|
||||
/* Size of the hash key (K_H) in bytes */
|
||||
#define HASH_KEY_SIZE (POLY1305_BLOCK_SIZE + NHPOLY1305_KEY_SIZE)
|
||||
|
||||
/*
|
||||
* The specification allows variable-length tweaks, but Linux's crypto API
|
||||
* currently only allows algorithms to support a single length. The "natural"
|
||||
* tweak length for Adiantum is 16, since that fits into one Poly1305 block for
|
||||
* the best performance. But longer tweaks are useful for fscrypt, to avoid
|
||||
* needing to derive per-file keys. So instead we use two blocks, or 32 bytes.
|
||||
*/
|
||||
#define TWEAK_SIZE 32
|
||||
|
||||
struct adiantum_instance_ctx {
|
||||
struct crypto_skcipher_spawn streamcipher_spawn;
|
||||
struct crypto_spawn blockcipher_spawn;
|
||||
struct crypto_shash_spawn hash_spawn;
|
||||
};
|
||||
|
||||
struct adiantum_tfm_ctx {
|
||||
struct crypto_skcipher *streamcipher;
|
||||
struct crypto_cipher *blockcipher;
|
||||
struct crypto_shash *hash;
|
||||
struct poly1305_key header_hash_key;
|
||||
};
|
||||
|
||||
struct adiantum_request_ctx {
|
||||
|
||||
/*
|
||||
* Buffer for right-hand part of data, i.e.
|
||||
*
|
||||
* P_L => P_M => C_M => C_R when encrypting, or
|
||||
* C_R => C_M => P_M => P_L when decrypting.
|
||||
*
|
||||
* Also used to build the IV for the stream cipher.
|
||||
*/
|
||||
union {
|
||||
u8 bytes[XCHACHA_IV_SIZE];
|
||||
__le32 words[XCHACHA_IV_SIZE / sizeof(__le32)];
|
||||
le128 bignum; /* interpret as element of Z/(2^{128}Z) */
|
||||
} rbuf;
|
||||
|
||||
bool enc; /* true if encrypting, false if decrypting */
|
||||
|
||||
/*
|
||||
* The result of the Poly1305 ε-∆U hash function applied to
|
||||
* (bulk length, tweak)
|
||||
*/
|
||||
le128 header_hash;
|
||||
|
||||
/* Sub-requests, must be last */
|
||||
union {
|
||||
struct shash_desc hash_desc;
|
||||
struct skcipher_request streamcipher_req;
|
||||
} u;
|
||||
};
|
||||
|
||||
/*
|
||||
* Given the XChaCha stream key K_S, derive the block cipher key K_E and the
|
||||
* hash key K_H as follows:
|
||||
*
|
||||
* K_E || K_H || ... = XChaCha(key=K_S, nonce=1||0^191)
|
||||
*
|
||||
* Note that this denotes using bits from the XChaCha keystream, which here we
|
||||
* get indirectly by encrypting a buffer containing all 0's.
|
||||
*/
|
||||
static int adiantum_setkey(struct crypto_skcipher *tfm, const u8 *key,
|
||||
unsigned int keylen)
|
||||
{
|
||||
struct adiantum_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
|
||||
struct {
|
||||
u8 iv[XCHACHA_IV_SIZE];
|
||||
u8 derived_keys[BLOCKCIPHER_KEY_SIZE + HASH_KEY_SIZE];
|
||||
struct scatterlist sg;
|
||||
struct crypto_wait wait;
|
||||
struct skcipher_request req; /* must be last */
|
||||
} *data;
|
||||
u8 *keyp;
|
||||
int err;
|
||||
|
||||
/* Set the stream cipher key (K_S) */
|
||||
crypto_skcipher_clear_flags(tctx->streamcipher, CRYPTO_TFM_REQ_MASK);
|
||||
crypto_skcipher_set_flags(tctx->streamcipher,
|
||||
crypto_skcipher_get_flags(tfm) &
|
||||
CRYPTO_TFM_REQ_MASK);
|
||||
err = crypto_skcipher_setkey(tctx->streamcipher, key, keylen);
|
||||
crypto_skcipher_set_flags(tfm,
|
||||
crypto_skcipher_get_flags(tctx->streamcipher) &
|
||||
CRYPTO_TFM_RES_MASK);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
/* Derive the subkeys */
|
||||
data = kzalloc(sizeof(*data) +
|
||||
crypto_skcipher_reqsize(tctx->streamcipher), GFP_KERNEL);
|
||||
if (!data)
|
||||
return -ENOMEM;
|
||||
data->iv[0] = 1;
|
||||
sg_init_one(&data->sg, data->derived_keys, sizeof(data->derived_keys));
|
||||
crypto_init_wait(&data->wait);
|
||||
skcipher_request_set_tfm(&data->req, tctx->streamcipher);
|
||||
skcipher_request_set_callback(&data->req, CRYPTO_TFM_REQ_MAY_SLEEP |
|
||||
CRYPTO_TFM_REQ_MAY_BACKLOG,
|
||||
crypto_req_done, &data->wait);
|
||||
skcipher_request_set_crypt(&data->req, &data->sg, &data->sg,
|
||||
sizeof(data->derived_keys), data->iv);
|
||||
err = crypto_wait_req(crypto_skcipher_encrypt(&data->req), &data->wait);
|
||||
if (err)
|
||||
goto out;
|
||||
keyp = data->derived_keys;
|
||||
|
||||
/* Set the block cipher key (K_E) */
|
||||
crypto_cipher_clear_flags(tctx->blockcipher, CRYPTO_TFM_REQ_MASK);
|
||||
crypto_cipher_set_flags(tctx->blockcipher,
|
||||
crypto_skcipher_get_flags(tfm) &
|
||||
CRYPTO_TFM_REQ_MASK);
|
||||
err = crypto_cipher_setkey(tctx->blockcipher, keyp,
|
||||
BLOCKCIPHER_KEY_SIZE);
|
||||
crypto_skcipher_set_flags(tfm,
|
||||
crypto_cipher_get_flags(tctx->blockcipher) &
|
||||
CRYPTO_TFM_RES_MASK);
|
||||
if (err)
|
||||
goto out;
|
||||
keyp += BLOCKCIPHER_KEY_SIZE;
|
||||
|
||||
/* Set the hash key (K_H) */
|
||||
poly1305_core_setkey(&tctx->header_hash_key, keyp);
|
||||
keyp += POLY1305_BLOCK_SIZE;
|
||||
|
||||
crypto_shash_clear_flags(tctx->hash, CRYPTO_TFM_REQ_MASK);
|
||||
crypto_shash_set_flags(tctx->hash, crypto_skcipher_get_flags(tfm) &
|
||||
CRYPTO_TFM_REQ_MASK);
|
||||
err = crypto_shash_setkey(tctx->hash, keyp, NHPOLY1305_KEY_SIZE);
|
||||
crypto_skcipher_set_flags(tfm, crypto_shash_get_flags(tctx->hash) &
|
||||
CRYPTO_TFM_RES_MASK);
|
||||
keyp += NHPOLY1305_KEY_SIZE;
|
||||
WARN_ON(keyp != &data->derived_keys[ARRAY_SIZE(data->derived_keys)]);
|
||||
out:
|
||||
kzfree(data);
|
||||
return err;
|
||||
}
|
||||
|
||||
/* Addition in Z/(2^{128}Z) */
|
||||
static inline void le128_add(le128 *r, const le128 *v1, const le128 *v2)
|
||||
{
|
||||
u64 x = le64_to_cpu(v1->b);
|
||||
u64 y = le64_to_cpu(v2->b);
|
||||
|
||||
r->b = cpu_to_le64(x + y);
|
||||
r->a = cpu_to_le64(le64_to_cpu(v1->a) + le64_to_cpu(v2->a) +
|
||||
(x + y < x));
|
||||
}
|
||||
|
||||
/* Subtraction in Z/(2^{128}Z) */
|
||||
static inline void le128_sub(le128 *r, const le128 *v1, const le128 *v2)
|
||||
{
|
||||
u64 x = le64_to_cpu(v1->b);
|
||||
u64 y = le64_to_cpu(v2->b);
|
||||
|
||||
r->b = cpu_to_le64(x - y);
|
||||
r->a = cpu_to_le64(le64_to_cpu(v1->a) - le64_to_cpu(v2->a) -
|
||||
(x - y > x));
|
||||
}
|
||||
|
||||
/*
|
||||
* Apply the Poly1305 ε-∆U hash function to (bulk length, tweak) and save the
|
||||
* result to rctx->header_hash. This is the calculation
|
||||
*
|
||||
* H_T ← Poly1305_{K_T}(bin_{128}(|L|) || T)
|
||||
*
|
||||
* from the procedure in section 6.4 of the Adiantum paper. The resulting value
|
||||
* is reused in both the first and second hash steps. Specifically, it's added
|
||||
* to the result of an independently keyed ε-∆U hash function (for equal length
|
||||
* inputs only) taken over the left-hand part (the "bulk") of the message, to
|
||||
* give the overall Adiantum hash of the (tweak, left-hand part) pair.
|
||||
*/
|
||||
static void adiantum_hash_header(struct skcipher_request *req)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
const struct adiantum_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
|
||||
struct adiantum_request_ctx *rctx = skcipher_request_ctx(req);
|
||||
const unsigned int bulk_len = req->cryptlen - BLOCKCIPHER_BLOCK_SIZE;
|
||||
struct {
|
||||
__le64 message_bits;
|
||||
__le64 padding;
|
||||
} header = {
|
||||
.message_bits = cpu_to_le64((u64)bulk_len * 8)
|
||||
};
|
||||
struct poly1305_state state;
|
||||
|
||||
poly1305_core_init(&state);
|
||||
|
||||
BUILD_BUG_ON(sizeof(header) % POLY1305_BLOCK_SIZE != 0);
|
||||
poly1305_core_blocks(&state, &tctx->header_hash_key,
|
||||
&header, sizeof(header) / POLY1305_BLOCK_SIZE);
|
||||
|
||||
BUILD_BUG_ON(TWEAK_SIZE % POLY1305_BLOCK_SIZE != 0);
|
||||
poly1305_core_blocks(&state, &tctx->header_hash_key, req->iv,
|
||||
TWEAK_SIZE / POLY1305_BLOCK_SIZE);
|
||||
|
||||
poly1305_core_emit(&state, &rctx->header_hash);
|
||||
}
|
||||
|
||||
/* Hash the left-hand part (the "bulk") of the message using NHPoly1305 */
|
||||
static int adiantum_hash_message(struct skcipher_request *req,
|
||||
struct scatterlist *sgl, le128 *digest)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
const struct adiantum_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
|
||||
struct adiantum_request_ctx *rctx = skcipher_request_ctx(req);
|
||||
const unsigned int bulk_len = req->cryptlen - BLOCKCIPHER_BLOCK_SIZE;
|
||||
struct shash_desc *hash_desc = &rctx->u.hash_desc;
|
||||
struct sg_mapping_iter miter;
|
||||
unsigned int i, n;
|
||||
int err;
|
||||
|
||||
hash_desc->tfm = tctx->hash;
|
||||
hash_desc->flags = 0;
|
||||
|
||||
err = crypto_shash_init(hash_desc);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
sg_miter_start(&miter, sgl, sg_nents(sgl),
|
||||
SG_MITER_FROM_SG | SG_MITER_ATOMIC);
|
||||
for (i = 0; i < bulk_len; i += n) {
|
||||
sg_miter_next(&miter);
|
||||
n = min_t(unsigned int, miter.length, bulk_len - i);
|
||||
err = crypto_shash_update(hash_desc, miter.addr, n);
|
||||
if (err)
|
||||
break;
|
||||
}
|
||||
sg_miter_stop(&miter);
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
return crypto_shash_final(hash_desc, (u8 *)digest);
|
||||
}
|
||||
|
||||
/* Continue Adiantum encryption/decryption after the stream cipher step */
|
||||
static int adiantum_finish(struct skcipher_request *req)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
const struct adiantum_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
|
||||
struct adiantum_request_ctx *rctx = skcipher_request_ctx(req);
|
||||
const unsigned int bulk_len = req->cryptlen - BLOCKCIPHER_BLOCK_SIZE;
|
||||
le128 digest;
|
||||
int err;
|
||||
|
||||
/* If decrypting, decrypt C_M with the block cipher to get P_M */
|
||||
if (!rctx->enc)
|
||||
crypto_cipher_decrypt_one(tctx->blockcipher, rctx->rbuf.bytes,
|
||||
rctx->rbuf.bytes);
|
||||
|
||||
/*
|
||||
* Second hash step
|
||||
* enc: C_R = C_M - H_{K_H}(T, C_L)
|
||||
* dec: P_R = P_M - H_{K_H}(T, P_L)
|
||||
*/
|
||||
err = adiantum_hash_message(req, req->dst, &digest);
|
||||
if (err)
|
||||
return err;
|
||||
le128_add(&digest, &digest, &rctx->header_hash);
|
||||
le128_sub(&rctx->rbuf.bignum, &rctx->rbuf.bignum, &digest);
|
||||
scatterwalk_map_and_copy(&rctx->rbuf.bignum, req->dst,
|
||||
bulk_len, BLOCKCIPHER_BLOCK_SIZE, 1);
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void adiantum_streamcipher_done(struct crypto_async_request *areq,
|
||||
int err)
|
||||
{
|
||||
struct skcipher_request *req = areq->data;
|
||||
|
||||
if (!err)
|
||||
err = adiantum_finish(req);
|
||||
|
||||
skcipher_request_complete(req, err);
|
||||
}
|
||||
|
||||
static int adiantum_crypt(struct skcipher_request *req, bool enc)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
const struct adiantum_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
|
||||
struct adiantum_request_ctx *rctx = skcipher_request_ctx(req);
|
||||
const unsigned int bulk_len = req->cryptlen - BLOCKCIPHER_BLOCK_SIZE;
|
||||
unsigned int stream_len;
|
||||
le128 digest;
|
||||
int err;
|
||||
|
||||
if (req->cryptlen < BLOCKCIPHER_BLOCK_SIZE)
|
||||
return -EINVAL;
|
||||
|
||||
rctx->enc = enc;
|
||||
|
||||
/*
|
||||
* First hash step
|
||||
* enc: P_M = P_R + H_{K_H}(T, P_L)
|
||||
* dec: C_M = C_R + H_{K_H}(T, C_L)
|
||||
*/
|
||||
adiantum_hash_header(req);
|
||||
err = adiantum_hash_message(req, req->src, &digest);
|
||||
if (err)
|
||||
return err;
|
||||
le128_add(&digest, &digest, &rctx->header_hash);
|
||||
scatterwalk_map_and_copy(&rctx->rbuf.bignum, req->src,
|
||||
bulk_len, BLOCKCIPHER_BLOCK_SIZE, 0);
|
||||
le128_add(&rctx->rbuf.bignum, &rctx->rbuf.bignum, &digest);
|
||||
|
||||
/* If encrypting, encrypt P_M with the block cipher to get C_M */
|
||||
if (enc)
|
||||
crypto_cipher_encrypt_one(tctx->blockcipher, rctx->rbuf.bytes,
|
||||
rctx->rbuf.bytes);
|
||||
|
||||
/* Initialize the rest of the XChaCha IV (first part is C_M) */
|
||||
BUILD_BUG_ON(BLOCKCIPHER_BLOCK_SIZE != 16);
|
||||
BUILD_BUG_ON(XCHACHA_IV_SIZE != 32); /* nonce || stream position */
|
||||
rctx->rbuf.words[4] = cpu_to_le32(1);
|
||||
rctx->rbuf.words[5] = 0;
|
||||
rctx->rbuf.words[6] = 0;
|
||||
rctx->rbuf.words[7] = 0;
|
||||
|
||||
/*
|
||||
* XChaCha needs to be done on all the data except the last 16 bytes;
|
||||
* for disk encryption that usually means 4080 or 496 bytes. But ChaCha
|
||||
* implementations tend to be most efficient when passed a whole number
|
||||
* of 64-byte ChaCha blocks, or sometimes even a multiple of 256 bytes.
|
||||
* And here it doesn't matter whether the last 16 bytes are written to,
|
||||
* as the second hash step will overwrite them. Thus, round the XChaCha
|
||||
* length up to the next 64-byte boundary if possible.
|
||||
*/
|
||||
stream_len = bulk_len;
|
||||
if (round_up(stream_len, CHACHA_BLOCK_SIZE) <= req->cryptlen)
|
||||
stream_len = round_up(stream_len, CHACHA_BLOCK_SIZE);
|
||||
|
||||
skcipher_request_set_tfm(&rctx->u.streamcipher_req, tctx->streamcipher);
|
||||
skcipher_request_set_crypt(&rctx->u.streamcipher_req, req->src,
|
||||
req->dst, stream_len, &rctx->rbuf);
|
||||
skcipher_request_set_callback(&rctx->u.streamcipher_req,
|
||||
req->base.flags,
|
||||
adiantum_streamcipher_done, req);
|
||||
return crypto_skcipher_encrypt(&rctx->u.streamcipher_req) ?:
|
||||
adiantum_finish(req);
|
||||
}
|
||||
|
||||
static int adiantum_encrypt(struct skcipher_request *req)
|
||||
{
|
||||
return adiantum_crypt(req, true);
|
||||
}
|
||||
|
||||
static int adiantum_decrypt(struct skcipher_request *req)
|
||||
{
|
||||
return adiantum_crypt(req, false);
|
||||
}
|
||||
|
||||
static int adiantum_init_tfm(struct crypto_skcipher *tfm)
|
||||
{
|
||||
struct skcipher_instance *inst = skcipher_alg_instance(tfm);
|
||||
struct adiantum_instance_ctx *ictx = skcipher_instance_ctx(inst);
|
||||
struct adiantum_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
|
||||
struct crypto_skcipher *streamcipher;
|
||||
struct crypto_cipher *blockcipher;
|
||||
struct crypto_shash *hash;
|
||||
unsigned int subreq_size;
|
||||
int err;
|
||||
|
||||
streamcipher = crypto_spawn_skcipher(&ictx->streamcipher_spawn);
|
||||
if (IS_ERR(streamcipher))
|
||||
return PTR_ERR(streamcipher);
|
||||
|
||||
blockcipher = crypto_spawn_cipher(&ictx->blockcipher_spawn);
|
||||
if (IS_ERR(blockcipher)) {
|
||||
err = PTR_ERR(blockcipher);
|
||||
goto err_free_streamcipher;
|
||||
}
|
||||
|
||||
hash = crypto_spawn_shash(&ictx->hash_spawn);
|
||||
if (IS_ERR(hash)) {
|
||||
err = PTR_ERR(hash);
|
||||
goto err_free_blockcipher;
|
||||
}
|
||||
|
||||
tctx->streamcipher = streamcipher;
|
||||
tctx->blockcipher = blockcipher;
|
||||
tctx->hash = hash;
|
||||
|
||||
BUILD_BUG_ON(offsetofend(struct adiantum_request_ctx, u) !=
|
||||
sizeof(struct adiantum_request_ctx));
|
||||
subreq_size = max(FIELD_SIZEOF(struct adiantum_request_ctx,
|
||||
u.hash_desc) +
|
||||
crypto_shash_descsize(hash),
|
||||
FIELD_SIZEOF(struct adiantum_request_ctx,
|
||||
u.streamcipher_req) +
|
||||
crypto_skcipher_reqsize(streamcipher));
|
||||
|
||||
crypto_skcipher_set_reqsize(tfm,
|
||||
offsetof(struct adiantum_request_ctx, u) +
|
||||
subreq_size);
|
||||
return 0;
|
||||
|
||||
err_free_blockcipher:
|
||||
crypto_free_cipher(blockcipher);
|
||||
err_free_streamcipher:
|
||||
crypto_free_skcipher(streamcipher);
|
||||
return err;
|
||||
}
|
||||
|
||||
static void adiantum_exit_tfm(struct crypto_skcipher *tfm)
|
||||
{
|
||||
struct adiantum_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
|
||||
|
||||
crypto_free_skcipher(tctx->streamcipher);
|
||||
crypto_free_cipher(tctx->blockcipher);
|
||||
crypto_free_shash(tctx->hash);
|
||||
}
|
||||
|
||||
static void adiantum_free_instance(struct skcipher_instance *inst)
|
||||
{
|
||||
struct adiantum_instance_ctx *ictx = skcipher_instance_ctx(inst);
|
||||
|
||||
crypto_drop_skcipher(&ictx->streamcipher_spawn);
|
||||
crypto_drop_spawn(&ictx->blockcipher_spawn);
|
||||
crypto_drop_shash(&ictx->hash_spawn);
|
||||
kfree(inst);
|
||||
}
|
||||
|
||||
/*
|
||||
* Check for a supported set of inner algorithms.
|
||||
* See the comment at the beginning of this file.
|
||||
*/
|
||||
static bool adiantum_supported_algorithms(struct skcipher_alg *streamcipher_alg,
|
||||
struct crypto_alg *blockcipher_alg,
|
||||
struct shash_alg *hash_alg)
|
||||
{
|
||||
if (strcmp(streamcipher_alg->base.cra_name, "xchacha12") != 0 &&
|
||||
strcmp(streamcipher_alg->base.cra_name, "xchacha20") != 0)
|
||||
return false;
|
||||
|
||||
if (blockcipher_alg->cra_cipher.cia_min_keysize > BLOCKCIPHER_KEY_SIZE ||
|
||||
blockcipher_alg->cra_cipher.cia_max_keysize < BLOCKCIPHER_KEY_SIZE)
|
||||
return false;
|
||||
if (blockcipher_alg->cra_blocksize != BLOCKCIPHER_BLOCK_SIZE)
|
||||
return false;
|
||||
|
||||
if (strcmp(hash_alg->base.cra_name, "nhpoly1305") != 0)
|
||||
return false;
|
||||
|
||||
return true;
|
||||
}
|
||||
|
||||
static int adiantum_create(struct crypto_template *tmpl, struct rtattr **tb)
|
||||
{
|
||||
struct crypto_attr_type *algt;
|
||||
const char *streamcipher_name;
|
||||
const char *blockcipher_name;
|
||||
const char *nhpoly1305_name;
|
||||
struct skcipher_instance *inst;
|
||||
struct adiantum_instance_ctx *ictx;
|
||||
struct skcipher_alg *streamcipher_alg;
|
||||
struct crypto_alg *blockcipher_alg;
|
||||
struct crypto_alg *_hash_alg;
|
||||
struct shash_alg *hash_alg;
|
||||
int err;
|
||||
|
||||
algt = crypto_get_attr_type(tb);
|
||||
if (IS_ERR(algt))
|
||||
return PTR_ERR(algt);
|
||||
|
||||
if ((algt->type ^ CRYPTO_ALG_TYPE_SKCIPHER) & algt->mask)
|
||||
return -EINVAL;
|
||||
|
||||
streamcipher_name = crypto_attr_alg_name(tb[1]);
|
||||
if (IS_ERR(streamcipher_name))
|
||||
return PTR_ERR(streamcipher_name);
|
||||
|
||||
blockcipher_name = crypto_attr_alg_name(tb[2]);
|
||||
if (IS_ERR(blockcipher_name))
|
||||
return PTR_ERR(blockcipher_name);
|
||||
|
||||
nhpoly1305_name = crypto_attr_alg_name(tb[3]);
|
||||
if (nhpoly1305_name == ERR_PTR(-ENOENT))
|
||||
nhpoly1305_name = "nhpoly1305";
|
||||
if (IS_ERR(nhpoly1305_name))
|
||||
return PTR_ERR(nhpoly1305_name);
|
||||
|
||||
inst = kzalloc(sizeof(*inst) + sizeof(*ictx), GFP_KERNEL);
|
||||
if (!inst)
|
||||
return -ENOMEM;
|
||||
ictx = skcipher_instance_ctx(inst);
|
||||
|
||||
/* Stream cipher, e.g. "xchacha12" */
|
||||
err = crypto_grab_skcipher(&ictx->streamcipher_spawn, streamcipher_name,
|
||||
0, crypto_requires_sync(algt->type,
|
||||
algt->mask));
|
||||
if (err)
|
||||
goto out_free_inst;
|
||||
streamcipher_alg = crypto_spawn_skcipher_alg(&ictx->streamcipher_spawn);
|
||||
|
||||
/* Block cipher, e.g. "aes" */
|
||||
err = crypto_grab_spawn(&ictx->blockcipher_spawn, blockcipher_name,
|
||||
CRYPTO_ALG_TYPE_CIPHER, CRYPTO_ALG_TYPE_MASK);
|
||||
if (err)
|
||||
goto out_drop_streamcipher;
|
||||
blockcipher_alg = ictx->blockcipher_spawn.alg;
|
||||
|
||||
/* NHPoly1305 ε-∆U hash function */
|
||||
_hash_alg = crypto_alg_mod_lookup(nhpoly1305_name,
|
||||
CRYPTO_ALG_TYPE_SHASH,
|
||||
CRYPTO_ALG_TYPE_MASK);
|
||||
if (IS_ERR(_hash_alg)) {
|
||||
err = PTR_ERR(_hash_alg);
|
||||
goto out_drop_blockcipher;
|
||||
}
|
||||
hash_alg = __crypto_shash_alg(_hash_alg);
|
||||
err = crypto_init_shash_spawn(&ictx->hash_spawn, hash_alg,
|
||||
skcipher_crypto_instance(inst));
|
||||
if (err)
|
||||
goto out_put_hash;
|
||||
|
||||
/* Check the set of algorithms */
|
||||
if (!adiantum_supported_algorithms(streamcipher_alg, blockcipher_alg,
|
||||
hash_alg)) {
|
||||
pr_warn("Unsupported Adiantum instantiation: (%s,%s,%s)\n",
|
||||
streamcipher_alg->base.cra_name,
|
||||
blockcipher_alg->cra_name, hash_alg->base.cra_name);
|
||||
err = -EINVAL;
|
||||
goto out_drop_hash;
|
||||
}
|
||||
|
||||
/* Instance fields */
|
||||
|
||||
err = -ENAMETOOLONG;
|
||||
if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME,
|
||||
"adiantum(%s,%s)", streamcipher_alg->base.cra_name,
|
||||
blockcipher_alg->cra_name) >= CRYPTO_MAX_ALG_NAME)
|
||||
goto out_drop_hash;
|
||||
if (snprintf(inst->alg.base.cra_driver_name, CRYPTO_MAX_ALG_NAME,
|
||||
"adiantum(%s,%s,%s)",
|
||||
streamcipher_alg->base.cra_driver_name,
|
||||
blockcipher_alg->cra_driver_name,
|
||||
hash_alg->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
|
||||
goto out_drop_hash;
|
||||
|
||||
inst->alg.base.cra_flags = streamcipher_alg->base.cra_flags &
|
||||
CRYPTO_ALG_ASYNC;
|
||||
inst->alg.base.cra_blocksize = BLOCKCIPHER_BLOCK_SIZE;
|
||||
inst->alg.base.cra_ctxsize = sizeof(struct adiantum_tfm_ctx);
|
||||
inst->alg.base.cra_alignmask = streamcipher_alg->base.cra_alignmask |
|
||||
hash_alg->base.cra_alignmask;
|
||||
/*
|
||||
* The block cipher is only invoked once per message, so for long
|
||||
* messages (e.g. sectors for disk encryption) its performance doesn't
|
||||
* matter as much as that of the stream cipher and hash function. Thus,
|
||||
* weigh the block cipher's ->cra_priority less.
|
||||
*/
|
||||
inst->alg.base.cra_priority = (4 * streamcipher_alg->base.cra_priority +
|
||||
2 * hash_alg->base.cra_priority +
|
||||
blockcipher_alg->cra_priority) / 7;
|
||||
|
||||
inst->alg.setkey = adiantum_setkey;
|
||||
inst->alg.encrypt = adiantum_encrypt;
|
||||
inst->alg.decrypt = adiantum_decrypt;
|
||||
inst->alg.init = adiantum_init_tfm;
|
||||
inst->alg.exit = adiantum_exit_tfm;
|
||||
inst->alg.min_keysize = crypto_skcipher_alg_min_keysize(streamcipher_alg);
|
||||
inst->alg.max_keysize = crypto_skcipher_alg_max_keysize(streamcipher_alg);
|
||||
inst->alg.ivsize = TWEAK_SIZE;
|
||||
|
||||
inst->free = adiantum_free_instance;
|
||||
|
||||
err = skcipher_register_instance(tmpl, inst);
|
||||
if (err)
|
||||
goto out_drop_hash;
|
||||
|
||||
crypto_mod_put(_hash_alg);
|
||||
return 0;
|
||||
|
||||
out_drop_hash:
|
||||
crypto_drop_shash(&ictx->hash_spawn);
|
||||
out_put_hash:
|
||||
crypto_mod_put(_hash_alg);
|
||||
out_drop_blockcipher:
|
||||
crypto_drop_spawn(&ictx->blockcipher_spawn);
|
||||
out_drop_streamcipher:
|
||||
crypto_drop_skcipher(&ictx->streamcipher_spawn);
|
||||
out_free_inst:
|
||||
kfree(inst);
|
||||
return err;
|
||||
}
|
||||
|
||||
/* adiantum(streamcipher_name, blockcipher_name [, nhpoly1305_name]) */
|
||||
static struct crypto_template adiantum_tmpl = {
|
||||
.name = "adiantum",
|
||||
.create = adiantum_create,
|
||||
.module = THIS_MODULE,
|
||||
};
|
||||
|
||||
static int __init adiantum_module_init(void)
|
||||
{
|
||||
return crypto_register_template(&adiantum_tmpl);
|
||||
}
|
||||
|
||||
static void __exit adiantum_module_exit(void)
|
||||
{
|
||||
crypto_unregister_template(&adiantum_tmpl);
|
||||
}
|
||||
|
||||
module_init(adiantum_module_init);
|
||||
module_exit(adiantum_module_exit);
|
||||
|
||||
MODULE_DESCRIPTION("Adiantum length-preserving encryption mode");
|
||||
MODULE_LICENSE("GPL v2");
|
||||
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
|
||||
MODULE_ALIAS_CRYPTO("adiantum");
|
@ -119,20 +119,16 @@ static int crypto_aead_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
struct crypto_report_aead raead;
|
||||
struct aead_alg *aead = container_of(alg, struct aead_alg, base);
|
||||
|
||||
strncpy(raead.type, "aead", sizeof(raead.type));
|
||||
strncpy(raead.geniv, "<none>", sizeof(raead.geniv));
|
||||
memset(&raead, 0, sizeof(raead));
|
||||
|
||||
strscpy(raead.type, "aead", sizeof(raead.type));
|
||||
strscpy(raead.geniv, "<none>", sizeof(raead.geniv));
|
||||
|
||||
raead.blocksize = alg->cra_blocksize;
|
||||
raead.maxauthsize = aead->maxauthsize;
|
||||
raead.ivsize = aead->ivsize;
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_AEAD,
|
||||
sizeof(struct crypto_report_aead), &raead))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_REPORT_AEAD, sizeof(raead), &raead);
|
||||
}
|
||||
#else
|
||||
static int crypto_aead_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
|
@ -63,7 +63,8 @@ static inline u8 byte(const u32 x, const unsigned n)
|
||||
|
||||
static const u32 rco_tab[10] = { 1, 2, 4, 8, 16, 32, 64, 128, 27, 54 };
|
||||
|
||||
__visible const u32 crypto_ft_tab[4][256] = {
|
||||
/* cacheline-aligned to facilitate prefetching into cache */
|
||||
__visible const u32 crypto_ft_tab[4][256] __cacheline_aligned = {
|
||||
{
|
||||
0xa56363c6, 0x847c7cf8, 0x997777ee, 0x8d7b7bf6,
|
||||
0x0df2f2ff, 0xbd6b6bd6, 0xb16f6fde, 0x54c5c591,
|
||||
@ -327,7 +328,7 @@ __visible const u32 crypto_ft_tab[4][256] = {
|
||||
}
|
||||
};
|
||||
|
||||
__visible const u32 crypto_fl_tab[4][256] = {
|
||||
__visible const u32 crypto_fl_tab[4][256] __cacheline_aligned = {
|
||||
{
|
||||
0x00000063, 0x0000007c, 0x00000077, 0x0000007b,
|
||||
0x000000f2, 0x0000006b, 0x0000006f, 0x000000c5,
|
||||
@ -591,7 +592,7 @@ __visible const u32 crypto_fl_tab[4][256] = {
|
||||
}
|
||||
};
|
||||
|
||||
__visible const u32 crypto_it_tab[4][256] = {
|
||||
__visible const u32 crypto_it_tab[4][256] __cacheline_aligned = {
|
||||
{
|
||||
0x50a7f451, 0x5365417e, 0xc3a4171a, 0x965e273a,
|
||||
0xcb6bab3b, 0xf1459d1f, 0xab58faac, 0x9303e34b,
|
||||
@ -855,7 +856,7 @@ __visible const u32 crypto_it_tab[4][256] = {
|
||||
}
|
||||
};
|
||||
|
||||
__visible const u32 crypto_il_tab[4][256] = {
|
||||
__visible const u32 crypto_il_tab[4][256] __cacheline_aligned = {
|
||||
{
|
||||
0x00000052, 0x00000009, 0x0000006a, 0x000000d5,
|
||||
0x00000030, 0x00000036, 0x000000a5, 0x00000038,
|
||||
|
@ -269,6 +269,7 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
|
||||
const u32 *rkp = ctx->key_enc + 4;
|
||||
int rounds = 6 + ctx->key_length / 4;
|
||||
u32 st0[4], st1[4];
|
||||
unsigned long flags;
|
||||
int round;
|
||||
|
||||
st0[0] = ctx->key_enc[0] ^ get_unaligned_le32(in);
|
||||
@ -276,6 +277,12 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
|
||||
st0[2] = ctx->key_enc[2] ^ get_unaligned_le32(in + 8);
|
||||
st0[3] = ctx->key_enc[3] ^ get_unaligned_le32(in + 12);
|
||||
|
||||
/*
|
||||
* Temporarily disable interrupts to avoid races where cachelines are
|
||||
* evicted when the CPU is interrupted to do something else.
|
||||
*/
|
||||
local_irq_save(flags);
|
||||
|
||||
st0[0] ^= __aesti_sbox[ 0] ^ __aesti_sbox[128];
|
||||
st0[1] ^= __aesti_sbox[32] ^ __aesti_sbox[160];
|
||||
st0[2] ^= __aesti_sbox[64] ^ __aesti_sbox[192];
|
||||
@ -300,6 +307,8 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
|
||||
put_unaligned_le32(subshift(st1, 1) ^ rkp[5], out + 4);
|
||||
put_unaligned_le32(subshift(st1, 2) ^ rkp[6], out + 8);
|
||||
put_unaligned_le32(subshift(st1, 3) ^ rkp[7], out + 12);
|
||||
|
||||
local_irq_restore(flags);
|
||||
}
|
||||
|
||||
static void aesti_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
|
||||
@ -308,6 +317,7 @@ static void aesti_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
|
||||
const u32 *rkp = ctx->key_dec + 4;
|
||||
int rounds = 6 + ctx->key_length / 4;
|
||||
u32 st0[4], st1[4];
|
||||
unsigned long flags;
|
||||
int round;
|
||||
|
||||
st0[0] = ctx->key_dec[0] ^ get_unaligned_le32(in);
|
||||
@ -315,6 +325,12 @@ static void aesti_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
|
||||
st0[2] = ctx->key_dec[2] ^ get_unaligned_le32(in + 8);
|
||||
st0[3] = ctx->key_dec[3] ^ get_unaligned_le32(in + 12);
|
||||
|
||||
/*
|
||||
* Temporarily disable interrupts to avoid races where cachelines are
|
||||
* evicted when the CPU is interrupted to do something else.
|
||||
*/
|
||||
local_irq_save(flags);
|
||||
|
||||
st0[0] ^= __aesti_inv_sbox[ 0] ^ __aesti_inv_sbox[128];
|
||||
st0[1] ^= __aesti_inv_sbox[32] ^ __aesti_inv_sbox[160];
|
||||
st0[2] ^= __aesti_inv_sbox[64] ^ __aesti_inv_sbox[192];
|
||||
@ -339,6 +355,8 @@ static void aesti_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
|
||||
put_unaligned_le32(inv_subshift(st1, 1) ^ rkp[5], out + 4);
|
||||
put_unaligned_le32(inv_subshift(st1, 2) ^ rkp[6], out + 8);
|
||||
put_unaligned_le32(inv_subshift(st1, 3) ^ rkp[7], out + 12);
|
||||
|
||||
local_irq_restore(flags);
|
||||
}
|
||||
|
||||
static struct crypto_alg aes_alg = {
|
||||
|
@ -364,20 +364,28 @@ static int crypto_ahash_op(struct ahash_request *req,
|
||||
|
||||
int crypto_ahash_final(struct ahash_request *req)
|
||||
{
|
||||
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
|
||||
struct crypto_alg *alg = tfm->base.__crt_alg;
|
||||
unsigned int nbytes = req->nbytes;
|
||||
int ret;
|
||||
|
||||
crypto_stats_get(alg);
|
||||
ret = crypto_ahash_op(req, crypto_ahash_reqtfm(req)->final);
|
||||
crypto_stat_ahash_final(req, ret);
|
||||
crypto_stats_ahash_final(nbytes, ret, alg);
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_ahash_final);
|
||||
|
||||
int crypto_ahash_finup(struct ahash_request *req)
|
||||
{
|
||||
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
|
||||
struct crypto_alg *alg = tfm->base.__crt_alg;
|
||||
unsigned int nbytes = req->nbytes;
|
||||
int ret;
|
||||
|
||||
crypto_stats_get(alg);
|
||||
ret = crypto_ahash_op(req, crypto_ahash_reqtfm(req)->finup);
|
||||
crypto_stat_ahash_final(req, ret);
|
||||
crypto_stats_ahash_final(nbytes, ret, alg);
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_ahash_finup);
|
||||
@ -385,13 +393,16 @@ EXPORT_SYMBOL_GPL(crypto_ahash_finup);
|
||||
int crypto_ahash_digest(struct ahash_request *req)
|
||||
{
|
||||
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
|
||||
struct crypto_alg *alg = tfm->base.__crt_alg;
|
||||
unsigned int nbytes = req->nbytes;
|
||||
int ret;
|
||||
|
||||
crypto_stats_get(alg);
|
||||
if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
|
||||
ret = -ENOKEY;
|
||||
else
|
||||
ret = crypto_ahash_op(req, tfm->digest);
|
||||
crypto_stat_ahash_final(req, ret);
|
||||
crypto_stats_ahash_final(nbytes, ret, alg);
|
||||
return ret;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_ahash_digest);
|
||||
@ -498,18 +509,14 @@ static int crypto_ahash_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_report_hash rhash;
|
||||
|
||||
strncpy(rhash.type, "ahash", sizeof(rhash.type));
|
||||
memset(&rhash, 0, sizeof(rhash));
|
||||
|
||||
strscpy(rhash.type, "ahash", sizeof(rhash.type));
|
||||
|
||||
rhash.blocksize = alg->cra_blocksize;
|
||||
rhash.digestsize = __crypto_hash_alg_common(alg)->digestsize;
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_HASH,
|
||||
sizeof(struct crypto_report_hash), &rhash))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_REPORT_HASH, sizeof(rhash), &rhash);
|
||||
}
|
||||
#else
|
||||
static int crypto_ahash_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
|
@ -30,15 +30,12 @@ static int crypto_akcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_report_akcipher rakcipher;
|
||||
|
||||
strncpy(rakcipher.type, "akcipher", sizeof(rakcipher.type));
|
||||
memset(&rakcipher, 0, sizeof(rakcipher));
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_AKCIPHER,
|
||||
sizeof(struct crypto_report_akcipher), &rakcipher))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
strscpy(rakcipher.type, "akcipher", sizeof(rakcipher.type));
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_REPORT_AKCIPHER,
|
||||
sizeof(rakcipher), &rakcipher);
|
||||
}
|
||||
#else
|
||||
static int crypto_akcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
|
247
crypto/algapi.c
247
crypto/algapi.c
@ -258,13 +258,7 @@ static struct crypto_larval *__crypto_register_alg(struct crypto_alg *alg)
|
||||
list_add(&alg->cra_list, &crypto_alg_list);
|
||||
list_add(&larval->alg.cra_list, &crypto_alg_list);
|
||||
|
||||
atomic_set(&alg->encrypt_cnt, 0);
|
||||
atomic_set(&alg->decrypt_cnt, 0);
|
||||
atomic64_set(&alg->encrypt_tlen, 0);
|
||||
atomic64_set(&alg->decrypt_tlen, 0);
|
||||
atomic_set(&alg->verify_cnt, 0);
|
||||
atomic_set(&alg->cipher_err_cnt, 0);
|
||||
atomic_set(&alg->sign_cnt, 0);
|
||||
crypto_stats_init(alg);
|
||||
|
||||
out:
|
||||
return larval;
|
||||
@ -1076,6 +1070,245 @@ int crypto_type_has_alg(const char *name, const struct crypto_type *frontend,
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_type_has_alg);
|
||||
|
||||
#ifdef CONFIG_CRYPTO_STATS
|
||||
void crypto_stats_init(struct crypto_alg *alg)
|
||||
{
|
||||
memset(&alg->stats, 0, sizeof(alg->stats));
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_init);
|
||||
|
||||
void crypto_stats_get(struct crypto_alg *alg)
|
||||
{
|
||||
crypto_alg_get(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_get);
|
||||
|
||||
void crypto_stats_ablkcipher_encrypt(unsigned int nbytes, int ret,
|
||||
struct crypto_alg *alg)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
|
||||
atomic64_inc(&alg->stats.cipher.err_cnt);
|
||||
} else {
|
||||
atomic64_inc(&alg->stats.cipher.encrypt_cnt);
|
||||
atomic64_add(nbytes, &alg->stats.cipher.encrypt_tlen);
|
||||
}
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_ablkcipher_encrypt);
|
||||
|
||||
void crypto_stats_ablkcipher_decrypt(unsigned int nbytes, int ret,
|
||||
struct crypto_alg *alg)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
|
||||
atomic64_inc(&alg->stats.cipher.err_cnt);
|
||||
} else {
|
||||
atomic64_inc(&alg->stats.cipher.decrypt_cnt);
|
||||
atomic64_add(nbytes, &alg->stats.cipher.decrypt_tlen);
|
||||
}
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_ablkcipher_decrypt);
|
||||
|
||||
void crypto_stats_aead_encrypt(unsigned int cryptlen, struct crypto_alg *alg,
|
||||
int ret)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
|
||||
atomic64_inc(&alg->stats.aead.err_cnt);
|
||||
} else {
|
||||
atomic64_inc(&alg->stats.aead.encrypt_cnt);
|
||||
atomic64_add(cryptlen, &alg->stats.aead.encrypt_tlen);
|
||||
}
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_aead_encrypt);
|
||||
|
||||
void crypto_stats_aead_decrypt(unsigned int cryptlen, struct crypto_alg *alg,
|
||||
int ret)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
|
||||
atomic64_inc(&alg->stats.aead.err_cnt);
|
||||
} else {
|
||||
atomic64_inc(&alg->stats.aead.decrypt_cnt);
|
||||
atomic64_add(cryptlen, &alg->stats.aead.decrypt_tlen);
|
||||
}
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_aead_decrypt);
|
||||
|
||||
void crypto_stats_akcipher_encrypt(unsigned int src_len, int ret,
|
||||
struct crypto_alg *alg)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
|
||||
atomic64_inc(&alg->stats.akcipher.err_cnt);
|
||||
} else {
|
||||
atomic64_inc(&alg->stats.akcipher.encrypt_cnt);
|
||||
atomic64_add(src_len, &alg->stats.akcipher.encrypt_tlen);
|
||||
}
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_akcipher_encrypt);
|
||||
|
||||
void crypto_stats_akcipher_decrypt(unsigned int src_len, int ret,
|
||||
struct crypto_alg *alg)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
|
||||
atomic64_inc(&alg->stats.akcipher.err_cnt);
|
||||
} else {
|
||||
atomic64_inc(&alg->stats.akcipher.decrypt_cnt);
|
||||
atomic64_add(src_len, &alg->stats.akcipher.decrypt_tlen);
|
||||
}
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_akcipher_decrypt);
|
||||
|
||||
void crypto_stats_akcipher_sign(int ret, struct crypto_alg *alg)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY)
|
||||
atomic64_inc(&alg->stats.akcipher.err_cnt);
|
||||
else
|
||||
atomic64_inc(&alg->stats.akcipher.sign_cnt);
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_akcipher_sign);
|
||||
|
||||
void crypto_stats_akcipher_verify(int ret, struct crypto_alg *alg)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY)
|
||||
atomic64_inc(&alg->stats.akcipher.err_cnt);
|
||||
else
|
||||
atomic64_inc(&alg->stats.akcipher.verify_cnt);
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_akcipher_verify);
|
||||
|
||||
void crypto_stats_compress(unsigned int slen, int ret, struct crypto_alg *alg)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
|
||||
atomic64_inc(&alg->stats.compress.err_cnt);
|
||||
} else {
|
||||
atomic64_inc(&alg->stats.compress.compress_cnt);
|
||||
atomic64_add(slen, &alg->stats.compress.compress_tlen);
|
||||
}
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_compress);
|
||||
|
||||
void crypto_stats_decompress(unsigned int slen, int ret, struct crypto_alg *alg)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
|
||||
atomic64_inc(&alg->stats.compress.err_cnt);
|
||||
} else {
|
||||
atomic64_inc(&alg->stats.compress.decompress_cnt);
|
||||
atomic64_add(slen, &alg->stats.compress.decompress_tlen);
|
||||
}
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_decompress);
|
||||
|
||||
void crypto_stats_ahash_update(unsigned int nbytes, int ret,
|
||||
struct crypto_alg *alg)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY)
|
||||
atomic64_inc(&alg->stats.hash.err_cnt);
|
||||
else
|
||||
atomic64_add(nbytes, &alg->stats.hash.hash_tlen);
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_ahash_update);
|
||||
|
||||
void crypto_stats_ahash_final(unsigned int nbytes, int ret,
|
||||
struct crypto_alg *alg)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
|
||||
atomic64_inc(&alg->stats.hash.err_cnt);
|
||||
} else {
|
||||
atomic64_inc(&alg->stats.hash.hash_cnt);
|
||||
atomic64_add(nbytes, &alg->stats.hash.hash_tlen);
|
||||
}
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_ahash_final);
|
||||
|
||||
void crypto_stats_kpp_set_secret(struct crypto_alg *alg, int ret)
|
||||
{
|
||||
if (ret)
|
||||
atomic64_inc(&alg->stats.kpp.err_cnt);
|
||||
else
|
||||
atomic64_inc(&alg->stats.kpp.setsecret_cnt);
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_kpp_set_secret);
|
||||
|
||||
void crypto_stats_kpp_generate_public_key(struct crypto_alg *alg, int ret)
|
||||
{
|
||||
if (ret)
|
||||
atomic64_inc(&alg->stats.kpp.err_cnt);
|
||||
else
|
||||
atomic64_inc(&alg->stats.kpp.generate_public_key_cnt);
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_kpp_generate_public_key);
|
||||
|
||||
void crypto_stats_kpp_compute_shared_secret(struct crypto_alg *alg, int ret)
|
||||
{
|
||||
if (ret)
|
||||
atomic64_inc(&alg->stats.kpp.err_cnt);
|
||||
else
|
||||
atomic64_inc(&alg->stats.kpp.compute_shared_secret_cnt);
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_kpp_compute_shared_secret);
|
||||
|
||||
void crypto_stats_rng_seed(struct crypto_alg *alg, int ret)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY)
|
||||
atomic64_inc(&alg->stats.rng.err_cnt);
|
||||
else
|
||||
atomic64_inc(&alg->stats.rng.seed_cnt);
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_rng_seed);
|
||||
|
||||
void crypto_stats_rng_generate(struct crypto_alg *alg, unsigned int dlen,
|
||||
int ret)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
|
||||
atomic64_inc(&alg->stats.rng.err_cnt);
|
||||
} else {
|
||||
atomic64_inc(&alg->stats.rng.generate_cnt);
|
||||
atomic64_add(dlen, &alg->stats.rng.generate_tlen);
|
||||
}
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_rng_generate);
|
||||
|
||||
void crypto_stats_skcipher_encrypt(unsigned int cryptlen, int ret,
|
||||
struct crypto_alg *alg)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
|
||||
atomic64_inc(&alg->stats.cipher.err_cnt);
|
||||
} else {
|
||||
atomic64_inc(&alg->stats.cipher.encrypt_cnt);
|
||||
atomic64_add(cryptlen, &alg->stats.cipher.encrypt_tlen);
|
||||
}
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_skcipher_encrypt);
|
||||
|
||||
void crypto_stats_skcipher_decrypt(unsigned int cryptlen, int ret,
|
||||
struct crypto_alg *alg)
|
||||
{
|
||||
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
|
||||
atomic64_inc(&alg->stats.cipher.err_cnt);
|
||||
} else {
|
||||
atomic64_inc(&alg->stats.cipher.decrypt_cnt);
|
||||
atomic64_add(cryptlen, &alg->stats.cipher.decrypt_tlen);
|
||||
}
|
||||
crypto_alg_put(alg);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_stats_skcipher_decrypt);
|
||||
#endif
|
||||
|
||||
static int __init crypto_algapi_init(void)
|
||||
{
|
||||
crypto_init_proc();
|
||||
|
@ -507,23 +507,18 @@ static int crypto_blkcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_report_blkcipher rblkcipher;
|
||||
|
||||
strncpy(rblkcipher.type, "blkcipher", sizeof(rblkcipher.type));
|
||||
strncpy(rblkcipher.geniv, alg->cra_blkcipher.geniv ?: "<default>",
|
||||
sizeof(rblkcipher.geniv));
|
||||
rblkcipher.geniv[sizeof(rblkcipher.geniv) - 1] = '\0';
|
||||
memset(&rblkcipher, 0, sizeof(rblkcipher));
|
||||
|
||||
strscpy(rblkcipher.type, "blkcipher", sizeof(rblkcipher.type));
|
||||
strscpy(rblkcipher.geniv, "<default>", sizeof(rblkcipher.geniv));
|
||||
|
||||
rblkcipher.blocksize = alg->cra_blocksize;
|
||||
rblkcipher.min_keysize = alg->cra_blkcipher.min_keysize;
|
||||
rblkcipher.max_keysize = alg->cra_blkcipher.max_keysize;
|
||||
rblkcipher.ivsize = alg->cra_blkcipher.ivsize;
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER,
|
||||
sizeof(struct crypto_report_blkcipher), &rblkcipher))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER,
|
||||
sizeof(rblkcipher), &rblkcipher);
|
||||
}
|
||||
#else
|
||||
static int crypto_blkcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
@ -541,8 +536,7 @@ static void crypto_blkcipher_show(struct seq_file *m, struct crypto_alg *alg)
|
||||
seq_printf(m, "min keysize : %u\n", alg->cra_blkcipher.min_keysize);
|
||||
seq_printf(m, "max keysize : %u\n", alg->cra_blkcipher.max_keysize);
|
||||
seq_printf(m, "ivsize : %u\n", alg->cra_blkcipher.ivsize);
|
||||
seq_printf(m, "geniv : %s\n", alg->cra_blkcipher.geniv ?:
|
||||
"<default>");
|
||||
seq_printf(m, "geniv : <default>\n");
|
||||
}
|
||||
|
||||
const struct crypto_type crypto_blkcipher_type = {
|
||||
|
@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct skcipher_walk *walk,
|
||||
|
||||
do {
|
||||
crypto_cfb_encrypt_one(tfm, iv, dst);
|
||||
crypto_xor(dst, iv, bsize);
|
||||
crypto_xor(dst, src, bsize);
|
||||
iv = src;
|
||||
|
||||
src += bsize;
|
||||
|
@ -1,137 +0,0 @@
|
||||
/*
|
||||
* ChaCha20 256-bit cipher algorithm, RFC7539
|
||||
*
|
||||
* Copyright (C) 2015 Martin Willi
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License as published by
|
||||
* the Free Software Foundation; either version 2 of the License, or
|
||||
* (at your option) any later version.
|
||||
*/
|
||||
|
||||
#include <asm/unaligned.h>
|
||||
#include <crypto/algapi.h>
|
||||
#include <crypto/chacha20.h>
|
||||
#include <crypto/internal/skcipher.h>
|
||||
#include <linux/module.h>
|
||||
|
||||
static void chacha20_docrypt(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int bytes)
|
||||
{
|
||||
/* aligned to potentially speed up crypto_xor() */
|
||||
u8 stream[CHACHA20_BLOCK_SIZE] __aligned(sizeof(long));
|
||||
|
||||
if (dst != src)
|
||||
memcpy(dst, src, bytes);
|
||||
|
||||
while (bytes >= CHACHA20_BLOCK_SIZE) {
|
||||
chacha20_block(state, stream);
|
||||
crypto_xor(dst, stream, CHACHA20_BLOCK_SIZE);
|
||||
bytes -= CHACHA20_BLOCK_SIZE;
|
||||
dst += CHACHA20_BLOCK_SIZE;
|
||||
}
|
||||
if (bytes) {
|
||||
chacha20_block(state, stream);
|
||||
crypto_xor(dst, stream, bytes);
|
||||
}
|
||||
}
|
||||
|
||||
void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv)
|
||||
{
|
||||
state[0] = 0x61707865; /* "expa" */
|
||||
state[1] = 0x3320646e; /* "nd 3" */
|
||||
state[2] = 0x79622d32; /* "2-by" */
|
||||
state[3] = 0x6b206574; /* "te k" */
|
||||
state[4] = ctx->key[0];
|
||||
state[5] = ctx->key[1];
|
||||
state[6] = ctx->key[2];
|
||||
state[7] = ctx->key[3];
|
||||
state[8] = ctx->key[4];
|
||||
state[9] = ctx->key[5];
|
||||
state[10] = ctx->key[6];
|
||||
state[11] = ctx->key[7];
|
||||
state[12] = get_unaligned_le32(iv + 0);
|
||||
state[13] = get_unaligned_le32(iv + 4);
|
||||
state[14] = get_unaligned_le32(iv + 8);
|
||||
state[15] = get_unaligned_le32(iv + 12);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_chacha20_init);
|
||||
|
||||
int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
|
||||
unsigned int keysize)
|
||||
{
|
||||
struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
|
||||
int i;
|
||||
|
||||
if (keysize != CHACHA20_KEY_SIZE)
|
||||
return -EINVAL;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(ctx->key); i++)
|
||||
ctx->key[i] = get_unaligned_le32(key + i * sizeof(u32));
|
||||
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_chacha20_setkey);
|
||||
|
||||
int crypto_chacha20_crypt(struct skcipher_request *req)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
|
||||
struct skcipher_walk walk;
|
||||
u32 state[16];
|
||||
int err;
|
||||
|
||||
err = skcipher_walk_virt(&walk, req, true);
|
||||
|
||||
crypto_chacha20_init(state, ctx, walk.iv);
|
||||
|
||||
while (walk.nbytes > 0) {
|
||||
unsigned int nbytes = walk.nbytes;
|
||||
|
||||
if (nbytes < walk.total)
|
||||
nbytes = round_down(nbytes, walk.stride);
|
||||
|
||||
chacha20_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr,
|
||||
nbytes);
|
||||
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
|
||||
}
|
||||
|
||||
return err;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_chacha20_crypt);
|
||||
|
||||
static struct skcipher_alg alg = {
|
||||
.base.cra_name = "chacha20",
|
||||
.base.cra_driver_name = "chacha20-generic",
|
||||
.base.cra_priority = 100,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha20_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA20_KEY_SIZE,
|
||||
.max_keysize = CHACHA20_KEY_SIZE,
|
||||
.ivsize = CHACHA20_IV_SIZE,
|
||||
.chunksize = CHACHA20_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha20_setkey,
|
||||
.encrypt = crypto_chacha20_crypt,
|
||||
.decrypt = crypto_chacha20_crypt,
|
||||
};
|
||||
|
||||
static int __init chacha20_generic_mod_init(void)
|
||||
{
|
||||
return crypto_register_skcipher(&alg);
|
||||
}
|
||||
|
||||
static void __exit chacha20_generic_mod_fini(void)
|
||||
{
|
||||
crypto_unregister_skcipher(&alg);
|
||||
}
|
||||
|
||||
module_init(chacha20_generic_mod_init);
|
||||
module_exit(chacha20_generic_mod_fini);
|
||||
|
||||
MODULE_LICENSE("GPL");
|
||||
MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
|
||||
MODULE_DESCRIPTION("chacha20 cipher algorithm");
|
||||
MODULE_ALIAS_CRYPTO("chacha20");
|
||||
MODULE_ALIAS_CRYPTO("chacha20-generic");
|
@ -13,7 +13,7 @@
|
||||
#include <crypto/internal/hash.h>
|
||||
#include <crypto/internal/skcipher.h>
|
||||
#include <crypto/scatterwalk.h>
|
||||
#include <crypto/chacha20.h>
|
||||
#include <crypto/chacha.h>
|
||||
#include <crypto/poly1305.h>
|
||||
#include <linux/err.h>
|
||||
#include <linux/init.h>
|
||||
@ -22,8 +22,6 @@
|
||||
|
||||
#include "internal.h"
|
||||
|
||||
#define CHACHAPOLY_IV_SIZE 12
|
||||
|
||||
struct chachapoly_instance_ctx {
|
||||
struct crypto_skcipher_spawn chacha;
|
||||
struct crypto_ahash_spawn poly;
|
||||
@ -51,7 +49,7 @@ struct poly_req {
|
||||
};
|
||||
|
||||
struct chacha_req {
|
||||
u8 iv[CHACHA20_IV_SIZE];
|
||||
u8 iv[CHACHA_IV_SIZE];
|
||||
struct scatterlist src[1];
|
||||
struct skcipher_request req; /* must be last member */
|
||||
};
|
||||
@ -91,7 +89,7 @@ static void chacha_iv(u8 *iv, struct aead_request *req, u32 icb)
|
||||
memcpy(iv, &leicb, sizeof(leicb));
|
||||
memcpy(iv + sizeof(leicb), ctx->salt, ctx->saltlen);
|
||||
memcpy(iv + sizeof(leicb) + ctx->saltlen, req->iv,
|
||||
CHACHA20_IV_SIZE - sizeof(leicb) - ctx->saltlen);
|
||||
CHACHA_IV_SIZE - sizeof(leicb) - ctx->saltlen);
|
||||
}
|
||||
|
||||
static int poly_verify_tag(struct aead_request *req)
|
||||
@ -494,7 +492,7 @@ static int chachapoly_setkey(struct crypto_aead *aead, const u8 *key,
|
||||
struct chachapoly_ctx *ctx = crypto_aead_ctx(aead);
|
||||
int err;
|
||||
|
||||
if (keylen != ctx->saltlen + CHACHA20_KEY_SIZE)
|
||||
if (keylen != ctx->saltlen + CHACHA_KEY_SIZE)
|
||||
return -EINVAL;
|
||||
|
||||
keylen -= ctx->saltlen;
|
||||
@ -639,7 +637,7 @@ static int chachapoly_create(struct crypto_template *tmpl, struct rtattr **tb,
|
||||
|
||||
err = -EINVAL;
|
||||
/* Need 16-byte IV size, including Initial Block Counter value */
|
||||
if (crypto_skcipher_alg_ivsize(chacha) != CHACHA20_IV_SIZE)
|
||||
if (crypto_skcipher_alg_ivsize(chacha) != CHACHA_IV_SIZE)
|
||||
goto out_drop_chacha;
|
||||
/* Not a stream cipher? */
|
||||
if (chacha->base.cra_blocksize != 1)
|
||||
|
217
crypto/chacha_generic.c
Normal file
217
crypto/chacha_generic.c
Normal file
@ -0,0 +1,217 @@
|
||||
/*
|
||||
* ChaCha and XChaCha stream ciphers, including ChaCha20 (RFC7539)
|
||||
*
|
||||
* Copyright (C) 2015 Martin Willi
|
||||
* Copyright (C) 2018 Google LLC
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or modify
|
||||
* it under the terms of the GNU General Public License as published by
|
||||
* the Free Software Foundation; either version 2 of the License, or
|
||||
* (at your option) any later version.
|
||||
*/
|
||||
|
||||
#include <asm/unaligned.h>
|
||||
#include <crypto/algapi.h>
|
||||
#include <crypto/chacha.h>
|
||||
#include <crypto/internal/skcipher.h>
|
||||
#include <linux/module.h>
|
||||
|
||||
static void chacha_docrypt(u32 *state, u8 *dst, const u8 *src,
|
||||
unsigned int bytes, int nrounds)
|
||||
{
|
||||
/* aligned to potentially speed up crypto_xor() */
|
||||
u8 stream[CHACHA_BLOCK_SIZE] __aligned(sizeof(long));
|
||||
|
||||
if (dst != src)
|
||||
memcpy(dst, src, bytes);
|
||||
|
||||
while (bytes >= CHACHA_BLOCK_SIZE) {
|
||||
chacha_block(state, stream, nrounds);
|
||||
crypto_xor(dst, stream, CHACHA_BLOCK_SIZE);
|
||||
bytes -= CHACHA_BLOCK_SIZE;
|
||||
dst += CHACHA_BLOCK_SIZE;
|
||||
}
|
||||
if (bytes) {
|
||||
chacha_block(state, stream, nrounds);
|
||||
crypto_xor(dst, stream, bytes);
|
||||
}
|
||||
}
|
||||
|
||||
static int chacha_stream_xor(struct skcipher_request *req,
|
||||
struct chacha_ctx *ctx, u8 *iv)
|
||||
{
|
||||
struct skcipher_walk walk;
|
||||
u32 state[16];
|
||||
int err;
|
||||
|
||||
err = skcipher_walk_virt(&walk, req, false);
|
||||
|
||||
crypto_chacha_init(state, ctx, iv);
|
||||
|
||||
while (walk.nbytes > 0) {
|
||||
unsigned int nbytes = walk.nbytes;
|
||||
|
||||
if (nbytes < walk.total)
|
||||
nbytes = round_down(nbytes, walk.stride);
|
||||
|
||||
chacha_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr,
|
||||
nbytes, ctx->nrounds);
|
||||
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
|
||||
}
|
||||
|
||||
return err;
|
||||
}
|
||||
|
||||
void crypto_chacha_init(u32 *state, struct chacha_ctx *ctx, u8 *iv)
|
||||
{
|
||||
state[0] = 0x61707865; /* "expa" */
|
||||
state[1] = 0x3320646e; /* "nd 3" */
|
||||
state[2] = 0x79622d32; /* "2-by" */
|
||||
state[3] = 0x6b206574; /* "te k" */
|
||||
state[4] = ctx->key[0];
|
||||
state[5] = ctx->key[1];
|
||||
state[6] = ctx->key[2];
|
||||
state[7] = ctx->key[3];
|
||||
state[8] = ctx->key[4];
|
||||
state[9] = ctx->key[5];
|
||||
state[10] = ctx->key[6];
|
||||
state[11] = ctx->key[7];
|
||||
state[12] = get_unaligned_le32(iv + 0);
|
||||
state[13] = get_unaligned_le32(iv + 4);
|
||||
state[14] = get_unaligned_le32(iv + 8);
|
||||
state[15] = get_unaligned_le32(iv + 12);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_chacha_init);
|
||||
|
||||
static int chacha_setkey(struct crypto_skcipher *tfm, const u8 *key,
|
||||
unsigned int keysize, int nrounds)
|
||||
{
|
||||
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
|
||||
int i;
|
||||
|
||||
if (keysize != CHACHA_KEY_SIZE)
|
||||
return -EINVAL;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(ctx->key); i++)
|
||||
ctx->key[i] = get_unaligned_le32(key + i * sizeof(u32));
|
||||
|
||||
ctx->nrounds = nrounds;
|
||||
return 0;
|
||||
}
|
||||
|
||||
int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
|
||||
unsigned int keysize)
|
||||
{
|
||||
return chacha_setkey(tfm, key, keysize, 20);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_chacha20_setkey);
|
||||
|
||||
int crypto_chacha12_setkey(struct crypto_skcipher *tfm, const u8 *key,
|
||||
unsigned int keysize)
|
||||
{
|
||||
return chacha_setkey(tfm, key, keysize, 12);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_chacha12_setkey);
|
||||
|
||||
int crypto_chacha_crypt(struct skcipher_request *req)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
|
||||
|
||||
return chacha_stream_xor(req, ctx, req->iv);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_chacha_crypt);
|
||||
|
||||
int crypto_xchacha_crypt(struct skcipher_request *req)
|
||||
{
|
||||
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
|
||||
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
|
||||
struct chacha_ctx subctx;
|
||||
u32 state[16];
|
||||
u8 real_iv[16];
|
||||
|
||||
/* Compute the subkey given the original key and first 128 nonce bits */
|
||||
crypto_chacha_init(state, ctx, req->iv);
|
||||
hchacha_block(state, subctx.key, ctx->nrounds);
|
||||
subctx.nrounds = ctx->nrounds;
|
||||
|
||||
/* Build the real IV */
|
||||
memcpy(&real_iv[0], req->iv + 24, 8); /* stream position */
|
||||
memcpy(&real_iv[8], req->iv + 16, 8); /* remaining 64 nonce bits */
|
||||
|
||||
/* Generate the stream and XOR it with the data */
|
||||
return chacha_stream_xor(req, &subctx, real_iv);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_xchacha_crypt);
|
||||
|
||||
static struct skcipher_alg algs[] = {
|
||||
{
|
||||
.base.cra_name = "chacha20",
|
||||
.base.cra_driver_name = "chacha20-generic",
|
||||
.base.cra_priority = 100,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA_KEY_SIZE,
|
||||
.max_keysize = CHACHA_KEY_SIZE,
|
||||
.ivsize = CHACHA_IV_SIZE,
|
||||
.chunksize = CHACHA_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha20_setkey,
|
||||
.encrypt = crypto_chacha_crypt,
|
||||
.decrypt = crypto_chacha_crypt,
|
||||
}, {
|
||||
.base.cra_name = "xchacha20",
|
||||
.base.cra_driver_name = "xchacha20-generic",
|
||||
.base.cra_priority = 100,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA_KEY_SIZE,
|
||||
.max_keysize = CHACHA_KEY_SIZE,
|
||||
.ivsize = XCHACHA_IV_SIZE,
|
||||
.chunksize = CHACHA_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha20_setkey,
|
||||
.encrypt = crypto_xchacha_crypt,
|
||||
.decrypt = crypto_xchacha_crypt,
|
||||
}, {
|
||||
.base.cra_name = "xchacha12",
|
||||
.base.cra_driver_name = "xchacha12-generic",
|
||||
.base.cra_priority = 100,
|
||||
.base.cra_blocksize = 1,
|
||||
.base.cra_ctxsize = sizeof(struct chacha_ctx),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
|
||||
.min_keysize = CHACHA_KEY_SIZE,
|
||||
.max_keysize = CHACHA_KEY_SIZE,
|
||||
.ivsize = XCHACHA_IV_SIZE,
|
||||
.chunksize = CHACHA_BLOCK_SIZE,
|
||||
.setkey = crypto_chacha12_setkey,
|
||||
.encrypt = crypto_xchacha_crypt,
|
||||
.decrypt = crypto_xchacha_crypt,
|
||||
}
|
||||
};
|
||||
|
||||
static int __init chacha_generic_mod_init(void)
|
||||
{
|
||||
return crypto_register_skciphers(algs, ARRAY_SIZE(algs));
|
||||
}
|
||||
|
||||
static void __exit chacha_generic_mod_fini(void)
|
||||
{
|
||||
crypto_unregister_skciphers(algs, ARRAY_SIZE(algs));
|
||||
}
|
||||
|
||||
module_init(chacha_generic_mod_init);
|
||||
module_exit(chacha_generic_mod_fini);
|
||||
|
||||
MODULE_LICENSE("GPL");
|
||||
MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
|
||||
MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers (generic)");
|
||||
MODULE_ALIAS_CRYPTO("chacha20");
|
||||
MODULE_ALIAS_CRYPTO("chacha20-generic");
|
||||
MODULE_ALIAS_CRYPTO("xchacha20");
|
||||
MODULE_ALIAS_CRYPTO("xchacha20-generic");
|
||||
MODULE_ALIAS_CRYPTO("xchacha12");
|
||||
MODULE_ALIAS_CRYPTO("xchacha12-generic");
|
@ -422,8 +422,6 @@ static int cryptd_create_blkcipher(struct crypto_template *tmpl,
|
||||
inst->alg.cra_ablkcipher.min_keysize = alg->cra_blkcipher.min_keysize;
|
||||
inst->alg.cra_ablkcipher.max_keysize = alg->cra_blkcipher.max_keysize;
|
||||
|
||||
inst->alg.cra_ablkcipher.geniv = alg->cra_blkcipher.geniv;
|
||||
|
||||
inst->alg.cra_ctxsize = sizeof(struct cryptd_blkcipher_ctx);
|
||||
|
||||
inst->alg.cra_init = cryptd_blkcipher_init_tfm;
|
||||
@ -1174,7 +1172,7 @@ struct cryptd_ablkcipher *cryptd_alloc_ablkcipher(const char *alg_name,
|
||||
return ERR_PTR(-EINVAL);
|
||||
type = crypto_skcipher_type(type);
|
||||
mask &= ~CRYPTO_ALG_TYPE_MASK;
|
||||
mask |= (CRYPTO_ALG_GENIV | CRYPTO_ALG_TYPE_BLKCIPHER_MASK);
|
||||
mask |= CRYPTO_ALG_TYPE_BLKCIPHER_MASK;
|
||||
tfm = crypto_alloc_base(cryptd_alg_name, type, mask);
|
||||
if (IS_ERR(tfm))
|
||||
return ERR_CAST(tfm);
|
||||
|
@ -84,87 +84,38 @@ static int crypto_report_cipher(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_report_cipher rcipher;
|
||||
|
||||
strncpy(rcipher.type, "cipher", sizeof(rcipher.type));
|
||||
memset(&rcipher, 0, sizeof(rcipher));
|
||||
|
||||
strscpy(rcipher.type, "cipher", sizeof(rcipher.type));
|
||||
|
||||
rcipher.blocksize = alg->cra_blocksize;
|
||||
rcipher.min_keysize = alg->cra_cipher.cia_min_keysize;
|
||||
rcipher.max_keysize = alg->cra_cipher.cia_max_keysize;
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_CIPHER,
|
||||
sizeof(struct crypto_report_cipher), &rcipher))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_REPORT_CIPHER,
|
||||
sizeof(rcipher), &rcipher);
|
||||
}
|
||||
|
||||
static int crypto_report_comp(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_report_comp rcomp;
|
||||
|
||||
strncpy(rcomp.type, "compression", sizeof(rcomp.type));
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_COMPRESS,
|
||||
sizeof(struct crypto_report_comp), &rcomp))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
memset(&rcomp, 0, sizeof(rcomp));
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
}
|
||||
strscpy(rcomp.type, "compression", sizeof(rcomp.type));
|
||||
|
||||
static int crypto_report_acomp(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_report_acomp racomp;
|
||||
|
||||
strncpy(racomp.type, "acomp", sizeof(racomp.type));
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_ACOMP,
|
||||
sizeof(struct crypto_report_acomp), &racomp))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
}
|
||||
|
||||
static int crypto_report_akcipher(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_report_akcipher rakcipher;
|
||||
|
||||
strncpy(rakcipher.type, "akcipher", sizeof(rakcipher.type));
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_AKCIPHER,
|
||||
sizeof(struct crypto_report_akcipher), &rakcipher))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
}
|
||||
|
||||
static int crypto_report_kpp(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_report_kpp rkpp;
|
||||
|
||||
strncpy(rkpp.type, "kpp", sizeof(rkpp.type));
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_KPP,
|
||||
sizeof(struct crypto_report_kpp), &rkpp))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_REPORT_COMPRESS, sizeof(rcomp), &rcomp);
|
||||
}
|
||||
|
||||
static int crypto_report_one(struct crypto_alg *alg,
|
||||
struct crypto_user_alg *ualg, struct sk_buff *skb)
|
||||
{
|
||||
strncpy(ualg->cru_name, alg->cra_name, sizeof(ualg->cru_name));
|
||||
strncpy(ualg->cru_driver_name, alg->cra_driver_name,
|
||||
memset(ualg, 0, sizeof(*ualg));
|
||||
|
||||
strscpy(ualg->cru_name, alg->cra_name, sizeof(ualg->cru_name));
|
||||
strscpy(ualg->cru_driver_name, alg->cra_driver_name,
|
||||
sizeof(ualg->cru_driver_name));
|
||||
strncpy(ualg->cru_module_name, module_name(alg->cra_module),
|
||||
strscpy(ualg->cru_module_name, module_name(alg->cra_module),
|
||||
sizeof(ualg->cru_module_name));
|
||||
|
||||
ualg->cru_type = 0;
|
||||
@ -177,9 +128,9 @@ static int crypto_report_one(struct crypto_alg *alg,
|
||||
if (alg->cra_flags & CRYPTO_ALG_LARVAL) {
|
||||
struct crypto_report_larval rl;
|
||||
|
||||
strncpy(rl.type, "larval", sizeof(rl.type));
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_LARVAL,
|
||||
sizeof(struct crypto_report_larval), &rl))
|
||||
memset(&rl, 0, sizeof(rl));
|
||||
strscpy(rl.type, "larval", sizeof(rl.type));
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_LARVAL, sizeof(rl), &rl))
|
||||
goto nla_put_failure;
|
||||
goto out;
|
||||
}
|
||||
@ -202,20 +153,6 @@ static int crypto_report_one(struct crypto_alg *alg,
|
||||
goto nla_put_failure;
|
||||
|
||||
break;
|
||||
case CRYPTO_ALG_TYPE_ACOMPRESS:
|
||||
if (crypto_report_acomp(skb, alg))
|
||||
goto nla_put_failure;
|
||||
|
||||
break;
|
||||
case CRYPTO_ALG_TYPE_AKCIPHER:
|
||||
if (crypto_report_akcipher(skb, alg))
|
||||
goto nla_put_failure;
|
||||
|
||||
break;
|
||||
case CRYPTO_ALG_TYPE_KPP:
|
||||
if (crypto_report_kpp(skb, alg))
|
||||
goto nla_put_failure;
|
||||
break;
|
||||
}
|
||||
|
||||
out:
|
||||
@ -294,30 +231,33 @@ drop_alg:
|
||||
|
||||
static int crypto_dump_report(struct sk_buff *skb, struct netlink_callback *cb)
|
||||
{
|
||||
struct crypto_alg *alg;
|
||||
const size_t start_pos = cb->args[0];
|
||||
size_t pos = 0;
|
||||
struct crypto_dump_info info;
|
||||
int err;
|
||||
|
||||
if (cb->args[0])
|
||||
goto out;
|
||||
|
||||
cb->args[0] = 1;
|
||||
struct crypto_alg *alg;
|
||||
int res;
|
||||
|
||||
info.in_skb = cb->skb;
|
||||
info.out_skb = skb;
|
||||
info.nlmsg_seq = cb->nlh->nlmsg_seq;
|
||||
info.nlmsg_flags = NLM_F_MULTI;
|
||||
|
||||
down_read(&crypto_alg_sem);
|
||||
list_for_each_entry(alg, &crypto_alg_list, cra_list) {
|
||||
err = crypto_report_alg(alg, &info);
|
||||
if (err)
|
||||
goto out_err;
|
||||
if (pos >= start_pos) {
|
||||
res = crypto_report_alg(alg, &info);
|
||||
if (res == -EMSGSIZE)
|
||||
break;
|
||||
if (res)
|
||||
goto out;
|
||||
}
|
||||
pos++;
|
||||
}
|
||||
|
||||
cb->args[0] = pos;
|
||||
res = skb->len;
|
||||
out:
|
||||
return skb->len;
|
||||
out_err:
|
||||
return err;
|
||||
up_read(&crypto_alg_sem);
|
||||
return res;
|
||||
}
|
||||
|
||||
static int crypto_dump_report_done(struct netlink_callback *cb)
|
||||
@ -483,9 +423,7 @@ static const struct crypto_link {
|
||||
.dump = crypto_dump_report,
|
||||
.done = crypto_dump_report_done},
|
||||
[CRYPTO_MSG_DELRNG - CRYPTO_MSG_BASE] = { .doit = crypto_del_rng },
|
||||
[CRYPTO_MSG_GETSTAT - CRYPTO_MSG_BASE] = { .doit = crypto_reportstat,
|
||||
.dump = crypto_dump_reportstat,
|
||||
.done = crypto_dump_reportstat_done},
|
||||
[CRYPTO_MSG_GETSTAT - CRYPTO_MSG_BASE] = { .doit = crypto_reportstat},
|
||||
};
|
||||
|
||||
static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
|
||||
@ -505,7 +443,7 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
|
||||
if ((type == (CRYPTO_MSG_GETALG - CRYPTO_MSG_BASE) &&
|
||||
(nlh->nlmsg_flags & NLM_F_DUMP))) {
|
||||
struct crypto_alg *alg;
|
||||
u16 dump_alloc = 0;
|
||||
unsigned long dump_alloc = 0;
|
||||
|
||||
if (link->dump == NULL)
|
||||
return -EINVAL;
|
||||
@ -513,16 +451,16 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
|
||||
down_read(&crypto_alg_sem);
|
||||
list_for_each_entry(alg, &crypto_alg_list, cra_list)
|
||||
dump_alloc += CRYPTO_REPORT_MAXSIZE;
|
||||
up_read(&crypto_alg_sem);
|
||||
|
||||
{
|
||||
struct netlink_dump_control c = {
|
||||
.dump = link->dump,
|
||||
.done = link->done,
|
||||
.min_dump_alloc = dump_alloc,
|
||||
.min_dump_alloc = min(dump_alloc, 65535UL),
|
||||
};
|
||||
err = netlink_dump_start(crypto_nlsk, skb, nlh, &c);
|
||||
}
|
||||
up_read(&crypto_alg_sem);
|
||||
|
||||
return err;
|
||||
}
|
||||
|
@ -33,260 +33,149 @@ struct crypto_dump_info {
|
||||
|
||||
static int crypto_report_aead(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_stat raead;
|
||||
u64 v64;
|
||||
u32 v32;
|
||||
struct crypto_stat_aead raead;
|
||||
|
||||
memset(&raead, 0, sizeof(raead));
|
||||
|
||||
strncpy(raead.type, "aead", sizeof(raead.type));
|
||||
strscpy(raead.type, "aead", sizeof(raead.type));
|
||||
|
||||
v32 = atomic_read(&alg->encrypt_cnt);
|
||||
raead.stat_encrypt_cnt = v32;
|
||||
v64 = atomic64_read(&alg->encrypt_tlen);
|
||||
raead.stat_encrypt_tlen = v64;
|
||||
v32 = atomic_read(&alg->decrypt_cnt);
|
||||
raead.stat_decrypt_cnt = v32;
|
||||
v64 = atomic64_read(&alg->decrypt_tlen);
|
||||
raead.stat_decrypt_tlen = v64;
|
||||
v32 = atomic_read(&alg->aead_err_cnt);
|
||||
raead.stat_aead_err_cnt = v32;
|
||||
raead.stat_encrypt_cnt = atomic64_read(&alg->stats.aead.encrypt_cnt);
|
||||
raead.stat_encrypt_tlen = atomic64_read(&alg->stats.aead.encrypt_tlen);
|
||||
raead.stat_decrypt_cnt = atomic64_read(&alg->stats.aead.decrypt_cnt);
|
||||
raead.stat_decrypt_tlen = atomic64_read(&alg->stats.aead.decrypt_tlen);
|
||||
raead.stat_err_cnt = atomic64_read(&alg->stats.aead.err_cnt);
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_STAT_AEAD,
|
||||
sizeof(struct crypto_stat), &raead))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_STAT_AEAD, sizeof(raead), &raead);
|
||||
}
|
||||
|
||||
static int crypto_report_cipher(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_stat rcipher;
|
||||
u64 v64;
|
||||
u32 v32;
|
||||
struct crypto_stat_cipher rcipher;
|
||||
|
||||
memset(&rcipher, 0, sizeof(rcipher));
|
||||
|
||||
strlcpy(rcipher.type, "cipher", sizeof(rcipher.type));
|
||||
strscpy(rcipher.type, "cipher", sizeof(rcipher.type));
|
||||
|
||||
v32 = atomic_read(&alg->encrypt_cnt);
|
||||
rcipher.stat_encrypt_cnt = v32;
|
||||
v64 = atomic64_read(&alg->encrypt_tlen);
|
||||
rcipher.stat_encrypt_tlen = v64;
|
||||
v32 = atomic_read(&alg->decrypt_cnt);
|
||||
rcipher.stat_decrypt_cnt = v32;
|
||||
v64 = atomic64_read(&alg->decrypt_tlen);
|
||||
rcipher.stat_decrypt_tlen = v64;
|
||||
v32 = atomic_read(&alg->cipher_err_cnt);
|
||||
rcipher.stat_cipher_err_cnt = v32;
|
||||
rcipher.stat_encrypt_cnt = atomic64_read(&alg->stats.cipher.encrypt_cnt);
|
||||
rcipher.stat_encrypt_tlen = atomic64_read(&alg->stats.cipher.encrypt_tlen);
|
||||
rcipher.stat_decrypt_cnt = atomic64_read(&alg->stats.cipher.decrypt_cnt);
|
||||
rcipher.stat_decrypt_tlen = atomic64_read(&alg->stats.cipher.decrypt_tlen);
|
||||
rcipher.stat_err_cnt = atomic64_read(&alg->stats.cipher.err_cnt);
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_STAT_CIPHER,
|
||||
sizeof(struct crypto_stat), &rcipher))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_STAT_CIPHER, sizeof(rcipher), &rcipher);
|
||||
}
|
||||
|
||||
static int crypto_report_comp(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_stat rcomp;
|
||||
u64 v64;
|
||||
u32 v32;
|
||||
struct crypto_stat_compress rcomp;
|
||||
|
||||
memset(&rcomp, 0, sizeof(rcomp));
|
||||
|
||||
strlcpy(rcomp.type, "compression", sizeof(rcomp.type));
|
||||
v32 = atomic_read(&alg->compress_cnt);
|
||||
rcomp.stat_compress_cnt = v32;
|
||||
v64 = atomic64_read(&alg->compress_tlen);
|
||||
rcomp.stat_compress_tlen = v64;
|
||||
v32 = atomic_read(&alg->decompress_cnt);
|
||||
rcomp.stat_decompress_cnt = v32;
|
||||
v64 = atomic64_read(&alg->decompress_tlen);
|
||||
rcomp.stat_decompress_tlen = v64;
|
||||
v32 = atomic_read(&alg->cipher_err_cnt);
|
||||
rcomp.stat_compress_err_cnt = v32;
|
||||
strscpy(rcomp.type, "compression", sizeof(rcomp.type));
|
||||
rcomp.stat_compress_cnt = atomic64_read(&alg->stats.compress.compress_cnt);
|
||||
rcomp.stat_compress_tlen = atomic64_read(&alg->stats.compress.compress_tlen);
|
||||
rcomp.stat_decompress_cnt = atomic64_read(&alg->stats.compress.decompress_cnt);
|
||||
rcomp.stat_decompress_tlen = atomic64_read(&alg->stats.compress.decompress_tlen);
|
||||
rcomp.stat_err_cnt = atomic64_read(&alg->stats.compress.err_cnt);
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_STAT_COMPRESS,
|
||||
sizeof(struct crypto_stat), &rcomp))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_STAT_COMPRESS, sizeof(rcomp), &rcomp);
|
||||
}
|
||||
|
||||
static int crypto_report_acomp(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_stat racomp;
|
||||
u64 v64;
|
||||
u32 v32;
|
||||
struct crypto_stat_compress racomp;
|
||||
|
||||
memset(&racomp, 0, sizeof(racomp));
|
||||
|
||||
strlcpy(racomp.type, "acomp", sizeof(racomp.type));
|
||||
v32 = atomic_read(&alg->compress_cnt);
|
||||
racomp.stat_compress_cnt = v32;
|
||||
v64 = atomic64_read(&alg->compress_tlen);
|
||||
racomp.stat_compress_tlen = v64;
|
||||
v32 = atomic_read(&alg->decompress_cnt);
|
||||
racomp.stat_decompress_cnt = v32;
|
||||
v64 = atomic64_read(&alg->decompress_tlen);
|
||||
racomp.stat_decompress_tlen = v64;
|
||||
v32 = atomic_read(&alg->cipher_err_cnt);
|
||||
racomp.stat_compress_err_cnt = v32;
|
||||
strscpy(racomp.type, "acomp", sizeof(racomp.type));
|
||||
racomp.stat_compress_cnt = atomic64_read(&alg->stats.compress.compress_cnt);
|
||||
racomp.stat_compress_tlen = atomic64_read(&alg->stats.compress.compress_tlen);
|
||||
racomp.stat_decompress_cnt = atomic64_read(&alg->stats.compress.decompress_cnt);
|
||||
racomp.stat_decompress_tlen = atomic64_read(&alg->stats.compress.decompress_tlen);
|
||||
racomp.stat_err_cnt = atomic64_read(&alg->stats.compress.err_cnt);
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_STAT_ACOMP,
|
||||
sizeof(struct crypto_stat), &racomp))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_STAT_ACOMP, sizeof(racomp), &racomp);
|
||||
}
|
||||
|
||||
static int crypto_report_akcipher(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_stat rakcipher;
|
||||
u64 v64;
|
||||
u32 v32;
|
||||
struct crypto_stat_akcipher rakcipher;
|
||||
|
||||
memset(&rakcipher, 0, sizeof(rakcipher));
|
||||
|
||||
strncpy(rakcipher.type, "akcipher", sizeof(rakcipher.type));
|
||||
v32 = atomic_read(&alg->encrypt_cnt);
|
||||
rakcipher.stat_encrypt_cnt = v32;
|
||||
v64 = atomic64_read(&alg->encrypt_tlen);
|
||||
rakcipher.stat_encrypt_tlen = v64;
|
||||
v32 = atomic_read(&alg->decrypt_cnt);
|
||||
rakcipher.stat_decrypt_cnt = v32;
|
||||
v64 = atomic64_read(&alg->decrypt_tlen);
|
||||
rakcipher.stat_decrypt_tlen = v64;
|
||||
v32 = atomic_read(&alg->sign_cnt);
|
||||
rakcipher.stat_sign_cnt = v32;
|
||||
v32 = atomic_read(&alg->verify_cnt);
|
||||
rakcipher.stat_verify_cnt = v32;
|
||||
v32 = atomic_read(&alg->akcipher_err_cnt);
|
||||
rakcipher.stat_akcipher_err_cnt = v32;
|
||||
strscpy(rakcipher.type, "akcipher", sizeof(rakcipher.type));
|
||||
rakcipher.stat_encrypt_cnt = atomic64_read(&alg->stats.akcipher.encrypt_cnt);
|
||||
rakcipher.stat_encrypt_tlen = atomic64_read(&alg->stats.akcipher.encrypt_tlen);
|
||||
rakcipher.stat_decrypt_cnt = atomic64_read(&alg->stats.akcipher.decrypt_cnt);
|
||||
rakcipher.stat_decrypt_tlen = atomic64_read(&alg->stats.akcipher.decrypt_tlen);
|
||||
rakcipher.stat_sign_cnt = atomic64_read(&alg->stats.akcipher.sign_cnt);
|
||||
rakcipher.stat_verify_cnt = atomic64_read(&alg->stats.akcipher.verify_cnt);
|
||||
rakcipher.stat_err_cnt = atomic64_read(&alg->stats.akcipher.err_cnt);
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_STAT_AKCIPHER,
|
||||
sizeof(struct crypto_stat), &rakcipher))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_STAT_AKCIPHER,
|
||||
sizeof(rakcipher), &rakcipher);
|
||||
}
|
||||
|
||||
static int crypto_report_kpp(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_stat rkpp;
|
||||
u32 v;
|
||||
struct crypto_stat_kpp rkpp;
|
||||
|
||||
memset(&rkpp, 0, sizeof(rkpp));
|
||||
|
||||
strlcpy(rkpp.type, "kpp", sizeof(rkpp.type));
|
||||
strscpy(rkpp.type, "kpp", sizeof(rkpp.type));
|
||||
|
||||
v = atomic_read(&alg->setsecret_cnt);
|
||||
rkpp.stat_setsecret_cnt = v;
|
||||
v = atomic_read(&alg->generate_public_key_cnt);
|
||||
rkpp.stat_generate_public_key_cnt = v;
|
||||
v = atomic_read(&alg->compute_shared_secret_cnt);
|
||||
rkpp.stat_compute_shared_secret_cnt = v;
|
||||
v = atomic_read(&alg->kpp_err_cnt);
|
||||
rkpp.stat_kpp_err_cnt = v;
|
||||
rkpp.stat_setsecret_cnt = atomic64_read(&alg->stats.kpp.setsecret_cnt);
|
||||
rkpp.stat_generate_public_key_cnt = atomic64_read(&alg->stats.kpp.generate_public_key_cnt);
|
||||
rkpp.stat_compute_shared_secret_cnt = atomic64_read(&alg->stats.kpp.compute_shared_secret_cnt);
|
||||
rkpp.stat_err_cnt = atomic64_read(&alg->stats.kpp.err_cnt);
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_STAT_KPP,
|
||||
sizeof(struct crypto_stat), &rkpp))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_STAT_KPP, sizeof(rkpp), &rkpp);
|
||||
}
|
||||
|
||||
static int crypto_report_ahash(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_stat rhash;
|
||||
u64 v64;
|
||||
u32 v32;
|
||||
struct crypto_stat_hash rhash;
|
||||
|
||||
memset(&rhash, 0, sizeof(rhash));
|
||||
|
||||
strncpy(rhash.type, "ahash", sizeof(rhash.type));
|
||||
strscpy(rhash.type, "ahash", sizeof(rhash.type));
|
||||
|
||||
v32 = atomic_read(&alg->hash_cnt);
|
||||
rhash.stat_hash_cnt = v32;
|
||||
v64 = atomic64_read(&alg->hash_tlen);
|
||||
rhash.stat_hash_tlen = v64;
|
||||
v32 = atomic_read(&alg->hash_err_cnt);
|
||||
rhash.stat_hash_err_cnt = v32;
|
||||
rhash.stat_hash_cnt = atomic64_read(&alg->stats.hash.hash_cnt);
|
||||
rhash.stat_hash_tlen = atomic64_read(&alg->stats.hash.hash_tlen);
|
||||
rhash.stat_err_cnt = atomic64_read(&alg->stats.hash.err_cnt);
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_STAT_HASH,
|
||||
sizeof(struct crypto_stat), &rhash))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_STAT_HASH, sizeof(rhash), &rhash);
|
||||
}
|
||||
|
||||
static int crypto_report_shash(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_stat rhash;
|
||||
u64 v64;
|
||||
u32 v32;
|
||||
struct crypto_stat_hash rhash;
|
||||
|
||||
memset(&rhash, 0, sizeof(rhash));
|
||||
|
||||
strncpy(rhash.type, "shash", sizeof(rhash.type));
|
||||
strscpy(rhash.type, "shash", sizeof(rhash.type));
|
||||
|
||||
v32 = atomic_read(&alg->hash_cnt);
|
||||
rhash.stat_hash_cnt = v32;
|
||||
v64 = atomic64_read(&alg->hash_tlen);
|
||||
rhash.stat_hash_tlen = v64;
|
||||
v32 = atomic_read(&alg->hash_err_cnt);
|
||||
rhash.stat_hash_err_cnt = v32;
|
||||
rhash.stat_hash_cnt = atomic64_read(&alg->stats.hash.hash_cnt);
|
||||
rhash.stat_hash_tlen = atomic64_read(&alg->stats.hash.hash_tlen);
|
||||
rhash.stat_err_cnt = atomic64_read(&alg->stats.hash.err_cnt);
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_STAT_HASH,
|
||||
sizeof(struct crypto_stat), &rhash))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_STAT_HASH, sizeof(rhash), &rhash);
|
||||
}
|
||||
|
||||
static int crypto_report_rng(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_stat rrng;
|
||||
u64 v64;
|
||||
u32 v32;
|
||||
struct crypto_stat_rng rrng;
|
||||
|
||||
memset(&rrng, 0, sizeof(rrng));
|
||||
|
||||
strncpy(rrng.type, "rng", sizeof(rrng.type));
|
||||
strscpy(rrng.type, "rng", sizeof(rrng.type));
|
||||
|
||||
v32 = atomic_read(&alg->generate_cnt);
|
||||
rrng.stat_generate_cnt = v32;
|
||||
v64 = atomic64_read(&alg->generate_tlen);
|
||||
rrng.stat_generate_tlen = v64;
|
||||
v32 = atomic_read(&alg->seed_cnt);
|
||||
rrng.stat_seed_cnt = v32;
|
||||
v32 = atomic_read(&alg->hash_err_cnt);
|
||||
rrng.stat_rng_err_cnt = v32;
|
||||
rrng.stat_generate_cnt = atomic64_read(&alg->stats.rng.generate_cnt);
|
||||
rrng.stat_generate_tlen = atomic64_read(&alg->stats.rng.generate_tlen);
|
||||
rrng.stat_seed_cnt = atomic64_read(&alg->stats.rng.seed_cnt);
|
||||
rrng.stat_err_cnt = atomic64_read(&alg->stats.rng.err_cnt);
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_STAT_RNG,
|
||||
sizeof(struct crypto_stat), &rrng))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_STAT_RNG, sizeof(rrng), &rrng);
|
||||
}
|
||||
|
||||
static int crypto_reportstat_one(struct crypto_alg *alg,
|
||||
@ -295,10 +184,10 @@ static int crypto_reportstat_one(struct crypto_alg *alg,
|
||||
{
|
||||
memset(ualg, 0, sizeof(*ualg));
|
||||
|
||||
strlcpy(ualg->cru_name, alg->cra_name, sizeof(ualg->cru_name));
|
||||
strlcpy(ualg->cru_driver_name, alg->cra_driver_name,
|
||||
strscpy(ualg->cru_name, alg->cra_name, sizeof(ualg->cru_name));
|
||||
strscpy(ualg->cru_driver_name, alg->cra_driver_name,
|
||||
sizeof(ualg->cru_driver_name));
|
||||
strlcpy(ualg->cru_module_name, module_name(alg->cra_module),
|
||||
strscpy(ualg->cru_module_name, module_name(alg->cra_module),
|
||||
sizeof(ualg->cru_module_name));
|
||||
|
||||
ualg->cru_type = 0;
|
||||
@ -309,12 +198,11 @@ static int crypto_reportstat_one(struct crypto_alg *alg,
|
||||
if (nla_put_u32(skb, CRYPTOCFGA_PRIORITY_VAL, alg->cra_priority))
|
||||
goto nla_put_failure;
|
||||
if (alg->cra_flags & CRYPTO_ALG_LARVAL) {
|
||||
struct crypto_stat rl;
|
||||
struct crypto_stat_larval rl;
|
||||
|
||||
memset(&rl, 0, sizeof(rl));
|
||||
strlcpy(rl.type, "larval", sizeof(rl.type));
|
||||
if (nla_put(skb, CRYPTOCFGA_STAT_LARVAL,
|
||||
sizeof(struct crypto_stat), &rl))
|
||||
strscpy(rl.type, "larval", sizeof(rl.type));
|
||||
if (nla_put(skb, CRYPTOCFGA_STAT_LARVAL, sizeof(rl), &rl))
|
||||
goto nla_put_failure;
|
||||
goto out;
|
||||
}
|
||||
@ -448,37 +336,4 @@ drop_alg:
|
||||
return nlmsg_unicast(crypto_nlsk, skb, NETLINK_CB(in_skb).portid);
|
||||
}
|
||||
|
||||
int crypto_dump_reportstat(struct sk_buff *skb, struct netlink_callback *cb)
|
||||
{
|
||||
struct crypto_alg *alg;
|
||||
struct crypto_dump_info info;
|
||||
int err;
|
||||
|
||||
if (cb->args[0])
|
||||
goto out;
|
||||
|
||||
cb->args[0] = 1;
|
||||
|
||||
info.in_skb = cb->skb;
|
||||
info.out_skb = skb;
|
||||
info.nlmsg_seq = cb->nlh->nlmsg_seq;
|
||||
info.nlmsg_flags = NLM_F_MULTI;
|
||||
|
||||
list_for_each_entry(alg, &crypto_alg_list, cra_list) {
|
||||
err = crypto_reportstat_alg(alg, &info);
|
||||
if (err)
|
||||
goto out_err;
|
||||
}
|
||||
|
||||
out:
|
||||
return skb->len;
|
||||
out_err:
|
||||
return err;
|
||||
}
|
||||
|
||||
int crypto_dump_reportstat_done(struct netlink_callback *cb)
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
MODULE_LICENSE("GPL");
|
||||
|
@ -233,8 +233,6 @@ static struct crypto_instance *crypto_ctr_alloc(struct rtattr **tb)
|
||||
inst->alg.cra_blkcipher.encrypt = crypto_ctr_crypt;
|
||||
inst->alg.cra_blkcipher.decrypt = crypto_ctr_crypt;
|
||||
|
||||
inst->alg.cra_blkcipher.geniv = "chainiv";
|
||||
|
||||
out:
|
||||
crypto_mod_put(alg);
|
||||
return inst;
|
||||
|
58
crypto/ecc.c
58
crypto/ecc.c
@ -842,15 +842,23 @@ static void xycz_add_c(u64 *x1, u64 *y1, u64 *x2, u64 *y2, u64 *curve_prime,
|
||||
|
||||
static void ecc_point_mult(struct ecc_point *result,
|
||||
const struct ecc_point *point, const u64 *scalar,
|
||||
u64 *initial_z, u64 *curve_prime,
|
||||
u64 *initial_z, const struct ecc_curve *curve,
|
||||
unsigned int ndigits)
|
||||
{
|
||||
/* R0 and R1 */
|
||||
u64 rx[2][ECC_MAX_DIGITS];
|
||||
u64 ry[2][ECC_MAX_DIGITS];
|
||||
u64 z[ECC_MAX_DIGITS];
|
||||
u64 sk[2][ECC_MAX_DIGITS];
|
||||
u64 *curve_prime = curve->p;
|
||||
int i, nb;
|
||||
int num_bits = vli_num_bits(scalar, ndigits);
|
||||
int num_bits;
|
||||
int carry;
|
||||
|
||||
carry = vli_add(sk[0], scalar, curve->n, ndigits);
|
||||
vli_add(sk[1], sk[0], curve->n, ndigits);
|
||||
scalar = sk[!carry];
|
||||
num_bits = sizeof(u64) * ndigits * 8 + 1;
|
||||
|
||||
vli_set(rx[1], point->x, ndigits);
|
||||
vli_set(ry[1], point->y, ndigits);
|
||||
@ -904,28 +912,41 @@ static inline void ecc_swap_digits(const u64 *in, u64 *out,
|
||||
out[i] = __swab64(in[ndigits - 1 - i]);
|
||||
}
|
||||
|
||||
static int __ecc_is_key_valid(const struct ecc_curve *curve,
|
||||
const u64 *private_key, unsigned int ndigits)
|
||||
{
|
||||
u64 one[ECC_MAX_DIGITS] = { 1, };
|
||||
u64 res[ECC_MAX_DIGITS];
|
||||
|
||||
if (!private_key)
|
||||
return -EINVAL;
|
||||
|
||||
if (curve->g.ndigits != ndigits)
|
||||
return -EINVAL;
|
||||
|
||||
/* Make sure the private key is in the range [2, n-3]. */
|
||||
if (vli_cmp(one, private_key, ndigits) != -1)
|
||||
return -EINVAL;
|
||||
vli_sub(res, curve->n, one, ndigits);
|
||||
vli_sub(res, res, one, ndigits);
|
||||
if (vli_cmp(res, private_key, ndigits) != 1)
|
||||
return -EINVAL;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int ecc_is_key_valid(unsigned int curve_id, unsigned int ndigits,
|
||||
const u64 *private_key, unsigned int private_key_len)
|
||||
{
|
||||
int nbytes;
|
||||
const struct ecc_curve *curve = ecc_get_curve(curve_id);
|
||||
|
||||
if (!private_key)
|
||||
return -EINVAL;
|
||||
|
||||
nbytes = ndigits << ECC_DIGITS_TO_BYTES_SHIFT;
|
||||
|
||||
if (private_key_len != nbytes)
|
||||
return -EINVAL;
|
||||
|
||||
if (vli_is_zero(private_key, ndigits))
|
||||
return -EINVAL;
|
||||
|
||||
/* Make sure the private key is in the range [1, n-1]. */
|
||||
if (vli_cmp(curve->n, private_key, ndigits) != 1)
|
||||
return -EINVAL;
|
||||
|
||||
return 0;
|
||||
return __ecc_is_key_valid(curve, private_key, ndigits);
|
||||
}
|
||||
|
||||
/*
|
||||
@ -971,11 +992,8 @@ int ecc_gen_privkey(unsigned int curve_id, unsigned int ndigits, u64 *privkey)
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
if (vli_is_zero(priv, ndigits))
|
||||
return -EINVAL;
|
||||
|
||||
/* Make sure the private key is in the range [1, n-1]. */
|
||||
if (vli_cmp(curve->n, priv, ndigits) != 1)
|
||||
/* Make sure the private key is in the valid range. */
|
||||
if (__ecc_is_key_valid(curve, priv, ndigits))
|
||||
return -EINVAL;
|
||||
|
||||
ecc_swap_digits(priv, privkey, ndigits);
|
||||
@ -1004,7 +1022,7 @@ int ecc_make_pub_key(unsigned int curve_id, unsigned int ndigits,
|
||||
goto out;
|
||||
}
|
||||
|
||||
ecc_point_mult(pk, &curve->g, priv, NULL, curve->p, ndigits);
|
||||
ecc_point_mult(pk, &curve->g, priv, NULL, curve, ndigits);
|
||||
if (ecc_point_is_zero(pk)) {
|
||||
ret = -EAGAIN;
|
||||
goto err_free_point;
|
||||
@ -1090,7 +1108,7 @@ int crypto_ecdh_shared_secret(unsigned int curve_id, unsigned int ndigits,
|
||||
goto err_alloc_product;
|
||||
}
|
||||
|
||||
ecc_point_mult(product, pk, priv, rand_z, curve->p, ndigits);
|
||||
ecc_point_mult(product, pk, priv, rand_z, curve, ndigits);
|
||||
|
||||
ecc_swap_digits(product->x, secret, ndigits);
|
||||
|
||||
|
@ -32,6 +32,8 @@ const char *const hash_algo_name[HASH_ALGO__LAST] = {
|
||||
[HASH_ALGO_TGR_160] = "tgr160",
|
||||
[HASH_ALGO_TGR_192] = "tgr192",
|
||||
[HASH_ALGO_SM3_256] = "sm3-256",
|
||||
[HASH_ALGO_STREEBOG_256] = "streebog256",
|
||||
[HASH_ALGO_STREEBOG_512] = "streebog512",
|
||||
};
|
||||
EXPORT_SYMBOL_GPL(hash_algo_name);
|
||||
|
||||
@ -54,5 +56,7 @@ const int hash_digest_size[HASH_ALGO__LAST] = {
|
||||
[HASH_ALGO_TGR_160] = TGR160_DIGEST_SIZE,
|
||||
[HASH_ALGO_TGR_192] = TGR192_DIGEST_SIZE,
|
||||
[HASH_ALGO_SM3_256] = SM3256_DIGEST_SIZE,
|
||||
[HASH_ALGO_STREEBOG_256] = STREEBOG256_DIGEST_SIZE,
|
||||
[HASH_ALGO_STREEBOG_512] = STREEBOG512_DIGEST_SIZE,
|
||||
};
|
||||
EXPORT_SYMBOL_GPL(hash_digest_size);
|
||||
|
10
crypto/kpp.c
10
crypto/kpp.c
@ -30,15 +30,11 @@ static int crypto_kpp_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_report_kpp rkpp;
|
||||
|
||||
strncpy(rkpp.type, "kpp", sizeof(rkpp.type));
|
||||
memset(&rkpp, 0, sizeof(rkpp));
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_KPP,
|
||||
sizeof(struct crypto_report_kpp), &rkpp))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
strscpy(rkpp.type, "kpp", sizeof(rkpp.type));
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_REPORT_KPP, sizeof(rkpp), &rkpp);
|
||||
}
|
||||
#else
|
||||
static int crypto_kpp_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
|
@ -122,7 +122,6 @@ static struct crypto_alg alg_lz4 = {
|
||||
.cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
|
||||
.cra_ctxsize = sizeof(struct lz4_ctx),
|
||||
.cra_module = THIS_MODULE,
|
||||
.cra_list = LIST_HEAD_INIT(alg_lz4.cra_list),
|
||||
.cra_init = lz4_init,
|
||||
.cra_exit = lz4_exit,
|
||||
.cra_u = { .compress = {
|
||||
|
@ -123,7 +123,6 @@ static struct crypto_alg alg_lz4hc = {
|
||||
.cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
|
||||
.cra_ctxsize = sizeof(struct lz4hc_ctx),
|
||||
.cra_module = THIS_MODULE,
|
||||
.cra_list = LIST_HEAD_INIT(alg_lz4hc.cra_list),
|
||||
.cra_init = lz4hc_init,
|
||||
.cra_exit = lz4hc_exit,
|
||||
.cra_u = { .compress = {
|
||||
|
254
crypto/nhpoly1305.c
Normal file
254
crypto/nhpoly1305.c
Normal file
@ -0,0 +1,254 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/*
|
||||
* NHPoly1305 - ε-almost-∆-universal hash function for Adiantum
|
||||
*
|
||||
* Copyright 2018 Google LLC
|
||||
*/
|
||||
|
||||
/*
|
||||
* "NHPoly1305" is the main component of Adiantum hashing.
|
||||
* Specifically, it is the calculation
|
||||
*
|
||||
* H_L ← Poly1305_{K_L}(NH_{K_N}(pad_{128}(L)))
|
||||
*
|
||||
* from the procedure in section 6.4 of the Adiantum paper [1]. It is an
|
||||
* ε-almost-∆-universal (ε-∆U) hash function for equal-length inputs over
|
||||
* Z/(2^{128}Z), where the "∆" operation is addition. It hashes 1024-byte
|
||||
* chunks of the input with the NH hash function [2], reducing the input length
|
||||
* by 32x. The resulting NH digests are evaluated as a polynomial in
|
||||
* GF(2^{130}-5), like in the Poly1305 MAC [3]. Note that the polynomial
|
||||
* evaluation by itself would suffice to achieve the ε-∆U property; NH is used
|
||||
* for performance since it's over twice as fast as Poly1305.
|
||||
*
|
||||
* This is *not* a cryptographic hash function; do not use it as such!
|
||||
*
|
||||
* [1] Adiantum: length-preserving encryption for entry-level processors
|
||||
* (https://eprint.iacr.org/2018/720.pdf)
|
||||
* [2] UMAC: Fast and Secure Message Authentication
|
||||
* (https://fastcrypto.org/umac/umac_proc.pdf)
|
||||
* [3] The Poly1305-AES message-authentication code
|
||||
* (https://cr.yp.to/mac/poly1305-20050329.pdf)
|
||||
*/
|
||||
|
||||
#include <asm/unaligned.h>
|
||||
#include <crypto/algapi.h>
|
||||
#include <crypto/internal/hash.h>
|
||||
#include <crypto/nhpoly1305.h>
|
||||
#include <linux/crypto.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/module.h>
|
||||
|
||||
static void nh_generic(const u32 *key, const u8 *message, size_t message_len,
|
||||
__le64 hash[NH_NUM_PASSES])
|
||||
{
|
||||
u64 sums[4] = { 0, 0, 0, 0 };
|
||||
|
||||
BUILD_BUG_ON(NH_PAIR_STRIDE != 2);
|
||||
BUILD_BUG_ON(NH_NUM_PASSES != 4);
|
||||
|
||||
while (message_len) {
|
||||
u32 m0 = get_unaligned_le32(message + 0);
|
||||
u32 m1 = get_unaligned_le32(message + 4);
|
||||
u32 m2 = get_unaligned_le32(message + 8);
|
||||
u32 m3 = get_unaligned_le32(message + 12);
|
||||
|
||||
sums[0] += (u64)(u32)(m0 + key[ 0]) * (u32)(m2 + key[ 2]);
|
||||
sums[1] += (u64)(u32)(m0 + key[ 4]) * (u32)(m2 + key[ 6]);
|
||||
sums[2] += (u64)(u32)(m0 + key[ 8]) * (u32)(m2 + key[10]);
|
||||
sums[3] += (u64)(u32)(m0 + key[12]) * (u32)(m2 + key[14]);
|
||||
sums[0] += (u64)(u32)(m1 + key[ 1]) * (u32)(m3 + key[ 3]);
|
||||
sums[1] += (u64)(u32)(m1 + key[ 5]) * (u32)(m3 + key[ 7]);
|
||||
sums[2] += (u64)(u32)(m1 + key[ 9]) * (u32)(m3 + key[11]);
|
||||
sums[3] += (u64)(u32)(m1 + key[13]) * (u32)(m3 + key[15]);
|
||||
key += NH_MESSAGE_UNIT / sizeof(key[0]);
|
||||
message += NH_MESSAGE_UNIT;
|
||||
message_len -= NH_MESSAGE_UNIT;
|
||||
}
|
||||
|
||||
hash[0] = cpu_to_le64(sums[0]);
|
||||
hash[1] = cpu_to_le64(sums[1]);
|
||||
hash[2] = cpu_to_le64(sums[2]);
|
||||
hash[3] = cpu_to_le64(sums[3]);
|
||||
}
|
||||
|
||||
/* Pass the next NH hash value through Poly1305 */
|
||||
static void process_nh_hash_value(struct nhpoly1305_state *state,
|
||||
const struct nhpoly1305_key *key)
|
||||
{
|
||||
BUILD_BUG_ON(NH_HASH_BYTES % POLY1305_BLOCK_SIZE != 0);
|
||||
|
||||
poly1305_core_blocks(&state->poly_state, &key->poly_key, state->nh_hash,
|
||||
NH_HASH_BYTES / POLY1305_BLOCK_SIZE);
|
||||
}
|
||||
|
||||
/*
|
||||
* Feed the next portion of the source data, as a whole number of 16-byte
|
||||
* "NH message units", through NH and Poly1305. Each NH hash is taken over
|
||||
* 1024 bytes, except possibly the final one which is taken over a multiple of
|
||||
* 16 bytes up to 1024. Also, in the case where data is passed in misaligned
|
||||
* chunks, we combine partial hashes; the end result is the same either way.
|
||||
*/
|
||||
static void nhpoly1305_units(struct nhpoly1305_state *state,
|
||||
const struct nhpoly1305_key *key,
|
||||
const u8 *src, unsigned int srclen, nh_t nh_fn)
|
||||
{
|
||||
do {
|
||||
unsigned int bytes;
|
||||
|
||||
if (state->nh_remaining == 0) {
|
||||
/* Starting a new NH message */
|
||||
bytes = min_t(unsigned int, srclen, NH_MESSAGE_BYTES);
|
||||
nh_fn(key->nh_key, src, bytes, state->nh_hash);
|
||||
state->nh_remaining = NH_MESSAGE_BYTES - bytes;
|
||||
} else {
|
||||
/* Continuing a previous NH message */
|
||||
__le64 tmp_hash[NH_NUM_PASSES];
|
||||
unsigned int pos;
|
||||
int i;
|
||||
|
||||
pos = NH_MESSAGE_BYTES - state->nh_remaining;
|
||||
bytes = min(srclen, state->nh_remaining);
|
||||
nh_fn(&key->nh_key[pos / 4], src, bytes, tmp_hash);
|
||||
for (i = 0; i < NH_NUM_PASSES; i++)
|
||||
le64_add_cpu(&state->nh_hash[i],
|
||||
le64_to_cpu(tmp_hash[i]));
|
||||
state->nh_remaining -= bytes;
|
||||
}
|
||||
if (state->nh_remaining == 0)
|
||||
process_nh_hash_value(state, key);
|
||||
src += bytes;
|
||||
srclen -= bytes;
|
||||
} while (srclen);
|
||||
}
|
||||
|
||||
int crypto_nhpoly1305_setkey(struct crypto_shash *tfm,
|
||||
const u8 *key, unsigned int keylen)
|
||||
{
|
||||
struct nhpoly1305_key *ctx = crypto_shash_ctx(tfm);
|
||||
int i;
|
||||
|
||||
if (keylen != NHPOLY1305_KEY_SIZE)
|
||||
return -EINVAL;
|
||||
|
||||
poly1305_core_setkey(&ctx->poly_key, key);
|
||||
key += POLY1305_BLOCK_SIZE;
|
||||
|
||||
for (i = 0; i < NH_KEY_WORDS; i++)
|
||||
ctx->nh_key[i] = get_unaligned_le32(key + i * sizeof(u32));
|
||||
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(crypto_nhpoly1305_setkey);
|
||||
|
||||
int crypto_nhpoly1305_init(struct shash_desc *desc)
|
||||
{
|
||||
struct nhpoly1305_state *state = shash_desc_ctx(desc);
|
||||
|
||||
poly1305_core_init(&state->poly_state);
|
||||
state->buflen = 0;
|
||||
state->nh_remaining = 0;
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(crypto_nhpoly1305_init);
|
||||
|
||||
int crypto_nhpoly1305_update_helper(struct shash_desc *desc,
|
||||
const u8 *src, unsigned int srclen,
|
||||
nh_t nh_fn)
|
||||
{
|
||||
struct nhpoly1305_state *state = shash_desc_ctx(desc);
|
||||
const struct nhpoly1305_key *key = crypto_shash_ctx(desc->tfm);
|
||||
unsigned int bytes;
|
||||
|
||||
if (state->buflen) {
|
||||
bytes = min(srclen, (int)NH_MESSAGE_UNIT - state->buflen);
|
||||
memcpy(&state->buffer[state->buflen], src, bytes);
|
||||
state->buflen += bytes;
|
||||
if (state->buflen < NH_MESSAGE_UNIT)
|
||||
return 0;
|
||||
nhpoly1305_units(state, key, state->buffer, NH_MESSAGE_UNIT,
|
||||
nh_fn);
|
||||
state->buflen = 0;
|
||||
src += bytes;
|
||||
srclen -= bytes;
|
||||
}
|
||||
|
||||
if (srclen >= NH_MESSAGE_UNIT) {
|
||||
bytes = round_down(srclen, NH_MESSAGE_UNIT);
|
||||
nhpoly1305_units(state, key, src, bytes, nh_fn);
|
||||
src += bytes;
|
||||
srclen -= bytes;
|
||||
}
|
||||
|
||||
if (srclen) {
|
||||
memcpy(state->buffer, src, srclen);
|
||||
state->buflen = srclen;
|
||||
}
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(crypto_nhpoly1305_update_helper);
|
||||
|
||||
int crypto_nhpoly1305_update(struct shash_desc *desc,
|
||||
const u8 *src, unsigned int srclen)
|
||||
{
|
||||
return crypto_nhpoly1305_update_helper(desc, src, srclen, nh_generic);
|
||||
}
|
||||
EXPORT_SYMBOL(crypto_nhpoly1305_update);
|
||||
|
||||
int crypto_nhpoly1305_final_helper(struct shash_desc *desc, u8 *dst, nh_t nh_fn)
|
||||
{
|
||||
struct nhpoly1305_state *state = shash_desc_ctx(desc);
|
||||
const struct nhpoly1305_key *key = crypto_shash_ctx(desc->tfm);
|
||||
|
||||
if (state->buflen) {
|
||||
memset(&state->buffer[state->buflen], 0,
|
||||
NH_MESSAGE_UNIT - state->buflen);
|
||||
nhpoly1305_units(state, key, state->buffer, NH_MESSAGE_UNIT,
|
||||
nh_fn);
|
||||
}
|
||||
|
||||
if (state->nh_remaining)
|
||||
process_nh_hash_value(state, key);
|
||||
|
||||
poly1305_core_emit(&state->poly_state, dst);
|
||||
return 0;
|
||||
}
|
||||
EXPORT_SYMBOL(crypto_nhpoly1305_final_helper);
|
||||
|
||||
int crypto_nhpoly1305_final(struct shash_desc *desc, u8 *dst)
|
||||
{
|
||||
return crypto_nhpoly1305_final_helper(desc, dst, nh_generic);
|
||||
}
|
||||
EXPORT_SYMBOL(crypto_nhpoly1305_final);
|
||||
|
||||
static struct shash_alg nhpoly1305_alg = {
|
||||
.base.cra_name = "nhpoly1305",
|
||||
.base.cra_driver_name = "nhpoly1305-generic",
|
||||
.base.cra_priority = 100,
|
||||
.base.cra_ctxsize = sizeof(struct nhpoly1305_key),
|
||||
.base.cra_module = THIS_MODULE,
|
||||
.digestsize = POLY1305_DIGEST_SIZE,
|
||||
.init = crypto_nhpoly1305_init,
|
||||
.update = crypto_nhpoly1305_update,
|
||||
.final = crypto_nhpoly1305_final,
|
||||
.setkey = crypto_nhpoly1305_setkey,
|
||||
.descsize = sizeof(struct nhpoly1305_state),
|
||||
};
|
||||
|
||||
static int __init nhpoly1305_mod_init(void)
|
||||
{
|
||||
return crypto_register_shash(&nhpoly1305_alg);
|
||||
}
|
||||
|
||||
static void __exit nhpoly1305_mod_exit(void)
|
||||
{
|
||||
crypto_unregister_shash(&nhpoly1305_alg);
|
||||
}
|
||||
|
||||
module_init(nhpoly1305_mod_init);
|
||||
module_exit(nhpoly1305_mod_exit);
|
||||
|
||||
MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function");
|
||||
MODULE_LICENSE("GPL v2");
|
||||
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
|
||||
MODULE_ALIAS_CRYPTO("nhpoly1305");
|
||||
MODULE_ALIAS_CRYPTO("nhpoly1305-generic");
|
@ -394,7 +394,7 @@ static int pcrypt_sysfs_add(struct padata_instance *pinst, const char *name)
|
||||
int ret;
|
||||
|
||||
pinst->kobj.kset = pcrypt_kset;
|
||||
ret = kobject_add(&pinst->kobj, NULL, name);
|
||||
ret = kobject_add(&pinst->kobj, NULL, "%s", name);
|
||||
if (!ret)
|
||||
kobject_uevent(&pinst->kobj, KOBJ_ADD);
|
||||
|
||||
|
@ -38,7 +38,7 @@ int crypto_poly1305_init(struct shash_desc *desc)
|
||||
{
|
||||
struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
|
||||
|
||||
memset(dctx->h, 0, sizeof(dctx->h));
|
||||
poly1305_core_init(&dctx->h);
|
||||
dctx->buflen = 0;
|
||||
dctx->rset = false;
|
||||
dctx->sset = false;
|
||||
@ -47,23 +47,16 @@ int crypto_poly1305_init(struct shash_desc *desc)
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_poly1305_init);
|
||||
|
||||
static void poly1305_setrkey(struct poly1305_desc_ctx *dctx, const u8 *key)
|
||||
void poly1305_core_setkey(struct poly1305_key *key, const u8 *raw_key)
|
||||
{
|
||||
/* r &= 0xffffffc0ffffffc0ffffffc0fffffff */
|
||||
dctx->r[0] = (get_unaligned_le32(key + 0) >> 0) & 0x3ffffff;
|
||||
dctx->r[1] = (get_unaligned_le32(key + 3) >> 2) & 0x3ffff03;
|
||||
dctx->r[2] = (get_unaligned_le32(key + 6) >> 4) & 0x3ffc0ff;
|
||||
dctx->r[3] = (get_unaligned_le32(key + 9) >> 6) & 0x3f03fff;
|
||||
dctx->r[4] = (get_unaligned_le32(key + 12) >> 8) & 0x00fffff;
|
||||
}
|
||||
|
||||
static void poly1305_setskey(struct poly1305_desc_ctx *dctx, const u8 *key)
|
||||
{
|
||||
dctx->s[0] = get_unaligned_le32(key + 0);
|
||||
dctx->s[1] = get_unaligned_le32(key + 4);
|
||||
dctx->s[2] = get_unaligned_le32(key + 8);
|
||||
dctx->s[3] = get_unaligned_le32(key + 12);
|
||||
key->r[0] = (get_unaligned_le32(raw_key + 0) >> 0) & 0x3ffffff;
|
||||
key->r[1] = (get_unaligned_le32(raw_key + 3) >> 2) & 0x3ffff03;
|
||||
key->r[2] = (get_unaligned_le32(raw_key + 6) >> 4) & 0x3ffc0ff;
|
||||
key->r[3] = (get_unaligned_le32(raw_key + 9) >> 6) & 0x3f03fff;
|
||||
key->r[4] = (get_unaligned_le32(raw_key + 12) >> 8) & 0x00fffff;
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(poly1305_core_setkey);
|
||||
|
||||
/*
|
||||
* Poly1305 requires a unique key for each tag, which implies that we can't set
|
||||
@ -75,13 +68,16 @@ unsigned int crypto_poly1305_setdesckey(struct poly1305_desc_ctx *dctx,
|
||||
{
|
||||
if (!dctx->sset) {
|
||||
if (!dctx->rset && srclen >= POLY1305_BLOCK_SIZE) {
|
||||
poly1305_setrkey(dctx, src);
|
||||
poly1305_core_setkey(&dctx->r, src);
|
||||
src += POLY1305_BLOCK_SIZE;
|
||||
srclen -= POLY1305_BLOCK_SIZE;
|
||||
dctx->rset = true;
|
||||
}
|
||||
if (srclen >= POLY1305_BLOCK_SIZE) {
|
||||
poly1305_setskey(dctx, src);
|
||||
dctx->s[0] = get_unaligned_le32(src + 0);
|
||||
dctx->s[1] = get_unaligned_le32(src + 4);
|
||||
dctx->s[2] = get_unaligned_le32(src + 8);
|
||||
dctx->s[3] = get_unaligned_le32(src + 12);
|
||||
src += POLY1305_BLOCK_SIZE;
|
||||
srclen -= POLY1305_BLOCK_SIZE;
|
||||
dctx->sset = true;
|
||||
@ -91,41 +87,37 @@ unsigned int crypto_poly1305_setdesckey(struct poly1305_desc_ctx *dctx,
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_poly1305_setdesckey);
|
||||
|
||||
static unsigned int poly1305_blocks(struct poly1305_desc_ctx *dctx,
|
||||
const u8 *src, unsigned int srclen,
|
||||
u32 hibit)
|
||||
static void poly1305_blocks_internal(struct poly1305_state *state,
|
||||
const struct poly1305_key *key,
|
||||
const void *src, unsigned int nblocks,
|
||||
u32 hibit)
|
||||
{
|
||||
u32 r0, r1, r2, r3, r4;
|
||||
u32 s1, s2, s3, s4;
|
||||
u32 h0, h1, h2, h3, h4;
|
||||
u64 d0, d1, d2, d3, d4;
|
||||
unsigned int datalen;
|
||||
|
||||
if (unlikely(!dctx->sset)) {
|
||||
datalen = crypto_poly1305_setdesckey(dctx, src, srclen);
|
||||
src += srclen - datalen;
|
||||
srclen = datalen;
|
||||
}
|
||||
if (!nblocks)
|
||||
return;
|
||||
|
||||
r0 = dctx->r[0];
|
||||
r1 = dctx->r[1];
|
||||
r2 = dctx->r[2];
|
||||
r3 = dctx->r[3];
|
||||
r4 = dctx->r[4];
|
||||
r0 = key->r[0];
|
||||
r1 = key->r[1];
|
||||
r2 = key->r[2];
|
||||
r3 = key->r[3];
|
||||
r4 = key->r[4];
|
||||
|
||||
s1 = r1 * 5;
|
||||
s2 = r2 * 5;
|
||||
s3 = r3 * 5;
|
||||
s4 = r4 * 5;
|
||||
|
||||
h0 = dctx->h[0];
|
||||
h1 = dctx->h[1];
|
||||
h2 = dctx->h[2];
|
||||
h3 = dctx->h[3];
|
||||
h4 = dctx->h[4];
|
||||
|
||||
while (likely(srclen >= POLY1305_BLOCK_SIZE)) {
|
||||
h0 = state->h[0];
|
||||
h1 = state->h[1];
|
||||
h2 = state->h[2];
|
||||
h3 = state->h[3];
|
||||
h4 = state->h[4];
|
||||
|
||||
do {
|
||||
/* h += m[i] */
|
||||
h0 += (get_unaligned_le32(src + 0) >> 0) & 0x3ffffff;
|
||||
h1 += (get_unaligned_le32(src + 3) >> 2) & 0x3ffffff;
|
||||
@ -154,16 +146,36 @@ static unsigned int poly1305_blocks(struct poly1305_desc_ctx *dctx,
|
||||
h1 += h0 >> 26; h0 = h0 & 0x3ffffff;
|
||||
|
||||
src += POLY1305_BLOCK_SIZE;
|
||||
srclen -= POLY1305_BLOCK_SIZE;
|
||||
} while (--nblocks);
|
||||
|
||||
state->h[0] = h0;
|
||||
state->h[1] = h1;
|
||||
state->h[2] = h2;
|
||||
state->h[3] = h3;
|
||||
state->h[4] = h4;
|
||||
}
|
||||
|
||||
void poly1305_core_blocks(struct poly1305_state *state,
|
||||
const struct poly1305_key *key,
|
||||
const void *src, unsigned int nblocks)
|
||||
{
|
||||
poly1305_blocks_internal(state, key, src, nblocks, 1 << 24);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(poly1305_core_blocks);
|
||||
|
||||
static void poly1305_blocks(struct poly1305_desc_ctx *dctx,
|
||||
const u8 *src, unsigned int srclen, u32 hibit)
|
||||
{
|
||||
unsigned int datalen;
|
||||
|
||||
if (unlikely(!dctx->sset)) {
|
||||
datalen = crypto_poly1305_setdesckey(dctx, src, srclen);
|
||||
src += srclen - datalen;
|
||||
srclen = datalen;
|
||||
}
|
||||
|
||||
dctx->h[0] = h0;
|
||||
dctx->h[1] = h1;
|
||||
dctx->h[2] = h2;
|
||||
dctx->h[3] = h3;
|
||||
dctx->h[4] = h4;
|
||||
|
||||
return srclen;
|
||||
poly1305_blocks_internal(&dctx->h, &dctx->r,
|
||||
src, srclen / POLY1305_BLOCK_SIZE, hibit);
|
||||
}
|
||||
|
||||
int crypto_poly1305_update(struct shash_desc *desc,
|
||||
@ -187,9 +199,9 @@ int crypto_poly1305_update(struct shash_desc *desc,
|
||||
}
|
||||
|
||||
if (likely(srclen >= POLY1305_BLOCK_SIZE)) {
|
||||
bytes = poly1305_blocks(dctx, src, srclen, 1 << 24);
|
||||
src += srclen - bytes;
|
||||
srclen = bytes;
|
||||
poly1305_blocks(dctx, src, srclen, 1 << 24);
|
||||
src += srclen - (srclen % POLY1305_BLOCK_SIZE);
|
||||
srclen %= POLY1305_BLOCK_SIZE;
|
||||
}
|
||||
|
||||
if (unlikely(srclen)) {
|
||||
@ -201,30 +213,18 @@ int crypto_poly1305_update(struct shash_desc *desc,
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(crypto_poly1305_update);
|
||||
|
||||
int crypto_poly1305_final(struct shash_desc *desc, u8 *dst)
|
||||
void poly1305_core_emit(const struct poly1305_state *state, void *dst)
|
||||
{
|
||||
struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
|
||||
u32 h0, h1, h2, h3, h4;
|
||||
u32 g0, g1, g2, g3, g4;
|
||||
u32 mask;
|
||||
u64 f = 0;
|
||||
|
||||
if (unlikely(!dctx->sset))
|
||||
return -ENOKEY;
|
||||
|
||||
if (unlikely(dctx->buflen)) {
|
||||
dctx->buf[dctx->buflen++] = 1;
|
||||
memset(dctx->buf + dctx->buflen, 0,
|
||||
POLY1305_BLOCK_SIZE - dctx->buflen);
|
||||
poly1305_blocks(dctx, dctx->buf, POLY1305_BLOCK_SIZE, 0);
|
||||
}
|
||||
|
||||
/* fully carry h */
|
||||
h0 = dctx->h[0];
|
||||
h1 = dctx->h[1];
|
||||
h2 = dctx->h[2];
|
||||
h3 = dctx->h[3];
|
||||
h4 = dctx->h[4];
|
||||
h0 = state->h[0];
|
||||
h1 = state->h[1];
|
||||
h2 = state->h[2];
|
||||
h3 = state->h[3];
|
||||
h4 = state->h[4];
|
||||
|
||||
h2 += (h1 >> 26); h1 = h1 & 0x3ffffff;
|
||||
h3 += (h2 >> 26); h2 = h2 & 0x3ffffff;
|
||||
@ -254,16 +254,40 @@ int crypto_poly1305_final(struct shash_desc *desc, u8 *dst)
|
||||
h4 = (h4 & mask) | g4;
|
||||
|
||||
/* h = h % (2^128) */
|
||||
h0 = (h0 >> 0) | (h1 << 26);
|
||||
h1 = (h1 >> 6) | (h2 << 20);
|
||||
h2 = (h2 >> 12) | (h3 << 14);
|
||||
h3 = (h3 >> 18) | (h4 << 8);
|
||||
put_unaligned_le32((h0 >> 0) | (h1 << 26), dst + 0);
|
||||
put_unaligned_le32((h1 >> 6) | (h2 << 20), dst + 4);
|
||||
put_unaligned_le32((h2 >> 12) | (h3 << 14), dst + 8);
|
||||
put_unaligned_le32((h3 >> 18) | (h4 << 8), dst + 12);
|
||||
}
|
||||
EXPORT_SYMBOL_GPL(poly1305_core_emit);
|
||||
|
||||
int crypto_poly1305_final(struct shash_desc *desc, u8 *dst)
|
||||
{
|
||||
struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
|
||||
__le32 digest[4];
|
||||
u64 f = 0;
|
||||
|
||||
if (unlikely(!dctx->sset))
|
||||
return -ENOKEY;
|
||||
|
||||
if (unlikely(dctx->buflen)) {
|
||||
dctx->buf[dctx->buflen++] = 1;
|
||||
memset(dctx->buf + dctx->buflen, 0,
|
||||
POLY1305_BLOCK_SIZE - dctx->buflen);
|
||||
poly1305_blocks(dctx, dctx->buf, POLY1305_BLOCK_SIZE, 0);
|
||||
}
|
||||
|
||||
poly1305_core_emit(&dctx->h, digest);
|
||||
|
||||
/* mac = (h + s) % (2^128) */
|
||||
f = (f >> 32) + h0 + dctx->s[0]; put_unaligned_le32(f, dst + 0);
|
||||
f = (f >> 32) + h1 + dctx->s[1]; put_unaligned_le32(f, dst + 4);
|
||||
f = (f >> 32) + h2 + dctx->s[2]; put_unaligned_le32(f, dst + 8);
|
||||
f = (f >> 32) + h3 + dctx->s[3]; put_unaligned_le32(f, dst + 12);
|
||||
f = (f >> 32) + le32_to_cpu(digest[0]) + dctx->s[0];
|
||||
put_unaligned_le32(f, dst + 0);
|
||||
f = (f >> 32) + le32_to_cpu(digest[1]) + dctx->s[1];
|
||||
put_unaligned_le32(f, dst + 4);
|
||||
f = (f >> 32) + le32_to_cpu(digest[2]) + dctx->s[2];
|
||||
put_unaligned_le32(f, dst + 8);
|
||||
f = (f >> 32) + le32_to_cpu(digest[3]) + dctx->s[3];
|
||||
put_unaligned_le32(f, dst + 12);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
16
crypto/rng.c
16
crypto/rng.c
@ -35,9 +35,11 @@ static int crypto_default_rng_refcnt;
|
||||
|
||||
int crypto_rng_reset(struct crypto_rng *tfm, const u8 *seed, unsigned int slen)
|
||||
{
|
||||
struct crypto_alg *alg = tfm->base.__crt_alg;
|
||||
u8 *buf = NULL;
|
||||
int err;
|
||||
|
||||
crypto_stats_get(alg);
|
||||
if (!seed && slen) {
|
||||
buf = kmalloc(slen, GFP_KERNEL);
|
||||
if (!buf)
|
||||
@ -50,7 +52,7 @@ int crypto_rng_reset(struct crypto_rng *tfm, const u8 *seed, unsigned int slen)
|
||||
}
|
||||
|
||||
err = crypto_rng_alg(tfm)->seed(tfm, seed, slen);
|
||||
crypto_stat_rng_seed(tfm, err);
|
||||
crypto_stats_rng_seed(alg, err);
|
||||
out:
|
||||
kzfree(buf);
|
||||
return err;
|
||||
@ -74,17 +76,13 @@ static int crypto_rng_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_report_rng rrng;
|
||||
|
||||
strncpy(rrng.type, "rng", sizeof(rrng.type));
|
||||
memset(&rrng, 0, sizeof(rrng));
|
||||
|
||||
strscpy(rrng.type, "rng", sizeof(rrng.type));
|
||||
|
||||
rrng.seedsize = seedsize(alg);
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_RNG,
|
||||
sizeof(struct crypto_report_rng), &rrng))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_REPORT_RNG, sizeof(rrng), &rrng);
|
||||
}
|
||||
#else
|
||||
static int crypto_rng_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
|
@ -159,7 +159,7 @@ static int salsa20_crypt(struct skcipher_request *req)
|
||||
u32 state[16];
|
||||
int err;
|
||||
|
||||
err = skcipher_walk_virt(&walk, req, true);
|
||||
err = skcipher_walk_virt(&walk, req, false);
|
||||
|
||||
salsa20_init(state, ctx, walk.iv);
|
||||
|
||||
|
@ -40,15 +40,12 @@ static int crypto_scomp_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
{
|
||||
struct crypto_report_comp rscomp;
|
||||
|
||||
strncpy(rscomp.type, "scomp", sizeof(rscomp.type));
|
||||
memset(&rscomp, 0, sizeof(rscomp));
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_COMPRESS,
|
||||
sizeof(struct crypto_report_comp), &rscomp))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
strscpy(rscomp.type, "scomp", sizeof(rscomp.type));
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_REPORT_COMPRESS,
|
||||
sizeof(rscomp), &rscomp);
|
||||
}
|
||||
#else
|
||||
static int crypto_scomp_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
|
@ -408,18 +408,14 @@ static int crypto_shash_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
struct crypto_report_hash rhash;
|
||||
struct shash_alg *salg = __crypto_shash_alg(alg);
|
||||
|
||||
strncpy(rhash.type, "shash", sizeof(rhash.type));
|
||||
memset(&rhash, 0, sizeof(rhash));
|
||||
|
||||
strscpy(rhash.type, "shash", sizeof(rhash.type));
|
||||
|
||||
rhash.blocksize = alg->cra_blocksize;
|
||||
rhash.digestsize = salg->digestsize;
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_HASH,
|
||||
sizeof(struct crypto_report_hash), &rhash))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_REPORT_HASH, sizeof(rhash), &rhash);
|
||||
}
|
||||
#else
|
||||
static int crypto_shash_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
|
@ -474,6 +474,8 @@ int skcipher_walk_virt(struct skcipher_walk *walk,
|
||||
{
|
||||
int err;
|
||||
|
||||
might_sleep_if(req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP);
|
||||
|
||||
walk->flags &= ~SKCIPHER_WALK_PHYS;
|
||||
|
||||
err = skcipher_walk_skcipher(walk, req);
|
||||
@ -577,8 +579,7 @@ static unsigned int crypto_skcipher_extsize(struct crypto_alg *alg)
|
||||
if (alg->cra_type == &crypto_blkcipher_type)
|
||||
return sizeof(struct crypto_blkcipher *);
|
||||
|
||||
if (alg->cra_type == &crypto_ablkcipher_type ||
|
||||
alg->cra_type == &crypto_givcipher_type)
|
||||
if (alg->cra_type == &crypto_ablkcipher_type)
|
||||
return sizeof(struct crypto_ablkcipher *);
|
||||
|
||||
return crypto_alg_extsize(alg);
|
||||
@ -842,8 +843,7 @@ static int crypto_skcipher_init_tfm(struct crypto_tfm *tfm)
|
||||
if (tfm->__crt_alg->cra_type == &crypto_blkcipher_type)
|
||||
return crypto_init_skcipher_ops_blkcipher(tfm);
|
||||
|
||||
if (tfm->__crt_alg->cra_type == &crypto_ablkcipher_type ||
|
||||
tfm->__crt_alg->cra_type == &crypto_givcipher_type)
|
||||
if (tfm->__crt_alg->cra_type == &crypto_ablkcipher_type)
|
||||
return crypto_init_skcipher_ops_ablkcipher(tfm);
|
||||
|
||||
skcipher->setkey = skcipher_setkey;
|
||||
@ -897,21 +897,18 @@ static int crypto_skcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
struct skcipher_alg *skcipher = container_of(alg, struct skcipher_alg,
|
||||
base);
|
||||
|
||||
strncpy(rblkcipher.type, "skcipher", sizeof(rblkcipher.type));
|
||||
strncpy(rblkcipher.geniv, "<none>", sizeof(rblkcipher.geniv));
|
||||
memset(&rblkcipher, 0, sizeof(rblkcipher));
|
||||
|
||||
strscpy(rblkcipher.type, "skcipher", sizeof(rblkcipher.type));
|
||||
strscpy(rblkcipher.geniv, "<none>", sizeof(rblkcipher.geniv));
|
||||
|
||||
rblkcipher.blocksize = alg->cra_blocksize;
|
||||
rblkcipher.min_keysize = skcipher->min_keysize;
|
||||
rblkcipher.max_keysize = skcipher->max_keysize;
|
||||
rblkcipher.ivsize = skcipher->ivsize;
|
||||
|
||||
if (nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER,
|
||||
sizeof(struct crypto_report_blkcipher), &rblkcipher))
|
||||
goto nla_put_failure;
|
||||
return 0;
|
||||
|
||||
nla_put_failure:
|
||||
return -EMSGSIZE;
|
||||
return nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER,
|
||||
sizeof(rblkcipher), &rblkcipher);
|
||||
}
|
||||
#else
|
||||
static int crypto_skcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
|
||||
|
1140
crypto/streebog_generic.c
Normal file
1140
crypto/streebog_generic.c
Normal file
File diff suppressed because it is too large
Load Diff
@ -76,10 +76,12 @@ static char *check[] = {
|
||||
"cast6", "arc4", "michael_mic", "deflate", "crc32c", "tea", "xtea",
|
||||
"khazad", "wp512", "wp384", "wp256", "tnepres", "xeta", "fcrypt",
|
||||
"camellia", "seed", "salsa20", "rmd128", "rmd160", "rmd256", "rmd320",
|
||||
"lzo", "cts", "sha3-224", "sha3-256", "sha3-384", "sha3-512", NULL
|
||||
"lzo", "cts", "sha3-224", "sha3-256", "sha3-384", "sha3-512",
|
||||
"streebog256", "streebog512",
|
||||
NULL
|
||||
};
|
||||
|
||||
static u32 block_sizes[] = { 16, 64, 256, 1024, 8192, 0 };
|
||||
static u32 block_sizes[] = { 16, 64, 256, 1024, 1472, 8192, 0 };
|
||||
static u32 aead_sizes[] = { 16, 64, 256, 512, 1024, 2048, 4096, 8192, 0 };
|
||||
|
||||
#define XBUFSIZE 8
|
||||
@ -1736,6 +1738,7 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
|
||||
ret += tcrypt_test("ctr(aes)");
|
||||
ret += tcrypt_test("rfc3686(ctr(aes))");
|
||||
ret += tcrypt_test("ofb(aes)");
|
||||
ret += tcrypt_test("cfb(aes)");
|
||||
break;
|
||||
|
||||
case 11:
|
||||
@ -1913,6 +1916,14 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
|
||||
ret += tcrypt_test("sm3");
|
||||
break;
|
||||
|
||||
case 53:
|
||||
ret += tcrypt_test("streebog256");
|
||||
break;
|
||||
|
||||
case 54:
|
||||
ret += tcrypt_test("streebog512");
|
||||
break;
|
||||
|
||||
case 100:
|
||||
ret += tcrypt_test("hmac(md5)");
|
||||
break;
|
||||
@ -1969,6 +1980,14 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
|
||||
ret += tcrypt_test("hmac(sha3-512)");
|
||||
break;
|
||||
|
||||
case 115:
|
||||
ret += tcrypt_test("hmac(streebog256)");
|
||||
break;
|
||||
|
||||
case 116:
|
||||
ret += tcrypt_test("hmac(streebog512)");
|
||||
break;
|
||||
|
||||
case 150:
|
||||
ret += tcrypt_test("ansi_cprng");
|
||||
break;
|
||||
@ -2060,6 +2079,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
|
||||
speed_template_16_24_32);
|
||||
test_cipher_speed("ctr(aes)", DECRYPT, sec, NULL, 0,
|
||||
speed_template_16_24_32);
|
||||
test_cipher_speed("cfb(aes)", ENCRYPT, sec, NULL, 0,
|
||||
speed_template_16_24_32);
|
||||
test_cipher_speed("cfb(aes)", DECRYPT, sec, NULL, 0,
|
||||
speed_template_16_24_32);
|
||||
break;
|
||||
|
||||
case 201:
|
||||
@ -2297,6 +2320,18 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
|
||||
test_cipher_speed("ctr(sm4)", DECRYPT, sec, NULL, 0,
|
||||
speed_template_16);
|
||||
break;
|
||||
|
||||
case 219:
|
||||
test_cipher_speed("adiantum(xchacha12,aes)", ENCRYPT, sec, NULL,
|
||||
0, speed_template_32);
|
||||
test_cipher_speed("adiantum(xchacha12,aes)", DECRYPT, sec, NULL,
|
||||
0, speed_template_32);
|
||||
test_cipher_speed("adiantum(xchacha20,aes)", ENCRYPT, sec, NULL,
|
||||
0, speed_template_32);
|
||||
test_cipher_speed("adiantum(xchacha20,aes)", DECRYPT, sec, NULL,
|
||||
0, speed_template_32);
|
||||
break;
|
||||
|
||||
case 300:
|
||||
if (alg) {
|
||||
test_hash_speed(alg, sec, generic_hash_speed_template);
|
||||
@ -2407,6 +2442,16 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
|
||||
test_hash_speed("sm3", sec, generic_hash_speed_template);
|
||||
if (mode > 300 && mode < 400) break;
|
||||
/* fall through */
|
||||
case 327:
|
||||
test_hash_speed("streebog256", sec,
|
||||
generic_hash_speed_template);
|
||||
if (mode > 300 && mode < 400) break;
|
||||
/* fall through */
|
||||
case 328:
|
||||
test_hash_speed("streebog512", sec,
|
||||
generic_hash_speed_template);
|
||||
if (mode > 300 && mode < 400) break;
|
||||
/* fall through */
|
||||
case 399:
|
||||
break;
|
||||
|
||||
@ -2520,6 +2565,16 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
|
||||
num_mb);
|
||||
if (mode > 400 && mode < 500) break;
|
||||
/* fall through */
|
||||
case 426:
|
||||
test_mb_ahash_speed("streebog256", sec,
|
||||
generic_hash_speed_template, num_mb);
|
||||
if (mode > 400 && mode < 500) break;
|
||||
/* fall through */
|
||||
case 427:
|
||||
test_mb_ahash_speed("streebog512", sec,
|
||||
generic_hash_speed_template, num_mb);
|
||||
if (mode > 400 && mode < 500) break;
|
||||
/* fall through */
|
||||
case 499:
|
||||
break;
|
||||
|
||||
|
@ -2404,6 +2404,18 @@ static int alg_test_null(const struct alg_test_desc *desc,
|
||||
/* Please keep this list sorted by algorithm name. */
|
||||
static const struct alg_test_desc alg_test_descs[] = {
|
||||
{
|
||||
.alg = "adiantum(xchacha12,aes)",
|
||||
.test = alg_test_skcipher,
|
||||
.suite = {
|
||||
.cipher = __VECS(adiantum_xchacha12_aes_tv_template)
|
||||
},
|
||||
}, {
|
||||
.alg = "adiantum(xchacha20,aes)",
|
||||
.test = alg_test_skcipher,
|
||||
.suite = {
|
||||
.cipher = __VECS(adiantum_xchacha20_aes_tv_template)
|
||||
},
|
||||
}, {
|
||||
.alg = "aegis128",
|
||||
.test = alg_test_aead,
|
||||
.suite = {
|
||||
@ -2690,6 +2702,13 @@ static const struct alg_test_desc alg_test_descs[] = {
|
||||
.dec = __VECS(aes_ccm_dec_tv_template)
|
||||
}
|
||||
}
|
||||
}, {
|
||||
.alg = "cfb(aes)",
|
||||
.test = alg_test_skcipher,
|
||||
.fips_allowed = 1,
|
||||
.suite = {
|
||||
.cipher = __VECS(aes_cfb_tv_template)
|
||||
},
|
||||
}, {
|
||||
.alg = "chacha20",
|
||||
.test = alg_test_skcipher,
|
||||
@ -2805,6 +2824,7 @@ static const struct alg_test_desc alg_test_descs[] = {
|
||||
}, {
|
||||
.alg = "cts(cbc(aes))",
|
||||
.test = alg_test_skcipher,
|
||||
.fips_allowed = 1,
|
||||
.suite = {
|
||||
.cipher = __VECS(cts_mode_tv_template)
|
||||
}
|
||||
@ -3184,6 +3204,18 @@ static const struct alg_test_desc alg_test_descs[] = {
|
||||
.suite = {
|
||||
.hash = __VECS(hmac_sha512_tv_template)
|
||||
}
|
||||
}, {
|
||||
.alg = "hmac(streebog256)",
|
||||
.test = alg_test_hash,
|
||||
.suite = {
|
||||
.hash = __VECS(hmac_streebog256_tv_template)
|
||||
}
|
||||
}, {
|
||||
.alg = "hmac(streebog512)",
|
||||
.test = alg_test_hash,
|
||||
.suite = {
|
||||
.hash = __VECS(hmac_streebog512_tv_template)
|
||||
}
|
||||
}, {
|
||||
.alg = "jitterentropy_rng",
|
||||
.fips_allowed = 1,
|
||||
@ -3291,6 +3323,12 @@ static const struct alg_test_desc alg_test_descs[] = {
|
||||
.dec = __VECS(morus640_dec_tv_template),
|
||||
}
|
||||
}
|
||||
}, {
|
||||
.alg = "nhpoly1305",
|
||||
.test = alg_test_hash,
|
||||
.suite = {
|
||||
.hash = __VECS(nhpoly1305_tv_template)
|
||||
}
|
||||
}, {
|
||||
.alg = "ofb(aes)",
|
||||
.test = alg_test_skcipher,
|
||||
@ -3496,6 +3534,18 @@ static const struct alg_test_desc alg_test_descs[] = {
|
||||
.suite = {
|
||||
.hash = __VECS(sm3_tv_template)
|
||||
}
|
||||
}, {
|
||||
.alg = "streebog256",
|
||||
.test = alg_test_hash,
|
||||
.suite = {
|
||||
.hash = __VECS(streebog256_tv_template)
|
||||
}
|
||||
}, {
|
||||
.alg = "streebog512",
|
||||
.test = alg_test_hash,
|
||||
.suite = {
|
||||
.hash = __VECS(streebog512_tv_template)
|
||||
}
|
||||
}, {
|
||||
.alg = "tgr128",
|
||||
.test = alg_test_hash,
|
||||
@ -3544,6 +3594,18 @@ static const struct alg_test_desc alg_test_descs[] = {
|
||||
.suite = {
|
||||
.hash = __VECS(aes_xcbc128_tv_template)
|
||||
}
|
||||
}, {
|
||||
.alg = "xchacha12",
|
||||
.test = alg_test_skcipher,
|
||||
.suite = {
|
||||
.cipher = __VECS(xchacha12_tv_template)
|
||||
},
|
||||
}, {
|
||||
.alg = "xchacha20",
|
||||
.test = alg_test_skcipher,
|
||||
.suite = {
|
||||
.cipher = __VECS(xchacha20_tv_template)
|
||||
},
|
||||
}, {
|
||||
.alg = "xts(aes)",
|
||||
.test = alg_test_skcipher,
|
||||
|
3220
crypto/testmgr.h
3220
crypto/testmgr.h
File diff suppressed because it is too large
Load Diff
@ -3623,7 +3623,7 @@ static int receive_protocol(struct drbd_connection *connection, struct packet_in
|
||||
* change.
|
||||
*/
|
||||
|
||||
peer_integrity_tfm = crypto_alloc_shash(integrity_alg, 0, CRYPTO_ALG_ASYNC);
|
||||
peer_integrity_tfm = crypto_alloc_shash(integrity_alg, 0, 0);
|
||||
if (IS_ERR(peer_integrity_tfm)) {
|
||||
peer_integrity_tfm = NULL;
|
||||
drbd_err(connection, "peer data-integrity-alg %s not supported\n",
|
||||
|
@ -1,10 +1,7 @@
|
||||
/**
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
/*
|
||||
* Copyright (c) 2010-2012 Broadcom. All rights reserved.
|
||||
* Copyright (c) 2013 Lubomir Rintel
|
||||
*
|
||||
* This program is free software; you can redistribute it and/or
|
||||
* modify it under the terms of the GNU General Public License ("GPL")
|
||||
* version 2, as published by the Free Software Foundation.
|
||||
*/
|
||||
|
||||
#include <linux/hw_random.h>
|
||||
|
@ -265,7 +265,7 @@
|
||||
#include <linux/syscalls.h>
|
||||
#include <linux/completion.h>
|
||||
#include <linux/uuid.h>
|
||||
#include <crypto/chacha20.h>
|
||||
#include <crypto/chacha.h>
|
||||
|
||||
#include <asm/processor.h>
|
||||
#include <linux/uaccess.h>
|
||||
@ -431,11 +431,10 @@ static int crng_init = 0;
|
||||
#define crng_ready() (likely(crng_init > 1))
|
||||
static int crng_init_cnt = 0;
|
||||
static unsigned long crng_global_init_time = 0;
|
||||
#define CRNG_INIT_CNT_THRESH (2*CHACHA20_KEY_SIZE)
|
||||
static void _extract_crng(struct crng_state *crng,
|
||||
__u8 out[CHACHA20_BLOCK_SIZE]);
|
||||
#define CRNG_INIT_CNT_THRESH (2*CHACHA_KEY_SIZE)
|
||||
static void _extract_crng(struct crng_state *crng, __u8 out[CHACHA_BLOCK_SIZE]);
|
||||
static void _crng_backtrack_protect(struct crng_state *crng,
|
||||
__u8 tmp[CHACHA20_BLOCK_SIZE], int used);
|
||||
__u8 tmp[CHACHA_BLOCK_SIZE], int used);
|
||||
static void process_random_ready_list(void);
|
||||
static void _get_random_bytes(void *buf, int nbytes);
|
||||
|
||||
@ -863,7 +862,7 @@ static int crng_fast_load(const char *cp, size_t len)
|
||||
}
|
||||
p = (unsigned char *) &primary_crng.state[4];
|
||||
while (len > 0 && crng_init_cnt < CRNG_INIT_CNT_THRESH) {
|
||||
p[crng_init_cnt % CHACHA20_KEY_SIZE] ^= *cp;
|
||||
p[crng_init_cnt % CHACHA_KEY_SIZE] ^= *cp;
|
||||
cp++; crng_init_cnt++; len--;
|
||||
}
|
||||
spin_unlock_irqrestore(&primary_crng.lock, flags);
|
||||
@ -895,7 +894,7 @@ static int crng_slow_load(const char *cp, size_t len)
|
||||
unsigned long flags;
|
||||
static unsigned char lfsr = 1;
|
||||
unsigned char tmp;
|
||||
unsigned i, max = CHACHA20_KEY_SIZE;
|
||||
unsigned i, max = CHACHA_KEY_SIZE;
|
||||
const char * src_buf = cp;
|
||||
char * dest_buf = (char *) &primary_crng.state[4];
|
||||
|
||||
@ -913,8 +912,8 @@ static int crng_slow_load(const char *cp, size_t len)
|
||||
lfsr >>= 1;
|
||||
if (tmp & 1)
|
||||
lfsr ^= 0xE1;
|
||||
tmp = dest_buf[i % CHACHA20_KEY_SIZE];
|
||||
dest_buf[i % CHACHA20_KEY_SIZE] ^= src_buf[i % len] ^ lfsr;
|
||||
tmp = dest_buf[i % CHACHA_KEY_SIZE];
|
||||
dest_buf[i % CHACHA_KEY_SIZE] ^= src_buf[i % len] ^ lfsr;
|
||||
lfsr += (tmp << 3) | (tmp >> 5);
|
||||
}
|
||||
spin_unlock_irqrestore(&primary_crng.lock, flags);
|
||||
@ -926,7 +925,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
|
||||
unsigned long flags;
|
||||
int i, num;
|
||||
union {
|
||||
__u8 block[CHACHA20_BLOCK_SIZE];
|
||||
__u8 block[CHACHA_BLOCK_SIZE];
|
||||
__u32 key[8];
|
||||
} buf;
|
||||
|
||||
@ -937,7 +936,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
|
||||
} else {
|
||||
_extract_crng(&primary_crng, buf.block);
|
||||
_crng_backtrack_protect(&primary_crng, buf.block,
|
||||
CHACHA20_KEY_SIZE);
|
||||
CHACHA_KEY_SIZE);
|
||||
}
|
||||
spin_lock_irqsave(&crng->lock, flags);
|
||||
for (i = 0; i < 8; i++) {
|
||||
@ -973,7 +972,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
|
||||
}
|
||||
|
||||
static void _extract_crng(struct crng_state *crng,
|
||||
__u8 out[CHACHA20_BLOCK_SIZE])
|
||||
__u8 out[CHACHA_BLOCK_SIZE])
|
||||
{
|
||||
unsigned long v, flags;
|
||||
|
||||
@ -990,7 +989,7 @@ static void _extract_crng(struct crng_state *crng,
|
||||
spin_unlock_irqrestore(&crng->lock, flags);
|
||||
}
|
||||
|
||||
static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
|
||||
static void extract_crng(__u8 out[CHACHA_BLOCK_SIZE])
|
||||
{
|
||||
struct crng_state *crng = NULL;
|
||||
|
||||
@ -1008,14 +1007,14 @@ static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
|
||||
* enough) to mutate the CRNG key to provide backtracking protection.
|
||||
*/
|
||||
static void _crng_backtrack_protect(struct crng_state *crng,
|
||||
__u8 tmp[CHACHA20_BLOCK_SIZE], int used)
|
||||
__u8 tmp[CHACHA_BLOCK_SIZE], int used)
|
||||
{
|
||||
unsigned long flags;
|
||||
__u32 *s, *d;
|
||||
int i;
|
||||
|
||||
used = round_up(used, sizeof(__u32));
|
||||
if (used + CHACHA20_KEY_SIZE > CHACHA20_BLOCK_SIZE) {
|
||||
if (used + CHACHA_KEY_SIZE > CHACHA_BLOCK_SIZE) {
|
||||
extract_crng(tmp);
|
||||
used = 0;
|
||||
}
|
||||
@ -1027,7 +1026,7 @@ static void _crng_backtrack_protect(struct crng_state *crng,
|
||||
spin_unlock_irqrestore(&crng->lock, flags);
|
||||
}
|
||||
|
||||
static void crng_backtrack_protect(__u8 tmp[CHACHA20_BLOCK_SIZE], int used)
|
||||
static void crng_backtrack_protect(__u8 tmp[CHACHA_BLOCK_SIZE], int used)
|
||||
{
|
||||
struct crng_state *crng = NULL;
|
||||
|
||||
@ -1042,8 +1041,8 @@ static void crng_backtrack_protect(__u8 tmp[CHACHA20_BLOCK_SIZE], int used)
|
||||
|
||||
static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
|
||||
{
|
||||
ssize_t ret = 0, i = CHACHA20_BLOCK_SIZE;
|
||||
__u8 tmp[CHACHA20_BLOCK_SIZE] __aligned(4);
|
||||
ssize_t ret = 0, i = CHACHA_BLOCK_SIZE;
|
||||
__u8 tmp[CHACHA_BLOCK_SIZE] __aligned(4);
|
||||
int large_request = (nbytes > 256);
|
||||
|
||||
while (nbytes) {
|
||||
@ -1057,7 +1056,7 @@ static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
|
||||
}
|
||||
|
||||
extract_crng(tmp);
|
||||
i = min_t(int, nbytes, CHACHA20_BLOCK_SIZE);
|
||||
i = min_t(int, nbytes, CHACHA_BLOCK_SIZE);
|
||||
if (copy_to_user(buf, tmp, i)) {
|
||||
ret = -EFAULT;
|
||||
break;
|
||||
@ -1622,14 +1621,14 @@ static void _warn_unseeded_randomness(const char *func_name, void *caller,
|
||||
*/
|
||||
static void _get_random_bytes(void *buf, int nbytes)
|
||||
{
|
||||
__u8 tmp[CHACHA20_BLOCK_SIZE] __aligned(4);
|
||||
__u8 tmp[CHACHA_BLOCK_SIZE] __aligned(4);
|
||||
|
||||
trace_get_random_bytes(nbytes, _RET_IP_);
|
||||
|
||||
while (nbytes >= CHACHA20_BLOCK_SIZE) {
|
||||
while (nbytes >= CHACHA_BLOCK_SIZE) {
|
||||
extract_crng(buf);
|
||||
buf += CHACHA20_BLOCK_SIZE;
|
||||
nbytes -= CHACHA20_BLOCK_SIZE;
|
||||
buf += CHACHA_BLOCK_SIZE;
|
||||
nbytes -= CHACHA_BLOCK_SIZE;
|
||||
}
|
||||
|
||||
if (nbytes > 0) {
|
||||
@ -1637,7 +1636,7 @@ static void _get_random_bytes(void *buf, int nbytes)
|
||||
memcpy(buf, tmp, nbytes);
|
||||
crng_backtrack_protect(tmp, nbytes);
|
||||
} else
|
||||
crng_backtrack_protect(tmp, CHACHA20_BLOCK_SIZE);
|
||||
crng_backtrack_protect(tmp, CHACHA_BLOCK_SIZE);
|
||||
memzero_explicit(tmp, sizeof(tmp));
|
||||
}
|
||||
|
||||
@ -2208,8 +2207,8 @@ struct ctl_table random_table[] = {
|
||||
|
||||
struct batched_entropy {
|
||||
union {
|
||||
u64 entropy_u64[CHACHA20_BLOCK_SIZE / sizeof(u64)];
|
||||
u32 entropy_u32[CHACHA20_BLOCK_SIZE / sizeof(u32)];
|
||||
u64 entropy_u64[CHACHA_BLOCK_SIZE / sizeof(u64)];
|
||||
u32 entropy_u32[CHACHA_BLOCK_SIZE / sizeof(u32)];
|
||||
};
|
||||
unsigned int position;
|
||||
};
|
||||
|
@ -762,10 +762,12 @@ config CRYPTO_DEV_CCREE
|
||||
select CRYPTO_ECB
|
||||
select CRYPTO_CTR
|
||||
select CRYPTO_XTS
|
||||
select CRYPTO_SM4
|
||||
select CRYPTO_SM3
|
||||
help
|
||||
Say 'Y' to enable a driver for the REE interface of the Arm
|
||||
TrustZone CryptoCell family of processors. Currently the
|
||||
CryptoCell 712, 710 and 630 are supported.
|
||||
CryptoCell 713, 703, 712, 710 and 630 are supported.
|
||||
Choose this if you wish to use hardware acceleration of
|
||||
cryptographic operations on the system REE.
|
||||
If unsure say Y.
|
||||
|
@ -520,8 +520,7 @@ static int crypto4xx_compute_gcm_hash_key_sw(__le32 *hash_start, const u8 *key,
|
||||
uint8_t src[16] = { 0 };
|
||||
int rc = 0;
|
||||
|
||||
aes_tfm = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_ASYNC |
|
||||
CRYPTO_ALG_NEED_FALLBACK);
|
||||
aes_tfm = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_NEED_FALLBACK);
|
||||
if (IS_ERR(aes_tfm)) {
|
||||
rc = PTR_ERR(aes_tfm);
|
||||
pr_warn("could not load aes cipher driver: %d\n", rc);
|
||||
|
@ -3868,7 +3868,6 @@ static struct iproc_alg_s driver_algs[] = {
|
||||
.cra_driver_name = "ctr-aes-iproc",
|
||||
.cra_blocksize = AES_BLOCK_SIZE,
|
||||
.cra_ablkcipher = {
|
||||
/* .geniv = "chainiv", */
|
||||
.min_keysize = AES_MIN_KEY_SIZE,
|
||||
.max_keysize = AES_MAX_KEY_SIZE,
|
||||
.ivsize = AES_BLOCK_SIZE,
|
||||
@ -4605,7 +4604,6 @@ static int spu_register_ablkcipher(struct iproc_alg_s *driver_alg)
|
||||
crypto->cra_priority = cipher_pri;
|
||||
crypto->cra_alignmask = 0;
|
||||
crypto->cra_ctxsize = sizeof(struct iproc_ctx_s);
|
||||
INIT_LIST_HEAD(&crypto->cra_list);
|
||||
|
||||
crypto->cra_init = ablkcipher_cra_init;
|
||||
crypto->cra_exit = generic_cra_exit;
|
||||
@ -4652,12 +4650,16 @@ static int spu_register_ahash(struct iproc_alg_s *driver_alg)
|
||||
hash->halg.statesize = sizeof(struct spu_hash_export_s);
|
||||
|
||||
if (driver_alg->auth_info.mode != HASH_MODE_HMAC) {
|
||||
hash->setkey = ahash_setkey;
|
||||
hash->init = ahash_init;
|
||||
hash->update = ahash_update;
|
||||
hash->final = ahash_final;
|
||||
hash->finup = ahash_finup;
|
||||
hash->digest = ahash_digest;
|
||||
if ((driver_alg->auth_info.alg == HASH_ALG_AES) &&
|
||||
((driver_alg->auth_info.mode == HASH_MODE_XCBC) ||
|
||||
(driver_alg->auth_info.mode == HASH_MODE_CMAC))) {
|
||||
hash->setkey = ahash_setkey;
|
||||
}
|
||||
} else {
|
||||
hash->setkey = ahash_hmac_setkey;
|
||||
hash->init = ahash_hmac_init;
|
||||
@ -4687,7 +4689,6 @@ static int spu_register_aead(struct iproc_alg_s *driver_alg)
|
||||
aead->base.cra_priority = aead_pri;
|
||||
aead->base.cra_alignmask = 0;
|
||||
aead->base.cra_ctxsize = sizeof(struct iproc_ctx_s);
|
||||
INIT_LIST_HEAD(&aead->base.cra_list);
|
||||
|
||||
aead->base.cra_flags |= CRYPTO_ALG_ASYNC;
|
||||
/* setkey set in alg initialization */
|
||||
|
@ -72,6 +72,8 @@
|
||||
#define AUTHENC_DESC_JOB_IO_LEN (AEAD_DESC_JOB_IO_LEN + \
|
||||
CAAM_CMD_SZ * 5)
|
||||
|
||||
#define CHACHAPOLY_DESC_JOB_IO_LEN (AEAD_DESC_JOB_IO_LEN + CAAM_CMD_SZ * 6)
|
||||
|
||||
#define DESC_MAX_USED_BYTES (CAAM_DESC_BYTES_MAX - DESC_JOB_IO_LEN)
|
||||
#define DESC_MAX_USED_LEN (DESC_MAX_USED_BYTES / CAAM_CMD_SZ)
|
||||
|
||||
@ -513,6 +515,61 @@ static int rfc4543_setauthsize(struct crypto_aead *authenc,
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int chachapoly_set_sh_desc(struct crypto_aead *aead)
|
||||
{
|
||||
struct caam_ctx *ctx = crypto_aead_ctx(aead);
|
||||
struct device *jrdev = ctx->jrdev;
|
||||
unsigned int ivsize = crypto_aead_ivsize(aead);
|
||||
u32 *desc;
|
||||
|
||||
if (!ctx->cdata.keylen || !ctx->authsize)
|
||||
return 0;
|
||||
|
||||
desc = ctx->sh_desc_enc;
|
||||
cnstr_shdsc_chachapoly(desc, &ctx->cdata, &ctx->adata, ivsize,
|
||||
ctx->authsize, true, false);
|
||||
dma_sync_single_for_device(jrdev, ctx->sh_desc_enc_dma,
|
||||
desc_bytes(desc), ctx->dir);
|
||||
|
||||
desc = ctx->sh_desc_dec;
|
||||
cnstr_shdsc_chachapoly(desc, &ctx->cdata, &ctx->adata, ivsize,
|
||||
ctx->authsize, false, false);
|
||||
dma_sync_single_for_device(jrdev, ctx->sh_desc_dec_dma,
|
||||
desc_bytes(desc), ctx->dir);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int chachapoly_setauthsize(struct crypto_aead *aead,
|
||||
unsigned int authsize)
|
||||
{
|
||||
struct caam_ctx *ctx = crypto_aead_ctx(aead);
|
||||
|
||||
if (authsize != POLY1305_DIGEST_SIZE)
|
||||
return -EINVAL;
|
||||
|
||||
ctx->authsize = authsize;
|
||||
return chachapoly_set_sh_desc(aead);
|
||||
}
|
||||
|
||||
static int chachapoly_setkey(struct crypto_aead *aead, const u8 *key,
|
||||
unsigned int keylen)
|
||||
{
|
||||
struct caam_ctx *ctx = crypto_aead_ctx(aead);
|
||||
unsigned int ivsize = crypto_aead_ivsize(aead);
|
||||
unsigned int saltlen = CHACHAPOLY_IV_SIZE - ivsize;
|
||||
|
||||
if (keylen != CHACHA_KEY_SIZE + saltlen) {
|
||||
crypto_aead_set_flags(aead, CRYPTO_TFM_RES_BAD_KEY_LEN);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
ctx->cdata.key_virt = key;
|
||||
ctx->cdata.keylen = keylen - saltlen;
|
||||
|
||||
return chachapoly_set_sh_desc(aead);
|
||||
}
|
||||
|
||||
static int aead_setkey(struct crypto_aead *aead,
|
||||
const u8 *key, unsigned int keylen)
|
||||
{
|
||||
@ -1031,6 +1088,40 @@ static void init_gcm_job(struct aead_request *req,
|
||||
/* End of blank commands */
|
||||
}
|
||||
|
||||
static void init_chachapoly_job(struct aead_request *req,
|
||||
struct aead_edesc *edesc, bool all_contig,
|
||||
bool encrypt)
|
||||
{
|
||||
struct crypto_aead *aead = crypto_aead_reqtfm(req);
|
||||
unsigned int ivsize = crypto_aead_ivsize(aead);
|
||||
unsigned int assoclen = req->assoclen;
|
||||
u32 *desc = edesc->hw_desc;
|
||||
u32 ctx_iv_off = 4;
|
||||
|
||||
init_aead_job(req, edesc, all_contig, encrypt);
|
||||
|
||||
if (ivsize != CHACHAPOLY_IV_SIZE) {
|
||||
/* IPsec specific: CONTEXT1[223:128] = {NONCE, IV} */
|
||||
ctx_iv_off += 4;
|
||||
|
||||
/*
|
||||
* The associated data comes already with the IV but we need
|
||||
* to skip it when we authenticate or encrypt...
|
||||
*/
|
||||
assoclen -= ivsize;
|
||||
}
|
||||
|
||||
append_math_add_imm_u32(desc, REG3, ZERO, IMM, assoclen);
|
||||
|
||||
/*
|
||||
* For IPsec load the IV further in the same register.
|
||||
* For RFC7539 simply load the 12 bytes nonce in a single operation
|
||||
*/
|
||||
append_load_as_imm(desc, req->iv, ivsize, LDST_CLASS_1_CCB |
|
||||
LDST_SRCDST_BYTE_CONTEXT |
|
||||
ctx_iv_off << LDST_OFFSET_SHIFT);
|
||||
}
|
||||
|
||||
static void init_authenc_job(struct aead_request *req,
|
||||
struct aead_edesc *edesc,
|
||||
bool all_contig, bool encrypt)
|
||||
@ -1289,6 +1380,72 @@ static int gcm_encrypt(struct aead_request *req)
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int chachapoly_encrypt(struct aead_request *req)
|
||||
{
|
||||
struct aead_edesc *edesc;
|
||||
struct crypto_aead *aead = crypto_aead_reqtfm(req);
|
||||
struct caam_ctx *ctx = crypto_aead_ctx(aead);
|
||||
struct device *jrdev = ctx->jrdev;
|
||||
bool all_contig;
|
||||
u32 *desc;
|
||||
int ret;
|
||||
|
||||
edesc = aead_edesc_alloc(req, CHACHAPOLY_DESC_JOB_IO_LEN, &all_contig,
|
||||
true);
|
||||
if (IS_ERR(edesc))
|
||||
return PTR_ERR(edesc);
|
||||
|
||||
desc = edesc->hw_desc;
|
||||
|
||||
init_chachapoly_job(req, edesc, all_contig, true);
|
||||
print_hex_dump_debug("chachapoly jobdesc@" __stringify(__LINE__)": ",
|
||||
DUMP_PREFIX_ADDRESS, 16, 4, desc, desc_bytes(desc),
|
||||
1);
|
||||
|
||||
ret = caam_jr_enqueue(jrdev, desc, aead_encrypt_done, req);
|
||||
if (!ret) {
|
||||
ret = -EINPROGRESS;
|
||||
} else {
|
||||
aead_unmap(jrdev, edesc, req);
|
||||
kfree(edesc);
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int chachapoly_decrypt(struct aead_request *req)
|
||||
{
|
||||
struct aead_edesc *edesc;
|
||||
struct crypto_aead *aead = crypto_aead_reqtfm(req);
|
||||
struct caam_ctx *ctx = crypto_aead_ctx(aead);
|
||||
struct device *jrdev = ctx->jrdev;
|
||||
bool all_contig;
|
||||
u32 *desc;
|
||||
int ret;
|
||||
|
||||
edesc = aead_edesc_alloc(req, CHACHAPOLY_DESC_JOB_IO_LEN, &all_contig,
|
||||
false);
|
||||
if (IS_ERR(edesc))
|
||||
return PTR_ERR(edesc);
|
||||
|
||||
desc = edesc->hw_desc;
|
||||
|
||||
init_chachapoly_job(req, edesc, all_contig, false);
|
||||
print_hex_dump_debug("chachapoly jobdesc@" __stringify(__LINE__)": ",
|
||||
DUMP_PREFIX_ADDRESS, 16, 4, desc, desc_bytes(desc),
|
||||
1);
|
||||
|
||||
ret = caam_jr_enqueue(jrdev, desc, aead_decrypt_done, req);
|
||||
if (!ret) {
|
||||
ret = -EINPROGRESS;
|
||||
} else {
|
||||
aead_unmap(jrdev, edesc, req);
|
||||
kfree(edesc);
|
||||
}
|
||||
|
||||
return ret;
|
||||
}
|
||||
|
||||
static int ipsec_gcm_encrypt(struct aead_request *req)
|
||||
{
|
||||
if (req->assoclen < 8)
|
||||
@ -3002,6 +3159,50 @@ static struct caam_aead_alg driver_aeads[] = {
|
||||
.geniv = true,
|
||||
},
|
||||
},
|
||||
{
|
||||
.aead = {
|
||||
.base = {
|
||||
.cra_name = "rfc7539(chacha20,poly1305)",
|
||||
.cra_driver_name = "rfc7539-chacha20-poly1305-"
|
||||
"caam",
|
||||
.cra_blocksize = 1,
|
||||
},
|
||||
.setkey = chachapoly_setkey,
|
||||
.setauthsize = chachapoly_setauthsize,
|
||||
.encrypt = chachapoly_encrypt,
|
||||
.decrypt = chachapoly_decrypt,
|
||||
.ivsize = CHACHAPOLY_IV_SIZE,
|
||||
.maxauthsize = POLY1305_DIGEST_SIZE,
|
||||
},
|
||||
.caam = {
|
||||
.class1_alg_type = OP_ALG_ALGSEL_CHACHA20 |
|
||||
OP_ALG_AAI_AEAD,
|
||||
.class2_alg_type = OP_ALG_ALGSEL_POLY1305 |
|
||||
OP_ALG_AAI_AEAD,
|
||||
},
|
||||
},
|
||||
{
|
||||
.aead = {
|
||||
.base = {
|
||||
.cra_name = "rfc7539esp(chacha20,poly1305)",
|
||||
.cra_driver_name = "rfc7539esp-chacha20-"
|
||||
"poly1305-caam",
|
||||
.cra_blocksize = 1,
|
||||
},
|
||||
.setkey = chachapoly_setkey,
|
||||
.setauthsize = chachapoly_setauthsize,
|
||||
.encrypt = chachapoly_encrypt,
|
||||
.decrypt = chachapoly_decrypt,
|
||||
.ivsize = 8,
|
||||
.maxauthsize = POLY1305_DIGEST_SIZE,
|
||||
},
|
||||
.caam = {
|
||||
.class1_alg_type = OP_ALG_ALGSEL_CHACHA20 |
|
||||
OP_ALG_AAI_AEAD,
|
||||
.class2_alg_type = OP_ALG_ALGSEL_POLY1305 |
|
||||
OP_ALG_AAI_AEAD,
|
||||
},
|
||||
},
|
||||
};
|
||||
|
||||
static int caam_init_common(struct caam_ctx *ctx, struct caam_alg_entry *caam,
|
||||
@ -3135,7 +3336,7 @@ static int __init caam_algapi_init(void)
|
||||
struct device *ctrldev;
|
||||
struct caam_drv_private *priv;
|
||||
int i = 0, err = 0;
|
||||
u32 cha_vid, cha_inst, des_inst, aes_inst, md_inst;
|
||||
u32 aes_vid, aes_inst, des_inst, md_vid, md_inst, ccha_inst, ptha_inst;
|
||||
unsigned int md_limit = SHA512_DIGEST_SIZE;
|
||||
bool registered = false;
|
||||
|
||||
@ -3168,14 +3369,38 @@ static int __init caam_algapi_init(void)
|
||||
* Register crypto algorithms the device supports.
|
||||
* First, detect presence and attributes of DES, AES, and MD blocks.
|
||||
*/
|
||||
cha_vid = rd_reg32(&priv->ctrl->perfmon.cha_id_ls);
|
||||
cha_inst = rd_reg32(&priv->ctrl->perfmon.cha_num_ls);
|
||||
des_inst = (cha_inst & CHA_ID_LS_DES_MASK) >> CHA_ID_LS_DES_SHIFT;
|
||||
aes_inst = (cha_inst & CHA_ID_LS_AES_MASK) >> CHA_ID_LS_AES_SHIFT;
|
||||
md_inst = (cha_inst & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
|
||||
if (priv->era < 10) {
|
||||
u32 cha_vid, cha_inst;
|
||||
|
||||
cha_vid = rd_reg32(&priv->ctrl->perfmon.cha_id_ls);
|
||||
aes_vid = cha_vid & CHA_ID_LS_AES_MASK;
|
||||
md_vid = (cha_vid & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
|
||||
|
||||
cha_inst = rd_reg32(&priv->ctrl->perfmon.cha_num_ls);
|
||||
des_inst = (cha_inst & CHA_ID_LS_DES_MASK) >>
|
||||
CHA_ID_LS_DES_SHIFT;
|
||||
aes_inst = cha_inst & CHA_ID_LS_AES_MASK;
|
||||
md_inst = (cha_inst & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
|
||||
ccha_inst = 0;
|
||||
ptha_inst = 0;
|
||||
} else {
|
||||
u32 aesa, mdha;
|
||||
|
||||
aesa = rd_reg32(&priv->ctrl->vreg.aesa);
|
||||
mdha = rd_reg32(&priv->ctrl->vreg.mdha);
|
||||
|
||||
aes_vid = (aesa & CHA_VER_VID_MASK) >> CHA_VER_VID_SHIFT;
|
||||
md_vid = (mdha & CHA_VER_VID_MASK) >> CHA_VER_VID_SHIFT;
|
||||
|
||||
des_inst = rd_reg32(&priv->ctrl->vreg.desa) & CHA_VER_NUM_MASK;
|
||||
aes_inst = aesa & CHA_VER_NUM_MASK;
|
||||
md_inst = mdha & CHA_VER_NUM_MASK;
|
||||
ccha_inst = rd_reg32(&priv->ctrl->vreg.ccha) & CHA_VER_NUM_MASK;
|
||||
ptha_inst = rd_reg32(&priv->ctrl->vreg.ptha) & CHA_VER_NUM_MASK;
|
||||
}
|
||||
|
||||
/* If MD is present, limit digest size based on LP256 */
|
||||
if (md_inst && ((cha_vid & CHA_ID_LS_MD_MASK) == CHA_ID_LS_MD_LP256))
|
||||
if (md_inst && md_vid == CHA_VER_VID_MD_LP256)
|
||||
md_limit = SHA256_DIGEST_SIZE;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(driver_algs); i++) {
|
||||
@ -3196,10 +3421,10 @@ static int __init caam_algapi_init(void)
|
||||
* Check support for AES modes not available
|
||||
* on LP devices.
|
||||
*/
|
||||
if ((cha_vid & CHA_ID_LS_AES_MASK) == CHA_ID_LS_AES_LP)
|
||||
if ((t_alg->caam.class1_alg_type & OP_ALG_AAI_MASK) ==
|
||||
OP_ALG_AAI_XTS)
|
||||
continue;
|
||||
if (aes_vid == CHA_VER_VID_AES_LP &&
|
||||
(t_alg->caam.class1_alg_type & OP_ALG_AAI_MASK) ==
|
||||
OP_ALG_AAI_XTS)
|
||||
continue;
|
||||
|
||||
caam_skcipher_alg_init(t_alg);
|
||||
|
||||
@ -3232,21 +3457,28 @@ static int __init caam_algapi_init(void)
|
||||
if (!aes_inst && (c1_alg_sel == OP_ALG_ALGSEL_AES))
|
||||
continue;
|
||||
|
||||
/* Skip CHACHA20 algorithms if not supported by device */
|
||||
if (c1_alg_sel == OP_ALG_ALGSEL_CHACHA20 && !ccha_inst)
|
||||
continue;
|
||||
|
||||
/* Skip POLY1305 algorithms if not supported by device */
|
||||
if (c2_alg_sel == OP_ALG_ALGSEL_POLY1305 && !ptha_inst)
|
||||
continue;
|
||||
|
||||
/*
|
||||
* Check support for AES algorithms not available
|
||||
* on LP devices.
|
||||
*/
|
||||
if ((cha_vid & CHA_ID_LS_AES_MASK) == CHA_ID_LS_AES_LP)
|
||||
if (alg_aai == OP_ALG_AAI_GCM)
|
||||
continue;
|
||||
if (aes_vid == CHA_VER_VID_AES_LP && alg_aai == OP_ALG_AAI_GCM)
|
||||
continue;
|
||||
|
||||
/*
|
||||
* Skip algorithms requiring message digests
|
||||
* if MD or MD size is not supported by device.
|
||||
*/
|
||||
if (c2_alg_sel &&
|
||||
(!md_inst || (t_alg->aead.maxauthsize > md_limit)))
|
||||
continue;
|
||||
if ((c2_alg_sel & ~OP_ALG_ALGSEL_SUBMASK) == 0x40 &&
|
||||
(!md_inst || t_alg->aead.maxauthsize > md_limit))
|
||||
continue;
|
||||
|
||||
caam_aead_alg_init(t_alg);
|
||||
|
||||
|
@ -1213,6 +1213,139 @@ void cnstr_shdsc_rfc4543_decap(u32 * const desc, struct alginfo *cdata,
|
||||
}
|
||||
EXPORT_SYMBOL(cnstr_shdsc_rfc4543_decap);
|
||||
|
||||
/**
|
||||
* cnstr_shdsc_chachapoly - Chacha20 + Poly1305 generic AEAD (rfc7539) and
|
||||
* IPsec ESP (rfc7634, a.k.a. rfc7539esp) shared
|
||||
* descriptor (non-protocol).
|
||||
* @desc: pointer to buffer used for descriptor construction
|
||||
* @cdata: pointer to block cipher transform definitions
|
||||
* Valid algorithm values - OP_ALG_ALGSEL_CHACHA20 ANDed with
|
||||
* OP_ALG_AAI_AEAD.
|
||||
* @adata: pointer to authentication transform definitions
|
||||
* Valid algorithm values - OP_ALG_ALGSEL_POLY1305 ANDed with
|
||||
* OP_ALG_AAI_AEAD.
|
||||
* @ivsize: initialization vector size
|
||||
* @icvsize: integrity check value (ICV) size (truncated or full)
|
||||
* @encap: true if encapsulation, false if decapsulation
|
||||
* @is_qi: true when called from caam/qi
|
||||
*/
|
||||
void cnstr_shdsc_chachapoly(u32 * const desc, struct alginfo *cdata,
|
||||
struct alginfo *adata, unsigned int ivsize,
|
||||
unsigned int icvsize, const bool encap,
|
||||
const bool is_qi)
|
||||
{
|
||||
u32 *key_jump_cmd, *wait_cmd;
|
||||
u32 nfifo;
|
||||
const bool is_ipsec = (ivsize != CHACHAPOLY_IV_SIZE);
|
||||
|
||||
/* Note: Context registers are saved. */
|
||||
init_sh_desc(desc, HDR_SHARE_SERIAL | HDR_SAVECTX);
|
||||
|
||||
/* skip key loading if they are loaded due to sharing */
|
||||
key_jump_cmd = append_jump(desc, JUMP_JSL | JUMP_TEST_ALL |
|
||||
JUMP_COND_SHRD);
|
||||
|
||||
append_key_as_imm(desc, cdata->key_virt, cdata->keylen, cdata->keylen,
|
||||
CLASS_1 | KEY_DEST_CLASS_REG);
|
||||
|
||||
/* For IPsec load the salt from keymat in the context register */
|
||||
if (is_ipsec)
|
||||
append_load_as_imm(desc, cdata->key_virt + cdata->keylen, 4,
|
||||
LDST_CLASS_1_CCB | LDST_SRCDST_BYTE_CONTEXT |
|
||||
4 << LDST_OFFSET_SHIFT);
|
||||
|
||||
set_jump_tgt_here(desc, key_jump_cmd);
|
||||
|
||||
/* Class 2 and 1 operations: Poly & ChaCha */
|
||||
if (encap) {
|
||||
append_operation(desc, adata->algtype | OP_ALG_AS_INITFINAL |
|
||||
OP_ALG_ENCRYPT);
|
||||
append_operation(desc, cdata->algtype | OP_ALG_AS_INITFINAL |
|
||||
OP_ALG_ENCRYPT);
|
||||
} else {
|
||||
append_operation(desc, adata->algtype | OP_ALG_AS_INITFINAL |
|
||||
OP_ALG_DECRYPT | OP_ALG_ICV_ON);
|
||||
append_operation(desc, cdata->algtype | OP_ALG_AS_INITFINAL |
|
||||
OP_ALG_DECRYPT);
|
||||
}
|
||||
|
||||
if (is_qi) {
|
||||
u32 *wait_load_cmd;
|
||||
u32 ctx1_iv_off = is_ipsec ? 8 : 4;
|
||||
|
||||
/* REG3 = assoclen */
|
||||
append_seq_load(desc, 4, LDST_CLASS_DECO |
|
||||
LDST_SRCDST_WORD_DECO_MATH3 |
|
||||
4 << LDST_OFFSET_SHIFT);
|
||||
|
||||
wait_load_cmd = append_jump(desc, JUMP_JSL | JUMP_TEST_ALL |
|
||||
JUMP_COND_CALM | JUMP_COND_NCP |
|
||||
JUMP_COND_NOP | JUMP_COND_NIP |
|
||||
JUMP_COND_NIFP);
|
||||
set_jump_tgt_here(desc, wait_load_cmd);
|
||||
|
||||
append_seq_load(desc, ivsize, LDST_CLASS_1_CCB |
|
||||
LDST_SRCDST_BYTE_CONTEXT |
|
||||
ctx1_iv_off << LDST_OFFSET_SHIFT);
|
||||
}
|
||||
|
||||
/*
|
||||
* MAGIC with NFIFO
|
||||
* Read associated data from the input and send them to class1 and
|
||||
* class2 alignment blocks. From class1 send data to output fifo and
|
||||
* then write it to memory since we don't need to encrypt AD.
|
||||
*/
|
||||
nfifo = NFIFOENTRY_DEST_BOTH | NFIFOENTRY_FC1 | NFIFOENTRY_FC2 |
|
||||
NFIFOENTRY_DTYPE_POLY | NFIFOENTRY_BND;
|
||||
append_load_imm_u32(desc, nfifo, LDST_CLASS_IND_CCB |
|
||||
LDST_SRCDST_WORD_INFO_FIFO_SM | LDLEN_MATH3);
|
||||
|
||||
append_math_add(desc, VARSEQINLEN, ZERO, REG3, CAAM_CMD_SZ);
|
||||
append_math_add(desc, VARSEQOUTLEN, ZERO, REG3, CAAM_CMD_SZ);
|
||||
append_seq_fifo_load(desc, 0, FIFOLD_TYPE_NOINFOFIFO |
|
||||
FIFOLD_CLASS_CLASS1 | LDST_VLF);
|
||||
append_move_len(desc, MOVE_AUX_LS | MOVE_SRC_AUX_ABLK |
|
||||
MOVE_DEST_OUTFIFO | MOVELEN_MRSEL_MATH3);
|
||||
append_seq_fifo_store(desc, 0, FIFOST_TYPE_MESSAGE_DATA | LDST_VLF);
|
||||
|
||||
/* IPsec - copy IV at the output */
|
||||
if (is_ipsec)
|
||||
append_seq_fifo_store(desc, ivsize, FIFOST_TYPE_METADATA |
|
||||
0x2 << 25);
|
||||
|
||||
wait_cmd = append_jump(desc, JUMP_JSL | JUMP_TYPE_LOCAL |
|
||||
JUMP_COND_NOP | JUMP_TEST_ALL);
|
||||
set_jump_tgt_here(desc, wait_cmd);
|
||||
|
||||
if (encap) {
|
||||
/* Read and write cryptlen bytes */
|
||||
append_math_add(desc, VARSEQINLEN, SEQINLEN, REG0, CAAM_CMD_SZ);
|
||||
append_math_add(desc, VARSEQOUTLEN, SEQINLEN, REG0,
|
||||
CAAM_CMD_SZ);
|
||||
aead_append_src_dst(desc, FIFOLD_TYPE_MSG1OUT2);
|
||||
|
||||
/* Write ICV */
|
||||
append_seq_store(desc, icvsize, LDST_CLASS_2_CCB |
|
||||
LDST_SRCDST_BYTE_CONTEXT);
|
||||
} else {
|
||||
/* Read and write cryptlen bytes */
|
||||
append_math_add(desc, VARSEQINLEN, SEQOUTLEN, REG0,
|
||||
CAAM_CMD_SZ);
|
||||
append_math_add(desc, VARSEQOUTLEN, SEQOUTLEN, REG0,
|
||||
CAAM_CMD_SZ);
|
||||
aead_append_src_dst(desc, FIFOLD_TYPE_MSG);
|
||||
|
||||
/* Load ICV for verification */
|
||||
append_seq_fifo_load(desc, icvsize, FIFOLD_CLASS_CLASS2 |
|
||||
FIFOLD_TYPE_LAST2 | FIFOLD_TYPE_ICV);
|
||||
}
|
||||
|
||||
print_hex_dump_debug("chachapoly shdesc@" __stringify(__LINE__)": ",
|
||||
DUMP_PREFIX_ADDRESS, 16, 4, desc, desc_bytes(desc),
|
||||
1);
|
||||
}
|
||||
EXPORT_SYMBOL(cnstr_shdsc_chachapoly);
|
||||
|
||||
/* For skcipher encrypt and decrypt, read from req->src and write to req->dst */
|
||||
static inline void skcipher_append_src_dst(u32 *desc)
|
||||
{
|
||||
@ -1228,7 +1361,8 @@ static inline void skcipher_append_src_dst(u32 *desc)
|
||||
* @desc: pointer to buffer used for descriptor construction
|
||||
* @cdata: pointer to block cipher transform definitions
|
||||
* Valid algorithm values - one of OP_ALG_ALGSEL_{AES, DES, 3DES} ANDed
|
||||
* with OP_ALG_AAI_CBC or OP_ALG_AAI_CTR_MOD128.
|
||||
* with OP_ALG_AAI_CBC or OP_ALG_AAI_CTR_MOD128
|
||||
* - OP_ALG_ALGSEL_CHACHA20
|
||||
* @ivsize: initialization vector size
|
||||
* @is_rfc3686: true when ctr(aes) is wrapped by rfc3686 template
|
||||
* @ctx1_iv_off: IV offset in CONTEXT1 register
|
||||
@ -1293,7 +1427,8 @@ EXPORT_SYMBOL(cnstr_shdsc_skcipher_encap);
|
||||
* @desc: pointer to buffer used for descriptor construction
|
||||
* @cdata: pointer to block cipher transform definitions
|
||||
* Valid algorithm values - one of OP_ALG_ALGSEL_{AES, DES, 3DES} ANDed
|
||||
* with OP_ALG_AAI_CBC or OP_ALG_AAI_CTR_MOD128.
|
||||
* with OP_ALG_AAI_CBC or OP_ALG_AAI_CTR_MOD128
|
||||
* - OP_ALG_ALGSEL_CHACHA20
|
||||
* @ivsize: initialization vector size
|
||||
* @is_rfc3686: true when ctr(aes) is wrapped by rfc3686 template
|
||||
* @ctx1_iv_off: IV offset in CONTEXT1 register
|
||||
|
@ -96,6 +96,11 @@ void cnstr_shdsc_rfc4543_decap(u32 * const desc, struct alginfo *cdata,
|
||||
unsigned int ivsize, unsigned int icvsize,
|
||||
const bool is_qi);
|
||||
|
||||
void cnstr_shdsc_chachapoly(u32 * const desc, struct alginfo *cdata,
|
||||
struct alginfo *adata, unsigned int ivsize,
|
||||
unsigned int icvsize, const bool encap,
|
||||
const bool is_qi);
|
||||
|
||||
void cnstr_shdsc_skcipher_encap(u32 * const desc, struct alginfo *cdata,
|
||||
unsigned int ivsize, const bool is_rfc3686,
|
||||
const u32 ctx1_iv_off);
|
||||
|
@ -2462,7 +2462,7 @@ static int __init caam_qi_algapi_init(void)
|
||||
struct device *ctrldev;
|
||||
struct caam_drv_private *priv;
|
||||
int i = 0, err = 0;
|
||||
u32 cha_vid, cha_inst, des_inst, aes_inst, md_inst;
|
||||
u32 aes_vid, aes_inst, des_inst, md_vid, md_inst;
|
||||
unsigned int md_limit = SHA512_DIGEST_SIZE;
|
||||
bool registered = false;
|
||||
|
||||
@ -2497,14 +2497,34 @@ static int __init caam_qi_algapi_init(void)
|
||||
* Register crypto algorithms the device supports.
|
||||
* First, detect presence and attributes of DES, AES, and MD blocks.
|
||||
*/
|
||||
cha_vid = rd_reg32(&priv->ctrl->perfmon.cha_id_ls);
|
||||
cha_inst = rd_reg32(&priv->ctrl->perfmon.cha_num_ls);
|
||||
des_inst = (cha_inst & CHA_ID_LS_DES_MASK) >> CHA_ID_LS_DES_SHIFT;
|
||||
aes_inst = (cha_inst & CHA_ID_LS_AES_MASK) >> CHA_ID_LS_AES_SHIFT;
|
||||
md_inst = (cha_inst & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
|
||||
if (priv->era < 10) {
|
||||
u32 cha_vid, cha_inst;
|
||||
|
||||
cha_vid = rd_reg32(&priv->ctrl->perfmon.cha_id_ls);
|
||||
aes_vid = cha_vid & CHA_ID_LS_AES_MASK;
|
||||
md_vid = (cha_vid & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
|
||||
|
||||
cha_inst = rd_reg32(&priv->ctrl->perfmon.cha_num_ls);
|
||||
des_inst = (cha_inst & CHA_ID_LS_DES_MASK) >>
|
||||
CHA_ID_LS_DES_SHIFT;
|
||||
aes_inst = cha_inst & CHA_ID_LS_AES_MASK;
|
||||
md_inst = (cha_inst & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
|
||||
} else {
|
||||
u32 aesa, mdha;
|
||||
|
||||
aesa = rd_reg32(&priv->ctrl->vreg.aesa);
|
||||
mdha = rd_reg32(&priv->ctrl->vreg.mdha);
|
||||
|
||||
aes_vid = (aesa & CHA_VER_VID_MASK) >> CHA_VER_VID_SHIFT;
|
||||
md_vid = (mdha & CHA_VER_VID_MASK) >> CHA_VER_VID_SHIFT;
|
||||
|
||||
des_inst = rd_reg32(&priv->ctrl->vreg.desa) & CHA_VER_NUM_MASK;
|
||||
aes_inst = aesa & CHA_VER_NUM_MASK;
|
||||
md_inst = mdha & CHA_VER_NUM_MASK;
|
||||
}
|
||||
|
||||
/* If MD is present, limit digest size based on LP256 */
|
||||
if (md_inst && ((cha_vid & CHA_ID_LS_MD_MASK) == CHA_ID_LS_MD_LP256))
|
||||
if (md_inst && md_vid == CHA_VER_VID_MD_LP256)
|
||||
md_limit = SHA256_DIGEST_SIZE;
|
||||
|
||||
for (i = 0; i < ARRAY_SIZE(driver_algs); i++) {
|
||||
@ -2556,8 +2576,7 @@ static int __init caam_qi_algapi_init(void)
|
||||
* Check support for AES algorithms not available
|
||||
* on LP devices.
|
||||
*/
|
||||
if (((cha_vid & CHA_ID_LS_AES_MASK) == CHA_ID_LS_AES_LP) &&
|
||||
(alg_aai == OP_ALG_AAI_GCM))
|
||||
if (aes_vid == CHA_VER_VID_AES_LP && alg_aai == OP_ALG_AAI_GCM)
|
||||
continue;
|
||||
|
||||
/*
|
||||
|
@ -462,7 +462,15 @@ static struct aead_edesc *aead_edesc_alloc(struct aead_request *req,
|
||||
edesc->dst_nents = dst_nents;
|
||||
edesc->iv_dma = iv_dma;
|
||||
|
||||
edesc->assoclen = cpu_to_caam32(req->assoclen);
|
||||
if ((alg->caam.class1_alg_type & OP_ALG_ALGSEL_MASK) ==
|
||||
OP_ALG_ALGSEL_CHACHA20 && ivsize != CHACHAPOLY_IV_SIZE)
|
||||
/*
|
||||
* The associated data comes already with the IV but we need
|
||||
* to skip it when we authenticate or encrypt...
|
||||
*/
|
||||
edesc->assoclen = cpu_to_caam32(req->assoclen - ivsize);
|
||||
else
|
||||
edesc->assoclen = cpu_to_caam32(req->assoclen);
|
||||
edesc->assoclen_dma = dma_map_single(dev, &edesc->assoclen, 4,
|
||||
DMA_TO_DEVICE);
|
||||
if (dma_mapping_error(dev, edesc->assoclen_dma)) {
|
||||
@ -532,6 +540,68 @@ static struct aead_edesc *aead_edesc_alloc(struct aead_request *req,
|
||||
return edesc;
|
||||
}
|
||||
|
||||
static int chachapoly_set_sh_desc(struct crypto_aead *aead)
|
||||
{
|
||||
struct caam_ctx *ctx = crypto_aead_ctx(aead);
|
||||
unsigned int ivsize = crypto_aead_ivsize(aead);
|
||||
struct device *dev = ctx->dev;
|
||||
struct caam_flc *flc;
|
||||
u32 *desc;
|
||||
|
||||
if (!ctx->cdata.keylen || !ctx->authsize)
|
||||
return 0;
|
||||
|
||||
flc = &ctx->flc[ENCRYPT];
|
||||
desc = flc->sh_desc;
|
||||
cnstr_shdsc_chachapoly(desc, &ctx->cdata, &ctx->adata, ivsize,
|
||||
ctx->authsize, true, true);
|
||||
flc->flc[1] = cpu_to_caam32(desc_len(desc)); /* SDL */
|
||||
dma_sync_single_for_device(dev, ctx->flc_dma[ENCRYPT],
|
||||
sizeof(flc->flc) + desc_bytes(desc),
|
||||
ctx->dir);
|
||||
|
||||
flc = &ctx->flc[DECRYPT];
|
||||
desc = flc->sh_desc;
|
||||
cnstr_shdsc_chachapoly(desc, &ctx->cdata, &ctx->adata, ivsize,
|
||||
ctx->authsize, false, true);
|
||||
flc->flc[1] = cpu_to_caam32(desc_len(desc)); /* SDL */
|
||||
dma_sync_single_for_device(dev, ctx->flc_dma[DECRYPT],
|
||||
sizeof(flc->flc) + desc_bytes(desc),
|
||||
ctx->dir);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int chachapoly_setauthsize(struct crypto_aead *aead,
|
||||
unsigned int authsize)
|
||||
{
|
||||
struct caam_ctx *ctx = crypto_aead_ctx(aead);
|
||||
|
||||
if (authsize != POLY1305_DIGEST_SIZE)
|
||||
return -EINVAL;
|
||||
|
||||
ctx->authsize = authsize;
|
||||
return chachapoly_set_sh_desc(aead);
|
||||
}
|
||||
|
||||
static int chachapoly_setkey(struct crypto_aead *aead, const u8 *key,
|
||||
unsigned int keylen)
|
||||
{
|
||||
struct caam_ctx *ctx = crypto_aead_ctx(aead);
|
||||
unsigned int ivsize = crypto_aead_ivsize(aead);
|
||||
unsigned int saltlen = CHACHAPOLY_IV_SIZE - ivsize;
|
||||
|
||||
if (keylen != CHACHA_KEY_SIZE + saltlen) {
|
||||
crypto_aead_set_flags(aead, CRYPTO_TFM_RES_BAD_KEY_LEN);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
ctx->cdata.key_virt = key;
|
||||
ctx->cdata.keylen = keylen - saltlen;
|
||||
|
||||
return chachapoly_set_sh_desc(aead);
|
||||
}
|
||||
|
||||
static int gcm_set_sh_desc(struct crypto_aead *aead)
|
||||
{
|
||||
struct caam_ctx *ctx = crypto_aead_ctx(aead);
|
||||
@ -816,7 +886,9 @@ static int skcipher_setkey(struct crypto_skcipher *skcipher, const u8 *key,
|
||||
u32 *desc;
|
||||
u32 ctx1_iv_off = 0;
|
||||
const bool ctr_mode = ((ctx->cdata.algtype & OP_ALG_AAI_MASK) ==
|
||||
OP_ALG_AAI_CTR_MOD128);
|
||||
OP_ALG_AAI_CTR_MOD128) &&
|
||||
((ctx->cdata.algtype & OP_ALG_ALGSEL_MASK) !=
|
||||
OP_ALG_ALGSEL_CHACHA20);
|
||||
const bool is_rfc3686 = alg->caam.rfc3686;
|
||||
|
||||
print_hex_dump_debug("key in @" __stringify(__LINE__)": ",
|
||||
@ -1494,7 +1566,23 @@ static struct caam_skcipher_alg driver_algs[] = {
|
||||
.ivsize = AES_BLOCK_SIZE,
|
||||
},
|
||||
.caam.class1_alg_type = OP_ALG_ALGSEL_AES | OP_ALG_AAI_XTS,
|
||||
}
|
||||
},
|
||||
{
|
||||
.skcipher = {
|
||||
.base = {
|
||||
.cra_name = "chacha20",
|
||||
.cra_driver_name = "chacha20-caam-qi2",
|
||||
.cra_blocksize = 1,
|
||||
},
|
||||
.setkey = skcipher_setkey,
|
||||
.encrypt = skcipher_encrypt,
|
||||
.decrypt = skcipher_decrypt,
|
||||
.min_keysize = CHACHA_KEY_SIZE,
|
||||
.max_keysize = CHACHA_KEY_SIZE,
|
||||
.ivsize = CHACHA_IV_SIZE,
|
||||
},
|
||||
.caam.class1_alg_type = OP_ALG_ALGSEL_CHACHA20,
|
||||
},
|
||||
};
|
||||
|
||||
static struct caam_aead_alg driver_aeads[] = {
|
||||
@ -2608,6 +2696,50 @@ static struct caam_aead_alg driver_aeads[] = {
|
||||
.geniv = true,
|
||||
},
|
||||
},
|
||||
{
|
||||
.aead = {
|
||||
.base = {
|
||||
.cra_name = "rfc7539(chacha20,poly1305)",
|
||||
.cra_driver_name = "rfc7539-chacha20-poly1305-"
|
||||
"caam-qi2",
|
||||
.cra_blocksize = 1,
|
||||
},
|
||||
.setkey = chachapoly_setkey,
|
||||
.setauthsize = chachapoly_setauthsize,
|
||||
.encrypt = aead_encrypt,
|
||||
.decrypt = aead_decrypt,
|
||||
.ivsize = CHACHAPOLY_IV_SIZE,
|
||||
.maxauthsize = POLY1305_DIGEST_SIZE,
|
||||
},
|
||||
.caam = {
|
||||
.class1_alg_type = OP_ALG_ALGSEL_CHACHA20 |
|
||||
OP_ALG_AAI_AEAD,
|
||||
.class2_alg_type = OP_ALG_ALGSEL_POLY1305 |
|
||||
OP_ALG_AAI_AEAD,
|
||||
},
|
||||
},
|
||||
{
|
||||
.aead = {
|
||||
.base = {
|
||||
.cra_name = "rfc7539esp(chacha20,poly1305)",
|
||||
.cra_driver_name = "rfc7539esp-chacha20-"
|
||||
"poly1305-caam-qi2",
|
||||
.cra_blocksize = 1,
|
||||
},
|
||||
.setkey = chachapoly_setkey,
|
||||
.setauthsize = chachapoly_setauthsize,
|
||||
.encrypt = aead_encrypt,
|
||||
.decrypt = aead_decrypt,
|
||||
.ivsize = 8,
|
||||
.maxauthsize = POLY1305_DIGEST_SIZE,
|
||||
},
|
||||
.caam = {
|
||||
.class1_alg_type = OP_ALG_ALGSEL_CHACHA20 |
|
||||
OP_ALG_AAI_AEAD,
|
||||
.class2_alg_type = OP_ALG_ALGSEL_POLY1305 |
|
||||
OP_ALG_AAI_AEAD,
|
||||
},
|
||||
},
|
||||
{
|
||||
.aead = {
|
||||
.base = {
|
||||
@ -4908,6 +5040,11 @@ static int dpaa2_caam_probe(struct fsl_mc_device *dpseci_dev)
|
||||
alg_sel == OP_ALG_ALGSEL_AES)
|
||||
continue;
|
||||
|
||||
/* Skip CHACHA20 algorithms if not supported by device */
|
||||
if (alg_sel == OP_ALG_ALGSEL_CHACHA20 &&
|
||||
!priv->sec_attr.ccha_acc_num)
|
||||
continue;
|
||||
|
||||
t_alg->caam.dev = dev;
|
||||
caam_skcipher_alg_init(t_alg);
|
||||
|
||||
@ -4940,11 +5077,22 @@ static int dpaa2_caam_probe(struct fsl_mc_device *dpseci_dev)
|
||||
c1_alg_sel == OP_ALG_ALGSEL_AES)
|
||||
continue;
|
||||
|
||||
/* Skip CHACHA20 algorithms if not supported by device */
|
||||
if (c1_alg_sel == OP_ALG_ALGSEL_CHACHA20 &&
|
||||
!priv->sec_attr.ccha_acc_num)
|
||||
continue;
|
||||
|
||||
/* Skip POLY1305 algorithms if not supported by device */
|
||||
if (c2_alg_sel == OP_ALG_ALGSEL_POLY1305 &&
|
||||
!priv->sec_attr.ptha_acc_num)
|
||||
continue;
|
||||
|
||||
/*
|
||||
* Skip algorithms requiring message digests
|
||||
* if MD not supported by device.
|
||||
*/
|
||||
if (!priv->sec_attr.md_acc_num && c2_alg_sel)
|
||||
if ((c2_alg_sel & ~OP_ALG_ALGSEL_SUBMASK) == 0x40 &&
|
||||
!priv->sec_attr.md_acc_num)
|
||||
continue;
|
||||
|
||||
t_alg->caam.dev = dev;
|
||||
|
@ -3,6 +3,7 @@
|
||||
* caam - Freescale FSL CAAM support for ahash functions of crypto API
|
||||
*
|
||||
* Copyright 2011 Freescale Semiconductor, Inc.
|
||||
* Copyright 2018 NXP
|
||||
*
|
||||
* Based on caamalg.c crypto API driver.
|
||||
*
|
||||
@ -1801,7 +1802,7 @@ static int __init caam_algapi_hash_init(void)
|
||||
int i = 0, err = 0;
|
||||
struct caam_drv_private *priv;
|
||||
unsigned int md_limit = SHA512_DIGEST_SIZE;
|
||||
u32 cha_inst, cha_vid;
|
||||
u32 md_inst, md_vid;
|
||||
|
||||
dev_node = of_find_compatible_node(NULL, NULL, "fsl,sec-v4.0");
|
||||
if (!dev_node) {
|
||||
@ -1831,18 +1832,27 @@ static int __init caam_algapi_hash_init(void)
|
||||
* Register crypto algorithms the device supports. First, identify
|
||||
* presence and attributes of MD block.
|
||||
*/
|
||||
cha_vid = rd_reg32(&priv->ctrl->perfmon.cha_id_ls);
|
||||
cha_inst = rd_reg32(&priv->ctrl->perfmon.cha_num_ls);
|
||||
if (priv->era < 10) {
|
||||
md_vid = (rd_reg32(&priv->ctrl->perfmon.cha_id_ls) &
|
||||
CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
|
||||
md_inst = (rd_reg32(&priv->ctrl->perfmon.cha_num_ls) &
|
||||
CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
|
||||
} else {
|
||||
u32 mdha = rd_reg32(&priv->ctrl->vreg.mdha);
|
||||
|
||||
md_vid = (mdha & CHA_VER_VID_MASK) >> CHA_VER_VID_SHIFT;
|
||||
md_inst = mdha & CHA_VER_NUM_MASK;
|
||||
}
|
||||
|
||||
/*
|
||||
* Skip registration of any hashing algorithms if MD block
|
||||
* is not present.
|
||||
*/
|
||||
if (!((cha_inst & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT))
|
||||
if (!md_inst)
|
||||
return -ENODEV;
|
||||
|
||||
/* Limit digest size based on LP256 */
|
||||
if ((cha_vid & CHA_ID_LS_MD_MASK) == CHA_ID_LS_MD_LP256)
|
||||
if (md_vid == CHA_VER_VID_MD_LP256)
|
||||
md_limit = SHA256_DIGEST_SIZE;
|
||||
|
||||
INIT_LIST_HEAD(&hash_list);
|
||||
|
@ -3,6 +3,7 @@
|
||||
* caam - Freescale FSL CAAM support for Public Key Cryptography
|
||||
*
|
||||
* Copyright 2016 Freescale Semiconductor, Inc.
|
||||
* Copyright 2018 NXP
|
||||
*
|
||||
* There is no Shared Descriptor for PKC so that the Job Descriptor must carry
|
||||
* all the desired key parameters, input and output pointers.
|
||||
@ -1017,7 +1018,7 @@ static int __init caam_pkc_init(void)
|
||||
struct platform_device *pdev;
|
||||
struct device *ctrldev;
|
||||
struct caam_drv_private *priv;
|
||||
u32 cha_inst, pk_inst;
|
||||
u32 pk_inst;
|
||||
int err;
|
||||
|
||||
dev_node = of_find_compatible_node(NULL, NULL, "fsl,sec-v4.0");
|
||||
@ -1045,8 +1046,11 @@ static int __init caam_pkc_init(void)
|
||||
return -ENODEV;
|
||||
|
||||
/* Determine public key hardware accelerator presence. */
|
||||
cha_inst = rd_reg32(&priv->ctrl->perfmon.cha_num_ls);
|
||||
pk_inst = (cha_inst & CHA_ID_LS_PK_MASK) >> CHA_ID_LS_PK_SHIFT;
|
||||
if (priv->era < 10)
|
||||
pk_inst = (rd_reg32(&priv->ctrl->perfmon.cha_num_ls) &
|
||||
CHA_ID_LS_PK_MASK) >> CHA_ID_LS_PK_SHIFT;
|
||||
else
|
||||
pk_inst = rd_reg32(&priv->ctrl->vreg.pkha) & CHA_VER_NUM_MASK;
|
||||
|
||||
/* Do not register algorithms if PKHA is not present. */
|
||||
if (!pk_inst)
|
||||
|
@ -3,6 +3,7 @@
|
||||
* caam - Freescale FSL CAAM support for hw_random
|
||||
*
|
||||
* Copyright 2011 Freescale Semiconductor, Inc.
|
||||
* Copyright 2018 NXP
|
||||
*
|
||||
* Based on caamalg.c crypto API driver.
|
||||
*
|
||||
@ -309,6 +310,7 @@ static int __init caam_rng_init(void)
|
||||
struct platform_device *pdev;
|
||||
struct device *ctrldev;
|
||||
struct caam_drv_private *priv;
|
||||
u32 rng_inst;
|
||||
int err;
|
||||
|
||||
dev_node = of_find_compatible_node(NULL, NULL, "fsl,sec-v4.0");
|
||||
@ -336,7 +338,13 @@ static int __init caam_rng_init(void)
|
||||
return -ENODEV;
|
||||
|
||||
/* Check for an instantiated RNG before registration */
|
||||
if (!(rd_reg32(&priv->ctrl->perfmon.cha_num_ls) & CHA_ID_LS_RNG_MASK))
|
||||
if (priv->era < 10)
|
||||
rng_inst = (rd_reg32(&priv->ctrl->perfmon.cha_num_ls) &
|
||||
CHA_ID_LS_RNG_MASK) >> CHA_ID_LS_RNG_SHIFT;
|
||||
else
|
||||
rng_inst = rd_reg32(&priv->ctrl->vreg.rng) & CHA_VER_NUM_MASK;
|
||||
|
||||
if (!rng_inst)
|
||||
return -ENODEV;
|
||||
|
||||
dev = caam_jr_alloc();
|
||||
|
@ -36,6 +36,8 @@
|
||||
#include <crypto/gcm.h>
|
||||
#include <crypto/sha.h>
|
||||
#include <crypto/md5.h>
|
||||
#include <crypto/chacha.h>
|
||||
#include <crypto/poly1305.h>
|
||||
#include <crypto/internal/aead.h>
|
||||
#include <crypto/authenc.h>
|
||||
#include <crypto/akcipher.h>
|
||||
|
@ -3,6 +3,7 @@
|
||||
* Controller-level driver, kernel property detection, initialization
|
||||
*
|
||||
* Copyright 2008-2012 Freescale Semiconductor, Inc.
|
||||
* Copyright 2018 NXP
|
||||
*/
|
||||
|
||||
#include <linux/device.h>
|
||||
@ -106,7 +107,7 @@ static inline int run_descriptor_deco0(struct device *ctrldev, u32 *desc,
|
||||
struct caam_ctrl __iomem *ctrl = ctrlpriv->ctrl;
|
||||
struct caam_deco __iomem *deco = ctrlpriv->deco;
|
||||
unsigned int timeout = 100000;
|
||||
u32 deco_dbg_reg, flags;
|
||||
u32 deco_dbg_reg, deco_state, flags;
|
||||
int i;
|
||||
|
||||
|
||||
@ -149,13 +150,22 @@ static inline int run_descriptor_deco0(struct device *ctrldev, u32 *desc,
|
||||
timeout = 10000000;
|
||||
do {
|
||||
deco_dbg_reg = rd_reg32(&deco->desc_dbg);
|
||||
|
||||
if (ctrlpriv->era < 10)
|
||||
deco_state = (deco_dbg_reg & DESC_DBG_DECO_STAT_MASK) >>
|
||||
DESC_DBG_DECO_STAT_SHIFT;
|
||||
else
|
||||
deco_state = (rd_reg32(&deco->dbg_exec) &
|
||||
DESC_DER_DECO_STAT_MASK) >>
|
||||
DESC_DER_DECO_STAT_SHIFT;
|
||||
|
||||
/*
|
||||
* If an error occured in the descriptor, then
|
||||
* the DECO status field will be set to 0x0D
|
||||
*/
|
||||
if ((deco_dbg_reg & DESC_DBG_DECO_STAT_MASK) ==
|
||||
DESC_DBG_DECO_STAT_HOST_ERR)
|
||||
if (deco_state == DECO_STAT_HOST_ERR)
|
||||
break;
|
||||
|
||||
cpu_relax();
|
||||
} while ((deco_dbg_reg & DESC_DBG_DECO_STAT_VALID) && --timeout);
|
||||
|
||||
@ -491,7 +501,7 @@ static int caam_probe(struct platform_device *pdev)
|
||||
struct caam_perfmon *perfmon;
|
||||
#endif
|
||||
u32 scfgr, comp_params;
|
||||
u32 cha_vid_ls;
|
||||
u8 rng_vid;
|
||||
int pg_size;
|
||||
int BLOCK_OFFSET = 0;
|
||||
|
||||
@ -733,15 +743,19 @@ static int caam_probe(struct platform_device *pdev)
|
||||
goto caam_remove;
|
||||
}
|
||||
|
||||
cha_vid_ls = rd_reg32(&ctrl->perfmon.cha_id_ls);
|
||||
if (ctrlpriv->era < 10)
|
||||
rng_vid = (rd_reg32(&ctrl->perfmon.cha_id_ls) &
|
||||
CHA_ID_LS_RNG_MASK) >> CHA_ID_LS_RNG_SHIFT;
|
||||
else
|
||||
rng_vid = (rd_reg32(&ctrl->vreg.rng) & CHA_VER_VID_MASK) >>
|
||||
CHA_VER_VID_SHIFT;
|
||||
|
||||
/*
|
||||
* If SEC has RNG version >= 4 and RNG state handle has not been
|
||||
* already instantiated, do RNG instantiation
|
||||
* In case of SoCs with Management Complex, RNG is managed by MC f/w.
|
||||
*/
|
||||
if (!ctrlpriv->mc_en &&
|
||||
(cha_vid_ls & CHA_ID_LS_RNG_MASK) >> CHA_ID_LS_RNG_SHIFT >= 4) {
|
||||
if (!ctrlpriv->mc_en && rng_vid >= 4) {
|
||||
ctrlpriv->rng4_sh_init =
|
||||
rd_reg32(&ctrl->r4tst[0].rdsta);
|
||||
/*
|
||||
|
@ -4,6 +4,7 @@
|
||||
* Definitions to support CAAM descriptor instruction generation
|
||||
*
|
||||
* Copyright 2008-2011 Freescale Semiconductor, Inc.
|
||||
* Copyright 2018 NXP
|
||||
*/
|
||||
|
||||
#ifndef DESC_H
|
||||
@ -242,6 +243,7 @@
|
||||
#define LDST_SRCDST_WORD_DESCBUF_SHARED (0x42 << LDST_SRCDST_SHIFT)
|
||||
#define LDST_SRCDST_WORD_DESCBUF_JOB_WE (0x45 << LDST_SRCDST_SHIFT)
|
||||
#define LDST_SRCDST_WORD_DESCBUF_SHARED_WE (0x46 << LDST_SRCDST_SHIFT)
|
||||
#define LDST_SRCDST_WORD_INFO_FIFO_SM (0x71 << LDST_SRCDST_SHIFT)
|
||||
#define LDST_SRCDST_WORD_INFO_FIFO (0x7a << LDST_SRCDST_SHIFT)
|
||||
|
||||
/* Offset in source/destination */
|
||||
@ -284,6 +286,12 @@
|
||||
#define LDLEN_SET_OFIFO_OFFSET_SHIFT 0
|
||||
#define LDLEN_SET_OFIFO_OFFSET_MASK (3 << LDLEN_SET_OFIFO_OFFSET_SHIFT)
|
||||
|
||||
/* Special Length definitions when dst=sm, nfifo-{sm,m} */
|
||||
#define LDLEN_MATH0 0
|
||||
#define LDLEN_MATH1 1
|
||||
#define LDLEN_MATH2 2
|
||||
#define LDLEN_MATH3 3
|
||||
|
||||
/*
|
||||
* FIFO_LOAD/FIFO_STORE/SEQ_FIFO_LOAD/SEQ_FIFO_STORE
|
||||
* Command Constructs
|
||||
@ -408,6 +416,7 @@
|
||||
#define FIFOST_TYPE_MESSAGE_DATA (0x30 << FIFOST_TYPE_SHIFT)
|
||||
#define FIFOST_TYPE_RNGSTORE (0x34 << FIFOST_TYPE_SHIFT)
|
||||
#define FIFOST_TYPE_RNGFIFO (0x35 << FIFOST_TYPE_SHIFT)
|
||||
#define FIFOST_TYPE_METADATA (0x3e << FIFOST_TYPE_SHIFT)
|
||||
#define FIFOST_TYPE_SKIP (0x3f << FIFOST_TYPE_SHIFT)
|
||||
|
||||
/*
|
||||
@ -1133,6 +1142,12 @@
|
||||
#define OP_ALG_TYPE_CLASS1 (2 << OP_ALG_TYPE_SHIFT)
|
||||
#define OP_ALG_TYPE_CLASS2 (4 << OP_ALG_TYPE_SHIFT)
|
||||
|
||||
/* version register fields */
|
||||
#define OP_VER_CCHA_NUM 0x000000ff /* Number CCHAs instantiated */
|
||||
#define OP_VER_CCHA_MISC 0x0000ff00 /* CCHA Miscellaneous Information */
|
||||
#define OP_VER_CCHA_REV 0x00ff0000 /* CCHA Revision Number */
|
||||
#define OP_VER_CCHA_VID 0xff000000 /* CCHA Version ID */
|
||||
|
||||
#define OP_ALG_ALGSEL_SHIFT 16
|
||||
#define OP_ALG_ALGSEL_MASK (0xff << OP_ALG_ALGSEL_SHIFT)
|
||||
#define OP_ALG_ALGSEL_SUBMASK (0x0f << OP_ALG_ALGSEL_SHIFT)
|
||||
@ -1152,6 +1167,8 @@
|
||||
#define OP_ALG_ALGSEL_KASUMI (0x70 << OP_ALG_ALGSEL_SHIFT)
|
||||
#define OP_ALG_ALGSEL_CRC (0x90 << OP_ALG_ALGSEL_SHIFT)
|
||||
#define OP_ALG_ALGSEL_SNOW_F9 (0xA0 << OP_ALG_ALGSEL_SHIFT)
|
||||
#define OP_ALG_ALGSEL_CHACHA20 (0xD0 << OP_ALG_ALGSEL_SHIFT)
|
||||
#define OP_ALG_ALGSEL_POLY1305 (0xE0 << OP_ALG_ALGSEL_SHIFT)
|
||||
|
||||
#define OP_ALG_AAI_SHIFT 4
|
||||
#define OP_ALG_AAI_MASK (0x1ff << OP_ALG_AAI_SHIFT)
|
||||
@ -1199,6 +1216,11 @@
|
||||
#define OP_ALG_AAI_RNG4_AI (0x80 << OP_ALG_AAI_SHIFT)
|
||||
#define OP_ALG_AAI_RNG4_SK (0x100 << OP_ALG_AAI_SHIFT)
|
||||
|
||||
/* Chacha20 AAI set */
|
||||
#define OP_ALG_AAI_AEAD (0x002 << OP_ALG_AAI_SHIFT)
|
||||
#define OP_ALG_AAI_KEYSTREAM (0x001 << OP_ALG_AAI_SHIFT)
|
||||
#define OP_ALG_AAI_BC8 (0x008 << OP_ALG_AAI_SHIFT)
|
||||
|
||||
/* hmac/smac AAI set */
|
||||
#define OP_ALG_AAI_HASH (0x00 << OP_ALG_AAI_SHIFT)
|
||||
#define OP_ALG_AAI_HMAC (0x01 << OP_ALG_AAI_SHIFT)
|
||||
@ -1387,6 +1409,7 @@
|
||||
#define MOVE_SRC_MATH3 (0x07 << MOVE_SRC_SHIFT)
|
||||
#define MOVE_SRC_INFIFO (0x08 << MOVE_SRC_SHIFT)
|
||||
#define MOVE_SRC_INFIFO_CL (0x09 << MOVE_SRC_SHIFT)
|
||||
#define MOVE_SRC_AUX_ABLK (0x0a << MOVE_SRC_SHIFT)
|
||||
|
||||
#define MOVE_DEST_SHIFT 16
|
||||
#define MOVE_DEST_MASK (0x0f << MOVE_DEST_SHIFT)
|
||||
@ -1413,6 +1436,10 @@
|
||||
|
||||
#define MOVELEN_MRSEL_SHIFT 0
|
||||
#define MOVELEN_MRSEL_MASK (0x3 << MOVE_LEN_SHIFT)
|
||||
#define MOVELEN_MRSEL_MATH0 (0 << MOVELEN_MRSEL_SHIFT)
|
||||
#define MOVELEN_MRSEL_MATH1 (1 << MOVELEN_MRSEL_SHIFT)
|
||||
#define MOVELEN_MRSEL_MATH2 (2 << MOVELEN_MRSEL_SHIFT)
|
||||
#define MOVELEN_MRSEL_MATH3 (3 << MOVELEN_MRSEL_SHIFT)
|
||||
|
||||
/*
|
||||
* MATH Command Constructs
|
||||
@ -1589,6 +1616,7 @@
|
||||
#define NFIFOENTRY_DTYPE_IV (0x2 << NFIFOENTRY_DTYPE_SHIFT)
|
||||
#define NFIFOENTRY_DTYPE_SAD (0x3 << NFIFOENTRY_DTYPE_SHIFT)
|
||||
#define NFIFOENTRY_DTYPE_ICV (0xA << NFIFOENTRY_DTYPE_SHIFT)
|
||||
#define NFIFOENTRY_DTYPE_POLY (0xB << NFIFOENTRY_DTYPE_SHIFT)
|
||||
#define NFIFOENTRY_DTYPE_SKIP (0xE << NFIFOENTRY_DTYPE_SHIFT)
|
||||
#define NFIFOENTRY_DTYPE_MSG (0xF << NFIFOENTRY_DTYPE_SHIFT)
|
||||
|
||||
|
@ -189,6 +189,7 @@ static inline u32 *append_##cmd(u32 * const desc, u32 options) \
|
||||
}
|
||||
APPEND_CMD_RET(jump, JUMP)
|
||||
APPEND_CMD_RET(move, MOVE)
|
||||
APPEND_CMD_RET(move_len, MOVE_LEN)
|
||||
|
||||
static inline void set_jump_tgt_here(u32 * const desc, u32 *jump_cmd)
|
||||
{
|
||||
@ -327,7 +328,11 @@ static inline void append_##cmd##_imm_##type(u32 * const desc, type immediate, \
|
||||
u32 options) \
|
||||
{ \
|
||||
PRINT_POS; \
|
||||
append_cmd(desc, CMD_##op | IMMEDIATE | options | sizeof(type)); \
|
||||
if (options & LDST_LEN_MASK) \
|
||||
append_cmd(desc, CMD_##op | IMMEDIATE | options); \
|
||||
else \
|
||||
append_cmd(desc, CMD_##op | IMMEDIATE | options | \
|
||||
sizeof(type)); \
|
||||
append_cmd(desc, immediate); \
|
||||
}
|
||||
APPEND_CMD_RAW_IMM(load, LOAD, u32);
|
||||
|
@ -3,6 +3,7 @@
|
||||
* CAAM hardware register-level view
|
||||
*
|
||||
* Copyright 2008-2011 Freescale Semiconductor, Inc.
|
||||
* Copyright 2018 NXP
|
||||
*/
|
||||
|
||||
#ifndef REGS_H
|
||||
@ -211,6 +212,47 @@ struct jr_outentry {
|
||||
u32 jrstatus; /* Status for completed descriptor */
|
||||
} __packed;
|
||||
|
||||
/* Version registers (Era 10+) e80-eff */
|
||||
struct version_regs {
|
||||
u32 crca; /* CRCA_VERSION */
|
||||
u32 afha; /* AFHA_VERSION */
|
||||
u32 kfha; /* KFHA_VERSION */
|
||||
u32 pkha; /* PKHA_VERSION */
|
||||
u32 aesa; /* AESA_VERSION */
|
||||
u32 mdha; /* MDHA_VERSION */
|
||||
u32 desa; /* DESA_VERSION */
|
||||
u32 snw8a; /* SNW8A_VERSION */
|
||||
u32 snw9a; /* SNW9A_VERSION */
|
||||
u32 zuce; /* ZUCE_VERSION */
|
||||
u32 zuca; /* ZUCA_VERSION */
|
||||
u32 ccha; /* CCHA_VERSION */
|
||||
u32 ptha; /* PTHA_VERSION */
|
||||
u32 rng; /* RNG_VERSION */
|
||||
u32 trng; /* TRNG_VERSION */
|
||||
u32 aaha; /* AAHA_VERSION */
|
||||
u32 rsvd[10];
|
||||
u32 sr; /* SR_VERSION */
|
||||
u32 dma; /* DMA_VERSION */
|
||||
u32 ai; /* AI_VERSION */
|
||||
u32 qi; /* QI_VERSION */
|
||||
u32 jr; /* JR_VERSION */
|
||||
u32 deco; /* DECO_VERSION */
|
||||
};
|
||||
|
||||
/* Version registers bitfields */
|
||||
|
||||
/* Number of CHAs instantiated */
|
||||
#define CHA_VER_NUM_MASK 0xffull
|
||||
/* CHA Miscellaneous Information */
|
||||
#define CHA_VER_MISC_SHIFT 8
|
||||
#define CHA_VER_MISC_MASK (0xffull << CHA_VER_MISC_SHIFT)
|
||||
/* CHA Revision Number */
|
||||
#define CHA_VER_REV_SHIFT 16
|
||||
#define CHA_VER_REV_MASK (0xffull << CHA_VER_REV_SHIFT)
|
||||
/* CHA Version ID */
|
||||
#define CHA_VER_VID_SHIFT 24
|
||||
#define CHA_VER_VID_MASK (0xffull << CHA_VER_VID_SHIFT)
|
||||
|
||||
/*
|
||||
* caam_perfmon - Performance Monitor/Secure Memory Status/
|
||||
* CAAM Global Status/Component Version IDs
|
||||
@ -223,15 +265,13 @@ struct jr_outentry {
|
||||
#define CHA_NUM_MS_DECONUM_MASK (0xfull << CHA_NUM_MS_DECONUM_SHIFT)
|
||||
|
||||
/*
|
||||
* CHA version IDs / instantiation bitfields
|
||||
* CHA version IDs / instantiation bitfields (< Era 10)
|
||||
* Defined for use with the cha_id fields in perfmon, but the same shift/mask
|
||||
* selectors can be used to pull out the number of instantiated blocks within
|
||||
* cha_num fields in perfmon because the locations are the same.
|
||||
*/
|
||||
#define CHA_ID_LS_AES_SHIFT 0
|
||||
#define CHA_ID_LS_AES_MASK (0xfull << CHA_ID_LS_AES_SHIFT)
|
||||
#define CHA_ID_LS_AES_LP (0x3ull << CHA_ID_LS_AES_SHIFT)
|
||||
#define CHA_ID_LS_AES_HP (0x4ull << CHA_ID_LS_AES_SHIFT)
|
||||
|
||||
#define CHA_ID_LS_DES_SHIFT 4
|
||||
#define CHA_ID_LS_DES_MASK (0xfull << CHA_ID_LS_DES_SHIFT)
|
||||
@ -241,9 +281,6 @@ struct jr_outentry {
|
||||
|
||||
#define CHA_ID_LS_MD_SHIFT 12
|
||||
#define CHA_ID_LS_MD_MASK (0xfull << CHA_ID_LS_MD_SHIFT)
|
||||
#define CHA_ID_LS_MD_LP256 (0x0ull << CHA_ID_LS_MD_SHIFT)
|
||||
#define CHA_ID_LS_MD_LP512 (0x1ull << CHA_ID_LS_MD_SHIFT)
|
||||
#define CHA_ID_LS_MD_HP (0x2ull << CHA_ID_LS_MD_SHIFT)
|
||||
|
||||
#define CHA_ID_LS_RNG_SHIFT 16
|
||||
#define CHA_ID_LS_RNG_MASK (0xfull << CHA_ID_LS_RNG_SHIFT)
|
||||
@ -269,6 +306,13 @@ struct jr_outentry {
|
||||
#define CHA_ID_MS_JR_SHIFT 28
|
||||
#define CHA_ID_MS_JR_MASK (0xfull << CHA_ID_MS_JR_SHIFT)
|
||||
|
||||
/* Specific CHA version IDs */
|
||||
#define CHA_VER_VID_AES_LP 0x3ull
|
||||
#define CHA_VER_VID_AES_HP 0x4ull
|
||||
#define CHA_VER_VID_MD_LP256 0x0ull
|
||||
#define CHA_VER_VID_MD_LP512 0x1ull
|
||||
#define CHA_VER_VID_MD_HP 0x2ull
|
||||
|
||||
struct sec_vid {
|
||||
u16 ip_id;
|
||||
u8 maj_rev;
|
||||
@ -479,8 +523,10 @@ struct caam_ctrl {
|
||||
struct rng4tst r4tst[2];
|
||||
};
|
||||
|
||||
u32 rsvd9[448];
|
||||
u32 rsvd9[416];
|
||||
|
||||
/* Version registers - introduced with era 10 e80-eff */
|
||||
struct version_regs vreg;
|
||||
/* Performance Monitor f00-fff */
|
||||
struct caam_perfmon perfmon;
|
||||
};
|
||||
@ -570,8 +616,10 @@ struct caam_job_ring {
|
||||
u32 rsvd11;
|
||||
u32 jrcommand; /* JRCRx - JobR command */
|
||||
|
||||
u32 rsvd12[932];
|
||||
u32 rsvd12[900];
|
||||
|
||||
/* Version registers - introduced with era 10 e80-eff */
|
||||
struct version_regs vreg;
|
||||
/* Performance Monitor f00-fff */
|
||||
struct caam_perfmon perfmon;
|
||||
};
|
||||
@ -878,13 +926,19 @@ struct caam_deco {
|
||||
u32 rsvd29[48];
|
||||
u32 descbuf[64]; /* DxDESB - Descriptor buffer */
|
||||
u32 rscvd30[193];
|
||||
#define DESC_DBG_DECO_STAT_HOST_ERR 0x00D00000
|
||||
#define DESC_DBG_DECO_STAT_VALID 0x80000000
|
||||
#define DESC_DBG_DECO_STAT_MASK 0x00F00000
|
||||
#define DESC_DBG_DECO_STAT_SHIFT 20
|
||||
u32 desc_dbg; /* DxDDR - DECO Debug Register */
|
||||
u32 rsvd31[126];
|
||||
u32 rsvd31[13];
|
||||
#define DESC_DER_DECO_STAT_MASK 0x000F0000
|
||||
#define DESC_DER_DECO_STAT_SHIFT 16
|
||||
u32 dbg_exec; /* DxDER - DECO Debug Exec Register */
|
||||
u32 rsvd32[112];
|
||||
};
|
||||
|
||||
#define DECO_STAT_HOST_ERR 0xD
|
||||
|
||||
#define DECO_JQCR_WHL 0x20000000
|
||||
#define DECO_JQCR_FOUR 0x10000000
|
||||
|
||||
|
@ -6,7 +6,10 @@ n5pf-objs := nitrox_main.o \
|
||||
nitrox_lib.o \
|
||||
nitrox_hal.o \
|
||||
nitrox_reqmgr.o \
|
||||
nitrox_algs.o
|
||||
nitrox_algs.o \
|
||||
nitrox_mbx.o \
|
||||
nitrox_skcipher.o \
|
||||
nitrox_aead.o
|
||||
|
||||
n5pf-$(CONFIG_PCI_IOV) += nitrox_sriov.o
|
||||
n5pf-$(CONFIG_DEBUG_FS) += nitrox_debugfs.o
|
||||
|
364
drivers/crypto/cavium/nitrox/nitrox_aead.c
Normal file
364
drivers/crypto/cavium/nitrox/nitrox_aead.c
Normal file
@ -0,0 +1,364 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/printk.h>
|
||||
#include <linux/crypto.h>
|
||||
#include <linux/rtnetlink.h>
|
||||
|
||||
#include <crypto/aead.h>
|
||||
#include <crypto/authenc.h>
|
||||
#include <crypto/des.h>
|
||||
#include <crypto/sha.h>
|
||||
#include <crypto/internal/aead.h>
|
||||
#include <crypto/scatterwalk.h>
|
||||
#include <crypto/gcm.h>
|
||||
|
||||
#include "nitrox_dev.h"
|
||||
#include "nitrox_common.h"
|
||||
#include "nitrox_req.h"
|
||||
|
||||
#define GCM_AES_SALT_SIZE 4
|
||||
|
||||
/**
|
||||
* struct nitrox_crypt_params - Params to set nitrox crypto request.
|
||||
* @cryptlen: Encryption/Decryption data length
|
||||
* @authlen: Assoc data length + Cryptlen
|
||||
* @srclen: Input buffer length
|
||||
* @dstlen: Output buffer length
|
||||
* @iv: IV data
|
||||
* @ivsize: IV data length
|
||||
* @ctrl_arg: Identifies the request type (ENCRYPT/DECRYPT)
|
||||
*/
|
||||
struct nitrox_crypt_params {
|
||||
unsigned int cryptlen;
|
||||
unsigned int authlen;
|
||||
unsigned int srclen;
|
||||
unsigned int dstlen;
|
||||
u8 *iv;
|
||||
int ivsize;
|
||||
u8 ctrl_arg;
|
||||
};
|
||||
|
||||
union gph_p3 {
|
||||
struct {
|
||||
#ifdef __BIG_ENDIAN_BITFIELD
|
||||
u16 iv_offset : 8;
|
||||
u16 auth_offset : 8;
|
||||
#else
|
||||
u16 auth_offset : 8;
|
||||
u16 iv_offset : 8;
|
||||
#endif
|
||||
};
|
||||
u16 param;
|
||||
};
|
||||
|
||||
static int nitrox_aes_gcm_setkey(struct crypto_aead *aead, const u8 *key,
|
||||
unsigned int keylen)
|
||||
{
|
||||
int aes_keylen;
|
||||
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
|
||||
struct flexi_crypto_context *fctx;
|
||||
union fc_ctx_flags flags;
|
||||
|
||||
aes_keylen = flexi_aes_keylen(keylen);
|
||||
if (aes_keylen < 0) {
|
||||
crypto_aead_set_flags(aead, CRYPTO_TFM_RES_BAD_KEY_LEN);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/* fill crypto context */
|
||||
fctx = nctx->u.fctx;
|
||||
flags.f = be64_to_cpu(fctx->flags.f);
|
||||
flags.w0.aes_keylen = aes_keylen;
|
||||
fctx->flags.f = cpu_to_be64(flags.f);
|
||||
|
||||
/* copy enc key to context */
|
||||
memset(&fctx->crypto, 0, sizeof(fctx->crypto));
|
||||
memcpy(fctx->crypto.u.key, key, keylen);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int nitrox_aead_setauthsize(struct crypto_aead *aead,
|
||||
unsigned int authsize)
|
||||
{
|
||||
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
|
||||
struct flexi_crypto_context *fctx = nctx->u.fctx;
|
||||
union fc_ctx_flags flags;
|
||||
|
||||
flags.f = be64_to_cpu(fctx->flags.f);
|
||||
flags.w0.mac_len = authsize;
|
||||
fctx->flags.f = cpu_to_be64(flags.f);
|
||||
|
||||
aead->authsize = authsize;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int alloc_src_sglist(struct aead_request *areq, char *iv, int ivsize,
|
||||
int buflen)
|
||||
{
|
||||
struct nitrox_kcrypt_request *nkreq = aead_request_ctx(areq);
|
||||
int nents = sg_nents_for_len(areq->src, buflen) + 1;
|
||||
int ret;
|
||||
|
||||
if (nents < 0)
|
||||
return nents;
|
||||
|
||||
/* Allocate buffer to hold IV and input scatterlist array */
|
||||
ret = alloc_src_req_buf(nkreq, nents, ivsize);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
nitrox_creq_copy_iv(nkreq->src, iv, ivsize);
|
||||
nitrox_creq_set_src_sg(nkreq, nents, ivsize, areq->src, buflen);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int alloc_dst_sglist(struct aead_request *areq, int ivsize, int buflen)
|
||||
{
|
||||
struct nitrox_kcrypt_request *nkreq = aead_request_ctx(areq);
|
||||
int nents = sg_nents_for_len(areq->dst, buflen) + 3;
|
||||
int ret;
|
||||
|
||||
if (nents < 0)
|
||||
return nents;
|
||||
|
||||
/* Allocate buffer to hold ORH, COMPLETION and output scatterlist
|
||||
* array
|
||||
*/
|
||||
ret = alloc_dst_req_buf(nkreq, nents);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
nitrox_creq_set_orh(nkreq);
|
||||
nitrox_creq_set_comp(nkreq);
|
||||
nitrox_creq_set_dst_sg(nkreq, nents, ivsize, areq->dst, buflen);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void free_src_sglist(struct aead_request *areq)
|
||||
{
|
||||
struct nitrox_kcrypt_request *nkreq = aead_request_ctx(areq);
|
||||
|
||||
kfree(nkreq->src);
|
||||
}
|
||||
|
||||
static void free_dst_sglist(struct aead_request *areq)
|
||||
{
|
||||
struct nitrox_kcrypt_request *nkreq = aead_request_ctx(areq);
|
||||
|
||||
kfree(nkreq->dst);
|
||||
}
|
||||
|
||||
static int nitrox_set_creq(struct aead_request *areq,
|
||||
struct nitrox_crypt_params *params)
|
||||
{
|
||||
struct nitrox_kcrypt_request *nkreq = aead_request_ctx(areq);
|
||||
struct se_crypto_request *creq = &nkreq->creq;
|
||||
struct crypto_aead *aead = crypto_aead_reqtfm(areq);
|
||||
union gph_p3 param3;
|
||||
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
|
||||
int ret;
|
||||
|
||||
creq->flags = areq->base.flags;
|
||||
creq->gfp = (areq->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP) ?
|
||||
GFP_KERNEL : GFP_ATOMIC;
|
||||
|
||||
creq->ctrl.value = 0;
|
||||
creq->opcode = FLEXI_CRYPTO_ENCRYPT_HMAC;
|
||||
creq->ctrl.s.arg = params->ctrl_arg;
|
||||
|
||||
creq->gph.param0 = cpu_to_be16(params->cryptlen);
|
||||
creq->gph.param1 = cpu_to_be16(params->authlen);
|
||||
creq->gph.param2 = cpu_to_be16(params->ivsize + areq->assoclen);
|
||||
param3.iv_offset = 0;
|
||||
param3.auth_offset = params->ivsize;
|
||||
creq->gph.param3 = cpu_to_be16(param3.param);
|
||||
|
||||
creq->ctx_handle = nctx->u.ctx_handle;
|
||||
creq->ctrl.s.ctxl = sizeof(struct flexi_crypto_context);
|
||||
|
||||
ret = alloc_src_sglist(areq, params->iv, params->ivsize,
|
||||
params->srclen);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
ret = alloc_dst_sglist(areq, params->ivsize, params->dstlen);
|
||||
if (ret) {
|
||||
free_src_sglist(areq);
|
||||
return ret;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void nitrox_aead_callback(void *arg, int err)
|
||||
{
|
||||
struct aead_request *areq = arg;
|
||||
|
||||
free_src_sglist(areq);
|
||||
free_dst_sglist(areq);
|
||||
if (err) {
|
||||
pr_err_ratelimited("request failed status 0x%0x\n", err);
|
||||
err = -EINVAL;
|
||||
}
|
||||
|
||||
areq->base.complete(&areq->base, err);
|
||||
}
|
||||
|
||||
static int nitrox_aes_gcm_enc(struct aead_request *areq)
|
||||
{
|
||||
struct crypto_aead *aead = crypto_aead_reqtfm(areq);
|
||||
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
|
||||
struct nitrox_kcrypt_request *nkreq = aead_request_ctx(areq);
|
||||
struct se_crypto_request *creq = &nkreq->creq;
|
||||
struct flexi_crypto_context *fctx = nctx->u.fctx;
|
||||
struct nitrox_crypt_params params;
|
||||
int ret;
|
||||
|
||||
memcpy(fctx->crypto.iv, areq->iv, GCM_AES_SALT_SIZE);
|
||||
|
||||
memset(¶ms, 0, sizeof(params));
|
||||
params.cryptlen = areq->cryptlen;
|
||||
params.authlen = areq->assoclen + params.cryptlen;
|
||||
params.srclen = params.authlen;
|
||||
params.dstlen = params.srclen + aead->authsize;
|
||||
params.iv = &areq->iv[GCM_AES_SALT_SIZE];
|
||||
params.ivsize = GCM_AES_IV_SIZE - GCM_AES_SALT_SIZE;
|
||||
params.ctrl_arg = ENCRYPT;
|
||||
ret = nitrox_set_creq(areq, ¶ms);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
/* send the crypto request */
|
||||
return nitrox_process_se_request(nctx->ndev, creq, nitrox_aead_callback,
|
||||
areq);
|
||||
}
|
||||
|
||||
static int nitrox_aes_gcm_dec(struct aead_request *areq)
|
||||
{
|
||||
struct crypto_aead *aead = crypto_aead_reqtfm(areq);
|
||||
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
|
||||
struct nitrox_kcrypt_request *nkreq = aead_request_ctx(areq);
|
||||
struct se_crypto_request *creq = &nkreq->creq;
|
||||
struct flexi_crypto_context *fctx = nctx->u.fctx;
|
||||
struct nitrox_crypt_params params;
|
||||
int ret;
|
||||
|
||||
memcpy(fctx->crypto.iv, areq->iv, GCM_AES_SALT_SIZE);
|
||||
|
||||
memset(¶ms, 0, sizeof(params));
|
||||
params.cryptlen = areq->cryptlen - aead->authsize;
|
||||
params.authlen = areq->assoclen + params.cryptlen;
|
||||
params.srclen = areq->cryptlen + areq->assoclen;
|
||||
params.dstlen = params.srclen - aead->authsize;
|
||||
params.iv = &areq->iv[GCM_AES_SALT_SIZE];
|
||||
params.ivsize = GCM_AES_IV_SIZE - GCM_AES_SALT_SIZE;
|
||||
params.ctrl_arg = DECRYPT;
|
||||
ret = nitrox_set_creq(areq, ¶ms);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
/* send the crypto request */
|
||||
return nitrox_process_se_request(nctx->ndev, creq, nitrox_aead_callback,
|
||||
areq);
|
||||
}
|
||||
|
||||
static int nitrox_aead_init(struct crypto_aead *aead)
|
||||
{
|
||||
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
|
||||
struct crypto_ctx_hdr *chdr;
|
||||
|
||||
/* get the first device */
|
||||
nctx->ndev = nitrox_get_first_device();
|
||||
if (!nctx->ndev)
|
||||
return -ENODEV;
|
||||
|
||||
/* allocate nitrox crypto context */
|
||||
chdr = crypto_alloc_context(nctx->ndev);
|
||||
if (!chdr) {
|
||||
nitrox_put_device(nctx->ndev);
|
||||
return -ENOMEM;
|
||||
}
|
||||
nctx->chdr = chdr;
|
||||
nctx->u.ctx_handle = (uintptr_t)((u8 *)chdr->vaddr +
|
||||
sizeof(struct ctx_hdr));
|
||||
nctx->u.fctx->flags.f = 0;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int nitrox_aes_gcm_init(struct crypto_aead *aead)
|
||||
{
|
||||
int ret;
|
||||
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
|
||||
union fc_ctx_flags *flags;
|
||||
|
||||
ret = nitrox_aead_init(aead);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
flags = &nctx->u.fctx->flags;
|
||||
flags->w0.cipher_type = CIPHER_AES_GCM;
|
||||
flags->w0.hash_type = AUTH_NULL;
|
||||
flags->w0.iv_source = IV_FROM_DPTR;
|
||||
/* ask microcode to calculate ipad/opad */
|
||||
flags->w0.auth_input_type = 1;
|
||||
flags->f = be64_to_cpu(flags->f);
|
||||
|
||||
crypto_aead_set_reqsize(aead, sizeof(struct aead_request) +
|
||||
sizeof(struct nitrox_kcrypt_request));
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void nitrox_aead_exit(struct crypto_aead *aead)
|
||||
{
|
||||
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
|
||||
|
||||
/* free the nitrox crypto context */
|
||||
if (nctx->u.ctx_handle) {
|
||||
struct flexi_crypto_context *fctx = nctx->u.fctx;
|
||||
|
||||
memzero_explicit(&fctx->crypto, sizeof(struct crypto_keys));
|
||||
memzero_explicit(&fctx->auth, sizeof(struct auth_keys));
|
||||
crypto_free_context((void *)nctx->chdr);
|
||||
}
|
||||
nitrox_put_device(nctx->ndev);
|
||||
|
||||
nctx->u.ctx_handle = 0;
|
||||
nctx->ndev = NULL;
|
||||
}
|
||||
|
||||
static struct aead_alg nitrox_aeads[] = { {
|
||||
.base = {
|
||||
.cra_name = "gcm(aes)",
|
||||
.cra_driver_name = "n5_aes_gcm",
|
||||
.cra_priority = PRIO,
|
||||
.cra_flags = CRYPTO_ALG_ASYNC,
|
||||
.cra_blocksize = AES_BLOCK_SIZE,
|
||||
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
|
||||
.cra_alignmask = 0,
|
||||
.cra_module = THIS_MODULE,
|
||||
},
|
||||
.setkey = nitrox_aes_gcm_setkey,
|
||||
.setauthsize = nitrox_aead_setauthsize,
|
||||
.encrypt = nitrox_aes_gcm_enc,
|
||||
.decrypt = nitrox_aes_gcm_dec,
|
||||
.init = nitrox_aes_gcm_init,
|
||||
.exit = nitrox_aead_exit,
|
||||
.ivsize = GCM_AES_IV_SIZE,
|
||||
.maxauthsize = AES_BLOCK_SIZE,
|
||||
} };
|
||||
|
||||
int nitrox_register_aeads(void)
|
||||
{
|
||||
return crypto_register_aeads(nitrox_aeads, ARRAY_SIZE(nitrox_aeads));
|
||||
}
|
||||
|
||||
void nitrox_unregister_aeads(void)
|
||||
{
|
||||
crypto_unregister_aeads(nitrox_aeads, ARRAY_SIZE(nitrox_aeads));
|
||||
}
|
@ -1,458 +1,24 @@
|
||||
// SPDX-License-Identifier: GPL-2.0
|
||||
#include <linux/crypto.h>
|
||||
#include <linux/kernel.h>
|
||||
#include <linux/module.h>
|
||||
#include <linux/printk.h>
|
||||
|
||||
#include <crypto/aes.h>
|
||||
#include <crypto/skcipher.h>
|
||||
#include <crypto/ctr.h>
|
||||
#include <crypto/des.h>
|
||||
#include <crypto/xts.h>
|
||||
|
||||
#include "nitrox_dev.h"
|
||||
#include "nitrox_common.h"
|
||||
#include "nitrox_req.h"
|
||||
|
||||
#define PRIO 4001
|
||||
|
||||
struct nitrox_cipher {
|
||||
const char *name;
|
||||
enum flexi_cipher value;
|
||||
};
|
||||
|
||||
/**
|
||||
* supported cipher list
|
||||
*/
|
||||
static const struct nitrox_cipher flexi_cipher_table[] = {
|
||||
{ "null", CIPHER_NULL },
|
||||
{ "cbc(des3_ede)", CIPHER_3DES_CBC },
|
||||
{ "ecb(des3_ede)", CIPHER_3DES_ECB },
|
||||
{ "cbc(aes)", CIPHER_AES_CBC },
|
||||
{ "ecb(aes)", CIPHER_AES_ECB },
|
||||
{ "cfb(aes)", CIPHER_AES_CFB },
|
||||
{ "rfc3686(ctr(aes))", CIPHER_AES_CTR },
|
||||
{ "xts(aes)", CIPHER_AES_XTS },
|
||||
{ "cts(cbc(aes))", CIPHER_AES_CBC_CTS },
|
||||
{ NULL, CIPHER_INVALID }
|
||||
};
|
||||
|
||||
static enum flexi_cipher flexi_cipher_type(const char *name)
|
||||
{
|
||||
const struct nitrox_cipher *cipher = flexi_cipher_table;
|
||||
|
||||
while (cipher->name) {
|
||||
if (!strcmp(cipher->name, name))
|
||||
break;
|
||||
cipher++;
|
||||
}
|
||||
return cipher->value;
|
||||
}
|
||||
|
||||
static int flexi_aes_keylen(int keylen)
|
||||
{
|
||||
int aes_keylen;
|
||||
|
||||
switch (keylen) {
|
||||
case AES_KEYSIZE_128:
|
||||
aes_keylen = 1;
|
||||
break;
|
||||
case AES_KEYSIZE_192:
|
||||
aes_keylen = 2;
|
||||
break;
|
||||
case AES_KEYSIZE_256:
|
||||
aes_keylen = 3;
|
||||
break;
|
||||
default:
|
||||
aes_keylen = -EINVAL;
|
||||
break;
|
||||
}
|
||||
return aes_keylen;
|
||||
}
|
||||
|
||||
static int nitrox_skcipher_init(struct crypto_skcipher *tfm)
|
||||
{
|
||||
struct nitrox_crypto_ctx *nctx = crypto_skcipher_ctx(tfm);
|
||||
void *fctx;
|
||||
|
||||
/* get the first device */
|
||||
nctx->ndev = nitrox_get_first_device();
|
||||
if (!nctx->ndev)
|
||||
return -ENODEV;
|
||||
|
||||
/* allocate nitrox crypto context */
|
||||
fctx = crypto_alloc_context(nctx->ndev);
|
||||
if (!fctx) {
|
||||
nitrox_put_device(nctx->ndev);
|
||||
return -ENOMEM;
|
||||
}
|
||||
nctx->u.ctx_handle = (uintptr_t)fctx;
|
||||
crypto_skcipher_set_reqsize(tfm, crypto_skcipher_reqsize(tfm) +
|
||||
sizeof(struct nitrox_kcrypt_request));
|
||||
return 0;
|
||||
}
|
||||
|
||||
static void nitrox_skcipher_exit(struct crypto_skcipher *tfm)
|
||||
{
|
||||
struct nitrox_crypto_ctx *nctx = crypto_skcipher_ctx(tfm);
|
||||
|
||||
/* free the nitrox crypto context */
|
||||
if (nctx->u.ctx_handle) {
|
||||
struct flexi_crypto_context *fctx = nctx->u.fctx;
|
||||
|
||||
memset(&fctx->crypto, 0, sizeof(struct crypto_keys));
|
||||
memset(&fctx->auth, 0, sizeof(struct auth_keys));
|
||||
crypto_free_context((void *)fctx);
|
||||
}
|
||||
nitrox_put_device(nctx->ndev);
|
||||
|
||||
nctx->u.ctx_handle = 0;
|
||||
nctx->ndev = NULL;
|
||||
}
|
||||
|
||||
static inline int nitrox_skcipher_setkey(struct crypto_skcipher *cipher,
|
||||
int aes_keylen, const u8 *key,
|
||||
unsigned int keylen)
|
||||
{
|
||||
struct crypto_tfm *tfm = crypto_skcipher_tfm(cipher);
|
||||
struct nitrox_crypto_ctx *nctx = crypto_tfm_ctx(tfm);
|
||||
struct flexi_crypto_context *fctx;
|
||||
enum flexi_cipher cipher_type;
|
||||
const char *name;
|
||||
|
||||
name = crypto_tfm_alg_name(tfm);
|
||||
cipher_type = flexi_cipher_type(name);
|
||||
if (unlikely(cipher_type == CIPHER_INVALID)) {
|
||||
pr_err("unsupported cipher: %s\n", name);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
/* fill crypto context */
|
||||
fctx = nctx->u.fctx;
|
||||
fctx->flags = 0;
|
||||
fctx->w0.cipher_type = cipher_type;
|
||||
fctx->w0.aes_keylen = aes_keylen;
|
||||
fctx->w0.iv_source = IV_FROM_DPTR;
|
||||
fctx->flags = cpu_to_be64(*(u64 *)&fctx->w0);
|
||||
/* copy the key to context */
|
||||
memcpy(fctx->crypto.u.key, key, keylen);
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
static int nitrox_aes_setkey(struct crypto_skcipher *cipher, const u8 *key,
|
||||
unsigned int keylen)
|
||||
{
|
||||
int aes_keylen;
|
||||
|
||||
aes_keylen = flexi_aes_keylen(keylen);
|
||||
if (aes_keylen < 0) {
|
||||
crypto_skcipher_set_flags(cipher, CRYPTO_TFM_RES_BAD_KEY_LEN);
|
||||
return -EINVAL;
|
||||
}
|
||||
return nitrox_skcipher_setkey(cipher, aes_keylen, key, keylen);
|
||||
}
|
||||
|
||||
static void nitrox_skcipher_callback(struct skcipher_request *skreq,
|
||||
int err)
|
||||
{
|
||||
if (err) {
|
||||
pr_err_ratelimited("request failed status 0x%0x\n", err);
|
||||
err = -EINVAL;
|
||||
}
|
||||
skcipher_request_complete(skreq, err);
|
||||
}
|
||||
|
||||
static int nitrox_skcipher_crypt(struct skcipher_request *skreq, bool enc)
|
||||
{
|
||||
struct crypto_skcipher *cipher = crypto_skcipher_reqtfm(skreq);
|
||||
struct nitrox_crypto_ctx *nctx = crypto_skcipher_ctx(cipher);
|
||||
struct nitrox_kcrypt_request *nkreq = skcipher_request_ctx(skreq);
|
||||
int ivsize = crypto_skcipher_ivsize(cipher);
|
||||
struct se_crypto_request *creq;
|
||||
|
||||
creq = &nkreq->creq;
|
||||
creq->flags = skreq->base.flags;
|
||||
creq->gfp = (skreq->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP) ?
|
||||
GFP_KERNEL : GFP_ATOMIC;
|
||||
|
||||
/* fill the request */
|
||||
creq->ctrl.value = 0;
|
||||
creq->opcode = FLEXI_CRYPTO_ENCRYPT_HMAC;
|
||||
creq->ctrl.s.arg = (enc ? ENCRYPT : DECRYPT);
|
||||
/* param0: length of the data to be encrypted */
|
||||
creq->gph.param0 = cpu_to_be16(skreq->cryptlen);
|
||||
creq->gph.param1 = 0;
|
||||
/* param2: encryption data offset */
|
||||
creq->gph.param2 = cpu_to_be16(ivsize);
|
||||
creq->gph.param3 = 0;
|
||||
|
||||
creq->ctx_handle = nctx->u.ctx_handle;
|
||||
creq->ctrl.s.ctxl = sizeof(struct flexi_crypto_context);
|
||||
|
||||
/* copy the iv */
|
||||
memcpy(creq->iv, skreq->iv, ivsize);
|
||||
creq->ivsize = ivsize;
|
||||
creq->src = skreq->src;
|
||||
creq->dst = skreq->dst;
|
||||
|
||||
nkreq->nctx = nctx;
|
||||
nkreq->skreq = skreq;
|
||||
|
||||
/* send the crypto request */
|
||||
return nitrox_process_se_request(nctx->ndev, creq,
|
||||
nitrox_skcipher_callback, skreq);
|
||||
}
|
||||
|
||||
static int nitrox_aes_encrypt(struct skcipher_request *skreq)
|
||||
{
|
||||
return nitrox_skcipher_crypt(skreq, true);
|
||||
}
|
||||
|
||||
static int nitrox_aes_decrypt(struct skcipher_request *skreq)
|
||||
{
|
||||
return nitrox_skcipher_crypt(skreq, false);
|
||||
}
|
||||
|
||||
static int nitrox_3des_setkey(struct crypto_skcipher *cipher,
|
||||
const u8 *key, unsigned int keylen)
|
||||
{
|
||||
if (keylen != DES3_EDE_KEY_SIZE) {
|
||||
crypto_skcipher_set_flags(cipher, CRYPTO_TFM_RES_BAD_KEY_LEN);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
return nitrox_skcipher_setkey(cipher, 0, key, keylen);
|
||||
}
|
||||
|
||||
static int nitrox_3des_encrypt(struct skcipher_request *skreq)
|
||||
{
|
||||
return nitrox_skcipher_crypt(skreq, true);
|
||||
}
|
||||
|
||||
static int nitrox_3des_decrypt(struct skcipher_request *skreq)
|
||||
{
|
||||
return nitrox_skcipher_crypt(skreq, false);
|
||||
}
|
||||
|
||||
static int nitrox_aes_xts_setkey(struct crypto_skcipher *cipher,
|
||||
const u8 *key, unsigned int keylen)
|
||||
{
|
||||
struct crypto_tfm *tfm = crypto_skcipher_tfm(cipher);
|
||||
struct nitrox_crypto_ctx *nctx = crypto_tfm_ctx(tfm);
|
||||
struct flexi_crypto_context *fctx;
|
||||
int aes_keylen, ret;
|
||||
|
||||
ret = xts_check_key(tfm, key, keylen);
|
||||
if (ret)
|
||||
return ret;
|
||||
|
||||
keylen /= 2;
|
||||
|
||||
aes_keylen = flexi_aes_keylen(keylen);
|
||||
if (aes_keylen < 0) {
|
||||
crypto_skcipher_set_flags(cipher, CRYPTO_TFM_RES_BAD_KEY_LEN);
|
||||
return -EINVAL;
|
||||
}
|
||||
|
||||
fctx = nctx->u.fctx;
|
||||
/* copy KEY2 */
|
||||
memcpy(fctx->auth.u.key2, (key + keylen), keylen);
|
||||
|
||||
return nitrox_skcipher_setkey(cipher, aes_keylen, key, keylen);
|
||||
}
|
||||
|
||||
static int nitrox_aes_ctr_rfc3686_setkey(struct crypto_skcipher *cipher,
|
||||
const u8 *key, unsigned int keylen)
|
||||
{
|
||||
struct crypto_tfm *tfm = crypto_skcipher_tfm(cipher);
|
||||
struct nitrox_crypto_ctx *nctx = crypto_tfm_ctx(tfm);
|
||||
struct flexi_crypto_context *fctx;
|
||||
int aes_keylen;
|
||||
|
||||
if (keylen < CTR_RFC3686_NONCE_SIZE)
|
||||
return -EINVAL;
|
||||
|
||||
fctx = nctx->u.fctx;
|
||||
|
||||
memcpy(fctx->crypto.iv, key + (keylen - CTR_RFC3686_NONCE_SIZE),
|
||||
CTR_RFC3686_NONCE_SIZE);
|
||||
|
||||
keylen -= CTR_RFC3686_NONCE_SIZE;
|
||||
|
||||
aes_keylen = flexi_aes_keylen(keylen);
|
||||
if (aes_keylen < 0) {
|
||||
crypto_skcipher_set_flags(cipher, CRYPTO_TFM_RES_BAD_KEY_LEN);
|
||||
return -EINVAL;
|
||||
}
|
||||
return nitrox_skcipher_setkey(cipher, aes_keylen, key, keylen);
|
||||
}
|
||||
|
||||
static struct skcipher_alg nitrox_skciphers[] = { {
|
||||
.base = {
|
||||
.cra_name = "cbc(aes)",
|
||||
.cra_driver_name = "n5_cbc(aes)",
|
||||
.cra_priority = PRIO,
|
||||
.cra_flags = CRYPTO_ALG_ASYNC,
|
||||
.cra_blocksize = AES_BLOCK_SIZE,
|
||||
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
|
||||
.cra_alignmask = 0,
|
||||
.cra_module = THIS_MODULE,
|
||||
},
|
||||
.min_keysize = AES_MIN_KEY_SIZE,
|
||||
.max_keysize = AES_MAX_KEY_SIZE,
|
||||
.ivsize = AES_BLOCK_SIZE,
|
||||
.setkey = nitrox_aes_setkey,
|
||||
.encrypt = nitrox_aes_encrypt,
|
||||
.decrypt = nitrox_aes_decrypt,
|
||||
.init = nitrox_skcipher_init,
|
||||
.exit = nitrox_skcipher_exit,
|
||||
}, {
|
||||
.base = {
|
||||
.cra_name = "ecb(aes)",
|
||||
.cra_driver_name = "n5_ecb(aes)",
|
||||
.cra_priority = PRIO,
|
||||
.cra_flags = CRYPTO_ALG_ASYNC,
|
||||
.cra_blocksize = AES_BLOCK_SIZE,
|
||||
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
|
||||
.cra_alignmask = 0,
|
||||
.cra_module = THIS_MODULE,
|
||||
},
|
||||
.min_keysize = AES_MIN_KEY_SIZE,
|
||||
.max_keysize = AES_MAX_KEY_SIZE,
|
||||
.ivsize = AES_BLOCK_SIZE,
|
||||
.setkey = nitrox_aes_setkey,
|
||||
.encrypt = nitrox_aes_encrypt,
|
||||
.decrypt = nitrox_aes_decrypt,
|
||||
.init = nitrox_skcipher_init,
|
||||
.exit = nitrox_skcipher_exit,
|
||||
}, {
|
||||
.base = {
|
||||
.cra_name = "cfb(aes)",
|
||||
.cra_driver_name = "n5_cfb(aes)",
|
||||
.cra_priority = PRIO,
|
||||
.cra_flags = CRYPTO_ALG_ASYNC,
|
||||
.cra_blocksize = AES_BLOCK_SIZE,
|
||||
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
|
||||
.cra_alignmask = 0,
|
||||
.cra_module = THIS_MODULE,
|
||||
},
|
||||
.min_keysize = AES_MIN_KEY_SIZE,
|
||||
.max_keysize = AES_MAX_KEY_SIZE,
|
||||
.ivsize = AES_BLOCK_SIZE,
|
||||
.setkey = nitrox_aes_setkey,
|
||||
.encrypt = nitrox_aes_encrypt,
|
||||
.decrypt = nitrox_aes_decrypt,
|
||||
.init = nitrox_skcipher_init,
|
||||
.exit = nitrox_skcipher_exit,
|
||||
}, {
|
||||
.base = {
|
||||
.cra_name = "xts(aes)",
|
||||
.cra_driver_name = "n5_xts(aes)",
|
||||
.cra_priority = PRIO,
|
||||
.cra_flags = CRYPTO_ALG_ASYNC,
|
||||
.cra_blocksize = AES_BLOCK_SIZE,
|
||||
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
|
||||
.cra_alignmask = 0,
|
||||
.cra_module = THIS_MODULE,
|
||||
},
|
||||
.min_keysize = 2 * AES_MIN_KEY_SIZE,
|
||||
.max_keysize = 2 * AES_MAX_KEY_SIZE,
|
||||
.ivsize = AES_BLOCK_SIZE,
|
||||
.setkey = nitrox_aes_xts_setkey,
|
||||
.encrypt = nitrox_aes_encrypt,
|
||||
.decrypt = nitrox_aes_decrypt,
|
||||
.init = nitrox_skcipher_init,
|
||||
.exit = nitrox_skcipher_exit,
|
||||
}, {
|
||||
.base = {
|
||||
.cra_name = "rfc3686(ctr(aes))",
|
||||
.cra_driver_name = "n5_rfc3686(ctr(aes))",
|
||||
.cra_priority = PRIO,
|
||||
.cra_flags = CRYPTO_ALG_ASYNC,
|
||||
.cra_blocksize = 1,
|
||||
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
|
||||
.cra_alignmask = 0,
|
||||
.cra_module = THIS_MODULE,
|
||||
},
|
||||
.min_keysize = AES_MIN_KEY_SIZE + CTR_RFC3686_NONCE_SIZE,
|
||||
.max_keysize = AES_MAX_KEY_SIZE + CTR_RFC3686_NONCE_SIZE,
|
||||
.ivsize = CTR_RFC3686_IV_SIZE,
|
||||
.init = nitrox_skcipher_init,
|
||||
.exit = nitrox_skcipher_exit,
|
||||
.setkey = nitrox_aes_ctr_rfc3686_setkey,
|
||||
.encrypt = nitrox_aes_encrypt,
|
||||
.decrypt = nitrox_aes_decrypt,
|
||||
}, {
|
||||
.base = {
|
||||
.cra_name = "cts(cbc(aes))",
|
||||
.cra_driver_name = "n5_cts(cbc(aes))",
|
||||
.cra_priority = PRIO,
|
||||
.cra_flags = CRYPTO_ALG_ASYNC,
|
||||
.cra_blocksize = AES_BLOCK_SIZE,
|
||||
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
|
||||
.cra_alignmask = 0,
|
||||
.cra_type = &crypto_ablkcipher_type,
|
||||
.cra_module = THIS_MODULE,
|
||||
},
|
||||
.min_keysize = AES_MIN_KEY_SIZE,
|
||||
.max_keysize = AES_MAX_KEY_SIZE,
|
||||
.ivsize = AES_BLOCK_SIZE,
|
||||
.setkey = nitrox_aes_setkey,
|
||||
.encrypt = nitrox_aes_encrypt,
|
||||
.decrypt = nitrox_aes_decrypt,
|
||||
.init = nitrox_skcipher_init,
|
||||
.exit = nitrox_skcipher_exit,
|
||||
}, {
|
||||
.base = {
|
||||
.cra_name = "cbc(des3_ede)",
|
||||
.cra_driver_name = "n5_cbc(des3_ede)",
|
||||
.cra_priority = PRIO,
|
||||
.cra_flags = CRYPTO_ALG_ASYNC,
|
||||
.cra_blocksize = DES3_EDE_BLOCK_SIZE,
|
||||
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
|
||||
.cra_alignmask = 0,
|
||||
.cra_module = THIS_MODULE,
|
||||
},
|
||||
.min_keysize = DES3_EDE_KEY_SIZE,
|
||||
.max_keysize = DES3_EDE_KEY_SIZE,
|
||||
.ivsize = DES3_EDE_BLOCK_SIZE,
|
||||
.setkey = nitrox_3des_setkey,
|
||||
.encrypt = nitrox_3des_encrypt,
|
||||
.decrypt = nitrox_3des_decrypt,
|
||||
.init = nitrox_skcipher_init,
|
||||
.exit = nitrox_skcipher_exit,
|
||||
}, {
|
||||
.base = {
|
||||
.cra_name = "ecb(des3_ede)",
|
||||
.cra_driver_name = "n5_ecb(des3_ede)",
|
||||
.cra_priority = PRIO,
|
||||
.cra_flags = CRYPTO_ALG_ASYNC,
|
||||
.cra_blocksize = DES3_EDE_BLOCK_SIZE,
|
||||
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
|
||||
.cra_alignmask = 0,
|
||||
.cra_module = THIS_MODULE,
|
||||
},
|
||||
.min_keysize = DES3_EDE_KEY_SIZE,
|
||||
.max_keysize = DES3_EDE_KEY_SIZE,
|
||||
.ivsize = DES3_EDE_BLOCK_SIZE,
|
||||
.setkey = nitrox_3des_setkey,
|
||||
.encrypt = nitrox_3des_encrypt,
|
||||
.decrypt = nitrox_3des_decrypt,
|
||||
.init = nitrox_skcipher_init,
|
||||
.exit = nitrox_skcipher_exit,
|
||||
}
|
||||
|
||||
};
|
||||
|
||||
int nitrox_crypto_register(void)
|
||||
{
|
||||
return crypto_register_skciphers(nitrox_skciphers,
|
||||
ARRAY_SIZE(nitrox_skciphers));
|
||||
int err;
|
||||
|
||||
err = nitrox_register_skciphers();
|
||||
if (err)
|
||||
return err;
|
||||
|
||||
err = nitrox_register_aeads();
|
||||
if (err) {
|
||||
nitrox_unregister_skciphers();
|
||||
return err;
|
||||
}
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
void nitrox_crypto_unregister(void)
|
||||
{
|
||||
crypto_unregister_skciphers(nitrox_skciphers,
|
||||
ARRAY_SIZE(nitrox_skciphers));
|
||||
nitrox_unregister_aeads();
|
||||
nitrox_unregister_skciphers();
|
||||
}
|
||||
|
@ -7,6 +7,10 @@
|
||||
|
||||
int nitrox_crypto_register(void);
|
||||
void nitrox_crypto_unregister(void);
|
||||
int nitrox_register_aeads(void);
|
||||
void nitrox_unregister_aeads(void);
|
||||
int nitrox_register_skciphers(void);
|
||||
void nitrox_unregister_skciphers(void);
|
||||
void *crypto_alloc_context(struct nitrox_device *ndev);
|
||||
void crypto_free_context(void *ctx);
|
||||
struct nitrox_device *nitrox_get_first_device(void);
|
||||
@ -19,7 +23,7 @@ void pkt_slc_resp_tasklet(unsigned long data);
|
||||
int nitrox_process_se_request(struct nitrox_device *ndev,
|
||||
struct se_crypto_request *req,
|
||||
completion_t cb,
|
||||
struct skcipher_request *skreq);
|
||||
void *cb_arg);
|
||||
void backlog_qflush_work(struct work_struct *work);
|
||||
|
||||
|
||||
|
@ -54,7 +54,13 @@
|
||||
#define NPS_STATS_PKT_DMA_WR_CNT 0x1000190
|
||||
|
||||
/* NPS packet registers */
|
||||
#define NPS_PKT_INT 0x1040018
|
||||
#define NPS_PKT_INT 0x1040018
|
||||
#define NPS_PKT_MBOX_INT_LO 0x1040020
|
||||
#define NPS_PKT_MBOX_INT_LO_ENA_W1C 0x1040030
|
||||
#define NPS_PKT_MBOX_INT_LO_ENA_W1S 0x1040038
|
||||
#define NPS_PKT_MBOX_INT_HI 0x1040040
|
||||
#define NPS_PKT_MBOX_INT_HI_ENA_W1C 0x1040050
|
||||
#define NPS_PKT_MBOX_INT_HI_ENA_W1S 0x1040058
|
||||
#define NPS_PKT_IN_RERR_HI 0x1040108
|
||||
#define NPS_PKT_IN_RERR_HI_ENA_W1S 0x1040120
|
||||
#define NPS_PKT_IN_RERR_LO 0x1040128
|
||||
@ -74,6 +80,10 @@
|
||||
#define NPS_PKT_SLC_RERR_LO_ENA_W1S 0x1040240
|
||||
#define NPS_PKT_SLC_ERR_TYPE 0x1040248
|
||||
#define NPS_PKT_SLC_ERR_TYPE_ENA_W1S 0x1040260
|
||||
/* Mailbox PF->VF PF Accessible Data registers */
|
||||
#define NPS_PKT_MBOX_PF_VF_PFDATAX(_i) (0x1040800 + ((_i) * 0x8))
|
||||
#define NPS_PKT_MBOX_VF_PF_PFDATAX(_i) (0x1040C00 + ((_i) * 0x8))
|
||||
|
||||
#define NPS_PKT_SLC_CTLX(_i) (0x10000 + ((_i) * 0x40000))
|
||||
#define NPS_PKT_SLC_CNTSX(_i) (0x10008 + ((_i) * 0x40000))
|
||||
#define NPS_PKT_SLC_INT_LEVELSX(_i) (0x10010 + ((_i) * 0x40000))
|
||||
|
Some files were not shown because too many files have changed in this diff Show More
Loading…
Reference in New Issue
Block a user