Merge branch 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6

Pull crypto updates from Herbert Xu:
 "API:
   - Add 1472-byte test to tcrypt for IPsec
   - Reintroduced crypto stats interface with numerous changes
   - Support incremental algorithm dumps

  Algorithms:
   - Add xchacha12/20
   - Add nhpoly1305
   - Add adiantum
   - Add streebog hash
   - Mark cts(cbc(aes)) as FIPS allowed

  Drivers:
   - Improve performance of arm64/chacha20
   - Improve performance of x86/chacha20
   - Add NEON-accelerated nhpoly1305
   - Add SSE2 accelerated nhpoly1305
   - Add AVX2 accelerated nhpoly1305
   - Add support for 192/256-bit keys in gcmaes AVX
   - Add SG support in gcmaes AVX
   - ESN for inline IPsec tx in chcr
   - Add support for CryptoCell 703 in ccree
   - Add support for CryptoCell 713 in ccree
   - Add SM4 support in ccree
   - Add SM3 support in ccree
   - Add support for chacha20 in caam/qi2
   - Add support for chacha20 + poly1305 in caam/jr
   - Add support for chacha20 + poly1305 in caam/qi2
   - Add AEAD cipher support in cavium/nitrox"

* 'linus' of git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6: (130 commits)
  crypto: skcipher - remove remnants of internal IV generators
  crypto: cavium/nitrox - Fix build with !CONFIG_DEBUG_FS
  crypto: salsa20-generic - don't unnecessarily use atomic walk
  crypto: skcipher - add might_sleep() to skcipher_walk_virt()
  crypto: x86/chacha - avoid sleeping under kernel_fpu_begin()
  crypto: cavium/nitrox - Added AEAD cipher support
  crypto: mxc-scc - fix build warnings on ARM64
  crypto: api - document missing stats member
  crypto: user - remove unused dump functions
  crypto: chelsio - Fix wrong error counter increments
  crypto: chelsio - Reset counters on cxgb4 Detach
  crypto: chelsio - Handle PCI shutdown event
  crypto: chelsio - cleanup:send addr as value in function argument
  crypto: chelsio - Use same value for both channel in single WR
  crypto: chelsio - Swap location of AAD and IV sent in WR
  crypto: chelsio - remove set but not used variable 'kctx_len'
  crypto: ux500 - Use proper enum in hash_set_dma_transfer
  crypto: ux500 - Use proper enum in cryp_set_dma_transfer
  crypto: aesni - Add scatter/gather avx stubs, and use them in C
  crypto: aesni - Introduce partial block macro
  ..
This commit is contained in:
Linus Torvalds 2018-12-27 13:53:32 -08:00
commit b71acb0e37
183 changed files with 15860 additions and 5111 deletions

View File

@ -1,15 +1,6 @@
Programming Interface
=====================
Please note that the kernel crypto API contains the AEAD givcrypt API
(crypto_aead_giv\* and aead_givcrypt\* function calls in
include/crypto/aead.h). This API is obsolete and will be removed in the
future. To obtain the functionality of an AEAD cipher with internal IV
generation, use the IV generator as a regular cipher. For example,
rfc4106(gcm(aes)) is the AEAD cipher with external IV generation and
seqniv(rfc4106(gcm(aes))) implies that the kernel crypto API generates
the IV. Different IV generators are available.
.. class:: toc-title
Table of contents

View File

@ -157,10 +157,6 @@ applicable to a cipher, it is not displayed:
- rng for random number generator
- givcipher for cipher with associated IV generator (see the geniv
entry below for the specification of the IV generator type used by
the cipher implementation)
- kpp for a Key-agreement Protocol Primitive (KPP) cipher such as
an ECDH or DH implementation
@ -174,16 +170,7 @@ applicable to a cipher, it is not displayed:
- digestsize: output size of the message digest
- geniv: IV generation type:
- eseqiv for encrypted sequence number based IV generation
- seqiv for sequence number based IV generation
- chainiv for chain iv generation
- <builtin> is a marker that the cipher implements IV generation and
handling as it is specific to the given cipher
- geniv: IV generator (obsolete)
Key Sizes
---------
@ -218,10 +205,6 @@ the aforementioned cipher types:
- CRYPTO_ALG_TYPE_ABLKCIPHER Asynchronous multi-block cipher
- CRYPTO_ALG_TYPE_GIVCIPHER Asynchronous multi-block cipher packed
together with an IV generator (see geniv field in the /proc/crypto
listing for the known IV generators)
- CRYPTO_ALG_TYPE_KPP Key-agreement Protocol Primitive (KPP) such as
an ECDH or DH implementation
@ -338,18 +321,14 @@ uses the API applicable to the cipher type specified for the block.
The following call sequence is applicable when the IPSEC layer triggers
an encryption operation with the esp_output function. During
configuration, the administrator set up the use of rfc4106(gcm(aes)) as
the cipher for ESP. The following call sequence is now depicted in the
ASCII art above:
configuration, the administrator set up the use of seqiv(rfc4106(gcm(aes)))
as the cipher for ESP. The following call sequence is now depicted in
the ASCII art above:
1. esp_output() invokes crypto_aead_encrypt() to trigger an
encryption operation of the AEAD cipher with IV generator.
In case of GCM, the SEQIV implementation is registered as GIVCIPHER
in crypto_rfc4106_alloc().
The SEQIV performs its operation to generate an IV where the core
function is seqiv_geniv().
The SEQIV generates the IV.
2. Now, SEQIV uses the AEAD API function calls to invoke the associated
AEAD cipher. In our case, during the instantiation of SEQIV, the

View File

@ -1,8 +1,12 @@
Arm TrustZone CryptoCell cryptographic engine
Required properties:
- compatible: Should be one of: "arm,cryptocell-712-ree",
"arm,cryptocell-710-ree" or "arm,cryptocell-630p-ree".
- compatible: Should be one of -
"arm,cryptocell-713-ree"
"arm,cryptocell-703-ree"
"arm,cryptocell-712-ree"
"arm,cryptocell-710-ree"
"arm,cryptocell-630p-ree"
- reg: Base physical address of the engine and length of memory mapped region.
- interrupts: Interrupt number for the device.

View File

@ -6,6 +6,8 @@ Required properties:
- interrupts : Should contain MXS DCP interrupt numbers, VMI IRQ and DCP IRQ
must be supplied, optionally Secure IRQ can be present, but
is currently not implemented and not used.
- clocks : Clock reference (only required on some SOCs: 6ull and 6sll).
- clock-names : Must be "dcp".
Example:

View File

@ -3484,6 +3484,7 @@ F: include/linux/spi/cc2520.h
F: Documentation/devicetree/bindings/net/ieee802154/cc2520.txt
CCREE ARM TRUSTZONE CRYPTOCELL REE DRIVER
M: Yael Chemla <yael.chemla@foss.arm.com>
M: Gilad Ben-Yossef <gilad@benyossef.com>
L: linux-crypto@vger.kernel.org
S: Supported
@ -7147,7 +7148,9 @@ F: crypto/842.c
F: lib/842/
IBM Power in-Nest Crypto Acceleration
M: Paulo Flabiano Smorigo <pfsmorigo@linux.ibm.com>
M: Breno Leitão <leitao@debian.org>
M: Nayna Jain <nayna@linux.ibm.com>
M: Paulo Flabiano Smorigo <pfsmorigo@gmail.com>
L: linux-crypto@vger.kernel.org
S: Supported
F: drivers/crypto/nx/Makefile
@ -7211,7 +7214,9 @@ S: Supported
F: drivers/scsi/ibmvscsi_tgt/
IBM Power VMX Cryptographic instructions
M: Paulo Flabiano Smorigo <pfsmorigo@linux.ibm.com>
M: Breno Leitão <leitao@debian.org>
M: Nayna Jain <nayna@linux.ibm.com>
M: Paulo Flabiano Smorigo <pfsmorigo@gmail.com>
L: linux-crypto@vger.kernel.org
S: Supported
F: drivers/crypto/vmx/Makefile

View File

@ -69,6 +69,15 @@ config CRYPTO_AES_ARM
help
Use optimized AES assembler routines for ARM platforms.
On ARM processors without the Crypto Extensions, this is the
fastest AES implementation for single blocks. For multiple
blocks, the NEON bit-sliced implementation is usually faster.
This implementation may be vulnerable to cache timing attacks,
since it uses lookup tables. However, as countermeasures it
disables IRQs and preloads the tables; it is hoped this makes
such attacks very difficult.
config CRYPTO_AES_ARM_BS
tristate "Bit sliced AES using NEON instructions"
depends on KERNEL_MODE_NEON
@ -117,9 +126,14 @@ config CRYPTO_CRC32_ARM_CE
select CRYPTO_HASH
config CRYPTO_CHACHA20_NEON
tristate "NEON accelerated ChaCha20 symmetric cipher"
tristate "NEON accelerated ChaCha stream cipher algorithms"
depends on KERNEL_MODE_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_CHACHA20
config CRYPTO_NHPOLY1305_NEON
tristate "NEON accelerated NHPoly1305 hash function (for Adiantum)"
depends on KERNEL_MODE_NEON
select CRYPTO_NHPOLY1305
endif

View File

@ -9,7 +9,8 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha-neon.o
obj-$(CONFIG_CRYPTO_NHPOLY1305_NEON) += nhpoly1305-neon.o
ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
@ -52,7 +53,8 @@ aes-arm-ce-y := aes-ce-core.o aes-ce-glue.o
ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o
crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
chacha-neon-y := chacha-neon-core.o chacha-neon-glue.o
nhpoly1305-neon-y := nh-neon-core.o nhpoly1305-neon-glue.o
ifdef REGENERATE_ARM_CRYPTO
quiet_cmd_perl = PERL $@

View File

@ -10,7 +10,6 @@
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/hwcap.h>
#include <crypto/aes.h>
#include <crypto/internal/simd.h>
#include <crypto/internal/skcipher.h>

View File

@ -10,6 +10,7 @@
*/
#include <linux/linkage.h>
#include <asm/assembler.h>
#include <asm/cache.h>
.text
@ -41,7 +42,7 @@
.endif
.endm
.macro __hround, out0, out1, in0, in1, in2, in3, t3, t4, enc, sz, op
.macro __hround, out0, out1, in0, in1, in2, in3, t3, t4, enc, sz, op, oldcpsr
__select \out0, \in0, 0
__select t0, \in1, 1
__load \out0, \out0, 0, \sz, \op
@ -73,6 +74,14 @@
__load t0, t0, 3, \sz, \op
__load \t4, \t4, 3, \sz, \op
.ifnb \oldcpsr
/*
* This is the final round and we're done with all data-dependent table
* lookups, so we can safely re-enable interrupts.
*/
restore_irqs \oldcpsr
.endif
eor \out1, \out1, t1, ror #24
eor \out0, \out0, t2, ror #16
ldm rk!, {t1, t2}
@ -83,14 +92,14 @@
eor \out1, \out1, t2
.endm
.macro fround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op
.macro fround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op, oldcpsr
__hround \out0, \out1, \in0, \in1, \in2, \in3, \out2, \out3, 1, \sz, \op
__hround \out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1, \sz, \op
__hround \out2, \out3, \in2, \in3, \in0, \in1, \in1, \in2, 1, \sz, \op, \oldcpsr
.endm
.macro iround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op
.macro iround, out0, out1, out2, out3, in0, in1, in2, in3, sz=2, op, oldcpsr
__hround \out0, \out1, \in0, \in3, \in2, \in1, \out2, \out3, 0, \sz, \op
__hround \out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0, \sz, \op
__hround \out2, \out3, \in2, \in1, \in0, \in3, \in1, \in0, 0, \sz, \op, \oldcpsr
.endm
.macro __rev, out, in
@ -118,13 +127,14 @@
.macro do_crypt, round, ttab, ltab, bsz
push {r3-r11, lr}
// Load keys first, to reduce latency in case they're not cached yet.
ldm rk!, {r8-r11}
ldr r4, [in]
ldr r5, [in, #4]
ldr r6, [in, #8]
ldr r7, [in, #12]
ldm rk!, {r8-r11}
#ifdef CONFIG_CPU_BIG_ENDIAN
__rev r4, r4
__rev r5, r5
@ -138,6 +148,25 @@
eor r7, r7, r11
__adrl ttab, \ttab
/*
* Disable interrupts and prefetch the 1024-byte 'ft' or 'it' table into
* L1 cache, assuming cacheline size >= 32. This is a hardening measure
* intended to make cache-timing attacks more difficult. They may not
* be fully prevented, however; see the paper
* https://cr.yp.to/antiforgery/cachetiming-20050414.pdf
* ("Cache-timing attacks on AES") for a discussion of the many
* difficulties involved in writing truly constant-time AES software.
*/
save_and_disable_irqs t0
.set i, 0
.rept 1024 / 128
ldr r8, [ttab, #i + 0]
ldr r9, [ttab, #i + 32]
ldr r10, [ttab, #i + 64]
ldr r11, [ttab, #i + 96]
.set i, i + 128
.endr
push {t0} // oldcpsr
tst rounds, #2
bne 1f
@ -151,8 +180,21 @@
\round r4, r5, r6, r7, r8, r9, r10, r11
b 0b
2: __adrl ttab, \ltab
\round r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b
2: .ifb \ltab
add ttab, ttab, #1
.else
__adrl ttab, \ltab
// Prefetch inverse S-box for final round; see explanation above
.set i, 0
.rept 256 / 64
ldr t0, [ttab, #i + 0]
ldr t1, [ttab, #i + 32]
.set i, i + 64
.endr
.endif
pop {rounds} // oldcpsr
\round r4, r5, r6, r7, r8, r9, r10, r11, \bsz, b, rounds
#ifdef CONFIG_CPU_BIG_ENDIAN
__rev r4, r4
@ -175,7 +217,7 @@
.endm
ENTRY(__aes_arm_encrypt)
do_crypt fround, crypto_ft_tab, crypto_ft_tab + 1, 2
do_crypt fround, crypto_ft_tab,, 2
ENDPROC(__aes_arm_encrypt)
.align 5

View File

@ -1,5 +1,5 @@
/*
* ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
* ChaCha/XChaCha NEON helper functions
*
* Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
*
@ -27,9 +27,9 @@
* (d) vtbl.8 + vtbl.8 (multiple of 8 bits rotations only,
* needs index vector)
*
* ChaCha20 has 16, 12, 8, and 7-bit rotations. For the 12 and 7-bit
* rotations, the only choices are (a) and (b). We use (a) since it takes
* two-thirds the cycles of (b) on both Cortex-A7 and Cortex-A53.
* ChaCha has 16, 12, 8, and 7-bit rotations. For the 12 and 7-bit rotations,
* the only choices are (a) and (b). We use (a) since it takes two-thirds the
* cycles of (b) on both Cortex-A7 and Cortex-A53.
*
* For the 16-bit rotation, we use vrev32.16 since it's consistently fastest
* and doesn't need a temporary register.
@ -52,30 +52,20 @@
.fpu neon
.align 5
ENTRY(chacha20_block_xor_neon)
// r0: Input state matrix, s
// r1: 1 data block output, o
// r2: 1 data block input, i
//
// This function encrypts one ChaCha20 block by loading the state matrix
// in four NEON registers. It performs matrix operation on four words in
// parallel, but requireds shuffling to rearrange the words after each
// round.
//
// x0..3 = s0..3
add ip, r0, #0x20
vld1.32 {q0-q1}, [r0]
vld1.32 {q2-q3}, [ip]
vmov q8, q0
vmov q9, q1
vmov q10, q2
vmov q11, q3
/*
* chacha_permute - permute one block
*
* Permute one 64-byte block where the state matrix is stored in the four NEON
* registers q0-q3. It performs matrix operations on four words in parallel,
* but requires shuffling to rearrange the words after each round.
*
* The round count is given in r3.
*
* Clobbers: r3, ip, q4-q5
*/
chacha_permute:
adr ip, .Lrol8_table
mov r3, #10
vld1.8 {d10}, [ip, :64]
.Ldoubleround:
@ -139,9 +129,31 @@ ENTRY(chacha20_block_xor_neon)
// x3 = shuffle32(x3, MASK(0, 3, 2, 1))
vext.8 q3, q3, q3, #4
subs r3, r3, #1
subs r3, r3, #2
bne .Ldoubleround
bx lr
ENDPROC(chacha_permute)
ENTRY(chacha_block_xor_neon)
// r0: Input state matrix, s
// r1: 1 data block output, o
// r2: 1 data block input, i
// r3: nrounds
push {lr}
// x0..3 = s0..3
add ip, r0, #0x20
vld1.32 {q0-q1}, [r0]
vld1.32 {q2-q3}, [ip]
vmov q8, q0
vmov q9, q1
vmov q10, q2
vmov q11, q3
bl chacha_permute
add ip, r2, #0x20
vld1.8 {q4-q5}, [r2]
vld1.8 {q6-q7}, [ip]
@ -166,15 +178,33 @@ ENTRY(chacha20_block_xor_neon)
vst1.8 {q0-q1}, [r1]
vst1.8 {q2-q3}, [ip]
bx lr
ENDPROC(chacha20_block_xor_neon)
pop {pc}
ENDPROC(chacha_block_xor_neon)
ENTRY(hchacha_block_neon)
// r0: Input state matrix, s
// r1: output (8 32-bit words)
// r2: nrounds
push {lr}
vld1.32 {q0-q1}, [r0]!
vld1.32 {q2-q3}, [r0]
mov r3, r2
bl chacha_permute
vst1.32 {q0}, [r1]!
vst1.32 {q3}, [r1]
pop {pc}
ENDPROC(hchacha_block_neon)
.align 4
.Lctrinc: .word 0, 1, 2, 3
.Lrol8_table: .byte 3, 0, 1, 2, 7, 4, 5, 6
.align 5
ENTRY(chacha20_4block_xor_neon)
ENTRY(chacha_4block_xor_neon)
push {r4-r5}
mov r4, sp // preserve the stack pointer
sub ip, sp, #0x20 // allocate a 32 byte buffer
@ -184,9 +214,10 @@ ENTRY(chacha20_4block_xor_neon)
// r0: Input state matrix, s
// r1: 4 data blocks output, o
// r2: 4 data blocks input, i
// r3: nrounds
//
// This function encrypts four consecutive ChaCha20 blocks by loading
// This function encrypts four consecutive ChaCha blocks by loading
// the state matrix in NEON registers four times. The algorithm performs
// each operation on the corresponding word of each state matrix, hence
// requires no word shuffling. The words are re-interleaved before the
@ -219,7 +250,6 @@ ENTRY(chacha20_4block_xor_neon)
vdup.32 q0, d0[0]
adr ip, .Lrol8_table
mov r3, #10
b 1f
.Ldoubleround4:
@ -417,7 +447,7 @@ ENTRY(chacha20_4block_xor_neon)
vsri.u32 q5, q8, #25
vsri.u32 q6, q9, #25
subs r3, r3, #1
subs r3, r3, #2
bne .Ldoubleround4
// x0..7[0-3] are in q0-q7, x10..15[0-3] are in q10-q15.
@ -527,4 +557,4 @@ ENTRY(chacha20_4block_xor_neon)
pop {r4-r5}
bx lr
ENDPROC(chacha20_4block_xor_neon)
ENDPROC(chacha_4block_xor_neon)

View File

@ -0,0 +1,201 @@
/*
* ARM NEON accelerated ChaCha and XChaCha stream ciphers,
* including ChaCha20 (RFC7539)
*
* Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* Based on:
* ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
*
* Copyright (C) 2015 Martin Willi
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*/
#include <crypto/algapi.h>
#include <crypto/chacha.h>
#include <crypto/internal/skcipher.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
asmlinkage void chacha_block_xor_neon(const u32 *state, u8 *dst, const u8 *src,
int nrounds);
asmlinkage void chacha_4block_xor_neon(const u32 *state, u8 *dst, const u8 *src,
int nrounds);
asmlinkage void hchacha_block_neon(const u32 *state, u32 *out, int nrounds);
static void chacha_doneon(u32 *state, u8 *dst, const u8 *src,
unsigned int bytes, int nrounds)
{
u8 buf[CHACHA_BLOCK_SIZE];
while (bytes >= CHACHA_BLOCK_SIZE * 4) {
chacha_4block_xor_neon(state, dst, src, nrounds);
bytes -= CHACHA_BLOCK_SIZE * 4;
src += CHACHA_BLOCK_SIZE * 4;
dst += CHACHA_BLOCK_SIZE * 4;
state[12] += 4;
}
while (bytes >= CHACHA_BLOCK_SIZE) {
chacha_block_xor_neon(state, dst, src, nrounds);
bytes -= CHACHA_BLOCK_SIZE;
src += CHACHA_BLOCK_SIZE;
dst += CHACHA_BLOCK_SIZE;
state[12]++;
}
if (bytes) {
memcpy(buf, src, bytes);
chacha_block_xor_neon(state, buf, buf, nrounds);
memcpy(dst, buf, bytes);
}
}
static int chacha_neon_stream_xor(struct skcipher_request *req,
struct chacha_ctx *ctx, u8 *iv)
{
struct skcipher_walk walk;
u32 state[16];
int err;
err = skcipher_walk_virt(&walk, req, false);
crypto_chacha_init(state, ctx, iv);
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
if (nbytes < walk.total)
nbytes = round_down(nbytes, walk.stride);
kernel_neon_begin();
chacha_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
nbytes, ctx->nrounds);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
}
return err;
}
static int chacha_neon(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd())
return crypto_chacha_crypt(req);
return chacha_neon_stream_xor(req, ctx, req->iv);
}
static int xchacha_neon(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
struct chacha_ctx subctx;
u32 state[16];
u8 real_iv[16];
if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd())
return crypto_xchacha_crypt(req);
crypto_chacha_init(state, ctx, req->iv);
kernel_neon_begin();
hchacha_block_neon(state, subctx.key, ctx->nrounds);
kernel_neon_end();
subctx.nrounds = ctx->nrounds;
memcpy(&real_iv[0], req->iv + 24, 8);
memcpy(&real_iv[8], req->iv + 16, 8);
return chacha_neon_stream_xor(req, &subctx, real_iv);
}
static struct skcipher_alg algs[] = {
{
.base.cra_name = "chacha20",
.base.cra_driver_name = "chacha20-neon",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = CHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.walksize = 4 * CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha20_setkey,
.encrypt = chacha_neon,
.decrypt = chacha_neon,
}, {
.base.cra_name = "xchacha20",
.base.cra_driver_name = "xchacha20-neon",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = XCHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.walksize = 4 * CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha20_setkey,
.encrypt = xchacha_neon,
.decrypt = xchacha_neon,
}, {
.base.cra_name = "xchacha12",
.base.cra_driver_name = "xchacha12-neon",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = XCHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.walksize = 4 * CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha12_setkey,
.encrypt = xchacha_neon,
.decrypt = xchacha_neon,
}
};
static int __init chacha_simd_mod_init(void)
{
if (!(elf_hwcap & HWCAP_NEON))
return -ENODEV;
return crypto_register_skciphers(algs, ARRAY_SIZE(algs));
}
static void __exit chacha_simd_mod_fini(void)
{
crypto_unregister_skciphers(algs, ARRAY_SIZE(algs));
}
module_init(chacha_simd_mod_init);
module_exit(chacha_simd_mod_fini);
MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers (NEON accelerated)");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS_CRYPTO("chacha20");
MODULE_ALIAS_CRYPTO("chacha20-neon");
MODULE_ALIAS_CRYPTO("xchacha20");
MODULE_ALIAS_CRYPTO("xchacha20-neon");
MODULE_ALIAS_CRYPTO("xchacha12");
MODULE_ALIAS_CRYPTO("xchacha12-neon");

View File

@ -1,127 +0,0 @@
/*
* ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
*
* Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* Based on:
* ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
*
* Copyright (C) 2015 Martin Willi
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*/
#include <crypto/algapi.h>
#include <crypto/chacha20.h>
#include <crypto/internal/skcipher.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
unsigned int bytes)
{
u8 buf[CHACHA20_BLOCK_SIZE];
while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
chacha20_4block_xor_neon(state, dst, src);
bytes -= CHACHA20_BLOCK_SIZE * 4;
src += CHACHA20_BLOCK_SIZE * 4;
dst += CHACHA20_BLOCK_SIZE * 4;
state[12] += 4;
}
while (bytes >= CHACHA20_BLOCK_SIZE) {
chacha20_block_xor_neon(state, dst, src);
bytes -= CHACHA20_BLOCK_SIZE;
src += CHACHA20_BLOCK_SIZE;
dst += CHACHA20_BLOCK_SIZE;
state[12]++;
}
if (bytes) {
memcpy(buf, src, bytes);
chacha20_block_xor_neon(state, buf, buf);
memcpy(dst, buf, bytes);
}
}
static int chacha20_neon(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
u32 state[16];
int err;
if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
return crypto_chacha20_crypt(req);
err = skcipher_walk_virt(&walk, req, true);
crypto_chacha20_init(state, ctx, walk.iv);
kernel_neon_begin();
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
if (nbytes < walk.total)
nbytes = round_down(nbytes, walk.stride);
chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
nbytes);
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
}
kernel_neon_end();
return err;
}
static struct skcipher_alg alg = {
.base.cra_name = "chacha20",
.base.cra_driver_name = "chacha20-neon",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha20_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA20_KEY_SIZE,
.max_keysize = CHACHA20_KEY_SIZE,
.ivsize = CHACHA20_IV_SIZE,
.chunksize = CHACHA20_BLOCK_SIZE,
.walksize = 4 * CHACHA20_BLOCK_SIZE,
.setkey = crypto_chacha20_setkey,
.encrypt = chacha20_neon,
.decrypt = chacha20_neon,
};
static int __init chacha20_simd_mod_init(void)
{
if (!(elf_hwcap & HWCAP_NEON))
return -ENODEV;
return crypto_register_skcipher(&alg);
}
static void __exit chacha20_simd_mod_fini(void)
{
crypto_unregister_skcipher(&alg);
}
module_init(chacha20_simd_mod_init);
module_exit(chacha20_simd_mod_fini);
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS_CRYPTO("chacha20");

View File

@ -0,0 +1,116 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* NH - ε-almost-universal hash function, NEON accelerated version
*
* Copyright 2018 Google LLC
*
* Author: Eric Biggers <ebiggers@google.com>
*/
#include <linux/linkage.h>
.text
.fpu neon
KEY .req r0
MESSAGE .req r1
MESSAGE_LEN .req r2
HASH .req r3
PASS0_SUMS .req q0
PASS0_SUM_A .req d0
PASS0_SUM_B .req d1
PASS1_SUMS .req q1
PASS1_SUM_A .req d2
PASS1_SUM_B .req d3
PASS2_SUMS .req q2
PASS2_SUM_A .req d4
PASS2_SUM_B .req d5
PASS3_SUMS .req q3
PASS3_SUM_A .req d6
PASS3_SUM_B .req d7
K0 .req q4
K1 .req q5
K2 .req q6
K3 .req q7
T0 .req q8
T0_L .req d16
T0_H .req d17
T1 .req q9
T1_L .req d18
T1_H .req d19
T2 .req q10
T2_L .req d20
T2_H .req d21
T3 .req q11
T3_L .req d22
T3_H .req d23
.macro _nh_stride k0, k1, k2, k3
// Load next message stride
vld1.8 {T3}, [MESSAGE]!
// Load next key stride
vld1.32 {\k3}, [KEY]!
// Add message words to key words
vadd.u32 T0, T3, \k0
vadd.u32 T1, T3, \k1
vadd.u32 T2, T3, \k2
vadd.u32 T3, T3, \k3
// Multiply 32x32 => 64 and accumulate
vmlal.u32 PASS0_SUMS, T0_L, T0_H
vmlal.u32 PASS1_SUMS, T1_L, T1_H
vmlal.u32 PASS2_SUMS, T2_L, T2_H
vmlal.u32 PASS3_SUMS, T3_L, T3_H
.endm
/*
* void nh_neon(const u32 *key, const u8 *message, size_t message_len,
* u8 hash[NH_HASH_BYTES])
*
* It's guaranteed that message_len % 16 == 0.
*/
ENTRY(nh_neon)
vld1.32 {K0,K1}, [KEY]!
vmov.u64 PASS0_SUMS, #0
vmov.u64 PASS1_SUMS, #0
vld1.32 {K2}, [KEY]!
vmov.u64 PASS2_SUMS, #0
vmov.u64 PASS3_SUMS, #0
subs MESSAGE_LEN, MESSAGE_LEN, #64
blt .Lloop4_done
.Lloop4:
_nh_stride K0, K1, K2, K3
_nh_stride K1, K2, K3, K0
_nh_stride K2, K3, K0, K1
_nh_stride K3, K0, K1, K2
subs MESSAGE_LEN, MESSAGE_LEN, #64
bge .Lloop4
.Lloop4_done:
ands MESSAGE_LEN, MESSAGE_LEN, #63
beq .Ldone
_nh_stride K0, K1, K2, K3
subs MESSAGE_LEN, MESSAGE_LEN, #16
beq .Ldone
_nh_stride K1, K2, K3, K0
subs MESSAGE_LEN, MESSAGE_LEN, #16
beq .Ldone
_nh_stride K2, K3, K0, K1
.Ldone:
// Sum the accumulators for each pass, then store the sums to 'hash'
vadd.u64 T0_L, PASS0_SUM_A, PASS0_SUM_B
vadd.u64 T0_H, PASS1_SUM_A, PASS1_SUM_B
vadd.u64 T1_L, PASS2_SUM_A, PASS2_SUM_B
vadd.u64 T1_H, PASS3_SUM_A, PASS3_SUM_B
vst1.8 {T0-T1}, [HASH]
bx lr
ENDPROC(nh_neon)

View File

@ -0,0 +1,77 @@
// SPDX-License-Identifier: GPL-2.0
/*
* NHPoly1305 - ε-almost--universal hash function for Adiantum
* (NEON accelerated version)
*
* Copyright 2018 Google LLC
*/
#include <asm/neon.h>
#include <asm/simd.h>
#include <crypto/internal/hash.h>
#include <crypto/nhpoly1305.h>
#include <linux/module.h>
asmlinkage void nh_neon(const u32 *key, const u8 *message, size_t message_len,
u8 hash[NH_HASH_BYTES]);
/* wrapper to avoid indirect call to assembly, which doesn't work with CFI */
static void _nh_neon(const u32 *key, const u8 *message, size_t message_len,
__le64 hash[NH_NUM_PASSES])
{
nh_neon(key, message, message_len, (u8 *)hash);
}
static int nhpoly1305_neon_update(struct shash_desc *desc,
const u8 *src, unsigned int srclen)
{
if (srclen < 64 || !may_use_simd())
return crypto_nhpoly1305_update(desc, src, srclen);
do {
unsigned int n = min_t(unsigned int, srclen, PAGE_SIZE);
kernel_neon_begin();
crypto_nhpoly1305_update_helper(desc, src, n, _nh_neon);
kernel_neon_end();
src += n;
srclen -= n;
} while (srclen);
return 0;
}
static struct shash_alg nhpoly1305_alg = {
.base.cra_name = "nhpoly1305",
.base.cra_driver_name = "nhpoly1305-neon",
.base.cra_priority = 200,
.base.cra_ctxsize = sizeof(struct nhpoly1305_key),
.base.cra_module = THIS_MODULE,
.digestsize = POLY1305_DIGEST_SIZE,
.init = crypto_nhpoly1305_init,
.update = nhpoly1305_neon_update,
.final = crypto_nhpoly1305_final,
.setkey = crypto_nhpoly1305_setkey,
.descsize = sizeof(struct nhpoly1305_state),
};
static int __init nhpoly1305_mod_init(void)
{
if (!(elf_hwcap & HWCAP_NEON))
return -ENODEV;
return crypto_register_shash(&nhpoly1305_alg);
}
static void __exit nhpoly1305_mod_exit(void)
{
crypto_unregister_shash(&nhpoly1305_alg);
}
module_init(nhpoly1305_mod_init);
module_exit(nhpoly1305_mod_exit);
MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function (NEON-accelerated)");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
MODULE_ALIAS_CRYPTO("nhpoly1305");
MODULE_ALIAS_CRYPTO("nhpoly1305-neon");

View File

@ -101,11 +101,16 @@ config CRYPTO_AES_ARM64_NEON_BLK
select CRYPTO_SIMD
config CRYPTO_CHACHA20_NEON
tristate "NEON accelerated ChaCha20 symmetric cipher"
tristate "ChaCha20, XChaCha20, and XChaCha12 stream ciphers using NEON instructions"
depends on KERNEL_MODE_NEON
select CRYPTO_BLKCIPHER
select CRYPTO_CHACHA20
config CRYPTO_NHPOLY1305_NEON
tristate "NHPoly1305 hash function using NEON instructions (for Adiantum)"
depends on KERNEL_MODE_NEON
select CRYPTO_NHPOLY1305
config CRYPTO_AES_ARM64_BS
tristate "AES in ECB/CBC/CTR/XTS modes using bit-sliced NEON algorithm"
depends on KERNEL_MODE_NEON

View File

@ -50,8 +50,11 @@ sha256-arm64-y := sha256-glue.o sha256-core.o
obj-$(CONFIG_CRYPTO_SHA512_ARM64) += sha512-arm64.o
sha512-arm64-y := sha512-glue.o sha512-core.o
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha-neon.o
chacha-neon-y := chacha-neon-core.o chacha-neon-glue.o
obj-$(CONFIG_CRYPTO_NHPOLY1305_NEON) += nhpoly1305-neon.o
nhpoly1305-neon-y := nh-neon-core.o nhpoly1305-neon-glue.o
obj-$(CONFIG_CRYPTO_AES_ARM64) += aes-arm64.o
aes-arm64-y := aes-cipher-core.o aes-cipher-glue.o

View File

@ -1,13 +1,13 @@
/*
* ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
* ChaCha/XChaCha NEON helper functions
*
* Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
* Copyright (C) 2016-2018 Linaro, Ltd. <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* Based on:
* Originally based on:
* ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
*
* Copyright (C) 2015 Martin Willi
@ -19,29 +19,27 @@
*/
#include <linux/linkage.h>
#include <asm/assembler.h>
#include <asm/cache.h>
.text
.align 6
ENTRY(chacha20_block_xor_neon)
// x0: Input state matrix, s
// x1: 1 data block output, o
// x2: 1 data block input, i
/*
* chacha_permute - permute one block
*
* Permute one 64-byte block where the state matrix is stored in the four NEON
* registers v0-v3. It performs matrix operations on four words in parallel,
* but requires shuffling to rearrange the words after each round.
*
* The round count is given in w3.
*
* Clobbers: w3, x10, v4, v12
*/
chacha_permute:
//
// This function encrypts one ChaCha20 block by loading the state matrix
// in four NEON registers. It performs matrix operation on four words in
// parallel, but requires shuffling to rearrange the words after each
// round.
//
// x0..3 = s0..3
adr x3, ROT8
ld1 {v0.4s-v3.4s}, [x0]
ld1 {v8.4s-v11.4s}, [x0]
ld1 {v12.4s}, [x3]
mov x3, #10
adr_l x10, ROT8
ld1 {v12.4s}, [x10]
.Ldoubleround:
// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
@ -102,9 +100,27 @@ ENTRY(chacha20_block_xor_neon)
// x3 = shuffle32(x3, MASK(0, 3, 2, 1))
ext v3.16b, v3.16b, v3.16b, #4
subs x3, x3, #1
subs w3, w3, #2
b.ne .Ldoubleround
ret
ENDPROC(chacha_permute)
ENTRY(chacha_block_xor_neon)
// x0: Input state matrix, s
// x1: 1 data block output, o
// x2: 1 data block input, i
// w3: nrounds
stp x29, x30, [sp, #-16]!
mov x29, sp
// x0..3 = s0..3
ld1 {v0.4s-v3.4s}, [x0]
ld1 {v8.4s-v11.4s}, [x0]
bl chacha_permute
ld1 {v4.16b-v7.16b}, [x2]
// o0 = i0 ^ (x0 + s0)
@ -125,71 +141,156 @@ ENTRY(chacha20_block_xor_neon)
st1 {v0.16b-v3.16b}, [x1]
ldp x29, x30, [sp], #16
ret
ENDPROC(chacha20_block_xor_neon)
ENDPROC(chacha_block_xor_neon)
ENTRY(hchacha_block_neon)
// x0: Input state matrix, s
// x1: output (8 32-bit words)
// w2: nrounds
stp x29, x30, [sp, #-16]!
mov x29, sp
ld1 {v0.4s-v3.4s}, [x0]
mov w3, w2
bl chacha_permute
st1 {v0.16b}, [x1], #16
st1 {v3.16b}, [x1]
ldp x29, x30, [sp], #16
ret
ENDPROC(hchacha_block_neon)
a0 .req w12
a1 .req w13
a2 .req w14
a3 .req w15
a4 .req w16
a5 .req w17
a6 .req w19
a7 .req w20
a8 .req w21
a9 .req w22
a10 .req w23
a11 .req w24
a12 .req w25
a13 .req w26
a14 .req w27
a15 .req w28
.align 6
ENTRY(chacha20_4block_xor_neon)
ENTRY(chacha_4block_xor_neon)
frame_push 10
// x0: Input state matrix, s
// x1: 4 data blocks output, o
// x2: 4 data blocks input, i
// w3: nrounds
// x4: byte count
adr_l x10, .Lpermute
and x5, x4, #63
add x10, x10, x5
add x11, x10, #64
//
// This function encrypts four consecutive ChaCha20 blocks by loading
// This function encrypts four consecutive ChaCha blocks by loading
// the state matrix in NEON registers four times. The algorithm performs
// each operation on the corresponding word of each state matrix, hence
// requires no word shuffling. For final XORing step we transpose the
// matrix by interleaving 32- and then 64-bit words, which allows us to
// do XOR in NEON registers.
//
adr x3, CTRINC // ... and ROT8
ld1 {v30.4s-v31.4s}, [x3]
// At the same time, a fifth block is encrypted in parallel using
// scalar registers
//
adr_l x9, CTRINC // ... and ROT8
ld1 {v30.4s-v31.4s}, [x9]
// x0..15[0-3] = s0..3[0..3]
mov x4, x0
ld4r { v0.4s- v3.4s}, [x4], #16
ld4r { v4.4s- v7.4s}, [x4], #16
ld4r { v8.4s-v11.4s}, [x4], #16
ld4r {v12.4s-v15.4s}, [x4]
add x8, x0, #16
ld4r { v0.4s- v3.4s}, [x0]
ld4r { v4.4s- v7.4s}, [x8], #16
ld4r { v8.4s-v11.4s}, [x8], #16
ld4r {v12.4s-v15.4s}, [x8]
// x12 += counter values 0-3
mov a0, v0.s[0]
mov a1, v1.s[0]
mov a2, v2.s[0]
mov a3, v3.s[0]
mov a4, v4.s[0]
mov a5, v5.s[0]
mov a6, v6.s[0]
mov a7, v7.s[0]
mov a8, v8.s[0]
mov a9, v9.s[0]
mov a10, v10.s[0]
mov a11, v11.s[0]
mov a12, v12.s[0]
mov a13, v13.s[0]
mov a14, v14.s[0]
mov a15, v15.s[0]
// x12 += counter values 1-4
add v12.4s, v12.4s, v30.4s
mov x3, #10
.Ldoubleround4:
// x0 += x4, x12 = rotl32(x12 ^ x0, 16)
// x1 += x5, x13 = rotl32(x13 ^ x1, 16)
// x2 += x6, x14 = rotl32(x14 ^ x2, 16)
// x3 += x7, x15 = rotl32(x15 ^ x3, 16)
add v0.4s, v0.4s, v4.4s
add a0, a0, a4
add v1.4s, v1.4s, v5.4s
add a1, a1, a5
add v2.4s, v2.4s, v6.4s
add a2, a2, a6
add v3.4s, v3.4s, v7.4s
add a3, a3, a7
eor v12.16b, v12.16b, v0.16b
eor a12, a12, a0
eor v13.16b, v13.16b, v1.16b
eor a13, a13, a1
eor v14.16b, v14.16b, v2.16b
eor a14, a14, a2
eor v15.16b, v15.16b, v3.16b
eor a15, a15, a3
rev32 v12.8h, v12.8h
ror a12, a12, #16
rev32 v13.8h, v13.8h
ror a13, a13, #16
rev32 v14.8h, v14.8h
ror a14, a14, #16
rev32 v15.8h, v15.8h
ror a15, a15, #16
// x8 += x12, x4 = rotl32(x4 ^ x8, 12)
// x9 += x13, x5 = rotl32(x5 ^ x9, 12)
// x10 += x14, x6 = rotl32(x6 ^ x10, 12)
// x11 += x15, x7 = rotl32(x7 ^ x11, 12)
add v8.4s, v8.4s, v12.4s
add a8, a8, a12
add v9.4s, v9.4s, v13.4s
add a9, a9, a13
add v10.4s, v10.4s, v14.4s
add a10, a10, a14
add v11.4s, v11.4s, v15.4s
add a11, a11, a15
eor v16.16b, v4.16b, v8.16b
eor a4, a4, a8
eor v17.16b, v5.16b, v9.16b
eor a5, a5, a9
eor v18.16b, v6.16b, v10.16b
eor a6, a6, a10
eor v19.16b, v7.16b, v11.16b
eor a7, a7, a11
shl v4.4s, v16.4s, #12
shl v5.4s, v17.4s, #12
@ -197,42 +298,66 @@ ENTRY(chacha20_4block_xor_neon)
shl v7.4s, v19.4s, #12
sri v4.4s, v16.4s, #20
ror a4, a4, #20
sri v5.4s, v17.4s, #20
ror a5, a5, #20
sri v6.4s, v18.4s, #20
ror a6, a6, #20
sri v7.4s, v19.4s, #20
ror a7, a7, #20
// x0 += x4, x12 = rotl32(x12 ^ x0, 8)
// x1 += x5, x13 = rotl32(x13 ^ x1, 8)
// x2 += x6, x14 = rotl32(x14 ^ x2, 8)
// x3 += x7, x15 = rotl32(x15 ^ x3, 8)
add v0.4s, v0.4s, v4.4s
add a0, a0, a4
add v1.4s, v1.4s, v5.4s
add a1, a1, a5
add v2.4s, v2.4s, v6.4s
add a2, a2, a6
add v3.4s, v3.4s, v7.4s
add a3, a3, a7
eor v12.16b, v12.16b, v0.16b
eor a12, a12, a0
eor v13.16b, v13.16b, v1.16b
eor a13, a13, a1
eor v14.16b, v14.16b, v2.16b
eor a14, a14, a2
eor v15.16b, v15.16b, v3.16b
eor a15, a15, a3
tbl v12.16b, {v12.16b}, v31.16b
ror a12, a12, #24
tbl v13.16b, {v13.16b}, v31.16b
ror a13, a13, #24
tbl v14.16b, {v14.16b}, v31.16b
ror a14, a14, #24
tbl v15.16b, {v15.16b}, v31.16b
ror a15, a15, #24
// x8 += x12, x4 = rotl32(x4 ^ x8, 7)
// x9 += x13, x5 = rotl32(x5 ^ x9, 7)
// x10 += x14, x6 = rotl32(x6 ^ x10, 7)
// x11 += x15, x7 = rotl32(x7 ^ x11, 7)
add v8.4s, v8.4s, v12.4s
add a8, a8, a12
add v9.4s, v9.4s, v13.4s
add a9, a9, a13
add v10.4s, v10.4s, v14.4s
add a10, a10, a14
add v11.4s, v11.4s, v15.4s
add a11, a11, a15
eor v16.16b, v4.16b, v8.16b
eor a4, a4, a8
eor v17.16b, v5.16b, v9.16b
eor a5, a5, a9
eor v18.16b, v6.16b, v10.16b
eor a6, a6, a10
eor v19.16b, v7.16b, v11.16b
eor a7, a7, a11
shl v4.4s, v16.4s, #7
shl v5.4s, v17.4s, #7
@ -240,42 +365,66 @@ ENTRY(chacha20_4block_xor_neon)
shl v7.4s, v19.4s, #7
sri v4.4s, v16.4s, #25
ror a4, a4, #25
sri v5.4s, v17.4s, #25
ror a5, a5, #25
sri v6.4s, v18.4s, #25
ror a6, a6, #25
sri v7.4s, v19.4s, #25
ror a7, a7, #25
// x0 += x5, x15 = rotl32(x15 ^ x0, 16)
// x1 += x6, x12 = rotl32(x12 ^ x1, 16)
// x2 += x7, x13 = rotl32(x13 ^ x2, 16)
// x3 += x4, x14 = rotl32(x14 ^ x3, 16)
add v0.4s, v0.4s, v5.4s
add a0, a0, a5
add v1.4s, v1.4s, v6.4s
add a1, a1, a6
add v2.4s, v2.4s, v7.4s
add a2, a2, a7
add v3.4s, v3.4s, v4.4s
add a3, a3, a4
eor v15.16b, v15.16b, v0.16b
eor a15, a15, a0
eor v12.16b, v12.16b, v1.16b
eor a12, a12, a1
eor v13.16b, v13.16b, v2.16b
eor a13, a13, a2
eor v14.16b, v14.16b, v3.16b
eor a14, a14, a3
rev32 v15.8h, v15.8h
ror a15, a15, #16
rev32 v12.8h, v12.8h
ror a12, a12, #16
rev32 v13.8h, v13.8h
ror a13, a13, #16
rev32 v14.8h, v14.8h
ror a14, a14, #16
// x10 += x15, x5 = rotl32(x5 ^ x10, 12)
// x11 += x12, x6 = rotl32(x6 ^ x11, 12)
// x8 += x13, x7 = rotl32(x7 ^ x8, 12)
// x9 += x14, x4 = rotl32(x4 ^ x9, 12)
add v10.4s, v10.4s, v15.4s
add a10, a10, a15
add v11.4s, v11.4s, v12.4s
add a11, a11, a12
add v8.4s, v8.4s, v13.4s
add a8, a8, a13
add v9.4s, v9.4s, v14.4s
add a9, a9, a14
eor v16.16b, v5.16b, v10.16b
eor a5, a5, a10
eor v17.16b, v6.16b, v11.16b
eor a6, a6, a11
eor v18.16b, v7.16b, v8.16b
eor a7, a7, a8
eor v19.16b, v4.16b, v9.16b
eor a4, a4, a9
shl v5.4s, v16.4s, #12
shl v6.4s, v17.4s, #12
@ -283,42 +432,66 @@ ENTRY(chacha20_4block_xor_neon)
shl v4.4s, v19.4s, #12
sri v5.4s, v16.4s, #20
ror a5, a5, #20
sri v6.4s, v17.4s, #20
ror a6, a6, #20
sri v7.4s, v18.4s, #20
ror a7, a7, #20
sri v4.4s, v19.4s, #20
ror a4, a4, #20
// x0 += x5, x15 = rotl32(x15 ^ x0, 8)
// x1 += x6, x12 = rotl32(x12 ^ x1, 8)
// x2 += x7, x13 = rotl32(x13 ^ x2, 8)
// x3 += x4, x14 = rotl32(x14 ^ x3, 8)
add v0.4s, v0.4s, v5.4s
add a0, a0, a5
add v1.4s, v1.4s, v6.4s
add a1, a1, a6
add v2.4s, v2.4s, v7.4s
add a2, a2, a7
add v3.4s, v3.4s, v4.4s
add a3, a3, a4
eor v15.16b, v15.16b, v0.16b
eor a15, a15, a0
eor v12.16b, v12.16b, v1.16b
eor a12, a12, a1
eor v13.16b, v13.16b, v2.16b
eor a13, a13, a2
eor v14.16b, v14.16b, v3.16b
eor a14, a14, a3
tbl v15.16b, {v15.16b}, v31.16b
ror a15, a15, #24
tbl v12.16b, {v12.16b}, v31.16b
ror a12, a12, #24
tbl v13.16b, {v13.16b}, v31.16b
ror a13, a13, #24
tbl v14.16b, {v14.16b}, v31.16b
ror a14, a14, #24
// x10 += x15, x5 = rotl32(x5 ^ x10, 7)
// x11 += x12, x6 = rotl32(x6 ^ x11, 7)
// x8 += x13, x7 = rotl32(x7 ^ x8, 7)
// x9 += x14, x4 = rotl32(x4 ^ x9, 7)
add v10.4s, v10.4s, v15.4s
add a10, a10, a15
add v11.4s, v11.4s, v12.4s
add a11, a11, a12
add v8.4s, v8.4s, v13.4s
add a8, a8, a13
add v9.4s, v9.4s, v14.4s
add a9, a9, a14
eor v16.16b, v5.16b, v10.16b
eor a5, a5, a10
eor v17.16b, v6.16b, v11.16b
eor a6, a6, a11
eor v18.16b, v7.16b, v8.16b
eor a7, a7, a8
eor v19.16b, v4.16b, v9.16b
eor a4, a4, a9
shl v5.4s, v16.4s, #7
shl v6.4s, v17.4s, #7
@ -326,11 +499,15 @@ ENTRY(chacha20_4block_xor_neon)
shl v4.4s, v19.4s, #7
sri v5.4s, v16.4s, #25
ror a5, a5, #25
sri v6.4s, v17.4s, #25
ror a6, a6, #25
sri v7.4s, v18.4s, #25
ror a7, a7, #25
sri v4.4s, v19.4s, #25
ror a4, a4, #25
subs x3, x3, #1
subs w3, w3, #2
b.ne .Ldoubleround4
ld4r {v16.4s-v19.4s}, [x0], #16
@ -344,9 +521,17 @@ ENTRY(chacha20_4block_xor_neon)
// x2[0-3] += s0[2]
// x3[0-3] += s0[3]
add v0.4s, v0.4s, v16.4s
mov w6, v16.s[0]
mov w7, v17.s[0]
add v1.4s, v1.4s, v17.4s
mov w8, v18.s[0]
mov w9, v19.s[0]
add v2.4s, v2.4s, v18.4s
add a0, a0, w6
add a1, a1, w7
add v3.4s, v3.4s, v19.4s
add a2, a2, w8
add a3, a3, w9
ld4r {v24.4s-v27.4s}, [x0], #16
ld4r {v28.4s-v31.4s}, [x0]
@ -356,95 +541,304 @@ ENTRY(chacha20_4block_xor_neon)
// x6[0-3] += s1[2]
// x7[0-3] += s1[3]
add v4.4s, v4.4s, v20.4s
mov w6, v20.s[0]
mov w7, v21.s[0]
add v5.4s, v5.4s, v21.4s
mov w8, v22.s[0]
mov w9, v23.s[0]
add v6.4s, v6.4s, v22.4s
add a4, a4, w6
add a5, a5, w7
add v7.4s, v7.4s, v23.4s
add a6, a6, w8
add a7, a7, w9
// x8[0-3] += s2[0]
// x9[0-3] += s2[1]
// x10[0-3] += s2[2]
// x11[0-3] += s2[3]
add v8.4s, v8.4s, v24.4s
mov w6, v24.s[0]
mov w7, v25.s[0]
add v9.4s, v9.4s, v25.4s
mov w8, v26.s[0]
mov w9, v27.s[0]
add v10.4s, v10.4s, v26.4s
add a8, a8, w6
add a9, a9, w7
add v11.4s, v11.4s, v27.4s
add a10, a10, w8
add a11, a11, w9
// x12[0-3] += s3[0]
// x13[0-3] += s3[1]
// x14[0-3] += s3[2]
// x15[0-3] += s3[3]
add v12.4s, v12.4s, v28.4s
mov w6, v28.s[0]
mov w7, v29.s[0]
add v13.4s, v13.4s, v29.4s
mov w8, v30.s[0]
mov w9, v31.s[0]
add v14.4s, v14.4s, v30.4s
add a12, a12, w6
add a13, a13, w7
add v15.4s, v15.4s, v31.4s
add a14, a14, w8
add a15, a15, w9
// interleave 32-bit words in state n, n+1
ldp w6, w7, [x2], #64
zip1 v16.4s, v0.4s, v1.4s
ldp w8, w9, [x2, #-56]
eor a0, a0, w6
zip2 v17.4s, v0.4s, v1.4s
eor a1, a1, w7
zip1 v18.4s, v2.4s, v3.4s
eor a2, a2, w8
zip2 v19.4s, v2.4s, v3.4s
eor a3, a3, w9
ldp w6, w7, [x2, #-48]
zip1 v20.4s, v4.4s, v5.4s
ldp w8, w9, [x2, #-40]
eor a4, a4, w6
zip2 v21.4s, v4.4s, v5.4s
eor a5, a5, w7
zip1 v22.4s, v6.4s, v7.4s
eor a6, a6, w8
zip2 v23.4s, v6.4s, v7.4s
eor a7, a7, w9
ldp w6, w7, [x2, #-32]
zip1 v24.4s, v8.4s, v9.4s
ldp w8, w9, [x2, #-24]
eor a8, a8, w6
zip2 v25.4s, v8.4s, v9.4s
eor a9, a9, w7
zip1 v26.4s, v10.4s, v11.4s
eor a10, a10, w8
zip2 v27.4s, v10.4s, v11.4s
eor a11, a11, w9
ldp w6, w7, [x2, #-16]
zip1 v28.4s, v12.4s, v13.4s
ldp w8, w9, [x2, #-8]
eor a12, a12, w6
zip2 v29.4s, v12.4s, v13.4s
eor a13, a13, w7
zip1 v30.4s, v14.4s, v15.4s
eor a14, a14, w8
zip2 v31.4s, v14.4s, v15.4s
eor a15, a15, w9
mov x3, #64
subs x5, x4, #128
add x6, x5, x2
csel x3, x3, xzr, ge
csel x2, x2, x6, ge
// interleave 64-bit words in state n, n+2
zip1 v0.2d, v16.2d, v18.2d
zip2 v4.2d, v16.2d, v18.2d
stp a0, a1, [x1], #64
zip1 v8.2d, v17.2d, v19.2d
zip2 v12.2d, v17.2d, v19.2d
ld1 {v16.16b-v19.16b}, [x2], #64
stp a2, a3, [x1, #-56]
ld1 {v16.16b-v19.16b}, [x2], x3
subs x6, x4, #192
ccmp x3, xzr, #4, lt
add x7, x6, x2
csel x3, x3, xzr, eq
csel x2, x2, x7, eq
zip1 v1.2d, v20.2d, v22.2d
zip2 v5.2d, v20.2d, v22.2d
stp a4, a5, [x1, #-48]
zip1 v9.2d, v21.2d, v23.2d
zip2 v13.2d, v21.2d, v23.2d
ld1 {v20.16b-v23.16b}, [x2], #64
stp a6, a7, [x1, #-40]
ld1 {v20.16b-v23.16b}, [x2], x3
subs x7, x4, #256
ccmp x3, xzr, #4, lt
add x8, x7, x2
csel x3, x3, xzr, eq
csel x2, x2, x8, eq
zip1 v2.2d, v24.2d, v26.2d
zip2 v6.2d, v24.2d, v26.2d
stp a8, a9, [x1, #-32]
zip1 v10.2d, v25.2d, v27.2d
zip2 v14.2d, v25.2d, v27.2d
ld1 {v24.16b-v27.16b}, [x2], #64
stp a10, a11, [x1, #-24]
ld1 {v24.16b-v27.16b}, [x2], x3
subs x8, x4, #320
ccmp x3, xzr, #4, lt
add x9, x8, x2
csel x2, x2, x9, eq
zip1 v3.2d, v28.2d, v30.2d
zip2 v7.2d, v28.2d, v30.2d
stp a12, a13, [x1, #-16]
zip1 v11.2d, v29.2d, v31.2d
zip2 v15.2d, v29.2d, v31.2d
stp a14, a15, [x1, #-8]
ld1 {v28.16b-v31.16b}, [x2]
// xor with corresponding input, write to output
tbnz x5, #63, 0f
eor v16.16b, v16.16b, v0.16b
eor v17.16b, v17.16b, v1.16b
eor v18.16b, v18.16b, v2.16b
eor v19.16b, v19.16b, v3.16b
st1 {v16.16b-v19.16b}, [x1], #64
cbz x5, .Lout
tbnz x6, #63, 1f
eor v20.16b, v20.16b, v4.16b
eor v21.16b, v21.16b, v5.16b
st1 {v16.16b-v19.16b}, [x1], #64
eor v22.16b, v22.16b, v6.16b
eor v23.16b, v23.16b, v7.16b
st1 {v20.16b-v23.16b}, [x1], #64
cbz x6, .Lout
tbnz x7, #63, 2f
eor v24.16b, v24.16b, v8.16b
eor v25.16b, v25.16b, v9.16b
st1 {v20.16b-v23.16b}, [x1], #64
eor v26.16b, v26.16b, v10.16b
eor v27.16b, v27.16b, v11.16b
eor v28.16b, v28.16b, v12.16b
st1 {v24.16b-v27.16b}, [x1], #64
cbz x7, .Lout
tbnz x8, #63, 3f
eor v28.16b, v28.16b, v12.16b
eor v29.16b, v29.16b, v13.16b
eor v30.16b, v30.16b, v14.16b
eor v31.16b, v31.16b, v15.16b
st1 {v28.16b-v31.16b}, [x1]
.Lout: frame_pop
ret
ENDPROC(chacha20_4block_xor_neon)
CTRINC: .word 0, 1, 2, 3
// fewer than 128 bytes of in/output
0: ld1 {v8.16b}, [x10]
ld1 {v9.16b}, [x11]
movi v10.16b, #16
sub x2, x1, #64
add x1, x1, x5
ld1 {v16.16b-v19.16b}, [x2]
tbl v4.16b, {v0.16b-v3.16b}, v8.16b
tbx v20.16b, {v16.16b-v19.16b}, v9.16b
add v8.16b, v8.16b, v10.16b
add v9.16b, v9.16b, v10.16b
tbl v5.16b, {v0.16b-v3.16b}, v8.16b
tbx v21.16b, {v16.16b-v19.16b}, v9.16b
add v8.16b, v8.16b, v10.16b
add v9.16b, v9.16b, v10.16b
tbl v6.16b, {v0.16b-v3.16b}, v8.16b
tbx v22.16b, {v16.16b-v19.16b}, v9.16b
add v8.16b, v8.16b, v10.16b
add v9.16b, v9.16b, v10.16b
tbl v7.16b, {v0.16b-v3.16b}, v8.16b
tbx v23.16b, {v16.16b-v19.16b}, v9.16b
eor v20.16b, v20.16b, v4.16b
eor v21.16b, v21.16b, v5.16b
eor v22.16b, v22.16b, v6.16b
eor v23.16b, v23.16b, v7.16b
st1 {v20.16b-v23.16b}, [x1]
b .Lout
// fewer than 192 bytes of in/output
1: ld1 {v8.16b}, [x10]
ld1 {v9.16b}, [x11]
movi v10.16b, #16
add x1, x1, x6
tbl v0.16b, {v4.16b-v7.16b}, v8.16b
tbx v20.16b, {v16.16b-v19.16b}, v9.16b
add v8.16b, v8.16b, v10.16b
add v9.16b, v9.16b, v10.16b
tbl v1.16b, {v4.16b-v7.16b}, v8.16b
tbx v21.16b, {v16.16b-v19.16b}, v9.16b
add v8.16b, v8.16b, v10.16b
add v9.16b, v9.16b, v10.16b
tbl v2.16b, {v4.16b-v7.16b}, v8.16b
tbx v22.16b, {v16.16b-v19.16b}, v9.16b
add v8.16b, v8.16b, v10.16b
add v9.16b, v9.16b, v10.16b
tbl v3.16b, {v4.16b-v7.16b}, v8.16b
tbx v23.16b, {v16.16b-v19.16b}, v9.16b
eor v20.16b, v20.16b, v0.16b
eor v21.16b, v21.16b, v1.16b
eor v22.16b, v22.16b, v2.16b
eor v23.16b, v23.16b, v3.16b
st1 {v20.16b-v23.16b}, [x1]
b .Lout
// fewer than 256 bytes of in/output
2: ld1 {v4.16b}, [x10]
ld1 {v5.16b}, [x11]
movi v6.16b, #16
add x1, x1, x7
tbl v0.16b, {v8.16b-v11.16b}, v4.16b
tbx v24.16b, {v20.16b-v23.16b}, v5.16b
add v4.16b, v4.16b, v6.16b
add v5.16b, v5.16b, v6.16b
tbl v1.16b, {v8.16b-v11.16b}, v4.16b
tbx v25.16b, {v20.16b-v23.16b}, v5.16b
add v4.16b, v4.16b, v6.16b
add v5.16b, v5.16b, v6.16b
tbl v2.16b, {v8.16b-v11.16b}, v4.16b
tbx v26.16b, {v20.16b-v23.16b}, v5.16b
add v4.16b, v4.16b, v6.16b
add v5.16b, v5.16b, v6.16b
tbl v3.16b, {v8.16b-v11.16b}, v4.16b
tbx v27.16b, {v20.16b-v23.16b}, v5.16b
eor v24.16b, v24.16b, v0.16b
eor v25.16b, v25.16b, v1.16b
eor v26.16b, v26.16b, v2.16b
eor v27.16b, v27.16b, v3.16b
st1 {v24.16b-v27.16b}, [x1]
b .Lout
// fewer than 320 bytes of in/output
3: ld1 {v4.16b}, [x10]
ld1 {v5.16b}, [x11]
movi v6.16b, #16
add x1, x1, x8
tbl v0.16b, {v12.16b-v15.16b}, v4.16b
tbx v28.16b, {v24.16b-v27.16b}, v5.16b
add v4.16b, v4.16b, v6.16b
add v5.16b, v5.16b, v6.16b
tbl v1.16b, {v12.16b-v15.16b}, v4.16b
tbx v29.16b, {v24.16b-v27.16b}, v5.16b
add v4.16b, v4.16b, v6.16b
add v5.16b, v5.16b, v6.16b
tbl v2.16b, {v12.16b-v15.16b}, v4.16b
tbx v30.16b, {v24.16b-v27.16b}, v5.16b
add v4.16b, v4.16b, v6.16b
add v5.16b, v5.16b, v6.16b
tbl v3.16b, {v12.16b-v15.16b}, v4.16b
tbx v31.16b, {v24.16b-v27.16b}, v5.16b
eor v28.16b, v28.16b, v0.16b
eor v29.16b, v29.16b, v1.16b
eor v30.16b, v30.16b, v2.16b
eor v31.16b, v31.16b, v3.16b
st1 {v28.16b-v31.16b}, [x1]
b .Lout
ENDPROC(chacha_4block_xor_neon)
.section ".rodata", "a", %progbits
.align L1_CACHE_SHIFT
.Lpermute:
.set .Li, 0
.rept 192
.byte (.Li - 64)
.set .Li, .Li + 1
.endr
CTRINC: .word 1, 2, 3, 4
ROT8: .word 0x02010003, 0x06050407, 0x0a09080b, 0x0e0d0c0f

View File

@ -0,0 +1,198 @@
/*
* ARM NEON accelerated ChaCha and XChaCha stream ciphers,
* including ChaCha20 (RFC7539)
*
* Copyright (C) 2016 - 2017 Linaro, Ltd. <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* Based on:
* ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
*
* Copyright (C) 2015 Martin Willi
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*/
#include <crypto/algapi.h>
#include <crypto/chacha.h>
#include <crypto/internal/skcipher.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
asmlinkage void chacha_block_xor_neon(u32 *state, u8 *dst, const u8 *src,
int nrounds);
asmlinkage void chacha_4block_xor_neon(u32 *state, u8 *dst, const u8 *src,
int nrounds, int bytes);
asmlinkage void hchacha_block_neon(const u32 *state, u32 *out, int nrounds);
static void chacha_doneon(u32 *state, u8 *dst, const u8 *src,
int bytes, int nrounds)
{
while (bytes > 0) {
int l = min(bytes, CHACHA_BLOCK_SIZE * 5);
if (l <= CHACHA_BLOCK_SIZE) {
u8 buf[CHACHA_BLOCK_SIZE];
memcpy(buf, src, l);
chacha_block_xor_neon(state, buf, buf, nrounds);
memcpy(dst, buf, l);
state[12] += 1;
break;
}
chacha_4block_xor_neon(state, dst, src, nrounds, l);
bytes -= CHACHA_BLOCK_SIZE * 5;
src += CHACHA_BLOCK_SIZE * 5;
dst += CHACHA_BLOCK_SIZE * 5;
state[12] += 5;
}
}
static int chacha_neon_stream_xor(struct skcipher_request *req,
struct chacha_ctx *ctx, u8 *iv)
{
struct skcipher_walk walk;
u32 state[16];
int err;
err = skcipher_walk_virt(&walk, req, false);
crypto_chacha_init(state, ctx, iv);
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
if (nbytes < walk.total)
nbytes = rounddown(nbytes, walk.stride);
kernel_neon_begin();
chacha_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
nbytes, ctx->nrounds);
kernel_neon_end();
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
}
return err;
}
static int chacha_neon(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd())
return crypto_chacha_crypt(req);
return chacha_neon_stream_xor(req, ctx, req->iv);
}
static int xchacha_neon(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
struct chacha_ctx subctx;
u32 state[16];
u8 real_iv[16];
if (req->cryptlen <= CHACHA_BLOCK_SIZE || !may_use_simd())
return crypto_xchacha_crypt(req);
crypto_chacha_init(state, ctx, req->iv);
kernel_neon_begin();
hchacha_block_neon(state, subctx.key, ctx->nrounds);
kernel_neon_end();
subctx.nrounds = ctx->nrounds;
memcpy(&real_iv[0], req->iv + 24, 8);
memcpy(&real_iv[8], req->iv + 16, 8);
return chacha_neon_stream_xor(req, &subctx, real_iv);
}
static struct skcipher_alg algs[] = {
{
.base.cra_name = "chacha20",
.base.cra_driver_name = "chacha20-neon",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = CHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.walksize = 5 * CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha20_setkey,
.encrypt = chacha_neon,
.decrypt = chacha_neon,
}, {
.base.cra_name = "xchacha20",
.base.cra_driver_name = "xchacha20-neon",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = XCHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.walksize = 5 * CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha20_setkey,
.encrypt = xchacha_neon,
.decrypt = xchacha_neon,
}, {
.base.cra_name = "xchacha12",
.base.cra_driver_name = "xchacha12-neon",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = XCHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.walksize = 5 * CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha12_setkey,
.encrypt = xchacha_neon,
.decrypt = xchacha_neon,
}
};
static int __init chacha_simd_mod_init(void)
{
if (!(elf_hwcap & HWCAP_ASIMD))
return -ENODEV;
return crypto_register_skciphers(algs, ARRAY_SIZE(algs));
}
static void __exit chacha_simd_mod_fini(void)
{
crypto_unregister_skciphers(algs, ARRAY_SIZE(algs));
}
module_init(chacha_simd_mod_init);
module_exit(chacha_simd_mod_fini);
MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers (NEON accelerated)");
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS_CRYPTO("chacha20");
MODULE_ALIAS_CRYPTO("chacha20-neon");
MODULE_ALIAS_CRYPTO("xchacha20");
MODULE_ALIAS_CRYPTO("xchacha20-neon");
MODULE_ALIAS_CRYPTO("xchacha12");
MODULE_ALIAS_CRYPTO("xchacha12-neon");

View File

@ -1,133 +0,0 @@
/*
* ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
*
* Copyright (C) 2016 - 2017 Linaro, Ltd. <ard.biesheuvel@linaro.org>
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License version 2 as
* published by the Free Software Foundation.
*
* Based on:
* ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
*
* Copyright (C) 2015 Martin Willi
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*/
#include <crypto/algapi.h>
#include <crypto/chacha20.h>
#include <crypto/internal/skcipher.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <asm/hwcap.h>
#include <asm/neon.h>
#include <asm/simd.h>
asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
unsigned int bytes)
{
u8 buf[CHACHA20_BLOCK_SIZE];
while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
kernel_neon_begin();
chacha20_4block_xor_neon(state, dst, src);
kernel_neon_end();
bytes -= CHACHA20_BLOCK_SIZE * 4;
src += CHACHA20_BLOCK_SIZE * 4;
dst += CHACHA20_BLOCK_SIZE * 4;
state[12] += 4;
}
if (!bytes)
return;
kernel_neon_begin();
while (bytes >= CHACHA20_BLOCK_SIZE) {
chacha20_block_xor_neon(state, dst, src);
bytes -= CHACHA20_BLOCK_SIZE;
src += CHACHA20_BLOCK_SIZE;
dst += CHACHA20_BLOCK_SIZE;
state[12]++;
}
if (bytes) {
memcpy(buf, src, bytes);
chacha20_block_xor_neon(state, buf, buf);
memcpy(dst, buf, bytes);
}
kernel_neon_end();
}
static int chacha20_neon(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
u32 state[16];
int err;
if (!may_use_simd() || req->cryptlen <= CHACHA20_BLOCK_SIZE)
return crypto_chacha20_crypt(req);
err = skcipher_walk_virt(&walk, req, false);
crypto_chacha20_init(state, ctx, walk.iv);
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
if (nbytes < walk.total)
nbytes = round_down(nbytes, walk.stride);
chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
nbytes);
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
}
return err;
}
static struct skcipher_alg alg = {
.base.cra_name = "chacha20",
.base.cra_driver_name = "chacha20-neon",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha20_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA20_KEY_SIZE,
.max_keysize = CHACHA20_KEY_SIZE,
.ivsize = CHACHA20_IV_SIZE,
.chunksize = CHACHA20_BLOCK_SIZE,
.walksize = 4 * CHACHA20_BLOCK_SIZE,
.setkey = crypto_chacha20_setkey,
.encrypt = chacha20_neon,
.decrypt = chacha20_neon,
};
static int __init chacha20_simd_mod_init(void)
{
if (!(elf_hwcap & HWCAP_ASIMD))
return -ENODEV;
return crypto_register_skcipher(&alg);
}
static void __exit chacha20_simd_mod_fini(void)
{
crypto_unregister_skcipher(&alg);
}
module_init(chacha20_simd_mod_init);
module_exit(chacha20_simd_mod_fini);
MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
MODULE_LICENSE("GPL v2");
MODULE_ALIAS_CRYPTO("chacha20");

View File

@ -0,0 +1,103 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* NH - ε-almost-universal hash function, ARM64 NEON accelerated version
*
* Copyright 2018 Google LLC
*
* Author: Eric Biggers <ebiggers@google.com>
*/
#include <linux/linkage.h>
KEY .req x0
MESSAGE .req x1
MESSAGE_LEN .req x2
HASH .req x3
PASS0_SUMS .req v0
PASS1_SUMS .req v1
PASS2_SUMS .req v2
PASS3_SUMS .req v3
K0 .req v4
K1 .req v5
K2 .req v6
K3 .req v7
T0 .req v8
T1 .req v9
T2 .req v10
T3 .req v11
T4 .req v12
T5 .req v13
T6 .req v14
T7 .req v15
.macro _nh_stride k0, k1, k2, k3
// Load next message stride
ld1 {T3.16b}, [MESSAGE], #16
// Load next key stride
ld1 {\k3\().4s}, [KEY], #16
// Add message words to key words
add T0.4s, T3.4s, \k0\().4s
add T1.4s, T3.4s, \k1\().4s
add T2.4s, T3.4s, \k2\().4s
add T3.4s, T3.4s, \k3\().4s
// Multiply 32x32 => 64 and accumulate
mov T4.d[0], T0.d[1]
mov T5.d[0], T1.d[1]
mov T6.d[0], T2.d[1]
mov T7.d[0], T3.d[1]
umlal PASS0_SUMS.2d, T0.2s, T4.2s
umlal PASS1_SUMS.2d, T1.2s, T5.2s
umlal PASS2_SUMS.2d, T2.2s, T6.2s
umlal PASS3_SUMS.2d, T3.2s, T7.2s
.endm
/*
* void nh_neon(const u32 *key, const u8 *message, size_t message_len,
* u8 hash[NH_HASH_BYTES])
*
* It's guaranteed that message_len % 16 == 0.
*/
ENTRY(nh_neon)
ld1 {K0.4s,K1.4s}, [KEY], #32
movi PASS0_SUMS.2d, #0
movi PASS1_SUMS.2d, #0
ld1 {K2.4s}, [KEY], #16
movi PASS2_SUMS.2d, #0
movi PASS3_SUMS.2d, #0
subs MESSAGE_LEN, MESSAGE_LEN, #64
blt .Lloop4_done
.Lloop4:
_nh_stride K0, K1, K2, K3
_nh_stride K1, K2, K3, K0
_nh_stride K2, K3, K0, K1
_nh_stride K3, K0, K1, K2
subs MESSAGE_LEN, MESSAGE_LEN, #64
bge .Lloop4
.Lloop4_done:
ands MESSAGE_LEN, MESSAGE_LEN, #63
beq .Ldone
_nh_stride K0, K1, K2, K3
subs MESSAGE_LEN, MESSAGE_LEN, #16
beq .Ldone
_nh_stride K1, K2, K3, K0
subs MESSAGE_LEN, MESSAGE_LEN, #16
beq .Ldone
_nh_stride K2, K3, K0, K1
.Ldone:
// Sum the accumulators for each pass, then store the sums to 'hash'
addp T0.2d, PASS0_SUMS.2d, PASS1_SUMS.2d
addp T1.2d, PASS2_SUMS.2d, PASS3_SUMS.2d
st1 {T0.16b,T1.16b}, [HASH]
ret
ENDPROC(nh_neon)

View File

@ -0,0 +1,77 @@
// SPDX-License-Identifier: GPL-2.0
/*
* NHPoly1305 - ε-almost--universal hash function for Adiantum
* (ARM64 NEON accelerated version)
*
* Copyright 2018 Google LLC
*/
#include <asm/neon.h>
#include <asm/simd.h>
#include <crypto/internal/hash.h>
#include <crypto/nhpoly1305.h>
#include <linux/module.h>
asmlinkage void nh_neon(const u32 *key, const u8 *message, size_t message_len,
u8 hash[NH_HASH_BYTES]);
/* wrapper to avoid indirect call to assembly, which doesn't work with CFI */
static void _nh_neon(const u32 *key, const u8 *message, size_t message_len,
__le64 hash[NH_NUM_PASSES])
{
nh_neon(key, message, message_len, (u8 *)hash);
}
static int nhpoly1305_neon_update(struct shash_desc *desc,
const u8 *src, unsigned int srclen)
{
if (srclen < 64 || !may_use_simd())
return crypto_nhpoly1305_update(desc, src, srclen);
do {
unsigned int n = min_t(unsigned int, srclen, PAGE_SIZE);
kernel_neon_begin();
crypto_nhpoly1305_update_helper(desc, src, n, _nh_neon);
kernel_neon_end();
src += n;
srclen -= n;
} while (srclen);
return 0;
}
static struct shash_alg nhpoly1305_alg = {
.base.cra_name = "nhpoly1305",
.base.cra_driver_name = "nhpoly1305-neon",
.base.cra_priority = 200,
.base.cra_ctxsize = sizeof(struct nhpoly1305_key),
.base.cra_module = THIS_MODULE,
.digestsize = POLY1305_DIGEST_SIZE,
.init = crypto_nhpoly1305_init,
.update = nhpoly1305_neon_update,
.final = crypto_nhpoly1305_final,
.setkey = crypto_nhpoly1305_setkey,
.descsize = sizeof(struct nhpoly1305_state),
};
static int __init nhpoly1305_mod_init(void)
{
if (!(elf_hwcap & HWCAP_ASIMD))
return -ENODEV;
return crypto_register_shash(&nhpoly1305_alg);
}
static void __exit nhpoly1305_mod_exit(void)
{
crypto_unregister_shash(&nhpoly1305_alg);
}
module_init(nhpoly1305_mod_init);
module_exit(nhpoly1305_mod_exit);
MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function (NEON-accelerated)");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
MODULE_ALIAS_CRYPTO("nhpoly1305");
MODULE_ALIAS_CRYPTO("nhpoly1305-neon");

View File

@ -137,7 +137,7 @@ static int fallback_init_cip(struct crypto_tfm *tfm)
struct s390_aes_ctx *sctx = crypto_tfm_ctx(tfm);
sctx->fallback.cip = crypto_alloc_cipher(name, 0,
CRYPTO_ALG_ASYNC | CRYPTO_ALG_NEED_FALLBACK);
CRYPTO_ALG_NEED_FALLBACK);
if (IS_ERR(sctx->fallback.cip)) {
pr_err("Allocating AES fallback algorithm %s failed\n",

View File

@ -476,11 +476,6 @@ static bool __init sparc64_has_aes_opcode(void)
static int __init aes_sparc64_mod_init(void)
{
int i;
for (i = 0; i < ARRAY_SIZE(algs); i++)
INIT_LIST_HEAD(&algs[i].cra_list);
if (sparc64_has_aes_opcode()) {
pr_info("Using sparc64 aes opcodes optimized AES implementation\n");
return crypto_register_algs(algs, ARRAY_SIZE(algs));

View File

@ -299,11 +299,6 @@ static bool __init sparc64_has_camellia_opcode(void)
static int __init camellia_sparc64_mod_init(void)
{
int i;
for (i = 0; i < ARRAY_SIZE(algs); i++)
INIT_LIST_HEAD(&algs[i].cra_list);
if (sparc64_has_camellia_opcode()) {
pr_info("Using sparc64 camellia opcodes optimized CAMELLIA implementation\n");
return crypto_register_algs(algs, ARRAY_SIZE(algs));

View File

@ -510,11 +510,6 @@ static bool __init sparc64_has_des_opcode(void)
static int __init des_sparc64_mod_init(void)
{
int i;
for (i = 0; i < ARRAY_SIZE(algs); i++)
INIT_LIST_HEAD(&algs[i].cra_list);
if (sparc64_has_des_opcode()) {
pr_info("Using sparc64 des opcodes optimized DES implementation\n");
return crypto_register_algs(algs, ARRAY_SIZE(algs));

View File

@ -8,6 +8,7 @@ OBJECT_FILES_NON_STANDARD := y
avx_supported := $(call as-instr,vpxor %xmm0$(comma)%xmm0$(comma)%xmm0,yes,no)
avx2_supported := $(call as-instr,vpgatherdd %ymm0$(comma)(%eax$(comma)%ymm1\
$(comma)4)$(comma)%ymm2,yes,no)
avx512_supported :=$(call as-instr,vpmovm2b %k1$(comma)%zmm5,yes,no)
sha1_ni_supported :=$(call as-instr,sha1msg1 %xmm0$(comma)%xmm1,yes,no)
sha256_ni_supported :=$(call as-instr,sha256msg1 %xmm0$(comma)%xmm1,yes,no)
@ -23,7 +24,7 @@ obj-$(CONFIG_CRYPTO_CAMELLIA_X86_64) += camellia-x86_64.o
obj-$(CONFIG_CRYPTO_BLOWFISH_X86_64) += blowfish-x86_64.o
obj-$(CONFIG_CRYPTO_TWOFISH_X86_64) += twofish-x86_64.o
obj-$(CONFIG_CRYPTO_TWOFISH_X86_64_3WAY) += twofish-x86_64-3way.o
obj-$(CONFIG_CRYPTO_CHACHA20_X86_64) += chacha20-x86_64.o
obj-$(CONFIG_CRYPTO_CHACHA20_X86_64) += chacha-x86_64.o
obj-$(CONFIG_CRYPTO_SERPENT_SSE2_X86_64) += serpent-sse2-x86_64.o
obj-$(CONFIG_CRYPTO_AES_NI_INTEL) += aesni-intel.o
obj-$(CONFIG_CRYPTO_GHASH_CLMUL_NI_INTEL) += ghash-clmulni-intel.o
@ -46,6 +47,9 @@ obj-$(CONFIG_CRYPTO_MORUS1280_GLUE) += morus1280_glue.o
obj-$(CONFIG_CRYPTO_MORUS640_SSE2) += morus640-sse2.o
obj-$(CONFIG_CRYPTO_MORUS1280_SSE2) += morus1280-sse2.o
obj-$(CONFIG_CRYPTO_NHPOLY1305_SSE2) += nhpoly1305-sse2.o
obj-$(CONFIG_CRYPTO_NHPOLY1305_AVX2) += nhpoly1305-avx2.o
# These modules require assembler to support AVX.
ifeq ($(avx_supported),yes)
obj-$(CONFIG_CRYPTO_CAMELLIA_AESNI_AVX_X86_64) += \
@ -74,7 +78,7 @@ camellia-x86_64-y := camellia-x86_64-asm_64.o camellia_glue.o
blowfish-x86_64-y := blowfish-x86_64-asm_64.o blowfish_glue.o
twofish-x86_64-y := twofish-x86_64-asm_64.o twofish_glue.o
twofish-x86_64-3way-y := twofish-x86_64-asm_64-3way.o twofish_glue_3way.o
chacha20-x86_64-y := chacha20-ssse3-x86_64.o chacha20_glue.o
chacha-x86_64-y := chacha-ssse3-x86_64.o chacha_glue.o
serpent-sse2-x86_64-y := serpent-sse2-x86_64-asm_64.o serpent_sse2_glue.o
aegis128-aesni-y := aegis128-aesni-asm.o aegis128-aesni-glue.o
@ -84,6 +88,8 @@ aegis256-aesni-y := aegis256-aesni-asm.o aegis256-aesni-glue.o
morus640-sse2-y := morus640-sse2-asm.o morus640-sse2-glue.o
morus1280-sse2-y := morus1280-sse2-asm.o morus1280-sse2-glue.o
nhpoly1305-sse2-y := nh-sse2-x86_64.o nhpoly1305-sse2-glue.o
ifeq ($(avx_supported),yes)
camellia-aesni-avx-x86_64-y := camellia-aesni-avx-asm_64.o \
camellia_aesni_avx_glue.o
@ -97,10 +103,16 @@ endif
ifeq ($(avx2_supported),yes)
camellia-aesni-avx2-y := camellia-aesni-avx2-asm_64.o camellia_aesni_avx2_glue.o
chacha20-x86_64-y += chacha20-avx2-x86_64.o
chacha-x86_64-y += chacha-avx2-x86_64.o
serpent-avx2-y := serpent-avx2-asm_64.o serpent_avx2_glue.o
morus1280-avx2-y := morus1280-avx2-asm.o morus1280-avx2-glue.o
nhpoly1305-avx2-y := nh-avx2-x86_64.o nhpoly1305-avx2-glue.o
endif
ifeq ($(avx512_supported),yes)
chacha-x86_64-y += chacha-avx512vl-x86_64.o
endif
aesni-intel-y := aesni-intel_asm.o aesni-intel_glue.o

File diff suppressed because it is too large Load Diff

View File

@ -84,7 +84,7 @@ struct gcm_context_data {
u8 current_counter[GCM_BLOCK_LEN];
u64 partial_block_len;
u64 unused;
u8 hash_keys[GCM_BLOCK_LEN * 8];
u8 hash_keys[GCM_BLOCK_LEN * 16];
};
asmlinkage int aesni_set_key(struct crypto_aes_ctx *ctx, const u8 *in_key,
@ -175,6 +175,32 @@ asmlinkage void aesni_gcm_finalize(void *ctx,
struct gcm_context_data *gdata,
u8 *auth_tag, unsigned long auth_tag_len);
static struct aesni_gcm_tfm_s {
void (*init)(void *ctx,
struct gcm_context_data *gdata,
u8 *iv,
u8 *hash_subkey, const u8 *aad,
unsigned long aad_len);
void (*enc_update)(void *ctx,
struct gcm_context_data *gdata, u8 *out,
const u8 *in,
unsigned long plaintext_len);
void (*dec_update)(void *ctx,
struct gcm_context_data *gdata, u8 *out,
const u8 *in,
unsigned long ciphertext_len);
void (*finalize)(void *ctx,
struct gcm_context_data *gdata,
u8 *auth_tag, unsigned long auth_tag_len);
} *aesni_gcm_tfm;
struct aesni_gcm_tfm_s aesni_gcm_tfm_sse = {
.init = &aesni_gcm_init,
.enc_update = &aesni_gcm_enc_update,
.dec_update = &aesni_gcm_dec_update,
.finalize = &aesni_gcm_finalize,
};
#ifdef CONFIG_AS_AVX
asmlinkage void aes_ctr_enc_128_avx_by8(const u8 *in, u8 *iv,
void *keys, u8 *out, unsigned int num_bytes);
@ -183,136 +209,94 @@ asmlinkage void aes_ctr_enc_192_avx_by8(const u8 *in, u8 *iv,
asmlinkage void aes_ctr_enc_256_avx_by8(const u8 *in, u8 *iv,
void *keys, u8 *out, unsigned int num_bytes);
/*
* asmlinkage void aesni_gcm_precomp_avx_gen2()
* asmlinkage void aesni_gcm_init_avx_gen2()
* gcm_data *my_ctx_data, context data
* u8 *hash_subkey, the Hash sub key input. Data starts on a 16-byte boundary.
*/
asmlinkage void aesni_gcm_precomp_avx_gen2(void *my_ctx_data, u8 *hash_subkey);
asmlinkage void aesni_gcm_init_avx_gen2(void *my_ctx_data,
struct gcm_context_data *gdata,
u8 *iv,
u8 *hash_subkey,
const u8 *aad,
unsigned long aad_len);
asmlinkage void aesni_gcm_enc_avx_gen2(void *ctx, u8 *out,
asmlinkage void aesni_gcm_enc_update_avx_gen2(void *ctx,
struct gcm_context_data *gdata, u8 *out,
const u8 *in, unsigned long plaintext_len);
asmlinkage void aesni_gcm_dec_update_avx_gen2(void *ctx,
struct gcm_context_data *gdata, u8 *out,
const u8 *in,
unsigned long ciphertext_len);
asmlinkage void aesni_gcm_finalize_avx_gen2(void *ctx,
struct gcm_context_data *gdata,
u8 *auth_tag, unsigned long auth_tag_len);
asmlinkage void aesni_gcm_enc_avx_gen2(void *ctx,
struct gcm_context_data *gdata, u8 *out,
const u8 *in, unsigned long plaintext_len, u8 *iv,
const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len);
asmlinkage void aesni_gcm_dec_avx_gen2(void *ctx, u8 *out,
asmlinkage void aesni_gcm_dec_avx_gen2(void *ctx,
struct gcm_context_data *gdata, u8 *out,
const u8 *in, unsigned long ciphertext_len, u8 *iv,
const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len);
static void aesni_gcm_enc_avx(void *ctx,
struct gcm_context_data *data, u8 *out,
const u8 *in, unsigned long plaintext_len, u8 *iv,
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len)
{
struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx;
if ((plaintext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)){
aesni_gcm_enc(ctx, data, out, in,
plaintext_len, iv, hash_subkey, aad,
aad_len, auth_tag, auth_tag_len);
} else {
aesni_gcm_precomp_avx_gen2(ctx, hash_subkey);
aesni_gcm_enc_avx_gen2(ctx, out, in, plaintext_len, iv, aad,
aad_len, auth_tag, auth_tag_len);
}
}
struct aesni_gcm_tfm_s aesni_gcm_tfm_avx_gen2 = {
.init = &aesni_gcm_init_avx_gen2,
.enc_update = &aesni_gcm_enc_update_avx_gen2,
.dec_update = &aesni_gcm_dec_update_avx_gen2,
.finalize = &aesni_gcm_finalize_avx_gen2,
};
static void aesni_gcm_dec_avx(void *ctx,
struct gcm_context_data *data, u8 *out,
const u8 *in, unsigned long ciphertext_len, u8 *iv,
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len)
{
struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx;
if ((ciphertext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)) {
aesni_gcm_dec(ctx, data, out, in,
ciphertext_len, iv, hash_subkey, aad,
aad_len, auth_tag, auth_tag_len);
} else {
aesni_gcm_precomp_avx_gen2(ctx, hash_subkey);
aesni_gcm_dec_avx_gen2(ctx, out, in, ciphertext_len, iv, aad,
aad_len, auth_tag, auth_tag_len);
}
}
#endif
#ifdef CONFIG_AS_AVX2
/*
* asmlinkage void aesni_gcm_precomp_avx_gen4()
* asmlinkage void aesni_gcm_init_avx_gen4()
* gcm_data *my_ctx_data, context data
* u8 *hash_subkey, the Hash sub key input. Data starts on a 16-byte boundary.
*/
asmlinkage void aesni_gcm_precomp_avx_gen4(void *my_ctx_data, u8 *hash_subkey);
asmlinkage void aesni_gcm_init_avx_gen4(void *my_ctx_data,
struct gcm_context_data *gdata,
u8 *iv,
u8 *hash_subkey,
const u8 *aad,
unsigned long aad_len);
asmlinkage void aesni_gcm_enc_avx_gen4(void *ctx, u8 *out,
asmlinkage void aesni_gcm_enc_update_avx_gen4(void *ctx,
struct gcm_context_data *gdata, u8 *out,
const u8 *in, unsigned long plaintext_len);
asmlinkage void aesni_gcm_dec_update_avx_gen4(void *ctx,
struct gcm_context_data *gdata, u8 *out,
const u8 *in,
unsigned long ciphertext_len);
asmlinkage void aesni_gcm_finalize_avx_gen4(void *ctx,
struct gcm_context_data *gdata,
u8 *auth_tag, unsigned long auth_tag_len);
asmlinkage void aesni_gcm_enc_avx_gen4(void *ctx,
struct gcm_context_data *gdata, u8 *out,
const u8 *in, unsigned long plaintext_len, u8 *iv,
const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len);
asmlinkage void aesni_gcm_dec_avx_gen4(void *ctx, u8 *out,
asmlinkage void aesni_gcm_dec_avx_gen4(void *ctx,
struct gcm_context_data *gdata, u8 *out,
const u8 *in, unsigned long ciphertext_len, u8 *iv,
const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len);
static void aesni_gcm_enc_avx2(void *ctx,
struct gcm_context_data *data, u8 *out,
const u8 *in, unsigned long plaintext_len, u8 *iv,
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len)
{
struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx;
if ((plaintext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)) {
aesni_gcm_enc(ctx, data, out, in,
plaintext_len, iv, hash_subkey, aad,
aad_len, auth_tag, auth_tag_len);
} else if (plaintext_len < AVX_GEN4_OPTSIZE) {
aesni_gcm_precomp_avx_gen2(ctx, hash_subkey);
aesni_gcm_enc_avx_gen2(ctx, out, in, plaintext_len, iv, aad,
aad_len, auth_tag, auth_tag_len);
} else {
aesni_gcm_precomp_avx_gen4(ctx, hash_subkey);
aesni_gcm_enc_avx_gen4(ctx, out, in, plaintext_len, iv, aad,
aad_len, auth_tag, auth_tag_len);
}
}
struct aesni_gcm_tfm_s aesni_gcm_tfm_avx_gen4 = {
.init = &aesni_gcm_init_avx_gen4,
.enc_update = &aesni_gcm_enc_update_avx_gen4,
.dec_update = &aesni_gcm_dec_update_avx_gen4,
.finalize = &aesni_gcm_finalize_avx_gen4,
};
static void aesni_gcm_dec_avx2(void *ctx,
struct gcm_context_data *data, u8 *out,
const u8 *in, unsigned long ciphertext_len, u8 *iv,
u8 *hash_subkey, const u8 *aad, unsigned long aad_len,
u8 *auth_tag, unsigned long auth_tag_len)
{
struct crypto_aes_ctx *aes_ctx = (struct crypto_aes_ctx*)ctx;
if ((ciphertext_len < AVX_GEN2_OPTSIZE) || (aes_ctx-> key_length != AES_KEYSIZE_128)) {
aesni_gcm_dec(ctx, data, out, in,
ciphertext_len, iv, hash_subkey,
aad, aad_len, auth_tag, auth_tag_len);
} else if (ciphertext_len < AVX_GEN4_OPTSIZE) {
aesni_gcm_precomp_avx_gen2(ctx, hash_subkey);
aesni_gcm_dec_avx_gen2(ctx, out, in, ciphertext_len, iv, aad,
aad_len, auth_tag, auth_tag_len);
} else {
aesni_gcm_precomp_avx_gen4(ctx, hash_subkey);
aesni_gcm_dec_avx_gen4(ctx, out, in, ciphertext_len, iv, aad,
aad_len, auth_tag, auth_tag_len);
}
}
#endif
static void (*aesni_gcm_enc_tfm)(void *ctx,
struct gcm_context_data *data, u8 *out,
const u8 *in, unsigned long plaintext_len,
u8 *iv, u8 *hash_subkey, const u8 *aad,
unsigned long aad_len, u8 *auth_tag,
unsigned long auth_tag_len);
static void (*aesni_gcm_dec_tfm)(void *ctx,
struct gcm_context_data *data, u8 *out,
const u8 *in, unsigned long ciphertext_len,
u8 *iv, u8 *hash_subkey, const u8 *aad,
unsigned long aad_len, u8 *auth_tag,
unsigned long auth_tag_len);
static inline struct
aesni_rfc4106_gcm_ctx *aesni_rfc4106_gcm_ctx_get(struct crypto_aead *tfm)
{
@ -794,6 +778,7 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
{
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
unsigned long auth_tag_len = crypto_aead_authsize(tfm);
struct aesni_gcm_tfm_s *gcm_tfm = aesni_gcm_tfm;
struct gcm_context_data data AESNI_ALIGN_ATTR;
struct scatter_walk dst_sg_walk = {};
unsigned long left = req->cryptlen;
@ -811,6 +796,15 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
if (!enc)
left -= auth_tag_len;
#ifdef CONFIG_AS_AVX2
if (left < AVX_GEN4_OPTSIZE && gcm_tfm == &aesni_gcm_tfm_avx_gen4)
gcm_tfm = &aesni_gcm_tfm_avx_gen2;
#endif
#ifdef CONFIG_AS_AVX
if (left < AVX_GEN2_OPTSIZE && gcm_tfm == &aesni_gcm_tfm_avx_gen2)
gcm_tfm = &aesni_gcm_tfm_sse;
#endif
/* Linearize assoc, if not already linear */
if (req->src->length >= assoclen && req->src->length &&
(!PageHighMem(sg_page(req->src)) ||
@ -835,7 +829,7 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
}
kernel_fpu_begin();
aesni_gcm_init(aes_ctx, &data, iv,
gcm_tfm->init(aes_ctx, &data, iv,
hash_subkey, assoc, assoclen);
if (req->src != req->dst) {
while (left) {
@ -846,10 +840,10 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
len = min(srclen, dstlen);
if (len) {
if (enc)
aesni_gcm_enc_update(aes_ctx, &data,
gcm_tfm->enc_update(aes_ctx, &data,
dst, src, len);
else
aesni_gcm_dec_update(aes_ctx, &data,
gcm_tfm->dec_update(aes_ctx, &data,
dst, src, len);
}
left -= len;
@ -867,10 +861,10 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
len = scatterwalk_clamp(&src_sg_walk, left);
if (len) {
if (enc)
aesni_gcm_enc_update(aes_ctx, &data,
gcm_tfm->enc_update(aes_ctx, &data,
src, src, len);
else
aesni_gcm_dec_update(aes_ctx, &data,
gcm_tfm->dec_update(aes_ctx, &data,
src, src, len);
}
left -= len;
@ -879,7 +873,7 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
scatterwalk_done(&src_sg_walk, 1, left);
}
}
aesni_gcm_finalize(aes_ctx, &data, authTag, auth_tag_len);
gcm_tfm->finalize(aes_ctx, &data, authTag, auth_tag_len);
kernel_fpu_end();
if (!assocmem)
@ -912,147 +906,15 @@ static int gcmaes_crypt_by_sg(bool enc, struct aead_request *req,
static int gcmaes_encrypt(struct aead_request *req, unsigned int assoclen,
u8 *hash_subkey, u8 *iv, void *aes_ctx)
{
u8 one_entry_in_sg = 0;
u8 *src, *dst, *assoc;
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
unsigned long auth_tag_len = crypto_aead_authsize(tfm);
struct scatter_walk src_sg_walk;
struct scatter_walk dst_sg_walk = {};
struct gcm_context_data data AESNI_ALIGN_ATTR;
if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 ||
aesni_gcm_enc_tfm == aesni_gcm_enc ||
req->cryptlen < AVX_GEN2_OPTSIZE) {
return gcmaes_crypt_by_sg(true, req, assoclen, hash_subkey, iv,
aes_ctx);
}
if (sg_is_last(req->src) &&
(!PageHighMem(sg_page(req->src)) ||
req->src->offset + req->src->length <= PAGE_SIZE) &&
sg_is_last(req->dst) &&
(!PageHighMem(sg_page(req->dst)) ||
req->dst->offset + req->dst->length <= PAGE_SIZE)) {
one_entry_in_sg = 1;
scatterwalk_start(&src_sg_walk, req->src);
assoc = scatterwalk_map(&src_sg_walk);
src = assoc + req->assoclen;
dst = src;
if (unlikely(req->src != req->dst)) {
scatterwalk_start(&dst_sg_walk, req->dst);
dst = scatterwalk_map(&dst_sg_walk) + req->assoclen;
}
} else {
/* Allocate memory for src, dst, assoc */
assoc = kmalloc(req->cryptlen + auth_tag_len + req->assoclen,
GFP_ATOMIC);
if (unlikely(!assoc))
return -ENOMEM;
scatterwalk_map_and_copy(assoc, req->src, 0,
req->assoclen + req->cryptlen, 0);
src = assoc + req->assoclen;
dst = src;
}
kernel_fpu_begin();
aesni_gcm_enc_tfm(aes_ctx, &data, dst, src, req->cryptlen, iv,
hash_subkey, assoc, assoclen,
dst + req->cryptlen, auth_tag_len);
kernel_fpu_end();
/* The authTag (aka the Integrity Check Value) needs to be written
* back to the packet. */
if (one_entry_in_sg) {
if (unlikely(req->src != req->dst)) {
scatterwalk_unmap(dst - req->assoclen);
scatterwalk_advance(&dst_sg_walk, req->dst->length);
scatterwalk_done(&dst_sg_walk, 1, 0);
}
scatterwalk_unmap(assoc);
scatterwalk_advance(&src_sg_walk, req->src->length);
scatterwalk_done(&src_sg_walk, req->src == req->dst, 0);
} else {
scatterwalk_map_and_copy(dst, req->dst, req->assoclen,
req->cryptlen + auth_tag_len, 1);
kfree(assoc);
}
return 0;
return gcmaes_crypt_by_sg(true, req, assoclen, hash_subkey, iv,
aes_ctx);
}
static int gcmaes_decrypt(struct aead_request *req, unsigned int assoclen,
u8 *hash_subkey, u8 *iv, void *aes_ctx)
{
u8 one_entry_in_sg = 0;
u8 *src, *dst, *assoc;
unsigned long tempCipherLen = 0;
struct crypto_aead *tfm = crypto_aead_reqtfm(req);
unsigned long auth_tag_len = crypto_aead_authsize(tfm);
u8 authTag[16];
struct scatter_walk src_sg_walk;
struct scatter_walk dst_sg_walk = {};
struct gcm_context_data data AESNI_ALIGN_ATTR;
int retval = 0;
if (((struct crypto_aes_ctx *)aes_ctx)->key_length != AES_KEYSIZE_128 ||
aesni_gcm_enc_tfm == aesni_gcm_enc ||
req->cryptlen < AVX_GEN2_OPTSIZE) {
return gcmaes_crypt_by_sg(false, req, assoclen, hash_subkey, iv,
aes_ctx);
}
tempCipherLen = (unsigned long)(req->cryptlen - auth_tag_len);
if (sg_is_last(req->src) &&
(!PageHighMem(sg_page(req->src)) ||
req->src->offset + req->src->length <= PAGE_SIZE) &&
sg_is_last(req->dst) && req->dst->length &&
(!PageHighMem(sg_page(req->dst)) ||
req->dst->offset + req->dst->length <= PAGE_SIZE)) {
one_entry_in_sg = 1;
scatterwalk_start(&src_sg_walk, req->src);
assoc = scatterwalk_map(&src_sg_walk);
src = assoc + req->assoclen;
dst = src;
if (unlikely(req->src != req->dst)) {
scatterwalk_start(&dst_sg_walk, req->dst);
dst = scatterwalk_map(&dst_sg_walk) + req->assoclen;
}
} else {
/* Allocate memory for src, dst, assoc */
assoc = kmalloc(req->cryptlen + req->assoclen, GFP_ATOMIC);
if (!assoc)
return -ENOMEM;
scatterwalk_map_and_copy(assoc, req->src, 0,
req->assoclen + req->cryptlen, 0);
src = assoc + req->assoclen;
dst = src;
}
kernel_fpu_begin();
aesni_gcm_dec_tfm(aes_ctx, &data, dst, src, tempCipherLen, iv,
hash_subkey, assoc, assoclen,
authTag, auth_tag_len);
kernel_fpu_end();
/* Compare generated tag with passed in tag. */
retval = crypto_memneq(src + tempCipherLen, authTag, auth_tag_len) ?
-EBADMSG : 0;
if (one_entry_in_sg) {
if (unlikely(req->src != req->dst)) {
scatterwalk_unmap(dst - req->assoclen);
scatterwalk_advance(&dst_sg_walk, req->dst->length);
scatterwalk_done(&dst_sg_walk, 1, 0);
}
scatterwalk_unmap(assoc);
scatterwalk_advance(&src_sg_walk, req->src->length);
scatterwalk_done(&src_sg_walk, req->src == req->dst, 0);
} else {
scatterwalk_map_and_copy(dst, req->dst, req->assoclen,
tempCipherLen, 1);
kfree(assoc);
}
return retval;
return gcmaes_crypt_by_sg(false, req, assoclen, hash_subkey, iv,
aes_ctx);
}
static int helper_rfc4106_encrypt(struct aead_request *req)
@ -1420,21 +1282,18 @@ static int __init aesni_init(void)
#ifdef CONFIG_AS_AVX2
if (boot_cpu_has(X86_FEATURE_AVX2)) {
pr_info("AVX2 version of gcm_enc/dec engaged.\n");
aesni_gcm_enc_tfm = aesni_gcm_enc_avx2;
aesni_gcm_dec_tfm = aesni_gcm_dec_avx2;
aesni_gcm_tfm = &aesni_gcm_tfm_avx_gen4;
} else
#endif
#ifdef CONFIG_AS_AVX
if (boot_cpu_has(X86_FEATURE_AVX)) {
pr_info("AVX version of gcm_enc/dec engaged.\n");
aesni_gcm_enc_tfm = aesni_gcm_enc_avx;
aesni_gcm_dec_tfm = aesni_gcm_dec_avx;
aesni_gcm_tfm = &aesni_gcm_tfm_avx_gen2;
} else
#endif
{
pr_info("SSE version of gcm_enc/dec engaged.\n");
aesni_gcm_enc_tfm = aesni_gcm_enc;
aesni_gcm_dec_tfm = aesni_gcm_dec;
aesni_gcm_tfm = &aesni_gcm_tfm_sse;
}
aesni_ctr_enc_tfm = aesni_ctr_enc;
#ifdef CONFIG_AS_AVX

File diff suppressed because it is too large Load Diff

View File

@ -0,0 +1,836 @@
/* SPDX-License-Identifier: GPL-2.0+ */
/*
* ChaCha 256-bit cipher algorithm, x64 AVX-512VL functions
*
* Copyright (C) 2018 Martin Willi
*/
#include <linux/linkage.h>
.section .rodata.cst32.CTR2BL, "aM", @progbits, 32
.align 32
CTR2BL: .octa 0x00000000000000000000000000000000
.octa 0x00000000000000000000000000000001
.section .rodata.cst32.CTR4BL, "aM", @progbits, 32
.align 32
CTR4BL: .octa 0x00000000000000000000000000000002
.octa 0x00000000000000000000000000000003
.section .rodata.cst32.CTR8BL, "aM", @progbits, 32
.align 32
CTR8BL: .octa 0x00000003000000020000000100000000
.octa 0x00000007000000060000000500000004
.text
ENTRY(chacha_2block_xor_avx512vl)
# %rdi: Input state matrix, s
# %rsi: up to 2 data blocks output, o
# %rdx: up to 2 data blocks input, i
# %rcx: input/output length in bytes
# %r8d: nrounds
# This function encrypts two ChaCha blocks by loading the state
# matrix twice across four AVX registers. It performs matrix operations
# on four words in each matrix in parallel, but requires shuffling to
# rearrange the words after each round.
vzeroupper
# x0..3[0-2] = s0..3
vbroadcasti128 0x00(%rdi),%ymm0
vbroadcasti128 0x10(%rdi),%ymm1
vbroadcasti128 0x20(%rdi),%ymm2
vbroadcasti128 0x30(%rdi),%ymm3
vpaddd CTR2BL(%rip),%ymm3,%ymm3
vmovdqa %ymm0,%ymm8
vmovdqa %ymm1,%ymm9
vmovdqa %ymm2,%ymm10
vmovdqa %ymm3,%ymm11
.Ldoubleround:
# x0 += x1, x3 = rotl32(x3 ^ x0, 16)
vpaddd %ymm1,%ymm0,%ymm0
vpxord %ymm0,%ymm3,%ymm3
vprold $16,%ymm3,%ymm3
# x2 += x3, x1 = rotl32(x1 ^ x2, 12)
vpaddd %ymm3,%ymm2,%ymm2
vpxord %ymm2,%ymm1,%ymm1
vprold $12,%ymm1,%ymm1
# x0 += x1, x3 = rotl32(x3 ^ x0, 8)
vpaddd %ymm1,%ymm0,%ymm0
vpxord %ymm0,%ymm3,%ymm3
vprold $8,%ymm3,%ymm3
# x2 += x3, x1 = rotl32(x1 ^ x2, 7)
vpaddd %ymm3,%ymm2,%ymm2
vpxord %ymm2,%ymm1,%ymm1
vprold $7,%ymm1,%ymm1
# x1 = shuffle32(x1, MASK(0, 3, 2, 1))
vpshufd $0x39,%ymm1,%ymm1
# x2 = shuffle32(x2, MASK(1, 0, 3, 2))
vpshufd $0x4e,%ymm2,%ymm2
# x3 = shuffle32(x3, MASK(2, 1, 0, 3))
vpshufd $0x93,%ymm3,%ymm3
# x0 += x1, x3 = rotl32(x3 ^ x0, 16)
vpaddd %ymm1,%ymm0,%ymm0
vpxord %ymm0,%ymm3,%ymm3
vprold $16,%ymm3,%ymm3
# x2 += x3, x1 = rotl32(x1 ^ x2, 12)
vpaddd %ymm3,%ymm2,%ymm2
vpxord %ymm2,%ymm1,%ymm1
vprold $12,%ymm1,%ymm1
# x0 += x1, x3 = rotl32(x3 ^ x0, 8)
vpaddd %ymm1,%ymm0,%ymm0
vpxord %ymm0,%ymm3,%ymm3
vprold $8,%ymm3,%ymm3
# x2 += x3, x1 = rotl32(x1 ^ x2, 7)
vpaddd %ymm3,%ymm2,%ymm2
vpxord %ymm2,%ymm1,%ymm1
vprold $7,%ymm1,%ymm1
# x1 = shuffle32(x1, MASK(2, 1, 0, 3))
vpshufd $0x93,%ymm1,%ymm1
# x2 = shuffle32(x2, MASK(1, 0, 3, 2))
vpshufd $0x4e,%ymm2,%ymm2
# x3 = shuffle32(x3, MASK(0, 3, 2, 1))
vpshufd $0x39,%ymm3,%ymm3
sub $2,%r8d
jnz .Ldoubleround
# o0 = i0 ^ (x0 + s0)
vpaddd %ymm8,%ymm0,%ymm7
cmp $0x10,%rcx
jl .Lxorpart2
vpxord 0x00(%rdx),%xmm7,%xmm6
vmovdqu %xmm6,0x00(%rsi)
vextracti128 $1,%ymm7,%xmm0
# o1 = i1 ^ (x1 + s1)
vpaddd %ymm9,%ymm1,%ymm7
cmp $0x20,%rcx
jl .Lxorpart2
vpxord 0x10(%rdx),%xmm7,%xmm6
vmovdqu %xmm6,0x10(%rsi)
vextracti128 $1,%ymm7,%xmm1
# o2 = i2 ^ (x2 + s2)
vpaddd %ymm10,%ymm2,%ymm7
cmp $0x30,%rcx
jl .Lxorpart2
vpxord 0x20(%rdx),%xmm7,%xmm6
vmovdqu %xmm6,0x20(%rsi)
vextracti128 $1,%ymm7,%xmm2
# o3 = i3 ^ (x3 + s3)
vpaddd %ymm11,%ymm3,%ymm7
cmp $0x40,%rcx
jl .Lxorpart2
vpxord 0x30(%rdx),%xmm7,%xmm6
vmovdqu %xmm6,0x30(%rsi)
vextracti128 $1,%ymm7,%xmm3
# xor and write second block
vmovdqa %xmm0,%xmm7
cmp $0x50,%rcx
jl .Lxorpart2
vpxord 0x40(%rdx),%xmm7,%xmm6
vmovdqu %xmm6,0x40(%rsi)
vmovdqa %xmm1,%xmm7
cmp $0x60,%rcx
jl .Lxorpart2
vpxord 0x50(%rdx),%xmm7,%xmm6
vmovdqu %xmm6,0x50(%rsi)
vmovdqa %xmm2,%xmm7
cmp $0x70,%rcx
jl .Lxorpart2
vpxord 0x60(%rdx),%xmm7,%xmm6
vmovdqu %xmm6,0x60(%rsi)
vmovdqa %xmm3,%xmm7
cmp $0x80,%rcx
jl .Lxorpart2
vpxord 0x70(%rdx),%xmm7,%xmm6
vmovdqu %xmm6,0x70(%rsi)
.Ldone2:
vzeroupper
ret
.Lxorpart2:
# xor remaining bytes from partial register into output
mov %rcx,%rax
and $0xf,%rcx
jz .Ldone8
mov %rax,%r9
and $~0xf,%r9
mov $1,%rax
shld %cl,%rax,%rax
sub $1,%rax
kmovq %rax,%k1
vmovdqu8 (%rdx,%r9),%xmm1{%k1}{z}
vpxord %xmm7,%xmm1,%xmm1
vmovdqu8 %xmm1,(%rsi,%r9){%k1}
jmp .Ldone2
ENDPROC(chacha_2block_xor_avx512vl)
ENTRY(chacha_4block_xor_avx512vl)
# %rdi: Input state matrix, s
# %rsi: up to 4 data blocks output, o
# %rdx: up to 4 data blocks input, i
# %rcx: input/output length in bytes
# %r8d: nrounds
# This function encrypts four ChaCha blocks by loading the state
# matrix four times across eight AVX registers. It performs matrix
# operations on four words in two matrices in parallel, sequentially
# to the operations on the four words of the other two matrices. The
# required word shuffling has a rather high latency, we can do the
# arithmetic on two matrix-pairs without much slowdown.
vzeroupper
# x0..3[0-4] = s0..3
vbroadcasti128 0x00(%rdi),%ymm0
vbroadcasti128 0x10(%rdi),%ymm1
vbroadcasti128 0x20(%rdi),%ymm2
vbroadcasti128 0x30(%rdi),%ymm3
vmovdqa %ymm0,%ymm4
vmovdqa %ymm1,%ymm5
vmovdqa %ymm2,%ymm6
vmovdqa %ymm3,%ymm7
vpaddd CTR2BL(%rip),%ymm3,%ymm3
vpaddd CTR4BL(%rip),%ymm7,%ymm7
vmovdqa %ymm0,%ymm11
vmovdqa %ymm1,%ymm12
vmovdqa %ymm2,%ymm13
vmovdqa %ymm3,%ymm14
vmovdqa %ymm7,%ymm15
.Ldoubleround4:
# x0 += x1, x3 = rotl32(x3 ^ x0, 16)
vpaddd %ymm1,%ymm0,%ymm0
vpxord %ymm0,%ymm3,%ymm3
vprold $16,%ymm3,%ymm3
vpaddd %ymm5,%ymm4,%ymm4
vpxord %ymm4,%ymm7,%ymm7
vprold $16,%ymm7,%ymm7
# x2 += x3, x1 = rotl32(x1 ^ x2, 12)
vpaddd %ymm3,%ymm2,%ymm2
vpxord %ymm2,%ymm1,%ymm1
vprold $12,%ymm1,%ymm1
vpaddd %ymm7,%ymm6,%ymm6
vpxord %ymm6,%ymm5,%ymm5
vprold $12,%ymm5,%ymm5
# x0 += x1, x3 = rotl32(x3 ^ x0, 8)
vpaddd %ymm1,%ymm0,%ymm0
vpxord %ymm0,%ymm3,%ymm3
vprold $8,%ymm3,%ymm3
vpaddd %ymm5,%ymm4,%ymm4
vpxord %ymm4,%ymm7,%ymm7
vprold $8,%ymm7,%ymm7
# x2 += x3, x1 = rotl32(x1 ^ x2, 7)
vpaddd %ymm3,%ymm2,%ymm2
vpxord %ymm2,%ymm1,%ymm1
vprold $7,%ymm1,%ymm1
vpaddd %ymm7,%ymm6,%ymm6
vpxord %ymm6,%ymm5,%ymm5
vprold $7,%ymm5,%ymm5
# x1 = shuffle32(x1, MASK(0, 3, 2, 1))
vpshufd $0x39,%ymm1,%ymm1
vpshufd $0x39,%ymm5,%ymm5
# x2 = shuffle32(x2, MASK(1, 0, 3, 2))
vpshufd $0x4e,%ymm2,%ymm2
vpshufd $0x4e,%ymm6,%ymm6
# x3 = shuffle32(x3, MASK(2, 1, 0, 3))
vpshufd $0x93,%ymm3,%ymm3
vpshufd $0x93,%ymm7,%ymm7
# x0 += x1, x3 = rotl32(x3 ^ x0, 16)
vpaddd %ymm1,%ymm0,%ymm0
vpxord %ymm0,%ymm3,%ymm3
vprold $16,%ymm3,%ymm3
vpaddd %ymm5,%ymm4,%ymm4
vpxord %ymm4,%ymm7,%ymm7
vprold $16,%ymm7,%ymm7
# x2 += x3, x1 = rotl32(x1 ^ x2, 12)
vpaddd %ymm3,%ymm2,%ymm2
vpxord %ymm2,%ymm1,%ymm1
vprold $12,%ymm1,%ymm1
vpaddd %ymm7,%ymm6,%ymm6
vpxord %ymm6,%ymm5,%ymm5
vprold $12,%ymm5,%ymm5
# x0 += x1, x3 = rotl32(x3 ^ x0, 8)
vpaddd %ymm1,%ymm0,%ymm0
vpxord %ymm0,%ymm3,%ymm3
vprold $8,%ymm3,%ymm3
vpaddd %ymm5,%ymm4,%ymm4
vpxord %ymm4,%ymm7,%ymm7
vprold $8,%ymm7,%ymm7
# x2 += x3, x1 = rotl32(x1 ^ x2, 7)
vpaddd %ymm3,%ymm2,%ymm2
vpxord %ymm2,%ymm1,%ymm1
vprold $7,%ymm1,%ymm1
vpaddd %ymm7,%ymm6,%ymm6
vpxord %ymm6,%ymm5,%ymm5
vprold $7,%ymm5,%ymm5
# x1 = shuffle32(x1, MASK(2, 1, 0, 3))
vpshufd $0x93,%ymm1,%ymm1
vpshufd $0x93,%ymm5,%ymm5
# x2 = shuffle32(x2, MASK(1, 0, 3, 2))
vpshufd $0x4e,%ymm2,%ymm2
vpshufd $0x4e,%ymm6,%ymm6
# x3 = shuffle32(x3, MASK(0, 3, 2, 1))
vpshufd $0x39,%ymm3,%ymm3
vpshufd $0x39,%ymm7,%ymm7
sub $2,%r8d
jnz .Ldoubleround4
# o0 = i0 ^ (x0 + s0), first block
vpaddd %ymm11,%ymm0,%ymm10
cmp $0x10,%rcx
jl .Lxorpart4
vpxord 0x00(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0x00(%rsi)
vextracti128 $1,%ymm10,%xmm0
# o1 = i1 ^ (x1 + s1), first block
vpaddd %ymm12,%ymm1,%ymm10
cmp $0x20,%rcx
jl .Lxorpart4
vpxord 0x10(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0x10(%rsi)
vextracti128 $1,%ymm10,%xmm1
# o2 = i2 ^ (x2 + s2), first block
vpaddd %ymm13,%ymm2,%ymm10
cmp $0x30,%rcx
jl .Lxorpart4
vpxord 0x20(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0x20(%rsi)
vextracti128 $1,%ymm10,%xmm2
# o3 = i3 ^ (x3 + s3), first block
vpaddd %ymm14,%ymm3,%ymm10
cmp $0x40,%rcx
jl .Lxorpart4
vpxord 0x30(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0x30(%rsi)
vextracti128 $1,%ymm10,%xmm3
# xor and write second block
vmovdqa %xmm0,%xmm10
cmp $0x50,%rcx
jl .Lxorpart4
vpxord 0x40(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0x40(%rsi)
vmovdqa %xmm1,%xmm10
cmp $0x60,%rcx
jl .Lxorpart4
vpxord 0x50(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0x50(%rsi)
vmovdqa %xmm2,%xmm10
cmp $0x70,%rcx
jl .Lxorpart4
vpxord 0x60(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0x60(%rsi)
vmovdqa %xmm3,%xmm10
cmp $0x80,%rcx
jl .Lxorpart4
vpxord 0x70(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0x70(%rsi)
# o0 = i0 ^ (x0 + s0), third block
vpaddd %ymm11,%ymm4,%ymm10
cmp $0x90,%rcx
jl .Lxorpart4
vpxord 0x80(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0x80(%rsi)
vextracti128 $1,%ymm10,%xmm4
# o1 = i1 ^ (x1 + s1), third block
vpaddd %ymm12,%ymm5,%ymm10
cmp $0xa0,%rcx
jl .Lxorpart4
vpxord 0x90(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0x90(%rsi)
vextracti128 $1,%ymm10,%xmm5
# o2 = i2 ^ (x2 + s2), third block
vpaddd %ymm13,%ymm6,%ymm10
cmp $0xb0,%rcx
jl .Lxorpart4
vpxord 0xa0(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0xa0(%rsi)
vextracti128 $1,%ymm10,%xmm6
# o3 = i3 ^ (x3 + s3), third block
vpaddd %ymm15,%ymm7,%ymm10
cmp $0xc0,%rcx
jl .Lxorpart4
vpxord 0xb0(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0xb0(%rsi)
vextracti128 $1,%ymm10,%xmm7
# xor and write fourth block
vmovdqa %xmm4,%xmm10
cmp $0xd0,%rcx
jl .Lxorpart4
vpxord 0xc0(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0xc0(%rsi)
vmovdqa %xmm5,%xmm10
cmp $0xe0,%rcx
jl .Lxorpart4
vpxord 0xd0(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0xd0(%rsi)
vmovdqa %xmm6,%xmm10
cmp $0xf0,%rcx
jl .Lxorpart4
vpxord 0xe0(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0xe0(%rsi)
vmovdqa %xmm7,%xmm10
cmp $0x100,%rcx
jl .Lxorpart4
vpxord 0xf0(%rdx),%xmm10,%xmm9
vmovdqu %xmm9,0xf0(%rsi)
.Ldone4:
vzeroupper
ret
.Lxorpart4:
# xor remaining bytes from partial register into output
mov %rcx,%rax
and $0xf,%rcx
jz .Ldone8
mov %rax,%r9
and $~0xf,%r9
mov $1,%rax
shld %cl,%rax,%rax
sub $1,%rax
kmovq %rax,%k1
vmovdqu8 (%rdx,%r9),%xmm1{%k1}{z}
vpxord %xmm10,%xmm1,%xmm1
vmovdqu8 %xmm1,(%rsi,%r9){%k1}
jmp .Ldone4
ENDPROC(chacha_4block_xor_avx512vl)
ENTRY(chacha_8block_xor_avx512vl)
# %rdi: Input state matrix, s
# %rsi: up to 8 data blocks output, o
# %rdx: up to 8 data blocks input, i
# %rcx: input/output length in bytes
# %r8d: nrounds
# This function encrypts eight consecutive ChaCha blocks by loading
# the state matrix in AVX registers eight times. Compared to AVX2, this
# mostly benefits from the new rotate instructions in VL and the
# additional registers.
vzeroupper
# x0..15[0-7] = s[0..15]
vpbroadcastd 0x00(%rdi),%ymm0
vpbroadcastd 0x04(%rdi),%ymm1
vpbroadcastd 0x08(%rdi),%ymm2
vpbroadcastd 0x0c(%rdi),%ymm3
vpbroadcastd 0x10(%rdi),%ymm4
vpbroadcastd 0x14(%rdi),%ymm5
vpbroadcastd 0x18(%rdi),%ymm6
vpbroadcastd 0x1c(%rdi),%ymm7
vpbroadcastd 0x20(%rdi),%ymm8
vpbroadcastd 0x24(%rdi),%ymm9
vpbroadcastd 0x28(%rdi),%ymm10
vpbroadcastd 0x2c(%rdi),%ymm11
vpbroadcastd 0x30(%rdi),%ymm12
vpbroadcastd 0x34(%rdi),%ymm13
vpbroadcastd 0x38(%rdi),%ymm14
vpbroadcastd 0x3c(%rdi),%ymm15
# x12 += counter values 0-3
vpaddd CTR8BL(%rip),%ymm12,%ymm12
vmovdqa64 %ymm0,%ymm16
vmovdqa64 %ymm1,%ymm17
vmovdqa64 %ymm2,%ymm18
vmovdqa64 %ymm3,%ymm19
vmovdqa64 %ymm4,%ymm20
vmovdqa64 %ymm5,%ymm21
vmovdqa64 %ymm6,%ymm22
vmovdqa64 %ymm7,%ymm23
vmovdqa64 %ymm8,%ymm24
vmovdqa64 %ymm9,%ymm25
vmovdqa64 %ymm10,%ymm26
vmovdqa64 %ymm11,%ymm27
vmovdqa64 %ymm12,%ymm28
vmovdqa64 %ymm13,%ymm29
vmovdqa64 %ymm14,%ymm30
vmovdqa64 %ymm15,%ymm31
.Ldoubleround8:
# x0 += x4, x12 = rotl32(x12 ^ x0, 16)
vpaddd %ymm0,%ymm4,%ymm0
vpxord %ymm0,%ymm12,%ymm12
vprold $16,%ymm12,%ymm12
# x1 += x5, x13 = rotl32(x13 ^ x1, 16)
vpaddd %ymm1,%ymm5,%ymm1
vpxord %ymm1,%ymm13,%ymm13
vprold $16,%ymm13,%ymm13
# x2 += x6, x14 = rotl32(x14 ^ x2, 16)
vpaddd %ymm2,%ymm6,%ymm2
vpxord %ymm2,%ymm14,%ymm14
vprold $16,%ymm14,%ymm14
# x3 += x7, x15 = rotl32(x15 ^ x3, 16)
vpaddd %ymm3,%ymm7,%ymm3
vpxord %ymm3,%ymm15,%ymm15
vprold $16,%ymm15,%ymm15
# x8 += x12, x4 = rotl32(x4 ^ x8, 12)
vpaddd %ymm12,%ymm8,%ymm8
vpxord %ymm8,%ymm4,%ymm4
vprold $12,%ymm4,%ymm4
# x9 += x13, x5 = rotl32(x5 ^ x9, 12)
vpaddd %ymm13,%ymm9,%ymm9
vpxord %ymm9,%ymm5,%ymm5
vprold $12,%ymm5,%ymm5
# x10 += x14, x6 = rotl32(x6 ^ x10, 12)
vpaddd %ymm14,%ymm10,%ymm10
vpxord %ymm10,%ymm6,%ymm6
vprold $12,%ymm6,%ymm6
# x11 += x15, x7 = rotl32(x7 ^ x11, 12)
vpaddd %ymm15,%ymm11,%ymm11
vpxord %ymm11,%ymm7,%ymm7
vprold $12,%ymm7,%ymm7
# x0 += x4, x12 = rotl32(x12 ^ x0, 8)
vpaddd %ymm0,%ymm4,%ymm0
vpxord %ymm0,%ymm12,%ymm12
vprold $8,%ymm12,%ymm12
# x1 += x5, x13 = rotl32(x13 ^ x1, 8)
vpaddd %ymm1,%ymm5,%ymm1
vpxord %ymm1,%ymm13,%ymm13
vprold $8,%ymm13,%ymm13
# x2 += x6, x14 = rotl32(x14 ^ x2, 8)
vpaddd %ymm2,%ymm6,%ymm2
vpxord %ymm2,%ymm14,%ymm14
vprold $8,%ymm14,%ymm14
# x3 += x7, x15 = rotl32(x15 ^ x3, 8)
vpaddd %ymm3,%ymm7,%ymm3
vpxord %ymm3,%ymm15,%ymm15
vprold $8,%ymm15,%ymm15
# x8 += x12, x4 = rotl32(x4 ^ x8, 7)
vpaddd %ymm12,%ymm8,%ymm8
vpxord %ymm8,%ymm4,%ymm4
vprold $7,%ymm4,%ymm4
# x9 += x13, x5 = rotl32(x5 ^ x9, 7)
vpaddd %ymm13,%ymm9,%ymm9
vpxord %ymm9,%ymm5,%ymm5
vprold $7,%ymm5,%ymm5
# x10 += x14, x6 = rotl32(x6 ^ x10, 7)
vpaddd %ymm14,%ymm10,%ymm10
vpxord %ymm10,%ymm6,%ymm6
vprold $7,%ymm6,%ymm6
# x11 += x15, x7 = rotl32(x7 ^ x11, 7)
vpaddd %ymm15,%ymm11,%ymm11
vpxord %ymm11,%ymm7,%ymm7
vprold $7,%ymm7,%ymm7
# x0 += x5, x15 = rotl32(x15 ^ x0, 16)
vpaddd %ymm0,%ymm5,%ymm0
vpxord %ymm0,%ymm15,%ymm15
vprold $16,%ymm15,%ymm15
# x1 += x6, x12 = rotl32(x12 ^ x1, 16)
vpaddd %ymm1,%ymm6,%ymm1
vpxord %ymm1,%ymm12,%ymm12
vprold $16,%ymm12,%ymm12
# x2 += x7, x13 = rotl32(x13 ^ x2, 16)
vpaddd %ymm2,%ymm7,%ymm2
vpxord %ymm2,%ymm13,%ymm13
vprold $16,%ymm13,%ymm13
# x3 += x4, x14 = rotl32(x14 ^ x3, 16)
vpaddd %ymm3,%ymm4,%ymm3
vpxord %ymm3,%ymm14,%ymm14
vprold $16,%ymm14,%ymm14
# x10 += x15, x5 = rotl32(x5 ^ x10, 12)
vpaddd %ymm15,%ymm10,%ymm10
vpxord %ymm10,%ymm5,%ymm5
vprold $12,%ymm5,%ymm5
# x11 += x12, x6 = rotl32(x6 ^ x11, 12)
vpaddd %ymm12,%ymm11,%ymm11
vpxord %ymm11,%ymm6,%ymm6
vprold $12,%ymm6,%ymm6
# x8 += x13, x7 = rotl32(x7 ^ x8, 12)
vpaddd %ymm13,%ymm8,%ymm8
vpxord %ymm8,%ymm7,%ymm7
vprold $12,%ymm7,%ymm7
# x9 += x14, x4 = rotl32(x4 ^ x9, 12)
vpaddd %ymm14,%ymm9,%ymm9
vpxord %ymm9,%ymm4,%ymm4
vprold $12,%ymm4,%ymm4
# x0 += x5, x15 = rotl32(x15 ^ x0, 8)
vpaddd %ymm0,%ymm5,%ymm0
vpxord %ymm0,%ymm15,%ymm15
vprold $8,%ymm15,%ymm15
# x1 += x6, x12 = rotl32(x12 ^ x1, 8)
vpaddd %ymm1,%ymm6,%ymm1
vpxord %ymm1,%ymm12,%ymm12
vprold $8,%ymm12,%ymm12
# x2 += x7, x13 = rotl32(x13 ^ x2, 8)
vpaddd %ymm2,%ymm7,%ymm2
vpxord %ymm2,%ymm13,%ymm13
vprold $8,%ymm13,%ymm13
# x3 += x4, x14 = rotl32(x14 ^ x3, 8)
vpaddd %ymm3,%ymm4,%ymm3
vpxord %ymm3,%ymm14,%ymm14
vprold $8,%ymm14,%ymm14
# x10 += x15, x5 = rotl32(x5 ^ x10, 7)
vpaddd %ymm15,%ymm10,%ymm10
vpxord %ymm10,%ymm5,%ymm5
vprold $7,%ymm5,%ymm5
# x11 += x12, x6 = rotl32(x6 ^ x11, 7)
vpaddd %ymm12,%ymm11,%ymm11
vpxord %ymm11,%ymm6,%ymm6
vprold $7,%ymm6,%ymm6
# x8 += x13, x7 = rotl32(x7 ^ x8, 7)
vpaddd %ymm13,%ymm8,%ymm8
vpxord %ymm8,%ymm7,%ymm7
vprold $7,%ymm7,%ymm7
# x9 += x14, x4 = rotl32(x4 ^ x9, 7)
vpaddd %ymm14,%ymm9,%ymm9
vpxord %ymm9,%ymm4,%ymm4
vprold $7,%ymm4,%ymm4
sub $2,%r8d
jnz .Ldoubleround8
# x0..15[0-3] += s[0..15]
vpaddd %ymm16,%ymm0,%ymm0
vpaddd %ymm17,%ymm1,%ymm1
vpaddd %ymm18,%ymm2,%ymm2
vpaddd %ymm19,%ymm3,%ymm3
vpaddd %ymm20,%ymm4,%ymm4
vpaddd %ymm21,%ymm5,%ymm5
vpaddd %ymm22,%ymm6,%ymm6
vpaddd %ymm23,%ymm7,%ymm7
vpaddd %ymm24,%ymm8,%ymm8
vpaddd %ymm25,%ymm9,%ymm9
vpaddd %ymm26,%ymm10,%ymm10
vpaddd %ymm27,%ymm11,%ymm11
vpaddd %ymm28,%ymm12,%ymm12
vpaddd %ymm29,%ymm13,%ymm13
vpaddd %ymm30,%ymm14,%ymm14
vpaddd %ymm31,%ymm15,%ymm15
# interleave 32-bit words in state n, n+1
vpunpckldq %ymm1,%ymm0,%ymm16
vpunpckhdq %ymm1,%ymm0,%ymm17
vpunpckldq %ymm3,%ymm2,%ymm18
vpunpckhdq %ymm3,%ymm2,%ymm19
vpunpckldq %ymm5,%ymm4,%ymm20
vpunpckhdq %ymm5,%ymm4,%ymm21
vpunpckldq %ymm7,%ymm6,%ymm22
vpunpckhdq %ymm7,%ymm6,%ymm23
vpunpckldq %ymm9,%ymm8,%ymm24
vpunpckhdq %ymm9,%ymm8,%ymm25
vpunpckldq %ymm11,%ymm10,%ymm26
vpunpckhdq %ymm11,%ymm10,%ymm27
vpunpckldq %ymm13,%ymm12,%ymm28
vpunpckhdq %ymm13,%ymm12,%ymm29
vpunpckldq %ymm15,%ymm14,%ymm30
vpunpckhdq %ymm15,%ymm14,%ymm31
# interleave 64-bit words in state n, n+2
vpunpcklqdq %ymm18,%ymm16,%ymm0
vpunpcklqdq %ymm19,%ymm17,%ymm1
vpunpckhqdq %ymm18,%ymm16,%ymm2
vpunpckhqdq %ymm19,%ymm17,%ymm3
vpunpcklqdq %ymm22,%ymm20,%ymm4
vpunpcklqdq %ymm23,%ymm21,%ymm5
vpunpckhqdq %ymm22,%ymm20,%ymm6
vpunpckhqdq %ymm23,%ymm21,%ymm7
vpunpcklqdq %ymm26,%ymm24,%ymm8
vpunpcklqdq %ymm27,%ymm25,%ymm9
vpunpckhqdq %ymm26,%ymm24,%ymm10
vpunpckhqdq %ymm27,%ymm25,%ymm11
vpunpcklqdq %ymm30,%ymm28,%ymm12
vpunpcklqdq %ymm31,%ymm29,%ymm13
vpunpckhqdq %ymm30,%ymm28,%ymm14
vpunpckhqdq %ymm31,%ymm29,%ymm15
# interleave 128-bit words in state n, n+4
# xor/write first four blocks
vmovdqa64 %ymm0,%ymm16
vperm2i128 $0x20,%ymm4,%ymm0,%ymm0
cmp $0x0020,%rcx
jl .Lxorpart8
vpxord 0x0000(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x0000(%rsi)
vmovdqa64 %ymm16,%ymm0
vperm2i128 $0x31,%ymm4,%ymm0,%ymm4
vperm2i128 $0x20,%ymm12,%ymm8,%ymm0
cmp $0x0040,%rcx
jl .Lxorpart8
vpxord 0x0020(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x0020(%rsi)
vperm2i128 $0x31,%ymm12,%ymm8,%ymm12
vperm2i128 $0x20,%ymm6,%ymm2,%ymm0
cmp $0x0060,%rcx
jl .Lxorpart8
vpxord 0x0040(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x0040(%rsi)
vperm2i128 $0x31,%ymm6,%ymm2,%ymm6
vperm2i128 $0x20,%ymm14,%ymm10,%ymm0
cmp $0x0080,%rcx
jl .Lxorpart8
vpxord 0x0060(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x0060(%rsi)
vperm2i128 $0x31,%ymm14,%ymm10,%ymm14
vperm2i128 $0x20,%ymm5,%ymm1,%ymm0
cmp $0x00a0,%rcx
jl .Lxorpart8
vpxord 0x0080(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x0080(%rsi)
vperm2i128 $0x31,%ymm5,%ymm1,%ymm5
vperm2i128 $0x20,%ymm13,%ymm9,%ymm0
cmp $0x00c0,%rcx
jl .Lxorpart8
vpxord 0x00a0(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x00a0(%rsi)
vperm2i128 $0x31,%ymm13,%ymm9,%ymm13
vperm2i128 $0x20,%ymm7,%ymm3,%ymm0
cmp $0x00e0,%rcx
jl .Lxorpart8
vpxord 0x00c0(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x00c0(%rsi)
vperm2i128 $0x31,%ymm7,%ymm3,%ymm7
vperm2i128 $0x20,%ymm15,%ymm11,%ymm0
cmp $0x0100,%rcx
jl .Lxorpart8
vpxord 0x00e0(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x00e0(%rsi)
vperm2i128 $0x31,%ymm15,%ymm11,%ymm15
# xor remaining blocks, write to output
vmovdqa64 %ymm4,%ymm0
cmp $0x0120,%rcx
jl .Lxorpart8
vpxord 0x0100(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x0100(%rsi)
vmovdqa64 %ymm12,%ymm0
cmp $0x0140,%rcx
jl .Lxorpart8
vpxord 0x0120(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x0120(%rsi)
vmovdqa64 %ymm6,%ymm0
cmp $0x0160,%rcx
jl .Lxorpart8
vpxord 0x0140(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x0140(%rsi)
vmovdqa64 %ymm14,%ymm0
cmp $0x0180,%rcx
jl .Lxorpart8
vpxord 0x0160(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x0160(%rsi)
vmovdqa64 %ymm5,%ymm0
cmp $0x01a0,%rcx
jl .Lxorpart8
vpxord 0x0180(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x0180(%rsi)
vmovdqa64 %ymm13,%ymm0
cmp $0x01c0,%rcx
jl .Lxorpart8
vpxord 0x01a0(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x01a0(%rsi)
vmovdqa64 %ymm7,%ymm0
cmp $0x01e0,%rcx
jl .Lxorpart8
vpxord 0x01c0(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x01c0(%rsi)
vmovdqa64 %ymm15,%ymm0
cmp $0x0200,%rcx
jl .Lxorpart8
vpxord 0x01e0(%rdx),%ymm0,%ymm0
vmovdqu64 %ymm0,0x01e0(%rsi)
.Ldone8:
vzeroupper
ret
.Lxorpart8:
# xor remaining bytes from partial register into output
mov %rcx,%rax
and $0x1f,%rcx
jz .Ldone8
mov %rax,%r9
and $~0x1f,%r9
mov $1,%rax
shld %cl,%rax,%rax
sub $1,%rax
kmovq %rax,%k1
vmovdqu8 (%rdx,%r9),%ymm1{%k1}{z}
vpxord %ymm0,%ymm1,%ymm1
vmovdqu8 %ymm1,(%rsi,%r9){%k1}
jmp .Ldone8
ENDPROC(chacha_8block_xor_avx512vl)

View File

@ -1,5 +1,5 @@
/*
* ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
* ChaCha 256-bit cipher algorithm, x64 SSSE3 functions
*
* Copyright (C) 2015 Martin Willi
*
@ -10,6 +10,7 @@
*/
#include <linux/linkage.h>
#include <asm/frame.h>
.section .rodata.cst16.ROT8, "aM", @progbits, 16
.align 16
@ -23,35 +24,25 @@ CTRINC: .octa 0x00000003000000020000000100000000
.text
ENTRY(chacha20_block_xor_ssse3)
# %rdi: Input state matrix, s
# %rsi: 1 data block output, o
# %rdx: 1 data block input, i
# This function encrypts one ChaCha20 block by loading the state matrix
# in four SSE registers. It performs matrix operation on four words in
# parallel, but requireds shuffling to rearrange the words after each
# round. 8/16-bit word rotation is done with the slightly better
# performing SSSE3 byte shuffling, 7/12-bit word rotation uses
# traditional shift+OR.
# x0..3 = s0..3
movdqa 0x00(%rdi),%xmm0
movdqa 0x10(%rdi),%xmm1
movdqa 0x20(%rdi),%xmm2
movdqa 0x30(%rdi),%xmm3
movdqa %xmm0,%xmm8
movdqa %xmm1,%xmm9
movdqa %xmm2,%xmm10
movdqa %xmm3,%xmm11
/*
* chacha_permute - permute one block
*
* Permute one 64-byte block where the state matrix is in %xmm0-%xmm3. This
* function performs matrix operations on four words in parallel, but requires
* shuffling to rearrange the words after each round. 8/16-bit word rotation is
* done with the slightly better performing SSSE3 byte shuffling, 7/12-bit word
* rotation uses traditional shift+OR.
*
* The round count is given in %r8d.
*
* Clobbers: %r8d, %xmm4-%xmm7
*/
chacha_permute:
movdqa ROT8(%rip),%xmm4
movdqa ROT16(%rip),%xmm5
mov $10,%ecx
.Ldoubleround:
# x0 += x1, x3 = rotl32(x3 ^ x0, 16)
paddd %xmm1,%xmm0
pxor %xmm0,%xmm3
@ -118,39 +109,129 @@ ENTRY(chacha20_block_xor_ssse3)
# x3 = shuffle32(x3, MASK(0, 3, 2, 1))
pshufd $0x39,%xmm3,%xmm3
dec %ecx
sub $2,%r8d
jnz .Ldoubleround
ret
ENDPROC(chacha_permute)
ENTRY(chacha_block_xor_ssse3)
# %rdi: Input state matrix, s
# %rsi: up to 1 data block output, o
# %rdx: up to 1 data block input, i
# %rcx: input/output length in bytes
# %r8d: nrounds
FRAME_BEGIN
# x0..3 = s0..3
movdqa 0x00(%rdi),%xmm0
movdqa 0x10(%rdi),%xmm1
movdqa 0x20(%rdi),%xmm2
movdqa 0x30(%rdi),%xmm3
movdqa %xmm0,%xmm8
movdqa %xmm1,%xmm9
movdqa %xmm2,%xmm10
movdqa %xmm3,%xmm11
mov %rcx,%rax
call chacha_permute
# o0 = i0 ^ (x0 + s0)
movdqu 0x00(%rdx),%xmm4
paddd %xmm8,%xmm0
cmp $0x10,%rax
jl .Lxorpart
movdqu 0x00(%rdx),%xmm4
pxor %xmm4,%xmm0
movdqu %xmm0,0x00(%rsi)
# o1 = i1 ^ (x1 + s1)
movdqu 0x10(%rdx),%xmm5
paddd %xmm9,%xmm1
pxor %xmm5,%xmm1
movdqu %xmm1,0x10(%rsi)
movdqa %xmm1,%xmm0
cmp $0x20,%rax
jl .Lxorpart
movdqu 0x10(%rdx),%xmm0
pxor %xmm1,%xmm0
movdqu %xmm0,0x10(%rsi)
# o2 = i2 ^ (x2 + s2)
movdqu 0x20(%rdx),%xmm6
paddd %xmm10,%xmm2
pxor %xmm6,%xmm2
movdqu %xmm2,0x20(%rsi)
movdqa %xmm2,%xmm0
cmp $0x30,%rax
jl .Lxorpart
movdqu 0x20(%rdx),%xmm0
pxor %xmm2,%xmm0
movdqu %xmm0,0x20(%rsi)
# o3 = i3 ^ (x3 + s3)
movdqu 0x30(%rdx),%xmm7
paddd %xmm11,%xmm3
pxor %xmm7,%xmm3
movdqu %xmm3,0x30(%rsi)
movdqa %xmm3,%xmm0
cmp $0x40,%rax
jl .Lxorpart
movdqu 0x30(%rdx),%xmm0
pxor %xmm3,%xmm0
movdqu %xmm0,0x30(%rsi)
.Ldone:
FRAME_END
ret
ENDPROC(chacha20_block_xor_ssse3)
ENTRY(chacha20_4block_xor_ssse3)
.Lxorpart:
# xor remaining bytes from partial register into output
mov %rax,%r9
and $0x0f,%r9
jz .Ldone
and $~0x0f,%rax
mov %rsi,%r11
lea 8(%rsp),%r10
sub $0x10,%rsp
and $~31,%rsp
lea (%rdx,%rax),%rsi
mov %rsp,%rdi
mov %r9,%rcx
rep movsb
pxor 0x00(%rsp),%xmm0
movdqa %xmm0,0x00(%rsp)
mov %rsp,%rsi
lea (%r11,%rax),%rdi
mov %r9,%rcx
rep movsb
lea -8(%r10),%rsp
jmp .Ldone
ENDPROC(chacha_block_xor_ssse3)
ENTRY(hchacha_block_ssse3)
# %rdi: Input state matrix, s
# %rsi: 4 data blocks output, o
# %rdx: 4 data blocks input, i
# %rsi: output (8 32-bit words)
# %edx: nrounds
FRAME_BEGIN
# This function encrypts four consecutive ChaCha20 blocks by loading the
movdqa 0x00(%rdi),%xmm0
movdqa 0x10(%rdi),%xmm1
movdqa 0x20(%rdi),%xmm2
movdqa 0x30(%rdi),%xmm3
mov %edx,%r8d
call chacha_permute
movdqu %xmm0,0x00(%rsi)
movdqu %xmm3,0x10(%rsi)
FRAME_END
ret
ENDPROC(hchacha_block_ssse3)
ENTRY(chacha_4block_xor_ssse3)
# %rdi: Input state matrix, s
# %rsi: up to 4 data blocks output, o
# %rdx: up to 4 data blocks input, i
# %rcx: input/output length in bytes
# %r8d: nrounds
# This function encrypts four consecutive ChaCha blocks by loading the
# the state matrix in SSE registers four times. As we need some scratch
# registers, we save the first four registers on the stack. The
# algorithm performs each operation on the corresponding word of each
@ -163,6 +244,7 @@ ENTRY(chacha20_4block_xor_ssse3)
lea 8(%rsp),%r10
sub $0x80,%rsp
and $~63,%rsp
mov %rcx,%rax
# x0..15[0-3] = s0..3[0..3]
movq 0x00(%rdi),%xmm1
@ -202,8 +284,6 @@ ENTRY(chacha20_4block_xor_ssse3)
# x12 += counter values 0-3
paddd %xmm1,%xmm12
mov $10,%ecx
.Ldoubleround4:
# x0 += x4, x12 = rotl32(x12 ^ x0, 16)
movdqa 0x00(%rsp),%xmm0
@ -421,7 +501,7 @@ ENTRY(chacha20_4block_xor_ssse3)
psrld $25,%xmm4
por %xmm0,%xmm4
dec %ecx
sub $2,%r8d
jnz .Ldoubleround4
# x0[0-3] += s0[0]
@ -573,58 +653,143 @@ ENTRY(chacha20_4block_xor_ssse3)
# xor with corresponding input, write to output
movdqa 0x00(%rsp),%xmm0
cmp $0x10,%rax
jl .Lxorpart4
movdqu 0x00(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0x00(%rsi)
movdqa 0x10(%rsp),%xmm0
movdqu 0x80(%rdx),%xmm1
movdqu %xmm4,%xmm0
cmp $0x20,%rax
jl .Lxorpart4
movdqu 0x10(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0x80(%rsi)
movdqu %xmm0,0x10(%rsi)
movdqu %xmm8,%xmm0
cmp $0x30,%rax
jl .Lxorpart4
movdqu 0x20(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0x20(%rsi)
movdqu %xmm12,%xmm0
cmp $0x40,%rax
jl .Lxorpart4
movdqu 0x30(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0x30(%rsi)
movdqa 0x20(%rsp),%xmm0
cmp $0x50,%rax
jl .Lxorpart4
movdqu 0x40(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0x40(%rsi)
movdqu %xmm6,%xmm0
cmp $0x60,%rax
jl .Lxorpart4
movdqu 0x50(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0x50(%rsi)
movdqu %xmm10,%xmm0
cmp $0x70,%rax
jl .Lxorpart4
movdqu 0x60(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0x60(%rsi)
movdqu %xmm14,%xmm0
cmp $0x80,%rax
jl .Lxorpart4
movdqu 0x70(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0x70(%rsi)
movdqa 0x10(%rsp),%xmm0
cmp $0x90,%rax
jl .Lxorpart4
movdqu 0x80(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0x80(%rsi)
movdqu %xmm5,%xmm0
cmp $0xa0,%rax
jl .Lxorpart4
movdqu 0x90(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0x90(%rsi)
movdqu %xmm9,%xmm0
cmp $0xb0,%rax
jl .Lxorpart4
movdqu 0xa0(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0xa0(%rsi)
movdqu %xmm13,%xmm0
cmp $0xc0,%rax
jl .Lxorpart4
movdqu 0xb0(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0xb0(%rsi)
movdqa 0x30(%rsp),%xmm0
cmp $0xd0,%rax
jl .Lxorpart4
movdqu 0xc0(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0xc0(%rsi)
movdqu 0x10(%rdx),%xmm1
pxor %xmm1,%xmm4
movdqu %xmm4,0x10(%rsi)
movdqu 0x90(%rdx),%xmm1
pxor %xmm1,%xmm5
movdqu %xmm5,0x90(%rsi)
movdqu 0x50(%rdx),%xmm1
pxor %xmm1,%xmm6
movdqu %xmm6,0x50(%rsi)
movdqu 0xd0(%rdx),%xmm1
pxor %xmm1,%xmm7
movdqu %xmm7,0xd0(%rsi)
movdqu 0x20(%rdx),%xmm1
pxor %xmm1,%xmm8
movdqu %xmm8,0x20(%rsi)
movdqu 0xa0(%rdx),%xmm1
pxor %xmm1,%xmm9
movdqu %xmm9,0xa0(%rsi)
movdqu 0x60(%rdx),%xmm1
pxor %xmm1,%xmm10
movdqu %xmm10,0x60(%rsi)
movdqu 0xe0(%rdx),%xmm1
pxor %xmm1,%xmm11
movdqu %xmm11,0xe0(%rsi)
movdqu 0x30(%rdx),%xmm1
pxor %xmm1,%xmm12
movdqu %xmm12,0x30(%rsi)
movdqu 0xb0(%rdx),%xmm1
pxor %xmm1,%xmm13
movdqu %xmm13,0xb0(%rsi)
movdqu 0x70(%rdx),%xmm1
pxor %xmm1,%xmm14
movdqu %xmm14,0x70(%rsi)
movdqu 0xf0(%rdx),%xmm1
pxor %xmm1,%xmm15
movdqu %xmm15,0xf0(%rsi)
movdqu %xmm7,%xmm0
cmp $0xe0,%rax
jl .Lxorpart4
movdqu 0xd0(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0xd0(%rsi)
movdqu %xmm11,%xmm0
cmp $0xf0,%rax
jl .Lxorpart4
movdqu 0xe0(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0xe0(%rsi)
movdqu %xmm15,%xmm0
cmp $0x100,%rax
jl .Lxorpart4
movdqu 0xf0(%rdx),%xmm1
pxor %xmm1,%xmm0
movdqu %xmm0,0xf0(%rsi)
.Ldone4:
lea -8(%r10),%rsp
ret
ENDPROC(chacha20_4block_xor_ssse3)
.Lxorpart4:
# xor remaining bytes from partial register into output
mov %rax,%r9
and $0x0f,%r9
jz .Ldone4
and $~0x0f,%rax
mov %rsi,%r11
lea (%rdx,%rax),%rsi
mov %rsp,%rdi
mov %r9,%rcx
rep movsb
pxor 0x00(%rsp),%xmm0
movdqa %xmm0,0x00(%rsp)
mov %rsp,%rsi
lea (%r11,%rax),%rdi
mov %r9,%rcx
rep movsb
jmp .Ldone4
ENDPROC(chacha_4block_xor_ssse3)

View File

@ -1,448 +0,0 @@
/*
* ChaCha20 256-bit cipher algorithm, RFC7539, x64 AVX2 functions
*
* Copyright (C) 2015 Martin Willi
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*/
#include <linux/linkage.h>
.section .rodata.cst32.ROT8, "aM", @progbits, 32
.align 32
ROT8: .octa 0x0e0d0c0f0a09080b0605040702010003
.octa 0x0e0d0c0f0a09080b0605040702010003
.section .rodata.cst32.ROT16, "aM", @progbits, 32
.align 32
ROT16: .octa 0x0d0c0f0e09080b0a0504070601000302
.octa 0x0d0c0f0e09080b0a0504070601000302
.section .rodata.cst32.CTRINC, "aM", @progbits, 32
.align 32
CTRINC: .octa 0x00000003000000020000000100000000
.octa 0x00000007000000060000000500000004
.text
ENTRY(chacha20_8block_xor_avx2)
# %rdi: Input state matrix, s
# %rsi: 8 data blocks output, o
# %rdx: 8 data blocks input, i
# This function encrypts eight consecutive ChaCha20 blocks by loading
# the state matrix in AVX registers eight times. As we need some
# scratch registers, we save the first four registers on the stack. The
# algorithm performs each operation on the corresponding word of each
# state matrix, hence requires no word shuffling. For final XORing step
# we transpose the matrix by interleaving 32-, 64- and then 128-bit
# words, which allows us to do XOR in AVX registers. 8/16-bit word
# rotation is done with the slightly better performing byte shuffling,
# 7/12-bit word rotation uses traditional shift+OR.
vzeroupper
# 4 * 32 byte stack, 32-byte aligned
lea 8(%rsp),%r10
and $~31, %rsp
sub $0x80, %rsp
# x0..15[0-7] = s[0..15]
vpbroadcastd 0x00(%rdi),%ymm0
vpbroadcastd 0x04(%rdi),%ymm1
vpbroadcastd 0x08(%rdi),%ymm2
vpbroadcastd 0x0c(%rdi),%ymm3
vpbroadcastd 0x10(%rdi),%ymm4
vpbroadcastd 0x14(%rdi),%ymm5
vpbroadcastd 0x18(%rdi),%ymm6
vpbroadcastd 0x1c(%rdi),%ymm7
vpbroadcastd 0x20(%rdi),%ymm8
vpbroadcastd 0x24(%rdi),%ymm9
vpbroadcastd 0x28(%rdi),%ymm10
vpbroadcastd 0x2c(%rdi),%ymm11
vpbroadcastd 0x30(%rdi),%ymm12
vpbroadcastd 0x34(%rdi),%ymm13
vpbroadcastd 0x38(%rdi),%ymm14
vpbroadcastd 0x3c(%rdi),%ymm15
# x0..3 on stack
vmovdqa %ymm0,0x00(%rsp)
vmovdqa %ymm1,0x20(%rsp)
vmovdqa %ymm2,0x40(%rsp)
vmovdqa %ymm3,0x60(%rsp)
vmovdqa CTRINC(%rip),%ymm1
vmovdqa ROT8(%rip),%ymm2
vmovdqa ROT16(%rip),%ymm3
# x12 += counter values 0-3
vpaddd %ymm1,%ymm12,%ymm12
mov $10,%ecx
.Ldoubleround8:
# x0 += x4, x12 = rotl32(x12 ^ x0, 16)
vpaddd 0x00(%rsp),%ymm4,%ymm0
vmovdqa %ymm0,0x00(%rsp)
vpxor %ymm0,%ymm12,%ymm12
vpshufb %ymm3,%ymm12,%ymm12
# x1 += x5, x13 = rotl32(x13 ^ x1, 16)
vpaddd 0x20(%rsp),%ymm5,%ymm0
vmovdqa %ymm0,0x20(%rsp)
vpxor %ymm0,%ymm13,%ymm13
vpshufb %ymm3,%ymm13,%ymm13
# x2 += x6, x14 = rotl32(x14 ^ x2, 16)
vpaddd 0x40(%rsp),%ymm6,%ymm0
vmovdqa %ymm0,0x40(%rsp)
vpxor %ymm0,%ymm14,%ymm14
vpshufb %ymm3,%ymm14,%ymm14
# x3 += x7, x15 = rotl32(x15 ^ x3, 16)
vpaddd 0x60(%rsp),%ymm7,%ymm0
vmovdqa %ymm0,0x60(%rsp)
vpxor %ymm0,%ymm15,%ymm15
vpshufb %ymm3,%ymm15,%ymm15
# x8 += x12, x4 = rotl32(x4 ^ x8, 12)
vpaddd %ymm12,%ymm8,%ymm8
vpxor %ymm8,%ymm4,%ymm4
vpslld $12,%ymm4,%ymm0
vpsrld $20,%ymm4,%ymm4
vpor %ymm0,%ymm4,%ymm4
# x9 += x13, x5 = rotl32(x5 ^ x9, 12)
vpaddd %ymm13,%ymm9,%ymm9
vpxor %ymm9,%ymm5,%ymm5
vpslld $12,%ymm5,%ymm0
vpsrld $20,%ymm5,%ymm5
vpor %ymm0,%ymm5,%ymm5
# x10 += x14, x6 = rotl32(x6 ^ x10, 12)
vpaddd %ymm14,%ymm10,%ymm10
vpxor %ymm10,%ymm6,%ymm6
vpslld $12,%ymm6,%ymm0
vpsrld $20,%ymm6,%ymm6
vpor %ymm0,%ymm6,%ymm6
# x11 += x15, x7 = rotl32(x7 ^ x11, 12)
vpaddd %ymm15,%ymm11,%ymm11
vpxor %ymm11,%ymm7,%ymm7
vpslld $12,%ymm7,%ymm0
vpsrld $20,%ymm7,%ymm7
vpor %ymm0,%ymm7,%ymm7
# x0 += x4, x12 = rotl32(x12 ^ x0, 8)
vpaddd 0x00(%rsp),%ymm4,%ymm0
vmovdqa %ymm0,0x00(%rsp)
vpxor %ymm0,%ymm12,%ymm12
vpshufb %ymm2,%ymm12,%ymm12
# x1 += x5, x13 = rotl32(x13 ^ x1, 8)
vpaddd 0x20(%rsp),%ymm5,%ymm0
vmovdqa %ymm0,0x20(%rsp)
vpxor %ymm0,%ymm13,%ymm13
vpshufb %ymm2,%ymm13,%ymm13
# x2 += x6, x14 = rotl32(x14 ^ x2, 8)
vpaddd 0x40(%rsp),%ymm6,%ymm0
vmovdqa %ymm0,0x40(%rsp)
vpxor %ymm0,%ymm14,%ymm14
vpshufb %ymm2,%ymm14,%ymm14
# x3 += x7, x15 = rotl32(x15 ^ x3, 8)
vpaddd 0x60(%rsp),%ymm7,%ymm0
vmovdqa %ymm0,0x60(%rsp)
vpxor %ymm0,%ymm15,%ymm15
vpshufb %ymm2,%ymm15,%ymm15
# x8 += x12, x4 = rotl32(x4 ^ x8, 7)
vpaddd %ymm12,%ymm8,%ymm8
vpxor %ymm8,%ymm4,%ymm4
vpslld $7,%ymm4,%ymm0
vpsrld $25,%ymm4,%ymm4
vpor %ymm0,%ymm4,%ymm4
# x9 += x13, x5 = rotl32(x5 ^ x9, 7)
vpaddd %ymm13,%ymm9,%ymm9
vpxor %ymm9,%ymm5,%ymm5
vpslld $7,%ymm5,%ymm0
vpsrld $25,%ymm5,%ymm5
vpor %ymm0,%ymm5,%ymm5
# x10 += x14, x6 = rotl32(x6 ^ x10, 7)
vpaddd %ymm14,%ymm10,%ymm10
vpxor %ymm10,%ymm6,%ymm6
vpslld $7,%ymm6,%ymm0
vpsrld $25,%ymm6,%ymm6
vpor %ymm0,%ymm6,%ymm6
# x11 += x15, x7 = rotl32(x7 ^ x11, 7)
vpaddd %ymm15,%ymm11,%ymm11
vpxor %ymm11,%ymm7,%ymm7
vpslld $7,%ymm7,%ymm0
vpsrld $25,%ymm7,%ymm7
vpor %ymm0,%ymm7,%ymm7
# x0 += x5, x15 = rotl32(x15 ^ x0, 16)
vpaddd 0x00(%rsp),%ymm5,%ymm0
vmovdqa %ymm0,0x00(%rsp)
vpxor %ymm0,%ymm15,%ymm15
vpshufb %ymm3,%ymm15,%ymm15
# x1 += x6, x12 = rotl32(x12 ^ x1, 16)%ymm0
vpaddd 0x20(%rsp),%ymm6,%ymm0
vmovdqa %ymm0,0x20(%rsp)
vpxor %ymm0,%ymm12,%ymm12
vpshufb %ymm3,%ymm12,%ymm12
# x2 += x7, x13 = rotl32(x13 ^ x2, 16)
vpaddd 0x40(%rsp),%ymm7,%ymm0
vmovdqa %ymm0,0x40(%rsp)
vpxor %ymm0,%ymm13,%ymm13
vpshufb %ymm3,%ymm13,%ymm13
# x3 += x4, x14 = rotl32(x14 ^ x3, 16)
vpaddd 0x60(%rsp),%ymm4,%ymm0
vmovdqa %ymm0,0x60(%rsp)
vpxor %ymm0,%ymm14,%ymm14
vpshufb %ymm3,%ymm14,%ymm14
# x10 += x15, x5 = rotl32(x5 ^ x10, 12)
vpaddd %ymm15,%ymm10,%ymm10
vpxor %ymm10,%ymm5,%ymm5
vpslld $12,%ymm5,%ymm0
vpsrld $20,%ymm5,%ymm5
vpor %ymm0,%ymm5,%ymm5
# x11 += x12, x6 = rotl32(x6 ^ x11, 12)
vpaddd %ymm12,%ymm11,%ymm11
vpxor %ymm11,%ymm6,%ymm6
vpslld $12,%ymm6,%ymm0
vpsrld $20,%ymm6,%ymm6
vpor %ymm0,%ymm6,%ymm6
# x8 += x13, x7 = rotl32(x7 ^ x8, 12)
vpaddd %ymm13,%ymm8,%ymm8
vpxor %ymm8,%ymm7,%ymm7
vpslld $12,%ymm7,%ymm0
vpsrld $20,%ymm7,%ymm7
vpor %ymm0,%ymm7,%ymm7
# x9 += x14, x4 = rotl32(x4 ^ x9, 12)
vpaddd %ymm14,%ymm9,%ymm9
vpxor %ymm9,%ymm4,%ymm4
vpslld $12,%ymm4,%ymm0
vpsrld $20,%ymm4,%ymm4
vpor %ymm0,%ymm4,%ymm4
# x0 += x5, x15 = rotl32(x15 ^ x0, 8)
vpaddd 0x00(%rsp),%ymm5,%ymm0
vmovdqa %ymm0,0x00(%rsp)
vpxor %ymm0,%ymm15,%ymm15
vpshufb %ymm2,%ymm15,%ymm15
# x1 += x6, x12 = rotl32(x12 ^ x1, 8)
vpaddd 0x20(%rsp),%ymm6,%ymm0
vmovdqa %ymm0,0x20(%rsp)
vpxor %ymm0,%ymm12,%ymm12
vpshufb %ymm2,%ymm12,%ymm12
# x2 += x7, x13 = rotl32(x13 ^ x2, 8)
vpaddd 0x40(%rsp),%ymm7,%ymm0
vmovdqa %ymm0,0x40(%rsp)
vpxor %ymm0,%ymm13,%ymm13
vpshufb %ymm2,%ymm13,%ymm13
# x3 += x4, x14 = rotl32(x14 ^ x3, 8)
vpaddd 0x60(%rsp),%ymm4,%ymm0
vmovdqa %ymm0,0x60(%rsp)
vpxor %ymm0,%ymm14,%ymm14
vpshufb %ymm2,%ymm14,%ymm14
# x10 += x15, x5 = rotl32(x5 ^ x10, 7)
vpaddd %ymm15,%ymm10,%ymm10
vpxor %ymm10,%ymm5,%ymm5
vpslld $7,%ymm5,%ymm0
vpsrld $25,%ymm5,%ymm5
vpor %ymm0,%ymm5,%ymm5
# x11 += x12, x6 = rotl32(x6 ^ x11, 7)
vpaddd %ymm12,%ymm11,%ymm11
vpxor %ymm11,%ymm6,%ymm6
vpslld $7,%ymm6,%ymm0
vpsrld $25,%ymm6,%ymm6
vpor %ymm0,%ymm6,%ymm6
# x8 += x13, x7 = rotl32(x7 ^ x8, 7)
vpaddd %ymm13,%ymm8,%ymm8
vpxor %ymm8,%ymm7,%ymm7
vpslld $7,%ymm7,%ymm0
vpsrld $25,%ymm7,%ymm7
vpor %ymm0,%ymm7,%ymm7
# x9 += x14, x4 = rotl32(x4 ^ x9, 7)
vpaddd %ymm14,%ymm9,%ymm9
vpxor %ymm9,%ymm4,%ymm4
vpslld $7,%ymm4,%ymm0
vpsrld $25,%ymm4,%ymm4
vpor %ymm0,%ymm4,%ymm4
dec %ecx
jnz .Ldoubleround8
# x0..15[0-3] += s[0..15]
vpbroadcastd 0x00(%rdi),%ymm0
vpaddd 0x00(%rsp),%ymm0,%ymm0
vmovdqa %ymm0,0x00(%rsp)
vpbroadcastd 0x04(%rdi),%ymm0
vpaddd 0x20(%rsp),%ymm0,%ymm0
vmovdqa %ymm0,0x20(%rsp)
vpbroadcastd 0x08(%rdi),%ymm0
vpaddd 0x40(%rsp),%ymm0,%ymm0
vmovdqa %ymm0,0x40(%rsp)
vpbroadcastd 0x0c(%rdi),%ymm0
vpaddd 0x60(%rsp),%ymm0,%ymm0
vmovdqa %ymm0,0x60(%rsp)
vpbroadcastd 0x10(%rdi),%ymm0
vpaddd %ymm0,%ymm4,%ymm4
vpbroadcastd 0x14(%rdi),%ymm0
vpaddd %ymm0,%ymm5,%ymm5
vpbroadcastd 0x18(%rdi),%ymm0
vpaddd %ymm0,%ymm6,%ymm6
vpbroadcastd 0x1c(%rdi),%ymm0
vpaddd %ymm0,%ymm7,%ymm7
vpbroadcastd 0x20(%rdi),%ymm0
vpaddd %ymm0,%ymm8,%ymm8
vpbroadcastd 0x24(%rdi),%ymm0
vpaddd %ymm0,%ymm9,%ymm9
vpbroadcastd 0x28(%rdi),%ymm0
vpaddd %ymm0,%ymm10,%ymm10
vpbroadcastd 0x2c(%rdi),%ymm0
vpaddd %ymm0,%ymm11,%ymm11
vpbroadcastd 0x30(%rdi),%ymm0
vpaddd %ymm0,%ymm12,%ymm12
vpbroadcastd 0x34(%rdi),%ymm0
vpaddd %ymm0,%ymm13,%ymm13
vpbroadcastd 0x38(%rdi),%ymm0
vpaddd %ymm0,%ymm14,%ymm14
vpbroadcastd 0x3c(%rdi),%ymm0
vpaddd %ymm0,%ymm15,%ymm15
# x12 += counter values 0-3
vpaddd %ymm1,%ymm12,%ymm12
# interleave 32-bit words in state n, n+1
vmovdqa 0x00(%rsp),%ymm0
vmovdqa 0x20(%rsp),%ymm1
vpunpckldq %ymm1,%ymm0,%ymm2
vpunpckhdq %ymm1,%ymm0,%ymm1
vmovdqa %ymm2,0x00(%rsp)
vmovdqa %ymm1,0x20(%rsp)
vmovdqa 0x40(%rsp),%ymm0
vmovdqa 0x60(%rsp),%ymm1
vpunpckldq %ymm1,%ymm0,%ymm2
vpunpckhdq %ymm1,%ymm0,%ymm1
vmovdqa %ymm2,0x40(%rsp)
vmovdqa %ymm1,0x60(%rsp)
vmovdqa %ymm4,%ymm0
vpunpckldq %ymm5,%ymm0,%ymm4
vpunpckhdq %ymm5,%ymm0,%ymm5
vmovdqa %ymm6,%ymm0
vpunpckldq %ymm7,%ymm0,%ymm6
vpunpckhdq %ymm7,%ymm0,%ymm7
vmovdqa %ymm8,%ymm0
vpunpckldq %ymm9,%ymm0,%ymm8
vpunpckhdq %ymm9,%ymm0,%ymm9
vmovdqa %ymm10,%ymm0
vpunpckldq %ymm11,%ymm0,%ymm10
vpunpckhdq %ymm11,%ymm0,%ymm11
vmovdqa %ymm12,%ymm0
vpunpckldq %ymm13,%ymm0,%ymm12
vpunpckhdq %ymm13,%ymm0,%ymm13
vmovdqa %ymm14,%ymm0
vpunpckldq %ymm15,%ymm0,%ymm14
vpunpckhdq %ymm15,%ymm0,%ymm15
# interleave 64-bit words in state n, n+2
vmovdqa 0x00(%rsp),%ymm0
vmovdqa 0x40(%rsp),%ymm2
vpunpcklqdq %ymm2,%ymm0,%ymm1
vpunpckhqdq %ymm2,%ymm0,%ymm2
vmovdqa %ymm1,0x00(%rsp)
vmovdqa %ymm2,0x40(%rsp)
vmovdqa 0x20(%rsp),%ymm0
vmovdqa 0x60(%rsp),%ymm2
vpunpcklqdq %ymm2,%ymm0,%ymm1
vpunpckhqdq %ymm2,%ymm0,%ymm2
vmovdqa %ymm1,0x20(%rsp)
vmovdqa %ymm2,0x60(%rsp)
vmovdqa %ymm4,%ymm0
vpunpcklqdq %ymm6,%ymm0,%ymm4
vpunpckhqdq %ymm6,%ymm0,%ymm6
vmovdqa %ymm5,%ymm0
vpunpcklqdq %ymm7,%ymm0,%ymm5
vpunpckhqdq %ymm7,%ymm0,%ymm7
vmovdqa %ymm8,%ymm0
vpunpcklqdq %ymm10,%ymm0,%ymm8
vpunpckhqdq %ymm10,%ymm0,%ymm10
vmovdqa %ymm9,%ymm0
vpunpcklqdq %ymm11,%ymm0,%ymm9
vpunpckhqdq %ymm11,%ymm0,%ymm11
vmovdqa %ymm12,%ymm0
vpunpcklqdq %ymm14,%ymm0,%ymm12
vpunpckhqdq %ymm14,%ymm0,%ymm14
vmovdqa %ymm13,%ymm0
vpunpcklqdq %ymm15,%ymm0,%ymm13
vpunpckhqdq %ymm15,%ymm0,%ymm15
# interleave 128-bit words in state n, n+4
vmovdqa 0x00(%rsp),%ymm0
vperm2i128 $0x20,%ymm4,%ymm0,%ymm1
vperm2i128 $0x31,%ymm4,%ymm0,%ymm4
vmovdqa %ymm1,0x00(%rsp)
vmovdqa 0x20(%rsp),%ymm0
vperm2i128 $0x20,%ymm5,%ymm0,%ymm1
vperm2i128 $0x31,%ymm5,%ymm0,%ymm5
vmovdqa %ymm1,0x20(%rsp)
vmovdqa 0x40(%rsp),%ymm0
vperm2i128 $0x20,%ymm6,%ymm0,%ymm1
vperm2i128 $0x31,%ymm6,%ymm0,%ymm6
vmovdqa %ymm1,0x40(%rsp)
vmovdqa 0x60(%rsp),%ymm0
vperm2i128 $0x20,%ymm7,%ymm0,%ymm1
vperm2i128 $0x31,%ymm7,%ymm0,%ymm7
vmovdqa %ymm1,0x60(%rsp)
vperm2i128 $0x20,%ymm12,%ymm8,%ymm0
vperm2i128 $0x31,%ymm12,%ymm8,%ymm12
vmovdqa %ymm0,%ymm8
vperm2i128 $0x20,%ymm13,%ymm9,%ymm0
vperm2i128 $0x31,%ymm13,%ymm9,%ymm13
vmovdqa %ymm0,%ymm9
vperm2i128 $0x20,%ymm14,%ymm10,%ymm0
vperm2i128 $0x31,%ymm14,%ymm10,%ymm14
vmovdqa %ymm0,%ymm10
vperm2i128 $0x20,%ymm15,%ymm11,%ymm0
vperm2i128 $0x31,%ymm15,%ymm11,%ymm15
vmovdqa %ymm0,%ymm11
# xor with corresponding input, write to output
vmovdqa 0x00(%rsp),%ymm0
vpxor 0x0000(%rdx),%ymm0,%ymm0
vmovdqu %ymm0,0x0000(%rsi)
vmovdqa 0x20(%rsp),%ymm0
vpxor 0x0080(%rdx),%ymm0,%ymm0
vmovdqu %ymm0,0x0080(%rsi)
vmovdqa 0x40(%rsp),%ymm0
vpxor 0x0040(%rdx),%ymm0,%ymm0
vmovdqu %ymm0,0x0040(%rsi)
vmovdqa 0x60(%rsp),%ymm0
vpxor 0x00c0(%rdx),%ymm0,%ymm0
vmovdqu %ymm0,0x00c0(%rsi)
vpxor 0x0100(%rdx),%ymm4,%ymm4
vmovdqu %ymm4,0x0100(%rsi)
vpxor 0x0180(%rdx),%ymm5,%ymm5
vmovdqu %ymm5,0x00180(%rsi)
vpxor 0x0140(%rdx),%ymm6,%ymm6
vmovdqu %ymm6,0x0140(%rsi)
vpxor 0x01c0(%rdx),%ymm7,%ymm7
vmovdqu %ymm7,0x01c0(%rsi)
vpxor 0x0020(%rdx),%ymm8,%ymm8
vmovdqu %ymm8,0x0020(%rsi)
vpxor 0x00a0(%rdx),%ymm9,%ymm9
vmovdqu %ymm9,0x00a0(%rsi)
vpxor 0x0060(%rdx),%ymm10,%ymm10
vmovdqu %ymm10,0x0060(%rsi)
vpxor 0x00e0(%rdx),%ymm11,%ymm11
vmovdqu %ymm11,0x00e0(%rsi)
vpxor 0x0120(%rdx),%ymm12,%ymm12
vmovdqu %ymm12,0x0120(%rsi)
vpxor 0x01a0(%rdx),%ymm13,%ymm13
vmovdqu %ymm13,0x01a0(%rsi)
vpxor 0x0160(%rdx),%ymm14,%ymm14
vmovdqu %ymm14,0x0160(%rsi)
vpxor 0x01e0(%rdx),%ymm15,%ymm15
vmovdqu %ymm15,0x01e0(%rsi)
vzeroupper
lea -8(%r10),%rsp
ret
ENDPROC(chacha20_8block_xor_avx2)

View File

@ -1,146 +0,0 @@
/*
* ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
*
* Copyright (C) 2015 Martin Willi
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*/
#include <crypto/algapi.h>
#include <crypto/chacha20.h>
#include <crypto/internal/skcipher.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <asm/fpu/api.h>
#include <asm/simd.h>
#define CHACHA20_STATE_ALIGN 16
asmlinkage void chacha20_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src);
asmlinkage void chacha20_4block_xor_ssse3(u32 *state, u8 *dst, const u8 *src);
#ifdef CONFIG_AS_AVX2
asmlinkage void chacha20_8block_xor_avx2(u32 *state, u8 *dst, const u8 *src);
static bool chacha20_use_avx2;
#endif
static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
unsigned int bytes)
{
u8 buf[CHACHA20_BLOCK_SIZE];
#ifdef CONFIG_AS_AVX2
if (chacha20_use_avx2) {
while (bytes >= CHACHA20_BLOCK_SIZE * 8) {
chacha20_8block_xor_avx2(state, dst, src);
bytes -= CHACHA20_BLOCK_SIZE * 8;
src += CHACHA20_BLOCK_SIZE * 8;
dst += CHACHA20_BLOCK_SIZE * 8;
state[12] += 8;
}
}
#endif
while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
chacha20_4block_xor_ssse3(state, dst, src);
bytes -= CHACHA20_BLOCK_SIZE * 4;
src += CHACHA20_BLOCK_SIZE * 4;
dst += CHACHA20_BLOCK_SIZE * 4;
state[12] += 4;
}
while (bytes >= CHACHA20_BLOCK_SIZE) {
chacha20_block_xor_ssse3(state, dst, src);
bytes -= CHACHA20_BLOCK_SIZE;
src += CHACHA20_BLOCK_SIZE;
dst += CHACHA20_BLOCK_SIZE;
state[12]++;
}
if (bytes) {
memcpy(buf, src, bytes);
chacha20_block_xor_ssse3(state, buf, buf);
memcpy(dst, buf, bytes);
}
}
static int chacha20_simd(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
u32 *state, state_buf[16 + 2] __aligned(8);
struct skcipher_walk walk;
int err;
BUILD_BUG_ON(CHACHA20_STATE_ALIGN != 16);
state = PTR_ALIGN(state_buf + 0, CHACHA20_STATE_ALIGN);
if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
return crypto_chacha20_crypt(req);
err = skcipher_walk_virt(&walk, req, true);
crypto_chacha20_init(state, ctx, walk.iv);
kernel_fpu_begin();
while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE));
err = skcipher_walk_done(&walk,
walk.nbytes % CHACHA20_BLOCK_SIZE);
}
if (walk.nbytes) {
chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
walk.nbytes);
err = skcipher_walk_done(&walk, 0);
}
kernel_fpu_end();
return err;
}
static struct skcipher_alg alg = {
.base.cra_name = "chacha20",
.base.cra_driver_name = "chacha20-simd",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha20_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA20_KEY_SIZE,
.max_keysize = CHACHA20_KEY_SIZE,
.ivsize = CHACHA20_IV_SIZE,
.chunksize = CHACHA20_BLOCK_SIZE,
.setkey = crypto_chacha20_setkey,
.encrypt = chacha20_simd,
.decrypt = chacha20_simd,
};
static int __init chacha20_simd_mod_init(void)
{
if (!boot_cpu_has(X86_FEATURE_SSSE3))
return -ENODEV;
#ifdef CONFIG_AS_AVX2
chacha20_use_avx2 = boot_cpu_has(X86_FEATURE_AVX) &&
boot_cpu_has(X86_FEATURE_AVX2) &&
cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
#endif
return crypto_register_skcipher(&alg);
}
static void __exit chacha20_simd_mod_fini(void)
{
crypto_unregister_skcipher(&alg);
}
module_init(chacha20_simd_mod_init);
module_exit(chacha20_simd_mod_fini);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
MODULE_DESCRIPTION("chacha20 cipher algorithm, SIMD accelerated");
MODULE_ALIAS_CRYPTO("chacha20");
MODULE_ALIAS_CRYPTO("chacha20-simd");

View File

@ -0,0 +1,304 @@
/*
* x64 SIMD accelerated ChaCha and XChaCha stream ciphers,
* including ChaCha20 (RFC7539)
*
* Copyright (C) 2015 Martin Willi
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*/
#include <crypto/algapi.h>
#include <crypto/chacha.h>
#include <crypto/internal/skcipher.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <asm/fpu/api.h>
#include <asm/simd.h>
#define CHACHA_STATE_ALIGN 16
asmlinkage void chacha_block_xor_ssse3(u32 *state, u8 *dst, const u8 *src,
unsigned int len, int nrounds);
asmlinkage void chacha_4block_xor_ssse3(u32 *state, u8 *dst, const u8 *src,
unsigned int len, int nrounds);
asmlinkage void hchacha_block_ssse3(const u32 *state, u32 *out, int nrounds);
#ifdef CONFIG_AS_AVX2
asmlinkage void chacha_2block_xor_avx2(u32 *state, u8 *dst, const u8 *src,
unsigned int len, int nrounds);
asmlinkage void chacha_4block_xor_avx2(u32 *state, u8 *dst, const u8 *src,
unsigned int len, int nrounds);
asmlinkage void chacha_8block_xor_avx2(u32 *state, u8 *dst, const u8 *src,
unsigned int len, int nrounds);
static bool chacha_use_avx2;
#ifdef CONFIG_AS_AVX512
asmlinkage void chacha_2block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
unsigned int len, int nrounds);
asmlinkage void chacha_4block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
unsigned int len, int nrounds);
asmlinkage void chacha_8block_xor_avx512vl(u32 *state, u8 *dst, const u8 *src,
unsigned int len, int nrounds);
static bool chacha_use_avx512vl;
#endif
#endif
static unsigned int chacha_advance(unsigned int len, unsigned int maxblocks)
{
len = min(len, maxblocks * CHACHA_BLOCK_SIZE);
return round_up(len, CHACHA_BLOCK_SIZE) / CHACHA_BLOCK_SIZE;
}
static void chacha_dosimd(u32 *state, u8 *dst, const u8 *src,
unsigned int bytes, int nrounds)
{
#ifdef CONFIG_AS_AVX2
#ifdef CONFIG_AS_AVX512
if (chacha_use_avx512vl) {
while (bytes >= CHACHA_BLOCK_SIZE * 8) {
chacha_8block_xor_avx512vl(state, dst, src, bytes,
nrounds);
bytes -= CHACHA_BLOCK_SIZE * 8;
src += CHACHA_BLOCK_SIZE * 8;
dst += CHACHA_BLOCK_SIZE * 8;
state[12] += 8;
}
if (bytes > CHACHA_BLOCK_SIZE * 4) {
chacha_8block_xor_avx512vl(state, dst, src, bytes,
nrounds);
state[12] += chacha_advance(bytes, 8);
return;
}
if (bytes > CHACHA_BLOCK_SIZE * 2) {
chacha_4block_xor_avx512vl(state, dst, src, bytes,
nrounds);
state[12] += chacha_advance(bytes, 4);
return;
}
if (bytes) {
chacha_2block_xor_avx512vl(state, dst, src, bytes,
nrounds);
state[12] += chacha_advance(bytes, 2);
return;
}
}
#endif
if (chacha_use_avx2) {
while (bytes >= CHACHA_BLOCK_SIZE * 8) {
chacha_8block_xor_avx2(state, dst, src, bytes, nrounds);
bytes -= CHACHA_BLOCK_SIZE * 8;
src += CHACHA_BLOCK_SIZE * 8;
dst += CHACHA_BLOCK_SIZE * 8;
state[12] += 8;
}
if (bytes > CHACHA_BLOCK_SIZE * 4) {
chacha_8block_xor_avx2(state, dst, src, bytes, nrounds);
state[12] += chacha_advance(bytes, 8);
return;
}
if (bytes > CHACHA_BLOCK_SIZE * 2) {
chacha_4block_xor_avx2(state, dst, src, bytes, nrounds);
state[12] += chacha_advance(bytes, 4);
return;
}
if (bytes > CHACHA_BLOCK_SIZE) {
chacha_2block_xor_avx2(state, dst, src, bytes, nrounds);
state[12] += chacha_advance(bytes, 2);
return;
}
}
#endif
while (bytes >= CHACHA_BLOCK_SIZE * 4) {
chacha_4block_xor_ssse3(state, dst, src, bytes, nrounds);
bytes -= CHACHA_BLOCK_SIZE * 4;
src += CHACHA_BLOCK_SIZE * 4;
dst += CHACHA_BLOCK_SIZE * 4;
state[12] += 4;
}
if (bytes > CHACHA_BLOCK_SIZE) {
chacha_4block_xor_ssse3(state, dst, src, bytes, nrounds);
state[12] += chacha_advance(bytes, 4);
return;
}
if (bytes) {
chacha_block_xor_ssse3(state, dst, src, bytes, nrounds);
state[12]++;
}
}
static int chacha_simd_stream_xor(struct skcipher_walk *walk,
struct chacha_ctx *ctx, u8 *iv)
{
u32 *state, state_buf[16 + 2] __aligned(8);
int next_yield = 4096; /* bytes until next FPU yield */
int err = 0;
BUILD_BUG_ON(CHACHA_STATE_ALIGN != 16);
state = PTR_ALIGN(state_buf + 0, CHACHA_STATE_ALIGN);
crypto_chacha_init(state, ctx, iv);
while (walk->nbytes > 0) {
unsigned int nbytes = walk->nbytes;
if (nbytes < walk->total) {
nbytes = round_down(nbytes, walk->stride);
next_yield -= nbytes;
}
chacha_dosimd(state, walk->dst.virt.addr, walk->src.virt.addr,
nbytes, ctx->nrounds);
if (next_yield <= 0) {
/* temporarily allow preemption */
kernel_fpu_end();
kernel_fpu_begin();
next_yield = 4096;
}
err = skcipher_walk_done(walk, walk->nbytes - nbytes);
}
return err;
}
static int chacha_simd(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
int err;
if (req->cryptlen <= CHACHA_BLOCK_SIZE || !irq_fpu_usable())
return crypto_chacha_crypt(req);
err = skcipher_walk_virt(&walk, req, true);
if (err)
return err;
kernel_fpu_begin();
err = chacha_simd_stream_xor(&walk, ctx, req->iv);
kernel_fpu_end();
return err;
}
static int xchacha_simd(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
struct chacha_ctx subctx;
u32 *state, state_buf[16 + 2] __aligned(8);
u8 real_iv[16];
int err;
if (req->cryptlen <= CHACHA_BLOCK_SIZE || !irq_fpu_usable())
return crypto_xchacha_crypt(req);
err = skcipher_walk_virt(&walk, req, true);
if (err)
return err;
BUILD_BUG_ON(CHACHA_STATE_ALIGN != 16);
state = PTR_ALIGN(state_buf + 0, CHACHA_STATE_ALIGN);
crypto_chacha_init(state, ctx, req->iv);
kernel_fpu_begin();
hchacha_block_ssse3(state, subctx.key, ctx->nrounds);
subctx.nrounds = ctx->nrounds;
memcpy(&real_iv[0], req->iv + 24, 8);
memcpy(&real_iv[8], req->iv + 16, 8);
err = chacha_simd_stream_xor(&walk, &subctx, real_iv);
kernel_fpu_end();
return err;
}
static struct skcipher_alg algs[] = {
{
.base.cra_name = "chacha20",
.base.cra_driver_name = "chacha20-simd",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = CHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha20_setkey,
.encrypt = chacha_simd,
.decrypt = chacha_simd,
}, {
.base.cra_name = "xchacha20",
.base.cra_driver_name = "xchacha20-simd",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = XCHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha20_setkey,
.encrypt = xchacha_simd,
.decrypt = xchacha_simd,
}, {
.base.cra_name = "xchacha12",
.base.cra_driver_name = "xchacha12-simd",
.base.cra_priority = 300,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = XCHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha12_setkey,
.encrypt = xchacha_simd,
.decrypt = xchacha_simd,
},
};
static int __init chacha_simd_mod_init(void)
{
if (!boot_cpu_has(X86_FEATURE_SSSE3))
return -ENODEV;
#ifdef CONFIG_AS_AVX2
chacha_use_avx2 = boot_cpu_has(X86_FEATURE_AVX) &&
boot_cpu_has(X86_FEATURE_AVX2) &&
cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
#ifdef CONFIG_AS_AVX512
chacha_use_avx512vl = chacha_use_avx2 &&
boot_cpu_has(X86_FEATURE_AVX512VL) &&
boot_cpu_has(X86_FEATURE_AVX512BW); /* kmovq */
#endif
#endif
return crypto_register_skciphers(algs, ARRAY_SIZE(algs));
}
static void __exit chacha_simd_mod_fini(void)
{
crypto_unregister_skciphers(algs, ARRAY_SIZE(algs));
}
module_init(chacha_simd_mod_init);
module_exit(chacha_simd_mod_fini);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers (x64 SIMD accelerated)");
MODULE_ALIAS_CRYPTO("chacha20");
MODULE_ALIAS_CRYPTO("chacha20-simd");
MODULE_ALIAS_CRYPTO("xchacha20");
MODULE_ALIAS_CRYPTO("xchacha20-simd");
MODULE_ALIAS_CRYPTO("xchacha12");
MODULE_ALIAS_CRYPTO("xchacha12-simd");

View File

@ -0,0 +1,157 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* NH - ε-almost-universal hash function, x86_64 AVX2 accelerated
*
* Copyright 2018 Google LLC
*
* Author: Eric Biggers <ebiggers@google.com>
*/
#include <linux/linkage.h>
#define PASS0_SUMS %ymm0
#define PASS1_SUMS %ymm1
#define PASS2_SUMS %ymm2
#define PASS3_SUMS %ymm3
#define K0 %ymm4
#define K0_XMM %xmm4
#define K1 %ymm5
#define K1_XMM %xmm5
#define K2 %ymm6
#define K2_XMM %xmm6
#define K3 %ymm7
#define K3_XMM %xmm7
#define T0 %ymm8
#define T1 %ymm9
#define T2 %ymm10
#define T2_XMM %xmm10
#define T3 %ymm11
#define T3_XMM %xmm11
#define T4 %ymm12
#define T5 %ymm13
#define T6 %ymm14
#define T7 %ymm15
#define KEY %rdi
#define MESSAGE %rsi
#define MESSAGE_LEN %rdx
#define HASH %rcx
.macro _nh_2xstride k0, k1, k2, k3
// Add message words to key words
vpaddd \k0, T3, T0
vpaddd \k1, T3, T1
vpaddd \k2, T3, T2
vpaddd \k3, T3, T3
// Multiply 32x32 => 64 and accumulate
vpshufd $0x10, T0, T4
vpshufd $0x32, T0, T0
vpshufd $0x10, T1, T5
vpshufd $0x32, T1, T1
vpshufd $0x10, T2, T6
vpshufd $0x32, T2, T2
vpshufd $0x10, T3, T7
vpshufd $0x32, T3, T3
vpmuludq T4, T0, T0
vpmuludq T5, T1, T1
vpmuludq T6, T2, T2
vpmuludq T7, T3, T3
vpaddq T0, PASS0_SUMS, PASS0_SUMS
vpaddq T1, PASS1_SUMS, PASS1_SUMS
vpaddq T2, PASS2_SUMS, PASS2_SUMS
vpaddq T3, PASS3_SUMS, PASS3_SUMS
.endm
/*
* void nh_avx2(const u32 *key, const u8 *message, size_t message_len,
* u8 hash[NH_HASH_BYTES])
*
* It's guaranteed that message_len % 16 == 0.
*/
ENTRY(nh_avx2)
vmovdqu 0x00(KEY), K0
vmovdqu 0x10(KEY), K1
add $0x20, KEY
vpxor PASS0_SUMS, PASS0_SUMS, PASS0_SUMS
vpxor PASS1_SUMS, PASS1_SUMS, PASS1_SUMS
vpxor PASS2_SUMS, PASS2_SUMS, PASS2_SUMS
vpxor PASS3_SUMS, PASS3_SUMS, PASS3_SUMS
sub $0x40, MESSAGE_LEN
jl .Lloop4_done
.Lloop4:
vmovdqu (MESSAGE), T3
vmovdqu 0x00(KEY), K2
vmovdqu 0x10(KEY), K3
_nh_2xstride K0, K1, K2, K3
vmovdqu 0x20(MESSAGE), T3
vmovdqu 0x20(KEY), K0
vmovdqu 0x30(KEY), K1
_nh_2xstride K2, K3, K0, K1
add $0x40, MESSAGE
add $0x40, KEY
sub $0x40, MESSAGE_LEN
jge .Lloop4
.Lloop4_done:
and $0x3f, MESSAGE_LEN
jz .Ldone
cmp $0x20, MESSAGE_LEN
jl .Llast
// 2 or 3 strides remain; do 2 more.
vmovdqu (MESSAGE), T3
vmovdqu 0x00(KEY), K2
vmovdqu 0x10(KEY), K3
_nh_2xstride K0, K1, K2, K3
add $0x20, MESSAGE
add $0x20, KEY
sub $0x20, MESSAGE_LEN
jz .Ldone
vmovdqa K2, K0
vmovdqa K3, K1
.Llast:
// Last stride. Zero the high 128 bits of the message and keys so they
// don't affect the result when processing them like 2 strides.
vmovdqu (MESSAGE), T3_XMM
vmovdqa K0_XMM, K0_XMM
vmovdqa K1_XMM, K1_XMM
vmovdqu 0x00(KEY), K2_XMM
vmovdqu 0x10(KEY), K3_XMM
_nh_2xstride K0, K1, K2, K3
.Ldone:
// Sum the accumulators for each pass, then store the sums to 'hash'
// PASS0_SUMS is (0A 0B 0C 0D)
// PASS1_SUMS is (1A 1B 1C 1D)
// PASS2_SUMS is (2A 2B 2C 2D)
// PASS3_SUMS is (3A 3B 3C 3D)
// We need the horizontal sums:
// (0A + 0B + 0C + 0D,
// 1A + 1B + 1C + 1D,
// 2A + 2B + 2C + 2D,
// 3A + 3B + 3C + 3D)
//
vpunpcklqdq PASS1_SUMS, PASS0_SUMS, T0 // T0 = (0A 1A 0C 1C)
vpunpckhqdq PASS1_SUMS, PASS0_SUMS, T1 // T1 = (0B 1B 0D 1D)
vpunpcklqdq PASS3_SUMS, PASS2_SUMS, T2 // T2 = (2A 3A 2C 3C)
vpunpckhqdq PASS3_SUMS, PASS2_SUMS, T3 // T3 = (2B 3B 2D 3D)
vinserti128 $0x1, T2_XMM, T0, T4 // T4 = (0A 1A 2A 3A)
vinserti128 $0x1, T3_XMM, T1, T5 // T5 = (0B 1B 2B 3B)
vperm2i128 $0x31, T2, T0, T0 // T0 = (0C 1C 2C 3C)
vperm2i128 $0x31, T3, T1, T1 // T1 = (0D 1D 2D 3D)
vpaddq T5, T4, T4
vpaddq T1, T0, T0
vpaddq T4, T0, T0
vmovdqu T0, (HASH)
ret
ENDPROC(nh_avx2)

View File

@ -0,0 +1,123 @@
/* SPDX-License-Identifier: GPL-2.0 */
/*
* NH - ε-almost-universal hash function, x86_64 SSE2 accelerated
*
* Copyright 2018 Google LLC
*
* Author: Eric Biggers <ebiggers@google.com>
*/
#include <linux/linkage.h>
#define PASS0_SUMS %xmm0
#define PASS1_SUMS %xmm1
#define PASS2_SUMS %xmm2
#define PASS3_SUMS %xmm3
#define K0 %xmm4
#define K1 %xmm5
#define K2 %xmm6
#define K3 %xmm7
#define T0 %xmm8
#define T1 %xmm9
#define T2 %xmm10
#define T3 %xmm11
#define T4 %xmm12
#define T5 %xmm13
#define T6 %xmm14
#define T7 %xmm15
#define KEY %rdi
#define MESSAGE %rsi
#define MESSAGE_LEN %rdx
#define HASH %rcx
.macro _nh_stride k0, k1, k2, k3, offset
// Load next message stride
movdqu \offset(MESSAGE), T1
// Load next key stride
movdqu \offset(KEY), \k3
// Add message words to key words
movdqa T1, T2
movdqa T1, T3
paddd T1, \k0 // reuse k0 to avoid a move
paddd \k1, T1
paddd \k2, T2
paddd \k3, T3
// Multiply 32x32 => 64 and accumulate
pshufd $0x10, \k0, T4
pshufd $0x32, \k0, \k0
pshufd $0x10, T1, T5
pshufd $0x32, T1, T1
pshufd $0x10, T2, T6
pshufd $0x32, T2, T2
pshufd $0x10, T3, T7
pshufd $0x32, T3, T3
pmuludq T4, \k0
pmuludq T5, T1
pmuludq T6, T2
pmuludq T7, T3
paddq \k0, PASS0_SUMS
paddq T1, PASS1_SUMS
paddq T2, PASS2_SUMS
paddq T3, PASS3_SUMS
.endm
/*
* void nh_sse2(const u32 *key, const u8 *message, size_t message_len,
* u8 hash[NH_HASH_BYTES])
*
* It's guaranteed that message_len % 16 == 0.
*/
ENTRY(nh_sse2)
movdqu 0x00(KEY), K0
movdqu 0x10(KEY), K1
movdqu 0x20(KEY), K2
add $0x30, KEY
pxor PASS0_SUMS, PASS0_SUMS
pxor PASS1_SUMS, PASS1_SUMS
pxor PASS2_SUMS, PASS2_SUMS
pxor PASS3_SUMS, PASS3_SUMS
sub $0x40, MESSAGE_LEN
jl .Lloop4_done
.Lloop4:
_nh_stride K0, K1, K2, K3, 0x00
_nh_stride K1, K2, K3, K0, 0x10
_nh_stride K2, K3, K0, K1, 0x20
_nh_stride K3, K0, K1, K2, 0x30
add $0x40, KEY
add $0x40, MESSAGE
sub $0x40, MESSAGE_LEN
jge .Lloop4
.Lloop4_done:
and $0x3f, MESSAGE_LEN
jz .Ldone
_nh_stride K0, K1, K2, K3, 0x00
sub $0x10, MESSAGE_LEN
jz .Ldone
_nh_stride K1, K2, K3, K0, 0x10
sub $0x10, MESSAGE_LEN
jz .Ldone
_nh_stride K2, K3, K0, K1, 0x20
.Ldone:
// Sum the accumulators for each pass, then store the sums to 'hash'
movdqa PASS0_SUMS, T0
movdqa PASS2_SUMS, T1
punpcklqdq PASS1_SUMS, T0 // => (PASS0_SUM_A PASS1_SUM_A)
punpcklqdq PASS3_SUMS, T1 // => (PASS2_SUM_A PASS3_SUM_A)
punpckhqdq PASS1_SUMS, PASS0_SUMS // => (PASS0_SUM_B PASS1_SUM_B)
punpckhqdq PASS3_SUMS, PASS2_SUMS // => (PASS2_SUM_B PASS3_SUM_B)
paddq PASS0_SUMS, T0
paddq PASS2_SUMS, T1
movdqu T0, 0x00(HASH)
movdqu T1, 0x10(HASH)
ret
ENDPROC(nh_sse2)

View File

@ -0,0 +1,77 @@
// SPDX-License-Identifier: GPL-2.0
/*
* NHPoly1305 - ε-almost--universal hash function for Adiantum
* (AVX2 accelerated version)
*
* Copyright 2018 Google LLC
*/
#include <crypto/internal/hash.h>
#include <crypto/nhpoly1305.h>
#include <linux/module.h>
#include <asm/fpu/api.h>
asmlinkage void nh_avx2(const u32 *key, const u8 *message, size_t message_len,
u8 hash[NH_HASH_BYTES]);
/* wrapper to avoid indirect call to assembly, which doesn't work with CFI */
static void _nh_avx2(const u32 *key, const u8 *message, size_t message_len,
__le64 hash[NH_NUM_PASSES])
{
nh_avx2(key, message, message_len, (u8 *)hash);
}
static int nhpoly1305_avx2_update(struct shash_desc *desc,
const u8 *src, unsigned int srclen)
{
if (srclen < 64 || !irq_fpu_usable())
return crypto_nhpoly1305_update(desc, src, srclen);
do {
unsigned int n = min_t(unsigned int, srclen, PAGE_SIZE);
kernel_fpu_begin();
crypto_nhpoly1305_update_helper(desc, src, n, _nh_avx2);
kernel_fpu_end();
src += n;
srclen -= n;
} while (srclen);
return 0;
}
static struct shash_alg nhpoly1305_alg = {
.base.cra_name = "nhpoly1305",
.base.cra_driver_name = "nhpoly1305-avx2",
.base.cra_priority = 300,
.base.cra_ctxsize = sizeof(struct nhpoly1305_key),
.base.cra_module = THIS_MODULE,
.digestsize = POLY1305_DIGEST_SIZE,
.init = crypto_nhpoly1305_init,
.update = nhpoly1305_avx2_update,
.final = crypto_nhpoly1305_final,
.setkey = crypto_nhpoly1305_setkey,
.descsize = sizeof(struct nhpoly1305_state),
};
static int __init nhpoly1305_mod_init(void)
{
if (!boot_cpu_has(X86_FEATURE_AVX2) ||
!boot_cpu_has(X86_FEATURE_OSXSAVE))
return -ENODEV;
return crypto_register_shash(&nhpoly1305_alg);
}
static void __exit nhpoly1305_mod_exit(void)
{
crypto_unregister_shash(&nhpoly1305_alg);
}
module_init(nhpoly1305_mod_init);
module_exit(nhpoly1305_mod_exit);
MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function (AVX2-accelerated)");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
MODULE_ALIAS_CRYPTO("nhpoly1305");
MODULE_ALIAS_CRYPTO("nhpoly1305-avx2");

View File

@ -0,0 +1,76 @@
// SPDX-License-Identifier: GPL-2.0
/*
* NHPoly1305 - ε-almost--universal hash function for Adiantum
* (SSE2 accelerated version)
*
* Copyright 2018 Google LLC
*/
#include <crypto/internal/hash.h>
#include <crypto/nhpoly1305.h>
#include <linux/module.h>
#include <asm/fpu/api.h>
asmlinkage void nh_sse2(const u32 *key, const u8 *message, size_t message_len,
u8 hash[NH_HASH_BYTES]);
/* wrapper to avoid indirect call to assembly, which doesn't work with CFI */
static void _nh_sse2(const u32 *key, const u8 *message, size_t message_len,
__le64 hash[NH_NUM_PASSES])
{
nh_sse2(key, message, message_len, (u8 *)hash);
}
static int nhpoly1305_sse2_update(struct shash_desc *desc,
const u8 *src, unsigned int srclen)
{
if (srclen < 64 || !irq_fpu_usable())
return crypto_nhpoly1305_update(desc, src, srclen);
do {
unsigned int n = min_t(unsigned int, srclen, PAGE_SIZE);
kernel_fpu_begin();
crypto_nhpoly1305_update_helper(desc, src, n, _nh_sse2);
kernel_fpu_end();
src += n;
srclen -= n;
} while (srclen);
return 0;
}
static struct shash_alg nhpoly1305_alg = {
.base.cra_name = "nhpoly1305",
.base.cra_driver_name = "nhpoly1305-sse2",
.base.cra_priority = 200,
.base.cra_ctxsize = sizeof(struct nhpoly1305_key),
.base.cra_module = THIS_MODULE,
.digestsize = POLY1305_DIGEST_SIZE,
.init = crypto_nhpoly1305_init,
.update = nhpoly1305_sse2_update,
.final = crypto_nhpoly1305_final,
.setkey = crypto_nhpoly1305_setkey,
.descsize = sizeof(struct nhpoly1305_state),
};
static int __init nhpoly1305_mod_init(void)
{
if (!boot_cpu_has(X86_FEATURE_XMM2))
return -ENODEV;
return crypto_register_shash(&nhpoly1305_alg);
}
static void __exit nhpoly1305_mod_exit(void)
{
crypto_unregister_shash(&nhpoly1305_alg);
}
module_init(nhpoly1305_mod_init);
module_exit(nhpoly1305_mod_exit);
MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function (SSE2-accelerated)");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
MODULE_ALIAS_CRYPTO("nhpoly1305");
MODULE_ALIAS_CRYPTO("nhpoly1305-sse2");

View File

@ -83,35 +83,37 @@ static unsigned int poly1305_simd_blocks(struct poly1305_desc_ctx *dctx,
if (poly1305_use_avx2 && srclen >= POLY1305_BLOCK_SIZE * 4) {
if (unlikely(!sctx->wset)) {
if (!sctx->uset) {
memcpy(sctx->u, dctx->r, sizeof(sctx->u));
poly1305_simd_mult(sctx->u, dctx->r);
memcpy(sctx->u, dctx->r.r, sizeof(sctx->u));
poly1305_simd_mult(sctx->u, dctx->r.r);
sctx->uset = true;
}
memcpy(sctx->u + 5, sctx->u, sizeof(sctx->u));
poly1305_simd_mult(sctx->u + 5, dctx->r);
poly1305_simd_mult(sctx->u + 5, dctx->r.r);
memcpy(sctx->u + 10, sctx->u + 5, sizeof(sctx->u));
poly1305_simd_mult(sctx->u + 10, dctx->r);
poly1305_simd_mult(sctx->u + 10, dctx->r.r);
sctx->wset = true;
}
blocks = srclen / (POLY1305_BLOCK_SIZE * 4);
poly1305_4block_avx2(dctx->h, src, dctx->r, blocks, sctx->u);
poly1305_4block_avx2(dctx->h.h, src, dctx->r.r, blocks,
sctx->u);
src += POLY1305_BLOCK_SIZE * 4 * blocks;
srclen -= POLY1305_BLOCK_SIZE * 4 * blocks;
}
#endif
if (likely(srclen >= POLY1305_BLOCK_SIZE * 2)) {
if (unlikely(!sctx->uset)) {
memcpy(sctx->u, dctx->r, sizeof(sctx->u));
poly1305_simd_mult(sctx->u, dctx->r);
memcpy(sctx->u, dctx->r.r, sizeof(sctx->u));
poly1305_simd_mult(sctx->u, dctx->r.r);
sctx->uset = true;
}
blocks = srclen / (POLY1305_BLOCK_SIZE * 2);
poly1305_2block_sse2(dctx->h, src, dctx->r, blocks, sctx->u);
poly1305_2block_sse2(dctx->h.h, src, dctx->r.r, blocks,
sctx->u);
src += POLY1305_BLOCK_SIZE * 2 * blocks;
srclen -= POLY1305_BLOCK_SIZE * 2 * blocks;
}
if (srclen >= POLY1305_BLOCK_SIZE) {
poly1305_block_sse2(dctx->h, src, dctx->r, 1);
poly1305_block_sse2(dctx->h.h, src, dctx->r.r, 1);
srclen -= POLY1305_BLOCK_SIZE;
}
return srclen;

View File

@ -430,11 +430,14 @@ config CRYPTO_CTS
help
CTS: Cipher Text Stealing
This is the Cipher Text Stealing mode as described by
Section 8 of rfc2040 and referenced by rfc3962.
(rfc3962 includes errata information in its Appendix A)
Section 8 of rfc2040 and referenced by rfc3962
(rfc3962 includes errata information in its Appendix A) or
CBC-CS3 as defined by NIST in Sp800-38A addendum from Oct 2010.
This mode is required for Kerberos gss mechanism support
for AES encryption.
See: https://csrc.nist.gov/publications/detail/sp/800-38a/addendum/final
config CRYPTO_ECB
tristate "ECB support"
select CRYPTO_BLKCIPHER
@ -493,6 +496,50 @@ config CRYPTO_KEYWRAP
Support for key wrapping (NIST SP800-38F / RFC3394) without
padding.
config CRYPTO_NHPOLY1305
tristate
select CRYPTO_HASH
select CRYPTO_POLY1305
config CRYPTO_NHPOLY1305_SSE2
tristate "NHPoly1305 hash function (x86_64 SSE2 implementation)"
depends on X86 && 64BIT
select CRYPTO_NHPOLY1305
help
SSE2 optimized implementation of the hash function used by the
Adiantum encryption mode.
config CRYPTO_NHPOLY1305_AVX2
tristate "NHPoly1305 hash function (x86_64 AVX2 implementation)"
depends on X86 && 64BIT
select CRYPTO_NHPOLY1305
help
AVX2 optimized implementation of the hash function used by the
Adiantum encryption mode.
config CRYPTO_ADIANTUM
tristate "Adiantum support"
select CRYPTO_CHACHA20
select CRYPTO_POLY1305
select CRYPTO_NHPOLY1305
help
Adiantum is a tweakable, length-preserving encryption mode
designed for fast and secure disk encryption, especially on
CPUs without dedicated crypto instructions. It encrypts
each sector using the XChaCha12 stream cipher, two passes of
an ε-almost-∆-universal hash function, and an invocation of
the AES-256 block cipher on a single 16-byte block. On CPUs
without AES instructions, Adiantum is much faster than
AES-XTS.
Adiantum's security is provably reducible to that of its
underlying stream and block ciphers, subject to a security
bound. Unlike XTS, Adiantum is a true wide-block encryption
mode, so it actually provides an even stronger notion of
security than XTS, subject to the security bound.
If unsure, say N.
comment "Hash modes"
config CRYPTO_CMAC
@ -936,6 +983,18 @@ config CRYPTO_SM3
http://www.oscca.gov.cn/UpFile/20101222141857786.pdf
https://datatracker.ietf.org/doc/html/draft-shen-sm3-hash
config CRYPTO_STREEBOG
tristate "Streebog Hash Function"
select CRYPTO_HASH
help
Streebog Hash Function (GOST R 34.11-2012, RFC 6986) is one of the Russian
cryptographic standard algorithms (called GOST algorithms).
This setting enables two hash algorithms with 256 and 512 bits output.
References:
https://tc26.ru/upload/iblock/fed/feddbb4d26b685903faa2ba11aea43f6.pdf
https://tools.ietf.org/html/rfc6986
config CRYPTO_TGR192
tristate "Tiger digest algorithms"
select CRYPTO_HASH
@ -1006,7 +1065,8 @@ config CRYPTO_AES_TI
8 for decryption), this implementation only uses just two S-boxes of
256 bytes each, and attempts to eliminate data dependent latencies by
prefetching the entire table into the cache at the start of each
block.
block. Interrupts are also disabled to avoid races where cachelines
are evicted when the CPU is interrupted to do something else.
config CRYPTO_AES_586
tristate "AES cipher algorithms (i586)"
@ -1387,32 +1447,34 @@ config CRYPTO_SALSA20
Bernstein <djb@cr.yp.to>. See <http://cr.yp.to/snuffle.html>
config CRYPTO_CHACHA20
tristate "ChaCha20 cipher algorithm"
tristate "ChaCha stream cipher algorithms"
select CRYPTO_BLKCIPHER
help
ChaCha20 cipher algorithm, RFC7539.
The ChaCha20, XChaCha20, and XChaCha12 stream cipher algorithms.
ChaCha20 is a 256-bit high-speed stream cipher designed by Daniel J.
Bernstein and further specified in RFC7539 for use in IETF protocols.
This is the portable C implementation of ChaCha20.
See also:
This is the portable C implementation of ChaCha20. See also:
<http://cr.yp.to/chacha/chacha-20080128.pdf>
XChaCha20 is the application of the XSalsa20 construction to ChaCha20
rather than to Salsa20. XChaCha20 extends ChaCha20's nonce length
from 64 bits (or 96 bits using the RFC7539 convention) to 192 bits,
while provably retaining ChaCha20's security. See also:
<https://cr.yp.to/snuffle/xsalsa-20081128.pdf>
XChaCha12 is XChaCha20 reduced to 12 rounds, with correspondingly
reduced security margin but increased performance. It can be needed
in some performance-sensitive scenarios.
config CRYPTO_CHACHA20_X86_64
tristate "ChaCha20 cipher algorithm (x86_64/SSSE3/AVX2)"
tristate "ChaCha stream cipher algorithms (x86_64/SSSE3/AVX2/AVX-512VL)"
depends on X86 && 64BIT
select CRYPTO_BLKCIPHER
select CRYPTO_CHACHA20
help
ChaCha20 cipher algorithm, RFC7539.
ChaCha20 is a 256-bit high-speed stream cipher designed by Daniel J.
Bernstein and further specified in RFC7539 for use in IETF protocols.
This is the x86_64 assembler implementation using SIMD instructions.
See also:
<http://cr.yp.to/chacha/chacha-20080128.pdf>
SSSE3, AVX2, and AVX-512VL optimized implementations of the ChaCha20,
XChaCha20, and XChaCha12 stream ciphers.
config CRYPTO_SEED
tristate "SEED cipher algorithm"
@ -1812,7 +1874,8 @@ config CRYPTO_USER_API_AEAD
cipher algorithms.
config CRYPTO_STATS
bool
bool "Crypto usage statistics for User-space"
depends on CRYPTO_USER
help
This option enables the gathering of crypto stats.
This will collect:

View File

@ -54,7 +54,8 @@ cryptomgr-y := algboss.o testmgr.o
obj-$(CONFIG_CRYPTO_MANAGER2) += cryptomgr.o
obj-$(CONFIG_CRYPTO_USER) += crypto_user.o
crypto_user-y := crypto_user_base.o crypto_user_stat.o
crypto_user-y := crypto_user_base.o
crypto_user-$(CONFIG_CRYPTO_STATS) += crypto_user_stat.o
obj-$(CONFIG_CRYPTO_CMAC) += cmac.o
obj-$(CONFIG_CRYPTO_HMAC) += hmac.o
obj-$(CONFIG_CRYPTO_VMAC) += vmac.o
@ -71,6 +72,7 @@ obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o
obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o
obj-$(CONFIG_CRYPTO_SHA3) += sha3_generic.o
obj-$(CONFIG_CRYPTO_SM3) += sm3_generic.o
obj-$(CONFIG_CRYPTO_STREEBOG) += streebog_generic.o
obj-$(CONFIG_CRYPTO_WP512) += wp512.o
CFLAGS_wp512.o := $(call cc-option,-fno-schedule-insns) # https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79149
obj-$(CONFIG_CRYPTO_TGR192) += tgr192.o
@ -84,6 +86,8 @@ obj-$(CONFIG_CRYPTO_LRW) += lrw.o
obj-$(CONFIG_CRYPTO_XTS) += xts.o
obj-$(CONFIG_CRYPTO_CTR) += ctr.o
obj-$(CONFIG_CRYPTO_KEYWRAP) += keywrap.o
obj-$(CONFIG_CRYPTO_ADIANTUM) += adiantum.o
obj-$(CONFIG_CRYPTO_NHPOLY1305) += nhpoly1305.o
obj-$(CONFIG_CRYPTO_GCM) += gcm.o
obj-$(CONFIG_CRYPTO_CCM) += ccm.o
obj-$(CONFIG_CRYPTO_CHACHA20POLY1305) += chacha20poly1305.o
@ -116,7 +120,7 @@ obj-$(CONFIG_CRYPTO_KHAZAD) += khazad.o
obj-$(CONFIG_CRYPTO_ANUBIS) += anubis.o
obj-$(CONFIG_CRYPTO_SEED) += seed.o
obj-$(CONFIG_CRYPTO_SALSA20) += salsa20_generic.o
obj-$(CONFIG_CRYPTO_CHACHA20) += chacha20_generic.o
obj-$(CONFIG_CRYPTO_CHACHA20) += chacha_generic.o
obj-$(CONFIG_CRYPTO_POLY1305) += poly1305_generic.o
obj-$(CONFIG_CRYPTO_DEFLATE) += deflate.o
obj-$(CONFIG_CRYPTO_MICHAEL_MIC) += michael_mic.o

View File

@ -365,23 +365,18 @@ static int crypto_ablkcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_report_blkcipher rblkcipher;
strncpy(rblkcipher.type, "ablkcipher", sizeof(rblkcipher.type));
strncpy(rblkcipher.geniv, alg->cra_ablkcipher.geniv ?: "<default>",
sizeof(rblkcipher.geniv));
rblkcipher.geniv[sizeof(rblkcipher.geniv) - 1] = '\0';
memset(&rblkcipher, 0, sizeof(rblkcipher));
strscpy(rblkcipher.type, "ablkcipher", sizeof(rblkcipher.type));
strscpy(rblkcipher.geniv, "<default>", sizeof(rblkcipher.geniv));
rblkcipher.blocksize = alg->cra_blocksize;
rblkcipher.min_keysize = alg->cra_ablkcipher.min_keysize;
rblkcipher.max_keysize = alg->cra_ablkcipher.max_keysize;
rblkcipher.ivsize = alg->cra_ablkcipher.ivsize;
if (nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER,
sizeof(struct crypto_report_blkcipher), &rblkcipher))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER,
sizeof(rblkcipher), &rblkcipher);
}
#else
static int crypto_ablkcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
@ -403,7 +398,7 @@ static void crypto_ablkcipher_show(struct seq_file *m, struct crypto_alg *alg)
seq_printf(m, "min keysize : %u\n", ablkcipher->min_keysize);
seq_printf(m, "max keysize : %u\n", ablkcipher->max_keysize);
seq_printf(m, "ivsize : %u\n", ablkcipher->ivsize);
seq_printf(m, "geniv : %s\n", ablkcipher->geniv ?: "<default>");
seq_printf(m, "geniv : <default>\n");
}
const struct crypto_type crypto_ablkcipher_type = {
@ -415,78 +410,3 @@ const struct crypto_type crypto_ablkcipher_type = {
.report = crypto_ablkcipher_report,
};
EXPORT_SYMBOL_GPL(crypto_ablkcipher_type);
static int crypto_init_givcipher_ops(struct crypto_tfm *tfm, u32 type,
u32 mask)
{
struct ablkcipher_alg *alg = &tfm->__crt_alg->cra_ablkcipher;
struct ablkcipher_tfm *crt = &tfm->crt_ablkcipher;
if (alg->ivsize > PAGE_SIZE / 8)
return -EINVAL;
crt->setkey = tfm->__crt_alg->cra_flags & CRYPTO_ALG_GENIV ?
alg->setkey : setkey;
crt->encrypt = alg->encrypt;
crt->decrypt = alg->decrypt;
crt->base = __crypto_ablkcipher_cast(tfm);
crt->ivsize = alg->ivsize;
return 0;
}
#ifdef CONFIG_NET
static int crypto_givcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_report_blkcipher rblkcipher;
strncpy(rblkcipher.type, "givcipher", sizeof(rblkcipher.type));
strncpy(rblkcipher.geniv, alg->cra_ablkcipher.geniv ?: "<built-in>",
sizeof(rblkcipher.geniv));
rblkcipher.geniv[sizeof(rblkcipher.geniv) - 1] = '\0';
rblkcipher.blocksize = alg->cra_blocksize;
rblkcipher.min_keysize = alg->cra_ablkcipher.min_keysize;
rblkcipher.max_keysize = alg->cra_ablkcipher.max_keysize;
rblkcipher.ivsize = alg->cra_ablkcipher.ivsize;
if (nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER,
sizeof(struct crypto_report_blkcipher), &rblkcipher))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
}
#else
static int crypto_givcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
{
return -ENOSYS;
}
#endif
static void crypto_givcipher_show(struct seq_file *m, struct crypto_alg *alg)
__maybe_unused;
static void crypto_givcipher_show(struct seq_file *m, struct crypto_alg *alg)
{
struct ablkcipher_alg *ablkcipher = &alg->cra_ablkcipher;
seq_printf(m, "type : givcipher\n");
seq_printf(m, "async : %s\n", alg->cra_flags & CRYPTO_ALG_ASYNC ?
"yes" : "no");
seq_printf(m, "blocksize : %u\n", alg->cra_blocksize);
seq_printf(m, "min keysize : %u\n", ablkcipher->min_keysize);
seq_printf(m, "max keysize : %u\n", ablkcipher->max_keysize);
seq_printf(m, "ivsize : %u\n", ablkcipher->ivsize);
seq_printf(m, "geniv : %s\n", ablkcipher->geniv ?: "<built-in>");
}
const struct crypto_type crypto_givcipher_type = {
.ctxsize = crypto_ablkcipher_ctxsize,
.init = crypto_init_givcipher_ops,
#ifdef CONFIG_PROC_FS
.show = crypto_givcipher_show,
#endif
.report = crypto_givcipher_report,
};
EXPORT_SYMBOL_GPL(crypto_givcipher_type);

View File

@ -33,15 +33,11 @@ static int crypto_acomp_report(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_report_acomp racomp;
strncpy(racomp.type, "acomp", sizeof(racomp.type));
memset(&racomp, 0, sizeof(racomp));
if (nla_put(skb, CRYPTOCFGA_REPORT_ACOMP,
sizeof(struct crypto_report_acomp), &racomp))
goto nla_put_failure;
return 0;
strscpy(racomp.type, "acomp", sizeof(racomp.type));
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_REPORT_ACOMP, sizeof(racomp), &racomp);
}
#else
static int crypto_acomp_report(struct sk_buff *skb, struct crypto_alg *alg)

664
crypto/adiantum.c Normal file
View File

@ -0,0 +1,664 @@
// SPDX-License-Identifier: GPL-2.0
/*
* Adiantum length-preserving encryption mode
*
* Copyright 2018 Google LLC
*/
/*
* Adiantum is a tweakable, length-preserving encryption mode designed for fast
* and secure disk encryption, especially on CPUs without dedicated crypto
* instructions. Adiantum encrypts each sector using the XChaCha12 stream
* cipher, two passes of an ε-almost--universal (ε-U) hash function based on
* NH and Poly1305, and an invocation of the AES-256 block cipher on a single
* 16-byte block. See the paper for details:
*
* Adiantum: length-preserving encryption for entry-level processors
* (https://eprint.iacr.org/2018/720.pdf)
*
* For flexibility, this implementation also allows other ciphers:
*
* - Stream cipher: XChaCha12 or XChaCha20
* - Block cipher: any with a 128-bit block size and 256-bit key
*
* This implementation doesn't currently allow other ε-U hash functions, i.e.
* HPolyC is not supported. This is because Adiantum is ~20% faster than HPolyC
* but still provably as secure, and also the ε-U hash function of HBSH is
* formally defined to take two inputs (tweak, message) which makes it difficult
* to wrap with the crypto_shash API. Rather, some details need to be handled
* here. Nevertheless, if needed in the future, support for other ε-U hash
* functions could be added here.
*/
#include <crypto/b128ops.h>
#include <crypto/chacha.h>
#include <crypto/internal/hash.h>
#include <crypto/internal/skcipher.h>
#include <crypto/nhpoly1305.h>
#include <crypto/scatterwalk.h>
#include <linux/module.h>
#include "internal.h"
/*
* Size of right-hand part of input data, in bytes; also the size of the block
* cipher's block size and the hash function's output.
*/
#define BLOCKCIPHER_BLOCK_SIZE 16
/* Size of the block cipher key (K_E) in bytes */
#define BLOCKCIPHER_KEY_SIZE 32
/* Size of the hash key (K_H) in bytes */
#define HASH_KEY_SIZE (POLY1305_BLOCK_SIZE + NHPOLY1305_KEY_SIZE)
/*
* The specification allows variable-length tweaks, but Linux's crypto API
* currently only allows algorithms to support a single length. The "natural"
* tweak length for Adiantum is 16, since that fits into one Poly1305 block for
* the best performance. But longer tweaks are useful for fscrypt, to avoid
* needing to derive per-file keys. So instead we use two blocks, or 32 bytes.
*/
#define TWEAK_SIZE 32
struct adiantum_instance_ctx {
struct crypto_skcipher_spawn streamcipher_spawn;
struct crypto_spawn blockcipher_spawn;
struct crypto_shash_spawn hash_spawn;
};
struct adiantum_tfm_ctx {
struct crypto_skcipher *streamcipher;
struct crypto_cipher *blockcipher;
struct crypto_shash *hash;
struct poly1305_key header_hash_key;
};
struct adiantum_request_ctx {
/*
* Buffer for right-hand part of data, i.e.
*
* P_L => P_M => C_M => C_R when encrypting, or
* C_R => C_M => P_M => P_L when decrypting.
*
* Also used to build the IV for the stream cipher.
*/
union {
u8 bytes[XCHACHA_IV_SIZE];
__le32 words[XCHACHA_IV_SIZE / sizeof(__le32)];
le128 bignum; /* interpret as element of Z/(2^{128}Z) */
} rbuf;
bool enc; /* true if encrypting, false if decrypting */
/*
* The result of the Poly1305 ε-U hash function applied to
* (bulk length, tweak)
*/
le128 header_hash;
/* Sub-requests, must be last */
union {
struct shash_desc hash_desc;
struct skcipher_request streamcipher_req;
} u;
};
/*
* Given the XChaCha stream key K_S, derive the block cipher key K_E and the
* hash key K_H as follows:
*
* K_E || K_H || ... = XChaCha(key=K_S, nonce=1||0^191)
*
* Note that this denotes using bits from the XChaCha keystream, which here we
* get indirectly by encrypting a buffer containing all 0's.
*/
static int adiantum_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keylen)
{
struct adiantum_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
struct {
u8 iv[XCHACHA_IV_SIZE];
u8 derived_keys[BLOCKCIPHER_KEY_SIZE + HASH_KEY_SIZE];
struct scatterlist sg;
struct crypto_wait wait;
struct skcipher_request req; /* must be last */
} *data;
u8 *keyp;
int err;
/* Set the stream cipher key (K_S) */
crypto_skcipher_clear_flags(tctx->streamcipher, CRYPTO_TFM_REQ_MASK);
crypto_skcipher_set_flags(tctx->streamcipher,
crypto_skcipher_get_flags(tfm) &
CRYPTO_TFM_REQ_MASK);
err = crypto_skcipher_setkey(tctx->streamcipher, key, keylen);
crypto_skcipher_set_flags(tfm,
crypto_skcipher_get_flags(tctx->streamcipher) &
CRYPTO_TFM_RES_MASK);
if (err)
return err;
/* Derive the subkeys */
data = kzalloc(sizeof(*data) +
crypto_skcipher_reqsize(tctx->streamcipher), GFP_KERNEL);
if (!data)
return -ENOMEM;
data->iv[0] = 1;
sg_init_one(&data->sg, data->derived_keys, sizeof(data->derived_keys));
crypto_init_wait(&data->wait);
skcipher_request_set_tfm(&data->req, tctx->streamcipher);
skcipher_request_set_callback(&data->req, CRYPTO_TFM_REQ_MAY_SLEEP |
CRYPTO_TFM_REQ_MAY_BACKLOG,
crypto_req_done, &data->wait);
skcipher_request_set_crypt(&data->req, &data->sg, &data->sg,
sizeof(data->derived_keys), data->iv);
err = crypto_wait_req(crypto_skcipher_encrypt(&data->req), &data->wait);
if (err)
goto out;
keyp = data->derived_keys;
/* Set the block cipher key (K_E) */
crypto_cipher_clear_flags(tctx->blockcipher, CRYPTO_TFM_REQ_MASK);
crypto_cipher_set_flags(tctx->blockcipher,
crypto_skcipher_get_flags(tfm) &
CRYPTO_TFM_REQ_MASK);
err = crypto_cipher_setkey(tctx->blockcipher, keyp,
BLOCKCIPHER_KEY_SIZE);
crypto_skcipher_set_flags(tfm,
crypto_cipher_get_flags(tctx->blockcipher) &
CRYPTO_TFM_RES_MASK);
if (err)
goto out;
keyp += BLOCKCIPHER_KEY_SIZE;
/* Set the hash key (K_H) */
poly1305_core_setkey(&tctx->header_hash_key, keyp);
keyp += POLY1305_BLOCK_SIZE;
crypto_shash_clear_flags(tctx->hash, CRYPTO_TFM_REQ_MASK);
crypto_shash_set_flags(tctx->hash, crypto_skcipher_get_flags(tfm) &
CRYPTO_TFM_REQ_MASK);
err = crypto_shash_setkey(tctx->hash, keyp, NHPOLY1305_KEY_SIZE);
crypto_skcipher_set_flags(tfm, crypto_shash_get_flags(tctx->hash) &
CRYPTO_TFM_RES_MASK);
keyp += NHPOLY1305_KEY_SIZE;
WARN_ON(keyp != &data->derived_keys[ARRAY_SIZE(data->derived_keys)]);
out:
kzfree(data);
return err;
}
/* Addition in Z/(2^{128}Z) */
static inline void le128_add(le128 *r, const le128 *v1, const le128 *v2)
{
u64 x = le64_to_cpu(v1->b);
u64 y = le64_to_cpu(v2->b);
r->b = cpu_to_le64(x + y);
r->a = cpu_to_le64(le64_to_cpu(v1->a) + le64_to_cpu(v2->a) +
(x + y < x));
}
/* Subtraction in Z/(2^{128}Z) */
static inline void le128_sub(le128 *r, const le128 *v1, const le128 *v2)
{
u64 x = le64_to_cpu(v1->b);
u64 y = le64_to_cpu(v2->b);
r->b = cpu_to_le64(x - y);
r->a = cpu_to_le64(le64_to_cpu(v1->a) - le64_to_cpu(v2->a) -
(x - y > x));
}
/*
* Apply the Poly1305 ε-U hash function to (bulk length, tweak) and save the
* result to rctx->header_hash. This is the calculation
*
* H_T Poly1305_{K_T}(bin_{128}(|L|) || T)
*
* from the procedure in section 6.4 of the Adiantum paper. The resulting value
* is reused in both the first and second hash steps. Specifically, it's added
* to the result of an independently keyed ε-U hash function (for equal length
* inputs only) taken over the left-hand part (the "bulk") of the message, to
* give the overall Adiantum hash of the (tweak, left-hand part) pair.
*/
static void adiantum_hash_header(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct adiantum_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
struct adiantum_request_ctx *rctx = skcipher_request_ctx(req);
const unsigned int bulk_len = req->cryptlen - BLOCKCIPHER_BLOCK_SIZE;
struct {
__le64 message_bits;
__le64 padding;
} header = {
.message_bits = cpu_to_le64((u64)bulk_len * 8)
};
struct poly1305_state state;
poly1305_core_init(&state);
BUILD_BUG_ON(sizeof(header) % POLY1305_BLOCK_SIZE != 0);
poly1305_core_blocks(&state, &tctx->header_hash_key,
&header, sizeof(header) / POLY1305_BLOCK_SIZE);
BUILD_BUG_ON(TWEAK_SIZE % POLY1305_BLOCK_SIZE != 0);
poly1305_core_blocks(&state, &tctx->header_hash_key, req->iv,
TWEAK_SIZE / POLY1305_BLOCK_SIZE);
poly1305_core_emit(&state, &rctx->header_hash);
}
/* Hash the left-hand part (the "bulk") of the message using NHPoly1305 */
static int adiantum_hash_message(struct skcipher_request *req,
struct scatterlist *sgl, le128 *digest)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct adiantum_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
struct adiantum_request_ctx *rctx = skcipher_request_ctx(req);
const unsigned int bulk_len = req->cryptlen - BLOCKCIPHER_BLOCK_SIZE;
struct shash_desc *hash_desc = &rctx->u.hash_desc;
struct sg_mapping_iter miter;
unsigned int i, n;
int err;
hash_desc->tfm = tctx->hash;
hash_desc->flags = 0;
err = crypto_shash_init(hash_desc);
if (err)
return err;
sg_miter_start(&miter, sgl, sg_nents(sgl),
SG_MITER_FROM_SG | SG_MITER_ATOMIC);
for (i = 0; i < bulk_len; i += n) {
sg_miter_next(&miter);
n = min_t(unsigned int, miter.length, bulk_len - i);
err = crypto_shash_update(hash_desc, miter.addr, n);
if (err)
break;
}
sg_miter_stop(&miter);
if (err)
return err;
return crypto_shash_final(hash_desc, (u8 *)digest);
}
/* Continue Adiantum encryption/decryption after the stream cipher step */
static int adiantum_finish(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct adiantum_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
struct adiantum_request_ctx *rctx = skcipher_request_ctx(req);
const unsigned int bulk_len = req->cryptlen - BLOCKCIPHER_BLOCK_SIZE;
le128 digest;
int err;
/* If decrypting, decrypt C_M with the block cipher to get P_M */
if (!rctx->enc)
crypto_cipher_decrypt_one(tctx->blockcipher, rctx->rbuf.bytes,
rctx->rbuf.bytes);
/*
* Second hash step
* enc: C_R = C_M - H_{K_H}(T, C_L)
* dec: P_R = P_M - H_{K_H}(T, P_L)
*/
err = adiantum_hash_message(req, req->dst, &digest);
if (err)
return err;
le128_add(&digest, &digest, &rctx->header_hash);
le128_sub(&rctx->rbuf.bignum, &rctx->rbuf.bignum, &digest);
scatterwalk_map_and_copy(&rctx->rbuf.bignum, req->dst,
bulk_len, BLOCKCIPHER_BLOCK_SIZE, 1);
return 0;
}
static void adiantum_streamcipher_done(struct crypto_async_request *areq,
int err)
{
struct skcipher_request *req = areq->data;
if (!err)
err = adiantum_finish(req);
skcipher_request_complete(req, err);
}
static int adiantum_crypt(struct skcipher_request *req, bool enc)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
const struct adiantum_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
struct adiantum_request_ctx *rctx = skcipher_request_ctx(req);
const unsigned int bulk_len = req->cryptlen - BLOCKCIPHER_BLOCK_SIZE;
unsigned int stream_len;
le128 digest;
int err;
if (req->cryptlen < BLOCKCIPHER_BLOCK_SIZE)
return -EINVAL;
rctx->enc = enc;
/*
* First hash step
* enc: P_M = P_R + H_{K_H}(T, P_L)
* dec: C_M = C_R + H_{K_H}(T, C_L)
*/
adiantum_hash_header(req);
err = adiantum_hash_message(req, req->src, &digest);
if (err)
return err;
le128_add(&digest, &digest, &rctx->header_hash);
scatterwalk_map_and_copy(&rctx->rbuf.bignum, req->src,
bulk_len, BLOCKCIPHER_BLOCK_SIZE, 0);
le128_add(&rctx->rbuf.bignum, &rctx->rbuf.bignum, &digest);
/* If encrypting, encrypt P_M with the block cipher to get C_M */
if (enc)
crypto_cipher_encrypt_one(tctx->blockcipher, rctx->rbuf.bytes,
rctx->rbuf.bytes);
/* Initialize the rest of the XChaCha IV (first part is C_M) */
BUILD_BUG_ON(BLOCKCIPHER_BLOCK_SIZE != 16);
BUILD_BUG_ON(XCHACHA_IV_SIZE != 32); /* nonce || stream position */
rctx->rbuf.words[4] = cpu_to_le32(1);
rctx->rbuf.words[5] = 0;
rctx->rbuf.words[6] = 0;
rctx->rbuf.words[7] = 0;
/*
* XChaCha needs to be done on all the data except the last 16 bytes;
* for disk encryption that usually means 4080 or 496 bytes. But ChaCha
* implementations tend to be most efficient when passed a whole number
* of 64-byte ChaCha blocks, or sometimes even a multiple of 256 bytes.
* And here it doesn't matter whether the last 16 bytes are written to,
* as the second hash step will overwrite them. Thus, round the XChaCha
* length up to the next 64-byte boundary if possible.
*/
stream_len = bulk_len;
if (round_up(stream_len, CHACHA_BLOCK_SIZE) <= req->cryptlen)
stream_len = round_up(stream_len, CHACHA_BLOCK_SIZE);
skcipher_request_set_tfm(&rctx->u.streamcipher_req, tctx->streamcipher);
skcipher_request_set_crypt(&rctx->u.streamcipher_req, req->src,
req->dst, stream_len, &rctx->rbuf);
skcipher_request_set_callback(&rctx->u.streamcipher_req,
req->base.flags,
adiantum_streamcipher_done, req);
return crypto_skcipher_encrypt(&rctx->u.streamcipher_req) ?:
adiantum_finish(req);
}
static int adiantum_encrypt(struct skcipher_request *req)
{
return adiantum_crypt(req, true);
}
static int adiantum_decrypt(struct skcipher_request *req)
{
return adiantum_crypt(req, false);
}
static int adiantum_init_tfm(struct crypto_skcipher *tfm)
{
struct skcipher_instance *inst = skcipher_alg_instance(tfm);
struct adiantum_instance_ctx *ictx = skcipher_instance_ctx(inst);
struct adiantum_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
struct crypto_skcipher *streamcipher;
struct crypto_cipher *blockcipher;
struct crypto_shash *hash;
unsigned int subreq_size;
int err;
streamcipher = crypto_spawn_skcipher(&ictx->streamcipher_spawn);
if (IS_ERR(streamcipher))
return PTR_ERR(streamcipher);
blockcipher = crypto_spawn_cipher(&ictx->blockcipher_spawn);
if (IS_ERR(blockcipher)) {
err = PTR_ERR(blockcipher);
goto err_free_streamcipher;
}
hash = crypto_spawn_shash(&ictx->hash_spawn);
if (IS_ERR(hash)) {
err = PTR_ERR(hash);
goto err_free_blockcipher;
}
tctx->streamcipher = streamcipher;
tctx->blockcipher = blockcipher;
tctx->hash = hash;
BUILD_BUG_ON(offsetofend(struct adiantum_request_ctx, u) !=
sizeof(struct adiantum_request_ctx));
subreq_size = max(FIELD_SIZEOF(struct adiantum_request_ctx,
u.hash_desc) +
crypto_shash_descsize(hash),
FIELD_SIZEOF(struct adiantum_request_ctx,
u.streamcipher_req) +
crypto_skcipher_reqsize(streamcipher));
crypto_skcipher_set_reqsize(tfm,
offsetof(struct adiantum_request_ctx, u) +
subreq_size);
return 0;
err_free_blockcipher:
crypto_free_cipher(blockcipher);
err_free_streamcipher:
crypto_free_skcipher(streamcipher);
return err;
}
static void adiantum_exit_tfm(struct crypto_skcipher *tfm)
{
struct adiantum_tfm_ctx *tctx = crypto_skcipher_ctx(tfm);
crypto_free_skcipher(tctx->streamcipher);
crypto_free_cipher(tctx->blockcipher);
crypto_free_shash(tctx->hash);
}
static void adiantum_free_instance(struct skcipher_instance *inst)
{
struct adiantum_instance_ctx *ictx = skcipher_instance_ctx(inst);
crypto_drop_skcipher(&ictx->streamcipher_spawn);
crypto_drop_spawn(&ictx->blockcipher_spawn);
crypto_drop_shash(&ictx->hash_spawn);
kfree(inst);
}
/*
* Check for a supported set of inner algorithms.
* See the comment at the beginning of this file.
*/
static bool adiantum_supported_algorithms(struct skcipher_alg *streamcipher_alg,
struct crypto_alg *blockcipher_alg,
struct shash_alg *hash_alg)
{
if (strcmp(streamcipher_alg->base.cra_name, "xchacha12") != 0 &&
strcmp(streamcipher_alg->base.cra_name, "xchacha20") != 0)
return false;
if (blockcipher_alg->cra_cipher.cia_min_keysize > BLOCKCIPHER_KEY_SIZE ||
blockcipher_alg->cra_cipher.cia_max_keysize < BLOCKCIPHER_KEY_SIZE)
return false;
if (blockcipher_alg->cra_blocksize != BLOCKCIPHER_BLOCK_SIZE)
return false;
if (strcmp(hash_alg->base.cra_name, "nhpoly1305") != 0)
return false;
return true;
}
static int adiantum_create(struct crypto_template *tmpl, struct rtattr **tb)
{
struct crypto_attr_type *algt;
const char *streamcipher_name;
const char *blockcipher_name;
const char *nhpoly1305_name;
struct skcipher_instance *inst;
struct adiantum_instance_ctx *ictx;
struct skcipher_alg *streamcipher_alg;
struct crypto_alg *blockcipher_alg;
struct crypto_alg *_hash_alg;
struct shash_alg *hash_alg;
int err;
algt = crypto_get_attr_type(tb);
if (IS_ERR(algt))
return PTR_ERR(algt);
if ((algt->type ^ CRYPTO_ALG_TYPE_SKCIPHER) & algt->mask)
return -EINVAL;
streamcipher_name = crypto_attr_alg_name(tb[1]);
if (IS_ERR(streamcipher_name))
return PTR_ERR(streamcipher_name);
blockcipher_name = crypto_attr_alg_name(tb[2]);
if (IS_ERR(blockcipher_name))
return PTR_ERR(blockcipher_name);
nhpoly1305_name = crypto_attr_alg_name(tb[3]);
if (nhpoly1305_name == ERR_PTR(-ENOENT))
nhpoly1305_name = "nhpoly1305";
if (IS_ERR(nhpoly1305_name))
return PTR_ERR(nhpoly1305_name);
inst = kzalloc(sizeof(*inst) + sizeof(*ictx), GFP_KERNEL);
if (!inst)
return -ENOMEM;
ictx = skcipher_instance_ctx(inst);
/* Stream cipher, e.g. "xchacha12" */
err = crypto_grab_skcipher(&ictx->streamcipher_spawn, streamcipher_name,
0, crypto_requires_sync(algt->type,
algt->mask));
if (err)
goto out_free_inst;
streamcipher_alg = crypto_spawn_skcipher_alg(&ictx->streamcipher_spawn);
/* Block cipher, e.g. "aes" */
err = crypto_grab_spawn(&ictx->blockcipher_spawn, blockcipher_name,
CRYPTO_ALG_TYPE_CIPHER, CRYPTO_ALG_TYPE_MASK);
if (err)
goto out_drop_streamcipher;
blockcipher_alg = ictx->blockcipher_spawn.alg;
/* NHPoly1305 ε-∆U hash function */
_hash_alg = crypto_alg_mod_lookup(nhpoly1305_name,
CRYPTO_ALG_TYPE_SHASH,
CRYPTO_ALG_TYPE_MASK);
if (IS_ERR(_hash_alg)) {
err = PTR_ERR(_hash_alg);
goto out_drop_blockcipher;
}
hash_alg = __crypto_shash_alg(_hash_alg);
err = crypto_init_shash_spawn(&ictx->hash_spawn, hash_alg,
skcipher_crypto_instance(inst));
if (err)
goto out_put_hash;
/* Check the set of algorithms */
if (!adiantum_supported_algorithms(streamcipher_alg, blockcipher_alg,
hash_alg)) {
pr_warn("Unsupported Adiantum instantiation: (%s,%s,%s)\n",
streamcipher_alg->base.cra_name,
blockcipher_alg->cra_name, hash_alg->base.cra_name);
err = -EINVAL;
goto out_drop_hash;
}
/* Instance fields */
err = -ENAMETOOLONG;
if (snprintf(inst->alg.base.cra_name, CRYPTO_MAX_ALG_NAME,
"adiantum(%s,%s)", streamcipher_alg->base.cra_name,
blockcipher_alg->cra_name) >= CRYPTO_MAX_ALG_NAME)
goto out_drop_hash;
if (snprintf(inst->alg.base.cra_driver_name, CRYPTO_MAX_ALG_NAME,
"adiantum(%s,%s,%s)",
streamcipher_alg->base.cra_driver_name,
blockcipher_alg->cra_driver_name,
hash_alg->base.cra_driver_name) >= CRYPTO_MAX_ALG_NAME)
goto out_drop_hash;
inst->alg.base.cra_flags = streamcipher_alg->base.cra_flags &
CRYPTO_ALG_ASYNC;
inst->alg.base.cra_blocksize = BLOCKCIPHER_BLOCK_SIZE;
inst->alg.base.cra_ctxsize = sizeof(struct adiantum_tfm_ctx);
inst->alg.base.cra_alignmask = streamcipher_alg->base.cra_alignmask |
hash_alg->base.cra_alignmask;
/*
* The block cipher is only invoked once per message, so for long
* messages (e.g. sectors for disk encryption) its performance doesn't
* matter as much as that of the stream cipher and hash function. Thus,
* weigh the block cipher's ->cra_priority less.
*/
inst->alg.base.cra_priority = (4 * streamcipher_alg->base.cra_priority +
2 * hash_alg->base.cra_priority +
blockcipher_alg->cra_priority) / 7;
inst->alg.setkey = adiantum_setkey;
inst->alg.encrypt = adiantum_encrypt;
inst->alg.decrypt = adiantum_decrypt;
inst->alg.init = adiantum_init_tfm;
inst->alg.exit = adiantum_exit_tfm;
inst->alg.min_keysize = crypto_skcipher_alg_min_keysize(streamcipher_alg);
inst->alg.max_keysize = crypto_skcipher_alg_max_keysize(streamcipher_alg);
inst->alg.ivsize = TWEAK_SIZE;
inst->free = adiantum_free_instance;
err = skcipher_register_instance(tmpl, inst);
if (err)
goto out_drop_hash;
crypto_mod_put(_hash_alg);
return 0;
out_drop_hash:
crypto_drop_shash(&ictx->hash_spawn);
out_put_hash:
crypto_mod_put(_hash_alg);
out_drop_blockcipher:
crypto_drop_spawn(&ictx->blockcipher_spawn);
out_drop_streamcipher:
crypto_drop_skcipher(&ictx->streamcipher_spawn);
out_free_inst:
kfree(inst);
return err;
}
/* adiantum(streamcipher_name, blockcipher_name [, nhpoly1305_name]) */
static struct crypto_template adiantum_tmpl = {
.name = "adiantum",
.create = adiantum_create,
.module = THIS_MODULE,
};
static int __init adiantum_module_init(void)
{
return crypto_register_template(&adiantum_tmpl);
}
static void __exit adiantum_module_exit(void)
{
crypto_unregister_template(&adiantum_tmpl);
}
module_init(adiantum_module_init);
module_exit(adiantum_module_exit);
MODULE_DESCRIPTION("Adiantum length-preserving encryption mode");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
MODULE_ALIAS_CRYPTO("adiantum");

View File

@ -119,20 +119,16 @@ static int crypto_aead_report(struct sk_buff *skb, struct crypto_alg *alg)
struct crypto_report_aead raead;
struct aead_alg *aead = container_of(alg, struct aead_alg, base);
strncpy(raead.type, "aead", sizeof(raead.type));
strncpy(raead.geniv, "<none>", sizeof(raead.geniv));
memset(&raead, 0, sizeof(raead));
strscpy(raead.type, "aead", sizeof(raead.type));
strscpy(raead.geniv, "<none>", sizeof(raead.geniv));
raead.blocksize = alg->cra_blocksize;
raead.maxauthsize = aead->maxauthsize;
raead.ivsize = aead->ivsize;
if (nla_put(skb, CRYPTOCFGA_REPORT_AEAD,
sizeof(struct crypto_report_aead), &raead))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_REPORT_AEAD, sizeof(raead), &raead);
}
#else
static int crypto_aead_report(struct sk_buff *skb, struct crypto_alg *alg)

View File

@ -63,7 +63,8 @@ static inline u8 byte(const u32 x, const unsigned n)
static const u32 rco_tab[10] = { 1, 2, 4, 8, 16, 32, 64, 128, 27, 54 };
__visible const u32 crypto_ft_tab[4][256] = {
/* cacheline-aligned to facilitate prefetching into cache */
__visible const u32 crypto_ft_tab[4][256] __cacheline_aligned = {
{
0xa56363c6, 0x847c7cf8, 0x997777ee, 0x8d7b7bf6,
0x0df2f2ff, 0xbd6b6bd6, 0xb16f6fde, 0x54c5c591,
@ -327,7 +328,7 @@ __visible const u32 crypto_ft_tab[4][256] = {
}
};
__visible const u32 crypto_fl_tab[4][256] = {
__visible const u32 crypto_fl_tab[4][256] __cacheline_aligned = {
{
0x00000063, 0x0000007c, 0x00000077, 0x0000007b,
0x000000f2, 0x0000006b, 0x0000006f, 0x000000c5,
@ -591,7 +592,7 @@ __visible const u32 crypto_fl_tab[4][256] = {
}
};
__visible const u32 crypto_it_tab[4][256] = {
__visible const u32 crypto_it_tab[4][256] __cacheline_aligned = {
{
0x50a7f451, 0x5365417e, 0xc3a4171a, 0x965e273a,
0xcb6bab3b, 0xf1459d1f, 0xab58faac, 0x9303e34b,
@ -855,7 +856,7 @@ __visible const u32 crypto_it_tab[4][256] = {
}
};
__visible const u32 crypto_il_tab[4][256] = {
__visible const u32 crypto_il_tab[4][256] __cacheline_aligned = {
{
0x00000052, 0x00000009, 0x0000006a, 0x000000d5,
0x00000030, 0x00000036, 0x000000a5, 0x00000038,

View File

@ -269,6 +269,7 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
const u32 *rkp = ctx->key_enc + 4;
int rounds = 6 + ctx->key_length / 4;
u32 st0[4], st1[4];
unsigned long flags;
int round;
st0[0] = ctx->key_enc[0] ^ get_unaligned_le32(in);
@ -276,6 +277,12 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
st0[2] = ctx->key_enc[2] ^ get_unaligned_le32(in + 8);
st0[3] = ctx->key_enc[3] ^ get_unaligned_le32(in + 12);
/*
* Temporarily disable interrupts to avoid races where cachelines are
* evicted when the CPU is interrupted to do something else.
*/
local_irq_save(flags);
st0[0] ^= __aesti_sbox[ 0] ^ __aesti_sbox[128];
st0[1] ^= __aesti_sbox[32] ^ __aesti_sbox[160];
st0[2] ^= __aesti_sbox[64] ^ __aesti_sbox[192];
@ -300,6 +307,8 @@ static void aesti_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
put_unaligned_le32(subshift(st1, 1) ^ rkp[5], out + 4);
put_unaligned_le32(subshift(st1, 2) ^ rkp[6], out + 8);
put_unaligned_le32(subshift(st1, 3) ^ rkp[7], out + 12);
local_irq_restore(flags);
}
static void aesti_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
@ -308,6 +317,7 @@ static void aesti_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
const u32 *rkp = ctx->key_dec + 4;
int rounds = 6 + ctx->key_length / 4;
u32 st0[4], st1[4];
unsigned long flags;
int round;
st0[0] = ctx->key_dec[0] ^ get_unaligned_le32(in);
@ -315,6 +325,12 @@ static void aesti_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
st0[2] = ctx->key_dec[2] ^ get_unaligned_le32(in + 8);
st0[3] = ctx->key_dec[3] ^ get_unaligned_le32(in + 12);
/*
* Temporarily disable interrupts to avoid races where cachelines are
* evicted when the CPU is interrupted to do something else.
*/
local_irq_save(flags);
st0[0] ^= __aesti_inv_sbox[ 0] ^ __aesti_inv_sbox[128];
st0[1] ^= __aesti_inv_sbox[32] ^ __aesti_inv_sbox[160];
st0[2] ^= __aesti_inv_sbox[64] ^ __aesti_inv_sbox[192];
@ -339,6 +355,8 @@ static void aesti_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in)
put_unaligned_le32(inv_subshift(st1, 1) ^ rkp[5], out + 4);
put_unaligned_le32(inv_subshift(st1, 2) ^ rkp[6], out + 8);
put_unaligned_le32(inv_subshift(st1, 3) ^ rkp[7], out + 12);
local_irq_restore(flags);
}
static struct crypto_alg aes_alg = {

View File

@ -364,20 +364,28 @@ static int crypto_ahash_op(struct ahash_request *req,
int crypto_ahash_final(struct ahash_request *req)
{
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
struct crypto_alg *alg = tfm->base.__crt_alg;
unsigned int nbytes = req->nbytes;
int ret;
crypto_stats_get(alg);
ret = crypto_ahash_op(req, crypto_ahash_reqtfm(req)->final);
crypto_stat_ahash_final(req, ret);
crypto_stats_ahash_final(nbytes, ret, alg);
return ret;
}
EXPORT_SYMBOL_GPL(crypto_ahash_final);
int crypto_ahash_finup(struct ahash_request *req)
{
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
struct crypto_alg *alg = tfm->base.__crt_alg;
unsigned int nbytes = req->nbytes;
int ret;
crypto_stats_get(alg);
ret = crypto_ahash_op(req, crypto_ahash_reqtfm(req)->finup);
crypto_stat_ahash_final(req, ret);
crypto_stats_ahash_final(nbytes, ret, alg);
return ret;
}
EXPORT_SYMBOL_GPL(crypto_ahash_finup);
@ -385,13 +393,16 @@ EXPORT_SYMBOL_GPL(crypto_ahash_finup);
int crypto_ahash_digest(struct ahash_request *req)
{
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
struct crypto_alg *alg = tfm->base.__crt_alg;
unsigned int nbytes = req->nbytes;
int ret;
crypto_stats_get(alg);
if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
ret = -ENOKEY;
else
ret = crypto_ahash_op(req, tfm->digest);
crypto_stat_ahash_final(req, ret);
crypto_stats_ahash_final(nbytes, ret, alg);
return ret;
}
EXPORT_SYMBOL_GPL(crypto_ahash_digest);
@ -498,18 +509,14 @@ static int crypto_ahash_report(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_report_hash rhash;
strncpy(rhash.type, "ahash", sizeof(rhash.type));
memset(&rhash, 0, sizeof(rhash));
strscpy(rhash.type, "ahash", sizeof(rhash.type));
rhash.blocksize = alg->cra_blocksize;
rhash.digestsize = __crypto_hash_alg_common(alg)->digestsize;
if (nla_put(skb, CRYPTOCFGA_REPORT_HASH,
sizeof(struct crypto_report_hash), &rhash))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_REPORT_HASH, sizeof(rhash), &rhash);
}
#else
static int crypto_ahash_report(struct sk_buff *skb, struct crypto_alg *alg)

View File

@ -30,15 +30,12 @@ static int crypto_akcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_report_akcipher rakcipher;
strncpy(rakcipher.type, "akcipher", sizeof(rakcipher.type));
memset(&rakcipher, 0, sizeof(rakcipher));
if (nla_put(skb, CRYPTOCFGA_REPORT_AKCIPHER,
sizeof(struct crypto_report_akcipher), &rakcipher))
goto nla_put_failure;
return 0;
strscpy(rakcipher.type, "akcipher", sizeof(rakcipher.type));
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_REPORT_AKCIPHER,
sizeof(rakcipher), &rakcipher);
}
#else
static int crypto_akcipher_report(struct sk_buff *skb, struct crypto_alg *alg)

View File

@ -258,13 +258,7 @@ static struct crypto_larval *__crypto_register_alg(struct crypto_alg *alg)
list_add(&alg->cra_list, &crypto_alg_list);
list_add(&larval->alg.cra_list, &crypto_alg_list);
atomic_set(&alg->encrypt_cnt, 0);
atomic_set(&alg->decrypt_cnt, 0);
atomic64_set(&alg->encrypt_tlen, 0);
atomic64_set(&alg->decrypt_tlen, 0);
atomic_set(&alg->verify_cnt, 0);
atomic_set(&alg->cipher_err_cnt, 0);
atomic_set(&alg->sign_cnt, 0);
crypto_stats_init(alg);
out:
return larval;
@ -1076,6 +1070,245 @@ int crypto_type_has_alg(const char *name, const struct crypto_type *frontend,
}
EXPORT_SYMBOL_GPL(crypto_type_has_alg);
#ifdef CONFIG_CRYPTO_STATS
void crypto_stats_init(struct crypto_alg *alg)
{
memset(&alg->stats, 0, sizeof(alg->stats));
}
EXPORT_SYMBOL_GPL(crypto_stats_init);
void crypto_stats_get(struct crypto_alg *alg)
{
crypto_alg_get(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_get);
void crypto_stats_ablkcipher_encrypt(unsigned int nbytes, int ret,
struct crypto_alg *alg)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
atomic64_inc(&alg->stats.cipher.err_cnt);
} else {
atomic64_inc(&alg->stats.cipher.encrypt_cnt);
atomic64_add(nbytes, &alg->stats.cipher.encrypt_tlen);
}
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_ablkcipher_encrypt);
void crypto_stats_ablkcipher_decrypt(unsigned int nbytes, int ret,
struct crypto_alg *alg)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
atomic64_inc(&alg->stats.cipher.err_cnt);
} else {
atomic64_inc(&alg->stats.cipher.decrypt_cnt);
atomic64_add(nbytes, &alg->stats.cipher.decrypt_tlen);
}
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_ablkcipher_decrypt);
void crypto_stats_aead_encrypt(unsigned int cryptlen, struct crypto_alg *alg,
int ret)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
atomic64_inc(&alg->stats.aead.err_cnt);
} else {
atomic64_inc(&alg->stats.aead.encrypt_cnt);
atomic64_add(cryptlen, &alg->stats.aead.encrypt_tlen);
}
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_aead_encrypt);
void crypto_stats_aead_decrypt(unsigned int cryptlen, struct crypto_alg *alg,
int ret)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
atomic64_inc(&alg->stats.aead.err_cnt);
} else {
atomic64_inc(&alg->stats.aead.decrypt_cnt);
atomic64_add(cryptlen, &alg->stats.aead.decrypt_tlen);
}
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_aead_decrypt);
void crypto_stats_akcipher_encrypt(unsigned int src_len, int ret,
struct crypto_alg *alg)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
atomic64_inc(&alg->stats.akcipher.err_cnt);
} else {
atomic64_inc(&alg->stats.akcipher.encrypt_cnt);
atomic64_add(src_len, &alg->stats.akcipher.encrypt_tlen);
}
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_akcipher_encrypt);
void crypto_stats_akcipher_decrypt(unsigned int src_len, int ret,
struct crypto_alg *alg)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
atomic64_inc(&alg->stats.akcipher.err_cnt);
} else {
atomic64_inc(&alg->stats.akcipher.decrypt_cnt);
atomic64_add(src_len, &alg->stats.akcipher.decrypt_tlen);
}
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_akcipher_decrypt);
void crypto_stats_akcipher_sign(int ret, struct crypto_alg *alg)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY)
atomic64_inc(&alg->stats.akcipher.err_cnt);
else
atomic64_inc(&alg->stats.akcipher.sign_cnt);
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_akcipher_sign);
void crypto_stats_akcipher_verify(int ret, struct crypto_alg *alg)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY)
atomic64_inc(&alg->stats.akcipher.err_cnt);
else
atomic64_inc(&alg->stats.akcipher.verify_cnt);
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_akcipher_verify);
void crypto_stats_compress(unsigned int slen, int ret, struct crypto_alg *alg)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
atomic64_inc(&alg->stats.compress.err_cnt);
} else {
atomic64_inc(&alg->stats.compress.compress_cnt);
atomic64_add(slen, &alg->stats.compress.compress_tlen);
}
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_compress);
void crypto_stats_decompress(unsigned int slen, int ret, struct crypto_alg *alg)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
atomic64_inc(&alg->stats.compress.err_cnt);
} else {
atomic64_inc(&alg->stats.compress.decompress_cnt);
atomic64_add(slen, &alg->stats.compress.decompress_tlen);
}
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_decompress);
void crypto_stats_ahash_update(unsigned int nbytes, int ret,
struct crypto_alg *alg)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY)
atomic64_inc(&alg->stats.hash.err_cnt);
else
atomic64_add(nbytes, &alg->stats.hash.hash_tlen);
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_ahash_update);
void crypto_stats_ahash_final(unsigned int nbytes, int ret,
struct crypto_alg *alg)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
atomic64_inc(&alg->stats.hash.err_cnt);
} else {
atomic64_inc(&alg->stats.hash.hash_cnt);
atomic64_add(nbytes, &alg->stats.hash.hash_tlen);
}
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_ahash_final);
void crypto_stats_kpp_set_secret(struct crypto_alg *alg, int ret)
{
if (ret)
atomic64_inc(&alg->stats.kpp.err_cnt);
else
atomic64_inc(&alg->stats.kpp.setsecret_cnt);
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_kpp_set_secret);
void crypto_stats_kpp_generate_public_key(struct crypto_alg *alg, int ret)
{
if (ret)
atomic64_inc(&alg->stats.kpp.err_cnt);
else
atomic64_inc(&alg->stats.kpp.generate_public_key_cnt);
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_kpp_generate_public_key);
void crypto_stats_kpp_compute_shared_secret(struct crypto_alg *alg, int ret)
{
if (ret)
atomic64_inc(&alg->stats.kpp.err_cnt);
else
atomic64_inc(&alg->stats.kpp.compute_shared_secret_cnt);
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_kpp_compute_shared_secret);
void crypto_stats_rng_seed(struct crypto_alg *alg, int ret)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY)
atomic64_inc(&alg->stats.rng.err_cnt);
else
atomic64_inc(&alg->stats.rng.seed_cnt);
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_rng_seed);
void crypto_stats_rng_generate(struct crypto_alg *alg, unsigned int dlen,
int ret)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
atomic64_inc(&alg->stats.rng.err_cnt);
} else {
atomic64_inc(&alg->stats.rng.generate_cnt);
atomic64_add(dlen, &alg->stats.rng.generate_tlen);
}
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_rng_generate);
void crypto_stats_skcipher_encrypt(unsigned int cryptlen, int ret,
struct crypto_alg *alg)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
atomic64_inc(&alg->stats.cipher.err_cnt);
} else {
atomic64_inc(&alg->stats.cipher.encrypt_cnt);
atomic64_add(cryptlen, &alg->stats.cipher.encrypt_tlen);
}
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_skcipher_encrypt);
void crypto_stats_skcipher_decrypt(unsigned int cryptlen, int ret,
struct crypto_alg *alg)
{
if (ret && ret != -EINPROGRESS && ret != -EBUSY) {
atomic64_inc(&alg->stats.cipher.err_cnt);
} else {
atomic64_inc(&alg->stats.cipher.decrypt_cnt);
atomic64_add(cryptlen, &alg->stats.cipher.decrypt_tlen);
}
crypto_alg_put(alg);
}
EXPORT_SYMBOL_GPL(crypto_stats_skcipher_decrypt);
#endif
static int __init crypto_algapi_init(void)
{
crypto_init_proc();

View File

@ -507,23 +507,18 @@ static int crypto_blkcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_report_blkcipher rblkcipher;
strncpy(rblkcipher.type, "blkcipher", sizeof(rblkcipher.type));
strncpy(rblkcipher.geniv, alg->cra_blkcipher.geniv ?: "<default>",
sizeof(rblkcipher.geniv));
rblkcipher.geniv[sizeof(rblkcipher.geniv) - 1] = '\0';
memset(&rblkcipher, 0, sizeof(rblkcipher));
strscpy(rblkcipher.type, "blkcipher", sizeof(rblkcipher.type));
strscpy(rblkcipher.geniv, "<default>", sizeof(rblkcipher.geniv));
rblkcipher.blocksize = alg->cra_blocksize;
rblkcipher.min_keysize = alg->cra_blkcipher.min_keysize;
rblkcipher.max_keysize = alg->cra_blkcipher.max_keysize;
rblkcipher.ivsize = alg->cra_blkcipher.ivsize;
if (nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER,
sizeof(struct crypto_report_blkcipher), &rblkcipher))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER,
sizeof(rblkcipher), &rblkcipher);
}
#else
static int crypto_blkcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
@ -541,8 +536,7 @@ static void crypto_blkcipher_show(struct seq_file *m, struct crypto_alg *alg)
seq_printf(m, "min keysize : %u\n", alg->cra_blkcipher.min_keysize);
seq_printf(m, "max keysize : %u\n", alg->cra_blkcipher.max_keysize);
seq_printf(m, "ivsize : %u\n", alg->cra_blkcipher.ivsize);
seq_printf(m, "geniv : %s\n", alg->cra_blkcipher.geniv ?:
"<default>");
seq_printf(m, "geniv : <default>\n");
}
const struct crypto_type crypto_blkcipher_type = {

View File

@ -144,7 +144,7 @@ static int crypto_cfb_decrypt_segment(struct skcipher_walk *walk,
do {
crypto_cfb_encrypt_one(tfm, iv, dst);
crypto_xor(dst, iv, bsize);
crypto_xor(dst, src, bsize);
iv = src;
src += bsize;

View File

@ -1,137 +0,0 @@
/*
* ChaCha20 256-bit cipher algorithm, RFC7539
*
* Copyright (C) 2015 Martin Willi
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*/
#include <asm/unaligned.h>
#include <crypto/algapi.h>
#include <crypto/chacha20.h>
#include <crypto/internal/skcipher.h>
#include <linux/module.h>
static void chacha20_docrypt(u32 *state, u8 *dst, const u8 *src,
unsigned int bytes)
{
/* aligned to potentially speed up crypto_xor() */
u8 stream[CHACHA20_BLOCK_SIZE] __aligned(sizeof(long));
if (dst != src)
memcpy(dst, src, bytes);
while (bytes >= CHACHA20_BLOCK_SIZE) {
chacha20_block(state, stream);
crypto_xor(dst, stream, CHACHA20_BLOCK_SIZE);
bytes -= CHACHA20_BLOCK_SIZE;
dst += CHACHA20_BLOCK_SIZE;
}
if (bytes) {
chacha20_block(state, stream);
crypto_xor(dst, stream, bytes);
}
}
void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv)
{
state[0] = 0x61707865; /* "expa" */
state[1] = 0x3320646e; /* "nd 3" */
state[2] = 0x79622d32; /* "2-by" */
state[3] = 0x6b206574; /* "te k" */
state[4] = ctx->key[0];
state[5] = ctx->key[1];
state[6] = ctx->key[2];
state[7] = ctx->key[3];
state[8] = ctx->key[4];
state[9] = ctx->key[5];
state[10] = ctx->key[6];
state[11] = ctx->key[7];
state[12] = get_unaligned_le32(iv + 0);
state[13] = get_unaligned_le32(iv + 4);
state[14] = get_unaligned_le32(iv + 8);
state[15] = get_unaligned_le32(iv + 12);
}
EXPORT_SYMBOL_GPL(crypto_chacha20_init);
int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keysize)
{
struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
int i;
if (keysize != CHACHA20_KEY_SIZE)
return -EINVAL;
for (i = 0; i < ARRAY_SIZE(ctx->key); i++)
ctx->key[i] = get_unaligned_le32(key + i * sizeof(u32));
return 0;
}
EXPORT_SYMBOL_GPL(crypto_chacha20_setkey);
int crypto_chacha20_crypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
struct skcipher_walk walk;
u32 state[16];
int err;
err = skcipher_walk_virt(&walk, req, true);
crypto_chacha20_init(state, ctx, walk.iv);
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
if (nbytes < walk.total)
nbytes = round_down(nbytes, walk.stride);
chacha20_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr,
nbytes);
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
}
return err;
}
EXPORT_SYMBOL_GPL(crypto_chacha20_crypt);
static struct skcipher_alg alg = {
.base.cra_name = "chacha20",
.base.cra_driver_name = "chacha20-generic",
.base.cra_priority = 100,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha20_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA20_KEY_SIZE,
.max_keysize = CHACHA20_KEY_SIZE,
.ivsize = CHACHA20_IV_SIZE,
.chunksize = CHACHA20_BLOCK_SIZE,
.setkey = crypto_chacha20_setkey,
.encrypt = crypto_chacha20_crypt,
.decrypt = crypto_chacha20_crypt,
};
static int __init chacha20_generic_mod_init(void)
{
return crypto_register_skcipher(&alg);
}
static void __exit chacha20_generic_mod_fini(void)
{
crypto_unregister_skcipher(&alg);
}
module_init(chacha20_generic_mod_init);
module_exit(chacha20_generic_mod_fini);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
MODULE_DESCRIPTION("chacha20 cipher algorithm");
MODULE_ALIAS_CRYPTO("chacha20");
MODULE_ALIAS_CRYPTO("chacha20-generic");

View File

@ -13,7 +13,7 @@
#include <crypto/internal/hash.h>
#include <crypto/internal/skcipher.h>
#include <crypto/scatterwalk.h>
#include <crypto/chacha20.h>
#include <crypto/chacha.h>
#include <crypto/poly1305.h>
#include <linux/err.h>
#include <linux/init.h>
@ -22,8 +22,6 @@
#include "internal.h"
#define CHACHAPOLY_IV_SIZE 12
struct chachapoly_instance_ctx {
struct crypto_skcipher_spawn chacha;
struct crypto_ahash_spawn poly;
@ -51,7 +49,7 @@ struct poly_req {
};
struct chacha_req {
u8 iv[CHACHA20_IV_SIZE];
u8 iv[CHACHA_IV_SIZE];
struct scatterlist src[1];
struct skcipher_request req; /* must be last member */
};
@ -91,7 +89,7 @@ static void chacha_iv(u8 *iv, struct aead_request *req, u32 icb)
memcpy(iv, &leicb, sizeof(leicb));
memcpy(iv + sizeof(leicb), ctx->salt, ctx->saltlen);
memcpy(iv + sizeof(leicb) + ctx->saltlen, req->iv,
CHACHA20_IV_SIZE - sizeof(leicb) - ctx->saltlen);
CHACHA_IV_SIZE - sizeof(leicb) - ctx->saltlen);
}
static int poly_verify_tag(struct aead_request *req)
@ -494,7 +492,7 @@ static int chachapoly_setkey(struct crypto_aead *aead, const u8 *key,
struct chachapoly_ctx *ctx = crypto_aead_ctx(aead);
int err;
if (keylen != ctx->saltlen + CHACHA20_KEY_SIZE)
if (keylen != ctx->saltlen + CHACHA_KEY_SIZE)
return -EINVAL;
keylen -= ctx->saltlen;
@ -639,7 +637,7 @@ static int chachapoly_create(struct crypto_template *tmpl, struct rtattr **tb,
err = -EINVAL;
/* Need 16-byte IV size, including Initial Block Counter value */
if (crypto_skcipher_alg_ivsize(chacha) != CHACHA20_IV_SIZE)
if (crypto_skcipher_alg_ivsize(chacha) != CHACHA_IV_SIZE)
goto out_drop_chacha;
/* Not a stream cipher? */
if (chacha->base.cra_blocksize != 1)

217
crypto/chacha_generic.c Normal file
View File

@ -0,0 +1,217 @@
/*
* ChaCha and XChaCha stream ciphers, including ChaCha20 (RFC7539)
*
* Copyright (C) 2015 Martin Willi
* Copyright (C) 2018 Google LLC
*
* This program is free software; you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation; either version 2 of the License, or
* (at your option) any later version.
*/
#include <asm/unaligned.h>
#include <crypto/algapi.h>
#include <crypto/chacha.h>
#include <crypto/internal/skcipher.h>
#include <linux/module.h>
static void chacha_docrypt(u32 *state, u8 *dst, const u8 *src,
unsigned int bytes, int nrounds)
{
/* aligned to potentially speed up crypto_xor() */
u8 stream[CHACHA_BLOCK_SIZE] __aligned(sizeof(long));
if (dst != src)
memcpy(dst, src, bytes);
while (bytes >= CHACHA_BLOCK_SIZE) {
chacha_block(state, stream, nrounds);
crypto_xor(dst, stream, CHACHA_BLOCK_SIZE);
bytes -= CHACHA_BLOCK_SIZE;
dst += CHACHA_BLOCK_SIZE;
}
if (bytes) {
chacha_block(state, stream, nrounds);
crypto_xor(dst, stream, bytes);
}
}
static int chacha_stream_xor(struct skcipher_request *req,
struct chacha_ctx *ctx, u8 *iv)
{
struct skcipher_walk walk;
u32 state[16];
int err;
err = skcipher_walk_virt(&walk, req, false);
crypto_chacha_init(state, ctx, iv);
while (walk.nbytes > 0) {
unsigned int nbytes = walk.nbytes;
if (nbytes < walk.total)
nbytes = round_down(nbytes, walk.stride);
chacha_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr,
nbytes, ctx->nrounds);
err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
}
return err;
}
void crypto_chacha_init(u32 *state, struct chacha_ctx *ctx, u8 *iv)
{
state[0] = 0x61707865; /* "expa" */
state[1] = 0x3320646e; /* "nd 3" */
state[2] = 0x79622d32; /* "2-by" */
state[3] = 0x6b206574; /* "te k" */
state[4] = ctx->key[0];
state[5] = ctx->key[1];
state[6] = ctx->key[2];
state[7] = ctx->key[3];
state[8] = ctx->key[4];
state[9] = ctx->key[5];
state[10] = ctx->key[6];
state[11] = ctx->key[7];
state[12] = get_unaligned_le32(iv + 0);
state[13] = get_unaligned_le32(iv + 4);
state[14] = get_unaligned_le32(iv + 8);
state[15] = get_unaligned_le32(iv + 12);
}
EXPORT_SYMBOL_GPL(crypto_chacha_init);
static int chacha_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keysize, int nrounds)
{
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
int i;
if (keysize != CHACHA_KEY_SIZE)
return -EINVAL;
for (i = 0; i < ARRAY_SIZE(ctx->key); i++)
ctx->key[i] = get_unaligned_le32(key + i * sizeof(u32));
ctx->nrounds = nrounds;
return 0;
}
int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keysize)
{
return chacha_setkey(tfm, key, keysize, 20);
}
EXPORT_SYMBOL_GPL(crypto_chacha20_setkey);
int crypto_chacha12_setkey(struct crypto_skcipher *tfm, const u8 *key,
unsigned int keysize)
{
return chacha_setkey(tfm, key, keysize, 12);
}
EXPORT_SYMBOL_GPL(crypto_chacha12_setkey);
int crypto_chacha_crypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
return chacha_stream_xor(req, ctx, req->iv);
}
EXPORT_SYMBOL_GPL(crypto_chacha_crypt);
int crypto_xchacha_crypt(struct skcipher_request *req)
{
struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
struct chacha_ctx *ctx = crypto_skcipher_ctx(tfm);
struct chacha_ctx subctx;
u32 state[16];
u8 real_iv[16];
/* Compute the subkey given the original key and first 128 nonce bits */
crypto_chacha_init(state, ctx, req->iv);
hchacha_block(state, subctx.key, ctx->nrounds);
subctx.nrounds = ctx->nrounds;
/* Build the real IV */
memcpy(&real_iv[0], req->iv + 24, 8); /* stream position */
memcpy(&real_iv[8], req->iv + 16, 8); /* remaining 64 nonce bits */
/* Generate the stream and XOR it with the data */
return chacha_stream_xor(req, &subctx, real_iv);
}
EXPORT_SYMBOL_GPL(crypto_xchacha_crypt);
static struct skcipher_alg algs[] = {
{
.base.cra_name = "chacha20",
.base.cra_driver_name = "chacha20-generic",
.base.cra_priority = 100,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = CHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha20_setkey,
.encrypt = crypto_chacha_crypt,
.decrypt = crypto_chacha_crypt,
}, {
.base.cra_name = "xchacha20",
.base.cra_driver_name = "xchacha20-generic",
.base.cra_priority = 100,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = XCHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha20_setkey,
.encrypt = crypto_xchacha_crypt,
.decrypt = crypto_xchacha_crypt,
}, {
.base.cra_name = "xchacha12",
.base.cra_driver_name = "xchacha12-generic",
.base.cra_priority = 100,
.base.cra_blocksize = 1,
.base.cra_ctxsize = sizeof(struct chacha_ctx),
.base.cra_module = THIS_MODULE,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = XCHACHA_IV_SIZE,
.chunksize = CHACHA_BLOCK_SIZE,
.setkey = crypto_chacha12_setkey,
.encrypt = crypto_xchacha_crypt,
.decrypt = crypto_xchacha_crypt,
}
};
static int __init chacha_generic_mod_init(void)
{
return crypto_register_skciphers(algs, ARRAY_SIZE(algs));
}
static void __exit chacha_generic_mod_fini(void)
{
crypto_unregister_skciphers(algs, ARRAY_SIZE(algs));
}
module_init(chacha_generic_mod_init);
module_exit(chacha_generic_mod_fini);
MODULE_LICENSE("GPL");
MODULE_AUTHOR("Martin Willi <martin@strongswan.org>");
MODULE_DESCRIPTION("ChaCha and XChaCha stream ciphers (generic)");
MODULE_ALIAS_CRYPTO("chacha20");
MODULE_ALIAS_CRYPTO("chacha20-generic");
MODULE_ALIAS_CRYPTO("xchacha20");
MODULE_ALIAS_CRYPTO("xchacha20-generic");
MODULE_ALIAS_CRYPTO("xchacha12");
MODULE_ALIAS_CRYPTO("xchacha12-generic");

View File

@ -422,8 +422,6 @@ static int cryptd_create_blkcipher(struct crypto_template *tmpl,
inst->alg.cra_ablkcipher.min_keysize = alg->cra_blkcipher.min_keysize;
inst->alg.cra_ablkcipher.max_keysize = alg->cra_blkcipher.max_keysize;
inst->alg.cra_ablkcipher.geniv = alg->cra_blkcipher.geniv;
inst->alg.cra_ctxsize = sizeof(struct cryptd_blkcipher_ctx);
inst->alg.cra_init = cryptd_blkcipher_init_tfm;
@ -1174,7 +1172,7 @@ struct cryptd_ablkcipher *cryptd_alloc_ablkcipher(const char *alg_name,
return ERR_PTR(-EINVAL);
type = crypto_skcipher_type(type);
mask &= ~CRYPTO_ALG_TYPE_MASK;
mask |= (CRYPTO_ALG_GENIV | CRYPTO_ALG_TYPE_BLKCIPHER_MASK);
mask |= CRYPTO_ALG_TYPE_BLKCIPHER_MASK;
tfm = crypto_alloc_base(cryptd_alg_name, type, mask);
if (IS_ERR(tfm))
return ERR_CAST(tfm);

View File

@ -84,87 +84,38 @@ static int crypto_report_cipher(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_report_cipher rcipher;
strncpy(rcipher.type, "cipher", sizeof(rcipher.type));
memset(&rcipher, 0, sizeof(rcipher));
strscpy(rcipher.type, "cipher", sizeof(rcipher.type));
rcipher.blocksize = alg->cra_blocksize;
rcipher.min_keysize = alg->cra_cipher.cia_min_keysize;
rcipher.max_keysize = alg->cra_cipher.cia_max_keysize;
if (nla_put(skb, CRYPTOCFGA_REPORT_CIPHER,
sizeof(struct crypto_report_cipher), &rcipher))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_REPORT_CIPHER,
sizeof(rcipher), &rcipher);
}
static int crypto_report_comp(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_report_comp rcomp;
strncpy(rcomp.type, "compression", sizeof(rcomp.type));
if (nla_put(skb, CRYPTOCFGA_REPORT_COMPRESS,
sizeof(struct crypto_report_comp), &rcomp))
goto nla_put_failure;
return 0;
memset(&rcomp, 0, sizeof(rcomp));
nla_put_failure:
return -EMSGSIZE;
}
strscpy(rcomp.type, "compression", sizeof(rcomp.type));
static int crypto_report_acomp(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_report_acomp racomp;
strncpy(racomp.type, "acomp", sizeof(racomp.type));
if (nla_put(skb, CRYPTOCFGA_REPORT_ACOMP,
sizeof(struct crypto_report_acomp), &racomp))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
}
static int crypto_report_akcipher(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_report_akcipher rakcipher;
strncpy(rakcipher.type, "akcipher", sizeof(rakcipher.type));
if (nla_put(skb, CRYPTOCFGA_REPORT_AKCIPHER,
sizeof(struct crypto_report_akcipher), &rakcipher))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
}
static int crypto_report_kpp(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_report_kpp rkpp;
strncpy(rkpp.type, "kpp", sizeof(rkpp.type));
if (nla_put(skb, CRYPTOCFGA_REPORT_KPP,
sizeof(struct crypto_report_kpp), &rkpp))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_REPORT_COMPRESS, sizeof(rcomp), &rcomp);
}
static int crypto_report_one(struct crypto_alg *alg,
struct crypto_user_alg *ualg, struct sk_buff *skb)
{
strncpy(ualg->cru_name, alg->cra_name, sizeof(ualg->cru_name));
strncpy(ualg->cru_driver_name, alg->cra_driver_name,
memset(ualg, 0, sizeof(*ualg));
strscpy(ualg->cru_name, alg->cra_name, sizeof(ualg->cru_name));
strscpy(ualg->cru_driver_name, alg->cra_driver_name,
sizeof(ualg->cru_driver_name));
strncpy(ualg->cru_module_name, module_name(alg->cra_module),
strscpy(ualg->cru_module_name, module_name(alg->cra_module),
sizeof(ualg->cru_module_name));
ualg->cru_type = 0;
@ -177,9 +128,9 @@ static int crypto_report_one(struct crypto_alg *alg,
if (alg->cra_flags & CRYPTO_ALG_LARVAL) {
struct crypto_report_larval rl;
strncpy(rl.type, "larval", sizeof(rl.type));
if (nla_put(skb, CRYPTOCFGA_REPORT_LARVAL,
sizeof(struct crypto_report_larval), &rl))
memset(&rl, 0, sizeof(rl));
strscpy(rl.type, "larval", sizeof(rl.type));
if (nla_put(skb, CRYPTOCFGA_REPORT_LARVAL, sizeof(rl), &rl))
goto nla_put_failure;
goto out;
}
@ -202,20 +153,6 @@ static int crypto_report_one(struct crypto_alg *alg,
goto nla_put_failure;
break;
case CRYPTO_ALG_TYPE_ACOMPRESS:
if (crypto_report_acomp(skb, alg))
goto nla_put_failure;
break;
case CRYPTO_ALG_TYPE_AKCIPHER:
if (crypto_report_akcipher(skb, alg))
goto nla_put_failure;
break;
case CRYPTO_ALG_TYPE_KPP:
if (crypto_report_kpp(skb, alg))
goto nla_put_failure;
break;
}
out:
@ -294,30 +231,33 @@ drop_alg:
static int crypto_dump_report(struct sk_buff *skb, struct netlink_callback *cb)
{
struct crypto_alg *alg;
const size_t start_pos = cb->args[0];
size_t pos = 0;
struct crypto_dump_info info;
int err;
if (cb->args[0])
goto out;
cb->args[0] = 1;
struct crypto_alg *alg;
int res;
info.in_skb = cb->skb;
info.out_skb = skb;
info.nlmsg_seq = cb->nlh->nlmsg_seq;
info.nlmsg_flags = NLM_F_MULTI;
down_read(&crypto_alg_sem);
list_for_each_entry(alg, &crypto_alg_list, cra_list) {
err = crypto_report_alg(alg, &info);
if (err)
goto out_err;
if (pos >= start_pos) {
res = crypto_report_alg(alg, &info);
if (res == -EMSGSIZE)
break;
if (res)
goto out;
}
pos++;
}
cb->args[0] = pos;
res = skb->len;
out:
return skb->len;
out_err:
return err;
up_read(&crypto_alg_sem);
return res;
}
static int crypto_dump_report_done(struct netlink_callback *cb)
@ -483,9 +423,7 @@ static const struct crypto_link {
.dump = crypto_dump_report,
.done = crypto_dump_report_done},
[CRYPTO_MSG_DELRNG - CRYPTO_MSG_BASE] = { .doit = crypto_del_rng },
[CRYPTO_MSG_GETSTAT - CRYPTO_MSG_BASE] = { .doit = crypto_reportstat,
.dump = crypto_dump_reportstat,
.done = crypto_dump_reportstat_done},
[CRYPTO_MSG_GETSTAT - CRYPTO_MSG_BASE] = { .doit = crypto_reportstat},
};
static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
@ -505,7 +443,7 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
if ((type == (CRYPTO_MSG_GETALG - CRYPTO_MSG_BASE) &&
(nlh->nlmsg_flags & NLM_F_DUMP))) {
struct crypto_alg *alg;
u16 dump_alloc = 0;
unsigned long dump_alloc = 0;
if (link->dump == NULL)
return -EINVAL;
@ -513,16 +451,16 @@ static int crypto_user_rcv_msg(struct sk_buff *skb, struct nlmsghdr *nlh,
down_read(&crypto_alg_sem);
list_for_each_entry(alg, &crypto_alg_list, cra_list)
dump_alloc += CRYPTO_REPORT_MAXSIZE;
up_read(&crypto_alg_sem);
{
struct netlink_dump_control c = {
.dump = link->dump,
.done = link->done,
.min_dump_alloc = dump_alloc,
.min_dump_alloc = min(dump_alloc, 65535UL),
};
err = netlink_dump_start(crypto_nlsk, skb, nlh, &c);
}
up_read(&crypto_alg_sem);
return err;
}

View File

@ -33,260 +33,149 @@ struct crypto_dump_info {
static int crypto_report_aead(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_stat raead;
u64 v64;
u32 v32;
struct crypto_stat_aead raead;
memset(&raead, 0, sizeof(raead));
strncpy(raead.type, "aead", sizeof(raead.type));
strscpy(raead.type, "aead", sizeof(raead.type));
v32 = atomic_read(&alg->encrypt_cnt);
raead.stat_encrypt_cnt = v32;
v64 = atomic64_read(&alg->encrypt_tlen);
raead.stat_encrypt_tlen = v64;
v32 = atomic_read(&alg->decrypt_cnt);
raead.stat_decrypt_cnt = v32;
v64 = atomic64_read(&alg->decrypt_tlen);
raead.stat_decrypt_tlen = v64;
v32 = atomic_read(&alg->aead_err_cnt);
raead.stat_aead_err_cnt = v32;
raead.stat_encrypt_cnt = atomic64_read(&alg->stats.aead.encrypt_cnt);
raead.stat_encrypt_tlen = atomic64_read(&alg->stats.aead.encrypt_tlen);
raead.stat_decrypt_cnt = atomic64_read(&alg->stats.aead.decrypt_cnt);
raead.stat_decrypt_tlen = atomic64_read(&alg->stats.aead.decrypt_tlen);
raead.stat_err_cnt = atomic64_read(&alg->stats.aead.err_cnt);
if (nla_put(skb, CRYPTOCFGA_STAT_AEAD,
sizeof(struct crypto_stat), &raead))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_STAT_AEAD, sizeof(raead), &raead);
}
static int crypto_report_cipher(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_stat rcipher;
u64 v64;
u32 v32;
struct crypto_stat_cipher rcipher;
memset(&rcipher, 0, sizeof(rcipher));
strlcpy(rcipher.type, "cipher", sizeof(rcipher.type));
strscpy(rcipher.type, "cipher", sizeof(rcipher.type));
v32 = atomic_read(&alg->encrypt_cnt);
rcipher.stat_encrypt_cnt = v32;
v64 = atomic64_read(&alg->encrypt_tlen);
rcipher.stat_encrypt_tlen = v64;
v32 = atomic_read(&alg->decrypt_cnt);
rcipher.stat_decrypt_cnt = v32;
v64 = atomic64_read(&alg->decrypt_tlen);
rcipher.stat_decrypt_tlen = v64;
v32 = atomic_read(&alg->cipher_err_cnt);
rcipher.stat_cipher_err_cnt = v32;
rcipher.stat_encrypt_cnt = atomic64_read(&alg->stats.cipher.encrypt_cnt);
rcipher.stat_encrypt_tlen = atomic64_read(&alg->stats.cipher.encrypt_tlen);
rcipher.stat_decrypt_cnt = atomic64_read(&alg->stats.cipher.decrypt_cnt);
rcipher.stat_decrypt_tlen = atomic64_read(&alg->stats.cipher.decrypt_tlen);
rcipher.stat_err_cnt = atomic64_read(&alg->stats.cipher.err_cnt);
if (nla_put(skb, CRYPTOCFGA_STAT_CIPHER,
sizeof(struct crypto_stat), &rcipher))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_STAT_CIPHER, sizeof(rcipher), &rcipher);
}
static int crypto_report_comp(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_stat rcomp;
u64 v64;
u32 v32;
struct crypto_stat_compress rcomp;
memset(&rcomp, 0, sizeof(rcomp));
strlcpy(rcomp.type, "compression", sizeof(rcomp.type));
v32 = atomic_read(&alg->compress_cnt);
rcomp.stat_compress_cnt = v32;
v64 = atomic64_read(&alg->compress_tlen);
rcomp.stat_compress_tlen = v64;
v32 = atomic_read(&alg->decompress_cnt);
rcomp.stat_decompress_cnt = v32;
v64 = atomic64_read(&alg->decompress_tlen);
rcomp.stat_decompress_tlen = v64;
v32 = atomic_read(&alg->cipher_err_cnt);
rcomp.stat_compress_err_cnt = v32;
strscpy(rcomp.type, "compression", sizeof(rcomp.type));
rcomp.stat_compress_cnt = atomic64_read(&alg->stats.compress.compress_cnt);
rcomp.stat_compress_tlen = atomic64_read(&alg->stats.compress.compress_tlen);
rcomp.stat_decompress_cnt = atomic64_read(&alg->stats.compress.decompress_cnt);
rcomp.stat_decompress_tlen = atomic64_read(&alg->stats.compress.decompress_tlen);
rcomp.stat_err_cnt = atomic64_read(&alg->stats.compress.err_cnt);
if (nla_put(skb, CRYPTOCFGA_STAT_COMPRESS,
sizeof(struct crypto_stat), &rcomp))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_STAT_COMPRESS, sizeof(rcomp), &rcomp);
}
static int crypto_report_acomp(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_stat racomp;
u64 v64;
u32 v32;
struct crypto_stat_compress racomp;
memset(&racomp, 0, sizeof(racomp));
strlcpy(racomp.type, "acomp", sizeof(racomp.type));
v32 = atomic_read(&alg->compress_cnt);
racomp.stat_compress_cnt = v32;
v64 = atomic64_read(&alg->compress_tlen);
racomp.stat_compress_tlen = v64;
v32 = atomic_read(&alg->decompress_cnt);
racomp.stat_decompress_cnt = v32;
v64 = atomic64_read(&alg->decompress_tlen);
racomp.stat_decompress_tlen = v64;
v32 = atomic_read(&alg->cipher_err_cnt);
racomp.stat_compress_err_cnt = v32;
strscpy(racomp.type, "acomp", sizeof(racomp.type));
racomp.stat_compress_cnt = atomic64_read(&alg->stats.compress.compress_cnt);
racomp.stat_compress_tlen = atomic64_read(&alg->stats.compress.compress_tlen);
racomp.stat_decompress_cnt = atomic64_read(&alg->stats.compress.decompress_cnt);
racomp.stat_decompress_tlen = atomic64_read(&alg->stats.compress.decompress_tlen);
racomp.stat_err_cnt = atomic64_read(&alg->stats.compress.err_cnt);
if (nla_put(skb, CRYPTOCFGA_STAT_ACOMP,
sizeof(struct crypto_stat), &racomp))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_STAT_ACOMP, sizeof(racomp), &racomp);
}
static int crypto_report_akcipher(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_stat rakcipher;
u64 v64;
u32 v32;
struct crypto_stat_akcipher rakcipher;
memset(&rakcipher, 0, sizeof(rakcipher));
strncpy(rakcipher.type, "akcipher", sizeof(rakcipher.type));
v32 = atomic_read(&alg->encrypt_cnt);
rakcipher.stat_encrypt_cnt = v32;
v64 = atomic64_read(&alg->encrypt_tlen);
rakcipher.stat_encrypt_tlen = v64;
v32 = atomic_read(&alg->decrypt_cnt);
rakcipher.stat_decrypt_cnt = v32;
v64 = atomic64_read(&alg->decrypt_tlen);
rakcipher.stat_decrypt_tlen = v64;
v32 = atomic_read(&alg->sign_cnt);
rakcipher.stat_sign_cnt = v32;
v32 = atomic_read(&alg->verify_cnt);
rakcipher.stat_verify_cnt = v32;
v32 = atomic_read(&alg->akcipher_err_cnt);
rakcipher.stat_akcipher_err_cnt = v32;
strscpy(rakcipher.type, "akcipher", sizeof(rakcipher.type));
rakcipher.stat_encrypt_cnt = atomic64_read(&alg->stats.akcipher.encrypt_cnt);
rakcipher.stat_encrypt_tlen = atomic64_read(&alg->stats.akcipher.encrypt_tlen);
rakcipher.stat_decrypt_cnt = atomic64_read(&alg->stats.akcipher.decrypt_cnt);
rakcipher.stat_decrypt_tlen = atomic64_read(&alg->stats.akcipher.decrypt_tlen);
rakcipher.stat_sign_cnt = atomic64_read(&alg->stats.akcipher.sign_cnt);
rakcipher.stat_verify_cnt = atomic64_read(&alg->stats.akcipher.verify_cnt);
rakcipher.stat_err_cnt = atomic64_read(&alg->stats.akcipher.err_cnt);
if (nla_put(skb, CRYPTOCFGA_STAT_AKCIPHER,
sizeof(struct crypto_stat), &rakcipher))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_STAT_AKCIPHER,
sizeof(rakcipher), &rakcipher);
}
static int crypto_report_kpp(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_stat rkpp;
u32 v;
struct crypto_stat_kpp rkpp;
memset(&rkpp, 0, sizeof(rkpp));
strlcpy(rkpp.type, "kpp", sizeof(rkpp.type));
strscpy(rkpp.type, "kpp", sizeof(rkpp.type));
v = atomic_read(&alg->setsecret_cnt);
rkpp.stat_setsecret_cnt = v;
v = atomic_read(&alg->generate_public_key_cnt);
rkpp.stat_generate_public_key_cnt = v;
v = atomic_read(&alg->compute_shared_secret_cnt);
rkpp.stat_compute_shared_secret_cnt = v;
v = atomic_read(&alg->kpp_err_cnt);
rkpp.stat_kpp_err_cnt = v;
rkpp.stat_setsecret_cnt = atomic64_read(&alg->stats.kpp.setsecret_cnt);
rkpp.stat_generate_public_key_cnt = atomic64_read(&alg->stats.kpp.generate_public_key_cnt);
rkpp.stat_compute_shared_secret_cnt = atomic64_read(&alg->stats.kpp.compute_shared_secret_cnt);
rkpp.stat_err_cnt = atomic64_read(&alg->stats.kpp.err_cnt);
if (nla_put(skb, CRYPTOCFGA_STAT_KPP,
sizeof(struct crypto_stat), &rkpp))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_STAT_KPP, sizeof(rkpp), &rkpp);
}
static int crypto_report_ahash(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_stat rhash;
u64 v64;
u32 v32;
struct crypto_stat_hash rhash;
memset(&rhash, 0, sizeof(rhash));
strncpy(rhash.type, "ahash", sizeof(rhash.type));
strscpy(rhash.type, "ahash", sizeof(rhash.type));
v32 = atomic_read(&alg->hash_cnt);
rhash.stat_hash_cnt = v32;
v64 = atomic64_read(&alg->hash_tlen);
rhash.stat_hash_tlen = v64;
v32 = atomic_read(&alg->hash_err_cnt);
rhash.stat_hash_err_cnt = v32;
rhash.stat_hash_cnt = atomic64_read(&alg->stats.hash.hash_cnt);
rhash.stat_hash_tlen = atomic64_read(&alg->stats.hash.hash_tlen);
rhash.stat_err_cnt = atomic64_read(&alg->stats.hash.err_cnt);
if (nla_put(skb, CRYPTOCFGA_STAT_HASH,
sizeof(struct crypto_stat), &rhash))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_STAT_HASH, sizeof(rhash), &rhash);
}
static int crypto_report_shash(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_stat rhash;
u64 v64;
u32 v32;
struct crypto_stat_hash rhash;
memset(&rhash, 0, sizeof(rhash));
strncpy(rhash.type, "shash", sizeof(rhash.type));
strscpy(rhash.type, "shash", sizeof(rhash.type));
v32 = atomic_read(&alg->hash_cnt);
rhash.stat_hash_cnt = v32;
v64 = atomic64_read(&alg->hash_tlen);
rhash.stat_hash_tlen = v64;
v32 = atomic_read(&alg->hash_err_cnt);
rhash.stat_hash_err_cnt = v32;
rhash.stat_hash_cnt = atomic64_read(&alg->stats.hash.hash_cnt);
rhash.stat_hash_tlen = atomic64_read(&alg->stats.hash.hash_tlen);
rhash.stat_err_cnt = atomic64_read(&alg->stats.hash.err_cnt);
if (nla_put(skb, CRYPTOCFGA_STAT_HASH,
sizeof(struct crypto_stat), &rhash))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_STAT_HASH, sizeof(rhash), &rhash);
}
static int crypto_report_rng(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_stat rrng;
u64 v64;
u32 v32;
struct crypto_stat_rng rrng;
memset(&rrng, 0, sizeof(rrng));
strncpy(rrng.type, "rng", sizeof(rrng.type));
strscpy(rrng.type, "rng", sizeof(rrng.type));
v32 = atomic_read(&alg->generate_cnt);
rrng.stat_generate_cnt = v32;
v64 = atomic64_read(&alg->generate_tlen);
rrng.stat_generate_tlen = v64;
v32 = atomic_read(&alg->seed_cnt);
rrng.stat_seed_cnt = v32;
v32 = atomic_read(&alg->hash_err_cnt);
rrng.stat_rng_err_cnt = v32;
rrng.stat_generate_cnt = atomic64_read(&alg->stats.rng.generate_cnt);
rrng.stat_generate_tlen = atomic64_read(&alg->stats.rng.generate_tlen);
rrng.stat_seed_cnt = atomic64_read(&alg->stats.rng.seed_cnt);
rrng.stat_err_cnt = atomic64_read(&alg->stats.rng.err_cnt);
if (nla_put(skb, CRYPTOCFGA_STAT_RNG,
sizeof(struct crypto_stat), &rrng))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_STAT_RNG, sizeof(rrng), &rrng);
}
static int crypto_reportstat_one(struct crypto_alg *alg,
@ -295,10 +184,10 @@ static int crypto_reportstat_one(struct crypto_alg *alg,
{
memset(ualg, 0, sizeof(*ualg));
strlcpy(ualg->cru_name, alg->cra_name, sizeof(ualg->cru_name));
strlcpy(ualg->cru_driver_name, alg->cra_driver_name,
strscpy(ualg->cru_name, alg->cra_name, sizeof(ualg->cru_name));
strscpy(ualg->cru_driver_name, alg->cra_driver_name,
sizeof(ualg->cru_driver_name));
strlcpy(ualg->cru_module_name, module_name(alg->cra_module),
strscpy(ualg->cru_module_name, module_name(alg->cra_module),
sizeof(ualg->cru_module_name));
ualg->cru_type = 0;
@ -309,12 +198,11 @@ static int crypto_reportstat_one(struct crypto_alg *alg,
if (nla_put_u32(skb, CRYPTOCFGA_PRIORITY_VAL, alg->cra_priority))
goto nla_put_failure;
if (alg->cra_flags & CRYPTO_ALG_LARVAL) {
struct crypto_stat rl;
struct crypto_stat_larval rl;
memset(&rl, 0, sizeof(rl));
strlcpy(rl.type, "larval", sizeof(rl.type));
if (nla_put(skb, CRYPTOCFGA_STAT_LARVAL,
sizeof(struct crypto_stat), &rl))
strscpy(rl.type, "larval", sizeof(rl.type));
if (nla_put(skb, CRYPTOCFGA_STAT_LARVAL, sizeof(rl), &rl))
goto nla_put_failure;
goto out;
}
@ -448,37 +336,4 @@ drop_alg:
return nlmsg_unicast(crypto_nlsk, skb, NETLINK_CB(in_skb).portid);
}
int crypto_dump_reportstat(struct sk_buff *skb, struct netlink_callback *cb)
{
struct crypto_alg *alg;
struct crypto_dump_info info;
int err;
if (cb->args[0])
goto out;
cb->args[0] = 1;
info.in_skb = cb->skb;
info.out_skb = skb;
info.nlmsg_seq = cb->nlh->nlmsg_seq;
info.nlmsg_flags = NLM_F_MULTI;
list_for_each_entry(alg, &crypto_alg_list, cra_list) {
err = crypto_reportstat_alg(alg, &info);
if (err)
goto out_err;
}
out:
return skb->len;
out_err:
return err;
}
int crypto_dump_reportstat_done(struct netlink_callback *cb)
{
return 0;
}
MODULE_LICENSE("GPL");

View File

@ -233,8 +233,6 @@ static struct crypto_instance *crypto_ctr_alloc(struct rtattr **tb)
inst->alg.cra_blkcipher.encrypt = crypto_ctr_crypt;
inst->alg.cra_blkcipher.decrypt = crypto_ctr_crypt;
inst->alg.cra_blkcipher.geniv = "chainiv";
out:
crypto_mod_put(alg);
return inst;

View File

@ -842,15 +842,23 @@ static void xycz_add_c(u64 *x1, u64 *y1, u64 *x2, u64 *y2, u64 *curve_prime,
static void ecc_point_mult(struct ecc_point *result,
const struct ecc_point *point, const u64 *scalar,
u64 *initial_z, u64 *curve_prime,
u64 *initial_z, const struct ecc_curve *curve,
unsigned int ndigits)
{
/* R0 and R1 */
u64 rx[2][ECC_MAX_DIGITS];
u64 ry[2][ECC_MAX_DIGITS];
u64 z[ECC_MAX_DIGITS];
u64 sk[2][ECC_MAX_DIGITS];
u64 *curve_prime = curve->p;
int i, nb;
int num_bits = vli_num_bits(scalar, ndigits);
int num_bits;
int carry;
carry = vli_add(sk[0], scalar, curve->n, ndigits);
vli_add(sk[1], sk[0], curve->n, ndigits);
scalar = sk[!carry];
num_bits = sizeof(u64) * ndigits * 8 + 1;
vli_set(rx[1], point->x, ndigits);
vli_set(ry[1], point->y, ndigits);
@ -904,28 +912,41 @@ static inline void ecc_swap_digits(const u64 *in, u64 *out,
out[i] = __swab64(in[ndigits - 1 - i]);
}
static int __ecc_is_key_valid(const struct ecc_curve *curve,
const u64 *private_key, unsigned int ndigits)
{
u64 one[ECC_MAX_DIGITS] = { 1, };
u64 res[ECC_MAX_DIGITS];
if (!private_key)
return -EINVAL;
if (curve->g.ndigits != ndigits)
return -EINVAL;
/* Make sure the private key is in the range [2, n-3]. */
if (vli_cmp(one, private_key, ndigits) != -1)
return -EINVAL;
vli_sub(res, curve->n, one, ndigits);
vli_sub(res, res, one, ndigits);
if (vli_cmp(res, private_key, ndigits) != 1)
return -EINVAL;
return 0;
}
int ecc_is_key_valid(unsigned int curve_id, unsigned int ndigits,
const u64 *private_key, unsigned int private_key_len)
{
int nbytes;
const struct ecc_curve *curve = ecc_get_curve(curve_id);
if (!private_key)
return -EINVAL;
nbytes = ndigits << ECC_DIGITS_TO_BYTES_SHIFT;
if (private_key_len != nbytes)
return -EINVAL;
if (vli_is_zero(private_key, ndigits))
return -EINVAL;
/* Make sure the private key is in the range [1, n-1]. */
if (vli_cmp(curve->n, private_key, ndigits) != 1)
return -EINVAL;
return 0;
return __ecc_is_key_valid(curve, private_key, ndigits);
}
/*
@ -971,11 +992,8 @@ int ecc_gen_privkey(unsigned int curve_id, unsigned int ndigits, u64 *privkey)
if (err)
return err;
if (vli_is_zero(priv, ndigits))
return -EINVAL;
/* Make sure the private key is in the range [1, n-1]. */
if (vli_cmp(curve->n, priv, ndigits) != 1)
/* Make sure the private key is in the valid range. */
if (__ecc_is_key_valid(curve, priv, ndigits))
return -EINVAL;
ecc_swap_digits(priv, privkey, ndigits);
@ -1004,7 +1022,7 @@ int ecc_make_pub_key(unsigned int curve_id, unsigned int ndigits,
goto out;
}
ecc_point_mult(pk, &curve->g, priv, NULL, curve->p, ndigits);
ecc_point_mult(pk, &curve->g, priv, NULL, curve, ndigits);
if (ecc_point_is_zero(pk)) {
ret = -EAGAIN;
goto err_free_point;
@ -1090,7 +1108,7 @@ int crypto_ecdh_shared_secret(unsigned int curve_id, unsigned int ndigits,
goto err_alloc_product;
}
ecc_point_mult(product, pk, priv, rand_z, curve->p, ndigits);
ecc_point_mult(product, pk, priv, rand_z, curve, ndigits);
ecc_swap_digits(product->x, secret, ndigits);

View File

@ -32,6 +32,8 @@ const char *const hash_algo_name[HASH_ALGO__LAST] = {
[HASH_ALGO_TGR_160] = "tgr160",
[HASH_ALGO_TGR_192] = "tgr192",
[HASH_ALGO_SM3_256] = "sm3-256",
[HASH_ALGO_STREEBOG_256] = "streebog256",
[HASH_ALGO_STREEBOG_512] = "streebog512",
};
EXPORT_SYMBOL_GPL(hash_algo_name);
@ -54,5 +56,7 @@ const int hash_digest_size[HASH_ALGO__LAST] = {
[HASH_ALGO_TGR_160] = TGR160_DIGEST_SIZE,
[HASH_ALGO_TGR_192] = TGR192_DIGEST_SIZE,
[HASH_ALGO_SM3_256] = SM3256_DIGEST_SIZE,
[HASH_ALGO_STREEBOG_256] = STREEBOG256_DIGEST_SIZE,
[HASH_ALGO_STREEBOG_512] = STREEBOG512_DIGEST_SIZE,
};
EXPORT_SYMBOL_GPL(hash_digest_size);

View File

@ -30,15 +30,11 @@ static int crypto_kpp_report(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_report_kpp rkpp;
strncpy(rkpp.type, "kpp", sizeof(rkpp.type));
memset(&rkpp, 0, sizeof(rkpp));
if (nla_put(skb, CRYPTOCFGA_REPORT_KPP,
sizeof(struct crypto_report_kpp), &rkpp))
goto nla_put_failure;
return 0;
strscpy(rkpp.type, "kpp", sizeof(rkpp.type));
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_REPORT_KPP, sizeof(rkpp), &rkpp);
}
#else
static int crypto_kpp_report(struct sk_buff *skb, struct crypto_alg *alg)

View File

@ -122,7 +122,6 @@ static struct crypto_alg alg_lz4 = {
.cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
.cra_ctxsize = sizeof(struct lz4_ctx),
.cra_module = THIS_MODULE,
.cra_list = LIST_HEAD_INIT(alg_lz4.cra_list),
.cra_init = lz4_init,
.cra_exit = lz4_exit,
.cra_u = { .compress = {

View File

@ -123,7 +123,6 @@ static struct crypto_alg alg_lz4hc = {
.cra_flags = CRYPTO_ALG_TYPE_COMPRESS,
.cra_ctxsize = sizeof(struct lz4hc_ctx),
.cra_module = THIS_MODULE,
.cra_list = LIST_HEAD_INIT(alg_lz4hc.cra_list),
.cra_init = lz4hc_init,
.cra_exit = lz4hc_exit,
.cra_u = { .compress = {

254
crypto/nhpoly1305.c Normal file
View File

@ -0,0 +1,254 @@
// SPDX-License-Identifier: GPL-2.0
/*
* NHPoly1305 - ε-almost--universal hash function for Adiantum
*
* Copyright 2018 Google LLC
*/
/*
* "NHPoly1305" is the main component of Adiantum hashing.
* Specifically, it is the calculation
*
* H_L Poly1305_{K_L}(NH_{K_N}(pad_{128}(L)))
*
* from the procedure in section 6.4 of the Adiantum paper [1]. It is an
* ε-almost--universal (ε-U) hash function for equal-length inputs over
* Z/(2^{128}Z), where the "" operation is addition. It hashes 1024-byte
* chunks of the input with the NH hash function [2], reducing the input length
* by 32x. The resulting NH digests are evaluated as a polynomial in
* GF(2^{130}-5), like in the Poly1305 MAC [3]. Note that the polynomial
* evaluation by itself would suffice to achieve the ε-U property; NH is used
* for performance since it's over twice as fast as Poly1305.
*
* This is *not* a cryptographic hash function; do not use it as such!
*
* [1] Adiantum: length-preserving encryption for entry-level processors
* (https://eprint.iacr.org/2018/720.pdf)
* [2] UMAC: Fast and Secure Message Authentication
* (https://fastcrypto.org/umac/umac_proc.pdf)
* [3] The Poly1305-AES message-authentication code
* (https://cr.yp.to/mac/poly1305-20050329.pdf)
*/
#include <asm/unaligned.h>
#include <crypto/algapi.h>
#include <crypto/internal/hash.h>
#include <crypto/nhpoly1305.h>
#include <linux/crypto.h>
#include <linux/kernel.h>
#include <linux/module.h>
static void nh_generic(const u32 *key, const u8 *message, size_t message_len,
__le64 hash[NH_NUM_PASSES])
{
u64 sums[4] = { 0, 0, 0, 0 };
BUILD_BUG_ON(NH_PAIR_STRIDE != 2);
BUILD_BUG_ON(NH_NUM_PASSES != 4);
while (message_len) {
u32 m0 = get_unaligned_le32(message + 0);
u32 m1 = get_unaligned_le32(message + 4);
u32 m2 = get_unaligned_le32(message + 8);
u32 m3 = get_unaligned_le32(message + 12);
sums[0] += (u64)(u32)(m0 + key[ 0]) * (u32)(m2 + key[ 2]);
sums[1] += (u64)(u32)(m0 + key[ 4]) * (u32)(m2 + key[ 6]);
sums[2] += (u64)(u32)(m0 + key[ 8]) * (u32)(m2 + key[10]);
sums[3] += (u64)(u32)(m0 + key[12]) * (u32)(m2 + key[14]);
sums[0] += (u64)(u32)(m1 + key[ 1]) * (u32)(m3 + key[ 3]);
sums[1] += (u64)(u32)(m1 + key[ 5]) * (u32)(m3 + key[ 7]);
sums[2] += (u64)(u32)(m1 + key[ 9]) * (u32)(m3 + key[11]);
sums[3] += (u64)(u32)(m1 + key[13]) * (u32)(m3 + key[15]);
key += NH_MESSAGE_UNIT / sizeof(key[0]);
message += NH_MESSAGE_UNIT;
message_len -= NH_MESSAGE_UNIT;
}
hash[0] = cpu_to_le64(sums[0]);
hash[1] = cpu_to_le64(sums[1]);
hash[2] = cpu_to_le64(sums[2]);
hash[3] = cpu_to_le64(sums[3]);
}
/* Pass the next NH hash value through Poly1305 */
static void process_nh_hash_value(struct nhpoly1305_state *state,
const struct nhpoly1305_key *key)
{
BUILD_BUG_ON(NH_HASH_BYTES % POLY1305_BLOCK_SIZE != 0);
poly1305_core_blocks(&state->poly_state, &key->poly_key, state->nh_hash,
NH_HASH_BYTES / POLY1305_BLOCK_SIZE);
}
/*
* Feed the next portion of the source data, as a whole number of 16-byte
* "NH message units", through NH and Poly1305. Each NH hash is taken over
* 1024 bytes, except possibly the final one which is taken over a multiple of
* 16 bytes up to 1024. Also, in the case where data is passed in misaligned
* chunks, we combine partial hashes; the end result is the same either way.
*/
static void nhpoly1305_units(struct nhpoly1305_state *state,
const struct nhpoly1305_key *key,
const u8 *src, unsigned int srclen, nh_t nh_fn)
{
do {
unsigned int bytes;
if (state->nh_remaining == 0) {
/* Starting a new NH message */
bytes = min_t(unsigned int, srclen, NH_MESSAGE_BYTES);
nh_fn(key->nh_key, src, bytes, state->nh_hash);
state->nh_remaining = NH_MESSAGE_BYTES - bytes;
} else {
/* Continuing a previous NH message */
__le64 tmp_hash[NH_NUM_PASSES];
unsigned int pos;
int i;
pos = NH_MESSAGE_BYTES - state->nh_remaining;
bytes = min(srclen, state->nh_remaining);
nh_fn(&key->nh_key[pos / 4], src, bytes, tmp_hash);
for (i = 0; i < NH_NUM_PASSES; i++)
le64_add_cpu(&state->nh_hash[i],
le64_to_cpu(tmp_hash[i]));
state->nh_remaining -= bytes;
}
if (state->nh_remaining == 0)
process_nh_hash_value(state, key);
src += bytes;
srclen -= bytes;
} while (srclen);
}
int crypto_nhpoly1305_setkey(struct crypto_shash *tfm,
const u8 *key, unsigned int keylen)
{
struct nhpoly1305_key *ctx = crypto_shash_ctx(tfm);
int i;
if (keylen != NHPOLY1305_KEY_SIZE)
return -EINVAL;
poly1305_core_setkey(&ctx->poly_key, key);
key += POLY1305_BLOCK_SIZE;
for (i = 0; i < NH_KEY_WORDS; i++)
ctx->nh_key[i] = get_unaligned_le32(key + i * sizeof(u32));
return 0;
}
EXPORT_SYMBOL(crypto_nhpoly1305_setkey);
int crypto_nhpoly1305_init(struct shash_desc *desc)
{
struct nhpoly1305_state *state = shash_desc_ctx(desc);
poly1305_core_init(&state->poly_state);
state->buflen = 0;
state->nh_remaining = 0;
return 0;
}
EXPORT_SYMBOL(crypto_nhpoly1305_init);
int crypto_nhpoly1305_update_helper(struct shash_desc *desc,
const u8 *src, unsigned int srclen,
nh_t nh_fn)
{
struct nhpoly1305_state *state = shash_desc_ctx(desc);
const struct nhpoly1305_key *key = crypto_shash_ctx(desc->tfm);
unsigned int bytes;
if (state->buflen) {
bytes = min(srclen, (int)NH_MESSAGE_UNIT - state->buflen);
memcpy(&state->buffer[state->buflen], src, bytes);
state->buflen += bytes;
if (state->buflen < NH_MESSAGE_UNIT)
return 0;
nhpoly1305_units(state, key, state->buffer, NH_MESSAGE_UNIT,
nh_fn);
state->buflen = 0;
src += bytes;
srclen -= bytes;
}
if (srclen >= NH_MESSAGE_UNIT) {
bytes = round_down(srclen, NH_MESSAGE_UNIT);
nhpoly1305_units(state, key, src, bytes, nh_fn);
src += bytes;
srclen -= bytes;
}
if (srclen) {
memcpy(state->buffer, src, srclen);
state->buflen = srclen;
}
return 0;
}
EXPORT_SYMBOL(crypto_nhpoly1305_update_helper);
int crypto_nhpoly1305_update(struct shash_desc *desc,
const u8 *src, unsigned int srclen)
{
return crypto_nhpoly1305_update_helper(desc, src, srclen, nh_generic);
}
EXPORT_SYMBOL(crypto_nhpoly1305_update);
int crypto_nhpoly1305_final_helper(struct shash_desc *desc, u8 *dst, nh_t nh_fn)
{
struct nhpoly1305_state *state = shash_desc_ctx(desc);
const struct nhpoly1305_key *key = crypto_shash_ctx(desc->tfm);
if (state->buflen) {
memset(&state->buffer[state->buflen], 0,
NH_MESSAGE_UNIT - state->buflen);
nhpoly1305_units(state, key, state->buffer, NH_MESSAGE_UNIT,
nh_fn);
}
if (state->nh_remaining)
process_nh_hash_value(state, key);
poly1305_core_emit(&state->poly_state, dst);
return 0;
}
EXPORT_SYMBOL(crypto_nhpoly1305_final_helper);
int crypto_nhpoly1305_final(struct shash_desc *desc, u8 *dst)
{
return crypto_nhpoly1305_final_helper(desc, dst, nh_generic);
}
EXPORT_SYMBOL(crypto_nhpoly1305_final);
static struct shash_alg nhpoly1305_alg = {
.base.cra_name = "nhpoly1305",
.base.cra_driver_name = "nhpoly1305-generic",
.base.cra_priority = 100,
.base.cra_ctxsize = sizeof(struct nhpoly1305_key),
.base.cra_module = THIS_MODULE,
.digestsize = POLY1305_DIGEST_SIZE,
.init = crypto_nhpoly1305_init,
.update = crypto_nhpoly1305_update,
.final = crypto_nhpoly1305_final,
.setkey = crypto_nhpoly1305_setkey,
.descsize = sizeof(struct nhpoly1305_state),
};
static int __init nhpoly1305_mod_init(void)
{
return crypto_register_shash(&nhpoly1305_alg);
}
static void __exit nhpoly1305_mod_exit(void)
{
crypto_unregister_shash(&nhpoly1305_alg);
}
module_init(nhpoly1305_mod_init);
module_exit(nhpoly1305_mod_exit);
MODULE_DESCRIPTION("NHPoly1305 ε-almost-∆-universal hash function");
MODULE_LICENSE("GPL v2");
MODULE_AUTHOR("Eric Biggers <ebiggers@google.com>");
MODULE_ALIAS_CRYPTO("nhpoly1305");
MODULE_ALIAS_CRYPTO("nhpoly1305-generic");

View File

@ -394,7 +394,7 @@ static int pcrypt_sysfs_add(struct padata_instance *pinst, const char *name)
int ret;
pinst->kobj.kset = pcrypt_kset;
ret = kobject_add(&pinst->kobj, NULL, name);
ret = kobject_add(&pinst->kobj, NULL, "%s", name);
if (!ret)
kobject_uevent(&pinst->kobj, KOBJ_ADD);

View File

@ -38,7 +38,7 @@ int crypto_poly1305_init(struct shash_desc *desc)
{
struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
memset(dctx->h, 0, sizeof(dctx->h));
poly1305_core_init(&dctx->h);
dctx->buflen = 0;
dctx->rset = false;
dctx->sset = false;
@ -47,23 +47,16 @@ int crypto_poly1305_init(struct shash_desc *desc)
}
EXPORT_SYMBOL_GPL(crypto_poly1305_init);
static void poly1305_setrkey(struct poly1305_desc_ctx *dctx, const u8 *key)
void poly1305_core_setkey(struct poly1305_key *key, const u8 *raw_key)
{
/* r &= 0xffffffc0ffffffc0ffffffc0fffffff */
dctx->r[0] = (get_unaligned_le32(key + 0) >> 0) & 0x3ffffff;
dctx->r[1] = (get_unaligned_le32(key + 3) >> 2) & 0x3ffff03;
dctx->r[2] = (get_unaligned_le32(key + 6) >> 4) & 0x3ffc0ff;
dctx->r[3] = (get_unaligned_le32(key + 9) >> 6) & 0x3f03fff;
dctx->r[4] = (get_unaligned_le32(key + 12) >> 8) & 0x00fffff;
}
static void poly1305_setskey(struct poly1305_desc_ctx *dctx, const u8 *key)
{
dctx->s[0] = get_unaligned_le32(key + 0);
dctx->s[1] = get_unaligned_le32(key + 4);
dctx->s[2] = get_unaligned_le32(key + 8);
dctx->s[3] = get_unaligned_le32(key + 12);
key->r[0] = (get_unaligned_le32(raw_key + 0) >> 0) & 0x3ffffff;
key->r[1] = (get_unaligned_le32(raw_key + 3) >> 2) & 0x3ffff03;
key->r[2] = (get_unaligned_le32(raw_key + 6) >> 4) & 0x3ffc0ff;
key->r[3] = (get_unaligned_le32(raw_key + 9) >> 6) & 0x3f03fff;
key->r[4] = (get_unaligned_le32(raw_key + 12) >> 8) & 0x00fffff;
}
EXPORT_SYMBOL_GPL(poly1305_core_setkey);
/*
* Poly1305 requires a unique key for each tag, which implies that we can't set
@ -75,13 +68,16 @@ unsigned int crypto_poly1305_setdesckey(struct poly1305_desc_ctx *dctx,
{
if (!dctx->sset) {
if (!dctx->rset && srclen >= POLY1305_BLOCK_SIZE) {
poly1305_setrkey(dctx, src);
poly1305_core_setkey(&dctx->r, src);
src += POLY1305_BLOCK_SIZE;
srclen -= POLY1305_BLOCK_SIZE;
dctx->rset = true;
}
if (srclen >= POLY1305_BLOCK_SIZE) {
poly1305_setskey(dctx, src);
dctx->s[0] = get_unaligned_le32(src + 0);
dctx->s[1] = get_unaligned_le32(src + 4);
dctx->s[2] = get_unaligned_le32(src + 8);
dctx->s[3] = get_unaligned_le32(src + 12);
src += POLY1305_BLOCK_SIZE;
srclen -= POLY1305_BLOCK_SIZE;
dctx->sset = true;
@ -91,41 +87,37 @@ unsigned int crypto_poly1305_setdesckey(struct poly1305_desc_ctx *dctx,
}
EXPORT_SYMBOL_GPL(crypto_poly1305_setdesckey);
static unsigned int poly1305_blocks(struct poly1305_desc_ctx *dctx,
const u8 *src, unsigned int srclen,
u32 hibit)
static void poly1305_blocks_internal(struct poly1305_state *state,
const struct poly1305_key *key,
const void *src, unsigned int nblocks,
u32 hibit)
{
u32 r0, r1, r2, r3, r4;
u32 s1, s2, s3, s4;
u32 h0, h1, h2, h3, h4;
u64 d0, d1, d2, d3, d4;
unsigned int datalen;
if (unlikely(!dctx->sset)) {
datalen = crypto_poly1305_setdesckey(dctx, src, srclen);
src += srclen - datalen;
srclen = datalen;
}
if (!nblocks)
return;
r0 = dctx->r[0];
r1 = dctx->r[1];
r2 = dctx->r[2];
r3 = dctx->r[3];
r4 = dctx->r[4];
r0 = key->r[0];
r1 = key->r[1];
r2 = key->r[2];
r3 = key->r[3];
r4 = key->r[4];
s1 = r1 * 5;
s2 = r2 * 5;
s3 = r3 * 5;
s4 = r4 * 5;
h0 = dctx->h[0];
h1 = dctx->h[1];
h2 = dctx->h[2];
h3 = dctx->h[3];
h4 = dctx->h[4];
while (likely(srclen >= POLY1305_BLOCK_SIZE)) {
h0 = state->h[0];
h1 = state->h[1];
h2 = state->h[2];
h3 = state->h[3];
h4 = state->h[4];
do {
/* h += m[i] */
h0 += (get_unaligned_le32(src + 0) >> 0) & 0x3ffffff;
h1 += (get_unaligned_le32(src + 3) >> 2) & 0x3ffffff;
@ -154,16 +146,36 @@ static unsigned int poly1305_blocks(struct poly1305_desc_ctx *dctx,
h1 += h0 >> 26; h0 = h0 & 0x3ffffff;
src += POLY1305_BLOCK_SIZE;
srclen -= POLY1305_BLOCK_SIZE;
} while (--nblocks);
state->h[0] = h0;
state->h[1] = h1;
state->h[2] = h2;
state->h[3] = h3;
state->h[4] = h4;
}
void poly1305_core_blocks(struct poly1305_state *state,
const struct poly1305_key *key,
const void *src, unsigned int nblocks)
{
poly1305_blocks_internal(state, key, src, nblocks, 1 << 24);
}
EXPORT_SYMBOL_GPL(poly1305_core_blocks);
static void poly1305_blocks(struct poly1305_desc_ctx *dctx,
const u8 *src, unsigned int srclen, u32 hibit)
{
unsigned int datalen;
if (unlikely(!dctx->sset)) {
datalen = crypto_poly1305_setdesckey(dctx, src, srclen);
src += srclen - datalen;
srclen = datalen;
}
dctx->h[0] = h0;
dctx->h[1] = h1;
dctx->h[2] = h2;
dctx->h[3] = h3;
dctx->h[4] = h4;
return srclen;
poly1305_blocks_internal(&dctx->h, &dctx->r,
src, srclen / POLY1305_BLOCK_SIZE, hibit);
}
int crypto_poly1305_update(struct shash_desc *desc,
@ -187,9 +199,9 @@ int crypto_poly1305_update(struct shash_desc *desc,
}
if (likely(srclen >= POLY1305_BLOCK_SIZE)) {
bytes = poly1305_blocks(dctx, src, srclen, 1 << 24);
src += srclen - bytes;
srclen = bytes;
poly1305_blocks(dctx, src, srclen, 1 << 24);
src += srclen - (srclen % POLY1305_BLOCK_SIZE);
srclen %= POLY1305_BLOCK_SIZE;
}
if (unlikely(srclen)) {
@ -201,30 +213,18 @@ int crypto_poly1305_update(struct shash_desc *desc,
}
EXPORT_SYMBOL_GPL(crypto_poly1305_update);
int crypto_poly1305_final(struct shash_desc *desc, u8 *dst)
void poly1305_core_emit(const struct poly1305_state *state, void *dst)
{
struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
u32 h0, h1, h2, h3, h4;
u32 g0, g1, g2, g3, g4;
u32 mask;
u64 f = 0;
if (unlikely(!dctx->sset))
return -ENOKEY;
if (unlikely(dctx->buflen)) {
dctx->buf[dctx->buflen++] = 1;
memset(dctx->buf + dctx->buflen, 0,
POLY1305_BLOCK_SIZE - dctx->buflen);
poly1305_blocks(dctx, dctx->buf, POLY1305_BLOCK_SIZE, 0);
}
/* fully carry h */
h0 = dctx->h[0];
h1 = dctx->h[1];
h2 = dctx->h[2];
h3 = dctx->h[3];
h4 = dctx->h[4];
h0 = state->h[0];
h1 = state->h[1];
h2 = state->h[2];
h3 = state->h[3];
h4 = state->h[4];
h2 += (h1 >> 26); h1 = h1 & 0x3ffffff;
h3 += (h2 >> 26); h2 = h2 & 0x3ffffff;
@ -254,16 +254,40 @@ int crypto_poly1305_final(struct shash_desc *desc, u8 *dst)
h4 = (h4 & mask) | g4;
/* h = h % (2^128) */
h0 = (h0 >> 0) | (h1 << 26);
h1 = (h1 >> 6) | (h2 << 20);
h2 = (h2 >> 12) | (h3 << 14);
h3 = (h3 >> 18) | (h4 << 8);
put_unaligned_le32((h0 >> 0) | (h1 << 26), dst + 0);
put_unaligned_le32((h1 >> 6) | (h2 << 20), dst + 4);
put_unaligned_le32((h2 >> 12) | (h3 << 14), dst + 8);
put_unaligned_le32((h3 >> 18) | (h4 << 8), dst + 12);
}
EXPORT_SYMBOL_GPL(poly1305_core_emit);
int crypto_poly1305_final(struct shash_desc *desc, u8 *dst)
{
struct poly1305_desc_ctx *dctx = shash_desc_ctx(desc);
__le32 digest[4];
u64 f = 0;
if (unlikely(!dctx->sset))
return -ENOKEY;
if (unlikely(dctx->buflen)) {
dctx->buf[dctx->buflen++] = 1;
memset(dctx->buf + dctx->buflen, 0,
POLY1305_BLOCK_SIZE - dctx->buflen);
poly1305_blocks(dctx, dctx->buf, POLY1305_BLOCK_SIZE, 0);
}
poly1305_core_emit(&dctx->h, digest);
/* mac = (h + s) % (2^128) */
f = (f >> 32) + h0 + dctx->s[0]; put_unaligned_le32(f, dst + 0);
f = (f >> 32) + h1 + dctx->s[1]; put_unaligned_le32(f, dst + 4);
f = (f >> 32) + h2 + dctx->s[2]; put_unaligned_le32(f, dst + 8);
f = (f >> 32) + h3 + dctx->s[3]; put_unaligned_le32(f, dst + 12);
f = (f >> 32) + le32_to_cpu(digest[0]) + dctx->s[0];
put_unaligned_le32(f, dst + 0);
f = (f >> 32) + le32_to_cpu(digest[1]) + dctx->s[1];
put_unaligned_le32(f, dst + 4);
f = (f >> 32) + le32_to_cpu(digest[2]) + dctx->s[2];
put_unaligned_le32(f, dst + 8);
f = (f >> 32) + le32_to_cpu(digest[3]) + dctx->s[3];
put_unaligned_le32(f, dst + 12);
return 0;
}

View File

@ -35,9 +35,11 @@ static int crypto_default_rng_refcnt;
int crypto_rng_reset(struct crypto_rng *tfm, const u8 *seed, unsigned int slen)
{
struct crypto_alg *alg = tfm->base.__crt_alg;
u8 *buf = NULL;
int err;
crypto_stats_get(alg);
if (!seed && slen) {
buf = kmalloc(slen, GFP_KERNEL);
if (!buf)
@ -50,7 +52,7 @@ int crypto_rng_reset(struct crypto_rng *tfm, const u8 *seed, unsigned int slen)
}
err = crypto_rng_alg(tfm)->seed(tfm, seed, slen);
crypto_stat_rng_seed(tfm, err);
crypto_stats_rng_seed(alg, err);
out:
kzfree(buf);
return err;
@ -74,17 +76,13 @@ static int crypto_rng_report(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_report_rng rrng;
strncpy(rrng.type, "rng", sizeof(rrng.type));
memset(&rrng, 0, sizeof(rrng));
strscpy(rrng.type, "rng", sizeof(rrng.type));
rrng.seedsize = seedsize(alg);
if (nla_put(skb, CRYPTOCFGA_REPORT_RNG,
sizeof(struct crypto_report_rng), &rrng))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_REPORT_RNG, sizeof(rrng), &rrng);
}
#else
static int crypto_rng_report(struct sk_buff *skb, struct crypto_alg *alg)

View File

@ -159,7 +159,7 @@ static int salsa20_crypt(struct skcipher_request *req)
u32 state[16];
int err;
err = skcipher_walk_virt(&walk, req, true);
err = skcipher_walk_virt(&walk, req, false);
salsa20_init(state, ctx, walk.iv);

View File

@ -40,15 +40,12 @@ static int crypto_scomp_report(struct sk_buff *skb, struct crypto_alg *alg)
{
struct crypto_report_comp rscomp;
strncpy(rscomp.type, "scomp", sizeof(rscomp.type));
memset(&rscomp, 0, sizeof(rscomp));
if (nla_put(skb, CRYPTOCFGA_REPORT_COMPRESS,
sizeof(struct crypto_report_comp), &rscomp))
goto nla_put_failure;
return 0;
strscpy(rscomp.type, "scomp", sizeof(rscomp.type));
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_REPORT_COMPRESS,
sizeof(rscomp), &rscomp);
}
#else
static int crypto_scomp_report(struct sk_buff *skb, struct crypto_alg *alg)

View File

@ -408,18 +408,14 @@ static int crypto_shash_report(struct sk_buff *skb, struct crypto_alg *alg)
struct crypto_report_hash rhash;
struct shash_alg *salg = __crypto_shash_alg(alg);
strncpy(rhash.type, "shash", sizeof(rhash.type));
memset(&rhash, 0, sizeof(rhash));
strscpy(rhash.type, "shash", sizeof(rhash.type));
rhash.blocksize = alg->cra_blocksize;
rhash.digestsize = salg->digestsize;
if (nla_put(skb, CRYPTOCFGA_REPORT_HASH,
sizeof(struct crypto_report_hash), &rhash))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_REPORT_HASH, sizeof(rhash), &rhash);
}
#else
static int crypto_shash_report(struct sk_buff *skb, struct crypto_alg *alg)

View File

@ -474,6 +474,8 @@ int skcipher_walk_virt(struct skcipher_walk *walk,
{
int err;
might_sleep_if(req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP);
walk->flags &= ~SKCIPHER_WALK_PHYS;
err = skcipher_walk_skcipher(walk, req);
@ -577,8 +579,7 @@ static unsigned int crypto_skcipher_extsize(struct crypto_alg *alg)
if (alg->cra_type == &crypto_blkcipher_type)
return sizeof(struct crypto_blkcipher *);
if (alg->cra_type == &crypto_ablkcipher_type ||
alg->cra_type == &crypto_givcipher_type)
if (alg->cra_type == &crypto_ablkcipher_type)
return sizeof(struct crypto_ablkcipher *);
return crypto_alg_extsize(alg);
@ -842,8 +843,7 @@ static int crypto_skcipher_init_tfm(struct crypto_tfm *tfm)
if (tfm->__crt_alg->cra_type == &crypto_blkcipher_type)
return crypto_init_skcipher_ops_blkcipher(tfm);
if (tfm->__crt_alg->cra_type == &crypto_ablkcipher_type ||
tfm->__crt_alg->cra_type == &crypto_givcipher_type)
if (tfm->__crt_alg->cra_type == &crypto_ablkcipher_type)
return crypto_init_skcipher_ops_ablkcipher(tfm);
skcipher->setkey = skcipher_setkey;
@ -897,21 +897,18 @@ static int crypto_skcipher_report(struct sk_buff *skb, struct crypto_alg *alg)
struct skcipher_alg *skcipher = container_of(alg, struct skcipher_alg,
base);
strncpy(rblkcipher.type, "skcipher", sizeof(rblkcipher.type));
strncpy(rblkcipher.geniv, "<none>", sizeof(rblkcipher.geniv));
memset(&rblkcipher, 0, sizeof(rblkcipher));
strscpy(rblkcipher.type, "skcipher", sizeof(rblkcipher.type));
strscpy(rblkcipher.geniv, "<none>", sizeof(rblkcipher.geniv));
rblkcipher.blocksize = alg->cra_blocksize;
rblkcipher.min_keysize = skcipher->min_keysize;
rblkcipher.max_keysize = skcipher->max_keysize;
rblkcipher.ivsize = skcipher->ivsize;
if (nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER,
sizeof(struct crypto_report_blkcipher), &rblkcipher))
goto nla_put_failure;
return 0;
nla_put_failure:
return -EMSGSIZE;
return nla_put(skb, CRYPTOCFGA_REPORT_BLKCIPHER,
sizeof(rblkcipher), &rblkcipher);
}
#else
static int crypto_skcipher_report(struct sk_buff *skb, struct crypto_alg *alg)

1140
crypto/streebog_generic.c Normal file

File diff suppressed because it is too large Load Diff

View File

@ -76,10 +76,12 @@ static char *check[] = {
"cast6", "arc4", "michael_mic", "deflate", "crc32c", "tea", "xtea",
"khazad", "wp512", "wp384", "wp256", "tnepres", "xeta", "fcrypt",
"camellia", "seed", "salsa20", "rmd128", "rmd160", "rmd256", "rmd320",
"lzo", "cts", "sha3-224", "sha3-256", "sha3-384", "sha3-512", NULL
"lzo", "cts", "sha3-224", "sha3-256", "sha3-384", "sha3-512",
"streebog256", "streebog512",
NULL
};
static u32 block_sizes[] = { 16, 64, 256, 1024, 8192, 0 };
static u32 block_sizes[] = { 16, 64, 256, 1024, 1472, 8192, 0 };
static u32 aead_sizes[] = { 16, 64, 256, 512, 1024, 2048, 4096, 8192, 0 };
#define XBUFSIZE 8
@ -1736,6 +1738,7 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
ret += tcrypt_test("ctr(aes)");
ret += tcrypt_test("rfc3686(ctr(aes))");
ret += tcrypt_test("ofb(aes)");
ret += tcrypt_test("cfb(aes)");
break;
case 11:
@ -1913,6 +1916,14 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
ret += tcrypt_test("sm3");
break;
case 53:
ret += tcrypt_test("streebog256");
break;
case 54:
ret += tcrypt_test("streebog512");
break;
case 100:
ret += tcrypt_test("hmac(md5)");
break;
@ -1969,6 +1980,14 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
ret += tcrypt_test("hmac(sha3-512)");
break;
case 115:
ret += tcrypt_test("hmac(streebog256)");
break;
case 116:
ret += tcrypt_test("hmac(streebog512)");
break;
case 150:
ret += tcrypt_test("ansi_cprng");
break;
@ -2060,6 +2079,10 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
speed_template_16_24_32);
test_cipher_speed("ctr(aes)", DECRYPT, sec, NULL, 0,
speed_template_16_24_32);
test_cipher_speed("cfb(aes)", ENCRYPT, sec, NULL, 0,
speed_template_16_24_32);
test_cipher_speed("cfb(aes)", DECRYPT, sec, NULL, 0,
speed_template_16_24_32);
break;
case 201:
@ -2297,6 +2320,18 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
test_cipher_speed("ctr(sm4)", DECRYPT, sec, NULL, 0,
speed_template_16);
break;
case 219:
test_cipher_speed("adiantum(xchacha12,aes)", ENCRYPT, sec, NULL,
0, speed_template_32);
test_cipher_speed("adiantum(xchacha12,aes)", DECRYPT, sec, NULL,
0, speed_template_32);
test_cipher_speed("adiantum(xchacha20,aes)", ENCRYPT, sec, NULL,
0, speed_template_32);
test_cipher_speed("adiantum(xchacha20,aes)", DECRYPT, sec, NULL,
0, speed_template_32);
break;
case 300:
if (alg) {
test_hash_speed(alg, sec, generic_hash_speed_template);
@ -2407,6 +2442,16 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
test_hash_speed("sm3", sec, generic_hash_speed_template);
if (mode > 300 && mode < 400) break;
/* fall through */
case 327:
test_hash_speed("streebog256", sec,
generic_hash_speed_template);
if (mode > 300 && mode < 400) break;
/* fall through */
case 328:
test_hash_speed("streebog512", sec,
generic_hash_speed_template);
if (mode > 300 && mode < 400) break;
/* fall through */
case 399:
break;
@ -2520,6 +2565,16 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
num_mb);
if (mode > 400 && mode < 500) break;
/* fall through */
case 426:
test_mb_ahash_speed("streebog256", sec,
generic_hash_speed_template, num_mb);
if (mode > 400 && mode < 500) break;
/* fall through */
case 427:
test_mb_ahash_speed("streebog512", sec,
generic_hash_speed_template, num_mb);
if (mode > 400 && mode < 500) break;
/* fall through */
case 499:
break;

View File

@ -2404,6 +2404,18 @@ static int alg_test_null(const struct alg_test_desc *desc,
/* Please keep this list sorted by algorithm name. */
static const struct alg_test_desc alg_test_descs[] = {
{
.alg = "adiantum(xchacha12,aes)",
.test = alg_test_skcipher,
.suite = {
.cipher = __VECS(adiantum_xchacha12_aes_tv_template)
},
}, {
.alg = "adiantum(xchacha20,aes)",
.test = alg_test_skcipher,
.suite = {
.cipher = __VECS(adiantum_xchacha20_aes_tv_template)
},
}, {
.alg = "aegis128",
.test = alg_test_aead,
.suite = {
@ -2690,6 +2702,13 @@ static const struct alg_test_desc alg_test_descs[] = {
.dec = __VECS(aes_ccm_dec_tv_template)
}
}
}, {
.alg = "cfb(aes)",
.test = alg_test_skcipher,
.fips_allowed = 1,
.suite = {
.cipher = __VECS(aes_cfb_tv_template)
},
}, {
.alg = "chacha20",
.test = alg_test_skcipher,
@ -2805,6 +2824,7 @@ static const struct alg_test_desc alg_test_descs[] = {
}, {
.alg = "cts(cbc(aes))",
.test = alg_test_skcipher,
.fips_allowed = 1,
.suite = {
.cipher = __VECS(cts_mode_tv_template)
}
@ -3184,6 +3204,18 @@ static const struct alg_test_desc alg_test_descs[] = {
.suite = {
.hash = __VECS(hmac_sha512_tv_template)
}
}, {
.alg = "hmac(streebog256)",
.test = alg_test_hash,
.suite = {
.hash = __VECS(hmac_streebog256_tv_template)
}
}, {
.alg = "hmac(streebog512)",
.test = alg_test_hash,
.suite = {
.hash = __VECS(hmac_streebog512_tv_template)
}
}, {
.alg = "jitterentropy_rng",
.fips_allowed = 1,
@ -3291,6 +3323,12 @@ static const struct alg_test_desc alg_test_descs[] = {
.dec = __VECS(morus640_dec_tv_template),
}
}
}, {
.alg = "nhpoly1305",
.test = alg_test_hash,
.suite = {
.hash = __VECS(nhpoly1305_tv_template)
}
}, {
.alg = "ofb(aes)",
.test = alg_test_skcipher,
@ -3496,6 +3534,18 @@ static const struct alg_test_desc alg_test_descs[] = {
.suite = {
.hash = __VECS(sm3_tv_template)
}
}, {
.alg = "streebog256",
.test = alg_test_hash,
.suite = {
.hash = __VECS(streebog256_tv_template)
}
}, {
.alg = "streebog512",
.test = alg_test_hash,
.suite = {
.hash = __VECS(streebog512_tv_template)
}
}, {
.alg = "tgr128",
.test = alg_test_hash,
@ -3544,6 +3594,18 @@ static const struct alg_test_desc alg_test_descs[] = {
.suite = {
.hash = __VECS(aes_xcbc128_tv_template)
}
}, {
.alg = "xchacha12",
.test = alg_test_skcipher,
.suite = {
.cipher = __VECS(xchacha12_tv_template)
},
}, {
.alg = "xchacha20",
.test = alg_test_skcipher,
.suite = {
.cipher = __VECS(xchacha20_tv_template)
},
}, {
.alg = "xts(aes)",
.test = alg_test_skcipher,

File diff suppressed because it is too large Load Diff

View File

@ -3623,7 +3623,7 @@ static int receive_protocol(struct drbd_connection *connection, struct packet_in
* change.
*/
peer_integrity_tfm = crypto_alloc_shash(integrity_alg, 0, CRYPTO_ALG_ASYNC);
peer_integrity_tfm = crypto_alloc_shash(integrity_alg, 0, 0);
if (IS_ERR(peer_integrity_tfm)) {
peer_integrity_tfm = NULL;
drbd_err(connection, "peer data-integrity-alg %s not supported\n",

View File

@ -1,10 +1,7 @@
/**
// SPDX-License-Identifier: GPL-2.0
/*
* Copyright (c) 2010-2012 Broadcom. All rights reserved.
* Copyright (c) 2013 Lubomir Rintel
*
* This program is free software; you can redistribute it and/or
* modify it under the terms of the GNU General Public License ("GPL")
* version 2, as published by the Free Software Foundation.
*/
#include <linux/hw_random.h>

View File

@ -265,7 +265,7 @@
#include <linux/syscalls.h>
#include <linux/completion.h>
#include <linux/uuid.h>
#include <crypto/chacha20.h>
#include <crypto/chacha.h>
#include <asm/processor.h>
#include <linux/uaccess.h>
@ -431,11 +431,10 @@ static int crng_init = 0;
#define crng_ready() (likely(crng_init > 1))
static int crng_init_cnt = 0;
static unsigned long crng_global_init_time = 0;
#define CRNG_INIT_CNT_THRESH (2*CHACHA20_KEY_SIZE)
static void _extract_crng(struct crng_state *crng,
__u8 out[CHACHA20_BLOCK_SIZE]);
#define CRNG_INIT_CNT_THRESH (2*CHACHA_KEY_SIZE)
static void _extract_crng(struct crng_state *crng, __u8 out[CHACHA_BLOCK_SIZE]);
static void _crng_backtrack_protect(struct crng_state *crng,
__u8 tmp[CHACHA20_BLOCK_SIZE], int used);
__u8 tmp[CHACHA_BLOCK_SIZE], int used);
static void process_random_ready_list(void);
static void _get_random_bytes(void *buf, int nbytes);
@ -863,7 +862,7 @@ static int crng_fast_load(const char *cp, size_t len)
}
p = (unsigned char *) &primary_crng.state[4];
while (len > 0 && crng_init_cnt < CRNG_INIT_CNT_THRESH) {
p[crng_init_cnt % CHACHA20_KEY_SIZE] ^= *cp;
p[crng_init_cnt % CHACHA_KEY_SIZE] ^= *cp;
cp++; crng_init_cnt++; len--;
}
spin_unlock_irqrestore(&primary_crng.lock, flags);
@ -895,7 +894,7 @@ static int crng_slow_load(const char *cp, size_t len)
unsigned long flags;
static unsigned char lfsr = 1;
unsigned char tmp;
unsigned i, max = CHACHA20_KEY_SIZE;
unsigned i, max = CHACHA_KEY_SIZE;
const char * src_buf = cp;
char * dest_buf = (char *) &primary_crng.state[4];
@ -913,8 +912,8 @@ static int crng_slow_load(const char *cp, size_t len)
lfsr >>= 1;
if (tmp & 1)
lfsr ^= 0xE1;
tmp = dest_buf[i % CHACHA20_KEY_SIZE];
dest_buf[i % CHACHA20_KEY_SIZE] ^= src_buf[i % len] ^ lfsr;
tmp = dest_buf[i % CHACHA_KEY_SIZE];
dest_buf[i % CHACHA_KEY_SIZE] ^= src_buf[i % len] ^ lfsr;
lfsr += (tmp << 3) | (tmp >> 5);
}
spin_unlock_irqrestore(&primary_crng.lock, flags);
@ -926,7 +925,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
unsigned long flags;
int i, num;
union {
__u8 block[CHACHA20_BLOCK_SIZE];
__u8 block[CHACHA_BLOCK_SIZE];
__u32 key[8];
} buf;
@ -937,7 +936,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
} else {
_extract_crng(&primary_crng, buf.block);
_crng_backtrack_protect(&primary_crng, buf.block,
CHACHA20_KEY_SIZE);
CHACHA_KEY_SIZE);
}
spin_lock_irqsave(&crng->lock, flags);
for (i = 0; i < 8; i++) {
@ -973,7 +972,7 @@ static void crng_reseed(struct crng_state *crng, struct entropy_store *r)
}
static void _extract_crng(struct crng_state *crng,
__u8 out[CHACHA20_BLOCK_SIZE])
__u8 out[CHACHA_BLOCK_SIZE])
{
unsigned long v, flags;
@ -990,7 +989,7 @@ static void _extract_crng(struct crng_state *crng,
spin_unlock_irqrestore(&crng->lock, flags);
}
static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
static void extract_crng(__u8 out[CHACHA_BLOCK_SIZE])
{
struct crng_state *crng = NULL;
@ -1008,14 +1007,14 @@ static void extract_crng(__u8 out[CHACHA20_BLOCK_SIZE])
* enough) to mutate the CRNG key to provide backtracking protection.
*/
static void _crng_backtrack_protect(struct crng_state *crng,
__u8 tmp[CHACHA20_BLOCK_SIZE], int used)
__u8 tmp[CHACHA_BLOCK_SIZE], int used)
{
unsigned long flags;
__u32 *s, *d;
int i;
used = round_up(used, sizeof(__u32));
if (used + CHACHA20_KEY_SIZE > CHACHA20_BLOCK_SIZE) {
if (used + CHACHA_KEY_SIZE > CHACHA_BLOCK_SIZE) {
extract_crng(tmp);
used = 0;
}
@ -1027,7 +1026,7 @@ static void _crng_backtrack_protect(struct crng_state *crng,
spin_unlock_irqrestore(&crng->lock, flags);
}
static void crng_backtrack_protect(__u8 tmp[CHACHA20_BLOCK_SIZE], int used)
static void crng_backtrack_protect(__u8 tmp[CHACHA_BLOCK_SIZE], int used)
{
struct crng_state *crng = NULL;
@ -1042,8 +1041,8 @@ static void crng_backtrack_protect(__u8 tmp[CHACHA20_BLOCK_SIZE], int used)
static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
{
ssize_t ret = 0, i = CHACHA20_BLOCK_SIZE;
__u8 tmp[CHACHA20_BLOCK_SIZE] __aligned(4);
ssize_t ret = 0, i = CHACHA_BLOCK_SIZE;
__u8 tmp[CHACHA_BLOCK_SIZE] __aligned(4);
int large_request = (nbytes > 256);
while (nbytes) {
@ -1057,7 +1056,7 @@ static ssize_t extract_crng_user(void __user *buf, size_t nbytes)
}
extract_crng(tmp);
i = min_t(int, nbytes, CHACHA20_BLOCK_SIZE);
i = min_t(int, nbytes, CHACHA_BLOCK_SIZE);
if (copy_to_user(buf, tmp, i)) {
ret = -EFAULT;
break;
@ -1622,14 +1621,14 @@ static void _warn_unseeded_randomness(const char *func_name, void *caller,
*/
static void _get_random_bytes(void *buf, int nbytes)
{
__u8 tmp[CHACHA20_BLOCK_SIZE] __aligned(4);
__u8 tmp[CHACHA_BLOCK_SIZE] __aligned(4);
trace_get_random_bytes(nbytes, _RET_IP_);
while (nbytes >= CHACHA20_BLOCK_SIZE) {
while (nbytes >= CHACHA_BLOCK_SIZE) {
extract_crng(buf);
buf += CHACHA20_BLOCK_SIZE;
nbytes -= CHACHA20_BLOCK_SIZE;
buf += CHACHA_BLOCK_SIZE;
nbytes -= CHACHA_BLOCK_SIZE;
}
if (nbytes > 0) {
@ -1637,7 +1636,7 @@ static void _get_random_bytes(void *buf, int nbytes)
memcpy(buf, tmp, nbytes);
crng_backtrack_protect(tmp, nbytes);
} else
crng_backtrack_protect(tmp, CHACHA20_BLOCK_SIZE);
crng_backtrack_protect(tmp, CHACHA_BLOCK_SIZE);
memzero_explicit(tmp, sizeof(tmp));
}
@ -2208,8 +2207,8 @@ struct ctl_table random_table[] = {
struct batched_entropy {
union {
u64 entropy_u64[CHACHA20_BLOCK_SIZE / sizeof(u64)];
u32 entropy_u32[CHACHA20_BLOCK_SIZE / sizeof(u32)];
u64 entropy_u64[CHACHA_BLOCK_SIZE / sizeof(u64)];
u32 entropy_u32[CHACHA_BLOCK_SIZE / sizeof(u32)];
};
unsigned int position;
};

View File

@ -762,10 +762,12 @@ config CRYPTO_DEV_CCREE
select CRYPTO_ECB
select CRYPTO_CTR
select CRYPTO_XTS
select CRYPTO_SM4
select CRYPTO_SM3
help
Say 'Y' to enable a driver for the REE interface of the Arm
TrustZone CryptoCell family of processors. Currently the
CryptoCell 712, 710 and 630 are supported.
CryptoCell 713, 703, 712, 710 and 630 are supported.
Choose this if you wish to use hardware acceleration of
cryptographic operations on the system REE.
If unsure say Y.

View File

@ -520,8 +520,7 @@ static int crypto4xx_compute_gcm_hash_key_sw(__le32 *hash_start, const u8 *key,
uint8_t src[16] = { 0 };
int rc = 0;
aes_tfm = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_ASYNC |
CRYPTO_ALG_NEED_FALLBACK);
aes_tfm = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_NEED_FALLBACK);
if (IS_ERR(aes_tfm)) {
rc = PTR_ERR(aes_tfm);
pr_warn("could not load aes cipher driver: %d\n", rc);

View File

@ -3868,7 +3868,6 @@ static struct iproc_alg_s driver_algs[] = {
.cra_driver_name = "ctr-aes-iproc",
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ablkcipher = {
/* .geniv = "chainiv", */
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
@ -4605,7 +4604,6 @@ static int spu_register_ablkcipher(struct iproc_alg_s *driver_alg)
crypto->cra_priority = cipher_pri;
crypto->cra_alignmask = 0;
crypto->cra_ctxsize = sizeof(struct iproc_ctx_s);
INIT_LIST_HEAD(&crypto->cra_list);
crypto->cra_init = ablkcipher_cra_init;
crypto->cra_exit = generic_cra_exit;
@ -4652,12 +4650,16 @@ static int spu_register_ahash(struct iproc_alg_s *driver_alg)
hash->halg.statesize = sizeof(struct spu_hash_export_s);
if (driver_alg->auth_info.mode != HASH_MODE_HMAC) {
hash->setkey = ahash_setkey;
hash->init = ahash_init;
hash->update = ahash_update;
hash->final = ahash_final;
hash->finup = ahash_finup;
hash->digest = ahash_digest;
if ((driver_alg->auth_info.alg == HASH_ALG_AES) &&
((driver_alg->auth_info.mode == HASH_MODE_XCBC) ||
(driver_alg->auth_info.mode == HASH_MODE_CMAC))) {
hash->setkey = ahash_setkey;
}
} else {
hash->setkey = ahash_hmac_setkey;
hash->init = ahash_hmac_init;
@ -4687,7 +4689,6 @@ static int spu_register_aead(struct iproc_alg_s *driver_alg)
aead->base.cra_priority = aead_pri;
aead->base.cra_alignmask = 0;
aead->base.cra_ctxsize = sizeof(struct iproc_ctx_s);
INIT_LIST_HEAD(&aead->base.cra_list);
aead->base.cra_flags |= CRYPTO_ALG_ASYNC;
/* setkey set in alg initialization */

View File

@ -72,6 +72,8 @@
#define AUTHENC_DESC_JOB_IO_LEN (AEAD_DESC_JOB_IO_LEN + \
CAAM_CMD_SZ * 5)
#define CHACHAPOLY_DESC_JOB_IO_LEN (AEAD_DESC_JOB_IO_LEN + CAAM_CMD_SZ * 6)
#define DESC_MAX_USED_BYTES (CAAM_DESC_BYTES_MAX - DESC_JOB_IO_LEN)
#define DESC_MAX_USED_LEN (DESC_MAX_USED_BYTES / CAAM_CMD_SZ)
@ -513,6 +515,61 @@ static int rfc4543_setauthsize(struct crypto_aead *authenc,
return 0;
}
static int chachapoly_set_sh_desc(struct crypto_aead *aead)
{
struct caam_ctx *ctx = crypto_aead_ctx(aead);
struct device *jrdev = ctx->jrdev;
unsigned int ivsize = crypto_aead_ivsize(aead);
u32 *desc;
if (!ctx->cdata.keylen || !ctx->authsize)
return 0;
desc = ctx->sh_desc_enc;
cnstr_shdsc_chachapoly(desc, &ctx->cdata, &ctx->adata, ivsize,
ctx->authsize, true, false);
dma_sync_single_for_device(jrdev, ctx->sh_desc_enc_dma,
desc_bytes(desc), ctx->dir);
desc = ctx->sh_desc_dec;
cnstr_shdsc_chachapoly(desc, &ctx->cdata, &ctx->adata, ivsize,
ctx->authsize, false, false);
dma_sync_single_for_device(jrdev, ctx->sh_desc_dec_dma,
desc_bytes(desc), ctx->dir);
return 0;
}
static int chachapoly_setauthsize(struct crypto_aead *aead,
unsigned int authsize)
{
struct caam_ctx *ctx = crypto_aead_ctx(aead);
if (authsize != POLY1305_DIGEST_SIZE)
return -EINVAL;
ctx->authsize = authsize;
return chachapoly_set_sh_desc(aead);
}
static int chachapoly_setkey(struct crypto_aead *aead, const u8 *key,
unsigned int keylen)
{
struct caam_ctx *ctx = crypto_aead_ctx(aead);
unsigned int ivsize = crypto_aead_ivsize(aead);
unsigned int saltlen = CHACHAPOLY_IV_SIZE - ivsize;
if (keylen != CHACHA_KEY_SIZE + saltlen) {
crypto_aead_set_flags(aead, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
ctx->cdata.key_virt = key;
ctx->cdata.keylen = keylen - saltlen;
return chachapoly_set_sh_desc(aead);
}
static int aead_setkey(struct crypto_aead *aead,
const u8 *key, unsigned int keylen)
{
@ -1031,6 +1088,40 @@ static void init_gcm_job(struct aead_request *req,
/* End of blank commands */
}
static void init_chachapoly_job(struct aead_request *req,
struct aead_edesc *edesc, bool all_contig,
bool encrypt)
{
struct crypto_aead *aead = crypto_aead_reqtfm(req);
unsigned int ivsize = crypto_aead_ivsize(aead);
unsigned int assoclen = req->assoclen;
u32 *desc = edesc->hw_desc;
u32 ctx_iv_off = 4;
init_aead_job(req, edesc, all_contig, encrypt);
if (ivsize != CHACHAPOLY_IV_SIZE) {
/* IPsec specific: CONTEXT1[223:128] = {NONCE, IV} */
ctx_iv_off += 4;
/*
* The associated data comes already with the IV but we need
* to skip it when we authenticate or encrypt...
*/
assoclen -= ivsize;
}
append_math_add_imm_u32(desc, REG3, ZERO, IMM, assoclen);
/*
* For IPsec load the IV further in the same register.
* For RFC7539 simply load the 12 bytes nonce in a single operation
*/
append_load_as_imm(desc, req->iv, ivsize, LDST_CLASS_1_CCB |
LDST_SRCDST_BYTE_CONTEXT |
ctx_iv_off << LDST_OFFSET_SHIFT);
}
static void init_authenc_job(struct aead_request *req,
struct aead_edesc *edesc,
bool all_contig, bool encrypt)
@ -1289,6 +1380,72 @@ static int gcm_encrypt(struct aead_request *req)
return ret;
}
static int chachapoly_encrypt(struct aead_request *req)
{
struct aead_edesc *edesc;
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct caam_ctx *ctx = crypto_aead_ctx(aead);
struct device *jrdev = ctx->jrdev;
bool all_contig;
u32 *desc;
int ret;
edesc = aead_edesc_alloc(req, CHACHAPOLY_DESC_JOB_IO_LEN, &all_contig,
true);
if (IS_ERR(edesc))
return PTR_ERR(edesc);
desc = edesc->hw_desc;
init_chachapoly_job(req, edesc, all_contig, true);
print_hex_dump_debug("chachapoly jobdesc@" __stringify(__LINE__)": ",
DUMP_PREFIX_ADDRESS, 16, 4, desc, desc_bytes(desc),
1);
ret = caam_jr_enqueue(jrdev, desc, aead_encrypt_done, req);
if (!ret) {
ret = -EINPROGRESS;
} else {
aead_unmap(jrdev, edesc, req);
kfree(edesc);
}
return ret;
}
static int chachapoly_decrypt(struct aead_request *req)
{
struct aead_edesc *edesc;
struct crypto_aead *aead = crypto_aead_reqtfm(req);
struct caam_ctx *ctx = crypto_aead_ctx(aead);
struct device *jrdev = ctx->jrdev;
bool all_contig;
u32 *desc;
int ret;
edesc = aead_edesc_alloc(req, CHACHAPOLY_DESC_JOB_IO_LEN, &all_contig,
false);
if (IS_ERR(edesc))
return PTR_ERR(edesc);
desc = edesc->hw_desc;
init_chachapoly_job(req, edesc, all_contig, false);
print_hex_dump_debug("chachapoly jobdesc@" __stringify(__LINE__)": ",
DUMP_PREFIX_ADDRESS, 16, 4, desc, desc_bytes(desc),
1);
ret = caam_jr_enqueue(jrdev, desc, aead_decrypt_done, req);
if (!ret) {
ret = -EINPROGRESS;
} else {
aead_unmap(jrdev, edesc, req);
kfree(edesc);
}
return ret;
}
static int ipsec_gcm_encrypt(struct aead_request *req)
{
if (req->assoclen < 8)
@ -3002,6 +3159,50 @@ static struct caam_aead_alg driver_aeads[] = {
.geniv = true,
},
},
{
.aead = {
.base = {
.cra_name = "rfc7539(chacha20,poly1305)",
.cra_driver_name = "rfc7539-chacha20-poly1305-"
"caam",
.cra_blocksize = 1,
},
.setkey = chachapoly_setkey,
.setauthsize = chachapoly_setauthsize,
.encrypt = chachapoly_encrypt,
.decrypt = chachapoly_decrypt,
.ivsize = CHACHAPOLY_IV_SIZE,
.maxauthsize = POLY1305_DIGEST_SIZE,
},
.caam = {
.class1_alg_type = OP_ALG_ALGSEL_CHACHA20 |
OP_ALG_AAI_AEAD,
.class2_alg_type = OP_ALG_ALGSEL_POLY1305 |
OP_ALG_AAI_AEAD,
},
},
{
.aead = {
.base = {
.cra_name = "rfc7539esp(chacha20,poly1305)",
.cra_driver_name = "rfc7539esp-chacha20-"
"poly1305-caam",
.cra_blocksize = 1,
},
.setkey = chachapoly_setkey,
.setauthsize = chachapoly_setauthsize,
.encrypt = chachapoly_encrypt,
.decrypt = chachapoly_decrypt,
.ivsize = 8,
.maxauthsize = POLY1305_DIGEST_SIZE,
},
.caam = {
.class1_alg_type = OP_ALG_ALGSEL_CHACHA20 |
OP_ALG_AAI_AEAD,
.class2_alg_type = OP_ALG_ALGSEL_POLY1305 |
OP_ALG_AAI_AEAD,
},
},
};
static int caam_init_common(struct caam_ctx *ctx, struct caam_alg_entry *caam,
@ -3135,7 +3336,7 @@ static int __init caam_algapi_init(void)
struct device *ctrldev;
struct caam_drv_private *priv;
int i = 0, err = 0;
u32 cha_vid, cha_inst, des_inst, aes_inst, md_inst;
u32 aes_vid, aes_inst, des_inst, md_vid, md_inst, ccha_inst, ptha_inst;
unsigned int md_limit = SHA512_DIGEST_SIZE;
bool registered = false;
@ -3168,14 +3369,38 @@ static int __init caam_algapi_init(void)
* Register crypto algorithms the device supports.
* First, detect presence and attributes of DES, AES, and MD blocks.
*/
cha_vid = rd_reg32(&priv->ctrl->perfmon.cha_id_ls);
cha_inst = rd_reg32(&priv->ctrl->perfmon.cha_num_ls);
des_inst = (cha_inst & CHA_ID_LS_DES_MASK) >> CHA_ID_LS_DES_SHIFT;
aes_inst = (cha_inst & CHA_ID_LS_AES_MASK) >> CHA_ID_LS_AES_SHIFT;
md_inst = (cha_inst & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
if (priv->era < 10) {
u32 cha_vid, cha_inst;
cha_vid = rd_reg32(&priv->ctrl->perfmon.cha_id_ls);
aes_vid = cha_vid & CHA_ID_LS_AES_MASK;
md_vid = (cha_vid & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
cha_inst = rd_reg32(&priv->ctrl->perfmon.cha_num_ls);
des_inst = (cha_inst & CHA_ID_LS_DES_MASK) >>
CHA_ID_LS_DES_SHIFT;
aes_inst = cha_inst & CHA_ID_LS_AES_MASK;
md_inst = (cha_inst & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
ccha_inst = 0;
ptha_inst = 0;
} else {
u32 aesa, mdha;
aesa = rd_reg32(&priv->ctrl->vreg.aesa);
mdha = rd_reg32(&priv->ctrl->vreg.mdha);
aes_vid = (aesa & CHA_VER_VID_MASK) >> CHA_VER_VID_SHIFT;
md_vid = (mdha & CHA_VER_VID_MASK) >> CHA_VER_VID_SHIFT;
des_inst = rd_reg32(&priv->ctrl->vreg.desa) & CHA_VER_NUM_MASK;
aes_inst = aesa & CHA_VER_NUM_MASK;
md_inst = mdha & CHA_VER_NUM_MASK;
ccha_inst = rd_reg32(&priv->ctrl->vreg.ccha) & CHA_VER_NUM_MASK;
ptha_inst = rd_reg32(&priv->ctrl->vreg.ptha) & CHA_VER_NUM_MASK;
}
/* If MD is present, limit digest size based on LP256 */
if (md_inst && ((cha_vid & CHA_ID_LS_MD_MASK) == CHA_ID_LS_MD_LP256))
if (md_inst && md_vid == CHA_VER_VID_MD_LP256)
md_limit = SHA256_DIGEST_SIZE;
for (i = 0; i < ARRAY_SIZE(driver_algs); i++) {
@ -3196,10 +3421,10 @@ static int __init caam_algapi_init(void)
* Check support for AES modes not available
* on LP devices.
*/
if ((cha_vid & CHA_ID_LS_AES_MASK) == CHA_ID_LS_AES_LP)
if ((t_alg->caam.class1_alg_type & OP_ALG_AAI_MASK) ==
OP_ALG_AAI_XTS)
continue;
if (aes_vid == CHA_VER_VID_AES_LP &&
(t_alg->caam.class1_alg_type & OP_ALG_AAI_MASK) ==
OP_ALG_AAI_XTS)
continue;
caam_skcipher_alg_init(t_alg);
@ -3232,21 +3457,28 @@ static int __init caam_algapi_init(void)
if (!aes_inst && (c1_alg_sel == OP_ALG_ALGSEL_AES))
continue;
/* Skip CHACHA20 algorithms if not supported by device */
if (c1_alg_sel == OP_ALG_ALGSEL_CHACHA20 && !ccha_inst)
continue;
/* Skip POLY1305 algorithms if not supported by device */
if (c2_alg_sel == OP_ALG_ALGSEL_POLY1305 && !ptha_inst)
continue;
/*
* Check support for AES algorithms not available
* on LP devices.
*/
if ((cha_vid & CHA_ID_LS_AES_MASK) == CHA_ID_LS_AES_LP)
if (alg_aai == OP_ALG_AAI_GCM)
continue;
if (aes_vid == CHA_VER_VID_AES_LP && alg_aai == OP_ALG_AAI_GCM)
continue;
/*
* Skip algorithms requiring message digests
* if MD or MD size is not supported by device.
*/
if (c2_alg_sel &&
(!md_inst || (t_alg->aead.maxauthsize > md_limit)))
continue;
if ((c2_alg_sel & ~OP_ALG_ALGSEL_SUBMASK) == 0x40 &&
(!md_inst || t_alg->aead.maxauthsize > md_limit))
continue;
caam_aead_alg_init(t_alg);

View File

@ -1213,6 +1213,139 @@ void cnstr_shdsc_rfc4543_decap(u32 * const desc, struct alginfo *cdata,
}
EXPORT_SYMBOL(cnstr_shdsc_rfc4543_decap);
/**
* cnstr_shdsc_chachapoly - Chacha20 + Poly1305 generic AEAD (rfc7539) and
* IPsec ESP (rfc7634, a.k.a. rfc7539esp) shared
* descriptor (non-protocol).
* @desc: pointer to buffer used for descriptor construction
* @cdata: pointer to block cipher transform definitions
* Valid algorithm values - OP_ALG_ALGSEL_CHACHA20 ANDed with
* OP_ALG_AAI_AEAD.
* @adata: pointer to authentication transform definitions
* Valid algorithm values - OP_ALG_ALGSEL_POLY1305 ANDed with
* OP_ALG_AAI_AEAD.
* @ivsize: initialization vector size
* @icvsize: integrity check value (ICV) size (truncated or full)
* @encap: true if encapsulation, false if decapsulation
* @is_qi: true when called from caam/qi
*/
void cnstr_shdsc_chachapoly(u32 * const desc, struct alginfo *cdata,
struct alginfo *adata, unsigned int ivsize,
unsigned int icvsize, const bool encap,
const bool is_qi)
{
u32 *key_jump_cmd, *wait_cmd;
u32 nfifo;
const bool is_ipsec = (ivsize != CHACHAPOLY_IV_SIZE);
/* Note: Context registers are saved. */
init_sh_desc(desc, HDR_SHARE_SERIAL | HDR_SAVECTX);
/* skip key loading if they are loaded due to sharing */
key_jump_cmd = append_jump(desc, JUMP_JSL | JUMP_TEST_ALL |
JUMP_COND_SHRD);
append_key_as_imm(desc, cdata->key_virt, cdata->keylen, cdata->keylen,
CLASS_1 | KEY_DEST_CLASS_REG);
/* For IPsec load the salt from keymat in the context register */
if (is_ipsec)
append_load_as_imm(desc, cdata->key_virt + cdata->keylen, 4,
LDST_CLASS_1_CCB | LDST_SRCDST_BYTE_CONTEXT |
4 << LDST_OFFSET_SHIFT);
set_jump_tgt_here(desc, key_jump_cmd);
/* Class 2 and 1 operations: Poly & ChaCha */
if (encap) {
append_operation(desc, adata->algtype | OP_ALG_AS_INITFINAL |
OP_ALG_ENCRYPT);
append_operation(desc, cdata->algtype | OP_ALG_AS_INITFINAL |
OP_ALG_ENCRYPT);
} else {
append_operation(desc, adata->algtype | OP_ALG_AS_INITFINAL |
OP_ALG_DECRYPT | OP_ALG_ICV_ON);
append_operation(desc, cdata->algtype | OP_ALG_AS_INITFINAL |
OP_ALG_DECRYPT);
}
if (is_qi) {
u32 *wait_load_cmd;
u32 ctx1_iv_off = is_ipsec ? 8 : 4;
/* REG3 = assoclen */
append_seq_load(desc, 4, LDST_CLASS_DECO |
LDST_SRCDST_WORD_DECO_MATH3 |
4 << LDST_OFFSET_SHIFT);
wait_load_cmd = append_jump(desc, JUMP_JSL | JUMP_TEST_ALL |
JUMP_COND_CALM | JUMP_COND_NCP |
JUMP_COND_NOP | JUMP_COND_NIP |
JUMP_COND_NIFP);
set_jump_tgt_here(desc, wait_load_cmd);
append_seq_load(desc, ivsize, LDST_CLASS_1_CCB |
LDST_SRCDST_BYTE_CONTEXT |
ctx1_iv_off << LDST_OFFSET_SHIFT);
}
/*
* MAGIC with NFIFO
* Read associated data from the input and send them to class1 and
* class2 alignment blocks. From class1 send data to output fifo and
* then write it to memory since we don't need to encrypt AD.
*/
nfifo = NFIFOENTRY_DEST_BOTH | NFIFOENTRY_FC1 | NFIFOENTRY_FC2 |
NFIFOENTRY_DTYPE_POLY | NFIFOENTRY_BND;
append_load_imm_u32(desc, nfifo, LDST_CLASS_IND_CCB |
LDST_SRCDST_WORD_INFO_FIFO_SM | LDLEN_MATH3);
append_math_add(desc, VARSEQINLEN, ZERO, REG3, CAAM_CMD_SZ);
append_math_add(desc, VARSEQOUTLEN, ZERO, REG3, CAAM_CMD_SZ);
append_seq_fifo_load(desc, 0, FIFOLD_TYPE_NOINFOFIFO |
FIFOLD_CLASS_CLASS1 | LDST_VLF);
append_move_len(desc, MOVE_AUX_LS | MOVE_SRC_AUX_ABLK |
MOVE_DEST_OUTFIFO | MOVELEN_MRSEL_MATH3);
append_seq_fifo_store(desc, 0, FIFOST_TYPE_MESSAGE_DATA | LDST_VLF);
/* IPsec - copy IV at the output */
if (is_ipsec)
append_seq_fifo_store(desc, ivsize, FIFOST_TYPE_METADATA |
0x2 << 25);
wait_cmd = append_jump(desc, JUMP_JSL | JUMP_TYPE_LOCAL |
JUMP_COND_NOP | JUMP_TEST_ALL);
set_jump_tgt_here(desc, wait_cmd);
if (encap) {
/* Read and write cryptlen bytes */
append_math_add(desc, VARSEQINLEN, SEQINLEN, REG0, CAAM_CMD_SZ);
append_math_add(desc, VARSEQOUTLEN, SEQINLEN, REG0,
CAAM_CMD_SZ);
aead_append_src_dst(desc, FIFOLD_TYPE_MSG1OUT2);
/* Write ICV */
append_seq_store(desc, icvsize, LDST_CLASS_2_CCB |
LDST_SRCDST_BYTE_CONTEXT);
} else {
/* Read and write cryptlen bytes */
append_math_add(desc, VARSEQINLEN, SEQOUTLEN, REG0,
CAAM_CMD_SZ);
append_math_add(desc, VARSEQOUTLEN, SEQOUTLEN, REG0,
CAAM_CMD_SZ);
aead_append_src_dst(desc, FIFOLD_TYPE_MSG);
/* Load ICV for verification */
append_seq_fifo_load(desc, icvsize, FIFOLD_CLASS_CLASS2 |
FIFOLD_TYPE_LAST2 | FIFOLD_TYPE_ICV);
}
print_hex_dump_debug("chachapoly shdesc@" __stringify(__LINE__)": ",
DUMP_PREFIX_ADDRESS, 16, 4, desc, desc_bytes(desc),
1);
}
EXPORT_SYMBOL(cnstr_shdsc_chachapoly);
/* For skcipher encrypt and decrypt, read from req->src and write to req->dst */
static inline void skcipher_append_src_dst(u32 *desc)
{
@ -1228,7 +1361,8 @@ static inline void skcipher_append_src_dst(u32 *desc)
* @desc: pointer to buffer used for descriptor construction
* @cdata: pointer to block cipher transform definitions
* Valid algorithm values - one of OP_ALG_ALGSEL_{AES, DES, 3DES} ANDed
* with OP_ALG_AAI_CBC or OP_ALG_AAI_CTR_MOD128.
* with OP_ALG_AAI_CBC or OP_ALG_AAI_CTR_MOD128
* - OP_ALG_ALGSEL_CHACHA20
* @ivsize: initialization vector size
* @is_rfc3686: true when ctr(aes) is wrapped by rfc3686 template
* @ctx1_iv_off: IV offset in CONTEXT1 register
@ -1293,7 +1427,8 @@ EXPORT_SYMBOL(cnstr_shdsc_skcipher_encap);
* @desc: pointer to buffer used for descriptor construction
* @cdata: pointer to block cipher transform definitions
* Valid algorithm values - one of OP_ALG_ALGSEL_{AES, DES, 3DES} ANDed
* with OP_ALG_AAI_CBC or OP_ALG_AAI_CTR_MOD128.
* with OP_ALG_AAI_CBC or OP_ALG_AAI_CTR_MOD128
* - OP_ALG_ALGSEL_CHACHA20
* @ivsize: initialization vector size
* @is_rfc3686: true when ctr(aes) is wrapped by rfc3686 template
* @ctx1_iv_off: IV offset in CONTEXT1 register

View File

@ -96,6 +96,11 @@ void cnstr_shdsc_rfc4543_decap(u32 * const desc, struct alginfo *cdata,
unsigned int ivsize, unsigned int icvsize,
const bool is_qi);
void cnstr_shdsc_chachapoly(u32 * const desc, struct alginfo *cdata,
struct alginfo *adata, unsigned int ivsize,
unsigned int icvsize, const bool encap,
const bool is_qi);
void cnstr_shdsc_skcipher_encap(u32 * const desc, struct alginfo *cdata,
unsigned int ivsize, const bool is_rfc3686,
const u32 ctx1_iv_off);

View File

@ -2462,7 +2462,7 @@ static int __init caam_qi_algapi_init(void)
struct device *ctrldev;
struct caam_drv_private *priv;
int i = 0, err = 0;
u32 cha_vid, cha_inst, des_inst, aes_inst, md_inst;
u32 aes_vid, aes_inst, des_inst, md_vid, md_inst;
unsigned int md_limit = SHA512_DIGEST_SIZE;
bool registered = false;
@ -2497,14 +2497,34 @@ static int __init caam_qi_algapi_init(void)
* Register crypto algorithms the device supports.
* First, detect presence and attributes of DES, AES, and MD blocks.
*/
cha_vid = rd_reg32(&priv->ctrl->perfmon.cha_id_ls);
cha_inst = rd_reg32(&priv->ctrl->perfmon.cha_num_ls);
des_inst = (cha_inst & CHA_ID_LS_DES_MASK) >> CHA_ID_LS_DES_SHIFT;
aes_inst = (cha_inst & CHA_ID_LS_AES_MASK) >> CHA_ID_LS_AES_SHIFT;
md_inst = (cha_inst & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
if (priv->era < 10) {
u32 cha_vid, cha_inst;
cha_vid = rd_reg32(&priv->ctrl->perfmon.cha_id_ls);
aes_vid = cha_vid & CHA_ID_LS_AES_MASK;
md_vid = (cha_vid & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
cha_inst = rd_reg32(&priv->ctrl->perfmon.cha_num_ls);
des_inst = (cha_inst & CHA_ID_LS_DES_MASK) >>
CHA_ID_LS_DES_SHIFT;
aes_inst = cha_inst & CHA_ID_LS_AES_MASK;
md_inst = (cha_inst & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
} else {
u32 aesa, mdha;
aesa = rd_reg32(&priv->ctrl->vreg.aesa);
mdha = rd_reg32(&priv->ctrl->vreg.mdha);
aes_vid = (aesa & CHA_VER_VID_MASK) >> CHA_VER_VID_SHIFT;
md_vid = (mdha & CHA_VER_VID_MASK) >> CHA_VER_VID_SHIFT;
des_inst = rd_reg32(&priv->ctrl->vreg.desa) & CHA_VER_NUM_MASK;
aes_inst = aesa & CHA_VER_NUM_MASK;
md_inst = mdha & CHA_VER_NUM_MASK;
}
/* If MD is present, limit digest size based on LP256 */
if (md_inst && ((cha_vid & CHA_ID_LS_MD_MASK) == CHA_ID_LS_MD_LP256))
if (md_inst && md_vid == CHA_VER_VID_MD_LP256)
md_limit = SHA256_DIGEST_SIZE;
for (i = 0; i < ARRAY_SIZE(driver_algs); i++) {
@ -2556,8 +2576,7 @@ static int __init caam_qi_algapi_init(void)
* Check support for AES algorithms not available
* on LP devices.
*/
if (((cha_vid & CHA_ID_LS_AES_MASK) == CHA_ID_LS_AES_LP) &&
(alg_aai == OP_ALG_AAI_GCM))
if (aes_vid == CHA_VER_VID_AES_LP && alg_aai == OP_ALG_AAI_GCM)
continue;
/*

View File

@ -462,7 +462,15 @@ static struct aead_edesc *aead_edesc_alloc(struct aead_request *req,
edesc->dst_nents = dst_nents;
edesc->iv_dma = iv_dma;
edesc->assoclen = cpu_to_caam32(req->assoclen);
if ((alg->caam.class1_alg_type & OP_ALG_ALGSEL_MASK) ==
OP_ALG_ALGSEL_CHACHA20 && ivsize != CHACHAPOLY_IV_SIZE)
/*
* The associated data comes already with the IV but we need
* to skip it when we authenticate or encrypt...
*/
edesc->assoclen = cpu_to_caam32(req->assoclen - ivsize);
else
edesc->assoclen = cpu_to_caam32(req->assoclen);
edesc->assoclen_dma = dma_map_single(dev, &edesc->assoclen, 4,
DMA_TO_DEVICE);
if (dma_mapping_error(dev, edesc->assoclen_dma)) {
@ -532,6 +540,68 @@ static struct aead_edesc *aead_edesc_alloc(struct aead_request *req,
return edesc;
}
static int chachapoly_set_sh_desc(struct crypto_aead *aead)
{
struct caam_ctx *ctx = crypto_aead_ctx(aead);
unsigned int ivsize = crypto_aead_ivsize(aead);
struct device *dev = ctx->dev;
struct caam_flc *flc;
u32 *desc;
if (!ctx->cdata.keylen || !ctx->authsize)
return 0;
flc = &ctx->flc[ENCRYPT];
desc = flc->sh_desc;
cnstr_shdsc_chachapoly(desc, &ctx->cdata, &ctx->adata, ivsize,
ctx->authsize, true, true);
flc->flc[1] = cpu_to_caam32(desc_len(desc)); /* SDL */
dma_sync_single_for_device(dev, ctx->flc_dma[ENCRYPT],
sizeof(flc->flc) + desc_bytes(desc),
ctx->dir);
flc = &ctx->flc[DECRYPT];
desc = flc->sh_desc;
cnstr_shdsc_chachapoly(desc, &ctx->cdata, &ctx->adata, ivsize,
ctx->authsize, false, true);
flc->flc[1] = cpu_to_caam32(desc_len(desc)); /* SDL */
dma_sync_single_for_device(dev, ctx->flc_dma[DECRYPT],
sizeof(flc->flc) + desc_bytes(desc),
ctx->dir);
return 0;
}
static int chachapoly_setauthsize(struct crypto_aead *aead,
unsigned int authsize)
{
struct caam_ctx *ctx = crypto_aead_ctx(aead);
if (authsize != POLY1305_DIGEST_SIZE)
return -EINVAL;
ctx->authsize = authsize;
return chachapoly_set_sh_desc(aead);
}
static int chachapoly_setkey(struct crypto_aead *aead, const u8 *key,
unsigned int keylen)
{
struct caam_ctx *ctx = crypto_aead_ctx(aead);
unsigned int ivsize = crypto_aead_ivsize(aead);
unsigned int saltlen = CHACHAPOLY_IV_SIZE - ivsize;
if (keylen != CHACHA_KEY_SIZE + saltlen) {
crypto_aead_set_flags(aead, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
ctx->cdata.key_virt = key;
ctx->cdata.keylen = keylen - saltlen;
return chachapoly_set_sh_desc(aead);
}
static int gcm_set_sh_desc(struct crypto_aead *aead)
{
struct caam_ctx *ctx = crypto_aead_ctx(aead);
@ -816,7 +886,9 @@ static int skcipher_setkey(struct crypto_skcipher *skcipher, const u8 *key,
u32 *desc;
u32 ctx1_iv_off = 0;
const bool ctr_mode = ((ctx->cdata.algtype & OP_ALG_AAI_MASK) ==
OP_ALG_AAI_CTR_MOD128);
OP_ALG_AAI_CTR_MOD128) &&
((ctx->cdata.algtype & OP_ALG_ALGSEL_MASK) !=
OP_ALG_ALGSEL_CHACHA20);
const bool is_rfc3686 = alg->caam.rfc3686;
print_hex_dump_debug("key in @" __stringify(__LINE__)": ",
@ -1494,7 +1566,23 @@ static struct caam_skcipher_alg driver_algs[] = {
.ivsize = AES_BLOCK_SIZE,
},
.caam.class1_alg_type = OP_ALG_ALGSEL_AES | OP_ALG_AAI_XTS,
}
},
{
.skcipher = {
.base = {
.cra_name = "chacha20",
.cra_driver_name = "chacha20-caam-qi2",
.cra_blocksize = 1,
},
.setkey = skcipher_setkey,
.encrypt = skcipher_encrypt,
.decrypt = skcipher_decrypt,
.min_keysize = CHACHA_KEY_SIZE,
.max_keysize = CHACHA_KEY_SIZE,
.ivsize = CHACHA_IV_SIZE,
},
.caam.class1_alg_type = OP_ALG_ALGSEL_CHACHA20,
},
};
static struct caam_aead_alg driver_aeads[] = {
@ -2608,6 +2696,50 @@ static struct caam_aead_alg driver_aeads[] = {
.geniv = true,
},
},
{
.aead = {
.base = {
.cra_name = "rfc7539(chacha20,poly1305)",
.cra_driver_name = "rfc7539-chacha20-poly1305-"
"caam-qi2",
.cra_blocksize = 1,
},
.setkey = chachapoly_setkey,
.setauthsize = chachapoly_setauthsize,
.encrypt = aead_encrypt,
.decrypt = aead_decrypt,
.ivsize = CHACHAPOLY_IV_SIZE,
.maxauthsize = POLY1305_DIGEST_SIZE,
},
.caam = {
.class1_alg_type = OP_ALG_ALGSEL_CHACHA20 |
OP_ALG_AAI_AEAD,
.class2_alg_type = OP_ALG_ALGSEL_POLY1305 |
OP_ALG_AAI_AEAD,
},
},
{
.aead = {
.base = {
.cra_name = "rfc7539esp(chacha20,poly1305)",
.cra_driver_name = "rfc7539esp-chacha20-"
"poly1305-caam-qi2",
.cra_blocksize = 1,
},
.setkey = chachapoly_setkey,
.setauthsize = chachapoly_setauthsize,
.encrypt = aead_encrypt,
.decrypt = aead_decrypt,
.ivsize = 8,
.maxauthsize = POLY1305_DIGEST_SIZE,
},
.caam = {
.class1_alg_type = OP_ALG_ALGSEL_CHACHA20 |
OP_ALG_AAI_AEAD,
.class2_alg_type = OP_ALG_ALGSEL_POLY1305 |
OP_ALG_AAI_AEAD,
},
},
{
.aead = {
.base = {
@ -4908,6 +5040,11 @@ static int dpaa2_caam_probe(struct fsl_mc_device *dpseci_dev)
alg_sel == OP_ALG_ALGSEL_AES)
continue;
/* Skip CHACHA20 algorithms if not supported by device */
if (alg_sel == OP_ALG_ALGSEL_CHACHA20 &&
!priv->sec_attr.ccha_acc_num)
continue;
t_alg->caam.dev = dev;
caam_skcipher_alg_init(t_alg);
@ -4940,11 +5077,22 @@ static int dpaa2_caam_probe(struct fsl_mc_device *dpseci_dev)
c1_alg_sel == OP_ALG_ALGSEL_AES)
continue;
/* Skip CHACHA20 algorithms if not supported by device */
if (c1_alg_sel == OP_ALG_ALGSEL_CHACHA20 &&
!priv->sec_attr.ccha_acc_num)
continue;
/* Skip POLY1305 algorithms if not supported by device */
if (c2_alg_sel == OP_ALG_ALGSEL_POLY1305 &&
!priv->sec_attr.ptha_acc_num)
continue;
/*
* Skip algorithms requiring message digests
* if MD not supported by device.
*/
if (!priv->sec_attr.md_acc_num && c2_alg_sel)
if ((c2_alg_sel & ~OP_ALG_ALGSEL_SUBMASK) == 0x40 &&
!priv->sec_attr.md_acc_num)
continue;
t_alg->caam.dev = dev;

View File

@ -3,6 +3,7 @@
* caam - Freescale FSL CAAM support for ahash functions of crypto API
*
* Copyright 2011 Freescale Semiconductor, Inc.
* Copyright 2018 NXP
*
* Based on caamalg.c crypto API driver.
*
@ -1801,7 +1802,7 @@ static int __init caam_algapi_hash_init(void)
int i = 0, err = 0;
struct caam_drv_private *priv;
unsigned int md_limit = SHA512_DIGEST_SIZE;
u32 cha_inst, cha_vid;
u32 md_inst, md_vid;
dev_node = of_find_compatible_node(NULL, NULL, "fsl,sec-v4.0");
if (!dev_node) {
@ -1831,18 +1832,27 @@ static int __init caam_algapi_hash_init(void)
* Register crypto algorithms the device supports. First, identify
* presence and attributes of MD block.
*/
cha_vid = rd_reg32(&priv->ctrl->perfmon.cha_id_ls);
cha_inst = rd_reg32(&priv->ctrl->perfmon.cha_num_ls);
if (priv->era < 10) {
md_vid = (rd_reg32(&priv->ctrl->perfmon.cha_id_ls) &
CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
md_inst = (rd_reg32(&priv->ctrl->perfmon.cha_num_ls) &
CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT;
} else {
u32 mdha = rd_reg32(&priv->ctrl->vreg.mdha);
md_vid = (mdha & CHA_VER_VID_MASK) >> CHA_VER_VID_SHIFT;
md_inst = mdha & CHA_VER_NUM_MASK;
}
/*
* Skip registration of any hashing algorithms if MD block
* is not present.
*/
if (!((cha_inst & CHA_ID_LS_MD_MASK) >> CHA_ID_LS_MD_SHIFT))
if (!md_inst)
return -ENODEV;
/* Limit digest size based on LP256 */
if ((cha_vid & CHA_ID_LS_MD_MASK) == CHA_ID_LS_MD_LP256)
if (md_vid == CHA_VER_VID_MD_LP256)
md_limit = SHA256_DIGEST_SIZE;
INIT_LIST_HEAD(&hash_list);

View File

@ -3,6 +3,7 @@
* caam - Freescale FSL CAAM support for Public Key Cryptography
*
* Copyright 2016 Freescale Semiconductor, Inc.
* Copyright 2018 NXP
*
* There is no Shared Descriptor for PKC so that the Job Descriptor must carry
* all the desired key parameters, input and output pointers.
@ -1017,7 +1018,7 @@ static int __init caam_pkc_init(void)
struct platform_device *pdev;
struct device *ctrldev;
struct caam_drv_private *priv;
u32 cha_inst, pk_inst;
u32 pk_inst;
int err;
dev_node = of_find_compatible_node(NULL, NULL, "fsl,sec-v4.0");
@ -1045,8 +1046,11 @@ static int __init caam_pkc_init(void)
return -ENODEV;
/* Determine public key hardware accelerator presence. */
cha_inst = rd_reg32(&priv->ctrl->perfmon.cha_num_ls);
pk_inst = (cha_inst & CHA_ID_LS_PK_MASK) >> CHA_ID_LS_PK_SHIFT;
if (priv->era < 10)
pk_inst = (rd_reg32(&priv->ctrl->perfmon.cha_num_ls) &
CHA_ID_LS_PK_MASK) >> CHA_ID_LS_PK_SHIFT;
else
pk_inst = rd_reg32(&priv->ctrl->vreg.pkha) & CHA_VER_NUM_MASK;
/* Do not register algorithms if PKHA is not present. */
if (!pk_inst)

View File

@ -3,6 +3,7 @@
* caam - Freescale FSL CAAM support for hw_random
*
* Copyright 2011 Freescale Semiconductor, Inc.
* Copyright 2018 NXP
*
* Based on caamalg.c crypto API driver.
*
@ -309,6 +310,7 @@ static int __init caam_rng_init(void)
struct platform_device *pdev;
struct device *ctrldev;
struct caam_drv_private *priv;
u32 rng_inst;
int err;
dev_node = of_find_compatible_node(NULL, NULL, "fsl,sec-v4.0");
@ -336,7 +338,13 @@ static int __init caam_rng_init(void)
return -ENODEV;
/* Check for an instantiated RNG before registration */
if (!(rd_reg32(&priv->ctrl->perfmon.cha_num_ls) & CHA_ID_LS_RNG_MASK))
if (priv->era < 10)
rng_inst = (rd_reg32(&priv->ctrl->perfmon.cha_num_ls) &
CHA_ID_LS_RNG_MASK) >> CHA_ID_LS_RNG_SHIFT;
else
rng_inst = rd_reg32(&priv->ctrl->vreg.rng) & CHA_VER_NUM_MASK;
if (!rng_inst)
return -ENODEV;
dev = caam_jr_alloc();

View File

@ -36,6 +36,8 @@
#include <crypto/gcm.h>
#include <crypto/sha.h>
#include <crypto/md5.h>
#include <crypto/chacha.h>
#include <crypto/poly1305.h>
#include <crypto/internal/aead.h>
#include <crypto/authenc.h>
#include <crypto/akcipher.h>

View File

@ -3,6 +3,7 @@
* Controller-level driver, kernel property detection, initialization
*
* Copyright 2008-2012 Freescale Semiconductor, Inc.
* Copyright 2018 NXP
*/
#include <linux/device.h>
@ -106,7 +107,7 @@ static inline int run_descriptor_deco0(struct device *ctrldev, u32 *desc,
struct caam_ctrl __iomem *ctrl = ctrlpriv->ctrl;
struct caam_deco __iomem *deco = ctrlpriv->deco;
unsigned int timeout = 100000;
u32 deco_dbg_reg, flags;
u32 deco_dbg_reg, deco_state, flags;
int i;
@ -149,13 +150,22 @@ static inline int run_descriptor_deco0(struct device *ctrldev, u32 *desc,
timeout = 10000000;
do {
deco_dbg_reg = rd_reg32(&deco->desc_dbg);
if (ctrlpriv->era < 10)
deco_state = (deco_dbg_reg & DESC_DBG_DECO_STAT_MASK) >>
DESC_DBG_DECO_STAT_SHIFT;
else
deco_state = (rd_reg32(&deco->dbg_exec) &
DESC_DER_DECO_STAT_MASK) >>
DESC_DER_DECO_STAT_SHIFT;
/*
* If an error occured in the descriptor, then
* the DECO status field will be set to 0x0D
*/
if ((deco_dbg_reg & DESC_DBG_DECO_STAT_MASK) ==
DESC_DBG_DECO_STAT_HOST_ERR)
if (deco_state == DECO_STAT_HOST_ERR)
break;
cpu_relax();
} while ((deco_dbg_reg & DESC_DBG_DECO_STAT_VALID) && --timeout);
@ -491,7 +501,7 @@ static int caam_probe(struct platform_device *pdev)
struct caam_perfmon *perfmon;
#endif
u32 scfgr, comp_params;
u32 cha_vid_ls;
u8 rng_vid;
int pg_size;
int BLOCK_OFFSET = 0;
@ -733,15 +743,19 @@ static int caam_probe(struct platform_device *pdev)
goto caam_remove;
}
cha_vid_ls = rd_reg32(&ctrl->perfmon.cha_id_ls);
if (ctrlpriv->era < 10)
rng_vid = (rd_reg32(&ctrl->perfmon.cha_id_ls) &
CHA_ID_LS_RNG_MASK) >> CHA_ID_LS_RNG_SHIFT;
else
rng_vid = (rd_reg32(&ctrl->vreg.rng) & CHA_VER_VID_MASK) >>
CHA_VER_VID_SHIFT;
/*
* If SEC has RNG version >= 4 and RNG state handle has not been
* already instantiated, do RNG instantiation
* In case of SoCs with Management Complex, RNG is managed by MC f/w.
*/
if (!ctrlpriv->mc_en &&
(cha_vid_ls & CHA_ID_LS_RNG_MASK) >> CHA_ID_LS_RNG_SHIFT >= 4) {
if (!ctrlpriv->mc_en && rng_vid >= 4) {
ctrlpriv->rng4_sh_init =
rd_reg32(&ctrl->r4tst[0].rdsta);
/*

View File

@ -4,6 +4,7 @@
* Definitions to support CAAM descriptor instruction generation
*
* Copyright 2008-2011 Freescale Semiconductor, Inc.
* Copyright 2018 NXP
*/
#ifndef DESC_H
@ -242,6 +243,7 @@
#define LDST_SRCDST_WORD_DESCBUF_SHARED (0x42 << LDST_SRCDST_SHIFT)
#define LDST_SRCDST_WORD_DESCBUF_JOB_WE (0x45 << LDST_SRCDST_SHIFT)
#define LDST_SRCDST_WORD_DESCBUF_SHARED_WE (0x46 << LDST_SRCDST_SHIFT)
#define LDST_SRCDST_WORD_INFO_FIFO_SM (0x71 << LDST_SRCDST_SHIFT)
#define LDST_SRCDST_WORD_INFO_FIFO (0x7a << LDST_SRCDST_SHIFT)
/* Offset in source/destination */
@ -284,6 +286,12 @@
#define LDLEN_SET_OFIFO_OFFSET_SHIFT 0
#define LDLEN_SET_OFIFO_OFFSET_MASK (3 << LDLEN_SET_OFIFO_OFFSET_SHIFT)
/* Special Length definitions when dst=sm, nfifo-{sm,m} */
#define LDLEN_MATH0 0
#define LDLEN_MATH1 1
#define LDLEN_MATH2 2
#define LDLEN_MATH3 3
/*
* FIFO_LOAD/FIFO_STORE/SEQ_FIFO_LOAD/SEQ_FIFO_STORE
* Command Constructs
@ -408,6 +416,7 @@
#define FIFOST_TYPE_MESSAGE_DATA (0x30 << FIFOST_TYPE_SHIFT)
#define FIFOST_TYPE_RNGSTORE (0x34 << FIFOST_TYPE_SHIFT)
#define FIFOST_TYPE_RNGFIFO (0x35 << FIFOST_TYPE_SHIFT)
#define FIFOST_TYPE_METADATA (0x3e << FIFOST_TYPE_SHIFT)
#define FIFOST_TYPE_SKIP (0x3f << FIFOST_TYPE_SHIFT)
/*
@ -1133,6 +1142,12 @@
#define OP_ALG_TYPE_CLASS1 (2 << OP_ALG_TYPE_SHIFT)
#define OP_ALG_TYPE_CLASS2 (4 << OP_ALG_TYPE_SHIFT)
/* version register fields */
#define OP_VER_CCHA_NUM 0x000000ff /* Number CCHAs instantiated */
#define OP_VER_CCHA_MISC 0x0000ff00 /* CCHA Miscellaneous Information */
#define OP_VER_CCHA_REV 0x00ff0000 /* CCHA Revision Number */
#define OP_VER_CCHA_VID 0xff000000 /* CCHA Version ID */
#define OP_ALG_ALGSEL_SHIFT 16
#define OP_ALG_ALGSEL_MASK (0xff << OP_ALG_ALGSEL_SHIFT)
#define OP_ALG_ALGSEL_SUBMASK (0x0f << OP_ALG_ALGSEL_SHIFT)
@ -1152,6 +1167,8 @@
#define OP_ALG_ALGSEL_KASUMI (0x70 << OP_ALG_ALGSEL_SHIFT)
#define OP_ALG_ALGSEL_CRC (0x90 << OP_ALG_ALGSEL_SHIFT)
#define OP_ALG_ALGSEL_SNOW_F9 (0xA0 << OP_ALG_ALGSEL_SHIFT)
#define OP_ALG_ALGSEL_CHACHA20 (0xD0 << OP_ALG_ALGSEL_SHIFT)
#define OP_ALG_ALGSEL_POLY1305 (0xE0 << OP_ALG_ALGSEL_SHIFT)
#define OP_ALG_AAI_SHIFT 4
#define OP_ALG_AAI_MASK (0x1ff << OP_ALG_AAI_SHIFT)
@ -1199,6 +1216,11 @@
#define OP_ALG_AAI_RNG4_AI (0x80 << OP_ALG_AAI_SHIFT)
#define OP_ALG_AAI_RNG4_SK (0x100 << OP_ALG_AAI_SHIFT)
/* Chacha20 AAI set */
#define OP_ALG_AAI_AEAD (0x002 << OP_ALG_AAI_SHIFT)
#define OP_ALG_AAI_KEYSTREAM (0x001 << OP_ALG_AAI_SHIFT)
#define OP_ALG_AAI_BC8 (0x008 << OP_ALG_AAI_SHIFT)
/* hmac/smac AAI set */
#define OP_ALG_AAI_HASH (0x00 << OP_ALG_AAI_SHIFT)
#define OP_ALG_AAI_HMAC (0x01 << OP_ALG_AAI_SHIFT)
@ -1387,6 +1409,7 @@
#define MOVE_SRC_MATH3 (0x07 << MOVE_SRC_SHIFT)
#define MOVE_SRC_INFIFO (0x08 << MOVE_SRC_SHIFT)
#define MOVE_SRC_INFIFO_CL (0x09 << MOVE_SRC_SHIFT)
#define MOVE_SRC_AUX_ABLK (0x0a << MOVE_SRC_SHIFT)
#define MOVE_DEST_SHIFT 16
#define MOVE_DEST_MASK (0x0f << MOVE_DEST_SHIFT)
@ -1413,6 +1436,10 @@
#define MOVELEN_MRSEL_SHIFT 0
#define MOVELEN_MRSEL_MASK (0x3 << MOVE_LEN_SHIFT)
#define MOVELEN_MRSEL_MATH0 (0 << MOVELEN_MRSEL_SHIFT)
#define MOVELEN_MRSEL_MATH1 (1 << MOVELEN_MRSEL_SHIFT)
#define MOVELEN_MRSEL_MATH2 (2 << MOVELEN_MRSEL_SHIFT)
#define MOVELEN_MRSEL_MATH3 (3 << MOVELEN_MRSEL_SHIFT)
/*
* MATH Command Constructs
@ -1589,6 +1616,7 @@
#define NFIFOENTRY_DTYPE_IV (0x2 << NFIFOENTRY_DTYPE_SHIFT)
#define NFIFOENTRY_DTYPE_SAD (0x3 << NFIFOENTRY_DTYPE_SHIFT)
#define NFIFOENTRY_DTYPE_ICV (0xA << NFIFOENTRY_DTYPE_SHIFT)
#define NFIFOENTRY_DTYPE_POLY (0xB << NFIFOENTRY_DTYPE_SHIFT)
#define NFIFOENTRY_DTYPE_SKIP (0xE << NFIFOENTRY_DTYPE_SHIFT)
#define NFIFOENTRY_DTYPE_MSG (0xF << NFIFOENTRY_DTYPE_SHIFT)

View File

@ -189,6 +189,7 @@ static inline u32 *append_##cmd(u32 * const desc, u32 options) \
}
APPEND_CMD_RET(jump, JUMP)
APPEND_CMD_RET(move, MOVE)
APPEND_CMD_RET(move_len, MOVE_LEN)
static inline void set_jump_tgt_here(u32 * const desc, u32 *jump_cmd)
{
@ -327,7 +328,11 @@ static inline void append_##cmd##_imm_##type(u32 * const desc, type immediate, \
u32 options) \
{ \
PRINT_POS; \
append_cmd(desc, CMD_##op | IMMEDIATE | options | sizeof(type)); \
if (options & LDST_LEN_MASK) \
append_cmd(desc, CMD_##op | IMMEDIATE | options); \
else \
append_cmd(desc, CMD_##op | IMMEDIATE | options | \
sizeof(type)); \
append_cmd(desc, immediate); \
}
APPEND_CMD_RAW_IMM(load, LOAD, u32);

View File

@ -3,6 +3,7 @@
* CAAM hardware register-level view
*
* Copyright 2008-2011 Freescale Semiconductor, Inc.
* Copyright 2018 NXP
*/
#ifndef REGS_H
@ -211,6 +212,47 @@ struct jr_outentry {
u32 jrstatus; /* Status for completed descriptor */
} __packed;
/* Version registers (Era 10+) e80-eff */
struct version_regs {
u32 crca; /* CRCA_VERSION */
u32 afha; /* AFHA_VERSION */
u32 kfha; /* KFHA_VERSION */
u32 pkha; /* PKHA_VERSION */
u32 aesa; /* AESA_VERSION */
u32 mdha; /* MDHA_VERSION */
u32 desa; /* DESA_VERSION */
u32 snw8a; /* SNW8A_VERSION */
u32 snw9a; /* SNW9A_VERSION */
u32 zuce; /* ZUCE_VERSION */
u32 zuca; /* ZUCA_VERSION */
u32 ccha; /* CCHA_VERSION */
u32 ptha; /* PTHA_VERSION */
u32 rng; /* RNG_VERSION */
u32 trng; /* TRNG_VERSION */
u32 aaha; /* AAHA_VERSION */
u32 rsvd[10];
u32 sr; /* SR_VERSION */
u32 dma; /* DMA_VERSION */
u32 ai; /* AI_VERSION */
u32 qi; /* QI_VERSION */
u32 jr; /* JR_VERSION */
u32 deco; /* DECO_VERSION */
};
/* Version registers bitfields */
/* Number of CHAs instantiated */
#define CHA_VER_NUM_MASK 0xffull
/* CHA Miscellaneous Information */
#define CHA_VER_MISC_SHIFT 8
#define CHA_VER_MISC_MASK (0xffull << CHA_VER_MISC_SHIFT)
/* CHA Revision Number */
#define CHA_VER_REV_SHIFT 16
#define CHA_VER_REV_MASK (0xffull << CHA_VER_REV_SHIFT)
/* CHA Version ID */
#define CHA_VER_VID_SHIFT 24
#define CHA_VER_VID_MASK (0xffull << CHA_VER_VID_SHIFT)
/*
* caam_perfmon - Performance Monitor/Secure Memory Status/
* CAAM Global Status/Component Version IDs
@ -223,15 +265,13 @@ struct jr_outentry {
#define CHA_NUM_MS_DECONUM_MASK (0xfull << CHA_NUM_MS_DECONUM_SHIFT)
/*
* CHA version IDs / instantiation bitfields
* CHA version IDs / instantiation bitfields (< Era 10)
* Defined for use with the cha_id fields in perfmon, but the same shift/mask
* selectors can be used to pull out the number of instantiated blocks within
* cha_num fields in perfmon because the locations are the same.
*/
#define CHA_ID_LS_AES_SHIFT 0
#define CHA_ID_LS_AES_MASK (0xfull << CHA_ID_LS_AES_SHIFT)
#define CHA_ID_LS_AES_LP (0x3ull << CHA_ID_LS_AES_SHIFT)
#define CHA_ID_LS_AES_HP (0x4ull << CHA_ID_LS_AES_SHIFT)
#define CHA_ID_LS_DES_SHIFT 4
#define CHA_ID_LS_DES_MASK (0xfull << CHA_ID_LS_DES_SHIFT)
@ -241,9 +281,6 @@ struct jr_outentry {
#define CHA_ID_LS_MD_SHIFT 12
#define CHA_ID_LS_MD_MASK (0xfull << CHA_ID_LS_MD_SHIFT)
#define CHA_ID_LS_MD_LP256 (0x0ull << CHA_ID_LS_MD_SHIFT)
#define CHA_ID_LS_MD_LP512 (0x1ull << CHA_ID_LS_MD_SHIFT)
#define CHA_ID_LS_MD_HP (0x2ull << CHA_ID_LS_MD_SHIFT)
#define CHA_ID_LS_RNG_SHIFT 16
#define CHA_ID_LS_RNG_MASK (0xfull << CHA_ID_LS_RNG_SHIFT)
@ -269,6 +306,13 @@ struct jr_outentry {
#define CHA_ID_MS_JR_SHIFT 28
#define CHA_ID_MS_JR_MASK (0xfull << CHA_ID_MS_JR_SHIFT)
/* Specific CHA version IDs */
#define CHA_VER_VID_AES_LP 0x3ull
#define CHA_VER_VID_AES_HP 0x4ull
#define CHA_VER_VID_MD_LP256 0x0ull
#define CHA_VER_VID_MD_LP512 0x1ull
#define CHA_VER_VID_MD_HP 0x2ull
struct sec_vid {
u16 ip_id;
u8 maj_rev;
@ -479,8 +523,10 @@ struct caam_ctrl {
struct rng4tst r4tst[2];
};
u32 rsvd9[448];
u32 rsvd9[416];
/* Version registers - introduced with era 10 e80-eff */
struct version_regs vreg;
/* Performance Monitor f00-fff */
struct caam_perfmon perfmon;
};
@ -570,8 +616,10 @@ struct caam_job_ring {
u32 rsvd11;
u32 jrcommand; /* JRCRx - JobR command */
u32 rsvd12[932];
u32 rsvd12[900];
/* Version registers - introduced with era 10 e80-eff */
struct version_regs vreg;
/* Performance Monitor f00-fff */
struct caam_perfmon perfmon;
};
@ -878,13 +926,19 @@ struct caam_deco {
u32 rsvd29[48];
u32 descbuf[64]; /* DxDESB - Descriptor buffer */
u32 rscvd30[193];
#define DESC_DBG_DECO_STAT_HOST_ERR 0x00D00000
#define DESC_DBG_DECO_STAT_VALID 0x80000000
#define DESC_DBG_DECO_STAT_MASK 0x00F00000
#define DESC_DBG_DECO_STAT_SHIFT 20
u32 desc_dbg; /* DxDDR - DECO Debug Register */
u32 rsvd31[126];
u32 rsvd31[13];
#define DESC_DER_DECO_STAT_MASK 0x000F0000
#define DESC_DER_DECO_STAT_SHIFT 16
u32 dbg_exec; /* DxDER - DECO Debug Exec Register */
u32 rsvd32[112];
};
#define DECO_STAT_HOST_ERR 0xD
#define DECO_JQCR_WHL 0x20000000
#define DECO_JQCR_FOUR 0x10000000

View File

@ -6,7 +6,10 @@ n5pf-objs := nitrox_main.o \
nitrox_lib.o \
nitrox_hal.o \
nitrox_reqmgr.o \
nitrox_algs.o
nitrox_algs.o \
nitrox_mbx.o \
nitrox_skcipher.o \
nitrox_aead.o
n5pf-$(CONFIG_PCI_IOV) += nitrox_sriov.o
n5pf-$(CONFIG_DEBUG_FS) += nitrox_debugfs.o

View File

@ -0,0 +1,364 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/kernel.h>
#include <linux/printk.h>
#include <linux/crypto.h>
#include <linux/rtnetlink.h>
#include <crypto/aead.h>
#include <crypto/authenc.h>
#include <crypto/des.h>
#include <crypto/sha.h>
#include <crypto/internal/aead.h>
#include <crypto/scatterwalk.h>
#include <crypto/gcm.h>
#include "nitrox_dev.h"
#include "nitrox_common.h"
#include "nitrox_req.h"
#define GCM_AES_SALT_SIZE 4
/**
* struct nitrox_crypt_params - Params to set nitrox crypto request.
* @cryptlen: Encryption/Decryption data length
* @authlen: Assoc data length + Cryptlen
* @srclen: Input buffer length
* @dstlen: Output buffer length
* @iv: IV data
* @ivsize: IV data length
* @ctrl_arg: Identifies the request type (ENCRYPT/DECRYPT)
*/
struct nitrox_crypt_params {
unsigned int cryptlen;
unsigned int authlen;
unsigned int srclen;
unsigned int dstlen;
u8 *iv;
int ivsize;
u8 ctrl_arg;
};
union gph_p3 {
struct {
#ifdef __BIG_ENDIAN_BITFIELD
u16 iv_offset : 8;
u16 auth_offset : 8;
#else
u16 auth_offset : 8;
u16 iv_offset : 8;
#endif
};
u16 param;
};
static int nitrox_aes_gcm_setkey(struct crypto_aead *aead, const u8 *key,
unsigned int keylen)
{
int aes_keylen;
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
struct flexi_crypto_context *fctx;
union fc_ctx_flags flags;
aes_keylen = flexi_aes_keylen(keylen);
if (aes_keylen < 0) {
crypto_aead_set_flags(aead, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
/* fill crypto context */
fctx = nctx->u.fctx;
flags.f = be64_to_cpu(fctx->flags.f);
flags.w0.aes_keylen = aes_keylen;
fctx->flags.f = cpu_to_be64(flags.f);
/* copy enc key to context */
memset(&fctx->crypto, 0, sizeof(fctx->crypto));
memcpy(fctx->crypto.u.key, key, keylen);
return 0;
}
static int nitrox_aead_setauthsize(struct crypto_aead *aead,
unsigned int authsize)
{
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
struct flexi_crypto_context *fctx = nctx->u.fctx;
union fc_ctx_flags flags;
flags.f = be64_to_cpu(fctx->flags.f);
flags.w0.mac_len = authsize;
fctx->flags.f = cpu_to_be64(flags.f);
aead->authsize = authsize;
return 0;
}
static int alloc_src_sglist(struct aead_request *areq, char *iv, int ivsize,
int buflen)
{
struct nitrox_kcrypt_request *nkreq = aead_request_ctx(areq);
int nents = sg_nents_for_len(areq->src, buflen) + 1;
int ret;
if (nents < 0)
return nents;
/* Allocate buffer to hold IV and input scatterlist array */
ret = alloc_src_req_buf(nkreq, nents, ivsize);
if (ret)
return ret;
nitrox_creq_copy_iv(nkreq->src, iv, ivsize);
nitrox_creq_set_src_sg(nkreq, nents, ivsize, areq->src, buflen);
return 0;
}
static int alloc_dst_sglist(struct aead_request *areq, int ivsize, int buflen)
{
struct nitrox_kcrypt_request *nkreq = aead_request_ctx(areq);
int nents = sg_nents_for_len(areq->dst, buflen) + 3;
int ret;
if (nents < 0)
return nents;
/* Allocate buffer to hold ORH, COMPLETION and output scatterlist
* array
*/
ret = alloc_dst_req_buf(nkreq, nents);
if (ret)
return ret;
nitrox_creq_set_orh(nkreq);
nitrox_creq_set_comp(nkreq);
nitrox_creq_set_dst_sg(nkreq, nents, ivsize, areq->dst, buflen);
return 0;
}
static void free_src_sglist(struct aead_request *areq)
{
struct nitrox_kcrypt_request *nkreq = aead_request_ctx(areq);
kfree(nkreq->src);
}
static void free_dst_sglist(struct aead_request *areq)
{
struct nitrox_kcrypt_request *nkreq = aead_request_ctx(areq);
kfree(nkreq->dst);
}
static int nitrox_set_creq(struct aead_request *areq,
struct nitrox_crypt_params *params)
{
struct nitrox_kcrypt_request *nkreq = aead_request_ctx(areq);
struct se_crypto_request *creq = &nkreq->creq;
struct crypto_aead *aead = crypto_aead_reqtfm(areq);
union gph_p3 param3;
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
int ret;
creq->flags = areq->base.flags;
creq->gfp = (areq->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP) ?
GFP_KERNEL : GFP_ATOMIC;
creq->ctrl.value = 0;
creq->opcode = FLEXI_CRYPTO_ENCRYPT_HMAC;
creq->ctrl.s.arg = params->ctrl_arg;
creq->gph.param0 = cpu_to_be16(params->cryptlen);
creq->gph.param1 = cpu_to_be16(params->authlen);
creq->gph.param2 = cpu_to_be16(params->ivsize + areq->assoclen);
param3.iv_offset = 0;
param3.auth_offset = params->ivsize;
creq->gph.param3 = cpu_to_be16(param3.param);
creq->ctx_handle = nctx->u.ctx_handle;
creq->ctrl.s.ctxl = sizeof(struct flexi_crypto_context);
ret = alloc_src_sglist(areq, params->iv, params->ivsize,
params->srclen);
if (ret)
return ret;
ret = alloc_dst_sglist(areq, params->ivsize, params->dstlen);
if (ret) {
free_src_sglist(areq);
return ret;
}
return 0;
}
static void nitrox_aead_callback(void *arg, int err)
{
struct aead_request *areq = arg;
free_src_sglist(areq);
free_dst_sglist(areq);
if (err) {
pr_err_ratelimited("request failed status 0x%0x\n", err);
err = -EINVAL;
}
areq->base.complete(&areq->base, err);
}
static int nitrox_aes_gcm_enc(struct aead_request *areq)
{
struct crypto_aead *aead = crypto_aead_reqtfm(areq);
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
struct nitrox_kcrypt_request *nkreq = aead_request_ctx(areq);
struct se_crypto_request *creq = &nkreq->creq;
struct flexi_crypto_context *fctx = nctx->u.fctx;
struct nitrox_crypt_params params;
int ret;
memcpy(fctx->crypto.iv, areq->iv, GCM_AES_SALT_SIZE);
memset(&params, 0, sizeof(params));
params.cryptlen = areq->cryptlen;
params.authlen = areq->assoclen + params.cryptlen;
params.srclen = params.authlen;
params.dstlen = params.srclen + aead->authsize;
params.iv = &areq->iv[GCM_AES_SALT_SIZE];
params.ivsize = GCM_AES_IV_SIZE - GCM_AES_SALT_SIZE;
params.ctrl_arg = ENCRYPT;
ret = nitrox_set_creq(areq, &params);
if (ret)
return ret;
/* send the crypto request */
return nitrox_process_se_request(nctx->ndev, creq, nitrox_aead_callback,
areq);
}
static int nitrox_aes_gcm_dec(struct aead_request *areq)
{
struct crypto_aead *aead = crypto_aead_reqtfm(areq);
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
struct nitrox_kcrypt_request *nkreq = aead_request_ctx(areq);
struct se_crypto_request *creq = &nkreq->creq;
struct flexi_crypto_context *fctx = nctx->u.fctx;
struct nitrox_crypt_params params;
int ret;
memcpy(fctx->crypto.iv, areq->iv, GCM_AES_SALT_SIZE);
memset(&params, 0, sizeof(params));
params.cryptlen = areq->cryptlen - aead->authsize;
params.authlen = areq->assoclen + params.cryptlen;
params.srclen = areq->cryptlen + areq->assoclen;
params.dstlen = params.srclen - aead->authsize;
params.iv = &areq->iv[GCM_AES_SALT_SIZE];
params.ivsize = GCM_AES_IV_SIZE - GCM_AES_SALT_SIZE;
params.ctrl_arg = DECRYPT;
ret = nitrox_set_creq(areq, &params);
if (ret)
return ret;
/* send the crypto request */
return nitrox_process_se_request(nctx->ndev, creq, nitrox_aead_callback,
areq);
}
static int nitrox_aead_init(struct crypto_aead *aead)
{
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
struct crypto_ctx_hdr *chdr;
/* get the first device */
nctx->ndev = nitrox_get_first_device();
if (!nctx->ndev)
return -ENODEV;
/* allocate nitrox crypto context */
chdr = crypto_alloc_context(nctx->ndev);
if (!chdr) {
nitrox_put_device(nctx->ndev);
return -ENOMEM;
}
nctx->chdr = chdr;
nctx->u.ctx_handle = (uintptr_t)((u8 *)chdr->vaddr +
sizeof(struct ctx_hdr));
nctx->u.fctx->flags.f = 0;
return 0;
}
static int nitrox_aes_gcm_init(struct crypto_aead *aead)
{
int ret;
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
union fc_ctx_flags *flags;
ret = nitrox_aead_init(aead);
if (ret)
return ret;
flags = &nctx->u.fctx->flags;
flags->w0.cipher_type = CIPHER_AES_GCM;
flags->w0.hash_type = AUTH_NULL;
flags->w0.iv_source = IV_FROM_DPTR;
/* ask microcode to calculate ipad/opad */
flags->w0.auth_input_type = 1;
flags->f = be64_to_cpu(flags->f);
crypto_aead_set_reqsize(aead, sizeof(struct aead_request) +
sizeof(struct nitrox_kcrypt_request));
return 0;
}
static void nitrox_aead_exit(struct crypto_aead *aead)
{
struct nitrox_crypto_ctx *nctx = crypto_aead_ctx(aead);
/* free the nitrox crypto context */
if (nctx->u.ctx_handle) {
struct flexi_crypto_context *fctx = nctx->u.fctx;
memzero_explicit(&fctx->crypto, sizeof(struct crypto_keys));
memzero_explicit(&fctx->auth, sizeof(struct auth_keys));
crypto_free_context((void *)nctx->chdr);
}
nitrox_put_device(nctx->ndev);
nctx->u.ctx_handle = 0;
nctx->ndev = NULL;
}
static struct aead_alg nitrox_aeads[] = { {
.base = {
.cra_name = "gcm(aes)",
.cra_driver_name = "n5_aes_gcm",
.cra_priority = PRIO,
.cra_flags = CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
.cra_alignmask = 0,
.cra_module = THIS_MODULE,
},
.setkey = nitrox_aes_gcm_setkey,
.setauthsize = nitrox_aead_setauthsize,
.encrypt = nitrox_aes_gcm_enc,
.decrypt = nitrox_aes_gcm_dec,
.init = nitrox_aes_gcm_init,
.exit = nitrox_aead_exit,
.ivsize = GCM_AES_IV_SIZE,
.maxauthsize = AES_BLOCK_SIZE,
} };
int nitrox_register_aeads(void)
{
return crypto_register_aeads(nitrox_aeads, ARRAY_SIZE(nitrox_aeads));
}
void nitrox_unregister_aeads(void)
{
crypto_unregister_aeads(nitrox_aeads, ARRAY_SIZE(nitrox_aeads));
}

View File

@ -1,458 +1,24 @@
// SPDX-License-Identifier: GPL-2.0
#include <linux/crypto.h>
#include <linux/kernel.h>
#include <linux/module.h>
#include <linux/printk.h>
#include <crypto/aes.h>
#include <crypto/skcipher.h>
#include <crypto/ctr.h>
#include <crypto/des.h>
#include <crypto/xts.h>
#include "nitrox_dev.h"
#include "nitrox_common.h"
#include "nitrox_req.h"
#define PRIO 4001
struct nitrox_cipher {
const char *name;
enum flexi_cipher value;
};
/**
* supported cipher list
*/
static const struct nitrox_cipher flexi_cipher_table[] = {
{ "null", CIPHER_NULL },
{ "cbc(des3_ede)", CIPHER_3DES_CBC },
{ "ecb(des3_ede)", CIPHER_3DES_ECB },
{ "cbc(aes)", CIPHER_AES_CBC },
{ "ecb(aes)", CIPHER_AES_ECB },
{ "cfb(aes)", CIPHER_AES_CFB },
{ "rfc3686(ctr(aes))", CIPHER_AES_CTR },
{ "xts(aes)", CIPHER_AES_XTS },
{ "cts(cbc(aes))", CIPHER_AES_CBC_CTS },
{ NULL, CIPHER_INVALID }
};
static enum flexi_cipher flexi_cipher_type(const char *name)
{
const struct nitrox_cipher *cipher = flexi_cipher_table;
while (cipher->name) {
if (!strcmp(cipher->name, name))
break;
cipher++;
}
return cipher->value;
}
static int flexi_aes_keylen(int keylen)
{
int aes_keylen;
switch (keylen) {
case AES_KEYSIZE_128:
aes_keylen = 1;
break;
case AES_KEYSIZE_192:
aes_keylen = 2;
break;
case AES_KEYSIZE_256:
aes_keylen = 3;
break;
default:
aes_keylen = -EINVAL;
break;
}
return aes_keylen;
}
static int nitrox_skcipher_init(struct crypto_skcipher *tfm)
{
struct nitrox_crypto_ctx *nctx = crypto_skcipher_ctx(tfm);
void *fctx;
/* get the first device */
nctx->ndev = nitrox_get_first_device();
if (!nctx->ndev)
return -ENODEV;
/* allocate nitrox crypto context */
fctx = crypto_alloc_context(nctx->ndev);
if (!fctx) {
nitrox_put_device(nctx->ndev);
return -ENOMEM;
}
nctx->u.ctx_handle = (uintptr_t)fctx;
crypto_skcipher_set_reqsize(tfm, crypto_skcipher_reqsize(tfm) +
sizeof(struct nitrox_kcrypt_request));
return 0;
}
static void nitrox_skcipher_exit(struct crypto_skcipher *tfm)
{
struct nitrox_crypto_ctx *nctx = crypto_skcipher_ctx(tfm);
/* free the nitrox crypto context */
if (nctx->u.ctx_handle) {
struct flexi_crypto_context *fctx = nctx->u.fctx;
memset(&fctx->crypto, 0, sizeof(struct crypto_keys));
memset(&fctx->auth, 0, sizeof(struct auth_keys));
crypto_free_context((void *)fctx);
}
nitrox_put_device(nctx->ndev);
nctx->u.ctx_handle = 0;
nctx->ndev = NULL;
}
static inline int nitrox_skcipher_setkey(struct crypto_skcipher *cipher,
int aes_keylen, const u8 *key,
unsigned int keylen)
{
struct crypto_tfm *tfm = crypto_skcipher_tfm(cipher);
struct nitrox_crypto_ctx *nctx = crypto_tfm_ctx(tfm);
struct flexi_crypto_context *fctx;
enum flexi_cipher cipher_type;
const char *name;
name = crypto_tfm_alg_name(tfm);
cipher_type = flexi_cipher_type(name);
if (unlikely(cipher_type == CIPHER_INVALID)) {
pr_err("unsupported cipher: %s\n", name);
return -EINVAL;
}
/* fill crypto context */
fctx = nctx->u.fctx;
fctx->flags = 0;
fctx->w0.cipher_type = cipher_type;
fctx->w0.aes_keylen = aes_keylen;
fctx->w0.iv_source = IV_FROM_DPTR;
fctx->flags = cpu_to_be64(*(u64 *)&fctx->w0);
/* copy the key to context */
memcpy(fctx->crypto.u.key, key, keylen);
return 0;
}
static int nitrox_aes_setkey(struct crypto_skcipher *cipher, const u8 *key,
unsigned int keylen)
{
int aes_keylen;
aes_keylen = flexi_aes_keylen(keylen);
if (aes_keylen < 0) {
crypto_skcipher_set_flags(cipher, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
return nitrox_skcipher_setkey(cipher, aes_keylen, key, keylen);
}
static void nitrox_skcipher_callback(struct skcipher_request *skreq,
int err)
{
if (err) {
pr_err_ratelimited("request failed status 0x%0x\n", err);
err = -EINVAL;
}
skcipher_request_complete(skreq, err);
}
static int nitrox_skcipher_crypt(struct skcipher_request *skreq, bool enc)
{
struct crypto_skcipher *cipher = crypto_skcipher_reqtfm(skreq);
struct nitrox_crypto_ctx *nctx = crypto_skcipher_ctx(cipher);
struct nitrox_kcrypt_request *nkreq = skcipher_request_ctx(skreq);
int ivsize = crypto_skcipher_ivsize(cipher);
struct se_crypto_request *creq;
creq = &nkreq->creq;
creq->flags = skreq->base.flags;
creq->gfp = (skreq->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP) ?
GFP_KERNEL : GFP_ATOMIC;
/* fill the request */
creq->ctrl.value = 0;
creq->opcode = FLEXI_CRYPTO_ENCRYPT_HMAC;
creq->ctrl.s.arg = (enc ? ENCRYPT : DECRYPT);
/* param0: length of the data to be encrypted */
creq->gph.param0 = cpu_to_be16(skreq->cryptlen);
creq->gph.param1 = 0;
/* param2: encryption data offset */
creq->gph.param2 = cpu_to_be16(ivsize);
creq->gph.param3 = 0;
creq->ctx_handle = nctx->u.ctx_handle;
creq->ctrl.s.ctxl = sizeof(struct flexi_crypto_context);
/* copy the iv */
memcpy(creq->iv, skreq->iv, ivsize);
creq->ivsize = ivsize;
creq->src = skreq->src;
creq->dst = skreq->dst;
nkreq->nctx = nctx;
nkreq->skreq = skreq;
/* send the crypto request */
return nitrox_process_se_request(nctx->ndev, creq,
nitrox_skcipher_callback, skreq);
}
static int nitrox_aes_encrypt(struct skcipher_request *skreq)
{
return nitrox_skcipher_crypt(skreq, true);
}
static int nitrox_aes_decrypt(struct skcipher_request *skreq)
{
return nitrox_skcipher_crypt(skreq, false);
}
static int nitrox_3des_setkey(struct crypto_skcipher *cipher,
const u8 *key, unsigned int keylen)
{
if (keylen != DES3_EDE_KEY_SIZE) {
crypto_skcipher_set_flags(cipher, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
return nitrox_skcipher_setkey(cipher, 0, key, keylen);
}
static int nitrox_3des_encrypt(struct skcipher_request *skreq)
{
return nitrox_skcipher_crypt(skreq, true);
}
static int nitrox_3des_decrypt(struct skcipher_request *skreq)
{
return nitrox_skcipher_crypt(skreq, false);
}
static int nitrox_aes_xts_setkey(struct crypto_skcipher *cipher,
const u8 *key, unsigned int keylen)
{
struct crypto_tfm *tfm = crypto_skcipher_tfm(cipher);
struct nitrox_crypto_ctx *nctx = crypto_tfm_ctx(tfm);
struct flexi_crypto_context *fctx;
int aes_keylen, ret;
ret = xts_check_key(tfm, key, keylen);
if (ret)
return ret;
keylen /= 2;
aes_keylen = flexi_aes_keylen(keylen);
if (aes_keylen < 0) {
crypto_skcipher_set_flags(cipher, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
fctx = nctx->u.fctx;
/* copy KEY2 */
memcpy(fctx->auth.u.key2, (key + keylen), keylen);
return nitrox_skcipher_setkey(cipher, aes_keylen, key, keylen);
}
static int nitrox_aes_ctr_rfc3686_setkey(struct crypto_skcipher *cipher,
const u8 *key, unsigned int keylen)
{
struct crypto_tfm *tfm = crypto_skcipher_tfm(cipher);
struct nitrox_crypto_ctx *nctx = crypto_tfm_ctx(tfm);
struct flexi_crypto_context *fctx;
int aes_keylen;
if (keylen < CTR_RFC3686_NONCE_SIZE)
return -EINVAL;
fctx = nctx->u.fctx;
memcpy(fctx->crypto.iv, key + (keylen - CTR_RFC3686_NONCE_SIZE),
CTR_RFC3686_NONCE_SIZE);
keylen -= CTR_RFC3686_NONCE_SIZE;
aes_keylen = flexi_aes_keylen(keylen);
if (aes_keylen < 0) {
crypto_skcipher_set_flags(cipher, CRYPTO_TFM_RES_BAD_KEY_LEN);
return -EINVAL;
}
return nitrox_skcipher_setkey(cipher, aes_keylen, key, keylen);
}
static struct skcipher_alg nitrox_skciphers[] = { {
.base = {
.cra_name = "cbc(aes)",
.cra_driver_name = "n5_cbc(aes)",
.cra_priority = PRIO,
.cra_flags = CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
.cra_alignmask = 0,
.cra_module = THIS_MODULE,
},
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = nitrox_aes_setkey,
.encrypt = nitrox_aes_encrypt,
.decrypt = nitrox_aes_decrypt,
.init = nitrox_skcipher_init,
.exit = nitrox_skcipher_exit,
}, {
.base = {
.cra_name = "ecb(aes)",
.cra_driver_name = "n5_ecb(aes)",
.cra_priority = PRIO,
.cra_flags = CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
.cra_alignmask = 0,
.cra_module = THIS_MODULE,
},
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = nitrox_aes_setkey,
.encrypt = nitrox_aes_encrypt,
.decrypt = nitrox_aes_decrypt,
.init = nitrox_skcipher_init,
.exit = nitrox_skcipher_exit,
}, {
.base = {
.cra_name = "cfb(aes)",
.cra_driver_name = "n5_cfb(aes)",
.cra_priority = PRIO,
.cra_flags = CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
.cra_alignmask = 0,
.cra_module = THIS_MODULE,
},
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = nitrox_aes_setkey,
.encrypt = nitrox_aes_encrypt,
.decrypt = nitrox_aes_decrypt,
.init = nitrox_skcipher_init,
.exit = nitrox_skcipher_exit,
}, {
.base = {
.cra_name = "xts(aes)",
.cra_driver_name = "n5_xts(aes)",
.cra_priority = PRIO,
.cra_flags = CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
.cra_alignmask = 0,
.cra_module = THIS_MODULE,
},
.min_keysize = 2 * AES_MIN_KEY_SIZE,
.max_keysize = 2 * AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = nitrox_aes_xts_setkey,
.encrypt = nitrox_aes_encrypt,
.decrypt = nitrox_aes_decrypt,
.init = nitrox_skcipher_init,
.exit = nitrox_skcipher_exit,
}, {
.base = {
.cra_name = "rfc3686(ctr(aes))",
.cra_driver_name = "n5_rfc3686(ctr(aes))",
.cra_priority = PRIO,
.cra_flags = CRYPTO_ALG_ASYNC,
.cra_blocksize = 1,
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
.cra_alignmask = 0,
.cra_module = THIS_MODULE,
},
.min_keysize = AES_MIN_KEY_SIZE + CTR_RFC3686_NONCE_SIZE,
.max_keysize = AES_MAX_KEY_SIZE + CTR_RFC3686_NONCE_SIZE,
.ivsize = CTR_RFC3686_IV_SIZE,
.init = nitrox_skcipher_init,
.exit = nitrox_skcipher_exit,
.setkey = nitrox_aes_ctr_rfc3686_setkey,
.encrypt = nitrox_aes_encrypt,
.decrypt = nitrox_aes_decrypt,
}, {
.base = {
.cra_name = "cts(cbc(aes))",
.cra_driver_name = "n5_cts(cbc(aes))",
.cra_priority = PRIO,
.cra_flags = CRYPTO_ALG_ASYNC,
.cra_blocksize = AES_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
.cra_alignmask = 0,
.cra_type = &crypto_ablkcipher_type,
.cra_module = THIS_MODULE,
},
.min_keysize = AES_MIN_KEY_SIZE,
.max_keysize = AES_MAX_KEY_SIZE,
.ivsize = AES_BLOCK_SIZE,
.setkey = nitrox_aes_setkey,
.encrypt = nitrox_aes_encrypt,
.decrypt = nitrox_aes_decrypt,
.init = nitrox_skcipher_init,
.exit = nitrox_skcipher_exit,
}, {
.base = {
.cra_name = "cbc(des3_ede)",
.cra_driver_name = "n5_cbc(des3_ede)",
.cra_priority = PRIO,
.cra_flags = CRYPTO_ALG_ASYNC,
.cra_blocksize = DES3_EDE_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
.cra_alignmask = 0,
.cra_module = THIS_MODULE,
},
.min_keysize = DES3_EDE_KEY_SIZE,
.max_keysize = DES3_EDE_KEY_SIZE,
.ivsize = DES3_EDE_BLOCK_SIZE,
.setkey = nitrox_3des_setkey,
.encrypt = nitrox_3des_encrypt,
.decrypt = nitrox_3des_decrypt,
.init = nitrox_skcipher_init,
.exit = nitrox_skcipher_exit,
}, {
.base = {
.cra_name = "ecb(des3_ede)",
.cra_driver_name = "n5_ecb(des3_ede)",
.cra_priority = PRIO,
.cra_flags = CRYPTO_ALG_ASYNC,
.cra_blocksize = DES3_EDE_BLOCK_SIZE,
.cra_ctxsize = sizeof(struct nitrox_crypto_ctx),
.cra_alignmask = 0,
.cra_module = THIS_MODULE,
},
.min_keysize = DES3_EDE_KEY_SIZE,
.max_keysize = DES3_EDE_KEY_SIZE,
.ivsize = DES3_EDE_BLOCK_SIZE,
.setkey = nitrox_3des_setkey,
.encrypt = nitrox_3des_encrypt,
.decrypt = nitrox_3des_decrypt,
.init = nitrox_skcipher_init,
.exit = nitrox_skcipher_exit,
}
};
int nitrox_crypto_register(void)
{
return crypto_register_skciphers(nitrox_skciphers,
ARRAY_SIZE(nitrox_skciphers));
int err;
err = nitrox_register_skciphers();
if (err)
return err;
err = nitrox_register_aeads();
if (err) {
nitrox_unregister_skciphers();
return err;
}
return 0;
}
void nitrox_crypto_unregister(void)
{
crypto_unregister_skciphers(nitrox_skciphers,
ARRAY_SIZE(nitrox_skciphers));
nitrox_unregister_aeads();
nitrox_unregister_skciphers();
}

View File

@ -7,6 +7,10 @@
int nitrox_crypto_register(void);
void nitrox_crypto_unregister(void);
int nitrox_register_aeads(void);
void nitrox_unregister_aeads(void);
int nitrox_register_skciphers(void);
void nitrox_unregister_skciphers(void);
void *crypto_alloc_context(struct nitrox_device *ndev);
void crypto_free_context(void *ctx);
struct nitrox_device *nitrox_get_first_device(void);
@ -19,7 +23,7 @@ void pkt_slc_resp_tasklet(unsigned long data);
int nitrox_process_se_request(struct nitrox_device *ndev,
struct se_crypto_request *req,
completion_t cb,
struct skcipher_request *skreq);
void *cb_arg);
void backlog_qflush_work(struct work_struct *work);

View File

@ -54,7 +54,13 @@
#define NPS_STATS_PKT_DMA_WR_CNT 0x1000190
/* NPS packet registers */
#define NPS_PKT_INT 0x1040018
#define NPS_PKT_INT 0x1040018
#define NPS_PKT_MBOX_INT_LO 0x1040020
#define NPS_PKT_MBOX_INT_LO_ENA_W1C 0x1040030
#define NPS_PKT_MBOX_INT_LO_ENA_W1S 0x1040038
#define NPS_PKT_MBOX_INT_HI 0x1040040
#define NPS_PKT_MBOX_INT_HI_ENA_W1C 0x1040050
#define NPS_PKT_MBOX_INT_HI_ENA_W1S 0x1040058
#define NPS_PKT_IN_RERR_HI 0x1040108
#define NPS_PKT_IN_RERR_HI_ENA_W1S 0x1040120
#define NPS_PKT_IN_RERR_LO 0x1040128
@ -74,6 +80,10 @@
#define NPS_PKT_SLC_RERR_LO_ENA_W1S 0x1040240
#define NPS_PKT_SLC_ERR_TYPE 0x1040248
#define NPS_PKT_SLC_ERR_TYPE_ENA_W1S 0x1040260
/* Mailbox PF->VF PF Accessible Data registers */
#define NPS_PKT_MBOX_PF_VF_PFDATAX(_i) (0x1040800 + ((_i) * 0x8))
#define NPS_PKT_MBOX_VF_PF_PFDATAX(_i) (0x1040C00 + ((_i) * 0x8))
#define NPS_PKT_SLC_CTLX(_i) (0x10000 + ((_i) * 0x40000))
#define NPS_PKT_SLC_CNTSX(_i) (0x10008 + ((_i) * 0x40000))
#define NPS_PKT_SLC_INT_LEVELSX(_i) (0x10010 + ((_i) * 0x40000))

Some files were not shown because too many files have changed in this diff Show More