linux/drivers/crypto
zhenwei pi 977231e8d4 virtio-crypto: wait ctrl queue instead of busy polling
Originally, after submitting request into virtio crypto control
queue, the guest side polls the result from the virt queue. This
works like following:
    CPU0   CPU1               ...             CPUx  CPUy
     |      |                                  |     |
     \      \                                  /     /
      \--------spin_lock(&vcrypto->ctrl_lock)-------/
                           |
                 virtqueue add & kick
                           |
                  busy poll virtqueue
                           |
              spin_unlock(&vcrypto->ctrl_lock)
                          ...

There are two problems:
1, The queue depth is always 1, the performance of a virtio crypto
   device gets limited. Multi user processes share a single control
   queue, and hit spin lock race from control queue. Test on Intel
   Platinum 8260, a single worker gets ~35K/s create/close session
   operations, and 8 workers get ~40K/s operations with 800% CPU
   utilization.
2, The control request is supposed to get handled immediately, but
   in the current implementation of QEMU(v6.2), the vCPU thread kicks
   another thread to do this work, the latency also gets unstable.
   Tracking latency of virtio_crypto_alg_akcipher_close_session in 5s:
        usecs               : count     distribution
         0 -> 1          : 0        |                        |
         2 -> 3          : 7        |                        |
         4 -> 7          : 72       |                        |
         8 -> 15         : 186485   |************************|
        16 -> 31         : 687      |                        |
        32 -> 63         : 5        |                        |
        64 -> 127        : 3        |                        |
       128 -> 255        : 1        |                        |
       256 -> 511        : 0        |                        |
       512 -> 1023       : 0        |                        |
      1024 -> 2047       : 0        |                        |
      2048 -> 4095       : 0        |                        |
      4096 -> 8191       : 0        |                        |
      8192 -> 16383      : 2        |                        |
This means that a CPU may hold vcrypto->ctrl_lock as long as 8192~16383us.

To improve the performance of control queue, a request on control queue
waits completion instead of busy polling to reduce lock racing, and gets
completed by control queue callback.
    CPU0   CPU1               ...             CPUx  CPUy
     |      |                                  |     |
     \      \                                  /     /
      \--------spin_lock(&vcrypto->ctrl_lock)-------/
                           |
                 virtqueue add & kick
                           |
      ---------spin_unlock(&vcrypto->ctrl_lock)------
     /      /                                  \     \
     |      |                                  |     |
    wait   wait                               wait  wait

Test this patch, the guest side get ~200K/s operations with 300% CPU
utilization.

Cc: Michael S. Tsirkin <mst@redhat.com>
Cc: Jason Wang <jasowang@redhat.com>
Cc: Gonglei <arei.gonglei@huawei.com>
Reviewed-by: Gonglei <arei.gonglei@huawei.com>
Signed-off-by: zhenwei pi <pizhenwei@bytedance.com>
Message-Id: <20220506131627.180784-4-pizhenwei@bytedance.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
2022-05-31 12:45:09 -04:00
..
allwinner crypto: sun8i-ce - do not fallback if cryptlen is less than sg length 2022-05-13 17:24:48 +08:00
amcc crypto: amcc - fix incorrect kernel-doc comment syntax in files 2021-03-26 20:15:58 +11:00
amlogic crypto: amlogic - call finalize with bh disabled 2022-03-03 10:47:49 +12:00
axis crypto: sha - split sha.h into sha1.h and sha2.h 2020-11-20 14:45:33 +11:00
bcm crypto: bcm - Fix a whole host of kernel-doc misdemeanours 2021-03-26 20:02:35 +11:00
caam This update includes the following changes: 2022-05-27 18:06:49 -07:00
cavium crypto: cavium/nitrox - remove check of list iterator against head past the loop body 2022-04-08 16:26:43 +08:00
ccp crypto: ccp - Fix the INIT_EX data file open failure 2022-04-29 13:44:57 +08:00
ccree crypto: ccree - use fine grained DMA mapping dir 2022-04-15 16:34:25 +08:00
chelsio treewide: Replace open-coded flex arrays in unions 2021-10-18 12:28:53 -07:00
gemini crypto: gemini - call finalize with bh disabled 2022-03-03 10:47:49 +12:00
hisilicon crypto: hisilicon/sec - delete the flag CRYPTO_ALG_ALLOCATES_MEMORY 2022-05-20 13:54:45 +08:00
inside-secure crypto: inside-secure - Add MODULE_FIRMWARE macros 2022-05-06 18:16:55 +08:00
keembay crypto: keembay - Make use of devm helper function devm_platform_ioremap_resource() 2022-04-29 13:44:57 +08:00
marvell crypto: octeontx2 - simplify the return expression of otx2_cpt_aead_cbc_aes_sha_setkey() 2022-05-13 17:24:48 +08:00
nx powerpc/powernv/vas: Assign real address to rx_fifo in vas_rx_win_attr 2022-05-22 15:58:27 +10:00
qat crypto: qat - add support for 401xx devices 2022-05-20 13:49:18 +08:00
qce crypto: qce - fix uaf on qce_skcipher_register_one 2021-11-20 15:02:08 +11:00
rockchip crypto: rockchip - ECB does not need IV 2022-02-18 16:21:10 +11:00
stm32 crypto: stm32 - fix reference leak in stm32_crc_remove 2022-03-25 16:21:05 +12:00
ux500 crypto: ux500/hash - simplify if-if to if-else 2022-04-15 16:34:28 +08:00
virtio virtio-crypto: wait ctrl queue instead of busy polling 2022-05-31 12:45:09 -04:00
vmx crypto: vmx - Fix build error 2022-05-10 11:20:25 +08:00
xilinx crypto: xilinx: prevent probing on non-xilinx hardware 2022-03-09 15:12:31 +12:00
atmel-aes-regs.h
atmel-aes.c crypto: atmel - add support for AES and SHA IPs available on lan966x SoC 2022-02-05 15:10:51 +11:00
atmel-authenc.h crypto: sha - split sha.h into sha1.h and sha2.h 2020-11-20 14:45:33 +11:00
atmel-ecc.c crypto: atmel - Avoid flush_scheduled_work() usage 2022-05-06 18:16:55 +08:00
atmel-i2c.c crypto: atmel - Avoid flush_scheduled_work() usage 2022-05-06 18:16:55 +08:00
atmel-i2c.h crypto: atmel - Avoid flush_scheduled_work() usage 2022-05-06 18:16:55 +08:00
atmel-sha204a.c crypto: atmel - Avoid flush_scheduled_work() usage 2022-05-06 18:16:55 +08:00
atmel-sha-regs.h
atmel-sha.c crypto: atmel - add support for AES and SHA IPs available on lan966x SoC 2022-02-05 15:10:51 +11:00
atmel-tdes-regs.h
atmel-tdes.c crypto: atmel-tdes - Add support for the TDES IP available on sama7g5 SoC 2022-02-11 20:39:38 +11:00
exynos-rng.c
geode-aes.c crypto: geode - use DEFINE_SPINLOCK() for spinlock 2021-04-16 21:16:31 +10:00
geode-aes.h crypto: geode-aes - convert to skcipher API and make thread-safe 2019-10-23 19:46:56 +11:00
hifn_795x.c crypto: drivers - use semicolons rather than commas to separate statements 2020-10-02 18:02:15 +10:00
img-hash.c crypto: img-hash - remove need for error return variable ret 2021-09-17 11:06:14 +08:00
ixp4xx_crypto.c ARM: ixp4xx: Drop all common code 2022-02-12 18:20:04 +01:00
Kconfig crypto: s390 - add crypto library interface for ChaCha20 2022-05-13 17:24:49 +08:00
Makefile crypto: atmel - Avoid flush_scheduled_work() usage 2022-05-06 18:16:55 +08:00
mxs-dcp.c crypto: mxs-dcp - Fix scatterlist processing 2022-01-31 11:21:46 +11:00
n2_asm.S
n2_core.c crypto: sha - split sha.h into sha1.h and sha2.h 2020-11-20 14:45:33 +11:00
n2_core.h
omap-aes-gcm.c crypto: omap-aes - avoid spamming console with self tests 2020-06-04 22:03:39 +10:00
omap-aes.c crypto: omap-aes - Constify static attribute_group 2022-02-18 16:21:09 +11:00
omap-aes.h crypto: omap-aes - permit asynchronous skcipher as fallback 2020-07-16 21:49:02 +10:00
omap-crypto.c crypto: omap - Avoid redundant copy when using truncated sg list 2021-08-21 15:44:53 +08:00
omap-crypto.h
omap-des.c crypto: omap - increase priority of DES/3DES 2021-12-24 14:18:22 +11:00
omap-sham.c crypto: omap-sham - Constify static attribute_group 2022-02-18 16:21:09 +11:00
padlock-aes.c crypto: algapi - Remove skbuff.h inclusion 2020-08-20 14:04:28 +10:00
padlock-sha.c crypto: sha - split sha.h into sha1.h and sha2.h 2020-11-20 14:45:33 +11:00
qcom-rng.c crypto: qcom-rng - fix infinite loop on requests not multiple of WORD_SZ 2022-05-13 17:13:38 +08:00
s5p-sss.c crypto: s5p-sss - Add error handling in s5p_aes_probe() 2021-10-29 21:04:03 +08:00
sa2ul.c crypto: sa2ul - Add the new compatible for AM62 2022-04-21 17:53:54 +08:00
sa2ul.h crypto: sa2ul - Add support for AM64 2021-04-22 17:31:30 +10:00
sahara.c crypto: sahara - Remove unused .id_table support 2021-01-03 08:41:34 +11:00
talitos.c crypto: talitos - Uniform coding style with defined variable 2022-05-13 17:24:49 +08:00
talitos.h crypto: talitos - Work around SEC6 ERRATA (AES-CTR mode data size error) 2021-01-29 15:57:58 +11:00