linux

korg/linux

mirror of https://mirrors.bfsu.edu.cn/git/linux.git synced 2024-11-29 23:24:11 +08:00

Author	SHA1	Message	Date
Daniel Jordan	34c3a47d20	padata: Always leave BHs disabled when running ->parallel() A deadlock can happen when an overloaded system runs ->parallel() in the context of the current task: padata_do_parallel ->parallel() pcrypt_aead_enc/dec padata_do_serial spin_lock(&reorder->lock) // BHs still enabled <interrupt> ... __do_softirq ... padata_do_serial spin_lock(&reorder->lock) It's a bug for BHs to be on in _do_serial as Steffen points out, so ensure they're off in the "current task" case like they are in padata_parallel_worker to avoid this situation. Reported-by: syzbot+bc05445bc14148d51915@syzkaller.appspotmail.com Fixes: `4611ce2246` ("padata: allocate work structures for parallel jobs from a pool") Signed-off-by: Daniel Jordan <daniel.m.jordan@oracle.com> Acked-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:18 +08:00
Zhang Yiqun	1aa33fc8d4	crypto: tcrypt - Fix multibuffer skcipher speed test mem leak In the past, the data for mb-skcipher test has been allocated twice, that means the first allcated memory area is without free, which may cause a potential memory leakage. So this patch is to remove one allocation to fix this error. Fixes: `e161c5930c` ("crypto: tcrypt - add multibuf skcipher...") Signed-off-by: Zhang Yiqun <zhangyiqun@phytium.com.cn> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:18 +08:00
Eric Biggers	441cb1b730	crypto: algboss - compile out test-related code when tests disabled When CONFIG_CRYPTO_MANAGER_DISABLE_TESTS is set, the code in algboss.c that handles CRYPTO_MSG_ALG_REGISTER is unnecessary, so make it be compiled out. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:18 +08:00
Eric Biggers	790c4c9f53	crypto: kdf - silence noisy self-test Make the kdf_sp800108 self-test only print a message on success when fips_enabled, so that it's consistent with testmgr.c and doesn't spam the kernel log with a message that isn't really important. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:18 +08:00
Eric Biggers	0bf365c0ef	crypto: kdf - skip self-test when tests disabled Make kdf_sp800108 honor the CONFIG_CRYPTO_MANAGER_DISABLE_TESTS kconfig option, so that it doesn't always waste time running its self-test. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:18 +08:00
Eric Biggers	06bd9c967e	crypto: api - compile out crypto_boot_test_finished when tests disabled The crypto_boot_test_finished static key is unnecessary when self-tests are disabled in the kconfig, so optimize it out accordingly, along with the entirety of crypto_start_tests(). This mainly avoids the overhead of an unnecessary static_branch_enable() on every boot. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:18 +08:00
Eric Biggers	9cadd73ade	crypto: algboss - optimize registration of internal algorithms Since algboss always skips testing of algorithms with the CRYPTO_ALG_INTERNAL flag, there is no need to go through the dance of creating the test kthread, which creates a lot of overhead. Instead, we can just directly finish the algorithm registration, like is now done when self-tests are disabled entirely. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:18 +08:00
Eric Biggers	a7008584ab	crypto: api - optimize algorithm registration when self-tests disabled Currently, registering an algorithm with the crypto API always causes a notification to be posted to the "cryptomgr", which then creates a kthread to self-test the algorithm. However, if self-tests are disabled in the kconfig (as is the default option), then this kthread just notifies waiters that the algorithm has been tested, then exits. This causes a significant amount of overhead, especially in the kthread creation and destruction, which is not necessary at all. For example, in a quick test I found that booting a "minimum" x86_64 kernel with all the crypto options enabled (except for the self-tests) takes about 400ms until PID 1 can start. Of that, a full 13ms is spent just doing this pointless dance, involving a kthread being created, run, and destroyed over 200 times. That's over 3% of the entire kernel start time. Fix this by just skipping the creation of the test larval and the posting of the registration notification entirely, when self-tests are disabled. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-25 17:39:18 +08:00
Herbert Xu	719c547c65	Merge branch 'i2c/client_device_id_helper-immutable' of git://git.kernel.org/pub/scm/linux/kernel/git/wsa/linux Merge i2c tree to pick up i2c_client_get_device_id helper.	2022-11-25 17:05:15 +08:00
Uwe Kleine-König	8e96729fc2	crypto: ccree - Make cc_debugfs_global_fini() available for module init function ccree_init() calls cc_debugfs_global_fini(), the former is an init function and the latter an exit function though. A modular build emits: WARNING: modpost: drivers/crypto/ccree/ccree.o: section mismatch in reference: init_module (section: .init.text) -> cc_debugfs_global_fini (section: .exit.text) (with CONFIG_DEBUG_SECTION_MISMATCH=y). Fixes: `4f1c596df7` ("crypto: ccree - Remove debugfs when platform_driver_register failed") Signed-off-by: Uwe Kleine-König <u.kleine-koenig@pengutronix.de> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-22 18:12:55 +08:00
Wenkai Lin	75df46b598	crypto: hisilicon/sec - remove continuous blank lines Fix that put two or more continuous blank lines inside function. Signed-off-by: Wenkai Lin <linwenkai6@hisilicon.com> Signed-off-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 17:00:23 +08:00
Kai Ye	2132d4efaa	crypto: hisilicon/sec - fix spelling mistake 'ckeck' -> 'check' There are a couple of spelling mistakes in sec2. Fix them. Signed-off-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 17:00:23 +08:00
Kai Ye	9c75609842	crypto: hisilicon/qm - the command dump process is modified Reduce the function complexity by use the function table in the process of dumping queue. The function input parameters are unified. And maintainability is enhanced. Signed-off-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 17:00:22 +08:00
Kai Ye	94476b2b6d	crypto: hisilicon/qm - split a debugfs.c from qm Considering that the qm feature and debugfs feature are independent. The code related to debugfs is getting larger and larger. It should be separate as a debugfs file. So move some debugfs code to new file from qm file. The qm code logic is not modified. And maintainability is enhanced. Signed-off-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 17:00:22 +08:00
Kai Ye	b40b62ed7b	crypto: hisilicon/qm - modify the process of regs dfx The last register logic and different register logic are combined. Use "u32" instead of 'int' in the regs function input parameter to simplify some checks. Signed-off-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 16:59:34 +08:00
Kai Ye	7bbbc9d81b	crypto: hisilicon/qm - delete redundant null assignment operations There is no security data in the pointer. It is only a value transferred as a structure. It makes no sense to zero a variable that is on the stack. So not need to set the pointer to null. Signed-off-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 16:59:34 +08:00
Herbert Xu	e6cb02bd0a	crypto: skcipher - Allow sync algorithms with large request contexts Some sync algorithms may require a large amount of temporary space during its operations. There is no reason why they should be limited just because some legacy users want to place all temporary data on the stack. Such algorithms can now set a flag to indicate that they need extra request context, which will cause them to be invisible to users that go through the sync_skcipher interface. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 16:59:34 +08:00
Xiongfeng Wang	cc7710d0d4	crypto: hisilicon/qm - add missing pci_dev_put() in q_num_set() pci_get_device() will increase the reference count for the returned pci_dev. We need to use pci_dev_put() to decrease the reference count before q_num_set() returns. Fixes: `c8b4b47707` ("crypto: hisilicon - add HiSilicon HPRE accelerator") Signed-off-by: Xiongfeng Wang <wangxiongfeng2@huawei.com> Reviewed-by: Weili Qian <qianweili@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 16:59:34 +08:00
Herbert Xu	3a58c23117	crypto: cryptd - Use request context instead of stack for sub-request cryptd is buggy as it tries to use sync_skcipher without going through the proper sync_skcipher interface. In fact it doesn't even need sync_skcipher since it's already a proper skcipher and can easily access the request context instead of using something off the stack. Fixes: `36b3875a97` ("crypto: cryptd - Remove VLA usage of skcipher") Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 16:59:34 +08:00
Tianjia Zhang	824db5cd1e	crypto: arm64 - Fix unused variable compilation warnings of cpu_feature The cpu feature defined by MODULE_DEVICE_TABLE is only referenced when compiling as a module, and the warning of unused variable will be encountered when compiling with intree. The warning can be removed by adding the __maybe_unused flag. Fixes: `03c9a333fe` ("crypto: arm64/ghash - add NEON accelerated fallback for 64-bit PMULL") Fixes: `ae1b83c7d5` ("crypto: arm64/sm4 - add CE implementation for GCM mode") Reported-by: kernel test robot <lkp@intel.com> Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 16:59:34 +08:00
Gaosheng Cui	4f1c596df7	crypto: ccree - Remove debugfs when platform_driver_register failed When platform_driver_register failed, we need to remove debugfs, which will caused a resource leak, fix it. Failed logs as follows: [ 32.606488] debugfs: Directory 'ccree' with parent '/' already present! Fixes: `4c3f97276e` ("crypto: ccree - introduce CryptoCell driver") Signed-off-by: Gaosheng Cui <cuigaosheng1@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 16:59:34 +08:00
Tomas Marek	7cdc5e6bcd	hwrng: stm32 - rename readl return value Use a more meaningful name for the readl return value variable. Link: https://lore.kernel.org/all/Y1J3QwynPFIlfrIv@loth.rohan.me.apana.org.au/ Signed-off-by: Tomas Marek <tomas.marek@elrest.cz> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 16:59:34 +08:00
Jason A. Donenfeld	16bdbae394	hwrng: core - treat default_quality as a maximum and default to 1024 Most hw_random devices return entropy which is assumed to be of full quality, but driver authors don't bother setting the quality knob. Some hw_random devices return less than full quality entropy, and then driver authors set the quality knob. Therefore, the entropy crediting should be opt-out rather than opt-in per-driver, to reflect the actual reality on the ground. For example, the two Raspberry Pi RNG drivers produce full entropy randomness, and both EDK2 and U-Boot's drivers for these treat them as such. The result is that EFI then uses these numbers and passes the to Linux, and Linux credits them as boot, thereby initializing the RNG. Yet, in Linux, the quality knob was never set to anything, and so on the chance that Linux is booted without EFI, nothing is ever credited. That's annoying. The same pattern appears to repeat itself throughout various drivers. In fact, very very few drivers have bothered setting quality=1024. Looking at the git history of existing drivers and corresponding mailing list discussion, this conclusion tracks. There's been a decent amount of discussion about drivers that set quality < 1024 -- somebody read and interepreted a datasheet, or made some back of the envelope calculation somehow. But there's been very little, if any, discussion about most drivers where the quality is just set to 1024 or unset (or set to 1000 when the authors misunderstood the API and assumed it was base-10 rather than base-2); in both cases the intent was fairly clear of, "this is a hardware random device; it's fine." So let's invert this logic. A hw_random struct's quality knob now controls the maximum quality a driver can produce, or 0 to specify 1024. Then, the module-wide switch called "default_quality" is changed to represent the maximum quality of any driver. By default it's 1024, and the quality of any particular driver is then given by: min(default_quality, rng->quality ?: 1024); This way, the user can still turn this off for weird reasons (and we can replace whatever driver-specific disabling hacks existed in the past), yet we get proper crediting for relevant RNGs. Cc: Dominik Brodowski <linux@dominikbrodowski.net> Cc: Ard Biesheuvel <ardb@kernel.org> Cc: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-18 16:59:34 +08:00
Angel Iglesias	662233731d	i2c: core: Introduce i2c_client_get_device_id helper function Introduces new helper function to aid in .probe_new() refactors. In order to use existing i2c_get_device_id() on the probe callback, the device match table needs to be accessible in that function, which would require bigger refactors in some drivers using the deprecated .probe callback. This issue was discussed in more detail in the IIO mailing list. Link: https://lore.kernel.org/all/20221023132302.911644-11-u.kleine-koenig@pengutronix.de/ Suggested-by: Nuno Sá <noname.nuno@gmail.com> Suggested-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Suggested-by: Jonathan Cameron <jic23@kernel.org> Signed-off-by: Angel Iglesias <ang.iglesiasg@gmail.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Reviewed-by: Jonathan Cameron <Jonathan.Cameron@huawei.com> Signed-off-by: Wolfram Sang <wsa@kernel.org>	2022-11-14 20:49:33 +01:00
Shashank Gupta	557ffd5a47	crypto: qat - remove ADF_STATUS_PF_RUNNING flag from probe The ADF_STATUS_PF_RUNNING bit is set after the successful initialization of the communication between VF to PF in adf_vf2pf_notify_init(). So, it is not required to be set after the execution of the function adf_dev_init(). Signed-off-by: Shashank Gupta <shashank.gupta@intel.com> Reviewed-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by: Wojciech Ziemba <wojciech.ziemba@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-11 18:14:59 +08:00
Yang Li	fb11cddfe2	crypto: rockchip - Remove surplus dev_err() when using platform_get_irq() There is no need to call the dev_err() function directly to print a custom message when handling an error from either the platform_get_irq() or platform_get_irq_byname() functions as both are going to display an appropriate error message in case of a failure. ./drivers/crypto/rockchip/rk3288_crypto.c:351:2-9: line 351 is redundant because platform_get_irq() already prints an error Link: https://bugzilla.openanolis.cn/show_bug.cgi?id=2677 Reported-by: Abaci Robot <abaci@linux.alibaba.com> Signed-off-by: Yang Li <yang.lee@linux.alibaba.com> Acked-by: Corentin Labbe <clabbe@baylibre.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-11 18:14:59 +08:00
Ard Biesheuvel	520af5da66	crypto: lib/aesgcm - Provide minimal library implementation Implement a minimal library version of AES-GCM based on the existing library implementations of AES and multiplication in GF(2^128). Using these primitives, GCM can be implemented in a straight-forward manner. GCM has a couple of sharp edges, i.e., the amount of input data processed with the same initialization vector (IV) should be capped to protect the counter from 32-bit rollover (or carry), and the size of the authentication tag should be fixed for a given key. [0] The former concern is addressed trivially, given that the function call API uses 32-bit signed types for the input lengths. It is still up to the caller to avoid IV reuse in general, but this is not something we can police at the implementation level. As for the latter concern, let's make the authentication tag size part of the key schedule, and only permit it to be configured as part of the key expansion routine. Note that table based AES implementations are susceptible to known plaintext timing attacks on the encryption key. The AES library already attempts to mitigate this to some extent, but given that the counter mode encryption used by GCM operates exclusively on known plaintext by construction (the IV and therefore the initial counter value are known to an attacker), let's take some extra care to mitigate this, by calling the AES library with interrupts disabled. [0] https://nvlpubs.nist.gov/nistpubs/legacy/sp/nistspecialpublication800-38d.pdf Link: https://lore.kernel.org/all/c6fb9b25-a4b6-2e4a-2dd1-63adda055a49@amd.com/ Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Tested-by: Nikunj A Dadhania <nikunj@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-11 18:14:59 +08:00
Ard Biesheuvel	b67ce439fe	crypto: lib/gf128mul - make gf128mul_lle time invariant The gf128mul library has different variants with different memory/performance tradeoffs, where the faster ones use 4k or 64k lookup tables precomputed at runtime, which are based on one of the multiplication factors, which is commonly the key for keyed hash algorithms such as GHASH. The slowest variant is gf128_mul_lle() [and its bbe/ble counterparts], which does not use precomputed lookup tables, but it still relies on a single u16[256] lookup table which is input independent. The use of such a table may cause the execution time of gf128_mul_lle() to correlate with the value of the inputs, which is generally something that must be avoided for cryptographic algorithms. On top of that, the function uses a sequence of if () statements that conditionally invoke be128_xor() based on which bits are set in the second argument of the function, which is usually a pointer to the multiplication factor that represents the key. In order to remove the correlation between the execution time of gf128_mul_lle() and the value of its inputs, let's address the identified shortcomings: - add a time invariant version of gf128mul_x8_lle() that replaces the table lookup with the expression that is used at compile time to populate the lookup table; - make the invocations of be128_xor() unconditional, but pass a zero vector as the third argument if the associated bit in the key is cleared. The resulting code is likely to be significantly slower. However, given that this is the slowest version already, making it even slower in order to make it more secure is assumed to be justified. The bbe and ble counterparts could receive the same treatment, but the former is never used anywhere in the kernel, and the latter is only used in the driver for a asynchronous crypto h/w accelerator (Chelsio), where timing variances are unlikely to matter. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-11 18:14:59 +08:00
Ard Biesheuvel	61c581a46a	crypto: move gf128mul library into lib/crypto The gf128mul library does not depend on the crypto API at all, so it can be moved into lib/crypto. This will allow us to use it in other library code in a subsequent patch without having to depend on CONFIG_CRYPTO. While at it, change the Kconfig symbol name to align with other crypto library implementations. However, the source file name is retained, as it is reflected in the module .ko filename, and changing this might break things for users. Signed-off-by: Ard Biesheuvel <ardb@kernel.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-11 18:14:59 +08:00
Ralph Siemsen	329cfa42e5	crypto: doc - use correct function name The hashing API does not have a function called .finish() Signed-off-by: Ralph Siemsen <ralph.siemsen@linaro.org> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:35:44 +08:00
Tianjia Zhang	ae1b83c7d5	crypto: arm64/sm4 - add CE implementation for GCM mode This patch is a CE-optimized assembly implementation for GCM mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 224 and 224 modes of tcrypt, and compared the performance before and after this patch (the driver used before this patch is gcm_base(ctr-sm4-ce,ghash-generic)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before (gcm_base(ctr-sm4-ce,ghash-generic)): gcm(sm4) \| 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------------- GCM enc \| 25.24 64.65 104.66 116.69 123.81 125.12 129.67 130.62 GCM dec \| 25.40 64.80 104.74 116.70 123.81 125.21 129.68 130.59 GCM mb enc \| 24.95 64.06 104.20 116.38 123.55 124.97 129.63 130.61 GCM mb dec \| 24.92 64.00 104.13 116.34 123.55 124.98 129.56 130.48 After: gcm-sm4-ce \| 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------------- GCM enc \| 108.62 397.18 971.60 1283.92 1522.77 1513.39 1777.00 1806.96 GCM dec \| 116.36 398.14 1004.27 1319.11 1624.21 1635.43 1932.54 1974.20 GCM mb enc \| 107.13 391.79 962.05 1274.94 1514.76 1508.57 1769.07 1801.58 GCM mb dec \| 113.40 389.36 988.51 1307.68 1619.10 1631.55 1931.70 1970.86 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:35:44 +08:00
Tianjia Zhang	67fa3a7fdf	crypto: arm64/sm4 - add CE implementation for CCM mode This patch is a CE-optimized assembly implementation for CCM mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 223 and 225 modes of tcrypt, and compared the performance before and after this patch (the driver used before this patch is ccm_base(ctr-sm4-ce,cbcmac-sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before (rfc4309(ccm_base(ctr-sm4-ce,cbcmac-sm4-ce))): ccm(sm4) \| 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------- CCM enc \| 35.07 125.40 336.47 468.17 581.97 619.18 712.56 736.01 CCM dec \| 34.87 124.40 335.08 466.75 581.04 618.81 712.25 735.89 CCM mb enc \| 34.71 123.96 333.92 465.39 579.91 617.49 711.45 734.92 CCM mb dec \| 34.42 122.80 331.02 462.81 578.28 616.42 709.88 734.19 After (rfc4309(ccm-sm4-ce)): ccm-sm4-ce \| 16 64 256 512 1024 1420 4096 8192 -------------+--------------------------------------------------------------- CCM enc \| 77.12 249.82 569.94 725.17 839.27 867.71 952.87 969.89 CCM dec \| 75.90 247.26 566.29 722.12 836.90 865.95 951.74 968.57 CCM mb enc \| 75.98 245.25 562.91 718.99 834.76 864.70 950.17 967.90 CCM mb dec \| 75.06 243.78 560.58 717.13 833.68 862.70 949.35 967.11 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:35:32 +08:00
Tianjia Zhang	6b5360a5e0	crypto: arm64/sm4 - add CE implementation for cmac/xcbc/cbcmac This patch is a CE-optimized assembly implementation for cmac/xcbc/cbcmac. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 300 mode of tcrypt, and compared the performance before and after this patch (the driver used before this patch is XXXmac(sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before: update-size \| 16 64 256 1024 2048 4096 8192 ---------------+-------------------------------------------------------- cmac(sm4-ce) \| 293.33 403.69 503.76 527.78 531.10 535.46 535.81 xcbc(sm4-ce) \| 292.83 402.50 504.02 529.08 529.87 536.55 538.24 cbcmac(sm4-ce) \| 318.42 415.79 497.12 515.05 523.15 521.19 523.01 After: update-size \| 16 64 256 1024 2048 4096 8192 ---------------+-------------------------------------------------------- cmac-sm4-ce \| 371.99 675.28 903.56 971.65 980.57 990.40 991.04 xcbc-sm4-ce \| 372.11 674.55 903.47 971.61 980.96 990.42 991.10 cbcmac-sm4-ce \| 371.63 675.33 903.23 972.07 981.42 990.93 991.45 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:43 +08:00
Tianjia Zhang	01f633113b	crypto: arm64/sm4 - add CE implementation for XTS mode This patch is a CE-optimized assembly implementation for XTS mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of tcrypt, and compared the performance before and after this patch (the driver used before this patch is xts(ecb-sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before: xts(ecb-sm4-ce) \| 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- XTS enc \| 117.17 430.56 732.92 1134.98 2007.03 2136.23 2347.20 XTS dec \| 116.89 429.02 733.40 1132.96 2006.13 2130.50 2347.92 After: xts-sm4-ce \| 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- XTS enc \| 224.68 798.91 1248.08 1714.60 2413.73 2467.84 2612.62 XTS dec \| 229.85 791.34 1237.79 1720.00 2413.30 2473.84 2611.95 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:42 +08:00
Tianjia Zhang	b1863fd074	crypto: arm64/sm4 - add CE implementation for CTS-CBC mode This patch is a CE-optimized assembly implementation for CTS-CBC mode. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 218 mode of tcrypt, and compared the performance before and after this patch (the driver used before this patch is cts(cbc-sm4-ce)). The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: Before: cts(cbc-sm4-ce) \| 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- CTS-CBC enc \| 286.09 297.17 457.97 627.75 868.58 900.80 957.69 CTS-CBC dec \| 286.67 285.63 538.35 947.08 2241.03 2577.32 3391.14 After: cts-cbc-sm4-ce \| 16 64 128 256 1024 1420 4096 ----------------+-------------------------------------------------------------- CTS-CBC enc \| 288.19 428.80 593.57 741.04 911.73 931.80 950.00 CTS-CBC dec \| 292.22 468.99 838.23 1380.76 2741.17 3036.42 3409.62 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:42 +08:00
Tianjia Zhang	45089dbe59	crypto: arm64/sm4 - export reusable CE acceleration functions In the accelerated implementation of the SM4 algorithm using the Crypto Extension instructions, there are some functions that can be reused in the upcoming accelerated implementation of the GCM/CCM mode, and the CBC/CFB encryption is reused in the optimized implementation of SVESM4. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:42 +08:00
Tianjia Zhang	cb9ba02b07	crypto: arm64/sm4 - simplify sm4_ce_expand_key() of CE implementation Use a 128-bit swap mask and tbl instruction to simplify the implementation for generating SM4 rkey_dec. Also fixed the issue of not being wrapped by kernel_neon_begin/end() when using the sm4_ce_expand_key() function. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:31 +08:00
Tianjia Zhang	ce41fefd24	crypto: arm64/sm4 - refactor and simplify CE implementation This patch does not add new features, but only refactors and simplifies the implementation of the Crypto Extension acceleration of the SM4 algorithm: Extract the macro optimized by SM4 Crypto Extension for reuse in the subsequent optimization of CCM/GCM modes. Encryption in CBC and CFB modes processes four blocks at a time instead of one, allowing the ld1 instruction to load 64 bytes of data at a time, which will reduces unnecessary memory accesses. CBC/CFB/CTR makes full use of free registers to reduce redundant memory accesses, and rearranges some instructions to improve out-of-order execution capabilities. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:31 +08:00
Tianjia Zhang	3c3836378d	crypto: tcrypt - add SM4 cts-cbc/xts/xcbc test Added CTS-CBC/XTS/XCBC tests for SM4 algorithms, as well as corresponding speed tests, this is to test performance-optimized implementations of these modes. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:21 +08:00
Tianjia Zhang	c24ee936c7	crypto: testmgr - add SM4 cts-cbc/xts/xcbc test vectors This patch newly adds the test vectors of CTS-CBC/XTS/XCBC modes of the SM4 algorithm, and also added some test vectors for SM4 GCM/CCM. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:21 +08:00
Tianjia Zhang	62508017a2	crypto: arm64/sm4 - refactor and simplify NEON implementation This patch does not add new features. The main work is to refactor and simplify the implementation of SM4 NEON, which is reflected in the following aspects: The accelerated implementation supports the arbitrary number of blocks, not just multiples of 8, which simplifies the implementation and brings some optimization acceleration for data that is not aligned by 8 blocks. When loading the input data, use the ld4 instruction to replace the original ld1 instruction as much as possible, which will save the cost of matrix transposition of the input data. Use 8-block parallelism whenever possible to speed up matrix transpose and rotation operations, instead of up to 4-block parallelism. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:21 +08:00
Tianjia Zhang	a41b212946	crypto: arm64/sm3 - add NEON assembly implementation This patch adds the NEON acceleration implementation of the SM3 hash algorithm. The main algorithm is based on SM3 NEON accelerated work of the libgcrypt project. Benchmark on T-Head Yitian-710 2.75 GHz, the data comes from the 326 mode of tcrypt, and compares the performance data of sm3-generic and sm3-ce. The abscissas are blocks of different lengths. The data is tabulated and the unit is Mb/s: update-size \| 16 64 256 1024 2048 4096 8192 ---------------+-------------------------------------------------------- sm3-generic \| 185.24 221.28 301.26 307.43 300.83 308.82 308.91 sm3-neon \| 171.81 220.20 322.94 339.28 334.09 343.61 343.87 sm3-ce \| 227.48 333.48 502.62 527.87 520.45 534.91 535.40 Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:34:21 +08:00
Tianjia Zhang	e1fa51aa2b	crypto: arm64/sm3 - raise the priority of the CE implementation Raise the priority of the sm3-ce algorithm from 200 to 400, this is to make room for the implementation of sm3-neon. Signed-off-by: Tianjia Zhang <tianjia.zhang@linux.alibaba.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:33:22 +08:00
Anirudh Venkataramanan	3513828cb8	crypto: tcrypt - Drop leading newlines from prints The top level print banners have a leading newline. It's not entirely clear why this exists, but it makes it harder to parse tcrypt test output using a script. Drop said newlines. tcrypt output before this patch: [...] testing speed of rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption [...] test 0 (160 bit key, 16 byte blocks): 1 operation in 2320 cycles (16 bytes) tcrypt output with this patch: [...] testing speed of rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption [...] test 0 (160 bit key, 16 byte blocks): 1 operation in 2320 cycles (16 bytes) Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:33:22 +08:00
Anirudh Venkataramanan	a2ef563000	crypto: tcrypt - Drop module name from print string The pr_fmt() define includes KBUILD_MODNAME, and so there's no need for pr_err() to also print it. Drop module name from the print string. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:33:22 +08:00
Anirudh Venkataramanan	837a99f590	crypto: tcrypt - Use pr_info/pr_err Currently, there's mixed use of printk() and pr_info()/pr_err(). The latter prints the module name (because pr_fmt() is defined so) but the former does not. As a result there's inconsistency in the printed output. For example: modprobe mode=211: [...] test 0 (160 bit key, 16 byte blocks): 1 operation in 2320 cycles (16 bytes) [...] test 1 (160 bit key, 64 byte blocks): 1 operation in 2336 cycles (64 bytes) modprobe mode=215: [...] tcrypt: test 0 (160 bit key, 16 byte blocks): 1 operation in 2173 cycles (16 bytes) [...] tcrypt: test 1 (160 bit key, 64 byte blocks): 1 operation in 2241 cycles (64 bytes) Replace all instances of printk() with pr_info()/pr_err() so that the module name is printed consistently. Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:33:22 +08:00
Anirudh Venkataramanan	fdaeb224e2	crypto: tcrypt - Use pr_cont to print test results For some test cases, a line break gets inserted between the test banner and the results. For example, with mode=211 this is the output: [...] testing speed of rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption [...] test 0 (160 bit key, 16 byte blocks): [...] 1 operation in 2373 cycles (16 bytes) --snip-- [...] testing speed of gcm(aes) (generic-gcm-aesni) encryption [...] test 0 (128 bit key, 16 byte blocks): [...] 1 operation in 2338 cycles (16 bytes) Similar behavior is seen in the following cases as well: modprobe tcrypt mode=212 modprobe tcrypt mode=213 modprobe tcrypt mode=221 modprobe tcrypt mode=300 sec=1 modprobe tcrypt mode=400 sec=1 This doesn't happen with mode=215: [...] tcrypt: testing speed of multibuffer rfc4106(gcm(aes)) (rfc4106-gcm-aesni) encryption [...] tcrypt: test 0 (160 bit key, 16 byte blocks): 1 operation in 2215 cycles (16 bytes) --snip-- [...] tcrypt: testing speed of multibuffer gcm(aes) (generic-gcm-aesni) encryption [...] tcrypt: test 0 (128 bit key, 16 byte blocks): 1 operation in 2191 cycles (16 bytes) This print inconsistency is because printk() is used instead of pr_cont() in a few places. Change these to be pr_cont(). checkpatch warns that pr_cont() shouldn't be used. This can be ignored in this context as tcrypt already uses pr_cont(). Signed-off-by: Anirudh Venkataramanan <anirudh.venkataramanan@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-11-04 17:33:22 +08:00
wangjianli	d6e9aa6e1e	crypto: octeontx - fix repeated words in comments Delete the redundant word 'the'. Signed-off-by: wangjianli <wangjianli@cdjrlc.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-10-28 12:36:34 +08:00
Kai Ye	8f82f4ae89	crypto: hisilicon/qm - delete redundancy check Because the permission on the VF debugfs file is "0444". So the VF function checking is redundant in qos writing api. Signed-off-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-10-28 12:36:34 +08:00
Kai Ye	22d7a6c39c	crypto: hisilicon/qm - add pci bdf number check The pci bdf number check is added for qos written by using the pci api. Directly get the devfn by pci_dev, so delete some redundant code. And use the kstrtoul instead of sscanf to simplify code. Signed-off-by: Kai Ye <yekai13@huawei.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>	2022-10-28 12:36:34 +08:00

1 2 3 4 5 ...

1136082 Commits