2021-09-06 22:23:14 +08:00
|
|
|
/*
|
|
|
|
* This program is free software; you can redistribute it and/or
|
|
|
|
* modify it under the terms of the GNU General Public
|
|
|
|
* License v2 as published by the Free Software Foundation.
|
|
|
|
*
|
|
|
|
* This program is distributed in the hope that it will be useful,
|
|
|
|
* but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
|
|
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
|
|
|
|
* General Public License for more details.
|
|
|
|
*
|
|
|
|
* You should have received a copy of the GNU General Public
|
|
|
|
* License along with this program; if not, write to the
|
|
|
|
* Free Software Foundation, Inc., 59 Temple Place - Suite 330,
|
|
|
|
* Boston, MA 021110-1307, USA.
|
|
|
|
*/
|
|
|
|
|
2023-02-21 08:41:21 +08:00
|
|
|
#include "kerncompat.h"
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
#include <time.h>
|
|
|
|
#include <getopt.h>
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
#include <unistd.h>
|
|
|
|
#if HAVE_LINUX_PERF_EVENT_H == 1 && HAVE_LINUX_HW_BREAKPOINT_H == 1
|
|
|
|
#include <linux/perf_event.h>
|
|
|
|
#include <linux/hw_breakpoint.h>
|
|
|
|
#include <sys/syscall.h>
|
|
|
|
#define HAVE_PERF
|
|
|
|
#endif
|
2019-06-10 20:49:50 +08:00
|
|
|
#include "crypto/hash.h"
|
|
|
|
#include "crypto/crc32c.h"
|
2019-10-08 00:23:52 +08:00
|
|
|
#include "crypto/sha.h"
|
2019-10-08 00:23:52 +08:00
|
|
|
#include "crypto/blake2.h"
|
2022-09-17 01:29:25 +08:00
|
|
|
#include "common/messages.h"
|
2023-02-09 09:54:35 +08:00
|
|
|
#include "common/cpu-utils.h"
|
2019-06-10 20:49:50 +08:00
|
|
|
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
#ifdef __x86_64__
|
|
|
|
static const int cycles_supported = 1;
|
|
|
|
#else
|
|
|
|
static const int cycles_supported = 0;
|
2019-06-10 20:49:50 +08:00
|
|
|
#endif
|
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
enum {
|
|
|
|
UNITS_CYCLES,
|
|
|
|
UNITS_TIME,
|
|
|
|
UNITS_PERF,
|
|
|
|
};
|
|
|
|
|
2019-06-10 20:49:50 +08:00
|
|
|
const int blocksize = 4096;
|
|
|
|
int iterations = 100000;
|
|
|
|
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
#ifdef __x86_64__
|
2019-06-10 20:49:50 +08:00
|
|
|
static __always_inline unsigned long long rdtsc(void)
|
|
|
|
{
|
|
|
|
unsigned low, high;
|
|
|
|
|
|
|
|
asm volatile("rdtsc" : "=a" (low), "=d" (high));
|
|
|
|
|
|
|
|
return (low | ((u64)(high) << 32));
|
|
|
|
}
|
|
|
|
|
|
|
|
static inline u64 read_tsc(void)
|
|
|
|
{
|
|
|
|
asm volatile("mfence");
|
|
|
|
return rdtsc();
|
|
|
|
}
|
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
#define cpu_cycles() read_tsc()
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
|
|
|
|
#else
|
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
#define cpu_cycles() (0)
|
|
|
|
|
|
|
|
#endif
|
|
|
|
|
|
|
|
#ifdef HAVE_PERF
|
|
|
|
|
|
|
|
static int perf_fd = -1;
|
|
|
|
static int perf_init(void)
|
|
|
|
{
|
|
|
|
static struct perf_event_attr attr = {
|
|
|
|
.type = PERF_TYPE_HARDWARE,
|
|
|
|
.config = PERF_COUNT_HW_CPU_CYCLES
|
|
|
|
};
|
|
|
|
|
|
|
|
perf_fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
|
|
|
|
return perf_fd;
|
|
|
|
}
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
static void perf_finish(void)
|
|
|
|
{
|
|
|
|
close(perf_fd);
|
|
|
|
}
|
|
|
|
|
|
|
|
static long long perf_cycles(void)
|
|
|
|
{
|
|
|
|
long long cycles;
|
|
|
|
int ret;
|
|
|
|
|
|
|
|
ret = read(perf_fd, &cycles, sizeof(cycles));
|
|
|
|
if (ret != sizeof(cycles))
|
|
|
|
return 0;
|
|
|
|
return cycles;
|
|
|
|
}
|
|
|
|
|
|
|
|
#else
|
|
|
|
static int perf_init()
|
|
|
|
{
|
|
|
|
errno = EOPNOTSUPP;
|
|
|
|
return -1;
|
|
|
|
}
|
|
|
|
static void perf_finish() {}
|
|
|
|
static long long perf_cycles() {
|
|
|
|
return 0;
|
|
|
|
}
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
#endif
|
|
|
|
|
|
|
|
static inline u64 get_time(void)
|
|
|
|
{
|
|
|
|
struct timespec ts;
|
|
|
|
|
|
|
|
clock_gettime(CLOCK_MONOTONIC, &ts);
|
|
|
|
return ts.tv_sec * 1000 * 1000 * 1000 + ts.tv_nsec;
|
|
|
|
}
|
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
static inline u64 get_cycles(int units)
|
|
|
|
{
|
|
|
|
switch (units) {
|
|
|
|
case UNITS_CYCLES: return cpu_cycles();
|
|
|
|
case UNITS_TIME: return get_time();
|
|
|
|
case UNITS_PERF: return perf_cycles();
|
|
|
|
}
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
2019-06-10 20:49:50 +08:00
|
|
|
/* Read the input and copy last bytes as the hash */
|
|
|
|
static int hash_null_memcpy(const u8 *buf, size_t length, u8 *out)
|
|
|
|
{
|
|
|
|
const u8 *end = buf + length;
|
|
|
|
|
|
|
|
while (buf + CRYPTO_HASH_SIZE_MAX < end) {
|
|
|
|
memcpy(out, buf, CRYPTO_HASH_SIZE_MAX);
|
|
|
|
buf += CRYPTO_HASH_SIZE_MAX;
|
|
|
|
}
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
|
|
|
/* Test overhead of the calls */
|
|
|
|
static int hash_null_nop(const u8 *buf, size_t length, u8 *out)
|
|
|
|
{
|
|
|
|
memset(out, 0xFF, CRYPTO_HASH_SIZE_MAX);
|
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
static const char *units_to_desc(int units)
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
{
|
|
|
|
switch (units) {
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
case UNITS_CYCLES: return "CPU cycles";
|
|
|
|
case UNITS_TIME: return "time: ns";
|
|
|
|
case UNITS_PERF: return "perf event: CPU cycles";
|
|
|
|
}
|
|
|
|
return "unknown";
|
|
|
|
}
|
|
|
|
|
|
|
|
static const char *units_to_str(int units)
|
|
|
|
{
|
|
|
|
switch (units) {
|
|
|
|
case UNITS_CYCLES: return "cycles";
|
|
|
|
case UNITS_TIME: return "nsecs";
|
|
|
|
case UNITS_PERF: return "perf_c";
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
}
|
|
|
|
return "unknown";
|
|
|
|
}
|
|
|
|
|
2019-06-10 20:49:50 +08:00
|
|
|
int main(int argc, char **argv) {
|
|
|
|
u8 buf[blocksize];
|
|
|
|
u8 hash[32];
|
|
|
|
int idx;
|
|
|
|
int iter;
|
|
|
|
struct contestant {
|
|
|
|
char name[16];
|
|
|
|
int (*digest)(const u8 *buf, size_t length, u8 *out);
|
|
|
|
int digest_size;
|
|
|
|
u64 cycles;
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 05:09:52 +08:00
|
|
|
u64 time;
|
2023-02-09 09:54:35 +08:00
|
|
|
unsigned long cpu_flag;
|
2023-03-01 22:27:52 +08:00
|
|
|
int backend;
|
2019-06-10 20:49:50 +08:00
|
|
|
} contestants[] = {
|
|
|
|
{ .name = "NULL-NOP", .digest = hash_null_nop, .digest_size = 32 },
|
|
|
|
{ .name = "NULL-MEMCPY", .digest = hash_null_memcpy, .digest_size = 32 },
|
btrfs-progs: hash-speedtest: select implementation by features
Now put all the recent changes into action. Add a callback that will
reinitialize the implementation pointers according to the desired
feature. Reference implementations use the NONE CPU flag to distinguish
them from the rest.
Example results:
$ hash-speedtest
CPU flags: 0xff
CPU features: SSE2 SSSE3 SSE41 SSE42 SHA AVX AVX2
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 67129026, cycles/i 67
NULL-MEMCPY: cycles: 231303654, cycles/i 231, 60792.500 MiB/s
CRC32C-ref: cycles: 23982698042, cycles/i 23982, 586.322 MiB/s
CRC32C-NI: cycles: 1168017624, cycles/i 1168, 12038.828 MiB/s
XXHASH: cycles: 838434468, cycles/i 838, 16771.152 MiB/s
SHA256-ref: cycles: 68296865380, cycles/i 68296, 205.889 MiB/s
SHA256-NI: cycles: 29748853920, cycles/i 29748, 472.676 MiB/s
BLAKE2-ref: cycles: 14532177414, cycles/i 14532, 967.617 MiB/s
BLAKE2-SSE2: cycles: 17762215810, cycles/i 17762, 791.657 MiB/s
BLAKE2-SSE41: cycles: 12370044656, cycles/i 12370, 1136.744 MiB/s
BLAKE2-AVX2: cycles: 9472823338, cycles/i 9472, 1484.412 MiB/s
Previously:
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 67714016, cycles/i 67
NULL-MEMCPY: cycles: 234140818, cycles/i 234, 60055.762 MiB/s
CRC32C: cycles: 1187358432, cycles/i 1187, 11842.733 MiB/s
XXHASH: cycles: 1897530684, cycles/i 1897, 7410.448 MiB/s
SHA256: cycles: 69855340702, cycles/i 69855, 201.296 MiB/s
BLAKE2: cycles: 14713130972, cycles/i 14713, 955.716 MiB/s
The CPU is i7-11700 3.60GHz and not the same as previous results
mentioned in changelogs so the results are incomparable. Otherwise, the
updated xxhash implementation is twice as fast, no significant changes
for the rest.
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-16 11:41:27 +08:00
|
|
|
{ .name = "CRC32C-ref", .digest = hash_crc32c, .digest_size = 4,
|
2023-03-01 08:32:26 +08:00
|
|
|
.cpu_flag = CPU_FLAG_NONE },
|
btrfs-progs: hash-speedtest: select implementation by features
Now put all the recent changes into action. Add a callback that will
reinitialize the implementation pointers according to the desired
feature. Reference implementations use the NONE CPU flag to distinguish
them from the rest.
Example results:
$ hash-speedtest
CPU flags: 0xff
CPU features: SSE2 SSSE3 SSE41 SSE42 SHA AVX AVX2
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 67129026, cycles/i 67
NULL-MEMCPY: cycles: 231303654, cycles/i 231, 60792.500 MiB/s
CRC32C-ref: cycles: 23982698042, cycles/i 23982, 586.322 MiB/s
CRC32C-NI: cycles: 1168017624, cycles/i 1168, 12038.828 MiB/s
XXHASH: cycles: 838434468, cycles/i 838, 16771.152 MiB/s
SHA256-ref: cycles: 68296865380, cycles/i 68296, 205.889 MiB/s
SHA256-NI: cycles: 29748853920, cycles/i 29748, 472.676 MiB/s
BLAKE2-ref: cycles: 14532177414, cycles/i 14532, 967.617 MiB/s
BLAKE2-SSE2: cycles: 17762215810, cycles/i 17762, 791.657 MiB/s
BLAKE2-SSE41: cycles: 12370044656, cycles/i 12370, 1136.744 MiB/s
BLAKE2-AVX2: cycles: 9472823338, cycles/i 9472, 1484.412 MiB/s
Previously:
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 67714016, cycles/i 67
NULL-MEMCPY: cycles: 234140818, cycles/i 234, 60055.762 MiB/s
CRC32C: cycles: 1187358432, cycles/i 1187, 11842.733 MiB/s
XXHASH: cycles: 1897530684, cycles/i 1897, 7410.448 MiB/s
SHA256: cycles: 69855340702, cycles/i 69855, 201.296 MiB/s
BLAKE2: cycles: 14713130972, cycles/i 14713, 955.716 MiB/s
The CPU is i7-11700 3.60GHz and not the same as previous results
mentioned in changelogs so the results are incomparable. Otherwise, the
updated xxhash implementation is twice as fast, no significant changes
for the rest.
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-16 11:41:27 +08:00
|
|
|
{ .name = "CRC32C-NI", .digest = hash_crc32c, .digest_size = 4,
|
2023-09-13 05:32:38 +08:00
|
|
|
.cpu_flag = CPU_FLAG_PCLMUL },
|
2019-06-10 20:49:50 +08:00
|
|
|
{ .name = "XXHASH", .digest = hash_xxhash, .digest_size = 8 },
|
btrfs-progs: hash-speedtest: select implementation by features
Now put all the recent changes into action. Add a callback that will
reinitialize the implementation pointers according to the desired
feature. Reference implementations use the NONE CPU flag to distinguish
them from the rest.
Example results:
$ hash-speedtest
CPU flags: 0xff
CPU features: SSE2 SSSE3 SSE41 SSE42 SHA AVX AVX2
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 67129026, cycles/i 67
NULL-MEMCPY: cycles: 231303654, cycles/i 231, 60792.500 MiB/s
CRC32C-ref: cycles: 23982698042, cycles/i 23982, 586.322 MiB/s
CRC32C-NI: cycles: 1168017624, cycles/i 1168, 12038.828 MiB/s
XXHASH: cycles: 838434468, cycles/i 838, 16771.152 MiB/s
SHA256-ref: cycles: 68296865380, cycles/i 68296, 205.889 MiB/s
SHA256-NI: cycles: 29748853920, cycles/i 29748, 472.676 MiB/s
BLAKE2-ref: cycles: 14532177414, cycles/i 14532, 967.617 MiB/s
BLAKE2-SSE2: cycles: 17762215810, cycles/i 17762, 791.657 MiB/s
BLAKE2-SSE41: cycles: 12370044656, cycles/i 12370, 1136.744 MiB/s
BLAKE2-AVX2: cycles: 9472823338, cycles/i 9472, 1484.412 MiB/s
Previously:
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 67714016, cycles/i 67
NULL-MEMCPY: cycles: 234140818, cycles/i 234, 60055.762 MiB/s
CRC32C: cycles: 1187358432, cycles/i 1187, 11842.733 MiB/s
XXHASH: cycles: 1897530684, cycles/i 1897, 7410.448 MiB/s
SHA256: cycles: 69855340702, cycles/i 69855, 201.296 MiB/s
BLAKE2: cycles: 14713130972, cycles/i 14713, 955.716 MiB/s
The CPU is i7-11700 3.60GHz and not the same as previous results
mentioned in changelogs so the results are incomparable. Otherwise, the
updated xxhash implementation is twice as fast, no significant changes
for the rest.
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-16 11:41:27 +08:00
|
|
|
{ .name = "SHA256-ref", .digest = hash_sha256, .digest_size = 32,
|
2023-03-01 22:27:52 +08:00
|
|
|
.cpu_flag = CPU_FLAG_NONE, .backend = CRYPTOPROVIDER_BUILTIN + 1 },
|
|
|
|
{ .name = "SHA256-gcrypt", .digest = hash_sha256, .digest_size = 32,
|
|
|
|
.cpu_flag = CPU_FLAG_NONE, .backend = CRYPTOPROVIDER_LIBGCRYPT + 1 },
|
|
|
|
{ .name = "SHA256-sodium", .digest = hash_sha256, .digest_size = 32,
|
|
|
|
.cpu_flag = CPU_FLAG_NONE, .backend = CRYPTOPROVIDER_LIBSODIUM + 1 },
|
|
|
|
{ .name = "SHA256-kcapi", .digest = hash_sha256, .digest_size = 32,
|
|
|
|
.cpu_flag = CPU_FLAG_NONE, .backend = CRYPTOPROVIDER_LIBKCAPI + 1 },
|
2023-11-16 02:02:15 +08:00
|
|
|
{ .name = "SHA256-botan", .digest = hash_sha256, .digest_size = 32,
|
|
|
|
.cpu_flag = CPU_FLAG_NONE, .backend = CRYPTOPROVIDER_BOTAN + 1 },
|
2023-11-16 21:54:03 +08:00
|
|
|
{ .name = "SHA256-openssl", .digest = hash_sha256, .digest_size = 32,
|
|
|
|
.cpu_flag = CPU_FLAG_NONE, .backend = CRYPTOPROVIDER_OPENSSL + 1 },
|
2023-02-09 21:58:08 +08:00
|
|
|
{ .name = "SHA256-NI", .digest = hash_sha256, .digest_size = 32,
|
2023-03-01 22:27:52 +08:00
|
|
|
.cpu_flag = CPU_FLAG_SHA, .backend = CRYPTOPROVIDER_BUILTIN + 1 },
|
btrfs-progs: hash-speedtest: select implementation by features
Now put all the recent changes into action. Add a callback that will
reinitialize the implementation pointers according to the desired
feature. Reference implementations use the NONE CPU flag to distinguish
them from the rest.
Example results:
$ hash-speedtest
CPU flags: 0xff
CPU features: SSE2 SSSE3 SSE41 SSE42 SHA AVX AVX2
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 67129026, cycles/i 67
NULL-MEMCPY: cycles: 231303654, cycles/i 231, 60792.500 MiB/s
CRC32C-ref: cycles: 23982698042, cycles/i 23982, 586.322 MiB/s
CRC32C-NI: cycles: 1168017624, cycles/i 1168, 12038.828 MiB/s
XXHASH: cycles: 838434468, cycles/i 838, 16771.152 MiB/s
SHA256-ref: cycles: 68296865380, cycles/i 68296, 205.889 MiB/s
SHA256-NI: cycles: 29748853920, cycles/i 29748, 472.676 MiB/s
BLAKE2-ref: cycles: 14532177414, cycles/i 14532, 967.617 MiB/s
BLAKE2-SSE2: cycles: 17762215810, cycles/i 17762, 791.657 MiB/s
BLAKE2-SSE41: cycles: 12370044656, cycles/i 12370, 1136.744 MiB/s
BLAKE2-AVX2: cycles: 9472823338, cycles/i 9472, 1484.412 MiB/s
Previously:
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 67714016, cycles/i 67
NULL-MEMCPY: cycles: 234140818, cycles/i 234, 60055.762 MiB/s
CRC32C: cycles: 1187358432, cycles/i 1187, 11842.733 MiB/s
XXHASH: cycles: 1897530684, cycles/i 1897, 7410.448 MiB/s
SHA256: cycles: 69855340702, cycles/i 69855, 201.296 MiB/s
BLAKE2: cycles: 14713130972, cycles/i 14713, 955.716 MiB/s
The CPU is i7-11700 3.60GHz and not the same as previous results
mentioned in changelogs so the results are incomparable. Otherwise, the
updated xxhash implementation is twice as fast, no significant changes
for the rest.
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-16 11:41:27 +08:00
|
|
|
{ .name = "BLAKE2-ref", .digest = hash_blake2b, .digest_size = 32,
|
2023-03-01 22:27:52 +08:00
|
|
|
.cpu_flag = CPU_FLAG_NONE, .backend = CRYPTOPROVIDER_BUILTIN + 1 },
|
|
|
|
{ .name = "BLAKE2-gcrypt", .digest = hash_blake2b, .digest_size = 32,
|
|
|
|
.cpu_flag = CPU_FLAG_NONE, .backend = CRYPTOPROVIDER_LIBGCRYPT + 1 },
|
|
|
|
{ .name = "BLAKE2-sodium", .digest = hash_blake2b, .digest_size = 32,
|
|
|
|
.cpu_flag = CPU_FLAG_NONE, .backend = CRYPTOPROVIDER_LIBSODIUM + 1 },
|
|
|
|
{ .name = "BLAKE2-kcapi", .digest = hash_blake2b, .digest_size = 32,
|
|
|
|
.cpu_flag = CPU_FLAG_NONE, .backend = CRYPTOPROVIDER_LIBKCAPI + 1 },
|
2023-11-16 02:02:15 +08:00
|
|
|
{ .name = "BLAKE2-botan", .digest = hash_blake2b, .digest_size = 32,
|
|
|
|
.cpu_flag = CPU_FLAG_NONE, .backend = CRYPTOPROVIDER_BOTAN + 1 },
|
2023-11-16 21:54:03 +08:00
|
|
|
{ .name = "BLAKE2-openssl", .digest = hash_blake2b, .digest_size = 32,
|
|
|
|
.cpu_flag = CPU_FLAG_NONE, .backend = CRYPTOPROVIDER_OPENSSL + 1 },
|
2023-02-09 09:54:35 +08:00
|
|
|
{ .name = "BLAKE2-SSE2", .digest = hash_blake2b, .digest_size = 32,
|
2023-03-01 22:27:52 +08:00
|
|
|
.cpu_flag = CPU_FLAG_SSE2, .backend = CRYPTOPROVIDER_BUILTIN + 1 },
|
2023-02-09 09:54:35 +08:00
|
|
|
{ .name = "BLAKE2-SSE41", .digest = hash_blake2b, .digest_size = 32,
|
2023-03-01 22:27:52 +08:00
|
|
|
.cpu_flag = CPU_FLAG_SSE41, .backend = CRYPTOPROVIDER_BUILTIN + 1 },
|
2023-02-09 09:54:35 +08:00
|
|
|
{ .name = "BLAKE2-AVX2", .digest = hash_blake2b, .digest_size = 32,
|
2023-03-01 22:27:52 +08:00
|
|
|
.cpu_flag = CPU_FLAG_AVX2, .backend = CRYPTOPROVIDER_BUILTIN + 1 },
|
2019-06-10 20:49:50 +08:00
|
|
|
};
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
int units = UNITS_CYCLES;
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
|
2023-02-09 09:54:35 +08:00
|
|
|
cpu_detect_flags();
|
|
|
|
cpu_print_flags();
|
2023-02-16 10:30:46 +08:00
|
|
|
hash_init_accel();
|
2023-02-09 09:54:35 +08:00
|
|
|
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
optind = 0;
|
|
|
|
while (1) {
|
|
|
|
static const struct option long_options[] = {
|
|
|
|
{ "cycles", no_argument, NULL, 'c' },
|
|
|
|
{ "time", no_argument, NULL, 't' },
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
{ "perf", no_argument, NULL, 'p' },
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
{ NULL, 0, NULL, 0}
|
|
|
|
};
|
|
|
|
int c;
|
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
c = getopt_long(argc, argv, "ctp", long_options, NULL);
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
if (c < 0)
|
|
|
|
break;
|
|
|
|
switch (c) {
|
|
|
|
case 'c':
|
|
|
|
if (!cycles_supported) {
|
2022-09-17 01:29:25 +08:00
|
|
|
error("cannot measure cycles on this arch, use --time");
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
return 1;
|
|
|
|
}
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
units = UNITS_CYCLES;
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
break;
|
|
|
|
case 't':
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
units = UNITS_TIME;
|
|
|
|
break;
|
|
|
|
case 'p':
|
|
|
|
if (perf_init() == -1) {
|
2022-09-17 01:29:25 +08:00
|
|
|
error(
|
|
|
|
"cannot initialize perf, please check sysctl kernel.perf_event_paranoid: %m");
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
units = UNITS_PERF;
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
break;
|
|
|
|
default:
|
2022-09-17 01:29:25 +08:00
|
|
|
error("unknown option");
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
return 1;
|
|
|
|
}
|
|
|
|
}
|
2019-06-10 20:49:50 +08:00
|
|
|
|
btrfs-progs: crypto: add time-based measurement to hash-speedtest
People are interested in measuring the hash performance on non-x86_64
architectures. Add option to do time-based measurements (in nanoseconds)
in case there's no support for clock-based measurements.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 43035633, cycles/i 430
NULL-MEMCPY: cycles: 72478624, cycles/i 724
CRC32C: cycles: 181712982, cycles/i 1817
XXHASH: cycles: 136251305, cycles/i 1362
SHA256: cycles: 10758567410, cycles/i 107585
BLAKE2b: cycles: 2249704806, cycles/i 22497
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12459033, nsecs/i 124
NULL-MEMCPY: nsecs: 20687845, nsecs/i 206
CRC32C: nsecs: 52648264, nsecs/i 526
XXHASH: nsecs: 39591766, nsecs/i 395
SHA256: nsecs: 3079668837, nsecs/i 30796
BLAKE2b: nsecs: 644766582, nsecs/i 6447
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 02:22:55 +08:00
|
|
|
if (argc - optind >= 1) {
|
|
|
|
iterations = atoi(argv[optind]);
|
2019-06-10 20:49:50 +08:00
|
|
|
if (iterations < 0)
|
|
|
|
iterations = 1;
|
|
|
|
}
|
|
|
|
|
|
|
|
memset(buf, 0, 4096);
|
|
|
|
|
2020-12-26 03:15:11 +08:00
|
|
|
printf("Block size: %d\n", blocksize);
|
|
|
|
printf("Iterations: %d\n", iterations);
|
|
|
|
printf("Implementation: %s\n", CRYPTOPROVIDER);
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
printf("Units: %s\n", units_to_desc(units));
|
2019-06-10 20:49:50 +08:00
|
|
|
printf("\n");
|
|
|
|
|
|
|
|
for (idx = 0; idx < ARRAY_SIZE(contestants); idx++) {
|
|
|
|
struct contestant *c = &contestants[idx];
|
|
|
|
u64 start, end;
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 05:09:52 +08:00
|
|
|
u64 tstart, tend;
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
u64 total = 0;
|
2019-06-10 20:49:50 +08:00
|
|
|
|
2023-02-09 09:54:35 +08:00
|
|
|
if (c->cpu_flag != 0 && !cpu_has_feature(c->cpu_flag)) {
|
btrfs-progs: crypto: add PCL based implementation for crc32c
Copy faster implementation of crc32c from linux kernel as of 6.5-rc7
(x86_64, arch/x86/crypto/crc32c-pcl-intel-asm_64.S). This needs
assembler build support, so detect target architecture so
cross-compilation still works.
Add a special CPU flag so the old and new implementations can be
benchmarked and verified separately.
Sample benchmark:
CPU flags: 0x1ff
CPU features: SSE2 SSSE3 SSE41 SSE42 SHA AVX AVX2 CRC32C_PCL
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 77177218, cycles/i 77
NULL-MEMCPY: cycles: 226313072, cycles/i 226, 62133.395 MiB/s
CRC32C-ref: cycles: 24418596066, cycles/i 24418, 575.859 MiB/s
CRC32C-NI: cycles: 1188335920, cycles/i 1188, 11833.073 MiB/s
CRC32C-PCL: cycles: 463193456, cycles/i 463, 30358.037 MiB/s
XXHASH: cycles: 851606646, cycles/i 851, 16511.916 MiB/s
SHA256-ref: cycles: 74476234956, cycles/i 74476, 188.808 MiB/s
SHA256-NI: cycles: 34198637428, cycles/i 34198, 411.177 MiB/s
BLAKE2-ref: cycles: 14761411664, cycles/i 14761, 952.597 MiB/s
BLAKE2-SSE2: cycles: 18101896796, cycles/i 18101, 776.807 MiB/s
BLAKE2-SSE41: cycles: 12599091062, cycles/i 12599, 1116.087 MiB/s
BLAKE2-AVX2: cycles: 9668247506, cycles/i 9668, 1454.418 MiB/s
The new implementation is about 2.5x faster.
Note: there new version does not work on musl because of linkage
problems (relocations in .rodata), so it's still using the old
implementation.
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-22 05:18:13 +08:00
|
|
|
printf("%14s: no CPU support\n", c->name);
|
2023-02-09 09:54:35 +08:00
|
|
|
continue;
|
|
|
|
}
|
2023-03-01 22:27:52 +08:00
|
|
|
/* Backend not compiled in */
|
|
|
|
if (c->backend == 1)
|
|
|
|
continue;
|
|
|
|
printf("%14s: ", c->name);
|
2019-06-10 20:49:50 +08:00
|
|
|
fflush(stdout);
|
|
|
|
|
btrfs-progs: hash-speedtest: select implementation by features
Now put all the recent changes into action. Add a callback that will
reinitialize the implementation pointers according to the desired
feature. Reference implementations use the NONE CPU flag to distinguish
them from the rest.
Example results:
$ hash-speedtest
CPU flags: 0xff
CPU features: SSE2 SSSE3 SSE41 SSE42 SHA AVX AVX2
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 67129026, cycles/i 67
NULL-MEMCPY: cycles: 231303654, cycles/i 231, 60792.500 MiB/s
CRC32C-ref: cycles: 23982698042, cycles/i 23982, 586.322 MiB/s
CRC32C-NI: cycles: 1168017624, cycles/i 1168, 12038.828 MiB/s
XXHASH: cycles: 838434468, cycles/i 838, 16771.152 MiB/s
SHA256-ref: cycles: 68296865380, cycles/i 68296, 205.889 MiB/s
SHA256-NI: cycles: 29748853920, cycles/i 29748, 472.676 MiB/s
BLAKE2-ref: cycles: 14532177414, cycles/i 14532, 967.617 MiB/s
BLAKE2-SSE2: cycles: 17762215810, cycles/i 17762, 791.657 MiB/s
BLAKE2-SSE41: cycles: 12370044656, cycles/i 12370, 1136.744 MiB/s
BLAKE2-AVX2: cycles: 9472823338, cycles/i 9472, 1484.412 MiB/s
Previously:
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 67714016, cycles/i 67
NULL-MEMCPY: cycles: 234140818, cycles/i 234, 60055.762 MiB/s
CRC32C: cycles: 1187358432, cycles/i 1187, 11842.733 MiB/s
XXHASH: cycles: 1897530684, cycles/i 1897, 7410.448 MiB/s
SHA256: cycles: 69855340702, cycles/i 69855, 201.296 MiB/s
BLAKE2: cycles: 14713130972, cycles/i 14713, 955.716 MiB/s
The CPU is i7-11700 3.60GHz and not the same as previous results
mentioned in changelogs so the results are incomparable. Otherwise, the
updated xxhash implementation is twice as fast, no significant changes
for the rest.
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-16 11:41:27 +08:00
|
|
|
if (c->cpu_flag) {
|
|
|
|
cpu_set_level(c->cpu_flag);
|
2023-03-01 08:32:26 +08:00
|
|
|
hash_init_accel();
|
btrfs-progs: hash-speedtest: select implementation by features
Now put all the recent changes into action. Add a callback that will
reinitialize the implementation pointers according to the desired
feature. Reference implementations use the NONE CPU flag to distinguish
them from the rest.
Example results:
$ hash-speedtest
CPU flags: 0xff
CPU features: SSE2 SSSE3 SSE41 SSE42 SHA AVX AVX2
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 67129026, cycles/i 67
NULL-MEMCPY: cycles: 231303654, cycles/i 231, 60792.500 MiB/s
CRC32C-ref: cycles: 23982698042, cycles/i 23982, 586.322 MiB/s
CRC32C-NI: cycles: 1168017624, cycles/i 1168, 12038.828 MiB/s
XXHASH: cycles: 838434468, cycles/i 838, 16771.152 MiB/s
SHA256-ref: cycles: 68296865380, cycles/i 68296, 205.889 MiB/s
SHA256-NI: cycles: 29748853920, cycles/i 29748, 472.676 MiB/s
BLAKE2-ref: cycles: 14532177414, cycles/i 14532, 967.617 MiB/s
BLAKE2-SSE2: cycles: 17762215810, cycles/i 17762, 791.657 MiB/s
BLAKE2-SSE41: cycles: 12370044656, cycles/i 12370, 1136.744 MiB/s
BLAKE2-AVX2: cycles: 9472823338, cycles/i 9472, 1484.412 MiB/s
Previously:
Block size: 4096
Iterations: 1000000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 67714016, cycles/i 67
NULL-MEMCPY: cycles: 234140818, cycles/i 234, 60055.762 MiB/s
CRC32C: cycles: 1187358432, cycles/i 1187, 11842.733 MiB/s
XXHASH: cycles: 1897530684, cycles/i 1897, 7410.448 MiB/s
SHA256: cycles: 69855340702, cycles/i 69855, 201.296 MiB/s
BLAKE2: cycles: 14713130972, cycles/i 14713, 955.716 MiB/s
The CPU is i7-11700 3.60GHz and not the same as previous results
mentioned in changelogs so the results are incomparable. Otherwise, the
updated xxhash implementation is twice as fast, no significant changes
for the rest.
Signed-off-by: David Sterba <dsterba@suse.com>
2023-02-16 11:41:27 +08:00
|
|
|
}
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 05:09:52 +08:00
|
|
|
tstart = get_time();
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
start = get_cycles(units);
|
2019-06-10 20:49:50 +08:00
|
|
|
for (iter = 0; iter < iterations; iter++) {
|
|
|
|
memset(buf, iter & 0xFF, blocksize);
|
|
|
|
memset(hash, 0, 32);
|
|
|
|
c->digest(buf, blocksize, hash);
|
|
|
|
}
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
end = get_cycles(units);
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 05:09:52 +08:00
|
|
|
tend = get_time();
|
2019-06-10 20:49:50 +08:00
|
|
|
c->cycles = end - start;
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 05:09:52 +08:00
|
|
|
c->time = tend - tstart;
|
2023-02-09 09:54:35 +08:00
|
|
|
cpu_reset_level();
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 05:09:52 +08:00
|
|
|
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
if (units == UNITS_CYCLES || units == UNITS_PERF)
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 05:09:52 +08:00
|
|
|
total = c->cycles;
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
if (units == UNITS_TIME)
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 05:09:52 +08:00
|
|
|
total = c->time;
|
|
|
|
|
2021-05-27 16:58:38 +08:00
|
|
|
printf("%s: %12llu, %s/i %8llu",
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 05:09:52 +08:00
|
|
|
units_to_str(units), total,
|
|
|
|
units_to_str(units), total / iterations);
|
|
|
|
if (idx > 0) {
|
|
|
|
float t;
|
|
|
|
float mb;
|
|
|
|
|
|
|
|
t = (float)c->time / 1000 / 1000 / 1000;
|
|
|
|
mb = blocksize * iterations / 1024 / 1024;
|
2021-05-27 16:58:38 +08:00
|
|
|
printf(", %12.3f MiB/s", mb / t);
|
btrfs-progs: crypto: print throughput in hash-speedtest
Calculate the estimated throughput as a number that's comparable across
machines.
$ ./hash-speedtest --cycles
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: cycles
NULL-NOP: cycles: 42928902, cycles/i 429
NULL-MEMCPY: cycles: 73014868, cycles/i 730, 18651.186 MiB/s
CRC32C: cycles: 182293290, cycles/i 1822, 7470.579 MiB/s
XXHASH: cycles: 138085981, cycles/i 1380, 9862.272 MiB/s
SHA256: cycles: 10576270837, cycles/i 105762, 128.764 MiB/s
BLAKE2b: cycles: 2263761293, cycles/i 22637, 601.585 MiB/s
$ ./hash-speedtest --time
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: nsecs
NULL-NOP: nsecs: 12164607, nsecs/i 121
NULL-MEMCPY: nsecs: 20423641, nsecs/i 204, 19095.518 MiB/s
CRC32C: nsecs: 51972794, nsecs/i 519, 7503.926 MiB/s
XXHASH: nsecs: 38935164, nsecs/i 389, 10016.651 MiB/s
SHA256: nsecs: 3030944497, nsecs/i 30309, 128.673 MiB/s
BLAKE2b: nsecs: 648489262, nsecs/i 6484, 601.398 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-05-27 05:09:52 +08:00
|
|
|
}
|
|
|
|
putchar('\n');
|
2019-06-10 20:49:50 +08:00
|
|
|
}
|
btrfs-progs: crypto: add perf support to speed test
Use perf events to read the cycle count, this should work on all
architectures. Enabled by option --perf and the sysctl
kernel.perf_event_paranoid must be 0 or 1.
The results are roughly the same as for raw cycles on x86_64 but worse
because of the additional overhead (read, context switch):
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: CPU cycles
NULL-NOP: cycles: 42719688, cycles/i 427
NULL-MEMCPY: cycles: 72941208, cycles/i 729, 18670.314 MiB/s
CRC32C: cycles: 183709926, cycles/i 1837, 7413.009 MiB/s
XXHASH: cycles: 136727614, cycles/i 1367, 9960.264 MiB/s
SHA256: cycles: 10711594532, cycles/i 107115, 127.137 MiB/s
BLAKE2: cycles: 2256957529, cycles/i 22569, 603.398 MiB/s
Block size: 4096
Iterations: 100000
Implementation: builtin
Units: perf event: CPU cycles
NULL-NOP: perf_c: 29649530, perf_c/i 296
NULL-MEMCPY: perf_c: 59954062, perf_c/i 599, 15137.464 MiB/s
CRC32C: perf_c: 179009071, perf_c/i 1790, 6929.460 MiB/s
XXHASH: perf_c: 136413509, perf_c/i 1364, 9982.950 MiB/s
SHA256: perf_c: 10997356664, perf_c/i 109973, 127.046 MiB/s
BLAKE2: perf_c: 2379077576, perf_c/i 23790, 588.780 MiB/s
Signed-off-by: David Sterba <dsterba@suse.com>
2021-06-02 03:41:53 +08:00
|
|
|
perf_finish();
|
2019-06-10 20:49:50 +08:00
|
|
|
|
|
|
|
return 0;
|
|
|
|
}
|