mirrors/qemu

mirror of https://github.com/qemu/qemu.git synced 2024-11-25 03:43:37 +08:00

Author	SHA1	Message	Date
Stefan Hajnoczi	d73415a315	qemu/atomic.h: rename atomic_ to qatomic_ clang's C11 atomic_fetch_() functions only take a C11 atomic type pointer argument. QEMU uses direct types (int, etc) and this causes a compiler error when a QEMU code calls these functions in a source file that also included <stdatomic.h> via a system header file: $ CC=clang CXX=clang++ ./configure ... && make ../util/async.c:79:17: error: address argument to atomic operation must be a pointer to _Atomic type ('unsigned int ' invalid) Avoid using atomic_*() names in QEMU's atomic.h since that namespace is used by <stdatomic.h>. Prefix QEMU's APIs with 'q' so that atomic.h and <stdatomic.h> can co-exist. I checked /usr/include on my machine and searched GitHub for existing "qatomic_" users but there seem to be none. This patch was generated using: $ git grep -h -o '\<atomic$64$\?_[a-z0-9_]\+' include/qemu/atomic.h \| \ sort -u >/tmp/changed_identifiers $ for identifier in $(</tmp/changed_identifiers); do sed -i "s%\<$identifier\>%q$identifier%g" \ $(git grep -I -l "\<$identifier\>") done I manually fixed line-wrap issues and misaligned rST tables. Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Acked-by: Paolo Bonzini <pbonzini@redhat.com> Message-Id: <20200923105646.47864-1-stefanha@redhat.com>	2020-09-23 16:07:44 +01:00
Richard Henderson	78441c04ca	tests/qht-bench: Adjust threshold computation In `06c4cc3660`, we split the multiplication in two parts to avoid a clang warning. But because double still rounds to 53 bits, this does not provide additional precision beyond multiplication by nextafter(0x1p64, 0), the largest representable value smaller than 264. However, since we have eliminated 1.0, mutiplying by 264 produces a better distribution of input values to the output values. Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20200626200950.1015121-3-richard.henderson@linaro.org>	2020-07-11 15:53:00 +01:00
Richard Henderson	d11f824991	tests/qht-bench: Adjust testing rate by -1 Since the seed must be non-zero, subtracting 1 means puts the rate in 0..UINT64_MAX-1, which allows the 0 and UINT64_MAX thresholds to corrspond to 0% (never) and 100% (always). Suggested-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Message-Id: <20200626200950.1015121-2-richard.henderson@linaro.org>	2020-07-11 15:53:00 +01:00
Richard Henderson	06c4cc3660	qht: Fix threshold rate calculation tests/qht-bench.c:287:29: error: implicit conversion from 'unsigned long' to 'double' changes value from 18446744073709551615 to 18446744073709551616 [-Werror,-Wimplicit-int-float-conversion] threshold = rate UINT64_MAX; ~ ^~~~~~~~~~ Fix this by splitting the 64-bit constant into two halves, each of which is individually perfectly representable, the sum of which produces the correct arithmetic result. This is very likely just a sticking plaster over some underlying incorrect code, but it will suppress the warning for the moment. Cc: Emilio G. Cota <cota@braap.org> Reported-by: Philippe Mathieu-Daudé <philmd@redhat.com> Signed-off-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2020-06-19 18:29:11 +01:00
Alex Bennée	eb4f8e100f	tests: use g_usleep instead of rem = sleep(time) Relying on sleep to always return having slept isn't safe as a signal may have occurred. If signals are constantly incoming the program will never reach its termination condition. This is believed to be the mechanism causing time outs for qht-test in Travis. The glib g_usleep() deals with all of this for us so lets use it instead. Signed-off-by: Alex Bennée <alex.bennee@linaro.org> Reviewed-by: Peter Maydell <peter.maydell@linaro.org>	2019-01-14 14:52:30 +00:00
Emilio G. Cota	fe656e3185	include: move exec/tb-hash-xx.h to qemu/xxhash.h Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Emilio G. Cota	c971d8fa73	exec: introduce qemu_xxhash{2,4,5,6,7} Before moving them all to include/qemu/xxhash.h. Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Emilio G. Cota	e132fde25f	qht-bench: document -p flag Which we forgot to do in `bd224fce60` ("qht-bench: add -p flag to precompute hash values", 2018-09-26). Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-12-17 06:04:44 +03:00
Emilio G. Cota	bd224fce60	qht-bench: add -p flag to precompute hash values Precomputing the hash values allows us to perform more frequent accesses to the hash table, thereby reaching higher throughputs. We keep the old behaviour by default, since (1) we might confuse users if they measured a speedup without changing anything in the QHT implementation, and (2) benchmarking the hash function "on line" is also valuable. Before: $ taskset -c 0 tests/qht-bench -n 1 Throughput: 38.18 MT/s After: $ taskset -c 0 tests/qht-bench -n 1 Throughput: 38.16 MT/s After (with precomputing): $ taskset -c 0 tests/qht-bench -n 1 -p Throughput: 50.87 MT/s Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-09-26 08:55:54 -07:00
Emilio G. Cota	32359d529f	qht: return existing entry when qht_insert fails The meaning of "existing" is now changed to "matches in hash and ht->cmp result". This is saner than just checking the pointer value. Suggested-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	61b8cef1d4	qht: require a default comparison function qht_lookup now uses the default cmp function. qht_lookup_custom is defined to retain the old behaviour, that is a cmp function is explicitly provided. qht_insert will gain use of the default cmp in the next patch. Note that we move qht_lookup_custom's @func to be the last argument, which makes the new qht_lookup as simple as possible. Instead of this (i.e. keeping @func 2nd): 0000000000010750 <qht_lookup>: 10750: 89 d1 mov %edx,%ecx 10752: 48 89 f2 mov %rsi,%rdx 10755: 48 8b 77 08 mov 0x8(%rdi),%rsi 10759: e9 22 ff ff ff jmpq 10680 <qht_lookup_custom> 1075e: 66 90 xchg %ax,%ax We get: 0000000000010740 <qht_lookup>: 10740: 48 8b 4f 08 mov 0x8(%rdi),%rcx 10744: e9 37 ff ff ff jmpq 10680 <qht_lookup_custom> 10749: 0f 1f 80 00 00 00 00 nopl 0x0(%rax) Reviewed-by: Richard Henderson <richard.henderson@linaro.org> Reviewed-by: Alex Bennée <alex.bennee@linaro.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2018-06-15 07:42:55 -10:00
Emilio G. Cota	4e2ca83e71	tcg: define CF_PARALLEL and use it for TB hashing along with CF_COUNT_MASK This will enable us to decouple code translation from the value of parallel_cpus at any given time. It will also help us minimize TB flushes when generating code via EXCP_ATOMIC. Note that the declaration of parallel_cpus is brought to exec-all.h to be able to define there the "curr_cflags" inline. Signed-off-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Richard Henderson <richard.henderson@linaro.org>	2017-10-24 13:53:41 -07:00
Lluís Vilanova	61a67f71dd	exec: [tcg] Use different TBs according to the vCPU's dynamic tracing state Every vCPU now uses a separate set of TBs for each set of dynamic tracing event state values. Each set of TBs can be used by any number of vCPUs to maximize TB reuse when vCPUs have the same tracing state. This feature is later used by tracetool to optimize tracing of guest code events. The maximum number of TB sets is defined as 2^E, where E is the number of events that have the 'vcpu' property (their state is stored in CPUState->trace_dstate). For this to work, a change on the dynamic tracing state of a vCPU will force it to flush its virtual TB cache (which is only indexed by address), and fall back to the physical TB cache (which now contains the vCPU's dynamic tracing state as part of the hashing function). Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu> Reviewed-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Emilio G. Cota <cota@braap.org> Signed-off-by: Emilio G. Cota <cota@braap.org> Message-id: 149915775266.6295.10060144081246467690.stgit@frigg.lan Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2017-07-17 13:11:05 +01:00
Emilio G. Cota	977ec47de0	qht-bench: relax test_start/stop atomic accesses test_start/stop are used only as flags to loop on. Barriers are unnecessary, since no dependent data is transferred among threads apart from the flags themselves. This commit relaxes the three accesses to test_start/stop that were not yet relaxed. Signed-off-by: Emilio G. Cota <cota@braap.org>	2016-10-24 15:27:19 +02:00
Paolo Bonzini	e9abfcb57f	clean-includes: run it once more Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-06-16 18:39:03 +02:00
Emilio G. Cota	515864a0d7	qht: add qht-bench, a performance benchmark This serves as a performance benchmark as well as a stress test for QHT. We can tweak quite a number of things, including the number of resize threads and how frequently resizes are triggered. A performance comparison of QHT vs CLHT[1] and ck_hs[2] using this same benchmark program can be found here: http://imgur.com/a/0Bms4 The tests are run on a 64-core AMD Opteron 6376, pinning threads to cores favoring same-socket cores. For each run, qht-bench is invoked with: $ tests/qht-bench -d $duration -n $n -u $u -g $range , where $duration is in seconds, $n is the number of threads, $u is the update rate (0.0 to 100.0), and $range is the number of keys. Note that ck_hs's performance drops significantly as writes go up, since it requires an external lock (I used a ck_spinlock) around every write. Also, note that CLHT instead of using a seqlock, relies on an allocator that does not ever return the same address during the same read-critical section. This gives it a slight performance advantage over QHT on read-heavy workloads, since the seqlock writes aren't there. [1] CLHT: https://github.com/LPD-EPFL/CLHT https://infoscience.epfl.ch/record/207109/files/ascy_asplos15.pdf [2] ck_hs: http://concurrencykit.org/ http://backtrace.io/blog/blog/2015/03/13/workload-specialization/ A few of those plots are shown in text here, since that site might not be online forever. Throughput is on Mops/s on the Y axis. 200K keys, 0 % updates 450 ++--+------+------+-------+-------+-------+-------+------+-------+--++ \| + + + + + + + + +N+ \| 400 ++ ---+E+ ++ \| +++---- \| 350 ++ 9 ++------+------++ --+E+ -+H+ ++ \| \| +H+- \| -+N+---- ---- +++ \| 300 ++ 8 ++ +E+ ++ -----+E+ --+H+ ++ \| \| +++ \| -+N+-----+H+-- \| 250 ++ 7 ++------+------++ +++-----+E+---- ++ 200 ++ 1 -+E+-----+H+ ++ \| ---- qht +-E--+ \| 150 ++ -+E+ clht +-H--+ ++ \| ---- ck +-N--+ \| 100 ++ +E+ ++ \| ---- \| 50 ++ -+E+ ++ \| +E+E+ + + + + + + + + \| 0 ++--E------+------+-------+-------+-------+-------+------+-------+--++ 1 8 16 24 32 40 48 56 64 Number of threads 200K keys, 1 % updates 350 ++--+------+------+-------+-------+-------+-------+------+-------+--++ \| + + + + + + + + -+E+ \| 300 ++ -----+H+ ++ \| +E+-- \| \| 9 ++------+------++ +++---- \| 250 ++ \| +E+ -- \| -+E+ ++ \| 8 ++ -- ++ ---- \| 200 ++ \| +++- \| +++ ---+E+ ++ \| 7 ++------N------++ -+E+-- qht +-E--+ \| \| 1 +++---- clht +-H--+ \| 150 ++ -+E+ ck +-N--+ ++ \| ---- \| 100 ++ +E+ ++ \| ---- \| \| -+E+ \| 50 ++ +H+-+N+----+N+-----+N+------ ++ \| +E+E+ + + + +N+-----+N+-----+N+----+N+-----+N+ \| 0 ++--E------+------+-------+-------+-------+-------+------+-------+--++ 1 8 16 24 32 40 48 56 64 Number of threads 200K keys, 20 % updates 300 ++--+------+------+-------+-------+-------+-------+------+-------+--++ \| + + + + + + + + + \| \| -+H+ \| 250 ++ ---- ++ \| 9 ++------+------++ --+H+ ---+E+ \| \| 8 ++ +H+-- ++ -+H+----+E+-- \| 200 ++ \| +E+ --\| -----+E+-- +++ ++ \| 7 ++ + ---- ++ ---+H+---- +++ qht +-E--+ \| 150 ++ 6 ++------N------++ -+H+-----+E+ clht +-H--+ ++ \| 1 -----+E+-- ck +-N--+ \| \| -+H+---- \| 100 ++ -----+E+ ++ \| +E+-- \| \| ----+++ \| 50 ++ -+E+ ++ \| +E+ +++ \| \| +E+N+-+N+-----+ + + + + + + \| 0 ++--E------+------N-------N-------N-------N-------N------N-------N--++ 1 8 16 24 32 40 48 56 64 Number of threads 200K keys, 100 % updates qht +-E--+ clht +-H--+ 160 ++--+------+------+-------+-------+-------+-------+---ck-+-N-----+--++ \| + + + + + + + + ----H \| 140 ++ +H+-- -+E+ ++ \| +++---- ---- \| 120 ++ 8 ++------+------++ -+H+ +E+ ++ \| 7 ++ +H+---- ++ ---- +++---- \| 100 ++ \| +E+ \| +++ ---+H+ -+E+ ++ \| 6 ++ +++ ++ -+H+-- +++---- \| 80 ++ 5 ++------N----------+E+-----+E+ ++ \| 1 -+H+---- +++ \| \| -----+E+ \| 60 ++ +H+---- +++ ++ \| ----+E+ \| 40 ++ +H+---- ++ \| --+E+ \| 20 ++ +E+ ++ \| +EE+ + + + + + + + + \| 0 ++--+N-N---N------N-------N-------N-------N-------N------N-------N--++ 1 8 16 24 32 40 48 56 64 Number of threads Signed-off-by: Emilio G. Cota <cota@braap.org> Message-Id: <1465412133-3029-13-git-send-email-cota@braap.org> Signed-off-by: Richard Henderson <rth@twiddle.net>	2016-06-11 17:11:16 -07:00

16 Commits