Commit Graph

196 Commits

Author SHA1 Message Date
Richard Henderson
38dc12947e tcg: Add support for vector bitwise select
This operation performs d = (b & a) | (c & ~a), and is present
on a majority of host vector units.  Include gvec expanders.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-22 15:09:43 -04:00
Peter Maydell
d8276573da Add CPUClass::tlb_fill.
Improve tlb_vaddr_to_host for use by ARM SVE no-fault loads.
 -----BEGIN PGP SIGNATURE-----
 
 iQFRBAABCgA7FiEEekgeeIaLTbaoWgXAZN846K9+IV8FAlzVx4UdHHJpY2hhcmQu
 aGVuZGVyc29uQGxpbmFyby5vcmcACgkQZN846K9+IV+U1Af/b3cV5d5a1LWRdLgR
 71JCPK/M3o43r2U9wCSikteXkmNBEdEoc5+WRk2SuZFLW/JB1DHDY7/gISPIhfoB
 ZIza2TxD/QK1CQ5/mMWruKBlyygbYYZgsYaaNsMJRJgicgOSjTN0nuHMbIfv3tAN
 mu+IlkD0LdhVjP0fz30Jpew3b3575RCjYxEPM6KQI3RxtQFjZ3FhqV5hKR4vtdP5
 yLWJQzwAbaCB3SZUvvp7TN1ZsmeyLpc+Yz/YtRTqQedo7SNWWBKldLhqq4bZnH1I
 AkzHbtWIOBrjWJ34ZMAgI5Q56Du9TBbBvCdM9azmrQjSu/2kdsPBPcUyOpnUCsCx
 NyXo9g==
 =x71l
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20190510' into staging

Add CPUClass::tlb_fill.
Improve tlb_vaddr_to_host for use by ARM SVE no-fault loads.

# gpg: Signature made Fri 10 May 2019 19:48:37 BST
# gpg:                using RSA key 7A481E78868B4DB6A85A05C064DF38E8AF7E215F
# gpg:                issuer "richard.henderson@linaro.org"
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20190510: (27 commits)
  tcg: Use tlb_fill probe from tlb_vaddr_to_host
  tcg: Remove CPUClass::handle_mmu_fault
  tcg: Use CPUClass::tlb_fill in cputlb.c
  target/xtensa: Convert to CPUClass::tlb_fill
  target/unicore32: Convert to CPUClass::tlb_fill
  target/tricore: Convert to CPUClass::tlb_fill
  target/tilegx: Convert to CPUClass::tlb_fill
  target/sparc: Convert to CPUClass::tlb_fill
  target/sh4: Convert to CPUClass::tlb_fill
  target/s390x: Convert to CPUClass::tlb_fill
  target/riscv: Convert to CPUClass::tlb_fill
  target/ppc: Convert to CPUClass::tlb_fill
  target/openrisc: Convert to CPUClass::tlb_fill
  target/nios2: Convert to CPUClass::tlb_fill
  target/moxie: Convert to CPUClass::tlb_fill
  target/mips: Convert to CPUClass::tlb_fill
  target/mips: Tidy control flow in mips_cpu_handle_mmu_fault
  target/mips: Pass a valid error to raise_mmu_exception for user-only
  target/microblaze: Convert to CPUClass::tlb_fill
  target/m68k: Convert to CPUClass::tlb_fill
  ...

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-05-16 13:15:08 +01:00
Richard Henderson
bcefc90208 tcg: Add support for vector absolute value
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
5ee5c14cac tcg: Add gvec expanders for variable shift
The gvec expanders perform a modulo on the shift count.  If the target
requires alternate behaviour, then it cannot use the generic gvec
expanders anyway, and will have to have its own custom code.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-13 22:52:08 +00:00
Richard Henderson
4601f8d10d cputlb: Do unaligned store recursion to outermost function
This is less tricky than for loads, because we always fall
back to single byte stores to implement unaligned stores.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2019-05-10 20:23:21 +01:00
Richard Henderson
2dd9260678 cputlb: Do unaligned load recursion to outermost function
If we attempt to recurse from load_helper back to load_helper,
even via intermediary, we do not get all of the constants
expanded away as desired.

But if we recurse back to the original helper (or a shim that
has a consistent function signature), the operands are folded
away as desired.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2019-05-10 20:23:21 +01:00
Richard Henderson
fc1bc77791 cputlb: Drop attribute flatten
Going to approach this problem via __attribute__((always_inline))
instead, but full conversion will take several steps.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2019-05-10 20:23:21 +01:00
Richard Henderson
f1be36969d cputlb: Move TLB_RECHECK handling into load/store_helper
Having this in io_readx/io_writex meant that we forgot to
re-compute index after tlb_fill.  It also means we can use
the normal aligned memory load path.  It also fixes a bug
in that we had cached a use of index across a tlb_fill.

Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2019-05-10 20:23:21 +01:00
Alex Bennée
eed5664238 accel/tcg: demacro cputlb
Instead of expanding a series of macros to generate the load/store
helpers we move stuff into common functions and rely on the compiler
to eliminate the dead code for each variant.

Signed-off-by: Alex Bennée <alex.bennee@linaro.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
2019-05-10 20:23:06 +01:00
Richard Henderson
4811e9095c tcg: Use tlb_fill probe from tlb_vaddr_to_host
Most of the existing users would continue around a loop which
would fault the tlb entry in via a normal load/store.

But for AArch64 SVE we have an existing emulation bug wherein we
would mark the first element of a no-fault vector load as faulted
(within the FFR, not via exception) just because we did not have
its address in the TLB.  Now we can properly only mark it as faulted
if there really is no valid, readable translation, while still not
raising an exception.  (Note that beyond the first element of the
vector, the hardware may report a fault for any reason whatsoever;
with at least one element loaded, forward progress is guaranteed.)

Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-10 11:12:50 -07:00
Richard Henderson
69963f5709 tcg: Remove CPUClass::handle_mmu_fault
This hook is now completely replaced by tlb_fill.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-10 11:12:50 -07:00
Richard Henderson
c319dc1357 tcg: Use CPUClass::tlb_fill in cputlb.c
We can now use the CPUClass hook instead of a named function.

Create a static tlb_fill function to avoid other changes within
cputlb.c.  This also isolates the asserts within.  Remove the
named tlb_fill function from all of the targets.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-10 11:12:50 -07:00
Richard Henderson
da6bbf8513 tcg: Add CPUClass::tlb_fill
This hook will replace the (user-only mode specific) handle_mmu_fault
hook, and the (system mode specific) tlb_fill function.

The handle_mmu_fault hook was written as if there was a valid
way to recover from an mmu fault, and had 3 possible return states.
In reality, the only valid action is to raise an exception,
return to the main loop, and deliver the SIGSEGV to the guest.

Note that all of the current implementations of handle_mmu_fault
for guests which support linux-user do in fact only ever return 1,
which is the signal to return to the main loop.

Using the hook for system mode requires that all targets be converted,
so for now the hook is (optionally) used only from user-only mode.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-05-10 07:55:06 -07:00
Shahab Vahedi
ef5dae6805 cputlb: Fix io_readx() to respect the access_type
This change adapts io_readx() to its input access_type. Currently
io_readx() treats any memory access as a read, although it has an
input argument "MMUAccessType access_type". This results in:

1) Calling the tlb_fill() only with MMU_DATA_LOAD
2) Considering only entry->addr_read as the tlb_addr

Buglink: https://bugs.launchpad.net/qemu/+bug/1825359
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Shahab Vahedi <shahab.vahedi@gmail.com>
Message-Id: <20190420072236.12347-1-shahab.vahedi@gmail.com>
[rth: Remove assert; fix expression formatting.]
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-25 10:40:06 -07:00
Richard Henderson
6e6c4efed9 tcg: Restart after TB code generation overflow
If a TB generates too much code, try again with fewer insns.

Fixes: https://bugs.launchpad.net/bugs/1824853
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Richard Henderson
8b86d6d258 tcg: Hoist max_insns computation to tb_gen_code
In order to handle TB's that translate to too much code, we
need to place the control of the length of the translation
in the hands of the code gen master loop.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-04-24 13:04:33 -07:00
Markus Armbruster
3de2faa9a8 tcg: Simplify how dump_exec_info() prints
dump_exec_info() takes an fprintf()-like callback and a FILE * to pass
to it.

Its only caller hmp_info_jit() passes monitor_fprintf() and the
current monitor cast to FILE *.  monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf().  The
type-punning is ugly.

Drop the callback, and call qemu_printf() instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-5-armbru@redhat.com>
2019-04-18 22:18:59 +02:00
Markus Armbruster
d4c51a0af3 tcg: Simplify how dump_opcount_info() prints
dump_opcount_info() takes an fprintf()-like callback and a FILE * to
pass to it.

Its only caller hmp_info_opcount() passes monitor_fprintf() and the
current monitor cast to FILE *.  monitor_fprintf() casts it right
back, and is otherwise identical to monitor_printf().  The
type-punning is ugly.

Drop the callback, and call qemu_printf() instead.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Message-Id: <20190417191805.28198-4-armbru@redhat.com>
2019-04-18 22:18:59 +02:00
Markus Armbruster
e68b3baa25 trace-events: Consistently point to docs/devel/tracing.txt
Almost all trace-events point to docs/devel/tracing.txt in a comment
right at the beginning.  Touch up the ones that don't.

[Updated with Markus' new commit description wording.
--Stefan]

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Message-id: 20190314180929.27722-2-armbru@redhat.com
Message-Id: <20190314180929.27722-2-armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2019-03-22 16:17:37 +00:00
Emilio G. Cota
6d967cb86d cputlb: update TLB entry/index after tlb_fill
We are failing to take into account that tlb_fill() can cause a
TLB resize, which renders prior TLB entry pointers/indices stale.
Fix it by re-doing the TLB entry lookups immediately after tlb_fill.

Fixes: 86e1eff8bc ("tcg: introduce dynamic TLB sizing", 2019-01-28)
Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190209162745.12668-3-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-02-11 08:52:44 -08:00
Peter Maydell
713acc316d Queued accel/tcg patches
-----BEGIN PGP SIGNATURE-----
 
 iQEcBAABAgAGBQJcWle8AAoJEGTfOOivfiFfYeAIAKAu8KlM6JVHVITkk58lzRmT
 /cwd2O0yRroROslxrdAC0IRgMAH/oSIRK2W3dAIXLXb2TNuf3ydvdkBs6oVGWYnd
 5P6fdFSq4ZqOGY6Y2QbBe1RhIOnmYOVreVePQsT+ofIFZOYrP0hKtcc68nB8nGZM
 vVdlfotU/6aQuPZnXVBPNsXZCZEWz67JY1pzJTTNJKtPV51jsgxNSp4ktXuqkmvY
 RnnG347hbQWFs5bCaVXoSp5rzuDxfjlI5/vJw/HuG3h47LSWkiq1GYg39d5Yn802
 N7nZURXEI6onmLkTVndV/EkA1yqvUpOh4NM7S1hZVJ2DXu/wXcEr8kAATSeYkos=
 =Gmjb
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/rth/tags/pull-tcg-20190206' into staging

Queued accel/tcg patches

# gpg: Signature made Wed 06 Feb 2019 03:42:52 GMT
# gpg:                using RSA key 64DF38E8AF7E215F
# gpg: Good signature from "Richard Henderson <richard.henderson@linaro.org>" [full]
# Primary key fingerprint: 7A48 1E78 868B 4DB6 A85A  05C0 64DF 38E8 AF7E 215F

* remotes/rth/tags/pull-tcg-20190206:
  accel/tcg: Consider cluster index in tb_lookup__cpu_state()
  tcg: add early clober modifier in atomic16_cmpxchg on aarch64

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2019-02-07 11:46:40 +00:00
Peter Maydell
9fd9b7de61 accel/tcg: Consider cluster index in tb_lookup__cpu_state()
In commit f7b78602fd we added the CPU cluster number to the
cflags field of the TB hash; this included adding it to the value
kept in tb->cflags, since we pass that field directly into the hash
calculation in some places. Unfortunately we forgot to check whether
other parts of the code were doing comparisons against tb->cflags
that would need to be updated.

It turns out that there is exactly one such place: the
tb_lookup__cpu_state() function checks whether the TB it has
found in the tb_jmp_cache has a tb->cflags matching the cf_mask
that is passed in. The tb->cflags has the cluster_index in it
but the cf_mask does not.

Hoist the "add cluster index to the cf_mask" code up from
tb_htable_lookup() to tb_lookup__cpu_state() so it can be considered
in the "did this TB match in the jmp cache" condition, as well as
when we do the full hash lookup by physical PC, flags, etc.
(tb_htable_lookup() is only called from tb_lookup__cpu_state(),
so this change doesn't require any further knock-on changes.)

Fixes: f7b78602fd ("accel/tcg: Add cluster number to TCG TB hash")
Tested-by: Cleber Rosa <crosa@redhat.com>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Reported-by: Howard Spoelstra <hsp.cat7@gmail.com>
Reported-by: Cleber Rosa <crosa@redhat.com>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Message-Id: <20190205151810.571-1-peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-02-06 03:39:24 +00:00
Emilio G. Cota
6aaa24f9d4 cpu-exec: reset BQL after longjmp in cpu_exec_step_atomic
Just like we do in cpu_exec().

Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-05 16:50:16 +01:00
Emilio G. Cota
8fd3a9b81d cpu-exec: add assert_no_pages_locked() after longjmp
We forgot to add this check in faa9372c07 ("translate-all:
introduce assert_no_pages_locked", 2018-06-15); we only added
it after returning from a longjmp in cpu_exec_step_atomic. Fix it.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-02-05 16:50:16 +01:00
Thomas Huth
fb0343d5b4 tcg: Fix LGPL version number
It's either "GNU *Library* General Public version 2" or "GNU Lesser
General Public version *2.1*", but there was no "version 2.0" of the
"Lesser" library. So assume that version 2.1 is meant here.

Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Thomas Huth <thuth@redhat.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <1548252536-6242-5-git-send-email-thuth@redhat.com>
Signed-off-by: Laurent Vivier <laurent@vivier.eu>
2019-01-30 11:01:52 +01:00
Peter Maydell
f7b78602fd accel/tcg: Add cluster number to TCG TB hash
Include the cluster number in the hash we use to look
up TBs. This is important because a TB that is valid
for one cluster at a given physical address and set
of CPU flags is not necessarily valid for another:
the two clusters may have different views of physical
memory, or may have different CPU features (eg FPU
present or absent).

We put the cluster number in the high 8 bits of the
TB cflags. This gives us up to 256 clusters, which should
be enough for anybody. If we ever need more, or need
more bits in cflags for other purposes, we could make
tb_hash_func() take more data (and expand qemu_xxhash7()
to qemu_xxhash8()).

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Reviewed-by: Edgar E. Iglesias <edgar.iglesias@xilinx.com>
Message-id: 20190121152218.9592-4-peter.maydell@linaro.org
2019-01-29 11:46:06 +00:00
Peter Maydell
f454a54f3b accel/tcg/user-exec: Don't parse aarch64 insns to test for read vs write
In cpu_signal_handler() for aarch64 hosts, currently we parse
the faulting instruction to see if it is a load or a store.
Since the 3.16 kernel (~2014), the kernel has provided us with
the syndrome register for a fault, which includes the WnR bit.
Use this instead if it is present, only falling back to
instruction parsing if not.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-id: 20190108180014.32386-1-peter.maydell@linaro.org
2019-01-29 11:46:04 +00:00
Richard Henderson
e77c89fb08 cputlb: Remove static tlb sizing
Now that all tcg backends support TCG_TARGET_IMPLEMENTS_DYN_TLB,
remove the define and the old code.

Reviewed-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:04:35 -08:00
Emilio G. Cota
86e1eff8bc tcg: introduce dynamic TLB sizing
Disabled in all TCG backends for now.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-3-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Emilio G. Cota
3cea94bbc9 cputlb: do not evict empty entries to the vtlb
Currently we evict an entry to the victim TLB when it doesn't match
the current address. But it could be that there's no match because
the current entry is empty (i.e. all -1's, for instance via tlb_flush).
Do not evict the entry to the vtlb in that case.

This change will help us keep track of the TLB's use rate, which
we'll use to implement a policy for dynamic TLB sizing.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20190116170114.26802-2-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
dd0a0fcdd8 tcg: Add opcodes for vector minmax arithmetic
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Richard Henderson
f550805d83 tcg: Add gvec expanders for nand, nor, eqv
Reviewed-by: David Gibson <david@gibson.dropbear.id.au>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2019-01-28 07:03:34 -08:00
Marc-André Lureau
444e20a36f build-sys: don't include windows.h, osdep.h does it
osdep.h will also define the available Windows API version for QEMU.

Signed-off-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20181122110039.15972-2-marcandre.lureau@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2019-01-11 13:57:24 +01:00
Alistair Francis
464e447a0c tcg: Add RISC-V cpu signal handler
Signed-off-by: Alistair Francis <alistair.francis@wdc.com>
Signed-off-by: Michael Clark <mjc@sifive.com>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Message-Id: <c445175310fa836b61fd862a55628907f0093194.1545246859.git.alistair.francis@wdc.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-12-26 06:40:02 +11:00
Richard Henderson
ab65110530 cputlb: Remove tlb_c.pending_flushes
This is essentially redundant with tlb_c.dirty.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:39 +00:00
Richard Henderson
3d1523ced6 cputlb: Filter flushes on already clean tlbs
Especially for guests with large numbers of tlbs, like ARM or PPC,
we may well not use all of them in between flush operations.
Remember which tlbs have been used since the last flush, and
avoid any useless flushing.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:35 +00:00
Richard Henderson
e09de0a20d cputlb: Count "partial" and "elided" tlb flushes
Our only statistic so far was "full" tlb flushes, where all mmu_idx
are flushed at the same time.

Now count "partial" tlb flushes where sets of mmu_idx are flushed,
but the set is not maximal.  Account one per mmu_idx flushed, as
that is the unit of work performed.

We don't actually count elided flushes yet, but go ahead and change
the interface presented to the monitor all at once.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:30 +00:00
Richard Henderson
f8144c6c1e cputlb: Merge tlb_flush_page into tlb_flush_page_by_mmuidx
The difference between the two sets of APIs is now miniscule.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:26 +00:00
Richard Henderson
64f2674bbc cputlb: Merge tlb_flush_nocheck into tlb_flush_by_mmuidx_async_work
The difference between the two sets of APIs is now miniscule.

This allows tlb_flush, tlb_flush_all_cpus, and tlb_flush_all_cpus_synced
to be merged with their corresponding by_mmuidx functions as well.  For
accounting, consider mmu_idx_bitmask = ALL_MMUIDX_BITS to be a full flush.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:16 +00:00
Richard Henderson
d5363e5849 cputlb: Move env->vtlb_index to env->tlb_d.vindex
The rest of the tlb victim cache is per-tlb,
the next use index should be as well.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:12 +00:00
Richard Henderson
1308e02671 cputlb: Split large page tracking per mmu_idx
The set of large pages in the kernel is probably not the same
as the set of large pages in the application.  Forcing one
range to cover both will flush more often than necessary.

This allows tlb_flush_page_async_work to flush just the one
mmu_idx implicated, which in turn allows us to remove
tlb_check_page_and_flush_by_mmuidx_async_work.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:08 +00:00
Richard Henderson
60a2ad7d86 cputlb: Move cpu->pending_tlb_flush to env->tlb_c.pending_flush
Protect it with the tlb_lock instead of using atomics.
The move puts it in or near the same cacheline as the lock;
using the lock means we don't need a second atomic operation
in order to perform the update.  Which makes it cheap to also
update pending_flush in tlb_flush_by_mmuidx_async_work.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:16:02 +00:00
Richard Henderson
8ab102667e cputlb: Remove tcg_enabled hack from tlb_flush_nocheck
The bugs this was working around were fixed with commits
022d6378c7  target/unicore32: remove tlb_flush from uc32_init_fn
6e11beecfd  target/alpha: remove tlb_flush from alpha_cpu_initfn

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:15:57 +00:00
Richard Henderson
53d284554c cputlb: Move tlb_lock to CPUTLBCommon
This is the first of several moves to reduce the size of the
CPU_COMMON_TLB macro and improve some locality of refernce.

Tested-by: Emilio G. Cota <cota@braap.org>
Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-31 12:15:28 +00:00
Emilio G. Cota
403f290c06 cputlb: read CPUTLBEntry.addr_write atomically
Updates can come from other threads, so readers that do not
take tlb_lock must use atomic_read to avoid undefined
behaviour (UB).

This completes the conversion to tlb_lock. This conversion results
on average in no performance loss, as the following experiments
(run on an Intel i7-6700K CPU @ 4.00GHz) show.

1. aarch64 bootup+shutdown test:

- Before:
 Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

       7487.087786      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.12% )
    31,574,905,303      cycles                    #    4.217 GHz                      ( +-  0.12% )
    57,097,908,812      instructions              #    1.81  insns per cycle          ( +-  0.08% )
    10,255,415,367      branches                  # 1369.747 M/sec                    ( +-  0.08% )
       173,278,962      branch-misses             #    1.69% of all branches          ( +-  0.18% )

       7.504481349 seconds time elapsed                                          ( +-  0.14% )

- After:
 Performance counter stats for 'taskset -c 0 ../img/aarch64/die.sh' (10 runs):

       7462.441328      task-clock (msec)         #    0.998 CPUs utilized            ( +-  0.07% )
    31,478,476,520      cycles                    #    4.218 GHz                      ( +-  0.07% )
    57,017,330,084      instructions              #    1.81  insns per cycle          ( +-  0.05% )
    10,251,929,667      branches                  # 1373.804 M/sec                    ( +-  0.05% )
       173,023,787      branch-misses             #    1.69% of all branches          ( +-  0.11% )

       7.474970463 seconds time elapsed                                          ( +-  0.07% )

2. SPEC06int:
                                              SPEC06int (test set)
                                           [Y axis: Speedup over master]
  1.15 +-+----+------+------+------+------+------+-------+------+------+------+------+------+------+----+-+
       |                                                                                                  |
   1.1 +-+.................................+++.............................+  tlb-lock-v2 (m+++x)       +-+
       |                                +++ |                   +++        tlb-lock-v3 (spinl|ck)         |
       |                    +++          |  |     +++    +++     |                           |            |
  1.05 +-+....+++...........####.........|####.+++.|......|.....###....+++...........+++....###.........+-+
       |      ###         ++#| #         |# |# ***### +++### +++#+#     |     +++     |     #|#    ###    |
     1 +-+++***+#++++####+++#++#++++++++++#++#+*+*++#++++#+#+****+#++++###++++###++++###++++#+#++++#+#+++-+
       |    *+* #    #++# ***  #   #### ***  # * *++# ****+# *| * # ****|#   |# #    #|#    #+#    # #    |
  0.95 +-+..*.*.#....#..#.*|*..#...#..#.*|*..#.*.*..#.*|.*.#.*++*.#.*++*+#.****.#....#+#....#.#..++#.#..+-+
       |    * * #    #  # *|*  #   #  # *|*  # * *  # *++* # *  * # *  * # * |* #  ++# #    # #  *** #    |
       |    * * #  ++#  # *+*  #   #  # *|*  # * *  # *  * # *  * # *  * # *++* # **** #  ++# #  * * #    |
   0.9 +-+..*.*.#...|#..#.*.*..#.++#..#.*|*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*.|*.#...|#.#..*.*.#..+-+
       |    * * #  ***  # * *  #  |#  # *+*  # * *  # *  * # *  * # *  * # *  * # *++* #   |# #  * * #    |
  0.85 +-+..*.*.#..*|*..#.*.*..#.***..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.****.#..*.*.#..+-+
       |    * * #  *+*  # * *  # *|*  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # * |* #  * * #    |
       |    * * #  * *  # * *  # *+*  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # * |* #  * * #    |
   0.8 +-+..*.*.#..*.*..#.*.*..#.*.*..#.*.*..#.*.*..#.*..*.#.*..*.#.*..*.#.*..*.#.*..*.#.*++*.#..*.*.#..+-+
       |    * * #  * *  # * *  # * *  # * *  # * *  # *  * # *  * # *  * # *  * # *  * # *  * #  * * #    |
  0.75 +-+--***##--***###-***###-***###-***###-***###-****##-****##-****##-****##-****##-****##--***##--+-+
 400.perlben401.bzip2403.gcc429.m445.gob456.hmme45462.libqua464.h26471.omnet473483.xalancbmkgeomean

  png: https://imgur.com/a/BHzpPTW

Notes:
- tlb-lock-v2 corresponds to an implementation with a mutex.
- tlb-lock-v3 corresponds to the current implementation, i.e.
  a spinlock and a single lock acquisition in tlb_set_page_with_attrs.

Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181016153840.25877-1-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 19:46:53 -07:00
Richard Henderson
e6cd4bb59b tcg: Split CONFIG_ATOMIC128
GCC7+ will no longer advertise support for 16-byte __atomic operations
if only cmpxchg is supported, as for x86_64.  Fortunately, x86_64 still
has support for __sync_compare_and_swap_16 and we can make use of that.
AArch64 does not have, nor ever has had such support, so open-code it.

Reviewed-by: Emilio G. Cota <cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 19:46:36 -07:00
Richard Henderson
383beda9cf tcg: Add tlb_index and tlb_entry helpers
Isolate the computation of an index from an address into a
helper before we change that function.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
[ cota: convert tlb_vaddr_to_host; use atomic_read on addr_write ]
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181009175129.17888-2-cota@braap.org>
2018-10-18 18:58:10 -07:00
Emilio G. Cota
71aec3541d cputlb: serialize tlb updates with env->tlb_lock
Currently we rely on atomic operations for cross-CPU invalidations.
There are two cases that these atomics miss: cross-CPU invalidations
can race with either (1) vCPU threads flushing their TLB, which
happens via memset, or (2) vCPUs calling tlb_reset_dirty on their TLB,
which updates .addr_write with a regular store. This results in
undefined behaviour, since we're mixing regular and atomic ops
on concurrent accesses.

Fix it by using tlb_lock, a per-vCPU lock. All updaters of tlb_table
and the corresponding victim cache now hold the lock.
The readers that do not hold tlb_lock must use atomic reads when
reading .addr_write, since this field can be updated by other threads;
the conversion to atomic reads is done in the next patch.

Note that an alternative fix would be to expand the use of atomic ops.
However, in the case of TLB flushes this would have a huge performance
impact, since (1) TLB flushes can happen very frequently and (2) we
currently use a full memory barrier to flush each TLB entry, and a TLB
has many entries. Instead, acquiring the lock is barely slower than a
full memory barrier since it is uncontended, and with a single lock
acquisition we can flush the entire TLB.

Tested-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181009174557.16125-6-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 18:58:10 -07:00
Emilio G. Cota
ea9025cb49 cputlb: fix assert_cpu_is_self macro
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181009174557.16125-5-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 18:58:10 -07:00
Emilio G. Cota
5005e2537d exec: introduce tlb_init
Paves the way for the addition of a per-TLB lock.

Reviewed-by: Alex Bennée <alex.bennee@linaro.org>
Reviewed-by: Richard Henderson <richard.henderson@linaro.org>
Signed-off-by: Emilio G. Cota <cota@braap.org>
Message-Id: <20181009174557.16125-4-cota@braap.org>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
2018-10-18 18:58:10 -07:00