mirrors/qemu

mirror of https://github.com/qemu/qemu.git synced 2024-11-25 20:03:37 +08:00

Author	SHA1	Message	Date
Paolo Bonzini	633f650254	optimize: optimize using nonzero bits This adds two optimizations using the non-zero bit mask. In some cases involving shifts or ANDs the value can become zero, and can thus be optimized to a move of zero. Second, useless zero-extension or an AND with constant can be detected that would only zero bits that are already zero. The main advantage of this optimization is that it turns zero-extensions into moves, thus enabling much better copy propagation (around 1% code reduction). Here is for example a "test $0xff0000,%ecx + je" before optimization: mov_i64 tmp0,rcx movi_i64 tmp1,$0xff0000 discard cc_src and_i64 cc_dst,tmp0,tmp1 movi_i32 cc_op,$0x1c ext32u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,eq,$0x0 and after (without patch on the left, with on the right): movi_i64 tmp1,$0xff0000 movi_i64 tmp1,$0xff0000 discard cc_src discard cc_src and_i64 cc_dst,rcx,tmp1 and_i64 cc_dst,rcx,tmp1 movi_i32 cc_op,$0x1c movi_i32 cc_op,$0x1c ext32u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,eq,$0x0 brcond_i64 cc_dst,tmp12,eq,$0x0 Other similar cases: "test %eax, %eax + jne" where eax is already 32-bit (after optimization, without patch on the left, with on the right): discard cc_src discard cc_src mov_i64 cc_dst,rax mov_i64 cc_dst,rax movi_i32 cc_op,$0x1c movi_i32 cc_op,$0x1c ext32u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,ne,$0x0 brcond_i64 rax,tmp12,ne,$0x0 "test $0x1, %dl + je": movi_i64 tmp1,$0x1 movi_i64 tmp1,$0x1 discard cc_src discard cc_src and_i64 cc_dst,rdx,tmp1 and_i64 cc_dst,rdx,tmp1 movi_i32 cc_op,$0x1a movi_i32 cc_op,$0x1a ext8u_i64 tmp0,cc_dst movi_i64 tmp12,$0x0 movi_i64 tmp12,$0x0 brcond_i64 tmp0,tmp12,eq,$0x0 brcond_i64 cc_dst,tmp12,eq,$0x0 In some cases TCG even outsmarts GCC. :) Here the input code has "and $0x2,%eax + movslq %eax,%rbx + test %rbx, %rbx" and the optimizer, thanks to copy propagation, does the following: movi_i64 tmp12,$0x2 movi_i64 tmp12,$0x2 and_i64 rax,rax,tmp12 and_i64 rax,rax,tmp12 mov_i64 cc_dst,rax mov_i64 cc_dst,rax ext32s_i64 tmp0,rax -> nop mov_i64 rbx,tmp0 -> mov_i64 rbx,cc_dst and_i64 cc_dst,rbx,rbx -> nop Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2013-01-19 10:13:16 +00:00
Paolo Bonzini	3a9d8b179b	optimize: track nonzero bits of registers Add a "mask" field to the tcg_temp_info struct. A bit that is zero in "mask" will always be zero in the corresponding temporary. Zero bits in the mask can be produced from moves of immediates, zero-extensions, ANDs with constants, shifts; they can then be be propagated by logical operations, shifts, sign-extensions, negations, deposit operations, and conditional moves. Other operations will just reset the mask to all-ones, i.e. unknown. [rth: s/target_ulong/tcg_target_ulong/] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2013-01-19 10:13:14 +00:00
Paolo Bonzini	d193a14a2c	optimize: only write to state when clearing optimizer data The next patch will add to the TCG optimizer a field that should be non-zero in the default case. Thus, replace the memset of the temps array with a loop. Only the state field has to be up-to-date, because others are not used except if the state is TCG_TEMP_COPY or TCG_TEMP_CONST. [rth: Extracted the loop to a function.] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2013-01-19 10:13:13 +00:00
Paolo Bonzini	163fa4b09d	tcg-i386: use LEA for 3-operand 64-bit addition Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2013-01-12 12:45:56 +00:00
Stefan Weil	9a8a5ae69d	tcg: Remove unneeded assertion Commit `7f6f0ae5b9` added two assertions. One of these assertions is not needed: The pointer ts is never NULL because it is initialized with the address of an array element. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2013-01-02 11:23:21 -06:00
Richard Henderson	753d99d38b	tcg-hppa: Fix typo in brcond2 Reported-by: Stuart Brady <sdb@zubnet.me.uk> Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-12-29 12:21:53 +00:00
Richard Henderson	76a347e1cd	tcg-i386: Perform cmov detection at runtime for 32-bit. Existing compile-time detection is spotty at best. Convert it all to runtime detection instead. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-12-29 12:21:16 +00:00
Richard Henderson	afcb92beac	tcg: Add TCGV_IS_UNUSED_* Cc: Aurelien Jarno <aurelien@aurel32.net> Signed-off-by: Richard Henderson <rth@twiddle.net> Reviewed-by: Andreas Färber <afaerber@suse.de> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-12-29 12:14:07 +00:00
Paolo Bonzini	1de7afc984	misc: move include files to include/qemu/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:32:39 +01:00
Paolo Bonzini	022c62cbbc	exec: move include files to include/exec/ Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:31:31 +01:00
Paolo Bonzini	cb9c377f54	janitor: add guards to headers Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2012-12-19 08:31:31 +01:00
Anthony Liguori	7c12fd9b29	Merge remote-tracking branch 'stefanha/trivial-patches' into staging * stefanha/trivial-patches: pc_sysfw: Plug memory leak on pc_fw_add_pflash_drv() error path qemu-options: Fix space at EOL Fix spelling in comments and documentation Clean up pci_drive_hot_add()'s use of BlockInterfaceType arm: a9mpcore: remove un-used ptimer_iomem field target-sparc: Remove t0, t1 from CPUSPARCState target-m68k: Remove t1 from CPUM68KState target-alpha: Remove t0, t1 from CPUAlphaState s390x: Spelling fixes (endianess -> endianness, occured -> occurred) Fix comments (adress -> address, layed -> laid, wierd -> weird) Fix spelling (prefered -> preferred) configure: Remove stray debug output sd: Send debug printfery to stderr not stdout Conflicts: configure Resolve spelling conflict in configure. Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>	2012-12-10 08:34:29 -06:00
Evgeny Voevodin	c3a43607d9	tcg/tcg.h: Duplicate global TCG gen_opc_ arrays into TCGContext. Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-12-08 14:24:41 +00:00
Stefan Weil	a93cf9dfba	Fix comments (adress -> address, layed -> laid, wierd -> weird) Remove also a duplicated 'the'. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2012-12-07 12:34:11 +01:00
Aurelien Jarno	e5138db510	tcg: mark local temps as MEM in dead_temp() In dead_temp, local temps should always be marked as back to memory, even if they have not been allocated (i.e. they are discared before cross a basic block). It fixes the following assertion in target-xtensa: qemu-system-xtensa: tcg/tcg.c:1665: temp_save: Assertion `s->temps[temp].val_type == 2 \|\| s->temps[temp].fixed_reg' failed. Aborted Reported-by: Max Filippov <jcmvbkbc@gmail.com> Tested-by: Max Filippov <jcmvbkbc@gmail.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-11-24 13:24:13 +01:00
Aurelien Jarno	7aab08aa78	tcg/arm: fix cross-endian qemu_st16 The bswap16 TCG opcode assumes that the high bytes of the temp equal to 0 before calling it. The ARM backend implementation takes this assumption to slightly optimize the generated code. The same implementation is called for implementing the cross-endian qemu_st16 opcode, where this assumption is not true anymore. One way to fix that would be to zero the high bytes before calling it. Given the store instruction just ignore them, it is possible to provide a slightly more optimized version. With ARMv6+ the rev16 instruction does the work correctly. For lower ARM versions the patch provides a version which behaves correctly with non-zero high bytes, but fill them with junk. Cc: Andrzej Zaborowski <balrogg@gmail.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: qemu-stable@nongnu.org Reviewed-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-11-24 13:19:53 +01:00
Aurelien Jarno	d17bd1d8cc	tcg/arm: fix TLB access in qemu-ld/st ops The TCG arm backend considers likely that the offset to the TLB entries does not exceed 12 bits for mem_index = 0. In practice this is not true for at least the MIPS target. The current patch fixes that by loading the bits 23-12 with a separate instruction, and using loads with address writeback, independently of the value of mem_idx. In total this allow a 24-bit offset, which is a lot more than needed. Cc: Andrzej Zaborowski <balrogg@gmail.com> Cc: Peter Maydell <peter.maydell@linaro.org> Cc: qemu-stable@nongnu.org Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-11-24 13:19:53 +01:00
malc	ecf51c9abe	tcg/ppc: Fix !softmmu case Signed-off-by: malc <av1474@comtv.ru>	2012-11-21 10:56:22 +04:00
malc	ecdffbccd7	tcg/ppc: Remove unused s_bits variable Thanks to Alexander Graf for heads up. Signed-off-by: malc <av1474@comtv.ru>	2012-11-19 22:22:24 +04:00
Stefan Weil	e24dc9feb0	tci: Support deposit operations The operations for INDEX_op_deposit_i32 and INDEX_op_deposit_i64 are now supported and enabled by default. Signed-off-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-18 20:40:08 +00:00
Evgeny Voevodin	83eeb39669	TCG: Remove unused global variables Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-17 13:53:38 +00:00
Evgeny Voevodin	1ff0a2c594	TCG: Use gen_opparam_buf from context instead of global variable. Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-17 13:53:37 +00:00
Evgeny Voevodin	92414b31e7	TCG: Use gen_opc_buf from context instead of global variable. Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-17 13:53:36 +00:00
Evgeny Voevodin	c4afe5c4d3	TCG: Use gen_opparam_ptr from context instead of global variable. Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-17 13:53:34 +00:00
Evgeny Voevodin	efd7f48600	TCG: Use gen_opc_ptr from context instead of global variable. Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-17 13:53:27 +00:00
Evgeny Voevodin	8232a46a16	tcg/tcg.h: Duplicate global TCG variables in TCGContext Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com> Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-17 13:53:26 +00:00
Kirill Batuzov	3c5645fab3	tcg: properly check that op's output needs to be synced to memory Fix typo introduced in `b3a1be87ba`. Reported-by: Ruslan Savchenko <ruslan.savchenko@gmail.com> Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-11-11 16:06:46 +01:00
malc	c878da3b27	tcg/ppc32: Use trampolines to trim the code size for mmu slow path accessors mmu access looks something like: <check tlb> if miss goto slow_path <fast path> done: ... ; end of the TB slow_path: <pre process> mr r3, r27 ; move areg0 to r3 ; (r3 holds the first argument for all the PPC32 ABIs) <call mmu_helper> b $+8 .long done <post process> b done On ppc32 <call mmu_helper> is: (SysV and Darwin) mmu_helper is most likely not within direct branching distance from the call site, necessitating a. moving 32 bit offset of mmu_helper into a GPR ; 8 bytes b. moving GPR to CTR/LR ; 4 bytes c. (finally) branching to CTR/LR ; 4 bytes r3 setting - 4 bytes call - 16 bytes dummy jump over retaddr - 4 bytes embedded retaddr - 4 bytes Total overhead - 28 bytes (PowerOpen (AIX)) a. moving 32 bit offset of mmu_helper's TOC into a GPR1 ; 8 bytes b. loading 32 bit function pointer into GPR2 ; 4 bytes c. moving GPR2 to CTR/LR ; 4 bytes d. loading 32 bit small area pointer into R2 ; 4 bytes e. (finally) branching to CTR/LR ; 4 bytes r3 setting - 4 bytes call - 24 bytes dummy jump over retaddr - 4 bytes embedded retaddr - 4 bytes Total overhead - 36 bytes Following is done to trim the code size of slow path sections: In tcg_target_qemu_prologue trampolines are emitted that look like this: trampoline: mfspr r3, LR addi r3, 4 mtspr LR, r3 ; fixup LR to point over embedded retaddr mr r3, r27 <jump mmu_helper> ; tail call of sorts And slow path becomes: slow_path: <pre process> <call trampoline> .long done <post process> b done call - 4 bytes (trampoline is within code gen buffer and most likely accessible via direct branch) embedded retaddr - 4 bytes Total overhead - 8 bytes In the end the icache pressure is decreased by 20/28 bytes at the cost of an extra jump to trampoline and adjusting LR (to skip over embedded retaddr) once inside. Signed-off-by: malc <av1474@comtv.ru>	2012-11-06 04:37:57 +04:00
malc	ed224a56b3	tcg/ppc: ld/st optimization Signed-off-by: malc <av1474@comtv.ru>	2012-11-03 20:17:54 +04:00
Yeongkyoon Lee	b76f0d8c2e	tcg: Optimize qemu_ld/st by generating slow paths at the end of a block Add optimized TCG qemu_ld/st generation which locates the code of TLB miss cases at the end of a block after generating the other IRs. Currently, this optimization supports only i386 and x86_64 hosts. Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-03 09:44:21 +00:00
Aurelien Jarno	b3a1be87ba	tcg: don't remove op if output needs to be synced to memory Commit `9c43b68de6` do not correctly check for dead outputs when they need to be synced to memory in case of half-dead operations. Fix that by applying the same pattern than for the default case. Tested-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-31 22:20:45 +01:00
Aurelien Jarno	3585317f6f	tcg/mips: use MUL instead of MULT on MIPS32 and above MIPS32 and later instruction sets have a multiplication instruction directly operating on GPRs. It only produces a 32-bit result but it is exactly what is needed by QEMU. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-30 00:34:48 +01:00
Richard Henderson	44b37ace06	tcg-i386: Use %gs prefixes for x86_64 GUEST_BASE When we allocate a reserved_va for the guest, the kernel will likely choose an address well above 4G. At which point we must use a pair of movabsq+addq to form the host address. If we have OS support, set up a segment register to point to guest_base instead. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:25 +01:00
Aurelien Jarno	b393ab4228	tcg: remove compatiblity call flags Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:24 +01:00
Aurelien Jarno	7850527966	tcg: rework TCG helper flags The current helper flags, TCG_CALL_CONST and TCG_CALL_PURE might be confusing and doesn't provide enough granularity for some helpers (FP helpers for example). This patch changes them into the following helpers flags: - TCG_CALL_NO_READ_GLOBALS means that the helper does not read globals, either directly or via an exception. They will not be saved to their canonical location before calling the helper. - TCG_CALL_NO_WRITE_GLOBALS means that the helper does not modify any globals. They will only be saved to their canonical locations before calling helpers, but they won't be reloaded afterwise. - TCG_CALL_NO_SIDE_EFFECTS means that the call to the function is removed if the return value is not used. It provides convenience flags, to avoid helper definitions longer than 80 characters. It also provides compatibility flags, and updates the documentation. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:23 +01:00
Aurelien Jarno	3d5c5f876d	tcg: synchronize globals for ops with side effects Operations with side effects (in practice qemu_ld/st ops), only need to synchronize globals to make sure the CPU state is consistent in case of exception. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	b202d41ee7	tcg: forbid ld/st function to modify globals Mapping a memory address using a global and accessing it through ld/st operations is currently broken. As it doesn't make any sense to do that performance wise, let's forbid that. Update the TCG documentation, and remove partial support for that. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	344028ba0f	tcg: fix some op flags Some branch related ops are marked with TCG_OPF_SIDE_EFFECTS, some other not. In practice they don't need to, as they are all marked with TCG_OPF_BB_END, which is handled specifically in all the code. The call op is marked as TCG_OPF_SIDE_EFFECTS, which might be not true as there is are specific flags (TCG_CALL_CONST and TCG_CALL_PURE) for specifying that. On the other hand it always clobber arguments, so mark it as such even if the call op is handled in a different code path. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	2c0366f036	tcg: don't explicitly save globals and temps The liveness analysis ensures that globals and temps are at the correct state at a basic block end or with an op with side effects. Avoid looping on all temps, this can be time consuming on targets with a lot of globals. Keep an assert in debug mode. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	7dfd8c6aa1	tcg: start with local temps in TEMP_VAL_MEM state Start with local temps in TEMP_VAL_MEM state, to make possible a later check that all the temps are correctly saved back to memory. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	a52ad07e7c	tcg: always mark dead input arguments as dead Always mark dead input arguments as dead, even if the op is at the basic block end. This will allow to check that all temps are correctly saved. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	c29c1d7edf	tcg: rewrite tcg_reg_alloc_mov() Now that the liveness analysis provides more information, rewrite tcg_reg_alloc_mov(). This changes the behaviour about propagating constants and memory accesses. We now take the assumption that once a value is loaded into a register (from memory or from a constant), it's better to keep it there than to reload it later. This assumption is now always almost correct given that we are now sure the corresponding temp is going to be used later (otherwise it would have been synchronized and marked as dead already). The assumption is wrong if one of the op after clobbers some registers including the one of the holding the temp (this can be avoided by allocating clobbered registers last, which is what most TCG target do), or in case of lack of available register. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	4c4e1ab26b	tcg: improve tcg_reg_alloc_movi() Now that the liveness analysis might mark some output temps as dead, call temp_dead() if needed. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	9c43b68de6	tcg: rework liveness analysis Rework the liveness analysis by tracking temps that need to go back to memory in addition to dead temps tracking. This allows to mark output arguments as "need sync", and to synchronize them back to memory as soon as they are not written anymore. This way even arguments mapping to globals can be marked as "dead", avoiding moves to a new register when input and outputs are aliased. In addition it means that registers are freed as soon as temps are not used anymore, instead of waiting for a basic block end or an op with side effects. This reduces register spilling especially on CPUs with few registers, and spread the mov over all the TB, increasing the performances on in-order CPUs. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	ec7a869d31	tcg: sync output arguments on liveness request Synchronize an output argument when requested by the liveness analysis. This is needed so that the temp can be declared dead later. For that, add a new op_sync_args table in which each bit tells if the corresponding output argument needs to be synchronized with the memory. Pass it to the tcg_reg_alloc_* functions, and honor this bit. We need to synchronize the argument before marking it as dead, and we have to make sure all the infos about the temp are correctly filled. At the same time change some types from unsigned int to uint16_t when passing op_dead_args. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	1ad80729be	tcg: add temp_sync() Add a new function temp_sync() to synchronize the canonical location of a temp with the value in the corresponding register, but without freeing the associated register. Rewrite temp_save() to call temp_sync() followed by temp_dead(). Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	7f6ceedf9c	tcg: add tcg_reg_sync() Add a new function tcg_reg_sync() to synchronize the canonical location of a temp with the value in the associated register, but without freeing it. Rewrite tcg_reg_free() to first call tcg_reg_sync() and then to free the register. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	639368dd68	tcg: add temp_dead() A lot of code is duplicated to mark a temporary as dead. Replace it by temp_dead(), which in addition marks the temp as saved in memory for globals and local temps, instead of doing this a posteriori in temp_save(). Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	17b914912d	tcg/i386: remove ld/st third argument register constraint On x86_64, remove the constraint on the third argument register which is not needed: - For loads the helper arguments are env, addr, mem_idx. The addr value should not be in the two first argument registers as they are used in tcg_out_tlb_load(). - For stores the helper arguments are env, addr, data, mem_idx. The addr and data values should not be in the two first argument registers as they are used in tcg_out_tlb_load(). The data value should also not be in the two first argument registers, but could be in the third argument register in which case it would be already loaded at the right location. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:15 +01:00
Aurelien Jarno	166792f7bb	tcg/i386: remove suboptimal register shifting Now that CONFIG_TCG_PASS_AREG0 has been removed, it's easier to get an optimal code for the load/store functions. First swap the two registers used in tcg_out_tlb_load() so that the address end-up in the second register instead of the first one. Adjust tcg_out_qemu_ld() and tcg_out_qemu_st() to respectively call tcg_out_qemu_ld_direct() and tcg_out_qemu_st_direct() with the correct registers. Then replace the register shifting by direct load of the arguments. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:05 +01:00

1 2 3 4 5 ...

750 Commits