mirror_qemu

Commit Graph

Author	SHA1	Message	Date
Kirill Batuzov	3c5645fab3	tcg: properly check that op's output needs to be synced to memory Fix typo introduced in `b3a1be87ba`. Reported-by: Ruslan Savchenko <ruslan.savchenko@gmail.com> Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-11-11 16:06:46 +01:00
malc	c878da3b27	tcg/ppc32: Use trampolines to trim the code size for mmu slow path accessors mmu access looks something like: <check tlb> if miss goto slow_path <fast path> done: ... ; end of the TB slow_path: <pre process> mr r3, r27 ; move areg0 to r3 ; (r3 holds the first argument for all the PPC32 ABIs) <call mmu_helper> b $+8 .long done <post process> b done On ppc32 <call mmu_helper> is: (SysV and Darwin) mmu_helper is most likely not within direct branching distance from the call site, necessitating a. moving 32 bit offset of mmu_helper into a GPR ; 8 bytes b. moving GPR to CTR/LR ; 4 bytes c. (finally) branching to CTR/LR ; 4 bytes r3 setting - 4 bytes call - 16 bytes dummy jump over retaddr - 4 bytes embedded retaddr - 4 bytes Total overhead - 28 bytes (PowerOpen (AIX)) a. moving 32 bit offset of mmu_helper's TOC into a GPR1 ; 8 bytes b. loading 32 bit function pointer into GPR2 ; 4 bytes c. moving GPR2 to CTR/LR ; 4 bytes d. loading 32 bit small area pointer into R2 ; 4 bytes e. (finally) branching to CTR/LR ; 4 bytes r3 setting - 4 bytes call - 24 bytes dummy jump over retaddr - 4 bytes embedded retaddr - 4 bytes Total overhead - 36 bytes Following is done to trim the code size of slow path sections: In tcg_target_qemu_prologue trampolines are emitted that look like this: trampoline: mfspr r3, LR addi r3, 4 mtspr LR, r3 ; fixup LR to point over embedded retaddr mr r3, r27 <jump mmu_helper> ; tail call of sorts And slow path becomes: slow_path: <pre process> <call trampoline> .long done <post process> b done call - 4 bytes (trampoline is within code gen buffer and most likely accessible via direct branch) embedded retaddr - 4 bytes Total overhead - 8 bytes In the end the icache pressure is decreased by 20/28 bytes at the cost of an extra jump to trampoline and adjusting LR (to skip over embedded retaddr) once inside. Signed-off-by: malc <av1474@comtv.ru>	2012-11-06 04:37:57 +04:00
malc	ed224a56b3	tcg/ppc: ld/st optimization Signed-off-by: malc <av1474@comtv.ru>	2012-11-03 20:17:54 +04:00
Yeongkyoon Lee	b76f0d8c2e	tcg: Optimize qemu_ld/st by generating slow paths at the end of a block Add optimized TCG qemu_ld/st generation which locates the code of TLB miss cases at the end of a block after generating the other IRs. Currently, this optimization supports only i386 and x86_64 hosts. Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-11-03 09:44:21 +00:00
Aurelien Jarno	b3a1be87ba	tcg: don't remove op if output needs to be synced to memory Commit `9c43b68de6` do not correctly check for dead outputs when they need to be synced to memory in case of half-dead operations. Fix that by applying the same pattern than for the default case. Tested-by: Stefan Weil <sw@weilnetz.de> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-31 22:20:45 +01:00
Aurelien Jarno	3585317f6f	tcg/mips: use MUL instead of MULT on MIPS32 and above MIPS32 and later instruction sets have a multiplication instruction directly operating on GPRs. It only produces a 32-bit result but it is exactly what is needed by QEMU. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-30 00:34:48 +01:00
Richard Henderson	44b37ace06	tcg-i386: Use %gs prefixes for x86_64 GUEST_BASE When we allocate a reserved_va for the guest, the kernel will likely choose an address well above 4G. At which point we must use a pair of movabsq+addq to form the host address. If we have OS support, set up a segment register to point to guest_base instead. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:25 +01:00
Aurelien Jarno	b393ab4228	tcg: remove compatiblity call flags Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:24 +01:00
Aurelien Jarno	7850527966	tcg: rework TCG helper flags The current helper flags, TCG_CALL_CONST and TCG_CALL_PURE might be confusing and doesn't provide enough granularity for some helpers (FP helpers for example). This patch changes them into the following helpers flags: - TCG_CALL_NO_READ_GLOBALS means that the helper does not read globals, either directly or via an exception. They will not be saved to their canonical location before calling the helper. - TCG_CALL_NO_WRITE_GLOBALS means that the helper does not modify any globals. They will only be saved to their canonical locations before calling helpers, but they won't be reloaded afterwise. - TCG_CALL_NO_SIDE_EFFECTS means that the call to the function is removed if the return value is not used. It provides convenience flags, to avoid helper definitions longer than 80 characters. It also provides compatibility flags, and updates the documentation. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:23 +01:00
Aurelien Jarno	3d5c5f876d	tcg: synchronize globals for ops with side effects Operations with side effects (in practice qemu_ld/st ops), only need to synchronize globals to make sure the CPU state is consistent in case of exception. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	b202d41ee7	tcg: forbid ld/st function to modify globals Mapping a memory address using a global and accessing it through ld/st operations is currently broken. As it doesn't make any sense to do that performance wise, let's forbid that. Update the TCG documentation, and remove partial support for that. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	344028ba0f	tcg: fix some op flags Some branch related ops are marked with TCG_OPF_SIDE_EFFECTS, some other not. In practice they don't need to, as they are all marked with TCG_OPF_BB_END, which is handled specifically in all the code. The call op is marked as TCG_OPF_SIDE_EFFECTS, which might be not true as there is are specific flags (TCG_CALL_CONST and TCG_CALL_PURE) for specifying that. On the other hand it always clobber arguments, so mark it as such even if the call op is handled in a different code path. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	2c0366f036	tcg: don't explicitly save globals and temps The liveness analysis ensures that globals and temps are at the correct state at a basic block end or with an op with side effects. Avoid looping on all temps, this can be time consuming on targets with a lot of globals. Keep an assert in debug mode. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	7dfd8c6aa1	tcg: start with local temps in TEMP_VAL_MEM state Start with local temps in TEMP_VAL_MEM state, to make possible a later check that all the temps are correctly saved back to memory. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	a52ad07e7c	tcg: always mark dead input arguments as dead Always mark dead input arguments as dead, even if the op is at the basic block end. This will allow to check that all temps are correctly saved. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	c29c1d7edf	tcg: rewrite tcg_reg_alloc_mov() Now that the liveness analysis provides more information, rewrite tcg_reg_alloc_mov(). This changes the behaviour about propagating constants and memory accesses. We now take the assumption that once a value is loaded into a register (from memory or from a constant), it's better to keep it there than to reload it later. This assumption is now always almost correct given that we are now sure the corresponding temp is going to be used later (otherwise it would have been synchronized and marked as dead already). The assumption is wrong if one of the op after clobbers some registers including the one of the holding the temp (this can be avoided by allocating clobbered registers last, which is what most TCG target do), or in case of lack of available register. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:22 +01:00
Aurelien Jarno	4c4e1ab26b	tcg: improve tcg_reg_alloc_movi() Now that the liveness analysis might mark some output temps as dead, call temp_dead() if needed. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	9c43b68de6	tcg: rework liveness analysis Rework the liveness analysis by tracking temps that need to go back to memory in addition to dead temps tracking. This allows to mark output arguments as "need sync", and to synchronize them back to memory as soon as they are not written anymore. This way even arguments mapping to globals can be marked as "dead", avoiding moves to a new register when input and outputs are aliased. In addition it means that registers are freed as soon as temps are not used anymore, instead of waiting for a basic block end or an op with side effects. This reduces register spilling especially on CPUs with few registers, and spread the mov over all the TB, increasing the performances on in-order CPUs. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	ec7a869d31	tcg: sync output arguments on liveness request Synchronize an output argument when requested by the liveness analysis. This is needed so that the temp can be declared dead later. For that, add a new op_sync_args table in which each bit tells if the corresponding output argument needs to be synchronized with the memory. Pass it to the tcg_reg_alloc_* functions, and honor this bit. We need to synchronize the argument before marking it as dead, and we have to make sure all the infos about the temp are correctly filled. At the same time change some types from unsigned int to uint16_t when passing op_dead_args. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	1ad80729be	tcg: add temp_sync() Add a new function temp_sync() to synchronize the canonical location of a temp with the value in the corresponding register, but without freeing the associated register. Rewrite temp_save() to call temp_sync() followed by temp_dead(). Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	7f6ceedf9c	tcg: add tcg_reg_sync() Add a new function tcg_reg_sync() to synchronize the canonical location of a temp with the value in the associated register, but without freeing it. Rewrite tcg_reg_free() to first call tcg_reg_sync() and then to free the register. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	639368dd68	tcg: add temp_dead() A lot of code is duplicated to mark a temporary as dead. Replace it by temp_dead(), which in addition marks the temp as saved in memory for globals and local temps, instead of doing this a posteriori in temp_save(). Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:21 +01:00
Aurelien Jarno	17b914912d	tcg/i386: remove ld/st third argument register constraint On x86_64, remove the constraint on the third argument register which is not needed: - For loads the helper arguments are env, addr, mem_idx. The addr value should not be in the two first argument registers as they are used in tcg_out_tlb_load(). - For stores the helper arguments are env, addr, data, mem_idx. The addr and data values should not be in the two first argument registers as they are used in tcg_out_tlb_load(). The data value should also not be in the two first argument registers, but could be in the third argument register in which case it would be already loaded at the right location. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:15 +01:00
Aurelien Jarno	166792f7bb	tcg/i386: remove suboptimal register shifting Now that CONFIG_TCG_PASS_AREG0 has been removed, it's easier to get an optimal code for the load/store functions. First swap the two registers used in tcg_out_tlb_load() so that the address end-up in the second register instead of the first one. Adjust tcg_out_qemu_ld() and tcg_out_qemu_st() to respectively call tcg_out_qemu_ld_direct() and tcg_out_qemu_st_direct() with the correct registers. Then replace the register shifting by direct load of the arguments. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-28 14:54:05 +01:00
Richard Henderson	4438c8a946	exec: Allocate code_gen_prologue from code_gen_buffer We had a hack for arm and sparc, allocating code_gen_prologue to a special section. Which, honestly does no good under certain cases. We've already got limits on code_gen_buffer_size to ensure that all TBs can use direct branches between themselves; reuse this limit to ensure the prologue is also reachable. As a bonus, we get to avoid marking a page of the main executable's data segment as executable. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-10-20 07:54:04 +00:00
Aurelien Jarno	41a05a4576	Merge branch 'linux-user-for-upstream' of git://git.linaro.org/people/rikuvoipio/qemu * 'linux-user-for-upstream' of git://git.linaro.org/people/rikuvoipio/qemu: linux-user: register align p{read, write}64 linux-user: ppc: mark as long long aligned tcg: Remove TCG_TARGET_HAS_GUEST_BASE define configure: Remove unnecessary host_guest_base code linux-user: If loading fails, print error as string, not number linux-user: Fix siginfo handling alpha-linux-user: Fix sigaltstack structure definition linux-user: Implement gethostname linux-user: Perform more checks on iovec lists linux-user: fix multi-threaded /proc/self/maps linux-user: fix statfs	2012-10-19 20:28:22 +02:00
Richard Henderson	1414968a6a	tcg: Optimize mulu2 Like add2, do operand ordering, constant folding, and dead operand elimination. The latter happens about 15% of all mulu2 during an x86_64 bios boot. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:51:39 +02:00
Richard Henderson	1305c451e6	tcg: Optimize half-dead add2/sub2 When x86_64 guest is not in 64-bit mode, the high-part of the 64-bit add is dead. When the host is 32-bit, we can simplify to 32-bit arithmetic. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:51:37 +02:00
Richard Henderson	212c328d61	tcg: Constant fold add2 and sub2 Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:51:37 +02:00
Richard Henderson	6c4382f8f4	tcg: Do constant folding on double-word comparisons Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:51:35 +02:00
Richard Henderson	9519da7e39	tcg: Split out subroutines from do_constant_folding_cond We can re-use these for implementing double-word folding. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:51:32 +02:00
Richard Henderson	bc1473eff4	tcg: Optimize double-word comparisons against zero Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:32:29 +02:00
Richard Henderson	6e14e91b66	tcg: Use common code when failing to optimize This saves a whole lot of repetitive code sequences. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:32:01 +02:00
Richard Henderson	0bfcb86538	tcg: Swap commutative double-word comparisons Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:31:57 +02:00
Richard Henderson	1e484e61e2	tcg: Canonicalize add2 operand ordering Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:31:53 +02:00
Richard Henderson	24c9ae4eba	tcg: Split out swap_commutative as a subroutine Reduces code duplication and prefers movcond d, c1, c2, const, s to movcond d, c1, c2, s, const It also prefers add r, r, c over add r, c, r when both inputs are known constants. This doesn't matter for true add, as we will fully constant fold that. But it matters for a follow-on patch using this routine for add2 which may not be fully foldable. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 17:30:40 +02:00
Richard Henderson	c7d4475a70	tcg-ia64: Implement deposit Note that in the general reg=reg,reg case we're restricted to 16-bit insertions. This makes it easy to allow "any" constant as input, as post-truncation it will fit into the constant load insn for which we have room in the bundle. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 01:26:43 +02:00
Aurelien Jarno	63975ea7df	tcg/ia64: slightly optimize TLB access code It is possible to slightly optimize the TLB access code, by replacing the movi + and instructions by a deposit instruction. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 01:26:43 +02:00
Aurelien Jarno	2174d1e1ff	tcg/ia64: remove suboptimal register shifting in qemu_ld/st ops Remove suboptimal register shifting in qemu_ld/st ops, introduced at the CONFIG_TCG_PASS_AREG0 time. As mem_idx is now loaded in register R58/R59 for the slow path, we have to make sure to do it last, to not add additional register constraints. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 01:26:43 +02:00
Aurelien Jarno	b90cf71692	tcg/ia64: implement movcond_i32/64 Implement movcond_i32/64 on ia64 hosts. It is not possible to have immediate compare arguments without adding a new bundle, but it is possible to have 22-bit immediate value arguments. Reviewed-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 01:26:42 +02:00
Blue Swirl	da897bf5ae	tcg/ia64: use stack for TCG temps Use stack instead of temp_buf array in CPUState for TCG temps. Signed-off-by: Blue Swirl <blauwirbel@gmail.com> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 01:26:42 +02:00
Peter Maydell	4a1d241e3c	tcg/arm: Implement movcond_i32 Implement movcond_i32 for ARM, as the sequence mov dst, v2 (implicitly done by the tcg common code) cmp c1, c2 movCC dst, v1 Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 01:22:49 +02:00
Peter Maydell	7fc645bf7a	tcg/arm: Factor out code to emit immediate or reg-reg op The code to emit either an immediate cmp or a register cmp insn is duplicated in several places; factor it out into its own function. Signed-off-by: Peter Maydell <peter.maydell@linaro.org> Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>	2012-10-17 01:22:48 +02:00
Richard Henderson	203342d8dc	tcg-sparc: Emit MOVR insns for setcond_i64 and movcond_64 Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-10-13 10:39:53 +00:00
Richard Henderson	ab1339b9b4	tcg-sparc: Emit BPr insns for brcond_i64 Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-10-13 10:39:53 +00:00
Richard Henderson	a115f3ea47	tcg-sparc: Drop use of Bicc in favor of BPcc Now that we're always sparcv9, we can not bother using Bicc for 32-bit branches and BPcc for 64-bit branches and instead always use BPcc. New interfaces allow less direct use of tcg_out32 and raw numbers inside the qemu_ld/st routines. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-10-13 10:39:53 +00:00
Richard Henderson	fd84ea2391	tcg-sparc: Optimize setcond2 equality compare with 0. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-10-13 10:39:53 +00:00
Richard Henderson	89269f6cea	tcg-sparc: Use Z constraint for %g0 Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-10-13 10:39:53 +00:00
Richard Henderson	4ec28e255f	tcg-sparc: Fix add2/sub2 We must care not to clobber the high parts before we consume them. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-10-13 10:39:53 +00:00
Richard Henderson	7d458a7567	tcg-sparc: Fix setcond The set of comparisons that can immediately use the carry are LTU/GEU, not LTU/LEU. Don't swap operands when we need a temp register; the register may already be in use from setcond2. Signed-off-by: Richard Henderson <rth@twiddle.net> Signed-off-by: Blue Swirl <blauwirbel@gmail.com>	2012-10-13 10:39:53 +00:00

1 2 3 4 5 ...

724 Commits (ce34cf72fe508b27a78f83c184142e8d1e6a048a)