Commit Graph

1231 Commits (f05d9999f42209a19c6bd82b09b47df8ca4f565c)

Author SHA1 Message Date
Richard Henderson 8642088a3d tcg-ia64: Use ADDS for small addition
Avoids a wasted cycle loading up small constants.

Simplify the code assuming the tcg optimizer is going to work
and don't expect the first operand of the add to be constant.

Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-11-18 15:57:23 +10:00
Richard Henderson 3c289cba9b tcg-ia64: Avoid unnecessary stop bit in tcg_out_alu
When performing an operation with two input registers, we'd leave
the stop bit (and thus an extra cycle) that's only needed when one
or the other input is a constant.

Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-11-18 15:57:16 +10:00
Richard Henderson d15de15ca0 tcg-ia64: Move AREG0 to R32
Since the move away from the global areg0, we're no longer globally
reserving areg0.  Which means our use of R7 clobbers a call-saved
register.  Shift areg0 into the windowed registers.  Indeed, choose
the incoming parameter register that it comes to us by.

This requires moving the register holding the return address elsewhere.
Choose R33 for tidiness.

Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-11-18 15:57:08 +10:00
Richard Henderson 6d264b38fc tcg-ia64: Simplify brcond
There was a misconception that a stop bit is required between a compare
and the branch that uses the predicate set by the compare.  This lead to
the usage of an extra bundle in which to perform the compare.  The extra
bundle left room for constants to be loaded for use with the compare insn.

If we pack the compare and the branch together in the same bundle, then
there's no longer any room for non-zero constants.  At which point we
can eliminate half the function by not handling them.

Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-11-18 15:56:42 +10:00
Richard Henderson 6f65c780b9 tcg-ia64: Handle constant calls
Using only indirect calls results in 3 bundles (one to load the
descriptor address), and 4 stop bits.  By looking through the
descriptor to the constants, we can perform the call with 2
bundles and only 1 stop bit.

Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-11-18 15:56:30 +10:00
Richard Henderson 5f7b16877a tcg-ia64: Use shortcuts for nop insns
There's no need to go through the full opcode-to-insn function call
to generate nops.  This makes the source a bit more readable.

Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-11-18 15:56:25 +10:00
Richard Henderson e3afa1c4ad tcg-ia64: Use TCGMemOp within qemu_ldst routines
Acked-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-11-18 15:56:12 +10:00
Richard Henderson 1768ec0623 tcg-ppc64: Support new ldst opcodes
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:20 -07:00
Richard Henderson 5dd391604f tcg-ppc: Support new ldst opcodes
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:20 -07:00
Richard Henderson e349a8d4ff tcg-ppc64: Convert to le/be ldst helpers
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:20 -07:00
Richard Henderson 92d0acda27 tcg-ppc: Convert to le/be ldst helpers
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:20 -07:00
Richard Henderson a058557381 tcg-ppc64: Use TCGMemOp within qemu_ldst routines
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:20 -07:00
Richard Henderson f1a16dcdd5 tcg-ppc: Use TCGMemOp within qemu_ldst routines
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:20 -07:00
Richard Henderson 091d567771 tcg-arm: Improve GUEST_BASE qemu_ld/st
If we pull the code to emit the actual load/store into a subroutine,
we can share the reg+reg addressing mode code between softmmu and
usermode.  This lets us load GUEST_BASE into a temporary register
rather than attempting to add it piece-wise to the address.

Which lets us use movw+movt for armv7, rather than (up to) 4 adds.
Code size for pre-armv7 stays the same.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:20 -07:00
Richard Henderson 15ecf6e394 tcg-arm: Convert to new ldst opcodes
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:20 -07:00
Richard Henderson a485cff09c tcg-arm: Tidy variable naming convention in qemu_ld/st
s/addr_reg2/addrhi/
s/addr_reg/addrlo/
s/data_reg2/datahi/
s/data_reg/datalo/

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:20 -07:00
Richard Henderson 0315c51ea9 tcg-arm: Convert to le/be ldst helpers
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:19 -07:00
Richard Henderson 099fcf2e36 tcg-arm: Use TCGMemOp within qemu_ldst routines
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:19 -07:00
Richard Henderson 8221a267fd tcg-i386: Support new ldst opcodes
No support for helpers with non-default endianness yet,
but good enough to test the opcodes.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:19 -07:00
Richard Henderson b3e2bc500f tcg-i386: Remove "cb" output restriction from qemu_st8 for i386
Once we form a combined qemu_st_i32 opcode, we won't be able to
have separate constraints based on size.  This one is fairly easy
to work around, since eax is available as a scratch register.

When storing variable data, this tends to merely exchange one mov
for another.  E.g.

-:  mov    %esi,%ecx
...
-:  mov    %cl,(%edx)
+:  mov    %esi,%eax
+:  mov    %al,(%edx)

Where we do have a regression is when storing constant data, in which
we may load the constant into edi, when only ecx/ebx ought to be used.

The proper way to recover this regression is to allow constants as
arguments to qemu_st_i32, so that we never load the constant data into
a register at all, must less the wrong register.  TBD.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:19 -07:00
Richard Henderson 7352ee546c tcg-i386: Tidy softmmu routines
Pass two TCGReg to tcg_out_tlb_load, rather than idx+args.

Move ldst_optimization routines just below tcg_out_tlb_load to avoid
the need for forward declarations.

Use TCGReg enum in preference to int where apprpriate.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:19 -07:00
Richard Henderson 37c5d0d5d1 tcg-i386: Use TCGMemOp within qemu_ldst routines
Step one in the transition, with constants passed down from tcg_out_op.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:19 -07:00
Richard Henderson d257e0d7ae tcg: Use TCGMemOp for TCGLabelQemuLdst.opc
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-12 16:19:19 -07:00
Richard Henderson 867b3201a3 exec: Add both big- and little-endian memory helpers
Step three in the transition: helpers not tied to the target
"default" endianness.  To be used when the guest uses a memory
operation with non-default endianness.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-10 13:19:21 -07:00
Richard Henderson f713d6ad7b tcg: Add qemu_ld_st_i32/64
Step two in the transition, adding the new ldst opcodes.  Keep the old
opcodes around until all backends support the new opcodes.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-10 13:19:21 -07:00
Richard Henderson 6c5f4ead64 tcg: Add TCGMemOp
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-10 12:20:59 -07:00
Richard Henderson 9ecefc84dd tcg: Add tcg-be-ldst.h
Move TCGLabelQemuLdst and related stuff out of tcg.h.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-10 11:44:26 -07:00
Richard Henderson 3cf246f0d4 tcg: Add tcg-be-null.h
This is a no-op backend data implementation, for those targets that
are not currently using the load/store optimization path.

This is prepatory to always requiring these functions in all backends.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-10 11:44:26 -07:00
Richard Henderson 023261ef85 tcg-aarch64: Update to helper_ret_*_mmu routines
A minimal update to use the new helpers with the return address argument.

Tested-by: Claudio Fontana <claudio.fontana@linaro.org>
Reviewed-by: Claudio Fontana <claudio.fontana@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-10 11:44:25 -07:00
Richard Henderson 84fd9dd3f7 tcg: Merge tcg_register_helper into tcg_context_init
Eliminates the repeated checks for having created
the s->helpers hash table.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-10 11:44:25 -07:00
Richard Henderson 4953ee6271 tcg: Add tcg-runtime.c helpers to all_helpers
For the few targets that actually use these, we'd not report
them symbolicly in the tcg opcode logs.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-10 11:44:25 -07:00
Richard Henderson 100b5e0170 tcg: Put target helper data into an array.
One call inside of a loop to tcg_register_helper instead of hundreds
of sequential calls.

Presumably more icache and branch prediction friendly; resulting binary
size mostly unchanged on x86_64, as we're trading 32-bit rip-relative
references in .text for full 64-bit pointers in .rodata.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-10 11:44:25 -07:00
Richard Henderson 5cd8f6210f tcg: Move helper registration into tcg_context_init
No longer needs to be done on a per-target basis.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-10 11:43:37 -07:00
Richard Henderson 6e085f72c6 tcg: Use a GHashTable for tcg_find_helper
Slightly changes the interface, in that we now return name
instead of a TCGHelperInfo structure, which goes away.

Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-10 11:41:36 -07:00
Richard Henderson 7c57df0d85 tcg: Delete tcg_helper_get_name declaration
The function was deleted in 4dc81f2822.

Reviewed-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-10 11:41:15 -07:00
Richard Henderson 802b508123 tcg-hppa: Remove tcg backend
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-10 11:31:06 -07:00
Anthony Liguori 576e81be39 Merge remote-tracking branch 'rth/tcg-arm-pull' into staging
# By Richard Henderson
# Via Richard Henderson
* rth/tcg-arm-pull:
  tcg-arm: Move the tlb addend load earlier
  tcg-arm: Remove restriction on qemu_ld output register
  tcg-arm: Return register containing tlb addend
  tcg-arm: Move load of tlb addend into tcg_out_tlb_read
  tcg-arm: Use QEMU_BUILD_BUG_ON to verify constraints on tlb
  tcg-arm: Use strd for tcg_out_arg_reg64
  tcg-arm: Rearrange slow-path qemu_ld/st
  tcg-arm: Use ldrd/strd for appropriate qemu_ld/st64

Message-id: 1380663109-14434-1-git-send-email-rth@twiddle.net
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
2013-10-09 07:52:57 -07:00
Anthony Liguori ce079abb41 Merge remote-tracking branch 'sweil/tci' into staging
# By Stefan Weil
# Via Stefan Weil
* sweil/tci:
  misc: Use new rotate functions
  bitops: Add rotate functions (rol8, ror8, ...)
  tci: Add implementation of rotl_i64, rotr_i64

Message-id: 1380137693-3729-1-git-send-email-sw@weilnetz.de
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
2013-10-09 07:51:23 -07:00
Richard Henderson ee06e23051 tcg-arm: Move the tlb addend load earlier
There are free scheduling slots between the sequence of
comparison instructions.  This requires changing the
register in use to avoid conflict with those compares.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-01 10:20:33 -07:00
Richard Henderson 66c2056fb8 tcg-arm: Remove restriction on qemu_ld output register
The main intent of the patch is to allow the tlb addend register
to be changed, without tying that change to the constraint.  But
the most common side-effect seems to be to enable usage of ldrd
with the r0,r1 pair.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-01 10:20:33 -07:00
Richard Henderson d3e440bef2 tcg-arm: Return register containing tlb addend
Preparatory to rescheduling the tlb load, and changing said register.
Continues to use R1 for now.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-01 10:20:33 -07:00
Richard Henderson d0ebde2284 tcg-arm: Move load of tlb addend into tcg_out_tlb_read
This allows us to make more intelligent decisions about the relative
offsets of the tlb comparator and the addend, avoiding any need of
writeback addressing.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-01 10:20:33 -07:00
Richard Henderson f248873637 tcg-arm: Use QEMU_BUILD_BUG_ON to verify constraints on tlb
One of the two constraints we already checked via #if, but
the tlb offset distance was only checked at runtime.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-01 10:20:33 -07:00
Richard Henderson e5e2e4a74b tcg-arm: Use strd for tcg_out_arg_reg64
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-01 10:20:33 -07:00
Richard Henderson d9f4dde4a6 tcg-arm: Rearrange slow-path qemu_ld/st
Use the new helper_ret_*_mmu routines.  Use a conditional call
to arrange for a tail-call from the store path, and to load the
return address for the helper for the load path.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-01 10:20:33 -07:00
Richard Henderson 23bbc25085 tcg-arm: Use ldrd/strd for appropriate qemu_ld/st64
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-10-01 10:20:33 -07:00
Stefan Weil 3df2b8fde9 misc: Use new rotate functions
Signed-off-by: Stefan Weil <sw@weilnetz.de>
2013-09-25 21:23:05 +02:00
Stefan Weil d285bf784b tci: Add implementation of rotl_i64, rotr_i64
It is used by qemu-ppc64 when running Debian's busybox-static.

Cc: qemu-stable <qemu-stable@nongnu.org>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2013-09-25 21:22:00 +02:00
Richard Henderson 7f12d6497f tcg-ppc64: Implement CONFIG_QEMU_LDST_OPTIMIZATION
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:33 -07:00
Richard Henderson c7ca6a2b75 tcg-ppc64: Add _noaddr functions for emitting forward branches
... rather than open-coding this stuff through the file.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:32 -07:00
Richard Henderson fedee3e7fd tcg-ppc64: Streamline tcg_out_tlb_read
Less conditional compilation.  Merge an add insn with the indexed
memory load insn.  Load the tlb addend earlier.  Avoid the address
update memory form.

Fix a bug in not allowing large enough tlb offsets for some guests.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:32 -07:00
Richard Henderson fa94c3be7a tcg-ppc64: Implement tcg_register_jit
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:32 -07:00
Richard Henderson b18d5d2b80 tcg-ppc64: Handle long offsets better
Previously we'd only handle 16-bit offsets from memory operand without falling
back to indexed, but it's easy to use ADDIS to handle full 32-bit offsets.

This also lets us unify code that existed inline in tcg_out_op for handling
addition of large constants.

The new R2 temporary was marked reserved for the AIX calling convention, but
the register really is call-clobbered and since tcg generated code has no use
for a TOC, it's available for use.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:32 -07:00
Richard Henderson 5e1702b074 tcg-ppc64: Tidy register allocation order
Remove conditionalization from tcg_target_reg_alloc_order, relying on
reserved_regs to prevent register allocation that shouldn't happen.
So R11 is now present in reg_alloc_order for __APPLE__, but also now
reserved.

Sort reg_alloc_order into call-saved, call-clobbered, and parameters.
This reduces the effect of values getting spilled and reloaded before
function calls.

Whether or not it is reserved, R2 (TOC) is always call-clobbered.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:32 -07:00
Richard Henderson b0940da012 tcg-ppc64: Look through a constant function descriptor
Especially in the user-only configurations, a direct branch into
the executable may be in range.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:32 -07:00
Richard Henderson d40f3cb112 tcg-ppc64: Fold constant call address into descriptor load
Eliminates one insn per call:

 :  lis     r2,4165
-:  ori     r2,r2,59616
-:  ld      r0,0(r2)
+:  ld      r0,-5920(r2)
 :  mtctr   r0
-:  ld      r2,8(r2)
+:  ld      r2,-5912(r2)
 :  bctrl

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:32 -07:00
Richard Henderson ad94e1a9db tcg-ppc64: Don't load the static chain from TCG
There are no helpers that require the static chain.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:32 -07:00
Richard Henderson f8b8412907 tcg-ppc64: Avoid code for nop move
While these are rare from code that's been through the optimizer,
it's not uncommon within the tcg backend.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:32 -07:00
Richard Henderson 5e0f40cfed tcg-ppc64: Use tcg_out64
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:32 -07:00
Richard Henderson 8327a470df tcg-ppc64: Use TCG_REG_Rn constants
Instead of bare N, for clarity.  The only (intentional) exception made
is for insns that encode R|0, i.e. when R0 encoded into the insn is
interpreted as zero not the contents of the register.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:32 -07:00
Richard Henderson 29b6919869 tcg-ppc64: More use of TAI and SAI helper macros
Finish conversion of all memory operations.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:32 -07:00
Richard Henderson 541dd4ceaa tcg-ppc64: Reformat tcg-target.c
Whitespace and brace changes only.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:31 -07:00
Richard Henderson 8f50c841b3 tcg-ppc: Fix and cleanup tcg_out_tlb_check
The fix is that sparc has so many mmu modes that the last one overflowed
the 16-bit signed offset we assumed would fit.  Handle this, and check
the new assumption at compile time.

Load the tlb addend earlier for the fast path.

Remove the explicit address + addend and make use of index addressing.

Adjust constraints for qemu_ld64 such that we don't clobber the address
register or tlb addend before loading both values.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:31 -07:00
Richard Henderson 5b1c985b7e tcg-ppc: Use conditional branch and link to slow path
Saves one insn per slow path.  Note that we can no longer use
a tail call into the store helper.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:31 -07:00
Richard Henderson 1d10cf9886 tcg-ppc: Cleanup tcg_out_qemu_ld/st_slow_path
Coding style fixes.  Use TCGReg enumeration values instead of raw
numbers.  Don't needlessly pull the whole TCGLabelQemuLdst struct
into local variables.  Less conditional compilation.

No functional changes.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:31 -07:00
Richard Henderson 4b2b114d8c tcg-ppc: Avoid code for nop move
While these are rare from code that's been through the optimizer,
it's not uncommon within the tcg backend.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:46:31 -07:00
Paolo Bonzini 619f90ba62 tcg-ppc: use new return-argument ld/st helpers
These use a 32-bit load-of-immediate to save a mflr+addi+mtlr sequence.
Tested with a Windows 98 guest (pretty much the most recent thing I
could run on my PPC machine) and kvm-unit-tests's sieve.flat.  The
speed up for sieve.flat is as high as 10% for qemu-system-i386, 25%
(no kidding) for qemu-system-x86_64 on my PowerBook G4.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:45:39 -07:00
Paolo Bonzini 6a11557988 tcg-ppc: fix qemu_ld/qemu_st for AIX ABI
For the AIX ABI, the function pointer and small area pointer need
to be loaded in the trampoline.  The trampoline instead is called
with a normal BL instruction.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-25 07:45:30 -07:00
Richard Henderson 387e417666 tcg-sparc: Fix parenthesis warning
error: suggest parentheses around comparison in operand of ‘&’ [-Werror=parentheses]

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2013-09-20 20:09:24 +04:00
Anthony Liguori 5a93d5c2ab Merge remote-tracking branch 'mjt/trivial-patches' into staging
# By Stefan Weil (6) and others
# Via Michael Tokarev
* mjt/trivial-patches:
  aio / timers: use g_usleep() not sleep()
  adlib: sort offsets in portio registration
  qmp: fix integer usage in examples
  tci: Remove function tcg_out64 (fix broken build)
  target-arm: Report unimplemented opcodes (LOG_UNIMP)
  pflash_cfi02.c: fix debug macro
  configure: Remove unneeded redirections of stderr (pkg-config --exists)
  configure: Remove unneeded redirections of stderr (pkg-config --cflags, --libs)
  configure: Don't write .pyc files by default (python -B)
  curl: qemu_bh_new() can never return NULL
  slirp/arp_table.c: Avoid shifting into sign bit of signed integers
  configure: disable clang -Wstring-plus-int warning
  rdma: silly ipv6 bugfix
  misc: Fix some typos in names and comments
  slirp: Port redirection option behave differently on Linux and Windows

Message-id: 1378119695-14568-1-git-send-email-mjt@msgid.tls.msk.ru
Signed-off-by: Anthony Liguori <anthony@codemonkey.ws>
2013-09-03 12:31:44 -05:00
Aurelien Jarno 545825d4cd Merge branch 'tcg-next' of git://github.com/rth7680/qemu
* 'tcg-next' of git://github.com/rth7680/qemu: (29 commits)
  tcg-i386: Make use of zero-extended memory helper routines
  tcg: Introduce zero and sign-extended versions of load helpers
  exec: Split softmmu_defs.h
  target: Include softmmu_exec.h where forgotten
  exec: Rename USUFFIX to LSUFFIX
  tcg-i386: Don't perform GETPC adjustment in TCG code
  exec: Reorganize the GETRA/GETPC macros
  configure: Allow x32 as a host
  tcg-i386: Adjust tcg_out_tlb_load for x32
  tcg-i386: Use intptr_t appropriately
  tcg: Fix jit debug for x32
  tcg: Use appropriate types in tcg_reg_alloc_call
  tcg: Change tcg_out_ld/st offset to intptr_t
  tcg: Change tcg_gen_exit_tb argument to uintptr_t
  tcg: Use uintptr_t in TCGHelperInfo
  tcg: Change relocation offsets to intptr_t
  tcg: Change memory offsets to intptr_t
  tcg: Change frame pointer offsets to intptr_t
  tcg: Define TCG_ptr properly
  tcg: Define TCG_TYPE_PTR properly
  ...
2013-09-03 01:35:43 +02:00
Aurelien Jarno 3207bf2549 tcg/mips: only enable ext8s/ext16s ops on MIPS32R2
On MIPS ext8s and ext16s ops are implemented with a dedicated
instruction only on MIPS32R2, otherwise the same kind of implementation
than at TCG level (shift left followed by shift right) is used.

Change that by only implementing the ext8s and ext16s ops on MIPS32R2 so
that optimizations can be done by the optimizer. Use an inline version to
avoid having to test again for MIPS32R2 instructions. Keep the shift
implementation for the ld/st routines.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-09-03 01:34:46 +02:00
Aurelien Jarno df81ff51d5 tcg/mips: inline bswap16/bswap32 ops
Use an inline version for the bswap16 and bswap32 ops to avoid
testing for MIPS32R2 instructions availability, as these ops are
only available in that case.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-09-03 01:34:46 +02:00
Aurelien Jarno 988902fc3b tcg/mips: detect available host instructions at runtime
Now that TCG supports enabling and disabling ops at runtime, it's
possible to detect the available host instructions at runtime, and
enable the corresponding ops accordingly.

Unfortunately it's not easy to probe for available instructions on
MIPS, the information is partially available in /proc/cpuinfo, and
not available in AUXV. This patch therefore probes for the instructions
by trying to execute them and by catching a possible SIGILL signal.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-09-03 01:34:46 +02:00
Richard Henderson 6fb5874590 tcg-i386: Make use of zero-extended memory helper routines
For 8 and 16-bit unsigned loads, rely on the zero-extension
from the helper and use a smaller 32-bit move insn.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:31 -07:00
Richard Henderson c8f94df593 tcg: Introduce zero and sign-extended versions of load helpers
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:31 -07:00
Richard Henderson e58eb53413 exec: Split softmmu_defs.h
The _cmmu helpers can be moved to exec-all.h.  The helpers that are
used from TCG will shortly need access to tcg_target_long so move
their declarations into tcg.h.

This requires minor include adjustments to all TCG backends.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:30 -07:00
Richard Henderson 5bcebc253c tcg-i386: Don't perform GETPC adjustment in TCG code
Since we now perform it inside the helper, no need to do it here.
This also lets us perform a tail-call from the store slow path to
the helper.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:30 -07:00
Richard Henderson d5dad3be31 tcg-i386: Adjust tcg_out_tlb_load for x32
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:30 -07:00
Richard Henderson 357e3d8a29 tcg-i386: Use intptr_t appropriately
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:30 -07:00
Richard Henderson edee2579ae tcg: Fix jit debug for x32
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:30 -07:00
Richard Henderson d3452f1f40 tcg: Use appropriate types in tcg_reg_alloc_call
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:30 -07:00
Richard Henderson a05b5b9be0 tcg: Change tcg_out_ld/st offset to intptr_t
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:30 -07:00
Richard Henderson 8cfd04959a tcg: Change tcg_gen_exit_tb argument to uintptr_t
And update all users.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:30 -07:00
Richard Henderson 48bc6bab47 tcg: Use uintptr_t in TCGHelperInfo
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Richard Henderson 2ba7fae29e tcg: Change relocation offsets to intptr_t
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Richard Henderson 2f2f244d02 tcg: Change memory offsets to intptr_t
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Richard Henderson e2c6d1b42d tcg: Change frame pointer offsets to intptr_t
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Richard Henderson 8b73d49f53 tcg: Define TCG_ptr properly
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Richard Henderson d289837eef tcg: Define TCG_TYPE_PTR properly
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Richard Henderson 78cd7b835e tcg: Allow TCG_TARGET_REG_BITS to be specified independantly
There are several hosts for which it would be useful to use the
available 64-bit registers in a 32-bit pointer environment.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Richard Henderson 04d5a1da70 tcg: Change tcg_qemu_tb_exec return to uintptr_t
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Richard Henderson b93949ef6a tcg: Change flush_icache_range arguments to uintptr_t
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Richard Henderson 01547f7f92 tcg: Constant fold div, rem
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Richard Henderson 32f5717f07 tcg-ppc64: Implement muluh, mulsh
Using these instead of mulu2 and muls2 lets us avoid having to argument
overlap analysis in the backend.  Normal register allocation will DTRT.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Richard Henderson 3c9a8f1756 tcg-mips: Implement mulsh, muluh
With the optimization in tcg_liveness_analysis,
we can avoid the MFLO when it is unused.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Richard Henderson 03271524b6 tcg: Add muluh and mulsh opcodes
Use them in places where mulu2 and muls2 are used.
Optimize mulx2 with dead low part to mulxh.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-09-02 09:08:29 -07:00
Stefan Weil a32b12741b tci: Remove function tcg_out64 (fix broken build)
Commit ac26eb69a3 added tcg_out64 to tcg/tcg.c.
tcg/tci/tcg-target.c already had a nearly identical implementation which is
now removed to fix a compiler error.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2013-09-01 19:36:16 +04:00
Richard Henderson 401c227b0a tcg-i386: Use new return-argument ld/st helpers
Discontinue the jump-around-jump-to-jump scheme, trading it for a single
immediate move instruction.  The two extra jumps always consume 7 bytes,
whereas the immediate move is either 5 or 7 bytes depending on where the
code_gen_buffer gets located.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-08-26 13:31:54 -07:00
Richard Henderson c6f29ff096 tcg-i386: Tidy qemu_ld/st slow path
Use existing stack space for arguments; don't push/pop.
Use less ifdefs and more C ifs.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-08-26 13:31:53 -07:00
Richard Henderson 8023ccda07 tcg-i386: Try pc-relative lea for constant formation
Use a 7 byte lea before the ultimate 10 byte movq.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-08-26 13:31:53 -07:00
Richard Henderson ac26eb69a3 tcg-i386: Add and use tcg_out64
No point in splitting the write into 32-bit pieces.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-08-26 13:31:53 -07:00
Richard Henderson 2bb8656dad tcg: Tidy generated code for tcg_outN
Aliasing was forcing s->code_ptr to be re-read after the store.
Keep the pointer in a local variable to help the compiler.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-08-26 13:31:53 -07:00
James Hogan 85711e6baf tcg/mips: fix invalid op definition errors
tcg/mips/tcg-target.h defines various operations conditionally depending
upon the isa revision, however these operations are included in
mips_op_defs[] unconditionally resulting in the following runtime errors
if CONFIG_DEBUG_TCG is defined:

Invalid op definition for movcond_i32
Invalid op definition for rotl_i32
Invalid op definition for rotr_i32
Invalid op definition for deposit_i32
Invalid op definition for bswap16_i32
Invalid op definition for bswap32_i32
tcg/tcg.c:1196: tcg fatal error

Fix with ifdefs like the i386 backend does for movcond_i32.

Signed-off-by: James Hogan <james.hogan@imgtec.com>
Cc: Aurelien Jarno <aurelien@aurel32.net>
Cc: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-08-08 23:06:02 +02:00
Stefan Weil 5fe0d351b3 tci: Fix broken build (compiler warning caused by redefined macro BIT)
The definition of macro BIT in tci/tcg-target.c now conflicts with the
definition of the same macro in includes qemu/bitops.h.

This conflict was triggered by a recent change in the include chain of
tcg.c (probably commit 949fc82314).

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Message-id: 1375216883-23969-1-git-send-email-sw@weilnetz.de
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-07-30 18:48:21 -05:00
Richard Henderson f290e4988d Merge git://github.com/hw-claudio/qemu-aarch64-queue into tcg-next 2013-07-15 13:21:10 -07:00
Jani Kokkonen c6d8ed24b4 tcg/aarch64: Implement tlb lookup fast path
Supports CONFIG_QEMU_LDST_OPTIMIZATION

Signed-off-by: Jani Kokkonen <jani.kokkonen@huawei.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Claudio Fontana <claudio.fontana@huawei.com>
2013-07-15 13:13:46 +02:00
Richard Henderson 0caa91fe1f tcg-arm: Implement tcg_register_jit
Allows unwinding past the code_gen_buffer.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09 07:15:25 -07:00
Richard Henderson b5cc476da7 tcg-i386: Use QEMU_BUILD_BUG_ON instead of assert for frame size
We can check the condition at compile time, rather than run time.

Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09 07:15:25 -07:00
Richard Henderson 497a22eb87 tcg: Move the CIE and FDE header definitions to common code
These will necessarily be the same layout for all hosts.  This limits
the amount of boilerplate required to implement jit debug for a host.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09 07:15:24 -07:00
Richard Henderson 45aba097d2 tcg: Fix high_pc fields in .debug_info
I don't think the debugger actually looks at this for anything,
using the correct .debug_frame contents, but might as well get
it all correct.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09 07:15:24 -07:00
Richard Henderson 1e709f3833 tcg-arm: Use AT_PLATFORM to detect the host ISA
With this we can generate armv7 insns even when the OS compiles for a
lower common denominator.  The macros are arranged so that when we do
compile for a given ISA, all of the runtime checks for that ISA are
optimized away.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09 07:15:13 -07:00
Richard Henderson cb91021a47 tcg-arm: Simplify logic in detecting the ARM ISA in use
GCC 4.8 defines a handy __ARM_ARCH symbol that we can use, which
will make us nicely forward compatible with ARMv8 AArch32.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09 07:15:02 -07:00
Richard Henderson fb82273851 tcg-arm: Rename use_armv5_instructions to use_armvt5_instructions
As it really controls the availability of a thumb interworking
instruction on armv5t.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09 07:14:51 -07:00
Richard Henderson 72e1ccfc0c tcg-arm: Make use of conditional availability of opcodes for divide
We can now detect and use divide instructions at runtime, rather than
having to restrict their availability to compile-time.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09 07:14:35 -07:00
Richard Henderson c1a61f6c85 tcg: Simplify logic using TCG_OPF_NOT_PRESENT
Expand the definition of "not present" to include "should not be present".
This means we can simplify the logic surrounding the generic tcg opcodes
for which the host backend ought not be providing definitions.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09 07:14:35 -07:00
Richard Henderson 4ef76952bd tcg: Allow non-constant control macros
This allows TCG_TARGET_HAS_* to be a variable rather than a constant,
which allows easier support for differing ISA levels for the host.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09 07:14:35 -07:00
Richard Henderson 5b9f72ab59 tcg-ppc64: Don't implement rem
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09 07:14:34 -07:00
Richard Henderson 865a4671f9 tcg-ppc: Don't implement rem
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09 07:14:34 -07:00
Richard Henderson 5e1108b370 tcg-arm: Don't implement rem
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09 07:14:34 -07:00
Richard Henderson ca675f46e6 tcg: Split rem requirement from div requirement
There are several hosts with only a "div" insn.  Remainder is computed
manually from the quotient and inputs.  We can do this generically.

Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-07-09 07:14:09 -07:00
Claudio Fontana b1f6dc0d2a tcg/aarch64: implement ldst 12bit scaled uimm offset
implement the 12bit scaled unsigned immediate offset
variant of LDR/STR. This improves code size by avoiding
the movi + ldst_r for naturally aligned offsets in range.

Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
2013-07-03 14:43:11 +02:00
Anton Blanchard d1bdd3af49 tcg-ppc64: rotr_i32 rotates wrong amount
rotr_i32 calculates the amount to left shift and puts it into a
temporary, but then doesn't use it when doing the shift.

Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-06-17 10:42:16 -07:00
Anton Blanchard 8424735710 tcg-ppc64: Fix add2_i64
add2_i64 was adding the lower double word to the upper double word
of each input. Fix this so we add the lower double words, then the
upper double words with carry propagation.

Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-06-17 10:42:16 -07:00
Anton Blanchard 82e0f9170a tcg-ppc64: bswap64 rotates output 32 bits
If our input and output is in the same register, bswap64 tries to
undo a rotate of the input. This just ends up rotating the output.

Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-06-17 10:42:16 -07:00
Anton Blanchard 8a94cfb05e tcg-ppc64: Fix RLDCL opcode
The rldcl instruction doesn't have an sh field, so the minor opcode
is shifted 1 bit. We were using the XO30 macro which shifted the
minor opcode 2 bits.

Remove XO30 and add MD30 and MDS30 macros which match the
Power ISA categories.

Cc: qemu-stable@nongnu.org
Signed-off-by: Anton Blanchard <anton@samba.org>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-06-17 10:41:52 -07:00
Anthony Liguori 86a6a07745 Merge remote-tracking branch 'pmaydell/tcg-aarch64.next' into staging
# By Claudio Fontana (9) and others
# Via Peter Maydell
* pmaydell/tcg-aarch64.next:
  MAINTAINERS: add tcg/aarch64 maintainer
  configure: permit compilation on arm aarch64
  tcg/aarch64: implement user mode qemu ld/st
  user-exec.c: aarch64 initial implementation of cpu_signal_handler
  tcg/aarch64: implement sign/zero extend operations
  tcg/aarch64: implement byte swap operations
  tcg/aarch64: implement AND/TEST immediate pattern
  tcg/aarch64: improve arith shifted regs operations
  tcg/aarch64: implement new TCG target for aarch64
  include/elf.h: add aarch64 ELF machine and relocs
  configure: Drop CONFIG_ATFILE test
  linux-user: Drop direct use of openat etc syscalls
  linux-user: Allow getdents to be provided by getdents64

Message-id: 1371052645-9006-1-git-send-email-peter.maydell@linaro.org
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-06-14 07:51:05 -05:00
Jani Kokkonen 6a91c7c978 tcg/aarch64: implement user mode qemu ld/st
also put aarch64 in the list of archs that do not need an ldscript.

Signed-off-by: Jani Kokkoken <jani.kokkonen@huawei.com>
Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 51AF40EE.1000104@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2013-06-12 16:20:23 +01:00
Claudio Fontana 31f1275b90 tcg/aarch64: implement sign/zero extend operations
implement the optional sign/zero extend operations with the dedicated
aarch64 instructions.

Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 51AC9A58.40502@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2013-06-12 16:20:23 +01:00
Claudio Fontana 9c4a059df3 tcg/aarch64: implement byte swap operations
implement the optional byte swap operations with the dedicated
aarch64 instructions.

Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 51AC9A33.9050003@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2013-06-12 16:20:23 +01:00
Claudio Fontana 7deea126b2 tcg/aarch64: implement AND/TEST immediate pattern
add functions to AND/TEST registers with immediate patterns.

Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 51AC9A0C.3090303@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2013-06-12 16:20:22 +01:00
Claudio Fontana 36fac14a64 tcg/aarch64: improve arith shifted regs operations
for arith operations, add SUBS, ANDS, ADDS and add a shift parameter
so that all arith instructions can make use of shifted registers.

Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Message-id: 51AC998B.7070506@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2013-06-12 16:20:22 +01:00
Claudio Fontana 4a136e0a6b tcg/aarch64: implement new TCG target for aarch64
add preliminary support for TCG target aarch64.

Signed-off-by: Claudio Fontana <claudio.fontana@huawei.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Message-id: 51A5C596.3090108@huawei.com
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2013-06-12 16:20:22 +01:00
Richard Henderson 56bbc2f967 tcg: Remove redundant tcg_target_init checks
We've got a compile-time check for the condition in exec/cpu-defs.h.

Reviewed-by: Andreas Färber <afaerber@suse.de>
Reviewed-by: liguang <lig.fnst@cn.fujitsu.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-06-05 05:54:40 -07:00
Aurelien Jarno 66e61b55f1 tcg/optimize: fix setcond2 optimization
When setcond2 is rewritten into setcond, the state of the destination
temp should be reset, so that a copy of the previous value is not
used instead of the result.

Reported-by: Michael Tokarev <mjt@tls.msk.ru>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-05-09 16:14:58 +02:00
Richard Henderson c9e53a4cf1 tcg-arm: Use movi32 in exit_tb
Avoid the mini constant pool for armv7, and avoid replicating
the test for pre-v7.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-05-03 11:53:30 +02:00
Richard Henderson 8ddaeb1be6 tcg-arm: Fix 64-bit tlb load for pre-v6
Found by inspection, since the effect of the bug was simply to
send all memory ops through the slow path.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-05-03 11:53:29 +02:00
Richard Henderson 96fbd7de36 tcg-arm: Remove long jump from tcg_out_goto_label
Branches within a TB will always be within 16MB.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:45 +02:00
Richard Henderson df5e0ef711 tcg-arm: Convert to CONFIG_QEMU_LDST_OPTIMIZATION
Move the slow path out of line, as the TODO's mention.
This allows the fast path to be unconditional, which can
speed up the fast path as well, depending on the core.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:45 +02:00
Richard Henderson 302fdde73f tcg-arm: Use movi32 + blx for calls on v7
Work better with branch predition when we have movw+movt,
as the size of the code is the same.  Perhaps re-evaluate
when we have a proper constant pool.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:45 +02:00
Richard Henderson 595b5397cc tcg-arm: Delete the 'S' constraint
After the previous patch, 's' and 'S' are the same.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:45 +02:00
Richard Henderson 702b33b1d5 tcg-arm: Improve scheduling of tcg_out_tlb_read
The schedule was fully serial, with no possibility for dual issue.
The old schedule had a minimal issue of 7 cycles; the new schedule
has a minimal issue of 5 cycles.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:45 +02:00
Richard Henderson cee87be80a tcg-arm: Split out tcg_out_tlb_read
Share code between qemu_ld and qemu_st to process the tlb.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:45 +02:00
Richard Henderson 9feac1d770 tcg-arm: Cleanup most primitive load store subroutines
Use even more primitive helper functions to avoid lots of duplicated code.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:44 +02:00
Richard Henderson 34358a12c8 tcg-arm: Cleanup multiply subroutines
Make the code more readable by only having one copy of the magic
numbers, swapping registers as needed prior to that.  Speed the
compiler by not applying the rd == rn avoidance for v6 or later.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:44 +02:00
Richard Henderson 13dd6fb962 tcg-arm: Use R12 for the tcg temporary
R12 is call clobbered, while R8 is call saved.  This change
gives tcg one more call saved register for real data.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:44 +02:00
Richard Henderson 4346457a47 tcg-arm: Use TCG_REG_TMP name for the tcg temporary
Don't hard-code R8.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:44 +02:00
Richard Henderson 0637c56c99 tcg-arm: Implement division instructions
An armv7 extension implements division, present on Cortex A15.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:44 +02:00
Richard Henderson b6b24cb031 tcg-arm: Implement deposit for armv7
We have BFI and BFC available for implementing it.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:44 +02:00
Richard Henderson e86e0f2807 tcg-arm: Improve constant generation
Try fully rotated arguments to mov and mvn before trying movt
or full decomposition.  Begin decomposition with mvn when it
looks like it'll help.  Examples include

-:        mov   r9, #0x00000fa0
-:        orr   r9, r9, #0x000ee000
-:        orr   r9, r9, #0x0ff00000
-:        orr   r9, r9, #0xf0000000
+:        mvn   r9, #0x0000005f
+:        eor   r9, r9, #0x00011000

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:43 +02:00
Richard Henderson 2df3f1ee68 tcg-arm: Handle constant arguments to add2/sub2
We get to re-use the _rIN and _rIK subroutines to handle the various
combinations of add vs sub.  Fold the << 21 into the opcode enum values
so that we can explicitly add TO_CPSR as desired.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:43 +02:00
Richard Henderson 5d53b4c93c tcg-arm: Use tcg_out_dat_rIN for compares
This allows us to emit CMN instructions.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:43 +02:00
Richard Henderson d9fda57549 tcg-arm: Allow constant first argument to sub
This allows the generation of RSB instructions.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:43 +02:00
Richard Henderson a9a86ae95d tcg-arm: Handle negated constant arguments to and/sub
This greatly improves code generation for addition of small
negative constants.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:43 +02:00
Richard Henderson 19b62bf414 tcg-arm: Use bic to implement and with constant
This greatly improves the code we can produce for deposit
without armv7 support.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:16:42 +02:00
Richard Henderson d6b64b2b60 tcg: Log the contents of the prologue with -d out_asm
This makes it easier to verify changes to the code
generating the prologue.

[Aurelien: change the format from %i to %zu]

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 02:15:55 +02:00
Richard Henderson fc4d60ee16 tcg-arm: Fix local stack frame
We were not allocating TCG_STATIC_CALL_ARGS_SIZE, so this meant that
any helper with more than 4 arguments would clobber the saved regs.
Realizing that we're supposed to have this memory pre-allocated means
we can clean up the tcg_out_arg functions, which were trying to do
more stack allocation.

Allocate stack memory for the TCG temporaries while we're at it.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-27 01:19:20 +02:00
Aurelien Jarno ed605126a8 tcg: fix deposit_i64 op on 32-bit targets
On 32-bit TCG targets, when emulating deposit_i64 with a mov_i32 +
deposit_i32, care should be taken to not overwrite the low part of
the second argument before the deposit when it is the same the
destination.

This fixes the shld instruction in qemu-system-x86_64, which in turns
fixes booting "system rescue CD version 2.8.0" on this target.

Reported-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-04-27 01:10:18 +02:00
Richard Henderson 39dc85b985 tcg-ppc64: Handle deposit of zero
The TCG optimizer does great work when inserting constants, being able
to fold the open-coded deposit expansion to just an AND or an OR.  Avoid
a bit the regression caused by having the deposit opcode by expanding
deposit of zero as an AND.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:55 +02:00
Richard Henderson 6645c147db tcg-ppc64: Implement mulu2/muls2_i64
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:54 +02:00
Richard Henderson 6c858762de tcg-ppc64: Implement add2/sub2_i64
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:54 +02:00
Richard Henderson 1e6e9aca15 tcg-ppc64: Use getauxval for ISA detection
Glibc 2.16 includes an easy way to get feature bits previously
buried in /proc or the program startup auxiliary vector.  Use it.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:54 +02:00
Richard Henderson 027ffea972 tcg-ppc64: Implement movcond
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:54 +02:00
Richard Henderson 70fac59a2a tcg-ppc64: Use ISEL for setcond
There are a few simple special cases that should be handled first.
Break these out to subroutines to avoid code duplication.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:53 +02:00
Richard Henderson 6995a4a063 tcg-ppc64: Use MFOCRF instead of MFCR
It takes half the cycles to read one CR register instead of all 8.
This is a backward compatible addition to the ISA, so chips prior
to Power 2.00 spec will simply continue to read the entire CR register.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:53 +02:00
Richard Henderson 991041a4eb tcg-ppc64: Cleanup i32 constants to tcg_out_cmp
Nothing else in the call chain ensures that these
constants don't have garbage in the high bits.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:53 +02:00
Richard Henderson 4c314da6d1 tcg-ppc64: Use TCGType throughout compares
The optimization/bug being fixed is that tcg_out_cmp was not applying the
right type to loading a constant, in the case it can't be implemented
directly.  Rather than recomputing the TCGType enum from the arch64 bool,
pass around the original TCGType throughout.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:52 +02:00
Richard Henderson ef809300fc tcg-ppc64: Use I constraint for mul
The mul_i32 pattern was loading non-16-bit constants into a register,
when we can get the middle-end to do that for us.  The mul_i64 pattern
was not considering that MULLI takes 64-bit inputs.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:52 +02:00
Richard Henderson 33de9ed223 tcg-ppc64: Implement deposit
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:52 +02:00
Richard Henderson 37251b98db tcg-ppc64: Handle constant inputs for some compound logicals
Since we have special code to handle and/or/xor with a constant,
apply the same to andc/orc/eqv with a constant.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:51 +02:00
Richard Henderson ce1010d6e3 tcg-ppc64: Implement compound logicals
Mostly copied from the ppc32 port.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:51 +02:00
Richard Henderson 68aebd45b1 tcg-ppc64: Implement bswap64
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:51 +02:00
Richard Henderson 5d22158200 tcg-ppc64: Implement bswap16 and bswap32
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 20:09:44 +02:00
Richard Henderson 313d91c778 tcg-ppc64: Implement rotates
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:55:38 +02:00
Richard Henderson 49d9870a54 tcg-ppc64: Streamline qemu_ld/st insn selection
Using a table to look up insns of the right width and sign.
Include support for the Power 2.06 LDBRX and STDBRX insns.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:55:35 +02:00
Richard Henderson 28f2dba6dc tcg-ppc64: Use automatic implementation of ext32u_i64
The enhancements to and immediate obviate this.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:55:31 +02:00
Richard Henderson 637af30c76 tcg-ppc64: Improve and_i64 with constant
Use RLDICL and RLDICR.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:55:27 +02:00
Richard Henderson a9249dff4d tcg-ppc64: Improve and_i32 with constant
Use RLWINM

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:55:27 +02:00
Richard Henderson dce74c57bb tcg-ppc64: Tidy or and xor patterns.
Handle constants in common code; we'll want to reuse that later.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:55:26 +02:00
Richard Henderson 148bdd2373 tcg-ppc64: Allow constant first argument to sub
Using SUBFIC for 16-bit signed constants.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:55:22 +02:00
Richard Henderson ee924fa6b3 tcg-ppc64: Improve constant add and sub ops.
Improve constant addition -- previously we'd emit useless addi with 0.
Use new constraints to force the driver to pull full 64-bit constants
into a register.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:55:15 +02:00
Richard Henderson 3d582c6179 tcg-ppc64: Rearrange integer constant constraints
We'll need a zero, and Z makes more sense for that.  Make sure we
have a full compliment of signed and unsigned 16 and 32-bit tests.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:52:05 +02:00
Richard Henderson 421233a146 tcg-ppc64: Cleanup tcg_out_movi
The test for using movi32 was sub-optimal for TCG_TYPE_I32, comparing
a signed 32-bit quantity against an unsigned 32-bit quantity.

When possible, use addi+oris for 32-bit unsigned constants.  Otherwise,
standardize on addi+oris+ori instead of addis+ori+rldicl.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:52:04 +02:00
Richard Henderson 752c1fdb6d tcg-ppc64: Fix setcond_i32
We weren't ignoring the high 32 bits during a NE comparison.

Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:51:50 +02:00
Richard Henderson 2fd8eddcab tcg-ppc64: Introduce and use TAI and SAI
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:44:48 +02:00
Richard Henderson 5e916c287e tcg-ppc64: Introduce and use tcg_out_shri64
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:44:46 +02:00
Richard Henderson 0a9564b964 tcg-ppc64: Introduce and use tcg_out_shli64
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:44:44 +02:00
Richard Henderson 6e5e06024f tcg-ppc64: Introduce and use tcg_out_ext32u
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:44:41 +02:00
Richard Henderson 9e555b735c tcg-ppc64: Introduce and use tcg_out_rlw
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:44:39 +02:00
Richard Henderson aceac8d685 tcg-ppc64: Use TCGReg everywhere
Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-15 19:44:37 +02:00
Aurelien Jarno 0a9c2341de Merge branch 'tci' of git://qemu.weilnetz.de/qemu
* 'tci' of git://qemu.weilnetz.de/qemu:
  tci: Make tcg temporaries local to tcg_qemu_tb_exec
  tci: Delete unused tb_ret_addr
  tci: Avoid code before declarations
  tci: Use a local variable for env
  tci: Use 32-bit signed offsets to loads/stores
2013-04-13 13:50:06 +02:00
Richard Henderson ee79c356ff tci: Make tcg temporaries local to tcg_qemu_tb_exec
We're moving away from the temporaries stored in env.  Make sure we can
differentiate between temp stores and possibly bogus stores for extra
call arguments.  Move TCG_AREG0 and TCG_REG_CALL_STACK out of the way
of the parameter passing registers.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off by: Stefan Weil <sw@weilnetz.de>
2013-04-11 19:58:21 +02:00
Richard Henderson 4699ca6dbf tci: Delete unused tb_ret_addr
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off by: Stefan Weil <sw@weilnetz.de>
2013-04-11 19:58:21 +02:00
Richard Henderson 03fc0548b7 tci: Use 32-bit signed offsets to loads/stores
Since the change to tcg_exit_req, the first insn of every TB is
a load with a negative offset from env.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off by: Stefan Weil <sw@weilnetz.de>
2013-04-11 19:58:21 +02:00
Richard Henderson b879f30846 tcg-s390: Fix merge error in tgen_brcond
When the TCG condition codes were re-organized last year,
we failed to update all of the "old-style" tests for unsigned.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-05 13:35:42 -05:00
Richard Henderson 78c9f7c5b0 tcg-s390: Use all 20 bits of the offset in tcg_out_mem
This can save one insn, if the constant has any bits in 32-63 set,
but no bits in 21-31 set.  It never results in more insns.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-05 13:35:41 -05:00
Richard Henderson 0db921e6d8 tcg-s390: Use load-address for addition
Since we're always in 64-bit mode, load address performs a full
64-bit add.  Use that for 3-address addition, as well as for
larger constant addends when we lack extended-immediates facility.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-05 13:35:41 -05:00
Richard Henderson 65a62a753c tcg-s390: Cleanup argument shuffling fixme in softmmu code
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-05 13:35:41 -05:00
Richard Henderson f0bffc2730 tcg-s390: Use risbgz for andi
This is immediately usable by the tlb lookup code.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-05 13:35:41 -05:00
Richard Henderson 07ff798313 tcg-s390: Remove constraint letters for and
Since we have a free temporary and can always just load the constant, we
ought to do so, rather than spending the same effort constraining the const.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-05 13:35:41 -05:00
Richard Henderson d5690ea433 tcg-s390: Implement deposit opcodes
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-05 13:35:40 -05:00
Richard Henderson 96a9f093f8 tcg-s390: Implement movcond opcodes
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-05 13:35:40 -05:00
Richard Henderson 36017dc68a tcg-s390: Implement mulu2_i64 opcode
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-05 13:35:40 -05:00
Richard Henderson 3790b9180a tcg-s390: Implement add2/sub2 opcodes
Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-05 13:35:40 -05:00
Richard Henderson a01fc30da4 tcg-s390: Remove useless preprocessor conditions
We only support 64-bit code generation for s390x.
Don't clutter the code with ifdefs that suggest otherwise.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-05 13:35:40 -05:00
Richard Henderson a4924e8bb5 tcg-s390: Properly allocate a stack frame.
Set TCG_TARGET_CALL_STACK_OFFSET properly for the abi.  Allocate the
standard TCG_STATIC_CALL_ARGS_SIZE.  And while we're at it, allocate
space for CPU_TEMP_BUF_NLONGS.

Signed-off-by: Richard Henderson <rth@twiddle.net>
2013-04-05 13:35:40 -05:00
Richard Henderson a22971f99f tcg-s390: Fix movi
The code to load the high 64 bits assumed that the insn used to
load the low 64 bits zero-extended.  Enforce that.
2013-04-05 13:35:39 -05:00
Aurelien Jarno 174d4d215f tcg/mips: Implement muls2_i32
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2013-04-01 18:49:17 +02:00
Richard Henderson 2d497542e1 tcg-optimize: Fold sub r,0,x to neg r,x
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-03-23 14:31:03 +00:00
陳韋任 (Wei-Ren Chen) 294e4669a5 Use proper term in TCG README
In TCG, "target" means the host architecture for which TCG generates
the code. Using "guest" rather than "target" to make the document more
consistent.

Signed-off-by: Chen Wei-Ren <chenwj@iis.sinica.edu.tw>
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-03-22 15:55:03 +01:00
Peter Maydell 378df4b237 Handle CPU interrupts by inline checking of a flag
Fix some of the nasty TCG race conditions and crashes by implementing
cpu_exit() as setting a flag which is checked at the start of each TB.
This avoids crashes if a thread or signal handler calls cpu_exit()
while the execution thread is itself modifying the TB graph (which
may happen in system emulation mode as well as in linux-user mode
with a multithreaded guest binary).

This fixes the crashes seen in LP:668799; however there are another
class of crashes described in LP:1098729 which stem from the fact
that in linux-user with a multithreaded guest all threads will
use and modify the same global TCG date structures (including the
generated code buffer) without any kind of locking. This means that
multithreaded guest binaries are still in the "unsupported"
category.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-03-03 14:28:47 +00:00
Peter Maydell 0980011b4f tcg: Document tcg_qemu_tb_exec() and provide constants for low bit uses
Document tcg_qemu_tb_exec(). In particular, its return value is a
combination of a pointer to the next translation block and some
extra information in the low two bits. Provide some #defines for
the values passed in these bits to improve code clarity.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-03-03 14:28:19 +00:00
Blue Swirl 07ca08bac8 tcg-sparc: fix build
Fix build breakage by 803d805bcef4ea7b7d6ef0b4929263e1160d6b3c:
make tcg_out_addsub2() always available.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-03-03 08:25:50 +00:00
Peter Maydell 989b697ddd qemu-log: default to stderr for logging output
Switch the default for qemu_log logging output from "/tmp/qemu.log"
to stderr. This is an incompatible change in some sense, but logging
is mostly used for debugging purposes so it shouldn't affect production
use. The previous behaviour can be obtained by adding "-D /tmp/qemu.log"
to the command line.

This change requires us to:
 * update all the documentation/help text (we take the opportunity
   to smooth out minor inconsistencies between the phrasing in
   linux-user/bsd-user/system help messages)
 * make linux-user and bsd-user defer to qemu-log for the default
   logging destination rather than overriding it themselves
 * ensure that all logfile closing is done via qemu_log_close()
   and that that function doesn't close stderr
as well as the obvious change to the behaviour of do_qemu_set_log()
when no logfile name has been specified.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Message-id: 1361901160-28729-1-git-send-email-peter.maydell@linaro.org
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-02-26 13:31:47 -06:00
Richard Henderson f1fae40c61 tcg: Apply life analysis to 64-bit multiword arithmetic ops
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:29 +00:00
Richard Henderson f402f38f43 tcg: Implement muls2 with mulu2
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:29 +00:00
Richard Henderson d693e14733 tcg-arm: Implement muls2_i32
We even had the encoding of smull already handy...

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:29 +00:00
Richard Henderson 624988a53b tcg-i386: Implement multiword arithmetic ops
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:29 +00:00
Richard Henderson f6953a7399 tcg: Implement multiword addition helpers
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson 696a8be6a0 tcg: Implement multiword multiply helpers
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson 3c51a98507 tcg: Implement a 64-bit to 32-bit extraction helper
We're going to have use for this shortly in implementing other helpers.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson 4d3203fd0b tcg: Add signed multiword multiplication operations
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson d7156f7ce4 tcg: Add 64-bit multiword arithmetic operations
Matching the 32-bit multiword arithmetic that we already have.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson 803d805bce tcg-sparc: Always implement 32-bit multiword ops
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson bbc863bfec tcg-i386: Always implement 32-bit multiword ops
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Richard Henderson e6a7273454 tcg: Make 32-bit multiword operations optional for 64-bit hosts
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-23 17:25:28 +00:00
Andreas Färber be96bd3fbf tcg/ppc: Fix build of tcg_qemu_tb_exec()
Commit 0b0d3320db (TCG: Final globals
clean-up) moved code_gen_prologue but forgot to update ppc code.
This broke the build on 32-bit ppc. ppc64 is unaffected.

Cc: Evgeny Voevodin <evgenyvoevodin@gmail.com>
Cc: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Andreas Färber <andreas.faerber@web.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-17 14:27:36 +00:00
Peter Maydell 24537a0191 qemu-log: Rename the public-facing cpu_set_log function to qemu_set_log
Rename the public-facing function cpu_set_log to qemu_set_log. This
requires us to rename the internal-only qemu_set_log() to
do_qemu_set_log().

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-16 10:44:44 +00:00
Evgeny Voevodin 5e5f07e08f TCG: Move translation block variables to new context inside tcg_ctx: tb_ctx
It's worth to clean-up translation blocks variables and move them
into one context as was suggested by Swirl.
Also if we use this context directly inside tcg_ctx, then it
speeds up code generation a bit.

Signed-off-by: Evgeny Voevodin <evgenyvoevodin@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-16 10:41:16 +00:00
Evgeny Voevodin 0b0d3320db TCG: Final globals clean-up
Signed-off-by: Evgeny Voevodin <evgenyvoevodin@gmail.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-02-16 10:40:56 +00:00
Peter Maydell 5256a7208a tcg/target-arm: Add missing parens to assertions
Silence a (legitimate) complaint about missing parentheses:

tcg/arm/tcg-target.c: In function ‘tcg_out_qemu_ld’:
tcg/arm/tcg-target.c:1148:5: error: suggest parentheses around
comparison in operand of ‘&’ [-Werror=parentheses]
tcg/arm/tcg-target.c: In function ‘tcg_out_qemu_st’:
tcg/arm/tcg-target.c:1357:5: error: suggest parentheses around
comparison in operand of ‘&’ [-Werror=parentheses]

which meant that we would mistakenly always assert if running
a QEMU built with debug enabled on ARM.

Signed-off-by: Peter Maydell <peter.maydelL@linaro.org>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:27:45 +00:00
Paolo Bonzini 633f650254 optimize: optimize using nonzero bits
This adds two optimizations using the non-zero bit mask.  In some cases
involving shifts or ANDs the value can become zero, and can thus be
optimized to a move of zero.  Second, useless zero-extension or an
AND with constant can be detected that would only zero bits that are
already zero.

The main advantage of this optimization is that it turns zero-extensions
into moves, thus enabling much better copy propagation (around 1% code
reduction).  Here is for example a "test $0xff0000,%ecx + je" before
optimization:

 mov_i64 tmp0,rcx
 movi_i64 tmp1,$0xff0000
 discard cc_src
 and_i64 cc_dst,tmp0,tmp1
 movi_i32 cc_op,$0x1c
 ext32u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,eq,$0x0

and after (without patch on the left, with on the right):

 movi_i64 tmp1,$0xff0000                 movi_i64 tmp1,$0xff0000
 discard cc_src                          discard cc_src
 and_i64 cc_dst,rcx,tmp1                 and_i64 cc_dst,rcx,tmp1
 movi_i32 cc_op,$0x1c                    movi_i32 cc_op,$0x1c
 ext32u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0                     movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,eq,$0x0           brcond_i64 cc_dst,tmp12,eq,$0x0

Other similar cases: "test %eax, %eax + jne" where eax is already 32-bit
(after optimization, without patch on the left, with on the right):

 discard cc_src                          discard cc_src
 mov_i64 cc_dst,rax                      mov_i64 cc_dst,rax
 movi_i32 cc_op,$0x1c                    movi_i32 cc_op,$0x1c
 ext32u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0                     movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,ne,$0x0           brcond_i64 rax,tmp12,ne,$0x0

"test $0x1, %dl + je":

 movi_i64 tmp1,$0x1                      movi_i64 tmp1,$0x1
 discard cc_src                          discard cc_src
 and_i64 cc_dst,rdx,tmp1                 and_i64 cc_dst,rdx,tmp1
 movi_i32 cc_op,$0x1a                    movi_i32 cc_op,$0x1a
 ext8u_i64 tmp0,cc_dst
 movi_i64 tmp12,$0x0                     movi_i64 tmp12,$0x0
 brcond_i64 tmp0,tmp12,eq,$0x0           brcond_i64 cc_dst,tmp12,eq,$0x0

In some cases TCG even outsmarts GCC. :)  Here the input code has
"and $0x2,%eax + movslq %eax,%rbx + test %rbx, %rbx" and the optimizer,
thanks to copy propagation, does the following:

 movi_i64 tmp12,$0x2                     movi_i64 tmp12,$0x2
 and_i64 rax,rax,tmp12                   and_i64 rax,rax,tmp12
 mov_i64 cc_dst,rax                      mov_i64 cc_dst,rax
 ext32s_i64 tmp0,rax                  -> nop
 mov_i64 rbx,tmp0                     -> mov_i64 rbx,cc_dst
 and_i64 cc_dst,rbx,rbx               -> nop

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:13:16 +00:00
Paolo Bonzini 3a9d8b179b optimize: track nonzero bits of registers
Add a "mask" field to the tcg_temp_info struct.  A bit that is zero
in "mask" will always be zero in the corresponding temporary.
Zero bits in the mask can be produced from moves of immediates,
zero-extensions, ANDs with constants, shifts; they can then be
be propagated by logical operations, shifts, sign-extensions,
negations, deposit operations, and conditional moves.  Other
operations will just reset the mask to all-ones, i.e. unknown.

[rth: s/target_ulong/tcg_target_ulong/]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:13:14 +00:00
Paolo Bonzini d193a14a2c optimize: only write to state when clearing optimizer data
The next patch will add to the TCG optimizer a field that should be
non-zero in the default case.  Thus, replace the memset of the
temps array with a loop.  Only the state field has to be up-to-date,
because others are not used except if the state is TCG_TEMP_COPY
or TCG_TEMP_CONST.

[rth: Extracted the loop to a function.]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-19 10:13:13 +00:00
Paolo Bonzini 163fa4b09d tcg-i386: use LEA for 3-operand 64-bit addition
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2013-01-12 12:45:56 +00:00
Stefan Weil 9a8a5ae69d tcg: Remove unneeded assertion
Commit 7f6f0ae5b9 added two assertions.

One of these assertions is not needed:
The pointer ts is never NULL because it is initialized with the
address of an array element.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2013-01-02 11:23:21 -06:00
Richard Henderson 753d99d38b tcg-hppa: Fix typo in brcond2
Reported-by: Stuart Brady <sdb@zubnet.me.uk>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-12-29 12:21:53 +00:00
Richard Henderson 76a347e1cd tcg-i386: Perform cmov detection at runtime for 32-bit.
Existing compile-time detection is spotty at best.  Convert
it all to runtime detection instead.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-12-29 12:21:16 +00:00
Richard Henderson afcb92beac tcg: Add TCGV_IS_UNUSED_*
Cc: Aurelien Jarno <aurelien@aurel32.net>
Signed-off-by: Richard Henderson <rth@twiddle.net>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-12-29 12:14:07 +00:00
Paolo Bonzini 1de7afc984 misc: move include files to include/qemu/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:32:39 +01:00
Paolo Bonzini 022c62cbbc exec: move include files to include/exec/
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Paolo Bonzini cb9c377f54 janitor: add guards to headers
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2012-12-19 08:31:31 +01:00
Anthony Liguori 7c12fd9b29 Merge remote-tracking branch 'stefanha/trivial-patches' into staging
* stefanha/trivial-patches:
  pc_sysfw: Plug memory leak on pc_fw_add_pflash_drv() error path
  qemu-options: Fix space at EOL
  Fix spelling in comments and documentation
  Clean up pci_drive_hot_add()'s use of BlockInterfaceType
  arm: a9mpcore: remove un-used ptimer_iomem field
  target-sparc: Remove t0, t1 from CPUSPARCState
  target-m68k: Remove t1 from CPUM68KState
  target-alpha: Remove t0, t1 from CPUAlphaState
  s390x: Spelling fixes (endianess -> endianness, occured -> occurred)
  Fix comments (adress -> address, layed -> laid, wierd -> weird)
  Fix spelling (prefered -> preferred)
  configure: Remove stray debug output
  sd: Send debug printfery to stderr not stdout

Conflicts:
	configure

Resolve spelling conflict in configure.

Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
2012-12-10 08:34:29 -06:00
Evgeny Voevodin c3a43607d9 tcg/tcg.h: Duplicate global TCG gen_opc_ arrays into TCGContext.
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-12-08 14:24:41 +00:00
Stefan Weil a93cf9dfba Fix comments (adress -> address, layed -> laid, wierd -> weird)
Remove also a duplicated 'the'.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2012-12-07 12:34:11 +01:00
Aurelien Jarno e5138db510 tcg: mark local temps as MEM in dead_temp()
In dead_temp, local temps should always be marked as back to memory,
even if they have not been allocated (i.e. they are discared before
cross a basic block).

It fixes the following assertion in target-xtensa:

    qemu-system-xtensa: tcg/tcg.c:1665: temp_save: Assertion `s->temps[temp].val_type == 2 || s->temps[temp].fixed_reg' failed.
    Aborted

Reported-by: Max Filippov <jcmvbkbc@gmail.com>
Tested-by: Max Filippov <jcmvbkbc@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-11-24 13:24:13 +01:00
Aurelien Jarno 7aab08aa78 tcg/arm: fix cross-endian qemu_st16
The bswap16 TCG opcode assumes that the high bytes of the temp equal
to 0 before calling it. The ARM backend implementation takes this
assumption to slightly optimize the generated code.

The same implementation is called for implementing the cross-endian
qemu_st16 opcode, where this assumption is not true anymore. One way to
fix that would be to zero the high bytes before calling it. Given the
store instruction just ignore them, it is possible to provide a slightly
more optimized version. With ARMv6+ the rev16 instruction does the work
correctly. For lower ARM versions the patch provides a version which
behaves correctly with non-zero high bytes, but fill them with junk.

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-stable@nongnu.org
Reviewed-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-11-24 13:19:53 +01:00
Aurelien Jarno d17bd1d8cc tcg/arm: fix TLB access in qemu-ld/st ops
The TCG arm backend considers likely that the offset to the TLB
entries does not exceed 12 bits for mem_index = 0. In practice this is
not true for at least the MIPS target.

The current patch fixes that by loading the bits 23-12 with a separate
instruction, and using loads with address writeback, independently of
the value of mem_idx. In total this allow a 24-bit offset, which is a
lot more than needed.

Cc: Andrzej Zaborowski <balrogg@gmail.com>
Cc: Peter Maydell <peter.maydell@linaro.org>
Cc: qemu-stable@nongnu.org
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-11-24 13:19:53 +01:00
malc ecf51c9abe tcg/ppc: Fix !softmmu case
Signed-off-by: malc <av1474@comtv.ru>
2012-11-21 10:56:22 +04:00
malc ecdffbccd7 tcg/ppc: Remove unused s_bits variable
Thanks to Alexander Graf for heads up.

Signed-off-by: malc <av1474@comtv.ru>
2012-11-19 22:22:24 +04:00
Stefan Weil e24dc9feb0 tci: Support deposit operations
The operations for INDEX_op_deposit_i32 and INDEX_op_deposit_i64
are now supported and enabled by default.

Signed-off-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-18 20:40:08 +00:00
Evgeny Voevodin 83eeb39669 TCG: Remove unused global variables
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-17 13:53:38 +00:00
Evgeny Voevodin 1ff0a2c594 TCG: Use gen_opparam_buf from context instead of global variable.
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-17 13:53:37 +00:00
Evgeny Voevodin 92414b31e7 TCG: Use gen_opc_buf from context instead of global variable.
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-17 13:53:36 +00:00
Evgeny Voevodin c4afe5c4d3 TCG: Use gen_opparam_ptr from context instead of global variable.
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-17 13:53:34 +00:00
Evgeny Voevodin efd7f48600 TCG: Use gen_opc_ptr from context instead of global variable.
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-17 13:53:27 +00:00
Evgeny Voevodin 8232a46a16 tcg/tcg.h: Duplicate global TCG variables in TCGContext
Signed-off-by: Evgeny Voevodin <e.voevodin@samsung.com>
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-17 13:53:26 +00:00
Kirill Batuzov 3c5645fab3 tcg: properly check that op's output needs to be synced to memory
Fix typo introduced in b3a1be87ba.

Reported-by: Ruslan Savchenko <ruslan.savchenko@gmail.com>
Signed-off-by: Kirill Batuzov <batuzovk@ispras.ru>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-11-11 16:06:46 +01:00
malc c878da3b27 tcg/ppc32: Use trampolines to trim the code size for mmu slow path accessors
mmu access looks something like:

<check tlb>
if miss goto slow_path
<fast path>
done:
...

; end of the TB
slow_path:
 <pre process>
 mr r3, r27         ; move areg0 to r3
                    ; (r3 holds the first argument for all the PPC32 ABIs)
 <call mmu_helper>
 b $+8
 .long done
 <post process>
 b done

On ppc32 <call mmu_helper> is:

(SysV and Darwin)

mmu_helper is most likely not within direct branching distance from
the call site, necessitating

a. moving 32 bit offset of mmu_helper into a GPR ; 8 bytes
b. moving GPR to CTR/LR                          ; 4 bytes
c. (finally) branching to CTR/LR                 ; 4 bytes

r3 setting              - 4 bytes
call                    - 16 bytes
dummy jump over retaddr - 4 bytes
embedded retaddr        - 4 bytes
         Total overhead - 28 bytes

(PowerOpen (AIX))
a. moving 32 bit offset of mmu_helper's TOC into a GPR1 ; 8 bytes
b. loading 32 bit function pointer into GPR2            ; 4 bytes
c. moving GPR2 to CTR/LR                                ; 4 bytes
d. loading 32 bit small area pointer into R2            ; 4 bytes
e. (finally) branching to CTR/LR                        ; 4 bytes

r3 setting              - 4 bytes
call                    - 24 bytes
dummy jump over retaddr - 4 bytes
embedded retaddr        - 4 bytes
         Total overhead - 36 bytes

Following is done to trim the code size of slow path sections:

In tcg_target_qemu_prologue trampolines are emitted that look like this:

trampoline:
mfspr r3, LR
addi  r3, 4
mtspr LR, r3      ; fixup LR to point over embedded retaddr
mr    r3, r27
<jump mmu_helper> ; tail call of sorts

And slow path becomes:

slow_path:
 <pre process>
 <call trampoline>
 .long done
 <post process>
 b done

call                    - 4 bytes (trampoline is within code gen buffer
                                   and most likely accessible via
                                   direct branch)
embedded retaddr        - 4 bytes
         Total overhead - 8 bytes

In the end the icache pressure is decreased by 20/28 bytes at the cost
of an extra jump to trampoline and adjusting LR (to skip over embedded
retaddr) once inside.

Signed-off-by: malc <av1474@comtv.ru>
2012-11-06 04:37:57 +04:00
malc ed224a56b3 tcg/ppc: ld/st optimization
Signed-off-by: malc <av1474@comtv.ru>
2012-11-03 20:17:54 +04:00
Yeongkyoon Lee b76f0d8c2e tcg: Optimize qemu_ld/st by generating slow paths at the end of a block
Add optimized TCG qemu_ld/st generation which locates the code of TLB miss
cases at the end of a block after generating the other IRs.
Currently, this optimization supports only i386 and x86_64 hosts.

Signed-off-by: Yeongkyoon Lee <yeongkyoon.lee@samsung.com>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-11-03 09:44:21 +00:00
Aurelien Jarno b3a1be87ba tcg: don't remove op if output needs to be synced to memory
Commit 9c43b68de6 do not correctly check
for dead outputs when they need to be synced to memory in case of
half-dead operations.

Fix that by applying the same pattern than for the default case.

Tested-by: Stefan Weil <sw@weilnetz.de>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-31 22:20:45 +01:00
Aurelien Jarno 3585317f6f tcg/mips: use MUL instead of MULT on MIPS32 and above
MIPS32 and later instruction sets have a multiplication instruction
directly operating on GPRs. It only produces a 32-bit result but
it is exactly what is needed by QEMU.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-30 00:34:48 +01:00
Richard Henderson 44b37ace06 tcg-i386: Use %gs prefixes for x86_64 GUEST_BASE
When we allocate a reserved_va for the guest, the kernel will likely
choose an address well above 4G.  At which point we must use a pair
of movabsq+addq to form the host address.  If we have OS support,
set up a segment register to point to guest_base instead.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:25 +01:00
Aurelien Jarno b393ab4228 tcg: remove compatiblity call flags
Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:24 +01:00
Aurelien Jarno 7850527966 tcg: rework TCG helper flags
The current helper flags, TCG_CALL_CONST and TCG_CALL_PURE might be
confusing and doesn't provide enough granularity for some helpers (FP
helpers for example).

This patch changes them into the following helpers flags:
- TCG_CALL_NO_READ_GLOBALS means that the helper does not read globals,
  either directly or via an exception. They will not be saved to their
  canonical location before calling the helper.
- TCG_CALL_NO_WRITE_GLOBALS means that the helper does not modify any
  globals. They will only be saved to their canonical locations before
  calling helpers, but they won't be reloaded afterwise.
- TCG_CALL_NO_SIDE_EFFECTS means that the call to the function is
  removed if the return value is not used.

It provides convenience flags, to avoid helper definitions longer than
80 characters. It also provides compatibility flags, and updates the
documentation.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:23 +01:00
Aurelien Jarno 3d5c5f876d tcg: synchronize globals for ops with side effects
Operations with side effects (in practice qemu_ld/st ops), only need to
synchronize globals to make sure the CPU state is consistent in case of
exception.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:22 +01:00
Aurelien Jarno b202d41ee7 tcg: forbid ld/st function to modify globals
Mapping a memory address using a global and accessing it through
ld/st operations is currently broken. As it doesn't make any sense
to do that performance wise, let's forbid that.

Update the TCG documentation, and remove partial support for that.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:22 +01:00
Aurelien Jarno 344028ba0f tcg: fix some op flags
Some branch related ops are marked with TCG_OPF_SIDE_EFFECTS, some other
not. In practice they don't need to, as they are all marked with
TCG_OPF_BB_END, which is handled specifically in all the code.

The call op is marked as TCG_OPF_SIDE_EFFECTS, which might be not true
as there is are specific flags (TCG_CALL_CONST and TCG_CALL_PURE) for
specifying that. On the other hand it always clobber arguments, so mark
it as such even if the call op is handled in a different code path.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:22 +01:00
Aurelien Jarno 2c0366f036 tcg: don't explicitly save globals and temps
The liveness analysis ensures that globals and temps are at the correct
state at a basic block end or with an op with side effects. Avoid
looping on all temps, this can be time consuming on targets with a lot
of globals. Keep an assert in debug mode.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:22 +01:00
Aurelien Jarno 7dfd8c6aa1 tcg: start with local temps in TEMP_VAL_MEM state
Start with local temps in TEMP_VAL_MEM state, to make possible a later
check that all the temps are correctly saved back to memory.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:22 +01:00
Aurelien Jarno a52ad07e7c tcg: always mark dead input arguments as dead
Always mark dead input arguments as dead, even if the op is at the basic
block end. This will allow to check that all temps are correctly saved.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:22 +01:00
Aurelien Jarno c29c1d7edf tcg: rewrite tcg_reg_alloc_mov()
Now that the liveness analysis provides more information, rewrite
tcg_reg_alloc_mov(). This changes the behaviour about propagating
constants and memory accesses. We now take the assumption that once
a value is loaded into a register (from memory or from a constant),
it's better to keep it there than to reload it later. This assumption
is now always almost correct given that we are now sure the
corresponding temp is going to be used later (otherwise it would have
been synchronized and marked as dead already). The assumption is wrong
if one of the op after clobbers some registers including the one
of the holding the temp (this can be avoided by allocating clobbered
registers last, which is what most TCG target do), or in case of lack
of available register.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:22 +01:00
Aurelien Jarno 4c4e1ab26b tcg: improve tcg_reg_alloc_movi()
Now that the liveness analysis might mark some output temps as dead, call
temp_dead() if needed.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:21 +01:00
Aurelien Jarno 9c43b68de6 tcg: rework liveness analysis
Rework the liveness analysis by tracking temps that need to go back to
memory in addition to dead temps tracking. This allows to mark output
arguments as "need sync", and to synchronize them back to memory as soon
as they are not written anymore. This way even arguments mapping to
globals can be marked as "dead", avoiding moves to a new register when
input and outputs are aliased.

In addition it means that registers are freed as soon as temps are not
used anymore, instead of waiting for a basic block end or an op with side
effects. This reduces register spilling especially on CPUs with few
registers, and spread the mov over all the TB, increasing the
performances on in-order CPUs.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:21 +01:00
Aurelien Jarno ec7a869d31 tcg: sync output arguments on liveness request
Synchronize an output argument when requested by the liveness analysis.
This is needed so that the temp can be declared dead later.

For that, add a new op_sync_args table in which each bit tells if the
corresponding output argument needs to be synchronized with the memory.
Pass it to the tcg_reg_alloc_* functions, and honor this bit. We need to
synchronize the argument before marking it as dead, and we have to make
sure all the infos about the temp are correctly filled.

At the same time change some types from unsigned int to uint16_t when
passing op_dead_args.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:21 +01:00
Aurelien Jarno 1ad80729be tcg: add temp_sync()
Add a new function temp_sync() to synchronize the canonical location
of a temp with the value in the corresponding register, but without
freeing the associated register. Rewrite temp_save() to call
temp_sync() followed by temp_dead().

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:21 +01:00
Aurelien Jarno 7f6ceedf9c tcg: add tcg_reg_sync()
Add a new function tcg_reg_sync() to synchronize the canonical location
of a temp with the value in the associated register, but without freeing
it. Rewrite tcg_reg_free() to first call tcg_reg_sync() and then to free
the register.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:21 +01:00
Aurelien Jarno 639368dd68 tcg: add temp_dead()
A lot of code is duplicated to mark a temporary as dead. Replace it
by temp_dead(), which in addition marks the temp as saved in memory
for globals and local temps, instead of doing this a posteriori in
temp_save().

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:21 +01:00
Aurelien Jarno 17b914912d tcg/i386: remove ld/st third argument register constraint
On x86_64, remove the constraint on the third argument register which
is not needed:
 - For loads the helper arguments are env, addr, mem_idx. The addr
   value should not be in the two first argument registers as they are
   used in tcg_out_tlb_load().
 - For stores the helper arguments are env, addr, data, mem_idx.
   The addr and data values should not be in the two first argument
   registers as they are used in tcg_out_tlb_load(). The data value
   should also not be in the two first argument registers, but could
   be in the third argument register in which case it would be already
   loaded at the right location.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:15 +01:00
Aurelien Jarno 166792f7bb tcg/i386: remove suboptimal register shifting
Now that CONFIG_TCG_PASS_AREG0 has been removed, it's easier to get
an optimal code for the load/store functions.

First swap the two registers used in tcg_out_tlb_load() so that the
address end-up in the second register instead of the first one. Adjust
tcg_out_qemu_ld() and tcg_out_qemu_st() to respectively call
tcg_out_qemu_ld_direct() and tcg_out_qemu_st_direct() with the correct
registers. Then replace the register shifting by direct load of the
arguments.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-28 14:54:05 +01:00
Richard Henderson 4438c8a946 exec: Allocate code_gen_prologue from code_gen_buffer
We had a hack for arm and sparc, allocating code_gen_prologue to a
special section.  Which, honestly does no good under certain cases.
We've already got limits on code_gen_buffer_size to ensure that all
TBs can use direct branches between themselves; reuse this limit to
ensure the prologue is also reachable.

As a bonus, we get to avoid marking a page of the main executable's
data segment as executable.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
2012-10-20 07:54:04 +00:00
Aurelien Jarno 41a05a4576 Merge branch 'linux-user-for-upstream' of git://git.linaro.org/people/rikuvoipio/qemu
* 'linux-user-for-upstream' of git://git.linaro.org/people/rikuvoipio/qemu:
  linux-user: register align p{read, write}64
  linux-user: ppc: mark as long long aligned
  tcg: Remove TCG_TARGET_HAS_GUEST_BASE define
  configure: Remove unnecessary host_guest_base code
  linux-user: If loading fails, print error as string, not number
  linux-user: Fix siginfo handling
  alpha-linux-user: Fix sigaltstack structure definition
  linux-user: Implement gethostname
  linux-user: Perform more checks on iovec lists
  linux-user: fix multi-threaded /proc/self/maps
  linux-user: fix statfs
2012-10-19 20:28:22 +02:00
Richard Henderson 1414968a6a tcg: Optimize mulu2
Like add2, do operand ordering, constant folding, and dead operand
elimination.  The latter happens about 15% of all mulu2 during an
x86_64 bios boot.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:51:39 +02:00
Richard Henderson 1305c451e6 tcg: Optimize half-dead add2/sub2
When x86_64 guest is not in 64-bit mode, the high-part of the 64-bit
add is dead.  When the host is 32-bit, we can simplify to 32-bit
arithmetic.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:51:37 +02:00
Richard Henderson 212c328d61 tcg: Constant fold add2 and sub2
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:51:37 +02:00
Richard Henderson 6c4382f8f4 tcg: Do constant folding on double-word comparisons
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:51:35 +02:00
Richard Henderson 9519da7e39 tcg: Split out subroutines from do_constant_folding_cond
We can re-use these for implementing double-word folding.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:51:32 +02:00
Richard Henderson bc1473eff4 tcg: Optimize double-word comparisons against zero
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:32:29 +02:00
Richard Henderson 6e14e91b66 tcg: Use common code when failing to optimize
This saves a whole lot of repetitive code sequences.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:32:01 +02:00
Richard Henderson 0bfcb86538 tcg: Swap commutative double-word comparisons
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:31:57 +02:00
Richard Henderson 1e484e61e2 tcg: Canonicalize add2 operand ordering
Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:31:53 +02:00
Richard Henderson 24c9ae4eba tcg: Split out swap_commutative as a subroutine
Reduces code duplication and prefers

  movcond d, c1, c2, const, s
to
  movcond d, c1, c2, s, const

It also prefers

  add r, r, c
over
  add r, c, r

when both inputs are known constants.  This doesn't matter for true add, as
we will fully constant fold that.  But it matters for a follow-on patch using
this routine for add2 which may not be fully foldable.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 17:30:40 +02:00
Richard Henderson c7d4475a70 tcg-ia64: Implement deposit
Note that in the general reg=reg,reg case we're restricted
to 16-bit insertions.  This makes it easy to allow "any"
constant as input, as post-truncation it will fit into the
constant load insn for which we have room in the bundle.

Signed-off-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 01:26:43 +02:00
Aurelien Jarno 63975ea7df tcg/ia64: slightly optimize TLB access code
It is possible to slightly optimize the TLB access code, by replacing
the movi + and instructions by a deposit instruction.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 01:26:43 +02:00
Aurelien Jarno 2174d1e1ff tcg/ia64: remove suboptimal register shifting in qemu_ld/st ops
Remove suboptimal register shifting in qemu_ld/st ops, introduced at the
CONFIG_TCG_PASS_AREG0 time.

As mem_idx is now loaded in register R58/R59 for the slow path, we have
to make sure to do it last, to not add additional register constraints.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 01:26:43 +02:00
Aurelien Jarno b90cf71692 tcg/ia64: implement movcond_i32/64
Implement movcond_i32/64 on ia64 hosts. It is not possible to have
immediate compare arguments without adding a new bundle, but it is
possible to have 22-bit immediate value arguments.

Reviewed-by: Richard Henderson <rth@twiddle.net>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 01:26:42 +02:00
Blue Swirl da897bf5ae tcg/ia64: use stack for TCG temps
Use stack instead of temp_buf array in CPUState for TCG temps.

Signed-off-by: Blue Swirl <blauwirbel@gmail.com>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 01:26:42 +02:00
Peter Maydell 4a1d241e3c tcg/arm: Implement movcond_i32
Implement movcond_i32 for ARM, as the sequence
  mov dst, v2   (implicitly done by the tcg common code)
  cmp c1, c2
  movCC dst, v1

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 01:22:49 +02:00
Peter Maydell 7fc645bf7a tcg/arm: Factor out code to emit immediate or reg-reg op
The code to emit either an immediate cmp or a register cmp insn is
duplicated in several places; factor it out into its own function.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Aurelien Jarno <aurelien@aurel32.net>
2012-10-17 01:22:48 +02:00