Commit Graph

47317 Commits (31efaef1d9b80b7803e22ef28ffc51df04db60ab)

Author SHA1 Message Date
Peter Maydell 31efaef1d9 linux-user: Forget about synchronous signal once it is delivered
Commit 655ed67c2a which switched synchronous signals to
benig recorded in ts->sync_signal rather than in a queue
with every other signal had a bug: we failed to clear
the flag indicating that a synchronous signal was pending
when we delivered it. This meant that we would take the signal
again and again every time the guest made a syscall.
(This is a bug introduced in my refactoring of Timothy Baldwin's
original code.)

Fix this by passing in the struct emulated_sigtable* to
handle_pending_signal(), so that we clear the pending flag
in the ts->sync_signal struct when handling a synchronous signal.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-07-19 15:23:16 +03:00
Peter Maydell f2c2fb50be linux-user: Correct type for LOOP_GET_STATUS{,64} ioctls
The LOOP_GET_STATUS and LOOP_GET_STATUS64 ioctls were incorrectly
defined as IOC_W rather than IOC_R, which meant we weren't
correctly copying the information back from the kernel to the guest.
The loop_info64 structure definition was also missing a member
and using the wrong type for several 32-bit fields.

In particular, this meant that "kpartx -d image.img" didn't work
and "losetup -a" behaved strangely. Correct the ioctl type definitions.

Reported-by: Chanho Park <chanho61.park@samsung.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-07-19 15:23:16 +03:00
Peter Maydell a4a2c51f90 linux-user: Correct type for BLKSSZGET
The BLKSSZGET ioctl takes an argument which is a pointer to an int.
We were incorrectly declaring it to take a pointer to a long, which
meant that we would incorrectly write to memory which we should not
if the guest is a 64-bit architecture.

In particular, kpartx uses this ioctl to write to an int on the
stack, which tends to result in it crashing immediately.

Reported-by: Chanho Park <chanho61.park@samsung.com>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-07-19 15:23:16 +03:00
Peter Maydell 884cdc48a9 linux-user: Add loop control ioctls
Add support for the /dev/loop-control ioctls:
 LOOP_CTL_ADD
 LOOP_CTL_REMOVE
 LOOP_CTL_GET_FREE

[RV: fixed to apply to new header guards]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-07-19 15:22:33 +03:00
Peter Maydell c815701e81 linux-user: Check sigsetsize argument to syscalls
Many syscalls which take a sigset_t argument also take an argument
giving the size of the sigset_t.  The kernel insists that this
matches its idea of the type size and fails EINVAL if it is not.
Implement this logic in QEMU.  (This mostly just means some LTP test
cases which check error cases now pass.)

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
Reviewed-by: Laurent Vivier <laurent@vivier.eu>
2016-07-19 15:20:59 +03:00
Laurent Vivier c5dff280b8 linux-user: add nested netlink types
Nested types are used by the kernel to send link information and
protocol properties.

We can see following errors with "ip link show":

Unimplemented nested type 26
Unimplemented nested type 26
Unimplemented nested type 18
Unimplemented nested type 26
Unimplemented nested type 18
Unimplemented nested type 26

This patch implements nested types 18 (IFLA_LINKINFO) and
26 (IFLA_AF_SPEC).

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-07-19 15:20:59 +03:00
Laurent Vivier a82ea9393d linux-user: convert sockaddr_ll from host to target
As we convert sockaddr for AF_PACKET family for sendto() (target to
host) we need also to convert this for getsockname() (host to target).

arping uses getsockname() to get the the interface address and uses
this address with sendto().

Tested with:

    /sbin/arping -D -q -c2 -I eno1 192.168.122.88

...
getsockname(3, {sa_family=AF_PACKET, proto=0x806, if2,
pkttype=PACKET_HOST, addr(6)={1, 10c37b6b9a76}, [18]) = 0
...
sendto(3, "..." 28, 0,
       {sa_family=AF_PACKET, proto=0x806, if2, pkttype=PACKET_HOST,
       addr(6)={1, ffffffffffff}, 20) = 28
...

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-07-19 15:20:59 +03:00
Laurent Vivier c35e1f9c87 linux-user: add fd_trans helper in do_recvfrom()
Fix passwd using netlink audit.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-07-19 15:20:59 +03:00
Laurent Vivier 7d61d89232 linux-user: fix netlink memory corruption
Netlink is byte-swapping data in the guest memory (it's bad).

It's ok when the data come from the host as they are generated by the
host.

But it doesn't work when data come from the guest: the guest can
try to reuse these data whereas they have been byte-swapped.

This is what happens in glibc:

glibc generates a sequence number in nlh.nlmsg_seq and calls
sendto() with this nlh. In sendto(), we byte-swap nlmsg.seq.

Later, after the recvmsg(), glibc compares nlh.nlmsg_seq with
sequence number given in return, and of course it fails (hangs),
because nlh.nlmsg_seq is not valid anymore.

The involved code in glibc is:

sysdeps/unix/sysv/linux/check_pf.c:make_request()
...
  req.nlh.nlmsg_seq = time (NULL);
...
  if (TEMP_FAILURE_RETRY (__sendto (fd, (void *) &req, sizeof (req), 0,
                                    (struct sockaddr *) &nladdr,
                                    sizeof (nladdr))) < 0)
<here req.nlh.nlmsg_seq has been byte-swapped>
...
  do
    {
...
      ssize_t read_len = TEMP_FAILURE_RETRY (__recvmsg (fd, &msg, 0));
...
      struct nlmsghdr *nlmh;
      for (nlmh = (struct nlmsghdr *) buf;
           NLMSG_OK (nlmh, (size_t) read_len);
           nlmh = (struct nlmsghdr *) NLMSG_NEXT (nlmh, read_len))
        {
<we compare nlmh->nlmsg_seq with corrupted req.nlh.nlmsg_seq>
          if (nladdr.nl_pid != 0 || (pid_t) nlmh->nlmsg_pid != pid
              || nlmh->nlmsg_seq != req.nlh.nlmsg_seq)
            continue;
...
          else if (nlmh->nlmsg_type == NLMSG_DONE)
            /* We found the end, leave the loop.  */
            done = true;
        }
    }
  while (! done);

As we have a continue on "nlmh->nlmsg_seq != req.nlh.nlmsg_seq",
"done" cannot be set to "true" and we have an infinite loop.

It's why commands like "apt-get update" or "dnf update hangs".

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-07-19 15:20:59 +03:00
Laurent Vivier ef759f6fcc linux-user: fd_trans_*_data() returns the length
fd_trans_target_to_host_data() and fd_trans_host_to_target_data() must
return the length of processed data.

Signed-off-by: Laurent Vivier <laurent@vivier.eu>
Signed-off-by: Riku Voipio <riku.voipio@linaro.org>
2016-07-19 15:20:58 +03:00
Peter Maydell ad31cd4c69 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQIcBAABAgAGBQJXjV3bAAoJEH3vgQaq/DkOArAQAKc03QhwukqT2aZ5zs7kZ1Hv
 PCHrISL0dKGK3YfyiSitppoxr6eBoBR9UIEjaNlW0XwujqwWdWfJ7kIbVSGAyWqR
 YWmV+bA+TUWXz+tFbeDI0maMH9GNVCAuvQiqldmJxhZ303pf5cksZ9CqiALSylTY
 t5XUB6nhV7MPto63C2X2xjLkKxlsT9KOTsYxGVgVXwUzgW1lAuu8Lo6eNULXCgUa
 j+azgSFAiUOKwfKxcKD25kPOxgWlrxkGRc2LdFlopEzShENROhR2r9ut3okgAoM4
 KPVIE3jSsLMhNr9bRQRUJw53vRSL/bxFvlCdzKiBFSo7wNMKWNA9RWF6+If1Jvoi
 Am+BzINCfNfoFmqlXppqWGlapk9ZtmDGbPwaUyT6NJ9axAASTQxcj8QOjGEX07UE
 ubvzIXx7D1Amo59/4RWRXVDpMb9+p3npqkuCL+DWZzq7EVB42ig8+fKhijhS4jUK
 2DA7uL4orUjUIoTbJZsKciw7MfaWuP2/SnP1VRNRXSsiNg5N4qJDUB6Wo5AKksQB
 LWP4Ou4irPj/ZGvMhJBMMOQ5kl2maqj8beP9pVjltVjGVzAihAyuXHUqjPdqe1R9
 PLQidHKCb1OaJWi3c53vayuzlINH9iY+adjgSuYxi1QOZ+uqEsjSs+3+m1gyTOeh
 sd8VU/TtJbiGBCh9VTTf
 =9N8+
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/jnsnow/tags/ide-pull-request' into staging

# gpg: Signature made Mon 18 Jul 2016 23:53:15 BST
# gpg:                using RSA key 0x7DEF8106AAFC390E
# gpg: Good signature from "John Snow (John Huston) <jsnow@redhat.com>"
# Primary key fingerprint: FAEB 9711 A12C F475 812F  18F2 88A9 064D 1835 61EB
#      Subkey fingerprint: F9B7 ABDB BCAC DF95 BE76  CBD0 7DEF 8106 AAFC 390E

* remotes/jnsnow/tags/ide-pull-request:
  block: ignore flush requests when storage is clean
  tests: in IDE and AHCI tests perform DMA write before flushing
  ide: set retry_unit for PIO and FLUSH requests
  ide: refactor retry_unit set and clear into separate function

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-19 11:47:07 +01:00
Peter Maydell 0c1b58f250 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQEcBAABAgAGBQJXjVFbAAoJEJykq7OBq3PIGZQH/A3FEMEPHgTKXyn11H2nbeMl
 Cc8PQlaZrnKBVEmy3JOP6XItjW4iuKwEgNvwqv8jR48uq4h2zppXsT67vYuVHwMT
 6JrT8X8bAzIJCwfa/jt8WYLIeSeqSFbY9tH5N7trTINs0xQaL9ZuawkOzDkdBxhl
 lFhkFVCH45CW8wXD7Jkicp/GwDiyZVotf3q3LPATN6QhWhiXVAWga1+xGK5rrBJs
 z3m14xCik+LlVkeXtZiqDUnuG0OKlzuGBkb6IMEwz4o7lRyoK1CXl2KgmKFblMpg
 whgE3EKNvE4kPwa2/chq54aQvoBYXEAtwWfUHIOARWIGds9BSsHgNFuePOv19Ds=
 =d0bQ
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/tracing-pull-request' into staging

# gpg: Signature made Mon 18 Jul 2016 22:59:55 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/tracing-pull-request:
  trace: Add QAPI/QMP interfaces to query and control per-vCPU tracing state
  trace: Allow event name pattern in "info trace-events"
  trace: Conditionally trace events based on their per-vCPU state
  trace: Add per-vCPU tracing states for events with the 'vcpu' property
  trace: Cosmetic changes on fast-path tracing
  disas: Remove unused macro '_'
  trace: Identify events with the 'vcpu' property
  trace: [bsd-user] Commandline arguments to control tracing
  trace: [linux-user] Commandline arguments to control tracing

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-19 10:54:49 +01:00
Peter Maydell 08b558f07b VFIO update 2016-07-18
One fix for 2.7-rc0 which hides the ARI extended capability, fixing
 multifunction support in PCIe configurations where the assigned device
 function topology does not match the host (Alex Williamson)
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.14 (GNU/Linux)
 
 iQIcBAABAgAGBQJXjQujAAoJECObm247sIsieFIQAIAm7rl4CT1kpIbeoJi0C9Ri
 0DnWnNQC5RtP9M4OMOpCReN2xol48XvZEpfQWorn6txH4xwKrF242MzQIABsP9u2
 6iOtHcGpKAx8ffRsdJAV5Ejuo/khtzNMFoxuioje06xEW2yQ5nHBcievwuxxggH0
 RcXNRMr09DwAC2eB6jYOuQcm5qkOy53t/t4oVBM+agd/C9epr+VHDvNarN2ZpL8t
 BLLBKESEhuUGRr3Vo7da9wnszBpxyP1PYameNLaEY6rZ5vcyU6Dcec5Y/zsaJZV2
 yHAr50UyGop6fK/azPTgL+GV/OWrwawn6KatiCEVQD3Yz/ZzvXIoNXLX6pR4xjqF
 AN+KmjcBoi9q6hiKLcbSZXJPzGi52HJlko+uMH6m1xhDzOlwznHw9ZaVZ8rd+idv
 wnuSjVoJGN8PG2tBoMwYTe3iIvVJLlvckGsxeeoHfVR/64NMUtaDBcd+BYNdy9iK
 CAqBq/MXNBPkNWkVXjC8mnbLGvXSBKHBlLieFpEYiBUJXnHblqojN/H345ZVVvby
 uIMqOOeanaVBZ0jNP39u6F5Qzp8mcR1GKphfFg6YaSzNLw0IgF2CQ4qwHbIBMA4v
 HUlNAoIxSKQe6Rk/8SFjyfzgVrJ4U18NhRTrSySvEupZgZvizXacjLYQNkTo1MN5
 STV2TpoVG18nl9KKhdG+
 =40ci
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20160718.0' into staging

VFIO update 2016-07-18

One fix for 2.7-rc0 which hides the ARI extended capability, fixing
multifunction support in PCIe configurations where the assigned device
function topology does not match the host (Alex Williamson)

# gpg: Signature made Mon 18 Jul 2016 18:02:27 BST
# gpg:                using RSA key 0x239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <alex.williamson@redhat.com>"
# gpg:                 aka "Alex Williamson <alex@shazbot.org>"
# gpg:                 aka "Alex Williamson <alwillia@redhat.com>"
# gpg:                 aka "Alex Williamson <alex.l.williamson@gmail.com>"
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-update-20160718.0:
  vfio/pci: Hide ARI capability

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-19 09:02:05 +01:00
Evgeny Yakovlev 3ff2f67a7c block: ignore flush requests when storage is clean
Some guests (win2008 server for example) do a lot of unnecessary
flushing when underlying media has not changed. This adds additional
overhead on host when calling fsync/fdatasync.

This change introduces a write generation scheme in BlockDriverState.
Current write generation is checked against last flushed generation to
avoid unnessesary flushes.

The problem with excessive flushing was found by a performance test
which does parallel directory tree creation (from 2 processes).
Results improved from 0.424 loops/sec to 0.432 loops/sec.
Each loop creates 10^3 directories with 10 files in each.

This affected some blkdebug testcases that were expecting error logs from
failure-injected flushes which are now skipped entirely
(tests 026 071 089).

This also affects the performance of block jobs and thus BLOCK_JOB_READY
events for driver-mirror and active block-commit commands now arrives
faster, before QMP send successfully returns to caller (tests 141 144).

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-5-git-send-email-den@openvz.org
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2016-07-18 18:19:01 -04:00
Evgeny Yakovlev 2dd7e10d7c tests: in IDE and AHCI tests perform DMA write before flushing
Due to changes in flush behaviour clean disks stopped generating
flush_to_disk events and IDE and AHCI tests that test flush commands
started to fail.

This change adds additional DMA writes to affected tests before sending
flush commands so that bdrv_flush actually generates flush_to_disk event.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-4-git-send-email-den@openvz.org
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2016-07-18 18:19:01 -04:00
Evgeny Yakovlev 35f78ab469 ide: set retry_unit for PIO and FLUSH requests
The following sequence of tests discovered a problem in IDE emulation:
1. Send DMA write to IDE device 0
2. Send CMD_FLUSH_CACHE to same IDE device which will be failed by block
layer using blkdebug script in tests/ide-test:test_retry_flush

When doing DMA request ide/core.c will set s->retry_unit to s->unit in
ide_start_dma. When dma completes ide_set_inactive sets retry_unit to -1.
After that ide_flush_cache runs and fails thanks to blkdebug.
ide_flush_cb calls ide_handle_rw_error which asserts that s->retry_unit
== s->unit. But s->retry_unit is still -1 after previous DMA completion
and flush does not use anything related to retry.

This patch restricts retry unit assertion only to ops that actually use
retry logic.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-3-git-send-email-den@openvz.org
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2016-07-18 18:19:01 -04:00
Evgeny Yakovlev 0eeee07e24 ide: refactor retry_unit set and clear into separate function
Code to set and clear state associated with retry in moved into
ide_set_retry and ide_clear_retry to make adding retry setups easier.

Signed-off-by: Evgeny Yakovlev <eyakovlev@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1468870792-7411-2-git-send-email-den@openvz.org
CC: Kevin Wolf <kwolf@redhat.com>
CC: Max Reitz <mreitz@redhat.com>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: John Snow <jsnow@redhat.com>
Signed-off-by: John Snow <jsnow@redhat.com>
2016-07-18 18:19:01 -04:00
Lluís Vilanova 77e2b17272 trace: Add QAPI/QMP interfaces to query and control per-vCPU tracing state
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 18:23:12 +01:00
Lluís Vilanova bd71211d55 trace: Allow event name pattern in "info trace-events"
Homogenizes the command capabilities with QMP.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 18:23:12 +01:00
Lluís Vilanova 40b9cd25f7 trace: Conditionally trace events based on their per-vCPU state
Events with the 'vcpu' property are conditionally emitted according to
their per-vCPU state. Other events are emitted normally based on their
global tracing state.

Note that the per-vCPU condition check applies to all tracing backends.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 18:23:12 +01:00
Lluís Vilanova 4815185902 trace: Add per-vCPU tracing states for events with the 'vcpu' property
Each vCPU gets a 'trace_dstate' bitmap to control the per-vCPU dynamic
tracing state of events with the 'vcpu' property.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 18:23:12 +01:00
Lluís Vilanova e1d6e0a4c0 trace: Cosmetic changes on fast-path tracing
Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 18:13:54 +01:00
Lluís Vilanova ca66f1a174 disas: Remove unused macro '_'
Eliminates a future compilation error when UI code includes the tracing
headers (indirectly pulling "disas/bfd.h" through "qom/cpu.h") and
GLib's i18n '_' macro.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 18:13:54 +01:00
Lluís Vilanova 17f7ac75df trace: Identify events with the 'vcpu' property
A new event attribute 'cpu_id' is added to have a separate ID
space ('TRACE_VCPU_*') for all events with the 'vcpu' property.

These are later used to identify which events are enabled on each vCPU.

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 18:13:54 +01:00
Lluís Vilanova 6913e79c36 trace: [bsd-user] Commandline arguments to control tracing
[Changed const char *trace_file to char *trace_file since it's a
heap-allocated string that needs to be freed.  This type is also
returned by trace_opt_parse() and used in vl.c.

Also fixed coding style on for(;;) and else statement as suggested by
Eric Blake <eblake@redhat.com> since the patch modifies these lines or
close enough.
--Stefan]

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 146860252322.30668.18276041739086338328.stgit@fimbulvetr.bsc.es
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 18:13:37 +01:00
Lluís Vilanova 6533dd6e11 trace: [linux-user] Commandline arguments to control tracing
[Changed const char *trace_file to char *trace_file since it's a
heap-allocated string that needs to be freed.  This type is also
returned by trace_opt_parse() and used in vl.c.
--Stefan]

Signed-off-by: Lluís Vilanova <vilanova@ac.upc.edu>
Message-id: 146860251784.30668.17339867835129075077.stgit@fimbulvetr.bsc.es
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 18:13:37 +01:00
Peter Maydell a098fbc025 -----BEGIN PGP SIGNATURE-----
Version: GnuPG v1
 
 iQEcBAABAgAGBQJXjQqzAAoJEJykq7OBq3PIqvUH/1HFZxTfmtQ2g3i6b7D63Bj8
 f5FHRZ8XCaBdIGKJO8/nY4TdwOswXKuomdrvSz9QtJRD/WgRDZQ30jrPPaq9P9+m
 GXfgWMtFWeYSOkRKtOtxJ+2pwHr0I0qmVBCDAwIgak9Yx+uP/KowEIibeybbPuQ+
 LXAWlw+LWGOV/XaZQyY6dgAUaXTJ+t86WyZL6cGR3JOMETmpYsFzzyO1379ZstY3
 +mcuAmmrjdV0B9l3rghKjfAooV3MmMAecsX5mizPsBgI29k7tvLEAnLHIoam7E0b
 brFL6vGoyhprDO0soR3ZqclSIQYdyQDTpV6vUjLnpg/z1SdQOCoccn7k7B7Hr3Q=
 =NoOD
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/stefanha/tags/block-pull-request' into staging

# gpg: Signature made Mon 18 Jul 2016 17:58:27 BST
# gpg:                using RSA key 0x9CA4ABB381AB73C8
# gpg: Good signature from "Stefan Hajnoczi <stefanha@redhat.com>"
# gpg:                 aka "Stefan Hajnoczi <stefanha@gmail.com>"
# Primary key fingerprint: 8695 A8BF D3F9 7CDA AC35  775A 9CA4 ABB3 81AB 73C8

* remotes/stefanha/tags/block-pull-request:
  MAINTAINERS: Add include/block/aio.h to block I/O path section
  virtio-blk: dataplane cleanup
  checkpatch: consider git extended headers valid patches
  aio-posix: remove useless parameter
  linux-aio: prevent submitting more than MAX_EVENTS
  aio_ctx_check: follow CODING_STYLE
  linux-aio: share one LinuxAioState within an AioContext
  spec/parallels: fix a mistake

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-18 18:13:01 +01:00
Alex Williamson 383a7af7ec vfio/pci: Hide ARI capability
QEMU supports ARI on downstream ports and assigned devices may support
ARI in their extended capabilities.  The endpoint ARI capability
specifies the next function, such that the OS doesn't need to walk
each possible function, however this next function is relative to the
host, not the guest.  This leads to device discovery issues when we
combine separate functions into virtual multi-function packages in a
guest.  For example, SR-IOV VFs are not enumerated by simply probing
the function address space, therefore the ARI next-function field is
zero.  When we combine multiple VFs together as a multi-function
device in the guest, the guest OS identifies ARI is enabled, relies on
this next-function field, and stops looking for additional function
after the first is found.

Long term we should expose the ARI capability to the guest to enable
configurations with more than 8 functions per slot, but this requires
additional QEMU PCI infrastructure to manage the next-function field
for multiple, otherwise independent devices.  In the short term,
hiding this capability allows equivalent functionality to what we
currently have on non-express chipsets.

Signed-off-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Marcel Apfelbaum <marcel@redhat.com>
2016-07-18 10:55:17 -06:00
Fam Zheng e1029ae26d MAINTAINERS: Add include/block/aio.h to block I/O path section
This file is actually the header for async.c and aio-*.c., so add it to
the same section.

Suggested-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1468826387-10473-1-git-send-email-famz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 15:10:52 +01:00
Cao jin ab3b9c1be8 virtio-blk: dataplane cleanup
No need duplicate the judgment, there is one in function entry.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1468814749-14510-1-git-send-email-caoj.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 15:10:52 +01:00
Stefan Hajnoczi f8dccbb634 checkpatch: consider git extended headers valid patches
Renames look like this with git-diff(1) when diff.renames = true is set:

  diff --git a/a b/b
  similarity index 100%
  rename from a
  rename to b

This raises the "Does not appear to be a unified-diff format patch"
error because checkpatch.pl only considers a diff valid if it contains
at least one "@@" hunk.

This patch accepts renames and copies too so that checkpatch.pl exits
successfully when a diff only renames/copies files.  The git diff
extended header format is described on the git-diff(1) man page.

Reported-by: Colin Lord <clord@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1468576014-28788-1-git-send-email-stefanha@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 15:10:52 +01:00
Cao jin 7e00346505 aio-posix: remove useless parameter
Parameter **errp of aio_context_setup() is useless, remove it
and clean up the related code.

Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Fam Zheng <famz@redhat.com>
Cc: Eric Blake <eblake@redhat.com>
Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1468578524-23433-1-git-send-email-caoj.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 15:10:52 +01:00
Roman Pen 5e1b34a3fa linux-aio: prevent submitting more than MAX_EVENTS
Invoking io_setup(MAX_EVENTS) we ask kernel to create ring buffer for us
with specified number of events.  But kernel ring buffer allocation logic
is a bit tricky (ring buffer is page size aligned + some percpu allocation
are required) so eventually more than requested events number is allocated.

From a userspace side we have to follow the convention and should not try
to io_submit() more or logic, which consumes completed events, should be
changed accordingly.  The pitfall is in the following sequence:

    MAX_EVENTS = 128
    io_setup(MAX_EVENTS)

    io_submit(MAX_EVENTS)
    io_submit(MAX_EVENTS)

    /* now 256 events are in-flight */

    io_getevents(MAX_EVENTS) = 128

    /* we can handle only 128 events at once, to be sure
     * that nothing is pended the io_getevents(MAX_EVENTS)
     * call must be invoked once more or hang will happen. */

To prevent the hang or reiteration of io_getevents() call this patch
restricts the number of in-flights, which is now limited to MAX_EVENTS.

Signed-off-by: Roman Pen <roman.penyaev@profitbricks.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1468415004-31755-1-git-send-email-roman.penyaev@profitbricks.com
Cc: Stefan Hajnoczi <stefanha@redhat.com>
Cc: qemu-devel@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 15:10:52 +01:00
Cao jin 6977d901c4 aio_ctx_check: follow CODING_STYLE
replace tab with spaces

Signed-off-by: Cao jin <caoj.fnst@cn.fujitsu.com>
Message-id: 1468501843-14927-1-git-send-email-caoj.fnst@cn.fujitsu.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 15:10:52 +01:00
Paolo Bonzini 0187f5c9cb linux-aio: share one LinuxAioState within an AioContext
This has better performance because it executes fewer system calls
and does not use a bottom half per disk.

Originally proposed by Ming Lei.

[Changed #include "raw-aio.h" to "block/raw-aio.h" in win32-aio.c to fix
build error as reported by Peter Maydell <peter.maydell@linaro.org>.
--Stefan]

Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1467650000-51385-1-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>

squash! linux-aio: share one LinuxAioState within an AioContext
2016-07-18 15:09:31 +01:00
Vladimir Sementsov-Ogievskiy 4e90ccc28e spec/parallels: fix a mistake
We have only one flag for now - Empty Image flag. The patch fixes unused
bits specification and marks bit 1 as usused.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2016-07-18 15:09:31 +01:00
Peter Maydell 3913d3707e ppc patch queue 2016-07-18
Here's what ought to be the final ppc pull request before the 2.7 hard
 freeze.  This set contains a rework of the DBDMA device for Mac
 platforms, and some assorted cleanups and bugfixes.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1
 
 iQIcBAABAgAGBQJXjFyPAAoJEGw4ysog2bOSfcMQAL6M8Kq51GHugv1Kf7f+pj2t
 QKtwHc2MTASNJuwE0uKOyUdsgPgUiX+umEBHmUP4FDE13LIfHgF3k/T9gzwPNnrR
 W2/qVZZVPJW9Bn3ZsUzQ2RYoSE3NMgPA93s80PTwGD/7IQl5uAs9chon3yHapt65
 lS/IRsLWHJ1Y+7GUTLHy8vPbo5yrMkwywO4jlCrlqi/3uYLYsDdWZA3lAYQ7ZUsB
 mf/Ldb/5q6CBxqhUpm7eX/Wzd3F0zXqiaoFEIbvHCzd1Cl/ZH/JeSUkOgY42kYTp
 Sdt17oY5vXivyLwANkeXntvVZNuDJrHWIH/e1Mn81OezA0QBTV0uRxc3hxdhetSH
 JdaqWI7H5uSi9hdLN6CSecRWR9DLW2678D+qwwHcGtpJhLQvNfwmQH5GQZuDkKYn
 ZLsuvhquDc29wq6T+G64MmzimvFWy6HMaqjoMyzg3h5VZO+DL+JdX2GiQWmsv7Sx
 4AX82S+vjmODgc+rv/KezvpUEus8JJO1wUb8xu9Q00uYO7HzBOQfXSKnU9rFV8Q5
 jS0rxU3CB2Ely9nkXU5i+4I20P0ARosOwdjbPfI0FxEJZAxPt4VFiOriE4nbOHpc
 vJqGc51hkR/2C2zypd02ApFqeLOPx/iSLKpUrO5+RVrjStvupj1sL1ZSsQS6p5zN
 tGpyEhnSGItPqsADvXch
 =K5vl
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/dgibson/tags/ppc-for-2.7-20160718' into staging

ppc patch queue 2016-07-18

Here's what ought to be the final ppc pull request before the 2.7 hard
freeze.  This set contains a rework of the DBDMA device for Mac
platforms, and some assorted cleanups and bugfixes.

# gpg: Signature made Mon 18 Jul 2016 05:35:27 BST
# gpg:                using RSA key 0x6C38CACA20D9B392
# gpg: Good signature from "David Gibson <david@gibson.dropbear.id.au>"
# gpg:                 aka "David Gibson (Red Hat) <dgibson@redhat.com>"
# gpg:                 aka "David Gibson (ozlabs.org) <dgibson@ozlabs.org>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 75F4 6586 AE61 A66C C44E  87DC 6C38 CACA 20D9 B392

* remotes/dgibson/tags/ppc-for-2.7-20160718:
  ppc: Yet another fix for the huge page support detection mechanism
  target-ppc: fix left shift overflow in hpte_page_shift
  ppc/mmu-hash64: Remove duplicated #include statement
  ppc: abort if compat property contains an unknown value
  spapr: Ensure CPU cores are added contiguously and removed in LIFO order
  vfio/spapr: Remove stale ioctl() call
  ppc: Fix support for odd MSR combinations
  dbdma: reset io->processing flag for unassigned DBDMA channel rw accesses
  dbdma: set FLUSH bit upon reception of flush command for unassigned DBDMA channels
  dbdma: fix load_word/store_word value endianness
  dbdma: fix endian of DBDMA_CMDPTR_LO during branch
  dbdma: add per-channel debugging enabled via DEBUG_DBDMA_CHANMASK
  dbdma: always define DBDMA_DPRINTF and enable debug with DEBUG_DBDMA
  spapr: fix core unplug crash

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2016-07-18 11:24:15 +01:00
Thomas Huth 159d2e39a8 ppc: Yet another fix for the huge page support detection mechanism
Commit 86b50f2e1b ("Disable huge page support if it is not available
for main RAM") already made sure that huge page support is not announced
to the guest if the normal RAM of non-NUMA configurations is not backed
by a huge page filesystem. However, there is one more case that can go
wrong: NUMA is enabled, but the RAM of the NUMA nodes are not configured
with huge page support (and only the memory of a DIMM is configured with
it). When QEMU is started with the following command line for example,
the Linux guest currently crashes because it is trying to use huge pages
on a memory region that does not support huge pages:

 qemu-system-ppc64 -enable-kvm ... -m 1G,slots=4,maxmem=32G -object \
   memory-backend-file,policy=default,mem-path=/hugepages,size=1G,id=mem-mem1 \
   -device pc-dimm,id=dimm-mem1,memdev=mem-mem1 -smp 2 \
   -numa node,nodeid=0 -numa node,nodeid=1

To fix this issue, we've got to make sure to disable huge page support,
too, when there is a NUMA node that is not using a memory backend with
huge page support.

Fixes: 86b50f2e1b
Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-18 10:52:19 +10:00
Paolo Bonzini b56d417b8d target-ppc: fix left shift overflow in hpte_page_shift
ps->pte_enc is a 32-bit value, which is shifted left and then compared
to a 64-bit value.  It needs a cast before the shift.

Reported by Coverity.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-18 10:45:44 +10:00
Thomas Huth 28f3331887 ppc/mmu-hash64: Remove duplicated #include statement
No need to include error-report.h twice here.

Signed-off-by: Thomas Huth <thuth@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-18 10:40:27 +10:00
Greg Kurz c4dfc14b55 ppc: abort if compat property contains an unknown value
It is not possible to set the compat property to an unknown value with
powerpc_set_compat(). Something must have gone terribly wrong in QEMU,
if we detect an "Internal error" in powerpc_get_compat(). Let's abort then.

This patch also drops the "max_compat ? *max_compat : -1" construct. It is
useless since max_compat is dereferenced a few lines above.

Signed-off-by: Greg Kurz <groug@kaod.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-18 10:40:27 +10:00
Bharata B Rao 5cbc64de25 spapr: Ensure CPU cores are added contiguously and removed in LIFO order
If CPU core addition or removal is allowed in random order leading to
holes in the core id range (and hence in the cpu_index range), migration
can fail as migration with holes in cpu_index range isn't yet handled
correctly.

Prevent this situation by enforcing the addition in contiguous order
and removal in LIFO order so that we never end up with holes in
cpu_index range.

Signed-off-by: Bharata B Rao <bharata@linux.vnet.ibm.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-18 10:40:27 +10:00
David Gibson 21bb3093e6 vfio/spapr: Remove stale ioctl() call
This ioctl() call to VFIO_IOMMU_SPAPR_TCE_REMOVE was left over from an
earlier version of the code and has since been folded into
vfio_spapr_remove_window().

It wasn't caught because although the argument structure has been removed,
the libc function remove() means this didn't trigger a compile failure.
The ioctl() was also almost certain to fail silently and harmlessly with
the bogus argument, so this wasn't caught in testing.

Suggested-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
Reviewed-by: Alexey Kardashevskiy <aik@ozlabs.ru>
2016-07-18 10:40:27 +10:00
Benjamin Herrenschmidt 36a24df84a ppc: Fix support for odd MSR combinations
MacOS uses an architecturally illegal MSR combination that
seems nonetheless supported by 32-bit processors, which is
to have MSR[PR]=1 and one or more of MSR[DR/IR/EE]=0.

This adds support for it. To work properly we need to also
properly include support for PR=1,{I,D}R=0 to the MMU index
used by the qemu TLB.

Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Tested-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-18 10:40:27 +10:00
Mark Cave-Ayland 2df778967b dbdma: reset io->processing flag for unassigned DBDMA channel rw accesses
Otherwise MacOS 9 hangs upon shutdown.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-18 10:40:27 +10:00
Mark Cave-Ayland 894993905d dbdma: set FLUSH bit upon reception of flush command for unassigned DBDMA channels
This fixes MacOS 9 whereby it continually flushes and polls the status bits
until they are set to indicate a successful flush.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-18 10:40:27 +10:00
Mark Cave-Ayland e12f50b900 dbdma: fix load_word/store_word value endianness
The values to read/write to/from physical memory are copied directly to the
physical address with no endian swapping required.

Also add some extra information to debugging output while we are here.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-18 10:40:27 +10:00
Mark Cave-Ayland 3f0d4128dc dbdma: fix endian of DBDMA_CMDPTR_LO during branch
The current DBDMA command is stored in little-endian format, so make sure
we convert it to match our CPU when updating the DBDMA_CMDPTR_LO register.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-18 10:40:27 +10:00
Mark Cave-Ayland 3e49c43940 dbdma: add per-channel debugging enabled via DEBUG_DBDMA_CHANMASK
By default large amounts of DBDMA debugging are produced when often it is just
1 or 2 channels that are of interest. Introduce DEBUG_DBDMA_CHANMASK to allow
the developer to select the channels of interest at compile time, and then
further add the extra channel information to each debug statement where
possible.

Also clearly mark the start/end of DBDMA_run_bh to allow tracking the bottom
half execution.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-18 10:40:27 +10:00
Mark Cave-Ayland ba0b17dd8f dbdma: always define DBDMA_DPRINTF and enable debug with DEBUG_DBDMA
Enabling DBDMA_DPRINTF unconditionally ensures that any errors in debug
statements are picked up immediately.

Signed-off-by: Mark Cave-Ayland <mark.cave-ayland@ilande.co.uk>
Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Signed-off-by: David Gibson <david@gibson.dropbear.id.au>
2016-07-18 10:40:27 +10:00