Commit Graph

248 Commits (e98ab097092e54999f046e9efa1ca1dd52f0c9e5)

Author SHA1 Message Date
Paolo Bonzini e98ab09709 aio-posix: move pollfds to thread-local storage
By using thread-local storage, aio_poll can stop using global data during
g_poll_ns.  This will make it possible to drop callbacks from rfifolock.

[Moved npfd = 0 assignment to end of walking_handlers region as
suggested by Paolo.  This resolves the assert(npfd == 0) assertion
failure in pollfds_cleanup().
--Stefan]

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1424449612-18215-2-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-04-28 15:36:08 +02:00
Max Reitz 3f4726596d nbd: Set block size to BDRV_SECTOR_SIZE
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <1424887718-10800-13-git-send-email-mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-18 12:07:01 +01:00
Max Reitz ac97393dc7 nbd: Fix potential signed overflow issues
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <1424887718-10800-11-git-send-email-mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-18 12:06:56 +01:00
Max Reitz 98f44bbe70 nbd: Handle blk_getlength() failure
Signed-off-by: Max Reitz <mreitz@redhat.com>
Message-Id: <1424887718-10800-9-git-send-email-mreitz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-18 12:06:50 +01:00
Fam Zheng d51a2427f6 block: Drop bdrv_find
All callers are converted, so drop it.

Signed-off-by: Fam Zheng <famz@redhat.com>
Message-id: 1425296209-1476-5-git-send-email-famz@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-03-16 12:10:30 -04:00
Ekaterina Tumanova 892b7de832 block: add bdrv functions for geometry and blocksize
Add driver functions for geometry and blocksize detection

Signed-off-by: Ekaterina Tumanova <tumanova@linux.vnet.ibm.com>
Reviewed-by: Thomas Huth <thuth@linux.vnet.ibm.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1424087278-49393-2-git-send-email-tumanova@linux.vnet.ibm.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10 14:02:21 +01:00
Max Reitz 06d05fa738 qcow2: Allow creation with refcount order != 4
Add a creation option to qcow2 for setting the refcount order of images
to be created, and respect that option's value.

This breaks some test outputs, fix them.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-10 14:02:21 +01:00
Teruaki Ishizaki 876eb1b0cc sheepdog: selectable object size support
Previously, qemu block driver of sheepdog used hard-coded VDI object size.
This patch enables users to handle VDI object size.

When you start qemu, you don't need to specify additional command option.

But when you create the VDI which doesn't have default object size
with qemu-img command, you specify object_size option.

If you want to create a VDI of 8MB object size,
you need to specify following command option.

 # qemu-img create -o object_size=8M sheepdog:test1 100M

In addition, when you don't specify qemu-img command option,
a default value of sheepdog cluster is used for creating VDI.

 # qemu-img create sheepdog:test2 100M

Signed-off-by: Teruaki Ishizaki <ishizaki.teruaki@lab.ntt.co.jp>
Acked-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-03-09 11:11:59 +01:00
Kevin Wolf cd12bb567c coroutine: Clean up qemu_coroutine_enter()
qemu_coroutine_enter() is now the only user of coroutine_swap(). Both
functions are short, so inline it.

Also, using COROUTINE_YIELD is now even more confusing because this code
is never called during qemu_coroutine_yield() any more. In fact, this
value is never read back, so we can just introduce a new COROUTINE_ENTER
which documents the purpose of the task switch better.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2015-03-09 11:11:59 +01:00
Fam Zheng 2e5b887cfc block: Forbid bdrv_set_aio_context outside BQL
Even if the caller has both the old and the new AioContext's, there can
be a deadlock, due to the leading bdrv_drain_all.

Suppose there are four io threads (A, B, A0, B0) with A and B owning a
BDS for each (bs_a, bs_b); Now A wants to move bs_a to iothread A0, and
B wants to move bs_b to B0, at the same time:

  iothread A                           iothread B
--------------------------------------------------------------------------
  aio_context_acquire(A0) /* OK */     aio_context_acquire(B0) /* OK */
  bdrv_set_aio_context(bs_a, A0)       bdrv_set_aio_context(bs_b, B0)
  -> bdrv_drain_all()                  -> bdrv_drain_all()
     -> acquire A /* OK */               -> acquire A /* blocked */
     -> acquire B /* blocked */          -> acquire B
     ...                                 ...

Deadlock happens because A is waiting for B, and B is waiting for A.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <1423969591-23646-2-git-send-email-famz@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-02-27 14:43:45 +01:00
Max Reitz c0191e763b block: Remove "growable" from BDS
Now that request clamping is done in the BlockBackend, the "growable"
field can be removed from the BlockDriverState. All BDSs are now treated
as being "growable" (that is, they are allowed to grow; they are not
necessarily actually able to).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-16-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16 15:07:19 +00:00
Max Reitz b65a5e12a4 block: Add Error parameter to bdrv_find_protocol()
The argument given to bdrv_find_protocol() is just a file name, which
makes it difficult for the caller to reconstruct what protocol
bdrv_find_protocol() was hoping to find. This patch adds an Error
parameter to that function to solve this issue.

Suggested-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1423162705-32065-4-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16 15:07:18 +00:00
Max Reitz f53a829bb9 nbd: Drop BDS backpointer
Before this patch, the "opaque" pointer in an NBD BDS points to a
BDRVNBDState, which contains an NbdClientSession object, which in turn
contains a pointer to the BDS. This pointer may become invalid due to
bdrv_swap(), so drop it, and instead pass the BDS directly to the
nbd-client.c functions which then retrieve the NbdClientSession object
from there.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1423256778-3340-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-02-16 14:36:03 +00:00
Markus Armbruster 4d2855a348 block: New bdrv_add_key(), convert monitor to use it
Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1422524221-8566-4-git-send-email-armbru@redhat.com
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-02-06 11:46:32 -05:00
Peter Lieven 75af1f34cd block: introduce BDRV_REQUEST_MAX_SECTORS
we check and adjust request sizes at several places with
sometimes inconsistent checks or default values:
 INT_MAX
 INT_MAX >> BDRV_SECTOR_BITS
 UINT_MAX >> BDRV_SECTOR_BITS
 SIZE_MAX >> BDRV_SECTOR_BITS

This patches introdocues a macro for the maximal allowed sectors
per request and uses it at several places.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Denis V. Lunev <den@openvz.org>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:22 +01:00
Max Reitz 1ce52846d3 nbd: Improve error messages
This patch makes use of the Error object for nbd_receive_negotiate() so
that errors during negotiation look nicer.

Furthermore, this patch adds an additional error message if the received
magic was wrong, but would be correct for the other protocol version,
respectively: So if an export name was specified, but the NBD server
magic corresponds to an old handshake, this condition is explicitly
signaled to the user, and vice versa.

As these messages are now part of the "Could not open image" error
message, additional filtering has to be employed in iotest 083, which
this patch does as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:22 +01:00
Francesco Romani e2462113b2 block: add event when disk usage exceeds threshold
Managing applications, like oVirt (http://www.ovirt.org), make extensive
use of thin-provisioned disk images.
To let the guest run smoothly and be not unnecessarily paused, oVirt sets
a disk usage threshold (so called 'high water mark') based on the occupation
of the device,  and automatically extends the image once the threshold
is reached or exceeded.

In order to detect the crossing of the threshold, oVirt has no choice but
aggressively polling the QEMU monitor using the query-blockstats command.
This lead to unnecessary system load, and is made even worse under scale:
deployments with hundreds of VMs are no longer rare.

To fix this, this patch adds:
* A new monitor command `block-set-write-threshold', to set a mark for
  a given block device.
* A new event `BLOCK_WRITE_THRESHOLD', to report if a block device
  usage exceeds the threshold.
* A new `write_threshold' field into the `BlockDeviceInfo' structure,
  to report the configured threshold.

This will allow the managing application to use smarter and more
efficient monitoring, greatly reducing the need of polling.

[Updated qemu-iotests 067 output to add the new 'write_threshold'
property. --Stefan]
[Changed g_assert_false() to !g_assert() to fix the build on older glib
versions. --Kevin]

Signed-off-by: Francesco Romani <fromani@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1421068273-692-1-git-send-email-fromani@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Peter Lieven f4564d53c6 block: add accounting for merged requests
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-02-06 17:24:21 +01:00
Jeff Cody 9a29e18f7d block: update string sizes for filename,backing_file,exact_filename
The string field entries 'filename', 'backing_file', and
'exact_filename' in the BlockDriverState struct are defined as 1024
bytes.

However, many places that use these values accept a maximum of PATH_MAX
bytes, so we have a mixture of 1024 byte and PATH_MAX byte allocations.
This patch makes the BlockDriverStruct field string sizes match usage.

This patch also does a few fixes related to the size that needs to
happen now:

    * the block qapi driver is updated to use PATH_MAX bytes
    * the qcow and qcow2 drivers have an additional safety check
    * the block vvfat driver is updated to use PATH_MAX bytes
      for the size of backing_file, for systems where PATH_MAX is < 1024
      bytes.
    * qemu-img uses PATH_MAX rather than 1024.  These instances were not
      changed to be dynamically allocated, however, as the extra
      temporary 3K in stack usage for qemu-img does not seem worrisome.

Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Signed-off-by: Jeff Cody <jcody@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-23 18:17:06 +01:00
Peter Maydell b629a38a13 Mostly bugfixes and cleanups from qemu-devel. Yet another small patch from
the record/replay series, and a few SCSI and i386 patches as well.
 -----BEGIN PGP SIGNATURE-----
 Version: GnuPG v2.0.22 (GNU/Linux)
 
 iQEcBAABAgAGBQJUtjlCAAoJEL/70l94x66Dy5gH/0QIHoXVH/2wuA9apNK2/gBj
 2U7g08QGKlc2wQGF4a48sQf523lSt5eirVxrwta0wmvFeznrdR84d4YGpolHM67A
 Q9Y5J2i+v1H6cfQH6ylq61QQ7rEC3+isa65wblLeMSCAb2W1CcV7avSKu4BSPZw2
 jGr3jd2Ve7pOsULpPhiNsmmltYSeZc7sQBYc9C7fQEoxOGsNnRoKOUKPnIk1mJTc
 iYH480L1MnOL3enIz13K34lQofNRhJxJBLYKhYsBydQbOh0/Ls1eifOY4xEegXZ0
 IUODy6c2pk+s/IUPARpBucKGKzDxdv0DLXDV60uGn5EsYT0CjCl9/sRs3bZvaQE=
 =eT8u
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/bonzini/tags/for-upstream' into staging

Mostly bugfixes and cleanups from qemu-devel.  Yet another small patch from
the record/replay series, and a few SCSI and i386 patches as well.

# gpg: Signature made Wed 14 Jan 2015 09:39:14 GMT using RSA key ID 78C7AE83
# gpg: Good signature from "Paolo Bonzini <bonzini@gnu.org>"
# gpg:                 aka "Paolo Bonzini <pbonzini@redhat.com>"
# gpg: WARNING: This key is not certified with sufficiently trusted signatures!
# gpg:          It is not certain that the signature belongs to the owner.
# Primary key fingerprint: 46F5 9FBD 57D6 12E7 BFD4  E2F7 7E15 100C CD36 69B1
#      Subkey fingerprint: F133 3857 4B66 2389 866C  7682 BFFB D25F 78C7 AE83

* remotes/bonzini/tags/for-upstream:
  cpus: consistently use QEMU_CLOCK_VIRTUAL_RT for icount_warp_rt timer
  qemu-timer: rename timer_init to timer_init_tl
  scsi: fix cancellation when I/O was completed but DMA was not.
  rules.mak: Fix module build
  hw/scsi/lsi53c895a: add support for additional diag / debug registers
  qemu-common.h: optimise muldiv64 if int128 is available
  target-i386: do not memcpy in and out of xmm_regs
  target-i386: fix movntsd on big-endian hosts
  vl.c: fix regression when reading memory size from config file
  vl: Don't silently change topology when all -smp options were set
  vl: fix max_cpus check
  vl: Avoid unnecessary 'if' nesting
  9pfs: changed to use event_notifier instead of qemu_pipe
  vl.c: fix regression when reading machine type from config file
  char: restore stdio echo on resume from suspend.

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2015-01-14 18:02:47 +00:00
Paolo Bonzini f186aa976b qemu-timer: rename timer_init to timer_init_tl
timer_init is not called that often.  Free the name for an equivalent
of timer_new.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
2015-01-14 10:38:57 +01:00
Fam Zheng bb00021de0 block: Split BLOCK_OP_TYPE_COMMIT to BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET}
Like BLOCK_OP_TYPE_BACKUP_SOURCE and BLOCK_OP_TYPE_BACKUP_TARGET,
block-commit involves two asymmetric devices.

This change is not user-visible (yet), because commit only works with
device names.

But once we enable backing reference in blockdev-add, or specifying
node-name in block-commit command, we don't want the user to start two
commit jobs on the same backing chain, which will corrupt things because
of the final bdrv_swap.

Before we have per category blockers, splitting this type is still
better.

[Resolved virtio-blk dataplane conflict by replacing
BLOCK_OP_TYPE_COMMIT with both BLOCK_OP_TYPE_COMMIT_{SOURCE, TARGET}.
They are safe since the block job runs in the same AioContext as the
dataplane IOThread.
--Stefan]

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-01-13 13:43:29 +00:00
Paolo Bonzini 66552b894b coroutine: drop qemu_coroutine_adjust_pool_size
This is not needed anymore.  The new TLS-based algorithm is adaptive.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1417518350-6167-7-git-send-email-pbonzini@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2015-01-13 13:43:29 +00:00
Vladimir Sementsov-Ogievskiy c4237dfa63 block: fix spoiling all dirty bitmaps by mirror and migration
Mirror and migration use dirty bitmaps for their purposes, and since
commit [block: per caller dirty bitmap] they use their own bitmaps, not
the global one. But they use old functions bdrv_set_dirty and
bdrv_reset_dirty, which change all dirty bitmaps.

Named dirty bitmaps series by Fam and Snow are affected: mirroring and
migration will spoil all (not related to this mirroring or migration)
named dirty bitmaps.

This patch fixes this by adding bdrv_set_dirty_bitmap and
bdrv_reset_dirty_bitmap, which change concrete bitmap. Also, to prevent
such mistakes in future, old functions bdrv_(set,reset)_dirty are made
static, for internal block usage.

Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@parallels.com>
CC: John Snow <jsnow@redhat.com>
CC: Fam Zheng <famz@redhat.com>
CC: Denis V. Lunev <den@openvz.org>
CC: Stefan Hajnoczi <stefanha@redhat.com>
CC: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: John Snow <jsnow@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1417081246-3593-1-git-send-email-vsementsov@parallels.com
Signed-off-by: Max Reitz <mreitz@redhat.com>
2015-01-13 11:47:56 +00:00
Max Reitz 9f07429e88 block: JSON filenames and relative backing files
When using a relative backing file name, qemu needs to know the
directory of the top image file. For JSON filenames, such a directory
cannot be easily determined (e.g. how do you determine the directory of
a qcow2 BDS directly on top of a quorum BDS?). Therefore, do not allow
relative filenames for the backing file of BDSs only having a JSON
filename.

Furthermore, BDS::exact_filename should be used whenever possible. If
BDS::filename is not equal to BDS::exact_filename, the former will
always be a JSON object.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13 11:47:56 +00:00
Max Reitz 0a82855a1a block: Get full backing filename from string
Introduce bdrv_get_full_backing_filename_from_filename(), a function
which takes the name of the backed file and a potentially relative
backing filename to produce the full (absolute) backing filename.

Use this function from bdrv_get_full_backing_filename().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2015-01-13 11:47:56 +00:00
Stefan Hajnoczi b5cf2c1b08 block: drop unused bdrv_clear_incoming_migration_all() prototype
The bdrv_clear_incoming_migration_all() function has not existed since
commit 7ea2d269cb ("block/migration:
Disable cache invalidate for incoming migration").

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1418212937-22222-1-git-send-email-stefanha@redhat.com
2014-12-12 16:55:16 +00:00
Max Reitz 5c98415b2a vmdk: Fix error for JSON descriptor file names
If vmdk blindly tries to use path_combine() using bs->file->filename as
the base file name, this will result in a bad error message for JSON
file names when calling bdrv_open(). It is better to only try
bs->file->exact_filename; if that is empty, bs->file->filename will be
useless for path_combine() and an error should be emitted (containing
bs->file->filename because desc_file_path (which is
bs->file->exact_filename) is empty).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Message-id: 1417615043-26174-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-12-12 13:14:10 +00:00
Max Reitz 5f535a941e block: Make essential BlockDriver objects public
There are some block drivers which are essential to QEMU and may not be
removed: These are raw, file and qcow2 (as the default non-raw format).
Make their BlockDriver objects public so they can be directly referenced
throughout the block layer without needing to call bdrv_find_format()
and having to deal with an error at runtime, while the real problem
occurred during linking (where raw, file or qcow2 were not linked into
qemu).

Cc: qemu-stable@nongnu.org
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:19 +01:00
Kevin Wolf 38f3ef574b raw: Prohibit dangerous writes for probed images
If the user neglects to specify the image format, QEMU probes the
image to guess it automatically, for convenience.

Relying on format probing is insecure for raw images (CVE-2008-2004).
If the guest writes a suitable header to the device, the next probe
will recognize a format chosen by the guest.  A malicious guest can
abuse this to gain access to host files, e.g. by crafting a QCOW2
header with backing file /etc/shadow.

Commit 1e72d3b (April 2008) provided -drive parameter format to let
users disable probing.  Commit f965509 (March 2009) extended QCOW2 to
optionally store the backing file format, to let users disable backing
file probing.  QED has had a flag to suppress probing since the
beginning (2010), set whenever a raw backing file is assigned.

All of these additions that allow to avoid format probing have to be
specified explicitly. The default still allows the attack.

In order to fix this, commit 79368c8 (July 2010) put probed raw images
in a restricted mode, in which they wouldn't be able to overwrite the
first few bytes of the image so that they would identify as a different
image. If a write to the first sector would write one of the signatures
of another driver, qemu would instead zero out the first four bytes.
This patch was later reverted in commit 8b33d9e (September 2010) because
it didn't get the handling of unaligned qiov members right.

Today's block layer that is based on coroutines and has qiov utility
functions makes it much easier to get this functionality right, so this
patch implements it.

The other differences of this patch to the old one are that it doesn't
silently write something different than the guest requested by zeroing
out some bytes (it fails the request instead) and that it doesn't
maintain a list of signatures in the raw driver (it calls the usual
probe function instead).

Note that this change doesn't introduce new breakage for false positive
cases where the guest legitimately writes data into the first sector
that matches the signatures of an image format (e.g. for nested virt):
These cases were broken before, only the failure mode changes from
corruption after the next restart (when the wrong format is probed) to
failing the problematic write request.

Also note that like in the original patch, the restrictions only apply
if the image format has been guessed by probing. Explicitly specifying a
format allows guests to write anything they like.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-id: 1416497234-29880-8-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:13 +01:00
Kevin Wolf 7cddd3728e block: Read only one sector for format probing
The only image format driver that even potentially accesses anything
after 512 bytes in its bdrv_probe() implementation is VMDK, which reads
a plain-text descriptor file. In practice, the field it's looking for
seems to come first and will be well within the first 512 bytes, too.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1416497234-29880-7-git-send-email-kwolf@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:12 +01:00
Max Reitz e140177d9c nbd: Change external interface to BlockBackend
Substitute BlockDriverState by BlockBackend in every globally visible
function provided by nbd.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Message-id: 1416309679-333-5-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:31:12 +01:00
Fam Zheng 20a9e77dfa block: Add bdrv_get_node_name
This returns the node name of a BDS. Remove the TODO comment and expect
the callers to be explicit.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:25:29 +01:00
Fam Zheng 04df765ab4 block: Add bdrv_next_node
Similar to bdrv_next, this traverses through graph_bdrv_states. Will be
useful to enumerate all the named nodes.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-12-10 10:25:29 +01:00
Peter Maydell 776346cd63 trivial patches for 2014-11-11
-----BEGIN PGP SIGNATURE-----
 Version: GnuPG v1.4.12 (GNU/Linux)
 
 iQEcBAABAgAGBQJUYh9vAAoJEL7lnXSkw9fbgPQH/065L5+SpaJR1Nte9Lz3N2s1
 a6tGSI22yu85tKvYCdYjeoVHSkSTyR57FdTfUd2xc2QPj+J4sWXpA81KILBGTJUp
 NMpmLpWg4LOh8Ek4ViRgmFFdryzIFa4dT4gc1AcSAIAQ6jsgK1dM7m5kfncC3TN0
 TUs248vJ2i/DaE0k8TOeJqxJTqInoFttlJEqG7RD+V5JznokE4zpFNXHDGx9BptE
 W2J38GJ/TKRPe9UrHMKZI1r6+ZBdXyE/CaqsNNKLJdqrHgSQuAyK/PS6dQbM4BLg
 M1qdP7Tp0wOlvv9qoEZMOEiUsi54XPqLgaLMbW74Yp5X459fqmLW2imy49pHXt8=
 =klsW
 -----END PGP SIGNATURE-----

Merge remote-tracking branch 'remotes/mjt/tags/pull-trivial-patches-2014-11-11' into staging

trivial patches for 2014-11-11

# gpg: Signature made Tue 11 Nov 2014 14:38:39 GMT using RSA key ID A4C3D7DB
# gpg: Good signature from "Michael Tokarev <mjt@tls.msk.ru>"
# gpg:                 aka "Michael Tokarev <mjt@corpit.ru>"
# gpg:                 aka "Michael Tokarev <mjt@debian.org>"

* remotes/mjt/tags/pull-trivial-patches-2014-11-11:
  block: Fix comment for bdrv_co_get_block_status
  sysbus: Correct SYSTEM_BUS(obj) defines
  target-i386: cpu: keeping function parameters alignment on new line
  xen-hvm: Remove redundant variable 'xstate'
  coroutine-sigaltstack: Change jmp_buf to sigjmp_buf
  pc-bios: petalogix-s3adsp1800.dtb: Use 'xlnx, xps-ethernetlite-2.00.a' instead of 'xlnx, xps-ethernetlite-2.00.b'
  gdbstub: Add a missing case of signal number translation in gdbstub
  numa: make 'info numa' take into account hotplugged memory
  slirp/smbd: modify/set several parameters in generated smbd.conf
  qemu-doc.texi: fix typos in x509 examples
  icc_bus: fix typo ICC_BRIGDE -> ICC_BRIDGE

Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-11-11 14:50:10 +00:00
Fam Zheng 705be728c0 block: Fix comment for bdrv_co_get_block_status
It returns more information than binary, fix the comment.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Michael Tokarev <mjt@tls.msk.ru>
2014-11-11 17:36:19 +03:00
Stefan Hajnoczi 5b98db0ad3 block: add bdrv_drain()
Now that op blockers are in use, we can ensure that no other sources are
generating I/O on a BlockDriverState.  Therefore it is possible to drain
requests for a single BDS.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1413889440-32577-7-git-send-email-stefanha@redhat.com
2014-11-03 11:41:49 +00:00
Stefan Hajnoczi dec7d421f8 blockjob: add block_job_defer_to_main_loop()
Block jobs will run in the BlockDriverState's AioContext, which may not
always be the QEMU main loop.

There are some block layer APIs that are either not thread-safe or risk
lock ordering problems.  This includes bdrv_unref(), bdrv_close(), and
anything that calls bdrv_drain_all().

The block_job_defer_to_main_loop() API allows a block job to schedule a
function to run in the main loop with the BlockDriverState AioContext
held.

This function will be used to perform cleanup and backing chain
manipulations in block jobs.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Message-id: 1413889440-32577-6-git-send-email-stefanha@redhat.com
2014-11-03 11:41:49 +00:00
Max Reitz 7748543420 block: Add status callback to bdrv_amend_options()
Depending on the changed options and the image format,
bdrv_amend_options() may take a significant amount of time. In these
cases, a way to be informed about the operation's status is desirable.

Since the operation is rather complex and may fundamentally change the
image, implementing it as AIO or a coroutine does not seem feasible. On
the other hand, implementing it as a block job would be significantly
more difficult than a simple callback and would not add benefits other
than progress report to the amending operation, because it should not
actually be run as a block job at all.

A callback may not be very pretty, but it's very easy to implement and
perfectly fits its purpose here.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1414404776-4919-2-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:48 +00:00
Max Reitz ef6dbf1e46 blockjob: Add "ready" field
When a block job signals readiness, this is currently reported only
through QMP. If qemu wants to use block jobs for internal tasks, there
needs to be another way to correctly detect when a block job may be
completed.

For this reason, introduce a bool "ready" which is set when the block
job may be completed.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Message-id: 1414159063-25977-6-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:48 +00:00
Max Reitz 345f9e1b04 blockjob: Introduce block_job_complete_sync()
Implement block_job_complete_sync() by doing the exact same thing as
block_job_cancel_sync() does, only with calling block_job_complete()
instead of block_job_cancel().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1414159063-25977-5-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:48 +00:00
Max Reitz 94054183da qcow2: Optimize bdrv_make_empty()
bdrv_make_empty() is currently only called if the current image
represents an external snapshot that has been committed to its base
image; it is therefore unlikely to have internal snapshots. In this
case, bdrv_make_empty() can be greatly sped up by emptying the L1 and
refcount table (while having the dirty flag set, which only works for
compat=1.1) and creating a trivial refcount structure.

If there are snapshots or for compat=0.10, fall back to the simple
implementation (discard all clusters).

[Applied s/clusters/cluster/ typo fix suggested by Eric Blake
--Stefan]

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Message-id: 1414159063-25977-4-git-send-email-mreitz@redhat.com
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 11:41:48 +00:00
Peter Lieven 2647fab57d BlockLimits: introduce max_transfer_length
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-11-03 09:48:41 +00:00
Max Reitz 9ebd844805 block: Add qemu_{,try_}blockalign0()
These functions call their non-0-counterparts and then fill the
allocated buffer with 0 (if the allocation has been successful).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-23 15:34:01 +02:00
Markus Armbruster a7f53e26a6 block: Lift device model API into BlockBackend
Move device model attachment / detachment and the BlockDevOps device
model callbacks and their wrappers from BlockDriverState to
BlockBackend.

Wrapper calls in block.c change from

    bdrv_dev_FOO_cb(bs, ...)

to

    if (bs->blk) {
        bdrv_dev_FOO_cb(bs->blk, ...);
    }

No change, because both bdrv_dev_change_media_cb() and
bdrv_dev_resize_cb() do nothing when no device model is attached, and
a device model can be attached only when bs->blk.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 14:03:50 +02:00
Markus Armbruster d829a2115f block/qapi: Convert qmp_query_block() to BlockBackend
Much more command code needs conversion.  I start with this one
because it's using bdrv_dev_* functions, which I'm about to lift into
BlockBackend.

While there, give bdrv_query_info() internal linkage.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Benoît Canet <benoit.canet@nodalink.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 14:03:50 +02:00
Markus Armbruster 097310b53e block: Rename BlockDriverCompletionFunc to BlockCompletionFunc
I'll use it with block backends shortly, and the name is going to fit
badly there.  It's a block layer thing anyway, not just a block driver
thing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:27 +02:00
Markus Armbruster 7c84b1b831 block: Rename BlockDriverAIOCB* to BlockAIOCB*
I'll use BlockDriverAIOCB with block backends shortly, and the name is
going to fit badly there.  It's a block layer thing anyway, not just a
block driver thing.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:27 +02:00
Markus Armbruster 7f06d47eff block: Merge BlockBackend and BlockDriverState name spaces
BlockBackend's name space is separate only to keep the initial patches
simple.  Time to merge the two.

Retain bdrv_find() and bdrv_get_device_name() for now, to keep this
series manageable.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00
Markus Armbruster bfb197e0d9 block: Eliminate BlockDriverState member device_name[]
device_name[] can become non-empty only in bdrv_new_root() and
bdrv_move_feature_fields().  The latter is used only to undo damage
done by bdrv_swap().  The former is called only by blk_new_with_bs().
Therefore, when a BlockDriverState's device_name[] is non-empty, then
it's been created with a BlockBackend, and vice versa.  Furthermore,
blk_new_with_bs() keeps the two names equal.

Therefore, device_name[] is redundant.  Eliminate it.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-10-20 13:41:26 +02:00