Commit Graph

757 Commits (be21788495fdc8251b04dd4bfd0cdce95c49d75b)

Author SHA1 Message Date
Kevin Wolf f2d953ec31 block: Catch duplicate IDs in bdrv_new()
Since commit f298d071, block devices added with blockdev-add don't have
a QemuOpts around in dinfo->opts. Consequently, we can't rely any more
on QemuOpts catching duplicate IDs for block devices.

This patch adds a new check for duplicate IDs to bdrv_new(), and moves
the existing check that the ID isn't already taken for a node-name there
as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-04-22 12:00:28 +02:00
Kevin Wolf 98522f63f4 block: Add errp to bdrv_new()
This patch adds an errp parameter to bdrv_new() and updates all its
callers. The next patches will make use of this in order to check for
duplicate IDs. Most of the callers know that their ID is fine, so they
can simply assert that there is no error.

Behaviour doesn't change with this patch yet as bdrv_new() doesn't
actually assign errors to errp.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2014-04-22 12:00:20 +02:00
Kevin Wolf 636ea3708c block: Remove -errno return value from bdrv_assign_node_name
It takes an errp argument. That's enough for error handling.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-04-22 11:57:02 +02:00
Fam Zheng b8afb520e4 block: Handle error of bdrv_getlength in bdrv_create_dirty_bitmap
bdrv_getlength could fail, check the return value before using it.
Return NULL and set errno if it fails. Callers are updated to handle
the error case.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-04-22 11:57:02 +02:00
Kevin Wolf 9ce10c0bdc block: Check bdrv_getlength() return value in bdrv_make_zero()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-22 11:57:02 +02:00
Kevin Wolf da15ee5134 block: Catch integer overflow in bdrv_rw_co()
Insanely large requests could cause an integer overflow in
bdrv_rw_co() while converting sectors to bytes. This patch catches the
problem and returns an error (if we hadn't overflown the integer here,
bdrv_check_byte_request() would have rejected the request, so we're not
breaking anything that was supposed to work before).

We actually do have a test case that triggers behaviour where we
accidentally let such a request pass, so that it would return success,
but read 0 bytes instead of the requested 4 GB. It fails now like it
should.

If the vdi block driver wants to be able to deal with huge images, it
can't read the whole block bitmap at once into memory like it does
today, but needs to use a metadata cache like qcow2 does.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-04-22 11:57:02 +02:00
Kevin Wolf 1dd3a44753 block: Limit size to INT_MAX in bdrv_check_byte_request()
Commit 8f4754ed intended to protect against integer overflow bugs in
block drivers by making sure that a single request that is passed to
drivers is no longer than INT_MAX bytes.

However, meanwhile there are some callers that don't use that code path
any more but call bdrv_check_byte_request() directy, so let's add a
check there as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-22 11:57:02 +02:00
Kevin Wolf 54db38a479 block: Fix nb_sectors check in bdrv_check_byte_request()
nb_sectors is signed, check for negative values.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-22 11:57:02 +02:00
Kevin Wolf f187743acd block: Check bdrv_getlength() return value in bdrv_append_temp_snapshot()
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-04 19:35:52 +02:00
Kevin Wolf b998875dcf block: Fix snapshot=on for protocol parsed from filename
Since commit 9fd3171a, BDRV_O_SNAPSHOT uses an option QDict to specify
the originally requested image as the backing file of the newly created
temporary snapshot. This means that the filename is stored in
"file.filename", which is an option that is not parsed for protocol
names. Therefore things like -drive file=nbd:localhost:10809 were
broken because it looked for a local file with the literal name
'nbd:localhost:10809'.

This patch changes the way BDRV_O_SNAPSHOT works once again. We now open
the originally requested image as normal, and then do a similar
operation as for live snapshots to put the temporary snapshot on top.
This way, both driver specific options and parsed filenames work.

As a nice side effect, this results in code movement to factor
bdrv_append_temp_snapshot() out. This is a good preparation for moving
its call to drive_init() and friends eventually.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-04 19:35:51 +02:00
Kevin Wolf e3fa4bfa72 block: Don't parse 'filename' option
When using the QDict option 'filename', it is supposed to be interpreted
literally. The code did correctly avoid guessing the protocol from any
string before the first colon, but it still called bdrv_parse_filename()
which would, for example, incorrectly remove a 'file:' prefix in the
raw-posix driver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-04-04 17:10:25 +02:00
Kevin Wolf 8f4754ede5 block: Limit request size (CVE-2014-0143)
Limiting the size of a single request to INT_MAX not only fixes a
direct integer overflow in bdrv_check_request() (which would only
trigger bad behaviour with ridiculously huge images, as in close to
2^64 bytes), but can also prevent overflows in all block drivers.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-04-01 15:22:35 +02:00
Kevin Wolf 5a8a30db47 block: Add error handling to bdrv_invalidate_cache()
If it returns an error, the migrated VM will not be started, but qemu
exits with an error message.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-03-19 09:39:41 +01:00
Markus Armbruster c3adb58fe0 blockdev: Refuse to open encrypted image unless paused
Opening an encrypted image takes an additional step: setting the key.
Between open and the key set, the image must not be used.

We have some protection against accidental use in place: you can't
unpause a guest while we're missing keys.  You can, however, hot-plug
block devices lacking keys into a running guest just fine, or insert
media lacking keys.  In the latter case, notifying the guest of the
insert is delayed until the key is set, which may suffice to protect
at least some guests in common usage.

This patch makes the protection apply in more cases, in a rather
heavy-handed way: it doesn't let you open encrypted images unless
we're in a paused state.

It doesn't extend the protection to users other than the guest (block
jobs?).  Use of runstate_check() from block.c is disgusting.  Best I
can do right now.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-14 16:24:42 +01:00
Max Reitz 9562f69cfd block: Unlink temporary file
If the image file cannot be opened and was created as a temporary file,
it should be deleted; thus, in this case, we should jump to the
"unlink_and_fail" label and not just to "fail".

Reported-by: Benoît Canet <benoit@irqsave.net>
Signed-off-by: Max Reitz <mreitz@redhat.com>
Acked-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-13 14:42:24 +01:00
Benoît Canet b5042a3622 block: Rewrite the snapshot authorization mechanism for block filters.
This patch keep the recursive way of doing things but simplify it by giving
two responsabilities to all block filters implementors.

They will need to do two things:

-Set the is_filter field of their block driver to true.

-Implement the bdrv_recurse_is_first_non_filter method of their block driver like
it is done on the Quorum block driver. (block/quorum.c)

[Paolo Bonzini <pbonzini@redhat.com> pointed out that this patch changes
the semantics of blkverify, which now recurses down both bs->file and
s->test_file.
-- Stefan]

Reported-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-13 14:23:27 +01:00
Max Reitz 938789ea92 block: bs->drv may be NULL in bdrv_debug_resume()
Currently, bdrv_debug_resume() requires every bs->drv in the BDS stack
to be NULL until a bs->drv with an implementation of bdrv_debug_resume()
is found. For a normal function, this would be fine, but this is a
function for debugging purposes and should therefore allow intermediate
BDS not to have a driver (i.e., be "ejected"). Otherwise, it is hard to
debug such situations.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-13 14:23:27 +01:00
Kevin Wolf 3456a8d185 block: Update image size in bdrv_invalidate_cache()
After migration has completed, we call bdrv_invalidate_cache() so that
drivers which cache some data drop their stale copy of the data and
reread it from the image file to get a new version of data that the
source modified while the migration was running.

Reloading metadata from the image file is useless, though, if the size
of the image file stays stale (this is a value that is cached for all
image formats in block.c). Reads from (meta)data after the old EOF
return only zeroes, causing image corruption.

We need to update bs->total_sectors in all layers that could potentially
have changed their size (i.e. backing files are not a concern - if they
are changed, we're in bigger trouble)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-03-13 14:23:27 +01:00
Kevin Wolf eb909c7f72 block: Fix error path segfault in bdrv_open()
Using an invalid option for a block device that is opened with
BDRV_O_PROTOCOL led to drv = NULL, and when trying to include the driver
name in the error message, qemu dereferenced it:

    $ x86_64-softmmu/qemu-system-x86_64 -drive file=/tmp/test.qcow2,file.foo=bar
    Segmentation fault (core dumped)

With this patch applied, the expected error message is printed:

    $ x86_64-softmmu/qemu-system-x86_64 -drive file=/tmp/test.qcow2,file.foo=bar
    qemu-system-x86_64: -drive file=/tmp/test.qcow2,file.foo=bar: could
    not open disk image /tmp/test.qcow2: Block protocol 'file' doesn't
    support the option 'foo'

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-03-06 17:29:24 +01:00
Max Reitz cd5d031e75 block: Keep "filename" option after parsing
Currently, bdrv_file_open() always removes the "filename" option from
the options QDict after bdrv_parse_filename() has been (successfully)
called. However, for drivers with bdrv_needs_filename, it makes more
sense for bdrv_parse_filename() to overwrite the "filename" option and
for bdrv_file_open() to fetch the filename from there.

Since there currently are no drivers that implement
bdrv_parse_filename() and have bdrv_needs_filename set, this does not
change current behavior.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-03-06 16:18:01 +01:00
Benoît Canet 90ce8a061b block: make bdrv_swap rebuild the bs graph node list field.
Moving only the node_name one field could lead to some inconsitencies where a
node_name was defined on a bs which was not registered in the graph node list.

bdrv_swap between a named node bs and a non named node bs would lead to this.

bdrv_make_anon would then crash because it would try to remove the bs from the
graph node list while it is not in it.

This patch remove named node bses from the graph node list before doing the swap
then insert them back.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-03-06 11:33:10 +01:00
Kevin Wolf 47ea2de2d6 block: Fix bs->request_alignment assertion for bs->sg=1
For sg backends, bs->request_alignment is meaningless and may be 0.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Acked-by: Paolo Bonzini <pbonzini@redhat.com>
2014-03-05 16:58:37 +01:00
Amit Shah 69bef7931e block: use /var/tmp instead of /tmp for -snapshot
If TMPDIR is not specified, the default was to use /tmp for the working
copy of the block devices.  Update this to /var/tmp instead, so systems
using tmp-on-tmpfs don't end up inadvertently using RAM for the block
device.

Signed-off-by: Amit Shah <amit.shah@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-02-28 18:59:07 +01:00
Max Reitz f7d9fd8c72 block: Remove bdrv_open_image()'s force_raw option
This option is now unnecessary since specifying BDRV_O_PROTOCOL as flag
will do exactly the same.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz 5acd9d81e1 block: Reuse success path from bdrv_open()
The fail and success paths of bdrv_file_open() may be further shortened
by reusing code already existent in bdrv_open(). This includes
bdrv_file_open() not taking the reference to options which allows the
removal of QDECREF(options) in that function.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz 5469a2a688 block: Handle bs->options in bdrv_open() only
The fail paths of bdrv_file_open() and bdrv_open() naturally exhibit
similarities, thus it is possible to reuse the one from bdrv_open() and
shorten the one in bdrv_file_open() accordingly.

Also, setting bs->options in bdrv_file_open() is not necessary if it is
already done in bdrv_open().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz d4446eae63 block: Remove bdrv_new() from bdrv_file_open()
Change bdrv_file_open() to take a simple pointer to an already existing
BDS instead of an indirect one. The BDS will be created in bdrv_open()
if necessary.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz 5d12aa63c7 block: Reuse reference handling from bdrv_open()
Remove the reference parameter and the related handling code from
bdrv_file_open(), since it exists in bdrv_open() now as well.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz 2e40134bfd block: Make bdrv_file_open() static
Add the bdrv_open() option BDRV_O_PROTOCOL which results in passing the
call to bdrv_file_open(). Additionally, make bdrv_file_open() static and
therefore bdrv_open() the only way to call it.

Consequently, all existing calls to bdrv_file_open() have to be adjusted
to use bdrv_open() with the BDRV_O_PROTOCOL flag instead.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz ddf5636dc9 block: Add reference parameter to bdrv_open()
Allow bdrv_open() to handle references to existing block devices just as
bdrv_file_open() is already capable of.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:22 +01:00
Max Reitz f67503e5bd block: Change BDS parameter of bdrv_open() to **
Make bdrv_open() take a pointer to a BDS pointer, similarly to
bdrv_file_open(). If a pointer to a NULL pointer is given, bdrv_open()
will create a new BDS with an empty name; if the BDS pointer is not
NULL, that existing BDS will be reused (in the same way as bdrv_open()
already did).

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-02-21 21:02:21 +01:00
Kevin Wolf e6dc8a1f83 block: Fix bdrv_is_first_non_filter()
Consider top level BlockDriverStates as well.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Tested-by: Benoit Canet <benoit@irqsave.net>
2014-02-21 21:02:21 +01:00
Peter Maydell 4c0c9bbe78 Merge remote-tracking branch 'remotes/qmp-unstable/queue/qmp' into staging
* remotes/qmp-unstable/queue/qmp:
  monitor: Add object_add class argument completion.
  monitor: Add object_del id argument completion.
  monitor: Add device_add device argument completion.
  monitor: Add device_del id argument completion.
  qmp: expose list of supported character device backends
  Use error_is_set() only when necessary
  QMP: allow JSON dict arguments in qmp-shell
  hmp: migrate command (without -d) now blocks correctly

Conflicts:
	blockdev.c

[PMM: resolved trivial conflict in blockdev.c]
Signed-off-by: Peter Maydell <peter.maydell@linaro.org>
2014-02-20 12:10:23 +00:00
Markus Armbruster 84d18f065f Use error_is_set() only when necessary
error_is_set(&var) is the same as var != NULL, but it takes
whole-program analysis to figure that out.  Unnecessarily hard for
optimizers, static checkers, and human readers.  Dumb it down to
obvious.

Gets rid of several dozen Coverity false positives.

Note that the obvious form is already used in many places.

Signed-off-by: Markus Armbruster <armbru@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Andreas Färber <afaerber@suse.de>
Signed-off-by: Luiz Capitulino <lcapitulino@redhat.com>
2014-02-17 11:57:23 -05:00
Benoît Canet 0c5e94ee83 block: Open by reference will try device then node_name.
Since we introduced node_name for named bs of the graph modify the opening by
reference to use it as a fallback.

This patch also enforce the separation of the device id and graph node
namespaces.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-02-14 18:05:39 +01:00
Benoît Canet dd67fa5052 block: Relax bdrv_lookup_bs constraints.
The following patch will reuse bdrv_lookup_bs in order to open images by
references so the rules of usage of bdrv_lookup_bs must be relaxed a bit.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-02-14 18:05:39 +01:00
Kevin Wolf e96126ffa5 block: Fix 32 bit truncation in mark_request_serialising()
On 32 bit hosts, size_t is too small for align as the bitmask
~(align - 1) will zero out the higher 32 bits of the offset.

While at it, change the local overlap_bytes variable to unsigned to
match the field in BdrvTrackedRequest.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
2014-02-09 09:12:39 +01:00
Kevin Wolf 5f5bcd80f8 block: Don't call ROUND_UP with negative values
The behaviour of the ROUND_UP macro with negative numbers isn't obvious.
It happens to do the right thing in this please, but better avoid it.

Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
2014-02-09 09:12:39 +01:00
Kevin Wolf af91f9a73c block: bdrv_aligned_pwritev: Assert overlap range
This adds assertions that the request that we actually end up passing to
the block driver (which includes RMW data and has therefore potentially
been rounded to alignment boundaries) is fully covered by the
overlap_{offset,size} fields of the associated BdrvTrackedRequest.

Suggested-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
2014-02-09 09:12:39 +01:00
Kevin Wolf 99c4a85ce6 block: Fix memory leaks in bdrv_co_do_pwritev()
The error path for a failure in one of the two bdrv_aligned_preadv()
calls leaked head_buf or tail_buf, respectively. This fixes the memory
leak.

Reported-by: Laszlo Ersek <lersek@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Laszlo Ersek <lersek@redhat.com>
2014-02-09 09:12:39 +01:00
Kevin Wolf 765003db02 block: Fail gracefully with missing filename
This fixes a regression introduced in commit 2a05cbe42 ('block: Allow
block devices without files'):

$ qemu-system-x86_64 -drive driver=file
qemu-system-x86_64: block.c:892: bdrv_open_common: Assertion
`!drv->bdrv_needs_filename || filename != ((void *)0)' failed.

Now the respective check must be performed not only in bdrv_file_open(),
but also in bdrv_open().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2014-02-09 09:12:38 +01:00
Kevin Wolf d5103588aa block: Switch bdrv_io_limits_intercept() to byte granularity
Request sizes used to be rounded down to the next sector boundary,
allowing to bypass the I/O limit. Now all requests are accounted for
with their exact byte size.

Reported-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:28 +01:00
Kevin Wolf 9e1cb96d9a qemu-iotests: Test pwritev RMW logic
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:25 +01:00
Kevin Wolf 8407d5d7e2 block: Make bdrv_pwrite() a bdrv_prwv_co() wrapper
Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_pwritev().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf a3ef657185 block: Make bdrv_pread() a bdrv_prwv_co() wrapper
Instead of implementing the alignment adjustment here, use the now
existing functionality of bdrv_co_do_preadv().

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf 775aa8b6e0 block: Change coroutine wrapper to byte granularity
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf 28de2dcd88 block: Assert serialisation assumptions in pwritev
If a request calls wait_serialising_requests() and actually has to wait
in this function (i.e. a coroutine yield), other requests can run and
previously read data (like the head or tail buffer) could become
outdated. In this case, we would have to restart from the beginning to
read in the updated data.

However, we're lucky and don't actually need to do that: A request can
only wait in the first call of wait_serialising_requests() because we
mark it as serialising before that call, so any later requests would
wait. So as we don't wait in practice, we don't have to reload the data.

This is an important assumption that may not be broken or data
corruption will happen. Document it with some assertions.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:03 +01:00
Kevin Wolf 3b8242e0ea block: Align requests in bdrv_co_do_pwritev()
This patch changes bdrv_co_do_pwritev() to actually be what its name
promises. If requests aren't properly aligned, it performs a RMW.

Requests touching the same block are serialised against the RMW request.
Further optimisation of this is possible by differentiating types of
requests (concurrent reads should actually be okay here).

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf 6460440f34 block: Allow wait_serialising_requests() at any point
We can only have a single wait_serialising_requests() call per request
because otherwise we can run into deadlocks where requests are waiting
for each other. The same is true when wait_serialising_requests() is not
at the very beginning of a request, so that other requests can be issued
between the start of the tracking and wait_serialising_requests().

Fix this by changing wait_serialising_requests() to ignore requests that
are already (directly or indirectly) waiting for the calling request.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf 7327145f63 block: Make overlap range for serialisation dynamic
Copy on Read wants to serialise with all requests touching the same
cluster, so wait_serialising_requests() rounded to cluster boundaries.
Other users like alignment RMW will have different requirements, though
(requests touching the same sector), so make it dynamic.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf 2dbafdc012 block: Generalise and optimise COR serialisation
Change the API so that specific requests can be marked serialising. Only
these requests are checked for overlaps then.

This means that during a Copy on Read operation, not all requests
overlapping other requests are serialised any more, but only those that
actually overlap with the specific COR request.

Also remove COR from function and variable names because this
functionality can be useful in other contexts.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf ec746e10cb block: Make zero-after-EOF work with larger alignment
Odd file sizes could make bdrv_aligned_preadv() shorten the request in
non-aligned ways. Fix it by rounding to the required alignment instead
of 512 bytes.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf 65afd211c7 block: Allow waiting for overlapping requests between begin/end
Previously, it was not possible to use wait_for_overlapping_requests()
between tracked_request_begin()/end() because it would wait for itself.

Ignore the current request in the overlap check and run more of the
bdrv_co_do_preadv/pwritev code with a BdrvTrackedRequest present.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf 793ed47a7a block: Switch BdrvTrackedRequest to byte granularity
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf 6601553e27 block: Introduce bdrv_co_do_pwritev()
This is going to become the bdrv_co_do_preadv() equivalent for writes.
In this patch, however, just a function taking byte offsets is created,
it doesn't align anything yet.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf 244eadef5c block: write: Handle COR dependency after I/O throttling
First waiting for all COR requests to complete and calling the
throttling function afterwards means that the request could be delayed
and we still need to wait for the COR request even if it was issued only
after the throttled write request.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf b404f72036 block: Introduce bdrv_aligned_pwritev()
This separates the part of bdrv_co_do_writev() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf 1b0288ae7f block: Introduce bdrv_co_do_preadv()
Similar to bdrv_pread(), which aligns byte-aligned request to 512 byte
sectors, bdrv_co_do_preadv() takes a byte-aligned request and aligns it
to the alignment specified in bs->request_alignment.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:02 +01:00
Kevin Wolf d0c7f642f5 block: Introduce bdrv_aligned_preadv()
This separates the part of bdrv_co_do_readv() that needs to happen
before the request is modified to match the backend alignment, and a
part that needs to be executed afterwards and passes the request to the
BlockDriver.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:02 +01:00
Paolo Bonzini c25f53b06e raw: Probe required direct I/O alignment
Add a bs->request_alignment field that contains the required
offset/length alignment for I/O requests and fill it in the raw block
drivers. Use ioctls if possible, else see what alignment it takes for
O_DIRECT to succeed.

While at it, also expose the memory alignment requirements, which may be
(and in practice are) different from the disk alignment requirements.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:02 +01:00
Paolo Bonzini 1b7fd72955 block: rename buffer_alignment to guest_block_size
The alignment field is now set to the value that is promised to the
guest, rather than required by the host.  The next patches will make
QEMU aware of the host-provided values, so make this clear.

The alignment is also not about memory buffers, but about the sectors on
the disk, change the documentation of the field.

At this point, the field is set by the device emulation, but completely
ignored by the block layer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf 339064d506 block: Don't use guest sector size for qemu_blockalign()
bs->buffer_alignment is set by the device emulation and contains the
logical block size of the guest device. This isn't something that the
block layer should know, and even less something to use for determining
the right alignment of buffers to be used for the host.

The new BlockLimits field opt_mem_alignment tells the qemu block layer
the optimal alignment to be used so that no bounce buffer must be used
in the driver.

This patch may change the buffer alignment from 4k to 512 for all
callers that used qemu_blockalign() with the top-level image format
BlockDriverState. The value was never propagated to other levels in the
tree, so in particular raw-posix never required anything else than 512.

While on disks with 4k sectors direct I/O requires a 4k alignment,
memory may still be okay when aligned to 512 byte boundaries. This is
what must have happened in practice, because otherwise this would
already have failed earlier. Therefore I don't expect regressions even
with this intermediate state. Later, raw-posix can implement the hook
and expose a different memory alignment requirement.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:01 +01:00
Kevin Wolf 1ff735bdc4 block: Detect unaligned length in bdrv_qiov_is_aligned()
For an O_DIRECT request to succeed, it's not only necessary that all
base addresses in the qiov are aligned, but also that each length in it
is aligned.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
2014-01-24 17:40:01 +01:00
Kevin Wolf 355ef4ac95 block: Update BlockLimits when they might have changed
When reopening with different flags, or when backing files disappear
from the chain, the limits may change. Make sure they get updated in
these cases.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf 466ad822de block: Inherit opt_transfer_length
When there is a format driver between the backend, it's not guaranteed
that exposing the opt_transfer_length for the format driver results in
the optimal requests (because of fragmentation etc.), but it can't make
things worse, so let's just do it.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoît Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf d34682cd4a block: Move initialisation of BlockLimits to bdrv_refresh_limits()
This function separates filling the BlockLimits from bdrv_open(), which
allows it to call it from other operations which may change the limits
(e.g. modifications to the backing file chain or bdrv_reopen)

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 17:40:01 +01:00
Kevin Wolf dabfa6cc2e block: Fix bdrv_commit return value
bdrv_commit() could return 0 or 1 on success, depending on whether or
not the last sector was allocated in the overlay and whether the overlay
format had a .bdrv_make_empty callback.

Most callers ignored it, but qemu-img commit would print an error
message while the operation actually succeeded.

Also clean up the handling of I/O errors to return the real error code
instead of -EIO.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2014-01-24 16:53:51 +01:00
Jeff Cody 72706ea4cd block: resize backing file image during offline commit, if necessary
Currently, if an image file is logically larger than its backing file,
committing it via 'qemu-img commit' will fail.

For instance, if we have a base image with a virtual size 10G, and a
snapshot image of size 20G, then committing the snapshot offline with
'qemu-img commit' will likely fail.

This will automatically attempt to resize the base image, if the
snapshot image to be committed is larger.

Signed-off-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:12:49 +01:00
Benoît Canet 212a5a8f09 block: Create authorizations mechanism for external snapshot and resize.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:07:08 +01:00
Benoît Canet 12d3ba821d qmp: Allow to change password on named block driver states.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Fam Zheng <famz@redhat.com>

There was two candidate ways to implement named node manipulation:

1)
{ 'command': 'block_passwd', 'data': {'*device': 'str',
                                      '*node-name': 'str', 'password': 'str'}
}

2)

{ 'command': 'block_passwd', 'data': {'device': 'str',
                                      '*device-is-node': 'bool',
                                      'password': 'str'} }

Luiz proposed 1 and says 2 was an abuse of the QMP interface and proposed to
rewrite the QMP block interface for 2.0.

Luiz does not like in 1 the fact that 2 fields are optional but one of them must
be specified leading to an abuse of the QMP semantic.

Kevin argumented that 2 what a clear abuse of the device field and would not be
practical when reading fast some log file because the user would read "device"
and think that a device is manipulated when it's in fact a node name.
Documentation of 1 make it pretty clear what to do for the user.

Kevin argued that all bs are node including devices ones so 2 does not make
sense.

Kevin also argued that rewriting the QMP block interface would not make disapear
the current one.

Kevin pushed the argument that making the QAPI generator compatible with the
semantic of the operation would need a rewrite that no one has done yet.

A vote has been done on the list to elect the version to use and 1 won.

For reference the complete thread is:
"[Qemu-devel] [PATCH V4 4/7] qmp: Allow to change password on names block driver
states."

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:07:08 +01:00
Benoît Canet c13163fba1 qmp: Add QMP query-named-block-nodes to list the named BlockDriverState nodes.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:07:08 +01:00
Benoît Canet 6913c0c2ce block: Allow the user to define "node-name" option both on command line and QMP.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 16:06:47 +01:00
Benoît Canet dc364f4cdc block: Add bs->node_name to hold the name of a bs node of the bs graph.
Add the minimum of code to prepare for the following patches.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-24 14:33:01 +01:00
Peter Feiner d80ac658f2 block: fix backing file segfault
When a backing file is opened such that (1) a protocol is directly
used as the block driver and (2) the block driver has bdrv_file_open,
bdrv_open_backing_file segfaults. The problem arises because
bdrv_open_common returns without setting bd->backing_hd->file.

To effect (1), you seem to have to use the -F flag in qemu-img. There
are several block drivers that satisfy (2), such as "file" and "nbd".
Here are some concrete examples:

    #!/bin/bash

    echo Test file format
    ./qemu-img create -f file base.file 1m
    ./qemu-img create -f qcow2 -F file -o backing_file=base.file\
        file-overlay.qcow2
    ./qemu-img convert -O raw file-overlay.qcow2 file-convert.raw

    echo Test nbd format
    SOCK=$PWD/nbd.sock
    ./qemu-img create -f raw base.raw 1m
    ./qemu-nbd -t -k $SOCK base.raw &
    trap "kill $!" EXIT
    while ! test -e $SOCK; do sleep 1; done
    ./qemu-img create -f qcow2 -F nbd -o backing_file=nbd:unix:$SOCK\
        nbd-overlay.qcow2
    ./qemu-img convert -O raw nbd-overlay.qcow2 nbd-convert.raw

Without this patch, the two qemu-img convert commands segfault.

This is a regression that was introduced in v1.7 by
dbecebddfa.

Signed-off-by: Peter Feiner <peter@gridcentric.ca>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 13:47:52 +01:00
Max Reitz 505d758334 block: Allow recursive "file"s
It should be possible to use a format as a driver for a file which in
turn requires another file, i.e., nesting file formats.

Allowing nested file formats results in e.g. qcow2 BlockDriverStates
never being directly passed to bdrv_open_common() from bdrv_file_open(),
but instead being handed through bdrv_open(). This changes the error
message when trying to give a filename to qcow2, i.e. trying to use it
as a driver for the protocol level. Therefore, change the reference
output of I/O test 051 accordingly.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:18 +01:00
Max Reitz 054963f8f0 block: Use bdrv_open_image() in bdrv_open()
Using bdrv_open_image() instead of bdrv_file_open() directly in
bdrv_open() is easier.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:18 +01:00
Max Reitz da557aac18 block: Add bdrv_open_image()
Add a common function for opening images to be used for block drivers
specified through BlockdevRefs in an option QDict. The difference from
bdrv_file_open() is that this function may invoke bdrv_open() instead,
allowing auto-detection of the driver to be used; and second, it
automatically extracts the BlockdevRef from the option QDict.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:18 +01:00
Max Reitz 2a05cbe426 block: Allow block devices without files
blkdebug and blkverify will, in order to retain compatibility, not
support the field "file" implicitly through bdrv_open(). In order to be
able to use those drivers without giving a filename anyway, it is
necessary to be able to have block devices without files implicitly
opened by bdrv_open(). This is the case, if there was neither a file
name, a reference to an existing block device to use as a file nor
options specific to the file.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:17 +01:00
Max Reitz 2258e3fe20 block: Pass reference to bdrv_file_open()
With that now being possible, bdrv_open() should try to extract a block
device reference from the options and pass it to bdrv_file_open().

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:17 +01:00
Max Reitz 72daa72eee block: Allow reference for bdrv_file_open()
Allow specifying a reference to an existing block device (by name) for
bdrv_file_open() instead of a filename and/or options.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2014-01-22 12:07:17 +01:00
Peter Lieven 3d94ce60ae block: expect get_block_status errors in bdrv_make_zero
during testing around with 4k LUNs a bad target implementation
triggert an -EIO in iscsi_get_block_status, but it got never caught
resulting in an infinite loop.

CC: qemu-stable@nongnu.org
Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-12-13 14:49:50 +01:00
Stefan Hajnoczi 0b06ef3bdd block: clean up bdrv_drain_all() throttling comments
Since cc0681c454 ("block: Enable the new
throttling code in the block layer.") bdrv_drain_all() no longer spins.
The code used to look as follows:

  do {
      busy = qemu_aio_wait();

      /* FIXME: We do not have timer support here, so this is effectively
       * a busy wait.
       */
      QTAILQ_FOREACH(bs, &bdrv_states, list) {
          while (qemu_co_enter_next(&bs->throttled_reqs)) {
              busy = true;
          }
      }
  } while (busy);

Note that throttle requests are kicked but I/O throttling limits are
still in effect.  The loop spins until the vm_clock time allows the
request to make progress and complete.

The new throttling code introduced bdrv_start_throttled_reqs().  This
function not only kicks throttled requests but also temporarily disables
throttling so requests can run.

The outdated FIXME comment can be removed.  Also drop the busy = true
assignment since we overwrite it immediately afterwards.

Reviewed-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-06 16:53:51 +01:00
Max Reitz 66f6b8143b block: Close backing file early in bdrv_img_create
Leaving the backing file open although it is not needed anymore can
cause problems if it is opened through a block driver which allows
exclusive access only and if the create function of the block driver
used for the top image (the one being created) tries to close and reopen
the image file (which will include opening the backing file a second
time).

In particular, this will happen with a backing file opened through
qemu-nbd and using qcow2 as the top image file format (which reopens the
image to flush it to disk).

In addition, the BlockDriverState in bdrv_img_create() is used for the
backing file only; it should therefore be made local to the respective
block.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Wenchao Xia <xiawenc@linux.vnet.ibm.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-04 11:29:19 +01:00
Paolo Bonzini b8d71c09f3 block: make bdrv_co_do_write_zeroes stricter in producing aligned requests
Right now, bdrv_co_do_write_zeroes will only try to align the
beginning of the request.  However, it is simpler for many
formats to expect the block layer to separate both the head *and*
the tail.  This makes sure that the format's bdrv_co_write_zeroes
function will be called with aligned sector_num and nb_sectors for
the bulk of the request.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:49 +01:00
Paolo Bonzini 7ce21016b6 block: handle ENOTSUP from discard in generic code
Similar to write_zeroes, let the generic code receive a ENOTSUP for
discard operations.  Since bdrv_discard has advisory semantics,
we can just swallow the error.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:49 +01:00
Paolo Bonzini d5ef94d43d block: add bdrv_aio_write_zeroes
This will be used by the SCSI layer.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:49 +01:00
Paolo Bonzini 94d6ff21f4 block: add flags argument to bdrv_co_write_zeroes tracepoint
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:49 +01:00
Paolo Bonzini d20d9b7c67 block: add flags to BlockRequest
This lets bdrv_co_do_rw receive flags, so that it can be used for
zero writes.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:48 +01:00
Paolo Bonzini d51e9fe505 block: generalize BlockLimits handling to cover bdrv_aio_discard too
bdrv_co_discard is only covering drivers which have a .bdrv_co_discard()
implementation, but not those with .bdrv_aio_discard(). Not very nice,
and easy to avoid.

Suggested-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Reviewed-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-12-03 15:26:48 +01:00
Kevin Wolf c9fbb99d41 block: Use BDRV_O_NO_BACKING where appropriate
If you open an image temporarily just because you want to check its size
or get it flushed, there's no real reason to open the whole backing file
chain.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
2013-11-29 17:41:09 +01:00
Kevin Wolf 9fd3171af9 block: Enable BDRV_O_SNAPSHOT with driver-specific options
In the case of snapshot=on, don't rely on the backing file path in the
temporary image any more, but override the backing file with the given
set of options. This way, block drivers that don't use a file name can
be accessed with snapshot=on, for example:

    -drive file.driver=nbd,file.host=localhost,snapshot=on

Which becomes internally something like:

    file.filename=/tmp/vl.AWQZCu,backing.file.driver=nbd,backing.file.host=localhost

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-11-29 13:40:37 +01:00
Fam Zheng 4cc70e9337 blkdebug: add "remove_break" command
This adds "remove_break" command which is the reverse of blkdebug
command "break": it removes all breakpoints with given tag and resumes
all the requests.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-29 13:40:37 +01:00
Fam Zheng 21b5683508 qapi: Change BlockDirtyInfo to list
We have multiple dirty bitmaps in BDS now, switch QAPI to allow query
it (BlockInfo.dirty_bitmaps), and also drop old BlockInfo.dirty.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-11-29 13:40:36 +01:00
Fam Zheng e4654d2d94 block: per caller dirty bitmap
Previously a BlockDriverState has only one dirty bitmap, so only one
caller (e.g. a block job) can keep track of writing. This changes the
dirty bitmap to a list and creates a BdrvDirtyBitmap for each caller, the
lifecycle is managed with these new functions:

    bdrv_create_dirty_bitmap
    bdrv_release_dirty_bitmap

Where BdrvDirtyBitmap is a linked list wrapper structure of HBitmap.

In place of bdrv_set_dirty_tracking, a BdrvDirtyBitmap pointer argument
is added to these functions, since each caller has its own dirty bitmap:

    bdrv_get_dirty
    bdrv_dirty_iter_init
    bdrv_get_dirty_count

bdrv_set_dirty and bdrv_reset_dirty prototypes are unchanged but will
internally walk the list of all dirty bitmaps and set them one by one.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-11-29 13:40:33 +01:00
Peter Lieven c3d8688470 block/get_block_status: fix BDRV_BLOCK_ZERO for unallocated blocks
this patch does 2 things:
a) only do additional call outs if BDRV_BLOCK_ZERO is not already set.
b) use the newly introduced bdrv_unallocated_blocks_are_zero()
   to return the zero state of an unallocated block. the used callout
   to bdrv_has_zero_init() is only valid right after bdrv_create.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:52 +01:00
Peter Lieven d75cbb5e68 block: introduce bdrv_make_zero
this patch adds a call to completely zero out a block device.
the operation is sped up by checking the block status and
only writing zeroes to the device if they currently do not
return zeroes. optionally the zero writing can be sped up
by setting the flag BDRV_REQ_MAY_UNMAP to emulate the zero
write by unmapping if the driver supports it.

Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:52 +01:00
Peter Lieven 6f14da5247 block: honour BlockLimits in bdrv_co_discard
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven c31cb70728 block: honour BlockLimits in bdrv_co_do_write_zeroes
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven 4ce786914b block: add wrappers for logical block provisioning information
This adds 2 wrappers to read the unallocated_blocks_are_zero and
can_write_zeroes_with_unmap info from the BDI. The wrappers are
required to check for the existence of a backing_hd and
if the devices are opened with the correct flags.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven d32f35cbc5 block: introduce BDRV_REQ_MAY_UNMAP request flag
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven aa7bfbfff7 block: add flags to bdrv_*_write_zeroes
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Peter Lieven 6faac15fa8 block: make BdrvRequestFlags public
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-28 10:30:51 +01:00
Kevin Wolf 06d22aa367 block: Fail if requested driver is not available
If an explicit driver option is present, but doesn't specify a valid
driver, then bdrv_open() should fail instead of probing the format.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Jeff Cody <jcody@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-15 13:37:48 +01:00
Fam Zheng b04b6b6ec3 block: Print its file name if backing file opening failed
If backing file doesn't exist, the error message is confusing and
misleading:

    $ qemu /tmp/a.qcow2
    qemu: could not open disk image /tmp/a.qcow2: Could not open file: No
    such file or directory

But...

    $ ls /tmp/a.qcow2
    /tmp/a.qcow2

    $ qemu-img info /tmp/a.qcow2
    image: /tmp/a.qcow2
    file format: qcow2
    virtual size: 8.0G (8589934592 bytes)
    disk size: 196K
    cluster_size: 65536
    backing file: /tmp/b.qcow2

Because...

    $ ls /tmp/b.qcow2
    ls: cannot access /tmp/b.qcow2: No such file or directory

This is not intuitive. It's better to have the missing file's name in
the error message. With this patch:

    $ qemu-io -c 'read 0 512' /tmp/a.qcow2
    qemu-io: can't open device /tmp/a.qcow2: Could not open backing
    file: Could not open '/stor/vm/arch.raw': No such file or directory
    no file open, try 'help open'

Which is a little bit better.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-11-14 13:09:06 +01:00
Fam Zheng 7e382003f1 block: Round up total_sectors
Since b94a2610, bdrv_getlength() is omitted when probing image. VMDK
monolithicFlat is broken by that because a file < 512 bytes can't be
read with its total_sectors truncated to 0. This patch round up the size
to BDRV_SECTOR_SIZE, when a image size is not sector aligned.

Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-08 10:44:24 +01:00
Max Reitz 17826bc159 block: Save errno before error_setg_errno
error_setg_errno() may overwrite errno; therefore, its value should be
read before calling that function and not afterwards.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-11-07 13:58:58 +01:00
Kevin Wolf b94a261057 block: Avoid unecessary drv->bdrv_getlength() calls
The block layer generally keeps the size of an image cached in
bs->total_sectors so that it doesn't have to perform expensive
operations to get the size whenever it needs it.

This doesn't work however when using a backend that can change its size
without qemu being aware of it, i.e. passthrough of removable media like
CD-ROMs or floppy disks. For this reason, the caching is disabled when a
removable device is used.

It is obvious that checking whether the _guest_ device has removable
media isn't the right thing to do when we want to know whether the size
of the host backend can change. To make things worse, non-top-level
BlockDriverStates never have any device attached, which makes qemu
assume they are removable, so drv->bdrv_getlength() is always called on
the protocol layer. In the case of raw-posix, this causes unnecessary
lseek() system calls, which turned out to be rather expensive.

This patch completely changes the logic and disables bs->total_sectors
caching only for certain block driver types, for which a size change is
expected: host_cdrom and host_floppy on POSIX, host_device on win32; also
the raw format in case it sits on top of one of these protocols, but in
the common case the nested bdrv_getlength() call on the protocol driver
will use the cache again and avoid an expensive drv->bdrv_getlength()
call.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Paolo Bonzini <pbonzini@redhat.com>
2013-10-29 13:10:26 +01:00
Thibaut LAURENT 87a5debd31 block: Disable BDRV_O_COPY_ON_READ for the backing file
Since commit 0ebd24e0a2,
bdrv_open_common will throw an error when trying to open a file
read-only with the BDRV_O_COPY_ON_READ flag set.
Although BDRV_O_RDWR is unset for the backing files,
BDRV_O_COPY_ON_READ is still passed on if copy-on-read was requested
for the drive. Let's unset this flag too before opening the backing
file, or bdrv_open_common will fail.

Signed-off-by: Thibaut LAURENT <thibaut.laurent@gmail.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-29 13:06:39 +01:00
Max Reitz 61ed268453 block: Don't copy backing file name on error
bdrv_open_backing_file() tries to copy the backing file name using
pstrcpy directly after calling bdrv_open() to open the backing file
without checking whether that was actually successful. If it was not,
ps->backing_hd->file will probably be NULL and qemu will crash.

Fix this by moving pstrcpy after checking whether bdrv_open() succeeded.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Amos Kong <kongjianjun@gmail.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-28 17:35:52 +01:00
Kevin Wolf 0ebd24e0a2 blockdev: Don't disable COR automatically with blockdev-add
If a read-only device is configured with copy-on-read=on, the old code
only prints a warning and automatically disables copy on read. Make it
a real error for blockdev-add.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
2013-10-11 16:50:02 +02:00
Kevin Wolf 8f94a6e40e block: Improve driver whitelist checks
The main intent of this patch is to consolidate the whitelist checks to
a single point in the code instead of spreading it everywhere. This adds
a nicer error message for read-only whitelisting, too, in places where
it was still missing.

The patch also contains a bonus bug fix: By finding the format first in
bdrv_open() and then independently checking against the whitelist only
later, we avoid the case that use of a non-whitelisted format results in
probing rather than an error message. Previously, this could happen when
using the driver=... option.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Reviewed-by: Fam Zheng <famz@redhat.com>
2013-10-11 16:50:00 +02:00
Benoît Canet f6186f49e2 block: Add BlockDriver.bdrv_check_ext_snapshot.
This field is used by blkverify to disable external snapshots creation.
It will also be used by block filters like quorum to disable external
snapshot creation.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 16:49:59 +02:00
Peter Lieven 92bc50a5ad block/get_block_status: avoid redundant callouts on raw devices
if a raw device like an iscsi target or host device is used
the current implementation makes a second call out to get
the block status of bs->file.

Signed-off-by: Peter Lieven <pl@kamp.de>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 16:49:59 +02:00
Max Reitz eae041fe6f block: Add bdrv_get_specific_info
Add a function for retrieving an ImageInfoSpecific object from a block
driver.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-10-11 10:52:54 +02:00
Dunrong Huang d4cea8dfb9 block: use correct filename
The content filename point to may be erased by qemu_opts_absorb_qdict()
in raw_open_common() in drv->bdrv_file_open()

So it's better to use bs->filename.

Signed-off-by: Dunrong Huang <riegamaths@gmail.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-10-07 13:23:19 +02:00
Dunrong Huang 2fa9aa59cf block: use correct filename for error report
The content filename point to will be erased by qemu_opts_absorb_qdict()
in raw_open_common() in drv->bdrv_file_open()

So it's better to use bs->filename.

Signed-off-by: Dunrong Huang <riegamaths@gmail.com>
Reviewed-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-10-02 11:41:50 +02:00
Fam Zheng d055a1fec3 block: use DIV_ROUND_UP in bdrv_co_do_readv
Signed-off-by: Fam Zheng <famz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-26 14:11:06 +02:00
Benoît Canet 5726d872f3 qdict: Extract qdict_extract_subqdict
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 16:21:28 +02:00
Benoît Canet 030be32184 block: introduce BlockDriver.bdrv_needs_filename to enable some drivers.
Some drivers will have driver specifics options but no filename.
This new bool allow the block layer to treat them correctly.

The .bdrv_needs_filename is set in drivers not having .bdrv_parse_filename and
not having .bdrv_open.

The first exception to this rule will be the quorum driver.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 16:21:28 +02:00
Peter Lieven 1f9db2243c block/get_block_status: avoid segfault if there is no backing_hd
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 10:08:56 +02:00
Peter Lieven 3e0a233d86 block/get_block_status: set *pnum = 0 on error
if the call is invoked through bdrv_is_allocated the caller might
expect *pnum = 0 on error. however, a new implementation of
bdrv_get_block_status might only return a negative exit value on
error while keeping *pnum untouched.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Peter Lieven <pl@kamp.de>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 10:08:56 +02:00
Fam Zheng dbecebddfa block: fix backing file overriding
Providing backing.file.filename doesn't override backing file as expected:

    $ x86_64-softmmu/qemu-system-x86_64 -drive \
        file=/tmp/child.qcow2,backing.file.filename=/tmp/fake.qcow2

    qemu-system-x86_64: -drive \
        file=/tmp/child.qcow2,backing.file.filename=/tmp/fake.qcow2: could not
        open disk image /tmp/child.qcow2: Can't specify 'file' and 'filename'
        options at the same time

With

    $ qemu-img info /tmp/child.qcow2
    image: /tmp/child.qcow2
    file format: qcow2
    virtual size: 1.0G (1073741824 bytes)
    disk size: 196K
    cluster_size: 65536
    backing file: /tmp/fake.qcow2

This fixes it by calling bdrv_get_full_backing_filename only if
backing.file.filename is not provided. Also save the backing file name
to bs->backing_file so the information is correct with HMP "info block".

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-25 10:08:56 +02:00
Fam Zheng bcb9d66e85 block: don't lose data from last incomplete sector
To read the last sector that is not aligned to sector boundary, current
code for growable backends, since commit 893a8f6 "block: Produce zeros
when protocols reading beyond end of file", drops the data and directly
returns zeroes. That is incorrect.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-20 19:27:26 +02:00
Max Reitz cc84d90ff5 block: Error parameter for create functions
Add an Error ** parameter to bdrv_create and its associated functions to
allow more specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:48 +02:00
Max Reitz 34b5d2c68e block: Error parameter for open functions
Add an Error ** parameter to bdrv_open, bdrv_file_open and associated
functions to allow more specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:48 +02:00
Max Reitz d5124c00d8 bdrv: Use "Error" for creating images
Add an Error ** parameter to BlockDriver.bdrv_create to allow more
specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:48 +02:00
Max Reitz 015a1036a7 bdrv: Use "Error" for opening images
Add an Error ** parameter to BlockDriver.bdrv_open and
BlockDriver.bdrv_file_open to allow more specific error messages.

Signed-off-by: Max Reitz <mreitz@redhat.com>
2013-09-12 10:12:47 +02:00
Max Reitz 6f176b48f9 block: Image file option amendment
This patch adds the "amend" option to qemu-img which allows changing
image options on existing image files. It also adds the generic bdrv
implementation which is basically just a wrapper for the image format
specific function.

Signed-off-by: Max Reitz <mreitz@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-09-12 10:12:46 +02:00
Paolo Bonzini 5daa74a6eb block: look for zero blocks in bs->file
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini 918e92d71b block: add default get_block_status implementation for protocols
Protocols return raw data, so you can assume the offsets to pass
through unchanged.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini f0ad5712d5 block: return BDRV_BLOCK_ZERO past end of backing file
If the sectors are unallocated and we are past the end of the
backing file, they will read as zero.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini 415b5b013c block: use bdrv_has_zero_init to return BDRV_BLOCK_ZERO
Alternatively, this could use a "discard zeroes data" flag returned
by bdrv_get_info.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini 4333bb7140 block: define get_block_status return value
Define the return value of get_block_status.  Bits 0, 1, 2 and 9-62
are valid; bit 63 (the sign bit) is reserved for errors.  Bits 3-8
are left for future extensions.

The return code is compatible with the old is_allocated API: if a driver
only returns 0 or 1 (aka BDRV_BLOCK_DATA) like is_allocated used to,
clients of is_allocated will not have any change in behavior.  Still,
we will return more precise information in the next patches and the
new definition of bdrv_is_allocated is already prepared for this.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini b6b8a33354 block: introduce bdrv_get_block_status API
For now, bdrv_get_block_status is just another name for bdrv_is_allocated.
The next patches will add more flags.

This also touches all block drivers with a mostly mechanical rename.  The
sole exception is cow; because it calls cow_co_is_allocated from the read
code, we keep that function and make cow_co_get_block_status a wrapper.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini 11212d8fa0 block: make bdrv_has_zero_init return false for copy-on-write-images
This helps implementing is_allocated on top of get_block_status.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini d663640c04 block: expect errors from bdrv_co_is_allocated
Some bdrv_is_allocated callers do not expect errors, but the fallback
in qcow2.c might make other callers trip on assertion failures or
infinite loops.

Fix the callers to always look for errors.

Cc: qemu-stable@nongnu.org
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini 4f5786376e block: remove bdrv_is_allocated_above/bdrv_co_is_allocated_above distinction
Now that bdrv_is_allocated detects coroutine context, the two can
use the same code.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:09 +02:00
Paolo Bonzini 617ccb466e block: do not use ->total_sectors in bdrv_co_is_allocated
This is more robust when the device has removable media.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Paolo Bonzini bdad13b9de block: make bdrv_co_is_allocated static
bdrv_is_allocated can detect coroutine context and go through a fast
path, similar to other block layer functions.

Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Paolo Bonzini df2a6f29a5 block: keep bs->total_sectors up to date even for growable block devices
If a BlockDriverState is growable, after every write we need to
check if bs->total_sectors might have changed.  With this change,
bdrv_getlength does not need anymore a system call.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Fam Zheng 4f6fd3491c block: make bdrv_delete() static
Manage BlockDriverState lifecycle with refcnt, so bdrv_delete() is no
longer public and should be called by bdrv_unref() if refcnt is
decreased to 0.

This is an identical change because effectively, there's no multiple
reference of BDS now: no caller of bdrv_ref() yet, only bdrv_new() sets
bs->refcnt to 1, so all bdrv_unref() now actually delete the BDS.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Fam Zheng 9fcb025146 block: implement reference count for BlockDriverState
Introduce bdrv_ref/bdrv_unref to manage the lifecycle of
BlockDriverState. They are unused for now but will used to replace
bdrv_delete() later.

Signed-off-by: Fam Zheng <famz@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:08 +02:00
Benoît Canet cc0681c454 block: Enable the new throttling code in the block layer.
Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-09-06 15:25:07 +02:00
Kevin Wolf 09da4a7292 block: Remove redundant assertion
The failing condition is checked immediately before the assertion, so
keeping the assertion is kind of redundant.

Signed-off-by: Kevin Wolf <kwolf@redhat.com>
2013-08-30 15:28:52 +02:00
Alex Bligh bc72ad6754 aio / timers: Switch entire codebase to the new timer API
This is an autogenerated patch using scripts/switch-timer-api.

Switch the entire code base to using the new timer API.

Note this patch may introduce some line length issues.

Signed-off-by: Alex Bligh <alex@alex.org.uk>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 19:14:24 +02:00
MORITA Kazutaka 893a8f6220 block: Produce zeros when protocols reading beyond end of file
While Asias is debugging an issue creating qcow2 images on top of
non-file protocols.  It boils down to this example using NBD:

$ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'

Notice the open -g option to set bs->growable.  This means you can
read/write beyond end of file.  Reading beyond end of file is supposed
to produce zeroes.

We rely on this behavior in qcow2_create2() during qcow2 image
creation.  We create a new file and then write the qcow2 header
structure using bdrv_pwrite().  Since QCowHeader is not a multiple of
sector size, block.c first uses bdrv_read() on the empty file to fetch
the first sector (should be all zeroes).

Here is the output from the qemu-io NBD example above:

$ qemu-io -c 'open -g nbd+unix:///?socket=/tmp/nbd.sock' -c 'read -v 0 512'
00000000:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
00000010:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
00000020:  ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab ab  ................
...

We are not zeroing the buffer!  As a result qcow2 image creation on top
of protocols is not guaranteed to work even when file creation is
supported by the protocol.

[Adapted this patch to use bs->zero_beyond_eof.
-- Stefan]

Signed-off-by: MORITA Kazutaka <morita.kazutaka@lab.ntt.co.jp>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 14:14:56 +02:00
Asias He 0d51b4debe block: Introduce bs->zero_beyond_eof
In 4146b46c42e0989cb5842e04d88ab6ccb1713a48 (block: Produce zeros when
protocols reading beyond end of file), we break qemu-iotests ./check
-qcow2 022. This happens because qcow2 temporarily sets ->growable = 1
for vmstate accesses (which are stored beyond the end of regular image
data).

We introduce the bs->zero_beyond_eof to allow qcow2_load_vmstate() to
disable ->zero_beyond_eof temporarily in addition to enable ->growable.

[Since the broken patch "block: Produce zeros when protocols reading
beyond end of file" has not been merged yet, I have applied this fix
*first* and will then apply the next patch to keep the tree bisectable.
-- Stefan]

Suggested-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Asias He <asias@redhat.com>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-22 14:10:21 +02:00
Stefan Hajnoczi 88266f5aa7 block: stop relying on io_flush() in bdrv_drain_all()
If a block driver has no file descriptors to monitor but there are still
active requests, it can return 1 from .io_flush().  This is used to spin
during synchronous I/O.

Stop relying on .io_flush() and instead check
QLIST_EMPTY(&bs->tracked_requests) to decide whether there are active
requests.

This is the first step in removing .io_flush() so that event loops no
longer need to have the concept of synchronous I/O.  Eventually we may
be able to kill synchronous I/O completely by running everything in a
coroutine, but that is future work.

Note this patch moves bs->throttled_reqs initialization to bdrv_new() so
that bdrv_requests_pending(bs) can safely access it.  In practice bs is
g_malloc0() so the memory is already zeroed but it's safer to initialize
the queue properly.

We also need to fix up block/stream.c:close_unused_images() to prevent
traversing a dangling pointer while it rearranges the backing file
chain.  This is necessary since the new bdrv_drain_all() traverses the
backing file chain.

Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:45:34 +02:00
Stefan Hajnoczi e1b5c52e04 block: ensure bdrv_drain_all() works during bdrv_delete()
In bdrv_delete() make sure to call bdrv_make_anon() *after* bdrv_close()
so that the device is still seen by bdrv_drain_all() when iterating
bdrv_states.

Cc: qemu-stable@nongnu.org
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-08-19 15:45:34 +02:00
Benoît Canet b681a1c73e block: Repair the throttling code.
The throttling code was segfaulting since commit
02ffb50448 because some qemu_co_queue_next caller
does not run in a coroutine.
qemu_co_queue_do_restart assume that the caller is a coroutinne.
As suggested by Stefan fix this by entering the coroutine directly.
Also make sure like suggested that qemu_co_queue_next() and
qemu_co_queue_restart_all() can be called only in coroutines.

Signed-off-by: Benoit Canet <benoit@irqsave.net>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
2013-07-29 17:07:37 +02:00