mirror_qemu

Commit Graph

Author	SHA1	Message	Date
Eric Blake	465fe887cc	block: Honor BDRV_REQ_FUA during write_zeroes The block layer has a couple of cases where it can lose Force Unit Access semantics when writing a large block of zeroes, such that the request returns before the zeroes have been guaranteed to land on underlying media. SCSI does not support FUA during WRITESAME(10/16); FUA is only supported if it falls back to WRITE(10/16). But where the underlying device is new enough to not need a fallback, it means that any upper layer request with FUA semantics was silently ignoring BDRV_REQ_FUA. Conversely, NBD has situations where it can support FUA but not ZERO_WRITE; when that happens, the generic block layer fallback to bdrv_driver_pwritev() (or the older bdrv_co_writev() in qemu 2.6) was losing the FUA flag. The problem of losing flags unrelated to ZERO_WRITE has been latent in bdrv_co_do_write_zeroes() since commit `aa7bfbff`, but back then, it did not matter because there was no FUA flag. It became observable when commit `93f5e6d8` paved the way for flags that can impact correctness, when we should have been using bdrv_co_writev_flags() with modified flags. Compare to commit `9eeb6dd`, which got flag manipulation right in bdrv_co_do_zero_pwritev(). Symptoms: I tested with qemu-io with default writethrough cache (which is supposed to use FUA semantics on every write), and targetted an NBD client connected to a server that intentionally did not advertise NBD_FLAG_SEND_FUA. When doing 'write 0 512', the NBD client sent two operations (NBD_CMD_WRITE then NBD_CMD_FLUSH) to get the fallback FUA semantics; but when doing 'write -z 0 512', the NBD client sent only NBD_CMD_WRITE. The fix is do to a cleanup bdrv_co_flush() at the end of the operation if any step in the middle relied on a BDS that does not natively support FUA for that step (note that we don't need to flush after every operation, if the operation is broken into chunks based on bounce-buffer sizing). Each BDS gains a new flag .supported_zero_flags, which parallels the use of .supported_write_flags but only when accessing a zero write operation (the flags MUST be different, because of SCSI having different semantics based on WRITE vs. WRITESAME; and also because BDRV_REQ_MAY_UNMAP only makes sense on zero writes). Also fix some documentation to describe -ENOTSUP semantics, particularly since iscsi depends on those semantics. Down the road, we may want to add a driver where its .bdrv_co_pwritev() honors all three of BDRV_REQ_FUA, BDRV_REQ_ZERO_WRITE, and BDRV_REQ_MAY_UNMAP, and advertise this via bs->supported_write_flags for blocks opened by that driver; such a driver should NOT supply .bdrv_co_write_zeroes nor .supported_zero_flags. But none of the drivers touched in this patch want to do that (the act of writing zeroes is different enough from normal writes to deserve a second callback). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Eric Blake	4df863f336	block: Make supported_write_flags a per-bds property Pre-patch, .supported_write_flags lives at the driver level, which means we are blindly declaring that all block devices using a given driver will either equally support FUA, or that we need a fallback at the block layer. But there are drivers where FUA support is a per-block decision: the NBD block driver is dependent on the remote server advertising NBD_FLAG_SEND_FUA (and has fallback code to duplicate the flush that the block layer would do if NBD had not set .supported_write_flags); and the iscsi block driver is dependent on the mode sense bits advertised by the underlying device (and is currently silently ignoring FUA requests if the underlying device does not support FUA). The fix is to make supported flags as a per-BDS option, set during .bdrv_open(). This patch moves the variable and fixes NBD and iscsi to set it only conditionally; later patches will then further simplify the NBD driver to quit duplicating work done at the block layer, as well as tackle the fact that SCSI does not support FUA semantics on WRITESAME(10/16) but only on WRITE(10/16). Signed-off-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Denis V. Lunev	2928abce6d	qcow2: improve qcow2_co_write_zeroes() There is a possibility that qcow2_co_write_zeroes() will be called with the partial block. This could be synthetically triggered with qemu-io -c "write -z 32k 4k" and can happen in the real life in qemu-nbd. The latter happens under the following conditions: (1) qemu-nbd is started with --detect-zeroes=on and is connected to the kernel NBD client (2) third party program opens kernel NBD device with O_DIRECT (3) third party program performs write operation with memory buffer not aligned to the page In this case qcow2_co_write_zeroes() is unable to perform the operation and mark entire cluster as zeroed and returns ENOTSUP. Thus the caller switches to non-optimized version and writes real zeroes to the disk. The patch creates a shortcut. If the block is read as zeroes, f.e. if it is unallocated, the request is extended to cover full block. User-visible situation with this block is not changed. Before the patch the block is filled in the image with real zeroes. After that patch the block is marked as zeroed in metadata. Thus any subsequent changes in backing store chain are not affected. Kevin, thank you for a cool suggestion. Signed-off-by: Denis V. Lunev <den@openvz.org> Reviewed-by: Roman Kagan <rkagan@virtuozzo.com> CC: Kevin Wolf <kwolf@redhat.com> CC: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Eric Blake	7b1deac84e	block: Kill unused sector-based blk_* functions Now that there are no remaining clients, we can drop the sector-based blk_read(), blk_write(), blk_aio_readv(), and blk_aio_writev(). Sadly, there are still remaining sector-based interfaces, such as blk_*discard(), or blk_write_compressed(); those will have to wait for another day. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:09 +02:00
Eric Blake	60cb2fa7eb	block: Introduce byte-based aio read/write blk_aio_readv() and blk_aio_writev() are annoying in that they can't access sub-sector granularity, and cannot pass flags. Also, they require the caller to pass redundant information about the size of the I/O (qiov->size in bytes must match nb_sectors in sectors). Add new blk_aio_preadv() and blk_aio_pwritev() functions to fix the flaws. The next few patches will upgrade callers, then finally delete the old interfaces. Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Eric Blake	983a160050	block: Switch blk_*write_zeroes() to byte interface Sector-based blk_write() should die; convert the one-off variant blk_write_zeroes() to use an offset/count interface instead. Likewise for blk_co_write_zeroes() and blk_aio_write_zeroes(). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Eric Blake	b7d17f9fa4	block: Switch blk_read_unthrottled() to byte interface Sector-based blk_read() should die; convert the one-off variant blk_read_unthrottled(). Signed-off-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Eric Blake	8341f00dc2	block: Allow BDRV_REQ_FUA through blk_pwrite() We have several block drivers that understand BDRV_REQ_FUA, and emulate it in the block layer for the rest by a full flush. But without a way to actually request BDRV_REQ_FUA during a pass-through blk_pwrite(), FUA-aware block drivers like NBD are forced to repeat the emulation logic of a full flush regardless of whether the backend they are writing to could do it more efficiently. This patch just wires up a flags argument; followup patches will actually make use of it in the NBD driver and in qemu-io. Signed-off-by: Eric Blake <eblake@redhat.com> Acked-by: Denis V. Lunev <den@openvz.org> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Janne Karhunen	f249924e96	Allow users to specify the vmdk virtual hardware version. Vmdk images have metadata to indicate the vmware virtual hardware version image was created/tested to run with. Allow users to specify that version via new 'hwversion' option. [ kwolf: Adjust qemu-iotests common.filter ] Signed-off-by: Janne Karhunen <Janne.Karhunen@gmail.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Zhou Jie	ed79f37d9b	block: always compile-check debug prints Files with conditional debug statements should ensure that the printf is always compiled. This prevents bitrot of the format string of the debug statement. And switch debug output to stderr. Signed-off-by: Zhou Jie <zhoujie2011@cn.fujitsu.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	e3ddef25e9	block: Remove BlockDriver.bdrv_read/write There are no block drivers left that implement the old .bdrv_read/write interface, so it can be removed now. This gets us rid of the corresponding emulation functions, too. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	4575eb496d	vvfat: Implement .bdrv_co_preadv/pwritev interfaces This doesn't really convert any of the actual vvfat logic to use vectored I/O (and it's doubtful whether that would make sense), but instead just adapts the wrappers to the modern interface. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	513b0f026b	vpc: Implement .bdrv_co_pwritev() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	d46b7cc680	vpc: Implement .bdrv_co_preadv() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	37b1d7d8c9	vmdk: Implement .bdrv_co_pwritev() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	f10cc24359	vmdk: Implement .bdrv_co_preadv() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	a844a2b0d4	vmdk: Add vmdk_find_offset_in_cluster() This is a byte granularity version of vmdk_find_index_in_cluster(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	fde9d56f5b	vdi: Implement .bdrv_co_pwritev() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	0865bb6f04	vdi: Implement .bdrv_co_preadv() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	3edf1e73d5	dmg: Implement .bdrv_co_preadv() interface This implements .bdrv_co_preadv() for the cloop block driver. While updating the error paths, change -1 to a valid -errno code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	5cd230819e	cloop: Implement .bdrv_co_preadv() interface This implements .bdrv_co_preadv() for the cloop block driver. While updating the error paths, change -1 to a valid -errno code. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	3b8fd33011	bochs: Implement .bdrv_co_preadv() interface Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	3fb06697ae	block: Introduce .bdrv_co_preadv/pwritev BlockDriver function Many parts of the block layer are already byte granularity. The block driver interface, however, was still missing an interface that allows making use of this. This patch introduces a new BlockDriver interface, which is based on coroutines, vectored, has flags and uses a byte granularity. This is now the preferred interface for new drivers. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	cab3a3563c	block: Rename bdrv_co_do_preadv/writev to bdrv_co_preadv/writev It used to be an internal helper function just for implementing bdrv_co_do_readv/writev(), but now that it's a public interface, it deserves a name without "do" in it. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:08 +02:00
Kevin Wolf	0884447382	block: Support AIO drivers in bdrv_driver_preadv/pwritev() Instead of registering emulation functions as .bdrv_co_writev, just directly check whether the function is there or not, and use the AIO interface if it isn't. This makes the read/write functions more consistent with how things are done in other places (flush, discard, etc.) Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:07 +02:00
Kevin Wolf	78a07294d5	block: Introduce bdrv_driver_pwritev() This is a function that simply calls into the block driver for doing a write, providing the byte granularity interface we want to eventually have everywhere, and using whatever interface that driver supports. This one is a bit more interesting than the version for reads: It adds support for .bdrv_co_writev_flags() everywhere, so that drivers implementing this function can drop .bdrv_co_writev() now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:07 +02:00
Kevin Wolf	166fe96051	block: Introduce bdrv_driver_preadv() This is a function that simply calls into the block driver for doing a read, providing the byte granularity interface we want to eventually have everywhere, and using whatever interface that driver supports. For now, this is just a wrapper for calling bs->drv->bdrv_co_readv(). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	dd7f7ed104	linux-aio: make it more type safe Replace void* with an opaque LinuxAioState type. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	6b98bd6495	block: plug whole tree at once, introduce bdrv_io_unplugged_begin/end Extract the handling of io_plug "depth" from linux-aio.c and let the main bdrv_drain loop do nothing but wait on I/O. Like the two newly introduced functions, bdrv_io_plug and bdrv_io_unplug now operate on all children. The visit order is now symmetrical between plug and unplug, making it possible for formats to implement plug/unplug. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	ce0f141259	block: introduce bdrv_no_throttling_begin/end Extract the handling of throttling from bdrv_flush_io_queue. These new functions will soon become BdrvChildRole callbacks, as they can be generalized to "beginning of drain" and "end of drain". Reviewed-by: Alberto Garcia <berto@igalia.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	b6e84c97ed	block: extract bdrv_drain_poll/bdrv_co_yield_to_drain from bdrv_drain/bdrv_co_drain Do not call bdrv_drain_recurse twice in bdrv_co_drain. A small tweak to the logic in Fam's patch, which is harmless since no one implements bdrv_drain anyway. But better get it right. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	a72f641407	block: move restarting of throttled reqs to block/throttle-groups.c We want to remove throttled_reqs from block/io.c. This is the easy part---hide the handling of throttled_reqs during disable/enable of throttling within throttle-groups.c. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Paolo Bonzini	733bbc8cea	block: make bdrv_start_throttled_reqs return void The return value is unused and I am not sure why it would be useful. Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Kevin Wolf	90c78624f1	block: Don't disable I/O throttling on sync requests We had to disable I/O throttling with synchronous requests because we didn't use to run timers in nested event loops when the code was introduced. This isn't true any more, and throttling works just fine even when using the synchronous API. The removed code is in fact dead code since commit `a8823a3b` ('block: Use blk_co_pwritev() for blk_write()') because I/O throttling can only be set on the top layer, but BlockBackend always uses the coroutine interface now instead of using the sync API emulation in block.c. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Message-Id: <1458660792-3035-2-git-send-email-kwolf@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Acked-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-05-12 15:22:07 +02:00
Eric Blake	15c2f669e3	qapi: Split visit_end_struct() into pieces As mentioned in previous patches, we want to call visit_end_struct() functions unconditionally, so that visitors can release resources tied up since the matching visit_start_struct() without also having to worry about error priority if more than one error occurs. Even though error_propagate() can be safely used to ignore a second error during cleanup caused by a first error, it is simpler if the cleanup cannot set an error. So, split out the error checking portion (basically, input visitors checking for unvisited keys) into a new function visit_check_struct(), which can be safely skipped if any earlier errors are encountered, and leave the cleanup portion (which never fails, but must be called unconditionally if visit_start_struct() succeeded) in visit_end_struct(). Generated code in qapi-visit.c has diffs resembling: \|@@ -59,10 +59,12 @@ void visit_type_ACPIOSTInfo(Visitor *v, \| goto out_obj; \| } \| visit_type_ACPIOSTInfo_members(v, obj, &err); \|- error_propagate(errp, err); \|- err = NULL; \|+ if (err) { \|+ goto out_obj; \|+ } \|+ visit_check_struct(v, &err); \| out_obj: \|- visit_end_struct(v, &err); \|+ visit_end_struct(v); \| out: and in qapi-event.c: @@ -47,7 +47,10 @@ void qapi_event_send_acpi_device_ost(ACP \| goto out; \| } \| visit_type_q_obj_ACPI_DEVICE_OST_arg_members(v, &param, &err); \|- visit_end_struct(v, err ? NULL : &err); \|+ if (!err) { \|+ visit_check_struct(v, &err); \|+ } \|+ visit_end_struct(v); \| if (err) { \| goto out; Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1461879932-9020-20-git-send-email-eblake@redhat.com> [Conflict with a doc fixup resolved] Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-05-12 09:47:55 +02:00
Kevin Wolf	d208c50d9d	vvfat: Fix default volume label Commit `d5941dd` documented that it leaves the default volume name as it was ("QEMU VVFAT"), but it doesn't actually implement this. You get an empty name (eleven space characters) instead. This fixes the implementation to apply the advertised default. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-04-29 11:14:13 +02:00
Kevin Wolf	ebb72c9f06	vvfat: Fix volume name assertion Commit `d5941dd` made the volume name configurable, but it didn't consider that the rw code compares the volume name string to assert that the first directory entry is the volume name. This made vvfat crash in rw mode. This fixes the assertion to compare with the configured volume name instead of a literal string. Cc: qemu-stable@nongnu.org Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-04-29 11:14:08 +02:00
Fam Zheng	ab27c3b5e7	mirror: Workaround for unexpected iohandler events during completion Commit `5a7e7a0ba` moved mirror_exit to a BH handler but didn't add any protection against new requests that could sneak in just before the BH is dispatched. For example (assuming a code base at that commit): main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch aio_dispatch ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop qemu_iohandler_poll virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite main_loop_wait # 2 <snip> aio_dispatch aio_bh_poll (c) mirror_exit At (a) we know the BDS has no pending request. However, the same main_loop_wait call is going to dispatch iohandlers (EventNotifier events), which may lead to a new I/O from guest. So the invariant is already broken at (c). Data loss. Commit `f3926945c8` made iohandler to use aio API. The order of virtio_queue_host_notifier_read and block_job_defer_to_main_loop within a main_loop_wait becomes unpredictable, and even worse, if the host notifier event arrives at the next main_loop_wait call, the unpredictable order between mirror_exit and virtio_queue_host_notifier_read is also a trouble. As shown below, this commit made the bug easier to trigger: - Bug case 1: main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch (qemu_aio_context) ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop aio_ctx_dispatch (iohandler_ctx) virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite main_loop_wait # 2 ... aio_dispatch aio_bh_poll (c) mirror_exit - Bug case 2: main_loop_wait # 1 os_host_main_loop_wait g_main_context_dispatch aio_ctx_dispatch (qemu_aio_context) ... mirror_run bdrv_drain (a) block_job_defer_to_main_loop main_loop_wait # 2 ... aio_ctx_dispatch (iohandler_ctx) virtio_queue_host_notifier_read ... virtio_submit_multiwrite (b) blk_aio_multiwrite aio_dispatch aio_bh_poll (c) mirror_exit In both cases, (b) breaks the invariant wanted by (a) and (c). Until then, the request loss has been silent. Later, `3f09bfbc7b` added asserts at (c) to check the invariant (in bdrv_replace_in_backing_chain), and Max reported an assertion failure first visible there, by doing active committing while the guest is running bonnie++. 2.5 added bdrv_drained_begin at (a) to protect the dataplane case from similar problems, but we never realize the main loop bug until now. As a bandage, this patch disables iohandler's external events temporarily together with bs->ctx. Launchpad Bug: 1570134 Cc: qemu-stable@nongnu.org Signed-off-by: Fam Zheng <famz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-22 16:44:09 +02:00
Fam Zheng	4150ae60eb	mirror: Don't extend the last sub-chunk The last sub-chunk is rounded up to the copy granularity in the target image, resulting in a larger size than the source. Add a function to clip the copied sectors to the end. This undoes the "wrong" changes to tests/qemu-iotests/109.out in `e5b43573e2`. The remaining two offset changes are okay. [ kwolf: Use DIV_ROUND_UP to calculate nb_chunks now ] Reported-by: Kevin Wolf <kwolf@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Jeff Cody <jcody@redhat.com>	2016-04-20 16:52:55 +02:00
Max Reitz	f27a274259	block/mirror: Refresh stale bitmap iterator cache If the drive's dirty bitmap is dirtied while the mirror operation is running, the cache of the iterator used by the mirror code may become stale and not contain all dirty bits. This only becomes an issue if we are looking for contiguously dirty chunks on the drive. In that case, we can easily detect the discrepancy and just refresh the iterator if one occurs. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-20 16:52:55 +02:00
Max Reitz	9c83625bdd	block/mirror: Revive dead yielding code mirror_iteration() is supposed to wait if the current chunk is subject to a still in-flight mirroring operation. However, it mixed checking this conflict situation with checking the dirty status of a chunk. A simplification for the latter condition (the first chunk encountered is always dirty) led to neglecting the former: We just skip the first chunk and thus never test whether it conflicts with an in-flight operation. To fix this, pull out the code which waits for in-flight operations on the first chunk of the range to be mirrored to settle. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-20 16:52:55 +02:00
Jeff Cody	d85fa9eb87	block/gluster: prevent data loss after i/o error Upon receiving an I/O error after an fsync, by default gluster will dump its cache. However, QEMU will retry the fsync, which is especially useful when encountering errors such as ENOSPC when using the werror=stop option. When using caching with gluster, however, the last written data will be lost upon encountering ENOSPC. Using the write-behind-cache xlator option of 'resync-failed-syncs-after-fsync' should cause gluster to retain the cached data after a failed fsync, so that ENOSPC and other transient errors are recoverable. Unfortunately, we have no way of knowing if the 'resync-failed-syncs-after-fsync' xlator option is supported, so for now close the fd and set the BDS driver to NULL upon fsync error. Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-04-19 12:24:59 -04:00
Jeff Cody	5d4343e6c2	block/gluster: code movement of qemu_gluster_close() Move qemu_gluster_close() further up in the file, in preparation for the next patch, to avoid a forward declaration. Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-04-19 12:24:59 -04:00
Jeff Cody	a882745356	block/gluster: return correct error value Upon error, gluster will call the aio callback function with a ret value of -1, with errno set to the proper error value. If we set the acb->ret value to the return value in the callback, that results in every error being EPERM (i.e. 1). Instead, set it to the proper error result. Reviewed-by: Niels de Vos <ndevos@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-04-19 12:24:59 -04:00
Kevin Wolf	16aaf975ee	block: Don't ignore flags in blk_{,co,aio}_write_zeroes() Commit `57d6a428` neglected to pass the given flags to blk_aio_prwv(), which broke discard by WRITE SAME for scsi-disk (the UNMAP bit would be ignored). Commit `fc1453cd` introduced the same bug for blk_write_zeroes(). This is used for 'qemu-img convert' without has_zero_init (e.g. on a block device) and for preallocation=falloc in parallels. Commit `8896e088` is the version for blk_co_write_zeroes(). This function is only used in qemu-io. Reported-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-04-15 17:22:12 +02:00
Jeff Cody	9c057d0b68	block/vpc: update comments to be compliant w/coding guidelines Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-15 17:22:12 +02:00
Jeff Cody	32f6439cf7	block/vpc: set errp in vpc_open Add more useful error information to failure paths in vpc_open Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-15 17:22:12 +02:00
Jeff Cody	66176fc6a7	block/vpc: make checks on max table size a bit more lax The check on the max_table_size field not being larger than required is valid, and in accordance with the VHD spec. However, there have been VHD images encountered in the wild that have an out-of-spec max table size that is technically too large. There is no issue in allowing this larger table size, as we also later verify that the computed size (used for the pagetable) is large enough to fit all sectors. In addition, max_table_entries is bounds checked against SIZE_MAX and INT_MAX. Remove the strict check, so that we can accomodate these sorts of images that are benignly out of spec. Reported-by: Stefan Hajnoczi <stefanha@redhat.com> Reported-by: Grant Wu <grantwwu@gmail.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-15 17:22:12 +02:00
Jeff Cody	c23fb11bbb	block/vpc: Use the correct max sector count for VHD images The old VHD_MAX_SECTORS value is incorrect, and is a throwback to the CHS calculations. The VHD specification allows images up to 2040 GiB, which (using 512 byte sectors) corresponds to a maximum number of sectors of 0xff000000, rather than the old value of 0xfe0001ff. Update VHD_MAX_SECTORS to reflect the correct value. Also, update comment references to the actual size limit, and correct one compare so that we can have sizes up to the limit. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-15 17:22:12 +02:00
Jeff Cody	bab246db1d	block/vpc: use current_size field for XenConverter VHD images XenConverter VHD images are another VHD image where current_size is different from the CHS values in the the format header. Use current_size as the default, by looking at the creator_app signature field. Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-15 17:22:12 +02:00
Stefan Hajnoczi	9bdfb9e8ac	vpc: use current_size field for XenServer VHD images The vpc driver has two methods of determining virtual disk size. The correct one to use depends on the software that generated the image file. Add the XenServer creator_app signature so that image size is correctly detected for those images. Reported-by: Grant Wu <grantwwu@gmail.com> Reported-by: Spencer Baugh <sbaugh@catern.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-15 17:22:12 +02:00
Jeff Cody	0211b9becc	block/vpc: set errp in vpc_create Add more useful error information to failure paths in vpc_create(). Signed-off-by: Jeff Cody <jcody@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-15 17:22:11 +02:00
Kevin Wolf	7fa84cd8d4	block: Fix blk_aio_write_zeroes() Commit `57d6a428` broke blk_aio_write_zeroes() because in some write functions in the call path don't have an explicit length argument but reuse qiov->size instead. Which is great, except that write_zeroes doesn't have a qiov, which this commit interprets as 0 bytes. Consequently, blk_aio_write_zeroes() didn't effectively do anything. This patch introduces an explicit acb->bytes in BlkAioEmAIOCB and uses that instead of acb->rwco.size. The synchronous version of the function is okay because it does pass a qiov (with the right size and a NULL pointer as its base). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-04-15 17:22:11 +02:00
Max Reitz	4e876bcf2b	qcow2: Prevent backing file names longer than 1023 We reject backing file names with a length of more than 1023 characters when opening a qcow2 file, so we should not produce such files ourselves. Cc: qemu-stable@nongnu.org Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-12 18:06:51 +02:00
Paolo Bonzini	40a99aace3	vpc: fix return value check for blk_pwrite bdrv_pwrite_sync used to return zero or negative error, while blk_pwrite returns the number of written bytes when successful. This caused VPC image creation to fail spectacularly: it wrote the first 512 bytes, and then exited immediately because of the non-zero answer from blk_pwrite. But the truly spectacular part is that it returns a positive value (the 512 that blk_pwrite returned) causing everyone to believe that it succeeded. This fixes qemu-iotests with vpc format. Fixes: `b8f45cdf78` Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-04-12 18:06:51 +02:00
Fam Zheng	39bf92dd70	mirror: Replace bdrv_drain(bs) with bdrv_co_drain(bs) Suggested-by: Stefan Hajnoczi <stefanha@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1459855253-5378-3-git-send-email-famz@redhat.com Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-04-11 16:59:09 +01:00
Fam Zheng	a77fd4bb29	block: Fix bdrv_drain in coroutine Using the nested aio_poll() in coroutine is a bad idea. This patch replaces the aio_poll loop in bdrv_drain with a BH, if called in coroutine. For example, the bdrv_drain() in mirror.c can hang when a guest issued request is pending on it in qemu_co_mutex_lock(). Mirror coroutine in this case has just finished a request, and the block job is about to complete. It calls bdrv_drain() which waits for the other coroutine to complete. The other coroutine is a scsi-disk request. The deadlock happens when the latter is in turn pending on the former to yield/terminate, in qemu_co_mutex_lock(). The state flow is as below (assuming a qcow2 image): mirror coroutine scsi-disk coroutine ------------------------------------------------------------- do last write qcow2:qemu_co_mutex_lock() ... scsi disk read tracked request begin qcow2:qemu_co_mutex_lock.enter qcow2:qemu_co_mutex_unlock() bdrv_drain while (has tracked request) aio_poll() In the scsi-disk coroutine, the qemu_co_mutex_lock() will never return because the mirror coroutine is blocked in the aio_poll(blocking=true). With this patch, the added qemu_coroutine_yield() allows the scsi-disk coroutine to make progress as expected: mirror coroutine scsi-disk coroutine ------------------------------------------------------------- do last write qcow2:qemu_co_mutex_lock() ... scsi disk read tracked request begin qcow2:qemu_co_mutex_lock.enter qcow2:qemu_co_mutex_unlock() bdrv_drain.enter > schedule BH > qemu_coroutine_yield() > qcow2:qemu_co_mutex_lock.return > ... tracked request end ... (resumed from BH callback) bdrv_drain.return ... Reported-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Message-id: 1459855253-5378-2-git-send-email-famz@redhat.com Suggested-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Fam Zheng <famz@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>	2016-04-11 16:59:09 +01:00
Peter Maydell	31370dbe5d	Block layer patches for 2.6 -----BEGIN PGP SIGNATURE----- Version: GnuPG v2.0.22 (GNU/Linux) iQIcBAABAgAGBQJXA9qJAAoJEH8JsnLIjy/W+W8P/2xTXH8+h0qurUvv5Rz6HUbD HHNlGnqa3M5yMLMqmtlb0J/9dj0pTlGNkp7d/9blh3MlZH/ZpeUMtq8Lro23YhUF 0J94sRKGhK3T5GqYSA/BFbVvXQJ3yX7cKcYaQjmh7rK6Ua+65Mv/dulci+jbfGuu BkiVgumGAalSeaFqXZR685g61ZHbz+mQJnd3VFcvletnPBu0j1GMkuU0THAcy09q CTUjwWlL9CHu1lYkAa0KxgFtj6mZ+gEu5ws5Lvk8yFtSB+af/mJtzoHuq/7+Ske2 7SiVXotFW8kR7ic1TnWiEku8+31FSBVJp6xUcRVTDOHVG7oSQxBDg5bGwPn8TxXy bvLvTJDIFodGhkiDTuGLuttvX+U2xCl4GmBS01OiFF53UGWgjjY+pkDZiaNC4nFW vwItj7/KGKL2Nq6cVfGCDOYYjFtHAPGI3yyJ2babXecv+9nKr0WeJpk6cfKVnP17 rZs28Y3Ub/P2M4oOt4YdhRSanQZbe5eIQOsdfWX4q12hujL0zbsCtV6dpjeTsY74 J4CBLzBYCj6y9Jc8R+D6XLYJBtJQGaSj99Oqe9WdUuHEqTGQt4HMuYHAj77wImVG ccURYiBpmB+FChLhG+yIlo1PHS0kpgeD+ZkZDHC0gYdiFqdKnFeQ7lj3Jj2tSWXY 7Y7qbaOhtXp+20M6oO+G =ax/b -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/kevin/tags/for-upstream' into staging Block layer patches for 2.6 # gpg: Signature made Tue 05 Apr 2016 16:32:25 BST using RSA key ID C88F2FD6 # gpg: Good signature from "Kevin Wolf <kwolf@redhat.com>" * remotes/kevin/tags/for-upstream: crypto: Avoid memory leak on failure qemu-iotests: 149: Use "/usr/bin/env python" block: Forbid I/O throttling on nodes with multiple parents for 2.6 block: forbid x-blockdev-del from acting on DriveInfo Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-04-05 17:03:32 +01:00
Eric Blake	95c3df5a24	crypto: Avoid memory leak on failure Commit `7836857` introduced a memory leak due to invalid use of Error vs. visit_type_end(). If visiting the intermediate members fails, we clear the error and unconditionally use visit_end_struct() on the same error object; but if that cleanup succeeds, we then skip the qapi_free call. Until a later patch adds visit_check_struct(), the only safe approach is to use two separate error objects. Signed-off-by: Eric Blake <eblake@redhat.com> Message-id: 1459526222-30052-1-git-send-email-eblake@redhat.com Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-04-05 17:23:21 +02:00
Eric Blake	a89ef0c357	nbd: don't request FUA on FLUSH The NBD protocol does not clearly document what will happen if a client sends NBD_CMD_FLAG_FUA on NBD_CMD_FLUSH. Historically, both the qemu and upstream NBD servers silently ignored that flag, but that feels a bit risky. Meanwhile, the qemu NBD client unconditionally sends the flag (without even bothering to check whether the caller cares; at least with NBD_CMD_WRITE the client only sends FUA if requested by a higher layer). There is ongoing discussion on the NBD list to fix the protocol documentation to require that the server MUST ignore the flag (unless the kernel folks can better explain what FUA means for a flush), but until those doc improvements land, the current nbd.git master was recently changed to reject the flag with EINVAL (see nbd commit ab22e082), which now makes it impossible for a qemu client to use FLUSH with an upstream NBD server. We should not send FUA with flush unless the upstream protocol documents what it will do, and even then, it should be something that the caller can opt into, rather than being unconditional. Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1459526902-32561-1-git-send-email-eblake@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-04-05 11:46:52 +02:00
Stefan Hajnoczi	0d94b74655	block/nfs: add missing #include "qemu/cutils.h" parse_uint_full() used to be included from qemu-common.h but was moved to qemu/cutils.h in commit `f348b6d1a5` ("util: move declarations out of qemu-common.h"). Cc: Veronia Bahaa <veroniabahaa@gmail.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1459341994-20567-3-git-send-email-stefanha@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-03-30 16:50:39 -04:00
Stefan Hajnoczi	d165b8cb8b	block/nfs: add missing #include "qapi/error.h" error_setg() used to be included indirectly through qemu/osdep.h. Since commit `da34e65cb4` ("include/qemu/osdep.h: Don't include qapi/error.h") it requires an explicit include. Cc: Markus Armbruster <armbru@redhat.com> Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 1459341994-20567-2-git-send-email-stefanha@redhat.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-03-30 16:50:39 -04:00
Max Reitz	a90639270d	block/null-{co,aio}: Implement get_block_status() Signed-off-by: Max Reitz <mreitz@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:16:04 +02:00
Max Reitz	cd219eb1e5	block/null-{co,aio}: Allow reading zeroes This is optional so that it does not impede the null block driver's performance unless this behavior is desired. Signed-off-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Acked-by: Fam Zheng <famz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:16:03 +02:00
Kevin Wolf	09cf9db1bc	block: Remove bdrv_(set_)enable_write_cache() The only remaining users were block jobs (mirror and backup) which unconditionally enabled WCE on the BlockBackend of the target image. As these block jobs don't go through BlockBackend for their I/O requests, they aren't affected by this setting anyway but always get a writeback mode, so that call can be removed. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:03 +02:00
Kevin Wolf	61de4c6808	block: Remove BDRV_O_CACHE_WB The previous patches have successively made blk->enable_write_cache the true source for the information whether a writethrough mode must be implemented. The corresponding BDRV_O_CACHE_WB is only useless baggage we're carrying around, so now's the time to remove it. At the same time, we remove the 'cache.writeback' option parsing on the BDS level as the only effect was setting the BDRV_O_CACHE_WB flag. This change requires test cases that explicitly enabled the option to drop it. Other than that and the change of the error message when writethrough is enabled on the BDS level (from "Can't set writethrough mode" to "doesn't support the option"), there should be no change in behaviour. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:03 +02:00
Kevin Wolf	5481531154	raw: Support BDRV_REQ_FUA Pass through the FUA flag to the lower layer so that the separate flush can be saved in practically relevant cases where a (raw) format driver sits on top of the protocol driver. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	2b556518c3	nbd: Support BDRV_REQ_FUA The NBD server already used to send a FUA flag when the writethrough mode was set. This code was a remnant from the times where protocol drivers actually had to implement writethrough modes. Since nowadays the block layer sends flushes in writethrough mode and non-root nodes are always writeback, this was mostly dead code - only mostly because if NBD was configured to be used without a format, we sent _both_ FUA and an explicit flush afterwards, which makes the code not technically dead, but useless overhead. This patch changes the code so that the block layer's FUA flag is recognised and translated into a NBD FUA flag. The additional flush is avoided now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	9f0eb9e129	iscsi: Support BDRV_REQ_FUA This replaces the existing hack in the iscsi driver that sent the FUA bit in writethrough mode and ignored the following flush in order to optimise the number of roundtrips (see commit `73b5394e`). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	93f5e6d88a	block: Introduce bdrv_co_writev_flags() This function will allow drivers to implement BDRV_REQ_FUA natively instead of sending a separate flush after the write. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	c83f9fba2a	block/qapi: Use blk_enable_write_cache() Now that WCE is handled on the BlockBackend level, the flag is meaningless for BDSes. As the schema requires us to fill the field, we return an enabled write cache for them. Note that this means that querying the BlockBackend name may return writethrough as the cache information, whereas querying the node-name of the root of that same BlockBackend will return writeback. This may appear odd at first, but it actually makes sense because it correctly repesents the layer that implements the WCE handling. This becomes more apparent when you consider nodes that are the root node of multiple BlockBackends, where each BB can have its own WCE setting. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	bfd18d1e0b	block: Move enable_write_cache to BB level Whether a write cache is used or not is a decision that concerns the user (e.g. the guest device) rather than the backend. It was already logically part of the BB level as bdrv_move_feature_fields() always kept it on top of the BDS tree; with this patch, the core of it (the actual flag and the additional flushes) is also implemented there. Direct callers of bdrv_open() must pass BDRV_O_CACHE_WB now if bs doesn't have a BlockBackend attached. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:02 +02:00
Kevin Wolf	855a6a93a1	block: Handle flush error in bdrv_pwrite_sync() We don't want to silently ignore a flush error. Also, there is little point in avoiding the flush for writethrough modes and once WCE is moved to the BB layer, we definitely need the flush here because bdrv_pwrite() won't involve one any more. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:01 +02:00
Kevin Wolf	72e775c7d9	block: Always set writeback mode in blk_new_open() All callers of blk_new_open() either don't rely on the WCE bit set after blk_new_open() because they explicitly set it anyway, or they pass BDRV_O_CACHE_WB unconditionally. This patch changes blk_new_open() so that it always enables writeback mode and asserts that BDRV_O_CACHE_WB is clear. For those callers that used to pass BDRV_O_CACHE_WB unconditionally, the flag is removed now. Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Max Reitz <mreitz@redhat.com>	2016-03-30 12:16:01 +02:00
Pavel Dovgalyuk	63785678f3	replay: introduce block devices record/replay This patch introduces block driver that implement recording and replaying of block devices' operations. All block completion operations are added to the queue. Queue is flushed at checkpoints and information about processed requests is recorded to the log. In replay phase the queue is matched with events read from the log. Therefore block devices requests are processed deterministically. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> [ kwolf: Rebased onto modified and already applied part of the series ] Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:15:57 +02:00
Pavel Dovgalyuk	c32b82afaf	block: add flush callback This patch adds callback for flush request. This callback is responsible for flushing whole block devices stack. bdrv_flush function does not proceed to underlying devices. It should be performed by this callback function, if needed. Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:12:15 +02:00
Daniel P. Berrange	e6ff69bf5e	block: move encryption deprecation warning into qcow code For a couple of releases we have been warning Encrypted images are deprecated Support for them will be removed in a future release. You can use 'qemu-img convert' to convert your image to an unencrypted one. This warning was issued by system emulators, qemu-img, qemu-nbd and qemu-io. Such a broad warning was issued because the original intention was to rip out all the code for dealing with encryption inside the QEMU block layer APIs. The new block encryption framework used for the LUKS driver does not rely on the unloved block layer API for encryption keys, instead using the QOM 'secret' object type. It is thus no longer appropriate to warn about encryption unconditionally. When the qcow/qcow2 drivers are converted to use the new encryption framework too, it will be practical to keep AES-CBC support present for use in qemu-img, qemu-io & qemu-nbd to allow for interoperability with older QEMU versions and liberation of data from existing encrypted qcow2 files. This change moves the warning out of the generic block code and into the qcow/qcow2 drivers. Further, the warning is set to only appear when running the system emulators, since qemu-img, qemu-io, qemu-nbd are expected to support qcow2 encryption long term now that the maint burden has been eliminated. Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:12:15 +02:00
Daniel P. Berrange	78368575a6	block: add generic full disk encryption driver Add a block driver that is capable of supporting any full disk encryption format. This utilizes the previously added block encryption code, and at this time supports the LUKS format. The driver code is capable of supporting any format supported by the QCryptoBlock module, so it registers one block driver for each format. This patch only registers the "luks" driver since the "qcow" driver is there only for back-compatibility with existing qcow built-in encryption. New LUKS compatible volumes can be formatted using qemu-img with defaults for all settings. $ qemu-img create --object secret,data=123456,id=sec0 \ -f luks -o key-secret=sec0 demo.luks 10G Alternatively the cryptographic settings can be explicitly set $ qemu-img create --object secret,data=123456,id=sec0 \ -f luks -o key-secret=sec0,cipher-alg=aes-256,\ cipher-mode=cbc,ivgen-alg=plain64,hash-alg=sha256 \ demo.luks 10G And query its size $ qemu-img info demo.img image: demo.img file format: luks virtual size: 10G (10737418240 bytes) disk size: 132K encrypted: yes Note that it was not necessary to provide the password when querying info for the volume. The password is only required when performing I/O on the volume All volumes created by this new 'luks' driver should be capable of being opened by the kernel dm-crypt driver. The only algorithms listed in the LUKS spec that are not currently supported by this impl are sha512 and ripemd160 hashes and cast6 cipher. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> [ kwolf - Added #include to resolve conflict with `da34e65c` ] Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 12:11:26 +02:00
Daniel P. Berrange	abb06c5ac1	block: add flag to indicate that no I/O will be performed When opening an image it is useful to know whether the caller intends to perform I/O on the image or not. In the case of encrypted images this will allow the block driver to avoid having to prompt for decryption keys when we merely want to query header metadata about the image. eg qemu-img info This flag is enforced at the top level only, since even if we don't want todo I/O on the 'qcow2' file payload, the underlying 'file' driver will still need todo I/O to read the qcow2 header, for example. Reviewed-by: Eric Blake <eblake@redhat.com> Signed-off-by: Daniel P. Berrange <berrange@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Max Reitz	5430215699	block/qapi: Pass bdrv_query_blk_stats() s->stats bdrv_query_blk_stats() does not need access to all of BlockStats, BlockDeviceStats is enough and is what this function is actually supposed to fill. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Max Reitz	0e8f44bee9	block/qapi: Set s->device in bdrv_query_stats() This is the only instance of bdrv_query_blk_stats() accessing anything in the BlockStats structure other than s->stats, so let us move it to its caller (where it makes just as much sense) allowing us to make bdrv_query_blk_stats() take a pointer to the BlockDeviceStats instead of BlockStats. Signed-off-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Peter Xu	5eda622768	block/qapi: fix unbounded stack for dump_qdict Using heap instead of stack for better safety. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Peter Xu	853ccfed8f	block/qapi: make two printf() formats literal Fix two places to use literal printf format when possible. Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com> Reviewed-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Kevin Wolf	72f41b6fbd	block: Remove blk_set_bs() The function is unused since commit `f21d96d0` ('block: Use BdrvChild in BlockBackend'). Signed-off-by: Kevin Wolf <kwolf@redhat.com> Reviewed-by: Eric Blake <eblake@redhat.com>	2016-03-30 11:59:32 +02:00
Programmingkid	d0855f1235	block/raw-posix.c: Make physical devices usable in QEMU under Mac OS X host Mac OS X can be picky when it comes to allowing the user to use physical devices in QEMU. Most mounted volumes appear to be off limits to QEMU. If an issue is detected, a message is displayed showing the user how to unmount a volume. Now QEMU uses both CD and DVD media. Signed-off-by: John Arbuckle <programmingkidx@gmail.com> Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-30 11:59:32 +02:00
Peter Maydell	553934db66	-----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAABAgAGBQJW+dDJAAoJEL2+eyfA3jBXk6oP/R/zX4foUVFMTvDbxHwWc41t gXGk1BpIjFnteab/tzUBDIdgs/DPxzM6bClhe45gNInVBgnOyeVmpUwRGGNYQKbn FdkrAcC6Vy6BJv+xRTMMS+h4i6ebJ6HqqQPwkz0VulxsAknDPQsBebe0tM8uO7k9 G+ccMYOyUUiGTIRC3pBkRCu8APEialPSv3MpUTMtp71R3US+pEwmo1AgyOFq/lDu B/8LUBoR48XCEGfOA6ZixzoMwF1lTWpezx5/KF+fQ26sgnNzjpwYWnJk+LG7Gtvj 8PHYsHDoXSISlIgxzLpS0AA6s54+mutgIeNJG5FBXGrSSNlAB1+cKZsnZw42YjfI BVIHQkmcGT+h9UEDdekiOfQorypSYRm51ueTGO/lUbxNifvJ5LQA97F0G/filoCj ovGIfOwgpWaEBPCb//U1TRGhhTg+dNyCeC4GoxDEFyWmLPYp8p7Xtz+vsZOIdH4O Wl9i6BzzeNEgJyutKqn2qpNLl6Pfd548MOJJqAUkGxDGrCJMkmn2lJSpSSji6cdm y4Az/tPY0/xpxwjSRakaIMOlhDoGXmrQG+I6JG1TZLSH7x1+Ajhr2ryx4CBONceV 1quibAqoG1GwxCyYn7dv4aeJrDlg3XzEWQW6nJhuE91d9ZH+jF5u2+i+IZcQCDBe Cd6d0SZlcOnq3M5LiOrA =1ekF -----END PGP SIGNATURE----- Merge remote-tracking branch 'remotes/cody/tags/block-pull-request' into staging # gpg: Signature made Tue 29 Mar 2016 01:48:09 BST using RSA key ID C0DE3057 # gpg: Good signature from "Jeffrey Cody <jcody@redhat.com>" # gpg: aka "Jeffrey Cody <jeff@codyprime.org>" # gpg: aka "Jeffrey Cody <codyprime@gmail.com>" * remotes/cody/tags/block-pull-request: qemu-iotests: add no-op streaming test qemu-iotests: fix test_stream_partial() block: never cancel a streaming job without running stream_complete() Signed-off-by: Peter Maydell <peter.maydell@linaro.org>	2016-03-29 19:54:49 +01:00
Alberto Garcia	6578629e08	block: never cancel a streaming job without running stream_complete() We need to call stream_complete() in order to do all the necessary clean-ups, even if there's an early failure. At the moment it's only useful to make sure that s->backing_file_str is not leaked, but it will become more important if we introduce support for streaming to any intermediate node. Signed-off-by: Alberto Garcia <berto@igalia.com> Reviewed-by: Max Reitz <mreitz@redhat.com> Reviewed-by: Fam Zheng <famz@redhat.com> Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com> Message-id: 2abedf2debc65c250560237f31a8e6756883c8fc.1458566441.git.berto@igalia.com Signed-off-by: Jeff Cody <jcody@redhat.com>	2016-03-28 13:56:44 -04:00
Veronia Bahaa	f348b6d1a5	util: move declarations out of qemu-common.h Move declarations out of qemu-common.h for functions declared in utils/ files: e.g. include/qemu/path.h for utils/path.c. Move inline functions out of qemu-common.h and into new files (e.g. include/qemu/bcd.h) Signed-off-by: Veronia Bahaa <veroniabahaa@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:17 +01:00
Rutuja Shah	73bcb24d93	Replaced get_tick_per_sec() by NANOSECONDS_PER_SECOND This patch replaces get_ticks_per_sec() calls with the macro NANOSECONDS_PER_SECOND. Also, as there are no callers, get_ticks_per_sec() is then removed. This replacement improves the readability and understandability of code. For example, timer_mod(fdctrl->result_timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + (get_ticks_per_sec() / 50)); NANOSECONDS_PER_SECOND makes it obvious that qemu_clock_get_ns matches the unit of the expression on the right side of the plus. Signed-off-by: Rutuja Shah <rutu.shah.26@gmail.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:17 +01:00
Markus Armbruster	daf015ef5a	include/qemu/iov.h: Don't include qemu-common.h qemu-common.h should only be included by .c files. Its file comment explains why: "No header file should depend on qemu-common.h, as this would easily lead to circular header dependencies." qemu/iov.h includes qemu-common.h for QEMUIOVector stuff. Move all that to qemu/iov.h and drop the ill-advised include. Include qemu/iov.h where the QEMUIOVector stuff is now missing. Signed-off-by: Markus Armbruster <armbru@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:16 +01:00
Markus Armbruster	da34e65cb4	include/qemu/osdep.h: Don't include qapi/error.h Commit `57cb38b` included qapi/error.h into qemu/osdep.h to get the Error typedef. Since then, we've moved to include qemu/osdep.h everywhere. Its file comment explains: "To avoid getting into possible circular include dependencies, this file should not include any other QEMU headers, with the exceptions of config-host.h, compiler.h, os-posix.h and os-win32.h, all of which are doing a similar job to this file and are under similar constraints." qapi/error.h doesn't do a similar job, and it doesn't adhere to similar constraints: it includes qapi-types.h. That's in excess of 100KiB of crap most .c files don't actually need. Add the typedef to qemu/typedefs.h, and include that instead of qapi/error.h. Include qapi/error.h in .c files that need it and don't get it now. Include qapi-types.h in qom/object.h for uint16List. Update scripts/clean-includes accordingly. Update it further to match reality: replace config.h by config-target.h, add sysemu/os-posix.h, sysemu/os-win32.h. Update the list of includes in the qemu/osdep.h comment quoted above similarly. This reduces the number of objects depending on qapi/error.h from "all of them" to less than a third. Unfortunately, the number depending on qapi-types.h shrinks only a little. More work is needed for that one. Signed-off-by: Markus Armbruster <armbru@redhat.com> [Fix compilation without the spice devel packages. - Paolo] Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>	2016-03-22 22:20:15 +01:00
Eric Blake	32bafa8fdd	qapi: Don't special-case simple union wrappers Simple unions were carrying a special case that hid their 'data' QMP member from the resulting C struct, via the hack method QAPISchemaObjectTypeVariant.simple_union_type(). But by using the work we started by unboxing flat union and alternate branches, coupled with the ability to visit the members of an implicit type, we can now expose the simple union's implicit type in qapi-types.h: \| struct q_obj_ImageInfoSpecificQCow2_wrapper { \| ImageInfoSpecificQCow2 data; \| }; \| \| struct q_obj_ImageInfoSpecificVmdk_wrapper { \| ImageInfoSpecificVmdk data; \| }; ... \| struct ImageInfoSpecific { \| ImageInfoSpecificKind type; \| union { /* union tag is @type / \| void data; \|- ImageInfoSpecificQCow2 qcow2; \|- ImageInfoSpecificVmdk vmdk; \|+ q_obj_ImageInfoSpecificQCow2_wrapper qcow2; \|+ q_obj_ImageInfoSpecificVmdk_wrapper vmdk; \| } u; \| }; Doing this removes asymmetry between QAPI's QMP side and its C side (both sides now expose 'data'), and means that the treatment of a simple union as sugar for a flat union is now equivalent in both languages (previously the two approaches used a different layer of dereferencing, where the simple union could be converted to a flat union with equivalent C layout but different {} on the wire, or to an equivalent QMP wire form but with different C representation). Using the implicit type also lets us get rid of the simple_union_type() hack. Of course, now all clients of simple unions have to adjust from using su->u.member to using su->u.member.data; while this touches a number of files in the tree, some earlier cleanup patches helped minimize the change to the initialization of a temporary variable rather than every single member access. The generated qapi-visit.c code is also affected by the layout change: \|@@ -7393,10 +7393,10 @@ void visit_type_ImageInfoSpecific_member \| } \| switch (obj->type) { \| case IMAGE_INFO_SPECIFIC_KIND_QCOW2: \|- visit_type_ImageInfoSpecificQCow2(v, "data", &obj->u.qcow2, &err); \|+ visit_type_q_obj_ImageInfoSpecificQCow2_wrapper_members(v, &obj->u.qcow2, &err); \| break; \| case IMAGE_INFO_SPECIFIC_KIND_VMDK: \|- visit_type_ImageInfoSpecificVmdk(v, "data", &obj->u.vmdk, &err); \|+ visit_type_q_obj_ImageInfoSpecificVmdk_wrapper_members(v, &obj->u.vmdk, &err); \| break; \| default: \| abort(); Signed-off-by: Eric Blake <eblake@redhat.com> Message-Id: <1458254921-17042-13-git-send-email-eblake@redhat.com> Signed-off-by: Markus Armbruster <armbru@redhat.com>	2016-03-18 10:29:26 +01:00
Alberto Garcia	6049490df4	quorum: Emit QUORUM_REPORT_BAD for reads in fifo mode If there's an I/O error in one of Quorum children then QEMU should emit QUORUM_REPORT_BAD. However this is not working with read-pattern=fifo. This patch fixes this problem. Signed-off-by: Alberto Garcia <berto@igalia.com> Message-id: d57e39e8d3e8564003a1e2aadbd29c97286eb2d2.1458034554.git.berto@igalia.com Reviewed-by: Max Reitz <mreitz@redhat.com> Signed-off-by: Max Reitz <mreitz@redhat.com>	2016-03-17 16:43:30 +01:00
Kevin Wolf	8896e08814	block: Use blk_co_pwritev() in blk_co_write_zeroes() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 16:30:00 +01:00
Kevin Wolf	57d6a42883	block: Use blk_aio_prwv() for aio_read/write/write_zeroes Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 16:30:00 +01:00
Kevin Wolf	a55d3fba99	block: Use blk_prw() in blk_pread()/blk_pwrite() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Kevin Wolf	fc1453cdfc	block: Use blk_co_pwritev() in blk_write_zeroes() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Kevin Wolf	5bd5119667	block: Pull up blk_read_unthrottled() implementation Use blk_read(), so that it goes through blk_co_preadv() like all read requests from the BB to the BDS. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Kevin Wolf	a8823a3bfd	block: Use blk_co_pwritev() for blk_write() Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00
Kevin Wolf	1bf1cbc91f	block: Use blk_co_preadv() for blk_read() This patch introduces blk_co_preadv() as a central function on the BlockBackend level that is supposed to handle all read requests from the BB to its root BDS eventually. Signed-off-by: Kevin Wolf <kwolf@redhat.com>	2016-03-17 15:47:57 +01:00

1 2 3 4 5 ...

2449 Commits (07bdaa4196b51bc7ffa7c3f74e9e4a9dc8a7966a)