Compare commits

...

283 Commits

Author SHA1 Message Date
3808fa39eb Add vitastor support 2025-04-11 02:08:11 +03:00
Thomas Lamprecht
e0969989ac bump version to 9.2.0-5
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-04-04 16:16:02 +02:00
Thomas Lamprecht
b30e898392 PVE backup: backup-access api: simplify bitmap logic
See inner patch for details.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-04-04 16:14:24 +02:00
Thomas Lamprecht
053f68a7ac bump version to 9.2.0-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-04-03 17:00:39 +02:00
Thomas Lamprecht
cb662eb2c1 pve backup: implement features for external backup providers
This series of patches extends our existing QEMU backup integration
for PVE such that it can be used for a in-development external-backup
plugin API.

This includes among other things an API for backup access and teardown
as well as dirty-bitmap management. See the separate patches for more
details and check [v7] and [v8] of this series on the list. The former
is the last revision by Fiona which was already very advanced,
Wolfgang picked that up and extended/adapted it slightly to match
needs that he found during implementation of a test provider and from
feedback of (future) external backup provider like Bareos and Veritas.
This was mainly done to get the QEMU build out for broad QA, we can
still change things here as long as the rest of the series is not yet
applied.

[v7]: https://lore.proxmox.com/pve-devel/20250401173435.221892-1-f.ebner@proxmox.com/
[v8]: https://lore.proxmox.com/pve-devel/20250403123118.264974-1-w.bumiller@proxmox.com/

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-04-03 16:48:42 +02:00
Wolfgang Bumiller
a6ee093f1c merge async snapshot improvements
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2025-04-02 17:23:05 +02:00
Thomas Lamprecht
76dd2fb90b bump version to 9.2.0-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-03-26 09:57:04 +01:00
Fiona Ebner
3da02409af revert hpet changes to fix performance regression
As reported in the community forum [0][1], QEMU processes for Linux
guests would consume more CPU on the host after an update to QEMU 9.2.
The issue was reproduced and bisecting pointed to QEMU commit
f0ccf77078 ("hpet: fix and cleanup persistence of interrupt status").

Some quick experimentation suggests that in particular the last part
 is responsible for the issue:
> - the timer must be kept running even if not enabled, in
>   order to set the ISR flag, so writes to HPET_TN_CFG must
>   not call hpet_del_timer()

Users confirmed that setting the hpet=off machine flag works around
the issue[0]. For Windows (7 or later) guests, the flag is already
disabled, because of issues in the past [2].

Upstream suggested reverting the relevant patches for now [3], because
other issues were reported too. All except commit 5895879aca ("hpet:
remove unnecessary variable "index"") are actually dependent on each
other for cleanly reverting f0ccf77078, and while not strictly
required, that one was reverted too for completeness.

[0]: https://forum.proxmox.com/threads/163694/
[1]: https://forum.proxmox.com/threads/161849/post-756793
[2]: https://lists.proxmox.com/pipermail/pve-devel/2012-December/004958.html
[3]: https://lore.kernel.org/qemu-devel/CABgObfaKJ5NFVKmYLFmu4C0iZZLJJtcWksLCzyA0tBoz0koZ4A@mail.gmail.com/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-03-26 08:42:59 +01:00
Thomas Lamprecht
f7be446ebe bump version to 9.2.0-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-02-24 17:34:34 +01:00
Fiona Ebner
dc45786e06 code style: some more coccinelle fixes
Below are the commands that generated the changes along with the
rationale:

command: spatch --in-place scripts/coccinelle/error_propagate_null.cocci pve-backup.c
rationale: error_propagate() already checks for NULL in its second
           argument

command: spatch --in-place scripts/coccinelle/round.cocci vma-reader.c vma-writer.c
rationale: DIV_ROUND_UP() macro is more readable than the expanded
           calculation

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-02-24 17:21:14 +01:00
Fiona Ebner
f6dc6b54ba replicated zfs migration: fix assertion failure with multiple disks
It is necessary to reset the error pointer after error_report_err(),
because that function frees the error. Not doing so can lead to a
use-after-free and in particular error_setg() with the same error
pointer will run into assertion failure, because it asserts that no
previous error is set:

> #5  0x00007c1723674eb2 in __GI___assert_fail (assertion=assertion@entry=0x59132c9fc540 "*errp == NULL",
>     file=file@entry=0x59132c9fc530 "../util/error.c", line=line@entry=68,
>     function=function@entry=0x59132c9fc5f8 <__PRETTY_FUNCTION__.2> "error_setv")
> #6  0x000059132c7d250f in error_setv (errp=0x7c15839fafb8, src=0x59132c9af224 "../block/dirty-bitmap.c", line=182,
>     func=0x59132c9af9b0 <__func__.17> "bdrv_dirty_bitmap_check", err_class=err_class@entry=ERROR_CLASS_GENERIC_ERROR,
>     fmt=fmt@entry=0x59132c9af380 "Bitmap '%s' is currently in use by another operation and cannot be used", ap=0x7c15839fad60,
>     suffix=0x0)
> #7  0x000059132c7d265c in error_setg_internal (errp=errp@entry=0x7c15839fafb8,
>     src=src@entry=0x59132c9af224 "../block/dirty-bitmap.c", line=line@entry=182,
>     func=func@entry=0x59132c9af9b0 <__func__.17> "bdrv_dirty_bitmap_check",
>     fmt=fmt@entry=0x59132c9af380 "Bitmap '%s' is currently in use by another operation and cannot be used")
> #8  0x000059132c68fbc1 in bdrv_dirty_bitmap_check (bitmap=bitmap@entry=0x5913542d6190, flags=flags@entry=7,
>     errp=errp@entry=0x7c15839fafb8)
> #9  0x000059132c3b951d in add_bitmaps_to_list (s=s@entry=0x59132d87ee40 <dbm_state>, bs=bs@entry=0x591352d6b720,
>     bs_name=bs_name@entry=0x591352d69900 "drive-scsi1", alias_map=alias_map@entry=0x0, errp=errp@entry=0x7c15839fafb8)
> #10 0x000059132c3ba23d in init_dirty_bitmap_migration (errp=<optimized out>, s=0x59132d87ee40 <dbm_state>)
> #11 dirty_bitmap_save_setup (f=0x591352ebdd30, opaque=0x59132d87ee40 <dbm_state>, errp=0x7c15839fafb8)
> #12 0x000059132c3d81f0 in qemu_savevm_state_setup (f=0x591352ebdd30, errp=errp@entry=0x7c15839fafb8)

Fix created using the appropriate in-tree coccinelle script:
spatch --in-place scripts/coccinelle/error-use-after-free.cocci migration/block-dirty-bitmap.c

The problematic change exposing the issue was part of 7882afe ("update
submodule and patches to QEMU 9.1.2") adapting to QEMU 9.1, commit
dd03167725 ("migration: Add Error** argument to
add_bitmaps_to_list()"), where the add_bitmaps_to_list() function
gained an error pointer argument, replacing the local error variable
that was used before.

Fixes: 7882afe ("update submodule and patches to QEMU 9.1.2")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-02-24 17:21:14 +01:00
Thomas Lamprecht
4f4fca78f7 bump version to 9.2.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-02-04 08:49:24 +01:00
Fiona Ebner
e247b46563 stable fixes for QEMU 9.2.0
Most notabbly, there now is an upstream workaround for the "Windows
PCI Label bug" [0] and the revert of QEMU commit 44d975ef34 ("x86:
acpi: workaround Windows not handling name references in Package
properly") can be dropped.

Pick up some other fixes already merged in current master, for
emulation as well as x86(_64) KVM, some PCI/USB fixes and a pair of
regression fixes for the net subsystem.

[0]: https://gitlab.com/qemu-project/qemu/-/issues/774

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-02-04 08:37:47 +01:00
Fiona Ebner
670aa8ecdf update submodule and patches to QEMU 9.2.0
Notable changes:

* Commit 07bea2d35f ("block-backend: Remove deadcode") removed
  blk_op_{,un}block_all() which was used by PVE async savevm code.
  Fixed by switching to using bdrv_op_{,un}block_all().

* Drop patches that are already part of upstream.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-02-04 08:37:47 +01:00
Thomas Lamprecht
dfac8d424d bump version to 9.1.2-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-01-24 16:12:38 +01:00
Fiona Ebner
b7ad4dc467 vma: allow specifying disk formats for create operation
Previously, the format would be auto-detected which can lead to
confusing raw images as a different format and being unable to detect
corrupted images of non-raw formats.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-01-24 16:11:57 +01:00
Fiona Ebner
1d777d7fe6 async snapshot: explicitly specify raw format when loading the VM state file
Proxmox VE always writes state files in raw format.

Suggested-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-01-24 16:11:57 +01:00
Thomas Lamprecht
6390972c7b bump version to 9.1.2-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2025-01-17 16:34:12 +01:00
Fiona Ebner
acd551f801 adapt machine version deprecation for Proxmox VE
In commit a35f8577a0 ("include/hw: add macros for deprecation &
removal of versioned machines"), a new machine version deprecation and
removal policy was introduced. After only 3 years a machine version
will be deprecated while being removed after 6 years.

The deprecation is a bit early considering major PVE releases are
approximately every 2 years. This means that a deprecation warning can
already happen for a machine version that was introduced during the
previous major release. This would scare users for no good reason, so
avoid deprecating machine versions in PVE too early and define a
baseline of machine versions that will be supported throughout a
single major PVE release.

Reported-by: Martin Maurer <martin@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-01-16 17:40:00 +01:00
Fiona Ebner
f42ba1f272 vma reader: drop unused variable
The variable has been unused since commit 67af0fa ("rebased pve
patches") back in 2017. There is no comment to why, but before that,
it was used to error out if there were no disks in the vma archive.
This should be possible however, so it's safe to assume this was an
intentional change.

This fixes compilation with clang19.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2025-01-13 14:00:50 +01:00
Thomas Lamprecht
c4efa30b30 bump version to 9.1.2-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-12-11 16:47:25 +01:00
Fiona Ebner
0b40610f61 stable fixes for QEMU 9.1.2
Pick up to stable fixes for virtio-net, one fixing multiqueue
initialization and one fixing potential out-of-bounds access (in the
work_around_broken_dhclient() hack that luckily seems to be
unreachable when 'vhost=on' is used for the device, which Proxmox VE
does except when running a non-native VM arch or if the vhost device
is not available).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-12-11 16:10:36 +01:00
Fiona Ebner
28ad83b492 async snapshot: improve error handling for 'savevm-start' QMP command
Return values for qemu_savevm_state_setup() and blk_set_aio_context()
now get checked.

Move the qemu_coroutine_create() call to after the new early return
to avoid a potential memory leak.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-12-11 16:10:36 +01:00
Fiona Ebner
5fff8d91c7 async snapshot: code cleanup: use error_setg() helper
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-12-11 16:10:36 +01:00
Fiona Ebner
7882afe30d update submodule and patches to QEMU 9.1.2
Notable changes, most interestingly the two build system changes:

* avoid making 'migration' target depend on 'libproxmox_backup_qemu':

  Having pbs-state.c be part of the 'migration_files' makes the
  'migration' target depend on 'libproxmox_backup_qemu'. Adding the
  dependency to 'migration' and 'libmigration' would not be enough
  however, because pbs-state.c depends on savevm.c (for
  register_savevm_live()), and savevm.c is not itself part of the
  'migration_files' and would need to be moved too. Otherwise, linking
  the 'test-xbzrle' unit test is broken. Instead, don't declare
  pbs-state.c to be part of the 'migration_files'.

* meson: pbs-restore + vma: add qemuutil dependency explicitly

  Both pbs-restore and vma use "qemu/osdep.h" so the dependency is
  present. Being explicit is required after commit 414b180d42 ("meson:
  Pass objects and dependencies to declare_dependency()").

* QAPI docs "Notes:" to ".. note::" conversion following commit
  d461c27973 ("qapi: convert "Note" sections to plain rST").

* Removal of QERR_* macros following commit
  a95921f171 ("qapi: Inline and remove QERR_DEVICE_HAS_NO_MEDIUM
  definition") and friends.

* Signature change for .save_setup callbacks following commit
  01c3ac681b ("migration: Add Error** argument to .save_setup()
  handler").

* Removal of separate .bdrv_file_open callbacks following commit
  44b424dc4a ("block: remove separate bdrv_file_open callback")

* Adapt dirty bitmap migration error handling following commit
  dd03167725 ("migration: Add Error** argument to
  add_bitmaps_to_list()")

* Adapt savevm async to removed block migration following commit
  eef0bae3a7 ("migration: Remove block migration")

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-12-11 16:10:36 +01:00
Thomas Lamprecht
05089ab57d various PVE backup code refactoring/improvements
Mostly preparation for our external backup plugin work, but fine to
already commit now.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-11-12 16:48:28 +01:00
Thomas Lamprecht
9e8ef15831 PVE backup: improve error handling for fleecing
See Fiona's inner commit for details.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-11-12 16:48:07 +01:00
Thomas Lamprecht
531db7df01 block/reqlist: allow adding overlapping requests
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-11-12 16:42:52 +01:00
Thomas Lamprecht
d14bffa8c0 refresh patches
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-11-12 16:42:30 +01:00
Thomas Lamprecht
4bc8223ac9 bump version to 9.0.2-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-11-10 11:23:14 +01:00
Fiona Ebner
fd53092e9b async snapshot: stop vCPU throttling after finishing
In the community forum, users reported issues about RCU stalls and
sluggish VMs after taking a snapshot with RAM in Proxmox VE [0]. Mario
was also experiencing similar issues from time to time and recently,
obtained a GDB stacktrace. The stacktrace showed that, in his case,
the vCPU threads were waiting in cpu_throttle_thread(). It is a good
guess that the issues in the forum could also be because of that.

From searching in the source code, it seems that migration is the only
user of the vCPU throttling functions in QEMU relevant for Proxmox VE
(the only other place where it is used is the Cocoa UI). In
particular, RAM migration will begin throttling vCPUs for
auto-converge.

In migration_iteration_finish() there is an unconditional call to
cpu_throttle_stop(), so do the same in the async snapshot code
specific to Proxmox VE.

It's not clear why the issue began to surface more prominently only
now, since the vCPU throttling was there since commit 070afca258
("migration: Dynamic cpu throttling for auto-converge") in QEMU
v2.10.0. However, there were a lot of changes in the migration code
between v8.1.5 and v9.0.2 and a few of them might have affected the
likelihood of cpu_throttle_set() being called, for example, 4e1871c450
("migration: Don't serialize devices in qemu_savevm_state_iterate()")

[0]: https://forum.proxmox.com/threads/153483

Reported-by: Mario Loderer <m.loderer@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Mario Loderer <m.loderer@proxmox.com>
2024-11-10 11:20:39 +01:00
Thomas Lamprecht
7446610389 bump version to 9.0.2-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-09-06 16:22:13 +02:00
Fiona Ebner
903a63402e pick up stable fixes for 9.0
Includes fixes for VirtIO-net, ARM and x86(_64) emulation, CVEs to
harden NBD server against malicious clients, as well as a few others
(VNC, physmem, Intel IOMMU, ...).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-09-05 14:44:54 +02:00
Fiona Ebner
441072fc57 pick up fix for VirtIO PCI regressions
Commit f06b222 ("fixes for QEMU 9.0") included a revert for the QEMU
commit 2ce6cff94d ("virtio-pci: fix use of a released vector"). That
commit caused some regressions which sounded just as bad as the fix.
Those regressions have now been addressed upstream, so pick up the fix
and drop the revert. Dropping the revert fixes the original issue that
commit 2ce6cff94d ("virtio-pci: fix use of a released vector")
addressed.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-09-05 14:44:54 +02:00
Fiona Ebner
582fd47901 bump version to 9.0.2-2
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-08-07 10:17:15 +02:00
Fiona Ebner
356bc2483a actually bump submodule to v9.0.2
Fixes: cf40e92 ("update submodule and patches to QEMU 9.0.2")
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-08-07 09:43:16 +02:00
Thomas Lamprecht
9efd9cea96 bump version to 9.0.2-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-07-29 18:59:45 +02:00
Fiona Ebner
4154eea6e6 some more stable fixes for QEMU 9.0.2
Fix the two issues reported in the community forum[0][1], i.e.
regression in LSI-53c895a controller and ignored boot order for USB
storage (only possible via custom arguments in Proxmox VE), both
causing boot failures, and pick up fixes for VirtIO, ARM emulation,
char IO device and a graph lock fix for the block layer.

The block-copy patches that serve as a preparation for fleecing are
moved to the extra folder, because the graph lock fix requires them
to be present first. They have been applied upstream in the meantime
and should drop out with the rebase on 9.1.

[0]: https://forum.proxmox.com/threads/149772/post-679433
[1]: https://forum.proxmox.com/threads/149772/post-683459

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-07-29 18:56:46 +02:00
Fiona Ebner
cf40e92996 update submodule and patches to QEMU 9.0.2
Most relevant are some fixes for VirtIO and for ARM and i386
emulation. There also is a fix for VGA display to fix screen blanking,
which fixes: https://bugzilla.proxmox.com/show_bug.cgi?id=4786

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-07-29 18:56:46 +02:00
Fiona Ebner
14afbdd55f bump version to 9.0.0-6
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-07-08 16:13:44 +02:00
Fiona Ebner
54d1666680 zeroinit: fix regression with filename parsing
As reported in the community forum [0], cloning or importing images
to RBD storages (without the krbd setting) was broken. This is a
result of no filename parsing happening anymore in bdrv_open_child()
after commit b242e7f ("backport fix for CVE-2024-4467"), which the
zeroinit relied on for passing along the RBD filename+key-value pairs.

There is a dedicated function for opening the file child which still
does filename parsing. Use that for opening the file child. Role and
flags should still be the same as with the manual bdrv_open_child(),
because the zeroinit driver is a filter, and the assignment bs->file
is also done by bdrv_open_file_child().

Fixes: b242e7f ("backport fix for CVE-2024-4467")

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>

[0]: https://forum.proxmox.com/threads/qemu-9-0-available-on-pve-no-subscription-as-of-now.149772/post-681620
FG: added missing link

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2024-07-08 15:56:39 +02:00
Fiona Ebner
49125e1708 bump version to 9.0.0-5
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-07-03 13:50:23 +02:00
Fiona Ebner
b242e7f196 backport fix for CVE-2024-4467
This prevents that malicious qcow2 images can already cause bad
effects if being queried via 'qemu-img info'.

For Proxmox VE, this is an additional safe guard, as currently it
directly creates and manages the qcow2 images used by VMs and does not
allow unprivileged users to import them.

Reference: https://access.redhat.com/security/cve/cve-2024-4467

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-07-03 13:50:07 +02:00
Fiona Ebner
c2abb73df7 fix #4726: avoid superfluous check in vma code
The 'status' pointer is dereferenced regardless of the NULL check,
i.e. 'status->closed' is accessed after the branch with the check.
Since all callers pass in the address of a struct on the stack, the
pointer can never be NULL. Remove the superfluous check and add an
assert instead.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Reviewed-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2024-07-03 10:57:21 +02:00
Fiona Ebner
5bdf1bebba bump version to 9.0.0-4
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-07-01 14:02:00 +02:00
Fiona Ebner
99c80e7492 async snapshot: fix crash with VirtIO block with iothread when not saving VM state
As reported in the community forum [0], doing a snapshot without
saving the VM state for a VM with a VirtIO block device with iothread
would lead to an assertion failure [1] and thus crash.

The issue is that vm_start() is called from the coroutine
qmp_savevm_end() which violates assumptions about graph locking down
the line. Factor out the part of qmp_savevm_end() that actually needs
to be a coroutine into a separate helper and turn qmp_savevm_end()
into a non-coroutine, so that it can call vm_start() safely.

The issue is likely not new, but was exposed by the recent graph
locking rework introducing stricter checks.

The issue does not occur when saving the VM state, because then the
non-coroutine process_savevm_finalize() will already call vm_start()
before qmp_savevm_end().

[0]: https://forum.proxmox.com/threads/149883/

[1]:

> #0  0x00007353e6096e2c __pthread_kill_implementation (libc.so.6 + 0x8ae2c)
> #1  0x00007353e6047fb2 __GI_raise (libc.so.6 + 0x3bfb2)
> #2  0x00007353e6032472 __GI_abort (libc.so.6 + 0x26472)
> #3  0x00007353e6032395 __assert_fail_base (libc.so.6 + 0x26395)
> #4  0x00007353e6040eb2 __GI___assert_fail (libc.so.6 + 0x34eb2)
> #5  0x0000592002307bb3 bdrv_graph_rdlock_main_loop (qemu-system-x86_64 + 0x83abb3)
> #6  0x00005920022da455 bdrv_change_aio_context (qemu-system-x86_64 + 0x80d455)
> #7  0x00005920022da6cb bdrv_try_change_aio_context (qemu-system-x86_64 + 0x80d6cb)
> #8  0x00005920022fe122 blk_set_aio_context (qemu-system-x86_64 + 0x831122)
> #9  0x00005920021b7b90 virtio_blk_start_ioeventfd (qemu-system-x86_64 + 0x6eab90)
> #10 0x0000592002022927 virtio_bus_start_ioeventfd (qemu-system-x86_64 + 0x555927)
> #11 0x0000592002066cc4 vm_state_notify (qemu-system-x86_64 + 0x599cc4)
> #12 0x000059200205d517 vm_prepare_start (qemu-system-x86_64 + 0x590517)
> #13 0x000059200205d56b vm_start (qemu-system-x86_64 + 0x59056b)
> #14 0x00005920020a43fd qmp_savevm_end (qemu-system-x86_64 + 0x5d73fd)
> #15 0x00005920023f3749 qmp_marshal_savevm_end (qemu-system-x86_64 + 0x926749)
> #16 0x000059200242f1d8 qmp_dispatch (qemu-system-x86_64 + 0x9621d8)
> #17 0x000059200238fa98 monitor_qmp_dispatch (qemu-system-x86_64 + 0x8c2a98)
> #18 0x000059200239044e monitor_qmp_dispatcher_co (qemu-system-x86_64 + 0x8c344e)
> #19 0x000059200245359b coroutine_trampoline (qemu-system-x86_64 + 0x98659b)
> #20 0x00007353e605d9c0 n/a (libc.so.6 + 0x519c0)

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-06-28 10:57:35 +02:00
Fiona Ebner
9664f5a132 PVE backup: remove unused targetfile member from device info
This became unused after 9e0186f ("backup: drop broken
BACKUP_FORMAT_DIR").

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-06-14 15:15:14 +02:00
Fiona Ebner
b37841aa1a remove outdated comments about AioContext locking
AioContext locking got removed in QEMU 9.0.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-06-14 13:45:33 +02:00
Fiona Ebner
822c99f3c3 pbs block driver: use custom error message when returned aid is too large
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-06-10 16:05:49 +02:00
Jing Luo
51df4937bf pbs block driver: improve data type for aid member
On ARM, gcc warns (-Werror=type-limits) that it will always be false
for the if statement. This is because here s->aid is defined as char,
while proxmox_restore_open_image() returns an int.

This is probably because chars are treated as unsigned on arm arch but
signed on x86 arch:

https://developer.arm.com/documentation/den0013/d/Porting/Miscellaneous-C-porting-issues/unsigned-char-and-signed-char

Make aid an explicit uint8_t, because that is the type for functions
taking the aid as a parameter, e.g. proxmox_restore_get_image_length().

Signed-off-by: Jing Luo <jing@jing.rocks>
[FE: slightly improve commit message]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-06-10 15:45:10 +02:00
Fiona Ebner
bb80c7f323 bump version to 9.0.0-3
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-05-29 16:02:36 +02:00
Fiona Ebner
c1cd6a6221 more stable fixes for QEMU 9.0
Most importantly the first one "Revert "monitor: use
aio_co_reschedule_self()"", fixing a crash when doing hotplug+resize
with a disk using io_uring.

Other fixes (likely not too important) for TCG emulation of x86(_64)
and ARM.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-05-29 13:35:39 +02:00
Thomas Lamprecht
16b7dfe03b bump version to 9.0.0-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-05-17 17:05:10 +02:00
Fiona Ebner
f06b222ece fixes for QEMU 9.0
Most importantly, fix forwards and backwards migration with VirtIO-GPU
display.

Other fixes are for a regression in pflash device (introduced in 8.2)
and some fixes for x86(_64) TCG emulation. One of the patches needed
to be adapted, because it removed a helper that is still in use in
9.0.0.

There also is a revert for a fix in VirtIO PCI devices that turned out
to cause some issues, see the revert itself for more details.

Lastly, there is a change to move compatibility flags for a new
VirtIO-net feature to the correct machine type. The feature was
introduced in QEMU 8.2, but the compatibility flags got added to
machine version 8.0 instead of 8.1. This breaks backwards migration
with machine version 8.1 from a 8.2/9.0 binary to an 8.1 binary, in
cases where the guest kernel enables the feature (e.g. Ubuntu 23.10).
While that breaks migration with machine version 8.1 from an unpatched
to a patched binary, Proxmox VE only ever had 8.2 on the test
repository and 9.0 not yet in any public repository. An upstream
developer suggested it is the proper fix [0]. Upstream submission [1].

[0]: https://lore.kernel.org/qemu-devel/CACGkMEtZrJuhof+hUGVRvLLQE+8nQE5XmSHpT0NAQ1EpnqfmsA@mail.gmail.com/T/#u
[1]: https://lore.kernel.org/qemu-devel/20240517075336.104091-1-f.ebner@proxmox.com/T/#u

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-05-17 15:56:12 +02:00
Fiona Ebner
db293008ee backup: improve error when copy-before-write fails for fleecing
With fleecing, failure for copy-before-write does not fail the guest
write, but only sets the snapshot error that is associated to the
copy-before-write filter, making further requests to the snapshot
access fail with EACCES, which then also fails the job. But that error
code is not the root cause of why the backup failed, so bubble up the
original snapshot error instead.

Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-29 17:25:20 +02:00
Fiona Ebner
51232e2e40 fix #5409: backup: fix copy-before-write timeout
The type for the copy-before-write timeout in nanoseconds was wrong.
By being just uint32_t, a maximum of slightly over 4 seconds was
possible. Larger values would overflow and thus the 45 seconds set by
Proxmox's backup with fleecing, resulted in effectively 2 seconds
timeout for copy-before-write operations.

Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-29 17:25:20 +02:00
Thomas Lamprecht
2cd560e0d2 bump version to 9.0.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-04-29 15:29:52 +02:00
Fiona Ebner
4fbd50e2f9 update submodule and patches to QEMU 9.0.0
Biggest change is that AioContext locking got removed, but no changes
required other than dropping the calls to acquire and release it. As a
consequence, the single parameter for the bdrv_graph_wrlock() call got
removed which also required adaptation.

QAPI docs became stricter requiring to document all members.

Other minor changes:

- Single parameter from migration_is_running() was dropped.
- qemu_mutex_(un)lock_iothread() got renamed to bql_(un)lock().

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-04-29 15:29:52 +02:00
Thomas Lamprecht
766c61f1b6 d/lintian: ignore missing source warning for linux-user vdso objects
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-04-29 15:29:52 +02:00
Thomas Lamprecht
c19617bf9b bump version to 8.2.2-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-04-29 10:45:26 +02:00
Fiona Ebner
f1eed34ac7 update submodule and patches to QEMU 8.2.2
This version includes both the AioContext lock and the block graph
lock, so there might be some deadlocks lurking. It's not possible to
disable the block graph lock like was done in QEMU 8.1, because there
are no changes like the function bdrv_schedule_unref() that require
it. QEMU 9.0 will finally get rid of the AioContext locking.

During live-restore with a VirtIO SCSI drive with iothread there is a
known racy deadlock related to the AioContext lock. Not new [1], but
not sure if more likely now. Should be fixed in QEMU 9.0.

The block graph lock comes with annotations that can be checked by
clang's TSA. This required changes to the block drivers, i.e.
alloc-track, pbs, zeroinit as well as taking the appropriate locks
in pve-backup, savevm-async, vma-reader.

Local variable shadowing is prohibited via a compiler flag now,
required slight adaptation in vma.c.

Major changes only affect alloc-track:

* It is not possible to call a generated co-wrapper like
  bdrv_get_info() while holding the block graph lock exclusively [0],
  which does happen during initialization of alloc-track when the
  backing hd is set and the refresh_limits driver callback is invoked.

  The bdrv_get_info() call to get the cluster size is moved to
  directly after opening the file child in track_open().

  The important thing is that at least the request alignment for the
  write target is used, because then the RMW cycle in bdrv_pwritev
  will gather enough data from the backing file. Partial cluster
  allocations in the target are not a fundamental issue, because the
  driver returns its allocation status based on the bitmap, so any
  other data that maps to the same cluster will still be copied later
  by a stream job (or during writes to that cluster).

* Replacing the node cannot be done in the
  track_co_change_backing_file() callback, because it is a coroutine
  and cannot hold the block graph lock exclusively. So it is moved to
  the stream job itself with the auto-remove option not having an
  effect anymore (qemu-server would always set it anyways).

  In the future, there could either be a special option for the stream
  job, or maybe the upcoming blockdev-replace QMP command can be used.

  Replacing the backing child is actually already done in the stream
  job, so no need to do it in the track_co_change_backing_file()
  callback. It also cannot be called from a coroutine. Looking at the
  implementation in the qcow2 driver, it doesn't seem to be intended
  to change the backing child itself, just update driver-internal
  state.

Other changes:

* alloc-track: Error out early when used without auto-remove. Since
  replacing the node now happens in the stream job, where the option
  cannot be read from (it's internal to the driver), it will always be
  treated as 'on'. Makes sure to have users beside qemu-server notice
  the change (should they even exist). The option can be fully dropped
  in the future while adding a version guard in qemu-server.

* alloc-track: Avoid seemingly superfluous child permission update.
  Doesn't seem necessary nowadays (maybe after commit "alloc-track:
  fix deadlock during drop" where the dropping is not rescheduled and
  delayed anymore or some upstream change). Replacing the block node
  will already update the permissions of the new node (which was the
  file child before). Should there really be some issue, instead of
  having a drop state, this could also be just based off the fact
  whether there is still a backing child.

  Dumping the cumulative (shared) permissions for the BDS with a debug
  print yields the same values after this patch and with QEMU 8.1,
  namely 3 and 5.

* PBS block driver: compile unconditionally. Proxmox VE always needs
  it and something in the build process changed to make it not enabled
  by default. Probably would need to move the build option to meson
  otherwise.

* backup: job unreferencing during cleanup needs to happen outside of
  coroutine, so it was moved to before invoking the clean

* mirror: Cherry-pick stable fix to avoid potential deadlock.

* savevm-async: migrate_init now can fail, so propagate potential
  error.

* savevm-async: compression counters are not accessible outside
  migration/ram-compress now, so drop code that prophylactically set
  it to zero.

[0]: https://lore.kernel.org/qemu-devel/220be383-3b0d-4938-b584-69ad214e5d5d@proxmox.com/
[1]: https://lore.kernel.org/qemu-devel/e13b488e-bf13-44f2-acca-e724d14f43fd@proxmox.com/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-26 14:14:06 +02:00
Fiona Ebner
2e71c17f5b makefile: also filter 64-bit hppa ROM for QEMU 8.2
Same rationale as 6facdf3 ("also exclude hppa-firmware.img ROM from
build"), not used by Proxmox VE and would cause a failure during
build.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-26 14:13:53 +02:00
Fiona Ebner
f76e07f370 makefile: adapt firmware blob removal to changes for QEMU 8.2
Namely, it's also necessary to remove .dts source files from the
meson.build file, because the .dtb file names are not directly listed
anymore since commit 6e0dc9d2a8 ("meson: compile bundled device
trees").

The same commit also introduced a "'.dtb'" in a line not just listing
a file name and removing that line would break the script. Be more
precise and require an alphanumeric character before the suffix.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-26 14:13:49 +02:00
Fiona Ebner
71dd2d48f9 Makefile: drop -j option from dpkg-buildpackage
From man dpkg-buildpackage:

> -j, --jobs[=jobs|auto]
> Specifies the number of jobs allowed to be run simultaneously (since
> dpkg 1.14.7, long option since dpkg 1.18.8). The number of jobs
> matching the number of online processors if auto is specified (since
> dpkg 1.17.10), or unlimited number if jobs is not specified. The
> default behavior is auto (since dpkg 1.18.11) in non-forced mode
> (since dpkg 1.21.10), and as such it is always safer to use with any
> package including those that are not parallel-build safe.

The option was added in the Makefile by commit 4ba321f ("build qemu
multithreaded") which states:

> same as in pve-kernel where we have --jobs=auto

But according to the man page, -j without an argument is not the same
and means unlimited. Using the number of online cores seems more
sensible and was the original intention. Again, according to the man
page, the default is auto since dpkg 1.18.11 (or Debian Stretch), so
just drop the option.

The motivation to look into this was that after the recent upstream
commit d1ce2cc95b ("Makefile: preserve --jobserver-auth argument when
calling ninja") having -j as the make flag would be broken as it was
mistakenly passed to ninja (for which the argument for -j is not
optional). Should get fixed soon [0].

[0]: https://lore.kernel.org/qemu-devel/20240412100401.20047-2-pbonzini@redhat.com/T/#u

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-04-18 22:18:05 +02:00
Thomas Lamprecht
59ab88deb6 bump version to 8.1.5-5
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-04-11 20:05:02 +02:00
Thomas Lamprecht
20209d8d73 implement support for backup fleecing
Excerpt from Fiona's v3 cover-letter [0]:

When a backup for a VM is started, QEMU will install a
"copy-before-write" filter in its block layer. This filter ensures
that upon new guest writes, old data still needed for the backup is
sent to the backup target first. The guest write blocks until this
operation is finished so guest IO to not-yet-backed-up sectors will be
limited by the speed of the backup target.

With backup fleecing, such old data is cached in a fleecing image
rather than sent directly to the backup target. This can help guest IO
performance and even prevent hangs in certain scenarios, at the cost
of requiring more storage space.

With this series it will be possible to enable backup-fleecing via
e.g. `vzdump 123 --fleecing enabled=1,storage=local-lvm` with fleecing
images created on the storage `local-lvm`. The fleecing storage should
be a fast local storage which supports thin-provisioning and discard.
If the storage supports qcow2, that is used as the fleecing image
format. If the underlying file system does not support discard, with
qcow2 and preallocation=off, at least already allocated parts of the
image can be re-used later.

Fleecing images are created by qemu-server via pve-storage and
attached to QEMU before the backup starts, and cleaned up after the
backup finished or failed. The naming schema for fleecing images is
'vm-ID-fleece-N(.FORMAT)'. The allocated images are recorded in the
guest configuration, so that even after a hard failure, clean-up can
be re-attempted. While not too bad, it's a non-trivial amount of code
and I'm not 100% sure about the cost-benefit, so sending those as RFC.

The fleecing image needs to be the exact same size as the source, but
luckily, an explicit size can be specified when attaching a raw image
to QEMU so there are no size issues when using storages that have
coarser allocation/round up. For qcow2, it seems that virtual size can
be nearly arbitrary (i.e. modulo 512 byte granularity) during
allocation.

[0]: https://lists.proxmox.com/pipermail/pve-devel/2024-April/062815.html

Originally-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-04-11 20:05:02 +02:00
Thomas Lamprecht
47bdd04244 bump version to 8.1.5-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-03-12 14:08:48 +01:00
Thomas Lamprecht
8dd76cc52d backup: factor out & clean up gathering device info into helper
Squash the two original patches [0][1] from Fiona, which got send
separate to be easier to review, into the big patch that adds the
Proxmox backup integration.

[0]: https://lists.proxmox.com/pipermail/pve-devel/2024-January/061479.html
[1]: https://lists.proxmox.com/pipermail/pve-devel/2024-January/061478.html

Originally-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-03-12 13:55:00 +01:00
Fiona Ebner
cd7676f3e6 backup: avoid bubbling up first ECANCELED error
With pvebackup_propagate_error(), the first error wins. When one job
in the transaction fails, it is expected that later jobs get the
ECANCELED error. Those are not interesting and by skipping them a more
interesting error, which is likely the actual root cause, can win.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-03-12 13:20:28 +01:00
Fiona Ebner
862b46e3e0 cleanup: squash backup dump driver change into patch introducing the driver
Makes it simpler and shorter. Still results in the same code after
applying both patches in question.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-03-12 13:19:30 +01:00
Fiona Ebner
061e9ceb36 fix patch for accepting NULL qiov when padding
All callers of the function pass an address, so dereferencing once
before checking for NULL is required. It's also necessary to update
bytes and offset nevertheless, so the request will actually be aligned
later and not trigger an assertion failure.

Seems like this was accidentally broken in 8dca018 ("udpate and rebase
to QEMU v6.0.0") and this is effectively a revert to the original
version of the patch. The qiov functions changed back then, which
might've been the reason Stefan tried to simplify the patch.

Should fix live-import for certain kinds of VMDK images.

Reported-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-03-12 13:11:21 +01:00
Thomas Lamprecht
0d4462207b bump version to 8.1.5-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-02-21 20:11:27 +01:00
Fiona Ebner
ed159bc32a add patch to fix deadlock with VirtIO block and iothread during QMP stop
Backported from commit bfa36802d1 ("virtio-blk: avoid using ioeventfd
state in irqfd conditional") because the rework/rename dataplane ->
ioeventfd didn't happen yet.

Reported in the community forum [0] and reproduced doing a backup loop
to PBS with suspend mode with fio doing heavy IO in the guest and
using an RBD storage (with krbd).

[0]: https://forum.proxmox.com/threads/141320

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-02-21 20:09:22 +01:00
Fiona Ebner
86460aef76 fix #4507: add patch to automatically increase NOFILE soft limit
In many configurations, e.g. multiple vNICs with multiple queues or
with many Ceph OSDs, the default soft limit of 1024 is not enough.
QEMU is supposed to work fine with file descriptors >= 1024 and does
not use select() on POSIX. Bump the soft limit to the allowed hard
limit to avoid issues with the aforementioned configurations.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-02-06 10:33:12 +01:00
Thomas Lamprecht
676adda3c6 bump version to 8.1.5-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-02-02 19:41:31 +01:00
Thomas Lamprecht
4ff04bdfa5 work around stuck guest IO with iothread and VirtIO block/SCSI
This essentially repeats commit 6b7c181 ("add patch to work around
stuck guest IO with iothread and VirtIO block/SCSI") with an added
fix for the SCSI event virtqueue, which requires special handling.
This is to avoid the issue [3] that made the revert 2a49e66 ("Revert
"add patch to work around stuck guest IO with iothread and VirtIO
block/SCSI"") necessary the first time around.

When using iothread, after commits
1665d9326f ("virtio-blk: implement BlockDevOps->drained_begin()")
766aa2de0f ("virtio-scsi: implement BlockDevOps->drained_begin()")
it can happen that polling gets stuck when draining. This would cause
IO in the guest to get completely stuck.

A workaround for users is stopping and resuming the vCPUs because that
would also stop and resume the dataplanes which would kick the host
notifiers.

This can happen with block jobs like backup and drive mirror as well
as with hotplug [2].

Reports in the community forum that might be about this issue[0][1]
and there is also one in the enterprise support channel.

As a workaround in the code, just re-enable notifications and kick the
virt queue after draining. Draining is already costly and rare, so no
need to worry about a performance penalty here.

Take special care to attach the SCSI event virtqueue host notifier
with the _no_poll() variant like in virtio_scsi_dataplane_start().
This avoids the issue from the first attempted fix where the iothread
would suddenly loop with 100% CPU usage whenever some guest IO came in
[3]. This is necessary because of commit 38738f7dbb ("virtio-scsi:
don't waste CPU polling the event virtqueue"). See [4] for the
relevant discussion.

[0]: https://forum.proxmox.com/threads/137286/
[1]: https://forum.proxmox.com/threads/137536/
[2]: https://issues.redhat.com/browse/RHEL-3934
[3]: https://forum.proxmox.com/threads/138140/
[4]: https://lore.kernel.org/qemu-devel/bfc7b20c-2144-46e9-acbc-e726276c5a31@proxmox.com/

Link: https://lore.kernel.org/qemu-devel/20240202153158.788922-1-hreitz@redhat.com/
Originally-by: Fiona Ebner <f.ebner@proxmox.com>
 [ TL: Update to v2 and rebased patch series handling to v8.1.5 ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-02-02 19:35:34 +01:00
Thomas Lamprecht
12b69ed9c5 bump version to 8.1.5-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2024-02-02 19:08:16 +01:00
Fiona Ebner
5e8903f875 stable fixes for corner case in i386 emulation and crash with VNC clipboard
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-02-02 19:06:29 +01:00
Fiona Ebner
4b7975e75d update submodule and patches to QEMU 8.1.5
Most notable fixes from a Proxmox VE perspective are:

* "virtio-net: correctly copy vnet header when flushing TX"
  To prevent a stack overflow that could lead to leaking parts of the
  QEMU process's memory.
* "hw/pflash: implement update buffer for block writes"
  To prevent an edge case for half-completed writes. This potentially
  affected EFI disks.
* Fixes to i386 emulation and ARM emulation.

No changes for patches were necessary (all are just automatic context
changes).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2024-02-02 19:06:29 +01:00
Fiona Ebner
f366bb97ae bump version to 8.1.2-6
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-12-15 14:26:09 +01:00
Fiona Ebner
2a49e667ba Revert "add patch to work around stuck guest IO with iothread and VirtIO block/SCSI"
This reverts commit 6b7c1815e1.

The attempted fix has been reported to cause high CPU usage after
backup [0]. Not difficult to reproduce and it's iothreads getting
stuck in a loop. Downgrading to pve-qemu-kvm=8.1.2-4 helps which was
also verified by Christian, thanks! The issue this was supposed to fix
is much rarer, so revert for now, while upstream is still working on a
proper fix.

[0]: https://forum.proxmox.com/threads/138140/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-12-15 14:16:26 +01:00
Thomas Lamprecht
c6eb05a799 bump version to 8.1.2-5
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-12-11 16:59:16 +01:00
Fiona Ebner
dfac4f3593 pick fix for potential deadlock with QMP resize and iothread
While the patch gives bdrv_graph_wrlock() as an example where the
issue can manifest, something similar can happen even when that is
disabled. Was able to reproduce the issue with
while true; do qm resize 115 scsi0 +4M; sleep 1; done
while running
fio --name=make-mirror-work --size=100M --direct=1 --rw=randwrite \
 --bs=4k --ioengine=psync --numjobs=5 --runtime=1200 --time_based
in the VM.

Fix picked up from:
https://lists.nongnu.org/archive/html/qemu-devel/2023-12/msg01102.html

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-12-11 16:56:50 +01:00
Fiona Ebner
6b7c1815e1 add patch to work around stuck guest IO with iothread and VirtIO block/SCSI
When using iothread, after commits
1665d9326f ("virtio-blk: implement BlockDevOps->drained_begin()")
766aa2de0f ("virtio-scsi: implement BlockDevOps->drained_begin()")
it can happen that polling gets stuck when draining. This would cause
IO in the guest to get completely stuck.

A workaround for users is stopping and resuming the vCPUs because that
would also stop and resume the dataplanes which would kick the host
notifiers.

This can happen with block jobs like backup and drive mirror as well
as with hotplug [2].

Reports in the community forum that might be about this issue[0][1]
and there is also one in the enterprise support channel.

As a workaround in the code, just re-enable notifications and kick the
virt queue after draining. Draining is already costly and rare, so no
need to worry about a performance penalty here. This was taken from
the following comment of a QEMU developer [3] (in my debugging,
I had already found re-enabling notification to work around the issue,
but also kicking the queue is more complete).

[0]: https://forum.proxmox.com/threads/137286/
[1]: https://forum.proxmox.com/threads/137536/
[2]: https://issues.redhat.com/browse/RHEL-3934
[3]: https://issues.redhat.com/browse/RHEL-3934?focusedId=23562096&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-23562096

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-12-11 16:56:50 +01:00
Thomas Lamprecht
24d732ac0f bump version to 8.1.2-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-11-22 14:28:25 +01:00
Fiona Ebner
df2cc786ee add fix for vnc clipboard
This fixes the host->guest direction with noNVC as a client (and
likely others).

Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Friedrich Weber <f.weber@proxmox.com>
2023-11-22 14:19:45 +01:00
Thomas Lamprecht
38726d3473 bump version to 8.1.2-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-11-20 10:35:52 +01:00
Fiona Ebner
89b46e17ec fix #5054: backport fix for software reset with SATA
The issue prevented FreeBSD 14 VMs with SATA disk from booting.

The commit it fixes e2a5d9b3d9c3 ("hw/ide/ahci: simplify and document
PxCI handling") is part of stable 8.1.2.

The patch was already applied to the block branch upstream:
https://lists.nongnu.org/archive/html/qemu-devel/2023-11/msg02711.html

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Friedrich Weber <f.weber@proxmox.com>
2023-11-20 10:35:00 +01:00
Thomas Lamprecht
33b22c3fe0 bump version to 8.1.2-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-11-17 11:55:26 +01:00
Fiona Ebner
c38e337f5d revert commit breaking VirtIO network adapters for certain versions of Windows
As reported in the community forum [0] and reproduced locally this
breaks VirtIO network adapters in (at least) the German ISO of Windows
Server 2022. The fix itself was for

> Issue is not fatal but as result acpi-index/"PCI Label ID" property
> is either not shown in device details page or shows incorrect value.

so revert and tolerate that as a stop-gap, rather than have the
devices not working at all.

[0]: https://forum.proxmox.com/threads/92094/post-605684

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-11-17 11:52:52 +01:00
Fiona Ebner
763949965f fix #4710: vma create: don't use O_DIRECT for tmpfs
The implementation of the helper is_path_tmpfs() is similar to the
existing qemu_fd_getfs() function in util/mmap-alloc.c, which
unfortunately only takes an existing fd.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-11-07 16:37:34 +01:00
Thomas Lamprecht
1807330a6f bump version to 8.1.2-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Thomas Lamprecht
a31ab74058 d/control: add python3-venv as build-dependency
Seems to be required since commit 81e2b198a8 ("configure: create a
python venv unconditionally").

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Fiona Ebner
b39f726f31 d/control: add versioned Breaks for qemu-server <= 8.0.6
Upstream QEMU commit 4271f40383 ("virtio-net: correctly report maximum
tx_queue_size value") made setting an invalid tx_queue_size for a
non-vDPA/vhost-user net device a hard error. Now, qemu-server before
commit 089aed81 ("cfg2cmd: netdev: fix value for tx_queue_size") did
just that, so the newer QEMU version would break start-up for most VMs
(a default vNIC configuration would be affected).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Fiona Ebner
a36bda146c add patch to avoid huge snapshot performance regression
Taking a snapshot became prohibitively slow because of the
migration_transferred_bytes() call in migration_rate_exceeded() [0].

This also applied to the async snapshot taking in Proxmox VE, so
work around the issue until it is fixed upstream.

[0]: https://gitlab.com/qemu-project/qemu/-/issues/1821

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Fiona Ebner
03ff63aa61 add patch to disable graph locking
There are still some issues with graph locking, e.g. deadlocks during
backup canceling [0] and initial attempts to fix it didn't work [1].
Because the AioContext locks still exist, it should still be safe to
disable graph locking.

[0]: https://lists.nongnu.org/archive/html/qemu-devel/2023-09/msg00729.html
[1]: https://lists.nongnu.org/archive/html/qemu-devel/2023-09/msg06905.html

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Fiona Ebner
10e1093325 update submodule and patches to QEMU 8.1.2
Bigger notable changes:

* Commit 1a30b0f5d7 ("block: .bdrv_open is non-coroutine and
  unlocked") broke the PVE backup patches, in particular setting up
  the backup dump block driver, because bdrv_new_open_driver() cannot
  be called from a coroutine. To fix it, bdrv_co_open() is used
  instead, and while it's a much more involved function, the result
  should be essentially the same. The only difference I noticed is
  that the BDRV_O_ALLOW_RDWR flag is also set in the resulting bds
  (block driver state), but that shouldn't hurt.

Smaller notable changes:

* aio_set_fd_handler() dropped its 'is_external' parameter stating
  that all callers now pass false in 60f782b6b7 ("aio: remove
  aio_disable_external() API"). The calls in the PVE patches also
  passed false, so just drop the parameter too.

* global_state_store() does not have a return value anymore, so the
  user in the PVE savevm-async patch was adapted. For context, see
  c33f1829f8 ("migration: never fail in global_state_store()").

* Renames affecting the PVE savevm-async patch:
  migrate_use_block() -> migrate_block() and ram_counters -> mig_stats
  9d4b1e5f22 ("migration: Move migrate_use_block() to options.c")
  aff3f6606d ("migration: Rename ram_counters to mig_stats")

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Fiona Ebner
0d9c737d61 buildsys: use QEMU's keycodemapdb again
instead of the split-out version that was last updated for QEMU 6.0.
This reverts the relevant part of 6838f03 ("bump version to 2.11.1-1")
which doesn't state a reason why the splitting was done. If something
breaks, we can still re-do it and document the reason this time.

Alternatively, it would be necessary to adapt the paths, because
keycodemapdb lives in subprojects/ rather than ui/ since QEMU commit
c53648abba ("meson: use subproject for keycodemapdb").

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Fiona Ebner
a6ddea7ef7 buildsys: fixup submodule target
It's not enough to initialize the submodules anymore, as some got
replaced by wrap files, see QEMU commit 2019cabfee ("meson:
subprojects: replace submodules with wrap files").

Download the subprojects during initialization of the QEMU submodule,
so building (without the automagical --enable-download) can succeeed
afterwards.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Fiona Ebner
89520c1cd0 d/rules: use disable-download option instead of git-submodules=ignore
See the following QEMU commits for reference:
0c5f3dcbb2 ("configure: add --enable-pypi and --disable-pypi")
ac4ccac740 ("configure: rename --enable-pypi to --enable-download, control subprojects too")
6f3ae23b29 ("configure: remove --with-git-submodules=") removed

The last one removed the option and the closest thing to
git-submodule=ignore is using disable-download. Which will then just
verify that the submodules are present.

Building now will require running either
* Running 'meson subprojects download' in the qemu submodule first.
* Using --enable-download, but then the submodules would be downloaded
  for each build (if not already downloaded in the submodule first)
  and it's just a bit too surprising if downloads happen during build.

The disable-download option will also disable automatic downloading of
missing Python modules from PyPI. Hopefully, it's enough to add them
as Debian build dependencies when required.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-24 15:01:23 +02:00
Thomas Lamprecht
eca4daeeed bump version to 8.0.2-7
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-10-04 08:33:39 +02:00
Fiona Ebner
816077299c fix #2874: SATA: avoid unsolicited write to sector 0 during reset
If there is a pending DMA operation during ide_bus_reset(), the fact
that the IDEstate is already reset before the operation is canceled
can be problematic. In particular, ide_dma_cb() might be called and
then use the reset IDEstate which contains the signature after the
reset. When used to construct the IO operation this leads to
ide_get_sector() returning 0 and nsector being 1. This is particularly
bad, because a write command will thus destroy the first sector which
often contains a partition table or similar.

Upstream discussion:
https://lists.nongnu.org/archive/html/qemu-devel/2023-08/msg04239.html

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-09-26 11:30:22 +02:00
Fiona Ebner
ef3308db71 vma: avoid compiler warning about incompatible pointer type
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-09-08 11:18:30 +02:00
Filip Schauer
0ff45eb23e backup: Fix spelling error in function name
Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
[FE: fixup patch context]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-09-08 11:13:04 +02:00
Thomas Lamprecht
6c5563e30b bump version to 8.0.2-6
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-09-06 17:04:04 +02:00
Fiona Ebner
9e0186f289 backup: drop broken BACKUP_FORMAT_DIR
Since upstream QEMU 8.0, it's no longer possible to call
bdrv_img_create() from a coroutine anymore, meaning a backup with the
directory format would crash the QEMU instance.

The feature is only exposed via the monitor and was intended to be
experimental. There were no user reports about the breakage and it
only was noticed during the rebase for QEMU 8.1, because other parts
of the backup code needed adaptation and I decided to check the
BACKUP_FORMAT_DIR case too.

It should not stay in a broken state of course, but avoid the
maintenance cost and just make it a removed feature for Proxmox VE 8
retroactively.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-09-06 16:59:12 +02:00
Fiona Ebner
0cffb504e7 backup: create jobs in a drained section
With the drive-backup QMP command, upstream QEMU uses a drained
section for the source drive when creating the backup job. Do the same
here to avoid subtle bugs.

There, the drained section extends until after the job is started, but
this cannot be done here for multi-disk backups (could at most start
the first job). The important thing is that the cbw
(copy-before-write) node is in place and the bcs (block-copy-state)
bitmap is initialized, which both happen during job creation (ensured
by the "block/backup: move bcs bitmap initialization to job creation"
PVE patch).

One such bug is one reported in the community forum [0], where using a
drive with iothread can lead to an overlapping block-copy request and
consequently an assertion failure. The block-copy code relies on the
bcs bitmap to determine if a request for a certain range can be
created. Each time a request is created, it resets the bcs bitmap at
that range to indicate that it's being handled.

The duplicate request can happen as follows:
Thread A attaches the cbw node
Thread B creates a request and resets the bitmap at that range
Thread A clears the bitmap and merges it with the PBS bitmap
The merging can lead to the bitmap being set again at the range of
the previous request, so the block-copy code thinks it's fine to
create a request there.
Thread B creates another requests at an overlapping range before the
other request is finished.

The drained section ensures that nothing else can interfere with the
bcs bitmap between attaching the copy-before-write block node and
initialization of the bitmap.

[0]: https://forum.proxmox.com/threads/133149/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-09-06 16:59:12 +02:00
Fiona Ebner
f7eed6caa1 regenerate patch stats
Apparently wasn't correct in 0cff91a ("fix #1534: vma: Add extract
filter for disk images").

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-09-06 16:59:12 +02:00
Filip Schauer
0cff91a000 fix #1534: vma: Add extract filter for disk images
Add a filter to the "vma extract" command. A comma seperated list of
disk images that should be extracted can be passed with the "-d" option.

Example to extract an IDE drive and an SCSI drive from vzdump.vma:

vma extract vzdump.vma -d "drive-ide0,drive-scsi0" extractdir

Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
2023-08-30 10:40:51 +02:00
Fiona Ebner
6cadf3677d bump version to 8.0.2-5
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-08-16 11:56:49 +02:00
Fiona Ebner
5f9cb29c3a backup: trim heap after finishing
Reported in the community forum [0]. By default, there can be large
amounts of memory left assigned to the QEMU process after backup.
Likely because of fragmentation, it's necessary to explicitly call
malloc_trim() to tell glibc that it shouldn't keep all that memory
resident for the process.

QEMU itself already does a malloc_trim() in the RCU thread, but that
code path might not be reached (or not for a long time) under usual
operation. The value of 4 MiB for the argument was also copied from
there.

Example with the following configuration:
> agent: 1
> boot: order=scsi0
> cores: 4
> cpu: x86-64-v2-AES
> ide2: none,media=cdrom
> memory: 1024
> name: backup-mem
> net0: virtio=DA:58:18:26:59:9F,bridge=vmbr0,firewall=1
> numa: 0
> ostype: l26
> scsi0: rbd:base-107-disk-0/vm-106-disk-1,size=4302M
> scsihw: virtio-scsi-pci
> smbios1: uuid=b2d4511e-8d01-44f1-afd6-9581b30c24a6
> sockets: 2
> startup: order=2
> virtio0: lvmthin:vm-106-disk-1,iothread=1,size=1G
> virtio1: lvmthin:vm-106-disk-2,iothread=1,size=1G
> virtio2: lvmthin:vm-106-disk-3,iothread=1,size=1G
> vmgenid: 0a1d8751-5e02-449d-977e-c0160e900231

Before the change:

> root@pve8a1 ~ # grep VmRSS /proc/$(cat /var/run/qemu-server/106.pid)/status
> VmRSS:	  370948 kB
> root@pve8a1 ~ # vzdump 106 --storage pbs
> (...)
> INFO: Backup job finished successfully
> root@pve8a1 ~ # grep VmRSS /proc/$(cat /var/run/qemu-server/106.pid)/status
> VmRSS:	 2114964 kB

After the change:

> root@pve8a1 ~ # grep VmRSS /proc/$(cat /var/run/qemu-server/106.pid)/status
> VmRSS:	  398788 kB
> root@pve8a1 ~ # vzdump 106 --storage pbs
> (...)
> INFO: Backup job finished successfully
> root@pve8a1 ~ # grep VmRSS /proc/$(cat /var/run/qemu-server/106.pid)/status
> VmRSS:	  424356 kB

[0]: https://forum.proxmox.com/threads/131339/

Co-diagnosed-by: Friedrich Weber <f.weber@proxmox.com>
Co-diagnosed-by: Dominik Csapak <d.csapak@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2023-08-16 11:50:12 +02:00
Fiona Ebner
c36e3f9d17 refresh patch context
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2023-08-16 11:50:08 +02:00
Filip Schauer
b8b4ce0480 Add format attributes to function candidates
Add format attributes to functions that take printf-like arguments. This
provides additional compile-time checking that the correct parameters
are passed to the functions.

This fixes compiler warnings generated by the -Wsuggest-attribute=format
flag.

Signed-off-by: Filip Schauer <f.schauer@proxmox.com>
2023-08-08 09:08:48 +02:00
Fiona Ebner
df47146afe add patch fixing fd leak for vhost
Each pause+resume operation (which is also done as part of taking a VM
snapshot) would increase the number of open file descriptors by the
number of vhost devices (e.g. network devices by default). This could
lead to crashes during backup and surely other issues once the system
limit (default 1024) was reached [0].

[0]: https://forum.proxmox.com/threads/131603/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-08-03 17:40:13 +02:00
Fabian Grünbichler
d9cbfafeeb bump version to 8.0.2-4
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2023-07-28 12:59:10 +02:00
Fiona Ebner
5919ec1446 add patch fixing resume for snapshot and hibernate with drive with iothread and a dirty bitmap
Not difficult to run into, just have a drive with iothread, take a PBS
backup and then take a snapshot or hibernate. Resuming will fail with
> qemu: qemu_mutex_unlock_impl: Operation not permitted
because of not acquiring the correct AioContext first.

Migration is not affected, because it runs in coroutine context.

Reported in the community forum:
https://forum.proxmox.com/threads/129899/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-07-28 12:00:50 +02:00
Thomas Lamprecht
409db0cd7b bump version to 8.0.2-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-06-15 13:59:12 +02:00
Fiona Ebner
ea7662074d fix checks for drive mirror with bitmap
The QAPI change for QEMU 8.0 dropped redundant has_foo parameters, but
in the blockdev_mirror_common() function (which is not part of the
QAPI itself but called from there) the argument pair was has_bitmap
and bitmap_name rather than has_bitmap and bitmap.

Reported-by: Aaron Lauterer <a.lauterer@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-06-15 13:55:22 +02:00
Fiona Ebner
d847446186 regenerate patches
There's still some context changes not covered by earlier series. No
functional change intended.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-06-15 13:55:22 +02:00
Thomas Lamprecht
3aaa855e5c bump version to 8.0.2-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-06-09 07:58:59 +02:00
Fiona Ebner
99f9ce2cd2 drop deprecated custom drive snapshot QMP commands
They are not required anymore since qemu-server >= 5.0-36.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-06-07 19:35:53 +02:00
Fiona Ebner
a816d2969e drop patch for custom get_link_status QMP command
There doesn't seem to be any Proxmox VE code using this.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-06-07 19:35:40 +02:00
Thomas Lamprecht
0e9a7bfda2 bump version to 8.0.2-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-06-06 16:35:20 +02:00
Fiona Ebner
a39364b9d1 update reentrancy patches to version in upstream git
The previous version was picked from the mailing list and still had
an object_dynamic_cast call in a hot path, which is avoided with the
version that landed in git.

Also adds a few more exceptions for devices that need reentrancy.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-06-06 16:32:38 +02:00
Fiona Ebner
0f693c2cab update submodule and patches to QEMU 8.0.2
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-06-06 16:32:38 +02:00
Thomas Lamprecht
88b1550dfb buildsys: remove edk2 source tree when assembling build-dir
we ship it via pve-edk2-firmware anyway and it only results in bigger
source tar balls and lintian yelling at us due to edk2 not being the
simplest repo to ensure DFSG compat.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-24 10:37:10 +02:00
Thomas Lamprecht
bd3c1fa525 bump version to 8.0.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-23 14:09:12 +02:00
Thomas Lamprecht
de2dde2da9 buildsys: avoid handling noopt locally, rather extend CFLAGS
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-23 14:09:12 +02:00
Thomas Lamprecht
04e0262e2e d/rules: add identation for configure switches for readability
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 15:23:55 +02:00
Thomas Lamprecht
d3c2ae9683 d/control: drop obsolete build dependencies
drop autotools-dev, texi2html and texinfo build dependencies, they
are not used and have no effect

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 15:11:33 +02:00
Thomas Lamprecht
d0603efa38 buildsys: auto-generate dbgsym package
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 15:09:14 +02:00
Fiona Ebner
db5d2a4b77 squash related patches
where there is no good reason to keep them separate. It's a pain
during rebase if there are multiple patches changing the same code
over and over again. This was especially bad for the backup-related
patches. If the history of patches really is needed, it can be
extracted via git. Additionally, compilation with partial application
of patches was broken since a long time, because one of the master key
changes became part of an earlier patch during a past rebase.

If only the same files were changed by a subsequent patch and the
changes felt to belong together (obvious for later bug fixes, but also
done for features e.g. adding master key support for PBS), the patches
were squashed together.

The PBS namespace support patch was split into the individual parts
it changes, i.e. PBS block driver, pbs-restore binary and QMP backup
infrastructure, and squashed into the respective patches.

No code change is intended, git diff in the submodule should not show
any difference between applying all patches before this commit and
applying all patches after this commit.

The query-proxmox-support QMP function has been left as part of the
"PVE-Backup: Proxmox backup patches for QEMU" patch, because it's
currently only used there. If it ever is used elsewhere too, it can
be split out from there.

The recent alloc-track and BQL-related savevm-async changes have been
left separate for now, because it's not 100% clear they are the best
approach yet. This depends on what upstream decides about the BQL
stuff and whether and what kind of issues with the changes pop up.

The qemu-img dd snapshot patch has been re-ordered to after the other
qemu-img dd patches.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-22 15:09:14 +02:00
Fiona Ebner
b64c4dec1c PVE backup: don't call no_co_wrapper function from coroutine
Namely, pvebackup_co_prepare() needs to call bdrv_co_open() rather
than bdrv_open(), because it is a coroutine itself.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-22 15:09:14 +02:00
Fiona Ebner
53b56ca781 add stable patches for 8.0.0
Changes to other patches are all just metadata/context changes except
for pvebackup_co_prepare() needing to call bdrv_co_unref() rather than
bdrv_unref(), because it is a coroutine itself. This is documented in
d6ee2e324e ("block-coroutine-wrapper: Introduce no_co_wrapper"). The
change is necessary, because one of the stable fixes converts
bdrv_unref and blk_unref into no_co_wrappers (in preparation for a
second patch to fix a hang with the block resize QMP command).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-22 15:09:14 +02:00
Fiona Ebner
bf251437e9 update submodule and patches to QEMU 8.0.0
Many changes were necessary this time around:

* QAPI was changed to avoid redundant has_* variables, see commit
44ea9d9be3 ("qapi: Start to elide redundant has_FOO in generated C")
for details. This affected many QMP commands added by Proxmox too.

* Pending querying for migration got split into two functions, one to
estimate, one for exact value, see commit c8df4a7aef ("migration:
Split save_live_pending() into state_pending_*") for details. Relevant
for savevm-async and PBS dirty bitmap.

* Some block (driver) functions got converted to coroutines, so the
Proxmox block drivers needed to be adapted.

* Alloc track auto-detaching during PBS live restore got broken by
AioContext-related changes resulting in a deadlock. The current, hacky
method was replaced by a simpler one. Stefan apparently ran into a
problem with that when he wrote the driver, but there were
improvements in the stream job code since then and I didn't manage to
reproduce the issue. It's a separate patch "alloc-track: fix deadlock
during drop" for now, you can find the details there.

* Async snapshot-related changes:
  - The pending querying got adapted to the above-mentioned split and
  a patch is added to optimize it/make it more similar to what
  upstream code does.
  - Added initialization of the compression counters (for
    future-proofing).
  - It's necessary the hold the BQL (big QEMU lock = iothread mutex)
  during the setup phase, because block layer functions are used there
  and not doing so leads to racy, hard-to-debug crashes or hangs. It's
  necessary to change some upstream code too for this, a version of
  the patch "migration: for snapshots, hold the BQL during setup
  callbacks" is intended to be upstreamed.
  - Need to take the bdrv graph read lock before flushing.

* hmp_info_balloon was moved to a different file.

* Needed to include a new headers from time to time to still get the
correct functions.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-22 15:09:14 +02:00
Fiona Ebner
fb818ea5b9 d/rules: drop virtiofsd switch
virtiofsd is no longer part of QEMU 8.0. It got replaced by a separate
implementation written in Rust, which will be its own package.

See QEMU commit 0aaf44776e ("Merge tag 'pull-virtiofs-20230216b' of
https://gitlab.com/dagrh/qemu into staging").

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-05-22 15:09:14 +02:00
Thomas Lamprecht
3c995a426d makefile: convert to use simple parenthesis
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 15:09:14 +02:00
Thomas Lamprecht
be7ce325c7 d/lintian-overrides: ignore groff line breakage/adjustment warnings
not much we can do here anyway..

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 15:09:14 +02:00
Thomas Lamprecht
19b4b4c50f d/lintian-overrides: sort
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 15:09:14 +02:00
Thomas Lamprecht
590adba81a d/parse-machines: produce stable json output
Enabling the "canonical" option the keys will be sorted, improving
build reproducibility.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 15:09:14 +02:00
Fiona Ebner
abb04bb627 d/control: define compat level via build-depends and raise to 13
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 13:29:59 +02:00
Thomas Lamprecht
6facdf3a08 also exclude hppa-firmware.img ROM from build
We don't use it and with debhelper compat level >= 11, the switch
from detecting files for strip through patters to checking for an ELF
header caused a build failure with the hppa-firmware.img ROM, as some
tools cannot cope with HP PARISC files.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 13:29:59 +02:00
Thomas Lamprecht
cb2b3190a4 move cleanup of unused ROMs from d/rules to build-dir generation
this way we save a bit of space and should make build also slightly
faster, otherwise nothing should change.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 13:29:59 +02:00
Thomas Lamprecht
2e416ad9d5 d/rules: fix debian-rules-missing-required-target
until we switch fully over to the dh sequencer

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 13:29:59 +02:00
Thomas Lamprecht
d80ca49db8 d/rules: cleanup cruft and use dpkg makefile fragements
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 13:29:59 +02:00
Thomas Lamprecht
d65b507d3f buildsys: update lintian overrides
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 13:29:59 +02:00
Thomas Lamprecht
98fd8612cb add .gitignore file
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 12:05:14 +02:00
Thomas Lamprecht
4f56d29218 buildsys: use shorter variable name $@ in $(BUILDIR) target
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 12:05:14 +02:00
Thomas Lamprecht
cd148033f3 buildsys: only run lintian for phony dsc target
This allows the sbuild to start much faster (lintian takes ~ minutes
for such big packages), and that without loss as sbuild will run
lintian on both binary and source package anyway.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 12:05:14 +02:00
Thomas Lamprecht
92c6d84f6a d/control: avoid versioned build-dependcies with a -1 revision
no effect besides making it harder to build this for an eventual
backport.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 12:05:14 +02:00
Thomas Lamprecht
b8af8dd4fa debian: normalize packaging files with wrap-and-sort -tkn
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-22 12:05:13 +02:00
Fiona Ebner
6eb3e31968 d/rules: fix comment about when clean target is executed
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:51:16 +02:00
Fiona Ebner
c913853be7 d/rules: move copying config.guess and config.sub to config.status target
It causes problems when done as part of the clean target when building
the dsc with the following error due to the additional files:
dpkg-source: error: aborting due to unexpected upstream changes

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:51:16 +02:00
Fiona Ebner
4fc4b533b5 buildsys: fix lintian overrides
See https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=1007002 for more
information.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:51:16 +02:00
Fiona Ebner
023b916380 d/rules: set job flag for make based on DEB_BUILD_OPTIONS
Copied from Debian's QEMU package's d/rules. Otherwise, ninja will end
up using only a single job (in Debian Bookworm/Proxmox VE 8).

Suggested-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:51:16 +02:00
Fiona Ebner
19a11f24a5 buildsys: expand clean target
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
 [ T: remove all tarballs for a package and any .deb ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:50:56 +02:00
Fiona Ebner
030fa1db4b buildsys: create build directory atomically
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:50:56 +02:00
Fiona Ebner
2d17b4b4d9 buildsys: add sbuild convenience target
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:50:56 +02:00
Fiona Ebner
280d157f1c buildsys: add dsc target
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:50:56 +02:00
Fiona Ebner
f6be0ca51a buildsys: derive upload dist automatically
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-05-21 15:50:56 +02:00
Thomas Lamprecht
93d558c1ee bump version to 7.2.0-8
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-03-17 15:48:12 +01:00
Fiona Ebner
e752bbe5e2 cherry-pick TCG-related stable fixes for 7.2
When turning off the "KVM hardware virtualization" checkbox in Proxmox
VE, the TCG accelerator is used, so these fixes are relevant then.

The first patch is included to allow cherry-picking the others without
changes.

Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-03-17 15:46:20 +01:00
Thomas Lamprecht
018ef788b3 bump version to 7.2.0-8
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-03-17 12:12:02 +01:00
Fiona Ebner
72fc94c0c6 add patch fixing ACPI CPU hotplug issue with TCG
Required for the debian/edk2-vars-generator.py script in the
pve-edk2-firmware repository when building the edk2-stable202302
release. Without this patch, the QEMU process spawned by the script
would hang indefinietly.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-03-17 12:06:22 +01:00
Thomas Lamprecht
09186f4b6e bump version to 7.2.0-7
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-03-13 17:42:52 +01:00
Fiona Ebner
ffda59f626 add patches to fix regression with LSI SCSI controller
The patch 0008-memory-prevent-dma-reentracy-issues.patch introduced a
regression for the LSI SCSI controller leading to boot failures [0],
because, in its current form, it relies on reentrancy for a particular
ram_io region.

[0]: https://forum.proxmox.com/threads/123843

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-03-13 17:36:22 +01:00
Fiona Ebner
3c4f941ac7 add more stable fixes
The patches were selected from the recent "Patch Round-up for stable
7.2.1" [0]. Those that should be relevant for our supported use-cases
(and the upcoming nvme use-case) were picked. Most of the patches
added now have not been submitted to qemu-stable before.

The follow-up for the virtio-rng-pci migration fix will break
migration between versions with the fix and without the fix when a
virtio-pci-rng(-non)-transitional device is used. Luckily Proxmox VE
only uses the virtio-pci-rng device, and this was fixed by
0006-virtio-rng-pci-fix-migration-compat-for-vectors.patch which was
applied before any public version of Proxmox VE's QEMU 7.2 package was
released.

[0]: https://lists.nongnu.org/archive/html/qemu-stable/2023-03/msg00010.html
[1]: https://bugzilla.redhat.com/show_bug.cgi?id=2162569

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-03-13 17:36:19 +01:00
Fiona Ebner
3a94e1a186 fixup patch "ide: avoid potential deadlock when draining during trim"
The patch was incomplete and (re-)introduced an issue with a potential
failing assertion upon cancelation of the DMA request.

There is a patch on qemu-devel now[0], and it's the same as this one
code-wise (except for comments). But the discussion is still ongoing.
While there shouldn't be a real issue with the patch, there might be
better approaches. The plan is to use this as a stop-gap for now and
pick up the proper solution once it's ready.

[0]: https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg03325.html

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-03-13 17:36:19 +01:00
Thomas Lamprecht
67cae45f41 bump version to 7.2.0-6
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-03-08 14:32:22 +01:00
Fiona Ebner
58659169de add patch to avoid potential deadlock with trim for IDE/SATA and draining
In particular, the deadlock can occur, together with unlucky timing
between the QEMU threads, when the guest is issuing trim requests
during the start of a backup operation.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
 [ T: resolve trivial merge conflict in series file ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-03-08 14:22:36 +01:00
Fiona Ebner
10691e04e9 add patch fixing Linux boot failures with megasas SCSI
A regression in 7.2 and easily reproduced.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-03-07 19:50:12 +01:00
Thomas Lamprecht
09723b9298 bump version to 7.2.0-5
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-02-21 13:50:08 +01:00
Fiona Ebner
00e2507aac add fix for iscsi double free issue leading to crashes
Reported here[0] and here[1].

[0]: https://gitlab.com/qemu-project/qemu/-/issues/1378
[1]: https://forum.proxmox.com/threads/122776/

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-02-21 13:49:19 +01:00
Fiona Ebner
e7e5f63573 add patch fixing DMA reentrancy issues
that could lead to use-after-frees and stack overflows with a
malicious (or buggy) guest. See [0] for a good summary:

[0]: https://lore.kernel.org/qemu-devel/CAFEAcA_23vc7hE3iaM-JVA6W38LK4hJoWae5KcknhPRD5fPBZA@mail.gmail.com

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-02-21 10:18:35 +01:00
Fiona Ebner
1688b43738 QMP backup: use correct errno when getting blockdrive length fails
di->size would only be set later. The errno is minus the return value
from the function.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-02-21 09:19:16 +01:00
Fiona Ebner
eee064d954 savevm-async: keep more free space when entering final stage
In qemu-server, we already allocate 2 * $mem_size + 500 MiB for driver
state (which was 32 MiB long ago according to git history). It seems
likely that the 30 MiB cutoff in the savevm-async implementation was
chosen based on that.

In bug #4476 [0], another issue caused the iteration to not make any
progress and the state file filled up all the way to the 30 MiB +
pending_size cutoff. Since the guest is not stopped immediately after
the check, it can still dirty some RAM and the current cutoff is not
enough for a reproducer VM (was done while bug #4476 still was not
fixed), dirtying memory with
> stress-ng -B 2 --bigheap-growth 64.0M'
After entering the final stage, savevm actually filled up the state
file completely, leading to an I/O error. It's probably the same
scenario as reported in the bug report, the error message was fixed in
commit a020815 ("savevm-async: fix function name in error message")
after the bug report.

If not for the bug, the cutoff will only be reached by a VM that's
dirtying RAM faster than can be written to the storage, so increase
the cutoff to 100 MiB to have a bigger chance to finish successfully,
while still trying to not increase downtime too much for
non-hibernation snapshots.

[0]: https://bugzilla.proxmox.com/show_bug.cgi?id=4476

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-02-21 08:39:08 +01:00
Fiona Ebner
8051a24b5f fix #4476: savevm-async: avoid looping without progress
when pend_postcopy is large. By definition, pend_postcopy won't
decrease when iterating, so a value larger than the cutoff of 400000
would lead to essentially empty iterations, filling up the state file
until only 30 MiB + pending_size remain and the second half of the
check would trigger.

Avoid this, by not considering pend_postcopy for the cutoff to enter
the final phase.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-02-21 08:39:08 +01:00
Fiona Ebner
ade9f50160 d/rules: add note explaining why using noopt doesn't currenlty work
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-02-14 10:04:21 +01:00
Fiona Ebner
0fde60fd10 d/rules: add missing export for CFLAGS
Otherwise, they don't affect the build of QEMU at all.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-02-14 10:04:21 +01:00
Thomas Lamprecht
d82c5eb632 bump version to 7.2.0-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-01-27 09:37:53 +01:00
Fiona Ebner
d5f6ef56f0 add patch to fix issue with VirtIO disk using detect-zeroes=unmap
Affects Proxmox VE, when the discard disk setting is used for a
VirtIO disk.

Upstream bug report:
https://gitlab.com/qemu-project/qemu/-/issues/1404

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-01-27 09:36:41 +01:00
Fabian Grünbichler
658cba46ee d/control: also conflict with "qemu-system-data"
it ships files also shipped by our qemu package, switching from Debian qemu to
ours doesn't work without manual intervention otherwise..

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2023-01-26 10:55:37 +01:00
Fiona Ebner
a02081501a savevm-async: fix function name in error message
which also makes it distinguishable from the other
"qemu_savevm_state_iterate error" message.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-01-24 17:08:54 +01:00
Thomas Lamprecht
baf4e3132d bump version to 7.2.0-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-01-12 13:13:23 +01:00
Fiona Ebner
48c307550a add regression fix for migration with virtio-rng device
between QEMU less than 7.2 and QEMU 7.2 without the fix (both
directions are affected).

As mentioned in the patch message, this fix itself will break
migration between QEMU 7.2 and QEMU 7.2 with the fix (in both
directions, if a virtio-rng device is attached), but this is fine,
because no pve-qemu-kvm package with QEMU 7.2 has been publicly
released yet.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-01-12 13:10:19 +01:00
Thomas Lamprecht
89fdfe8975 bump version to 7.2.0-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2023-01-10 15:47:52 +01:00
Fiona Ebner
f64132208a cherry-pick stable fixes for 7.2
Two for virtio-mem and one for vIOMMU. Both features are not yet
exposed in PVE's qemu-server, but planned to be added.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-01-10 15:42:28 +01:00
Fiona Ebner
271ac0a8a7 add QAPI naming exceptions in patches introducing them
Avoids a patch and is required to compile when not all patches are
applied. No functional change is intended.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2023-01-10 15:42:16 +01:00
Fiona Ebner
f4ed54ec37 d/control: drop outdated jemalloc dependencies
Commit 3d785ea ("disable jemalloc") disabled jemalloc support, so
these are not needed anymore.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-12-21 13:52:16 +01:00
Fiona Ebner
2277182712 d/control: add libslirp-dev as a build dependency
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-12-21 13:52:16 +01:00
Fiona Ebner
0906461df0 d/rules: enable slirp again
Commit d03e1b3 ("update submodule and patches to 7.2.0") argued that
slirp is not explicitly supported in PVE, but that is not true. In
qemu-server, user networking is supported (via CLI/API) when no bridge
is set on a virtual NIC. So slirp needs to stay to keep such NICs
working.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-12-21 13:52:16 +01:00
Wolfgang Bumiller
29bee92c59 bump version to 7.2.0-1
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-12-16 13:23:29 +01:00
Fiona Ebner
82640bb859 d/rules: explicitly disable building slirp
Otherwise, it depends on whether libslirp-devel is installed or not.
See the previous commit message for more context.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-12-16 11:47:25 +01:00
Fiona Ebner
d03e1b3ce3 update submodule and patches to 7.2.0
User-facing breaking change:

The slirp submodule for user networking got removed. It would be
necessary to add the --enable-slirp option to the build and/or install
the appropriate library to continue building it. Since PVE is not
explicitly supporting it, it would require additionally installing the
libslirp0 package on all installations and there is *very* little
mention on the community forum when searching for "slirp" or
"netdev user", the plan is to only enable it again if there is some
real demand for it.

Notable changes:

* The big change for this release is the rework of job locking, using
  a job mutex and introducing _locked() variants of job API functions
  moving away from call-side AioContext locking. See (in the qemu
  submodule) commit 6f592e5aca ("job.c: enable job lock/unlock and
  remove Aiocontext locks") and previous commits for context.

  Changes required for the backup patches:
  * Use WITH_JOB_LOCK_GUARD() and call the _locked() variant of job
    API functions where appropriate (many are only availalbe as
    a _locked() variant).
  * Remove acquiring/releasing AioContext around functions taking the
    job mutex lock internally.

  The patch introducing sequential transaction support for jobs needs
  to temporarily unlock the job mutex to call job_start() when
  starting the next job in the transaction.

* The zeroinit block driver now marks its child as primary.

  The documentation in include/block/block-common.h states:
  > Filter node has exactly one FILTERED|PRIMARY child, and may have
  > other children which must not have these bits

  Without this, an assert will trigger when copying to a zeroinit target
  with qemu-img convert, because bdrv_child_cb_attach() expects any
  non-PRIMARY child to be not FILTERED:
  > qemu-img convert -n -p -f raw -O raw input.raw zeroinit:output.raw
  > qemu-img: ../block.c:1476: bdrv_child_cb_attach: Assertion
  > `!(child->role & BDRV_CHILD_FILTERED)' failed.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-12-16 11:47:20 +01:00
Thomas Lamprecht
55e33a045e bump version to 7.1.0-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-22 09:21:10 +01:00
Thomas Lamprecht
8a38e1da9e cherry-pick "block/block-backend: blk_set_enable_write_cache is IO_CODE"
albeit I was short from disarming that GLOBAL_STATE_CODE assert
completely, as its just bogus to assert that on runtime for a lot of
call sites, rather it should be verified on compilation (function
coloring with attributes and maybe a compiler plugin).

But, as this is already solved upstream lets take in that patch.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-11-22 09:19:00 +01:00
Thomas Lamprecht
3b3d5516ee bump version to 7.1.0-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-10-28 10:27:54 +02:00
Thomas Lamprecht
509409fb64 init: daemonize: defuse PID file resolve error to warning
fixes file restore, where we actively unlink the PID file of the
transient VM ourself after opening it - while we use it only for
tracking when the QEMU process itself has finished start up, it's
easier and cleaner to fix this regression now, than to rework that to
something that doesn't depends on the PID file at all.

Applying Fiona's patch as patch-patch tracked under extra, as I
expect that something similar to this gets accepted upstreamed.

Link: https://lists.proxmox.com/pipermail/pve-devel/2022-October/054448.html
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-10-28 10:22:26 +02:00
Wolfgang Bumiller
bf03cd367f bump version to 7.1.0-2
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-10-18 15:35:09 +02:00
Fiona Ebner
0af826b448 savevm async IO channel: channel writev: fix return value in error case
The documentation in include/io/channel.h states that -1 or
QIO_CHANNEL_ERR_BLOCK should be returned upon error. Simply passing
along the return value from the blk-functions has the potential to
confuse the call sides. Non-blocking mode is not implemented
currently, so -1 it is.

The "return ret" was mistakenly left over from the previous
QEMUFileOps based implementation. Also, use error_setg_errno(), since
the blk(_co)_p{readv,writev} functions return errno codes.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-18 15:32:13 +02:00
Wolfgang Bumiller
ed23707ed7 bump version to 7.1.0-1
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-10-14 14:55:53 +02:00
Fiona Ebner
4e1935c2c9 {alloc track, pbs} block driver: bdrv_co_preadv: adapt return values
to be in-line with what other implementations in QEMU do. Commit
1d39c7098bbfa6862cb96066c4f8f6735ea397c5 mentions the EIO bit and
the function is expected to return 0 upon success (see other
implementations).

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 14:52:36 +02:00
Fiona Ebner
a262e9642b savevm async: cleaner initialization of target_close_wait member
Suggested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 14:52:34 +02:00
Fiona Ebner
73912aee39 cherry-pick upstream fixes for 7.1.0
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 14:52:32 +02:00
Fiona Ebner
5b15e2ecaf update submodule and patches to 7.1.0
Notable changes:
* The only big change is the switch to using a custom QIOChannel for
  savevm-async, because the previously used QEMUFileOps was dropped.

  Changes to the current implementation:

  * Switch to vector based methods as required for an IO channel. For
    short reads the passed-in IO vector is stuffed with zeroes at the
    end, just to be sure.

  * For reading: The documentation in include/io/channel.h states that
    at least one byte should be read, so also error out when whe are
    at the very end instead of returning 0.

  * For reading: Fix off-by-one error when request goes beyond end.

    The wrong code piece was:
    if ((pos + size) > maxlen) {
        size = maxlen - pos - 1;
    }

    Previously, the last byte would not be read. It's actually
    possible to get a snapshot .raw file that has content all the way
    up the final 512 byte (= BDRV_SECTOR_SIZE) boundary without any
    trailing zero bytes (I wrote a script to do it).

    Luckily, it didn't cause a real issue, because qemu_loadvm_state()
    is not interested in the final (i.e. QEMU_VM_VMDESCRIPTION)
    section. The buffer for reading it is simply freed up afterwards
    and the function will assume that it read the whole section, even
    if that's not the case.

  * For writing: Make use of the generated blk_pwritev() wrapper
    instead of manually wrapping the coroutine to simplify and save a
    few lines.

* Adapt to changed interfaces for blk_{pread,pwrite}:
  * a9262f551e ("block: Change blk_{pread,pwrite}() param order")
  * 3b35d4542c ("block: Add a 'flags' param to blk_pread()")
  * bf5b16fa40 ("block: Make blk_{pread,pwrite}() return 0 on success")
  Those changes especially affected the qemu-img dd patches, because
  the context also changed, but also some of our block drivers used
  the functions.

* Drop qemu-common.h include: it got renamed after essentially
  everything was moved to other headers. The only remaining user I
  could find for things dropped from the header between 7.0 and 7.1
  was qemu_get_vm_name() in the iscsi-initiatorname patch, but it
  already includes the header to which the function was moved.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-14 14:52:29 +02:00
Wolfgang Bumiller
2775b2e378 bump version to 7.0.0-4
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-10-10 11:56:27 +02:00
Wolfgang Bumiller
ed01236593 add patch: PVE Backup: allow passing max-workers performance setting
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-10-10 11:55:15 +02:00
Fiona Ebner
2b259b70ec d/rules: add revision to package version
This version string can be queried with $BINARY --version as well as
the query-version QMP command.

Useful for qemu-server to be able to report the running QEMU version
exactly. Could also be used to version guard against features as an
alternative to the query-proxmox-support QMP command.

Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-10-10 11:26:47 +02:00
Thomas Lamprecht
a186335be5 bump version to 7.0.0-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-08-30 12:54:12 +02:00
Fiona Ebner
1976ca4607 savevm-async: set SAVE_STATE_DONE when closing state file was successful
Without this change, it's necessary to send a second savevm-end QMP
command after aborting a snaphsot, before a new savevm-start QMP
command can succeed.

In process_savevm_finalize(), no longer set an error in the abort
scenario. If there already is another error, there's no need to
override it. If canceling was done intentionally, qmp_savevm_end()
is responsible for setting the state now.

Reported-by: Mira Limbeck <m.limbeck@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-08-19 09:44:16 +02:00
Fiona Ebner
563c592898 savevm-async: avoid segfault when aborting snapshot
Reported in the community forum[0].

For 6.1.0, there were a few changes to the coroutine-sleep API, but
the adaptations in f376b2b ("update and rebase to QEMU v6.1.0") made
a mistake.

Currently, target_close_wait is NULL when passed to
qemu_co_sleep_ns_wakeable(), which further passes it to
qemu_co_sleep(), but there, it is dereferenced when trying to access
the 'to_wake' member:

> Thread 1 "kvm" received signal SIGSEGV, Segmentation fault.
> qemu_co_sleep (w=0x0) at ../util/qemu-coroutine-sleep.c:57

To fix it, create a proper struct and pass its address instead. Also
call qemu_co_sleep_wake unconditionally, because the NULL check (for
the 'to_wake' member) is done inside the function itself.

This patch is based on what the QEMU commits introducing the changes
to the coroutine-sleep API did to the callers in QEMU:
eaee072085 ("coroutine-sleep: allow qemu_co_sleep_wake that wakes nothing")
29a6ea24eb ("coroutine-sleep: replace QemuCoSleepState pointer with struct in the API")

[0]: https://forum.proxmox.com/threads/112130/

Tested-by: Mira Limbeck <m.limbeck@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
2022-08-19 09:44:14 +02:00
Thomas Lamprecht
1de53d8a45 bump version to 7.0.0-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-07-20 09:17:13 +02:00
Fabian Ebner
0e88ec19db add two more stable patches
For the io_uring patch, it's not very clear which configurations can
trigger it, but it should be rather uncommon. See qemu commit
be6a166fde652589761cf70471bcde623e9bd72a for a bit more information.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2022-07-19 17:22:10 +02:00
Wolfgang Bumiller
9ee866b2e9 bump version to 7.0.0-1
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-06-30 11:08:36 +02:00
Fabian Ebner
14ed554660 cherry-pick upstream fixes for 7.0.0
coming in via qemu-stable (except for the vdmk fix, which was tagged
for-7.0 on the qemu-devel list, but didn't make it into the release).

Also took the chance to switch the gluster fix to the version that
made it into upstream.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-06-29 12:29:30 +02:00
Fabian Ebner
eba403aafc d/rules: adapt to changed opensbi riscv filenames in 7.0.0
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-06-29 12:29:28 +02:00
Fabian Ebner
b2685aee04 d/rules: drop outdated configure flags
See QEMU commits 9e8be4c546ce8469ca9702715bf8f198d604b685 and
a5730b8bd3675f484ed0eacea052452048eeb35d for more information.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-06-29 12:29:25 +02:00
Fabian Ebner
dc9827a6a4 update submodule and patches to 7.0.0
Only very minor changes needed:
* Most patches in extra (or some version of them) are part of 7.0.0.
* aio_set_fd_handler got an extra parameter, but can just pass NULL
  like we did for the related 'poll' parameter. See QEMU commit
  826cc32423db2a99d184dbf4f507c737d7e7a4ae for more.
* Add include for qemu/memalign.h in vma.c and vma-writer.c.
* Add reverts for fixups of already reverted 0347a8fd4c ("block/rbd:
  implement bdrv_co_block_status") that came in with 7.0.0. Those
  fixups are not enough, see Proxmox bugzilla #4047.
* Two trivial context changes for bitmap-mirror patches.
* block_int.h got split up into multiple headers.
* Some context changes in configure and meson.build.
* Used the oppurtunity to squash fixup of bdrv_backuo_dump_create typo
  in a later patch into the patch introducing the function (had to
  move code to new header during rebase).

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-06-29 12:29:21 +02:00
Thomas Lamprecht
4e4b9ab13f bump version to 6.2.0-11
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-06-22 15:54:58 +02:00
Thomas Lamprecht
39e84ba82d vma/alloc-track improvements
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-06-22 15:52:16 +02:00
Thomas Lamprecht
4fd0fa7fb3 re-export patches in normalized form
iow. using:

git format-patch --zero-commit --no-signature --no-numbered --diff-algorithm=myers ...

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-06-22 15:49:53 +02:00
Dominik Csapak
539e333eaa add 'namespace' to BlockdevOptionsPbs
so that we can use it for the -blockdev options (used for live-restore)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2022-06-22 15:10:49 +02:00
Fabian Grünbichler
68569ea2ff bump version to 6.2.0-10
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2022-06-09 16:35:57 +02:00
Fabian Grünbichler
41aedfb6db add d/source/include-binaries
to shutup dpkg-source when building a source package

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2022-06-09 16:35:01 +02:00
Fabian Ebner
7bd4d8645a fix #4101: acquire job's aio context before calling job_unref
Otherwise, we might run into an abort via bdrv_co_yield_to_drain()
(can at least happen when a disk with iothread is used):
> #0  0x00007fef4f5dece1 __GI_raise (libc.so.6 + 0x3bce1)
> #1  0x00007fef4f5c8537 __GI_abort (libc.so.6 + 0x25537)
> #2  0x00005641bce3c71f error_exit (qemu-system-x86_64 + 0x80371f)
> #3  0x00005641bce3d02b qemu_mutex_unlock_impl (qemu-system-x86_64 + 0x80402b)
> #4  0x00005641bcd51655 bdrv_co_yield_to_drain (qemu-system-x86_64 + 0x718655)
> #5  0x00005641bcd52de8 bdrv_do_drained_begin (qemu-system-x86_64 + 0x719de8)
> #6  0x00005641bcd47e07 blk_drain (qemu-system-x86_64 + 0x70ee07)
> #7  0x00005641bcd498cd blk_unref (qemu-system-x86_64 + 0x7108cd)
> #8  0x00005641bcd31e6f block_job_free (qemu-system-x86_64 + 0x6f8e6f)
> #9  0x00005641bcd32d65 job_unref (qemu-system-x86_64 + 0x6f9d65)
> #10 0x00005641bcd93b3d pvebackup_co_complete_stream (qemu-system-x86_64 + 0x75ab3d)
> #11 0x00005641bce4e353 coroutine_trampoline (qemu-system-x86_64 + 0x815353)

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-06-09 14:57:28 +02:00
Wolfgang Bumiller
ed3b5b8ab8 bump version to 6.2.0-9
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-06-08 14:04:09 +02:00
Wolfgang Bumiller
7f4326d1dc pbs cleanup fixes
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-06-08 13:10:51 +02:00
Wolfgang Bumiller
53bff441c5 delete patches which were dropped from the series file
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-06-08 13:07:04 +02:00
Thomas Lamprecht
f9a4b1cea7 bump version to 6.2.0-8
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-05-19 09:25:11 +02:00
Fabian Ebner
dc265df350 add revert to work around performance regression when backing up large RBD disk
resulting in QMP timeouts and very slow backups. The plan is to figure
out (ideally together with upstream) a way to make the implementation
of bdrv_co_block_status for RBD more efficient. But for now, revert
the problematic change as a stop-gap measure.

Upstream bug report:
https://gitlab.com/qemu-project/qemu/-/issues/1026

Forum threads:
https://forum.proxmox.com/threads/109272/
https://forum.proxmox.com/threads/109448/
https://forum.proxmox.com/threads/101334/ (partially)

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2022-05-19 09:23:38 +02:00
Thomas Lamprecht
e0076597c6 bump version to 6.2.0-7
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-05-12 16:06:00 +02:00
Thomas Lamprecht
ee99c1f813 d/control: bump build-depenceny of proxmox-backup-qemu to 1.3.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-05-12 16:05:30 +02:00
Wolfgang Bumiller
58a5492e9c namespace support
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-05-12 13:49:35 +02:00
Thomas Lamprecht
e67b8b5bd9 bump version to 6.2.0-6
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-05-11 10:42:57 +02:00
Thomas Lamprecht
309b5c1694 backport various fixes for gluster, qxl and vnc
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-05-11 10:40:14 +02:00
Thomas Lamprecht
4ce5937f89 bump version to 6.2.0-5
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-25 10:13:50 +02:00
Thomas Lamprecht
f87d0523df vma: allow partial restore
Introduce a new map line for skipping a certain drive, of the form
skip=drive-scsi0

Since in PVE, most archives are compressed and piped to vma for
restore, it's not easily possible to skip reads.

For the reader, a new skip flag for VmaRestoreState is added and the
target is allowed to be NULL if skip is specified when registering.
If
the skip flag is set, no writes will be made as well as no check for
duplicate clusters. Therefore, the flag is not set for verify.

Originally-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-25 10:07:37 +02:00
Thomas Lamprecht
2fd4ea2813 patches: update context
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-25 10:07:01 +02:00
Thomas Lamprecht
2653a5f029 vma: restore: call blk_unref for all opened block devices
Originally-by: Fabian Ebner <f.ebner@proxmox.com>
Link: https://lists.proxmox.com/pipermail/pve-devel/2022-April/052642.html
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-25 10:05:29 +02:00
Thomas Lamprecht
664ecf59a9 bump version to 6.2.0-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-22 11:52:34 +02:00
Thomas Lamprecht
4de9440f87 various stable backports
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-22 10:22:39 +02:00
Thomas Lamprecht
9b348f8c6d d/copyright: drop trailing whitespace
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-22 09:16:23 +02:00
Thomas Lamprecht
799cf8c5a3 d/control: add suggest dependency-hint for libgl1
It pulls in a lot of stuff via the libglx0 -> libglx-mesa0 dependency
chain, so only suggest it for now to avoid installing it in the
installer or via common "PVE on-top Debian" installations, VirGL
integration is experimental after all and we may drop/replace it with
the vulkan based venus one, once available (Debian 12?).

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-22 09:14:39 +02:00
Thomas Lamprecht
b02e62dba0 d/control: add libgbm to build dependencies
required for good virgl support

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-22 08:50:36 +02:00
Thomas Lamprecht
fc03e1b6bf bump version to 6.2.0-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-15 09:09:43 +02:00
Thomas Lamprecht
c8ba14bed0 cherry-pick fix for passing some acpi slic tables
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-15 08:07:34 +02:00
Wolfgang Bumiller
daea534caa bump version to 6.2.0-2
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-03-03 12:05:37 +01:00
Fabian Ebner
27199bd753 backup: add patch to initialize bcs bitmap early enough for PBS
This is necessary for multi-disk backups where not all jobs are
immediately started after they are created. QEMU commit
06e0a9c16405c0a4c1eca33cf286cc04c42066a2 did already part of the work,
ensuring that new writes after job creation don't pass through to the
backup, but not yet for the MIRROR_SYNC_MODE_BITMAP case which is used
for PBS.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-03-03 11:37:17 +01:00
Thomas Lamprecht
e050683663 d/control: mark numactl a recommended package
we do not call in anywhere unconditionally

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-24 07:51:19 +01:00
Thomas Lamprecht
13184117e4 d/control: drop sdl dependency, we disable it on compile tinme
disabled via d/rules since a while...

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-24 07:51:19 +01:00
Thomas Lamprecht
aa4b14ea10 d/control: libaio1 is added by dh shlibs
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-24 07:51:19 +01:00
Thomas Lamprecht
3aa5b7598d enable zstd support
plan to use that for multifd migration, among other things

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-24 07:51:19 +01:00
Thomas Lamprecht
13d3e10aa6 compile in virgl support
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-24 07:51:19 +01:00
Thomas Lamprecht
58c3533a58 bump version to 6.2.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-18 14:23:41 +01:00
Thomas Lamprecht
fe0542bed9 d/rules: disable libssh by default
was always disabled in our clean builds, this now also avoids
auto-enabling it on "dirty" build hosts

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-18 14:21:53 +01:00
Fabian Ebner
f6d40bfdf4 add patch for loading a snapshot with qemu-img dd
Will be used when cloning from a qcow2 efidisk.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2022-02-15 14:03:07 +01:00
Fabian Ebner
107132becc fix getopt-string when introducing -n option for qemu-img dd
The colon after U is wrong, because it doesn't take an argument.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2022-02-15 14:03:07 +01:00
Fabian Ebner
4567474e95 update submodule and patches to 6.2.0
Notable changes:
* bdrv_co_p{discard,readv,writev,write_zeroes} function signatures
  changed, to using int64_t for offsets/bytes and some still had int
  rather than BrdvRequestFlags for the flags.
* job_cancel_sync now has a force parameter. Commit messages in
  73895f3838cd7fdaf185cf1dbc47be58844a966f
  4cfb3f05627ad82af473e7f7ae113c3884cd04e3
  sound like using force=true makes more sense.
* Added 3 patches coming in via qemu-stable tag, most important one is
  to work around a librbd issue.
* Added another 3 patches from qemu-devel to fix issue leading to
  crash when live migrating with iothread.
* cluster_size calculation helper changed (see patch pve/0026).
* QAPI's if conditionals now use 'CONFIG_FOO' rather than
  'defined(CONFIG_FOO)'

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2022-02-15 14:03:07 +01:00
Thomas Lamprecht
33a2d3a826 bump version to 6.1.1-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-14 15:53:18 +01:00
Fabian Ebner
2bf61c3eb6 vma: create: register all streams before entering coroutines
Otherwise, the header might already get written by a coroutine and
registering further streams will fail after that.

Also adds a missing g_list_free call for the other GList that's used.

Reported in the community forum:
https://forum.proxmox.com/threads/104744/

Reproducer script (increase beyond 30 if the issue isn't triggered yet):
> #!/usr/bin/perl
>
> my $dir = "./vma-create-bug";
> mkdir $dir;
>
> my $archive_path = "$dir/vzdump-qemu-104-2202_02_02-00_00_00.vma";
> unlink $archive_path;
>
> my $cmd = "vma create $archive_path -v";
> for (my $i = 0; $i < 30; $i++) {
>   system("truncate -s 1M $dir/drive-virtio$i.img");
>   $cmd .= " drive-virtio$i=$dir/drive-virtio$i.img";
> }
> system($cmd);

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2022-02-14 15:38:58 +01:00
Thomas Lamprecht
c07b2203b3 bump version to 6.1.1-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-01-13 10:57:48 +01:00
Thomas Lamprecht
ddbf7a872d update submodule and patches to 6.1.1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-01-13 10:56:39 +01:00
Thomas Lamprecht
95c7156d1e bump version to 6.1.0-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-12-01 15:35:49 +01:00
Fabian Ebner
570d4ad51d fix #3738: cherry-pick "block: introduce max_hw_iov for use in scsi-generic"
which fixes the bad commit 18473467d55a20d643b6c9b3a52de42f705b4d35
that was tracked down via bisecting, and has a Cc for qemu-stable as
well.

Issue was easy enough to reproduce with a single virtio-block disk
using a few runs of dd if=/dev/urandom of=file bs=1M count=1000

Commit cc071629539dc1f303175a7e2d4ab854c0a8b20f upstream.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-12-01 15:34:27 +01:00
Dominik Csapak
c5e8e7c998 buildsys: fix build-dependencies on headers for 'vma' and 'pbs_restore'
both of them depend on generated header files, so we have to specify
them as sources. Otherwise, it happens (at least on some machines)
that they will be compiled before the headers are generated, aborting
the build.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-11-18 08:11:57 +01:00
Fabian Grünbichler
7cf6b60926 fix #3728: handle machine without type
libguestfs starts their helper VMs with `-machine accel=..` without a
machine type, and our pve version suffix handling would segfault in that
case. there might be other scripted use cases that are affected as well.

this regression was introduced with the rebase of our patch set on top
of 6.1.0

Fixes: f376b2b9e2

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2021-11-17 17:20:26 +01:00
Thomas Lamprecht
50999525c6 bump version to 6.1.0-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-11-16 09:38:18 +01:00
Fabian Grünbichler
edbcc10a69 cherry-pick segfault fix
this was reported multiple times in our forums[1 with backtraces, 2 & 3
with same log messages], fix is taken from upstream master.

1: https://forum.proxmox.com/threads/pve-7-0-14-1-vm-not-running-live-migration-kills-vm-post-ssd-move-pre-ram-move.99704/
2: https://forum.proxmox.com/threads/proxmox-7-0-14-1-crashes-vm-during-migrate-to-other-host.99678
3: https://forum.proxmox.com/threads/cannot-migrate-between-zfs-and-ceph.99685/#post-430152

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2021-11-16 09:23:43 +01:00
Thomas Lamprecht
cc707c362e bump version to 6.1.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-10-13 17:58:18 +02:00
Stefan Reiter
af64ed13eb add fixup patch for qxl migration logic
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-10-13 17:58:18 +02:00
Stefan Reiter
f376b2b9e2 update and rebase to QEMU v6.1.0
Very clean rebase, only the +pve version handling needed manual fixing.
Drops two applied patches from extra/ and adds one new from upstream
(extra/0001*, fixes VNC over unix sockets) as well as 3 of my own for
allowing password changes on custom VNC displays again (as seen and
reviewed upstream, but not yet applied).

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-10-11 15:13:26 +02:00
Thomas Lamprecht
89fa943ef9 bump version to 6.0.0-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-09-06 07:30:05 +02:00
Stefan Reiter
26eee146bc add temporary QMP race fix
same as the initial version sent to qemu-devel, it won't be the final
fix we plan to upstream but it should be enough band-aid to
workaround how PVE uses the QMP.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
 [ Thomas: add a bit reasoning to commit message body ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-09-06 07:28:07 +02:00
Wolfgang Bumiller
277d33454f drop patch force-disabling smm
This drops debian/patches/pve/0005-PVE-Config-smm_available-false.patch
(and renumbers the remaining patches)

From what I could gather, this patch was originally added
due to issues with old kernels. Now we have users which
seem to run into issues *with* the patch.

All this does is toggle an option, and it's available via a
qemu CLI option anyway, so if dropping this patch causes
issues for some people we can just add an option to
qemu-server & UI control smm explicitly.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Cc: Alexandre Derumier <aderumier@odiso.com>
Tested-by: Stefan Reiter <s.reiter@proxmox.com>
2021-08-24 11:19:05 +02:00
Fabian Grünbichler
611b692181 bump version to 6.0.0-3
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2021-08-03 15:02:40 +02:00
Fabian Ebner
0114d3cd02 io_uring: resubmit when result is -EAGAIN
Linux SCSI can throw spurious -EAGAIN in some corner cases in its
completion path, which will end up being the result in the completed
io_uring request.

Resubmitting such requests should allow block jobs to complete, even
if such spurious errors are encountered.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-07-29 11:51:57 +02:00
Thomas Lamprecht
bb3e494bdc bump version to 6.0.0-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-06-23 11:04:47 +02:00
Stefan Reiter
403f23a0c3 enable io-uring support in QEMU builds
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-06-21 09:56:06 +02:00
Thomas Lamprecht
db687e3cac buildsys: change upload dist to bullseye
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-06-08 11:18:10 +02:00
Thomas Lamprecht
8893def37c bump version to 6.0.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-05-28 11:30:55 +02:00
Stefan Reiter
263fef5b4c update keycodemapdb for 6.0
QEMU 6.0 requires the updated version to build correctly, as the
keymap-gen tool gained some new parameters.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-05-28 11:29:44 +02:00
Stefan Reiter
eb96e850ac debian: ignore submodule checks in QEMU build
...we do those manually, and the build dir is not a git repo.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-05-28 11:29:44 +02:00
Stefan Reiter
8dca018b68 udpate and rebase to QEMU v6.0.0
Mostly minor changes, bigger ones summarized:
* QEMU's internal backup code now uses a new async system, which allows
  parallel requests - the default max_workers settings is 64, I chose
  less, since 64 put enough stress on QEMU that the guest became
  practically unusable during the backup, and 16 still shows quite a
  nice measureable performance improvement. Little code changes for us
  though.
* 'malformed' QAPI parameters/functions are now a build error (i.e.
  using '_' vs '-'), I chose to just whitelist our calls in the name of
  backwards compatibility.
* monitor OOB race fix now uses the upstream variant, cherry-picked from
  origin/master since it's not in 6.0 by default
* last patch fixes a bug with snapshot rollback related to the new yank
  system

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-05-28 11:29:44 +02:00
149 changed files with 11923 additions and 12039 deletions

7
.gitignore vendored Normal file
View File

@@ -0,0 +1,7 @@
/*.build
/*.buildinfo
/*.changes
/*.deb
/*.dsc
/*.tar*
/pve-qemu-kvm-*.*/

View File

@@ -1,60 +1,90 @@
include /usr/share/dpkg/pkg-info.mk
include /usr/share/dpkg/architecture.mk
include /usr/share/dpkg/default.mk
PACKAGE = pve-qemu-kvm
SRCDIR := qemu
BUILDDIR ?= ${PACKAGE}-${DEB_VERSION_UPSTREAM}
BUILDDIR ?= $(PACKAGE)-$(DEB_VERSION_UPSTREAM)
ORIG_SRC_TAR=$(PACKAGE)_$(DEB_VERSION_UPSTREAM).orig.tar.gz
GITVERSION := $(shell git rev-parse HEAD)
DEB = ${PACKAGE}_${DEB_VERSION_UPSTREAM_REVISION}_${DEB_BUILD_ARCH}.deb
DEB_DBG = ${PACKAGE}-dbg_${DEB_VERSION_UPSTREAM_REVISION}_${DEB_BUILD_ARCH}.deb
DSC=$(PACKAGE)_$(DEB_VERSION_UPSTREAM_REVISION).dsc
DEB = $(PACKAGE)_$(DEB_VERSION_UPSTREAM_REVISION)_$(DEB_BUILD_ARCH).deb
DEB_DBG = $(PACKAGE)-dbgsym_$(DEB_VERSION_UPSTREAM_REVISION)_$(DEB_BUILD_ARCH).deb
DEBS = $(DEB) $(DEB_DBG)
all: $(DEBS)
.PHONY: submodule
submodule:
test -f "${SRCDIR}/configure" || git submodule update --init --recursive
ifeq ($(shell test -f "$(SRCDIR)/configure" && echo 1 || echo 0), 0)
git submodule update --init --recursive
cd $(SRCDIR); meson subprojects download
endif
$(BUILDDIR): keycodemapdb | submodule
PC_BIOS_FW_PURGE_LIST_IN = \
hppa-firmware.img \
hppa-firmware64.img \
openbios-ppc \
openbios-sparc32 \
openbios-sparc64 \
palcode-clipper \
s390-ccw.img \
s390-netboot.img \
u-boot.e500 \
.*[a-zA-Z0-9]\.dtb \
.*[a-zA-Z0-9]\.dts \
qemu_vga.ndrv \
slof.bin \
opensbi-riscv.*-generic-fw_dynamic.bin \
BLOB_PURGE_SED_CMDS = $(foreach FILE,$(PC_BIOS_FW_PURGE_LIST_IN),-e "/$(FILE)/d")
BLOB_PURGE_FILTER = $(foreach FILE,$(PC_BIOS_FW_PURGE_LIST_IN),-e "$(FILE)")
$(BUILDDIR): submodule
# check if qemu/ was used for a build
# if so, please run 'make distclean' in the submodule and try again
test ! -f $(SRCDIR)/build/config.status
rm -rf $(BUILDDIR)
cp -a $(SRCDIR) $(BUILDDIR)
cp -a debian $(BUILDDIR)/debian
rm -rf $(BUILDDIR)/ui/keycodemapdb
cp -a keycodemapdb $(BUILDDIR)/ui/
echo "git clone git://git.proxmox.com/git/pve-qemu.git\\ngit checkout $(GITVERSION)" > $(BUILDDIR)/debian/SOURCE
rm -rf $@.tmp $@
cp -a $(SRCDIR) $@.tmp
cp -a debian $@.tmp/debian
rm -rf $@.tmp/roms/edk2 # packaged separately
find $@.tmp/pc-bios -type f | grep $(BLOB_PURGE_FILTER) | xargs rm -f
sed -i $(BLOB_PURGE_SED_CMDS) $@.tmp/pc-bios/meson.build
echo "git clone git://git.proxmox.com/git/pve-qemu.git\\ngit checkout $(GITVERSION)" > $@.tmp/debian/SOURCE
mv $@.tmp $@
.PHONY: deb kvm
deb kvm: $(DEBS)
$(DEB_DBG): $(DEB)
$(DEB): $(BUILDDIR)
cd $(BUILDDIR); dpkg-buildpackage -b -us -uc -j
cd $(BUILDDIR); dpkg-buildpackage -b -us -uc -j32
lintian $(DEBS)
.PHONY: update
update:
cd $(SRCDIR) && git submodule deinit ui/keycodemapdb || true
rm -rf $(SRCDIR)/ui/keycodemapdb
mkdir $(SRCDIR)/ui/keycodemapdb
cd $(SRCDIR) && git submodule update --init ui/keycodemapdb
rm -rf keycodemapdb
mkdir keycodemapdb
cp -R $(SRCDIR)/ui/keycodemapdb/* keycodemapdb/
git add keycodemapdb
sbuild: $(DSC)
sbuild $(DSC)
$(ORIG_SRC_TAR): $(BUILDDIR)
tar czf $(ORIG_SRC_TAR) --exclude="$(BUILDDIR)/debian" $(BUILDDIR)
.PHONY: dsc
dsc:
rm -rf *.dsc $(BUILDDIR)
$(MAKE) $(DSC)
lintian $(DSC)
$(DSC): $(ORIG_SRC_TAR) $(BUILDDIR)
cd $(BUILDDIR); dpkg-buildpackage -S -us -uc -d
.PHONY: upload
upload: UPLOAD_DIST ?= $(DEB_DISTRIBUTION)
upload: $(DEBS)
tar cf - ${DEBS} | ssh repoman@repo.proxmox.com upload --product pve --dist buster
tar cf - $(DEBS) | ssh repoman@repo.proxmox.com upload --product pve --dist $(UPLOAD_DIST)
.PHONY: distclean clean
distclean: clean
clean:
rm -rf $(BUILDDIR) $(PACKAGE)*.deb *.buildinfo *.changes
rm -rf $(PACKAGE)-[0-9]*/ $(PACKAGE)*.tar* *.deb *.dsc *.build *.buildinfo *.changes
.PHONY: dinstall
dinstall: $(DEBS)

656
debian/changelog vendored
View File

@@ -1,3 +1,659 @@
pve-qemu-kvm (9.2.0-5+vitastor1) bookworm; urgency=medium
* Add Vitastor support
-- Vitaliy Filippov <vitalif@yourcmc.ru> Fri, 11 Apr 2025 02:07:44 +0300
pve-qemu-kvm (9.2.0-5) bookworm; urgency=medium
* pve backup: backup-access api: simplify bitmap logic
-- Proxmox Support Team <support@proxmox.com> Fri, 04 Apr 2025 16:15:58 +0200
pve-qemu-kvm (9.2.0-4) bookworm; urgency=medium
* various async snapshot improvements, inclduing using a dedicated IO thread
for the state file when doing a live snapshot. That should reduce load on
the main thread and for it to get stuck on IO, i.e. same benefits as using
a dedicated IO thread for regular drives. This is particularly interesting
when the VM state storage is a network storage like NFS. It should also
address #6262.
* pve backup: implement basic features and API in preperation for external
backup provider storage plugins.
-- Proxmox Support Team <support@proxmox.com> Thu, 03 Apr 2025 17:00:34 +0200
pve-qemu-kvm (9.2.0-3) bookworm; urgency=medium
* revert changes to the High Precision Event Timer (HPET) to fix performance
regression
-- Proxmox Support Team <support@proxmox.com> Wed, 26 Mar 2025 09:56:01 +0100
pve-qemu-kvm (9.2.0-2) bookworm; urgency=medium
* fix assertion failure when migrating a VM with multiple disks on a
replicated ZFS.
-- Proxmox Support Team <support@proxmox.com> Mon, 24 Feb 2025 17:33:34 +0100
pve-qemu-kvm (9.2.0-1) bookworm; urgency=medium
* update submodule and patches to QEMU 9.2.0
-- Proxmox Support Team <support@proxmox.com> Tue, 04 Feb 2025 08:49:20 +0100
pve-qemu-kvm (9.1.2-3) bookworm; urgency=medium
* async snapshot: explicitly specify raw format when loading the VM state
file
* vma create: rework CLI parameters for passing disk to a more structured
style and use that to allow explicitly specifying the format
-- Proxmox Support Team <support@proxmox.com> Fri, 24 Jan 2025 16:12:34 +0100
pve-qemu-kvm (9.1.2-2) bookworm; urgency=medium
* adapt machine version deprecation for Proxmox VE release and support
cycle.
-- Proxmox Support Team <support@proxmox.com> Fri, 17 Jan 2025 16:34:06 +0100
pve-qemu-kvm (9.1.2-1) bookworm; urgency=medium
* update submodule and patches to QEMU 9.1.2
* improve error handling and edge cases with fleecing backups.
-- Proxmox Support Team <support@proxmox.com> Wed, 11 Dec 2024 16:47:21 +0100
pve-qemu-kvm (9.0.2-4) bookworm; urgency=medium
* async snapshot: ensure any dynamic vCPU-throttling applied for
auto-converge gets always disabled again after finishing the snapshot.
-- Proxmox Support Team <support@proxmox.com> Sun, 10 Nov 2024 11:23:09 +0100
pve-qemu-kvm (9.0.2-3) bookworm; urgency=medium
* pick up fix for VirtIO PCI regressions
* pick up stable fixes for 9.0, including fixes for VirtIO-net, ARM and
x86(_64) emulation, CVEs to harden NBD server against malicious clients,
as well as a few others (VNC, physmem, Intel IOMMU, ...).
-- Proxmox Support Team <support@proxmox.com> Fri, 06 Sep 2024 16:21:42 +0200
pve-qemu-kvm (9.0.2-2) bookworm; urgency=medium
* actually update submodule to QEMU 9.0.2. The previous release was still
based on 9.0.0 by mistake.
-- Proxmox Support Team <support@proxmox.com> Wed, 07 Aug 2024 10:16:01 +0200
pve-qemu-kvm (9.0.2-1) bookworm; urgency=medium
* update submodule and patches to QEMU 9.0.2. While our version had most
stable fixes included already, there are new fixes for VirtIO and VGA
display screen blanking (#4786)
* backport fix for a regression with the LSI-53c895a controller and one for
the boot order getting ignored for USB storage
-- Proxmox Support Team <support@proxmox.com> Mon, 29 Jul 2024 18:59:40 +0200
pve-qemu-kvm (9.0.0-6) bookworm; urgency=medium
* fix a regression in the zeroinit block driver that prevented importing and
cloning disks to RBD storages which are not using the krbd setting
-- Proxmox Support Team <support@proxmox.com> Mon, 08 Jul 2024 16:11:15 +0200
pve-qemu-kvm (9.0.0-5) bookworm; urgency=medium
* backport fix for CVE-2024-4467 to prevent malicious qcow2 image files from
already causing bad effects if being queried via 'qemu-img info'. For
Proxmox VE, this is an additional safe guard, as currently it directly
creates and manages the qcow2 images used by VMs and does not allow
unprivileged users to import them
* fix #4726: code cleanup: avoid superfluous check in vma backup code
-- Proxmox Support Team <support@proxmox.com> Wed, 03 Jul 2024 13:13:35 +0200
pve-qemu-kvm (9.0.0-4) bookworm; urgency=medium
* fix crash after saving a snapshot without including VM state when a VirtIO
block device with iothread is configured.
* fix edge case in error handling when opening a block device from PBS fails
* minor code cleanup in backup code
-- Proxmox Support Team <support@proxmox.com> Mon, 01 Jul 2024 11:26:11 +0200
pve-qemu-kvm (9.0.0-3) bookworm; urgency=medium
* fix crash when doing resize after hotplugging a disk using io_uring
* fix some minor issues in software CPU emulation (i.e. non-KVM) for ARM and
x86(_64)
-- Proxmox Support Team <support@proxmox.com> Wed, 29 May 2024 15:55:44 +0200
pve-qemu-kvm (9.0.0-2) bookworm; urgency=medium
* fix #5409: backup: fix copy-before-write timeout
* backup: improve error when copy-before-write fails for fleecing
* fix forwards and backwards migration with VirtIO-GPU display
* fix a regression in pflash device introduced in 8.2
* revert a commit for VirtIO PCI devices that turned out to cause more
potential security issues than what it fixed
* move compatibility flags for a new VirtIO-net feature to the correct
machine type. The feature was introduced in QEMU 8.2, but the
compatibility flags got added to machine version 8.0 instead of 8.1. This
breaks backwards migration with machine version 8.1 from a 8.2/9.0 binary
to an 8.1 binary, in cases where the guest kernel enables the feature
(e.g. Ubuntu 23.10).
While that breaks migration with machine version 8.1 from an unpatched to
a patched binary, Proxmox VE only ever had 8.2 on the test repository and
9.0 not yet in any public repository.
-- Proxmox Support Team <support@proxmox.com> Fri, 17 May 2024 17:04:52 +0200
pve-qemu-kvm (9.0.0-1) bookworm; urgency=medium
* update submodule and patches to QEMU 9.0.0
-- Proxmox Support Team <support@proxmox.com> Mon, 29 Apr 2024 10:51:37 +0200
pve-qemu-kvm (8.2.2-1) bookworm; urgency=medium
* update submodule and patches to QEMU 8.2.2
-- Proxmox Support Team <support@proxmox.com> Sat, 27 Apr 2024 12:44:30 +0200
pve-qemu-kvm (8.1.5-5) bookworm; urgency=medium
* implement support for backup fleecing
-- Proxmox Support Team <support@proxmox.com> Thu, 11 Apr 2024 17:46:48 +0200
pve-qemu-kvm (8.1.5-4) bookworm; urgency=medium
* fix live-import for certain kinds of VMDK images that rely on padding
* backup: avoid bubbling up first error if it's an ECANCELED one, as those
are often a result of cancling the job due to running into an actual
issue.
* backup: factor out & clean up gathering device info into helper
-- Proxmox Support Team <support@proxmox.com> Tue, 12 Mar 2024 14:08:40 +0100
pve-qemu-kvm (8.1.5-3) bookworm; urgency=medium
* backport fix for potential deadlock during QMP stop command if the VM has
disks attached through VirtIO-Block and IO-Thread enabled
* fix #4507: add patch to automatically increase NOFILE soft limit
-- Proxmox Support Team <support@proxmox.com> Wed, 21 Feb 2024 20:11:23 +0100
pve-qemu-kvm (8.1.5-2) bookworm; urgency=medium
* work around for a situation where guest IO might get stuck, if the VM is
configure with iothread and VirtIO block/SCSI
-- Proxmox Support Team <support@proxmox.com> Fri, 02 Feb 2024 19:41:27 +0100
pve-qemu-kvm (8.1.5-1) bookworm; urgency=medium
* update to 8.1.5 stable release, including more relevant fixes like:
- virtio-net: correctly copy vnet header when flushing TX
- hw/pflash: implement update buffer for block writes
- Fixes to i386 emulation and ARM emulation.
-- Proxmox Support Team <support@proxmox.com> Fri, 02 Feb 2024 19:08:13 +0100
pve-qemu-kvm (8.1.2-6) bookworm; urgency=medium
* revert attempted fix to avoid rare issue with stuck guest IO when using
iothread, because it caused a much more common issue with iothreads
consuming too much CPU
-- Proxmox Support Team <support@proxmox.com> Fri, 15 Dec 2023 14:22:06 +0100
pve-qemu-kvm (8.1.2-5) bookworm; urgency=medium
* backport workaround for stuck guest IO with iothread and VirtIO block/SCSI
in some rare edge cases
* backport fix for potential deadlock when issuing the "resize" QMP command
for a disk that is using iothread
-- Proxmox Support Team <support@proxmox.com> Mon, 11 Dec 2023 16:58:27 +0100
pve-qemu-kvm (8.1.2-4) bookworm; urgency=medium
* fix vnc clipboard in the host to guest direction
-- Proxmox Support Team <support@proxmox.com> Wed, 22 Nov 2023 14:28:21 +0100
pve-qemu-kvm (8.1.2-3) bookworm; urgency=medium
* fix #5054: backport fix for software reset with SATA, avoiding breakage
with, e.g., some FreeBSD VMs
-- Proxmox Support Team <support@proxmox.com> Mon, 20 Nov 2023 10:24:50 +0100
pve-qemu-kvm (8.1.2-2) bookworm; urgency=medium
* revert "x86: acpi: workaround Windows not handling name references in
Package properly" as that seems to have broken networking (and possibly
other things) one some localized variants of Windows (e.g., the German
versions).
-- Proxmox Support Team <support@proxmox.com> Fri, 17 Nov 2023 11:55:23 +0100
pve-qemu-kvm (8.1.2-1) bookworm; urgency=medium
* update submodule and patches to QEMU 8.1.2
* use QEMU's keycode-map-db again instead of our static copy from QEMU 6.0
* disable graph locking, newly introduced in the 8.1 release, as it has
still various deadlock issuess, e.g., during canceling backup jobs.
-- Proxmox Support Team <support@proxmox.com> Tue, 24 Oct 2023 13:42:45 +0200
pve-qemu-kvm (8.0.2-7) bookworm; urgency=medium
* fix #2874: SATA: avoid unsolicited write to sector 0 during reset
-- Proxmox Support Team <support@proxmox.com> Wed, 04 Oct 2023 08:33:35 +0200
pve-qemu-kvm (8.0.2-6) bookworm; urgency=medium
* fix #1534: vma: add extract-filter for disk images allowing users to pass
a comma separated list of the disks they want to extract from an archive.
* backup: create jobs in a drained section to avoid subtle bugs where
something interferes with the block-copy-state bitmap on initialization
* backup: drop experimental, and since a while also fully broken, directory
backup format (BACKUP_FORMAT_DIR). This format was never exposed via the
Proxmox VE API, but only available via QMP, as its broken since QEMU 8 and
we got zero reports about that, it's safe to assume that there are no
public users, so just remove it completely.
-- Proxmox Support Team <support@proxmox.com> Wed, 06 Sep 2023 17:03:59 +0200
pve-qemu-kvm (8.0.2-5) bookworm; urgency=medium
* improve memory footprint after backup by not keeping as much memory
resident.
* fix file descriptor leak for vhost (used by default by vNICs).
-- Proxmox Support Team <support@proxmox.com> Wed, 16 Aug 2023 11:52:24 +0200
pve-qemu-kvm (8.0.2-4) bookworm; urgency=medium
* fix resume for snapshot and hibernate in combination with iothread and
dirty bitmap
-- Proxmox Support Team <support@proxmox.com> Fri, 28 Jul 2023 12:58:22 +0200
pve-qemu-kvm (8.0.2-3) bookworm; urgency=medium
* fix regression in QEMU 8.0 for drive mirror with bitmap
-- Proxmox Support Team <support@proxmox.com> Thu, 15 Jun 2023 13:57:46 +0200
pve-qemu-kvm (8.0.2-2) bookworm; urgency=medium
* drop custom get_link_status QMP command, was never really used.
* drop custom & deprecated drive snapshot QMP commands, we use a better
alternative since a while.
-- Proxmox Support Team <support@proxmox.com> Fri, 09 Jun 2023 07:57:56 +0200
pve-qemu-kvm (8.0.2-1) bookworm; urgency=medium
* update to QEMU stable release 8.0.2
* update patches for avoiding issues with DMA reentrancy to current,
slightly optimized version.
-- Proxmox Support Team <support@proxmox.com> Tue, 06 Jun 2023 16:34:50 +0200
pve-qemu-kvm (8.0.0-1) bookworm; urgency=medium
* update to QEMU stable release 8.0.0
* re-build for Proxmox VE 8 / Debian 12 Bookworm
* adapt to the local virtiofsd C variant being dropped, it has been
rewritten in Rust and is now hosted in a separate source repository.
-- Proxmox Support Team <support@proxmox.com> Mon, 22 May 2023 13:45:49 +0200
pve-qemu-kvm (7.2.0-8) bullseye; urgency=medium
* backport fix for ACPI CPU hotplug issue with TCG
* cherry-pick TCG-related stable fixes for 7.2 for users that turned off KVM
HW acceleration
-- Proxmox Support Team <support@proxmox.com> Fri, 17 Mar 2023 15:47:08 +0100
pve-qemu-kvm (7.2.0-7) bullseye; urgency=medium
* improve fix for potential deadlock with trim for IDE/SATA and draining
* backport stable fixes:
- hw/nvme: fix missing endian conversions for doorbell buffers
- hw/smbios: fix field corruption in type 4 table
- virtio-rng-pci: fix transitional migration compat for vectors
- hw/timer/hpet: Fix expiration time overflow
- vhost/vdpa: stop all svq on device deletion
- vhost: avoid a potential use of an uninitialized variable in the call to
vhost_svq_poll
- chardev/char-socket: set s->listener = NULL in char_socket_finalize to
fix a potential crash after live-migration
- intel-iommu: fail MAP notifier without caching mode
- intel-iommu: fail DEVIOTLB_UNMAP without dt mode
* fix a regression for when the LSI SCSI controller is used
-- Proxmox Support Team <support@proxmox.com> Mon, 13 Mar 2023 17:42:49 +0100
pve-qemu-kvm (7.2.0-6) bullseye; urgency=medium
* fix 7.2 regression for Linux boot failures with megasas SCSI
* fix 7.0 regression for a potential deadlock with trim for IDE/SATA and
draining
-- Proxmox Support Team <support@proxmox.com> Wed, 08 Mar 2023 14:32:17 +0100
pve-qemu-kvm (7.2.0-5) bullseye; urgency=medium
* fix #4476: savevm-async: avoid looping without progress
* savevm-async: decrease the boundary for free space for (memory) state left
on target from 30 MiB to 100 MiB, improving the heuristic for when to
enter the final "pause and sync" stage.
* QMP backup: use correct error number when getting blockdrive length fails
* backport fix for some DMA reentrancy issues, better protecting against
malicious guests
* backport fix for iSCSI double free issue leading to crashes
-- Proxmox Support Team <support@proxmox.com> Tue, 21 Feb 2023 13:49:43 +0100
pve-qemu-kvm (7.2.0-4) bullseye; urgency=medium
* backport fix for a 7.2 regression when using VirtIO disk with
detect-zeroes=unmap
-- Proxmox Support Team <support@proxmox.com> Fri, 27 Jan 2023 09:37:49 +0100
pve-qemu-kvm (7.2.0-3) bullseye; urgency=medium
* add fix for live-migration with virtio-rng devices, which regressed in
QEMU 7.2.0.
-- Proxmox Support Team <support@proxmox.com> Thu, 12 Jan 2023 13:13:14 +0100
pve-qemu-kvm (7.2.0-2) bullseye; urgency=medium
* enable slirp again for now, as in qemu-server, user networking is
supported (via CLI/API) when no bridge is set on a virtual NIC
* cherry-pick stable fixes for 7.2. Two for virtio-mem and one for vIOMMU.
Both features are not yet exposed in PVE's qemu-server, but there's work
going on to change that.
-- Proxmox Support Team <support@proxmox.com> Tue, 10 Jan 2023 15:47:48 +0100
pve-qemu-kvm (7.2.0-1) bullseye; urgency=medium
* update to QEMU stable release 7.2.0
* drop 'slirp' networking
-- Proxmox Support Team <support@proxmox.com> Fri, 16 Dec 2022 13:18:21 +0100
pve-qemu-kvm (7.1.0-4) bullseye; urgency=medium
* cherry-pick "block/block-backend: blk_set_enable_write_cache is IO_CODE"
-- Proxmox Support Team <support@proxmox.com> Tue, 22 Nov 2022 09:21:06 +0100
pve-qemu-kvm (7.1.0-3) bullseye; urgency=medium
* init: daemonize: defuse PID file resolve error to a warning at max, fixing
some usecases that regressed with 7.1, like tracking start up in our
file-restore VM.
-- Proxmox Support Team <support@proxmox.com> Fri, 28 Oct 2022 10:27:49 +0200
pve-qemu-kvm (7.1.0-2) bullseye; urgency=medium
* fix an issue with error handling in async backup code
-- Proxmox Support Team <support@proxmox.com> Tue, 18 Oct 2022 15:33:44 +0200
pve-qemu-kvm (7.1.0-1) bullseye; urgency=medium
* update to QEMU stable release 7.1.0
* add fix for io_uring_register_ring_fd from upstream
-- Proxmox Support Team <support@proxmox.com> Fri, 14 Oct 2022 14:54:09 +0200
pve-qemu-kvm (7.0.0-4) bullseye; urgency=medium
* add revision to version output
* PVE Backup: allow passing max-workers performance setting
-- Proxmox Support Team <support@proxmox.com> Mon, 10 Oct 2022 11:55:37 +0200
pve-qemu-kvm (7.0.0-3) bullseye; urgency=medium
* savevm-async: avoid segfault when aborting snapshot creation task
* savevm-async: set SAVE_STATE_DONE when closing state file was successful
allowing one to start a new snapshot task after aborting one.
-- Proxmox Support Team <support@proxmox.com> Tue, 30 Aug 2022 12:54:03 +0200
pve-qemu-kvm (7.0.0-2) bullseye; urgency=medium
* backport "io_uring: fix short read slow path"
* backport "e1000: set RX descriptor status in a separate operation"
-- Proxmox Support Team <support@proxmox.com> Wed, 20 Jul 2022 09:17:07 +0200
pve-qemu-kvm (7.0.0-1) bullseye; urgency=medium
* update to QEMU stable release 7.0.0
-- Proxmox Support Team <support@proxmox.com> Thu, 30 Jun 2022 11:07:37 +0200
pve-qemu-kvm (6.2.0-11) bullseye; urgency=medium
* add 'namespace' to BlockdevOptionsPbs for live-restore support
* vma: create: support 64KiB-unaligned input images like to improve backing
up some VM templates
* block: alloc-track: avoid unlikely, but possible premature break
-- Proxmox Support Team <support@proxmox.com> Wed, 22 Jun 2022 15:54:54 +0200
pve-qemu-kvm (6.2.0-10) bullseye; urgency=medium
* fix #4101: fix backup cancellation bug with iothreads
-- Proxmox Support Team <support@proxmox.com> Thu, 9 Jun 2022 16:35:51 +0200
pve-qemu-kvm (6.2.0-9) bullseye; urgency=medium
* fix possible race conditions during cancellation of a PBS backup
-- Proxmox Support Team <support@proxmox.com> Wed, 08 Jun 2022 14:03:22 +0200
pve-qemu-kvm (6.2.0-8) bullseye; urgency=medium
* revert "block/rbd: implement bdrv_co_block_status" to work around
performance regression when backing up large RBD disk
-- Proxmox Support Team <support@proxmox.com> Thu, 19 May 2022 09:24:45 +0200
pve-qemu-kvm (6.2.0-7) bullseye; urgency=medium
* Proxmox Backup Server namespace support
-- Proxmox Support Team <support@proxmox.com> Thu, 12 May 2022 16:05:56 +0200
pve-qemu-kvm (6.2.0-6) bullseye; urgency=medium
* block/gluster: correctly set max_pdiscard which is int64_t to avoid
triggering assertion
* ui/vnc.c: Fixed a deadlock bug
* display/qxl-render: fix race condition in qxl_cursor (CVE-2021-4207) and
integer overflow in cursor_alloc (CVE-2021-4206)
-- Proxmox Support Team <support@proxmox.com> Wed, 11 May 2022 10:42:53 +0200
pve-qemu-kvm (6.2.0-5) bullseye; urgency=medium
* vma: allow partial restore by skipping some disk
-- Proxmox Support Team <support@proxmox.com> Mon, 25 Apr 2022 10:13:46 +0200
pve-qemu-kvm (6.2.0-4) bullseye; urgency=medium
* d/control: add libgbm to build dependencies
* d/control: add suggest dependency-hint for libgl1
* various stable backports:
+ virtio-net: fix map leaking on error during receive
+ memory: Fix incorrect calls of log_global_start/stop
+ acpi: fix OEM ID/OEM Table ID padding
+ vhost-vsock: detach the virqueue element in case of error
+ vhost-user: remove VirtQ notifier restore
+ vhost-user: fix VirtQ notifier cleanup
+ virtio: fix the condition for iommu_platform not supported
-- Proxmox Support Team <support@proxmox.com> Fri, 22 Apr 2022 11:52:30 +0200
pve-qemu-kvm (6.2.0-3) bullseye; urgency=medium
* cherry-pick fix for some manually added ACPI table SLIC entries via the
custom args flag.
-- Proxmox Support Team <support@proxmox.com> Fri, 15 Apr 2022 09:09:37 +0200
pve-qemu-kvm (6.2.0-2) bullseye; urgency=medium
* compile in virgl support
* enable zstd support
* drop sdl dependency (it was disabled at compile time already)
* recommend 'numactl'
* fix an issue with multi-disk backups where chunks would be written
multiple times
-- Proxmox Support Team <support@proxmox.com> Thu, 03 Mar 2022 12:03:44 +0100
pve-qemu-kvm (6.2.0-1) bullseye; urgency=medium
* update to QEMU stable release 6.2.0
-- Proxmox Support Team <support@proxmox.com> Thu, 17 Feb 2022 06:23:14 +0100
pve-qemu-kvm (6.1.1-2) bullseye; urgency=medium
* vma: create: register all streams before entering coroutines to avoid that
an early stream starts to write already before all got registered.
-- Proxmox Support Team <support@proxmox.com> Mon, 14 Feb 2022 15:53:15 +0100
pve-qemu-kvm (6.1.1-1) bullseye; urgency=medium
* update to 6.1.1 stable release
-- Proxmox Support Team <support@proxmox.com> Thu, 13 Jan 2022 10:57:43 +0100
pve-qemu-kvm (6.1.0-3) bullseye; urgency=medium
* fix #3738: cherry-pick "block: introduce max_hw_iov for use in scsi-
generic
-- Proxmox Support Team <support@proxmox.com> Wed, 01 Dec 2021 15:35:43 +0100
pve-qemu-kvm (6.1.0-2) bullseye; urgency=medium
* avoid a possible segmentation fault during block (disk) mirror
-- Proxmox Support Team <support@proxmox.com> Tue, 16 Nov 2021 09:38:10 +0100
pve-qemu-kvm (6.1.0-1) bullseye; urgency=medium
* update to QEMU stable release 6.1.0
-- Proxmox Support Team <support@proxmox.com> Mon, 11 Oct 2021 15:15:19 +0200
pve-qemu-kvm (6.0.0-4) bullseye; urgency=medium
* drop the ancient workaround that force disabled SMM due to observing VM
hangs on old kernel versions.
* monitor/qmp: fix race with clients disconnecting early resulting in other
clients receiving a message with the (now wrong) ID of the former
-- Proxmox Support Team <support@proxmox.com> Mon, 06 Sep 2021 07:30:00 +0200
pve-qemu-kvm (6.0.0-3) bullseye; urgency=medium
* io_uring: resubmit when result is -EAGAIN
-- Proxmox Support Team <support@proxmox.com> Tue, 3 Aug 2021 15:01:31 +0200
pve-qemu-kvm (6.0.0-2) bullseye; urgency=medium
* enable io-uring support in QEMU builds
-- Proxmox Support Team <support@proxmox.com> Wed, 23 Jun 2021 11:03:54 +0200
pve-qemu-kvm (6.0.0-1) bullseye; urgency=medium
* update to QEMU stable release 6.0.0
-- Proxmox Support Team <support@proxmox.com> Fri, 28 May 2021 11:30:50 +0200
pve-qemu-kvm (5.2.0-11) bullseye; urgency=medium
* re-build for Proxmox VE 7 / Debian Bullseye

1
debian/compat vendored
View File

@@ -1 +0,0 @@
10

36
debian/control vendored
View File

@@ -2,39 +2,43 @@ Source: pve-qemu-kvm
Section: admin
Priority: optional
Maintainer: Proxmox Support Team <support@proxmox.com>
Build-Depends: autotools-dev,
Build-Depends: debhelper-compat (= 13),
check,
debhelper (>= 9),
libacl1-dev,
libaio-dev,
libattr1-dev,
libcap-ng-dev,
libcurl4-gnutls-dev,
libepoxy-dev,
libfdt-dev,
libgbm-dev,
libglusterfs-dev (>= 5.2-2),
libgnutls28-dev,
libiscsi-dev (>= 1.12.0),
libjemalloc-dev,
libjpeg-dev,
libjson-perl,
libnuma-dev,
libpci-dev,
libpixman-1-dev,
libproxmox-backup-qemu0-dev (>= 1.0.3-1),
libproxmox-backup-qemu0-dev (>= 1.3.0),
librbd-dev (>= 0.48),
libsdl1.2-dev,
libseccomp-dev,
libslirp-dev,
libspice-protocol-dev (>= 0.12.14~),
libspice-server-dev (>= 0.14.0~),
libsystemd-dev,
libusb-1.0-0-dev (>= 1.0.17-1),
liburing-dev,
libusb-1.0-0-dev (>= 1.0.17),
libusbredirparser-dev (>= 0.6-2),
libvirglrenderer-dev,
libzstd-dev,
meson,
python3-minimal,
python3-sphinx,
python3-sphinx-rtd-theme,
python3-venv,
quilt,
texi2html,
texinfo,
uuid-dev,
xfslibs-dev,
Standards-Version: 3.7.2
@@ -43,7 +47,6 @@ Package: pve-qemu-kvm
Architecture: any
Depends: ceph-common (>= 0.48),
iproute2,
libaio1,
libgfapi0 | glusterfs-common (>= 5.6),
libgfchangelog0 | glusterfs-common (>= 5.6),
libgfdb0 | glusterfs-common (>= 5.6),
@@ -52,16 +55,16 @@ Depends: ceph-common (>= 0.48),
libglusterfs-dev | glusterfs-common (>= 5.6),
libglusterfs0 | glusterfs-common (>= 5.6),
libiscsi4 (>= 1.12.0) | libiscsi7,
libjemalloc2,
libjpeg62-turbo,
libsdl1.2debian,
libspice-server1 (>= 0.14.0~),
libusb-1.0-0 (>= 1.0.17-1),
libusbredirparser1 (>= 0.6-2),
vitastor-client (>= 0.9.4),
libuuid1,
numactl,
${misc:Depends},
${shlibs:Depends},
Recommends: numactl,
Suggests: libgl1,
Conflicts: kvm,
pve-kvm,
pve-qemu-kvm-2.6.18,
@@ -69,22 +72,17 @@ Conflicts: kvm,
qemu-kvm,
qemu-system-arm,
qemu-system-common,
qemu-system-data,
qemu-system-x86,
qemu-utils,
Provides: qemu-system-arm, qemu-system-x86, qemu-utils
Provides: qemu-system-arm, qemu-system-x86, qemu-utils,
Replaces: pve-kvm,
pve-qemu-kvm-2.6.18,
qemu-system-arm,
qemu-system-x86,
qemu-utils,
Breaks: qemu-server (<= 8.0.6)
Description: Full virtualization on x86 hardware
Using KVM, one can run multiple virtual PCs, each running unmodified Linux or
Windows images. Each virtual machine has private virtualized hardware: a
network card, disk, graphics adapter, etc.
Package: pve-qemu-kvm-dbg
Architecture: any
Section: debug
Depends: pve-qemu-kvm (= ${binary:Version})
Description: pve qemu debugging symbols
This package contains the debugging symbols for pve-qemu-kvm.

2
debian/copyright vendored
View File

@@ -25,7 +25,7 @@ License:
In particular, the QEMU virtual CPU core library (libqemu.a) is
released under the GNU Lesser General Public License version 2 or later.
On Debian systems, the complete text of the GNU Lesser General Public
On Debian systems, the complete text of the GNU Lesser General Public
License can be found in the file /usr/share/common-licenses/LGPL.
Some hardware device emulation sources and other QEMU functionality are

View File

@@ -24,4 +24,5 @@ while (<STDIN>) {
die "no QEMU machine types detected from STDIN input" if scalar (@$machines) <= 0;
print to_json($machines, { utf8 => 1 }) or die "$!\n";
print to_json($machines, { utf8 => 1, canonical => 1 })
or die "failed to encode detected machines as JSON - $!\n";

View File

@@ -26,19 +26,22 @@ Suggested-by: Ma Haocong <mahaocong@didichuxing.com>
Signed-off-by: Ma Haocong <mahaocong@didichuxing.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: rebased for 9.1.2]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/mirror.c | 98 ++++++++++++++++++++++++++++++-------
blockdev.c | 39 +++++++++++++--
include/block/block_int.h | 4 +-
qapi/block-core.json | 29 +++++++++--
tests/test-block-iothread.c | 4 +-
5 files changed, 145 insertions(+), 29 deletions(-)
block/mirror.c | 99 ++++++++++++++++++++------
blockdev.c | 38 +++++++++-
include/block/block_int-global-state.h | 4 +-
qapi/block-core.json | 25 ++++++-
tests/unit/test-block-iothread.c | 4 +-
5 files changed, 142 insertions(+), 28 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index 8e1ad6eceb..97843992c2 100644
index 2afe700b4d..c3d4be9b15 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -50,7 +50,7 @@ typedef struct MirrorBlockJob {
@@ -51,7 +51,7 @@ typedef struct MirrorBlockJob {
BlockDriverState *to_replace;
/* Used to block operations on the drive-mirror-replace target */
Error *replace_blocker;
@@ -47,7 +50,7 @@ index 8e1ad6eceb..97843992c2 100644
BlockMirrorBackingMode backing_mode;
/* Whether the target image requires explicit zero-initialization */
bool zero_target;
@@ -65,6 +65,8 @@ typedef struct MirrorBlockJob {
@@ -73,6 +73,8 @@ typedef struct MirrorBlockJob {
size_t buf_size;
int64_t bdev_length;
unsigned long *cow_bitmap;
@@ -56,9 +59,9 @@ index 8e1ad6eceb..97843992c2 100644
BdrvDirtyBitmap *dirty_bitmap;
BdrvDirtyBitmapIter *dbi;
uint8_t *buf;
@@ -677,7 +679,8 @@ static int mirror_exit_common(Job *job)
bdrv_child_refresh_perms(mirror_top_bs, mirror_top_bs->backing,
@@ -723,7 +725,8 @@ static int mirror_exit_common(Job *job)
&error_abort);
if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
- BlockDriverState *backing = s->is_none_mode ? src : s->base;
+ BlockDriverState *backing;
@@ -66,7 +69,7 @@ index 8e1ad6eceb..97843992c2 100644
BlockDriverState *unfiltered_target = bdrv_skip_filters(target_bs);
if (bdrv_cow_bs(unfiltered_target) != backing) {
@@ -774,6 +777,16 @@ static void mirror_abort(Job *job)
@@ -824,6 +827,16 @@ static void mirror_abort(Job *job)
assert(ret == 0);
}
@@ -83,7 +86,7 @@ index 8e1ad6eceb..97843992c2 100644
static void coroutine_fn mirror_throttle(MirrorBlockJob *s)
{
int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -955,7 +968,8 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
@@ -1020,7 +1033,8 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
mirror_free_init(s);
s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -93,23 +96,23 @@ index 8e1ad6eceb..97843992c2 100644
ret = mirror_dirty_init(s);
if (ret < 0 || job_is_cancelled(&s->common.job)) {
goto immediate_exit;
@@ -1188,6 +1202,7 @@ static const BlockJobDriver mirror_job_driver = {
@@ -1309,6 +1323,7 @@ static const BlockJobDriver mirror_job_driver = {
.run = mirror_run,
.prepare = mirror_prepare,
.abort = mirror_abort,
+ .clean = mirror_clean,
.pause = mirror_pause,
.complete = mirror_complete,
},
@@ -1203,6 +1218,7 @@ static const BlockJobDriver commit_active_job_driver = {
.cancel = mirror_cancel,
@@ -1327,6 +1342,7 @@ static const BlockJobDriver commit_active_job_driver = {
.run = mirror_run,
.prepare = mirror_prepare,
.abort = mirror_abort,
+ .clean = mirror_clean,
.pause = mirror_pause,
.complete = mirror_complete,
},
@@ -1550,7 +1566,10 @@ static BlockJob *mirror_start_job(
.cancel = commit_active_cancel,
@@ -1719,7 +1735,10 @@ static BlockJob *mirror_start_job(
BlockCompletionFunc *cb,
void *opaque,
const BlockJobDriver *driver,
@@ -120,10 +123,10 @@ index 8e1ad6eceb..97843992c2 100644
+ BlockDriverState *base,
bool auto_complete, const char *filter_node_name,
bool is_mirror, MirrorCopyMode copy_mode,
Error **errp)
@@ -1563,10 +1582,39 @@ static BlockJob *mirror_start_job(
Error *local_err = NULL;
int ret;
bool base_ro,
@@ -1734,10 +1753,39 @@ static BlockJob *mirror_start_job(
GLOBAL_STATE_CODE();
- if (granularity == 0) {
- granularity = bdrv_get_default_bitmap_granularity(target);
@@ -163,7 +166,7 @@ index 8e1ad6eceb..97843992c2 100644
assert(is_power_of_2(granularity));
if (buf_size < 0) {
@@ -1705,7 +1753,9 @@ static BlockJob *mirror_start_job(
@@ -1878,7 +1926,9 @@ static BlockJob *mirror_start_job(
s->replaces = g_strdup(replaces);
s->on_source_error = on_source_error;
s->on_target_error = on_target_error;
@@ -173,10 +176,10 @@ index 8e1ad6eceb..97843992c2 100644
+ s->bitmap_mode = bitmap_mode;
s->backing_mode = backing_mode;
s->zero_target = zero_target;
s->copy_mode = copy_mode;
@@ -1726,6 +1776,18 @@ static BlockJob *mirror_start_job(
bdrv_disable_dirty_bitmap(s->dirty_bitmap);
}
qatomic_set(&s->copy_mode, copy_mode);
@@ -1904,6 +1954,18 @@ static BlockJob *mirror_start_job(
*/
bdrv_disable_dirty_bitmap(s->dirty_bitmap);
+ if (s->sync_bitmap) {
+ bdrv_dirty_bitmap_set_busy(s->sync_bitmap, true);
@@ -190,10 +193,10 @@ index 8e1ad6eceb..97843992c2 100644
+ }
+ }
+
bdrv_graph_wrlock();
ret = block_job_add_bdrv(&s->common, "source", bs, 0,
BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |
BLK_PERM_CONSISTENT_READ,
@@ -1803,6 +1865,9 @@ fail:
@@ -1986,6 +2048,9 @@ fail:
if (s->dirty_bitmap) {
bdrv_release_dirty_bitmap(s->dirty_bitmap);
}
@@ -203,7 +206,7 @@ index 8e1ad6eceb..97843992c2 100644
job_early_fail(&s->common.job);
}
@@ -1820,29 +1885,23 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
@@ -2008,35 +2073,28 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
BlockDriverState *target, const char *replaces,
int creation_flags, int64_t speed,
uint32_t granularity, int64_t buf_size,
@@ -220,25 +223,31 @@ index 8e1ad6eceb..97843992c2 100644
- bool is_none_mode;
BlockDriverState *base;
GLOBAL_STATE_CODE();
- if ((mode == MIRROR_SYNC_MODE_INCREMENTAL) ||
- (mode == MIRROR_SYNC_MODE_BITMAP)) {
- error_setg(errp, "Sync mode '%s' not supported",
- MirrorSyncMode_str(mode));
- return;
- }
-
bdrv_graph_rdlock_main_loop();
- is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
base = mode == MIRROR_SYNC_MODE_TOP ? bdrv_backing_chain_next(bs) : NULL;
bdrv_graph_rdunlock_main_loop();
mirror_start_job(job_id, bs, creation_flags, target, replaces,
speed, granularity, buf_size, backing_mode, zero_target,
on_source_error, on_target_error, unmap, NULL, NULL,
- &mirror_job_driver, is_none_mode, base, false,
- filter_node_name, true, copy_mode, errp);
- filter_node_name, true, copy_mode, false, errp);
+ &mirror_job_driver, mode, bitmap, bitmap_mode, base,
+ false, filter_node_name, true, copy_mode, errp);
+ false, filter_node_name, true, copy_mode, false, errp);
}
BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
@@ -1868,7 +1927,8 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
@@ -2063,7 +2121,8 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
job_id, bs, creation_flags, base, NULL, speed, 0, 0,
MIRROR_LEAVE_BACKING_CHAIN, false,
on_error, on_error, true, cb, opaque,
@@ -246,36 +255,35 @@ index 8e1ad6eceb..97843992c2 100644
+ &commit_active_job_driver, MIRROR_SYNC_MODE_FULL,
+ NULL, 0, base, auto_complete,
filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
&local_err);
if (local_err) {
base_read_only, errp);
if (!job) {
diff --git a/blockdev.c b/blockdev.c
index fe6fb5dc1d..394920613d 100644
index 6740663fda..38fa63155c 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2930,6 +2930,10 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
@@ -2781,6 +2781,9 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
BlockDriverState *target,
bool has_replaces, const char *replaces,
const char *replaces,
enum MirrorSyncMode sync,
+ bool has_bitmap,
+ const char *bitmap_name,
+ bool has_bitmap_mode,
+ BitmapSyncMode bitmap_mode,
BlockMirrorBackingMode backing_mode,
bool zero_target,
bool has_speed, int64_t speed,
@@ -2949,6 +2953,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
@@ -2799,6 +2802,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
{
BlockDriverState *unfiltered_bs;
int job_flags = JOB_DEFAULT;
+ BdrvDirtyBitmap *bitmap = NULL;
if (!has_speed) {
speed = 0;
@@ -3003,6 +3008,29 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
GLOBAL_STATE_CODE();
GRAPH_RDLOCK_GUARD_MAINLOOP();
@@ -2853,6 +2857,29 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
sync = MIRROR_SYNC_MODE_FULL;
}
+ if (has_bitmap) {
+ if (bitmap_name) {
+ if (granularity) {
+ error_setg(errp, "Granularity and bitmap cannot both be set");
+ return;
@@ -298,53 +306,53 @@ index fe6fb5dc1d..394920613d 100644
+ }
+ }
+
if (!has_replaces) {
if (!replaces) {
/* We want to mirror from @bs, but keep implicit filters on top */
unfiltered_bs = bdrv_skip_implicit_filters(bs);
@@ -3049,8 +3077,8 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
@@ -2894,8 +2921,8 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
* and will allow to check whether the node still exist at mirror completion
*/
mirror_start(job_id, bs, target,
- has_replaces ? replaces : NULL, job_flags,
- replaces, job_flags,
- speed, granularity, buf_size, sync, backing_mode, zero_target,
+ has_replaces ? replaces : NULL, job_flags, speed, granularity,
+ buf_size, sync, bitmap, bitmap_mode, backing_mode, zero_target,
+ replaces, job_flags, speed, granularity, buf_size, sync,
+ bitmap, bitmap_mode, backing_mode, zero_target,
on_source_error, on_target_error, unmap, filter_node_name,
copy_mode, errp);
}
@@ -3195,6 +3223,8 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
@@ -3039,6 +3066,8 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
blockdev_mirror_common(arg->has_job_id ? arg->job_id : NULL, bs, target_bs,
arg->has_replaces, arg->replaces, arg->sync,
+ arg->has_bitmap, arg->bitmap,
blockdev_mirror_common(arg->job_id, bs, target_bs,
arg->replaces, arg->sync,
+ arg->bitmap,
+ arg->has_bitmap_mode, arg->bitmap_mode,
backing_mode, zero_target,
arg->has_speed, arg->speed,
arg->has_granularity, arg->granularity,
@@ -3216,6 +3246,8 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
@@ -3058,6 +3087,8 @@ void qmp_blockdev_mirror(const char *job_id,
const char *device, const char *target,
bool has_replaces, const char *replaces,
const char *replaces,
MirrorSyncMode sync,
+ bool has_bitmap, const char *bitmap,
+ const char *bitmap,
+ bool has_bitmap_mode, BitmapSyncMode bitmap_mode,
bool has_speed, int64_t speed,
bool has_granularity, uint32_t granularity,
bool has_buf_size, int64_t buf_size,
@@ -3265,7 +3297,8 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
@@ -3098,7 +3129,8 @@ void qmp_blockdev_mirror(const char *job_id,
}
blockdev_mirror_common(has_job_id ? job_id : NULL, bs, target_bs,
- has_replaces, replaces, sync, backing_mode,
+ has_replaces, replaces, sync, has_bitmap,
blockdev_mirror_common(job_id, bs, target_bs,
- replaces, sync, backing_mode,
+ replaces, sync,
+ bitmap, has_bitmap_mode, bitmap_mode, backing_mode,
zero_target, has_speed, speed,
has_granularity, granularity,
has_buf_size, buf_size,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 95d9333be1..6f8eda629a 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1230,7 +1230,9 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
diff --git a/include/block/block_int-global-state.h b/include/block/block_int-global-state.h
index eb2d92a226..f0c642b194 100644
--- a/include/block/block_int-global-state.h
+++ b/include/block/block_int-global-state.h
@@ -158,7 +158,9 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
BlockDriverState *target, const char *replaces,
int creation_flags, int64_t speed,
uint32_t granularity, int64_t buf_size,
@@ -356,31 +364,26 @@ index 95d9333be1..6f8eda629a 100644
BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 04ad80bc1e..9db3120716 100644
index fd3bcc1c17..48ba32049f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -1971,10 +1971,19 @@
# (all the disk, only the sectors allocated in the topmost image, or
# only new I/O).
@@ -2178,6 +2178,15 @@
# destination (all the disk, only the sectors allocated in the
# topmost image, or only new I/O).
#
+# @bitmap: The name of a bitmap to use for sync=bitmap mode. This argument must
+# be present for bitmap mode and absent otherwise. The bitmap's
+# granularity is used instead of @granularity (since 4.1).
+# @bitmap: The name of a bitmap to use for sync=bitmap mode. This
+# argument must be present for bitmap mode and absent otherwise.
+# The bitmap's granularity is used instead of @granularity (Since
+# 4.1).
+#
+# @bitmap-mode: Specifies the type of data the bitmap should contain after
+# the operation concludes. Must be present if sync is "bitmap".
+# Must NOT be present otherwise. (Since 4.1)
+# @bitmap-mode: Specifies the type of data the bitmap should contain
+# after the operation concludes. Must be present if sync is
+# "bitmap". Must NOT be present otherwise. (Since 4.1)
+#
# @granularity: granularity of the dirty bitmap, default is 64K
# if the image format doesn't have clusters, 4K if the clusters
# are smaller than that, else the cluster size. Must be a
-# power of 2 between 512 and 64M (since 1.4).
+# power of 2 between 512 and 64M. Must not be specified if
+# @bitmap is present (since 1.4).
#
# @buf-size: maximum amount of data in flight from source to
# target (since 1.4).
@@ -2012,7 +2021,9 @@
# @granularity: granularity of the dirty bitmap, default is 64K if the
# image format doesn't have clusters, 4K if the clusters are
# smaller than that, else the cluster size. Must be a power of 2
@@ -2220,7 +2229,9 @@
{ 'struct': 'DriveMirror',
'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
'*format': 'str', '*node-name': 'str', '*replaces': 'str',
@@ -391,28 +394,23 @@ index 04ad80bc1e..9db3120716 100644
'*speed': 'int', '*granularity': 'uint32',
'*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
'*on-target-error': 'BlockdevOnError',
@@ -2280,10 +2291,19 @@
# (all the disk, only the sectors allocated in the topmost image, or
# only new I/O).
@@ -2499,6 +2510,15 @@
# destination (all the disk, only the sectors allocated in the
# topmost image, or only new I/O).
#
+# @bitmap: The name of a bitmap to use for sync=bitmap mode. This argument must
+# be present for bitmap mode and absent otherwise. The bitmap's
+# granularity is used instead of @granularity (since 4.1).
+# @bitmap: The name of a bitmap to use for sync=bitmap mode. This
+# argument must be present for bitmap mode and absent otherwise.
+# The bitmap's granularity is used instead of @granularity (since
+# 4.1).
+#
+# @bitmap-mode: Specifies the type of data the bitmap should contain after
+# the operation concludes. Must be present if sync is "bitmap".
+# Must NOT be present otherwise. (Since 4.1)
+# @bitmap-mode: Specifies the type of data the bitmap should contain
+# after the operation concludes. Must be present if sync is
+# "bitmap". Must NOT be present otherwise. (Since 4.1)
+#
# @granularity: granularity of the dirty bitmap, default is 64K
# if the image format doesn't have clusters, 4K if the clusters
# are smaller than that, else the cluster size. Must be a
-# power of 2 between 512 and 64M
+# power of 2 between 512 and 64M . Must not be specified if
+# @bitmap is present.
#
# @buf-size: maximum amount of data in flight from source to
# target
@@ -2332,7 +2352,8 @@
# @granularity: granularity of the dirty bitmap, default is 64K if the
# image format doesn't have clusters, 4K if the clusters are
# smaller than that, else the cluster size. Must be a power of 2
@@ -2547,7 +2567,8 @@
{ 'command': 'blockdev-mirror',
'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
'*replaces': 'str',
@@ -422,11 +420,11 @@ index 04ad80bc1e..9db3120716 100644
'*speed': 'int', '*granularity': 'uint32',
'*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
'*on-target-error': 'BlockdevOnError',
diff --git a/tests/test-block-iothread.c b/tests/test-block-iothread.c
index 3f866a35c6..500ede71c8 100644
--- a/tests/test-block-iothread.c
+++ b/tests/test-block-iothread.c
@@ -623,8 +623,8 @@ static void test_propagate_mirror(void)
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index 20ed54f570..4f50a99334 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -755,8 +755,8 @@ static void test_propagate_mirror(void)
/* Start a mirror job */
mirror_start("job0", src, target, NULL, JOB_DEFAULT, 0, 0, 0,
@@ -436,4 +434,4 @@ index 3f866a35c6..500ede71c8 100644
+ false, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
false, "filter_node", MIRROR_COPY_MODE_BACKGROUND,
&error_abort);
job = job_get("job0");

View File

@@ -18,15 +18,16 @@ incremental backup modes; we can use this bitmap to later refresh a
successfully created mirror.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/mirror.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index 97843992c2..d1cce079da 100644
index c3d4be9b15..7b6f7c0068 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -654,8 +654,6 @@ static int mirror_exit_common(Job *job)
@@ -694,8 +694,6 @@ static int mirror_exit_common(Job *job)
bdrv_unfreeze_backing_chain(mirror_top_bs, target_bs);
}
@@ -35,9 +36,9 @@ index 97843992c2..d1cce079da 100644
/* Make sure that the source BDS doesn't go away during bdrv_replace_node,
* before we can call bdrv_drained_end */
bdrv_ref(src);
@@ -755,6 +753,18 @@ static int mirror_exit_common(Job *job)
blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort);
@@ -805,6 +803,18 @@ static int mirror_exit_common(Job *job)
bdrv_drained_end(target_bs);
bdrv_unref(target_bs);
+ if (s->sync_bitmap) {
+ if (s->bitmap_mode == BITMAP_SYNC_MODE_ALWAYS ||
@@ -54,7 +55,7 @@ index 97843992c2..d1cce079da 100644
bs_opaque->job = NULL;
bdrv_drained_end(src);
@@ -1592,10 +1602,6 @@ static BlockJob *mirror_start_job(
@@ -1763,10 +1773,6 @@ static BlockJob *mirror_start_job(
" sync mode",
MirrorSyncMode_str(sync_mode));
return NULL;
@@ -65,7 +66,7 @@ index 97843992c2..d1cce079da 100644
}
} else if (bitmap) {
error_setg(errp,
@@ -1612,6 +1618,12 @@ static BlockJob *mirror_start_job(
@@ -1783,6 +1789,12 @@ static BlockJob *mirror_start_job(
return NULL;
}
granularity = bdrv_dirty_bitmap_granularity(bitmap);

View File

@@ -10,15 +10,16 @@ as one without the other does not make much sense with the current set
of modes.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
blockdev.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/blockdev.c b/blockdev.c
index 394920613d..4f8bd38b58 100644
index 38fa63155c..204cf6fad1 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3029,6 +3029,9 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
@@ -2878,6 +2878,9 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_ALLOW_RO, errp)) {
return;
}
@@ -27,4 +28,4 @@ index 394920613d..4f8bd38b58 100644
+ return;
}
if (!has_replaces) {
if (!replaces) {

View File

@@ -10,15 +10,16 @@ since sync_bitmap is busy at the point of merging, and we checked access
beforehand.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/mirror.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
block/mirror.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index d1cce079da..e6140cf018 100644
index 7b6f7c0068..2b1c07095d 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -759,8 +759,8 @@ static int mirror_exit_common(Job *job)
@@ -809,8 +809,8 @@ static int mirror_exit_common(Job *job)
job->ret == 0 && ret == 0)) {
/* Success; synchronize copy back to sync. */
bdrv_clear_dirty_bitmap(s->sync_bitmap, NULL);
@@ -29,14 +30,17 @@ index d1cce079da..e6140cf018 100644
}
}
bdrv_release_dirty_bitmap(s->dirty_bitmap);
@@ -1793,8 +1793,8 @@ static BlockJob *mirror_start_job(
@@ -1971,11 +1971,8 @@ static BlockJob *mirror_start_job(
}
if (s->sync_mode == MIRROR_SYNC_MODE_BITMAP) {
- bdrv_merge_dirty_bitmap(s->dirty_bitmap, s->sync_bitmap,
- NULL, &local_err);
- if (local_err) {
- goto fail;
- }
+ bdrv_dirty_bitmap_merge_internal(s->dirty_bitmap, s->sync_bitmap,
+ NULL, true);
if (local_err) {
goto fail;
}
}
bdrv_graph_wrlock();

View File

@@ -20,11 +20,11 @@ intentionally keeping copyright and ownership of original test case to
honor provenance.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
tests/qemu-iotests/384 | 547 +++++++
tests/qemu-iotests/384.out | 2846 ++++++++++++++++++++++++++++++++++++
tests/qemu-iotests/group | 1 +
3 files changed, 3394 insertions(+)
2 files changed, 3393 insertions(+)
create mode 100755 tests/qemu-iotests/384
create mode 100644 tests/qemu-iotests/384.out
@@ -3433,15 +3433,3 @@ index 0000000000..9b7408b6d6
+{"execute": "blockdev-mirror", "arguments": {"bitmap": "bitmap0", "device": "drive0", "filter-node-name": "mirror-top", "job-id": "api_job", "sync": "none", "target": "mirror_target"}}
+{"error": {"class": "GenericError", "desc": "bitmap-mode must be specified if a bitmap is provided"}}
+
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 2960dff728..952dceba1f 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -270,6 +270,7 @@
253 rw quick
254 rw backing quick
255 rw quick
+384 rw
256 rw auto quick
257 rw
258 rw quick

View File

@@ -11,6 +11,9 @@ mode was never available for drive-mirror, it makes the interface more
uniform w.r.t. backup block jobs.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: rebase for 8.2.2]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/mirror.c | 28 +++------------
blockdev.c | 29 +++++++++++++++
@@ -18,12 +21,12 @@ Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
3 files changed, 70 insertions(+), 59 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index e6140cf018..3a08239a78 100644
index 2b1c07095d..f5787b380c 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1592,31 +1592,13 @@ static BlockJob *mirror_start_job(
Error *local_err = NULL;
int ret;
@@ -1763,31 +1763,13 @@ static BlockJob *mirror_start_job(
GLOBAL_STATE_CODE();
- if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
- error_setg(errp, "Sync mode '%s' not supported",
@@ -59,17 +62,17 @@ index e6140cf018..3a08239a78 100644
if (bitmap_mode != BITMAP_SYNC_MODE_NEVER) {
diff --git a/blockdev.c b/blockdev.c
index 4f8bd38b58..a40c6fd0f6 100644
index 204cf6fad1..79d47b1920 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3008,7 +3008,36 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
@@ -2857,7 +2857,36 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
sync = MIRROR_SYNC_MODE_FULL;
}
+ if ((sync == MIRROR_SYNC_MODE_BITMAP) ||
+ (sync == MIRROR_SYNC_MODE_INCREMENTAL)) {
+ /* done before desugaring 'incremental' to print the right message */
+ if (!has_bitmap) {
+ if (!bitmap_name) {
+ error_setg(errp, "Must provide a valid bitmap name for "
+ "'%s' sync mode", MirrorSyncMode_str(sync));
+ return;
@@ -90,7 +93,7 @@ index 4f8bd38b58..a40c6fd0f6 100644
+ bitmap_mode = BITMAP_SYNC_MODE_ON_SUCCESS;
+ }
+
if (has_bitmap) {
if (bitmap_name) {
+ if (sync != MIRROR_SYNC_MODE_BITMAP) {
+ error_setg(errp, "Sync mode '%s' not supported with bitmap.",
+ MirrorSyncMode_str(sync));

View File

@@ -1,33 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
Date: Mon, 14 Sep 2020 19:32:21 +0200
Subject: [PATCH] Revert "qemu-img convert: Don't pre-zero images"
This reverts commit edafc70c0c8510862f2f213a3acf7067113bcd08.
As it correlates with causing issues on LVM allocation
https://bugzilla.proxmox.com/show_bug.cgi?id=3002
---
qemu-img.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/qemu-img.c b/qemu-img.c
index 8bdea40b58..f9050bfaad 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2104,6 +2104,15 @@ static int convert_do_copy(ImgConvertState *s)
s->has_zero_init = bdrv_has_zero_init(blk_bs(s->target));
}
+ if (!s->has_zero_init && !s->target_has_backing &&
+ bdrv_can_write_zeroes_with_unmap(blk_bs(s->target)))
+ {
+ ret = blk_make_zero(s->target, BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK);
+ if (ret == 0) {
+ s->has_zero_init = true;
+ }
+ }
+
/* Allocate buffer for copied data. For compressed images, only one cluster
* can be copied at a time. */
if (s->compressed) {

View File

@@ -0,0 +1,206 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Mon, 23 Aug 2021 11:28:32 +0200
Subject: [PATCH] monitor/qmp: fix race with clients disconnecting early
The following sequence can produce a race condition that results in
responses meant for different clients being sent to the wrong one:
(QMP, no OOB)
1) client A connects
2) client A sends 'qmp_capabilities'
3) 'qmp_dispatch' runs in coroutine, schedules out to
'do_qmp_dispatch_bh' and yields
4) client A disconnects (i.e. aborts, crashes, etc...)
5) client B connects
6) 'do_qmp_dispatch_bh' runs 'qmp_capabilities' and wakes calling coroutine
7) capabilities are now set and 'mon->commands' is set to '&qmp_commands'
8) 'qmp_dispatch' returns to 'monitor_qmp_dispatch'
9) success message is sent to client B *without it ever having sent
'qmp_capabilities' itself*
9a) even if client B ignores it, it will now presumably send it's own
greeting, which will error because caps are already set
The fix proposed here uses an atomic, sequential connection number
stored in the MonitorQMP struct, which is incremented everytime a new
client connects. Since it is not changed on CHR_EVENT_CLOSED, the
behaviour of allowing a client to disconnect only one side of the
connection is retained.
The connection_nr needs to be exposed outside of the monitor subsystem,
since qmp_dispatch lives in qapi code. It needs to be checked twice,
once for actually running the command in the BH (fixes 7), and once for
sending back a response (fixes 9).
This satisfies my local reproducer - using multiple clients constantly
looping to open a connection, send the greeting, then exiting no longer
crashes other, normally behaving clients with unrelated responses.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
include/monitor/monitor.h | 1 +
monitor/monitor-internal.h | 7 +++++++
monitor/monitor.c | 15 +++++++++++++++
monitor/qmp.c | 15 ++++++++++++++-
qapi/qmp-dispatch.c | 21 +++++++++++++++++----
stubs/monitor-core.c | 5 +++++
6 files changed, 59 insertions(+), 5 deletions(-)
diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index c3740ec616..7f38ce6b8b 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -16,6 +16,7 @@ extern QemuOptsList qemu_mon_opts;
Monitor *monitor_cur(void);
Monitor *monitor_set_cur(Coroutine *co, Monitor *mon);
bool monitor_cur_is_qmp(void);
+int monitor_get_connection_nr(const Monitor *mon);
void monitor_init_globals(void);
void monitor_init_globals_core(void);
diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
index cb628f681d..93dbd62fc2 100644
--- a/monitor/monitor-internal.h
+++ b/monitor/monitor-internal.h
@@ -151,6 +151,13 @@ typedef struct {
QemuMutex qmp_queue_lock;
/* Input queue that holds all the parsed QMP requests */
GQueue *qmp_requests;
+
+ /*
+ * A sequential number that gets incremented on every new CHR_EVENT_OPENED.
+ * Used to avoid leftover responses in BHs from being sent to the wrong
+ * client. Access with atomics.
+ */
+ int connection_nr;
} MonitorQMP;
/**
diff --git a/monitor/monitor.c b/monitor/monitor.c
index 56786c0ccc..30071d0c8a 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -116,6 +116,21 @@ bool monitor_cur_is_qmp(void)
return cur_mon && monitor_is_qmp(cur_mon);
}
+/**
+ * If @mon is a QMP monitor, return the connection_nr, otherwise -1.
+ */
+int monitor_get_connection_nr(const Monitor *mon)
+{
+ MonitorQMP *qmp_mon;
+
+ if (!monitor_is_qmp(mon)) {
+ return -1;
+ }
+
+ qmp_mon = container_of(mon, MonitorQMP, common);
+ return qatomic_read(&qmp_mon->connection_nr);
+}
+
/**
* Is @mon is using readline?
* Note: not all HMP monitors use readline, e.g., gdbserver has a
diff --git a/monitor/qmp.c b/monitor/qmp.c
index 5e538f34c0..eb181d5979 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -165,6 +165,8 @@ static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
QDict *rsp;
QDict *error;
+ int conn_nr_before = qatomic_read(&mon->connection_nr);
+
rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon),
&mon->common);
@@ -180,7 +182,17 @@ static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
}
}
- monitor_qmp_respond(mon, rsp);
+ /*
+ * qmp_dispatch might have yielded and waited for a BH, in which case there
+ * is a chance a new client connected in the meantime - if this happened,
+ * the command will not have been executed, but we also need to ensure that
+ * we don't send back a corresponding response on a line that no longer
+ * belongs to this request.
+ */
+ if (conn_nr_before == qatomic_read(&mon->connection_nr)) {
+ monitor_qmp_respond(mon, rsp);
+ }
+
qobject_unref(rsp);
}
@@ -461,6 +473,7 @@ static void monitor_qmp_event(void *opaque, QEMUChrEvent event)
switch (event) {
case CHR_EVENT_OPENED:
+ qatomic_inc_fetch(&mon->connection_nr);
mon->commands = &qmp_cap_negotiation_commands;
monitor_qmp_caps_reset(mon);
data = qmp_greeting(mon);
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index 176b549473..790bb7d1da 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -117,16 +117,28 @@ typedef struct QmpDispatchBH {
QObject **ret;
Error **errp;
Coroutine *co;
+ int conn_nr;
} QmpDispatchBH;
static void do_qmp_dispatch_bh(void *opaque)
{
QmpDispatchBH *data = opaque;
- assert(monitor_cur() == NULL);
- monitor_set_cur(qemu_coroutine_self(), data->cur_mon);
- data->cmd->fn(data->args, data->ret, data->errp);
- monitor_set_cur(qemu_coroutine_self(), NULL);
+ /*
+ * A QMP monitor tracks it's client with a connection number, if this
+ * changes during the scheduling delay of this BH, we must not execute the
+ * command. Otherwise a badly placed 'qmp_capabilities' might affect the
+ * connection state of a client it was never meant for.
+ */
+ if (data->conn_nr == monitor_get_connection_nr(data->cur_mon)) {
+ assert(monitor_cur() == NULL);
+ monitor_set_cur(qemu_coroutine_self(), data->cur_mon);
+ data->cmd->fn(data->args, data->ret, data->errp);
+ monitor_set_cur(qemu_coroutine_self(), NULL);
+ } else {
+ error_setg(data->errp, "active monitor connection changed");
+ }
+
aio_co_wake(data->co);
}
@@ -253,6 +265,7 @@ QDict *coroutine_mixed_fn qmp_dispatch(const QmpCommandList *cmds, QObject *requ
.ret = &ret,
.errp = &err,
.co = qemu_coroutine_self(),
+ .conn_nr = monitor_get_connection_nr(cur_mon),
};
aio_bh_schedule_oneshot(iohandler_get_aio_context(), do_qmp_dispatch_bh,
&data);
diff --git a/stubs/monitor-core.c b/stubs/monitor-core.c
index 1894cdfe1f..d74d0459f0 100644
--- a/stubs/monitor-core.c
+++ b/stubs/monitor-core.c
@@ -12,6 +12,11 @@ Monitor *monitor_set_cur(Coroutine *co, Monitor *mon)
return NULL;
}
+int monitor_get_connection_nr(const Monitor *mon)
+{
+ return -1;
+}
+
void qapi_event_emit(QAPIEvent event, QDict *qdict)
{
}

View File

@@ -1,38 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Thu, 28 Jan 2021 15:19:51 +0100
Subject: [PATCH] docs: don't install man page if guest agent is disabled
No sense outputting the qemu-ga and qemu-ga-ref man pages when the guest
agent binary itself is disabled. This mirrors behaviour from before the
meson switch.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
docs/meson.build | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/docs/meson.build b/docs/meson.build
index ebd85d59f9..cc6f5007f8 100644
--- a/docs/meson.build
+++ b/docs/meson.build
@@ -46,6 +46,8 @@ if build_docs
meson.source_root() / 'docs/sphinx/qmp_lexer.py',
qapi_gen_depends ]
+ have_ga = have_tools and config_host.has_key('CONFIG_GUEST_AGENT')
+
configure_file(output: 'index.html',
input: files('index.html.in'),
configuration: {'VERSION': meson.project_version()},
@@ -53,8 +55,8 @@ if build_docs
manuals = [ 'devel', 'interop', 'tools', 'specs', 'system', 'user' ]
man_pages = {
'interop' : {
- 'qemu-ga.8': (have_tools ? 'man8' : ''),
- 'qemu-ga-ref.7': 'man7',
+ 'qemu-ga.8': (have_ga ? 'man8' : ''),
+ 'qemu-ga-ref.7': (have_ga ? 'man7' : ''),
'qemu-qmp-ref.7': 'man7',
},
'tools': {

View File

@@ -0,0 +1,100 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Tue, 7 Mar 2023 15:03:02 +0100
Subject: [PATCH] ide: avoid potential deadlock when draining during trim
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The deadlock can happen as follows:
1. ide_issue_trim is called, and increments the in_flight counter.
2. ide_issue_trim_cb calls blk_aio_pdiscard.
3. Somebody else starts draining (e.g. backup to insert the cbw node).
4. ide_issue_trim_cb is called as the completion callback for
blk_aio_pdiscard.
5. ide_issue_trim_cb issues yet another blk_aio_pdiscard request.
6. The request is added to the wait queue via blk_wait_while_drained,
because draining has been started.
7. Nobody ever decrements the in_flight counter and draining can't
finish. This would be done by ide_trim_bh_cb, which is called after
ide_issue_trim_cb has issued its last request, but
ide_issue_trim_cb is not called anymore, because it's the
completion callback of blk_aio_pdiscard, which waits on draining.
Quoting Hanna Czenczek:
> The point of 7e5cdb345f was that we need any in-flight count to
> accompany a set s->bus->dma->aiocb. While blk_aio_pdiscard() is
> happening, we dont necessarily need another count. But we do need
> it while there is no blk_aio_pdiscard().
> ide_issue_trim_cb() returns in two cases (and, recursively through
> its callers, leaves s->bus->dma->aiocb set):
> 1. After calling blk_aio_pdiscard(), which will keep an in-flight
> count,
> 2. After calling replay_bh_schedule_event() (i.e.
> qemu_bh_schedule()), which does not keep an in-flight count.
Thus, even after moving the blk_inc_in_flight to above the
replay_bh_schedule_event call, the invariant "ide_issue_trim_cb
returns with an accompanying in-flight count" is still satisfied.
However, the issue 7e5cdb345f fixed for canceling resurfaces, because
ide_cancel_dma_sync assumes that it just needs to drain once. But now
the in_flight count is not consistently > 0 during the trim operation.
So, change it to drain until !s->bus->dma->aiocb, which means that the
operation finished (s->bus->dma->aiocb is cleared by ide_set_inactive
via the ide_dma_cb when the end of the transfer is reached).
Discussion here:
https://lists.nongnu.org/archive/html/qemu-devel/2023-03/msg02506.html
Fixes: 7e5cdb345f ("ide: Increment BB in-flight counter for TRIM BH")
Suggested-by: Hanna Czenczek <hreitz@redhat.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/ide/core.c | 12 ++++++------
1 file changed, 6 insertions(+), 6 deletions(-)
diff --git a/hw/ide/core.c b/hw/ide/core.c
index 08d9218455..20d8c0cf66 100644
--- a/hw/ide/core.c
+++ b/hw/ide/core.c
@@ -456,7 +456,7 @@ static void ide_trim_bh_cb(void *opaque)
iocb->bh = NULL;
qemu_aio_unref(iocb);
- /* Paired with an increment in ide_issue_trim() */
+ /* Paired with an increment in ide_issue_trim_cb() */
blk_dec_in_flight(blk);
}
@@ -516,6 +516,8 @@ static void ide_issue_trim_cb(void *opaque, int ret)
done:
iocb->aiocb = NULL;
if (iocb->bh) {
+ /* Paired with a decrement in ide_trim_bh_cb() */
+ blk_inc_in_flight(s->blk);
replay_bh_schedule_event(iocb->bh);
}
}
@@ -528,9 +530,6 @@ BlockAIOCB *ide_issue_trim(
IDEDevice *dev = s->unit ? s->bus->slave : s->bus->master;
TrimAIOCB *iocb;
- /* Paired with a decrement in ide_trim_bh_cb() */
- blk_inc_in_flight(s->blk);
-
iocb = blk_aio_get(&trim_aiocb_info, s->blk, cb, cb_opaque);
iocb->s = s;
iocb->bh = qemu_bh_new_guarded(ide_trim_bh_cb, iocb,
@@ -754,8 +753,9 @@ void ide_cancel_dma_sync(IDEState *s)
*/
if (s->bus->dma->aiocb) {
trace_ide_cancel_dma_sync_remaining();
- blk_drain(s->blk);
- assert(s->bus->dma->aiocb == NULL);
+ while (s->bus->dma->aiocb) {
+ blk_drain(s->blk);
+ }
}
}

View File

@@ -1,31 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Thu, 4 Feb 2021 17:06:19 +0100
Subject: [PATCH] migration: only check page size match if RAM postcopy is
enabled
Postcopy may also be advised for dirty-bitmap migration only, in which
case the remote page size will not be available and we'll instead read
bogus data, blocking migration with a mismatch error if the VM uses
hugepages.
Fixes: 58110f0acb ("migration: split common postcopy out of ram postcopy")
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
---
migration/ram.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/migration/ram.c b/migration/ram.c
index 7811cde643..6ace15261c 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3521,7 +3521,7 @@ static int ram_load_precopy(QEMUFile *f)
}
}
/* For postcopy we need to check hugepage sizes match */
- if (postcopy_advised &&
+ if (postcopy_advised && migrate_postcopy_ram() &&
block->page_size != qemu_host_page_size) {
uint64_t remote_page_size = qemu_get_be64(f);
if (remote_page_size != block->page_size) {

View File

@@ -0,0 +1,82 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Richard Henderson <richard.henderson@linaro.org>
Date: Sat, 7 Dec 2024 18:14:45 +0000
Subject: [PATCH] tcg: Reset free_temps before tcg_optimize
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
When allocating new temps during tcg_optmize, do not re-use
any EBB temps that were used within the TB. We do not have
any idea what span of the TB in which the temp was live.
Introduce tcg_temp_ebb_reset_freed and use before tcg_optimize,
as well as replacing the equivalent in plugin_gen_inject and
tcg_func_start.
Cc: qemu-stable@nongnu.org
Fixes: fb04ab7ddd8 ("tcg/optimize: Lower TCG_COND_TST{EQ,NE} if unsupported")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/2711
Reported-by: wannacu <wannacu2049@gmail.com>
Signed-off-by: Richard Henderson <richard.henderson@linaro.org>
Reviewed-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit 04e006ab36a8565b92d4e21dd346367fbade7d74)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
accel/tcg/plugin-gen.c | 2 +-
include/tcg/tcg-temp-internal.h | 6 ++++++
tcg/tcg.c | 5 ++++-
3 files changed, 11 insertions(+), 2 deletions(-)
diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 0f47bfbb48..1ef075552c 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -275,7 +275,7 @@ static void plugin_gen_inject(struct qemu_plugin_tb *plugin_tb)
* that might be live within the existing opcode stream.
* The simplest solution is to release them all and create new.
*/
- memset(tcg_ctx->free_temps, 0, sizeof(tcg_ctx->free_temps));
+ tcg_temp_ebb_reset_freed(tcg_ctx);
QTAILQ_FOREACH_SAFE(op, &tcg_ctx->ops, link, next) {
switch (op->opc) {
diff --git a/include/tcg/tcg-temp-internal.h b/include/tcg/tcg-temp-internal.h
index 44192c55a9..98f91e68b7 100644
--- a/include/tcg/tcg-temp-internal.h
+++ b/include/tcg/tcg-temp-internal.h
@@ -42,4 +42,10 @@ TCGv_i64 tcg_temp_ebb_new_i64(void);
TCGv_ptr tcg_temp_ebb_new_ptr(void);
TCGv_i128 tcg_temp_ebb_new_i128(void);
+/* Forget all freed EBB temps, so that new allocations produce new temps. */
+static inline void tcg_temp_ebb_reset_freed(TCGContext *s)
+{
+ memset(s->free_temps, 0, sizeof(s->free_temps));
+}
+
#endif /* TCG_TEMP_FREE_H */
diff --git a/tcg/tcg.c b/tcg/tcg.c
index 0babae1b88..4578b185be 100644
--- a/tcg/tcg.c
+++ b/tcg/tcg.c
@@ -1489,7 +1489,7 @@ void tcg_func_start(TCGContext *s)
s->nb_temps = s->nb_globals;
/* No temps have been previously allocated for size or locality. */
- memset(s->free_temps, 0, sizeof(s->free_temps));
+ tcg_temp_ebb_reset_freed(s);
/* No constant temps have been previously allocated. */
for (int i = 0; i < TCG_TYPE_COUNT; ++i) {
@@ -6120,6 +6120,9 @@ int tcg_gen_code(TCGContext *s, TranslationBlock *tb, uint64_t pc_start)
}
#endif
+ /* Do not reuse any EBB that may be allocated within the TB. */
+ tcg_temp_ebb_reset_freed(s);
+
tcg_optimize(s);
reachable_code_pass(s);

View File

@@ -0,0 +1,149 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
Date: Thu, 12 Dec 2024 15:51:15 +0100
Subject: [PATCH] target/i386: Reset TSCs of parked vCPUs too on VM reset
Since commit 5286c3662294 ("target/i386: properly reset TSC on reset")
QEMU writes the special value of "1" to each online vCPU TSC on VM reset
to reset it.
However parked vCPUs don't get that handling and due to that their TSCs
get desynchronized when the VM gets reset.
This in turn causes KVM to turn off PVCLOCK_TSC_STABLE_BIT in its exported
PV clock.
Note that KVM has no understanding of vCPU being currently parked.
Without PVCLOCK_TSC_STABLE_BIT the sched clock is marked unstable in
the guest's kvm_sched_clock_init().
This causes a performance regressions to show in some tests.
Fix this issue by writing the special value of "1" also to TSCs of parked
vCPUs on VM reset.
Reproducing the issue:
1) Boot a VM with "-smp 2,maxcpus=3" or similar
2) device_add host-x86_64-cpu,id=vcpu,node-id=0,socket-id=0,core-id=2,thread-id=0
3) Wait a few seconds
4) device_del vcpu
5) Inside the VM run:
# echo "t" >/proc/sysrq-trigger; dmesg | grep sched_clock_stable
Observe the sched_clock_stable() value is 1.
6) Reboot the VM
7) Once the VM boots once again run inside it:
# echo "t" >/proc/sysrq-trigger; dmesg | grep sched_clock_stable
Observe the sched_clock_stable() value is now 0.
Fixes: 5286c3662294 ("target/i386: properly reset TSC on reset")
Signed-off-by: Maciej S. Szmigiero <maciej.szmigiero@oracle.com>
Link: https://lore.kernel.org/r/5a605a88e9a231386dc803c60f5fed9b48108139.1734014926.git.maciej.szmigiero@oracle.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit 3f2a05b31ee9ce2ddb6c75a9bc3f5e7f7af9a76f)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
accel/kvm/kvm-all.c | 11 +++++++++++
configs/targets/i386-softmmu.mak | 1 +
configs/targets/x86_64-softmmu.mak | 1 +
include/sysemu/kvm.h | 8 ++++++++
target/i386/kvm/kvm.c | 15 +++++++++++++++
5 files changed, 36 insertions(+)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 801cff16a5..dec1d1c16a 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -437,6 +437,16 @@ int kvm_unpark_vcpu(KVMState *s, unsigned long vcpu_id)
return kvm_fd;
}
+static void kvm_reset_parked_vcpus(void *param)
+{
+ KVMState *s = param;
+ struct KVMParkedVcpu *cpu;
+
+ QLIST_FOREACH(cpu, &s->kvm_parked_vcpus, node) {
+ kvm_arch_reset_parked_vcpu(cpu->vcpu_id, cpu->kvm_fd);
+ }
+}
+
int kvm_create_vcpu(CPUState *cpu)
{
unsigned long vcpu_id = kvm_arch_vcpu_id(cpu);
@@ -2728,6 +2738,7 @@ static int kvm_init(MachineState *ms)
}
qemu_register_reset(kvm_unpoison_all, NULL);
+ qemu_register_reset(kvm_reset_parked_vcpus, s);
if (s->kernel_irqchip_allowed) {
kvm_irqchip_create(s);
diff --git a/configs/targets/i386-softmmu.mak b/configs/targets/i386-softmmu.mak
index 2ac69d5ba3..2eb0e86250 100644
--- a/configs/targets/i386-softmmu.mak
+++ b/configs/targets/i386-softmmu.mak
@@ -1,4 +1,5 @@
TARGET_ARCH=i386
TARGET_SUPPORTS_MTTCG=y
TARGET_KVM_HAVE_GUEST_DEBUG=y
+TARGET_KVM_HAVE_RESET_PARKED_VCPU=y
TARGET_XML_FILES= gdb-xml/i386-32bit.xml
diff --git a/configs/targets/x86_64-softmmu.mak b/configs/targets/x86_64-softmmu.mak
index e12ac3dc59..920e9a4200 100644
--- a/configs/targets/x86_64-softmmu.mak
+++ b/configs/targets/x86_64-softmmu.mak
@@ -2,4 +2,5 @@ TARGET_ARCH=x86_64
TARGET_BASE_ARCH=i386
TARGET_SUPPORTS_MTTCG=y
TARGET_KVM_HAVE_GUEST_DEBUG=y
+TARGET_KVM_HAVE_RESET_PARKED_VCPU=y
TARGET_XML_FILES= gdb-xml/i386-64bit.xml
diff --git a/include/sysemu/kvm.h b/include/sysemu/kvm.h
index c3a60b2890..ab17c09a55 100644
--- a/include/sysemu/kvm.h
+++ b/include/sysemu/kvm.h
@@ -377,6 +377,14 @@ int kvm_arch_init(MachineState *ms, KVMState *s);
int kvm_arch_init_vcpu(CPUState *cpu);
int kvm_arch_destroy_vcpu(CPUState *cpu);
+#ifdef TARGET_KVM_HAVE_RESET_PARKED_VCPU
+void kvm_arch_reset_parked_vcpu(unsigned long vcpu_id, int kvm_fd);
+#else
+static inline void kvm_arch_reset_parked_vcpu(unsigned long vcpu_id, int kvm_fd)
+{
+}
+#endif
+
bool kvm_vcpu_id_is_valid(int vcpu_id);
/* Returns VCPU ID to be used on KVM_CREATE_VCPU ioctl() */
diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index 8e17942c3b..2ff618fbf1 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2415,6 +2415,21 @@ void kvm_arch_after_reset_vcpu(X86CPU *cpu)
}
}
+void kvm_arch_reset_parked_vcpu(unsigned long vcpu_id, int kvm_fd)
+{
+ g_autofree struct kvm_msrs *msrs = NULL;
+
+ msrs = g_malloc0(sizeof(*msrs) + sizeof(msrs->entries[0]));
+ msrs->entries[0].index = MSR_IA32_TSC;
+ msrs->entries[0].data = 1; /* match the value in x86_cpu_reset() */
+ msrs->nmsrs++;
+
+ if (ioctl(kvm_fd, KVM_SET_MSRS, msrs) != 1) {
+ warn_report("parked vCPU %lu TSC reset failed: %d",
+ vcpu_id, errno);
+ }
+}
+
void kvm_arch_do_init_vcpu(X86CPU *cpu)
{
CPUX86State *env = &cpu->env;

View File

@@ -1,143 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Thu, 4 Feb 2021 18:34:35 +0000
Subject: [PATCH] virtiofsd: extract lo_do_open() from lo_open()
Both lo_open() and lo_create() have similar code to open a file. Extract
a common lo_do_open() function from lo_open() that will be used by
lo_create() in a later commit.
Since lo_do_open() does not otherwise need fuse_req_t req, convert
lo_add_fd_mapping() to use struct lo_data *lo instead.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210204150208.367837-2-stefanha@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
tools/virtiofsd/passthrough_ll.c | 73 ++++++++++++++++++++------------
1 file changed, 46 insertions(+), 27 deletions(-)
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 97485b22b4..218e20e9d7 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -471,17 +471,17 @@ static void lo_map_remove(struct lo_map *map, size_t key)
}
/* Assumes lo->mutex is held */
-static ssize_t lo_add_fd_mapping(fuse_req_t req, int fd)
+static ssize_t lo_add_fd_mapping(struct lo_data *lo, int fd)
{
struct lo_map_elem *elem;
- elem = lo_map_alloc_elem(&lo_data(req)->fd_map);
+ elem = lo_map_alloc_elem(&lo->fd_map);
if (!elem) {
return -1;
}
elem->fd = fd;
- return elem - lo_data(req)->fd_map.elems;
+ return elem - lo->fd_map.elems;
}
/* Assumes lo->mutex is held */
@@ -1661,6 +1661,38 @@ static void update_open_flags(int writeback, int allow_direct_io,
}
}
+static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
+ struct fuse_file_info *fi)
+{
+ char buf[64];
+ ssize_t fh;
+ int fd;
+
+ update_open_flags(lo->writeback, lo->allow_direct_io, fi);
+
+ sprintf(buf, "%i", inode->fd);
+ fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
+ if (fd == -1) {
+ return errno;
+ }
+
+ pthread_mutex_lock(&lo->mutex);
+ fh = lo_add_fd_mapping(lo, fd);
+ pthread_mutex_unlock(&lo->mutex);
+ if (fh == -1) {
+ close(fd);
+ return ENOMEM;
+ }
+
+ fi->fh = fh;
+ if (lo->cache == CACHE_NONE) {
+ fi->direct_io = 1;
+ } else if (lo->cache == CACHE_ALWAYS) {
+ fi->keep_cache = 1;
+ }
+ return 0;
+}
+
static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
mode_t mode, struct fuse_file_info *fi)
{
@@ -1701,7 +1733,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
ssize_t fh;
pthread_mutex_lock(&lo->mutex);
- fh = lo_add_fd_mapping(req, fd);
+ fh = lo_add_fd_mapping(lo, fd);
pthread_mutex_unlock(&lo->mutex);
if (fh == -1) {
close(fd);
@@ -1892,38 +1924,25 @@ static void lo_fsyncdir(fuse_req_t req, fuse_ino_t ino, int datasync,
static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
{
- int fd;
- ssize_t fh;
- char buf[64];
struct lo_data *lo = lo_data(req);
+ struct lo_inode *inode = lo_inode(req, ino);
+ int err;
fuse_log(FUSE_LOG_DEBUG, "lo_open(ino=%" PRIu64 ", flags=%d)\n", ino,
fi->flags);
- update_open_flags(lo->writeback, lo->allow_direct_io, fi);
-
- sprintf(buf, "%i", lo_fd(req, ino));
- fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
- if (fd == -1) {
- return (void)fuse_reply_err(req, errno);
- }
-
- pthread_mutex_lock(&lo->mutex);
- fh = lo_add_fd_mapping(req, fd);
- pthread_mutex_unlock(&lo->mutex);
- if (fh == -1) {
- close(fd);
- fuse_reply_err(req, ENOMEM);
+ if (!inode) {
+ fuse_reply_err(req, EBADF);
return;
}
- fi->fh = fh;
- if (lo->cache == CACHE_NONE) {
- fi->direct_io = 1;
- } else if (lo->cache == CACHE_ALWAYS) {
- fi->keep_cache = 1;
+ err = lo_do_open(lo, inode, fi);
+ lo_inode_put(lo, &inode);
+ if (err) {
+ fuse_reply_err(req, err);
+ } else {
+ fuse_reply_open(req, fi);
}
- fuse_reply_open(req, fi);
}
static void lo_release(fuse_req_t req, fuse_ino_t ino,

View File

@@ -0,0 +1,41 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Zhao Liu <zhao1.liu@intel.com>
Date: Wed, 6 Nov 2024 11:07:18 +0800
Subject: [PATCH] i386/cpu: Mark avx10_version filtered when prefix is NULL
In x86_cpu_filter_features(), if host doesn't support AVX10, the
configured avx10_version should be marked as filtered regardless of
whether prefix is NULL or not.
Check prefix before warn_report() instead of checking for
have_filtered_features.
Cc: qemu-stable@nongnu.org
Fixes: commit bccfb846fd52 ("target/i386: add AVX10 feature and AVX10 version property")
Signed-off-by: Zhao Liu <zhao1.liu@intel.com>
Reviewed-by: Tao Su <tao1.su@linux.intel.com>
Link: https://lore.kernel.org/r/20241106030728.553238-2-zhao1.liu@intel.com
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
(cherry picked from commit cf4c263551886964c5d58bd7b675b13fd497b402)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
target/i386/cpu.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index 3725dbbc4b..1981aeaba5 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -7718,8 +7718,10 @@ static bool x86_cpu_filter_features(X86CPU *cpu, bool verbose)
env->avx10_version = version;
have_filtered_features = true;
}
- } else if (env->avx10_version && prefix) {
- warn_report("%s: avx10.%d.", prefix, env->avx10_version);
+ } else if (env->avx10_version) {
+ if (prefix) {
+ warn_report("%s: avx10.%d.", prefix, env->avx10_version);
+ }
have_filtered_features = true;
}

View File

@@ -1,107 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Thu, 4 Feb 2021 18:34:36 +0000
Subject: [PATCH] virtiofsd: optionally return inode pointer from
lo_do_lookup()
lo_do_lookup() finds an existing inode or allocates a new one. It
increments nlookup so that the inode stays alive until the client
releases it.
Existing callers don't need the struct lo_inode so the function doesn't
return it. Extend the function to optionally return the inode. The next
commit will need it.
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210204150208.367837-3-stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
tools/virtiofsd/passthrough_ll.c | 29 +++++++++++++++++++++--------
1 file changed, 21 insertions(+), 8 deletions(-)
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 218e20e9d7..2bd050b620 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -843,11 +843,13 @@ static int do_statx(struct lo_data *lo, int dirfd, const char *pathname,
}
/*
- * Increments nlookup and caller must release refcount using
- * lo_inode_put(&parent).
+ * Increments nlookup on the inode on success. unref_inode_lolocked() must be
+ * called eventually to decrement nlookup again. If inodep is non-NULL, the
+ * inode pointer is stored and the caller must call lo_inode_put().
*/
static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
- struct fuse_entry_param *e)
+ struct fuse_entry_param *e,
+ struct lo_inode **inodep)
{
int newfd;
int res;
@@ -857,6 +859,10 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
struct lo_inode *inode = NULL;
struct lo_inode *dir = lo_inode(req, parent);
+ if (inodep) {
+ *inodep = NULL;
+ }
+
/*
* name_to_handle_at() and open_by_handle_at() can reach here with fuse
* mount point in guest, but we don't have its inode info in the
@@ -924,7 +930,14 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
pthread_mutex_unlock(&lo->mutex);
}
e->ino = inode->fuse_ino;
- lo_inode_put(lo, &inode);
+
+ /* Transfer ownership of inode pointer to caller or drop it */
+ if (inodep) {
+ *inodep = inode;
+ } else {
+ lo_inode_put(lo, &inode);
+ }
+
lo_inode_put(lo, &dir);
fuse_log(FUSE_LOG_DEBUG, " %lli/%s -> %lli\n", (unsigned long long)parent,
@@ -959,7 +972,7 @@ static void lo_lookup(fuse_req_t req, fuse_ino_t parent, const char *name)
return;
}
- err = lo_do_lookup(req, parent, name, &e);
+ err = lo_do_lookup(req, parent, name, &e, NULL);
if (err) {
fuse_reply_err(req, err);
} else {
@@ -1067,7 +1080,7 @@ static void lo_mknod_symlink(fuse_req_t req, fuse_ino_t parent,
goto out;
}
- saverr = lo_do_lookup(req, parent, name, &e);
+ saverr = lo_do_lookup(req, parent, name, &e, NULL);
if (saverr) {
goto out;
}
@@ -1544,7 +1557,7 @@ static void lo_do_readdir(fuse_req_t req, fuse_ino_t ino, size_t size,
if (plus) {
if (!is_dot_or_dotdot(name)) {
- err = lo_do_lookup(req, ino, name, &e);
+ err = lo_do_lookup(req, ino, name, &e, NULL);
if (err) {
goto error;
}
@@ -1742,7 +1755,7 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
}
fi->fh = fh;
- err = lo_do_lookup(req, parent, name, &e);
+ err = lo_do_lookup(req, parent, name, &e, NULL);
}
if (lo->cache == CACHE_NONE) {
fi->direct_io = 1;

View File

@@ -0,0 +1,67 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Laurent Vivier <lvivier@redhat.com>
Date: Fri, 17 Jan 2025 12:17:08 +0100
Subject: [PATCH] net: Fix announce_self
b9ad513e1876 ("net: Remove receive_raw()") adds an iovec entry
in qemu_deliver_packet_iov() to add the virtio-net header
in the data when QEMU_NET_PACKET_FLAG_RAW is set but forgets
to increase the number of iovec entries in the array, so
receive_iov() will only send the first entry (the virtio-net
entry, full of 0) and no data. The packet will be discarded.
The only user of QEMU_NET_PACKET_FLAG_RAW is announce_self.
We can see the problem with tcpdump:
- QEMU parameters:
.. -monitor stdio \
-netdev bridge,id=netdev0,br=virbr0 \
-device virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0 \
- HMP command:
(qemu) announce_self
- TCP dump:
$ sudo tcpdump -nxi virbr0
without the fix:
<nothing>
with the fix:
ARP, Reverse Request who-is 9a:2b:2c:2d:2e:2f tell 9a:2b:2c:2d:2e:2f, length 46
0x0000: 0001 0800 0604 0003 9a2b 2c2d 2e2f 0000
0x0010: 0000 9a2b 2c2d 2e2f 0000 0000 0000 0000
0x0020: 0000 0000 0000 0000 0000 0000 0000
Reported-by: Xiaohui Li <xiaohli@redhat.com>
Bug: https://issues.redhat.com/browse/RHEL-73891
Fixes: b9ad513e1876 ("net: Remove receive_raw()")
Cc: akihiko.odaki@daynix.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
(picked from https://lore.kernel.org/qemu-devel/20250117111709.970789-2-lvivier@redhat.com/)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
net/net.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/net/net.c b/net/net.c
index 7ef6885876..fefa701bb2 100644
--- a/net/net.c
+++ b/net/net.c
@@ -822,6 +822,7 @@ static ssize_t qemu_deliver_packet_iov(NetClientState *sender,
iov_copy[0].iov_len = nc->vnet_hdr_len;
memcpy(&iov_copy[1], iov, iovcnt * sizeof(*iov));
iov = iov_copy;
+ iovcnt++;
}
if (nc->info->receive_iov) {

View File

@@ -1,296 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Hajnoczi <stefanha@redhat.com>
Date: Thu, 4 Feb 2021 18:34:37 +0000
Subject: [PATCH] virtiofsd: prevent opening of special files (CVE-2020-35517)
A well-behaved FUSE client does not attempt to open special files with
FUSE_OPEN because they are handled on the client side (e.g. device nodes
are handled by client-side device drivers).
The check to prevent virtiofsd from opening special files is missing in
a few cases, most notably FUSE_OPEN. A malicious client can cause
virtiofsd to open a device node, potentially allowing the guest to
escape. This can be exploited by a modified guest device driver. It is
not exploitable from guest userspace since the guest kernel will handle
special files inside the guest instead of sending FUSE requests.
This patch fixes this issue by introducing the lo_inode_open() function
to check the file type before opening it. This is a short-term solution
because it does not prevent a compromised virtiofsd process from opening
device nodes on the host.
Restructure lo_create() to try O_CREAT | O_EXCL first. Note that O_CREAT
| O_EXCL does not follow symlinks, so O_NOFOLLOW masking is not
necessary here. If the file exists and the user did not specify O_EXCL,
open it via lo_do_open().
Reported-by: Alex Xu <alex@alxu.ca>
Fixes: CVE-2020-35517
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Vivek Goyal <vgoyal@redhat.com>
Reviewed-by: Greg Kurz <groug@kaod.org>
Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
Message-Id: <20210204150208.367837-4-stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
tools/virtiofsd/passthrough_ll.c | 144 ++++++++++++++++++++-----------
1 file changed, 92 insertions(+), 52 deletions(-)
diff --git a/tools/virtiofsd/passthrough_ll.c b/tools/virtiofsd/passthrough_ll.c
index 2bd050b620..03c5e0d13c 100644
--- a/tools/virtiofsd/passthrough_ll.c
+++ b/tools/virtiofsd/passthrough_ll.c
@@ -567,6 +567,38 @@ static int lo_fd(fuse_req_t req, fuse_ino_t ino)
return fd;
}
+/*
+ * Open a file descriptor for an inode. Returns -EBADF if the inode is not a
+ * regular file or a directory.
+ *
+ * Use this helper function instead of raw openat(2) to prevent security issues
+ * when a malicious client opens special files such as block device nodes.
+ * Symlink inodes are also rejected since symlinks must already have been
+ * traversed on the client side.
+ */
+static int lo_inode_open(struct lo_data *lo, struct lo_inode *inode,
+ int open_flags)
+{
+ g_autofree char *fd_str = g_strdup_printf("%d", inode->fd);
+ int fd;
+
+ if (!S_ISREG(inode->filetype) && !S_ISDIR(inode->filetype)) {
+ return -EBADF;
+ }
+
+ /*
+ * The file is a symlink so O_NOFOLLOW must be ignored. We checked earlier
+ * that the inode is not a special file but if an external process races
+ * with us then symlinks are traversed here. It is not possible to escape
+ * the shared directory since it is mounted as "/" though.
+ */
+ fd = openat(lo->proc_self_fd, fd_str, open_flags & ~O_NOFOLLOW);
+ if (fd < 0) {
+ return -errno;
+ }
+ return fd;
+}
+
static void lo_init(void *userdata, struct fuse_conn_info *conn)
{
struct lo_data *lo = (struct lo_data *)userdata;
@@ -696,9 +728,9 @@ static void lo_setattr(fuse_req_t req, fuse_ino_t ino, struct stat *attr,
if (fi) {
truncfd = fd;
} else {
- sprintf(procname, "%i", ifd);
- truncfd = openat(lo->proc_self_fd, procname, O_RDWR);
+ truncfd = lo_inode_open(lo, inode, O_RDWR);
if (truncfd < 0) {
+ errno = -truncfd;
goto out_err;
}
}
@@ -860,7 +892,7 @@ static int lo_do_lookup(fuse_req_t req, fuse_ino_t parent, const char *name,
struct lo_inode *dir = lo_inode(req, parent);
if (inodep) {
- *inodep = NULL;
+ *inodep = NULL; /* in case there is an error */
}
/*
@@ -1674,19 +1706,26 @@ static void update_open_flags(int writeback, int allow_direct_io,
}
}
+/*
+ * Open a regular file, set up an fd mapping, and fill out the struct
+ * fuse_file_info for it. If existing_fd is not negative, use that fd instead
+ * opening a new one. Takes ownership of existing_fd.
+ *
+ * Returns 0 on success or a positive errno.
+ */
static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
- struct fuse_file_info *fi)
+ int existing_fd, struct fuse_file_info *fi)
{
- char buf[64];
ssize_t fh;
- int fd;
+ int fd = existing_fd;
update_open_flags(lo->writeback, lo->allow_direct_io, fi);
- sprintf(buf, "%i", inode->fd);
- fd = openat(lo->proc_self_fd, buf, fi->flags & ~O_NOFOLLOW);
- if (fd == -1) {
- return errno;
+ if (fd < 0) {
+ fd = lo_inode_open(lo, inode, fi->flags);
+ if (fd < 0) {
+ return -fd;
+ }
}
pthread_mutex_lock(&lo->mutex);
@@ -1709,9 +1748,10 @@ static int lo_do_open(struct lo_data *lo, struct lo_inode *inode,
static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
mode_t mode, struct fuse_file_info *fi)
{
- int fd;
+ int fd = -1;
struct lo_data *lo = lo_data(req);
struct lo_inode *parent_inode;
+ struct lo_inode *inode = NULL;
struct fuse_entry_param e;
int err;
struct lo_cred old = {};
@@ -1737,36 +1777,38 @@ static void lo_create(fuse_req_t req, fuse_ino_t parent, const char *name,
update_open_flags(lo->writeback, lo->allow_direct_io, fi);
- fd = openat(parent_inode->fd, name, (fi->flags | O_CREAT) & ~O_NOFOLLOW,
- mode);
+ /* Try to create a new file but don't open existing files */
+ fd = openat(parent_inode->fd, name, fi->flags | O_CREAT | O_EXCL, mode);
err = fd == -1 ? errno : 0;
- lo_restore_cred(&old);
- if (!err) {
- ssize_t fh;
+ lo_restore_cred(&old);
- pthread_mutex_lock(&lo->mutex);
- fh = lo_add_fd_mapping(lo, fd);
- pthread_mutex_unlock(&lo->mutex);
- if (fh == -1) {
- close(fd);
- err = ENOMEM;
- goto out;
- }
+ /* Ignore the error if file exists and O_EXCL was not given */
+ if (err && (err != EEXIST || (fi->flags & O_EXCL))) {
+ goto out;
+ }
- fi->fh = fh;
- err = lo_do_lookup(req, parent, name, &e, NULL);
+ err = lo_do_lookup(req, parent, name, &e, &inode);
+ if (err) {
+ goto out;
}
- if (lo->cache == CACHE_NONE) {
- fi->direct_io = 1;
- } else if (lo->cache == CACHE_ALWAYS) {
- fi->keep_cache = 1;
+
+ err = lo_do_open(lo, inode, fd, fi);
+ fd = -1; /* lo_do_open() takes ownership of fd */
+ if (err) {
+ /* Undo lo_do_lookup() nlookup ref */
+ unref_inode_lolocked(lo, inode, 1);
}
out:
+ lo_inode_put(lo, &inode);
lo_inode_put(lo, &parent_inode);
if (err) {
+ if (fd >= 0) {
+ close(fd);
+ }
+
fuse_reply_err(req, err);
} else {
fuse_reply_create(req, &e, fi);
@@ -1780,7 +1822,6 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
pid_t pid, int *err)
{
struct lo_inode_plock *plock;
- char procname[64];
int fd;
plock =
@@ -1797,12 +1838,10 @@ static struct lo_inode_plock *lookup_create_plock_ctx(struct lo_data *lo,
}
/* Open another instance of file which can be used for ofd locks. */
- sprintf(procname, "%i", inode->fd);
-
/* TODO: What if file is not writable? */
- fd = openat(lo->proc_self_fd, procname, O_RDWR);
- if (fd == -1) {
- *err = errno;
+ fd = lo_inode_open(lo, inode, O_RDWR);
+ if (fd < 0) {
+ *err = -fd;
free(plock);
return NULL;
}
@@ -1949,7 +1988,7 @@ static void lo_open(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
return;
}
- err = lo_do_open(lo, inode, fi);
+ err = lo_do_open(lo, inode, -1, fi);
lo_inode_put(lo, &inode);
if (err) {
fuse_reply_err(req, err);
@@ -2005,39 +2044,40 @@ static void lo_flush(fuse_req_t req, fuse_ino_t ino, struct fuse_file_info *fi)
static void lo_fsync(fuse_req_t req, fuse_ino_t ino, int datasync,
struct fuse_file_info *fi)
{
+ struct lo_inode *inode = lo_inode(req, ino);
+ struct lo_data *lo = lo_data(req);
int res;
int fd;
- char *buf;
fuse_log(FUSE_LOG_DEBUG, "lo_fsync(ino=%" PRIu64 ", fi=0x%p)\n", ino,
(void *)fi);
- if (!fi) {
- struct lo_data *lo = lo_data(req);
-
- res = asprintf(&buf, "%i", lo_fd(req, ino));
- if (res == -1) {
- return (void)fuse_reply_err(req, errno);
- }
+ if (!inode) {
+ fuse_reply_err(req, EBADF);
+ return;
+ }
- fd = openat(lo->proc_self_fd, buf, O_RDWR);
- free(buf);
- if (fd == -1) {
- return (void)fuse_reply_err(req, errno);
+ if (!fi) {
+ fd = lo_inode_open(lo, inode, O_RDWR);
+ if (fd < 0) {
+ res = -fd;
+ goto out;
}
} else {
fd = lo_fi_fd(req, fi);
}
if (datasync) {
- res = fdatasync(fd);
+ res = fdatasync(fd) == -1 ? errno : 0;
} else {
- res = fsync(fd);
+ res = fsync(fd) == -1 ? errno : 0;
}
if (!fi) {
close(fd);
}
- fuse_reply_err(req, res == -1 ? errno : 0);
+out:
+ lo_inode_put(lo, &inode);
+ fuse_reply_err(req, res);
}
static void lo_read(fuse_req_t req, fuse_ino_t ino, size_t size, off_t offset,

View File

@@ -0,0 +1,67 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Laurent Vivier <lvivier@redhat.com>
Date: Fri, 17 Jan 2025 12:17:09 +0100
Subject: [PATCH] net/dump: Correctly compute Ethernet packet offset
When a packet is sent with QEMU_NET_PACKET_FLAG_RAW by QEMU it
never includes virtio-net header even if qemu_get_vnet_hdr_len()
is not 0, and filter-dump is not managing this case.
The only user of QEMU_NET_PACKET_FLAG_RAW is announce_self,
we can show the problem using it and tcpddump:
- QEMU parameters:
.. -monitor stdio \
-netdev bridge,id=netdev0,br=virbr0 \
-device virtio-net,mac=9a:2b:2c:2d:2e:2f,netdev=netdev0 \
-object filter-dump,netdev=netdev0,file=log.pcap,id=pcap0
- HMP command:
(qemu) announce_self
- TCP dump:
$ tcpdump -nxr log.pcap
without the fix:
08:00:06:04:00:03 > 2e:2f:80:35:00:01, ethertype Unknown (0x9a2b), length 50:
0x0000: 2c2d 2e2f 0000 0000 9a2b 2c2d 2e2f 0000
0x0010: 0000 0000 0000 0000 0000 0000 0000 0000
0x0020: 0000 0000
with the fix:
ARP, Reverse Request who-is 9a:2b:2c:2d:2e:2f tell 9a:2b:2c:2d:2e:2f, length 46
0x0000: 0001 0800 0604 0003 9a2b 2c2d 2e2f 0000
0x0010: 0000 9a2b 2c2d 2e2f 0000 0000 0000 0000
0x0020: 0000 0000 0000 0000 0000 0000 0000
Fixes: 481c52320a26 ("net: Strip virtio-net header when dumping")
Cc: akihiko.odaki@daynix.com
Signed-off-by: Laurent Vivier <lvivier@redhat.com>
Reviewed-by: Akihiko Odaki <akihiko.odaki@daynix.com>
Acked-by: Jason Wang <jasowang@redhat.com>
Reviewed-by: Michael Tokarev <mjt@tls.msk.ru>
(picked from https://lore.kernel.org/qemu-devel/20250117111709.970789-3-lvivier@redhat.com/)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
net/dump.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/net/dump.c b/net/dump.c
index 956e34a123..42ab8d7716 100644
--- a/net/dump.c
+++ b/net/dump.c
@@ -155,7 +155,8 @@ static ssize_t filter_dump_receive_iov(NetFilterState *nf, NetClientState *sndr,
{
NetFilterDumpState *nfds = FILTER_DUMP(nf);
- dump_receive_iov(&nfds->ds, iov, iovcnt, qemu_get_vnet_hdr_len(nf->netdev));
+ dump_receive_iov(&nfds->ds, iov, iovcnt, flags & QEMU_NET_PACKET_FLAG_RAW ?
+ 0 : qemu_get_vnet_hdr_len(nf->netdev));
return 0;
}

View File

@@ -1,29 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Greg Kurz <groug@kaod.org>
Date: Thu, 4 Feb 2021 18:34:38 +0000
Subject: [PATCH] virtiofsd: Add _llseek to the seccomp whitelist
This is how glibc implements lseek(2) on POWER.
BugLink: https://bugzilla.redhat.com/show_bug.cgi?id=1917692
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210121171540.1449777-1-groug@kaod.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
tools/virtiofsd/passthrough_seccomp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index 11623f56f2..bb8ef5b17f 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -68,6 +68,7 @@ static const int syscall_whitelist[] = {
SCMP_SYS(linkat),
SCMP_SYS(listxattr),
SCMP_SYS(lseek),
+ SCMP_SYS(_llseek), /* For POWER */
SCMP_SYS(madvise),
SCMP_SYS(mkdirat),
SCMP_SYS(mknodat),

View File

@@ -0,0 +1,96 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Igor Mammedov <imammedo@redhat.com>
Date: Wed, 15 Jan 2025 13:53:41 +0100
Subject: [PATCH] pci: acpi: Windows 'PCI Label Id' bug workaround
Current versions of Windows call _DSM(func=7) regardless
of whether it is supported or not. It leads to NICs having bogus
'PCI Label Id = 0', where none should be set at all.
Also presence of 'PCI Label Id' triggers another Windows bug
on localized versions that leads to hangs. The later bug is fixed
in latest updates for 'Windows Server' but not in consumer
versions of Windows (and there is no plans to fix it
as far as I'm aware).
Given it's easy, implement Microsoft suggested workaround
(return invalid Package) so that affected Windows versions
could boot on QEMU.
This would effectvely remove bogus 'PCI Label Id's on NICs,
but MS teem confirmed that flipping 'PCI Label Id' should not
change 'Network Connection' ennumeration, so it should be safe
for QEMU to change _DSM without any compat code.
Smoke tested with WinXP and WS2022
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/774
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20250115125342.3883374-3-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 0b053391985abcc40b16ac8fc4a7f6588d1d95c1)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/i386/acpi-build.c | 33 +++++++++++++++++++++++----------
1 file changed, 23 insertions(+), 10 deletions(-)
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 9fcc2897b8..f7b961e04c 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -654,6 +654,7 @@ static Aml *aml_pci_pdsm(void)
Aml *acpi_index = aml_local(2);
Aml *zero = aml_int(0);
Aml *one = aml_int(1);
+ Aml *not_supp = aml_int(0xFFFFFFFF);
Aml *func = aml_arg(2);
Aml *params = aml_arg(4);
Aml *bnum = aml_derefof(aml_index(params, aml_int(0)));
@@ -678,7 +679,7 @@ static Aml *aml_pci_pdsm(void)
*/
ifctx1 = aml_if(aml_lnot(
aml_or(aml_equal(acpi_index, zero),
- aml_equal(acpi_index, aml_int(0xFFFFFFFF)), NULL)
+ aml_equal(acpi_index, not_supp), NULL)
));
{
/* have supported functions */
@@ -704,18 +705,30 @@ static Aml *aml_pci_pdsm(void)
{
Aml *pkg = aml_package(2);
- aml_append(pkg, zero);
- /*
- * optional, if not impl. should return null string
- */
- aml_append(pkg, aml_string("%s", ""));
- aml_append(ifctx, aml_store(pkg, ret));
-
aml_append(ifctx, aml_store(aml_call2("AIDX", bnum, sunum), acpi_index));
+ aml_append(ifctx, aml_store(pkg, ret));
/*
- * update acpi-index to actual value
+ * Windows calls func=7 without checking if it's available,
+ * as workaround Microsoft has suggested to return invalid for func7
+ * Package, so return 2 elements package but only initialize elements
+ * when acpi_index is supported and leave them uninitialized, which
+ * leads elements to being Uninitialized ObjectType and should trip
+ * Windows into discarding result as an unexpected and prevent setting
+ * bogus 'PCI Label' on the device.
*/
- aml_append(ifctx, aml_store(acpi_index, aml_index(ret, zero)));
+ ifctx1 = aml_if(aml_lnot(aml_lor(
+ aml_equal(acpi_index, zero), aml_equal(acpi_index, not_supp)
+ )));
+ {
+ aml_append(ifctx1, aml_store(acpi_index, aml_index(ret, zero)));
+ /*
+ * optional, if not impl. should return null string
+ */
+ aml_append(ifctx1, aml_store(aml_string("%s", ""),
+ aml_index(ret, one)));
+ }
+ aml_append(ifctx, ifctx1);
+
aml_append(ifctx, aml_return(ret));
}

View File

@@ -1,31 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Greg Kurz <groug@kaod.org>
Date: Thu, 4 Feb 2021 18:34:39 +0000
Subject: [PATCH] virtiofsd: Add restart_syscall to the seccomp whitelist
This is how linux restarts some system calls after SIGSTOP/SIGCONT.
This is needed to avoid virtiofsd termination when resuming execution
under GDB for example.
Signed-off-by: Greg Kurz <groug@kaod.org>
Message-Id: <20210201193305.136390-1-groug@kaod.org>
Reviewed-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Dr. David Alan Gilbert <dgilbert@redhat.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
tools/virtiofsd/passthrough_seccomp.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tools/virtiofsd/passthrough_seccomp.c b/tools/virtiofsd/passthrough_seccomp.c
index bb8ef5b17f..44d75e0e36 100644
--- a/tools/virtiofsd/passthrough_seccomp.c
+++ b/tools/virtiofsd/passthrough_seccomp.c
@@ -92,6 +92,7 @@ static const int syscall_whitelist[] = {
SCMP_SYS(renameat),
SCMP_SYS(renameat2),
SCMP_SYS(removexattr),
+ SCMP_SYS(restart_syscall),
SCMP_SYS(rt_sigaction),
SCMP_SYS(rt_sigprocmask),
SCMP_SYS(rt_sigreturn),

View File

@@ -0,0 +1,53 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Phil Dennis-Jordan <phil@philjordan.eu>
Date: Fri, 13 Dec 2024 17:06:14 +0100
Subject: [PATCH] hw/usb/hcd-xhci-pci: Use modulo to select MSI vector as per
spec
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
QEMU would crash with a failed assertion if the XHCI controller
attempted to raise the interrupt on an interrupter corresponding
to a MSI vector with a higher index than the highest configured
for the device by the guest driver.
This behaviour is correct on the MSI/PCI side: per PCI 3.0 spec,
devices must ensure they do not send MSI notifications for
vectors beyond the range of those allocated by the system/driver
software. Unlike MSI-X, there is no generic way for handling
aliasing in the case of fewer allocated vectors than requested,
so the specifics are up to device implementors. (Section
6.8.3.4. "Sending Messages")
It turns out the XHCI spec (Implementation Note in section 4.17,
"Interrupters") requires that the host controller signal the MSI
vector with the number computed by taking the interrupter number
modulo the number of enabled MSI vectors.
This change introduces that modulo calculation, fixing the
failed assertion. This makes the device work correctly in MSI mode
with macOS's XHCI driver, which only allocates a single vector.
Signed-off-by: Phil Dennis-Jordan <phil@philjordan.eu>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Message-ID: <20250112210056.16658-2-phil@philjordan.eu>
Signed-off-by: Philippe Mathieu-Daudé <philmd@linaro.org>
(cherry picked from commit bb5b7fced6b5d3334ab20702fc846e47bb1fb731)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/usb/hcd-xhci-pci.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/hw/usb/hcd-xhci-pci.c b/hw/usb/hcd-xhci-pci.c
index a039f5778a..516e6909d2 100644
--- a/hw/usb/hcd-xhci-pci.c
+++ b/hw/usb/hcd-xhci-pci.c
@@ -74,6 +74,7 @@ static bool xhci_pci_intr_raise(XHCIState *xhci, int n, bool level)
}
if (msi_enabled(pci_dev) && level) {
+ n %= msi_nr_vectors_allocated(pci_dev);
msi_notify(pci_dev, n);
return true;
}

View File

@@ -1,108 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Vitaly Cheptsov <cheptsov@ispras.ru>
Date: Tue, 2 Mar 2021 09:21:10 -0500
Subject: [PATCH] i386/acpi: restore device paths for pre-5.1 vms
After fixing the _UID value for the primary PCI root bridge in
af1b80ae it was discovered that this change updates Windows
configuration in an incompatible way causing network configuration
failure unless DHCP is used. More details provided on the list:
https://lists.gnu.org/archive/html/qemu-devel/2021-02/msg08484.html
This change reverts the _UID update from 1 to 0 for q35 and i440fx
VMs before version 5.2 to maintain the original behaviour when
upgrading.
Cc: qemu-stable@nongnu.org
Cc: qemu-devel@nongnu.org
Reported-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Suggested-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Vitaly Cheptsov <cheptsov@ispras.ru>
Message-Id: <20210301195919.9333-1-cheptsov@ispras.ru>
Tested-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Reviewed-by: Igor Mammedov <imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Fixes: af1b80ae56c9 ("i386/acpi: fix inconsistent QEMU/OVMF device paths")
---
hw/i386/acpi-build.c | 4 ++--
hw/i386/pc_piix.c | 2 ++
hw/i386/pc_q35.c | 2 ++
include/hw/i386/pc.h | 1 +
4 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index 1f5c211245..b5616582a5 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1513,7 +1513,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
dev = aml_device("PCI0");
aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A03")));
aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
- aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+ aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
aml_append(sb_scope, dev);
aml_append(dsdt, sb_scope);
@@ -1530,7 +1530,7 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
aml_append(dev, aml_name_decl("_HID", aml_eisaid("PNP0A08")));
aml_append(dev, aml_name_decl("_CID", aml_eisaid("PNP0A03")));
aml_append(dev, aml_name_decl("_ADR", aml_int(0)));
- aml_append(dev, aml_name_decl("_UID", aml_int(0)));
+ aml_append(dev, aml_name_decl("_UID", aml_int(pcmc->pci_root_uid)));
aml_append(dev, build_q35_osc_method());
aml_append(sb_scope, dev);
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 13d1628f13..2524c96216 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -417,6 +417,7 @@ static void pc_i440fx_machine_options(MachineClass *m)
{
PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
pcmc->default_nic_model = "e1000";
+ pcmc->pci_root_uid = 0;
m->family = "pc_piix";
m->desc = "Standard PC (i440FX + PIIX, 1996)";
@@ -448,6 +449,7 @@ static void pc_i440fx_5_1_machine_options(MachineClass *m)
compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
pcmc->kvmclock_create_always = false;
+ pcmc->pci_root_uid = 1;
}
DEFINE_I440FX_MACHINE(v5_1, "pc-i440fx-5.1", NULL,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index a3f4959c43..c58dad5ae3 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -329,6 +329,7 @@ static void pc_q35_machine_options(MachineClass *m)
{
PCMachineClass *pcmc = PC_MACHINE_CLASS(m);
pcmc->default_nic_model = "e1000e";
+ pcmc->pci_root_uid = 0;
m->family = "pc_q35";
m->desc = "Standard PC (Q35 + ICH9, 2009)";
@@ -364,6 +365,7 @@ static void pc_q35_5_1_machine_options(MachineClass *m)
compat_props_add(m->compat_props, hw_compat_5_1, hw_compat_5_1_len);
compat_props_add(m->compat_props, pc_compat_5_1, pc_compat_5_1_len);
pcmc->kvmclock_create_always = false;
+ pcmc->pci_root_uid = 1;
}
DEFINE_Q35_MACHINE(v5_1, "pc-q35-5.1", NULL,
diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 911e460097..7f8e1a791f 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -99,6 +99,7 @@ struct PCMachineClass {
int legacy_acpi_table_size;
unsigned acpi_data_size;
bool do_not_add_smb_acpi;
+ int pci_root_uid;
/* SMBIOS compat: */
bool smbios_defaults;

View File

@@ -1,48 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Mon, 22 Mar 2021 15:20:04 +0100
Subject: [PATCH] monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB
The QMP dispatcher coroutine holds the qmp_queue_lock over a yield
point, where it expects to be rescheduled from the main context. If a
CHR_EVENT_CLOSED event is received just then, it can race and block the
main thread on the mutex in monitor_qmp_cleanup_queue_and_resume.
Calculate need_resume immediately after we pop a request from the queue,
so that we can release the mutex before yielding.
Suggested-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
monitor/qmp.c | 11 ++++++-----
1 file changed, 6 insertions(+), 5 deletions(-)
diff --git a/monitor/qmp.c b/monitor/qmp.c
index 2e37d11bd3..2aff833f7a 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -252,6 +252,12 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
}
}
+ mon = req_obj->mon;
+ /* qmp_oob_enabled() might change after "qmp_capabilities" */
+ need_resume = !qmp_oob_enabled(mon) ||
+ mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
+ qemu_mutex_unlock(&mon->qmp_queue_lock);
+
if (qatomic_xchg(&qmp_dispatcher_co_busy, true) == true) {
/*
* Someone rescheduled us (probably because a new requests
@@ -270,11 +276,6 @@ void coroutine_fn monitor_qmp_dispatcher_co(void *data)
aio_co_schedule(qemu_get_aio_context(), qmp_dispatcher_co);
qemu_coroutine_yield();
- mon = req_obj->mon;
- /* qmp_oob_enabled() might change after "qmp_capabilities" */
- need_resume = !qmp_oob_enabled(mon) ||
- mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;
- qemu_mutex_unlock(&mon->qmp_queue_lock);
if (req_obj->req) {
QDict *qdict = qobject_to(QDict, req_obj->req);
QObject *id = qdict ? qdict_get(qdict, "id") : NULL;

View File

@@ -0,0 +1,63 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Sebastian Ott <sebott@redhat.com>
Date: Tue, 3 Dec 2024 13:19:28 +0100
Subject: [PATCH] pci: ensure valid link status bits for downstream ports
PCI hotplug for downstream endpoints on arm fails because Linux'
PCIe hotplug driver doesn't like the QEMU provided LNKSTA:
pcieport 0000:08:01.0: pciehp: Slot(2): Card present
pcieport 0000:08:01.0: pciehp: Slot(2): Link Up
pcieport 0000:08:01.0: pciehp: Slot(2): Cannot train link: status 0x2000
There's 2 cases where LNKSTA isn't setup properly:
* the downstream device has no express capability
* max link width of the bridge is 0
Move the sanity checks added via 88c869198aa63
("pci: Sanity test minimum downstream LNKSTA") outside of the
branch to make sure downstream ports always have a valid LNKSTA.
Signed-off-by: Sebastian Ott <sebott@redhat.com>
Tested-by: Zhenyu Zhang <zhenyzha@redhat.com>
Message-Id: <20241203121928.14861-1-sebott@redhat.com>
Reviewed-by: Alex Williamson <alex.williamson@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 694632fd44987cc4618612a38ad151047524a590)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/pci/pcie.c | 12 ++++++++----
1 file changed, 8 insertions(+), 4 deletions(-)
diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 0b455c8654..1b12db6fa2 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -1113,18 +1113,22 @@ void pcie_sync_bridge_lnk(PCIDevice *bridge_dev)
if ((lnksta & PCI_EXP_LNKSTA_NLW) > (lnkcap & PCI_EXP_LNKCAP_MLW)) {
lnksta &= ~PCI_EXP_LNKSTA_NLW;
lnksta |= lnkcap & PCI_EXP_LNKCAP_MLW;
- } else if (!(lnksta & PCI_EXP_LNKSTA_NLW)) {
- lnksta |= QEMU_PCI_EXP_LNKSTA_NLW(QEMU_PCI_EXP_LNK_X1);
}
if ((lnksta & PCI_EXP_LNKSTA_CLS) > (lnkcap & PCI_EXP_LNKCAP_SLS)) {
lnksta &= ~PCI_EXP_LNKSTA_CLS;
lnksta |= lnkcap & PCI_EXP_LNKCAP_SLS;
- } else if (!(lnksta & PCI_EXP_LNKSTA_CLS)) {
- lnksta |= QEMU_PCI_EXP_LNKSTA_CLS(QEMU_PCI_EXP_LNK_2_5GT);
}
}
+ if (!(lnksta & PCI_EXP_LNKSTA_NLW)) {
+ lnksta |= QEMU_PCI_EXP_LNKSTA_NLW(QEMU_PCI_EXP_LNK_X1);
+ }
+
+ if (!(lnksta & PCI_EXP_LNKSTA_CLS)) {
+ lnksta |= QEMU_PCI_EXP_LNKSTA_CLS(QEMU_PCI_EXP_LNK_2_5GT);
+ }
+
pci_word_test_and_clear_mask(exp_cap + PCI_EXP_LNKSTA,
PCI_EXP_LNKSTA_CLS | PCI_EXP_LNKSTA_NLW);
pci_word_test_and_set_mask(exp_cap + PCI_EXP_LNKSTA, lnksta &

View File

@@ -1,42 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Kevin Wolf <kwolf@redhat.com>
Date: Thu, 3 Dec 2020 18:23:10 +0100
Subject: [PATCH] block: Fix locking in qmp_block_resize()
The drain functions assume that we hold the AioContext lock of the
drained block node. Make sure to actually take the lock.
Cc: qemu-stable@nongnu.org
Fixes: eb94b81a94bce112e6b206df846c1551aaf6cab6
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201203172311.68232-3-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
blockdev.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/blockdev.c b/blockdev.c
index fe6fb5dc1d..9a86e9fb4b 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2481,14 +2481,17 @@ void coroutine_fn qmp_block_resize(bool has_device, const char *device,
goto out;
}
+ bdrv_co_lock(bs);
bdrv_drained_begin(bs);
+ bdrv_co_unlock(bs);
+
old_ctx = bdrv_co_enter(bs);
blk_truncate(blk, size, false, PREALLOC_MODE_OFF, 0, errp);
bdrv_co_leave(bs, old_ctx);
- bdrv_drained_end(bs);
out:
bdrv_co_lock(bs);
+ bdrv_drained_end(bs);
blk_unref(blk);
bdrv_co_unlock(bs);
}

View File

@@ -0,0 +1,36 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Nicholas Piggin <npiggin@gmail.com>
Date: Thu, 12 Dec 2024 22:04:02 +1000
Subject: [PATCH] pci/msix: Fix msix pba read vector poll end calculation
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The end vector calculation has a bug that results in polling fewer
than required vectors when reading at a non-zero offset in PBA memory.
Fixes: bbef882cc193 ("msi: add API to get notified about pending bit poll")
Signed-off-by: Nicholas Piggin <npiggin@gmail.com>
Message-Id: <20241212120402.1475053-1-npiggin@gmail.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@linaro.org>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 42e2a7a0ab23784e44fcb18369e06067abc89305)
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/pci/msix.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/pci/msix.c b/hw/pci/msix.c
index 487e49834e..cc6e79ec67 100644
--- a/hw/pci/msix.c
+++ b/hw/pci/msix.c
@@ -250,7 +250,7 @@ static uint64_t msix_pba_mmio_read(void *opaque, hwaddr addr,
PCIDevice *dev = opaque;
if (dev->msix_vector_poll_notifier) {
unsigned vector_start = addr * 8;
- unsigned vector_end = MIN(addr + size * 8, dev->msix_entries_nr);
+ unsigned vector_end = MIN((addr + size) * 8, dev->msix_entries_nr);
dev->msix_vector_poll_notifier(dev, vector_start, vector_end);
}

View File

@@ -1,118 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Kevin Wolf <kwolf@redhat.com>
Date: Thu, 3 Dec 2020 18:23:11 +0100
Subject: [PATCH] block: Fix deadlock in bdrv_co_yield_to_drain()
If bdrv_co_yield_to_drain() is called for draining a block node that
runs in a different AioContext, it keeps that AioContext locked while it
yields and schedules a BH in the AioContext to do the actual drain.
As long as executing the BH is the very next thing that the event loop
of the node's AioContext does, this actually happens to work, but when
it tries to execute something else that wants to take the AioContext
lock, it will deadlock. (In the bug report, this other thing is a
virtio-scsi device running virtio_scsi_data_plane_handle_cmd().)
Instead, always drop the AioContext lock across the yield and reacquire
it only when the coroutine is reentered. The BH needs to unconditionally
take the lock for itself now.
This fixes the 'block_resize' QMP command on a block node that runs in
an iothread.
Cc: qemu-stable@nongnu.org
Fixes: eb94b81a94bce112e6b206df846c1551aaf6cab6
Fixes: https://bugzilla.redhat.com/show_bug.cgi?id=1903511
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Message-Id: <20201203172311.68232-4-kwolf@redhat.com>
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
block/io.c | 41 ++++++++++++++++++++++++-----------------
1 file changed, 24 insertions(+), 17 deletions(-)
diff --git a/block/io.c b/block/io.c
index ec5e152bb7..a9f56a9ab1 100644
--- a/block/io.c
+++ b/block/io.c
@@ -306,17 +306,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
if (bs) {
AioContext *ctx = bdrv_get_aio_context(bs);
- AioContext *co_ctx = qemu_coroutine_get_aio_context(co);
-
- /*
- * When the coroutine yielded, the lock for its home context was
- * released, so we need to re-acquire it here. If it explicitly
- * acquired a different context, the lock is still held and we don't
- * want to lock it a second time (or AIO_WAIT_WHILE() would hang).
- */
- if (ctx == co_ctx) {
- aio_context_acquire(ctx);
- }
+ aio_context_acquire(ctx);
bdrv_dec_in_flight(bs);
if (data->begin) {
assert(!data->drained_end_counter);
@@ -328,9 +318,7 @@ static void bdrv_co_drain_bh_cb(void *opaque)
data->ignore_bds_parents,
data->drained_end_counter);
}
- if (ctx == co_ctx) {
- aio_context_release(ctx);
- }
+ aio_context_release(ctx);
} else {
assert(data->begin);
bdrv_drain_all_begin();
@@ -348,13 +336,16 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
int *drained_end_counter)
{
BdrvCoDrainData data;
+ Coroutine *self = qemu_coroutine_self();
+ AioContext *ctx = bdrv_get_aio_context(bs);
+ AioContext *co_ctx = qemu_coroutine_get_aio_context(self);
/* Calling bdrv_drain() from a BH ensures the current coroutine yields and
* other coroutines run if they were queued by aio_co_enter(). */
assert(qemu_in_coroutine());
data = (BdrvCoDrainData) {
- .co = qemu_coroutine_self(),
+ .co = self,
.bs = bs,
.done = false,
.begin = begin,
@@ -368,13 +359,29 @@ static void coroutine_fn bdrv_co_yield_to_drain(BlockDriverState *bs,
if (bs) {
bdrv_inc_in_flight(bs);
}
- replay_bh_schedule_oneshot_event(bdrv_get_aio_context(bs),
- bdrv_co_drain_bh_cb, &data);
+
+ /*
+ * Temporarily drop the lock across yield or we would get deadlocks.
+ * bdrv_co_drain_bh_cb() reaquires the lock as needed.
+ *
+ * When we yield below, the lock for the current context will be
+ * released, so if this is actually the lock that protects bs, don't drop
+ * it a second time.
+ */
+ if (ctx != co_ctx) {
+ aio_context_release(ctx);
+ }
+ replay_bh_schedule_oneshot_event(ctx, bdrv_co_drain_bh_cb, &data);
qemu_coroutine_yield();
/* If we are resumed from some other event (such as an aio completion or a
* timer callback), it is a bug in the caller that should be fixed. */
assert(data.done);
+
+ /* Reaquire the AioContext of bs if we dropped it */
+ if (ctx != co_ctx) {
+ aio_context_acquire(ctx);
+ }
}
void bdrv_do_drained_begin_quiesce(BlockDriverState *bs,

1282
debian/patches/pve-qemu-9.2-vitastor.patch vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@@ -14,10 +14,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index d5fd1dbcd2..bda3e606dc 100644
index 90fa54352c..e2ea071315 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -508,7 +508,7 @@ static QemuOptsList raw_runtime_opts = {
@@ -564,7 +564,7 @@ static QemuOptsList raw_runtime_opts = {
{
.name = "locking",
.type = QEMU_OPT_STRING,
@@ -26,7 +26,7 @@ index d5fd1dbcd2..bda3e606dc 100644
},
{
.name = "pr-manager",
@@ -606,7 +606,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
@@ -664,7 +664,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
s->use_lock = false;
break;
case ON_OFF_AUTO_AUTO:

View File

@@ -9,12 +9,12 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/net/net.h b/include/net/net.h
index 778fc787ca..fb2db6bb75 100644
index cdd5b109b0..653a37e9d1 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -210,8 +210,8 @@ void netdev_add(QemuOpts *opts, Error **errp);
@@ -305,8 +305,8 @@ void netdev_add(QemuOpts *opts, Error **errp);
int net_hub_id_for_client(NetClientState *nc, int *id);
NetClientState *net_hub_port_find(int hub_id);
-#define DEFAULT_NETWORK_SCRIPT CONFIG_SYSCONFDIR "/qemu-ifup"
-#define DEFAULT_NETWORK_DOWN_SCRIPT CONFIG_SYSCONFDIR "/qemu-ifdown"

View File

@@ -10,10 +10,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index 88e8586f8f..93563ee0c2 100644
index 4c239a6970..be09263fb0 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1973,9 +1973,9 @@ uint64_t cpu_get_tsc(CPUX86State *env);
@@ -2475,9 +2475,9 @@ uint64_t cpu_get_tsc(CPUX86State *env);
#define CPU_RESOLVING_TYPE TYPE_X86_CPU
#ifdef TARGET_X86_64
@@ -24,4 +24,4 @@ index 88e8586f8f..93563ee0c2 100644
+#define TARGET_DEFAULT_CPU_TYPE X86_CPU_TYPE_NAME("kvm32")
#endif
#define cpu_signal_handler cpu_x86_signal_handler
#define cpu_list x86_cpu_list

View File

@@ -9,10 +9,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/ui/spice-core.c b/ui/spice-core.c
index eea52f5389..d09ee7f09e 100644
index bd9dbe03f1..a7ecaad9c7 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -667,32 +667,35 @@ static void qemu_spice_init(void)
@@ -690,32 +690,35 @@ static void qemu_spice_init(void)
if (tls_port) {
x509_dir = qemu_opt_get(opts, "x509-dir");

View File

@@ -9,7 +9,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/block/gluster.c b/block/gluster.c
index 4f1448e2bc..715f06394c 100644
index e9c038042b..c8457a5014 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -42,7 +42,7 @@
@@ -21,15 +21,15 @@ index 4f1448e2bc..715f06394c 100644
/*
* Several versions of GlusterFS (3.12? -> 6.0.1) fail when the transfer size
* is greater or equal to 1024 MiB, so we are limiting the transfer size to 512
@@ -424,6 +424,7 @@ static struct glfs *qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
@@ -421,6 +421,7 @@ static struct glfs *qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
int old_errno;
SocketAddressList *server;
unsigned long long port;
uint64_t port;
+ const char *logfile;
glfs = glfs_find_preopened(gconf->volume);
if (glfs) {
@@ -466,9 +467,15 @@ static struct glfs *qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
@@ -463,9 +464,15 @@ static struct glfs *qemu_gluster_glfs_init(BlockdevOptionsGluster *gconf,
}
}

View File

@@ -1,24 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Alexandre Derumier <aderumier@odiso.com>
Date: Mon, 6 Apr 2020 12:16:34 +0200
Subject: [PATCH] PVE: [Config] smm_available = false
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
hw/i386/x86.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 5944fc44ed..31b481b4e9 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1115,7 +1115,7 @@ bool x86_machine_is_smm_enabled(const X86MachineState *x86ms)
if (tcg_enabled() || qtest_enabled()) {
smm_available = true;
} else if (kvm_enabled()) {
- smm_available = kvm_has_smm();
+ smm_available = false;
}
if (smm_available) {

View File

@@ -18,10 +18,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+)
diff --git a/block/rbd.c b/block/rbd.c
index 9bd2bce716..c7195a2342 100644
index 04ed0e242e..728bce3b1e 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -609,6 +609,8 @@ static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
@@ -963,6 +963,8 @@ static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
rados_conf_set(*cluster, "rbd_cache", "false");
}

View File

@@ -16,7 +16,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/block/gluster.c b/block/gluster.c
index 715f06394c..6d02170d9b 100644
index c8457a5014..c3a9555591 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -57,6 +57,7 @@ typedef struct GlusterAIOCB {
@@ -27,7 +27,7 @@ index 715f06394c..6d02170d9b 100644
} GlusterAIOCB;
typedef struct BDRVGlusterState {
@@ -759,8 +760,10 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret,
@@ -746,8 +747,10 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret,
acb->ret = 0; /* Success */
} else if (ret < 0) {
acb->ret = -errno; /* Read/Write failed */
@@ -39,15 +39,15 @@ index 715f06394c..6d02170d9b 100644
}
aio_co_schedule(acb->aio_context, acb->coroutine);
@@ -1028,6 +1031,7 @@ static coroutine_fn int qemu_gluster_co_pwrite_zeroes(BlockDriverState *bs,
@@ -1018,6 +1021,7 @@ static coroutine_fn int qemu_gluster_co_pwrite_zeroes(BlockDriverState *bs,
acb.ret = 0;
acb.coroutine = qemu_coroutine_self();
acb.aio_context = bdrv_get_aio_context(bs);
+ acb.is_write = true;
ret = glfs_zerofill_async(s->fd, offset, size, gluster_finish_aiocb, &acb);
ret = glfs_zerofill_async(s->fd, offset, bytes, gluster_finish_aiocb, &acb);
if (ret < 0) {
@@ -1209,9 +1213,11 @@ static coroutine_fn int qemu_gluster_co_rw(BlockDriverState *bs,
@@ -1198,9 +1202,11 @@ static coroutine_fn int qemu_gluster_co_rw(BlockDriverState *bs,
acb.aio_context = bdrv_get_aio_context(bs);
if (write) {
@@ -59,7 +59,7 @@ index 715f06394c..6d02170d9b 100644
ret = glfs_preadv_async(s->fd, qiov->iov, qiov->niov, offset, 0,
gluster_finish_aiocb, &acb);
}
@@ -1275,6 +1281,7 @@ static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs)
@@ -1263,6 +1269,7 @@ static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs)
acb.ret = 0;
acb.coroutine = qemu_coroutine_self();
acb.aio_context = bdrv_get_aio_context(bs);
@@ -67,11 +67,11 @@ index 715f06394c..6d02170d9b 100644
ret = glfs_fsync_async(s->fd, gluster_finish_aiocb, &acb);
if (ret < 0) {
@@ -1321,6 +1328,7 @@ static coroutine_fn int qemu_gluster_co_pdiscard(BlockDriverState *bs,
@@ -1311,6 +1318,7 @@ static coroutine_fn int qemu_gluster_co_pdiscard(BlockDriverState *bs,
acb.ret = 0;
acb.coroutine = qemu_coroutine_self();
acb.aio_context = bdrv_get_aio_context(bs);
+ acb.is_write = true;
ret = glfs_discard_async(s->fd, offset, size, gluster_finish_aiocb, &acb);
ret = glfs_discard_async(s->fd, offset, bytes, gluster_finish_aiocb, &acb);
if (ret < 0) {

View File

@@ -9,10 +9,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/qemu-img.c b/qemu-img.c
index f9050bfaad..7e6666b5f7 100644
index 7668f86769..2575e97b43 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -3022,7 +3022,8 @@ static int img_info(int argc, char **argv)
@@ -3075,7 +3075,8 @@ static int img_info(int argc, char **argv)
list = collect_image_info_list(image_opts, filename, fmt, chain,
force_share);
if (!list) {

View File

@@ -1,88 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
Date: Mon, 6 Apr 2020 12:16:37 +0200
Subject: [PATCH] PVE: [Up] qmp: add get_link_status
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
net/net.c | 27 +++++++++++++++++++++++++++
qapi/net.json | 15 +++++++++++++++
qapi/pragma.json | 1 +
3 files changed, 43 insertions(+)
diff --git a/net/net.c b/net/net.c
index 6a2c3d9567..a1e9514fb8 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1277,6 +1277,33 @@ void hmp_info_network(Monitor *mon, const QDict *qdict)
}
}
+int64_t qmp_get_link_status(const char *name, Error **errp)
+{
+ NetClientState *ncs[MAX_QUEUE_NUM];
+ NetClientState *nc;
+ int queues;
+ bool ret;
+
+ queues = qemu_find_net_clients_except(name, ncs,
+ NET_CLIENT_DRIVER__MAX,
+ MAX_QUEUE_NUM);
+
+ if (queues == 0) {
+ error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+ "Device '%s' not found", name);
+ return (int64_t) -1;
+ }
+
+ nc = ncs[0];
+ ret = ncs[0]->link_down;
+
+ if (nc->peer->info->type == NET_CLIENT_DRIVER_NIC) {
+ ret = ncs[0]->peer->link_down;
+ }
+
+ return (int64_t) ret ? 0 : 1;
+}
+
void colo_notify_filters_event(int event, Error **errp)
{
NetClientState *nc;
diff --git a/qapi/net.json b/qapi/net.json
index a3a1336001..b8092c4e20 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -35,6 +35,21 @@
##
{ 'command': 'set_link', 'data': {'name': 'str', 'up': 'bool'} }
+##
+# @get_link_status:
+#
+# Get the current link state of the nics or nic.
+#
+# @name: name of the nic you get the state of
+#
+# Return: If link is up 1
+# If link is down 0
+# If an error occure an empty string.
+#
+# Notes: this is an Proxmox VE extension and not offical part of Qemu.
+##
+{ 'command': 'get_link_status', 'data': {'name': 'str'} , 'returns': 'int' }
+
##
# @netdev_add:
#
diff --git a/qapi/pragma.json b/qapi/pragma.json
index cffae27666..5a3e3de95f 100644
--- a/qapi/pragma.json
+++ b/qapi/pragma.json
@@ -5,6 +5,7 @@
{ 'pragma': {
# Commands allowed to return a non-dictionary:
'returns-whitelist': [
+ 'get_link_status',
'human-monitor-command',
'qom-get',
'query-migrate-cache-size',

View File

@@ -31,16 +31,17 @@ override the output file's size.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
qemu-img-cmds.hx | 4 +-
qemu-img.c | 191 +++++++++++++++++++++++++++++------------------
2 files changed, 121 insertions(+), 74 deletions(-)
qemu-img.c | 202 ++++++++++++++++++++++++++++++-----------------
2 files changed, 133 insertions(+), 73 deletions(-)
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index b3620f29e5..e70ef3dc91 100644
index c9dd70a892..048788b23d 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -58,9 +58,9 @@ SRST
@@ -60,9 +60,9 @@ SRST
ERST
DEF("dd", img_dd,
@@ -53,10 +54,10 @@ index b3620f29e5..e70ef3dc91 100644
DEF("info", img_info,
diff --git a/qemu-img.c b/qemu-img.c
index 7e6666b5f7..44cf942bd2 100644
index 2575e97b43..8ec68b346f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4897,10 +4897,12 @@ static int img_bitmap(int argc, char **argv)
@@ -4993,10 +4993,12 @@ static int img_bitmap(int argc, char **argv)
#define C_IF 04
#define C_OF 010
#define C_SKIP 020
@@ -69,7 +70,7 @@ index 7e6666b5f7..44cf942bd2 100644
};
struct DdIo {
@@ -4976,6 +4978,19 @@ static int img_dd_skip(const char *arg,
@@ -5072,6 +5074,19 @@ static int img_dd_skip(const char *arg,
return 0;
}
@@ -89,7 +90,7 @@ index 7e6666b5f7..44cf942bd2 100644
static int img_dd(int argc, char **argv)
{
int ret = 0;
@@ -5016,6 +5031,7 @@ static int img_dd(int argc, char **argv)
@@ -5112,6 +5127,7 @@ static int img_dd(int argc, char **argv)
{ "if", img_dd_if, C_IF },
{ "of", img_dd_of, C_OF },
{ "skip", img_dd_skip, C_SKIP },
@@ -97,7 +98,7 @@ index 7e6666b5f7..44cf942bd2 100644
{ NULL, NULL, 0 }
};
const struct option long_options[] = {
@@ -5094,8 +5110,13 @@ static int img_dd(int argc, char **argv)
@@ -5187,91 +5203,112 @@ static int img_dd(int argc, char **argv)
arg = NULL;
}
@@ -105,53 +106,30 @@ index 7e6666b5f7..44cf942bd2 100644
- error_report("Must specify both input and output files");
+ if (!(dd.flags & C_IF) && (!fmt || strcmp(fmt, "raw") != 0)) {
+ error_report("Input format must be raw when readin from stdin");
+ ret = -1;
+ goto out;
+ }
ret = -1;
goto out;
}
-
- blk1 = img_open(image_opts, in.filename, fmt, 0, false, false,
- force_share);
-
- if (!blk1) {
+ if (!(dd.flags & C_OF) && strcmp(out_fmt, "raw") != 0) {
+ error_report("Output format must be raw when writing to stdout");
ret = -1;
goto out;
}
@@ -5107,85 +5128,101 @@ static int img_dd(int argc, char **argv)
goto out;
}
- blk1 = img_open(image_opts, in.filename, fmt, 0, false, false,
- force_share);
+ if (dd.flags & C_IF) {
+ blk1 = img_open(image_opts, in.filename, fmt, 0, false, false,
+ force_share);
- if (!blk1) {
- ret = -1;
- goto out;
+ if (!blk1) {
+ ret = -1;
+ goto out;
+ }
}
- drv = bdrv_find_format(out_fmt);
- if (!drv) {
- error_report("Unknown file format");
+ if (dd.flags & C_OSIZE) {
+ size = dd.osize;
+ } else if (dd.flags & C_IF) {
+ size = blk_getlength(blk1);
+ if (size < 0) {
+ error_report("Failed to get size for '%s'", in.filename);
+ ret = -1;
+ goto out;
+ }
+ } else if (dd.flags & C_COUNT) {
+ size = dd.count * in.bsz;
+ } else {
+ error_report("Output size must be known when reading from stdin");
ret = -1;
goto out;
}
- ret = -1;
- goto out;
- }
- proto_drv = bdrv_find_protocol(out.filename, true, &local_err);
+ if (dd.flags & C_IF) {
+ blk1 = img_open(image_opts, in.filename, fmt, 0, false, false,
+ force_share);
- if (!proto_drv) {
- error_report_err(local_err);
@@ -169,14 +147,50 @@ index 7e6666b5f7..44cf942bd2 100644
- proto_drv->format_name);
- ret = -1;
- goto out;
+ if (!(dd.flags & C_OSIZE) && dd.flags & C_COUNT && dd.count <= INT64_MAX / in.bsz &&
+ dd.count * in.bsz < size) {
+ size = dd.count * in.bsz;
+ if (!blk1) {
+ ret = -1;
+ goto out;
+ }
}
- create_opts = qemu_opts_append(create_opts, drv->create_opts);
- create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);
-
- opts = qemu_opts_create(create_opts, NULL, 0, &error_abort);
- size = blk_getlength(blk1);
- if (size < 0) {
- error_report("Failed to get size for '%s'", in.filename);
+ if (dd.flags & C_OSIZE) {
+ size = dd.osize;
+ } else if (dd.flags & C_IF) {
+ size = blk_getlength(blk1);
+ if (size < 0) {
+ error_report("Failed to get size for '%s'", in.filename);
+ ret = -1;
+ goto out;
+ }
+ } else if (dd.flags & C_COUNT) {
+ size = dd.count * in.bsz;
+ } else {
+ error_report("Output size must be known when reading from stdin");
ret = -1;
goto out;
}
- if (dd.flags & C_COUNT && dd.count <= INT64_MAX / in.bsz &&
+ if (!(dd.flags & C_OSIZE) && dd.flags & C_COUNT && dd.count <= INT64_MAX / in.bsz &&
dd.count * in.bsz < size) {
size = dd.count * in.bsz;
}
- /* Overflow means the specified offset is beyond input image's size */
- if (dd.flags & C_SKIP && (in.offset > INT64_MAX / in.bsz ||
- size < in.bsz * in.offset)) {
- qemu_opt_set_number(opts, BLOCK_OPT_SIZE, 0, &error_abort);
- } else {
- qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
- size - in.bsz * in.offset, &error_abort);
- }
+ if (dd.flags & C_OF) {
+ drv = bdrv_find_format(out_fmt);
+ if (!drv) {
@@ -186,9 +200,11 @@ index 7e6666b5f7..44cf942bd2 100644
+ }
+ proto_drv = bdrv_find_protocol(out.filename, true, &local_err);
- size = blk_getlength(blk1);
- if (size < 0) {
- error_report("Failed to get size for '%s'", in.filename);
- ret = bdrv_create(drv, out.filename, opts, &local_err);
- if (ret < 0) {
- error_reportf_err(local_err,
- "%s: error while creating output image: ",
- out.filename);
- ret = -1;
- goto out;
- }
@@ -212,20 +228,18 @@ index 7e6666b5f7..44cf942bd2 100644
+ create_opts = qemu_opts_append(create_opts, drv->create_opts);
+ create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);
- if (dd.flags & C_COUNT && dd.count <= INT64_MAX / in.bsz &&
- dd.count * in.bsz < size) {
- size = dd.count * in.bsz;
- }
- /* TODO, we can't honour --image-opts for the target,
- * since it needs to be given in a format compatible
- * with the bdrv_create() call above which does not
- * support image-opts style.
- */
- blk2 = img_open_file(out.filename, NULL, out_fmt, BDRV_O_RDWR,
- false, false, false);
+ opts = qemu_opts_create(create_opts, NULL, 0, &error_abort);
- /* Overflow means the specified offset is beyond input image's size */
- if (dd.flags & C_SKIP && (in.offset > INT64_MAX / in.bsz ||
- size < in.bsz * in.offset)) {
- qemu_opt_set_number(opts, BLOCK_OPT_SIZE, 0, &error_abort);
- } else {
- qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
- size - in.bsz * in.offset, &error_abort);
- }
- if (!blk2) {
- ret = -1;
- goto out;
+ /* Overflow means the specified offset is beyond input image's size */
+ if (dd.flags & C_OSIZE) {
+ qemu_opt_set_number(opts, BLOCK_OPT_SIZE, size, &error_abort);
@@ -236,15 +250,7 @@ index 7e6666b5f7..44cf942bd2 100644
+ qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
+ size - in.bsz * in.offset, &error_abort);
+ }
- ret = bdrv_create(drv, out.filename, opts, &local_err);
- if (ret < 0) {
- error_reportf_err(local_err,
- "%s: error while creating output image: ",
- out.filename);
- ret = -1;
- goto out;
- }
+
+ ret = bdrv_create(drv, out.filename, opts, &local_err);
+ if (ret < 0) {
+ error_reportf_err(local_err,
@@ -253,14 +259,7 @@ index 7e6666b5f7..44cf942bd2 100644
+ ret = -1;
+ goto out;
+ }
- /* TODO, we can't honour --image-opts for the target,
- * since it needs to be given in a format compatible
- * with the bdrv_create() call above which does not
- * support image-opts style.
- */
- blk2 = img_open_file(out.filename, NULL, out_fmt, BDRV_O_RDWR,
- false, false, false);
+
+ /* TODO, we can't honour --image-opts for the target,
+ * since it needs to be given in a format compatible
+ * with the bdrv_create() call above which does not
@@ -268,10 +267,7 @@ index 7e6666b5f7..44cf942bd2 100644
+ */
+ blk2 = img_open_file(out.filename, NULL, out_fmt, BDRV_O_RDWR,
+ false, false, false);
- if (!blk2) {
- ret = -1;
- goto out;
+
+ if (!blk2) {
+ ret = -1;
+ goto out;
@@ -279,41 +275,54 @@ index 7e6666b5f7..44cf942bd2 100644
}
if (dd.flags & C_SKIP && (in.offset > INT64_MAX / in.bsz ||
@@ -5203,11 +5240,17 @@ static int img_dd(int argc, char **argv)
@@ -5288,20 +5325,43 @@ static int img_dd(int argc, char **argv)
in.buf = g_new(uint8_t, in.bsz);
for (out_pos = 0; in_pos < size; block_count++) {
int in_ret, out_ret;
for (out_pos = 0; in_pos < size; ) {
+ int in_ret, out_ret;
int bytes = (in_pos + in.bsz > size) ? size - in_pos : in.bsz;
-
- if (in_pos + in.bsz > size) {
- in_ret = blk_pread(blk1, in_pos, in.buf, size - in_pos);
+ size_t in_bsz = in_pos + in.bsz > size ? size - in_pos : in.bsz;
- ret = blk_pread(blk1, in_pos, bytes, in.buf, 0);
- if (ret < 0) {
+ if (blk1) {
+ in_ret = blk_pread(blk1, in_pos, in.buf, in_bsz);
} else {
- in_ret = blk_pread(blk1, in_pos, in.buf, in.bsz);
+ in_ret = read(STDIN_FILENO, in.buf, in_bsz);
+ in_ret = blk_pread(blk1, in_pos, bytes, in.buf, 0);
+ if (in_ret == 0) {
+ in_ret = bytes;
+ }
+ } else {
+ in_ret = read(STDIN_FILENO, in.buf, bytes);
+ if (in_ret == 0) {
+ /* early EOF is considered an error */
+ error_report("Input ended unexpectedly");
+ ret = -1;
+ goto out;
+ }
}
if (in_ret < 0) {
+ }
+ if (in_ret < 0) {
error_report("error while reading from input image file: %s",
@@ -5217,9 +5260,13 @@ static int img_dd(int argc, char **argv)
- strerror(-ret));
+ strerror(-in_ret));
+ ret = -1;
goto out;
}
in_pos += in_ret;
in_pos += bytes;
- out_ret = blk_pwrite(blk2, out_pos, in.buf, in_ret, 0);
- ret = blk_pwrite(blk2, out_pos, bytes, in.buf, 0);
- if (ret < 0) {
+ if (blk2) {
+ out_ret = blk_pwrite(blk2, out_pos, in.buf, in_ret, 0);
+ out_ret = blk_pwrite(blk2, out_pos, in_ret, in.buf, 0);
+ if (out_ret == 0) {
+ out_ret = in_ret;
+ }
+ } else {
+ out_ret = write(STDOUT_FILENO, in.buf, in_ret);
+ }
- if (out_ret < 0) {
+
+ if (out_ret != in_ret) {
error_report("error while writing to output image file: %s",
strerror(-out_ret));
ret = -1;
- strerror(-ret));
+ strerror(-out_ret));
+ ret = -1;
goto out;
}
out_pos += bytes;

View File

@@ -10,15 +10,16 @@ an expected end of input.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
qemu-img.c | 28 +++++++++++++++++++++++++---
1 file changed, 25 insertions(+), 3 deletions(-)
diff --git a/qemu-img.c b/qemu-img.c
index 44cf942bd2..5ce60e8a45 100644
index 8ec68b346f..b98184bba1 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4898,11 +4898,13 @@ static int img_bitmap(int argc, char **argv)
@@ -4994,11 +4994,13 @@ static int img_bitmap(int argc, char **argv)
#define C_OF 010
#define C_SKIP 020
#define C_OSIZE 040
@@ -32,7 +33,7 @@ index 44cf942bd2..5ce60e8a45 100644
};
struct DdIo {
@@ -4991,6 +4993,19 @@ static int img_dd_osize(const char *arg,
@@ -5087,6 +5089,19 @@ static int img_dd_osize(const char *arg,
return 0;
}
@@ -52,13 +53,13 @@ index 44cf942bd2..5ce60e8a45 100644
static int img_dd(int argc, char **argv)
{
int ret = 0;
@@ -5005,12 +5020,14 @@ static int img_dd(int argc, char **argv)
@@ -5101,12 +5116,14 @@ static int img_dd(int argc, char **argv)
int c, i;
const char *out_fmt = "raw";
const char *fmt = NULL;
- int64_t size = 0;
+ int64_t size = 0, readsize = 0;
int64_t block_count = 0, out_pos, in_pos;
int64_t out_pos, in_pos;
bool force_share = false;
struct DdInfo dd = {
.flags = 0,
@@ -68,7 +69,7 @@ index 44cf942bd2..5ce60e8a45 100644
};
struct DdIo in = {
.bsz = 512, /* Block size is by default 512 bytes */
@@ -5032,6 +5049,7 @@ static int img_dd(int argc, char **argv)
@@ -5128,6 +5145,7 @@ static int img_dd(int argc, char **argv)
{ "of", img_dd_of, C_OF },
{ "skip", img_dd_skip, C_SKIP },
{ "osize", img_dd_osize, C_OSIZE },
@@ -76,20 +77,22 @@ index 44cf942bd2..5ce60e8a45 100644
{ NULL, NULL, 0 }
};
const struct option long_options[] = {
@@ -5238,14 +5256,18 @@ static int img_dd(int argc, char **argv)
@@ -5324,9 +5342,10 @@ static int img_dd(int argc, char **argv)
in.buf = g_new(uint8_t, in.bsz);
- for (out_pos = 0; in_pos < size; block_count++) {
- for (out_pos = 0; in_pos < size; ) {
+ readsize = (dd.isize > 0) ? dd.isize : size;
+ for (out_pos = 0; in_pos < readsize; block_count++) {
+ for (out_pos = 0; in_pos < readsize; ) {
int in_ret, out_ret;
- size_t in_bsz = in_pos + in.bsz > size ? size - in_pos : in.bsz;
+ size_t in_bsz = in_pos + in.bsz > readsize ? readsize - in_pos : in.bsz;
- int bytes = (in_pos + in.bsz > size) ? size - in_pos : in.bsz;
+ int bytes = (in_pos + in.bsz > readsize) ? readsize - in_pos : in.bsz;
if (blk1) {
in_ret = blk_pread(blk1, in_pos, in.buf, in_bsz);
in_ret = blk_pread(blk1, in_pos, bytes, in.buf, 0);
if (in_ret == 0) {
@@ -5335,6 +5354,9 @@ static int img_dd(int argc, char **argv)
} else {
in_ret = read(STDIN_FILENO, in.buf, in_bsz);
in_ret = read(STDIN_FILENO, in.buf, bytes);
if (in_ret == 0) {
+ if (dd.isize == 0) {
+ goto out;

View File

@@ -0,0 +1,121 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Alexandre Derumier <aderumier@odiso.com>
Date: Mon, 6 Apr 2020 12:16:42 +0200
Subject: [PATCH] PVE: [Up] qemu-img dd: add -n skip_create
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: fix getopt-string + add documentation]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
docs/tools/qemu-img.rst | 11 ++++++++++-
qemu-img-cmds.hx | 4 ++--
qemu-img.c | 23 ++++++++++++++---------
3 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index 3653adb963..d83e8fb3c0 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -212,6 +212,10 @@ Parameters to convert subcommand:
Parameters to dd subcommand:
+.. option:: -n
+
+ Skip the creation of the target volume
+
.. program:: qemu-img-dd
.. option:: bs=BLOCK_SIZE
@@ -492,7 +496,7 @@ Command description:
it doesn't need to be specified separately in this case.
-.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] if=INPUT of=OUTPUT
+.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [-n] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] if=INPUT of=OUTPUT
dd copies from *INPUT* file to *OUTPUT* file converting it from
*FMT* format to *OUTPUT_FMT* format.
@@ -503,6 +507,11 @@ Command description:
The size syntax is similar to :manpage:`dd(1)`'s size syntax.
+ If the ``-n`` option is specified, the target volume creation will be
+ skipped. This is useful for formats such as ``rbd`` if the target
+ volume has already been created with site specific options that cannot
+ be supplied through ``qemu-img``.
+
.. option:: info [--object OBJECTDEF] [--image-opts] [-f FMT] [--output=OFMT] [--backing-chain] [-U] FILENAME
Give information about the disk image *FILENAME*. Use it in
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 048788b23d..0b29a67a06 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -60,9 +60,9 @@ SRST
ERST
DEF("dd", img_dd,
- "dd [--image-opts] [-U] [-f fmt] [-O output_fmt] [bs=block_size] [count=blocks] [skip=blocks] [osize=output_size] if=input of=output")
+ "dd [--image-opts] [-U] [-f fmt] [-O output_fmt] [-n] [bs=block_size] [count=blocks] [skip=blocks] [osize=output_size] if=input of=output")
SRST
-.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] [osize=OUTPUT_SIZE] if=INPUT of=OUTPUT
+.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [-n] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] [osize=OUTPUT_SIZE] if=INPUT of=OUTPUT
ERST
DEF("info", img_info,
diff --git a/qemu-img.c b/qemu-img.c
index b98184bba1..6fc8384f64 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -5118,7 +5118,7 @@ static int img_dd(int argc, char **argv)
const char *fmt = NULL;
int64_t size = 0, readsize = 0;
int64_t out_pos, in_pos;
- bool force_share = false;
+ bool force_share = false, skip_create = false;
struct DdInfo dd = {
.flags = 0,
.count = 0,
@@ -5156,7 +5156,7 @@ static int img_dd(int argc, char **argv)
{ 0, 0, 0, 0 }
};
- while ((c = getopt_long(argc, argv, ":hf:O:U", long_options, NULL))) {
+ while ((c = getopt_long(argc, argv, ":hf:O:Un", long_options, NULL))) {
if (c == EOF) {
break;
}
@@ -5176,6 +5176,9 @@ static int img_dd(int argc, char **argv)
case 'h':
help();
break;
+ case 'n':
+ skip_create = true;
+ break;
case 'U':
force_share = true;
break;
@@ -5306,13 +5309,15 @@ static int img_dd(int argc, char **argv)
size - in.bsz * in.offset, &error_abort);
}
- ret = bdrv_create(drv, out.filename, opts, &local_err);
- if (ret < 0) {
- error_reportf_err(local_err,
- "%s: error while creating output image: ",
- out.filename);
- ret = -1;
- goto out;
+ if (!skip_create) {
+ ret = bdrv_create(drv, out.filename, opts, &local_err);
+ if (ret < 0) {
+ error_reportf_err(local_err,
+ "%s: error while creating output image: ",
+ out.filename);
+ ret = -1;
+ goto out;
+ }
}
/* TODO, we can't honour --image-opts for the target,

View File

@@ -0,0 +1,130 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Mon, 7 Feb 2022 14:21:01 +0100
Subject: [PATCH] qemu-img dd: add -l option for loading a snapshot
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
docs/tools/qemu-img.rst | 6 +++---
qemu-img-cmds.hx | 4 ++--
qemu-img.c | 33 +++++++++++++++++++++++++++++++--
3 files changed, 36 insertions(+), 7 deletions(-)
diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index d83e8fb3c0..61c6b21859 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -496,10 +496,10 @@ Command description:
it doesn't need to be specified separately in this case.
-.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [-n] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] if=INPUT of=OUTPUT
+.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [-n] [-l SNAPSHOT_PARAM] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] if=INPUT of=OUTPUT
- dd copies from *INPUT* file to *OUTPUT* file converting it from
- *FMT* format to *OUTPUT_FMT* format.
+ dd copies from *INPUT* file or snapshot *SNAPSHOT_PARAM* to *OUTPUT* file
+ converting it from *FMT* format to *OUTPUT_FMT* format.
The data is by default read and written using blocks of 512 bytes but can be
modified by specifying *BLOCK_SIZE*. If count=\ *BLOCKS* is specified
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 0b29a67a06..758f397232 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -60,9 +60,9 @@ SRST
ERST
DEF("dd", img_dd,
- "dd [--image-opts] [-U] [-f fmt] [-O output_fmt] [-n] [bs=block_size] [count=blocks] [skip=blocks] [osize=output_size] if=input of=output")
+ "dd [--image-opts] [-U] [-f fmt] [-O output_fmt] [-n] [-l snapshot_param] [bs=block_size] [count=blocks] [skip=blocks] [osize=output_size] if=input of=output")
SRST
-.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [-n] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] [osize=OUTPUT_SIZE] if=INPUT of=OUTPUT
+.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [-n] [-l SNAPSHOT_PARAM] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] [osize=OUTPUT_SIZE] if=INPUT of=OUTPUT
ERST
DEF("info", img_info,
diff --git a/qemu-img.c b/qemu-img.c
index 6fc8384f64..a6c88e0860 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -5110,6 +5110,7 @@ static int img_dd(int argc, char **argv)
BlockDriver *drv = NULL, *proto_drv = NULL;
BlockBackend *blk1 = NULL, *blk2 = NULL;
QemuOpts *opts = NULL;
+ QemuOpts *sn_opts = NULL;
QemuOptsList *create_opts = NULL;
Error *local_err = NULL;
bool image_opts = false;
@@ -5119,6 +5120,7 @@ static int img_dd(int argc, char **argv)
int64_t size = 0, readsize = 0;
int64_t out_pos, in_pos;
bool force_share = false, skip_create = false;
+ const char *snapshot_name = NULL;
struct DdInfo dd = {
.flags = 0,
.count = 0,
@@ -5156,7 +5158,7 @@ static int img_dd(int argc, char **argv)
{ 0, 0, 0, 0 }
};
- while ((c = getopt_long(argc, argv, ":hf:O:Un", long_options, NULL))) {
+ while ((c = getopt_long(argc, argv, ":hf:O:l:Un", long_options, NULL))) {
if (c == EOF) {
break;
}
@@ -5179,6 +5181,19 @@ static int img_dd(int argc, char **argv)
case 'n':
skip_create = true;
break;
+ case 'l':
+ if (strstart(optarg, SNAPSHOT_OPT_BASE, NULL)) {
+ sn_opts = qemu_opts_parse_noisily(&internal_snapshot_opts,
+ optarg, false);
+ if (!sn_opts) {
+ error_report("Failed in parsing snapshot param '%s'",
+ optarg);
+ goto out;
+ }
+ } else {
+ snapshot_name = optarg;
+ }
+ break;
case 'U':
force_share = true;
break;
@@ -5238,11 +5253,24 @@ static int img_dd(int argc, char **argv)
if (dd.flags & C_IF) {
blk1 = img_open(image_opts, in.filename, fmt, 0, false, false,
force_share);
-
if (!blk1) {
ret = -1;
goto out;
}
+ if (sn_opts) {
+ bdrv_snapshot_load_tmp(blk_bs(blk1),
+ qemu_opt_get(sn_opts, SNAPSHOT_OPT_ID),
+ qemu_opt_get(sn_opts, SNAPSHOT_OPT_NAME),
+ &local_err);
+ } else if (snapshot_name != NULL) {
+ bdrv_snapshot_load_tmp_by_id_or_name(blk_bs(blk1), snapshot_name,
+ &local_err);
+ }
+ if (local_err) {
+ error_reportf_err(local_err, "Failed to load snapshot: ");
+ ret = -1;
+ goto out;
+ }
}
if (dd.flags & C_OSIZE) {
@@ -5397,6 +5425,7 @@ static int img_dd(int argc, char **argv)
out:
g_free(arg);
qemu_opts_del(opts);
+ qemu_opts_del(sn_opts);
qemu_opts_free(create_opts);
blk_unref(blk1);
blk_unref(blk2);

View File

@@ -1,65 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Alexandre Derumier <aderumier@odiso.com>
Date: Mon, 6 Apr 2020 12:16:42 +0200
Subject: [PATCH] PVE: [Up] qemu-img dd: add -n skip_create
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
qemu-img.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/qemu-img.c b/qemu-img.c
index 5ce60e8a45..86bfd0288b 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -5022,7 +5022,7 @@ static int img_dd(int argc, char **argv)
const char *fmt = NULL;
int64_t size = 0, readsize = 0;
int64_t block_count = 0, out_pos, in_pos;
- bool force_share = false;
+ bool force_share = false, skip_create = false;
struct DdInfo dd = {
.flags = 0,
.count = 0,
@@ -5060,7 +5060,7 @@ static int img_dd(int argc, char **argv)
{ 0, 0, 0, 0 }
};
- while ((c = getopt_long(argc, argv, ":hf:O:U", long_options, NULL))) {
+ while ((c = getopt_long(argc, argv, ":hf:O:U:n", long_options, NULL))) {
if (c == EOF) {
break;
}
@@ -5080,6 +5080,9 @@ static int img_dd(int argc, char **argv)
case 'h':
help();
break;
+ case 'n':
+ skip_create = true;
+ break;
case 'U':
force_share = true;
break;
@@ -5220,13 +5223,15 @@ static int img_dd(int argc, char **argv)
size - in.bsz * in.offset, &error_abort);
}
- ret = bdrv_create(drv, out.filename, opts, &local_err);
- if (ret < 0) {
- error_reportf_err(local_err,
- "%s: error while creating output image: ",
- out.filename);
- ret = -1;
- goto out;
+ if (!skip_create) {
+ ret = bdrv_create(drv, out.filename, opts, &local_err);
+ if (ret < 0) {
+ error_reportf_err(local_err,
+ "%s: error while creating output image: ",
+ out.filename);
+ ret = -1;
+ goto out;
+ }
}
/* TODO, we can't honour --image-opts for the target,

View File

@@ -7,17 +7,62 @@ Actually provide memory information via the query-balloon
command.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: add BalloonInfo to member name exceptions list
rebase for 8.0 - moved to hw/core/machine-hmp-cmds.c]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/core/machine-hmp-cmds.c | 30 +++++++++++++++++++++++++++++-
hw/virtio/virtio-balloon.c | 33 +++++++++++++++++++++++++++++++--
monitor/hmp-cmds.c | 30 +++++++++++++++++++++++++++++-
qapi/machine.json | 22 +++++++++++++++++++++-
3 files changed, 81 insertions(+), 4 deletions(-)
qapi/pragma.json | 1 +
4 files changed, 82 insertions(+), 4 deletions(-)
diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
index 8701f00cc7..3b4c5ef403 100644
--- a/hw/core/machine-hmp-cmds.c
+++ b/hw/core/machine-hmp-cmds.c
@@ -179,7 +179,35 @@ void hmp_info_balloon(Monitor *mon, const QDict *qdict)
return;
}
- monitor_printf(mon, "balloon: actual=%" PRId64 "\n", info->actual >> 20);
+ monitor_printf(mon, "balloon: actual=%" PRId64, info->actual >> 20);
+ monitor_printf(mon, " max_mem=%" PRId64, info->max_mem >> 20);
+ if (info->has_total_mem) {
+ monitor_printf(mon, " total_mem=%" PRId64, info->total_mem >> 20);
+ }
+ if (info->has_free_mem) {
+ monitor_printf(mon, " free_mem=%" PRId64, info->free_mem >> 20);
+ }
+
+ if (info->has_mem_swapped_in) {
+ monitor_printf(mon, " mem_swapped_in=%" PRId64, info->mem_swapped_in);
+ }
+ if (info->has_mem_swapped_out) {
+ monitor_printf(mon, " mem_swapped_out=%" PRId64, info->mem_swapped_out);
+ }
+ if (info->has_major_page_faults) {
+ monitor_printf(mon, " major_page_faults=%" PRId64,
+ info->major_page_faults);
+ }
+ if (info->has_minor_page_faults) {
+ monitor_printf(mon, " minor_page_faults=%" PRId64,
+ info->minor_page_faults);
+ }
+ if (info->has_last_update) {
+ monitor_printf(mon, " last_update=%" PRId64,
+ info->last_update);
+ }
+
+ monitor_printf(mon, "\n");
qapi_free_BalloonInfo(info);
}
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index b22b5beda3..6e581439bf 100644
index afd2ad6dd6..c724218c17 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -805,8 +805,37 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
@@ -795,8 +795,37 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
static void virtio_balloon_stat(void *opaque, BalloonInfo *info)
{
VirtIOBalloon *dev = opaque;
@@ -57,54 +102,13 @@ index b22b5beda3..6e581439bf 100644
}
static void virtio_balloon_to_target(void *opaque, ram_addr_t target)
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 65d8ff4849..705f08a8f1 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -695,7 +695,35 @@ void hmp_info_balloon(Monitor *mon, const QDict *qdict)
return;
}
- monitor_printf(mon, "balloon: actual=%" PRId64 "\n", info->actual >> 20);
+ monitor_printf(mon, "balloon: actual=%" PRId64, info->actual >> 20);
+ monitor_printf(mon, " max_mem=%" PRId64, info->max_mem >> 20);
+ if (info->has_total_mem) {
+ monitor_printf(mon, " total_mem=%" PRId64, info->total_mem >> 20);
+ }
+ if (info->has_free_mem) {
+ monitor_printf(mon, " free_mem=%" PRId64, info->free_mem >> 20);
+ }
+
+ if (info->has_mem_swapped_in) {
+ monitor_printf(mon, " mem_swapped_in=%" PRId64, info->mem_swapped_in);
+ }
+ if (info->has_mem_swapped_out) {
+ monitor_printf(mon, " mem_swapped_out=%" PRId64, info->mem_swapped_out);
+ }
+ if (info->has_major_page_faults) {
+ monitor_printf(mon, " major_page_faults=%" PRId64,
+ info->major_page_faults);
+ }
+ if (info->has_minor_page_faults) {
+ monitor_printf(mon, " minor_page_faults=%" PRId64,
+ info->minor_page_faults);
+ }
+ if (info->has_last_update) {
+ monitor_printf(mon, " last_update=%" PRId64,
+ info->last_update);
+ }
+
+ monitor_printf(mon, "\n");
qapi_free_BalloonInfo(info);
}
diff --git a/qapi/machine.json b/qapi/machine.json
index 7c9a263778..3e59199280 100644
index a6b8795b09..9f7ed0eaa0 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1205,10 +1205,30 @@
# @actual: the logical size of the VM in bytes
# Formula used: logical_vm_size = vm_ram_size - balloon_size
@@ -1163,9 +1163,29 @@
# @actual: the logical size of the VM in bytes Formula used:
# logical_vm_size = vm_ram_size - balloon_size
#
+# @last_update: time when stats got updated from guest
+#
@@ -122,8 +126,7 @@ index 7c9a263778..3e59199280 100644
+#
+# @max_mem: amount of memory (in bytes) assigned to the guest
+#
# Since: 0.14.0
#
# Since: 0.14
##
-{ 'struct': 'BalloonInfo', 'data': {'actual': 'int' } }
+{ 'struct': 'BalloonInfo',
@@ -134,3 +137,15 @@ index 7c9a263778..3e59199280 100644
##
# @query-balloon:
diff --git a/qapi/pragma.json b/qapi/pragma.json
index 023a2ef7bc..6aaa9cb975 100644
--- a/qapi/pragma.json
+++ b/qapi/pragma.json
@@ -81,6 +81,7 @@
'member-name-exceptions': [ # visible in:
'ACPISlotType', # query-acpi-ospm-status
'AcpiTableOptions', # -acpitable
+ 'BalloonInfo', # query-balloon
'BlkdebugEvent', # blockdev-add, -blockdev
'BlkdebugSetStateOptions', # blockdev-add, -blockdev
'BlockDeviceInfo', # query-block

View File

@@ -13,13 +13,13 @@ Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 5362c80a18..3fcb82ce2f 100644
index 130217da8f..52a6d74820 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -234,6 +234,12 @@ MachineInfoList *qmp_query_machines(Error **errp)
info->hotpluggable_cpus = mc->has_hotpluggable_cpus;
@@ -90,6 +90,12 @@ MachineInfoList *qmp_query_machines(bool has_compat_props, bool compat_props,
info->numa_mem_supported = mc->numa_mem_supported;
info->deprecated = !!mc->deprecation_reason;
info->acpi = !!object_class_property_find(OBJECT_CLASS(mc), "acpi");
+
+ if (strcmp(mc->name, MACHINE_GET_CLASS(current_machine)->name) == 0) {
+ info->has_is_current = true;
@@ -28,21 +28,21 @@ index 5362c80a18..3fcb82ce2f 100644
+
if (mc->default_cpu_type) {
info->default_cpu_type = g_strdup(mc->default_cpu_type);
info->has_default_cpu_type = true;
}
diff --git a/qapi/machine.json b/qapi/machine.json
index 3e59199280..dfc1a49d3c 100644
index 9f7ed0eaa0..16366b774a 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -318,6 +318,8 @@
@@ -167,6 +167,8 @@
#
# @is-default: whether the machine is default
#
+# @is-current: whether this machine is currently used
+#
# @cpu-max: maximum number of CPUs supported by the machine type
# (since 1.5.0)
# (since 1.5)
#
@@ -339,7 +341,7 @@
@@ -199,7 +201,7 @@
##
{ 'struct': 'MachineInfo',
'data': { 'name': 'str', '*alias': 'str',
@@ -50,4 +50,4 @@ index 3e59199280..dfc1a49d3c 100644
+ '*is-default': 'bool', '*is-current': 'bool', 'cpu-max': 'int',
'hotpluggable-cpus': 'bool', 'numa-mem-supported': 'bool',
'deprecated': 'bool', '*default-cpu-type': 'str',
'*default-ram-id': 'str' } }
'*default-ram-id': 'str', 'acpi': 'bool',

View File

@@ -6,40 +6,41 @@ Subject: [PATCH] PVE: qapi: modify spice query
Provide the last ticket in the SpiceInfo struct optionally.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: adapt to QAPI change]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
qapi/ui.json | 3 +++
ui/spice-core.c | 5 +++++
2 files changed, 8 insertions(+)
ui/spice-core.c | 4 ++++
2 files changed, 7 insertions(+)
diff --git a/qapi/ui.json b/qapi/ui.json
index 6c7b33cb72..39ff301d1e 100644
index 460a26b981..42b911bda3 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -215,11 +215,14 @@
@@ -312,11 +312,14 @@
#
# @channels: a list of @SpiceChannel for each active spice channel
#
+# @ticket: The last ticket set with set_password
+#
# Since: 0.14.0
# Since: 0.14
##
{ 'struct': 'SpiceInfo',
'data': {'enabled': 'bool', 'migrated': 'bool', '*host': 'str', '*port': 'int',
'*tls-port': 'int', '*auth': 'str', '*compiled-version': 'str',
+ '*ticket': 'str',
'mouse-mode': 'SpiceQueryMouseMode', '*channels': ['SpiceChannel']},
'if': 'defined(CONFIG_SPICE)' }
'if': 'CONFIG_SPICE' }
diff --git a/ui/spice-core.c b/ui/spice-core.c
index d09ee7f09e..da3d2644d1 100644
index a7ecaad9c7..fecf002d50 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -538,6 +538,11 @@ static SpiceInfo *qmp_query_spice_real(Error **errp)
@@ -548,6 +548,10 @@ static SpiceInfo *qmp_query_spice_real(Error **errp)
micro = SPICE_SERVER_VERSION & 0xff;
info->compiled_version = g_strdup_printf("%d.%d.%d", major, minor, micro);
+ if (auth_passwd) {
+ info->has_ticket = true;
+ info->ticket = g_strdup(auth_passwd);
+ }
+

View File

@@ -0,0 +1,284 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 13 Oct 2022 11:33:50 +0200
Subject: [PATCH] PVE: add IOChannel implementation for savevm-async
based on migration/channel-block.c and the implementation that was
present in migration/savevm-async.c before QEMU 7.1.
Passes along read/write requests to the given BlockBackend, while
ensuring that a read request going beyond the end results in a
graceful short read.
Additionally, allows tracking the current position from the outside
(intended to be used for progress tracking).
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
migration/channel-savevm-async.c | 184 +++++++++++++++++++++++++++++++
migration/channel-savevm-async.h | 51 +++++++++
migration/meson.build | 1 +
3 files changed, 236 insertions(+)
create mode 100644 migration/channel-savevm-async.c
create mode 100644 migration/channel-savevm-async.h
diff --git a/migration/channel-savevm-async.c b/migration/channel-savevm-async.c
new file mode 100644
index 0000000000..081a192f49
--- /dev/null
+++ b/migration/channel-savevm-async.c
@@ -0,0 +1,184 @@
+/*
+ * QIO Channel implementation to be used by savevm-async QMP calls
+ */
+#include "qemu/osdep.h"
+#include "migration/channel-savevm-async.h"
+#include "qapi/error.h"
+#include "sysemu/block-backend.h"
+#include "trace.h"
+
+QIOChannelSavevmAsync *
+qio_channel_savevm_async_new(BlockBackend *be, size_t *bs_pos)
+{
+ QIOChannelSavevmAsync *ioc;
+
+ ioc = QIO_CHANNEL_SAVEVM_ASYNC(object_new(TYPE_QIO_CHANNEL_SAVEVM_ASYNC));
+
+ bdrv_ref(blk_bs(be));
+ ioc->be = be;
+ ioc->bs_pos = bs_pos;
+
+ return ioc;
+}
+
+
+static void
+qio_channel_savevm_async_finalize(Object *obj)
+{
+ QIOChannelSavevmAsync *ioc = QIO_CHANNEL_SAVEVM_ASYNC(obj);
+
+ if (ioc->be) {
+ bdrv_unref(blk_bs(ioc->be));
+ ioc->be = NULL;
+ }
+ ioc->bs_pos = NULL;
+}
+
+
+static ssize_t
+qio_channel_savevm_async_readv(QIOChannel *ioc,
+ const struct iovec *iov,
+ size_t niov,
+ int **fds,
+ size_t *nfds,
+ int flags,
+ Error **errp)
+{
+ QIOChannelSavevmAsync *saioc = QIO_CHANNEL_SAVEVM_ASYNC(ioc);
+ BlockBackend *be = saioc->be;
+ int64_t maxlen = blk_getlength(be);
+ QEMUIOVector qiov;
+ size_t size;
+ int ret;
+
+ qemu_iovec_init_external(&qiov, (struct iovec *)iov, niov);
+
+ if (*saioc->bs_pos >= maxlen) {
+ error_setg(errp, "cannot read beyond maxlen");
+ return -1;
+ }
+
+ if (maxlen - *saioc->bs_pos < qiov.size) {
+ size = maxlen - *saioc->bs_pos;
+ } else {
+ size = qiov.size;
+ }
+
+ // returns 0 on success
+ ret = blk_preadv(be, *saioc->bs_pos, size, &qiov, 0);
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "blk_preadv failed");
+ return -1;
+ }
+
+ *saioc->bs_pos += size;
+ return size;
+}
+
+
+static ssize_t
+qio_channel_savevm_async_writev(QIOChannel *ioc,
+ const struct iovec *iov,
+ size_t niov,
+ int *fds,
+ size_t nfds,
+ int flags,
+ Error **errp)
+{
+ QIOChannelSavevmAsync *saioc = QIO_CHANNEL_SAVEVM_ASYNC(ioc);
+ BlockBackend *be = saioc->be;
+ QEMUIOVector qiov;
+ int ret;
+
+ qemu_iovec_init_external(&qiov, (struct iovec *)iov, niov);
+
+ if (qemu_in_coroutine()) {
+ ret = blk_co_pwritev(be, *saioc->bs_pos, qiov.size, &qiov, 0);
+ aio_wait_kick();
+ } else {
+ ret = blk_pwritev(be, *saioc->bs_pos, qiov.size, &qiov, 0);
+ }
+
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "blk(_co)_pwritev failed");
+ return -1;
+ }
+
+ *saioc->bs_pos += qiov.size;
+ return qiov.size;
+}
+
+
+static int
+qio_channel_savevm_async_set_blocking(QIOChannel *ioc,
+ bool enabled,
+ Error **errp)
+{
+ if (!enabled) {
+ error_setg(errp, "Non-blocking mode not supported for savevm-async");
+ return -1;
+ }
+ return 0;
+}
+
+
+static int
+qio_channel_savevm_async_close(QIOChannel *ioc,
+ Error **errp)
+{
+ QIOChannelSavevmAsync *saioc = QIO_CHANNEL_SAVEVM_ASYNC(ioc);
+ int rv = bdrv_flush(blk_bs(saioc->be));
+
+ if (rv < 0) {
+ error_setg_errno(errp, -rv, "Unable to flush VMState");
+ return -1;
+ }
+
+ bdrv_unref(blk_bs(saioc->be));
+ saioc->be = NULL;
+ saioc->bs_pos = NULL;
+
+ return 0;
+}
+
+
+static void
+qio_channel_savevm_async_set_aio_fd_handler(QIOChannel *ioc,
+ AioContext *read_ctx,
+ IOHandler *io_read,
+ AioContext *write_ctx,
+ IOHandler *io_write,
+ void *opaque)
+{
+ // if channel-block starts doing something, check if this needs adaptation
+}
+
+
+static void
+qio_channel_savevm_async_class_init(ObjectClass *klass,
+ void *class_data G_GNUC_UNUSED)
+{
+ QIOChannelClass *ioc_klass = QIO_CHANNEL_CLASS(klass);
+
+ ioc_klass->io_writev = qio_channel_savevm_async_writev;
+ ioc_klass->io_readv = qio_channel_savevm_async_readv;
+ ioc_klass->io_set_blocking = qio_channel_savevm_async_set_blocking;
+ ioc_klass->io_close = qio_channel_savevm_async_close;
+ ioc_klass->io_set_aio_fd_handler = qio_channel_savevm_async_set_aio_fd_handler;
+}
+
+static const TypeInfo qio_channel_savevm_async_info = {
+ .parent = TYPE_QIO_CHANNEL,
+ .name = TYPE_QIO_CHANNEL_SAVEVM_ASYNC,
+ .instance_size = sizeof(QIOChannelSavevmAsync),
+ .instance_finalize = qio_channel_savevm_async_finalize,
+ .class_init = qio_channel_savevm_async_class_init,
+};
+
+static void
+qio_channel_savevm_async_register_types(void)
+{
+ type_register_static(&qio_channel_savevm_async_info);
+}
+
+type_init(qio_channel_savevm_async_register_types);
diff --git a/migration/channel-savevm-async.h b/migration/channel-savevm-async.h
new file mode 100644
index 0000000000..17ae2cb261
--- /dev/null
+++ b/migration/channel-savevm-async.h
@@ -0,0 +1,51 @@
+/*
+ * QEMU I/O channels driver for savevm-async.c
+ *
+ * Copyright (c) 2022 Proxmox Server Solutions
+ *
+ * Authors:
+ * Fiona Ebner (f.ebner@proxmox.com)
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#ifndef QIO_CHANNEL_SAVEVM_ASYNC_H
+#define QIO_CHANNEL_SAVEVM_ASYNC_H
+
+#include "io/channel.h"
+#include "qom/object.h"
+
+#define TYPE_QIO_CHANNEL_SAVEVM_ASYNC "qio-channel-savevm-async"
+OBJECT_DECLARE_SIMPLE_TYPE(QIOChannelSavevmAsync, QIO_CHANNEL_SAVEVM_ASYNC)
+
+
+/**
+ * QIOChannelSavevmAsync:
+ *
+ * The QIOChannelBlock object provides a channel implementation that is able to
+ * perform I/O on any BlockBackend whose BlockDriverState directly contains a
+ * VMState (as opposed to indirectly, like qcow2). It allows tracking the
+ * current position from the outside.
+ */
+struct QIOChannelSavevmAsync {
+ QIOChannel parent;
+ BlockBackend *be;
+ size_t *bs_pos;
+};
+
+
+/**
+ * qio_channel_savevm_async_new:
+ * @be: the block backend
+ * @bs_pos: used to keep track of the IOChannels current position
+ *
+ * Create a new IO channel object that can perform I/O on a BlockBackend object
+ * whose BlockDriverState directly contains a VMState.
+ *
+ * Returns: the new channel object
+ */
+QIOChannelSavevmAsync *
+qio_channel_savevm_async_new(BlockBackend *be, size_t *bs_pos);
+
+#endif /* QIO_CHANNEL_SAVEVM_ASYNC_H */
diff --git a/migration/meson.build b/migration/meson.build
index d53cf3417a..b00d58064d 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -13,6 +13,7 @@ system_ss.add(files(
'block-dirty-bitmap.c',
'channel.c',
'channel-block.c',
+ 'channel-savevm-async.c',
'cpu-throttle.c',
'dirtyrate.c',
'exec.c',

View File

@@ -21,29 +21,38 @@ still opened by QEMU.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[improve aborting]
[SR: improve aborting
register yank before migration_incoming_state_destroy]
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
[FE: further improve aborting
adapt to removal of QEMUFileOps
improve condition for entering final stage
adapt to QAPI and other changes for 8.2
make sure to not call vm_start() from coroutine
stop CPU throttling after finishing
force raw format when loading state as suggested by Friedrich Weber]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hmp-commands-info.hx | 13 +
hmp-commands.hx | 33 ++
include/migration/snapshot.h | 1 +
include/monitor/hmp.h | 5 +
hmp-commands.hx | 17 ++
include/migration/snapshot.h | 2 +
include/monitor/hmp.h | 3 +
migration/meson.build | 1 +
migration/savevm-async.c | 598 +++++++++++++++++++++++++++++++++++
monitor/hmp-cmds.c | 57 ++++
qapi/migration.json | 34 ++
qapi/misc.json | 32 ++
migration/savevm-async.c | 571 +++++++++++++++++++++++++++++++++++
monitor/hmp-cmds.c | 38 +++
qapi/migration.json | 34 +++
qapi/misc.json | 18 ++
qemu-options.hx | 12 +
softmmu/vl.c | 10 +
11 files changed, 796 insertions(+)
system/vl.c | 10 +
11 files changed, 719 insertions(+)
create mode 100644 migration/savevm-async.c
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 117ba25f91..b3b797ca28 100644
index c59cd6637b..d1a7b99add 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -580,6 +580,19 @@ SRST
Show current migration xbzrle cache size.
@@ -512,6 +512,19 @@ SRST
Show current migration parameters.
ERST
+ {
@@ -63,13 +72,13 @@ index 117ba25f91..b3b797ca28 100644
.name = "balloon",
.args_type = "",
diff --git a/hmp-commands.hx b/hmp-commands.hx
index ff2d7aa8f3..d294c234a5 100644
index 06746f0afc..0c7c6f2c16 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1866,3 +1866,36 @@ ERST
.flags = "p",
},
@@ -1859,3 +1859,20 @@ SRST
List event channels in the guest
ERST
#endif
+
+ {
+ .name = "savevm-start",
@@ -80,22 +89,6 @@ index ff2d7aa8f3..d294c234a5 100644
+ },
+
+ {
+ .name = "snapshot-drive",
+ .args_type = "device:s,name:s",
+ .params = "device name",
+ .help = "Create internal snapshot.",
+ .cmd = hmp_snapshot_drive,
+ },
+
+ {
+ .name = "delete-drive-snapshot",
+ .args_type = "device:s,name:s",
+ .params = "device name",
+ .help = "Delete internal snapshot.",
+ .cmd = hmp_delete_drive_snapshot,
+ },
+
+ {
+ .name = "savevm-end",
+ .args_type = "",
+ .params = "",
@@ -104,21 +97,21 @@ index ff2d7aa8f3..d294c234a5 100644
+ .coroutine = true,
+ },
diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h
index c85b6ec75b..4411b7121d 100644
index 9e4dcaaa75..2581730d74 100644
--- a/include/migration/snapshot.h
+++ b/include/migration/snapshot.h
@@ -17,5 +17,6 @@
@@ -68,4 +68,6 @@ bool delete_snapshot(const char *name,
*/
void load_snapshot_resume(RunState state);
int save_snapshot(const char *name, Error **errp);
int load_snapshot(const char *name, Error **errp);
+int load_snapshot_from_blockdev(const char *filename, Error **errp);
+
#endif
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index ed2913fd18..4e06f89e8e 100644
index ae116d9804..2596cc2426 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -25,6 +25,7 @@ void hmp_info_status(Monitor *mon, const QDict *qdict);
@@ -28,6 +28,7 @@ void hmp_info_status(Monitor *mon, const QDict *qdict);
void hmp_info_uuid(Monitor *mon, const QDict *qdict);
void hmp_info_chardev(Monitor *mon, const QDict *qdict);
void hmp_info_mice(Monitor *mon, const QDict *qdict);
@@ -126,42 +119,44 @@ index ed2913fd18..4e06f89e8e 100644
void hmp_info_migrate(Monitor *mon, const QDict *qdict);
void hmp_info_migrate_capabilities(Monitor *mon, const QDict *qdict);
void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict);
@@ -83,6 +84,10 @@ void hmp_netdev_add(Monitor *mon, const QDict *qdict);
void hmp_netdev_del(Monitor *mon, const QDict *qdict);
void hmp_getfd(Monitor *mon, const QDict *qdict);
void hmp_closefd(Monitor *mon, const QDict *qdict);
@@ -92,6 +93,8 @@ void hmp_closefd(Monitor *mon, const QDict *qdict);
void hmp_mouse_move(Monitor *mon, const QDict *qdict);
void hmp_mouse_button(Monitor *mon, const QDict *qdict);
void hmp_mouse_set(Monitor *mon, const QDict *qdict);
+void hmp_savevm_start(Monitor *mon, const QDict *qdict);
+void hmp_snapshot_drive(Monitor *mon, const QDict *qdict);
+void hmp_delete_drive_snapshot(Monitor *mon, const QDict *qdict);
+void hmp_savevm_end(Monitor *mon, const QDict *qdict);
void hmp_sendkey(Monitor *mon, const QDict *qdict);
void hmp_screendump(Monitor *mon, const QDict *qdict);
void coroutine_fn hmp_screendump(Monitor *mon, const QDict *qdict);
void hmp_chardev_add(Monitor *mon, const QDict *qdict);
diff --git a/migration/meson.build b/migration/meson.build
index 980e37865c..e62b79b60f 100644
index b00d58064d..075b013971 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -23,6 +23,7 @@ softmmu_ss.add(files(
'multifd-zlib.c',
@@ -29,6 +29,7 @@ system_ss.add(files(
'options.c',
'postcopy-ram.c',
'savevm.c',
+ 'savevm-async.c',
'socket.c',
'tls.c',
))
'threadinfo.c',
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
new file mode 100644
index 0000000000..593a619088
index 0000000000..ee8ef316d0
--- /dev/null
+++ b/migration/savevm-async.c
@@ -0,0 +1,598 @@
@@ -0,0 +1,571 @@
+#include "qemu/osdep.h"
+#include "migration/channel-savevm-async.h"
+#include "migration/migration.h"
+#include "migration/migration-stats.h"
+#include "migration/options.h"
+#include "migration/savevm.h"
+#include "migration/snapshot.h"
+#include "migration/global_state.h"
+#include "migration/ram.h"
+#include "migration/qemu-file.h"
+#include "sysemu/cpu-throttle.h"
+#include "sysemu/sysemu.h"
+#include "sysemu/runstate.h"
+#include "block/block.h"
@@ -173,15 +168,14 @@ index 0000000000..593a619088
+#include "qapi/qapi-commands-misc.h"
+#include "qapi/qapi-commands-block.h"
+#include "qemu/cutils.h"
+#include "qemu/error-report.h"
+#include "qemu/timer.h"
+#include "qemu/main-loop.h"
+#include "qemu/rcu.h"
+#include "qemu/yank.h"
+
+/* #define DEBUG_SAVEVM_STATE */
+
+/* used while emulated sync operation in progress */
+#define NOT_DONE -EINPROGRESS
+
+#ifdef DEBUG_SAVEVM_STATE
+#define DPRINTF(fmt, ...) \
+ do { printf("savevm-async: " fmt, ## __VA_ARGS__); } while (0)
@@ -210,7 +204,7 @@ index 0000000000..593a619088
+ int64_t total_time;
+ QEMUBH *finalize_bh;
+ Coroutine *co;
+ QemuCoSleepState *target_close_wait;
+ QemuCoSleep target_close_wait;
+} snap_state;
+
+static bool savevm_aborted(void)
@@ -229,24 +223,20 @@ index 0000000000..593a619088
+ info->bytes = s->bs_pos;
+ switch (s->state) {
+ case SAVE_STATE_ERROR:
+ info->has_status = true;
+ info->status = g_strdup("failed");
+ info->has_total_time = true;
+ info->total_time = s->total_time;
+ if (s->error) {
+ info->has_error = true;
+ info->error = g_strdup(error_get_pretty(s->error));
+ }
+ break;
+ case SAVE_STATE_ACTIVE:
+ info->has_status = true;
+ info->status = g_strdup("active");
+ info->has_total_time = true;
+ info->total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME)
+ - s->total_time;
+ break;
+ case SAVE_STATE_COMPLETED:
+ info->has_status = true;
+ info->status = g_strdup("completed");
+ info->has_total_time = true;
+ info->total_time = s->total_time;
@@ -268,9 +258,11 @@ index 0000000000..593a619088
+
+ if (snap_state.file) {
+ ret = qemu_fclose(snap_state.file);
+ snap_state.file = NULL;
+ }
+
+ if (snap_state.target) {
+ BlockDriverState *target_bs = blk_bs(snap_state.target);
+ if (!savevm_aborted()) {
+ /* try to truncate, but ignore errors (will fail on block devices).
+ * note1: bdrv_read() need whole blocks, so we need to round up
@@ -279,21 +271,21 @@ index 0000000000..593a619088
+ size_t size = QEMU_ALIGN_UP(snap_state.bs_pos, BDRV_SECTOR_SIZE*2);
+ blk_truncate(snap_state.target, size, false, PREALLOC_MODE_OFF, 0, NULL);
+ }
+ blk_op_unblock_all(snap_state.target, snap_state.blocker);
+ if (target_bs) {
+ bdrv_op_unblock_all(target_bs, snap_state.blocker);
+ }
+ error_free(snap_state.blocker);
+ snap_state.blocker = NULL;
+ blk_unref(snap_state.target);
+ snap_state.target = NULL;
+
+ if (snap_state.target_close_wait) {
+ qemu_co_sleep_wake(snap_state.target_close_wait);
+ }
+ qemu_co_sleep_wake(&snap_state.target_close_wait);
+ }
+
+ return ret;
+}
+
+static void save_snapshot_error(const char *fmt, ...)
+static void G_GNUC_PRINTF(1, 2) save_snapshot_error(const char *fmt, ...)
+{
+ va_list ap;
+ char *msg;
@@ -305,7 +297,7 @@ index 0000000000..593a619088
+ DPRINTF("save_snapshot_error: %s\n", msg);
+
+ if (!snap_state.error) {
+ error_set(&snap_state.error, ERROR_CLASS_GENERIC_ERROR, "%s", msg);
+ error_setg(&snap_state.error, "%s", msg);
+ }
+
+ g_free (msg);
@@ -313,64 +305,9 @@ index 0000000000..593a619088
+ snap_state.state = SAVE_STATE_ERROR;
+}
+
+static int block_state_close(void *opaque, Error **errp)
+{
+ snap_state.file = NULL;
+ return blk_flush(snap_state.target);
+}
+
+typedef struct BlkRwCo {
+ int64_t offset;
+ QEMUIOVector *qiov;
+ ssize_t ret;
+} BlkRwCo;
+
+static void coroutine_fn block_state_write_entry(void *opaque) {
+ BlkRwCo *rwco = opaque;
+ rwco->ret = blk_co_pwritev(snap_state.target, rwco->offset, rwco->qiov->size,
+ rwco->qiov, 0);
+ aio_wait_kick();
+}
+
+static ssize_t block_state_writev_buffer(void *opaque, struct iovec *iov,
+ int iovcnt, int64_t pos, Error **errp)
+{
+ QEMUIOVector qiov;
+ BlkRwCo rwco;
+
+ assert(pos == snap_state.bs_pos);
+ rwco = (BlkRwCo) {
+ .offset = pos,
+ .qiov = &qiov,
+ .ret = NOT_DONE,
+ };
+
+ qemu_iovec_init_external(&qiov, iov, iovcnt);
+
+ if (qemu_in_coroutine()) {
+ block_state_write_entry(&rwco);
+ } else {
+ Coroutine *co = qemu_coroutine_create(&block_state_write_entry, &rwco);
+ bdrv_coroutine_enter(blk_bs(snap_state.target), co);
+ BDRV_POLL_WHILE(blk_bs(snap_state.target), rwco.ret == NOT_DONE);
+ }
+ if (rwco.ret < 0) {
+ return rwco.ret;
+ }
+
+ snap_state.bs_pos += qiov.size;
+ return qiov.size;
+}
+
+static const QEMUFileOps block_file_ops = {
+ .writev_buffer = block_state_writev_buffer,
+ .close = block_state_close,
+};
+
+static void process_savevm_finalize(void *opaque)
+{
+ int ret;
+ AioContext *iohandler_ctx = iohandler_get_aio_context();
+ MigrationState *ms = migrate_get_current();
+
+ bool aborted = savevm_aborted();
@@ -387,9 +324,7 @@ index 0000000000..593a619088
+ * so move it back. It can stay in the main context and live out its live
+ * there, since we're done with it after this method ends anyway.
+ */
+ aio_context_acquire(iohandler_ctx);
+ blk_set_aio_context(snap_state.target, qemu_get_aio_context(), NULL);
+ aio_context_release(iohandler_ctx);
+
+ ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+ if (ret < 0) {
@@ -401,7 +336,7 @@ index 0000000000..593a619088
+ (void)qemu_savevm_state_complete_precopy(snap_state.file, false, false);
+ ret = qemu_file_get_error(snap_state.file);
+ if (ret < 0) {
+ save_snapshot_error("qemu_savevm_state_iterate error %d", ret);
+ save_snapshot_error("qemu_savevm_state_complete_precopy error %d", ret);
+ }
+ }
+
@@ -414,6 +349,12 @@ index 0000000000..593a619088
+ ret || aborted ? MIGRATION_STATUS_FAILED : MIGRATION_STATUS_COMPLETED);
+ ms->to_dst_file = NULL;
+
+ /*
+ * Same as in migration_iteration_finish(): saving RAM might've turned on CPU throttling for
+ * auto-converge, make sure to disable it.
+ */
+ cpu_throttle_stop();
+
+ qemu_savevm_state_cleanup();
+
+ ret = save_snapshot_cleanup();
@@ -422,8 +363,11 @@ index 0000000000..593a619088
+ } else if (snap_state.state == SAVE_STATE_ACTIVE) {
+ snap_state.state = SAVE_STATE_COMPLETED;
+ } else if (aborted) {
+ save_snapshot_error("process_savevm_cleanup: found aborted state: %d",
+ snap_state.state);
+ /*
+ * If there was an error, there's no need to set a new one here.
+ * If the snapshot was canceled, leave setting the state to
+ * qmp_savevm_end(), which is waked by save_snapshot_cleanup().
+ */
+ } else {
+ save_snapshot_error("process_savevm_cleanup: invalid state: %d",
+ snap_state.state);
@@ -455,18 +399,32 @@ index 0000000000..593a619088
+ }
+
+ while (snap_state.state == SAVE_STATE_ACTIVE) {
+ uint64_t pending_size, pend_precopy, pend_compatible, pend_postcopy;
+ uint64_t pending_size, pend_precopy, pend_postcopy;
+ uint64_t threshold = 400 * 1000;
+
+ /* pending is expected to be called without iothread lock */
+ qemu_mutex_unlock_iothread();
+ qemu_savevm_state_pending(snap_state.file, 0, &pend_precopy, &pend_compatible, &pend_postcopy);
+ qemu_mutex_lock_iothread();
+ /*
+ * pending_{estimate,exact} are expected to be called without iothread
+ * lock. Similar to what is done in migration.c, call the exact variant
+ * only once pend_precopy in the estimate is below the threshold.
+ */
+ bql_unlock();
+ qemu_savevm_state_pending_estimate(&pend_precopy, &pend_postcopy);
+ if (pend_precopy <= threshold) {
+ qemu_savevm_state_pending_exact(&pend_precopy, &pend_postcopy);
+ }
+ bql_lock();
+ pending_size = pend_precopy + pend_postcopy;
+
+ pending_size = pend_precopy + pend_compatible + pend_postcopy;
+ /*
+ * A guest reaching this cutoff is dirtying lots of RAM. It should be
+ * large enough so that the guest can't dirty this much between the
+ * check and the guest actually being stopped, but it should be small
+ * enough to avoid long downtimes for non-hibernation snapshots.
+ */
+ maxlen = blk_getlength(snap_state.target) - 100*1024*1024;
+
+ maxlen = blk_getlength(snap_state.target) - 30*1024*1024;
+
+ if (pending_size > 400000 && snap_state.bs_pos + pending_size < maxlen) {
+ /* Note that there is no progress for pend_postcopy when iterating */
+ if (pend_precopy > threshold && snap_state.bs_pos + pending_size < maxlen) {
+ ret = qemu_savevm_state_iterate(snap_state.file, false);
+ if (ret < 0) {
+ save_snapshot_error("qemu_savevm_state_iterate error %d", ret);
@@ -475,11 +433,7 @@ index 0000000000..593a619088
+ DPRINTF("savevm iterate pending size %lu ret %d\n", pending_size, ret);
+ } else {
+ qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
+ ret = global_state_store();
+ if (ret) {
+ save_snapshot_error("global_state_store error %d", ret);
+ break;
+ }
+ global_state_store();
+
+ DPRINTF("savevm iterate complete\n");
+ break;
@@ -498,19 +452,25 @@ index 0000000000..593a619088
+ * so move there now and after every flush.
+ */
+ aio_co_reschedule_self(qemu_get_aio_context());
+ for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+ bdrv_graph_co_rdlock();
+ bs = bdrv_first(&it);
+ bdrv_graph_co_rdunlock();
+ while (bs) {
+ /* target has BDRV_O_NO_FLUSH, no sense calling bdrv_flush on it */
+ if (bs == blk_bs(snap_state.target)) {
+ continue;
+ }
+
+ AioContext *bs_ctx = bdrv_get_aio_context(bs);
+ if (bs_ctx != qemu_get_aio_context()) {
+ DPRINTF("savevm: async flushing drive %s\n", bs->filename);
+ aio_co_reschedule_self(bs_ctx);
+ bdrv_flush(bs);
+ aio_co_reschedule_self(qemu_get_aio_context());
+ if (bs != blk_bs(snap_state.target)) {
+ AioContext *bs_ctx = bdrv_get_aio_context(bs);
+ if (bs_ctx != qemu_get_aio_context()) {
+ DPRINTF("savevm: async flushing drive %s\n", bs->filename);
+ aio_co_reschedule_self(bs_ctx);
+ bdrv_graph_co_rdlock();
+ bdrv_flush(bs);
+ bdrv_graph_co_rdunlock();
+ aio_co_reschedule_self(qemu_get_aio_context());
+ }
+ }
+ bdrv_graph_co_rdlock();
+ bs = bdrv_next(&it);
+ bdrv_graph_co_rdunlock();
+ }
+
+ DPRINTF("timing: async flushing took %ld ms\n",
@@ -519,28 +479,23 @@ index 0000000000..593a619088
+ qemu_bh_schedule(snap_state.finalize_bh);
+}
+
+void qmp_savevm_start(bool has_statefile, const char *statefile, Error **errp)
+void qmp_savevm_start(const char *statefile, Error **errp)
+{
+ Error *local_err = NULL;
+ MigrationState *ms = migrate_get_current();
+ AioContext *iohandler_ctx = iohandler_get_aio_context();
+ BlockDriverState *target_bs = NULL;
+ int ret = 0;
+
+ int bdrv_oflags = BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_NO_FLUSH;
+
+ if (snap_state.state != SAVE_STATE_DONE) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+ "VM snapshot already started\n");
+ error_setg(errp, "VM snapshot already started\n");
+ return;
+ }
+
+ if (migration_is_running(ms->state)) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, QERR_MIGRATION_ACTIVE);
+ return;
+ }
+
+ if (migrate_use_block()) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+ "Block migration and snapshots are incompatible");
+ if (migration_is_running()) {
+ error_setg(errp, "There's a migration process in progress");
+ return;
+ }
+
@@ -549,13 +504,14 @@ index 0000000000..593a619088
+ snap_state.bs_pos = 0;
+ snap_state.total_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+ snap_state.blocker = NULL;
+ snap_state.target_close_wait = (QemuCoSleep){ .to_wake = NULL };
+
+ if (snap_state.error) {
+ error_free(snap_state.error);
+ snap_state.error = NULL;
+ }
+
+ if (!has_statefile) {
+ if (!statefile) {
+ vm_stop(RUN_STATE_SAVE_VM);
+ snap_state.state = SAVE_STATE_COMPLETED;
+ return;
@@ -571,14 +527,21 @@ index 0000000000..593a619088
+ qdict_put_str(options, "driver", "raw");
+ snap_state.target = blk_new_open(statefile, NULL, options, bdrv_oflags, &local_err);
+ if (!snap_state.target) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "failed to open '%s'", statefile);
+ error_setg(errp, "failed to open '%s'", statefile);
+ goto restart;
+ }
+ target_bs = blk_bs(snap_state.target);
+ if (!target_bs) {
+ error_setg(errp, "failed to open '%s' - no block driver state", statefile);
+ goto restart;
+ }
+
+ snap_state.file = qemu_fopen_ops(&snap_state, &block_file_ops);
+ QIOChannel *ioc = QIO_CHANNEL(qio_channel_savevm_async_new(snap_state.target,
+ &snap_state.bs_pos));
+ snap_state.file = qemu_file_new_output(ioc);
+
+ if (!snap_state.file) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "failed to open '%s'", statefile);
+ error_setg(errp, "failed to open '%s'", statefile);
+ goto restart;
+ }
+
@@ -587,26 +550,36 @@ index 0000000000..593a619088
+ * State is cleared in process_savevm_co, but has to be initialized
+ * here (blocking main thread, from QMP) to avoid race conditions.
+ */
+ migrate_init(ms);
+ memset(&ram_counters, 0, sizeof(ram_counters));
+ if (migrate_init(ms, errp)) {
+ return;
+ }
+ memset(&mig_stats, 0, sizeof(mig_stats));
+ ms->to_dst_file = snap_state.file;
+
+ error_setg(&snap_state.blocker, "block device is in use by savevm");
+ blk_op_block_all(snap_state.target, snap_state.blocker);
+ bdrv_op_block_all(target_bs, snap_state.blocker);
+
+ snap_state.state = SAVE_STATE_ACTIVE;
+ snap_state.finalize_bh = qemu_bh_new(process_savevm_finalize, &snap_state);
+ snap_state.co = qemu_coroutine_create(&process_savevm_co, NULL);
+ qemu_mutex_unlock_iothread();
+ qemu_savevm_state_header(snap_state.file);
+ qemu_savevm_state_setup(snap_state.file);
+ qemu_mutex_lock_iothread();
+ ret = qemu_savevm_state_setup(snap_state.file, &local_err);
+ if (ret != 0) {
+ error_setg_errno(errp, -ret, "savevm state setup failed: %s",
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ return;
+ }
+
+ /* Async processing from here on out happens in iohandler context, so let
+ * the target bdrv have its home there.
+ */
+ blk_set_aio_context(snap_state.target, iohandler_ctx, &local_err);
+ ret = blk_set_aio_context(snap_state.target, iohandler_ctx, &local_err);
+ if (ret != 0) {
+ warn_report("failed to set iohandler context for VM state target: %s %s",
+ local_err ? error_get_pretty(local_err) : "unknown error",
+ strerror(-ret));
+ }
+
+ snap_state.co = qemu_coroutine_create(&process_savevm_co, NULL);
+ aio_co_schedule(iohandler_ctx, snap_state.co);
+
+ return;
@@ -621,29 +594,10 @@ index 0000000000..593a619088
+ }
+}
+
+void coroutine_fn qmp_savevm_end(Error **errp)
+static void coroutine_fn wait_for_close_co(void *opaque)
+{
+ int64_t timeout;
+
+ if (snap_state.state == SAVE_STATE_DONE) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+ "VM snapshot not started\n");
+ return;
+ }
+
+ if (snap_state.state == SAVE_STATE_ACTIVE) {
+ snap_state.state = SAVE_STATE_CANCELLED;
+ goto wait_for_close;
+ }
+
+ if (snap_state.saved_vm_running) {
+ vm_start();
+ snap_state.saved_vm_running = false;
+ }
+
+ snap_state.state = SAVE_STATE_DONE;
+
+wait_for_close:
+ if (!snap_state.target) {
+ DPRINTF("savevm-end: no target file open\n");
+ return;
@@ -653,9 +607,8 @@ index 0000000000..593a619088
+ * call exits the statefile will be closed and can be removed immediately */
+ DPRINTF("savevm-end: waiting for cleanup\n");
+ timeout = 30L * 1000 * 1000 * 1000;
+ qemu_co_sleep_ns_wakeable(QEMU_CLOCK_REALTIME, timeout,
+ &snap_state.target_close_wait);
+ snap_state.target_close_wait = NULL;
+ qemu_co_sleep_ns_wakeable(&snap_state.target_close_wait,
+ QEMU_CLOCK_REALTIME, timeout);
+ if (snap_state.target) {
+ save_snapshot_error("timeout waiting for target file close in "
+ "qmp_savevm_end");
@@ -664,67 +617,72 @@ index 0000000000..593a619088
+ return;
+ }
+
+ // File closed and no other error, so ensure next snapshot can be started.
+ if (snap_state.state != SAVE_STATE_ERROR) {
+ snap_state.state = SAVE_STATE_DONE;
+ }
+
+ DPRINTF("savevm-end: cleanup done\n");
+}
+
+// FIXME: Deprecated
+void qmp_snapshot_drive(const char *device, const char *name, Error **errp)
+void qmp_savevm_end(Error **errp)
+{
+ // Compatibility to older qemu-server.
+ qmp_blockdev_snapshot_internal_sync(device, name, errp);
+}
+
+// FIXME: Deprecated
+void qmp_delete_drive_snapshot(const char *device, const char *name,
+ Error **errp)
+{
+ // Compatibility to older qemu-server.
+ (void)qmp_blockdev_snapshot_delete_internal_sync(device, false, NULL,
+ true, name, errp);
+}
+
+static ssize_t loadstate_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
+ size_t size, Error **errp)
+{
+ BlockBackend *be = opaque;
+ int64_t maxlen = blk_getlength(be);
+ if (pos > maxlen) {
+ return -EIO;
+ if (snap_state.state == SAVE_STATE_DONE) {
+ error_setg(errp, "VM snapshot not started\n");
+ return;
+ }
+ if ((pos + size) > maxlen) {
+ size = maxlen - pos - 1;
+ }
+ if (size == 0) {
+ return 0;
+ }
+ return blk_pread(be, pos, buf, size);
+}
+
+static const QEMUFileOps loadstate_file_ops = {
+ .get_buffer = loadstate_get_buffer,
+};
+ Coroutine *wait_for_close = qemu_coroutine_create(wait_for_close_co, NULL);
+
+ if (snap_state.state == SAVE_STATE_ACTIVE) {
+ snap_state.state = SAVE_STATE_CANCELLED;
+ qemu_coroutine_enter(wait_for_close);
+ return;
+ }
+
+ if (snap_state.saved_vm_running) {
+ vm_start();
+ snap_state.saved_vm_running = false;
+ }
+
+ snap_state.state = SAVE_STATE_DONE;
+
+ qemu_coroutine_enter(wait_for_close);
+}
+
+int load_snapshot_from_blockdev(const char *filename, Error **errp)
+{
+ BlockBackend *be;
+ BlockDriverState *bs = NULL;
+ Error *local_err = NULL;
+ Error *blocker = NULL;
+ QDict *options;
+
+ QEMUFile *f;
+ size_t bs_pos = 0;
+ int ret = -EINVAL;
+
+ be = blk_new_open(filename, NULL, NULL, 0, &local_err);
+ options = qdict_new();
+ qdict_put_str(options, "driver", "raw");
+
+ be = blk_new_open(filename, NULL, options, 0, &local_err);
+
+ if (!be) {
+ error_setg(errp, "Could not open VM state file");
+ goto the_end;
+ }
+
+ bs = blk_bs(be);
+ if (!bs) {
+ error_setg(errp, "Could not open VM state file - missing block driver state");
+ goto the_end;
+ }
+
+ error_setg(&blocker, "block device is in use by load state");
+ blk_op_block_all(be, blocker);
+ bdrv_op_block_all(bs, blocker);
+
+ /* restore the VM state */
+ f = qemu_fopen_ops(be, &loadstate_file_ops);
+ f = qemu_file_new_input(QIO_CHANNEL(qio_channel_savevm_async_new(be, &bs_pos)));
+ if (!f) {
+ error_setg(errp, "Could not open VM state file");
+ goto the_end;
@@ -737,6 +695,10 @@ index 0000000000..593a619088
+ dirty_bitmap_mig_before_vm_start();
+
+ qemu_fclose(f);
+
+ /* state_destroy assumes a real migration which would have added a yank */
+ yank_register_instance(MIGRATION_YANK_INSTANCE, &error_abort);
+
+ migration_incoming_state_destroy();
+ if (ret < 0) {
+ error_setg_errno(errp, -ret, "Error while loading VM state");
@@ -747,46 +709,37 @@ index 0000000000..593a619088
+
+ the_end:
+ if (be) {
+ blk_op_unblock_all(be, blocker);
+ if (bs) {
+ bdrv_op_unblock_all(bs, blocker);
+ }
+ error_free(blocker);
+ blk_unref(be);
+ }
+ return ret;
+}
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 705f08a8f1..77ab152aab 100644
index f601d06ab8..874084565f 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1949,6 +1949,63 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
hmp_handle_error(mon, err);
@@ -24,6 +24,7 @@
#include "qapi/error.h"
#include "qapi/qapi-commands-control.h"
#include "qapi/qapi-commands-machine.h"
+#include "qapi/qapi-commands-migration.h"
#include "qapi/qapi-commands-misc.h"
#include "qapi/qmp/qdict.h"
#include "qemu/cutils.h"
@@ -434,3 +435,40 @@ void hmp_dumpdtb(Monitor *mon, const QDict *qdict)
monitor_printf(mon, "dtb dumped to %s", filename);
}
#endif
+
+void hmp_savevm_start(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+ const char *statefile = qdict_get_try_str(qdict, "statefile");
+
+ qmp_savevm_start(statefile != NULL, statefile, &errp);
+ hmp_handle_error(mon, errp);
+}
+
+void hmp_snapshot_drive(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+ const char *name = qdict_get_str(qdict, "name");
+ const char *device = qdict_get_str(qdict, "device");
+
+ qmp_snapshot_drive(device, name, &errp);
+ hmp_handle_error(mon, errp);
+}
+
+void hmp_delete_drive_snapshot(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+ const char *name = qdict_get_str(qdict, "name");
+ const char *device = qdict_get_str(qdict, "device");
+
+ qmp_delete_drive_snapshot(device, name, &errp);
+ qmp_savevm_start(statefile, &errp);
+ hmp_handle_error(mon, errp);
+}
+
@@ -803,7 +756,7 @@ index 705f08a8f1..77ab152aab 100644
+ SaveVMInfo *info;
+ info = qmp_query_savevm(NULL);
+
+ if (info->has_status) {
+ if (info->status) {
+ monitor_printf(mon, "savevm status: %s\n", info->status);
+ monitor_printf(mon, "total time: %" PRIu64 " milliseconds\n",
+ info->total_time);
@@ -813,21 +766,17 @@ index 705f08a8f1..77ab152aab 100644
+ if (info->has_bytes) {
+ monitor_printf(mon, "Bytes saved: %"PRIu64"\n", info->bytes);
+ }
+ if (info->has_error) {
+ if (info->error) {
+ monitor_printf(mon, "Error: %s\n", info->error);
+ }
+}
+
void hmp_info_iothreads(Monitor *mon, const QDict *qdict)
{
IOThreadInfoList *info_list = qmp_query_iothreads(NULL);
diff --git a/qapi/migration.json b/qapi/migration.json
index 3c75820527..cb3627884c 100644
index a605dc26db..927b1e1c7d 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -242,6 +242,40 @@
'*compression': 'CompressionStats',
'*socket-address': ['SocketAddress'] } }
@@ -276,6 +276,40 @@
'*dirty-limit-throttle-time-per-round': 'uint64',
'*dirty-limit-ring-full-time': 'uint64'} }
+##
+# @SaveVMInfo:
@@ -867,10 +816,10 @@ index 3c75820527..cb3627884c 100644
# @query-migrate:
#
diff --git a/qapi/misc.json b/qapi/misc.json
index 40df513856..4f5333d960 100644
index 559b66f201..7959e89c1e 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -476,6 +476,38 @@
@@ -454,6 +454,24 @@
##
{ 'command': 'query-fdsets', 'returns': ['FdsetInfo'] }
@@ -879,41 +828,27 @@ index 40df513856..4f5333d960 100644
+#
+# Prepare for snapshot and halt VM. Save VM state to statefile.
+#
+# @statefile: target file that state should be written to.
+#
+##
+{ 'command': 'savevm-start', 'data': { '*statefile': 'str' } }
+
+##
+# @snapshot-drive:
+#
+# Create an internal drive snapshot.
+#
+##
+{ 'command': 'snapshot-drive', 'data': { 'device': 'str', 'name': 'str' } }
+
+##
+# @delete-drive-snapshot:
+#
+# Delete a drive snapshot.
+#
+##
+{ 'command': 'delete-drive-snapshot', 'data': { 'device': 'str', 'name': 'str' } }
+
+##
+# @savevm-end:
+#
+# Resume VM after a snapshot.
+#
+##
+{ 'command': 'savevm-end', 'coroutine': true }
+{ 'command': 'savevm-end' }
+
##
# @CommandLineParameterType:
#
diff --git a/qemu-options.hx b/qemu-options.hx
index 104632ea34..c1352312c2 100644
index dacc9790a4..c05f411599 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3903,6 +3903,18 @@ SRST
@@ -4764,6 +4764,18 @@ SRST
Start right away with a saved state (``loadvm`` in monitor)
ERST
@@ -932,32 +867,22 @@ index 104632ea34..c1352312c2 100644
#ifndef _WIN32
DEF("daemonize", 0, QEMU_OPTION_daemonize, \
"-daemonize daemonize QEMU after initializing\n", QEMU_ARCH_ALL)
diff --git a/softmmu/vl.c b/softmmu/vl.c
index e6e0ad5a92..03152c816c 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2878,6 +2878,7 @@ void qemu_init(int argc, char **argv, char **envp)
int optind;
const char *optarg;
const char *loadvm = NULL;
+ const char *loadstate = NULL;
MachineClass *machine_class;
const char *cpu_option;
const char *vga_model = NULL;
@@ -3439,6 +3440,9 @@ void qemu_init(int argc, char **argv, char **envp)
case QEMU_OPTION_loadvm:
loadvm = optarg;
break;
+ case QEMU_OPTION_loadstate:
+ loadstate = optarg;
+ break;
case QEMU_OPTION_full_screen:
dpy.has_full_screen = true;
dpy.full_screen = true;
@@ -4478,6 +4482,12 @@ void qemu_init(int argc, char **argv, char **envp)
autostart = 0;
exit(1);
}
diff --git a/system/vl.c b/system/vl.c
index 2f855d83fb..39d451bb41 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -164,6 +164,7 @@ static const char *accelerators;
static bool have_custom_ram_size;
static const char *ram_memdev_id;
static QDict *machine_opts_dict;
+static const char *loadstate;
static QTAILQ_HEAD(, ObjectOption) object_opts = QTAILQ_HEAD_INITIALIZER(object_opts);
static QTAILQ_HEAD(, DeviceOption) device_opts = QTAILQ_HEAD_INITIALIZER(device_opts);
static int display_remote;
@@ -2725,6 +2726,12 @@ void qmp_x_exit_preconfig(Error **errp)
RunState state = autostart ? RUN_STATE_RUNNING : runstate_get();
load_snapshot(loadvm, NULL, false, NULL, &error_fatal);
load_snapshot_resume(state);
+ } else if (loadstate) {
+ Error *local_err = NULL;
+ if (load_snapshot_from_blockdev(loadstate, &local_err) < 0) {
@@ -967,3 +892,13 @@ index e6e0ad5a92..03152c816c 100644
}
if (replay_mode != REPLAY_MODE_NONE) {
replay_vmstate_init();
@@ -3262,6 +3269,9 @@ void qemu_init(int argc, char **argv)
case QEMU_OPTION_loadvm:
loadvm = optarg;
break;
+ case QEMU_OPTION_loadstate:
+ loadstate = optarg;
+ break;
case QEMU_OPTION_full_screen:
dpy.has_full_screen = true;
dpy.full_screen = true;

View File

@@ -9,19 +9,22 @@ increase performance storing the state onto ceph.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[increase max IOV count in QEMUFile to actually write more data]
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: adapt to removal of QEMUFileOps]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
migration/qemu-file.c | 38 +++++++++++++++++++++++++-------------
migration/qemu-file.h | 1 +
migration/savevm-async.c | 4 ++--
3 files changed, 28 insertions(+), 15 deletions(-)
migration/qemu-file.c | 48 +++++++++++++++++++++++++++-------------
migration/qemu-file.h | 2 ++
migration/savevm-async.c | 5 +++--
3 files changed, 38 insertions(+), 17 deletions(-)
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index be21518c57..1926b5202c 100644
index b6d2f588bd..754dc0b3f7 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -30,8 +30,8 @@
#include "trace.h"
#include "qapi/error.h"
@@ -34,8 +34,8 @@
#include "rdma.h"
#include "io/channel-file.h"
-#define IO_BUF_SIZE 32768
-#define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
@@ -29,9 +32,9 @@ index be21518c57..1926b5202c 100644
+#define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 256)
struct QEMUFile {
const QEMUFileOps *ops;
@@ -45,7 +45,8 @@ struct QEMUFile {
when reading */
QIOChannel *ioc;
@@ -43,7 +43,8 @@ struct QEMUFile {
int buf_index;
int buf_size; /* 0 when writing */
- uint8_t buf[IO_BUF_SIZE];
@@ -40,53 +43,77 @@ index be21518c57..1926b5202c 100644
DECLARE_BITMAP(may_free, MAX_IOV_SIZE);
struct iovec iov[MAX_IOV_SIZE];
@@ -101,7 +102,7 @@ bool qemu_file_mode_is_not_valid(const char *mode)
return false;
@@ -100,7 +101,9 @@ int qemu_file_shutdown(QEMUFile *f)
return 0;
}
-QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
+QEMUFile *qemu_fopen_ops_sized(void *opaque, const QEMUFileOps *ops, size_t buffer_size)
-static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
+static QEMUFile *qemu_file_new_impl(QIOChannel *ioc,
+ bool is_writable,
+ size_t buffer_size)
{
QEMUFile *f;
@@ -109,9 +110,17 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
f->opaque = opaque;
f->ops = ops;
@@ -109,6 +112,8 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
object_ref(ioc);
f->ioc = ioc;
f->is_writable = is_writable;
+ f->buf_allocated_size = buffer_size;
+ f->buf = malloc(buffer_size);
+
return f;
}
@@ -119,17 +124,27 @@ static QEMUFile *qemu_file_new_impl(QIOChannel *ioc, bool is_writable)
*/
QEMUFile *qemu_file_get_return_path(QEMUFile *f)
{
- return qemu_file_new_impl(f->ioc, !f->is_writable);
+ return qemu_file_new_impl(f->ioc, !f->is_writable, DEFAULT_IO_BUF_SIZE);
}
+QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
+{
+ return qemu_fopen_ops_sized(opaque, ops, DEFAULT_IO_BUF_SIZE);
QEMUFile *qemu_file_new_output(QIOChannel *ioc)
{
- return qemu_file_new_impl(ioc, true);
+ return qemu_file_new_impl(ioc, true, DEFAULT_IO_BUF_SIZE);
+}
+
+QEMUFile *qemu_file_new_output_sized(QIOChannel *ioc, size_t buffer_size)
+{
+ return qemu_file_new_impl(ioc, true, buffer_size);
}
void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
QEMUFile *qemu_file_new_input(QIOChannel *ioc)
{
@@ -346,7 +355,7 @@ static ssize_t qemu_fill_buffer(QEMUFile *f)
}
- return qemu_file_new_impl(ioc, false);
+ return qemu_file_new_impl(ioc, false, DEFAULT_IO_BUF_SIZE);
+}
+
+QEMUFile *qemu_file_new_input_sized(QIOChannel *ioc, size_t buffer_size)
+{
+ return qemu_file_new_impl(ioc, false, buffer_size);
}
len = f->ops->get_buffer(f->opaque, f->buf + pending, f->pos,
- IO_BUF_SIZE - pending, &local_error);
+ f->buf_allocated_size - pending, &local_error);
if (len > 0) {
f->buf_size += len;
f->pos += len;
@@ -386,6 +395,9 @@ int qemu_fclose(QEMUFile *f)
ret = ret2;
}
/*
@@ -327,7 +342,7 @@ static ssize_t coroutine_mixed_fn qemu_fill_buffer(QEMUFile *f)
do {
len = qio_channel_read(f->ioc,
(char *)f->buf + pending,
- IO_BUF_SIZE - pending,
+ f->buf_allocated_size - pending,
&local_error);
if (len == QIO_CHANNEL_ERR_BLOCK) {
if (qemu_in_coroutine()) {
@@ -367,6 +382,9 @@ int qemu_fclose(QEMUFile *f)
ret = ret2;
}
g_clear_pointer(&f->ioc, object_unref);
+
+ free(f->buf);
+
/* If any error was spotted before closing, we should report it
* instead of the close() return value.
*/
@@ -435,7 +447,7 @@ static void add_buf_to_iovec(QEMUFile *f, size_t len)
error_free(f->last_error_obj);
g_free(f);
trace_qemu_file_fclose();
@@ -415,7 +433,7 @@ static void add_buf_to_iovec(QEMUFile *f, size_t len)
{
if (!add_to_iovec(f, f->buf + f->buf_index, len, false)) {
f->buf_index += len;
@@ -95,7 +122,7 @@ index be21518c57..1926b5202c 100644
qemu_fflush(f);
}
}
@@ -461,7 +473,7 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, size_t size)
@@ -440,7 +458,7 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, size_t size)
}
while (size > 0) {
@@ -104,7 +131,7 @@ index be21518c57..1926b5202c 100644
if (l > size) {
l = size;
}
@@ -508,8 +520,8 @@ size_t qemu_peek_buffer(QEMUFile *f, uint8_t **buf, size_t size, size_t offset)
@@ -586,8 +604,8 @@ size_t coroutine_mixed_fn qemu_peek_buffer(QEMUFile *f, uint8_t **buf, size_t si
size_t index;
assert(!qemu_file_is_writable(f));
@@ -115,7 +142,7 @@ index be21518c57..1926b5202c 100644
/* The 1st byte to read from */
index = f->buf_index + offset;
@@ -559,7 +571,7 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
@@ -637,7 +655,7 @@ size_t coroutine_mixed_fn qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size
size_t res;
uint8_t *src;
@@ -124,16 +151,16 @@ index be21518c57..1926b5202c 100644
if (res == 0) {
return done;
}
@@ -593,7 +605,7 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
@@ -671,7 +689,7 @@ size_t coroutine_mixed_fn qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size
*/
size_t qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size)
size_t coroutine_mixed_fn qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size)
{
- if (size < IO_BUF_SIZE) {
+ if (size < f->buf_allocated_size) {
size_t res;
uint8_t *src;
uint8_t *src = NULL;
@@ -618,7 +630,7 @@ int qemu_peek_byte(QEMUFile *f, int offset)
@@ -696,7 +714,7 @@ int coroutine_mixed_fn qemu_peek_byte(QEMUFile *f, int offset)
int index = f->buf_index + offset;
assert(!qemu_file_is_writable(f));
@@ -142,46 +169,40 @@ index be21518c57..1926b5202c 100644
if (index >= f->buf_size) {
qemu_fill_buffer(f);
@@ -770,7 +782,7 @@ static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
const uint8_t *p, size_t size)
{
- ssize_t blen = IO_BUF_SIZE - f->buf_index - sizeof(int32_t);
+ ssize_t blen = f->buf_allocated_size - f->buf_index - sizeof(int32_t);
if (blen < compressBound(size)) {
return -1;
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index a9b6d6ccb7..8752d27c74 100644
index 11c2120edd..edf3c5d147 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -120,6 +120,7 @@ typedef struct QEMUFileHooks {
} QEMUFileHooks;
@@ -30,7 +30,9 @@
#include "io/channel.h"
QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
+QEMUFile *qemu_fopen_ops_sized(void *opaque, const QEMUFileOps *ops, size_t buffer_size);
void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks);
int qemu_get_fd(QEMUFile *f);
QEMUFile *qemu_file_new_input(QIOChannel *ioc);
+QEMUFile *qemu_file_new_input_sized(QIOChannel *ioc, size_t buffer_size);
QEMUFile *qemu_file_new_output(QIOChannel *ioc);
+QEMUFile *qemu_file_new_output_sized(QIOChannel *ioc, size_t buffer_size);
int qemu_fclose(QEMUFile *f);
/*
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index 593a619088..cc2552d977 100644
index ee8ef316d0..1e79fce9ba 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -418,7 +418,7 @@ void qmp_savevm_start(bool has_statefile, const char *statefile, Error **errp)
goto restart;
}
@@ -390,7 +390,7 @@ void qmp_savevm_start(const char *statefile, Error **errp)
- snap_state.file = qemu_fopen_ops(&snap_state, &block_file_ops);
+ snap_state.file = qemu_fopen_ops_sized(&snap_state, &block_file_ops, 4 * 1024 * 1024);
QIOChannel *ioc = QIO_CHANNEL(qio_channel_savevm_async_new(snap_state.target,
&snap_state.bs_pos));
- snap_state.file = qemu_file_new_output(ioc);
+ snap_state.file = qemu_file_new_output_sized(ioc, 4 * 1024 * 1024);
if (!snap_state.file) {
error_set(errp, ERROR_CLASS_GENERIC_ERROR, "failed to open '%s'", statefile);
@@ -567,7 +567,7 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
blk_op_block_all(be, blocker);
error_setg(errp, "failed to open '%s'", statefile);
@@ -534,7 +534,8 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
bdrv_op_block_all(bs, blocker);
/* restore the VM state */
- f = qemu_fopen_ops(be, &loadstate_file_ops);
+ f = qemu_fopen_ops_sized(be, &loadstate_file_ops, 4 * 1024 * 1024);
- f = qemu_file_new_input(QIO_CHANNEL(qio_channel_savevm_async_new(be, &bs_pos)));
+ f = qemu_file_new_input_sized(QIO_CHANNEL(qio_channel_savevm_async_new(be, &bs_pos)),
+ 4 * 1024 * 1024);
if (!f) {
error_setg(errp, "Could not open VM state file");
goto the_end;

View File

@@ -4,30 +4,34 @@ Date: Mon, 6 Apr 2020 12:16:47 +0200
Subject: [PATCH] PVE: block: add the zeroinit block driver filter
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: adapt to changed function signatures
adhere to block graph lock requirements
use dedicated function to open file child]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/meson.build | 1 +
block/zeroinit.c | 196 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 197 insertions(+)
block/zeroinit.c | 207 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 208 insertions(+)
create mode 100644 block/zeroinit.c
diff --git a/block/meson.build b/block/meson.build
index 5dcc1e5cce..c10d544864 100644
index f1262ec2ba..6a60b5d6b9 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -39,6 +39,7 @@ block_ss.add(files(
'vmdk.c',
'vpc.c',
'throttle.c',
'throttle-groups.c',
'write-threshold.c',
+ 'zeroinit.c',
), zstd, zlib)
softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
system_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
diff --git a/block/zeroinit.c b/block/zeroinit.c
new file mode 100644
index 0000000000..5529627f7e
index 0000000000..2b2b194ccf
--- /dev/null
+++ b/block/zeroinit.c
@@ -0,0 +1,196 @@
@@ -0,0 +1,207 @@
+/*
+ * Filter to fake a zero-initialized block device.
+ *
@@ -41,6 +45,8 @@ index 0000000000..5529627f7e
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "block/block_int.h"
+#include "block/block-io.h"
+#include "block/graph-lock.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qstring.h"
+#include "qemu/cutils.h"
@@ -106,10 +112,9 @@ index 0000000000..5529627f7e
+ }
+
+ /* Open the raw file */
+ bs->file = bdrv_open_child(qemu_opt_get(opts, "x-next"), options, "next",
+ bs, &child_of_bds, BDRV_CHILD_FILTERED, false, &local_err);
+ if (local_err) {
+ ret = -EINVAL;
+ ret = bdrv_open_file_child(qemu_opt_get(opts, "x-next"), options, "next",
+ bs, &local_err);
+ if (ret < 0) {
+ error_propagate(errp, local_err);
+ goto fail;
+ }
@@ -120,7 +125,9 @@ index 0000000000..5529627f7e
+ ret = 0;
+fail:
+ if (ret < 0) {
+ bdrv_graph_wrlock();
+ bdrv_unref_child(bs, bs->file);
+ bdrv_graph_wrunlock();
+ }
+ qemu_opts_del(opts);
+ return ret;
@@ -132,28 +139,32 @@ index 0000000000..5529627f7e
+ (void)s;
+}
+
+static int64_t zeroinit_getlength(BlockDriverState *bs)
+static coroutine_fn int64_t GRAPH_RDLOCK
+zeroinit_co_getlength(BlockDriverState *bs)
+{
+ return bdrv_getlength(bs->file->bs);
+ return bdrv_co_getlength(bs->file->bs);
+}
+
+static int coroutine_fn zeroinit_co_preadv(BlockDriverState *bs,
+ uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
+static int coroutine_fn GRAPH_RDLOCK
+zeroinit_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
+}
+
+static int coroutine_fn zeroinit_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+ int count, BdrvRequestFlags flags)
+static int coroutine_fn GRAPH_RDLOCK
+zeroinit_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ BdrvRequestFlags flags)
+{
+ BDRVZeroinitState *s = bs->opaque;
+ if (offset >= s->extents)
+ return 0;
+ return bdrv_pwrite_zeroes(bs->file, offset, count, flags);
+ return bdrv_pwrite_zeroes(bs->file, offset, bytes, flags);
+}
+
+static int coroutine_fn zeroinit_co_pwritev(BlockDriverState *bs,
+ uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
+static int coroutine_fn GRAPH_RDLOCK
+zeroinit_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ BDRVZeroinitState *s = bs->opaque;
+ int64_t extents = offset + bytes;
@@ -162,33 +173,37 @@ index 0000000000..5529627f7e
+ return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+}
+
+static coroutine_fn int zeroinit_co_flush(BlockDriverState *bs)
+static coroutine_fn int GRAPH_RDLOCK
+zeroinit_co_flush(BlockDriverState *bs)
+{
+ return bdrv_co_flush(bs->file->bs);
+}
+
+static int zeroinit_has_zero_init(BlockDriverState *bs)
+static int GRAPH_RDLOCK
+zeroinit_has_zero_init(BlockDriverState *bs)
+{
+ BDRVZeroinitState *s = bs->opaque;
+ return s->has_zero_init;
+}
+
+static int coroutine_fn zeroinit_co_pdiscard(BlockDriverState *bs,
+ int64_t offset, int count)
+static int coroutine_fn GRAPH_RDLOCK
+zeroinit_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
+{
+ return bdrv_co_pdiscard(bs->file, offset, count);
+ return bdrv_co_pdiscard(bs->file, offset, bytes);
+}
+
+static int zeroinit_co_truncate(BlockDriverState *bs, int64_t offset,
+ _Bool exact, PreallocMode prealloc,
+ BdrvRequestFlags req_flags, Error **errp)
+static int GRAPH_RDLOCK
+zeroinit_co_truncate(BlockDriverState *bs, int64_t offset, _Bool exact,
+ PreallocMode prealloc, BdrvRequestFlags req_flags,
+ Error **errp)
+{
+ return bdrv_co_truncate(bs->file, offset, exact, prealloc, req_flags, errp);
+}
+
+static int zeroinit_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+static coroutine_fn int GRAPH_RDLOCK
+zeroinit_co_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+ return bdrv_get_info(bs->file->bs, bdi);
+ return bdrv_co_get_info(bs->file->bs, bdi);
+}
+
+static BlockDriver bdrv_zeroinit = {
@@ -197,9 +212,9 @@ index 0000000000..5529627f7e
+ .instance_size = sizeof(BDRVZeroinitState),
+
+ .bdrv_parse_filename = zeroinit_parse_filename,
+ .bdrv_file_open = zeroinit_open,
+ .bdrv_open = zeroinit_open,
+ .bdrv_close = zeroinit_close,
+ .bdrv_getlength = zeroinit_getlength,
+ .bdrv_co_getlength = zeroinit_co_getlength,
+ .bdrv_child_perm = bdrv_default_perms,
+ .bdrv_co_flush_to_disk = zeroinit_co_flush,
+
@@ -215,7 +230,7 @@ index 0000000000..5529627f7e
+ .bdrv_co_pdiscard = zeroinit_co_pdiscard,
+
+ .bdrv_co_truncate = zeroinit_co_truncate,
+ .bdrv_get_info = zeroinit_get_info,
+ .bdrv_co_get_info = zeroinit_co_get_info,
+};
+
+static void bdrv_zeroinit_init(void)

View File

@@ -10,16 +10,16 @@ Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
qemu-options.hx | 3 +++
softmmu/vl.c | 8 ++++++++
system/vl.c | 8 ++++++++
2 files changed, 11 insertions(+)
diff --git a/qemu-options.hx b/qemu-options.hx
index c1352312c2..9a0cb6780e 100644
index c05f411599..0732077a0e 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -906,6 +906,9 @@ DEFHEADING()
@@ -1239,6 +1239,9 @@ legacy PC, they are not recommended for modern configurations.
DEFHEADING(Block device options:)
ERST
+DEF("id", HAS_ARG, QEMU_OPTION_id,
+ "-id n set the VMID", QEMU_ARCH_ALL)
@@ -27,21 +27,21 @@ index c1352312c2..9a0cb6780e 100644
DEF("fda", HAS_ARG, QEMU_OPTION_fda,
"-fda/-fdb file use 'file' as floppy disk 0/1 image\n", QEMU_ARCH_ALL)
DEF("fdb", HAS_ARG, QEMU_OPTION_fdb, "", QEMU_ARCH_ALL)
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 03152c816c..da204d24f0 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2866,6 +2866,7 @@ static char *find_datadir(void)
void qemu_init(int argc, char **argv, char **envp)
{
int i;
diff --git a/system/vl.c b/system/vl.c
index 39d451bb41..e7cae51f13 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -2762,6 +2762,7 @@ void qemu_init(int argc, char **argv)
MachineClass *machine_class;
bool userconfig = true;
FILE *vmstate_dump_file = NULL;
+ long vm_id;
int snapshot, linux_boot;
const char *initrd_filename;
const char *kernel_filename, *kernel_cmdline;
@@ -3557,6 +3558,13 @@ void qemu_init(int argc, char **argv, char **envp)
exit(1);
}
qemu_add_opts(&qemu_drive_opts);
qemu_add_drive_opts(&qemu_legacy_drive_opts);
@@ -3374,6 +3375,13 @@ void qemu_init(int argc, char **argv)
machine_parse_property_opt(qemu_find_opts("smp-opts"),
"smp", optarg);
break;
+ case QEMU_OPTION_id:
+ vm_id = strtol(optarg, (char **)&optarg, 10);
@@ -50,6 +50,6 @@ index 03152c816c..da204d24f0 100644
+ exit(1);
+ }
+ break;
#ifdef CONFIG_VNC
case QEMU_OPTION_vnc:
vnc_parse(optarg, &error_fatal);
break;
vnc_parse(optarg);

View File

@@ -11,10 +11,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 9 insertions(+)
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 502e94effc..590ef6ec8e 100644
index 62f3bbf203..89e0c7d995 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -278,6 +278,15 @@ static void apic_reset_common(DeviceState *dev)
@@ -263,6 +263,15 @@ static void apic_reset_common(DeviceState *dev)
info->vapic_base_update(s);
apic_init_reset(dev);

View File

@@ -8,15 +8,15 @@ Otherwise creating images on nfs/cifs can be problematic.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/file-posix.c | 61 +++++++++++++++++++++++++++++---------------
qapi/block-core.json | 3 ++-
2 files changed, 43 insertions(+), 21 deletions(-)
block/file-posix.c | 59 ++++++++++++++++++++++++++++++--------------
qapi/block-core.json | 7 +++++-
2 files changed, 46 insertions(+), 20 deletions(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index bda3e606dc..037839622e 100644
index e2ea071315..4c3dc56c8e 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2388,6 +2388,7 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
@@ -2884,6 +2884,7 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
int fd;
uint64_t perm, shared;
int result = 0;
@@ -24,7 +24,7 @@ index bda3e606dc..037839622e 100644
/* Validate options and set default values */
assert(options->driver == BLOCKDEV_DRIVER_FILE);
@@ -2428,19 +2429,22 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
@@ -2924,19 +2925,22 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
perm = BLK_PERM_WRITE | BLK_PERM_RESIZE;
shared = BLK_PERM_ALL & ~BLK_PERM_RESIZE;
@@ -59,7 +59,7 @@ index bda3e606dc..037839622e 100644
}
/* Clear the file by truncating it to 0 */
@@ -2494,13 +2498,15 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
@@ -2990,13 +2994,15 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
}
out_unlock:
@@ -82,7 +82,7 @@ index bda3e606dc..037839622e 100644
}
out_close:
@@ -2525,6 +2531,7 @@ static int coroutine_fn raw_co_create_opts(BlockDriver *drv,
@@ -3020,6 +3026,7 @@ raw_co_create_opts(BlockDriver *drv, const char *filename,
PreallocMode prealloc;
char *buf = NULL;
Error *local_err = NULL;
@@ -90,7 +90,7 @@ index bda3e606dc..037839622e 100644
/* Skip file: protocol prefix */
strstart(filename, "file:", &filename);
@@ -2547,6 +2554,18 @@ static int coroutine_fn raw_co_create_opts(BlockDriver *drv,
@@ -3042,6 +3049,18 @@ raw_co_create_opts(BlockDriver *drv, const char *filename,
return -EINVAL;
}
@@ -109,7 +109,7 @@ index bda3e606dc..037839622e 100644
options = (BlockdevCreateOptions) {
.driver = BLOCKDEV_DRIVER_FILE,
.u.file = {
@@ -2558,6 +2577,8 @@ static int coroutine_fn raw_co_create_opts(BlockDriver *drv,
@@ -3053,6 +3072,8 @@ raw_co_create_opts(BlockDriver *drv, const char *filename,
.nocow = nocow,
.has_extent_size_hint = has_extent_size_hint,
.extent_size_hint = extent_size_hint,
@@ -118,20 +118,22 @@ index bda3e606dc..037839622e 100644
},
};
return raw_co_create(&options, errp);
@@ -3104,7 +3125,7 @@ static int raw_check_perm(BlockDriverState *bs, uint64_t perm, uint64_t shared,
}
/* Copy locks to the new fd */
- if (s->perm_change_fd) {
+ if (s->use_lock && s->perm_change_fd) {
ret = raw_apply_lock_bytes(NULL, s->perm_change_fd, perm, ~shared,
false, errp);
if (ret < 0) {
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9db3120716..d285622589 100644
index 48ba32049f..321d1fd0e1 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4224,7 +4224,8 @@
@@ -4974,6 +4974,10 @@
# @extent-size-hint: Extent size hint to add to the image file; 0 for
# not adding an extent size hint (default: 1 MB, since 5.1)
#
+# @locking: whether to enable file locking. If set to 'auto', only
+# enable when Open File Descriptor (OFD) locking API is available
+# (default: auto).
+#
# Since: 2.12
##
{ 'struct': 'BlockdevCreateOptionsFile',
@@ -4981,7 +4985,8 @@
'size': 'size',
'*preallocation': 'PreallocMode',
'*nocow': 'bool',

View File

@@ -18,10 +18,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/monitor/qmp.c b/monitor/qmp.c
index b42f8c6af3..2e37d11bd3 100644
index eb181d5979..20fc0d20a6 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -466,8 +466,7 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
@@ -534,8 +534,7 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
qemu_chr_fe_set_echo(&mon->common.chr, true);
/* Note: we run QMP monitor in I/O thread when @chr supports that */

View File

@@ -26,10 +26,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index d0408049b5..5b38cf9356 100644
index f29fe95964..2c327fc36a 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -78,7 +78,8 @@ GlobalProperty hw_compat_4_0[] = {
@@ -180,7 +180,8 @@ GlobalProperty hw_compat_4_0[] = {
{ "virtio-vga", "edid", "false" },
{ "virtio-gpu-device", "edid", "false" },
{ "virtio-device", "use-started", "false" },

View File

@@ -10,100 +10,119 @@ Version is made available as 'pve-version' in query-machines (same as,
and only if 'is-current').
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: adapt to QAPI changes]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
hw/core/machine-qmp-cmds.c | 6 ++++++
hw/core/machine-qmp-cmds.c | 5 +++++
include/hw/boards.h | 2 ++
qapi/machine.json | 4 +++-
softmmu/vl.c | 15 ++++++++++++++-
4 files changed, 25 insertions(+), 2 deletions(-)
qapi/machine.json | 3 +++
system/vl.c | 24 ++++++++++++++++++++++++
4 files changed, 34 insertions(+)
diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 3fcb82ce2f..7868241bd5 100644
index 52a6d74820..362128842d 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -238,6 +238,12 @@ MachineInfoList *qmp_query_machines(Error **errp)
@@ -94,6 +94,11 @@ MachineInfoList *qmp_query_machines(bool has_compat_props, bool compat_props,
if (strcmp(mc->name, MACHINE_GET_CLASS(current_machine)->name) == 0) {
info->has_is_current = true;
info->is_current = true;
+
+ // PVE version string only exists for current machine
+ if (mc->pve_version) {
+ info->has_pve_version = true;
+ info->pve_version = g_strdup(mc->pve_version);
+ }
}
if (mc->default_cpu_type) {
diff --git a/include/hw/boards.h b/include/hw/boards.h
index a49e3a6b44..8e0a8c5571 100644
index 36fbb9b59d..d1741ea121 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -165,6 +165,8 @@ struct MachineClass {
@@ -268,6 +268,8 @@ struct MachineClass {
const char *desc;
const char *deprecation_reason;
+ const char *pve_version;
+
void (*init)(MachineState *state);
void (*reset)(MachineState *state);
void (*reset)(MachineState *state, ResetType type);
void (*wakeup)(MachineState *state);
diff --git a/qapi/machine.json b/qapi/machine.json
index dfc1a49d3c..32fc674042 100644
index 16366b774a..12cfd3f260 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -337,6 +337,8 @@
@@ -189,6 +189,8 @@
#
# @default-ram-id: the default ID of initial RAM memory backend (since 5.2)
# @acpi: machine type supports ACPI (since 8.0)
#
+# @pve-version: custom PVE version suffix specified as 'machine+pveN'
+#
# Since: 1.2.0
##
{ 'struct': 'MachineInfo',
@@ -344,7 +346,7 @@
'*is-default': 'bool', '*is-current': 'bool', 'cpu-max': 'int',
# @compat-props: The machine type's compatibility properties. Only
# present when query-machines argument @compat-props is true.
# (since 9.1)
@@ -205,6 +207,7 @@
'hotpluggable-cpus': 'bool', 'numa-mem-supported': 'bool',
'deprecated': 'bool', '*default-cpu-type': 'str',
- '*default-ram-id': 'str' } }
+ '*default-ram-id': 'str', '*pve-version': 'str' } }
'*default-ram-id': 'str', 'acpi': 'bool',
+ '*pve-version': 'str',
'*compat-props': { 'type': ['CompatProperty'],
'features': ['unstable'] } } }
##
# @query-machines:
diff --git a/softmmu/vl.c b/softmmu/vl.c
index da204d24f0..5b5512128e 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2325,6 +2325,8 @@ static MachineClass *machine_parse(const char *name, GSList *machines)
diff --git a/system/vl.c b/system/vl.c
index e7cae51f13..3f4916ac5a 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -1675,6 +1675,7 @@ static MachineClass *select_machine(QDict *qdict, Error **errp)
{
MachineClass *mc;
GSList *el;
+ size_t pvever_index = 0;
+ gchar *name_clean;
ERRP_GUARD();
const char *machine_type = qdict_get_try_str(qdict, "type");
+ const char *pvever = qdict_get_try_str(qdict, "pvever");
g_autoptr(GSList) machines = object_class_get_list(TYPE_MACHINE, false);
MachineClass *machine_class = NULL;
if (is_help_option(name)) {
printf("Supported machines are:\n");
@@ -2341,12 +2343,23 @@ static MachineClass *machine_parse(const char *name, GSList *machines)
exit(0);
}
- mc = find_machine(name, machines);
+ // PVE version is specified with '+' as seperator, e.g. pc-i440fx+pvever
+ pvever_index = strcspn(name, "+");
+
+ name_clean = g_strndup(name, pvever_index);
+ mc = find_machine(name_clean, machines);
+ g_free(name_clean);
+
if (!mc) {
error_report("unsupported machine type");
error_printf("Use -machine help to list supported machines\n");
exit(1);
@@ -1694,7 +1695,11 @@ static MachineClass *select_machine(QDict *qdict, Error **errp)
if (!machine_class) {
error_append_hint(errp,
"Use -machine help to list supported machines\n");
+ } else {
+ machine_class->pve_version = g_strdup(pvever);
+ qdict_del(qdict, "pvever");
}
+
+ if (pvever_index < strlen(name)) {
+ mc->pve_version = &name[pvever_index+1];
+ }
+
return mc;
return machine_class;
}
@@ -3316,12 +3321,31 @@ void qemu_init(int argc, char **argv)
case QEMU_OPTION_machine:
{
bool help;
+ size_t pvever_index, name_len;
+ const gchar *name;
+ gchar *name_clean, *pvever;
keyval_parse_into(machine_opts_dict, optarg, "type", &help, &error_fatal);
if (help) {
machine_help_func(machine_opts_dict);
exit(EXIT_SUCCESS);
}
+
+ // PVE version is specified with '+' as seperator, e.g. pc-i440fx+pvever
+ name = qdict_get_try_str(machine_opts_dict, "type");
+ if (name != NULL) {
+ name_len = strlen(name);
+ pvever_index = strcspn(name, "+");
+ if (pvever_index < name_len) {
+ name_clean = g_strndup(name, pvever_index);
+ pvever = g_strndup(name + pvever_index + 1, name_len - pvever_index - 1);
+ qdict_put_str(machine_opts_dict, "pvever", pvever);
+ qdict_put_str(machine_opts_dict, "type", name_clean);
+ g_free(name_clean);
+ g_free(pvever);
+ }
+ }
+
break;
}
case QEMU_OPTION_accel:

View File

@@ -0,0 +1,59 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Wed, 2 Mar 2022 08:35:05 +0100
Subject: [PATCH] block/backup: move bcs bitmap initialization to job creation
For backing up the state of multiple disks from the same time, a job
for each disk has to be created. It's convenient if the jobs don't
have to be started at the same time and if operation of the VM can be
resumed after job creation. This would lead to a window between job
creation and running the job, where writes can happen. But no writes
should happen between setting up the copy-before-write filter and
setting up the block copy state bitmap, because then new writes would
just pass through.
Commit 06e0a9c16405c0a4c1eca33cf286cc04c42066a2 moved initalization of
the bitmap to setting up the copy-before-write filter when sync_mode
is not MIRROR_SYNC_MODE_BITMAP. Ensure that the bitmap is initialized
upon job creation for the remaining case too, by moving the
backup_init_bcs_bitmap call to backup_job_create.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/backup.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/block/backup.c b/block/backup.c
index a1292c01ec..2e38b30d67 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -237,8 +237,8 @@ static void backup_init_bcs_bitmap(BackupBlockJob *job)
true);
} else if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
/*
- * We can't hog the coroutine to initialize this thoroughly.
- * Set a flag and resume work when we are able to yield safely.
+ * Initialization is costly here. Simply set a flag and let the
+ * backup_run coroutine resume work once it can yield safely.
*/
block_copy_set_skip_unallocated(job->bcs, true);
}
@@ -252,8 +252,6 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
int ret;
- backup_init_bcs_bitmap(s);
-
if (s->sync_mode == MIRROR_SYNC_MODE_TOP) {
int64_t offset = 0;
int64_t count;
@@ -502,6 +500,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
&error_abort);
bdrv_graph_wrunlock();
+ backup_init_bcs_bitmap(job);
+
return &job->common;
error:

View File

@@ -3,21 +3,31 @@ From: Dietmar Maurer <dietmar@proxmox.com>
Date: Mon, 6 Apr 2020 12:16:57 +0200
Subject: [PATCH] PVE-Backup: add vma backup format code
Notes about partial restoring: skipping a certain drive is done via a
map line of the form skip=drive-scsi0. Since in PVE, most archives are
compressed and piped to vma for restore, it's not easily possible to
skip reads.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: improvements during create
allow partial restore
allow specifying disk formats for create operation]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/meson.build | 2 +
meson.build | 5 +
vma-reader.c | 857 ++++++++++++++++++++++++++++++++++++++++++++++
vma-writer.c | 790 ++++++++++++++++++++++++++++++++++++++++++
vma.c | 839 +++++++++++++++++++++++++++++++++++++++++++++
vma-reader.c | 867 ++++++++++++++++++++++++++++++++++++++++++
vma-writer.c | 816 ++++++++++++++++++++++++++++++++++++++++
vma.c | 941 ++++++++++++++++++++++++++++++++++++++++++++++
vma.h | 150 ++++++++
6 files changed, 2643 insertions(+)
6 files changed, 2781 insertions(+)
create mode 100644 vma-reader.c
create mode 100644 vma-writer.c
create mode 100644 vma.c
create mode 100644 vma.h
diff --git a/block/meson.build b/block/meson.build
index c10d544864..feffbc8623 100644
index 6a60b5d6b9..652c8cbdb7 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -42,6 +42,8 @@ block_ss.add(files(
@@ -26,38 +36,38 @@ index c10d544864..feffbc8623 100644
+block_ss.add(files('../vma-writer.c'), libuuid)
+
softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
system_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
system_ss.add(files('block-ram-registrar.c'))
block_ss.add(when: 'CONFIG_QCOW1', if_true: files('qcow.c'))
diff --git a/meson.build b/meson.build
index e3386196ba..d5b660516b 100644
index 147097c652..b9b673c271 100644
--- a/meson.build
+++ b/meson.build
@@ -725,6 +725,8 @@ keyutils = dependency('libkeyutils', required: false,
@@ -2129,6 +2129,8 @@ endif
has_gettid = cc.has_function('gettid')
+libuuid = cc.find_library('uuid', required: true)
+
# Malloc tests
# libselinux
selinux = dependency('libselinux',
required: get_option('selinux'),
@@ -4344,6 +4346,9 @@ if have_tools
dependencies: [blockdev, qemuutil, selinux],
install: true)
malloc = []
@@ -1907,6 +1909,9 @@ if have_tools
qemu_nbd = executable('qemu-nbd', files('qemu-nbd.c'),
dependencies: [blockdev, qemuutil], install: true)
+ vma = executable('vma', files('vma.c', 'vma-reader.c'),
+ dependencies: [authz, block, crypto, io, qom], install: true)
+ vma = executable('vma', files('vma.c', 'vma-reader.c') + genh,
+ dependencies: [authz, block, crypto, io, qemuutil, qom], install: true)
+
subdir('storage-daemon')
subdir('contrib/rdmacm-mux')
subdir('contrib/elf2dmp')
foreach exe: [ 'qemu-img', 'qemu-io', 'qemu-nbd', 'qemu-storage-daemon']
diff --git a/vma-reader.c b/vma-reader.c
new file mode 100644
index 0000000000..2b1d1cdab3
index 0000000000..bb65ad313c
--- /dev/null
+++ b/vma-reader.c
@@ -0,0 +1,857 @@
@@ -0,0 +1,867 @@
+/*
+ * VMA: Virtual Machine Archive
+ *
@@ -75,11 +85,11 @@ index 0000000000..2b1d1cdab3
+#include <glib.h>
+#include <uuid/uuid.h>
+
+#include "qemu-common.h"
+#include "qemu/timer.h"
+#include "qemu/ratelimit.h"
+#include "vma.h"
+#include "block/block.h"
+#include "block/graph-lock.h"
+#include "sysemu/block-backend.h"
+
+static unsigned char zero_vma_block[VMA_BLOCK_SIZE];
@@ -89,6 +99,7 @@ index 0000000000..2b1d1cdab3
+ bool write_zeroes;
+ unsigned long *bitmap;
+ int bitmap_size;
+ bool skip;
+} VmaRestoreState;
+
+struct VmaReader {
@@ -252,6 +263,9 @@ index 0000000000..2b1d1cdab3
+ if (vmar->rstate[i].bitmap) {
+ g_free(vmar->rstate[i].bitmap);
+ }
+ if (vmar->rstate[i].target) {
+ blk_unref(vmar->rstate[i].target);
+ }
+ }
+
+ if (vmar->md5csum) {
@@ -368,7 +382,6 @@ index 0000000000..2b1d1cdab3
+ }
+
+
+ int count = 0;
+ for (i = 1; i < 256; i++) {
+ VmaDeviceInfoHeader *dih = &h->dev_info[i];
+ uint32_t devname_ptr = GUINT32_FROM_BE(dih->devname_ptr);
@@ -376,7 +389,6 @@ index 0000000000..2b1d1cdab3
+ const char *devname = get_header_str(vmar, devname_ptr);
+
+ if (size && devname) {
+ count++;
+ vmar->devinfo[i].size = size;
+ vmar->devinfo[i].devname = devname;
+
@@ -483,13 +495,14 @@ index 0000000000..2b1d1cdab3
+}
+
+static void allocate_rstate(VmaReader *vmar, guint8 dev_id,
+ BlockBackend *target, bool write_zeroes)
+ BlockBackend *target, bool write_zeroes, bool skip)
+{
+ assert(vmar);
+ assert(dev_id);
+
+ vmar->rstate[dev_id].target = target;
+ vmar->rstate[dev_id].write_zeroes = write_zeroes;
+ vmar->rstate[dev_id].skip = skip;
+
+ int64_t size = vmar->devinfo[dev_id].size;
+
@@ -504,28 +517,30 @@ index 0000000000..2b1d1cdab3
+}
+
+int vma_reader_register_bs(VmaReader *vmar, guint8 dev_id, BlockBackend *target,
+ bool write_zeroes, Error **errp)
+ bool write_zeroes, bool skip, Error **errp)
+{
+ assert(vmar);
+ assert(target != NULL);
+ assert(target != NULL || skip);
+ assert(dev_id);
+ assert(vmar->rstate[dev_id].target == NULL);
+ assert(vmar->rstate[dev_id].target == NULL && !vmar->rstate[dev_id].skip);
+
+ int64_t size = blk_getlength(target);
+ int64_t size_diff = size - vmar->devinfo[dev_id].size;
+ if (target != NULL) {
+ int64_t size = blk_getlength(target);
+ int64_t size_diff = size - vmar->devinfo[dev_id].size;
+
+ /* storage types can have different size restrictions, so it
+ * is not always possible to create an image with exact size.
+ * So we tolerate a size difference up to 4MB.
+ */
+ if ((size_diff < 0) || (size_diff > 4*1024*1024)) {
+ error_setg(errp, "vma_reader_register_bs for stream %s failed - "
+ "unexpected size %zd != %zd", vmar->devinfo[dev_id].devname,
+ size, vmar->devinfo[dev_id].size);
+ return -1;
+ /* storage types can have different size restrictions, so it
+ * is not always possible to create an image with exact size.
+ * So we tolerate a size difference up to 4MB.
+ */
+ if ((size_diff < 0) || (size_diff > 4*1024*1024)) {
+ error_setg(errp, "vma_reader_register_bs for stream %s failed - "
+ "unexpected size %zd != %zd", vmar->devinfo[dev_id].devname,
+ size, vmar->devinfo[dev_id].size);
+ return -1;
+ }
+ }
+
+ allocate_rstate(vmar, dev_id, target, write_zeroes);
+ allocate_rstate(vmar, dev_id, target, write_zeroes, skip);
+
+ return 0;
+}
@@ -583,10 +598,12 @@ index 0000000000..2b1d1cdab3
+ }
+ }
+ } else {
+ int res = blk_pwrite(target, sector_num * BDRV_SECTOR_SIZE, buf, nb_sectors * BDRV_SECTOR_SIZE, 0);
+ int res = blk_pwrite(target, sector_num * BDRV_SECTOR_SIZE, nb_sectors * BDRV_SECTOR_SIZE, buf, 0);
+ if (res < 0) {
+ bdrv_graph_rdlock_main_loop();
+ error_setg(errp, "blk_pwrite to %s failed (%d)",
+ bdrv_get_device_name(blk_bs(target)), res);
+ bdrv_graph_rdunlock_main_loop();
+ return -1;
+ }
+ }
@@ -618,19 +635,23 @@ index 0000000000..2b1d1cdab3
+ VmaRestoreState *rstate = &vmar->rstate[dev_id];
+ BlockBackend *target = NULL;
+
+ bool skip = rstate->skip;
+
+ if (dev_id != vmar->vmstate_stream) {
+ target = rstate->target;
+ if (!verify && !target) {
+ if (!verify && !target && !skip) {
+ error_setg(errp, "got wrong dev id %d", dev_id);
+ return -1;
+ }
+
+ if (vma_reader_get_bitmap(rstate, cluster_num)) {
+ error_setg(errp, "found duplicated cluster %zd for stream %s",
+ cluster_num, vmar->devinfo[dev_id].devname);
+ return -1;
+ if (!skip) {
+ if (vma_reader_get_bitmap(rstate, cluster_num)) {
+ error_setg(errp, "found duplicated cluster %zd for stream %s",
+ cluster_num, vmar->devinfo[dev_id].devname);
+ return -1;
+ }
+ vma_reader_set_bitmap(rstate, cluster_num, 1);
+ }
+ vma_reader_set_bitmap(rstate, cluster_num, 1);
+
+ max_sector = vmar->devinfo[dev_id].size/BDRV_SECTOR_SIZE;
+ } else {
@@ -676,7 +697,7 @@ index 0000000000..2b1d1cdab3
+ return -1;
+ }
+
+ if (!verify) {
+ if (!verify && !skip) {
+ int nb_sectors = end_sector - sector_num;
+ if (restore_write_data(vmar, dev_id, target, vmstate_fd,
+ buf + start, sector_num, nb_sectors,
@@ -712,7 +733,7 @@ index 0000000000..2b1d1cdab3
+ return -1;
+ }
+
+ if (!verify) {
+ if (!verify && !skip) {
+ int nb_sectors = end_sector - sector_num;
+ if (restore_write_data(vmar, dev_id, target, vmstate_fd,
+ buf + start, sector_num,
@@ -737,7 +758,7 @@ index 0000000000..2b1d1cdab3
+ vmar->partial_zero_cluster_data += zero_size;
+ }
+
+ if (rstate->write_zeroes && !verify) {
+ if (rstate->write_zeroes && !verify && !skip) {
+ if (restore_write_data(vmar, dev_id, target, vmstate_fd,
+ zero_vma_block, sector_num,
+ nb_sectors, errp) < 0) {
@@ -862,8 +883,7 @@ index 0000000000..2b1d1cdab3
+
+ int64_t cluster_num, end;
+
+ end = (vmar->devinfo[i].size + VMA_CLUSTER_SIZE - 1) /
+ VMA_CLUSTER_SIZE;
+ end = DIV_ROUND_UP(vmar->devinfo[i].size, VMA_CLUSTER_SIZE);
+
+ for (cluster_num = 0; cluster_num < end; cluster_num++) {
+ if (!vma_reader_get_bitmap(rstate, cluster_num)) {
@@ -908,7 +928,7 @@ index 0000000000..2b1d1cdab3
+
+ for (dev_id = 1; dev_id < 255; dev_id++) {
+ if (vma_reader_get_device_info(vmar, dev_id)) {
+ allocate_rstate(vmar, dev_id, NULL, false);
+ allocate_rstate(vmar, dev_id, NULL, false, false);
+ }
+ }
+
@@ -917,10 +937,10 @@ index 0000000000..2b1d1cdab3
+
diff --git a/vma-writer.c b/vma-writer.c
new file mode 100644
index 0000000000..11d8321ffd
index 0000000000..3f489092df
--- /dev/null
+++ b/vma-writer.c
@@ -0,0 +1,790 @@
@@ -0,0 +1,816 @@
+/*
+ * VMA: Virtual Machine Archive
+ *
@@ -936,6 +956,8 @@ index 0000000000..11d8321ffd
+
+#include "qemu/osdep.h"
+#include <glib.h>
+#include <linux/magic.h>
+#include <sys/vfs.h>
+#include <uuid/uuid.h>
+
+#include "vma.h"
@@ -944,6 +966,8 @@ index 0000000000..11d8321ffd
+#include "qemu/main-loop.h"
+#include "qemu/coroutine.h"
+#include "qemu/cutils.h"
+#include "qemu/error-report.h"
+#include "qemu/memalign.h"
+
+#define DEBUG_VMA 0
+
@@ -1110,8 +1134,7 @@ index 0000000000..11d8321ffd
+ vmaw->stream_info[n].devname = g_strdup(devname);
+ vmaw->stream_info[n].size = size;
+
+ vmaw->stream_info[n].cluster_count = (size + VMA_CLUSTER_SIZE - 1) /
+ VMA_CLUSTER_SIZE;
+ vmaw->stream_info[n].cluster_count = DIV_ROUND_UP(size, VMA_CLUSTER_SIZE);
+
+ vmaw->stream_count = n;
+
@@ -1126,10 +1149,10 @@ index 0000000000..11d8321ffd
+{
+ assert(qemu_in_coroutine());
+ AioContext *ctx = qemu_get_current_aio_context();
+ aio_set_fd_handler(ctx, fd, false, NULL, (IOHandler *)qemu_coroutine_enter,
+ aio_set_fd_handler(ctx, fd, NULL, (IOHandler *)qemu_coroutine_enter, NULL,
+ NULL, qemu_coroutine_self());
+ qemu_coroutine_yield();
+ aio_set_fd_handler(ctx, fd, false, NULL, NULL, NULL, NULL);
+ aio_set_fd_handler(ctx, fd, NULL, NULL, NULL, NULL, NULL);
+}
+
+static ssize_t coroutine_fn
@@ -1178,6 +1201,23 @@ index 0000000000..11d8321ffd
+ return (done == bytes) ? bytes : -1;
+}
+
+static bool is_path_tmpfs(const char *path) {
+ struct statfs fs;
+ int ret;
+
+ do {
+ ret = statfs(path, &fs);
+ } while (ret != 0 && errno == EINTR);
+
+ if (ret != 0) {
+ warn_report("statfs call for %s failed, assuming not tmpfs - %s\n",
+ path, strerror(errno));
+ return false;
+ }
+
+ return fs.f_type == TMPFS_MAGIC;
+}
+
+VmaWriter *vma_writer_create(const char *filename, uuid_t uuid, Error **errp)
+{
+ const char *p;
@@ -1227,12 +1267,19 @@ index 0000000000..11d8321ffd
+ }
+ /* try to use O_NONBLOCK */
+ fcntl(vmaw->fd, F_SETFL, fcntl(vmaw->fd, F_GETFL)|O_NONBLOCK);
+ } else {
+ oflags = O_NONBLOCK|O_DIRECT|O_WRONLY|O_EXCL;
+ } else {
+ gchar *dirname = g_path_get_dirname(filename);
+ oflags = O_NONBLOCK|O_WRONLY|O_EXCL;
+ if (!is_path_tmpfs(dirname)) {
+ oflags |= O_DIRECT;
+ }
+ g_free(dirname);
+ vmaw->fd = qemu_create(filename, oflags, 0644, errp);
+ }
+
+ if (vmaw->fd < 0) {
+ error_free(*errp);
+ *errp = NULL;
+ error_setg(errp, "can't open file %s - %s\n", filename,
+ g_strerror(errno));
+ goto err;
@@ -1467,17 +1514,16 @@ index 0000000000..11d8321ffd
+ int i;
+
+ g_assert(vmaw != NULL);
+ g_assert(status != NULL);
+
+ if (status) {
+ status->status = vmaw->status;
+ g_strlcpy(status->errmsg, vmaw->errmsg, sizeof(status->errmsg));
+ for (i = 0; i <= 255; i++) {
+ status->stream_info[i] = vmaw->stream_info[i];
+ }
+
+ uuid_unparse_lower(vmaw->uuid, status->uuid_str);
+ status->status = vmaw->status;
+ g_strlcpy(status->errmsg, vmaw->errmsg, sizeof(status->errmsg));
+ for (i = 0; i <= 255; i++) {
+ status->stream_info[i] = vmaw->stream_info[i];
+ }
+
+ uuid_unparse_lower(vmaw->uuid, status->uuid_str);
+
+ status->closed = vmaw->closed;
+
+ return vmaw->status;
@@ -1713,10 +1759,10 @@ index 0000000000..11d8321ffd
+}
diff --git a/vma.c b/vma.c
new file mode 100644
index 0000000000..2eea2fc281
index 0000000000..8d4b4be414
--- /dev/null
+++ b/vma.c
@@ -0,0 +1,839 @@
@@ -0,0 +1,941 @@
+/*
+ * VMA: Virtual Machine Archive
+ *
@@ -1734,11 +1780,11 @@ index 0000000000..2eea2fc281
+#include <glib.h>
+
+#include "vma.h"
+#include "qemu-common.h"
+#include "qemu/module.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
+#include "qemu/cutils.h"
+#include "qemu/memalign.h"
+#include "qapi/qmp/qdict.h"
+#include "sysemu/block-backend.h"
+
@@ -1748,9 +1794,9 @@ index 0000000000..2eea2fc281
+ "usage: vma command [command options]\n"
+ "\n"
+ "vma list <filename>\n"
+ "vma config <filename> [-c config]\n"
+ "vma create <filename> [-c config] pathname ...\n"
+ "vma extract <filename> [-r <fifo>] <targetdir>\n"
+ "vma config <filename> [-c <config>]\n"
+ "vma create <filename> [-c <config>] [-d format=<format>:<device name>=<path> [-d ...]]\n"
+ "vma extract <filename> [-d <drive-list>] [-r <fifo>] <targetdir>\n"
+ "vma verify <filename> [-v]\n"
+ ;
+
@@ -1857,6 +1903,7 @@ index 0000000000..2eea2fc281
+ char *throttling_group;
+ char *cache;
+ bool write_zero;
+ bool skip;
+} RestoreMap;
+
+static bool try_parse_option(char **line, const char *optname, char **out, const char *inbuf) {
@@ -1894,9 +1941,10 @@ index 0000000000..2eea2fc281
+ const char *filename;
+ const char *dirname;
+ const char *readmap = NULL;
+ gchar **drive_list = NULL;
+
+ for (;;) {
+ c = getopt(argc, argv, "hvr:");
+ c = getopt(argc, argv, "hvd:r:");
+ if (c == -1) {
+ break;
+ }
@@ -1905,6 +1953,9 @@ index 0000000000..2eea2fc281
+ case 'h':
+ help();
+ break;
+ case 'd':
+ drive_list = g_strsplit(optarg, ",", 254);
+ break;
+ case 'r':
+ readmap = optarg;
+ break;
@@ -1964,76 +2015,89 @@ index 0000000000..2eea2fc281
+ char *bps = NULL;
+ char *group = NULL;
+ char *cache = NULL;
+ char *devname = NULL;
+ bool skip = false;
+ uint64_t bps_value = 0;
+ const char *path = NULL;
+ bool write_zero = true;
+
+ if (!line || line[0] == '\0' || !strcmp(line, "done\n")) {
+ break;
+ }
+ int len = strlen(line);
+ if (line[len - 1] == '\n') {
+ line[len - 1] = '\0';
+ if (len == 1) {
+ len = len - 1;
+ if (len == 0) {
+ break;
+ }
+ }
+
+ while (1) {
+ if (!try_parse_option(&line, "format", &format, inbuf) &&
+ !try_parse_option(&line, "throttling.bps", &bps, inbuf) &&
+ !try_parse_option(&line, "throttling.group", &group, inbuf) &&
+ !try_parse_option(&line, "cache", &cache, inbuf))
+ {
+ break;
+ if (strncmp(line, "skip", 4) == 0) {
+ if (len < 6 || line[4] != '=') {
+ g_error("read map failed - option 'skip' has no value ('%s')",
+ inbuf);
+ } else {
+ devname = line + 5;
+ skip = true;
+ }
+ }
+
+ uint64_t bps_value = 0;
+ if (bps) {
+ bps_value = verify_u64(bps);
+ g_free(bps);
+ }
+
+ const char *path;
+ bool write_zero;
+ if (line[0] == '0' && line[1] == ':') {
+ path = line + 2;
+ write_zero = false;
+ } else if (line[0] == '1' && line[1] == ':') {
+ path = line + 2;
+ write_zero = true;
+ } else {
+ g_error("read map failed - parse error ('%s')", inbuf);
+ while (1) {
+ if (!try_parse_option(&line, "format", &format, inbuf) &&
+ !try_parse_option(&line, "throttling.bps", &bps, inbuf) &&
+ !try_parse_option(&line, "throttling.group", &group, inbuf) &&
+ !try_parse_option(&line, "cache", &cache, inbuf))
+ {
+ break;
+ }
+ }
+
+ if (bps) {
+ bps_value = verify_u64(bps);
+ g_free(bps);
+ }
+
+ if (line[0] == '0' && line[1] == ':') {
+ path = line + 2;
+ write_zero = false;
+ } else if (line[0] == '1' && line[1] == ':') {
+ path = line + 2;
+ write_zero = true;
+ } else {
+ g_error("read map failed - parse error ('%s')", inbuf);
+ }
+
+ path = extract_devname(path, &devname, -1);
+ }
+
+ char *devname = NULL;
+ path = extract_devname(path, &devname, -1);
+ if (!devname) {
+ g_error("read map failed - no dev name specified ('%s')",
+ inbuf);
+ }
+
+ RestoreMap *map = g_new0(RestoreMap, 1);
+ map->devname = g_strdup(devname);
+ map->path = g_strdup(path);
+ map->format = format;
+ map->throttling_bps = bps_value;
+ map->throttling_group = group;
+ map->cache = cache;
+ map->write_zero = write_zero;
+ RestoreMap *restore_map = g_new0(RestoreMap, 1);
+ restore_map->devname = g_strdup(devname);
+ restore_map->path = g_strdup(path);
+ restore_map->format = format;
+ restore_map->throttling_bps = bps_value;
+ restore_map->throttling_group = group;
+ restore_map->cache = cache;
+ restore_map->write_zero = write_zero;
+ restore_map->skip = skip;
+
+ g_hash_table_insert(devmap, map->devname, map);
+ g_hash_table_insert(devmap, restore_map->devname, restore_map);
+
+ };
+ }
+
+ int i;
+ int vmstate_fd = -1;
+ guint8 vmstate_stream = 0;
+
+ BlockBackend *blk = NULL;
+ bool drive_rename_bitmap[255];
+ memset(drive_rename_bitmap, 0, sizeof(drive_rename_bitmap));
+
+ for (i = 1; i < 255; i++) {
+ VmaDeviceInfo *di = vma_reader_get_device_info(vmar, i);
+ if (di && (strcmp(di->devname, "vmstate") == 0)) {
+ vmstate_stream = i;
+ char *statefn = g_strdup_printf("%s/vmstate.bin", dirname);
+ vmstate_fd = open(statefn, O_WRONLY|O_CREAT|O_EXCL, 0644);
+ if (vmstate_fd < 0) {
@@ -2049,8 +2113,25 @@ index 0000000000..2eea2fc281
+ const char *cache = NULL;
+ int flags = BDRV_O_RDWR;
+ bool write_zero = true;
+ bool skip = false;
+
+ if (readmap) {
+ BlockBackend *blk = NULL;
+
+ if (drive_list) {
+ skip = true;
+ int j;
+ for (j = 0; drive_list[j]; j++) {
+ if (strcmp(drive_list[j], di->devname) == 0) {
+ skip = false;
+ drive_rename_bitmap[i] = true;
+ break;
+ }
+ }
+ } else {
+ drive_rename_bitmap[i] = true;
+ }
+
+ if (!skip && readmap) {
+ RestoreMap *map;
+ map = (RestoreMap *)g_hash_table_lookup(devmap, di->devname);
+ if (map == NULL) {
@@ -2062,7 +2143,8 @@ index 0000000000..2eea2fc281
+ throttling_group = map->throttling_group;
+ cache = map->cache;
+ write_zero = map->write_zero;
+ } else {
+ skip = map->skip;
+ } else if (!skip) {
+ devfn = g_strdup_printf("%s/tmp-disk-%s.raw",
+ dirname, di->devname);
+ printf("DEVINFO %s %zd\n", devfn, di->size);
@@ -2080,57 +2162,60 @@ index 0000000000..2eea2fc281
+ write_zero = false;
+ }
+
+ size_t devlen = strlen(devfn);
+ QDict *options = NULL;
+ bool writethrough;
+ if (format) {
+ /* explicit format from commandline */
+ options = qdict_new();
+ qdict_put_str(options, "driver", format);
+ } else if ((devlen > 4 && strcmp(devfn+devlen-4, ".raw") == 0) ||
+ strncmp(devfn, "/dev/", 5) == 0)
+ {
+ /* This part is now deprecated for PVE as well (just as qemu
+ * deprecated not specifying an explicit raw format, too.
+ */
+ /* explicit raw format */
+ options = qdict_new();
+ qdict_put_str(options, "driver", "raw");
+ }
+ if (cache && bdrv_parse_cache_mode(cache, &flags, &writethrough)) {
+ g_error("invalid cache option: %s\n", cache);
+ }
+
+ if (errp || !(blk = blk_new_open(devfn, NULL, options, flags, &errp))) {
+ g_error("can't open file %s - %s", devfn,
+ error_get_pretty(errp));
+ }
+
+ if (cache) {
+ blk_set_enable_write_cache(blk, !writethrough);
+ }
+
+ if (throttling_group) {
+ blk_io_limits_enable(blk, throttling_group);
+ }
+
+ if (throttling_bps) {
+ if (!throttling_group) {
+ blk_io_limits_enable(blk, devfn);
+ if (!skip) {
+ size_t devlen = strlen(devfn);
+ QDict *options = NULL;
+ bool writethrough;
+ if (format) {
+ /* explicit format from commandline */
+ options = qdict_new();
+ qdict_put_str(options, "driver", format);
+ } else if ((devlen > 4 && strcmp(devfn+devlen-4, ".raw") == 0) ||
+ strncmp(devfn, "/dev/", 5) == 0)
+ {
+ /* This part is now deprecated for PVE as well (just as qemu
+ * deprecated not specifying an explicit raw format, too.
+ */
+ /* explicit raw format */
+ options = qdict_new();
+ qdict_put_str(options, "driver", "raw");
+ }
+
+ ThrottleConfig cfg;
+ throttle_config_init(&cfg);
+ cfg.buckets[THROTTLE_BPS_WRITE].avg = throttling_bps;
+ Error *err = NULL;
+ if (!throttle_is_valid(&cfg, &err)) {
+ error_report_err(err);
+ g_error("failed to apply throttling");
+ if (cache && bdrv_parse_cache_mode(cache, &flags, &writethrough)) {
+ g_error("invalid cache option: %s\n", cache);
+ }
+
+ if (errp || !(blk = blk_new_open(devfn, NULL, options, flags, &errp))) {
+ g_error("can't open file %s - %s", devfn,
+ error_get_pretty(errp));
+ }
+
+ if (cache) {
+ blk_set_enable_write_cache(blk, !writethrough);
+ }
+
+ if (throttling_group) {
+ blk_io_limits_enable(blk, throttling_group);
+ }
+
+ if (throttling_bps) {
+ if (!throttling_group) {
+ blk_io_limits_enable(blk, devfn);
+ }
+
+ ThrottleConfig cfg;
+ throttle_config_init(&cfg);
+ cfg.buckets[THROTTLE_BPS_WRITE].avg = throttling_bps;
+ Error *err = NULL;
+ if (!throttle_is_valid(&cfg, &err)) {
+ error_report_err(err);
+ g_error("failed to apply throttling");
+ }
+ blk_set_io_limits(blk, &cfg);
+ }
+ blk_set_io_limits(blk, &cfg);
+ }
+
+ if (vma_reader_register_bs(vmar, i, blk, write_zero, &errp) < 0) {
+ if (vma_reader_register_bs(vmar, i, blk, write_zero, skip, &errp) < 0) {
+ g_error("%s", error_get_pretty(errp));
+ }
+
@@ -2140,6 +2225,10 @@ index 0000000000..2eea2fc281
+ }
+ }
+
+ if (drive_list) {
+ g_strfreev(drive_list);
+ }
+
+ if (vma_reader_restore(vmar, vmstate_fd, verbose, &errp) < 0) {
+ g_error("restore failed - %s", error_get_pretty(errp));
+ }
@@ -2147,7 +2236,7 @@ index 0000000000..2eea2fc281
+ if (!readmap) {
+ for (i = 1; i < 255; i++) {
+ VmaDeviceInfo *di = vma_reader_get_device_info(vmar, i);
+ if (di && (i != vmstate_stream)) {
+ if (di && drive_rename_bitmap[i]) {
+ char *tmpfn = g_strdup_printf("%s/tmp-disk-%s.raw",
+ dirname, di->devname);
+ char *fn = g_strdup_printf("%s/disk-%s.raw",
@@ -2162,8 +2251,6 @@ index 0000000000..2eea2fc281
+
+ vma_reader_destroy(vmar);
+
+ blk_unref(blk);
+
+ bdrv_close_all();
+
+ return ret;
@@ -2248,7 +2335,7 @@ index 0000000000..2eea2fc281
+ struct iovec iov;
+ QEMUIOVector qiov;
+
+ int64_t start, end;
+ int64_t start, end, readlen;
+ int ret = 0;
+
+ unsigned char *buf = blk_blockalign(job->target, VMA_CLUSTER_SIZE);
@@ -2262,16 +2349,24 @@ index 0000000000..2eea2fc281
+ iov.iov_len = VMA_CLUSTER_SIZE;
+ qemu_iovec_init_external(&qiov, &iov, 1);
+
+ if (start + 1 == end) {
+ memset(buf, 0, VMA_CLUSTER_SIZE);
+ readlen = job->len - start * VMA_CLUSTER_SIZE;
+ assert(readlen > 0 && readlen <= VMA_CLUSTER_SIZE);
+ } else {
+ readlen = VMA_CLUSTER_SIZE;
+ }
+
+ ret = blk_co_preadv(job->target, start * VMA_CLUSTER_SIZE,
+ VMA_CLUSTER_SIZE, &qiov, 0);
+ readlen, &qiov, 0);
+ if (ret < 0) {
+ vma_writer_set_error(job->vmaw, "read error", -1);
+ vma_writer_set_error(job->vmaw, "read error");
+ goto out;
+ }
+
+ size_t zb = 0;
+ if (vma_writer_write(job->vmaw, job->dev_id, start, buf, &zb) < 0) {
+ vma_writer_set_error(job->vmaw, "backup_dump_cb vma_writer_write failed", -1);
+ vma_writer_set_error(job->vmaw, "backup_dump_cb vma_writer_write failed");
+ goto out;
+ }
+ }
@@ -2289,13 +2384,16 @@ index 0000000000..2eea2fc281
+
+static int create_archive(int argc, char **argv)
+{
+ int i, c;
+ int c;
+ int verbose = 0;
+ bool expect_format = true;
+ const char *archivename;
+ GList *backup_coroutines = NULL;
+ GList *config_files = NULL;
+ GList *disk_infos = NULL;
+
+ for (;;) {
+ c = getopt(argc, argv, "hvc:");
+ c = getopt(argc, argv, "hvc:d:");
+ if (c == -1) {
+ break;
+ }
@@ -2307,6 +2405,9 @@ index 0000000000..2eea2fc281
+ case 'c':
+ config_files = g_list_append(config_files, optarg);
+ break;
+ case 'd':
+ disk_infos = g_list_append(disk_infos, optarg);
+ break;
+ case 'v':
+ verbose = 1;
+ break;
@@ -2352,16 +2453,48 @@ index 0000000000..2eea2fc281
+ l = g_list_next(l);
+ }
+
+ int devcount = 0;
+ /*
+ * Don't allow mixing new and old way to specifiy disks.
+ * TODO PVE 9 drop old way and always require format.
+ */
+ if (optind < argc && g_list_first(disk_infos)) {
+ unlink(archivename);
+ g_error("Unexpected extra argument - specify all devices via '-d'");
+ }
+
+ while (optind < argc) {
+ const char *path = argv[optind++];
+ expect_format = false;
+ disk_infos = g_list_append(disk_infos, argv[optind++]);
+ }
+
+ int devcount = 0;
+ GList *disk_l = disk_infos;
+ while (disk_l && disk_l->data) {
+ char *disk_info = disk_l->data;
+ const char *path = NULL;
+ char *devname = NULL;
+ path = extract_devname(path, &devname, devcount++);
+ char *format = NULL;
+ QDict *options = qdict_new();
+
+ if (try_parse_option(&disk_info, "format", &format, disk_info)) {
+ qdict_put_str(options, "driver", format);
+ } else {
+ if (expect_format) {
+ unlink(archivename);
+ g_error("No format specified for device: '%s'", disk_info);
+ } else {
+ g_warning("Specifying a device without a format is deprecated"
+ " - use '-d format=<format>:%s'",
+ disk_info);
+ }
+ }
+
+ path = extract_devname(disk_info, &devname, devcount++);
+
+ Error *errp = NULL;
+ BlockBackend *target;
+
+ target = blk_new_open(path, NULL, NULL, 0, &errp);
+ target = blk_new_open(path, NULL, options, 0, &errp);
+ if (!target) {
+ unlink(archivename);
+ g_error("bdrv_open '%s' failed - %s", path, error_get_pretty(errp));
@@ -2380,7 +2513,11 @@ index 0000000000..2eea2fc281
+ job->dev_id = dev_id;
+
+ Coroutine *co = qemu_coroutine_create(backup_run, job);
+ qemu_coroutine_enter(co);
+ // Don't enter coroutine yet, because it might write the header before
+ // all streams can be registered.
+ backup_coroutines = g_list_append(backup_coroutines, co);
+
+ disk_l = g_list_next(disk_l);
+ }
+
+ VmaStatus vmastat;
@@ -2388,6 +2525,13 @@ index 0000000000..2eea2fc281
+ int last_percent = -1;
+
+ if (devcount) {
+ GList *entry = backup_coroutines;
+ while (entry && entry->data) {
+ Coroutine *co = entry->data;
+ qemu_coroutine_enter(co);
+ entry = g_list_next(entry);
+ }
+
+ while (1) {
+ main_loop_wait(false);
+ vma_writer_get_status(vmaw, &vmastat);
@@ -2437,6 +2581,7 @@ index 0000000000..2eea2fc281
+ vma_writer_get_status(vmaw, &vmastat);
+
+ if (verbose) {
+ int i;
+ for (i = 0; i < 256; i++) {
+ VmaStreamInfo *si = &vmastat.stream_info[i];
+ if (si->size) {
@@ -2452,6 +2597,9 @@ index 0000000000..2eea2fc281
+ g_error("creating vma archive failed");
+ }
+
+ g_list_free(backup_coroutines);
+ g_list_free(config_files);
+ g_list_free(disk_infos);
+ vma_writer_destroy(vmaw);
+ return 0;
+}
@@ -2558,7 +2706,7 @@ index 0000000000..2eea2fc281
+}
diff --git a/vma.h b/vma.h
new file mode 100644
index 0000000000..c895c97f6d
index 0000000000..86d2873aa5
--- /dev/null
+++ b/vma.h
@@ -0,0 +1,150 @@
@@ -2696,7 +2844,7 @@ index 0000000000..c895c97f6d
+int coroutine_fn vma_writer_flush_output(VmaWriter *vmaw);
+
+int vma_writer_get_status(VmaWriter *vmaw, VmaStatus *status);
+void vma_writer_set_error(VmaWriter *vmaw, const char *fmt, ...);
+void vma_writer_set_error(VmaWriter *vmaw, const char *fmt, ...) G_GNUC_PRINTF(2, 3);
+
+
+VmaReader *vma_reader_create(const char *filename, Error **errp);
@@ -2706,7 +2854,7 @@ index 0000000000..c895c97f6d
+VmaDeviceInfo *vma_reader_get_device_info(VmaReader *vmar, guint8 dev_id);
+int vma_reader_register_bs(VmaReader *vmar, guint8 dev_id,
+ BlockBackend *target, bool write_zeroes,
+ Error **errp);
+ bool skip, Error **errp);
+int vma_reader_restore(VmaReader *vmar, int vmstate_fd, bool verbose,
+ Error **errp);
+int vma_reader_verify(VmaReader *vmar, bool verbose, Error **errp);

View File

@@ -7,21 +7,25 @@ Subject: [PATCH] PVE-Backup: add backup-dump block driver
- move BackupBlockJob declaration from block/backup.c to include/block/block_int.h
- block/backup.c - backup-job-create: also consider source cluster size
- job.c: make job_should_pause non-static
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: adapt to coroutine changes]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/backup-dump.c | 168 ++++++++++++++++++++++++++++++++++++++
block/backup.c | 23 ++----
block/meson.build | 1 +
include/block/block_int.h | 30 +++++++
job.c | 3 +-
5 files changed, 206 insertions(+), 19 deletions(-)
block/backup-dump.c | 172 +++++++++++++++++++++++++++++++
block/backup.c | 30 ++----
block/meson.build | 1 +
include/block/block_int-common.h | 35 +++++++
job.c | 3 +-
5 files changed, 218 insertions(+), 23 deletions(-)
create mode 100644 block/backup-dump.c
diff --git a/block/backup-dump.c b/block/backup-dump.c
new file mode 100644
index 0000000000..93d7f46950
index 0000000000..e46abf1070
--- /dev/null
+++ b/block/backup-dump.c
@@ -0,0 +1,168 @@
@@ -0,0 +1,172 @@
+/*
+ * BlockDriver to send backup data stream to a callback function
+ *
@@ -33,7 +37,8 @@ index 0000000000..93d7f46950
+ */
+
+#include "qemu/osdep.h"
+#include "qemu-common.h"
+
+#include "qapi/qmp/qdict.h"
+#include "qom/object_interfaces.h"
+#include "block/block_int.h"
+
@@ -44,7 +49,8 @@ index 0000000000..93d7f46950
+ void *dump_cb_data;
+} BDRVBackupDumpState;
+
+static int qemu_backup_dump_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+static coroutine_fn int qemu_backup_dump_co_get_info(BlockDriverState *bs,
+ BlockDriverInfo *bdi)
+{
+ BDRVBackupDumpState *s = bs->opaque;
+
@@ -85,7 +91,7 @@ index 0000000000..93d7f46950
+ /* Nothing to do. */
+}
+
+static int64_t qemu_backup_dump_getlength(BlockDriverState *bs)
+static coroutine_fn int64_t qemu_backup_dump_co_getlength(BlockDriverState *bs)
+{
+ BDRVBackupDumpState *s = bs->opaque;
+
@@ -145,8 +151,8 @@ index 0000000000..93d7f46950
+
+ .bdrv_close = qemu_backup_dump_close,
+ .bdrv_has_zero_init = bdrv_has_zero_init_1,
+ .bdrv_getlength = qemu_backup_dump_getlength,
+ .bdrv_get_info = qemu_backup_dump_get_info,
+ .bdrv_co_getlength = qemu_backup_dump_co_getlength,
+ .bdrv_co_get_info = qemu_backup_dump_co_get_info,
+
+ .bdrv_co_writev = qemu_backup_dump_co_writev,
+
@@ -165,7 +171,7 @@ index 0000000000..93d7f46950
+block_init(bdrv_backup_dump_init);
+
+
+BlockDriverState *bdrv_backup_dump_create(
+BlockDriverState *coroutine_fn bdrv_co_backup_dump_create(
+ int dump_cb_block_size,
+ uint64_t byte_size,
+ BackupDumpFunc *dump_cb,
@@ -173,9 +179,11 @@ index 0000000000..93d7f46950
+ Error **errp)
+{
+ BDRVBackupDumpState *state;
+ BlockDriverState *bs = bdrv_new_open_driver(
+ &bdrv_backup_dump_drive, NULL, BDRV_O_RDWR, errp);
+
+ QDict *options = qdict_new();
+ qdict_put_str(options, "driver", "backup-dump-drive");
+
+ BlockDriverState *bs = bdrv_co_open(NULL, NULL, options, BDRV_O_RDWR, errp);
+ if (!bs) {
+ return NULL;
+ }
@@ -191,17 +199,18 @@ index 0000000000..93d7f46950
+ return bs;
+}
diff --git a/block/backup.c b/block/backup.c
index 9afa0bf3b4..3df3d532d5 100644
index 2e38b30d67..fe69723ada 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -32,24 +32,6 @@
@@ -29,28 +29,6 @@
#define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
#include "block/copy-before-write.h"
-typedef struct BackupBlockJob {
- BlockJob common;
- BlockDriverState *backup_top;
- BlockDriverState *cbw;
- BlockDriverState *source_bs;
- BlockDriverState *target_bs;
-
- BdrvDirtyBitmap *sync_bitmap;
-
@@ -210,29 +219,35 @@ index 9afa0bf3b4..3df3d532d5 100644
- BlockdevOnError on_source_error;
- BlockdevOnError on_target_error;
- uint64_t len;
- uint64_t bytes_read;
- int64_t cluster_size;
- BackupPerf perf;
-
- BlockCopyState *bcs;
-
- bool wait;
- BlockCopyCallState *bg_bcs_call;
-} BackupBlockJob;
-
static const BlockJobDriver backup_job_driver;
static void backup_progress_bytes_callback(int64_t bytes, void *opaque)
@@ -423,6 +405,11 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
goto error;
static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
@@ -462,6 +440,14 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
}
cluster_size = block_copy_cluster_size(bcs);
+ if (cluster_size < 0) {
+ goto error;
+ }
+
+ BlockDriverInfo bdi;
+ if (bdrv_get_info(bs, &bdi) == 0) {
+ cluster_size = MAX(cluster_size, bdi.cluster_size);
+ }
+
/*
* If source is in backing chain of target assume that target is going to be
* used for "image fleecing", i.e. it should represent a kind of snapshot of
if (perf->max_chunk && perf->max_chunk < cluster_size) {
error_setg(errp, "Required max-chunk (%" PRIi64 ") is less than backup "
diff --git a/block/meson.build b/block/meson.build
index feffbc8623..2507af1168 100644
index 652c8cbdb7..e1cf5a2e65 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -4,6 +4,7 @@ block_ss.add(files(
@@ -240,20 +255,28 @@ index feffbc8623..2507af1168 100644
'amend.c',
'backup.c',
+ 'backup-dump.c',
'backup-top.c',
'blkdebug.c',
'blklogwrites.c',
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6f8eda629a..5455102da8 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -63,6 +63,36 @@
'blkverify.c',
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index ebb4e56a50..e717a74e5f 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -26,6 +26,7 @@
#include "block/aio.h"
#include "block/block-common.h"
+#include "block/block-copy.h"
#include "block/block-global-state.h"
#include "block/snapshot.h"
#include "qemu/iov.h"
@@ -60,6 +61,40 @@
#define BLOCK_PROBE_BUF_SIZE 512
+typedef int BackupDumpFunc(void *opaque, uint64_t offset, uint64_t bytes, const void *buf);
+
+BlockDriverState *bdrv_backuo_dump_create(
+BlockDriverState *coroutine_fn bdrv_co_backup_dump_create(
+ int dump_cb_block_size,
+ uint64_t byte_size,
+ BackupDumpFunc *dump_cb,
@@ -265,8 +288,9 @@ index 6f8eda629a..5455102da8 100644
+typedef struct BlockCopyState BlockCopyState;
+typedef struct BackupBlockJob {
+ BlockJob common;
+ BlockDriverState *backup_top;
+ BlockDriverState *cbw;
+ BlockDriverState *source_bs;
+ BlockDriverState *target_bs;
+
+ BdrvDirtyBitmap *sync_bitmap;
+
@@ -275,26 +299,29 @@ index 6f8eda629a..5455102da8 100644
+ BlockdevOnError on_source_error;
+ BlockdevOnError on_target_error;
+ uint64_t len;
+ uint64_t bytes_read;
+ int64_t cluster_size;
+ BackupPerf perf;
+
+ BlockCopyState *bcs;
+
+ bool wait;
+ BlockCopyCallState *bg_bcs_call;
+} BackupBlockJob;
+
enum BdrvTrackedRequestType {
BDRV_TRACKED_READ,
BDRV_TRACKED_WRITE,
diff --git a/job.c b/job.c
index 8fecf38960..f9884e7d9d 100644
index 660ce22c56..baf54c8d60 100644
--- a/job.c
+++ b/job.c
@@ -269,7 +269,8 @@ static bool job_started(Job *job)
return job->co;
@@ -331,7 +331,8 @@ static bool job_started_locked(Job *job)
}
-static bool job_should_pause(Job *job)
+bool job_should_pause(Job *job);
+bool job_should_pause(Job *job)
/* Called with job_mutex held. */
-static bool job_should_pause_locked(Job *job)
+bool job_should_pause_locked(Job *job);
+bool job_should_pause_locked(Job *job)
{
return job->pause_count > 0;
}

File diff suppressed because it is too large Load Diff

View File

@@ -4,16 +4,17 @@ Date: Thu, 20 Aug 2020 14:31:59 +0200
Subject: [PATCH] PVE: Add sequential job transaction support
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
include/qemu/job.h | 12 ++++++++++++
job.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+)
job.c | 34 ++++++++++++++++++++++++++++++++++
2 files changed, 46 insertions(+)
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 32aabb1c60..f7a6a0926a 100644
index 2b873f2576..528cd6acb9 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -280,6 +280,18 @@ typedef enum JobCreateFlags {
@@ -362,6 +362,18 @@ void job_unlock(void);
*/
JobTxn *job_txn_new(void);
@@ -33,10 +34,10 @@ index 32aabb1c60..f7a6a0926a 100644
* Release a reference that was previously acquired with job_txn_add_job or
* job_txn_new. If it's the last reference to the object, it will be freed.
diff --git a/job.c b/job.c
index f9884e7d9d..05b7797e82 100644
index baf54c8d60..3ac5e5cde2 100644
--- a/job.c
+++ b/job.c
@@ -72,6 +72,8 @@ struct JobTxn {
@@ -94,6 +94,8 @@ struct JobTxn {
/* Reference count */
int refcnt;
@@ -44,8 +45,8 @@ index f9884e7d9d..05b7797e82 100644
+ bool sequential;
};
/* Right now, this mutex is only needed to synchronize accesses to job->busy
@@ -102,6 +104,25 @@ JobTxn *job_txn_new(void)
void job_lock(void)
@@ -119,6 +121,25 @@ JobTxn *job_txn_new(void)
return txn;
}
@@ -68,20 +69,23 @@ index f9884e7d9d..05b7797e82 100644
+ job_start(first);
+}
+
static void job_txn_ref(JobTxn *txn)
/* Called with job_mutex held. */
static void job_txn_ref_locked(JobTxn *txn)
{
txn->refcnt++;
@@ -841,6 +862,9 @@ static void job_completed_txn_success(Job *job)
@@ -1042,6 +1063,12 @@ static void job_completed_txn_success_locked(Job *job)
*/
QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
if (!job_is_completed(other_job)) {
if (!job_is_completed_locked(other_job)) {
+ if (txn->sequential) {
+ job_unlock();
+ /* Needs to be called without holding the job lock */
+ job_start(other_job);
+ job_lock();
+ }
return;
}
assert(other_job->ret == 0);
@@ -1011,6 +1035,13 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
@@ -1253,6 +1280,13 @@ int job_finish_sync_locked(Job *job,
return -EBUSY;
}
@@ -89,9 +93,9 @@ index f9884e7d9d..05b7797e82 100644
+ * of cancelling, these have not begun work so job_enter won't do anything,
+ * let's ensure they are marked as ABORTING if required */
+ if (job->status == JOB_STATUS_CREATED && job->txn->sequential) {
+ job_update_rc(job);
+ job_update_rc_locked(job);
+ }
+
AIO_WAIT_WHILE(job->aio_context,
(job_enter(job), !job_is_completed(job)));
job_unlock();
AIO_WAIT_WHILE_UNLOCKED(job->aio_context,
(job_enter(job), !job_is_completed(job)));

View File

@@ -1,452 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Mon, 29 Jun 2020 11:06:03 +0200
Subject: [PATCH] PVE-Backup: Add dirty-bitmap tracking for incremental backups
Uses QEMU's existing MIRROR_SYNC_MODE_BITMAP and a dirty-bitmap on top
of all backed-up drives. This will only execute the data-write callback
for any changed chunks, the PBS rust code will reuse chunks from the
previous index for everything it doesn't receive if reuse_index is true.
On error or cancellation, remove all dirty bitmaps to ensure
consistency.
Add PBS/incremental specific information to query backup info QMP and
HMP commands.
Only supported for PBS backups.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 1 +
monitor/hmp-cmds.c | 45 ++++++++++----
proxmox-backup-client.c | 3 +-
proxmox-backup-client.h | 1 +
pve-backup.c | 103 ++++++++++++++++++++++++++++++---
qapi/block-core.json | 12 +++-
6 files changed, 142 insertions(+), 23 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 9ba7c774a2..056d14deee 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1039,6 +1039,7 @@ void hmp_backup(Monitor *mon, const QDict *qdict)
false, NULL, // PBS fingerprint
false, NULL, // PBS backup-id
false, 0, // PBS backup-time
+ false, false, // PBS incremental
true, dir ? BACKUP_FORMAT_DIR : BACKUP_FORMAT_VMA,
false, NULL, false, NULL, !!devlist,
devlist, qdict_haskey(qdict, "speed"), speed, &error);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 182e79c943..604026bb37 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -221,19 +221,42 @@ void hmp_info_backup(Monitor *mon, const QDict *qdict)
monitor_printf(mon, "End time: %s", ctime(&info->end_time));
}
- int per = (info->has_total && info->total &&
- info->has_transferred && info->transferred) ?
- (info->transferred * 100)/info->total : 0;
- int zero_per = (info->has_total && info->total &&
- info->has_zero_bytes && info->zero_bytes) ?
- (info->zero_bytes * 100)/info->total : 0;
monitor_printf(mon, "Backup file: %s\n", info->backup_file);
monitor_printf(mon, "Backup uuid: %s\n", info->uuid);
- monitor_printf(mon, "Total size: %zd\n", info->total);
- monitor_printf(mon, "Transferred bytes: %zd (%d%%)\n",
- info->transferred, per);
- monitor_printf(mon, "Zero bytes: %zd (%d%%)\n",
- info->zero_bytes, zero_per);
+
+ if (!(info->has_total && info->total)) {
+ // this should not happen normally
+ monitor_printf(mon, "Total size: %d\n", 0);
+ } else {
+ bool incremental = false;
+ size_t total_or_dirty = info->total;
+ if (info->has_transferred) {
+ if (info->has_dirty && info->dirty) {
+ if (info->dirty < info->total) {
+ total_or_dirty = info->dirty;
+ incremental = true;
+ }
+ }
+ }
+
+ int per = (info->transferred * 100)/total_or_dirty;
+
+ monitor_printf(mon, "Backup mode: %s\n", incremental ? "incremental" : "full");
+
+ int zero_per = (info->has_zero_bytes && info->zero_bytes) ?
+ (info->zero_bytes * 100)/info->total : 0;
+ monitor_printf(mon, "Total size: %zd\n", info->total);
+ monitor_printf(mon, "Transferred bytes: %zd (%d%%)\n",
+ info->transferred, per);
+ monitor_printf(mon, "Zero bytes: %zd (%d%%)\n",
+ info->zero_bytes, zero_per);
+
+ if (info->has_reused) {
+ int reused_per = (info->reused * 100)/total_or_dirty;
+ monitor_printf(mon, "Reused bytes: %zd (%d%%)\n",
+ info->reused, reused_per);
+ }
+ }
}
qapi_free_BackupStatus(info);
diff --git a/proxmox-backup-client.c b/proxmox-backup-client.c
index a8f6653a81..4ce7bc0b5e 100644
--- a/proxmox-backup-client.c
+++ b/proxmox-backup-client.c
@@ -89,6 +89,7 @@ proxmox_backup_co_register_image(
ProxmoxBackupHandle *pbs,
const char *device_name,
uint64_t size,
+ bool incremental,
Error **errp)
{
Coroutine *co = qemu_coroutine_self();
@@ -98,7 +99,7 @@ proxmox_backup_co_register_image(
int pbs_res = -1;
proxmox_backup_register_image_async(
- pbs, device_name, size ,proxmox_backup_schedule_wake, &waker, &pbs_res, &pbs_err);
+ pbs, device_name, size, incremental, proxmox_backup_schedule_wake, &waker, &pbs_res, &pbs_err);
qemu_coroutine_yield();
if (pbs_res < 0) {
if (errp) error_setg(errp, "backup register image failed: %s", pbs_err ? pbs_err : "unknown error");
diff --git a/proxmox-backup-client.h b/proxmox-backup-client.h
index 1dda8b7d8f..8cbf645b2c 100644
--- a/proxmox-backup-client.h
+++ b/proxmox-backup-client.h
@@ -32,6 +32,7 @@ proxmox_backup_co_register_image(
ProxmoxBackupHandle *pbs,
const char *device_name,
uint64_t size,
+ bool incremental,
Error **errp);
diff --git a/pve-backup.c b/pve-backup.c
index d40f3f2fd6..1cd9d31d7c 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -28,6 +28,8 @@
*
*/
+const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap";
+
static struct PVEBackupState {
struct {
// Everithing accessed from qmp_backup_query command is protected using lock
@@ -39,7 +41,9 @@ static struct PVEBackupState {
uuid_t uuid;
char uuid_str[37];
size_t total;
+ size_t dirty;
size_t transferred;
+ size_t reused;
size_t zero_bytes;
} stat;
int64_t speed;
@@ -66,6 +70,7 @@ typedef struct PVEBackupDevInfo {
uint8_t dev_id;
bool completed;
char targetfile[PATH_MAX];
+ BdrvDirtyBitmap *bitmap;
BlockDriverState *target;
} PVEBackupDevInfo;
@@ -105,11 +110,12 @@ static bool pvebackup_error_or_canceled(void)
return error_or_canceled;
}
-static void pvebackup_add_transfered_bytes(size_t transferred, size_t zero_bytes)
+static void pvebackup_add_transfered_bytes(size_t transferred, size_t zero_bytes, size_t reused)
{
qemu_mutex_lock(&backup_state.stat.lock);
backup_state.stat.zero_bytes += zero_bytes;
backup_state.stat.transferred += transferred;
+ backup_state.stat.reused += reused;
qemu_mutex_unlock(&backup_state.stat.lock);
}
@@ -148,7 +154,8 @@ pvebackup_co_dump_pbs_cb(
pvebackup_propagate_error(local_err);
return pbs_res;
} else {
- pvebackup_add_transfered_bytes(size, !buf ? size : 0);
+ size_t reused = (pbs_res == 0) ? size : 0;
+ pvebackup_add_transfered_bytes(size, !buf ? size : 0, reused);
}
return size;
@@ -208,11 +215,11 @@ pvebackup_co_dump_vma_cb(
} else {
if (remaining >= VMA_CLUSTER_SIZE) {
assert(ret == VMA_CLUSTER_SIZE);
- pvebackup_add_transfered_bytes(VMA_CLUSTER_SIZE, zero_bytes);
+ pvebackup_add_transfered_bytes(VMA_CLUSTER_SIZE, zero_bytes, 0);
remaining -= VMA_CLUSTER_SIZE;
} else {
assert(ret == remaining);
- pvebackup_add_transfered_bytes(remaining, zero_bytes);
+ pvebackup_add_transfered_bytes(remaining, zero_bytes, 0);
remaining = 0;
}
}
@@ -248,6 +255,18 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
if (local_err != NULL) {
pvebackup_propagate_error(local_err);
}
+ } else {
+ // on error or cancel we cannot ensure synchronization of dirty
+ // bitmaps with backup server, so remove all and do full backup next
+ GList *l = backup_state.di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ if (di->bitmap) {
+ bdrv_release_dirty_bitmap(di->bitmap);
+ }
+ }
}
proxmox_backup_disconnect(backup_state.pbs);
@@ -303,6 +322,12 @@ static void pvebackup_complete_cb(void *opaque, int ret)
// remove self from job queue
backup_state.di_list = g_list_remove(backup_state.di_list, di);
+ if (di->bitmap && ret < 0) {
+ // on error or cancel we cannot ensure synchronization of dirty
+ // bitmaps with backup server, so remove all and do full backup next
+ bdrv_release_dirty_bitmap(di->bitmap);
+ }
+
g_free(di);
qemu_mutex_unlock(&backup_state.backup_mutex);
@@ -470,12 +495,18 @@ static bool create_backup_jobs(void) {
assert(di->target != NULL);
+ MirrorSyncMode sync_mode = MIRROR_SYNC_MODE_FULL;
+ BitmapSyncMode bitmap_mode = BITMAP_SYNC_MODE_NEVER;
+ if (di->bitmap) {
+ sync_mode = MIRROR_SYNC_MODE_BITMAP;
+ bitmap_mode = BITMAP_SYNC_MODE_ON_SUCCESS;
+ }
AioContext *aio_context = bdrv_get_aio_context(di->bs);
aio_context_acquire(aio_context);
BlockJob *job = backup_job_create(
- NULL, di->bs, di->target, backup_state.speed, MIRROR_SYNC_MODE_FULL, NULL,
- BITMAP_SYNC_MODE_NEVER, false, NULL, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
+ NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
+ bitmap_mode, false, NULL, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
JOB_DEFAULT, pvebackup_complete_cb, di, 1, NULL, &local_err);
aio_context_release(aio_context);
@@ -526,6 +557,8 @@ typedef struct QmpBackupTask {
const char *fingerprint;
bool has_fingerprint;
int64_t backup_time;
+ bool has_use_dirty_bitmap;
+ bool use_dirty_bitmap;
bool has_format;
BackupFormat format;
bool has_config_file;
@@ -617,6 +650,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
size_t total = 0;
+ size_t dirty = 0;
l = di_list;
while (l) {
@@ -654,6 +688,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
int dump_cb_block_size = PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE; // Hardcoded (4M)
firewall_name = "fw.conf";
+ bool use_dirty_bitmap = task->has_use_dirty_bitmap && task->use_dirty_bitmap;
+
char *pbs_err = NULL;
pbs = proxmox_backup_new(
task->backup_file,
@@ -673,7 +709,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
goto err;
}
- if (proxmox_backup_co_connect(pbs, task->errp) < 0)
+ int connect_result = proxmox_backup_co_connect(pbs, task->errp);
+ if (connect_result < 0)
goto err;
/* register all devices */
@@ -684,9 +721,40 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
const char *devname = bdrv_get_device_name(di->bs);
- int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, task->errp);
- if (dev_id < 0)
+ BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
+ bool expect_only_dirty = false;
+
+ if (use_dirty_bitmap) {
+ if (bitmap == NULL) {
+ bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, task->errp);
+ if (!bitmap) {
+ goto err;
+ }
+ } else {
+ expect_only_dirty = proxmox_backup_check_incremental(pbs, devname, di->size) != 0;
+ }
+
+ if (expect_only_dirty) {
+ dirty += bdrv_get_dirty_count(bitmap);
+ } else {
+ /* mark entire bitmap as dirty to make full backup */
+ bdrv_set_dirty_bitmap(bitmap, 0, di->size);
+ dirty += di->size;
+ }
+ di->bitmap = bitmap;
+ } else {
+ dirty += di->size;
+
+ /* after a full backup the old dirty bitmap is invalid anyway */
+ if (bitmap != NULL) {
+ bdrv_release_dirty_bitmap(bitmap);
+ }
+ }
+
+ int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, task->errp);
+ if (dev_id < 0) {
goto err;
+ }
if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, task->errp))) {
goto err;
@@ -695,6 +763,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
di->dev_id = dev_id;
}
} else if (format == BACKUP_FORMAT_VMA) {
+ dirty = total;
+
vmaw = vma_writer_create(task->backup_file, uuid, &local_err);
if (!vmaw) {
if (local_err) {
@@ -722,6 +792,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
}
} else if (format == BACKUP_FORMAT_DIR) {
+ dirty = total;
+
if (mkdir(task->backup_file, 0640) != 0) {
error_setg_errno(task->errp, errno, "can't create directory '%s'\n",
task->backup_file);
@@ -794,8 +866,10 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
char *uuid_str = g_strdup(backup_state.stat.uuid_str);
backup_state.stat.total = total;
+ backup_state.stat.dirty = dirty;
backup_state.stat.transferred = 0;
backup_state.stat.zero_bytes = 0;
+ backup_state.stat.reused = format == BACKUP_FORMAT_PBS && dirty >= total ? 0 : total - dirty;
qemu_mutex_unlock(&backup_state.stat.lock);
@@ -819,6 +893,10 @@ err:
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
+ if (di->bitmap) {
+ bdrv_release_dirty_bitmap(di->bitmap);
+ }
+
if (di->target) {
bdrv_unref(di->target);
}
@@ -860,6 +938,7 @@ UuidInfo *qmp_backup(
bool has_fingerprint, const char *fingerprint,
bool has_backup_id, const char *backup_id,
bool has_backup_time, int64_t backup_time,
+ bool has_use_dirty_bitmap, bool use_dirty_bitmap,
bool has_format, BackupFormat format,
bool has_config_file, const char *config_file,
bool has_firewall_file, const char *firewall_file,
@@ -878,6 +957,8 @@ UuidInfo *qmp_backup(
.backup_id = backup_id,
.has_backup_time = has_backup_time,
.backup_time = backup_time,
+ .has_use_dirty_bitmap = has_use_dirty_bitmap,
+ .use_dirty_bitmap = use_dirty_bitmap,
.has_format = has_format,
.format = format,
.has_config_file = has_config_file,
@@ -946,10 +1027,14 @@ BackupStatus *qmp_query_backup(Error **errp)
info->has_total = true;
info->total = backup_state.stat.total;
+ info->has_dirty = true;
+ info->dirty = backup_state.stat.dirty;
info->has_zero_bytes = true;
info->zero_bytes = backup_state.stat.zero_bytes;
info->has_transferred = true;
info->transferred = backup_state.stat.transferred;
+ info->has_reused = true;
+ info->reused = backup_state.stat.reused;
qemu_mutex_unlock(&backup_state.stat.lock);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9054db608c..d4e1c98c50 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -758,8 +758,13 @@
#
# @total: total amount of bytes involved in the backup process
#
+# @dirty: with incremental mode (PBS) this is the amount of bytes involved
+# in the backup process which are marked dirty.
+#
# @transferred: amount of bytes already backed up.
#
+# @reused: amount of bytes reused due to deduplication.
+#
# @zero-bytes: amount of 'zero' bytes detected.
#
# @start-time: time (epoch) when backup job started.
@@ -772,8 +777,8 @@
#
##
{ 'struct': 'BackupStatus',
- 'data': {'*status': 'str', '*errmsg': 'str', '*total': 'int',
- '*transferred': 'int', '*zero-bytes': 'int',
+ 'data': {'*status': 'str', '*errmsg': 'str', '*total': 'int', '*dirty': 'int',
+ '*transferred': 'int', '*zero-bytes': 'int', '*reused': 'int',
'*start-time': 'int', '*end-time': 'int',
'*backup-file': 'str', '*uuid': 'str' } }
@@ -816,6 +821,8 @@
#
# @backup-time: backup timestamp (Unix epoch, required for format 'pbs')
#
+# @use-dirty-bitmap: use dirty bitmap to detect incremental changes since last job (optional for format 'pbs')
+#
# Returns: the uuid of the backup job
#
##
@@ -826,6 +833,7 @@
'*fingerprint': 'str',
'*backup-id': 'str',
'*backup-time': 'int',
+ '*use-dirty-bitmap': 'bool',
'*format': 'BackupFormat',
'*config-file': 'str',
'*firewall-file': 'str',

File diff suppressed because it is too large Load Diff

View File

@@ -5,33 +5,35 @@ Subject: [PATCH] PVE-Backup: pbs-restore - new command to restore from proxmox
backup server
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[WB: add namespace support]
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
meson.build | 4 +
pbs-restore.c | 224 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 228 insertions(+)
pbs-restore.c | 236 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 240 insertions(+)
create mode 100644 pbs-restore.c
diff --git a/meson.build b/meson.build
index 3094f98c47..6f1fafee14 100644
index f6fb9b4fd8..f666d0f028 100644
--- a/meson.build
+++ b/meson.build
@@ -1913,6 +1913,10 @@ if have_tools
vma = executable('vma', files('vma.c', 'vma-reader.c'),
dependencies: [authz, block, crypto, io, qom], install: true)
@@ -4350,6 +4350,10 @@ if have_tools
vma = executable('vma', files('vma.c', 'vma-reader.c') + genh,
dependencies: [authz, block, crypto, io, qemuutil, qom], install: true)
+ pbs_restore = executable('pbs-restore', files('pbs-restore.c'),
+ dependencies: [authz, block, crypto, io, qom,
+ pbs_restore = executable('pbs-restore', files('pbs-restore.c') + genh,
+ dependencies: [authz, block, crypto, io, qemuutil, qom,
+ libproxmox_backup_qemu], install: true)
+
subdir('storage-daemon')
subdir('contrib/rdmacm-mux')
subdir('contrib/elf2dmp')
foreach exe: [ 'qemu-img', 'qemu-io', 'qemu-nbd', 'qemu-storage-daemon']
diff --git a/pbs-restore.c b/pbs-restore.c
new file mode 100644
index 0000000000..4d3f925a1b
index 0000000000..f03d9bab8d
--- /dev/null
+++ b/pbs-restore.c
@@ -0,0 +1,224 @@
@@ -0,0 +1,236 @@
+/*
+ * Qemu image restore helper for Proxmox Backup
+ *
@@ -50,7 +52,6 @@ index 0000000000..4d3f925a1b
+#include <getopt.h>
+#include <string.h>
+
+#include "qemu-common.h"
+#include "qemu/module.h"
+#include "qemu/error-report.h"
+#include "qemu/main-loop.h"
@@ -64,7 +65,7 @@ index 0000000000..4d3f925a1b
+static void help(void)
+{
+ const char *help_msg =
+ "usage: pbs-restore [--repository <repo>] snapshot archive-name target [command options]\n"
+ "usage: pbs-restore [--repository <repo>] [--ns namespace] snapshot archive-name target [command options]\n"
+ ;
+
+ printf("%s", help_msg);
@@ -96,7 +97,7 @@ index 0000000000..4d3f925a1b
+ }
+ res = blk_pwrite_zeroes(callback_data->target, offset, data_len, 0);
+ } else {
+ res = blk_pwrite(callback_data->target, offset, data, data_len, 0);
+ res = blk_pwrite(callback_data->target, offset, data_len, data, 0);
+ }
+
+ if (res < 0) {
@@ -112,6 +113,7 @@ index 0000000000..4d3f925a1b
+ Error *main_loop_err = NULL;
+ const char *format = "raw";
+ const char *repository = NULL;
+ const char *backup_ns = NULL;
+ const char *keyfile = NULL;
+ int verbose = false;
+ bool skip_zero = false;
@@ -125,6 +127,7 @@ index 0000000000..4d3f925a1b
+ {"verbose", no_argument, 0, 'v'},
+ {"format", required_argument, 0, 'f'},
+ {"repository", required_argument, 0, 'r'},
+ {"ns", required_argument, 0, 'n'},
+ {"keyfile", required_argument, 0, 'k'},
+ {0, 0, 0, 0}
+ };
@@ -145,6 +148,9 @@ index 0000000000..4d3f925a1b
+ case 'r':
+ repository = g_strdup(argv[optind - 1]);
+ break;
+ case 'n':
+ backup_ns = g_strdup(argv[optind - 1]);
+ break;
+ case 'k':
+ keyfile = g_strdup(argv[optind - 1]);
+ break;
@@ -195,8 +201,16 @@ index 0000000000..4d3f925a1b
+ fprintf(stderr, "connecting to repository '%s'\n", repository);
+ }
+ char *pbs_error = NULL;
+ ProxmoxRestoreHandle *conn = proxmox_restore_new(
+ repository, snapshot, password, keyfile, key_password, fingerprint, &pbs_error);
+ ProxmoxRestoreHandle *conn = proxmox_restore_new_ns(
+ repository,
+ snapshot,
+ backup_ns,
+ password,
+ keyfile,
+ key_password,
+ fingerprint,
+ &pbs_error
+ );
+ if (conn == NULL) {
+ fprintf(stderr, "restore failed: %s\n", pbs_error);
+ return -1;

View File

@@ -1,218 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Dietmar Maurer <dietmar@proxmox.com>
Date: Thu, 9 Jul 2020 12:53:08 +0200
Subject: [PATCH] PVE: various PBS fixes
pbs: fix crypt and compress parameters
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
PVE: handle PBS write callback with big blocks correctly
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
PVE: add zero block handling to PBS dump callback
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 4 ++-
pve-backup.c | 57 +++++++++++++++++++++++++++-------
qapi/block-core.json | 6 ++++
3 files changed, 54 insertions(+), 13 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 056d14deee..46c63b1cf9 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1039,7 +1039,9 @@ void hmp_backup(Monitor *mon, const QDict *qdict)
false, NULL, // PBS fingerprint
false, NULL, // PBS backup-id
false, 0, // PBS backup-time
- false, false, // PBS incremental
+ false, false, // PBS use-dirty-bitmap
+ false, false, // PBS compress
+ false, false, // PBS encrypt
true, dir ? BACKUP_FORMAT_DIR : BACKUP_FORMAT_VMA,
false, NULL, false, NULL, !!devlist,
devlist, qdict_haskey(qdict, "speed"), speed, &error);
diff --git a/pve-backup.c b/pve-backup.c
index 1cd9d31d7c..b8182aaf89 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -8,6 +8,7 @@
#include "block/blockjob.h"
#include "qapi/qapi-commands-block.h"
#include "qapi/qmp/qerror.h"
+#include "qemu/cutils.h"
/* PVE backup state and related function */
@@ -67,6 +68,7 @@ opts_init(pvebackup_init);
typedef struct PVEBackupDevInfo {
BlockDriverState *bs;
size_t size;
+ uint64_t block_size;
uint8_t dev_id;
bool completed;
char targetfile[PATH_MAX];
@@ -135,10 +137,13 @@ pvebackup_co_dump_pbs_cb(
PVEBackupDevInfo *di = opaque;
assert(backup_state.pbs);
+ assert(buf);
Error *local_err = NULL;
int pbs_res = -1;
+ bool is_zero_block = size == di->block_size && buffer_is_zero(buf, size);
+
qemu_co_mutex_lock(&backup_state.dump_callback_mutex);
// avoid deadlock if job is cancelled
@@ -147,17 +152,29 @@ pvebackup_co_dump_pbs_cb(
return -1;
}
- pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id, buf, start, size, &local_err);
- qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+ uint64_t transferred = 0;
+ uint64_t reused = 0;
+ while (transferred < size) {
+ uint64_t left = size - transferred;
+ uint64_t to_transfer = left < di->block_size ? left : di->block_size;
- if (pbs_res < 0) {
- pvebackup_propagate_error(local_err);
- return pbs_res;
- } else {
- size_t reused = (pbs_res == 0) ? size : 0;
- pvebackup_add_transfered_bytes(size, !buf ? size : 0, reused);
+ pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id,
+ is_zero_block ? NULL : buf + transferred, start + transferred,
+ to_transfer, &local_err);
+ transferred += to_transfer;
+
+ if (pbs_res < 0) {
+ pvebackup_propagate_error(local_err);
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+ return pbs_res;
+ }
+
+ reused += pbs_res == 0 ? to_transfer : 0;
}
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+ pvebackup_add_transfered_bytes(size, is_zero_block ? size : 0, reused);
+
return size;
}
@@ -178,6 +195,7 @@ pvebackup_co_dump_vma_cb(
int ret = -1;
assert(backup_state.vmaw);
+ assert(buf);
uint64_t remaining = size;
@@ -204,9 +222,7 @@ pvebackup_co_dump_vma_cb(
qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
++cluster_num;
- if (buf) {
- buf += VMA_CLUSTER_SIZE;
- }
+ buf += VMA_CLUSTER_SIZE;
if (ret < 0) {
Error *local_err = NULL;
vma_writer_error_propagate(backup_state.vmaw, &local_err);
@@ -567,6 +583,10 @@ typedef struct QmpBackupTask {
const char *firewall_file;
bool has_devlist;
const char *devlist;
+ bool has_compress;
+ bool compress;
+ bool has_encrypt;
+ bool encrypt;
bool has_speed;
int64_t speed;
Error **errp;
@@ -690,6 +710,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
bool use_dirty_bitmap = task->has_use_dirty_bitmap && task->use_dirty_bitmap;
+
char *pbs_err = NULL;
pbs = proxmox_backup_new(
task->backup_file,
@@ -699,8 +720,10 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
task->has_password ? task->password : NULL,
task->has_keyfile ? task->keyfile : NULL,
task->has_key_password ? task->key_password : NULL,
+ task->has_compress ? task->compress : true,
+ task->has_encrypt ? task->encrypt : task->has_keyfile,
task->has_fingerprint ? task->fingerprint : NULL,
- &pbs_err);
+ &pbs_err);
if (!pbs) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
@@ -719,6 +742,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
+ di->block_size = dump_cb_block_size;
+
const char *devname = bdrv_get_device_name(di->bs);
BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
@@ -939,6 +964,8 @@ UuidInfo *qmp_backup(
bool has_backup_id, const char *backup_id,
bool has_backup_time, int64_t backup_time,
bool has_use_dirty_bitmap, bool use_dirty_bitmap,
+ bool has_compress, bool compress,
+ bool has_encrypt, bool encrypt,
bool has_format, BackupFormat format,
bool has_config_file, const char *config_file,
bool has_firewall_file, const char *firewall_file,
@@ -949,6 +976,8 @@ UuidInfo *qmp_backup(
.backup_file = backup_file,
.has_password = has_password,
.password = password,
+ .has_keyfile = has_keyfile,
+ .keyfile = keyfile,
.has_key_password = has_key_password,
.key_password = key_password,
.has_fingerprint = has_fingerprint,
@@ -959,6 +988,10 @@ UuidInfo *qmp_backup(
.backup_time = backup_time,
.has_use_dirty_bitmap = has_use_dirty_bitmap,
.use_dirty_bitmap = use_dirty_bitmap,
+ .has_compress = has_compress,
+ .compress = compress,
+ .has_encrypt = has_encrypt,
+ .encrypt = encrypt,
.has_format = has_format,
.format = format,
.has_config_file = has_config_file,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index d4e1c98c50..0fda1e3fd3 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -823,6 +823,10 @@
#
# @use-dirty-bitmap: use dirty bitmap to detect incremental changes since last job (optional for format 'pbs')
#
+# @compress: use compression (optional for format 'pbs', defaults to true)
+#
+# @encrypt: use encryption ((optional for format 'pbs', defaults to true if there is a keyfile)
+#
# Returns: the uuid of the backup job
#
##
@@ -834,6 +838,8 @@
'*backup-id': 'str',
'*backup-time': 'int',
'*use-dirty-bitmap': 'bool',
+ '*compress': 'bool',
+ '*encrypt': 'bool',
'*format': 'BackupFormat',
'*config-file': 'str',
'*firewall-file': 'str',

View File

@@ -6,35 +6,41 @@ Subject: [PATCH] PVE: Add PBS block driver to map backup archives into VMs
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
[error cleanups, file_open implementation]
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[WB: add namespace support]
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[FE: adapt to changed function signatures
make pbs_co_preadv return values consistent with QEMU
getlength is now a coroutine function]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/meson.build | 3 +
block/pbs.c | 271 +++++++++++++++++++++++++++++++++++++++++++
configure | 9 ++
meson.build | 1 +
qapi/block-core.json | 14 ++-
5 files changed, 297 insertions(+), 1 deletion(-)
block/meson.build | 2 +
block/pbs.c | 306 +++++++++++++++++++++++++++++++++++++++++++
meson.build | 2 +-
qapi/block-core.json | 29 ++++
qapi/pragma.json | 1 +
5 files changed, 339 insertions(+), 1 deletion(-)
create mode 100644 block/pbs.c
diff --git a/block/meson.build b/block/meson.build
index dfae565db3..a070060e53 100644
index 2367e1ac1b..e178047ec9 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -49,6 +49,9 @@ block_ss.add(files(
@@ -49,6 +49,8 @@ block_ss.add(files(
'../pve-backup.c',
), libproxmox_backup_qemu)
+block_ss.add(when: 'CONFIG_PBS_BDRV', if_true: files('pbs.c'))
+block_ss.add(when: 'CONFIG_PBS_BDRV', if_true: libproxmox_backup_qemu)
+block_ss.add(files('pbs.c'), libproxmox_backup_qemu)
+
softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
system_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
system_ss.add(files('block-ram-registrar.c'))
diff --git a/block/pbs.c b/block/pbs.c
new file mode 100644
index 0000000000..1481a2bfd1
index 0000000000..2d5e28ce8f
--- /dev/null
+++ b/block/pbs.c
@@ -0,0 +1,271 @@
@@ -0,0 +1,306 @@
+/*
+ * Proxmox Backup Server read-only block driver
+ */
@@ -47,10 +53,12 @@ index 0000000000..1481a2bfd1
+#include "qemu/option.h"
+#include "qemu/cutils.h"
+#include "block/block_int.h"
+#include "block/block-io.h"
+
+#include <proxmox-backup-qemu.h>
+
+#define PBS_OPT_REPOSITORY "repository"
+#define PBS_OPT_NAMESPACE "namespace"
+#define PBS_OPT_SNAPSHOT "snapshot"
+#define PBS_OPT_ARCHIVE "archive"
+#define PBS_OPT_KEYFILE "keyfile"
@@ -60,10 +68,11 @@ index 0000000000..1481a2bfd1
+
+typedef struct {
+ ProxmoxRestoreHandle *conn;
+ char aid;
+ uint8_t aid;
+ int64_t length;
+
+ char *repository;
+ char *namespace;
+ char *snapshot;
+ char *archive;
+} BDRVPBSState;
@@ -78,6 +87,11 @@ index 0000000000..1481a2bfd1
+ .help = "The server address and repository to connect to.",
+ },
+ {
+ .name = PBS_OPT_NAMESPACE,
+ .type = QEMU_OPT_STRING,
+ .help = "Optional: The snapshot's namespace.",
+ },
+ {
+ .name = PBS_OPT_SNAPSHOT,
+ .type = QEMU_OPT_STRING,
+ .help = "The snapshot to read.",
@@ -113,7 +127,7 @@ index 0000000000..1481a2bfd1
+
+
+// filename format:
+// pbs:repository=<repo>,snapshot=<snap>,password=<pw>,key_password=<kpw>,fingerprint=<fp>,archive=<archive>
+// pbs:repository=<repo>,namespace=<ns>,snapshot=<snap>,password=<pw>,key_password=<kpw>,fingerprint=<fp>,archive=<archive>
+static void pbs_parse_filename(const char *filename, QDict *options,
+ Error **errp)
+{
@@ -149,6 +163,7 @@ index 0000000000..1481a2bfd1
+ s->archive = g_strdup(qemu_opt_get(opts, PBS_OPT_ARCHIVE));
+ const char *keyfile = qemu_opt_get(opts, PBS_OPT_KEYFILE);
+ const char *password = qemu_opt_get(opts, PBS_OPT_PASSWORD);
+ const char *namespace = qemu_opt_get(opts, PBS_OPT_NAMESPACE);
+ const char *fingerprint = qemu_opt_get(opts, PBS_OPT_FINGERPRINT);
+ const char *key_password = qemu_opt_get(opts, PBS_OPT_ENCRYPTION_PASSWORD);
+
@@ -161,9 +176,12 @@ index 0000000000..1481a2bfd1
+ if (!key_password) {
+ key_password = getenv("PBS_ENCRYPTION_PASSWORD");
+ }
+ if (namespace) {
+ s->namespace = g_strdup(namespace);
+ }
+
+ /* connect to PBS server in read mode */
+ s->conn = proxmox_restore_new(s->repository, s->snapshot, password,
+ s->conn = proxmox_restore_new_ns(s->repository, s->snapshot, s->namespace, password,
+ keyfile, key_password, fingerprint, &pbs_error);
+
+ /* invalidates qemu_opt_get char pointers from above */
@@ -183,12 +201,18 @@ index 0000000000..1481a2bfd1
+ }
+
+ /* acquire handle and length */
+ s->aid = proxmox_restore_open_image(s->conn, s->archive, &pbs_error);
+ if (s->aid < 0) {
+ ret = proxmox_restore_open_image(s->conn, s->archive, &pbs_error);
+ if (ret < 0) {
+ if (pbs_error && errp) error_setg(errp, "PBS open_image failed: %s", pbs_error);
+ if (pbs_error) proxmox_backup_free_error(pbs_error);
+ return -ENODEV;
+ }
+ if (ret > UINT8_MAX) {
+ error_setg(errp, "PBS open_image returned an ID larger than %u", UINT8_MAX);
+ return -ENODEV;
+ }
+ s->aid = ret;
+
+ s->length = proxmox_restore_get_image_length(s->conn, s->aid, &pbs_error);
+ if (s->length < 0) {
+ if (pbs_error && errp) error_setg(errp, "PBS get_image_length failed: %s", pbs_error);
@@ -199,21 +223,17 @@ index 0000000000..1481a2bfd1
+ return 0;
+}
+
+static int pbs_file_open(BlockDriverState *bs, QDict *options, int flags,
+ Error **errp)
+{
+ return pbs_open(bs, options, flags, errp);
+}
+
+static void pbs_close(BlockDriverState *bs) {
+ BDRVPBSState *s = bs->opaque;
+ g_free(s->repository);
+ g_free(s->namespace);
+ g_free(s->snapshot);
+ g_free(s->archive);
+ proxmox_restore_disconnect(s->conn);
+}
+
+static int64_t pbs_getlength(BlockDriverState *bs)
+static coroutine_fn int64_t GRAPH_RDLOCK
+pbs_co_getlength(BlockDriverState *bs)
+{
+ BDRVPBSState *s = bs->opaque;
+ return s->length;
@@ -230,21 +250,35 @@ index 0000000000..1481a2bfd1
+ aio_co_schedule(rcb->ctx, rcb->co);
+}
+
+static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
+ uint64_t offset, uint64_t bytes,
+ QEMUIOVector *qiov, int flags)
+static coroutine_fn int GRAPH_RDLOCK
+pbs_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ BDRVPBSState *s = bs->opaque;
+ int ret;
+ char *pbs_error = NULL;
+ uint8_t *buf = malloc(bytes);
+ uint8_t *buf;
+ bool inline_buf = true;
+
+ /* for single-buffer IO vectors we can fast-path the write directly to it */
+ if (qiov->niov == 1 && qiov->iov->iov_len >= bytes) {
+ buf = qiov->iov->iov_base;
+ } else {
+ inline_buf = false;
+ buf = g_malloc(bytes);
+ }
+
+ if (offset < 0 || bytes < 0) {
+ fprintf(stderr, "unexpected negative 'offset' or 'bytes' value!\n");
+ return -EIO;
+ }
+
+ ReadCallbackData rcb = {
+ .co = qemu_coroutine_self(),
+ .ctx = bdrv_get_aio_context(bs),
+ };
+
+ proxmox_restore_read_image_at_async(s->conn, s->aid, buf, offset, bytes,
+ proxmox_restore_read_image_at_async(s->conn, s->aid, buf, (uint64_t)offset, (uint64_t)bytes,
+ read_callback, (void *) &rcb, &ret, &pbs_error);
+
+ qemu_coroutine_yield();
@@ -255,26 +289,34 @@ index 0000000000..1481a2bfd1
+ return -EIO;
+ }
+
+ qemu_iovec_from_buf(qiov, 0, buf, bytes);
+ free(buf);
+ if (!inline_buf) {
+ qemu_iovec_from_buf(qiov, 0, buf, bytes);
+ g_free(buf);
+ }
+
+ return ret;
+ return 0;
+}
+
+static coroutine_fn int pbs_co_pwritev(BlockDriverState *bs,
+ uint64_t offset, uint64_t bytes,
+ QEMUIOVector *qiov, int flags)
+static coroutine_fn int GRAPH_RDLOCK
+pbs_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ fprintf(stderr, "pbs-bdrv: cannot write to backup file, make sure "
+ "any attached disk devices are set to read-only!\n");
+ return -EPERM;
+}
+
+static void pbs_refresh_filename(BlockDriverState *bs)
+static void GRAPH_RDLOCK
+pbs_refresh_filename(BlockDriverState *bs)
+{
+ BDRVPBSState *s = bs->opaque;
+ snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s/%s(%s)",
+ s->repository, s->snapshot, s->archive);
+ if (s->namespace) {
+ snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s/%s:%s(%s)",
+ s->repository, s->namespace, s->snapshot, s->archive);
+ } else {
+ snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s/%s(%s)",
+ s->repository, s->snapshot, s->archive);
+ }
+}
+
+static const char *const pbs_strong_runtime_opts[] = {
@@ -288,10 +330,9 @@ index 0000000000..1481a2bfd1
+
+ .bdrv_parse_filename = pbs_parse_filename,
+
+ .bdrv_file_open = pbs_file_open,
+ .bdrv_open = pbs_open,
+ .bdrv_close = pbs_close,
+ .bdrv_getlength = pbs_getlength,
+ .bdrv_co_getlength = pbs_co_getlength,
+
+ .bdrv_co_preadv = pbs_co_preadv,
+ .bdrv_co_pwritev = pbs_co_pwritev,
@@ -306,73 +347,32 @@ index 0000000000..1481a2bfd1
+}
+
+block_init(bdrv_pbs_init);
diff --git a/configure b/configure
index 18c26e0389..33d9933871 100755
--- a/configure
+++ b/configure
@@ -436,6 +436,7 @@ vvfat="yes"
qed="yes"
parallels="yes"
sheepdog="no"
+pbs_bdrv="yes"
libxml2=""
debug_mutex="no"
libpmem=""
@@ -1461,6 +1462,10 @@ for opt do
;;
--enable-sheepdog) sheepdog="yes"
;;
+ --disable-pbs-bdrv) pbs_bdrv="no"
+ ;;
+ --enable-pbs-bdrv) pbs_bdrv="yes"
+ ;;
--disable-vhost-user) vhost_user="no"
;;
--enable-vhost-user) vhost_user="yes"
@@ -1843,6 +1848,7 @@ disabled with --disable-FEATURE, default is enabled if available:
qed qed image format support
parallels parallels image format support
sheepdog sheepdog block driver support (deprecated)
+ pbs-bdrv Proxmox backup server read-only block driver support
crypto-afalg Linux AF_ALG crypto backend driver
capstone capstone disassembler support
debug-mutex mutex debugging support
@@ -6682,6 +6688,9 @@ if test "$sheepdog" = "yes" ; then
add_to deprecated_features "sheepdog"
echo "CONFIG_SHEEPDOG=y" >> $config_host_mak
fi
+if test "$pbs_bdrv" = "yes" ; then
+ echo "CONFIG_PBS_BDRV=y" >> $config_host_mak
+fi
if test "$pty_h" = "yes" ; then
echo "HAVE_PTY_H=y" >> $config_host_mak
fi
diff --git a/meson.build b/meson.build
index 6f1fafee14..4d156d35ce 100644
index f666d0f028..4c85736ec3 100644
--- a/meson.build
+++ b/meson.build
@@ -2199,6 +2199,7 @@ summary_info += {'vvfat support': config_host.has_key('CONFIG_VVFAT')}
summary_info += {'qed support': config_host.has_key('CONFIG_QED')}
summary_info += {'parallels support': config_host.has_key('CONFIG_PARALLELS')}
summary_info += {'sheepdog support': config_host.has_key('CONFIG_SHEEPDOG')}
@@ -4815,7 +4815,7 @@ summary_info += {'Query Processing Library support': qpl}
summary_info += {'UADK Library support': uadk}
summary_info += {'qatzip support': qatzip}
summary_info += {'NUMA host support': numa}
-summary_info += {'capstone': capstone}
+summary_info += {'PBS bdrv support': config_host.has_key('CONFIG_PBS_BDRV')}
summary_info += {'capstone': capstone_opt == 'disabled' ? false : capstone_opt}
summary_info += {'libpmem support': config_host.has_key('CONFIG_LIBPMEM')}
summary_info += {'libdaxctl support': config_host.has_key('CONFIG_LIBDAXCTL')}
summary_info += {'libpmem support': libpmem}
summary_info += {'libdaxctl support': libdaxctl}
summary_info += {'libcbor support': libcbor}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 0fda1e3fd3..553112d998 100644
index 68caf30084..d45e8975a7 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2975,7 +2975,7 @@
'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
{ 'name': 'replication', 'if': 'defined(CONFIG_REPLICATION)' },
- 'sheepdog',
+ 'sheepdog', 'pbs',
'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
##
@@ -3039,6 +3039,17 @@
@@ -3466,6 +3466,7 @@
'parallels', 'preallocate', 'qcow', 'qcow2', 'qed', 'quorum',
'raw', 'rbd',
{ 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
+ 'pbs',
'ssh', 'throttle', 'vdi', 'vhdx',
{ 'name': 'virtio-blk-vfio-pci', 'if': 'CONFIG_BLKIO' },
{ 'name': 'virtio-blk-vhost-user', 'if': 'CONFIG_BLKIO' },
@@ -3552,6 +3553,33 @@
{ 'struct': 'BlockdevOptionsNull',
'data': { '*size': 'int', '*latency-ns': 'uint64', '*read-zeroes': 'bool' } }
@@ -381,20 +381,48 @@ index 0fda1e3fd3..553112d998 100644
+#
+# Driver specific block device options for the PBS backend.
+#
+# @repository: Proxmox Backup Server repository.
+#
+# @snapshot: backup snapshots ID.
+#
+# @archive: archive name.
+#
+# @keyfile: keyfile to use for encryption.
+#
+# @password: password to use for connection.
+#
+# @fingerprint: backup server fingerprint.
+#
+# @key_password: password to unlock key.
+#
+# @namespace: namespace where backup snapshot lives.
+#
+##
+{ 'struct': 'BlockdevOptionsPbs',
+ 'data': { 'repository': 'str', 'snapshot': 'str', 'archive': 'str',
+ '*keyfile': 'str', '*password': 'str', '*fingerprint': 'str',
+ '*key_password': 'str' } }
+ '*key_password': 'str', '*namespace': 'str' } }
+
##
# @BlockdevOptionsNVMe:
#
@@ -4148,6 +4159,7 @@
@@ -4993,6 +5021,7 @@
'nfs': 'BlockdevOptionsNfs',
'null-aio': 'BlockdevOptionsNull',
'null-co': 'BlockdevOptionsNull',
+ 'pbs': 'BlockdevOptionsPbs',
'nvme': 'BlockdevOptionsNVMe',
'parallels': 'BlockdevOptionsGenericFormat',
'qcow2': 'BlockdevOptionsQcow2',
'nvme-io_uring': { 'type': 'BlockdevOptionsNvmeIoUring',
'if': 'CONFIG_BLKIO' },
diff --git a/qapi/pragma.json b/qapi/pragma.json
index 6aaa9cb975..e9c595c4ba 100644
--- a/qapi/pragma.json
+++ b/qapi/pragma.json
@@ -91,6 +91,7 @@
'BlockInfo', # query-block
'BlockdevAioOptions', # blockdev-add, -blockdev
'BlockdevDriver', # blockdev-add, query-blockstats, ...
+ 'BlockdevOptionsPbs', # for PBS backwards compat
'BlockdevVmdkAdapterType', # blockdev-create (to match VMDK spec)
'BlockdevVmdkSubformat', # blockdev-create (to match VMDK spec)
'ColoCompareProperties', # object_add, -object

View File

@@ -1,74 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 8 Jul 2020 11:57:53 +0200
Subject: [PATCH] PVE: add query_proxmox_support QMP command
Generic interface for future use, currently used for PBS dirty-bitmap
backup support.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[PVE: query-proxmox-support: include library version]
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
pve-backup.c | 9 +++++++++
qapi/block-core.json | 29 +++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/pve-backup.c b/pve-backup.c
index b8182aaf89..98e79552ef 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -1073,3 +1073,12 @@ BackupStatus *qmp_query_backup(Error **errp)
return info;
}
+
+ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
+{
+ ProxmoxSupportStatus *ret = g_malloc0(sizeof(*ret));
+ ret->pbs_library_version = g_strdup(proxmox_backup_qemu_version());
+ ret->pbs_dirty_bitmap = true;
+ ret->pbs_dirty_bitmap_savevm = true;
+ return ret;
+}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 553112d998..f3608390c4 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -868,6 +868,35 @@
##
{ 'command': 'backup-cancel' }
+##
+# @ProxmoxSupportStatus:
+#
+# Contains info about supported features added by Proxmox.
+#
+# @pbs-dirty-bitmap: True if dirty-bitmap-incremental backups to PBS are
+# supported.
+#
+# @pbs-dirty-bitmap-savevm: True if 'dirty-bitmaps' migration capability can
+# safely be set for savevm-async.
+#
+# @pbs-library-version: Running version of libproxmox-backup-qemu0 library.
+#
+##
+{ 'struct': 'ProxmoxSupportStatus',
+ 'data': { 'pbs-dirty-bitmap': 'bool',
+ 'pbs-dirty-bitmap-savevm': 'bool',
+ 'pbs-library-version': 'str' } }
+
+##
+# @query-proxmox-support:
+#
+# Returns information about supported features added by Proxmox.
+#
+# Returns: @ProxmoxSupportStatus
+#
+##
+{ 'command': 'query-proxmox-support', 'returns': 'ProxmoxSupportStatus' }
+
##
# @BlockDeviceTimedStats:
#

View File

@@ -7,51 +7,53 @@ QEMU uses the logging for error messages usually, so LOG_ERR is most
fitting.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
meson.build | 2 ++
meson.build | 3 ++-
os-posix.c | 7 +++++--
2 files changed, 7 insertions(+), 2 deletions(-)
2 files changed, 7 insertions(+), 3 deletions(-)
diff --git a/meson.build b/meson.build
index 4d156d35ce..737ea9e5d7 100644
index 4c85736ec3..57f666d722 100644
--- a/meson.build
+++ b/meson.build
@@ -726,6 +726,7 @@ keyutils = dependency('libkeyutils', required: false,
@@ -2130,6 +2130,7 @@ endif
has_gettid = cc.has_function('gettid')
libuuid = cc.find_library('uuid', required: true)
+libsystemd = cc.find_library('systemd', required: true)
libproxmox_backup_qemu = cc.find_library('proxmox_backup_qemu', required: true)
# Malloc tests
@@ -1539,6 +1540,7 @@ blockdev_ss.add(files(
# os-posix.c contains POSIX-specific functions used by qemu-storage-daemon,
# os-win32.c does not
blockdev_ss.add(when: 'CONFIG_POSIX', if_true: files('os-posix.c'))
+blockdev_ss.add(when: 'CONFIG_POSIX', if_true: libsystemd)
softmmu_ss.add(when: 'CONFIG_WIN32', if_true: [files('os-win32.c')])
# libselinux
@@ -3744,7 +3745,7 @@ if have_block
if host_os == 'windows'
system_ss.add(files('os-win32.c'))
else
- blockdev_ss.add(files('os-posix.c'))
+ blockdev_ss.add(files('os-posix.c'), libsystemd)
endif
endif
common_ss.add(files('cpus-common.c'))
diff --git a/os-posix.c b/os-posix.c
index 1de2839554..ac4f652923 100644
index 43f9a43f3f..a47e46d1c2 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -28,6 +28,8 @@
@@ -29,6 +29,8 @@
#include <pwd.h>
#include <grp.h>
#include <libgen.h>
+#include <systemd/sd-journal.h>
+#include <syslog.h>
#include "qemu-common.h"
/* Needed early for CONFIG_BSD etc. */
@@ -288,9 +290,10 @@ void os_setup_post(void)
#include "qemu/error-report.h"
#include "qemu/log.h"
@@ -306,9 +308,10 @@ void os_setup_post(void)
dup2(fd, 0);
dup2(fd, 1);
- /* In case -D is given do not redirect stderr to /dev/null */
+ /* In case -D is given do not redirect stderr to journal */
if (!qemu_logfile) {
if (!qemu_log_enabled()) {
- dup2(fd, 2);
+ int journal_fd = sd_journal_stream_fd("QEMU", LOG_ERR, 0);
+ dup2(journal_fd, 2);

View File

@@ -12,49 +12,69 @@ Also add a flag to query-proxmox-support so qemu-server can determine if
safe migration is possible and makes sense.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: split up state_pending for 8.0]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
include/migration/misc.h | 3 ++
migration/meson.build | 2 +
migration/pbs-state.c | 106 +++++++++++++++++++++++++++++++++++++++
migration/migration.c | 1 +
migration/pbs-state.c | 104 +++++++++++++++++++++++++++++++++++++++
pve-backup.c | 1 +
qapi/block-core.json | 6 +++
softmmu/vl.c | 1 +
6 files changed, 119 insertions(+)
6 files changed, 117 insertions(+)
create mode 100644 migration/pbs-state.c
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 34e7d75713..f83816dd3c 100644
index 804eb23c06..c75b146ae6 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -75,4 +75,7 @@ bool migration_in_incoming_postcopy(void);
/* migration/block-dirty-bitmap.c */
void dirty_bitmap_mig_init(void);
@@ -106,4 +106,7 @@ bool migration_incoming_postcopy_advised(void);
/* True if background snapshot is active */
bool migration_in_bg_snapshot(void);
+/* migration/pbs-state.c */
+void pbs_state_mig_init(void);
+
#endif
diff --git a/migration/meson.build b/migration/meson.build
index e62b79b60f..b90a04aa75 100644
index 075b013971..eca57cb2a3 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -7,8 +7,10 @@ migration_files = files(
'qemu-file-channel.c',
@@ -8,6 +8,7 @@ migration_files = files(
'qemu-file.c',
'qjson.c',
+ 'pbs-state.c',
'yank_functions.c',
)
softmmu_ss.add(migration_files)
+softmmu_ss.add(libproxmox_backup_qemu)
+system_ss.add(libproxmox_backup_qemu)
softmmu_ss.add(files(
system_ss.add(files(
'block-dirty-bitmap.c',
@@ -27,6 +28,7 @@ system_ss.add(files(
'multifd-zlib.c',
'multifd-zero-page.c',
'options.c',
+ 'pbs-state.c',
'postcopy-ram.c',
'savevm.c',
'savevm-async.c',
diff --git a/migration/migration.c b/migration/migration.c
index 8c5bd0a75c..491d9aa017 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -266,6 +266,7 @@ void migration_object_init(void)
/* Initialize cpu throttle timers */
cpu_throttle_init();
+ pbs_state_mig_init();
}
typedef struct {
diff --git a/migration/pbs-state.c b/migration/pbs-state.c
new file mode 100644
index 0000000000..29f2b3860d
index 0000000000..a97187e4d7
--- /dev/null
+++ b/migration/pbs-state.c
@@ -0,0 +1,106 @@
@@ -0,0 +1,104 @@
+/*
+ * PBS (dirty-bitmap) state migration
+ */
@@ -73,11 +93,8 @@ index 0000000000..29f2b3860d
+/* state is accessed via this static variable directly, 'opaque' is NULL */
+static PBSState pbs_state;
+
+static void pbs_state_save_pending(QEMUFile *f, void *opaque,
+ uint64_t max_size,
+ uint64_t *res_precopy_only,
+ uint64_t *res_compatible,
+ uint64_t *res_postcopy_only)
+static void pbs_state_pending(void *opaque, uint64_t *must_precopy,
+ uint64_t *can_postcopy)
+{
+ /* we send everything in save_setup, so nothing is ever pending */
+}
@@ -103,7 +120,7 @@ index 0000000000..29f2b3860d
+}
+
+/* serialize PBS state and send to target via f, called on source */
+static int pbs_state_save_setup(QEMUFile *f, void *opaque)
+static int pbs_state_save_setup(QEMUFile *f, void *opaque, Error **errp)
+{
+ size_t buf_size;
+ uint8_t *buf = proxmox_export_state(&buf_size);
@@ -147,7 +164,8 @@ index 0000000000..29f2b3860d
+static SaveVMHandlers savevm_pbs_state_handlers = {
+ .save_setup = pbs_state_save_setup,
+ .has_postcopy = pbs_state_has_postcopy,
+ .save_live_pending = pbs_state_save_pending,
+ .state_pending_exact = pbs_state_pending,
+ .state_pending_estimate = pbs_state_pending,
+ .is_active_iterate = pbs_state_is_active_iterate,
+ .load_state = pbs_state_load,
+ .is_active = pbs_state_is_active,
@@ -162,22 +180,22 @@ index 0000000000..29f2b3860d
+ NULL);
+}
diff --git a/pve-backup.c b/pve-backup.c
index e671ed8d48..bd2647e5f3 100644
index fea0152de0..faa6a9b93c 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -1130,6 +1130,7 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
@@ -1083,6 +1083,7 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
ret->pbs_library_version = g_strdup(proxmox_backup_qemu_version());
ret->pbs_dirty_bitmap = true;
ret->pbs_dirty_bitmap_savevm = true;
+ ret->pbs_dirty_bitmap_migration = true;
ret->query_bitmap_info = true;
return ret;
}
ret->pbs_masterkey = true;
ret->backup_max_workers = true;
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9b827cbe43..30eb1262ff 100644
index d45e8975a7..9795247c1f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -884,6 +884,11 @@
@@ -1004,6 +1004,11 @@
# @pbs-dirty-bitmap-savevm: True if 'dirty-bitmaps' migration capability can
# safely be set for savevm-async.
#
@@ -186,26 +204,14 @@ index 9b827cbe43..30eb1262ff 100644
+# migration cap if this is false/unset may lead
+# to crashes on migration!
+#
# @pbs-library-version: Running version of libproxmox-backup-qemu0 library.
# @pbs-masterkey: True if the QMP backup call supports the 'master_keyfile'
# parameter.
#
##
@@ -891,6 +896,7 @@
@@ -1017,6 +1022,7 @@
'data': { 'pbs-dirty-bitmap': 'bool',
'query-bitmap-info': 'bool',
'pbs-dirty-bitmap-savevm': 'bool',
+ 'pbs-dirty-bitmap-migration': 'bool',
'pbs-library-version': 'str' } }
##
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 5b5512128e..6721889fee 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -4304,6 +4304,7 @@ void qemu_init(int argc, char **argv, char **envp)
blk_mig_init();
ram_mig_init();
dirty_bitmap_mig_init();
+ pbs_state_mig_init();
qemu_opts_foreach(qemu_find_opts("mon"),
mon_init_func, NULL, &error_fatal);
'pbs-masterkey': 'bool',
'pbs-library-version': 'str',
'backup-max-workers': 'bool' } }

View File

@@ -1,440 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 19 Aug 2020 17:02:00 +0200
Subject: [PATCH] PVE: add query-pbs-bitmap-info QMP call
Returns advanced information about dirty bitmaps used (or not used) for
the latest PBS backup.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
monitor/hmp-cmds.c | 28 ++++++-----
pve-backup.c | 117 ++++++++++++++++++++++++++++++++-----------
qapi/block-core.json | 56 +++++++++++++++++++++
3 files changed, 159 insertions(+), 42 deletions(-)
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 604026bb37..95f4e7f5c1 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -198,6 +198,7 @@ void hmp_info_mice(Monitor *mon, const QDict *qdict)
void hmp_info_backup(Monitor *mon, const QDict *qdict)
{
BackupStatus *info;
+ PBSBitmapInfoList *bitmap_info;
info = qmp_query_backup(NULL);
@@ -228,26 +229,29 @@ void hmp_info_backup(Monitor *mon, const QDict *qdict)
// this should not happen normally
monitor_printf(mon, "Total size: %d\n", 0);
} else {
- bool incremental = false;
size_t total_or_dirty = info->total;
- if (info->has_transferred) {
- if (info->has_dirty && info->dirty) {
- if (info->dirty < info->total) {
- total_or_dirty = info->dirty;
- incremental = true;
- }
- }
+ bitmap_info = qmp_query_pbs_bitmap_info(NULL);
+
+ while (bitmap_info) {
+ monitor_printf(mon, "Drive %s:\n",
+ bitmap_info->value->drive);
+ monitor_printf(mon, " bitmap action: %s\n",
+ PBSBitmapAction_str(bitmap_info->value->action));
+ monitor_printf(mon, " size: %zd\n",
+ bitmap_info->value->size);
+ monitor_printf(mon, " dirty: %zd\n",
+ bitmap_info->value->dirty);
+ bitmap_info = bitmap_info->next;
}
- int per = (info->transferred * 100)/total_or_dirty;
-
- monitor_printf(mon, "Backup mode: %s\n", incremental ? "incremental" : "full");
+ qapi_free_PBSBitmapInfoList(bitmap_info);
int zero_per = (info->has_zero_bytes && info->zero_bytes) ?
(info->zero_bytes * 100)/info->total : 0;
monitor_printf(mon, "Total size: %zd\n", info->total);
+ int trans_per = (info->transferred * 100)/total_or_dirty;
monitor_printf(mon, "Transferred bytes: %zd (%d%%)\n",
- info->transferred, per);
+ info->transferred, trans_per);
monitor_printf(mon, "Zero bytes: %zd (%d%%)\n",
info->zero_bytes, zero_per);
diff --git a/pve-backup.c b/pve-backup.c
index 98e79552ef..8305105fd5 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -46,6 +46,7 @@ static struct PVEBackupState {
size_t transferred;
size_t reused;
size_t zero_bytes;
+ GList *bitmap_list;
} stat;
int64_t speed;
VmaWriter *vmaw;
@@ -670,7 +671,6 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
size_t total = 0;
- size_t dirty = 0;
l = di_list;
while (l) {
@@ -691,18 +691,33 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
uuid_generate(uuid);
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.reused = 0;
+
+ /* clear previous backup's bitmap_list */
+ if (backup_state.stat.bitmap_list) {
+ GList *bl = backup_state.stat.bitmap_list;
+ while (bl) {
+ g_free(((PBSBitmapInfo *)bl->data)->drive);
+ g_free(bl->data);
+ bl = g_list_next(bl);
+ }
+ g_list_free(backup_state.stat.bitmap_list);
+ backup_state.stat.bitmap_list = NULL;
+ }
+
if (format == BACKUP_FORMAT_PBS) {
if (!task->has_password) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'password'");
- goto err;
+ goto err_mutex;
}
if (!task->has_backup_id) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-id'");
- goto err;
+ goto err_mutex;
}
if (!task->has_backup_time) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-time'");
- goto err;
+ goto err_mutex;
}
int dump_cb_block_size = PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE; // Hardcoded (4M)
@@ -729,12 +744,12 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
"proxmox_backup_new failed: %s", pbs_err);
proxmox_backup_free_error(pbs_err);
- goto err;
+ goto err_mutex;
}
int connect_result = proxmox_backup_co_connect(pbs, task->errp);
if (connect_result < 0)
- goto err;
+ goto err_mutex;
/* register all devices */
l = di_list;
@@ -745,6 +760,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
di->block_size = dump_cb_block_size;
const char *devname = bdrv_get_device_name(di->bs);
+ PBSBitmapAction action = PBS_BITMAP_ACTION_NOT_USED;
+ size_t dirty = di->size;
BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
bool expect_only_dirty = false;
@@ -753,49 +770,59 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (bitmap == NULL) {
bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, task->errp);
if (!bitmap) {
- goto err;
+ goto err_mutex;
}
+ action = PBS_BITMAP_ACTION_NEW;
} else {
expect_only_dirty = proxmox_backup_check_incremental(pbs, devname, di->size) != 0;
}
if (expect_only_dirty) {
- dirty += bdrv_get_dirty_count(bitmap);
+ /* track clean chunks as reused */
+ dirty = MIN(bdrv_get_dirty_count(bitmap), di->size);
+ backup_state.stat.reused += di->size - dirty;
+ action = PBS_BITMAP_ACTION_USED;
} else {
/* mark entire bitmap as dirty to make full backup */
bdrv_set_dirty_bitmap(bitmap, 0, di->size);
- dirty += di->size;
+ if (action != PBS_BITMAP_ACTION_NEW) {
+ action = PBS_BITMAP_ACTION_INVALID;
+ }
}
di->bitmap = bitmap;
} else {
- dirty += di->size;
-
/* after a full backup the old dirty bitmap is invalid anyway */
if (bitmap != NULL) {
bdrv_release_dirty_bitmap(bitmap);
+ action = PBS_BITMAP_ACTION_NOT_USED_REMOVED;
}
}
int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, task->errp);
if (dev_id < 0) {
- goto err;
+ goto err_mutex;
}
if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, task->errp))) {
- goto err;
+ goto err_mutex;
}
di->dev_id = dev_id;
+
+ PBSBitmapInfo *info = g_malloc(sizeof(*info));
+ info->drive = g_strdup(devname);
+ info->action = action;
+ info->size = di->size;
+ info->dirty = dirty;
+ backup_state.stat.bitmap_list = g_list_append(backup_state.stat.bitmap_list, info);
}
} else if (format == BACKUP_FORMAT_VMA) {
- dirty = total;
-
vmaw = vma_writer_create(task->backup_file, uuid, &local_err);
if (!vmaw) {
if (local_err) {
error_propagate(task->errp, local_err);
}
- goto err;
+ goto err_mutex;
}
/* register all devices for vma writer */
@@ -805,7 +832,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
l = g_list_next(l);
if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_vma_cb, di, task->errp))) {
- goto err;
+ goto err_mutex;
}
const char *devname = bdrv_get_device_name(di->bs);
@@ -813,16 +840,14 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (di->dev_id <= 0) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
"register_stream failed");
- goto err;
+ goto err_mutex;
}
}
} else if (format == BACKUP_FORMAT_DIR) {
- dirty = total;
-
if (mkdir(task->backup_file, 0640) != 0) {
error_setg_errno(task->errp, errno, "can't create directory '%s'\n",
task->backup_file);
- goto err;
+ goto err_mutex;
}
backup_dir = task->backup_file;
@@ -839,18 +864,18 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
di->size, flags, false, &local_err);
if (local_err) {
error_propagate(task->errp, local_err);
- goto err;
+ goto err_mutex;
}
di->target = bdrv_open(di->targetfile, NULL, NULL, flags, &local_err);
if (!di->target) {
error_propagate(task->errp, local_err);
- goto err;
+ goto err_mutex;
}
}
} else {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "unknown backup format");
- goto err;
+ goto err_mutex;
}
@@ -858,7 +883,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (task->has_config_file) {
if (pvebackup_co_add_config(task->config_file, config_name, format, backup_dir,
vmaw, pbs, task->errp) != 0) {
- goto err;
+ goto err_mutex;
}
}
@@ -866,12 +891,11 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (task->has_firewall_file) {
if (pvebackup_co_add_config(task->firewall_file, firewall_name, format, backup_dir,
vmaw, pbs, task->errp) != 0) {
- goto err;
+ goto err_mutex;
}
}
/* initialize global backup_state now */
-
- qemu_mutex_lock(&backup_state.stat.lock);
+ /* note: 'reused' and 'bitmap_list' are initialized earlier */
if (backup_state.stat.error) {
error_free(backup_state.stat.error);
@@ -891,10 +915,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
char *uuid_str = g_strdup(backup_state.stat.uuid_str);
backup_state.stat.total = total;
- backup_state.stat.dirty = dirty;
+ backup_state.stat.dirty = total - backup_state.stat.reused;
backup_state.stat.transferred = 0;
backup_state.stat.zero_bytes = 0;
- backup_state.stat.reused = format == BACKUP_FORMAT_PBS && dirty >= total ? 0 : total - dirty;
qemu_mutex_unlock(&backup_state.stat.lock);
@@ -911,6 +934,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
task->result = uuid_info;
return;
+err_mutex:
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
err:
l = di_list;
@@ -1074,11 +1100,42 @@ BackupStatus *qmp_query_backup(Error **errp)
return info;
}
+PBSBitmapInfoList *qmp_query_pbs_bitmap_info(Error **errp)
+{
+ PBSBitmapInfoList *head = NULL, **p_next = &head;
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+
+ GList *l = backup_state.stat.bitmap_list;
+ while (l) {
+ PBSBitmapInfo *info = (PBSBitmapInfo *)l->data;
+ l = g_list_next(l);
+
+ /* clone bitmap info to avoid auto free after QMP marshalling */
+ PBSBitmapInfo *info_ret = g_malloc0(sizeof(*info_ret));
+ info_ret->drive = g_strdup(info->drive);
+ info_ret->action = info->action;
+ info_ret->size = info->size;
+ info_ret->dirty = info->dirty;
+
+ PBSBitmapInfoList *info_list = g_malloc0(sizeof(*info_list));
+ info_list->value = info_ret;
+
+ *p_next = info_list;
+ p_next = &info_list->next;
+ }
+
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+ return head;
+}
+
ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
{
ProxmoxSupportStatus *ret = g_malloc0(sizeof(*ret));
ret->pbs_library_version = g_strdup(proxmox_backup_qemu_version());
ret->pbs_dirty_bitmap = true;
ret->pbs_dirty_bitmap_savevm = true;
+ ret->query_bitmap_info = true;
return ret;
}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index f3608390c4..f57fda122c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -876,6 +876,8 @@
# @pbs-dirty-bitmap: True if dirty-bitmap-incremental backups to PBS are
# supported.
#
+# @query-bitmap-info: True if the 'query-pbs-bitmap-info' QMP call is supported.
+#
# @pbs-dirty-bitmap-savevm: True if 'dirty-bitmaps' migration capability can
# safely be set for savevm-async.
#
@@ -884,6 +886,7 @@
##
{ 'struct': 'ProxmoxSupportStatus',
'data': { 'pbs-dirty-bitmap': 'bool',
+ 'query-bitmap-info': 'bool',
'pbs-dirty-bitmap-savevm': 'bool',
'pbs-library-version': 'str' } }
@@ -897,6 +900,59 @@
##
{ 'command': 'query-proxmox-support', 'returns': 'ProxmoxSupportStatus' }
+##
+# @PBSBitmapAction:
+#
+# An action taken on a dirty-bitmap when a backup job was started.
+#
+# @not-used: Bitmap mode was not enabled.
+#
+# @not-used-removed: Bitmap mode was not enabled, but a bitmap from a
+# previous backup still existed and was removed.
+#
+# @new: A new bitmap was attached to the drive for this backup.
+#
+# @used: An existing bitmap will be used to only backup changed data.
+#
+# @invalid: A bitmap existed, but had to be cleared since it's associated
+# base snapshot did not match the base given for the current job or
+# the crypt mode has changed.
+#
+##
+{ 'enum': 'PBSBitmapAction',
+ 'data': ['not-used', 'not-used-removed', 'new', 'used', 'invalid'] }
+
+##
+# @PBSBitmapInfo:
+#
+# Contains information about dirty bitmaps used for each drive in a PBS backup.
+#
+# @drive: The underlying drive.
+#
+# @action: The action that was taken when the backup started.
+#
+# @size: The total size of the drive.
+#
+# @dirty: How much of the drive is considered dirty and will be backed up,
+# or 'size' if everything will be.
+#
+##
+{ 'struct': 'PBSBitmapInfo',
+ 'data': { 'drive': 'str', 'action': 'PBSBitmapAction', 'size': 'int',
+ 'dirty': 'int' } }
+
+##
+# @query-pbs-bitmap-info:
+#
+# Returns information about dirty bitmaps used on the most recently started
+# backup. Returns nothing when the last backup was not using PBS or if no
+# backup occured in this session.
+#
+# Returns: @PBSBitmapInfo
+#
+##
+{ 'command': 'query-pbs-bitmap-info', 'returns': ['PBSBitmapInfo'] }
+
##
# @BlockDeviceTimedStats:
#

View File

@@ -13,19 +13,24 @@ that are obviously marked as "busy", which would cause none at all to be
transferred.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
migration/block-dirty-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
migration/block-dirty-bitmap.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index c61d382be8..26e4e5c99c 100644
index a7d55048c2..44078ea670 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -534,7 +534,7 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs,
@@ -539,7 +539,11 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs,
}
if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, &local_err)) {
error_report_err(local_err);
if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, errp)) {
- return -1;
+ if (errp != NULL) {
+ error_report_err(*errp);
+ *errp = NULL;
+ }
+ continue;
}

View File

@@ -15,15 +15,16 @@ According to RFC 3720, an initiator name is at most 223 bytes long, so the
4 KiB buffer is big enough, even if many whitespaces are used.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/iscsi.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/block/iscsi.c b/block/iscsi.c
index e30a7e3606..6c70bbe351 100644
index 979bf90cb7..961714a4be 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1374,12 +1374,42 @@ static char *get_initiator_name(QemuOpts *opts)
@@ -1392,12 +1392,42 @@ static char *get_initiator_name(QemuOpts *opts)
const char *name;
char *iscsi_name;
UuidInfo *uuid_info;

View File

@@ -1,292 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Thu, 20 Aug 2020 14:25:00 +0200
Subject: [PATCH] PVE-Backup: Use a transaction to synchronize job states
By using a JobTxn, we can sync dirty bitmaps only when *all* jobs were
successful - meaning we don't need to remove them when the backup fails,
since QEMU's BITMAP_SYNC_MODE_ON_SUCCESS will now handle that for us.
To keep the rate-limiting and IO impact from before, we use a sequential
transaction, so drives will still be backed up one after the other.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
pve-backup.c | 167 +++++++++++++++------------------------------------
1 file changed, 49 insertions(+), 118 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 8305105fd5..d7f2b2206f 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -52,6 +52,7 @@ static struct PVEBackupState {
VmaWriter *vmaw;
ProxmoxBackupHandle *pbs;
GList *di_list;
+ JobTxn *txn;
QemuMutex backup_mutex;
CoMutex dump_callback_mutex;
} backup_state;
@@ -71,32 +72,12 @@ typedef struct PVEBackupDevInfo {
size_t size;
uint64_t block_size;
uint8_t dev_id;
- bool completed;
char targetfile[PATH_MAX];
BdrvDirtyBitmap *bitmap;
BlockDriverState *target;
+ BlockJob *job;
} PVEBackupDevInfo;
-static void pvebackup_run_next_job(void);
-
-static BlockJob *
-lookup_active_block_job(PVEBackupDevInfo *di)
-{
- if (!di->completed && di->bs) {
- for (BlockJob *job = block_job_next(NULL); job; job = block_job_next(job)) {
- if (job->job.driver->job_type != JOB_TYPE_BACKUP) {
- continue;
- }
-
- BackupBlockJob *bjob = container_of(job, BackupBlockJob, common);
- if (bjob && bjob->source_bs == di->bs) {
- return job;
- }
- }
- }
- return NULL;
-}
-
static void pvebackup_propagate_error(Error *err)
{
qemu_mutex_lock(&backup_state.stat.lock);
@@ -272,18 +253,6 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
if (local_err != NULL) {
pvebackup_propagate_error(local_err);
}
- } else {
- // on error or cancel we cannot ensure synchronization of dirty
- // bitmaps with backup server, so remove all and do full backup next
- GList *l = backup_state.di_list;
- while (l) {
- PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
- l = g_list_next(l);
-
- if (di->bitmap) {
- bdrv_release_dirty_bitmap(di->bitmap);
- }
- }
}
proxmox_backup_disconnect(backup_state.pbs);
@@ -322,8 +291,6 @@ static void pvebackup_complete_cb(void *opaque, int ret)
qemu_mutex_lock(&backup_state.backup_mutex);
- di->completed = true;
-
if (ret < 0) {
Error *local_err = NULL;
error_setg(&local_err, "job failed with err %d - %s", ret, strerror(-ret));
@@ -336,20 +303,17 @@ static void pvebackup_complete_cb(void *opaque, int ret)
block_on_coroutine_fn(pvebackup_complete_stream, di);
- // remove self from job queue
+ // remove self from job list
backup_state.di_list = g_list_remove(backup_state.di_list, di);
- if (di->bitmap && ret < 0) {
- // on error or cancel we cannot ensure synchronization of dirty
- // bitmaps with backup server, so remove all and do full backup next
- bdrv_release_dirty_bitmap(di->bitmap);
- }
-
g_free(di);
- qemu_mutex_unlock(&backup_state.backup_mutex);
+ /* call cleanup if we're the last job */
+ if (!g_list_first(backup_state.di_list)) {
+ block_on_coroutine_fn(pvebackup_co_cleanup, NULL);
+ }
- pvebackup_run_next_job();
+ qemu_mutex_unlock(&backup_state.backup_mutex);
}
static void pvebackup_cancel(void)
@@ -371,36 +335,28 @@ static void pvebackup_cancel(void)
proxmox_backup_abort(backup_state.pbs, "backup canceled");
}
- qemu_mutex_unlock(&backup_state.backup_mutex);
-
- for(;;) {
-
- BlockJob *next_job = NULL;
-
- qemu_mutex_lock(&backup_state.backup_mutex);
-
- GList *l = backup_state.di_list;
- while (l) {
- PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
- l = g_list_next(l);
+ /* it's enough to cancel one job in the transaction, the rest will follow
+ * automatically */
+ GList *bdi = g_list_first(backup_state.di_list);
+ BlockJob *cancel_job = bdi && bdi->data ?
+ ((PVEBackupDevInfo *)bdi->data)->job :
+ NULL;
- BlockJob *job = lookup_active_block_job(di);
- if (job != NULL) {
- next_job = job;
- break;
- }
- }
+ /* ref the job before releasing the mutex, just to be safe */
+ if (cancel_job) {
+ job_ref(&cancel_job->job);
+ }
- qemu_mutex_unlock(&backup_state.backup_mutex);
+ /* job_cancel_sync may enter the job, so we need to release the
+ * backup_mutex to avoid deadlock */
+ qemu_mutex_unlock(&backup_state.backup_mutex);
- if (next_job) {
- AioContext *aio_context = next_job->job.aio_context;
- aio_context_acquire(aio_context);
- job_cancel_sync(&next_job->job);
- aio_context_release(aio_context);
- } else {
- break;
- }
+ if (cancel_job) {
+ AioContext *aio_context = cancel_job->job.aio_context;
+ aio_context_acquire(aio_context);
+ job_cancel_sync(&cancel_job->job);
+ job_unref(&cancel_job->job);
+ aio_context_release(aio_context);
}
}
@@ -459,51 +415,19 @@ static int coroutine_fn pvebackup_co_add_config(
goto out;
}
-bool job_should_pause(Job *job);
-
-static void pvebackup_run_next_job(void)
-{
- assert(!qemu_in_coroutine());
-
- qemu_mutex_lock(&backup_state.backup_mutex);
-
- GList *l = backup_state.di_list;
- while (l) {
- PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
- l = g_list_next(l);
-
- BlockJob *job = lookup_active_block_job(di);
-
- if (job) {
- qemu_mutex_unlock(&backup_state.backup_mutex);
-
- AioContext *aio_context = job->job.aio_context;
- aio_context_acquire(aio_context);
-
- if (job_should_pause(&job->job)) {
- bool error_or_canceled = pvebackup_error_or_canceled();
- if (error_or_canceled) {
- job_cancel_sync(&job->job);
- } else {
- job_resume(&job->job);
- }
- }
- aio_context_release(aio_context);
- return;
- }
- }
-
- block_on_coroutine_fn(pvebackup_co_cleanup, NULL); // no more jobs, run cleanup
-
- qemu_mutex_unlock(&backup_state.backup_mutex);
-}
-
static bool create_backup_jobs(void) {
assert(!qemu_in_coroutine());
Error *local_err = NULL;
+ /* create job transaction to synchronize bitmap commit and cancel all
+ * jobs in case one errors */
+ if (backup_state.txn) {
+ job_txn_unref(backup_state.txn);
+ }
+ backup_state.txn = job_txn_new_seq();
+
/* create and start all jobs (paused state) */
GList *l = backup_state.di_list;
while (l) {
@@ -524,7 +448,7 @@ static bool create_backup_jobs(void) {
BlockJob *job = backup_job_create(
NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
bitmap_mode, false, NULL, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
- JOB_DEFAULT, pvebackup_complete_cb, di, 1, NULL, &local_err);
+ JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn, &local_err);
aio_context_release(aio_context);
@@ -536,7 +460,8 @@ static bool create_backup_jobs(void) {
pvebackup_propagate_error(create_job_err);
break;
}
- job_start(&job->job);
+
+ di->job = job;
bdrv_unref(di->target);
di->target = NULL;
@@ -554,6 +479,10 @@ static bool create_backup_jobs(void) {
bdrv_unref(di->target);
di->target = NULL;
}
+
+ if (di->job) {
+ job_unref(&di->job->job);
+ }
}
}
@@ -944,10 +873,6 @@ err:
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
- if (di->bitmap) {
- bdrv_release_dirty_bitmap(di->bitmap);
- }
-
if (di->target) {
bdrv_unref(di->target);
}
@@ -1036,9 +961,15 @@ UuidInfo *qmp_backup(
block_on_coroutine_fn(pvebackup_co_prepare, &task);
if (*errp == NULL) {
- create_backup_jobs();
+ bool errors = create_backup_jobs();
qemu_mutex_unlock(&backup_state.backup_mutex);
- pvebackup_run_next_job();
+
+ if (!errors) {
+ /* start the first job in the transaction
+ * note: this might directly enter the job, so we need to do this
+ * after unlocking the backup_mutex */
+ job_txn_start_seq(backup_state.txn);
+ }
} else {
qemu_mutex_unlock(&backup_state.backup_mutex);
}

View File

@@ -4,15 +4,17 @@ Date: Tue, 2 Mar 2021 16:34:28 +0100
Subject: [PATCH] PVE: block/stream: increase chunk size
Ceph favors bigger chunks, so increase to 4M.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/stream.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/stream.c b/block/stream.c
index 236384f2f7..a5371420e3 100644
index 9076203193..1d1c65f061 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -26,7 +26,7 @@ enum {
@@ -27,7 +27,7 @@ enum {
* large enough to process multiple clusters in a single call, so
* that populating contiguous regions of the image is efficient.
*/

View File

@@ -1,500 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Mon, 28 Sep 2020 13:40:51 +0200
Subject: [PATCH] PVE-Backup: Don't block on finishing and cleanup
create_backup_jobs
proxmox_backup_co_finish is already async, but previously we would wait
for the coroutine using block_on_coroutine_fn(). Avoid this by
scheduling pvebackup_co_complete_stream (and thus pvebackup_co_cleanup)
as a real coroutine when calling from pvebackup_complete_cb. This is ok,
since complete_stream uses the backup_mutex internally to synchronize,
and other streams can happily continue writing in the meantime anyway.
To accomodate, backup_mutex is converted to a CoMutex. This means
converting every user to a coroutine. This is not just useful here, but
will come in handy once this series[0] is merged, and QMP calls can be
yield-able coroutines too. Then we can also finally get rid of
block_on_coroutine_fn.
Cases of aio_context_acquire/release from within what is now a coroutine
are changed to aio_co_reschedule_self, which works since a running
coroutine always holds the aio lock for the context it is running in.
job_cancel_sync is called from a BH since it can't be run from a
coroutine (uses AIO_WAIT_WHILE internally).
Same thing for create_backup_jobs, which is converted to a BH too.
To communicate the finishing state, a new property is introduced to
query-backup: 'finishing'. A new state is explicitly not used, since
that would break compatibility with older qemu-server versions.
Also fix create_backup_jobs:
No more weird bool returns, just the standard "errp" format used
everywhere else too. With this, if backup_job_create fails, the error
message is actually returned over QMP and can be shown to the user.
To facilitate correct cleanup on such an error, we call
create_backup_jobs as a bottom half directly from pvebackup_co_prepare.
This additionally allows us to actually hold the backup_mutex during
operation.
Also add a job_cancel_sync before job_unref, since a job must be in
STATUS_NULL to be deleted by unref, which could trigger an assert
before.
[0] https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg03515.html
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
pve-backup.c | 217 ++++++++++++++++++++++++++++---------------
qapi/block-core.json | 5 +-
2 files changed, 144 insertions(+), 78 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index d7f2b2206f..e671ed8d48 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -33,7 +33,9 @@ const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap";
static struct PVEBackupState {
struct {
- // Everithing accessed from qmp_backup_query command is protected using lock
+ // Everything accessed from qmp_backup_query command is protected using
+ // this lock. Do NOT hold this lock for long times, as it is sometimes
+ // acquired from coroutines, and thus any wait time may block the guest.
QemuMutex lock;
Error *error;
time_t start_time;
@@ -47,20 +49,22 @@ static struct PVEBackupState {
size_t reused;
size_t zero_bytes;
GList *bitmap_list;
+ bool finishing;
+ bool starting;
} stat;
int64_t speed;
VmaWriter *vmaw;
ProxmoxBackupHandle *pbs;
GList *di_list;
JobTxn *txn;
- QemuMutex backup_mutex;
+ CoMutex backup_mutex;
CoMutex dump_callback_mutex;
} backup_state;
static void pvebackup_init(void)
{
qemu_mutex_init(&backup_state.stat.lock);
- qemu_mutex_init(&backup_state.backup_mutex);
+ qemu_co_mutex_init(&backup_state.backup_mutex);
qemu_co_mutex_init(&backup_state.dump_callback_mutex);
}
@@ -72,6 +76,7 @@ typedef struct PVEBackupDevInfo {
size_t size;
uint64_t block_size;
uint8_t dev_id;
+ int completed_ret; // INT_MAX if not completed
char targetfile[PATH_MAX];
BdrvDirtyBitmap *bitmap;
BlockDriverState *target;
@@ -227,12 +232,12 @@ pvebackup_co_dump_vma_cb(
}
// assumes the caller holds backup_mutex
-static void coroutine_fn pvebackup_co_cleanup(void *unused)
+static void coroutine_fn pvebackup_co_cleanup(void)
{
assert(qemu_in_coroutine());
qemu_mutex_lock(&backup_state.stat.lock);
- backup_state.stat.end_time = time(NULL);
+ backup_state.stat.finishing = true;
qemu_mutex_unlock(&backup_state.stat.lock);
if (backup_state.vmaw) {
@@ -261,35 +266,29 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
g_list_free(backup_state.di_list);
backup_state.di_list = NULL;
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.end_time = time(NULL);
+ backup_state.stat.finishing = false;
+ qemu_mutex_unlock(&backup_state.stat.lock);
}
-// assumes the caller holds backup_mutex
-static void coroutine_fn pvebackup_complete_stream(void *opaque)
+static void coroutine_fn pvebackup_co_complete_stream(void *opaque)
{
PVEBackupDevInfo *di = opaque;
+ int ret = di->completed_ret;
- bool error_or_canceled = pvebackup_error_or_canceled();
-
- if (backup_state.vmaw) {
- vma_writer_close_stream(backup_state.vmaw, di->dev_id);
+ qemu_mutex_lock(&backup_state.stat.lock);
+ bool starting = backup_state.stat.starting;
+ qemu_mutex_unlock(&backup_state.stat.lock);
+ if (starting) {
+ /* in 'starting' state, no tasks have been run yet, meaning we can (and
+ * must) skip all cleanup, as we don't know what has and hasn't been
+ * initialized yet. */
+ return;
}
- if (backup_state.pbs && !error_or_canceled) {
- Error *local_err = NULL;
- proxmox_backup_co_close_image(backup_state.pbs, di->dev_id, &local_err);
- if (local_err != NULL) {
- pvebackup_propagate_error(local_err);
- }
- }
-}
-
-static void pvebackup_complete_cb(void *opaque, int ret)
-{
- assert(!qemu_in_coroutine());
-
- PVEBackupDevInfo *di = opaque;
-
- qemu_mutex_lock(&backup_state.backup_mutex);
+ qemu_co_mutex_lock(&backup_state.backup_mutex);
if (ret < 0) {
Error *local_err = NULL;
@@ -301,7 +300,19 @@ static void pvebackup_complete_cb(void *opaque, int ret)
assert(di->target == NULL);
- block_on_coroutine_fn(pvebackup_complete_stream, di);
+ bool error_or_canceled = pvebackup_error_or_canceled();
+
+ if (backup_state.vmaw) {
+ vma_writer_close_stream(backup_state.vmaw, di->dev_id);
+ }
+
+ if (backup_state.pbs && !error_or_canceled) {
+ Error *local_err = NULL;
+ proxmox_backup_co_close_image(backup_state.pbs, di->dev_id, &local_err);
+ if (local_err != NULL) {
+ pvebackup_propagate_error(local_err);
+ }
+ }
// remove self from job list
backup_state.di_list = g_list_remove(backup_state.di_list, di);
@@ -310,21 +321,49 @@ static void pvebackup_complete_cb(void *opaque, int ret)
/* call cleanup if we're the last job */
if (!g_list_first(backup_state.di_list)) {
- block_on_coroutine_fn(pvebackup_co_cleanup, NULL);
+ pvebackup_co_cleanup();
}
- qemu_mutex_unlock(&backup_state.backup_mutex);
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
}
-static void pvebackup_cancel(void)
+static void pvebackup_complete_cb(void *opaque, int ret)
{
- assert(!qemu_in_coroutine());
+ PVEBackupDevInfo *di = opaque;
+ di->completed_ret = ret;
+
+ /*
+ * Schedule stream cleanup in async coroutine. close_image and finish might
+ * take a while, so we can't block on them here. This way it also doesn't
+ * matter if we're already running in a coroutine or not.
+ * Note: di is a pointer to an entry in the global backup_state struct, so
+ * it stays valid.
+ */
+ Coroutine *co = qemu_coroutine_create(pvebackup_co_complete_stream, di);
+ aio_co_enter(qemu_get_aio_context(), co);
+}
+
+/*
+ * job_cancel(_sync) does not like to be called from coroutines, so defer to
+ * main loop processing via a bottom half.
+ */
+static void job_cancel_bh(void *opaque) {
+ CoCtxData *data = (CoCtxData*)opaque;
+ Job *job = (Job*)data->data;
+ AioContext *job_ctx = job->aio_context;
+ aio_context_acquire(job_ctx);
+ job_cancel_sync(job);
+ aio_context_release(job_ctx);
+ aio_co_enter(data->ctx, data->co);
+}
+static void coroutine_fn pvebackup_co_cancel(void *opaque)
+{
Error *cancel_err = NULL;
error_setg(&cancel_err, "backup canceled");
pvebackup_propagate_error(cancel_err);
- qemu_mutex_lock(&backup_state.backup_mutex);
+ qemu_co_mutex_lock(&backup_state.backup_mutex);
if (backup_state.vmaw) {
/* make sure vma writer does not block anymore */
@@ -342,27 +381,22 @@ static void pvebackup_cancel(void)
((PVEBackupDevInfo *)bdi->data)->job :
NULL;
- /* ref the job before releasing the mutex, just to be safe */
if (cancel_job) {
- job_ref(&cancel_job->job);
+ CoCtxData data = {
+ .ctx = qemu_get_current_aio_context(),
+ .co = qemu_coroutine_self(),
+ .data = &cancel_job->job,
+ };
+ aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);
+ qemu_coroutine_yield();
}
- /* job_cancel_sync may enter the job, so we need to release the
- * backup_mutex to avoid deadlock */
- qemu_mutex_unlock(&backup_state.backup_mutex);
-
- if (cancel_job) {
- AioContext *aio_context = cancel_job->job.aio_context;
- aio_context_acquire(aio_context);
- job_cancel_sync(&cancel_job->job);
- job_unref(&cancel_job->job);
- aio_context_release(aio_context);
- }
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
}
void qmp_backup_cancel(Error **errp)
{
- pvebackup_cancel();
+ block_on_coroutine_fn(pvebackup_co_cancel, NULL);
}
// assumes the caller holds backup_mutex
@@ -415,10 +449,18 @@ static int coroutine_fn pvebackup_co_add_config(
goto out;
}
-static bool create_backup_jobs(void) {
+/*
+ * backup_job_create can *not* be run from a coroutine (and requires an
+ * acquired AioContext), so this can't either.
+ * The caller is responsible that backup_mutex is held nonetheless.
+ */
+static void create_backup_jobs_bh(void *opaque) {
assert(!qemu_in_coroutine());
+ CoCtxData *data = (CoCtxData*)opaque;
+ Error **errp = (Error**)data->data;
+
Error *local_err = NULL;
/* create job transaction to synchronize bitmap commit and cancel all
@@ -452,24 +494,19 @@ static bool create_backup_jobs(void) {
aio_context_release(aio_context);
- if (!job || local_err != NULL) {
- Error *create_job_err = NULL;
- error_setg(&create_job_err, "backup_job_create failed: %s",
- local_err ? error_get_pretty(local_err) : "null");
+ di->job = job;
- pvebackup_propagate_error(create_job_err);
+ if (!job || local_err) {
+ error_setg(errp, "backup_job_create failed: %s",
+ local_err ? error_get_pretty(local_err) : "null");
break;
}
- di->job = job;
-
bdrv_unref(di->target);
di->target = NULL;
}
- bool errors = pvebackup_error_or_canceled();
-
- if (errors) {
+ if (*errp) {
l = backup_state.di_list;
while (l) {
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
@@ -481,12 +518,17 @@ static bool create_backup_jobs(void) {
}
if (di->job) {
+ AioContext *ctx = di->job->job.aio_context;
+ aio_context_acquire(ctx);
+ job_cancel_sync(&di->job->job);
job_unref(&di->job->job);
+ aio_context_release(ctx);
}
}
}
- return errors;
+ /* return */
+ aio_co_enter(data->ctx, data->co);
}
typedef struct QmpBackupTask {
@@ -523,11 +565,12 @@ typedef struct QmpBackupTask {
UuidInfo *result;
} QmpBackupTask;
-// assumes the caller holds backup_mutex
static void coroutine_fn pvebackup_co_prepare(void *opaque)
{
assert(qemu_in_coroutine());
+ qemu_co_mutex_lock(&backup_state.backup_mutex);
+
QmpBackupTask *task = opaque;
task->result = NULL; // just to be sure
@@ -548,8 +591,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
const char *firewall_name = "qemu-server.fw";
if (backup_state.di_list) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+ error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
"previous backup not finished");
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
return;
}
@@ -616,6 +660,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
di->size = size;
total += size;
+
+ di->completed_ret = INT_MAX;
}
uuid_generate(uuid);
@@ -847,6 +893,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
backup_state.stat.dirty = total - backup_state.stat.reused;
backup_state.stat.transferred = 0;
backup_state.stat.zero_bytes = 0;
+ backup_state.stat.finishing = false;
+ backup_state.stat.starting = true;
qemu_mutex_unlock(&backup_state.stat.lock);
@@ -861,6 +909,33 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
uuid_info->UUID = uuid_str;
task->result = uuid_info;
+
+ /* Run create_backup_jobs_bh outside of coroutine (in BH) but keep
+ * backup_mutex locked. This is fine, a CoMutex can be held across yield
+ * points, and we'll release it as soon as the BH reschedules us.
+ */
+ CoCtxData waker = {
+ .co = qemu_coroutine_self(),
+ .ctx = qemu_get_current_aio_context(),
+ .data = &local_err,
+ };
+ aio_bh_schedule_oneshot(waker.ctx, create_backup_jobs_bh, &waker);
+ qemu_coroutine_yield();
+
+ if (local_err) {
+ error_propagate(task->errp, local_err);
+ goto err;
+ }
+
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.starting = false;
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+ /* start the first job in the transaction */
+ job_txn_start_seq(backup_state.txn);
+
return;
err_mutex:
@@ -883,6 +958,7 @@ err:
g_free(di);
}
g_list_free(di_list);
+ backup_state.di_list = NULL;
if (devs) {
g_strfreev(devs);
@@ -903,6 +979,8 @@ err:
}
task->result = NULL;
+
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
return;
}
@@ -956,24 +1034,8 @@ UuidInfo *qmp_backup(
.errp = errp,
};
- qemu_mutex_lock(&backup_state.backup_mutex);
-
block_on_coroutine_fn(pvebackup_co_prepare, &task);
- if (*errp == NULL) {
- bool errors = create_backup_jobs();
- qemu_mutex_unlock(&backup_state.backup_mutex);
-
- if (!errors) {
- /* start the first job in the transaction
- * note: this might directly enter the job, so we need to do this
- * after unlocking the backup_mutex */
- job_txn_start_seq(backup_state.txn);
- }
- } else {
- qemu_mutex_unlock(&backup_state.backup_mutex);
- }
-
return task.result;
}
@@ -1025,6 +1087,7 @@ BackupStatus *qmp_query_backup(Error **errp)
info->transferred = backup_state.stat.transferred;
info->has_reused = true;
info->reused = backup_state.stat.reused;
+ info->finishing = backup_state.stat.finishing;
qemu_mutex_unlock(&backup_state.stat.lock);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index f57fda122c..9b827cbe43 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -775,12 +775,15 @@
#
# @uuid: uuid for this backup job
#
+# @finishing: if status='active' and finishing=true, then the backup process is
+# waiting for the target to finish.
+#
##
{ 'struct': 'BackupStatus',
'data': {'*status': 'str', '*errmsg': 'str', '*total': 'int', '*dirty': 'int',
'*transferred': 'int', '*zero-bytes': 'int', '*reused': 'int',
'*start-time': 'int', '*end-time': 'int',
- '*backup-file': 'str', '*uuid': 'str' } }
+ '*backup-file': 'str', '*uuid': 'str', 'finishing': 'bool' } }
##
# @BackupFormat:

View File

@@ -19,22 +19,33 @@ well.
This only worked if the target supports backing images, so up until now
only for qcow2, with alloc-track any driver for the target can be used.
If 'auto-remove' is set, alloc-track will automatically detach itself
once the backing image is removed. It will be replaced by 'file'.
Replacing the node cannot be done in the
track_co_change_backing_file() callback, because replacing a node
cannot happen in a coroutine and requires the block graph lock
exclusively. Could either become a special option for the stream job,
or maybe the upcoming blockdev-replace QMP command can be used in the
future.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: adapt to changed function signatures
make error return value consistent with QEMU
avoid premature break during read
adhere to block graph lock requirements]
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/alloc-track.c | 342 ++++++++++++++++++++++++++++++++++++++++++++
block/alloc-track.c | 366 ++++++++++++++++++++++++++++++++++++++++++++
block/meson.build | 1 +
2 files changed, 343 insertions(+)
block/stream.c | 34 ++++
3 files changed, 401 insertions(+)
create mode 100644 block/alloc-track.c
diff --git a/block/alloc-track.c b/block/alloc-track.c
new file mode 100644
index 0000000000..b579380279
index 0000000000..b4a9851144
--- /dev/null
+++ b/block/alloc-track.c
@@ -0,0 +1,345 @@
@@ -0,0 +1,366 @@
+/*
+ * Node to allow backing images to be applied to any node. Assumes a blank
+ * image to begin with, only new writes are tracked as allocated, thus this
@@ -50,9 +61,12 @@ index 0000000000..b579380279
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "block/block_int.h"
+#include "block/dirty-bitmap.h"
+#include "block/graph-lock.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qstring.h"
+#include "qemu/cutils.h"
+#include "qemu/error-report.h"
+#include "qemu/option.h"
+#include "qemu/module.h"
+#include "sysemu/block-backend.h"
@@ -61,12 +75,12 @@ index 0000000000..b579380279
+
+typedef enum DropState {
+ DropNone,
+ DropRequested,
+ DropInProgress,
+} DropState;
+
+typedef struct {
+ BdrvDirtyBitmap *bitmap;
+ uint64_t granularity;
+ DropState drop_state;
+ bool auto_remove;
+} BDRVAllocTrackState;
@@ -85,26 +99,29 @@ index 0000000000..b579380279
+ },
+};
+
+static void track_refresh_limits(BlockDriverState *bs, Error **errp)
+static void GRAPH_RDLOCK
+track_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+ BlockDriverInfo bdi;
+ BDRVAllocTrackState *s = bs->opaque;
+
+ if (!bs->file) {
+ return;
+ }
+
+ /* always use alignment from underlying write device so RMW cycle for
+ * bdrv_pwritev reads data from our backing via track_co_preadv (no partial
+ * cluster allocation in 'file') */
+ bdrv_get_info(bs->file->bs, &bdi);
+ /*
+ * Always use alignment from underlying write device so RMW cycle for
+ * bdrv_pwritev reads data from our backing via track_co_preadv. Also use at
+ * least the bitmap granularity.
+ */
+ bs->bl.request_alignment = MAX(bs->file->bs->bl.request_alignment,
+ MAX(bdi.cluster_size, BDRV_SECTOR_SIZE));
+ s->granularity);
+}
+
+static int track_open(BlockDriverState *bs, QDict *options, int flags,
+ Error **errp)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+ BdrvChild *file = NULL;
+ QemuOpts *opts;
+ Error *local_err = NULL;
+ int ret = 0;
@@ -120,18 +137,45 @@ index 0000000000..b579380279
+ s->auto_remove = qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false);
+
+ /* open the target (write) node, backing will be attached by block layer */
+ bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+ BDRV_CHILD_DATA | BDRV_CHILD_METADATA, false,
+ &local_err);
+ file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+ BDRV_CHILD_DATA | BDRV_CHILD_METADATA, false,
+ &local_err);
+ bdrv_graph_wrlock();
+ bs->file = file;
+ bdrv_graph_wrunlock();
+ if (local_err) {
+ ret = -EINVAL;
+ error_propagate(errp, local_err);
+ goto fail;
+ }
+
+ bdrv_graph_rdlock_main_loop();
+ BlockDriverInfo bdi = {0};
+ ret = bdrv_get_info(bs->file->bs, &bdi);
+ if (ret < 0) {
+ /*
+ * Not a hard failure. Worst that can happen is partial cluster
+ * allocation in the write target. However, the driver here returns its
+ * allocation status based on the dirty bitmap, so any other data that
+ * maps to such a cluster will still be copied later by a stream job (or
+ * during writes to that cluster).
+ */
+ warn_report("alloc-track: unable to query cluster size for write target: %s",
+ strerror(ret));
+ }
+ ret = 0;
+ /*
+ * Always consider alignment from underlying write device so RMW cycle for
+ * bdrv_pwritev reads data from our backing via track_co_preadv. Also try to
+ * avoid partial cluster allocation in the write target by considering the
+ * cluster size.
+ */
+ s->granularity = MAX(bs->file->bs->bl.request_alignment,
+ MAX(bdi.cluster_size, BDRV_SECTOR_SIZE));
+ track_refresh_limits(bs, errp);
+ uint64_t gran = bs->bl.request_alignment;
+ s->bitmap = bdrv_create_dirty_bitmap(bs->file->bs, gran, NULL, &local_err);
+ s->bitmap = bdrv_create_dirty_bitmap(bs->file->bs, s->granularity, NULL,
+ &local_err);
+ bdrv_graph_rdunlock_main_loop();
+ if (local_err) {
+ ret = -EIO;
+ error_propagate(errp, local_err);
@@ -142,7 +186,9 @@ index 0000000000..b579380279
+
+fail:
+ if (ret < 0) {
+ bdrv_graph_wrlock();
+ bdrv_unref_child(bs, bs->file);
+ bdrv_graph_wrunlock();
+ if (s->bitmap) {
+ bdrv_release_dirty_bitmap(s->bitmap);
+ }
@@ -159,13 +205,15 @@ index 0000000000..b579380279
+ }
+}
+
+static int64_t track_getlength(BlockDriverState *bs)
+static coroutine_fn int64_t GRAPH_RDLOCK
+track_co_getlength(BlockDriverState *bs)
+{
+ return bdrv_getlength(bs->file->bs);
+ return bdrv_co_getlength(bs->file->bs);
+}
+
+static int coroutine_fn track_co_preadv(BlockDriverState *bs,
+ uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
+static int coroutine_fn GRAPH_RDLOCK
+track_co_preadv(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+ QEMUIOVector local_qiov;
@@ -176,6 +224,11 @@ index 0000000000..b579380279
+ int64_t local_bytes;
+ bool alloc;
+
+ if (offset < 0 || bytes < 0) {
+ fprintf(stderr, "unexpected negative 'offset' or 'bytes' value!\n");
+ return -EIO;
+ }
+
+ /* a read request can span multiple granularity-sized chunks, and can thus
+ * contain blocks with different allocation status - we could just iterate
+ * granularity-wise, but for better performance use bdrv_dirty_bitmap_next_X
@@ -206,7 +259,8 @@ index 0000000000..b579380279
+ ret = bdrv_co_preadv(bs->backing, local_offset, local_bytes,
+ &local_qiov, flags);
+ } else {
+ ret = qemu_iovec_memset(&local_qiov, cur_offset, 0, local_bytes);
+ qemu_iovec_memset(&local_qiov, cur_offset, 0, local_bytes);
+ ret = 0;
+ }
+
+ if (ret != 0) {
@@ -217,36 +271,39 @@ index 0000000000..b579380279
+ return ret;
+}
+
+static int coroutine_fn track_co_pwritev(BlockDriverState *bs,
+ uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
+static int coroutine_fn GRAPH_RDLOCK
+track_co_pwritev(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+}
+
+static int coroutine_fn track_co_pwrite_zeroes(BlockDriverState *bs,
+ int64_t offset, int count, BdrvRequestFlags flags)
+static int coroutine_fn GRAPH_RDLOCK
+track_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset, int64_t bytes,
+ BdrvRequestFlags flags)
+{
+ return bdrv_co_pwrite_zeroes(bs->file, offset, count, flags);
+ return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
+}
+
+static int coroutine_fn track_co_pdiscard(BlockDriverState *bs,
+ int64_t offset, int count)
+static int coroutine_fn GRAPH_RDLOCK
+track_co_pdiscard(BlockDriverState *bs, int64_t offset, int64_t bytes)
+{
+ return bdrv_co_pdiscard(bs->file, offset, count);
+ return bdrv_co_pdiscard(bs->file, offset, bytes);
+}
+
+static coroutine_fn int track_co_flush(BlockDriverState *bs)
+static coroutine_fn int GRAPH_RDLOCK
+track_co_flush(BlockDriverState *bs)
+{
+ return bdrv_co_flush(bs->file->bs);
+}
+
+static int coroutine_fn track_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t offset,
+ int64_t bytes,
+ int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
+static int coroutine_fn GRAPH_RDLOCK
+track_co_block_status(BlockDriverState *bs, bool want_zero,
+ int64_t offset,
+ int64_t bytes,
+ int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+
@@ -272,10 +329,10 @@ index 0000000000..b579380279
+ return 0;
+}
+
+static void track_child_perm(BlockDriverState *bs, BdrvChild *c,
+ BdrvChildRole role, BlockReopenQueue *reopen_queue,
+ uint64_t perm, uint64_t shared,
+ uint64_t *nperm, uint64_t *nshared)
+static void GRAPH_RDLOCK
+track_child_perm(BlockDriverState *bs, BdrvChild *c, BdrvChildRole role,
+ BlockReopenQueue *reopen_queue, uint64_t perm, uint64_t shared,
+ uint64_t *nperm, uint64_t *nshared)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+
@@ -298,53 +355,28 @@ index 0000000000..b579380279
+ }
+}
+
+static void track_drop(void *opaque)
+static int coroutine_fn GRAPH_RDLOCK
+track_co_change_backing_file(BlockDriverState *bs, const char *backing_file,
+ const char *backing_fmt)
+{
+ BlockDriverState *bs = (BlockDriverState*)opaque;
+ BlockDriverState *file = bs->file->bs;
+ BDRVAllocTrackState *s = bs->opaque;
+
+ assert(file);
+
+ /* we rely on the fact that we're not used anywhere else, so let's wait
+ * until we're only used once - in the drive connected to the guest (and one
+ * ref is held by bdrv_ref in track_change_backing_file) */
+ if (bs->refcnt > 2) {
+ aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, opaque);
+ return;
+ }
+ AioContext *aio_context = bdrv_get_aio_context(bs);
+ aio_context_acquire(aio_context);
+
+ bdrv_drained_begin(bs);
+
+ /* now that we're drained, we can safely set 'DropInProgress' */
+ s->drop_state = DropInProgress;
+ bdrv_child_refresh_perms(bs, bs->file, &error_abort);
+
+ bdrv_replace_node(bs, file, &error_abort);
+ bdrv_set_backing_hd(bs, NULL, &error_abort);
+ bdrv_drained_end(bs);
+ bdrv_unref(bs);
+ aio_context_release(aio_context);
+}
+
+static int track_change_backing_file(BlockDriverState *bs,
+ const char *backing_file,
+ const char *backing_fmt)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+ if (s->auto_remove && s->drop_state == DropNone &&
+ backing_file == NULL && backing_fmt == NULL)
+ {
+ /* backing file has been disconnected, there's no longer any use for
+ * this node, so let's remove ourselves from the block graph - we need
+ * to schedule this for later however, since when this function is
+ * called, the blockjob modifying us is probably not done yet and has a
+ * blocker on 'bs' */
+ s->drop_state = DropRequested;
+ bdrv_ref(bs);
+ aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, (void*)bs);
+ /*
+ * Note that the actual backing file graph change is already done in the
+ * stream job itself with bdrv_set_backing_hd_drained(), so no need to
+ * actually do anything here. But still needs to be implemented, to make
+ * our caller (i.e. bdrv_co_change_backing_file() do the right thing).
+ *
+ * FIXME
+ * We'd like to auto-remove ourselves from the block graph, but it cannot
+ * be done from a coroutine. Currently done in the stream job, where it
+ * kinda fits better, but in the long-term, a special parameter would be
+ * nice (or done via qemu-server via upcoming blockdev-replace QMP command).
+ */
+ if (backing_file == NULL) {
+ BDRVAllocTrackState *s = bs->opaque;
+ bdrv_drained_begin(bs);
+ s->drop_state = DropInProgress;
+ bdrv_child_refresh_perms(bs, bs->file, &error_abort);
+ bdrv_drained_end(bs);
+ }
+
+ return 0;
@@ -354,9 +386,9 @@ index 0000000000..b579380279
+ .format_name = "alloc-track",
+ .instance_size = sizeof(BDRVAllocTrackState),
+
+ .bdrv_file_open = track_open,
+ .bdrv_open = track_open,
+ .bdrv_close = track_close,
+ .bdrv_getlength = track_getlength,
+ .bdrv_co_getlength = track_co_getlength,
+ .bdrv_child_perm = track_child_perm,
+ .bdrv_refresh_limits = track_refresh_limits,
+
@@ -371,7 +403,7 @@ index 0000000000..b579380279
+ .supports_backing = true,
+
+ .bdrv_co_block_status = track_co_block_status,
+ .bdrv_change_backing_file = track_change_backing_file,
+ .bdrv_co_change_backing_file = track_co_change_backing_file,
+};
+
+static void bdrv_alloc_track_init(void)
@@ -381,7 +413,7 @@ index 0000000000..b579380279
+
+block_init(bdrv_alloc_track_init);
diff --git a/block/meson.build b/block/meson.build
index a070060e53..e387990764 100644
index e178047ec9..7ef7250d31 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -2,6 +2,7 @@ block_ss.add(genh)
@@ -392,3 +424,48 @@ index a070060e53..e387990764 100644
'amend.c',
'backup.c',
'backup-dump.c',
diff --git a/block/stream.c b/block/stream.c
index 1d1c65f061..d499c8883f 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -120,6 +120,40 @@ static int stream_prepare(Job *job)
ret = -EPERM;
goto out;
}
+
+ /*
+ * This cannot be done in the co_change_backing_file callback, because
+ * bdrv_replace_node() cannot be done in a coroutine. The latter also
+ * requires the graph lock exclusively. Only required for the
+ * alloc-track driver.
+ *
+ * The long-term plan is to either have an explicit parameter for the
+ * stream job or use the upcoming blockdev-replace QMP command.
+ */
+ if (base_id == NULL && strcmp(unfiltered_bs->drv->format_name, "alloc-track") == 0) {
+ BlockDriverState *file_bs;
+
+ bdrv_graph_rdlock_main_loop();
+ file_bs = unfiltered_bs->file->bs;
+ bdrv_graph_rdunlock_main_loop();
+
+ bdrv_ref(unfiltered_bs); // unrefed by bdrv_replace_node()
+ bdrv_drained_begin(file_bs);
+ bdrv_graph_wrlock();
+
+ bdrv_replace_node(unfiltered_bs, file_bs, &local_err);
+
+ bdrv_graph_wrunlock();
+ bdrv_drained_end(file_bs);
+ bdrv_unref(unfiltered_bs);
+
+ if (local_err) {
+ error_prepend(&local_err, "failed to replace alloc-track node: ");
+ error_report_err(local_err);
+ ret = -EPERM;
+ goto out;
+ }
+ }
}
out:

View File

@@ -0,0 +1,81 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Thu, 23 Jun 2022 14:00:05 +0200
Subject: [PATCH] Revert "block/rbd: workaround for ceph issue #53784"
This reverts commit fc176116cdea816ceb8dd969080b2b95f58edbc0 in
preparation to revert 0347a8fd4c3faaedf119be04c197804be40a384b.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/rbd.c | 42 ++----------------------------------------
1 file changed, 2 insertions(+), 40 deletions(-)
diff --git a/block/rbd.c b/block/rbd.c
index 728bce3b1e..6c9a8e0add 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1515,7 +1515,6 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
int status, r;
RBDDiffIterateReq req = { .offs = offset };
uint64_t features, flags;
- uint64_t head = 0;
assert(offset + bytes <= s->image_size);
@@ -1543,43 +1542,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
return status;
}
-#if LIBRBD_VERSION_CODE < LIBRBD_VERSION(1, 17, 0)
- /*
- * librbd had a bug until early 2022 that affected all versions of ceph that
- * supported fast-diff. This bug results in reporting of incorrect offsets
- * if the offset parameter to rbd_diff_iterate2 is not object aligned.
- * Work around this bug by rounding down the offset to object boundaries.
- * This is OK because we call rbd_diff_iterate2 with whole_object = true.
- * However, this workaround only works for non cloned images with default
- * striping.
- *
- * See: https://tracker.ceph.com/issues/53784
- */
-
- /* check if RBD image has non-default striping enabled */
- if (features & RBD_FEATURE_STRIPINGV2) {
- return status;
- }
-
-#pragma GCC diagnostic push
-#pragma GCC diagnostic ignored "-Wdeprecated-declarations"
- /*
- * check if RBD image is a clone (= has a parent).
- *
- * rbd_get_parent_info is deprecated from Nautilus onwards, but the
- * replacement rbd_get_parent is not present in Luminous and Mimic.
- */
- if (rbd_get_parent_info(s->image, NULL, 0, NULL, 0, NULL, 0) != -ENOENT) {
- return status;
- }
-#pragma GCC diagnostic pop
-
- head = req.offs & (s->object_size - 1);
- req.offs -= head;
- bytes += head;
-#endif
-
- r = rbd_diff_iterate2(s->image, NULL, req.offs, bytes, true, true,
+ r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
qemu_rbd_diff_iterate_cb, &req);
if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
return status;
@@ -1598,8 +1561,7 @@ static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
}
- assert(req.bytes > head);
- *pnum = req.bytes - head;
+ *pnum = req.bytes;
return status;
}

View File

@@ -0,0 +1,36 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Thu, 23 Jun 2022 14:00:07 +0200
Subject: [PATCH] Revert "block/rbd: fix handling of holes in
.bdrv_co_block_status"
This reverts commit 9e302f64bb407a9bb097b626da97228c2654cfee in
preparation to revert 0347a8fd4c3faaedf119be04c197804be40a384b.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/rbd.c | 10 +++++-----
1 file changed, 5 insertions(+), 5 deletions(-)
diff --git a/block/rbd.c b/block/rbd.c
index 6c9a8e0add..6f5fe90f3a 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -1474,11 +1474,11 @@ static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
RBDDiffIterateReq *req = opaque;
assert(req->offs + req->bytes <= offs);
-
- /* treat a hole like an unallocated area and bail out */
- if (!exists) {
- return 0;
- }
+ /*
+ * we do not diff against a snapshot so we should never receive a callback
+ * for a hole.
+ */
+ assert(exists);
if (!req->exists && offs > req->offs) {
/*

View File

@@ -0,0 +1,162 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Tue, 17 May 2022 09:46:02 +0200
Subject: [PATCH] Revert "block/rbd: implement bdrv_co_block_status"
During backup, bdrv_co_block_status is called for each block copy
chunk. When RBD is used, the current implementation with
rbd_diff_iterate2() using whole_object=true takes about linearly more
time, depending on the image size. Since there are linearly more
chunks, the slowdown is quadratic, becoming unacceptable for large
images (starting somewhere between 500-1000 GiB in my testing).
This reverts commit 0347a8fd4c3faaedf119be04c197804be40a384b as a
stop-gap measure, until it's clear how to make the implemenation
more efficient.
Upstream bug report:
https://gitlab.com/qemu-project/qemu/-/issues/1026
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/rbd.c | 112 ----------------------------------------------------
1 file changed, 112 deletions(-)
diff --git a/block/rbd.c b/block/rbd.c
index 6f5fe90f3a..24e820d056 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -108,12 +108,6 @@ typedef struct RBDTask {
int64_t ret;
} RBDTask;
-typedef struct RBDDiffIterateReq {
- uint64_t offs;
- uint64_t bytes;
- bool exists;
-} RBDDiffIterateReq;
-
static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
BlockdevOptionsRbd *opts, bool cache,
const char *keypairs, const char *secretid,
@@ -1460,111 +1454,6 @@ static ImageInfoSpecific *qemu_rbd_get_specific_info(BlockDriverState *bs,
return spec_info;
}
-/*
- * rbd_diff_iterate2 allows to interrupt the exection by returning a negative
- * value in the callback routine. Choose a value that does not conflict with
- * an existing exitcode and return it if we want to prematurely stop the
- * execution because we detected a change in the allocation status.
- */
-#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
-
-static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
- int exists, void *opaque)
-{
- RBDDiffIterateReq *req = opaque;
-
- assert(req->offs + req->bytes <= offs);
- /*
- * we do not diff against a snapshot so we should never receive a callback
- * for a hole.
- */
- assert(exists);
-
- if (!req->exists && offs > req->offs) {
- /*
- * we started in an unallocated area and hit the first allocated
- * block. req->bytes must be set to the length of the unallocated area
- * before the allocated area. stop further processing.
- */
- req->bytes = offs - req->offs;
- return QEMU_RBD_EXIT_DIFF_ITERATE2;
- }
-
- if (req->exists && offs > req->offs + req->bytes) {
- /*
- * we started in an allocated area and jumped over an unallocated area,
- * req->bytes contains the length of the allocated area before the
- * unallocated area. stop further processing.
- */
- return QEMU_RBD_EXIT_DIFF_ITERATE2;
- }
-
- req->bytes += len;
- req->exists = true;
-
- return 0;
-}
-
-static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum,
- int64_t *map,
- BlockDriverState **file)
-{
- BDRVRBDState *s = bs->opaque;
- int status, r;
- RBDDiffIterateReq req = { .offs = offset };
- uint64_t features, flags;
-
- assert(offset + bytes <= s->image_size);
-
- /* default to all sectors allocated */
- status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
- *map = offset;
- *file = bs;
- *pnum = bytes;
-
- /* check if RBD image supports fast-diff */
- r = rbd_get_features(s->image, &features);
- if (r < 0) {
- return status;
- }
- if (!(features & RBD_FEATURE_FAST_DIFF)) {
- return status;
- }
-
- /* check if RBD fast-diff result is valid */
- r = rbd_get_flags(s->image, &flags);
- if (r < 0) {
- return status;
- }
- if (flags & RBD_FLAG_FAST_DIFF_INVALID) {
- return status;
- }
-
- r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
- qemu_rbd_diff_iterate_cb, &req);
- if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
- return status;
- }
- assert(req.bytes <= bytes);
- if (!req.exists) {
- if (r == 0) {
- /*
- * rbd_diff_iterate2 does not invoke callbacks for unallocated
- * areas. This here catches the case where no callback was
- * invoked at all (req.bytes == 0).
- */
- assert(req.bytes == 0);
- req.bytes = bytes;
- }
- status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
- }
-
- *pnum = req.bytes;
- return status;
-}
-
static int64_t coroutine_fn qemu_rbd_co_getlength(BlockDriverState *bs)
{
BDRVRBDState *s = bs->opaque;
@@ -1801,7 +1690,6 @@ static BlockDriver bdrv_rbd = {
#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
.bdrv_co_pwrite_zeroes = qemu_rbd_co_pwrite_zeroes,
#endif
- .bdrv_co_block_status = qemu_rbd_co_block_status,
.bdrv_snapshot_create = qemu_rbd_snap_create,
.bdrv_snapshot_delete = qemu_rbd_snap_remove,

View File

@@ -1,597 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Tue, 26 Jan 2021 15:45:30 +0100
Subject: [PATCH] PVE: Use coroutine QMP for backup/cancel_backup
Finally turn backup QMP calls into coroutines, now that it's possible.
This has the benefit that calls are asynchronous to the main loop, i.e.
long running operations like connecting to a PBS server will no longer
hang the VM.
Additionally, it allows us to get rid of block_on_coroutine_fn, which
was always a hacky workaround.
While we're already spring cleaning, also remove the QmpBackupTask
struct, since we can now put the 'prepare' function directly into
qmp_backup and thus no longer need those giant walls of text.
(Note that for our patches to work with 5.2.0 this change is actually
required, otherwise monitor_get_fd() fails as we're not in a QMP
coroutine, but one we start ourselves - we could of course set the
monitor for that coroutine ourselves, but let's just fix it the right
way instead)
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 4 +-
hmp-commands.hx | 2 +
proxmox-backup-client.c | 31 -----
pve-backup.c | 232 ++++++++++-----------------------
qapi/block-core.json | 4 +-
5 files changed, 77 insertions(+), 196 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 46c63b1cf9..11c84d5508 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1013,7 +1013,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
g_free(global_snapshots);
}
-void hmp_backup_cancel(Monitor *mon, const QDict *qdict)
+void coroutine_fn hmp_backup_cancel(Monitor *mon, const QDict *qdict)
{
Error *error = NULL;
@@ -1022,7 +1022,7 @@ void hmp_backup_cancel(Monitor *mon, const QDict *qdict)
hmp_handle_error(mon, error);
}
-void hmp_backup(Monitor *mon, const QDict *qdict)
+void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
{
Error *error = NULL;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 0c6b944850..54de3f80e6 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -108,6 +108,7 @@ ERST
"\n\t\t\t Use -d to dump data into a directory instead"
"\n\t\t\t of using VMA format.",
.cmd = hmp_backup,
+ .coroutine = true,
},
SRST
@@ -121,6 +122,7 @@ ERST
.params = "",
.help = "cancel the current VM backup",
.cmd = hmp_backup_cancel,
+ .coroutine = true,
},
SRST
diff --git a/proxmox-backup-client.c b/proxmox-backup-client.c
index 4ce7bc0b5e..0923037dec 100644
--- a/proxmox-backup-client.c
+++ b/proxmox-backup-client.c
@@ -5,37 +5,6 @@
/* Proxmox Backup Server client bindings using coroutines */
-typedef struct BlockOnCoroutineWrapper {
- AioContext *ctx;
- CoroutineEntry *entry;
- void *entry_arg;
- bool finished;
-} BlockOnCoroutineWrapper;
-
-static void coroutine_fn block_on_coroutine_wrapper(void *opaque)
-{
- BlockOnCoroutineWrapper *wrapper = opaque;
- wrapper->entry(wrapper->entry_arg);
- wrapper->finished = true;
- aio_wait_kick();
-}
-
-void block_on_coroutine_fn(CoroutineEntry *entry, void *entry_arg)
-{
- assert(!qemu_in_coroutine());
-
- AioContext *ctx = qemu_get_current_aio_context();
- BlockOnCoroutineWrapper wrapper = {
- .finished = false,
- .entry = entry,
- .entry_arg = entry_arg,
- .ctx = ctx,
- };
- Coroutine *wrapper_co = qemu_coroutine_create(block_on_coroutine_wrapper, &wrapper);
- aio_co_enter(ctx, wrapper_co);
- AIO_WAIT_WHILE(ctx, !wrapper.finished);
-}
-
// This is called from another thread, so we use aio_co_schedule()
static void proxmox_backup_schedule_wake(void *data) {
CoCtxData *waker = (CoCtxData *)data;
diff --git a/pve-backup.c b/pve-backup.c
index bd2647e5f3..dec9c0d188 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -357,7 +357,7 @@ static void job_cancel_bh(void *opaque) {
aio_co_enter(data->ctx, data->co);
}
-static void coroutine_fn pvebackup_co_cancel(void *opaque)
+void coroutine_fn qmp_backup_cancel(Error **errp)
{
Error *cancel_err = NULL;
error_setg(&cancel_err, "backup canceled");
@@ -394,11 +394,6 @@ static void coroutine_fn pvebackup_co_cancel(void *opaque)
qemu_co_mutex_unlock(&backup_state.backup_mutex);
}
-void qmp_backup_cancel(Error **errp)
-{
- block_on_coroutine_fn(pvebackup_co_cancel, NULL);
-}
-
// assumes the caller holds backup_mutex
static int coroutine_fn pvebackup_co_add_config(
const char *file,
@@ -531,50 +526,27 @@ static void create_backup_jobs_bh(void *opaque) {
aio_co_enter(data->ctx, data->co);
}
-typedef struct QmpBackupTask {
- const char *backup_file;
- bool has_password;
- const char *password;
- bool has_keyfile;
- const char *keyfile;
- bool has_key_password;
- const char *key_password;
- bool has_backup_id;
- const char *backup_id;
- bool has_backup_time;
- const char *fingerprint;
- bool has_fingerprint;
- int64_t backup_time;
- bool has_use_dirty_bitmap;
- bool use_dirty_bitmap;
- bool has_format;
- BackupFormat format;
- bool has_config_file;
- const char *config_file;
- bool has_firewall_file;
- const char *firewall_file;
- bool has_devlist;
- const char *devlist;
- bool has_compress;
- bool compress;
- bool has_encrypt;
- bool encrypt;
- bool has_speed;
- int64_t speed;
- Error **errp;
- UuidInfo *result;
-} QmpBackupTask;
-
-static void coroutine_fn pvebackup_co_prepare(void *opaque)
+UuidInfo coroutine_fn *qmp_backup(
+ const char *backup_file,
+ bool has_password, const char *password,
+ bool has_keyfile, const char *keyfile,
+ bool has_key_password, const char *key_password,
+ bool has_fingerprint, const char *fingerprint,
+ bool has_backup_id, const char *backup_id,
+ bool has_backup_time, int64_t backup_time,
+ bool has_use_dirty_bitmap, bool use_dirty_bitmap,
+ bool has_compress, bool compress,
+ bool has_encrypt, bool encrypt,
+ bool has_format, BackupFormat format,
+ bool has_config_file, const char *config_file,
+ bool has_firewall_file, const char *firewall_file,
+ bool has_devlist, const char *devlist,
+ bool has_speed, int64_t speed, Error **errp)
{
assert(qemu_in_coroutine());
qemu_co_mutex_lock(&backup_state.backup_mutex);
- QmpBackupTask *task = opaque;
-
- task->result = NULL; // just to be sure
-
BlockBackend *blk;
BlockDriverState *bs = NULL;
const char *backup_dir = NULL;
@@ -591,17 +563,17 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
const char *firewall_name = "qemu-server.fw";
if (backup_state.di_list) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
"previous backup not finished");
qemu_co_mutex_unlock(&backup_state.backup_mutex);
- return;
+ return NULL;
}
/* Todo: try to auto-detect format based on file name */
- BackupFormat format = task->has_format ? task->format : BACKUP_FORMAT_VMA;
+ format = has_format ? format : BACKUP_FORMAT_VMA;
- if (task->has_devlist) {
- devs = g_strsplit_set(task->devlist, ",;:", -1);
+ if (has_devlist) {
+ devs = g_strsplit_set(devlist, ",;:", -1);
gchar **d = devs;
while (d && *d) {
@@ -609,14 +581,14 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (blk) {
bs = blk_bs(blk);
if (!bdrv_is_inserted(bs)) {
- error_setg(task->errp, QERR_DEVICE_HAS_NO_MEDIUM, *d);
+ error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, *d);
goto err;
}
PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
di->bs = bs;
di_list = g_list_append(di_list, di);
} else {
- error_set(task->errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+ error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
"Device '%s' not found", *d);
goto err;
}
@@ -639,7 +611,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
if (!di_list) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "empty device list");
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "empty device list");
goto err;
}
@@ -649,13 +621,13 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
while (l) {
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
- if (bdrv_op_is_blocked(di->bs, BLOCK_OP_TYPE_BACKUP_SOURCE, task->errp)) {
+ if (bdrv_op_is_blocked(di->bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
goto err;
}
ssize_t size = bdrv_getlength(di->bs);
if (size < 0) {
- error_setg_errno(task->errp, -di->size, "bdrv_getlength failed");
+ error_setg_errno(errp, -di->size, "bdrv_getlength failed");
goto err;
}
di->size = size;
@@ -682,47 +654,44 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
if (format == BACKUP_FORMAT_PBS) {
- if (!task->has_password) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'password'");
+ if (!has_password) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'password'");
goto err_mutex;
}
- if (!task->has_backup_id) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-id'");
+ if (!has_backup_id) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-id'");
goto err_mutex;
}
- if (!task->has_backup_time) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-time'");
+ if (!has_backup_time) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-time'");
goto err_mutex;
}
int dump_cb_block_size = PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE; // Hardcoded (4M)
firewall_name = "fw.conf";
- bool use_dirty_bitmap = task->has_use_dirty_bitmap && task->use_dirty_bitmap;
-
-
char *pbs_err = NULL;
pbs = proxmox_backup_new(
- task->backup_file,
- task->backup_id,
- task->backup_time,
+ backup_file,
+ backup_id,
+ backup_time,
dump_cb_block_size,
- task->has_password ? task->password : NULL,
- task->has_keyfile ? task->keyfile : NULL,
- task->has_key_password ? task->key_password : NULL,
- task->has_compress ? task->compress : true,
- task->has_encrypt ? task->encrypt : task->has_keyfile,
- task->has_fingerprint ? task->fingerprint : NULL,
+ has_password ? password : NULL,
+ has_keyfile ? keyfile : NULL,
+ has_key_password ? key_password : NULL,
+ has_compress ? compress : true,
+ has_encrypt ? encrypt : has_keyfile,
+ has_fingerprint ? fingerprint : NULL,
&pbs_err);
if (!pbs) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
"proxmox_backup_new failed: %s", pbs_err);
proxmox_backup_free_error(pbs_err);
goto err_mutex;
}
- int connect_result = proxmox_backup_co_connect(pbs, task->errp);
+ int connect_result = proxmox_backup_co_connect(pbs, errp);
if (connect_result < 0)
goto err_mutex;
@@ -741,9 +710,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
bool expect_only_dirty = false;
- if (use_dirty_bitmap) {
+ if (has_use_dirty_bitmap && use_dirty_bitmap) {
if (bitmap == NULL) {
- bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, task->errp);
+ bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, errp);
if (!bitmap) {
goto err_mutex;
}
@@ -773,12 +742,12 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
}
- int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, task->errp);
+ int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, errp);
if (dev_id < 0) {
goto err_mutex;
}
- if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, task->errp))) {
+ if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, errp))) {
goto err_mutex;
}
@@ -792,10 +761,10 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
backup_state.stat.bitmap_list = g_list_append(backup_state.stat.bitmap_list, info);
}
} else if (format == BACKUP_FORMAT_VMA) {
- vmaw = vma_writer_create(task->backup_file, uuid, &local_err);
+ vmaw = vma_writer_create(backup_file, uuid, &local_err);
if (!vmaw) {
if (local_err) {
- error_propagate(task->errp, local_err);
+ error_propagate(errp, local_err);
}
goto err_mutex;
}
@@ -806,25 +775,25 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
- if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_vma_cb, di, task->errp))) {
+ if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_vma_cb, di, errp))) {
goto err_mutex;
}
const char *devname = bdrv_get_device_name(di->bs);
di->dev_id = vma_writer_register_stream(vmaw, devname, di->size);
if (di->dev_id <= 0) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
"register_stream failed");
goto err_mutex;
}
}
} else if (format == BACKUP_FORMAT_DIR) {
- if (mkdir(task->backup_file, 0640) != 0) {
- error_setg_errno(task->errp, errno, "can't create directory '%s'\n",
- task->backup_file);
+ if (mkdir(backup_file, 0640) != 0) {
+ error_setg_errno(errp, errno, "can't create directory '%s'\n",
+ backup_file);
goto err_mutex;
}
- backup_dir = task->backup_file;
+ backup_dir = backup_file;
l = di_list;
while (l) {
@@ -838,34 +807,34 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
bdrv_img_create(di->targetfile, "raw", NULL, NULL, NULL,
di->size, flags, false, &local_err);
if (local_err) {
- error_propagate(task->errp, local_err);
+ error_propagate(errp, local_err);
goto err_mutex;
}
di->target = bdrv_open(di->targetfile, NULL, NULL, flags, &local_err);
if (!di->target) {
- error_propagate(task->errp, local_err);
+ error_propagate(errp, local_err);
goto err_mutex;
}
}
} else {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "unknown backup format");
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "unknown backup format");
goto err_mutex;
}
/* add configuration file to archive */
- if (task->has_config_file) {
- if (pvebackup_co_add_config(task->config_file, config_name, format, backup_dir,
- vmaw, pbs, task->errp) != 0) {
+ if (has_config_file) {
+ if (pvebackup_co_add_config(config_file, config_name, format, backup_dir,
+ vmaw, pbs, errp) != 0) {
goto err_mutex;
}
}
/* add firewall file to archive */
- if (task->has_firewall_file) {
- if (pvebackup_co_add_config(task->firewall_file, firewall_name, format, backup_dir,
- vmaw, pbs, task->errp) != 0) {
+ if (has_firewall_file) {
+ if (pvebackup_co_add_config(firewall_file, firewall_name, format, backup_dir,
+ vmaw, pbs, errp) != 0) {
goto err_mutex;
}
}
@@ -883,7 +852,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (backup_state.stat.backup_file) {
g_free(backup_state.stat.backup_file);
}
- backup_state.stat.backup_file = g_strdup(task->backup_file);
+ backup_state.stat.backup_file = g_strdup(backup_file);
uuid_copy(backup_state.stat.uuid, uuid);
uuid_unparse_lower(uuid, backup_state.stat.uuid_str);
@@ -898,7 +867,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
qemu_mutex_unlock(&backup_state.stat.lock);
- backup_state.speed = (task->has_speed && task->speed > 0) ? task->speed : 0;
+ backup_state.speed = (has_speed && speed > 0) ? speed : 0;
backup_state.vmaw = vmaw;
backup_state.pbs = pbs;
@@ -908,8 +877,6 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
uuid_info = g_malloc0(sizeof(*uuid_info));
uuid_info->UUID = uuid_str;
- task->result = uuid_info;
-
/* Run create_backup_jobs_bh outside of coroutine (in BH) but keep
* backup_mutex locked. This is fine, a CoMutex can be held across yield
* points, and we'll release it as soon as the BH reschedules us.
@@ -923,7 +890,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
qemu_coroutine_yield();
if (local_err) {
- error_propagate(task->errp, local_err);
+ error_propagate(errp, local_err);
goto err;
}
@@ -936,7 +903,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
/* start the first job in the transaction */
job_txn_start_seq(backup_state.txn);
- return;
+ return uuid_info;
err_mutex:
qemu_mutex_unlock(&backup_state.stat.lock);
@@ -967,7 +934,7 @@ err:
if (vmaw) {
Error *err = NULL;
vma_writer_close(vmaw, &err);
- unlink(task->backup_file);
+ unlink(backup_file);
}
if (pbs) {
@@ -978,65 +945,8 @@ err:
rmdir(backup_dir);
}
- task->result = NULL;
-
qemu_co_mutex_unlock(&backup_state.backup_mutex);
- return;
-}
-
-UuidInfo *qmp_backup(
- const char *backup_file,
- bool has_password, const char *password,
- bool has_keyfile, const char *keyfile,
- bool has_key_password, const char *key_password,
- bool has_fingerprint, const char *fingerprint,
- bool has_backup_id, const char *backup_id,
- bool has_backup_time, int64_t backup_time,
- bool has_use_dirty_bitmap, bool use_dirty_bitmap,
- bool has_compress, bool compress,
- bool has_encrypt, bool encrypt,
- bool has_format, BackupFormat format,
- bool has_config_file, const char *config_file,
- bool has_firewall_file, const char *firewall_file,
- bool has_devlist, const char *devlist,
- bool has_speed, int64_t speed, Error **errp)
-{
- QmpBackupTask task = {
- .backup_file = backup_file,
- .has_password = has_password,
- .password = password,
- .has_keyfile = has_keyfile,
- .keyfile = keyfile,
- .has_key_password = has_key_password,
- .key_password = key_password,
- .has_fingerprint = has_fingerprint,
- .fingerprint = fingerprint,
- .has_backup_id = has_backup_id,
- .backup_id = backup_id,
- .has_backup_time = has_backup_time,
- .backup_time = backup_time,
- .has_use_dirty_bitmap = has_use_dirty_bitmap,
- .use_dirty_bitmap = use_dirty_bitmap,
- .has_compress = has_compress,
- .compress = compress,
- .has_encrypt = has_encrypt,
- .encrypt = encrypt,
- .has_format = has_format,
- .format = format,
- .has_config_file = has_config_file,
- .config_file = config_file,
- .has_firewall_file = has_firewall_file,
- .firewall_file = firewall_file,
- .has_devlist = has_devlist,
- .devlist = devlist,
- .has_speed = has_speed,
- .speed = speed,
- .errp = errp,
- };
-
- block_on_coroutine_fn(pvebackup_co_prepare, &task);
-
- return task.result;
+ return NULL;
}
BackupStatus *qmp_query_backup(Error **errp)
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 30eb1262ff..6ff5367383 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -847,7 +847,7 @@
'*config-file': 'str',
'*firewall-file': 'str',
'*devlist': 'str', '*speed': 'int' },
- 'returns': 'UuidInfo' }
+ 'returns': 'UuidInfo', 'coroutine': true }
##
# @query-backup:
@@ -869,7 +869,7 @@
# Notes: This command succeeds even if there is no backup process running.
#
##
-{ 'command': 'backup-cancel' }
+{ 'command': 'backup-cancel', 'coroutine': true }
##
# @ProxmoxSupportStatus:

View File

@@ -0,0 +1,43 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Tue, 26 Mar 2024 14:57:51 +0100
Subject: [PATCH] alloc-track: error out when auto-remove is not set
Since replacing the node now happens in the stream job, where the
option cannot be read from (it's internal to the driver), it will
always be treated as on.
qemu-server will always set it, make sure to have other users notice
the change (should they even exist). The option can be fully dropped
in the future while adding a version guard in qemu-server.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/alloc-track.c | 7 +++++--
1 file changed, 5 insertions(+), 2 deletions(-)
diff --git a/block/alloc-track.c b/block/alloc-track.c
index b4a9851144..fc7d58a5d0 100644
--- a/block/alloc-track.c
+++ b/block/alloc-track.c
@@ -34,7 +34,6 @@ typedef struct {
BdrvDirtyBitmap *bitmap;
uint64_t granularity;
DropState drop_state;
- bool auto_remove;
} BDRVAllocTrackState;
static QemuOptsList runtime_opts = {
@@ -86,7 +85,11 @@ static int track_open(BlockDriverState *bs, QDict *options, int flags,
goto fail;
}
- s->auto_remove = qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false);
+ if (!qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false)) {
+ error_setg(errp, "alloc-track: requires auto-remove option to be set to on");
+ ret = -EINVAL;
+ goto fail;
+ }
/* open the target (write) node, backing will be attached by block layer */
file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,

View File

@@ -1,97 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 10 Feb 2021 11:07:06 +0100
Subject: [PATCH] PBS: add master key support
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
this requires a new enough libproxmox-backup-qemu0, and allows querying
from the PVE side to avoid QMP calls with unsupported parameters.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 1 +
pve-backup.c | 3 +++
qapi/block-core.json | 7 +++++++
3 files changed, 11 insertions(+)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 11c84d5508..0932deb28c 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1036,6 +1036,7 @@ void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
false, NULL, // PBS password
false, NULL, // PBS keyfile
false, NULL, // PBS key_password
+ false, NULL, // PBS master_keyfile
false, NULL, // PBS fingerprint
false, NULL, // PBS backup-id
false, 0, // PBS backup-time
diff --git a/pve-backup.c b/pve-backup.c
index dec9c0d188..076146cc1e 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -531,6 +531,7 @@ UuidInfo coroutine_fn *qmp_backup(
bool has_password, const char *password,
bool has_keyfile, const char *keyfile,
bool has_key_password, const char *key_password,
+ bool has_master_keyfile, const char *master_keyfile,
bool has_fingerprint, const char *fingerprint,
bool has_backup_id, const char *backup_id,
bool has_backup_time, int64_t backup_time,
@@ -679,6 +680,7 @@ UuidInfo coroutine_fn *qmp_backup(
has_password ? password : NULL,
has_keyfile ? keyfile : NULL,
has_key_password ? key_password : NULL,
+ has_master_keyfile ? master_keyfile : NULL,
has_compress ? compress : true,
has_encrypt ? encrypt : has_keyfile,
has_fingerprint ? fingerprint : NULL,
@@ -1042,5 +1044,6 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
ret->pbs_dirty_bitmap_savevm = true;
ret->pbs_dirty_bitmap_migration = true;
ret->query_bitmap_info = true;
+ ret->pbs_masterkey = true;
return ret;
}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6ff5367383..bef9b65fec 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -818,6 +818,8 @@
#
# @key-password: password for keyfile (optional for format 'pbs')
#
+# @master-keyfile: PEM-formatted master public keyfile (optional for format 'pbs')
+#
# @fingerprint: server cert fingerprint (optional for format 'pbs')
#
# @backup-id: backup ID (required for format 'pbs')
@@ -837,6 +839,7 @@
'*password': 'str',
'*keyfile': 'str',
'*key-password': 'str',
+ '*master-keyfile': 'str',
'*fingerprint': 'str',
'*backup-id': 'str',
'*backup-time': 'int',
@@ -889,6 +892,9 @@
# migration cap if this is false/unset may lead
# to crashes on migration!
#
+# @pbs-masterkey: True if the QMP backup call supports the 'master_keyfile'
+# parameter.
+#
# @pbs-library-version: Running version of libproxmox-backup-qemu0 library.
#
##
@@ -897,6 +903,7 @@
'query-bitmap-info': 'bool',
'pbs-dirty-bitmap-savevm': 'bool',
'pbs-dirty-bitmap-migration': 'bool',
+ 'pbs-masterkey': 'bool',
'pbs-library-version': 'str' } }
##

View File

@@ -0,0 +1,84 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Wed, 27 Mar 2024 11:15:39 +0100
Subject: [PATCH] alloc-track: avoid seemingly superfluous child permission
update
Doesn't seem necessary nowadays (maybe after commit "alloc-track: fix
deadlock during drop" where the dropping is not rescheduled and delayed
anymore or some upstream change). Should there really be some issue,
instead of having a drop state, this could also be just based off the
fact whether there is still a backing child.
Dumping the cumulative (shared) permissions for the BDS with a debug
print yields the same values after this patch and with QEMU 8.1,
namely 3 and 5.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
block/alloc-track.c | 26 --------------------------
1 file changed, 26 deletions(-)
diff --git a/block/alloc-track.c b/block/alloc-track.c
index fc7d58a5d0..b56425b7f0 100644
--- a/block/alloc-track.c
+++ b/block/alloc-track.c
@@ -25,15 +25,9 @@
#define TRACK_OPT_AUTO_REMOVE "auto-remove"
-typedef enum DropState {
- DropNone,
- DropInProgress,
-} DropState;
-
typedef struct {
BdrvDirtyBitmap *bitmap;
uint64_t granularity;
- DropState drop_state;
} BDRVAllocTrackState;
static QemuOptsList runtime_opts = {
@@ -137,8 +131,6 @@ static int track_open(BlockDriverState *bs, QDict *options, int flags,
goto fail;
}
- s->drop_state = DropNone;
-
fail:
if (ret < 0) {
bdrv_graph_wrlock();
@@ -289,18 +281,8 @@ track_child_perm(BlockDriverState *bs, BdrvChild *c, BdrvChildRole role,
BlockReopenQueue *reopen_queue, uint64_t perm, uint64_t shared,
uint64_t *nperm, uint64_t *nshared)
{
- BDRVAllocTrackState *s = bs->opaque;
-
*nshared = BLK_PERM_ALL;
- /* in case we're currently dropping ourselves, claim to not use any
- * permissions at all - which is fine, since from this point on we will
- * never issue a read or write anymore */
- if (s->drop_state == DropInProgress) {
- *nperm = 0;
- return;
- }
-
if (role & BDRV_CHILD_DATA) {
*nperm = perm & DEFAULT_PERM_PASSTHROUGH;
} else {
@@ -326,14 +308,6 @@ track_co_change_backing_file(BlockDriverState *bs, const char *backing_file,
* kinda fits better, but in the long-term, a special parameter would be
* nice (or done via qemu-server via upcoming blockdev-replace QMP command).
*/
- if (backing_file == NULL) {
- BDRVAllocTrackState *s = bs->opaque;
- bdrv_drained_begin(bs);
- s->drop_state = DropInProgress;
- bdrv_child_refresh_perms(bs, bs->file, &error_abort);
- bdrv_drained_end(bs);
- }
-
return 0;
}

View File

@@ -0,0 +1,337 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 11 Apr 2024 11:29:28 +0200
Subject: [PATCH] PVE backup: add fleecing option
When a fleecing option is given, it is expected that each device has
a corresponding "-fleecing" block device already attached, except for
EFI disk and TPM state, where fleecing is never used.
The following graph was adapted from [0] which also contains more
details about fleecing.
[guest]
|
| root
v file
[copy-before-write]<------[snapshot-access]
| |
| file | target
v v
[source] [fleecing]
For fleecing, a copy-before-write filter is inserted on top of the
source node, as well as a snapshot-access node pointing to the filter
node which allows to read the consistent state of the image at the
time it was inserted. New guest writes are passed through the
copy-before-write filter which will first copy over old data to the
fleecing image in case that old data is still needed by the
snapshot-access node.
The backup process will sequentially read from the snapshot access,
which has a bitmap and knows whether to read from the original image
or the fleecing image to get the "snapshot" state, i.e. data from the
source image at the time when the copy-before-write filter was
inserted. After reading, the copied sections are discarded from the
fleecing image to reduce space usage.
All of this can be restricted by an initial dirty bitmap to parts of
the source image that are required for an incremental backup.
For discard to work, it is necessary that the fleecing image does not
have a larger cluster size than the backup job granularity. Since
querying that size does not always work, e.g. for RBD with krbd, the
cluster size will not be reported, a minimum of 4 MiB is used. A job
with PBS target already has at least this granularity, so it's just
relevant for other targets. I.e. edge cases where this minimum is not
enough should be very rare in practice. If ever necessary in the
future, can still add a passed-in value for the backup QMP command to
override.
Additionally, the cbw-timeout and on-cbw-error=break-snapshot options
are set when installing the copy-before-write filter and
snapshot-access. When an error or timeout occurs, the problematic (and
each further) snapshot operation will fail and thus cancel the backup
instead of breaking the guest write.
Note that job_id cannot be inferred from the snapshot-access bs because
it has no parent, so just pass the one from the original bs.
[0]: https://www.mail-archive.com/qemu-devel@nongnu.org/msg876056.html
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 1 +
pve-backup.c | 135 ++++++++++++++++++++++++++++++++-
qapi/block-core.json | 10 ++-
3 files changed, 142 insertions(+), 4 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 439a7a14c8..d0e7771dcc 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1044,6 +1044,7 @@ void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
NULL, NULL,
devlist, qdict_haskey(qdict, "speed"), speed,
false, 0, // BackupPerf max-workers
+ false, false, // fleecing
&error);
hmp_handle_error(mon, error);
diff --git a/pve-backup.c b/pve-backup.c
index faa6a9b93c..4b0820c8a7 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -7,9 +7,11 @@
#include "sysemu/blockdev.h"
#include "block/block_int-global-state.h"
#include "block/blockjob.h"
+#include "block/copy-before-write.h"
#include "block/dirty-bitmap.h"
#include "block/graph-lock.h"
#include "qapi/qapi-commands-block.h"
+#include "qapi/qmp/qdict.h"
#include "qapi/qmp/qerror.h"
#include "qemu/cutils.h"
@@ -80,8 +82,15 @@ static void pvebackup_init(void)
// initialize PVEBackupState at startup
opts_init(pvebackup_init);
+typedef struct PVEBackupFleecingInfo {
+ BlockDriverState *bs;
+ BlockDriverState *cbw;
+ BlockDriverState *snapshot_access;
+} PVEBackupFleecingInfo;
+
typedef struct PVEBackupDevInfo {
BlockDriverState *bs;
+ PVEBackupFleecingInfo fleecing;
size_t size;
uint64_t block_size;
uint8_t dev_id;
@@ -353,6 +362,22 @@ static void pvebackup_complete_cb(void *opaque, int ret)
PVEBackupDevInfo *di = opaque;
di->completed_ret = ret;
+ /*
+ * Handle block-graph specific cleanup (for fleecing) outside of the coroutine, because the work
+ * won't be done as a coroutine anyways:
+ * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via bdrv_co_unref() would
+ * just spawn a BH calling bdrv_unref().
+ * - For cbw, draining would need to spawn a BH.
+ */
+ if (di->fleecing.snapshot_access) {
+ bdrv_unref(di->fleecing.snapshot_access);
+ di->fleecing.snapshot_access = NULL;
+ }
+ if (di->fleecing.cbw) {
+ bdrv_cbw_drop(di->fleecing.cbw);
+ di->fleecing.cbw = NULL;
+ }
+
/*
* Needs to happen outside of coroutine, because it takes the graph write lock.
*/
@@ -519,9 +544,77 @@ static void create_backup_jobs_bh(void *opaque) {
}
bdrv_drained_begin(di->bs);
+ BackupPerf perf = (BackupPerf){ .max_workers = backup_state.perf.max_workers };
+
+ BlockDriverState *source_bs = di->bs;
+ bool discard_source = false;
+ bdrv_graph_co_rdlock();
+ const char *job_id = bdrv_get_device_name(di->bs);
+ bdrv_graph_co_rdunlock();
+ if (di->fleecing.bs) {
+ QDict *cbw_opts = qdict_new();
+ qdict_put_str(cbw_opts, "driver", "copy-before-write");
+ qdict_put_str(cbw_opts, "file", bdrv_get_node_name(di->bs));
+ qdict_put_str(cbw_opts, "target", bdrv_get_node_name(di->fleecing.bs));
+
+ if (di->bitmap) {
+ /*
+ * Only guest writes to parts relevant for the backup need to be intercepted with
+ * old data being copied to the fleecing image.
+ */
+ qdict_put_str(cbw_opts, "bitmap.node", bdrv_get_node_name(di->bs));
+ qdict_put_str(cbw_opts, "bitmap.name", bdrv_dirty_bitmap_name(di->bitmap));
+ }
+ /*
+ * Fleecing storage is supposed to be fast and it's better to break backup than guest
+ * writes. Certain guest drivers like VirtIO-win have 60 seconds timeout by default, so
+ * abort a bit before that.
+ */
+ qdict_put_str(cbw_opts, "on-cbw-error", "break-snapshot");
+ qdict_put_int(cbw_opts, "cbw-timeout", 45);
+
+ di->fleecing.cbw = bdrv_insert_node(di->bs, cbw_opts, BDRV_O_RDWR, &local_err);
+
+ if (!di->fleecing.cbw) {
+ error_setg(errp, "appending cbw node for fleecing failed: %s",
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ break;
+ }
+
+ QDict *snapshot_access_opts = qdict_new();
+ qdict_put_str(snapshot_access_opts, "driver", "snapshot-access");
+ qdict_put_str(snapshot_access_opts, "file", bdrv_get_node_name(di->fleecing.cbw));
+
+ di->fleecing.snapshot_access =
+ bdrv_open(NULL, NULL, snapshot_access_opts, BDRV_O_RDWR | BDRV_O_UNMAP, &local_err);
+ if (!di->fleecing.snapshot_access) {
+ error_setg(errp, "setting up snapshot access for fleecing failed: %s",
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ break;
+ }
+ source_bs = di->fleecing.snapshot_access;
+ discard_source = true;
+
+ /*
+ * bdrv_get_info() just retuns 0 (= doesn't matter) for RBD when using krbd. But discard
+ * on the fleecing image won't work if the backup job's granularity is less than the RBD
+ * object size (default 4 MiB), so it does matter. Always use at least 4 MiB. With a PBS
+ * target, the backup job granularity would already be at least this much.
+ */
+ perf.min_cluster_size = 4 * 1024 * 1024;
+ /*
+ * For discard to work, cluster size for the backup job must be at least the same as for
+ * the fleecing image.
+ */
+ BlockDriverInfo bdi;
+ if (bdrv_get_info(di->fleecing.bs, &bdi) >= 0) {
+ perf.min_cluster_size = MAX(perf.min_cluster_size, bdi.cluster_size);
+ }
+ }
+
BlockJob *job = backup_job_create(
- NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
- bitmap_mode, false, NULL, &backup_state.perf, BLOCKDEV_ON_ERROR_REPORT,
+ job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap,
+ bitmap_mode, false, discard_source, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT,
BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn,
&local_err);
@@ -577,6 +670,14 @@ static void create_backup_jobs_bh(void *opaque) {
aio_co_enter(data->ctx, data->co);
}
+/*
+ * EFI disk and TPM state are small and it's just not worth setting up fleecing for them.
+ */
+static bool device_uses_fleecing(const char *device_id)
+{
+ return strncmp(device_id, "drive-efidisk", 13) && strncmp(device_id, "drive-tpmstate", 14);
+}
+
/*
* Returns a list of device infos, which needs to be freed by the caller. In
* case of an error, errp will be set, but the returned value might still be a
@@ -584,6 +685,7 @@ static void create_backup_jobs_bh(void *opaque) {
*/
static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
const char *devlist,
+ bool fleecing,
Error **errp)
{
gchar **devs = NULL;
@@ -607,6 +709,31 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
}
PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
di->bs = bs;
+
+ if (fleecing && device_uses_fleecing(*d)) {
+ g_autofree gchar *fleecing_devid = g_strconcat(*d, "-fleecing", NULL);
+ BlockBackend *fleecing_blk = blk_by_name(fleecing_devid);
+ if (!fleecing_blk) {
+ error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+ "Device '%s' not found", fleecing_devid);
+ goto err;
+ }
+ BlockDriverState *fleecing_bs = blk_bs(fleecing_blk);
+ if (!bdrv_co_is_inserted(fleecing_bs)) {
+ error_setg(errp, "Device '%s' has no medium", fleecing_devid);
+ goto err;
+ }
+ /*
+ * Fleecing image needs to be the same size to act as a cbw target.
+ */
+ if (bs->total_sectors != fleecing_bs->total_sectors) {
+ error_setg(errp, "Size mismatch for '%s' - sector count %ld != %ld",
+ fleecing_devid, fleecing_bs->total_sectors, bs->total_sectors);
+ goto err;
+ }
+ di->fleecing.bs = fleecing_bs;
+ }
+
di_list = g_list_append(di_list, di);
d++;
}
@@ -656,6 +783,7 @@ UuidInfo coroutine_fn *qmp_backup(
const char *devlist,
bool has_speed, int64_t speed,
bool has_max_workers, int64_t max_workers,
+ bool has_fleecing, bool fleecing,
Error **errp)
{
assert(qemu_in_coroutine());
@@ -684,7 +812,7 @@ UuidInfo coroutine_fn *qmp_backup(
format = has_format ? format : BACKUP_FORMAT_VMA;
bdrv_graph_co_rdlock();
- di_list = get_device_info(devlist, &local_err);
+ di_list = get_device_info(devlist, has_fleecing && fleecing, &local_err);
bdrv_graph_co_rdunlock();
if (local_err) {
error_propagate(errp, local_err);
@@ -1087,5 +1215,6 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
ret->query_bitmap_info = true;
ret->pbs_masterkey = true;
ret->backup_max_workers = true;
+ ret->backup_fleecing = true;
return ret;
}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 9795247c1f..c581f1f238 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -948,6 +948,10 @@
#
# @max-workers: see @BackupPerf for details. Default 16.
#
+# @fleecing: perform a backup with fleecing. For each device in @devlist, a
+# corresponing '-fleecing' device with the same size already needs to
+# be present.
+#
# Returns: the uuid of the backup job
#
##
@@ -968,7 +972,8 @@
'*firewall-file': 'str',
'*devlist': 'str',
'*speed': 'int',
- '*max-workers': 'int' },
+ '*max-workers': 'int',
+ '*fleecing': 'bool' },
'returns': 'UuidInfo', 'coroutine': true }
##
@@ -1014,6 +1019,8 @@
#
# @pbs-library-version: Running version of libproxmox-backup-qemu0 library.
#
+# @backup-fleecing: Whether backup fleecing is supported or not.
+#
# @backup-max-workers: Whether the 'max-workers' @BackupPerf setting is
# supported or not.
#
@@ -1025,6 +1032,7 @@
'pbs-dirty-bitmap-migration': 'bool',
'pbs-masterkey': 'bool',
'pbs-library-version': 'str',
+ 'backup-fleecing': 'bool',
'backup-max-workers': 'bool' } }
##

View File

@@ -1,52 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 9 Dec 2020 11:46:57 +0100
Subject: [PATCH] PVE: block/pbs: fast-path reads without allocation if
possible
...and switch over to g_malloc/g_free while at it to align with other
QEMU code.
Tracing shows the fast-path is taken almost all the time, though not
100% so the slow one is still necessary.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
block/pbs.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/block/pbs.c b/block/pbs.c
index 1481a2bfd1..fbf0d8d845 100644
--- a/block/pbs.c
+++ b/block/pbs.c
@@ -200,7 +200,16 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
BDRVPBSState *s = bs->opaque;
int ret;
char *pbs_error = NULL;
- uint8_t *buf = malloc(bytes);
+ uint8_t *buf;
+ bool inline_buf = true;
+
+ /* for single-buffer IO vectors we can fast-path the write directly to it */
+ if (qiov->niov == 1 && qiov->iov->iov_len >= bytes) {
+ buf = qiov->iov->iov_base;
+ } else {
+ inline_buf = false;
+ buf = g_malloc(bytes);
+ }
ReadCallbackData rcb = {
.co = qemu_coroutine_self(),
@@ -218,8 +227,10 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
return -EIO;
}
- qemu_iovec_from_buf(qiov, 0, buf, bytes);
- free(buf);
+ if (!inline_buf) {
+ qemu_iovec_from_buf(qiov, 0, buf, bytes);
+ g_free(buf);
+ }
return ret;
}

View File

@@ -0,0 +1,117 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Mon, 29 Apr 2024 14:43:58 +0200
Subject: [PATCH] PVE backup: improve error when copy-before-write fails for
fleecing
With fleecing, failure for copy-before-write does not fail the guest
write, but only sets the snapshot error that is associated to the
copy-before-write filter, making further requests to the snapshot
access fail with EACCES, which then also fails the job. But that error
code is not the root cause of why the backup failed, so bubble up the
original snapshot error instead.
Reported-by: Friedrich Weber <f.weber@proxmox.com>
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
Tested-by: Friedrich Weber <f.weber@proxmox.com>
---
block/copy-before-write.c | 18 ++++++++++++------
block/copy-before-write.h | 1 +
pve-backup.c | 9 +++++++++
3 files changed, 22 insertions(+), 6 deletions(-)
diff --git a/block/copy-before-write.c b/block/copy-before-write.c
index 81afeff1c7..fdf9cdc0cd 100644
--- a/block/copy-before-write.c
+++ b/block/copy-before-write.c
@@ -27,6 +27,7 @@
#include "qapi/qmp/qjson.h"
#include "sysemu/block-backend.h"
+#include "qemu/atomic.h"
#include "qemu/cutils.h"
#include "qapi/error.h"
#include "block/block_int.h"
@@ -75,7 +76,8 @@ typedef struct BDRVCopyBeforeWriteState {
* @snapshot_error is normally zero. But on first copy-before-write failure
* when @on_cbw_error == ON_CBW_ERROR_BREAK_SNAPSHOT, @snapshot_error takes
* value of this error (<0). After that all in-flight and further
- * snapshot-API requests will fail with that error.
+ * snapshot-API requests will fail with that error. To be accessed with
+ * atomics.
*/
int snapshot_error;
} BDRVCopyBeforeWriteState;
@@ -115,7 +117,7 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
return 0;
}
- if (s->snapshot_error) {
+ if (qatomic_read(&s->snapshot_error)) {
return 0;
}
@@ -139,9 +141,7 @@ static coroutine_fn int cbw_do_copy_before_write(BlockDriverState *bs,
WITH_QEMU_LOCK_GUARD(&s->lock) {
if (ret < 0) {
assert(s->on_cbw_error == ON_CBW_ERROR_BREAK_SNAPSHOT);
- if (!s->snapshot_error) {
- s->snapshot_error = ret;
- }
+ qatomic_cmpxchg(&s->snapshot_error, 0, ret);
} else {
bdrv_set_dirty_bitmap(s->done_bitmap, off, end - off);
}
@@ -215,7 +215,7 @@ cbw_snapshot_read_lock(BlockDriverState *bs, int64_t offset, int64_t bytes,
QEMU_LOCK_GUARD(&s->lock);
- if (s->snapshot_error) {
+ if (qatomic_read(&s->snapshot_error)) {
g_free(req);
return NULL;
}
@@ -595,6 +595,12 @@ void bdrv_cbw_drop(BlockDriverState *bs)
bdrv_unref(bs);
}
+int bdrv_cbw_snapshot_error(BlockDriverState *bs)
+{
+ BDRVCopyBeforeWriteState *s = bs->opaque;
+ return qatomic_read(&s->snapshot_error);
+}
+
static void cbw_init(void)
{
bdrv_register(&bdrv_cbw_filter);
diff --git a/block/copy-before-write.h b/block/copy-before-write.h
index 2a5d4ba693..969da3620f 100644
--- a/block/copy-before-write.h
+++ b/block/copy-before-write.h
@@ -44,5 +44,6 @@ BlockDriverState *bdrv_cbw_append(BlockDriverState *source,
BlockCopyState **bcs,
Error **errp);
void bdrv_cbw_drop(BlockDriverState *bs);
+int bdrv_cbw_snapshot_error(BlockDriverState *bs);
#endif /* COPY_BEFORE_WRITE_H */
diff --git a/pve-backup.c b/pve-backup.c
index 4b0820c8a7..81697d9bf9 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -374,6 +374,15 @@ static void pvebackup_complete_cb(void *opaque, int ret)
di->fleecing.snapshot_access = NULL;
}
if (di->fleecing.cbw) {
+ /*
+ * With fleecing, failure for cbw does not fail the guest write, but only sets the snapshot
+ * error, making further requests to the snapshot fail with EACCES, which then also fail the
+ * job. But that code is not the root cause and just confusing, so update it.
+ */
+ int snapshot_error = bdrv_cbw_snapshot_error(di->fleecing.cbw);
+ if (di->completed_ret == -EACCES && snapshot_error) {
+ di->completed_ret = snapshot_error;
+ }
bdrv_cbw_drop(di->fleecing.cbw);
di->fleecing.cbw = NULL;
}

View File

@@ -0,0 +1,103 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 7 Nov 2024 17:51:14 +0100
Subject: [PATCH] PVE backup: fixup error handling for fleecing
The drained section needs to be terminated before breaking out of the
loop in the error scenarios. Otherwise, guest IO on the drive would
become stuck.
If the job is created successfully, then the job completion callback
will clean up the snapshot access block nodes. In case failure
happened before the job is created, there was no cleanup for the
snapshot access block nodes yet. Add it.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
pve-backup.c | 38 +++++++++++++++++++++++++-------------
1 file changed, 25 insertions(+), 13 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 81697d9bf9..320c660589 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -357,22 +357,23 @@ static void coroutine_fn pvebackup_co_complete_stream(void *opaque)
qemu_co_mutex_unlock(&backup_state.backup_mutex);
}
-static void pvebackup_complete_cb(void *opaque, int ret)
+static void cleanup_snapshot_access(PVEBackupDevInfo *di)
{
- PVEBackupDevInfo *di = opaque;
- di->completed_ret = ret;
-
- /*
- * Handle block-graph specific cleanup (for fleecing) outside of the coroutine, because the work
- * won't be done as a coroutine anyways:
- * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via bdrv_co_unref() would
- * just spawn a BH calling bdrv_unref().
- * - For cbw, draining would need to spawn a BH.
- */
if (di->fleecing.snapshot_access) {
bdrv_unref(di->fleecing.snapshot_access);
di->fleecing.snapshot_access = NULL;
}
+ if (di->fleecing.cbw) {
+ bdrv_cbw_drop(di->fleecing.cbw);
+ di->fleecing.cbw = NULL;
+ }
+}
+
+static void pvebackup_complete_cb(void *opaque, int ret)
+{
+ PVEBackupDevInfo *di = opaque;
+ di->completed_ret = ret;
+
if (di->fleecing.cbw) {
/*
* With fleecing, failure for cbw does not fail the guest write, but only sets the snapshot
@@ -383,10 +384,17 @@ static void pvebackup_complete_cb(void *opaque, int ret)
if (di->completed_ret == -EACCES && snapshot_error) {
di->completed_ret = snapshot_error;
}
- bdrv_cbw_drop(di->fleecing.cbw);
- di->fleecing.cbw = NULL;
}
+ /*
+ * Handle block-graph specific cleanup (for fleecing) outside of the coroutine, because the work
+ * won't be done as a coroutine anyways:
+ * - For snapshot_access, allows doing bdrv_unref() directly. Doing it via bdrv_co_unref() would
+ * just spawn a BH calling bdrv_unref().
+ * - For cbw, draining would need to spawn a BH.
+ */
+ cleanup_snapshot_access(di);
+
/*
* Needs to happen outside of coroutine, because it takes the graph write lock.
*/
@@ -587,6 +595,7 @@ static void create_backup_jobs_bh(void *opaque) {
if (!di->fleecing.cbw) {
error_setg(errp, "appending cbw node for fleecing failed: %s",
local_err ? error_get_pretty(local_err) : "unknown error");
+ bdrv_drained_end(di->bs);
break;
}
@@ -599,6 +608,8 @@ static void create_backup_jobs_bh(void *opaque) {
if (!di->fleecing.snapshot_access) {
error_setg(errp, "setting up snapshot access for fleecing failed: %s",
local_err ? error_get_pretty(local_err) : "unknown error");
+ cleanup_snapshot_access(di);
+ bdrv_drained_end(di->bs);
break;
}
source_bs = di->fleecing.snapshot_access;
@@ -637,6 +648,7 @@ static void create_backup_jobs_bh(void *opaque) {
}
if (!job || local_err) {
+ cleanup_snapshot_access(di);
error_setg(errp, "backup_job_create failed: %s",
local_err ? error_get_pretty(local_err) : "null");
break;

View File

@@ -1,42 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Tue, 2 Mar 2021 16:11:54 +0100
Subject: [PATCH] block/io: accept NULL qiov in bdrv_pad_request
Some operations, e.g. block-stream, perform reads while discarding the
results (only copy-on-read matters). In this case they will pass NULL as
the target QEMUIOVector, which will however trip bdrv_pad_request, since
it wants to extend its passed vector.
Simply check for NULL and do nothing, there's no reason to pad the
target if it will be discarded anyway.
---
block/io.c | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/block/io.c b/block/io.c
index ec5e152bb7..08dee005ec 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1613,13 +1613,16 @@ static bool bdrv_pad_request(BlockDriverState *bs,
return false;
}
- qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
- *qiov, *qiov_offset, *bytes,
- pad->buf + pad->buf_len - pad->tail, pad->tail);
+ if (*qiov) {
+ qemu_iovec_init_extended(&pad->local_qiov, pad->buf, pad->head,
+ *qiov, *qiov_offset, *bytes,
+ pad->buf + pad->buf_len - pad->tail, pad->tail);
+ *qiov = &pad->local_qiov;
+ *qiov_offset = 0;
+ }
+
*bytes += pad->head + pad->tail;
*offset -= pad->head;
- *qiov = &pad->local_qiov;
- *qiov_offset = 0;
return true;
}

View File

@@ -0,0 +1,135 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 7 Nov 2024 17:51:15 +0100
Subject: [PATCH] PVE backup: factor out setting up snapshot access for
fleecing
Avoids some line bloat in the create_backup_jobs_bh() function and is
in preparation for setting up the snapshot access independently of
fleecing, in particular that will be useful for providing access to
the snapshot via NBD.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
pve-backup.c | 95 ++++++++++++++++++++++++++++++++--------------------
1 file changed, 58 insertions(+), 37 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 320c660589..d8d0c04b0f 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -525,6 +525,62 @@ static int coroutine_fn pvebackup_co_add_config(
goto out;
}
+/*
+ * Setup a snapshot-access block node for a device with associated fleecing image.
+ */
+static int setup_snapshot_access(PVEBackupDevInfo *di, Error **errp)
+{
+ Error *local_err = NULL;
+
+ if (!di->fleecing.bs) {
+ error_setg(errp, "no associated fleecing image");
+ return -1;
+ }
+
+ QDict *cbw_opts = qdict_new();
+ qdict_put_str(cbw_opts, "driver", "copy-before-write");
+ qdict_put_str(cbw_opts, "file", bdrv_get_node_name(di->bs));
+ qdict_put_str(cbw_opts, "target", bdrv_get_node_name(di->fleecing.bs));
+
+ if (di->bitmap) {
+ /*
+ * Only guest writes to parts relevant for the backup need to be intercepted with
+ * old data being copied to the fleecing image.
+ */
+ qdict_put_str(cbw_opts, "bitmap.node", bdrv_get_node_name(di->bs));
+ qdict_put_str(cbw_opts, "bitmap.name", bdrv_dirty_bitmap_name(di->bitmap));
+ }
+ /*
+ * Fleecing storage is supposed to be fast and it's better to break backup than guest
+ * writes. Certain guest drivers like VirtIO-win have 60 seconds timeout by default, so
+ * abort a bit before that.
+ */
+ qdict_put_str(cbw_opts, "on-cbw-error", "break-snapshot");
+ qdict_put_int(cbw_opts, "cbw-timeout", 45);
+
+ di->fleecing.cbw = bdrv_insert_node(di->bs, cbw_opts, BDRV_O_RDWR, &local_err);
+
+ if (!di->fleecing.cbw) {
+ error_setg(errp, "appending cbw node for fleecing failed: %s",
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ return -1;
+ }
+
+ QDict *snapshot_access_opts = qdict_new();
+ qdict_put_str(snapshot_access_opts, "driver", "snapshot-access");
+ qdict_put_str(snapshot_access_opts, "file", bdrv_get_node_name(di->fleecing.cbw));
+
+ di->fleecing.snapshot_access =
+ bdrv_open(NULL, NULL, snapshot_access_opts, BDRV_O_RDWR | BDRV_O_UNMAP, &local_err);
+ if (!di->fleecing.snapshot_access) {
+ error_setg(errp, "setting up snapshot access for fleecing failed: %s",
+ local_err ? error_get_pretty(local_err) : "unknown error");
+ return -1;
+ }
+
+ return 0;
+}
+
/*
* backup_job_create can *not* be run from a coroutine, so this can't either.
* The caller is responsible that backup_mutex is held nonetheless.
@@ -569,49 +625,14 @@ static void create_backup_jobs_bh(void *opaque) {
const char *job_id = bdrv_get_device_name(di->bs);
bdrv_graph_co_rdunlock();
if (di->fleecing.bs) {
- QDict *cbw_opts = qdict_new();
- qdict_put_str(cbw_opts, "driver", "copy-before-write");
- qdict_put_str(cbw_opts, "file", bdrv_get_node_name(di->bs));
- qdict_put_str(cbw_opts, "target", bdrv_get_node_name(di->fleecing.bs));
-
- if (di->bitmap) {
- /*
- * Only guest writes to parts relevant for the backup need to be intercepted with
- * old data being copied to the fleecing image.
- */
- qdict_put_str(cbw_opts, "bitmap.node", bdrv_get_node_name(di->bs));
- qdict_put_str(cbw_opts, "bitmap.name", bdrv_dirty_bitmap_name(di->bitmap));
- }
- /*
- * Fleecing storage is supposed to be fast and it's better to break backup than guest
- * writes. Certain guest drivers like VirtIO-win have 60 seconds timeout by default, so
- * abort a bit before that.
- */
- qdict_put_str(cbw_opts, "on-cbw-error", "break-snapshot");
- qdict_put_int(cbw_opts, "cbw-timeout", 45);
-
- di->fleecing.cbw = bdrv_insert_node(di->bs, cbw_opts, BDRV_O_RDWR, &local_err);
-
- if (!di->fleecing.cbw) {
- error_setg(errp, "appending cbw node for fleecing failed: %s",
- local_err ? error_get_pretty(local_err) : "unknown error");
- bdrv_drained_end(di->bs);
- break;
- }
-
- QDict *snapshot_access_opts = qdict_new();
- qdict_put_str(snapshot_access_opts, "driver", "snapshot-access");
- qdict_put_str(snapshot_access_opts, "file", bdrv_get_node_name(di->fleecing.cbw));
-
- di->fleecing.snapshot_access =
- bdrv_open(NULL, NULL, snapshot_access_opts, BDRV_O_RDWR | BDRV_O_UNMAP, &local_err);
- if (!di->fleecing.snapshot_access) {
+ if (setup_snapshot_access(di, &local_err) < 0) {
error_setg(errp, "setting up snapshot access for fleecing failed: %s",
local_err ? error_get_pretty(local_err) : "unknown error");
cleanup_snapshot_access(di);
bdrv_drained_end(di->bs);
break;
}
+
source_bs = di->fleecing.snapshot_access;
discard_source = true;

View File

@@ -0,0 +1,135 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 7 Nov 2024 17:51:16 +0100
Subject: [PATCH] PVE backup: save device name in device info structure
The device name needs to be queried while holding the graph read lock
and since it doesn't change during the whole operation, just get it
once during setup and avoid the need to query it again in different
places.
Also in preparation to use it more often in error messages and for the
upcoming external backup access API.
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
pve-backup.c | 29 +++++++++++++++--------------
1 file changed, 15 insertions(+), 14 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index d8d0c04b0f..e2110ce0db 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -94,6 +94,7 @@ typedef struct PVEBackupDevInfo {
size_t size;
uint64_t block_size;
uint8_t dev_id;
+ char* device_name;
int completed_ret; // INT_MAX if not completed
BdrvDirtyBitmap *bitmap;
BlockDriverState *target;
@@ -327,6 +328,8 @@ static void coroutine_fn pvebackup_co_complete_stream(void *opaque)
}
di->bs = NULL;
+ g_free(di->device_name);
+ di->device_name = NULL;
assert(di->target == NULL);
@@ -621,9 +624,6 @@ static void create_backup_jobs_bh(void *opaque) {
BlockDriverState *source_bs = di->bs;
bool discard_source = false;
- bdrv_graph_co_rdlock();
- const char *job_id = bdrv_get_device_name(di->bs);
- bdrv_graph_co_rdunlock();
if (di->fleecing.bs) {
if (setup_snapshot_access(di, &local_err) < 0) {
error_setg(errp, "setting up snapshot access for fleecing failed: %s",
@@ -654,7 +654,7 @@ static void create_backup_jobs_bh(void *opaque) {
}
BlockJob *job = backup_job_create(
- job_id, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap,
+ di->device_name, source_bs, di->target, backup_state.speed, sync_mode, di->bitmap,
bitmap_mode, false, discard_source, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT,
BLOCKDEV_ON_ERROR_REPORT, JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn,
&local_err);
@@ -751,6 +751,7 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
}
PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
di->bs = bs;
+ di->device_name = g_strdup(bdrv_get_device_name(bs));
if (fleecing && device_uses_fleecing(*d)) {
g_autofree gchar *fleecing_devid = g_strconcat(*d, "-fleecing", NULL);
@@ -789,6 +790,7 @@ static GList coroutine_fn GRAPH_RDLOCK *get_device_info(
PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
di->bs = bs;
+ di->device_name = g_strdup(bdrv_get_device_name(bs));
di_list = g_list_append(di_list, di);
}
}
@@ -956,9 +958,6 @@ UuidInfo coroutine_fn *qmp_backup(
di->block_size = dump_cb_block_size;
- bdrv_graph_co_rdlock();
- const char *devname = bdrv_get_device_name(di->bs);
- bdrv_graph_co_rdunlock();
PBSBitmapAction action = PBS_BITMAP_ACTION_NOT_USED;
size_t dirty = di->size;
@@ -973,7 +972,8 @@ UuidInfo coroutine_fn *qmp_backup(
}
action = PBS_BITMAP_ACTION_NEW;
} else {
- expect_only_dirty = proxmox_backup_check_incremental(pbs, devname, di->size) != 0;
+ expect_only_dirty =
+ proxmox_backup_check_incremental(pbs, di->device_name, di->size) != 0;
}
if (expect_only_dirty) {
@@ -997,7 +997,8 @@ UuidInfo coroutine_fn *qmp_backup(
}
}
- int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, errp);
+ int dev_id = proxmox_backup_co_register_image(pbs, di->device_name, di->size,
+ expect_only_dirty, errp);
if (dev_id < 0) {
goto err_mutex;
}
@@ -1009,7 +1010,7 @@ UuidInfo coroutine_fn *qmp_backup(
di->dev_id = dev_id;
PBSBitmapInfo *info = g_malloc(sizeof(*info));
- info->drive = g_strdup(devname);
+ info->drive = g_strdup(di->device_name);
info->action = action;
info->size = di->size;
info->dirty = dirty;
@@ -1032,10 +1033,7 @@ UuidInfo coroutine_fn *qmp_backup(
goto err_mutex;
}
- bdrv_graph_co_rdlock();
- const char *devname = bdrv_get_device_name(di->bs);
- bdrv_graph_co_rdunlock();
- di->dev_id = vma_writer_register_stream(vmaw, devname, di->size);
+ di->dev_id = vma_writer_register_stream(vmaw, di->device_name, di->size);
if (di->dev_id <= 0) {
error_set(errp, ERROR_CLASS_GENERIC_ERROR,
"register_stream failed");
@@ -1146,6 +1144,9 @@ err:
bdrv_co_unref(di->target);
}
+ g_free(di->device_name);
+ di->device_name = NULL;
+
g_free(di);
}
g_list_free(di_list);

View File

@@ -0,0 +1,25 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fiona Ebner <f.ebner@proxmox.com>
Date: Thu, 7 Nov 2024 17:51:17 +0100
Subject: [PATCH] PVE backup: include device name in error when setting up
snapshot access fails
Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
---
pve-backup.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/pve-backup.c b/pve-backup.c
index e2110ce0db..32352fb5ec 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -626,7 +626,8 @@ static void create_backup_jobs_bh(void *opaque) {
bool discard_source = false;
if (di->fleecing.bs) {
if (setup_snapshot_access(di, &local_err) < 0) {
- error_setg(errp, "setting up snapshot access for fleecing failed: %s",
+ error_setg(errp, "%s - setting up snapshot access for fleecing failed: %s",
+ di->device_name,
local_err ? error_get_pretty(local_err) : "unknown error");
cleanup_snapshot_access(di);
bdrv_drained_end(di->bs);

Some files were not shown because too many files have changed in this diff Show More