Compare commits

..

153 Commits

Author SHA1 Message Date
f0bb2e89bd Add bdrv_co_block_status 2023-01-13 23:52:17 +03:00
7564d2396c Add Vitastor support 2022-12-14 18:57:19 +03:00
Thomas Lamprecht
4e4b9ab13f bump version to 6.2.0-11
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-06-22 15:54:58 +02:00
Thomas Lamprecht
39e84ba82d vma/alloc-track improvements
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-06-22 15:52:16 +02:00
Thomas Lamprecht
4fd0fa7fb3 re-export patches in normalized form
iow. using:

git format-patch --zero-commit --no-signature --no-numbered --diff-algorithm=myers ...

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-06-22 15:49:53 +02:00
Dominik Csapak
539e333eaa add 'namespace' to BlockdevOptionsPbs
so that we can use it for the -blockdev options (used for live-restore)

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2022-06-22 15:10:49 +02:00
Fabian Grünbichler
68569ea2ff bump version to 6.2.0-10
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2022-06-09 16:35:57 +02:00
Fabian Grünbichler
41aedfb6db add d/source/include-binaries
to shutup dpkg-source when building a source package

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2022-06-09 16:35:01 +02:00
Fabian Ebner
7bd4d8645a fix #4101: acquire job's aio context before calling job_unref
Otherwise, we might run into an abort via bdrv_co_yield_to_drain()
(can at least happen when a disk with iothread is used):
> #0  0x00007fef4f5dece1 __GI_raise (libc.so.6 + 0x3bce1)
> #1  0x00007fef4f5c8537 __GI_abort (libc.so.6 + 0x25537)
> #2  0x00005641bce3c71f error_exit (qemu-system-x86_64 + 0x80371f)
> #3  0x00005641bce3d02b qemu_mutex_unlock_impl (qemu-system-x86_64 + 0x80402b)
> #4  0x00005641bcd51655 bdrv_co_yield_to_drain (qemu-system-x86_64 + 0x718655)
> #5  0x00005641bcd52de8 bdrv_do_drained_begin (qemu-system-x86_64 + 0x719de8)
> #6  0x00005641bcd47e07 blk_drain (qemu-system-x86_64 + 0x70ee07)
> #7  0x00005641bcd498cd blk_unref (qemu-system-x86_64 + 0x7108cd)
> #8  0x00005641bcd31e6f block_job_free (qemu-system-x86_64 + 0x6f8e6f)
> #9  0x00005641bcd32d65 job_unref (qemu-system-x86_64 + 0x6f9d65)
> #10 0x00005641bcd93b3d pvebackup_co_complete_stream (qemu-system-x86_64 + 0x75ab3d)
> #11 0x00005641bce4e353 coroutine_trampoline (qemu-system-x86_64 + 0x815353)

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-06-09 14:57:28 +02:00
Wolfgang Bumiller
ed3b5b8ab8 bump version to 6.2.0-9
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-06-08 14:04:09 +02:00
Wolfgang Bumiller
7f4326d1dc pbs cleanup fixes
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-06-08 13:10:51 +02:00
Wolfgang Bumiller
53bff441c5 delete patches which were dropped from the series file
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-06-08 13:07:04 +02:00
Thomas Lamprecht
f9a4b1cea7 bump version to 6.2.0-8
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-05-19 09:25:11 +02:00
Fabian Ebner
dc265df350 add revert to work around performance regression when backing up large RBD disk
resulting in QMP timeouts and very slow backups. The plan is to figure
out (ideally together with upstream) a way to make the implementation
of bdrv_co_block_status for RBD more efficient. But for now, revert
the problematic change as a stop-gap measure.

Upstream bug report:
https://gitlab.com/qemu-project/qemu/-/issues/1026

Forum threads:
https://forum.proxmox.com/threads/109272/
https://forum.proxmox.com/threads/109448/
https://forum.proxmox.com/threads/101334/ (partially)

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2022-05-19 09:23:38 +02:00
Thomas Lamprecht
e0076597c6 bump version to 6.2.0-7
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-05-12 16:06:00 +02:00
Thomas Lamprecht
ee99c1f813 d/control: bump build-depenceny of proxmox-backup-qemu to 1.3.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-05-12 16:05:30 +02:00
Wolfgang Bumiller
58a5492e9c namespace support
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-05-12 13:49:35 +02:00
Thomas Lamprecht
e67b8b5bd9 bump version to 6.2.0-6
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-05-11 10:42:57 +02:00
Thomas Lamprecht
309b5c1694 backport various fixes for gluster, qxl and vnc
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-05-11 10:40:14 +02:00
Thomas Lamprecht
4ce5937f89 bump version to 6.2.0-5
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-25 10:13:50 +02:00
Thomas Lamprecht
f87d0523df vma: allow partial restore
Introduce a new map line for skipping a certain drive, of the form
skip=drive-scsi0

Since in PVE, most archives are compressed and piped to vma for
restore, it's not easily possible to skip reads.

For the reader, a new skip flag for VmaRestoreState is added and the
target is allowed to be NULL if skip is specified when registering.
If
the skip flag is set, no writes will be made as well as no check for
duplicate clusters. Therefore, the flag is not set for verify.

Originally-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-25 10:07:37 +02:00
Thomas Lamprecht
2fd4ea2813 patches: update context
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-25 10:07:01 +02:00
Thomas Lamprecht
2653a5f029 vma: restore: call blk_unref for all opened block devices
Originally-by: Fabian Ebner <f.ebner@proxmox.com>
Link: https://lists.proxmox.com/pipermail/pve-devel/2022-April/052642.html
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-25 10:05:29 +02:00
Thomas Lamprecht
664ecf59a9 bump version to 6.2.0-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-22 11:52:34 +02:00
Thomas Lamprecht
4de9440f87 various stable backports
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-22 10:22:39 +02:00
Thomas Lamprecht
9b348f8c6d d/copyright: drop trailing whitespace
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-22 09:16:23 +02:00
Thomas Lamprecht
799cf8c5a3 d/control: add suggest dependency-hint for libgl1
It pulls in a lot of stuff via the libglx0 -> libglx-mesa0 dependency
chain, so only suggest it for now to avoid installing it in the
installer or via common "PVE on-top Debian" installations, VirGL
integration is experimental after all and we may drop/replace it with
the vulkan based venus one, once available (Debian 12?).

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-22 09:14:39 +02:00
Thomas Lamprecht
b02e62dba0 d/control: add libgbm to build dependencies
required for good virgl support

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-22 08:50:36 +02:00
Thomas Lamprecht
fc03e1b6bf bump version to 6.2.0-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-15 09:09:43 +02:00
Thomas Lamprecht
c8ba14bed0 cherry-pick fix for passing some acpi slic tables
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-04-15 08:07:34 +02:00
Wolfgang Bumiller
daea534caa bump version to 6.2.0-2
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-03-03 12:05:37 +01:00
Fabian Ebner
27199bd753 backup: add patch to initialize bcs bitmap early enough for PBS
This is necessary for multi-disk backups where not all jobs are
immediately started after they are created. QEMU commit
06e0a9c16405c0a4c1eca33cf286cc04c42066a2 did already part of the work,
ensuring that new writes after job creation don't pass through to the
backup, but not yet for the MIRROR_SYNC_MODE_BITMAP case which is used
for PBS.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2022-03-03 11:37:17 +01:00
Thomas Lamprecht
e050683663 d/control: mark numactl a recommended package
we do not call in anywhere unconditionally

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-24 07:51:19 +01:00
Thomas Lamprecht
13184117e4 d/control: drop sdl dependency, we disable it on compile tinme
disabled via d/rules since a while...

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-24 07:51:19 +01:00
Thomas Lamprecht
aa4b14ea10 d/control: libaio1 is added by dh shlibs
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-24 07:51:19 +01:00
Thomas Lamprecht
3aa5b7598d enable zstd support
plan to use that for multifd migration, among other things

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-24 07:51:19 +01:00
Thomas Lamprecht
13d3e10aa6 compile in virgl support
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-24 07:51:19 +01:00
Thomas Lamprecht
58c3533a58 bump version to 6.2.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-18 14:23:41 +01:00
Thomas Lamprecht
fe0542bed9 d/rules: disable libssh by default
was always disabled in our clean builds, this now also avoids
auto-enabling it on "dirty" build hosts

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-18 14:21:53 +01:00
Fabian Ebner
f6d40bfdf4 add patch for loading a snapshot with qemu-img dd
Will be used when cloning from a qcow2 efidisk.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2022-02-15 14:03:07 +01:00
Fabian Ebner
107132becc fix getopt-string when introducing -n option for qemu-img dd
The colon after U is wrong, because it doesn't take an argument.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2022-02-15 14:03:07 +01:00
Fabian Ebner
4567474e95 update submodule and patches to 6.2.0
Notable changes:
* bdrv_co_p{discard,readv,writev,write_zeroes} function signatures
  changed, to using int64_t for offsets/bytes and some still had int
  rather than BrdvRequestFlags for the flags.
* job_cancel_sync now has a force parameter. Commit messages in
  73895f3838cd7fdaf185cf1dbc47be58844a966f
  4cfb3f05627ad82af473e7f7ae113c3884cd04e3
  sound like using force=true makes more sense.
* Added 3 patches coming in via qemu-stable tag, most important one is
  to work around a librbd issue.
* Added another 3 patches from qemu-devel to fix issue leading to
  crash when live migrating with iothread.
* cluster_size calculation helper changed (see patch pve/0026).
* QAPI's if conditionals now use 'CONFIG_FOO' rather than
  'defined(CONFIG_FOO)'

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2022-02-15 14:03:07 +01:00
Thomas Lamprecht
33a2d3a826 bump version to 6.1.1-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-02-14 15:53:18 +01:00
Fabian Ebner
2bf61c3eb6 vma: create: register all streams before entering coroutines
Otherwise, the header might already get written by a coroutine and
registering further streams will fail after that.

Also adds a missing g_list_free call for the other GList that's used.

Reported in the community forum:
https://forum.proxmox.com/threads/104744/

Reproducer script (increase beyond 30 if the issue isn't triggered yet):
> #!/usr/bin/perl
>
> my $dir = "./vma-create-bug";
> mkdir $dir;
>
> my $archive_path = "$dir/vzdump-qemu-104-2202_02_02-00_00_00.vma";
> unlink $archive_path;
>
> my $cmd = "vma create $archive_path -v";
> for (my $i = 0; $i < 30; $i++) {
>   system("truncate -s 1M $dir/drive-virtio$i.img");
>   $cmd .= " drive-virtio$i=$dir/drive-virtio$i.img";
> }
> system($cmd);

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2022-02-14 15:38:58 +01:00
Thomas Lamprecht
c07b2203b3 bump version to 6.1.1-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-01-13 10:57:48 +01:00
Thomas Lamprecht
ddbf7a872d update submodule and patches to 6.1.1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2022-01-13 10:56:39 +01:00
Thomas Lamprecht
95c7156d1e bump version to 6.1.0-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-12-01 15:35:49 +01:00
Fabian Ebner
570d4ad51d fix #3738: cherry-pick "block: introduce max_hw_iov for use in scsi-generic"
which fixes the bad commit 18473467d55a20d643b6c9b3a52de42f705b4d35
that was tracked down via bisecting, and has a Cc for qemu-stable as
well.

Issue was easy enough to reproduce with a single virtio-block disk
using a few runs of dd if=/dev/urandom of=file bs=1M count=1000

Commit cc071629539dc1f303175a7e2d4ab854c0a8b20f upstream.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-12-01 15:34:27 +01:00
Dominik Csapak
c5e8e7c998 buildsys: fix build-dependencies on headers for 'vma' and 'pbs_restore'
both of them depend on generated header files, so we have to specify
them as sources. Otherwise, it happens (at least on some machines)
that they will be compiled before the headers are generated, aborting
the build.

Signed-off-by: Dominik Csapak <d.csapak@proxmox.com>
2021-11-18 08:11:57 +01:00
Fabian Grünbichler
7cf6b60926 fix #3728: handle machine without type
libguestfs starts their helper VMs with `-machine accel=..` without a
machine type, and our pve version suffix handling would segfault in that
case. there might be other scripted use cases that are affected as well.

this regression was introduced with the rebase of our patch set on top
of 6.1.0

Fixes: f376b2b9e2

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2021-11-17 17:20:26 +01:00
Thomas Lamprecht
50999525c6 bump version to 6.1.0-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-11-16 09:38:18 +01:00
Fabian Grünbichler
edbcc10a69 cherry-pick segfault fix
this was reported multiple times in our forums[1 with backtraces, 2 & 3
with same log messages], fix is taken from upstream master.

1: https://forum.proxmox.com/threads/pve-7-0-14-1-vm-not-running-live-migration-kills-vm-post-ssd-move-pre-ram-move.99704/
2: https://forum.proxmox.com/threads/proxmox-7-0-14-1-crashes-vm-during-migrate-to-other-host.99678
3: https://forum.proxmox.com/threads/cannot-migrate-between-zfs-and-ceph.99685/#post-430152

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2021-11-16 09:23:43 +01:00
Thomas Lamprecht
cc707c362e bump version to 6.1.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-10-13 17:58:18 +02:00
Stefan Reiter
af64ed13eb add fixup patch for qxl migration logic
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-10-13 17:58:18 +02:00
Stefan Reiter
f376b2b9e2 update and rebase to QEMU v6.1.0
Very clean rebase, only the +pve version handling needed manual fixing.
Drops two applied patches from extra/ and adds one new from upstream
(extra/0001*, fixes VNC over unix sockets) as well as 3 of my own for
allowing password changes on custom VNC displays again (as seen and
reviewed upstream, but not yet applied).

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-10-11 15:13:26 +02:00
Thomas Lamprecht
89fa943ef9 bump version to 6.0.0-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-09-06 07:30:05 +02:00
Stefan Reiter
26eee146bc add temporary QMP race fix
same as the initial version sent to qemu-devel, it won't be the final
fix we plan to upstream but it should be enough band-aid to
workaround how PVE uses the QMP.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
 [ Thomas: add a bit reasoning to commit message body ]
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-09-06 07:28:07 +02:00
Wolfgang Bumiller
277d33454f drop patch force-disabling smm
This drops debian/patches/pve/0005-PVE-Config-smm_available-false.patch
(and renumbers the remaining patches)

From what I could gather, this patch was originally added
due to issues with old kernels. Now we have users which
seem to run into issues *with* the patch.

All this does is toggle an option, and it's available via a
qemu CLI option anyway, so if dropping this patch causes
issues for some people we can just add an option to
qemu-server & UI control smm explicitly.

Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Cc: Alexandre Derumier <aderumier@odiso.com>
Tested-by: Stefan Reiter <s.reiter@proxmox.com>
2021-08-24 11:19:05 +02:00
Fabian Grünbichler
611b692181 bump version to 6.0.0-3
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2021-08-03 15:02:40 +02:00
Fabian Ebner
0114d3cd02 io_uring: resubmit when result is -EAGAIN
Linux SCSI can throw spurious -EAGAIN in some corner cases in its
completion path, which will end up being the result in the completed
io_uring request.

Resubmitting such requests should allow block jobs to complete, even
if such spurious errors are encountered.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-07-29 11:51:57 +02:00
Thomas Lamprecht
bb3e494bdc bump version to 6.0.0-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-06-23 11:04:47 +02:00
Stefan Reiter
403f23a0c3 enable io-uring support in QEMU builds
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-06-21 09:56:06 +02:00
Thomas Lamprecht
db687e3cac buildsys: change upload dist to bullseye
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-06-08 11:18:10 +02:00
Thomas Lamprecht
8893def37c bump version to 6.0.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-05-28 11:30:55 +02:00
Stefan Reiter
263fef5b4c update keycodemapdb for 6.0
QEMU 6.0 requires the updated version to build correctly, as the
keymap-gen tool gained some new parameters.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-05-28 11:29:44 +02:00
Stefan Reiter
eb96e850ac debian: ignore submodule checks in QEMU build
...we do those manually, and the build dir is not a git repo.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-05-28 11:29:44 +02:00
Stefan Reiter
8dca018b68 udpate and rebase to QEMU v6.0.0
Mostly minor changes, bigger ones summarized:
* QEMU's internal backup code now uses a new async system, which allows
  parallel requests - the default max_workers settings is 64, I chose
  less, since 64 put enough stress on QEMU that the guest became
  practically unusable during the backup, and 16 still shows quite a
  nice measureable performance improvement. Little code changes for us
  though.
* 'malformed' QAPI parameters/functions are now a build error (i.e.
  using '_' vs '-'), I chose to just whitelist our calls in the name of
  backwards compatibility.
* monitor OOB race fix now uses the upstream variant, cherry-picked from
  origin/master since it's not in 6.0 by default
* last patch fixes a bug with snapshot rollback related to the new yank
  system

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-05-28 11:29:44 +02:00
Thomas Lamprecht
1cbf147251 bump version to 5.2.0-11
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-05-13 19:30:59 +02:00
Thomas Lamprecht
9e4cba3943 d/control: add libjson-perl to build dependencies
we use that to build the available machine/flags list

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-05-13 19:30:37 +02:00
Thomas Lamprecht
ebcd9ada10 bump version to 5.2.0-6
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-15 16:29:52 +02:00
Thomas Lamprecht
0a88214b72 alloc track: use coroutine version of bdrv_pwrite_zeroes
as we're in a coroutine here too

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-06 16:31:53 +02:00
Thomas Lamprecht
76e464784e pbs block driver: run read in the AIO context of the bs
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-06 16:31:53 +02:00
Thomas Lamprecht
b36e8acc31 alloc track: acquire BS AIO context during dropping
ran into this when live-restoring a backup configured for IO-threads,
got the good ol':
> qemu: qemu_mutex_unlock_impl: Operation not permitted
error.

Checking out the history of the related bdrv_backup_top_drop(*bs)
method, we can see that it used to do the AIO context acquiring too,
but in the backup path this was problematic and was changed to be
higher up in the call path in a upstream series from Stefan[0].

That said, this is a completely different code path and it is safe to
do so here. We always run from the main threads's AIO context here
and we call it only indirectly once, guarded by checking for
`s->drop_state == DropNone` and set `s->drop_state = DropRequested`
shortly before we schedule the track_drop() in a bh.

[0]: https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg09139.html

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-06 16:27:48 +02:00
Thomas Lamprecht
aa42ea267e alloc track: keep track_drop() closer to similar block drivers
Reads just nicer with a drain begin *and* end call. Also clearing the
backing link of the alloc track BDS makes it closer to
bdrv_backup_top_drop() with which this driver has a bit in common.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-04-06 16:27:37 +02:00
Thomas Lamprecht
6d6894394c bump version to 5.2.0-5
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-03-30 18:19:47 +02:00
Stefan Reiter
e79be6c6c4 add upstream fixes for qmp_block_resize
cherry-picked cleanly from 6.0 development tree, fixes an issue with
resizing RBD drives (and reportedly also on krbd or potentially other
storage backends) with iothreads.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-03-30 18:14:37 +02:00
Thomas Lamprecht
0be2cab670 bump version to 5.2.0-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-03-23 15:41:33 +01:00
Stefan Reiter
bb751cab32 Add tentative fix for QMP hang
Not exactly as sent upstream[0] since we're missing a change in our
v5.2.0 branch (irrelevant for us), but functionally works the same.

[0] https://lists.gnu.org/archive/html/qemu-devel/2021-03/msg07590.html
2021-03-22 16:52:40 +01:00
Thomas Lamprecht
154a7c0f8d d/control: drop unused python from dependencies
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-03-18 14:15:31 +01:00
Stefan Reiter
677d0d169f add alloc-track block driver patch
See added patches for more info, overview:
0044: slightly increase PBS performance by reducing allocations
0045: slightly increase block-stream performance for Ceph
0046: don't crash with block-stream on RBD
0047: add alloc-track driver for live restore

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-03-16 20:53:18 +01:00
Stefan Reiter
e9b36665c7 fix saving and loading dirty bitmaps in snapshots
Saving dirty bitmaps from our savevm-async code didn't work, since we
use a coroutine which holds the iothread mutex already (upstream savevm
is sync, migration uses a thread). Release the mutex before calling the
one function that (according to it's documentation) requires the lock to
*not* be held: qemu_savevm_state_pending.

Additionally, loading dirty bitmaps requires a call to
dirty_bitmap_mig_before_vm_start after "loadvm", which the upstream
savevm does explicitly afterwards - do that too.

This is exposed via the query-proxmox-support property
"pbs-dirty-bitmap-savevm".

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-03-16 20:44:06 +01:00
Thomas Lamprecht
970196fc93 bump version to 5.2.0-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-03-05 16:23:16 +01:00
Thomas Lamprecht
6503e6e08d machine list: save as JSON
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-03-05 16:21:30 +01:00
Stefan Reiter
40e6b6e5a5 add ACPI compat patch for 5.1 and older machine types
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-03-05 15:20:14 +01:00
Stefan Reiter
a6ede89808 add static supported machines file
Same rationale as the CPU flags file, avoids calling QEMU binary just to
query machine types.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-03-05 15:20:14 +01:00
Stefan Reiter
2413972b46 move bitmap-mirror patches to seperate folder
...instead of having them in the middle of the backup related patches.
These might (hopefully) become upstream at some point as well.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-03-03 14:29:05 +01:00
Stefan Reiter
0c893fd820 clean up pve/ patches by squashing patches of patches
No functional change intended.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-03-03 14:29:05 +01:00
Thomas Lamprecht
25b9156448 bump version to 5.2.0-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-24 19:02:07 +01:00
Stefan Reiter
4194124719 pbs-restore: unref/close target block backend
Use blk_unref to drop the last reference, which will close the block
backend and flush all caches and outstanding writes.

This is especially important for restoring to Ceph, as the userspace
librbd caches will not be flushed if the application exits immediately,
leading to potentially incomplete restores.

Reported-by: Eneko Lacunza <elacunza@binovo.es>
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-02-24 19:02:07 +01:00
Thomas Lamprecht
42a90c4e1c d/patches: backport virtiofsd security fix
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-24 19:02:07 +01:00
Thomas Lamprecht
448136958e d/rules: build virtiofsd
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-24 19:02:07 +01:00
Thomas Lamprecht
8e231fbe8d bump version to 5.2.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-12 12:10:04 +01:00
Fabian Grünbichler
00fae7cdbe build: drop explicit libproxmox-backup-qemu0 dep
it ships a symbol file now, so it can be auto-generated based on the
build-dep and usage.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2021-02-12 12:07:12 +01:00
Stefan Reiter
0b8da68824 add PBS master key support
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-02-12 10:47:14 +01:00
Stefan Reiter
817b7667e8 Update to QEMU 5.2
Lots of patches touched and some slight changes to the build process
since QEMU switched to meson as their build system. Functionality-wise
very little rebasing required.

New patches introduced:
* pve/0058: to fix VMA backups and clean up some code in general with
  new 5.2 features now available to us (namely coroutine-enabled QMP).
* extra/0002: don't build man pages for guest agent when disabled
* extra/0003: fix live-migration with hugepages
* 0017 and 0018 are adjusted to fix snapshot abort and improve
  snap performance a bit

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2021-02-12 10:20:01 +01:00
Fabian Ebner
a16eaaffd3 fix #3084: fall back to open-iscsi initiatorname
Fixes vma restore when the target is an iSCSI storage which expects that
initiatorname. Also avoids the need to always explicitly set the initiatorname
in PVE code, thus fixing moving efidisks from and to such iSCSI storages.

Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
2021-02-06 15:09:15 +01:00
Wolfgang Bumiller
cfa22a5b2a bump version to 5.1.0-8
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2021-01-07 10:28:46 +01:00
Wolfgang Bumiller
b515d45e6b fix #3225: properly cancel jobs in 'created' state
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2021-01-07 10:26:37 +01:00
Stefan Reiter
3d785eabbe disable jemalloc
jemalloc does not play nice with our Rust library (proxmox-backup-qemu),
specifically it never releases memory allocated from Rust to the OS.
This leads to a problem with larger caches (e.g. for the PBS block driver).

It appears to be related to this GitHub issue:
https://github.com/jemalloc/jemalloc/issues/1398

The background_thread solution seems weirdly hacky, so let's disable
jemalloc entirely for now.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-12-15 14:43:34 +01:00
Thomas Lamprecht
55df4a9eb1 bump version to 5.1.0-7
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-25 14:09:24 +01:00
Stefan Reiter
cfe02b3b4e update patches with some pbs-state migration cleanups
...and literal cleanup, as in, call save_cleanup after success or error.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-11-25 11:49:06 +01:00
Stefan Reiter
32ee41155b update patches with squashed in 'include library version'
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>

bump build-dependency on libproxmox-backup-qemu0-dev with version query
support

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-11-25 11:49:06 +01:00
Thomas Lamprecht
424f6841d5 bump version to 5.1.0-6
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-05 18:59:44 +01:00
Thomas Lamprecht
66eae0ae75 fix dirty-bitmap state migration freeze
The idea in general is to migrate all the state, which is small for
us, in a single step once. But, QEMU only calls save state if we
return active true.

Hardcoding is-active to return true, like done initially, makes the
migration freeze, as QEMU thinks this is never done, and only stops
calling us and finishes after a few seconds.

So, add a state with an "active" boolean, set to true when
initializing a migration, and set it to false when the state was
saved.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-05 18:58:15 +01:00
Thomas Lamprecht
2130e925a8 bump version to 5.1.0-5
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-04 18:36:42 +01:00
Thomas Lamprecht
f36fa39113 migration/block-dirty-bitmap: migrate other bitmaps even if one fails
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-11-04 18:35:50 +01:00
Thomas Lamprecht
50eb3ea618 bump version to 5.1.0-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-10-29 18:09:22 +01:00
Thomas Lamprecht
3e2a0699e6 d/control: update versioned dependency for proxmox backup qemu library
to have the proxmox_export_state et al. available

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-10-29 18:07:14 +01:00
Thomas Lamprecht
d95ad93eed apply dirty-bitmap state migration + fix
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-10-29 18:05:43 +01:00
Stefan Reiter
72ae34ecce Several fixes for backup abort and error reporting
Also add my Signed-off-by to some patches where it was missing.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-10-29 17:57:47 +01:00
Thomas Lamprecht
60a607ad0f bump version to 5.1.0-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-09-29 09:23:06 +02:00
Stefan Reiter
d333327a1b Add transaction patches and fix for blocking finish
With the transaction patches, patch 0026-PVE-Backup-modify-job-api.patch
is no longer necessary, so drop it and rebase all following patches on
top.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-09-29 09:21:15 +02:00
Thomas Lamprecht
250d87c276 bump version to 5.1.0-2
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-09-14 20:01:32 +02:00
Thomas Lamprecht
4b7a18845c cherry-pick: "usb: fix setup_len init (CVE-2020-14364)"
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-09-14 19:38:34 +02:00
Thomas Lamprecht
7895b0d523 work around #3002: revert "qemu-img convert: Don't pre-zero images"
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-09-14 19:37:45 +02:00
Stefan Reiter
437d68473c Add systemd journal logging patch
Prints QEMU errors that occur *after* the "-daemonize" fork to the
systemd journal, instead of pushing them into /dev/null like before.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-09-08 17:13:29 +02:00
Thomas Lamprecht
46c0ec19ad bump version to 5.1.0-1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-08-20 14:48:58 +02:00
Fabian Grünbichler
a5feeebc51 allow backup of read-only block drives
this is needed for template backups with PBS until we have the backup
equivalent of 'pbs-restore'.

Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-08-20 14:11:39 +02:00
Stefan Reiter
60ae3775bf update to QEMU 5.1
No major semantic changes, mostly just deprecations and changed function
signatures. Drop the extra/ patches, as they have been applied upstream.

The added extra/ patch was accepted upstream[0] but has not been picked
up for 5.1. It is required for non-4M aligned backups to work with PBS.

[0] https://lists.gnu.org/archive/html/qemu-devel/2020-08/msg01671.html

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-08-20 13:40:36 +02:00
Thomas Lamprecht
878df11e78 bump version to 5.0.0-13
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-08-19 18:19:35 +02:00
Thomas Lamprecht
f00a720d7e PVE: add query-pbs-bitmap-info QMP call
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-08-19 18:11:23 +02:00
Thomas Lamprecht
c5f7dc1d72 PVE: add zero block handling to PBS dump callback
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-08-19 13:56:03 +02:00
Fabian Grünbichler
dde77467ca bump version to 5.0.0-12
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-08-11 11:29:21 +02:00
Fabian Grünbichler
2821f02d70 fix PBS write callback with big blocks
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
2020-08-11 11:14:36 +02:00
Oguz Bektas
95fd47ecb9 patch for possible DOS in qemu network packet processing
fixes an assertion failure in qemu network packet processing, which can
lead to DOS'ing the qemu process on the host. this affects 'e1000e' and
'vmxnet3' network devices.

patch is cherry-picked from the commit mentioned in the oss-security email.

more info on oss-security [0]

[0]: https://www.openwall.com/lists/oss-security/2020/08/10/1

Signed-off-by: Oguz Bektas <o.bektas@proxmox.com>
2020-08-11 11:08:39 +02:00
Thomas Lamprecht
41424ed9c8 bump version to 5.0.0-11
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-14 12:45:24 +02:00
Stefan Reiter
f257cc05f4 Fix dirty-bitmap PBS backup with multiple drives
"PVE backup: rename incremental to use-dirty-bitmap" merged two
variables (use_dirty_bitmap and incremental) into one, but they served
two different purposes. Rename the original use_dirty_bitmap to
"expect_only_dirty" so the new one doesn't conflict, and rework "PVE:
use proxmox_backup_check_incremental" around that semantic.

In practice, this had the effect that only one disk at a time would
have a bitmap added, as after the first "use_dirty_bitmap" would be set
to one and the rest would behave as if the QMP parameter of the same
name was unset.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-07-14 10:46:48 +02:00
Wolfgang Bumiller
9886892f10 bump version to 5.0.0-10
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-07-10 13:33:44 +02:00
Wolfgang Bumiller
6d46b2ff4c fix backup qmp parameters to pass along encryption info
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
2020-07-10 13:31:52 +02:00
Thomas Lamprecht
2e4f5f2a90 bump version to 5.0.0-9
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-09 13:18:48 +02:00
Thomas Lamprecht
1d606ec161 d/control: update build-dependency of libproxmox-backup-qemu0-dev
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-09 13:17:29 +02:00
Thomas Lamprecht
3499c5b45a PBS patches: block driver, adapat encrypt/compress param, add query-proxmox-support QMP cmd
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-09 13:15:49 +02:00
Thomas Lamprecht
102ddd7e59 bump version to 5.0.0-8
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-06 23:05:42 +02:00
Thomas Lamprecht
4c17eebee4 fixup: proxmox_backup_check_incremental is negated
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-06 23:05:42 +02:00
Thomas Lamprecht
dfed71b229 bump version to 5.0.0-7
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-06 22:19:12 +02:00
Thomas Lamprecht
3ab149ccdd update/add PBS integration patches
* rename "incremental" param to "use-dirty-bitmap", avoids confusion
  as the backup can be incrementally also with that param set to
  false.
* use new proxmox_backup_check_incremental
* fix setting dirty counter and adapt to new connect API semantic

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-06 22:13:12 +02:00
Thomas Lamprecht
1f8140323f bump version to 5.0.0-6
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-03 19:26:09 +02:00
Thomas Lamprecht
15b9c76e1f pbs: query-backup: set reused field also for dirty-bitmap
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-03 19:26:09 +02:00
Thomas Lamprecht
d7f4e01a34 debian/patches: squash some followup patches and regroup a bit more together
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-02 13:33:16 +02:00
Thomas Lamprecht
8efe995b49 fixup: qemu submodule should be at v5.0.0
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-02 13:33:16 +02:00
Thomas Lamprecht
20be7fa0a0 backup: improve QAPI info and remove all dirty-bitmaps on failed drive-job
effectively two commits merged as one:
https://pve.proxmox.com/pipermail/pve-devel/2020-July/044185.html
https://pve.proxmox.com/pipermail/pve-devel/2020-July/044194.html

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-07-02 13:03:49 +02:00
Thomas Lamprecht
0943af81a6 bump version to 5.0.0-5
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-06-30 12:59:26 +02:00
Thomas Lamprecht
04e436ea6b d/control: build depend on newer libproxmox-backup-qemu0-dev
to ensure we have the new 'incremental' parameter for the
proxmox_backup_register_image and proxmox_backup_register_image_async
functions available

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-06-30 12:53:01 +02:00
Dietmar Maurer
c96a4a38cb add incremental backup patches
and fix typo: s/BPS/PBS/
2020-06-30 10:34:00 +02:00
Stefan Reiter
f0b53ef0b2 fix #2794: Include legacy-igd passthrough fix
See https://bugs.launchpad.net/qemu/+bug/1882784

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
2020-06-25 13:24:34 +02:00
Thomas Lamprecht
b570f1c41e Fix backup for not 64k-aligned storages
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Tested-by: Roland Kammerer <roland.kammerer@linbit.com>
2020-06-24 16:26:30 +02:00
Thomas Lamprecht
fff7e250ee pbs-restore: flush verbose log before calling into library
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-06-02 13:20:56 +02:00
Thomas Lamprecht
5107839915 bump version to 5.0.0-4
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-05-30 18:48:00 +02:00
Thomas Lamprecht
bce72611f9 pbs-restore: be more verbose if asked to
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-05-30 18:48:00 +02:00
Thomas Lamprecht
c6979241f1 small cleanups for pbs-restore
Add trailing newline to two error messages, and drop an extra
unconditional `qdict_put_str(options, "driver", format);`
Besides that it's just formatting.

Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-05-30 15:24:20 +02:00
Thomas Lamprecht
4e74eca7ed install pbs-restore to usr/bin
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-05-30 13:54:08 +02:00
Thomas Lamprecht
62b7007c2a bump version to 5.0.0-3
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-05-27 18:54:06 +02:00
Thomas Lamprecht
f063a8aadb fix vmstate-snapshots with iothread=1
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2020-05-27 18:54:06 +02:00
111 changed files with 10837 additions and 3289 deletions

View File

@@ -19,6 +19,9 @@ submodule:
test -f "${SRCDIR}/configure" || git submodule update --init --recursive
$(BUILDDIR): keycodemapdb | submodule
# check if qemu/ was used for a build
# if so, please run 'make distclean' in the submodule and try again
test ! -f $(SRCDIR)/build/config.status
rm -rf $(BUILDDIR)
cp -a $(SRCDIR) $(BUILDDIR)
cp -a debian $(BUILDDIR)/debian
@@ -46,7 +49,7 @@ update:
.PHONY: upload
upload: $(DEBS)
tar cf - ${DEBS} | ssh repoman@repo.proxmox.com upload --product pve --dist buster
tar cf - ${DEBS} | ssh repoman@repo.proxmox.com upload --product pve --dist bullseye
.PHONY: distclean clean
distclean: clean

371
debian/changelog vendored
View File

@@ -1,7 +1,374 @@
pve-qemu-kvm (6.2.0-11+vitastor2) bullseye; urgency=medium
* Add bdrv_co_block_status implementation for QCOW2 export support
-- Vitaliy Filippov <vitalif@yourcmc.ru> Fri, 13 Jan 2023 20:14:59 +0300
pve-qemu-kvm (6.2.0-11+vitastor1) bullseye; urgency=medium
* Add Vitastor support
-- Vitaliy Filippov <vitalif@yourcmc.ru> Thu, 14 Dec 2022 18:13:59 +0300
pve-qemu-kvm (6.2.0-11) bullseye; urgency=medium
* add 'namespace' to BlockdevOptionsPbs for live-restore support
* vma: create: support 64KiB-unaligned input images like to improve backing
up some VM templates
* block: alloc-track: avoid unlikely, but possible premature break
-- Proxmox Support Team <support@proxmox.com> Wed, 22 Jun 2022 15:54:54 +0200
pve-qemu-kvm (6.2.0-10) bullseye; urgency=medium
* fix #4101: fix backup cancellation bug with iothreads
-- Proxmox Support Team <support@proxmox.com> Thu, 9 Jun 2022 16:35:51 +0200
pve-qemu-kvm (6.2.0-9) bullseye; urgency=medium
* fix possible race conditions during cancellation of a PBS backup
-- Proxmox Support Team <support@proxmox.com> Wed, 08 Jun 2022 14:03:22 +0200
pve-qemu-kvm (6.2.0-8) bullseye; urgency=medium
* revert "block/rbd: implement bdrv_co_block_status" to work around
performance regression when backing up large RBD disk
-- Proxmox Support Team <support@proxmox.com> Thu, 19 May 2022 09:24:45 +0200
pve-qemu-kvm (6.2.0-7) bullseye; urgency=medium
* Proxmox Backup Server namespace support
-- Proxmox Support Team <support@proxmox.com> Thu, 12 May 2022 16:05:56 +0200
pve-qemu-kvm (6.2.0-6) bullseye; urgency=medium
* block/gluster: correctly set max_pdiscard which is int64_t to avoid
triggering assertion
* ui/vnc.c: Fixed a deadlock bug
* display/qxl-render: fix race condition in qxl_cursor (CVE-2021-4207) and
integer overflow in cursor_alloc (CVE-2021-4206)
-- Proxmox Support Team <support@proxmox.com> Wed, 11 May 2022 10:42:53 +0200
pve-qemu-kvm (6.2.0-5) bullseye; urgency=medium
* vma: allow partial restore by skipping some disk
-- Proxmox Support Team <support@proxmox.com> Mon, 25 Apr 2022 10:13:46 +0200
pve-qemu-kvm (6.2.0-4) bullseye; urgency=medium
* d/control: add libgbm to build dependencies
* d/control: add suggest dependency-hint for libgl1
* various stable backports:
+ virtio-net: fix map leaking on error during receive
+ memory: Fix incorrect calls of log_global_start/stop
+ acpi: fix OEM ID/OEM Table ID padding
+ vhost-vsock: detach the virqueue element in case of error
+ vhost-user: remove VirtQ notifier restore
+ vhost-user: fix VirtQ notifier cleanup
+ virtio: fix the condition for iommu_platform not supported
-- Proxmox Support Team <support@proxmox.com> Fri, 22 Apr 2022 11:52:30 +0200
pve-qemu-kvm (6.2.0-3) bullseye; urgency=medium
* cherry-pick fix for some manually added ACPI table SLIC entries via the
custom args flag.
-- Proxmox Support Team <support@proxmox.com> Fri, 15 Apr 2022 09:09:37 +0200
pve-qemu-kvm (6.2.0-2) bullseye; urgency=medium
* compile in virgl support
* enable zstd support
* drop sdl dependency (it was disabled at compile time already)
* recommend 'numactl'
* fix an issue with multi-disk backups where chunks would be written
multiple times
-- Proxmox Support Team <support@proxmox.com> Thu, 03 Mar 2022 12:03:44 +0100
pve-qemu-kvm (6.2.0-1) bullseye; urgency=medium
* update to QEMU stable release 6.2.0
-- Proxmox Support Team <support@proxmox.com> Thu, 17 Feb 2022 06:23:14 +0100
pve-qemu-kvm (6.1.1-2) bullseye; urgency=medium
* vma: create: register all streams before entering coroutines to avoid that
an early stream starts to write already before all got registered.
-- Proxmox Support Team <support@proxmox.com> Mon, 14 Feb 2022 15:53:15 +0100
pve-qemu-kvm (6.1.1-1) bullseye; urgency=medium
* update to 6.1.1 stable release
-- Proxmox Support Team <support@proxmox.com> Thu, 13 Jan 2022 10:57:43 +0100
pve-qemu-kvm (6.1.0-3) bullseye; urgency=medium
* fix #3738: cherry-pick "block: introduce max_hw_iov for use in scsi-
generic
-- Proxmox Support Team <support@proxmox.com> Wed, 01 Dec 2021 15:35:43 +0100
pve-qemu-kvm (6.1.0-2) bullseye; urgency=medium
* avoid a possible segmentation fault during block (disk) mirror
-- Proxmox Support Team <support@proxmox.com> Tue, 16 Nov 2021 09:38:10 +0100
pve-qemu-kvm (6.1.0-1) bullseye; urgency=medium
* update to QEMU stable release 6.1.0
-- Proxmox Support Team <support@proxmox.com> Mon, 11 Oct 2021 15:15:19 +0200
pve-qemu-kvm (6.0.0-4) bullseye; urgency=medium
* drop the ancient workaround that force disabled SMM due to observing VM
hangs on old kernel versions.
* monitor/qmp: fix race with clients disconnecting early resulting in other
clients receiving a message with the (now wrong) ID of the former
-- Proxmox Support Team <support@proxmox.com> Mon, 06 Sep 2021 07:30:00 +0200
pve-qemu-kvm (6.0.0-3) bullseye; urgency=medium
* io_uring: resubmit when result is -EAGAIN
-- Proxmox Support Team <support@proxmox.com> Tue, 3 Aug 2021 15:01:31 +0200
pve-qemu-kvm (6.0.0-2) bullseye; urgency=medium
* enable io-uring support in QEMU builds
-- Proxmox Support Team <support@proxmox.com> Wed, 23 Jun 2021 11:03:54 +0200
pve-qemu-kvm (6.0.0-1) bullseye; urgency=medium
* update to QEMU stable release 6.0.0
-- Proxmox Support Team <support@proxmox.com> Fri, 28 May 2021 11:30:50 +0200
pve-qemu-kvm (5.2.0-11) bullseye; urgency=medium
* re-build for Proxmox VE 7 / Debian Bullseye
-- Proxmox Support Team <support@proxmox.com> Thu, 13 May 2021 14:03:00 +0200
pve-qemu-kvm (5.2.0-6) pve; urgency=medium
* improve the alloc-track and Proxmox Backup Server special block driver when
used with IO-threads for the new live-restore feature
-- Proxmox Support Team <support@proxmox.com> Thu, 15 Apr 2021 16:29:48 +0200
pve-qemu-kvm (5.2.0-5) pve; urgency=medium
* cherry-pick fixes for a possible deadlock when resizing a disk while using
IO-threads
-- Proxmox Support Team <support@proxmox.com> Tue, 30 Mar 2021 18:18:18 +0200
pve-qemu-kvm (5.2.0-4) pve; urgency=medium
* monitor/qmp: fix race on CHR_EVENT_CLOSED without OOB
* improve saving and loading dirty bitmaps in live-snapshots
-- Proxmox Support Team <support@proxmox.com> Tue, 23 Mar 2021 15:41:26 +0100
pve-qemu-kvm (5.2.0-3) pve; urgency=medium
* backport "i386/acpi: restore device paths for pre-5.1 vms" patch
* ship list of 'i440fx' and 'q35' machine versions this QEMU version supports
-- Proxmox Support Team <support@proxmox.com> Fri, 05 Mar 2021 16:23:06 +0100
pve-qemu-kvm (5.2.0-2) pve; urgency=medium
* Proxmox Backup restore: ensure all caches are flushed for userspace
attached storages (for example, Ceph when librbd and not KRBD is used)
* ship VirtIOFSd daemon
-- Proxmox Support Team <support@proxmox.com> Wed, 24 Feb 2021 18:25:14 +0100
pve-qemu-kvm (5.2.0-1) pve; urgency=medium
* update to QEMU stable release 5.2.0
* fix #3084: fall back to open-iscsi initiatorname
* fix snapshot abort and improve performance in some edge cases
* add basis for Proxmox Backup Server master key support
-- Proxmox Support Team <support@proxmox.com> Fri, 12 Feb 2021 12:09:59 +0100
pve-qemu-kvm (5.1.0-8) pve; urgency=medium
* disable jemalloc, as it does not play nice with our library written in
rust
* fix #3225: properly cancel jobs in 'created' state when cancelling or
failing backup jobs
-- Proxmox Support Team <support@proxmox.com> Thu, 07 Jan 2021 10:27:33 +0100
pve-qemu-kvm (5.1.0-7) pve; urgency=medium
* allow to query the loaded Proxmox Backup library version
-- Proxmox Support Team <support@proxmox.com> Wed, 25 Nov 2020 14:09:16 +0100
pve-qemu-kvm (5.1.0-6) pve; urgency=medium
* migration/block-dirty-bitmap: avoid telling QEMU that the bitmap migration
is active longer than required
-- Proxmox Support Team <support@proxmox.com> Thu, 05 Nov 2020 18:59:40 +0100
pve-qemu-kvm (5.1.0-5) pve; urgency=medium
* migration/block-dirty-bitmap: migrate other bitmaps even if one fails
-- Proxmox Support Team <support@proxmox.com> Wed, 04 Nov 2020 18:36:32 +0100
pve-qemu-kvm (5.1.0-4) pve; urgency=medium
* several fixes for backup abort edgecases and error reporting
* allow to migrate dirty-bitmap and Proxmox Backup Server state
-- Proxmox Support Team <support@proxmox.com> Thu, 29 Oct 2020 18:09:16 +0100
pve-qemu-kvm (5.1.0-3) pve; urgency=medium
* backup: make more use of coroutines and do not block on finishing
* backup: use transactions to synchronize the disk job states
-- Proxmox Support Team <support@proxmox.com> Tue, 29 Sep 2020 09:22:56 +0200
pve-qemu-kvm (5.1.0-2) pve; urgency=medium
* cherry-pick fix to harden checks for USB devices (CVE-2020-14364)
* work around #3002: revert "qemu-img convert: Don't pre-zero images" as it
correlates with an issue when using LVM as a target storage for disk move
operations.
-- Proxmox Support Team <support@proxmox.com> Mon, 14 Sep 2020 20:01:24 +0200
pve-qemu-kvm (5.1.0-1) pve; urgency=medium
* update to QEMU 5.1.0
-- Proxmox Support Team <support@proxmox.com> Thu, 20 Aug 2020 13:42:10 +0200
pve-qemu-kvm (5.0.0-13) pve; urgency=medium
* improve zero block handling for PBS backups
* allow querying a more detailed dirty-bitmap state per VM disk
-- Proxmox Support Team <support@proxmox.com> Wed, 19 Aug 2020 18:19:26 +0200
pve-qemu-kvm (5.0.0-12) pve; urgency=medium
* patch for possible DOS in qemu network packet processing
* fix PBS write callback with big blocks
-- Proxmox Support Team <support@proxmox.com> Tue, 11 Aug 2020 11:29:13 +0200
pve-qemu-kvm (5.0.0-11) pve; urgency=medium
* improve dirty-bitmap Proxmox Backup Server backup with multiple drives
-- Proxmox Support Team <support@proxmox.com> Tue, 14 Jul 2020 12:44:25 +0200
pve-qemu-kvm (5.0.0-10) pve; urgency=medium
* fix compression and encryption related backup parameters not being passed
on from the HMP command properly
-- Proxmox Support Team <support@proxmox.com> Fri, 10 Jul 2020 13:32:11 +0200
pve-qemu-kvm (5.0.0-9) pve; urgency=medium
* adapt to new compress and encrypt params of the proxmox-backup library
* add block driver for Proxmox Backup Server backed images
* add 'query-proxmox-support' QMP command
-- Proxmox Support Team <support@proxmox.com> Thu, 09 Jul 2020 13:18:45 +0200
pve-qemu-kvm (5.0.0-8) pve; urgency=medium
* backup: rename parameter for dirty-bitmap PBS backup to "use-dirty-bitmap"
* backup: improve checking if a previous backup is available when trying to
reuse it
-- Proxmox Support Team <support@proxmox.com> Mon, 06 Jul 2020 22:58:45 +0200
pve-qemu-kvm (5.0.0-6) pve; urgency=medium
* backup: improve query-backup information and remove all dirty-bitmaps on
failed drive-job
-- Proxmox Support Team <support@proxmox.com> Fri, 03 Jul 2020 17:00:30 +0200
pve-qemu-kvm (5.0.0-5) pve; urgency=medium
* fix backup for not 64k-aligned storages
* fix #2794: Include legacy-igd passthrough fix
* add initial support for incremental backup for running VMs and
Proxmox Backup Server as a target
-- Proxmox Support Team <support@proxmox.com> Tue, 30 Jun 2020 11:12:55 +0200
pve-qemu-kvm (5.0.0-4) pve; urgency=medium
* install missing restore helper binary
-- Proxmox Support Team <support@proxmox.com> Sat, 30 May 2020 15:25:38 +0200
pve-qemu-kvm (5.0.0-3) pve; urgency=medium
* ensure that a data-flush for all drives uses the correct AioContext. Fixes
a potential VM hang happening on some storage types if IOThreads are used.
-- Proxmox Support Team <support@proxmox.com> Wed, 27 May 2020 14:41:31 +0200
pve-qemu-kvm (5.0.0-2) pve; urgency=medium
* fix saving a VM-state (snapshot, suspend-to-disk) in combination with
IOThreads
* fix saving a VM-state (snapshot, suspend-to-disk) with QEMU 5.0
* try to use bigger chunks for saving a VM-state to improve performance on
storage backends like Ceph RBD

17
debian/control vendored
View File

@@ -10,25 +10,34 @@ Build-Depends: autotools-dev,
libattr1-dev,
libcap-ng-dev,
libcurl4-gnutls-dev,
libepoxy-dev,
libfdt-dev,
libgbm-dev,
libglusterfs-dev (>= 5.2-2),
libgnutls28-dev,
libiscsi-dev (>= 1.12.0),
libjemalloc-dev,
libjpeg-dev,
libjson-perl,
libnuma-dev,
libpci-dev,
libpixman-1-dev,
libproxmox-backup-qemu0-dev,
libproxmox-backup-qemu0-dev (>= 1.3.0-1),
librbd-dev (>= 0.48),
libsdl1.2-dev,
libseccomp-dev,
libspice-protocol-dev (>= 0.12.14~),
libspice-server-dev (>= 0.14.0~),
libsystemd-dev,
liburing-dev,
libusb-1.0-0-dev (>= 1.0.17-1),
libusbredirparser-dev (>= 0.6-2),
libvirglrenderer-dev,
libzstd-dev,
meson,
python3-minimal,
python3-sphinx,
python3-sphinx-rtd-theme,
quilt,
texi2html,
texinfo,
@@ -40,7 +49,6 @@ Package: pve-qemu-kvm
Architecture: any
Depends: ceph-common (>= 0.48),
iproute2,
libaio1,
libgfapi0 | glusterfs-common (>= 5.6),
libgfchangelog0 | glusterfs-common (>= 5.6),
libgfdb0 | glusterfs-common (>= 5.6),
@@ -51,15 +59,14 @@ Depends: ceph-common (>= 0.48),
libiscsi4 (>= 1.12.0) | libiscsi7,
libjemalloc2,
libjpeg62-turbo,
libsdl1.2debian,
libspice-server1 (>= 0.14.0~),
libusb-1.0-0 (>= 1.0.17-1),
libusbredirparser1 (>= 0.6-2),
libuuid1,
numactl,
python,
${misc:Depends},
${shlibs:Depends},
Recommends: numactl
Suggests: libgl1
Conflicts: kvm,
pve-kvm,
pve-qemu-kvm-2.6.18,

2
debian/copyright vendored
View File

@@ -25,7 +25,7 @@ License:
In particular, the QEMU virtual CPU core library (libqemu.a) is
released under the GNU Lesser General Public License version 2 or later.
On Debian systems, the complete text of the GNU Lesser General Public
On Debian systems, the complete text of the GNU Lesser General Public
License can be found in the file /usr/share/common-licenses/LGPL.
Some hardware device emulation sources and other QEMU functionality are

27
debian/parse-machines.pl vendored Executable file
View File

@@ -0,0 +1,27 @@
#!/usr/bin/perl
use warnings;
use strict;
use JSON;
my $machines = [];
while (<STDIN>) {
if (/^\s*Supported machines are:/) {
next;
}
s/^\s+//;
my @machine = split(/\s+/);
next if $machine[0] !~ m/^pc-(i440fx|q35)-(.+)$/;
push @$machines, {
'id' => $machine[0],
'type' => $1,
'version' => $2,
};
}
die "no QEMU machine types detected from STDIN input" if scalar (@$machines) <= 0;
print to_json($machines, { utf8 => 1 }) or die "$!\n";

View File

@@ -26,19 +26,20 @@ Suggested-by: Ma Haocong <mahaocong@didichuxing.com>
Signed-off-by: Ma Haocong <mahaocong@didichuxing.com>
Signed-off-by: John Snow <jsnow@redhat.com>
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/mirror.c | 98 ++++++++++++++++++++++++++++++-------
blockdev.c | 39 +++++++++++++--
include/block/block_int.h | 4 +-
qapi/block-core.json | 29 +++++++++--
tests/test-block-iothread.c | 4 +-
block/mirror.c | 98 +++++++++++++++++++++++++-------
blockdev.c | 39 ++++++++++++-
include/block/block_int.h | 4 +-
qapi/block-core.json | 29 ++++++++--
tests/unit/test-block-iothread.c | 4 +-
5 files changed, 145 insertions(+), 29 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index c26fd9260d..3c9cd42c50 100644
index efec2c7674..f7804638f9 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -49,7 +49,7 @@ typedef struct MirrorBlockJob {
@@ -50,7 +50,7 @@ typedef struct MirrorBlockJob {
BlockDriverState *to_replace;
/* Used to block operations on the drive-mirror-replace target */
Error *replace_blocker;
@@ -56,17 +57,17 @@ index c26fd9260d..3c9cd42c50 100644
BdrvDirtyBitmap *dirty_bitmap;
BdrvDirtyBitmapIter *dbi;
uint8_t *buf;
@@ -676,7 +678,8 @@ static int mirror_exit_common(Job *job)
@@ -695,7 +697,8 @@ static int mirror_exit_common(Job *job)
bdrv_child_refresh_perms(mirror_top_bs, mirror_top_bs->backing,
&error_abort);
if (!abort && s->backing_mode == MIRROR_SOURCE_BACKING_CHAIN) {
- BlockDriverState *backing = s->is_none_mode ? src : s->base;
+ BlockDriverState *backing;
+ backing = s->sync_mode == MIRROR_SYNC_MODE_NONE ? src : s->base;
if (backing_bs(target_bs) != backing) {
bdrv_set_backing_hd(target_bs, backing, &local_err);
if (local_err) {
@@ -771,6 +774,16 @@ static void mirror_abort(Job *job)
BlockDriverState *unfiltered_target = bdrv_skip_filters(target_bs);
if (bdrv_cow_bs(unfiltered_target) != backing) {
@@ -800,6 +803,16 @@ static void mirror_abort(Job *job)
assert(ret == 0);
}
@@ -83,7 +84,7 @@ index c26fd9260d..3c9cd42c50 100644
static void coroutine_fn mirror_throttle(MirrorBlockJob *s)
{
int64_t now = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -949,7 +962,8 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
@@ -979,7 +992,8 @@ static int coroutine_fn mirror_run(Job *job, Error **errp)
mirror_free_init(s);
s->last_pause_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
@@ -93,23 +94,23 @@ index c26fd9260d..3c9cd42c50 100644
ret = mirror_dirty_init(s);
if (ret < 0 || job_is_cancelled(&s->common.job)) {
goto immediate_exit;
@@ -1181,6 +1195,7 @@ static const BlockJobDriver mirror_job_driver = {
@@ -1221,6 +1235,7 @@ static const BlockJobDriver mirror_job_driver = {
.run = mirror_run,
.prepare = mirror_prepare,
.abort = mirror_abort,
+ .clean = mirror_clean,
.pause = mirror_pause,
.complete = mirror_complete,
},
@@ -1196,6 +1211,7 @@ static const BlockJobDriver commit_active_job_driver = {
.cancel = mirror_cancel,
@@ -1237,6 +1252,7 @@ static const BlockJobDriver commit_active_job_driver = {
.run = mirror_run,
.prepare = mirror_prepare,
.abort = mirror_abort,
+ .clean = mirror_clean,
.pause = mirror_pause,
.complete = mirror_complete,
},
@@ -1542,7 +1558,10 @@ static BlockJob *mirror_start_job(
.cancel = commit_active_cancel,
@@ -1602,7 +1618,10 @@ static BlockJob *mirror_start_job(
BlockCompletionFunc *cb,
void *opaque,
const BlockJobDriver *driver,
@@ -121,12 +122,11 @@ index c26fd9260d..3c9cd42c50 100644
bool auto_complete, const char *filter_node_name,
bool is_mirror, MirrorCopyMode copy_mode,
Error **errp)
@@ -1555,10 +1574,39 @@ static BlockJob *mirror_start_job(
Error *local_err = NULL;
@@ -1614,10 +1633,39 @@ static BlockJob *mirror_start_job(
uint64_t target_perms, target_shared_perms;
int ret;
- if (granularity == 0) {
- granularity = bdrv_get_default_bitmap_granularity(target);
+ if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
+ error_setg(errp, "Sync mode '%s' not supported",
+ MirrorSyncMode_str(sync_mode));
@@ -147,8 +147,8 @@ index c26fd9260d..3c9cd42c50 100644
+ "sync mode '%s' is not compatible with bitmaps",
+ MirrorSyncMode_str(sync_mode));
+ return NULL;
}
+ }
+
+ if (bitmap) {
+ if (granularity) {
+ error_setg(errp, "granularity (%d)"
@@ -158,12 +158,13 @@ index c26fd9260d..3c9cd42c50 100644
+ }
+ granularity = bdrv_dirty_bitmap_granularity(bitmap);
+ } else if (granularity == 0) {
+ granularity = bdrv_get_default_bitmap_granularity(target);
+ }
granularity = bdrv_get_default_bitmap_granularity(target);
}
-
assert(is_power_of_2(granularity));
if (buf_size < 0) {
@@ -1662,7 +1710,9 @@ static BlockJob *mirror_start_job(
@@ -1755,7 +1803,9 @@ static BlockJob *mirror_start_job(
s->replaces = g_strdup(replaces);
s->on_source_error = on_source_error;
s->on_target_error = on_target_error;
@@ -174,7 +175,7 @@ index c26fd9260d..3c9cd42c50 100644
s->backing_mode = backing_mode;
s->zero_target = zero_target;
s->copy_mode = copy_mode;
@@ -1682,6 +1732,18 @@ static BlockJob *mirror_start_job(
@@ -1776,6 +1826,18 @@ static BlockJob *mirror_start_job(
bdrv_disable_dirty_bitmap(s->dirty_bitmap);
}
@@ -193,7 +194,7 @@ index c26fd9260d..3c9cd42c50 100644
ret = block_job_add_bdrv(&s->common, "source", bs, 0,
BLK_PERM_WRITE_UNCHANGED | BLK_PERM_WRITE |
BLK_PERM_CONSISTENT_READ,
@@ -1735,6 +1797,9 @@ fail:
@@ -1853,6 +1915,9 @@ fail:
if (s->dirty_bitmap) {
bdrv_release_dirty_bitmap(s->dirty_bitmap);
}
@@ -203,7 +204,7 @@ index c26fd9260d..3c9cd42c50 100644
job_early_fail(&s->common.job);
}
@@ -1752,29 +1817,23 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
@@ -1870,29 +1935,23 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
BlockDriverState *target, const char *replaces,
int creation_flags, int64_t speed,
uint32_t granularity, int64_t buf_size,
@@ -227,7 +228,7 @@ index c26fd9260d..3c9cd42c50 100644
- return;
- }
- is_none_mode = mode == MIRROR_SYNC_MODE_NONE;
base = mode == MIRROR_SYNC_MODE_TOP ? backing_bs(bs) : NULL;
base = mode == MIRROR_SYNC_MODE_TOP ? bdrv_backing_chain_next(bs) : NULL;
mirror_start_job(job_id, bs, creation_flags, target, replaces,
speed, granularity, buf_size, backing_mode, zero_target,
on_source_error, on_target_error, unmap, NULL, NULL,
@@ -238,7 +239,7 @@ index c26fd9260d..3c9cd42c50 100644
}
BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
@@ -1800,7 +1859,8 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
@@ -1917,7 +1976,8 @@ BlockJob *commit_active_start(const char *job_id, BlockDriverState *bs,
job_id, bs, creation_flags, base, NULL, speed, 0, 0,
MIRROR_LEAVE_BACKING_CHAIN, false,
on_error, on_error, true, cb, opaque,
@@ -246,13 +247,13 @@ index c26fd9260d..3c9cd42c50 100644
+ &commit_active_job_driver, MIRROR_SYNC_MODE_FULL,
+ NULL, 0, base, auto_complete,
filter_node_name, false, MIRROR_COPY_MODE_BACKGROUND,
&local_err);
if (local_err) {
errp);
if (!job) {
diff --git a/blockdev.c b/blockdev.c
index f391c3b3c7..bbeff9c439 100644
index b35072644e..9940116fe0 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3159,6 +3159,10 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
@@ -2956,6 +2956,10 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
BlockDriverState *target,
bool has_replaces, const char *replaces,
enum MirrorSyncMode sync,
@@ -263,15 +264,15 @@ index f391c3b3c7..bbeff9c439 100644
BlockMirrorBackingMode backing_mode,
bool zero_target,
bool has_speed, int64_t speed,
@@ -3177,6 +3181,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
Error **errp)
@@ -2975,6 +2979,7 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
{
BlockDriverState *unfiltered_bs;
int job_flags = JOB_DEFAULT;
+ BdrvDirtyBitmap *bitmap = NULL;
if (!has_speed) {
speed = 0;
@@ -3231,6 +3236,29 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
@@ -3029,6 +3034,29 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
sync = MIRROR_SYNC_MODE_FULL;
}
@@ -298,10 +299,10 @@ index f391c3b3c7..bbeff9c439 100644
+ }
+ }
+
if (has_replaces) {
BlockDriverState *to_replace_bs;
AioContext *replace_aio_context;
@@ -3268,8 +3296,8 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
if (!has_replaces) {
/* We want to mirror from @bs, but keep implicit filters on top */
unfiltered_bs = bdrv_skip_implicit_filters(bs);
@@ -3075,8 +3103,8 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
* and will allow to check whether the node still exist at mirror completion
*/
mirror_start(job_id, bs, target,
@@ -312,7 +313,7 @@ index f391c3b3c7..bbeff9c439 100644
on_source_error, on_target_error, unmap, filter_node_name,
copy_mode, errp);
}
@@ -3410,6 +3438,8 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
@@ -3221,6 +3249,8 @@ void qmp_drive_mirror(DriveMirror *arg, Error **errp)
blockdev_mirror_common(arg->has_job_id ? arg->job_id : NULL, bs, target_bs,
arg->has_replaces, arg->replaces, arg->sync,
@@ -321,7 +322,7 @@ index f391c3b3c7..bbeff9c439 100644
backing_mode, zero_target,
arg->has_speed, arg->speed,
arg->has_granularity, arg->granularity,
@@ -3432,6 +3462,8 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
@@ -3242,6 +3272,8 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
const char *device, const char *target,
bool has_replaces, const char *replaces,
MirrorSyncMode sync,
@@ -330,7 +331,7 @@ index f391c3b3c7..bbeff9c439 100644
bool has_speed, int64_t speed,
bool has_granularity, uint32_t granularity,
bool has_buf_size, int64_t buf_size,
@@ -3482,7 +3514,8 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
@@ -3291,7 +3323,8 @@ void qmp_blockdev_mirror(bool has_job_id, const char *job_id,
}
blockdev_mirror_common(has_job_id ? job_id : NULL, bs, target_bs,
@@ -341,10 +342,10 @@ index f391c3b3c7..bbeff9c439 100644
has_granularity, granularity,
has_buf_size, buf_size,
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 6d234f1de9..180a5e00fd 100644
index f4c75e8ba9..ee0aeb1414 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1210,7 +1210,9 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
@@ -1287,7 +1287,9 @@ void mirror_start(const char *job_id, BlockDriverState *bs,
BlockDriverState *target, const char *replaces,
int creation_flags, int64_t speed,
uint32_t granularity, int64_t buf_size,
@@ -356,10 +357,10 @@ index 6d234f1de9..180a5e00fd 100644
BlockdevOnError on_source_error,
BlockdevOnError on_target_error,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 97d1f64636..8bdbccb397 100644
index 1d3dd9cb48..da5dca1e3b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -2054,10 +2054,19 @@
@@ -1995,10 +1995,19 @@
# (all the disk, only the sectors allocated in the topmost image, or
# only new I/O).
#
@@ -380,7 +381,7 @@ index 97d1f64636..8bdbccb397 100644
#
# @buf-size: maximum amount of data in flight from source to
# target (since 1.4).
@@ -2095,7 +2104,9 @@
@@ -2036,7 +2045,9 @@
{ 'struct': 'DriveMirror',
'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
'*format': 'str', '*node-name': 'str', '*replaces': 'str',
@@ -391,7 +392,7 @@ index 97d1f64636..8bdbccb397 100644
'*speed': 'int', '*granularity': 'uint32',
'*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
'*on-target-error': 'BlockdevOnError',
@@ -2362,10 +2373,19 @@
@@ -2308,10 +2319,19 @@
# (all the disk, only the sectors allocated in the topmost image, or
# only new I/O).
#
@@ -412,7 +413,7 @@ index 97d1f64636..8bdbccb397 100644
#
# @buf-size: maximum amount of data in flight from source to
# target
@@ -2414,7 +2434,8 @@
@@ -2360,7 +2380,8 @@
{ 'command': 'blockdev-mirror',
'data': { '*job-id': 'str', 'device': 'str', 'target': 'str',
'*replaces': 'str',
@@ -422,11 +423,11 @@ index 97d1f64636..8bdbccb397 100644
'*speed': 'int', '*granularity': 'uint32',
'*buf-size': 'int', '*on-source-error': 'BlockdevOnError',
'*on-target-error': 'BlockdevOnError',
diff --git a/tests/test-block-iothread.c b/tests/test-block-iothread.c
index 0c861809f0..da87a67a57 100644
--- a/tests/test-block-iothread.c
+++ b/tests/test-block-iothread.c
@@ -611,8 +611,8 @@ static void test_propagate_mirror(void)
diff --git a/tests/unit/test-block-iothread.c b/tests/unit/test-block-iothread.c
index aea660aeed..22b9770a3e 100644
--- a/tests/unit/test-block-iothread.c
+++ b/tests/unit/test-block-iothread.c
@@ -626,8 +626,8 @@ static void test_propagate_mirror(void)
/* Start a mirror job */
mirror_start("job0", src, target, NULL, JOB_DEFAULT, 0, 0, 0,

View File

@@ -18,15 +18,16 @@ incremental backup modes; we can use this bitmap to later refresh a
successfully created mirror.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/mirror.c | 24 ++++++++++++++++++------
1 file changed, 18 insertions(+), 6 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index 3c9cd42c50..08ac9827f2 100644
index f7804638f9..4f5f74e2cf 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -653,8 +653,6 @@ static int mirror_exit_common(Job *job)
@@ -672,8 +672,6 @@ static int mirror_exit_common(Job *job)
bdrv_unfreeze_backing_chain(mirror_top_bs, target_bs);
}
@@ -35,7 +36,7 @@ index 3c9cd42c50..08ac9827f2 100644
/* Make sure that the source BDS doesn't go away during bdrv_replace_node,
* before we can call bdrv_drained_end */
bdrv_ref(src);
@@ -752,6 +750,18 @@ static int mirror_exit_common(Job *job)
@@ -781,6 +779,18 @@ static int mirror_exit_common(Job *job)
blk_set_perm(bjob->blk, 0, BLK_PERM_ALL, &error_abort);
blk_insert_bs(bjob->blk, mirror_top_bs, &error_abort);
@@ -54,7 +55,7 @@ index 3c9cd42c50..08ac9827f2 100644
bs_opaque->job = NULL;
bdrv_drained_end(src);
@@ -1584,10 +1594,6 @@ static BlockJob *mirror_start_job(
@@ -1643,10 +1653,6 @@ static BlockJob *mirror_start_job(
" sync mode",
MirrorSyncMode_str(sync_mode));
return NULL;
@@ -65,7 +66,7 @@ index 3c9cd42c50..08ac9827f2 100644
}
} else if (bitmap) {
error_setg(errp,
@@ -1604,6 +1610,12 @@ static BlockJob *mirror_start_job(
@@ -1663,6 +1669,12 @@ static BlockJob *mirror_start_job(
return NULL;
}
granularity = bdrv_dirty_bitmap_granularity(bitmap);

View File

@@ -10,15 +10,16 @@ as one without the other does not make much sense with the current set
of modes.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
blockdev.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/blockdev.c b/blockdev.c
index bbeff9c439..fa3c2f5548 100644
index 9940116fe0..b113e57d68 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3257,6 +3257,9 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
@@ -3055,6 +3055,9 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_ALLOW_RO, errp)) {
return;
}
@@ -27,4 +28,4 @@ index bbeff9c439..fa3c2f5548 100644
+ return;
}
if (has_replaces) {
if (!has_replaces) {

View File

@@ -10,15 +10,16 @@ since sync_bitmap is busy at the point of merging, and we checked access
beforehand.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/mirror.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
block/mirror.c | 11 ++++-------
1 file changed, 4 insertions(+), 7 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index 08ac9827f2..c56b0f87e3 100644
index 4f5f74e2cf..7024f3bbf0 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -756,8 +756,8 @@ static int mirror_exit_common(Job *job)
@@ -785,8 +785,8 @@ static int mirror_exit_common(Job *job)
job->ret == 0 && ret == 0)) {
/* Success; synchronize copy back to sync. */
bdrv_clear_dirty_bitmap(s->sync_bitmap, NULL);
@@ -29,14 +30,17 @@ index 08ac9827f2..c56b0f87e3 100644
}
}
bdrv_release_dirty_bitmap(s->dirty_bitmap);
@@ -1749,8 +1749,8 @@ static BlockJob *mirror_start_job(
@@ -1843,11 +1843,8 @@ static BlockJob *mirror_start_job(
}
if (s->sync_mode == MIRROR_SYNC_MODE_BITMAP) {
- bdrv_merge_dirty_bitmap(s->dirty_bitmap, s->sync_bitmap,
- NULL, &local_err);
- if (local_err) {
- goto fail;
- }
+ bdrv_dirty_bitmap_merge_internal(s->dirty_bitmap, s->sync_bitmap,
+ NULL, true);
if (local_err) {
goto fail;
}
}
ret = block_job_add_bdrv(&s->common, "source", bs, 0,

View File

@@ -20,11 +20,11 @@ intentionally keeping copyright and ownership of original test case to
honor provenance.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
tests/qemu-iotests/384 | 547 +++++++
tests/qemu-iotests/384.out | 2846 ++++++++++++++++++++++++++++++++++++
tests/qemu-iotests/group | 1 +
3 files changed, 3394 insertions(+)
2 files changed, 3393 insertions(+)
create mode 100755 tests/qemu-iotests/384
create mode 100644 tests/qemu-iotests/384.out
@@ -3433,15 +3433,3 @@ index 0000000000..9b7408b6d6
+{"execute": "blockdev-mirror", "arguments": {"bitmap": "bitmap0", "device": "drive0", "filter-node-name": "mirror-top", "job-id": "api_job", "sync": "none", "target": "mirror_target"}}
+{"error": {"class": "GenericError", "desc": "bitmap-mode must be specified if a bitmap is provided"}}
+
diff --git a/tests/qemu-iotests/group b/tests/qemu-iotests/group
index 435dccd5af..939efd9c70 100644
--- a/tests/qemu-iotests/group
+++ b/tests/qemu-iotests/group
@@ -270,6 +270,7 @@
253 rw quick
254 rw backing quick
255 rw quick
+384 rw
256 rw auto quick
257 rw
258 rw quick

View File

@@ -11,6 +11,7 @@ mode was never available for drive-mirror, it makes the interface more
uniform w.r.t. backup block jobs.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/mirror.c | 28 +++------------
blockdev.c | 29 +++++++++++++++
@@ -18,11 +19,11 @@ Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
3 files changed, 70 insertions(+), 59 deletions(-)
diff --git a/block/mirror.c b/block/mirror.c
index c56b0f87e3..dbba6fc80e 100644
index 7024f3bbf0..6211ff22fc 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1584,31 +1584,13 @@ static BlockJob *mirror_start_job(
Error *local_err = NULL;
@@ -1643,31 +1643,13 @@ static BlockJob *mirror_start_job(
uint64_t target_perms, target_shared_perms;
int ret;
- if (sync_mode == MIRROR_SYNC_MODE_INCREMENTAL) {
@@ -59,10 +60,10 @@ index c56b0f87e3..dbba6fc80e 100644
if (bitmap_mode != BITMAP_SYNC_MODE_NEVER) {
diff --git a/blockdev.c b/blockdev.c
index fa3c2f5548..206de2b6c2 100644
index b113e57d68..4be0863050 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3236,7 +3236,36 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
@@ -3034,7 +3034,36 @@ static void blockdev_mirror_common(const char *job_id, BlockDriverState *bs,
sync = MIRROR_SYNC_MODE_FULL;
}

View File

@@ -0,0 +1,206 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Mon, 23 Aug 2021 11:28:32 +0200
Subject: [PATCH] monitor/qmp: fix race with clients disconnecting early
The following sequence can produce a race condition that results in
responses meant for different clients being sent to the wrong one:
(QMP, no OOB)
1) client A connects
2) client A sends 'qmp_capabilities'
3) 'qmp_dispatch' runs in coroutine, schedules out to
'do_qmp_dispatch_bh' and yields
4) client A disconnects (i.e. aborts, crashes, etc...)
5) client B connects
6) 'do_qmp_dispatch_bh' runs 'qmp_capabilities' and wakes calling coroutine
7) capabilities are now set and 'mon->commands' is set to '&qmp_commands'
8) 'qmp_dispatch' returns to 'monitor_qmp_dispatch'
9) success message is sent to client B *without it ever having sent
'qmp_capabilities' itself*
9a) even if client B ignores it, it will now presumably send it's own
greeting, which will error because caps are already set
The fix proposed here uses an atomic, sequential connection number
stored in the MonitorQMP struct, which is incremented everytime a new
client connects. Since it is not changed on CHR_EVENT_CLOSED, the
behaviour of allowing a client to disconnect only one side of the
connection is retained.
The connection_nr needs to be exposed outside of the monitor subsystem,
since qmp_dispatch lives in qapi code. It needs to be checked twice,
once for actually running the command in the BH (fixes 7), and once for
sending back a response (fixes 9).
This satisfies my local reproducer - using multiple clients constantly
looping to open a connection, send the greeting, then exiting no longer
crashes other, normally behaving clients with unrelated responses.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
include/monitor/monitor.h | 1 +
monitor/monitor-internal.h | 7 +++++++
monitor/monitor.c | 15 +++++++++++++++
monitor/qmp.c | 15 ++++++++++++++-
qapi/qmp-dispatch.c | 21 +++++++++++++++++----
stubs/monitor-core.c | 5 +++++
6 files changed, 59 insertions(+), 5 deletions(-)
diff --git a/include/monitor/monitor.h b/include/monitor/monitor.h
index 12d395d62d..b182943324 100644
--- a/include/monitor/monitor.h
+++ b/include/monitor/monitor.h
@@ -16,6 +16,7 @@ extern QemuOptsList qemu_mon_opts;
Monitor *monitor_cur(void);
Monitor *monitor_set_cur(Coroutine *co, Monitor *mon);
bool monitor_cur_is_qmp(void);
+int monitor_get_connection_nr(const Monitor *mon);
void monitor_init_globals(void);
void monitor_init_globals_core(void);
diff --git a/monitor/monitor-internal.h b/monitor/monitor-internal.h
index 3da3f86c6a..9953e0cd2d 100644
--- a/monitor/monitor-internal.h
+++ b/monitor/monitor-internal.h
@@ -151,6 +151,13 @@ typedef struct {
QemuMutex qmp_queue_lock;
/* Input queue that holds all the parsed QMP requests */
GQueue *qmp_requests;
+
+ /*
+ * A sequential number that gets incremented on every new CHR_EVENT_OPENED.
+ * Used to avoid leftover responses in BHs from being sent to the wrong
+ * client. Access with atomics.
+ */
+ int connection_nr;
} MonitorQMP;
/**
diff --git a/monitor/monitor.c b/monitor/monitor.c
index 21c7a68758..ad9813567a 100644
--- a/monitor/monitor.c
+++ b/monitor/monitor.c
@@ -135,6 +135,21 @@ bool monitor_cur_is_qmp(void)
return cur_mon && monitor_is_qmp(cur_mon);
}
+/**
+ * If @mon is a QMP monitor, return the connection_nr, otherwise -1.
+ */
+int monitor_get_connection_nr(const Monitor *mon)
+{
+ MonitorQMP *qmp_mon;
+
+ if (!monitor_is_qmp(mon)) {
+ return -1;
+ }
+
+ qmp_mon = container_of(mon, MonitorQMP, common);
+ return qatomic_read(&qmp_mon->connection_nr);
+}
+
/**
* Is @mon is using readline?
* Note: not all HMP monitors use readline, e.g., gdbserver has a
diff --git a/monitor/qmp.c b/monitor/qmp.c
index 092c527b6f..6b8cfcf6d8 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -141,6 +141,8 @@ static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
QDict *rsp;
QDict *error;
+ int conn_nr_before = qatomic_read(&mon->connection_nr);
+
rsp = qmp_dispatch(mon->commands, req, qmp_oob_enabled(mon),
&mon->common);
@@ -156,7 +158,17 @@ static void monitor_qmp_dispatch(MonitorQMP *mon, QObject *req)
}
}
- monitor_qmp_respond(mon, rsp);
+ /*
+ * qmp_dispatch might have yielded and waited for a BH, in which case there
+ * is a chance a new client connected in the meantime - if this happened,
+ * the command will not have been executed, but we also need to ensure that
+ * we don't send back a corresponding response on a line that no longer
+ * belongs to this request.
+ */
+ if (conn_nr_before == qatomic_read(&mon->connection_nr)) {
+ monitor_qmp_respond(mon, rsp);
+ }
+
qobject_unref(rsp);
}
@@ -444,6 +456,7 @@ static void monitor_qmp_event(void *opaque, QEMUChrEvent event)
switch (event) {
case CHR_EVENT_OPENED:
+ qatomic_inc_fetch(&mon->connection_nr);
mon->commands = &qmp_cap_negotiation_commands;
monitor_qmp_caps_reset(mon);
data = qmp_greeting(mon);
diff --git a/qapi/qmp-dispatch.c b/qapi/qmp-dispatch.c
index d378bccac7..fb8936e7cd 100644
--- a/qapi/qmp-dispatch.c
+++ b/qapi/qmp-dispatch.c
@@ -118,16 +118,28 @@ typedef struct QmpDispatchBH {
QObject **ret;
Error **errp;
Coroutine *co;
+ int conn_nr;
} QmpDispatchBH;
static void do_qmp_dispatch_bh(void *opaque)
{
QmpDispatchBH *data = opaque;
- assert(monitor_cur() == NULL);
- monitor_set_cur(qemu_coroutine_self(), data->cur_mon);
- data->cmd->fn(data->args, data->ret, data->errp);
- monitor_set_cur(qemu_coroutine_self(), NULL);
+ /*
+ * A QMP monitor tracks it's client with a connection number, if this
+ * changes during the scheduling delay of this BH, we must not execute the
+ * command. Otherwise a badly placed 'qmp_capabilities' might affect the
+ * connection state of a client it was never meant for.
+ */
+ if (data->conn_nr == monitor_get_connection_nr(data->cur_mon)) {
+ assert(monitor_cur() == NULL);
+ monitor_set_cur(qemu_coroutine_self(), data->cur_mon);
+ data->cmd->fn(data->args, data->ret, data->errp);
+ monitor_set_cur(qemu_coroutine_self(), NULL);
+ } else {
+ error_setg(data->errp, "active monitor connection changed");
+ }
+
aio_co_wake(data->co);
}
@@ -232,6 +244,7 @@ QDict *qmp_dispatch(const QmpCommandList *cmds, QObject *request,
.ret = &ret,
.errp = &err,
.co = qemu_coroutine_self(),
+ .conn_nr = monitor_get_connection_nr(cur_mon),
};
aio_bh_schedule_oneshot(qemu_get_aio_context(), do_qmp_dispatch_bh,
&data);
diff --git a/stubs/monitor-core.c b/stubs/monitor-core.c
index d058a2a00d..3290b58120 100644
--- a/stubs/monitor-core.c
+++ b/stubs/monitor-core.c
@@ -13,6 +13,11 @@ Monitor *monitor_set_cur(Coroutine *co, Monitor *mon)
return NULL;
}
+int monitor_get_connection_nr(const Monitor *mon)
+{
+ return -1;
+}
+
void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
{
}

View File

@@ -0,0 +1,55 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 1 Sep 2021 16:51:04 +0200
Subject: [PATCH] monitor/hmp: add support for flag argument with value
Adds support for the "-xS" parameter type, where "-x" denotes a flag
name and the "S" suffix indicates that this flag is supposed to take an
arbitrary string parameter.
These parameters are always optional, the entry in the qdict will be
omitted if the flag is not given.
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
monitor/hmp.c | 17 ++++++++++++++++-
1 file changed, 16 insertions(+), 1 deletion(-)
diff --git a/monitor/hmp.c b/monitor/hmp.c
index b20737e63c..b29dbb1833 100644
--- a/monitor/hmp.c
+++ b/monitor/hmp.c
@@ -981,6 +981,7 @@ static QDict *monitor_parse_arguments(Monitor *mon,
{
const char *tmp = p;
int skip_key = 0;
+ int ret;
/* option */
c = *typestr++;
@@ -1003,8 +1004,22 @@ static QDict *monitor_parse_arguments(Monitor *mon,
}
if (skip_key) {
p = tmp;
+ } else if (*typestr == 'S') {
+ /* has option with string value */
+ typestr++;
+ tmp = p++;
+ while (qemu_isspace(*p)) {
+ p++;
+ }
+ ret = get_str(buf, sizeof(buf), &p);
+ if (ret < 0) {
+ monitor_printf(mon, "%s: value expected for -%c\n",
+ cmd->name, *tmp);
+ goto fail;
+ }
+ qdict_put_str(qdict, key, buf);
} else {
- /* has option */
+ /* has boolean option */
p++;
qdict_put_bool(qdict, key, true);
}

View File

@@ -0,0 +1,477 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 25 Aug 2021 11:14:13 +0200
Subject: [PATCH] monitor: refactor set/expire_password and allow VNC display
id
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
It is possible to specify more than one VNC server on the command line,
either with an explicit ID or the auto-generated ones à la "default",
"vnc2", "vnc3", ...
It is not possible to change the password on one of these extra VNC
displays though. Fix this by adding a "display" parameter to the
"set_password" and "expire_password" QMP and HMP commands.
For HMP, the display is specified using the "-d" value flag.
For QMP, the schema is updated to explicitly express the supported
variants of the commands with protocol-discriminated unions.
Suggested-by: Eric Blake <eblake@redhat.com>
Suggested-by: Markus Armbruster <armbru@redhat.com>
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
hmp-commands.hx | 24 ++++---
monitor/hmp-cmds.c | 57 +++++++++++++++-
monitor/qmp-cmds.c | 62 ++++++-----------
qapi/ui.json | 165 ++++++++++++++++++++++++++++++++++++++-------
4 files changed, 231 insertions(+), 77 deletions(-)
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 70a9136ac2..5efb47fc32 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1514,33 +1514,35 @@ ERST
{
.name = "set_password",
- .args_type = "protocol:s,password:s,connected:s?",
- .params = "protocol password action-if-connected",
+ .args_type = "protocol:s,password:s,display:-dS,connected:s?",
+ .params = "protocol password [-d display] [action-if-connected]",
.help = "set spice/vnc password",
.cmd = hmp_set_password,
},
SRST
-``set_password [ vnc | spice ] password [ action-if-connected ]``
- Change spice/vnc password. *action-if-connected* specifies what
- should happen in case a connection is established: *fail* makes the
- password change fail. *disconnect* changes the password and
+``set_password [ vnc | spice ] password [ -d display ] [ action-if-connected ]``
+ Change spice/vnc password. *display* can be used with 'vnc' to specify
+ which display to set the password on. *action-if-connected* specifies
+ what should happen in case a connection is established: *fail* makes
+ the password change fail. *disconnect* changes the password and
disconnects the client. *keep* changes the password and keeps the
connection up. *keep* is the default.
ERST
{
.name = "expire_password",
- .args_type = "protocol:s,time:s",
- .params = "protocol time",
+ .args_type = "protocol:s,time:s,display:-dS",
+ .params = "protocol time [-d display]",
.help = "set spice/vnc password expire-time",
.cmd = hmp_expire_password,
},
SRST
-``expire_password [ vnc | spice ]`` *expire-time*
- Specify when a password for spice/vnc becomes
- invalid. *expire-time* accepts:
+``expire_password [ vnc | spice ] expire-time [ -d display ]``
+ Specify when a password for spice/vnc becomes invalid.
+ *display* behaves the same as in ``set_password``.
+ *expire-time* accepts:
``now``
Invalidate password instantly.
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 9c91bf93e9..2e91ccb738 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1384,10 +1384,41 @@ void hmp_set_password(Monitor *mon, const QDict *qdict)
{
const char *protocol = qdict_get_str(qdict, "protocol");
const char *password = qdict_get_str(qdict, "password");
+ const char *display = qdict_get_try_str(qdict, "display");
const char *connected = qdict_get_try_str(qdict, "connected");
Error *err = NULL;
+ DisplayProtocol proto;
- qmp_set_password(protocol, password, !!connected, connected, &err);
+ SetPasswordOptions opts = {
+ .password = g_strdup(password),
+ .u.vnc.display = NULL,
+ };
+
+ proto = qapi_enum_parse(&DisplayProtocol_lookup, protocol,
+ DISPLAY_PROTOCOL_VNC, &err);
+ if (err) {
+ hmp_handle_error(mon, err);
+ return;
+ }
+ opts.protocol = proto;
+
+ if (proto == DISPLAY_PROTOCOL_VNC) {
+ opts.u.vnc.has_display = !!display;
+ opts.u.vnc.display = g_strdup(display);
+ } else if (proto == DISPLAY_PROTOCOL_SPICE) {
+ opts.u.spice.has_connected = !!connected;
+ opts.u.spice.connected =
+ qapi_enum_parse(&SetPasswordAction_lookup, connected,
+ SET_PASSWORD_ACTION_KEEP, &err);
+ if (err) {
+ hmp_handle_error(mon, err);
+ return;
+ }
+ }
+
+ qmp_set_password(&opts, &err);
+ g_free(opts.password);
+ g_free(opts.u.vnc.display);
hmp_handle_error(mon, err);
}
@@ -1395,9 +1426,31 @@ void hmp_expire_password(Monitor *mon, const QDict *qdict)
{
const char *protocol = qdict_get_str(qdict, "protocol");
const char *whenstr = qdict_get_str(qdict, "time");
+ const char *display = qdict_get_try_str(qdict, "display");
Error *err = NULL;
+ DisplayProtocol proto;
- qmp_expire_password(protocol, whenstr, &err);
+ ExpirePasswordOptions opts = {
+ .time = g_strdup(whenstr),
+ .u.vnc.display = NULL,
+ };
+
+ proto = qapi_enum_parse(&DisplayProtocol_lookup, protocol,
+ DISPLAY_PROTOCOL_VNC, &err);
+ if (err) {
+ hmp_handle_error(mon, err);
+ return;
+ }
+ opts.protocol = proto;
+
+ if (proto == DISPLAY_PROTOCOL_VNC) {
+ opts.u.vnc.has_display = !!display;
+ opts.u.vnc.display = g_strdup(display);
+ }
+
+ qmp_expire_password(&opts, &err);
+ g_free(opts.time);
+ g_free(opts.u.vnc.display);
hmp_handle_error(mon, err);
}
diff --git a/monitor/qmp-cmds.c b/monitor/qmp-cmds.c
index 343353e27a..729ca7cceb 100644
--- a/monitor/qmp-cmds.c
+++ b/monitor/qmp-cmds.c
@@ -167,45 +167,30 @@ void qmp_system_wakeup(Error **errp)
qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, errp);
}
-void qmp_set_password(const char *protocol, const char *password,
- bool has_connected, const char *connected, Error **errp)
+void qmp_set_password(SetPasswordOptions *opts, Error **errp)
{
- int disconnect_if_connected = 0;
- int fail_if_connected = 0;
- int rc;
+ bool disconnect_if_connected = false;
+ bool fail_if_connected = false;
+ int rc = 0;
- if (has_connected) {
- if (strcmp(connected, "fail") == 0) {
- fail_if_connected = 1;
- } else if (strcmp(connected, "disconnect") == 0) {
- disconnect_if_connected = 1;
- } else if (strcmp(connected, "keep") == 0) {
- /* nothing */
- } else {
- error_setg(errp, QERR_INVALID_PARAMETER, "connected");
- return;
- }
- }
-
- if (strcmp(protocol, "spice") == 0) {
+ if (opts->protocol == DISPLAY_PROTOCOL_SPICE) {
if (!qemu_using_spice(errp)) {
return;
}
- rc = qemu_spice.set_passwd(password, fail_if_connected,
+ if (opts->u.spice.has_connected) {
+ fail_if_connected =
+ opts->u.spice.connected == SET_PASSWORD_ACTION_FAIL;
+ disconnect_if_connected =
+ opts->u.spice.connected == SET_PASSWORD_ACTION_DISCONNECT;
+ }
+ rc = qemu_spice.set_passwd(opts->password, fail_if_connected,
disconnect_if_connected);
- } else if (strcmp(protocol, "vnc") == 0) {
- if (fail_if_connected || disconnect_if_connected) {
- /* vnc supports "connected=keep" only */
- error_setg(errp, QERR_INVALID_PARAMETER, "connected");
- return;
- }
+ } else if (opts->protocol == DISPLAY_PROTOCOL_VNC) {
/* Note that setting an empty password will not disable login through
* this interface. */
- rc = vnc_display_password(NULL, password);
- } else {
- error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "protocol",
- "'vnc' or 'spice'");
- return;
+ rc = vnc_display_password(
+ opts->u.vnc.has_display ? opts->u.vnc.display : NULL,
+ opts->password);
}
if (rc != 0) {
@@ -213,11 +198,11 @@ void qmp_set_password(const char *protocol, const char *password,
}
}
-void qmp_expire_password(const char *protocol, const char *whenstr,
- Error **errp)
+void qmp_expire_password(ExpirePasswordOptions *opts, Error **errp)
{
time_t when;
int rc;
+ const char* whenstr = opts->time;
if (strcmp(whenstr, "now") == 0) {
when = 0;
@@ -229,17 +214,14 @@ void qmp_expire_password(const char *protocol, const char *whenstr,
when = strtoull(whenstr, NULL, 10);
}
- if (strcmp(protocol, "spice") == 0) {
+ if (opts->protocol == DISPLAY_PROTOCOL_SPICE) {
if (!qemu_using_spice(errp)) {
return;
}
rc = qemu_spice.set_pw_expire(when);
- } else if (strcmp(protocol, "vnc") == 0) {
- rc = vnc_display_pw_expire(NULL, when);
- } else {
- error_setg(errp, QERR_INVALID_PARAMETER_VALUE, "protocol",
- "'vnc' or 'spice'");
- return;
+ } else if (opts->protocol == DISPLAY_PROTOCOL_VNC) {
+ rc = vnc_display_pw_expire(
+ opts->u.vnc.has_display ? opts->u.vnc.display : NULL, when);
}
if (rc != 0) {
diff --git a/qapi/ui.json b/qapi/ui.json
index d7567ac866..4244c62c30 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -9,22 +9,23 @@
{ 'include': 'common.json' }
{ 'include': 'sockets.json' }
+##
+# @DisplayProtocol:
+#
+# Display protocols which support changing password options.
+#
+# Since: 6.2
+#
+##
+{ 'enum': 'DisplayProtocol',
+ 'data': [ { 'name': 'vnc', 'if': 'CONFIG_VNC' },
+ { 'name': 'spice', 'if': 'CONFIG_SPICE' } ] }
+
##
# @set_password:
#
# Sets the password of a remote display session.
#
-# @protocol: - 'vnc' to modify the VNC server password
-# - 'spice' to modify the Spice server password
-#
-# @password: the new password
-#
-# @connected: how to handle existing clients when changing the
-# password. If nothing is specified, defaults to 'keep'
-# 'fail' to fail the command if clients are connected
-# 'disconnect' to disconnect existing clients
-# 'keep' to maintain existing clients
-#
# Returns: - Nothing on success
# - If Spice is not enabled, DeviceNotFound
#
@@ -37,16 +38,123 @@
# <- { "return": {} }
#
##
-{ 'command': 'set_password',
- 'data': {'protocol': 'str', 'password': 'str', '*connected': 'str'} }
+{ 'command': 'set_password', 'boxed': true, 'data': 'SetPasswordOptions' }
+
+##
+# @SetPasswordOptions:
+#
+# Data required to set a new password on a display server protocol.
+#
+# @protocol: - 'vnc' to modify the VNC server password
+# - 'spice' to modify the Spice server password
+#
+# @password: the new password
+#
+# Since: 6.2
+#
+##
+{ 'union': 'SetPasswordOptions',
+ 'base': { 'protocol': 'DisplayProtocol',
+ 'password': 'str' },
+ 'discriminator': 'protocol',
+ 'data': { 'vnc': 'SetPasswordOptionsVnc',
+ 'spice': 'SetPasswordOptionsSpice' } }
+
+##
+# @SetPasswordAction:
+#
+# An action to take on changing a password on a connection with active clients.
+#
+# @fail: fail the command if clients are connected
+#
+# @disconnect: disconnect existing clients
+#
+# @keep: maintain existing clients
+#
+# Since: 6.2
+#
+##
+{ 'enum': 'SetPasswordAction',
+ 'data': [ 'fail', 'disconnect', 'keep' ] }
+
+##
+# @SetPasswordActionVnc:
+#
+# See @SetPasswordAction. VNC only supports the keep action. 'connection'
+# should just be omitted for VNC, this is kept for backwards compatibility.
+#
+# @keep: maintain existing clients
+#
+# Since: 6.2
+#
+##
+{ 'enum': 'SetPasswordActionVnc',
+ 'data': [ 'keep' ] }
+
+##
+# @SetPasswordOptionsSpice:
+#
+# Options for set_password specific to the VNC procotol.
+#
+# @connected: How to handle existing clients when changing the
+# password. If nothing is specified, defaults to 'keep'.
+#
+# Since: 6.2
+#
+##
+{ 'struct': 'SetPasswordOptionsSpice',
+ 'data': { '*connected': 'SetPasswordAction' } }
+
+##
+# @SetPasswordOptionsVnc:
+#
+# Options for set_password specific to the VNC procotol.
+#
+# @display: The id of the display where the password should be changed.
+# Defaults to the first.
+#
+# @connected: How to handle existing clients when changing the
+# password.
+#
+# Features:
+# @deprecated: For VNC, @connected will always be 'keep', parameter should be
+# omitted.
+#
+# Since: 6.2
+#
+##
+{ 'struct': 'SetPasswordOptionsVnc',
+ 'data': { '*display': 'str',
+ '*connected': { 'type': 'SetPasswordActionVnc',
+ 'features': ['deprecated'] } } }
##
# @expire_password:
#
# Expire the password of a remote display server.
#
-# @protocol: the name of the remote display protocol 'vnc' or 'spice'
+# Returns: - Nothing on success
+# - If @protocol is 'spice' and Spice is not active, DeviceNotFound
#
+# Since: 0.14
+#
+# Example:
+#
+# -> { "execute": "expire_password", "arguments": { "protocol": "vnc",
+# "time": "+60" } }
+# <- { "return": {} }
+#
+##
+{ 'command': 'expire_password', 'boxed': true, 'data': 'ExpirePasswordOptions' }
+
+##
+# @ExpirePasswordOptions:
+#
+# Data required to set password expiration on a display server protocol.
+#
+# @protocol: - 'vnc' to modify the VNC server expiration
+# - 'spice' to modify the Spice server expiration
+
# @time: when to expire the password.
#
# - 'now' to expire the password immediately
@@ -54,24 +162,33 @@
# - '+INT' where INT is the number of seconds from now (integer)
# - 'INT' where INT is the absolute time in seconds
#
-# Returns: - Nothing on success
-# - If @protocol is 'spice' and Spice is not active, DeviceNotFound
-#
-# Since: 0.14
-#
# Notes: Time is relative to the server and currently there is no way to
# coordinate server time with client time. It is not recommended to
# use the absolute time version of the @time parameter unless you're
# sure you are on the same machine as the QEMU instance.
#
-# Example:
+# Since: 6.2
#
-# -> { "execute": "expire_password", "arguments": { "protocol": "vnc",
-# "time": "+60" } }
-# <- { "return": {} }
+##
+{ 'union': 'ExpirePasswordOptions',
+ 'base': { 'protocol': 'DisplayProtocol',
+ 'time': 'str' },
+ 'discriminator': 'protocol',
+ 'data': { 'vnc': 'ExpirePasswordOptionsVnc' } }
+
+##
+# @ExpirePasswordOptionsVnc:
+#
+# Options for expire_password specific to the VNC procotol.
+#
+# @display: The id of the display where the expiration should be changed.
+# Defaults to the first.
+#
+# Since: 6.2
#
##
-{ 'command': 'expire_password', 'data': {'protocol': 'str', 'time': 'str'} }
+{ 'struct': 'ExpirePasswordOptionsVnc',
+ 'data': { '*display': 'str' } }
##
# @screendump:

View File

@@ -0,0 +1,43 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Tue, 18 Jan 2022 17:59:59 +0100
Subject: [PATCH] block/io: Update BSC only if want_zero is true
We update the block-status cache whenever we get new information from a
bdrv_co_block_status() call to the block driver. However, if we have
passed want_zero=false to that call, it may flag areas containing zeroes
as data, and so we would update the block-status cache with wrong
information.
Therefore, we should not update the cache with want_zero=false.
Reported-by: Nir Soffer <nsoffer@redhat.com>
Fixes: 0bc329fbb00 ("block: block-status cache for data regions")
Reviewed-by: Nir Soffer <nsoffer@redhat.com>
Cc: qemu-stable@nongnu.org
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
Message-Id: <20220118170000.49423-2-hreitz@redhat.com>
Reviewed-by: Eric Blake <eblake@redhat.com>
Signed-off-by: Eric Blake <eblake@redhat.com>
---
block/io.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/block/io.c b/block/io.c
index bb0a254def..4e4cb556c5 100644
--- a/block/io.c
+++ b/block/io.c
@@ -2497,8 +2497,12 @@ static int coroutine_fn bdrv_co_block_status(BlockDriverState *bs,
* non-protocol nodes, and then it is never used. However, filling
* the cache requires an RCU update, so double check here to avoid
* such an update if possible.
+ *
+ * Check want_zero, because we only want to update the cache when we
+ * have accurate information about what is zero and what is data.
*/
- if (ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
+ if (want_zero &&
+ ret == (BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID) &&
QLIST_EMPTY(&bs->children))
{
/*

View File

@@ -0,0 +1,40 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Wed, 9 Feb 2022 15:02:52 +0100
Subject: [PATCH] block/nbd: Delete reconnect delay timer when done
We start the reconnect delay timer to cancel the reconnection attempt
after a while. Once nbd_co_do_establish_connection() has returned, this
attempt is over, and we no longer need the timer.
Delete it before returning from nbd_reconnect_attempt(), so that it does
not persist beyond the I/O request that was paused for reconnecting; we
do not want it to fire in a drained section, because all sort of things
can happen in such a section (e.g. the AioContext might be changed, and
we do not want the timer to fire in the wrong context; or the BDS might
even be deleted, and so the timer CB would access already-freed data).
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
---
block/nbd.c | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/block/nbd.c b/block/nbd.c
index 5ef462db1b..b8e5a9b4cc 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -353,6 +353,13 @@ static coroutine_fn void nbd_reconnect_attempt(BDRVNBDState *s)
}
nbd_co_do_establish_connection(s->bs, NULL);
+
+ /*
+ * The reconnect attempt is done (maybe successfully, maybe not), so
+ * we no longer need this timer. Delete it so it will not outlive
+ * this I/O request (so draining removes all timers).
+ */
+ reconnect_delay_timer_del(s);
}
static coroutine_fn int nbd_receive_replies(BDRVNBDState *s, uint64_t handle)

View File

@@ -0,0 +1,34 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Wed, 9 Feb 2022 15:02:54 +0100
Subject: [PATCH] block/nbd: Assert there are no timers when closed
Our two timers must not remain armed beyond nbd_clear_bdrvstate(), or
they will access freed data when they fire.
This patch is separate from the patches that actually fix the issue
(HEAD^^ and HEAD^) so that you can run the associated regression iotest
(281) on a configuration that reproducibly exposes the bug.
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
[FE: backport (open_timer doesn't exist yet in 6.2.0)]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
block/nbd.c | 3 +++
1 file changed, 3 insertions(+)
diff --git a/block/nbd.c b/block/nbd.c
index b8e5a9b4cc..aab20125d8 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -108,6 +108,9 @@ static void nbd_clear_bdrvstate(BlockDriverState *bs)
yank_unregister_instance(BLOCKDEV_YANK_INSTANCE(bs->node_name));
+ /* Must not leave timers behind that would access freed data */
+ assert(!s->reconnect_delay_timer);
+
object_unref(OBJECT(s->tlscreds));
qapi_free_SocketAddress(s->saddr);
s->saddr = NULL;

View File

@@ -0,0 +1,90 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Hanna Reitz <hreitz@redhat.com>
Date: Wed, 9 Feb 2022 15:02:57 +0100
Subject: [PATCH] block/nbd: Move s->ioc on AioContext change
s->ioc must always be attached to the NBD node's AioContext. If that
context changes, s->ioc must be attached to the new context.
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2033626
Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Signed-off-by: Hanna Reitz <hreitz@redhat.com>
[FE: backport (open_timer doesn't exist yet in 6.2.0)]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
block/nbd.c | 41 +++++++++++++++++++++++++++++++++++++++++
1 file changed, 41 insertions(+)
diff --git a/block/nbd.c b/block/nbd.c
index aab20125d8..a3896c7f5f 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -2003,6 +2003,38 @@ static void nbd_cancel_in_flight(BlockDriverState *bs)
nbd_co_establish_connection_cancel(s->conn);
}
+static void nbd_attach_aio_context(BlockDriverState *bs,
+ AioContext *new_context)
+{
+ BDRVNBDState *s = bs->opaque;
+
+ /*
+ * The reconnect_delay_timer is scheduled in I/O paths when the
+ * connection is lost, to cancel the reconnection attempt after a
+ * given time. Once this attempt is done (successfully or not),
+ * nbd_reconnect_attempt() ensures the timer is deleted before the
+ * respective I/O request is resumed.
+ * Since the AioContext can only be changed when a node is drained,
+ * the reconnect_delay_timer cannot be active here.
+ */
+ assert(!s->reconnect_delay_timer);
+
+ if (s->ioc) {
+ qio_channel_attach_aio_context(s->ioc, new_context);
+ }
+}
+
+static void nbd_detach_aio_context(BlockDriverState *bs)
+{
+ BDRVNBDState *s = bs->opaque;
+
+ assert(!s->reconnect_delay_timer);
+
+ if (s->ioc) {
+ qio_channel_detach_aio_context(s->ioc);
+ }
+}
+
static BlockDriver bdrv_nbd = {
.format_name = "nbd",
.protocol_name = "nbd",
@@ -2026,6 +2058,9 @@ static BlockDriver bdrv_nbd = {
.bdrv_dirname = nbd_dirname,
.strong_runtime_opts = nbd_strong_runtime_opts,
.bdrv_cancel_in_flight = nbd_cancel_in_flight,
+
+ .bdrv_attach_aio_context = nbd_attach_aio_context,
+ .bdrv_detach_aio_context = nbd_detach_aio_context,
};
static BlockDriver bdrv_nbd_tcp = {
@@ -2051,6 +2086,9 @@ static BlockDriver bdrv_nbd_tcp = {
.bdrv_dirname = nbd_dirname,
.strong_runtime_opts = nbd_strong_runtime_opts,
.bdrv_cancel_in_flight = nbd_cancel_in_flight,
+
+ .bdrv_attach_aio_context = nbd_attach_aio_context,
+ .bdrv_detach_aio_context = nbd_detach_aio_context,
};
static BlockDriver bdrv_nbd_unix = {
@@ -2076,6 +2114,9 @@ static BlockDriver bdrv_nbd_unix = {
.bdrv_dirname = nbd_dirname,
.strong_runtime_opts = nbd_strong_runtime_opts,
.bdrv_cancel_in_flight = nbd_cancel_in_flight,
+
+ .bdrv_attach_aio_context = nbd_attach_aio_context,
+ .bdrv_detach_aio_context = nbd_detach_aio_context,
};
static void bdrv_nbd_init(void)

View File

@@ -0,0 +1,89 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Igor Mammedov <imammedo@redhat.com>
Date: Mon, 27 Dec 2021 14:31:17 -0500
Subject: [PATCH] acpi: fix QEMU crash when started with SLIC table
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
if QEMU is started with used provided SLIC table blob,
-acpitable sig=SLIC,oem_id='CRASH ',oem_table_id="ME",oem_rev=00002210,asl_compiler_id="",asl_compiler_rev=00000000,data=/dev/null
it will assert with:
hw/acpi/aml-build.c:61:build_append_padded_str: assertion failed: (len <= maxlen)
and following backtrace:
...
build_append_padded_str (array=0x555556afe320, str=0x555556afdb2e "CRASH ME", maxlen=0x6, pad=0x20) at hw/acpi/aml-build.c:61
acpi_table_begin (desc=0x7fffffffd1b0, array=0x555556afe320) at hw/acpi/aml-build.c:1727
build_fadt (tbl=0x555556afe320, linker=0x555557ca3830, f=0x7fffffffd318, oem_id=0x555556afdb2e "CRASH ME", oem_table_id=0x555556afdb34 "ME") at hw/acpi/aml-build.c:2064
...
which happens due to acpi_table_begin() expecting NULL terminated
oem_id and oem_table_id strings, which is normally the case, but
in case of user provided SLIC table, oem_id points to table's blob
directly and as result oem_id became longer than expected.
Fix issue by handling oem_id consistently and make acpi_get_slic_oem()
return NULL terminated strings.
PS:
After [1] refactoring, oem_id semantics became inconsistent, where
NULL terminated string was coming from machine and old way pointer
into byte array coming from -acpitable option. That used to work
since build_header() wasn't expecting NULL terminated string and
blindly copied the 1st 6 bytes only.
However commit [2] broke that by replacing build_header() with
acpi_table_begin(), which was expecting NULL terminated string
and was checking oem_id size.
1) 602b45820 ("acpi: Permit OEM ID and OEM table ID fields to be changed")
2)
Fixes: 4b56e1e4eb08 ("acpi: build_fadt: use acpi_table_begin()/acpi_table_end() instead of build_header()")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/786
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Message-Id: <20211227193120.1084176-2-imammedo@redhat.com>
Reviewed-by: Philippe Mathieu-Daudé <philmd@redhat.com>
Tested-by: Denis Lisov <dennis.lissov@gmail.com>
Tested-by: Alexander Tsoy <alexander@tsoy.me>
Cc: qemu-stable@nongnu.org
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 8cdb99af45365727ac17f45239a9b8c1d5155c6d)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
hw/acpi/core.c | 4 ++--
hw/i386/acpi-build.c | 2 ++
2 files changed, 4 insertions(+), 2 deletions(-)
diff --git a/hw/acpi/core.c b/hw/acpi/core.c
index 1e004d0078..3e811bf03c 100644
--- a/hw/acpi/core.c
+++ b/hw/acpi/core.c
@@ -345,8 +345,8 @@ int acpi_get_slic_oem(AcpiSlicOem *oem)
struct acpi_table_header *hdr = (void *)(u - sizeof(hdr->_length));
if (memcmp(hdr->sig, "SLIC", 4) == 0) {
- oem->id = hdr->oem_id;
- oem->table_id = hdr->oem_table_id;
+ oem->id = g_strndup(hdr->oem_id, 6);
+ oem->table_id = g_strndup(hdr->oem_table_id, 8);
return 0;
}
}
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index a99c6e4fe3..570f82997b 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2721,6 +2721,8 @@ void acpi_build(AcpiBuildTables *tables, MachineState *machine)
/* Cleanup memory that's no longer used. */
g_array_free(table_offsets, true);
+ g_free(slic_oem.id);
+ g_free(slic_oem.table_id);
}
static void acpi_ram_update(MemoryRegion *mr, GArray *data)

View File

@@ -0,0 +1,38 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Jason Wang <jasowang@redhat.com>
Date: Tue, 8 Mar 2022 10:42:51 +0800
Subject: [PATCH] virtio-net: fix map leaking on error during receive
Commit bedd7e93d0196 ("virtio-net: fix use after unmap/free for sg")
tries to fix the use after free of the sg by caching the virtqueue
elements in an array and unmap them at once after receiving the
packets, But it forgot to unmap the cached elements on error which
will lead to leaking of mapping and other unexpected results.
Fixing this by detaching the cached elements on error. This addresses
CVE-2022-26353.
Reported-by: Victor Tom <vv474172261@gmail.com>
Cc: qemu-stable@nongnu.org
Fixes: CVE-2022-26353
Fixes: bedd7e93d0196 ("virtio-net: fix use after unmap/free for sg")
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit abe300d9d894f7138e1af7c8e9c88c04bfe98b37)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
hw/net/virtio-net.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index f2014d5ea0..e1f4748831 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1862,6 +1862,7 @@ static ssize_t virtio_net_receive_rcu(NetClientState *nc, const uint8_t *buf,
err:
for (j = 0; j < i; j++) {
+ virtqueue_detach_element(q->rx_vq, elems[j], lens[j]);
g_free(elems[j]);
}

View File

@@ -0,0 +1,86 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Peter Xu <peterx@redhat.com>
Date: Tue, 30 Nov 2021 16:00:28 +0800
Subject: [PATCH] memory: Fix incorrect calls of log_global_start/stop
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
We should only call the log_global_start/stop when the global dirty track
bitmask changes from zero<->non-zero.
No real issue reported for this yet probably because no immediate user to
enable both dirty rate measurement and migration at the same time. However
it'll be good to be prepared for it.
Fixes: 63b41db4bc ("memory: make global_dirty_tracking a bitmask")
Cc: qemu-stable@nongnu.org
Cc: Hyman Huang <huangy81@chinatelecom.cn>
Cc: Paolo Bonzini <pbonzini@redhat.com>
Cc: Dr. David Alan Gilbert <dgilbert@redhat.com>
Cc: Juan Quintela <quintela@redhat.com>
Cc: David Hildenbrand <david@redhat.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
Reviewed-by: David Hildenbrand <david@redhat.com>
Message-Id: <20211130080028.6474-1-peterx@redhat.com>
Signed-off-by: Philippe Mathieu-Daudé <f4bug@amsat.org>
(cherry picked from commit 7b0538ed3a22ce30817f818449d10701fb0821f9)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
softmmu/memory.c | 27 ++++++++++++++-------------
1 file changed, 14 insertions(+), 13 deletions(-)
diff --git a/softmmu/memory.c b/softmmu/memory.c
index 7340e19ff5..81d4bf1454 100644
--- a/softmmu/memory.c
+++ b/softmmu/memory.c
@@ -2773,6 +2773,8 @@ static VMChangeStateEntry *vmstate_change;
void memory_global_dirty_log_start(unsigned int flags)
{
+ unsigned int old_flags = global_dirty_tracking;
+
if (vmstate_change) {
qemu_del_vm_change_state_handler(vmstate_change);
vmstate_change = NULL;
@@ -2781,15 +2783,14 @@ void memory_global_dirty_log_start(unsigned int flags)
assert(flags && !(flags & (~GLOBAL_DIRTY_MASK)));
assert(!(global_dirty_tracking & flags));
global_dirty_tracking |= flags;
-
trace_global_dirty_changed(global_dirty_tracking);
- MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward);
-
- /* Refresh DIRTY_MEMORY_MIGRATION bit. */
- memory_region_transaction_begin();
- memory_region_update_pending = true;
- memory_region_transaction_commit();
+ if (!old_flags) {
+ MEMORY_LISTENER_CALL_GLOBAL(log_global_start, Forward);
+ memory_region_transaction_begin();
+ memory_region_update_pending = true;
+ memory_region_transaction_commit();
+ }
}
static void memory_global_dirty_log_do_stop(unsigned int flags)
@@ -2800,12 +2801,12 @@ static void memory_global_dirty_log_do_stop(unsigned int flags)
trace_global_dirty_changed(global_dirty_tracking);
- /* Refresh DIRTY_MEMORY_MIGRATION bit. */
- memory_region_transaction_begin();
- memory_region_update_pending = true;
- memory_region_transaction_commit();
-
- MEMORY_LISTENER_CALL_GLOBAL(log_global_stop, Reverse);
+ if (!global_dirty_tracking) {
+ memory_region_transaction_begin();
+ memory_region_update_pending = true;
+ memory_region_transaction_commit();
+ MEMORY_LISTENER_CALL_GLOBAL(log_global_stop, Reverse);
+ }
}
static void memory_vm_change_state_handler(void *opaque, bool running,

View File

@@ -0,0 +1,59 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Igor Mammedov <imammedo@redhat.com>
Date: Wed, 12 Jan 2022 08:03:31 -0500
Subject: [PATCH] acpi: fix OEM ID/OEM Table ID padding
Commit [2] broke original '\0' padding of OEM ID and OEM Table ID
fields in headers of ACPI tables. While it doesn't have impact on
default values since QEMU uses 6 and 8 characters long values
respectively, it broke usecase where IDs are provided on QEMU CLI.
It shouldn't affect guest (but may cause licensing verification
issues in guest OS).
One of the broken usecases is user supplied SLIC table with IDs
shorter than max possible length, where [2] mangles IDs with extra
spaces in RSDT and FADT tables whereas guest OS expects those to
mirror the respective values of the used SLIC table.
Fix it by replacing whitespace padding with '\0' padding in
accordance with [1] and expectations of guest OS
1) ACPI spec, v2.0b
17.2 AML Grammar Definition
...
//OEM ID of up to 6 characters. If the OEM ID is
//shorter than 6 characters, it can be terminated
//with a NULL character.
2)
Fixes: 602b458201 ("acpi: Permit OEM ID and OEM table ID fields to be changed")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/707
Reported-by: Dmitry V. Orekhov <dima.orekhov@gmail.com>
Signed-off-by: Igor Mammedov <imammedo@redhat.com>
Cc: qemu-stable@nongnu.org
Message-Id: <20220112130332.1648664-4-imammedo@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Reviewed-by: Ani Sinha <ani@anisinha.ca>
Tested-by: Dmitry V. Orekhov dima.orekhov@gmail.com
(cherry picked from commit 748c030f360a940fe0c9382c8ca1649096c3a80d)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
hw/acpi/aml-build.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index b3b3310df3..65148d5b9d 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1724,9 +1724,9 @@ void acpi_table_begin(AcpiTable *desc, GArray *array)
build_append_int_noprefix(array, 0, 4); /* Length */
build_append_int_noprefix(array, desc->rev, 1); /* Revision */
build_append_int_noprefix(array, 0, 1); /* Checksum */
- build_append_padded_str(array, desc->oem_id, 6, ' '); /* OEMID */
+ build_append_padded_str(array, desc->oem_id, 6, '\0'); /* OEMID */
/* OEM Table ID */
- build_append_padded_str(array, desc->oem_table_id, 8, ' ');
+ build_append_padded_str(array, desc->oem_table_id, 8, '\0');
build_append_int_noprefix(array, 1, 4); /* OEM Revision */
g_array_append_vals(array, ACPI_BUILD_APPNAME8, 4); /* Creator ID */
build_append_int_noprefix(array, 1, 4); /* Creator Revision */

View File

@@ -0,0 +1,55 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefano Garzarella <sgarzare@redhat.com>
Date: Mon, 28 Feb 2022 10:50:58 +0100
Subject: [PATCH] vhost-vsock: detach the virqueue element in case of error
In vhost_vsock_common_send_transport_reset(), if an element popped from
the virtqueue is invalid, we should call virtqueue_detach_element() to
detach it from the virtqueue before freeing its memory.
Fixes: fc0b9b0e1c ("vhost-vsock: add virtio sockets device")
Fixes: CVE-2022-26354
Cc: qemu-stable@nongnu.org
Reported-by: VictorV <vv474172261@gmail.com>
Signed-off-by: Stefano Garzarella <sgarzare@redhat.com>
Message-Id: <20220228095058.27899-1-sgarzare@redhat.com>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 8d1b247f3748ac4078524130c6d7ae42b6140aaf)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
hw/virtio/vhost-vsock-common.c | 10 +++++++---
1 file changed, 7 insertions(+), 3 deletions(-)
diff --git a/hw/virtio/vhost-vsock-common.c b/hw/virtio/vhost-vsock-common.c
index 3f3771274e..ed706681ac 100644
--- a/hw/virtio/vhost-vsock-common.c
+++ b/hw/virtio/vhost-vsock-common.c
@@ -153,19 +153,23 @@ static void vhost_vsock_common_send_transport_reset(VHostVSockCommon *vvc)
if (elem->out_num) {
error_report("invalid vhost-vsock event virtqueue element with "
"out buffers");
- goto out;
+ goto err;
}
if (iov_from_buf(elem->in_sg, elem->in_num, 0,
&event, sizeof(event)) != sizeof(event)) {
error_report("vhost-vsock event virtqueue element is too short");
- goto out;
+ goto err;
}
virtqueue_push(vq, elem, sizeof(event));
virtio_notify(VIRTIO_DEVICE(vvc), vq);
-out:
+ g_free(elem);
+ return;
+
+err:
+ virtqueue_detach_element(vq, elem, 0);
g_free(elem);
}

View File

@@ -0,0 +1,98 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Xueming Li <xuemingl@nvidia.com>
Date: Mon, 7 Feb 2022 15:19:28 +0800
Subject: [PATCH] vhost-user: remove VirtQ notifier restore
Notifier set when vhost-user backend asks qemu to mmap an FD and
offset. When vhost-user backend restart or getting killed, VQ notifier
FD and mmap addresses become invalid. After backend restart, MR contains
the invalid address will be restored and fail on notifier access.
On the other hand, qemu should munmap the notifier, release underlying
hardware resources to enable backend restart and allocate hardware
notifier resources correctly.
Qemu shouldn't reference and use resources of disconnected backend.
This patch removes VQ notifier restore, uses the default vhost-user
notifier to avoid invalid address access.
After backend restart, the backend should ask qemu to install a hardware
notifier if needed.
Fixes: 44866521bd6e ("vhost-user: support registering external host notifiers")
Cc: qemu-stable@nongnu.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Message-Id: <20220207071929.527149-2-xuemingl@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit e867144b73b3c5009266b6df07d5ff44acfb82c3)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
hw/virtio/vhost-user.c | 19 +------------------
include/hw/virtio/vhost-user.h | 1 -
2 files changed, 1 insertion(+), 19 deletions(-)
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index bf6e50223c..c671719e9b 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1143,19 +1143,6 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
}
-static void vhost_user_host_notifier_restore(struct vhost_dev *dev,
- int queue_idx)
-{
- struct vhost_user *u = dev->opaque;
- VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
- VirtIODevice *vdev = dev->vdev;
-
- if (n->addr && !n->set) {
- virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true);
- n->set = true;
- }
-}
-
static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
int queue_idx)
{
@@ -1163,17 +1150,14 @@ static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
VirtIODevice *vdev = dev->vdev;
- if (n->addr && n->set) {
+ if (n->addr) {
virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
- n->set = false;
}
}
static int vhost_user_set_vring_base(struct vhost_dev *dev,
struct vhost_vring_state *ring)
{
- vhost_user_host_notifier_restore(dev, ring->index);
-
return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
}
@@ -1538,7 +1522,6 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
}
n->addr = addr;
- n->set = true;
return 0;
}
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index a9abca3288..f6012b2078 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -14,7 +14,6 @@
typedef struct VhostUserHostNotifier {
MemoryRegion mr;
void *addr;
- bool set;
} VhostUserHostNotifier;
typedef struct VhostUserState {

View File

@@ -0,0 +1,149 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Xueming Li <xuemingl@nvidia.com>
Date: Mon, 7 Feb 2022 15:19:29 +0800
Subject: [PATCH] vhost-user: fix VirtQ notifier cleanup
When vhost-user device cleanup, remove notifier MR and munmaps notifier
address in the event-handling thread, VM CPU thread writing the notifier
in concurrent fails with an error of accessing invalid address. It
happens because MR is still being referenced and accessed in another
thread while the underlying notifier mmap address is being freed and
becomes invalid.
This patch calls RCU and munmap notifiers in the callback after the
memory flatview update finish.
Fixes: 44866521bd6e ("vhost-user: support registering external host notifiers")
Cc: qemu-stable@nongnu.org
Signed-off-by: Xueming Li <xuemingl@nvidia.com>
Message-Id: <20220207071929.527149-3-xuemingl@nvidia.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
(cherry picked from commit 0b0af4d62f7002b31cd7b2762b26d2fcb76bb2ba)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
hw/virtio/vhost-user.c | 48 ++++++++++++++++++++--------------
include/hw/virtio/vhost-user.h | 2 ++
2 files changed, 31 insertions(+), 19 deletions(-)
diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index c671719e9b..ed5f9a5471 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -25,6 +25,7 @@
#include "migration/migration.h"
#include "migration/postcopy-ram.h"
#include "trace.h"
+#include "exec/ramblock.h"
#include <sys/ioctl.h>
#include <sys/socket.h>
@@ -1143,15 +1144,26 @@ static int vhost_user_set_vring_num(struct vhost_dev *dev,
return vhost_set_vring(dev, VHOST_USER_SET_VRING_NUM, ring);
}
-static void vhost_user_host_notifier_remove(struct vhost_dev *dev,
- int queue_idx)
+static void vhost_user_host_notifier_free(VhostUserHostNotifier *n)
{
- struct vhost_user *u = dev->opaque;
- VhostUserHostNotifier *n = &u->user->notifier[queue_idx];
- VirtIODevice *vdev = dev->vdev;
+ assert(n && n->unmap_addr);
+ munmap(n->unmap_addr, qemu_real_host_page_size);
+ n->unmap_addr = NULL;
+}
+
+static void vhost_user_host_notifier_remove(VhostUserState *user,
+ VirtIODevice *vdev, int queue_idx)
+{
+ VhostUserHostNotifier *n = &user->notifier[queue_idx];
if (n->addr) {
- virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
+ if (vdev) {
+ virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
+ }
+ assert(!n->unmap_addr);
+ n->unmap_addr = n->addr;
+ n->addr = NULL;
+ call_rcu(n, vhost_user_host_notifier_free, rcu);
}
}
@@ -1190,8 +1202,9 @@ static int vhost_user_get_vring_base(struct vhost_dev *dev,
.payload.state = *ring,
.hdr.size = sizeof(msg.payload.state),
};
+ struct vhost_user *u = dev->opaque;
- vhost_user_host_notifier_remove(dev, ring->index);
+ vhost_user_host_notifier_remove(u->user, dev->vdev, ring->index);
if (vhost_user_write(dev, &msg, NULL, 0) < 0) {
return -1;
@@ -1486,12 +1499,7 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
n = &user->notifier[queue_idx];
- if (n->addr) {
- virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, false);
- object_unparent(OBJECT(&n->mr));
- munmap(n->addr, page_size);
- n->addr = NULL;
- }
+ vhost_user_host_notifier_remove(user, vdev, queue_idx);
if (area->u64 & VHOST_USER_VRING_NOFD_MASK) {
return 0;
@@ -1510,9 +1518,12 @@ static int vhost_user_slave_handle_vring_host_notifier(struct vhost_dev *dev,
name = g_strdup_printf("vhost-user/host-notifier@%p mmaps[%d]",
user, queue_idx);
- if (!n->mr.ram) /* Don't init again after suspend. */
+ if (!n->mr.ram) { /* Don't init again after suspend. */
memory_region_init_ram_device_ptr(&n->mr, OBJECT(vdev), name,
page_size, addr);
+ } else {
+ n->mr.ram_block->host = addr;
+ }
g_free(name);
if (virtio_queue_set_host_notifier_mr(vdev, queue_idx, &n->mr, true)) {
@@ -2460,17 +2471,16 @@ bool vhost_user_init(VhostUserState *user, CharBackend *chr, Error **errp)
void vhost_user_cleanup(VhostUserState *user)
{
int i;
+ VhostUserHostNotifier *n;
if (!user->chr) {
return;
}
memory_region_transaction_begin();
for (i = 0; i < VIRTIO_QUEUE_MAX; i++) {
- if (user->notifier[i].addr) {
- object_unparent(OBJECT(&user->notifier[i].mr));
- munmap(user->notifier[i].addr, qemu_real_host_page_size);
- user->notifier[i].addr = NULL;
- }
+ n = &user->notifier[i];
+ vhost_user_host_notifier_remove(user, NULL, i);
+ object_unparent(OBJECT(&n->mr));
}
memory_region_transaction_commit();
user->chr = NULL;
diff --git a/include/hw/virtio/vhost-user.h b/include/hw/virtio/vhost-user.h
index f6012b2078..e44a41bb70 100644
--- a/include/hw/virtio/vhost-user.h
+++ b/include/hw/virtio/vhost-user.h
@@ -12,8 +12,10 @@
#include "hw/virtio/virtio.h"
typedef struct VhostUserHostNotifier {
+ struct rcu_head rcu;
MemoryRegion mr;
void *addr;
+ void *unmap_addr;
} VhostUserHostNotifier;
typedef struct VhostUserState {

View File

@@ -0,0 +1,101 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Halil Pasic <pasic@linux.ibm.com>
Date: Mon, 7 Feb 2022 12:28:57 +0100
Subject: [PATCH] virtio: fix the condition for iommu_platform not supported
The commit 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
unsupported") claims to fail the device hotplug when iommu_platform
is requested, but not supported by the (vhost) device. On the first
glance the condition for detecting that situation looks perfect, but
because a certain peculiarity of virtio_platform it ain't.
In fact the aforementioned commit introduces a regression. It breaks
virtio-fs support for Secure Execution, and most likely also for AMD SEV
or any other confidential guest scenario that relies encrypted guest
memory. The same also applies to any other vhost device that does not
support _F_ACCESS_PLATFORM.
The peculiarity is that iommu_platform and _F_ACCESS_PLATFORM collates
"device can not access all of the guest RAM" and "iova != gpa, thus
device needs to translate iova".
Confidential guest technologies currently rely on the device/hypervisor
offering _F_ACCESS_PLATFORM, so that, after the feature has been
negotiated, the guest grants access to the portions of memory the
device needs to see. So in for confidential guests, generally,
_F_ACCESS_PLATFORM is about the restricted access to memory, but not
about the addresses used being something else than guest physical
addresses.
This is the very reason for which commit f7ef7e6e3b ("vhost: correctly
turn on VIRTIO_F_IOMMU_PLATFORM") fences _F_ACCESS_PLATFORM from the
vhost device that does not need it, because on the vhost interface it
only means "I/O address translation is needed".
This patch takes inspiration from f7ef7e6e3b ("vhost: correctly turn on
VIRTIO_F_IOMMU_PLATFORM"), and uses the same condition for detecting the
situation when _F_ACCESS_PLATFORM is requested, but no I/O translation
by the device, and thus no device capability is needed. In this
situation claiming that the device does not support iommu_plattform=on
is counter-productive. So let us stop doing that!
Signed-off-by: Halil Pasic <pasic@linux.ibm.com>
Reported-by: Jakob Naucke <Jakob.Naucke@ibm.com>
Fixes: 04ceb61a40 ("virtio: Fail if iommu_platform is requested, but
unsupported")
Acked-by: Cornelia Huck <cohuck@redhat.com>
Reviewed-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Tested-by: Daniel Henrique Barboza <danielhb413@gmail.com>
Cc: Kevin Wolf <kwolf@redhat.com>
Cc: qemu-stable@nongnu.org
Message-Id: <20220207112857.607829-1-pasic@linux.ibm.com>
Reviewed-by: Michael S. Tsirkin <mst@redhat.com>
Signed-off-by: Michael S. Tsirkin <mst@redhat.com>
Acked-by: Jason Wang <jasowang@redhat.com>
(cherry picked from commit e65902a913bf31ba79a83a3bd3621108b85cf645)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
hw/virtio/virtio-bus.c | 12 +++++++-----
1 file changed, 7 insertions(+), 5 deletions(-)
diff --git a/hw/virtio/virtio-bus.c b/hw/virtio/virtio-bus.c
index d23db98c56..0f69d1c742 100644
--- a/hw/virtio/virtio-bus.c
+++ b/hw/virtio/virtio-bus.c
@@ -48,6 +48,7 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
VirtioBusClass *klass = VIRTIO_BUS_GET_CLASS(bus);
VirtioDeviceClass *vdc = VIRTIO_DEVICE_GET_CLASS(vdev);
bool has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
+ bool vdev_has_iommu;
Error *local_err = NULL;
DPRINTF("%s: plug device.\n", qbus->name);
@@ -69,11 +70,6 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
return;
}
- if (has_iommu && !virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM)) {
- error_setg(errp, "iommu_platform=true is not supported by the device");
- return;
- }
-
if (klass->device_plugged != NULL) {
klass->device_plugged(qbus->parent, &local_err);
}
@@ -82,9 +78,15 @@ void virtio_bus_device_plugged(VirtIODevice *vdev, Error **errp)
return;
}
+ vdev_has_iommu = virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
if (klass->get_dma_as != NULL && has_iommu) {
virtio_add_feature(&vdev->host_features, VIRTIO_F_IOMMU_PLATFORM);
vdev->dma_as = klass->get_dma_as(qbus->parent);
+ if (!vdev_has_iommu && vdev->dma_as != &address_space_memory) {
+ error_setg(errp,
+ "iommu_platform=true is not supported by the device");
+ return;
+ }
} else {
vdev->dma_as = &address_space_memory;
}

View File

@@ -0,0 +1,38 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Fri, 6 May 2022 14:38:35 +0200
Subject: [PATCH] block/gluster: correctly set max_pdiscard which is int64_t
Previously, max_pdiscard would be zero in the following assertion:
qemu-system-x86_64: ../block/io.c:3166: bdrv_co_pdiscard: Assertion
`max_pdiscard >= bs->bl.request_alignment' failed.
Fixes: 0c8022876f ("block: use int64_t instead of int in driver discard handlers")
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/gluster.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/gluster.c b/block/gluster.c
index 398976bc66..592e71b22a 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -891,7 +891,7 @@ out:
static void qemu_gluster_refresh_limits(BlockDriverState *bs, Error **errp)
{
bs->bl.max_transfer = GLUSTER_MAX_TRANSFER;
- bs->bl.max_pdiscard = SIZE_MAX;
+ bs->bl.max_pdiscard = INT64_MAX;
}
static int qemu_gluster_reopen_prepare(BDRVReopenState *state,
@@ -1304,7 +1304,7 @@ static coroutine_fn int qemu_gluster_co_pdiscard(BlockDriverState *bs,
GlusterAIOCB acb;
BDRVGlusterState *s = bs->opaque;
- assert(bytes <= SIZE_MAX); /* rely on max_pdiscard */
+ assert(bytes <= INT64_MAX); /* rely on max_pdiscard */
acb.size = 0;
acb.ret = 0;

View File

@@ -0,0 +1,72 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Rao Lei <lei.rao@intel.com>
Date: Fri, 6 May 2022 14:38:36 +0200
Subject: [PATCH] ui/vnc.c: Fixed a deadlock bug.
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
The GDB statck is as follows:
(gdb) bt
0 __lll_lock_wait (futex=futex@entry=0x56211df20360, private=0) at lowlevellock.c:52
1 0x00007f263caf20a3 in __GI___pthread_mutex_lock (mutex=0x56211df20360) at ../nptl/pthread_mutex_lock.c:80
2 0x000056211a757364 in qemu_mutex_lock_impl (mutex=0x56211df20360, file=0x56211a804857 "../ui/vnc-jobs.h", line=60)
at ../util/qemu-thread-posix.c:80
3 0x000056211a0ef8c7 in vnc_lock_output (vs=0x56211df14200) at ../ui/vnc-jobs.h:60
4 0x000056211a0efcb7 in vnc_clipboard_send (vs=0x56211df14200, count=1, dwords=0x7ffdf1701338) at ../ui/vnc-clipboard.c:138
5 0x000056211a0f0129 in vnc_clipboard_notify (notifier=0x56211df244c8, data=0x56211dd1bbf0) at ../ui/vnc-clipboard.c:209
6 0x000056211a75dde8 in notifier_list_notify (list=0x56211afa17d0 <clipboard_notifiers>, data=0x56211dd1bbf0) at ../util/notify.c:39
7 0x000056211a0bf0e6 in qemu_clipboard_update (info=0x56211dd1bbf0) at ../ui/clipboard.c:50
8 0x000056211a0bf05d in qemu_clipboard_peer_release (peer=0x56211df244c0, selection=QEMU_CLIPBOARD_SELECTION_CLIPBOARD)
at ../ui/clipboard.c:41
9 0x000056211a0bef9b in qemu_clipboard_peer_unregister (peer=0x56211df244c0) at ../ui/clipboard.c:19
10 0x000056211a0d45f3 in vnc_disconnect_finish (vs=0x56211df14200) at ../ui/vnc.c:1358
11 0x000056211a0d4c9d in vnc_client_read (vs=0x56211df14200) at ../ui/vnc.c:1611
12 0x000056211a0d4df8 in vnc_client_io (ioc=0x56211ce70690, condition=G_IO_IN, opaque=0x56211df14200) at ../ui/vnc.c:1649
13 0x000056211a5b976c in qio_channel_fd_source_dispatch
(source=0x56211ce50a00, callback=0x56211a0d4d71 <vnc_client_io>, user_data=0x56211df14200) at ../io/channel-watch.c:84
14 0x00007f263ccede8e in g_main_context_dispatch () at /lib/x86_64-linux-gnu/libglib-2.0.so.0
15 0x000056211a77d4a1 in glib_pollfds_poll () at ../util/main-loop.c:232
16 0x000056211a77d51f in os_host_main_loop_wait (timeout=958545) at ../util/main-loop.c:255
17 0x000056211a77d630 in main_loop_wait (nonblocking=0) at ../util/main-loop.c:531
18 0x000056211a45bc8e in qemu_main_loop () at ../softmmu/runstate.c:726
19 0x000056211a0b45fa in main (argc=69, argv=0x7ffdf1701778, envp=0x7ffdf17019a8) at ../softmmu/main.c:50
From the call trace, we can see it is a deadlock bug.
vnc_disconnect_finish will acquire the output_mutex.
But, the output_mutex will be acquired again in vnc_clipboard_send.
Repeated locking will cause deadlock. So, I move
qemu_clipboard_peer_unregister() behind vnc_unlock_output();
Fixes: 0bf41cab93e ("ui/vnc: clipboard support")
Signed-off-by: Lei Rao <lei.rao@intel.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220105020808.597325-1-lei.rao@intel.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry-picked from commit 1dbbe6f172810026c51dc84ed927a3cc23017949)
[FE: trivial backport for 6.2]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
ui/vnc.c | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/ui/vnc.c b/ui/vnc.c
index af02522e84..b253e85c65 100644
--- a/ui/vnc.c
+++ b/ui/vnc.c
@@ -1354,12 +1354,12 @@ void vnc_disconnect_finish(VncState *vs)
/* last client gone */
vnc_update_server_surface(vs->vd);
}
+ vnc_unlock_output(vs);
+
if (vs->cbpeer.update.notify) {
qemu_clipboard_peer_unregister(&vs->cbpeer);
}
- vnc_unlock_output(vs);
-
qemu_mutex_destroy(&vs->output_mutex);
if (vs->bh != NULL) {
qemu_bh_delete(vs->bh);

View File

@@ -0,0 +1,37 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Mauro Matteo Cascella <mcascell@redhat.com>
Date: Thu, 7 Apr 2022 10:11:06 +0200
Subject: [PATCH] display/qxl-render: fix race condition in qxl_cursor
(CVE-2021-4207)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Avoid fetching 'width' and 'height' a second time to prevent possible
race condition. Refer to security advisory
https://starlabs.sg/advisories/22-4207/ for more information.
Fixes: CVE-2021-4207
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220407081106.343235-1-mcascell@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit 9569f5cb5b4bffa9d3ebc8ba7da1e03830a9a895)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
hw/display/qxl-render.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/display/qxl-render.c b/hw/display/qxl-render.c
index d28849b121..237ed293ba 100644
--- a/hw/display/qxl-render.c
+++ b/hw/display/qxl-render.c
@@ -266,7 +266,7 @@ static QEMUCursor *qxl_cursor(PCIQXLDevice *qxl, QXLCursor *cursor,
}
break;
case SPICE_CURSOR_TYPE_ALPHA:
- size = sizeof(uint32_t) * cursor->header.width * cursor->header.height;
+ size = sizeof(uint32_t) * c->width * c->height;
qxl_unpack_chunks(c->data, size, qxl, &cursor->chunk, group_id);
if (qxl->debug > 2) {
cursor_print_ascii_art(c, "qxl/alpha");

View File

@@ -0,0 +1,83 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Mauro Matteo Cascella <mcascell@redhat.com>
Date: Thu, 7 Apr 2022 10:17:12 +0200
Subject: [PATCH] ui/cursor: fix integer overflow in cursor_alloc
(CVE-2021-4206)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Prevent potential integer overflow by limiting 'width' and 'height' to
512x512. Also change 'datasize' type to size_t. Refer to security
advisory https://starlabs.sg/advisories/22-4206/ for more information.
Fixes: CVE-2021-4206
Signed-off-by: Mauro Matteo Cascella <mcascell@redhat.com>
Reviewed-by: Marc-André Lureau <marcandre.lureau@redhat.com>
Message-Id: <20220407081712.345609-1-mcascell@redhat.com>
Signed-off-by: Gerd Hoffmann <kraxel@redhat.com>
(cherry picked from commit fa892e9abb728e76afcf27323ab29c57fb0fe7aa)
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
hw/display/qxl-render.c | 7 +++++++
hw/display/vmware_vga.c | 2 ++
ui/cursor.c | 8 +++++++-
3 files changed, 16 insertions(+), 1 deletion(-)
diff --git a/hw/display/qxl-render.c b/hw/display/qxl-render.c
index 237ed293ba..ca217004bf 100644
--- a/hw/display/qxl-render.c
+++ b/hw/display/qxl-render.c
@@ -247,6 +247,13 @@ static QEMUCursor *qxl_cursor(PCIQXLDevice *qxl, QXLCursor *cursor,
size_t size;
c = cursor_alloc(cursor->header.width, cursor->header.height);
+
+ if (!c) {
+ qxl_set_guest_bug(qxl, "%s: cursor %ux%u alloc error", __func__,
+ cursor->header.width, cursor->header.height);
+ goto fail;
+ }
+
c->hot_x = cursor->header.hot_spot_x;
c->hot_y = cursor->header.hot_spot_y;
switch (cursor->header.type) {
diff --git a/hw/display/vmware_vga.c b/hw/display/vmware_vga.c
index e2969a6c81..2b81d6122f 100644
--- a/hw/display/vmware_vga.c
+++ b/hw/display/vmware_vga.c
@@ -509,6 +509,8 @@ static inline void vmsvga_cursor_define(struct vmsvga_state_s *s,
int i, pixels;
qc = cursor_alloc(c->width, c->height);
+ assert(qc != NULL);
+
qc->hot_x = c->hot_x;
qc->hot_y = c->hot_y;
switch (c->bpp) {
diff --git a/ui/cursor.c b/ui/cursor.c
index 1d62ddd4d0..835f0802f9 100644
--- a/ui/cursor.c
+++ b/ui/cursor.c
@@ -46,6 +46,8 @@ static QEMUCursor *cursor_parse_xpm(const char *xpm[])
/* parse pixel data */
c = cursor_alloc(width, height);
+ assert(c != NULL);
+
for (pixel = 0, y = 0; y < height; y++, line++) {
for (x = 0; x < height; x++, pixel++) {
idx = xpm[line][x];
@@ -91,7 +93,11 @@ QEMUCursor *cursor_builtin_left_ptr(void)
QEMUCursor *cursor_alloc(int width, int height)
{
QEMUCursor *c;
- int datasize = width * height * sizeof(uint32_t);
+ size_t datasize = width * height * sizeof(uint32_t);
+
+ if (width > 512 || height > 512) {
+ return NULL;
+ }
c = g_malloc0(sizeof(QEMUCursor) + datasize);
c->width = width;

View File

@@ -0,0 +1,971 @@
Index: qemu/block/meson.build
===================================================================
--- qemu.orig/block/meson.build
+++ qemu/block/meson.build
@@ -91,6 +91,7 @@ foreach m : [
[libnfs, 'nfs', files('nfs.c')],
[libssh, 'ssh', files('ssh.c')],
[rbd, 'rbd', files('rbd.c')],
+ [vitastor, 'vitastor', files('vitastor.c')],
]
if m[0].found()
module_ss = ss.source_set()
Index: qemu/meson.build
===================================================================
--- qemu.orig/meson.build
+++ qemu/meson.build
@@ -838,6 +838,26 @@ if not get_option('rbd').auto() or have_
endif
endif
+vitastor = not_found
+if not get_option('vitastor').auto() or have_block
+ libvitastor_client = cc.find_library('vitastor_client', has_headers: ['vitastor_c.h'],
+ required: get_option('vitastor'), kwargs: static_kwargs)
+ if libvitastor_client.found()
+ if cc.links('''
+ #include <vitastor_c.h>
+ int main(void) {
+ vitastor_c_create_qemu(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0);
+ return 0;
+ }''', dependencies: libvitastor_client)
+ vitastor = declare_dependency(dependencies: libvitastor_client)
+ elif get_option('vitastor').enabled()
+ error('could not link libvitastor_client')
+ else
+ warning('could not link libvitastor_client, disabling')
+ endif
+ endif
+endif
+
glusterfs = not_found
glusterfs_ftruncate_has_stat = false
glusterfs_iocb_has_stat = false
@@ -1459,6 +1479,7 @@ config_host_data.set('CONFIG_LINUX_AIO',
config_host_data.set('CONFIG_LINUX_IO_URING', linux_io_uring.found())
config_host_data.set('CONFIG_LIBPMEM', libpmem.found())
config_host_data.set('CONFIG_RBD', rbd.found())
+config_host_data.set('CONFIG_VITASTOR', vitastor.found())
config_host_data.set('CONFIG_SDL', sdl.found())
config_host_data.set('CONFIG_SDL_IMAGE', sdl_image.found())
config_host_data.set('CONFIG_SECCOMP', seccomp.found())
@@ -3424,6 +3445,7 @@ if spice_protocol.found()
summary_info += {' spice server support': spice}
endif
summary_info += {'rbd support': rbd}
+summary_info += {'vitastor support': vitastor}
summary_info += {'xfsctl support': config_host.has_key('CONFIG_XFS')}
summary_info += {'smartcard support': cacard}
summary_info += {'U2F support': u2f}
Index: qemu/meson_options.txt
===================================================================
--- qemu.orig/meson_options.txt
+++ qemu/meson_options.txt
@@ -121,6 +121,8 @@ option('lzo', type : 'feature', value :
description: 'lzo compression support')
option('rbd', type : 'feature', value : 'auto',
description: 'Ceph block device driver')
+option('vitastor', type : 'feature', value : 'auto',
+ description: 'Vitastor block device driver')
option('gtk', type : 'feature', value : 'auto',
description: 'GTK+ user interface')
option('sdl', type : 'feature', value : 'auto',
Index: qemu/qapi/block-core.json
===================================================================
--- qemu.orig/qapi/block-core.json
+++ qemu/qapi/block-core.json
@@ -3179,7 +3179,7 @@
'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
{ 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
'pbs',
- 'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
+ 'ssh', 'throttle', 'vdi', 'vhdx', 'vitastor', 'vmdk', 'vpc', 'vvfat' ] }
##
# @BlockdevOptionsFile:
@@ -4125,6 +4125,28 @@
'*server': ['InetSocketAddressBase'] } }
##
+# @BlockdevOptionsVitastor:
+#
+# Driver specific block device options for vitastor
+#
+# @image: Image name
+# @inode: Inode number
+# @pool: Pool ID
+# @size: Desired image size in bytes
+# @config-path: Path to Vitastor configuration
+# @etcd-host: etcd connection address(es)
+# @etcd-prefix: etcd key/value prefix
+##
+{ 'struct': 'BlockdevOptionsVitastor',
+ 'data': { '*inode': 'uint64',
+ '*pool': 'uint64',
+ '*size': 'uint64',
+ '*image': 'str',
+ '*config-path': 'str',
+ '*etcd-host': 'str',
+ '*etcd-prefix': 'str' } }
+
+##
# @ReplicationMode:
#
# An enumeration of replication modes.
@@ -4520,6 +4542,7 @@
'throttle': 'BlockdevOptionsThrottle',
'vdi': 'BlockdevOptionsGenericFormat',
'vhdx': 'BlockdevOptionsGenericFormat',
+ 'vitastor': 'BlockdevOptionsVitastor',
'vmdk': 'BlockdevOptionsGenericCOWFormat',
'vpc': 'BlockdevOptionsGenericFormat',
'vvfat': 'BlockdevOptionsVVFAT'
@@ -4910,6 +4933,17 @@
'*encrypt' : 'RbdEncryptionCreateOptions' } }
##
+# @BlockdevCreateOptionsVitastor:
+#
+# Driver specific image creation options for Vitastor.
+#
+# @size: Size of the virtual disk in bytes
+##
+{ 'struct': 'BlockdevCreateOptionsVitastor',
+ 'data': { 'location': 'BlockdevOptionsVitastor',
+ 'size': 'size' } }
+
+##
# @BlockdevVmdkSubformat:
#
# Subformat options for VMDK images
@@ -5108,6 +5142,7 @@
'ssh': 'BlockdevCreateOptionsSsh',
'vdi': 'BlockdevCreateOptionsVdi',
'vhdx': 'BlockdevCreateOptionsVhdx',
+ 'vitastor': 'BlockdevCreateOptionsVitastor',
'vmdk': 'BlockdevCreateOptionsVmdk',
'vpc': 'BlockdevCreateOptionsVpc'
} }
Index: qemu/scripts/ci/org.centos/stream/8/x86_64/configure
===================================================================
--- qemu.orig/scripts/ci/org.centos/stream/8/x86_64/configure
+++ qemu/scripts/ci/org.centos/stream/8/x86_64/configure
@@ -31,7 +31,7 @@
--with-git=meson \
--with-git-submodules=update \
--target-list="x86_64-softmmu" \
---block-drv-rw-whitelist="qcow2,raw,file,host_device,nbd,iscsi,rbd,blkdebug,luks,null-co,nvme,copy-on-read,throttle,gluster" \
+--block-drv-rw-whitelist="qcow2,raw,file,host_device,nbd,iscsi,rbd,vitastor,blkdebug,luks,null-co,nvme,copy-on-read,throttle,gluster" \
--audio-drv-list="" \
--block-drv-ro-whitelist="vmdk,vhdx,vpc,https,ssh" \
--with-coroutine=ucontext \
@@ -183,6 +183,7 @@
--enable-opengl \
--enable-pie \
--enable-rbd \
+--enable-vitastor \
--enable-rdma \
--enable-seccomp \
--enable-snappy \
Index: a/block/vitastor.c
===================================================================
--- /dev/null
+++ a/block/vitastor.c
@@ -0,0 +1,797 @@
+// Copyright (c) Vitaliy Filippov, 2019+
+// License: VNPL-1.1 or GNU GPL-2.0+ (see README.md for details)
+
+// QEMU block driver
+
+#ifdef VITASTOR_SOURCE_TREE
+#define BUILD_DSO
+#define _GNU_SOURCE
+#endif
+#include "qemu/osdep.h"
+#include "qemu/main-loop.h"
+#include "block/block_int.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qerror.h"
+#include "qemu/uri.h"
+#include "qemu/error-report.h"
+#include "qemu/module.h"
+#include "qemu/option.h"
+
+#if QEMU_VERSION_MAJOR >= 3
+#include "qemu/units.h"
+#include "block/qdict.h"
+#include "qemu/cutils.h"
+#elif QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR >= 10
+#include "qemu/cutils.h"
+#include "qapi/qmp/qstring.h"
+#include "qapi/qmp/qjson.h"
+#else
+#include "qapi/qmp/qint.h"
+#define qdict_put_int(options, name, num_val) qdict_put_obj(options, name, QOBJECT(qint_from_int(num_val)))
+#define qdict_put_str(options, name, value) qdict_put_obj(options, name, QOBJECT(qstring_from_str(value)))
+#define qobject_unref QDECREF
+#endif
+
+#include "vitastor_c.h"
+
+#ifdef VITASTOR_SOURCE_TREE
+void qemu_module_dummy(void)
+{
+}
+
+void DSO_STAMP_FUN(void)
+{
+}
+#endif
+
+typedef struct VitastorClient
+{
+ void *proxy;
+ void *watch;
+ char *config_path;
+ char *etcd_host;
+ char *etcd_prefix;
+ char *image;
+ int skip_parents;
+ uint64_t inode;
+ uint64_t pool;
+ uint64_t size;
+ long readonly;
+ int use_rdma;
+ char *rdma_device;
+ int rdma_port_num;
+ int rdma_gid_index;
+ int rdma_mtu;
+ QemuMutex mutex;
+
+ uint64_t last_bitmap_inode, last_bitmap_offset, last_bitmap_len;
+ uint32_t last_bitmap_granularity;
+ uint8_t *last_bitmap;
+} VitastorClient;
+
+typedef struct VitastorRPC
+{
+ BlockDriverState *bs;
+ Coroutine *co;
+ QEMUIOVector *iov;
+ long ret;
+ int complete;
+ uint64_t inode, offset, len;
+ uint32_t bitmap_granularity;
+ uint8_t *bitmap;
+} VitastorRPC;
+
+static void vitastor_co_init_task(BlockDriverState *bs, VitastorRPC *task);
+static void vitastor_co_generic_bh_cb(void *opaque, long retval);
+static void vitastor_co_read_cb(void *opaque, long retval, uint64_t version);
+static void vitastor_close(BlockDriverState *bs);
+
+static char *qemu_vitastor_next_tok(char *src, char delim, char **p)
+{
+ char *end;
+ *p = NULL;
+ for (end = src; *end; ++end)
+ {
+ if (*end == delim)
+ break;
+ if (*end == '\\' && end[1] != '\0')
+ end++;
+ }
+ if (*end == delim)
+ {
+ *p = end + 1;
+ *end = '\0';
+ }
+ return src;
+}
+
+static void qemu_vitastor_unescape(char *src)
+{
+ char *p;
+ for (p = src; *src; ++src, ++p)
+ {
+ if (*src == '\\' && src[1] != '\0')
+ src++;
+ *p = *src;
+ }
+ *p = '\0';
+}
+
+// vitastor[:key=value]*
+// vitastor[:etcd_host=127.0.0.1]:inode=1:pool=1[:rdma_gid_index=3]
+// vitastor:config_path=/etc/vitastor/vitastor.conf:image=testimg
+static void vitastor_parse_filename(const char *filename, QDict *options, Error **errp)
+{
+ const char *start;
+ char *p, *buf;
+
+ if (!strstart(filename, "vitastor:", &start))
+ {
+ error_setg(errp, "File name must start with 'vitastor:'");
+ return;
+ }
+
+ buf = g_strdup(start);
+ p = buf;
+
+ // The following are all key/value pairs
+ while (p)
+ {
+ int i;
+ char *name, *value;
+ name = qemu_vitastor_next_tok(p, '=', &p);
+ if (!p)
+ {
+ error_setg(errp, "conf option %s has no value", name);
+ break;
+ }
+ for (i = 0; i < strlen(name); i++)
+ if (name[i] == '_')
+ name[i] = '-';
+ qemu_vitastor_unescape(name);
+ value = qemu_vitastor_next_tok(p, ':', &p);
+ qemu_vitastor_unescape(value);
+ if (!strcmp(name, "inode") ||
+ !strcmp(name, "pool") ||
+ !strcmp(name, "size") ||
+ !strcmp(name, "skip-parents") ||
+ !strcmp(name, "use-rdma") ||
+ !strcmp(name, "rdma-port_num") ||
+ !strcmp(name, "rdma-gid-index") ||
+ !strcmp(name, "rdma-mtu"))
+ {
+ unsigned long long num_val;
+ if (parse_uint_full(value, &num_val, 0))
+ {
+ error_setg(errp, "Illegal %s: %s", name, value);
+ goto out;
+ }
+ qdict_put_int(options, name, num_val);
+ }
+ else
+ {
+ qdict_put_str(options, name, value);
+ }
+ }
+ if (!qdict_get_try_str(options, "image"))
+ {
+ if (!qdict_get_try_int(options, "inode", 0))
+ {
+ error_setg(errp, "one of image (name) and inode (number) must be specified");
+ goto out;
+ }
+ if (!(qdict_get_try_int(options, "inode", 0) >> (64-POOL_ID_BITS)) &&
+ !qdict_get_try_int(options, "pool", 0))
+ {
+ error_setg(errp, "pool number must be specified or included in the inode number");
+ goto out;
+ }
+ if (!qdict_get_try_int(options, "size", 0))
+ {
+ error_setg(errp, "size must be specified when inode number is used instead of image name");
+ goto out;
+ }
+ }
+
+out:
+ g_free(buf);
+ return;
+}
+
+static void coroutine_fn vitastor_co_get_metadata(VitastorRPC *task)
+{
+ BlockDriverState *bs = task->bs;
+ VitastorClient *client = bs->opaque;
+ task->co = qemu_coroutine_self();
+
+ qemu_mutex_lock(&client->mutex);
+ vitastor_c_watch_inode(client->proxy, client->image, vitastor_co_generic_bh_cb, task);
+ qemu_mutex_unlock(&client->mutex);
+
+ while (!task->complete)
+ {
+ qemu_coroutine_yield();
+ }
+}
+
+static void vitastor_aio_set_fd_handler(void *ctx, int fd, int unused1, IOHandler *fd_read, IOHandler *fd_write, void *unused2, void *opaque)
+{
+ aio_set_fd_handler(ctx, fd,
+#if QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR >= 5 || QEMU_VERSION_MAJOR >= 3
+ 0 /*is_external*/,
+#endif
+ fd_read, fd_write,
+#if QEMU_VERSION_MAJOR == 1 && QEMU_VERSION_MINOR <= 6 || QEMU_VERSION_MAJOR < 1
+ NULL /*io_flush*/,
+#endif
+#if QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR >= 9 || QEMU_VERSION_MAJOR >= 3
+ NULL /*io_poll*/,
+#endif
+#if QEMU_VERSION_MAJOR >= 7
+ NULL /*io_poll_ready*/,
+#endif
+ opaque);
+}
+
+static int vitastor_file_open(BlockDriverState *bs, QDict *options, int flags, Error **errp)
+{
+ VitastorRPC task;
+ VitastorClient *client = bs->opaque;
+ void *image = NULL;
+ int64_t ret = 0;
+ qemu_mutex_init(&client->mutex);
+ client->config_path = g_strdup(qdict_get_try_str(options, "config-path"));
+ // FIXME: Rename to etcd_address
+ client->etcd_host = g_strdup(qdict_get_try_str(options, "etcd-host"));
+ client->etcd_prefix = g_strdup(qdict_get_try_str(options, "etcd-prefix"));
+ client->skip_parents = qdict_get_try_int(options, "skip-parents", 0);
+ client->use_rdma = qdict_get_try_int(options, "use-rdma", -1);
+ client->rdma_device = g_strdup(qdict_get_try_str(options, "rdma-device"));
+ client->rdma_port_num = qdict_get_try_int(options, "rdma-port-num", 0);
+ client->rdma_gid_index = qdict_get_try_int(options, "rdma-gid-index", 0);
+ client->rdma_mtu = qdict_get_try_int(options, "rdma-mtu", 0);
+ client->proxy = vitastor_c_create_qemu(
+ vitastor_aio_set_fd_handler, bdrv_get_aio_context(bs), client->config_path, client->etcd_host, client->etcd_prefix,
+ client->use_rdma, client->rdma_device, client->rdma_port_num, client->rdma_gid_index, client->rdma_mtu, 0
+ );
+ image = client->image = g_strdup(qdict_get_try_str(options, "image"));
+ client->readonly = (flags & BDRV_O_RDWR) ? 1 : 0;
+ // Get image metadata (size and readonly flag) or just wait until the client is ready
+ if (!image)
+ client->image = (char*)"x";
+ task.complete = 0;
+ task.bs = bs;
+ if (qemu_in_coroutine())
+ {
+ vitastor_co_get_metadata(&task);
+ }
+ else
+ {
+ bdrv_coroutine_enter(bs, qemu_coroutine_create((void(*)(void*))vitastor_co_get_metadata, &task));
+ BDRV_POLL_WHILE(bs, !task.complete);
+ }
+ client->image = image;
+ if (client->image)
+ {
+ client->watch = (void*)task.ret;
+ client->readonly = client->readonly || vitastor_c_inode_get_readonly(client->watch);
+ client->size = vitastor_c_inode_get_size(client->watch);
+ if (!vitastor_c_inode_get_num(client->watch))
+ {
+ error_setg(errp, "image does not exist");
+ vitastor_close(bs);
+ return -1;
+ }
+ if (!client->size)
+ {
+ client->size = qdict_get_try_int(options, "size", 0);
+ }
+ }
+ else
+ {
+ client->watch = NULL;
+ client->inode = qdict_get_try_int(options, "inode", 0);
+ client->pool = qdict_get_try_int(options, "pool", 0);
+ if (client->pool)
+ {
+ client->inode = (client->inode & (((uint64_t)1 << (64-POOL_ID_BITS)) - 1)) | (client->pool << (64-POOL_ID_BITS));
+ }
+ client->size = qdict_get_try_int(options, "size", 0);
+ vitastor_c_close_watch(client->proxy, (void*)task.ret);
+ }
+ if (!client->size)
+ {
+ error_setg(errp, "image size not specified");
+ vitastor_close(bs);
+ return -1;
+ }
+ bs->total_sectors = client->size / BDRV_SECTOR_SIZE;
+ //client->aio_context = bdrv_get_aio_context(bs);
+ qdict_del(options, "use-rdma");
+ qdict_del(options, "rdma-mtu");
+ qdict_del(options, "rdma-gid-index");
+ qdict_del(options, "rdma-port-num");
+ qdict_del(options, "rdma-device");
+ qdict_del(options, "config-path");
+ qdict_del(options, "etcd-host");
+ qdict_del(options, "etcd-prefix");
+ qdict_del(options, "image");
+ qdict_del(options, "inode");
+ qdict_del(options, "pool");
+ qdict_del(options, "size");
+ qdict_del(options, "skip-parents");
+ return ret;
+}
+
+static void vitastor_close(BlockDriverState *bs)
+{
+ VitastorClient *client = bs->opaque;
+ vitastor_c_destroy(client->proxy);
+ qemu_mutex_destroy(&client->mutex);
+ if (client->config_path)
+ g_free(client->config_path);
+ if (client->etcd_host)
+ g_free(client->etcd_host);
+ if (client->etcd_prefix)
+ g_free(client->etcd_prefix);
+ if (client->image)
+ g_free(client->image);
+ free(client->last_bitmap);
+ client->last_bitmap = NULL;
+}
+
+#if QEMU_VERSION_MAJOR >= 3 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR > 2
+static int vitastor_probe_blocksizes(BlockDriverState *bs, BlockSizes *bsz)
+{
+ bsz->phys = 4096;
+ bsz->log = 512;
+ return 0;
+}
+#endif
+
+#if QEMU_VERSION_MAJOR >= 3 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR >= 12
+static int coroutine_fn vitastor_co_create_opts(
+#if QEMU_VERSION_MAJOR >= 4
+ BlockDriver *drv,
+#endif
+ const char *url, QemuOpts *opts, Error **errp)
+{
+ QDict *options;
+ int ret;
+
+ options = qdict_new();
+ vitastor_parse_filename(url, options, errp);
+ if (*errp)
+ {
+ ret = -1;
+ goto out;
+ }
+
+ // inodes don't require creation in Vitastor. FIXME: They will when there will be some metadata
+
+ ret = 0;
+out:
+ qobject_unref(options);
+ return ret;
+}
+#endif
+
+#if QEMU_VERSION_MAJOR >= 3
+static int coroutine_fn vitastor_co_truncate(BlockDriverState *bs, int64_t offset,
+#if QEMU_VERSION_MAJOR >= 4
+ bool exact,
+#endif
+ PreallocMode prealloc,
+#if QEMU_VERSION_MAJOR >= 5 && QEMU_VERSION_MINOR >= 1 || QEMU_VERSION_MAJOR > 5 || defined RHEL_BDRV_CO_TRUNCATE_FLAGS
+ BdrvRequestFlags flags,
+#endif
+ Error **errp)
+{
+ VitastorClient *client = bs->opaque;
+
+ if (prealloc != PREALLOC_MODE_OFF)
+ {
+ error_setg(errp, "Unsupported preallocation mode '%s'", PreallocMode_str(prealloc));
+ return -ENOTSUP;
+ }
+
+ // TODO: Resize inode to <offset> bytes
+ client->size = offset / BDRV_SECTOR_SIZE;
+
+ return 0;
+}
+#endif
+
+static int vitastor_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
+{
+ bdi->cluster_size = 4096;
+ return 0;
+}
+
+static int64_t vitastor_getlength(BlockDriverState *bs)
+{
+ VitastorClient *client = bs->opaque;
+ return client->size;
+}
+
+#if QEMU_VERSION_MAJOR >= 3 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR > 0
+static void vitastor_refresh_limits(BlockDriverState *bs, Error **errp)
+#else
+static int vitastor_refresh_limits(BlockDriverState *bs)
+#endif
+{
+ bs->bl.request_alignment = 4096;
+#if QEMU_VERSION_MAJOR >= 3 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR > 3
+ bs->bl.min_mem_alignment = 4096;
+#endif
+ bs->bl.opt_mem_alignment = 4096;
+#if QEMU_VERSION_MAJOR < 2 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR == 0
+ return 0;
+#endif
+}
+
+//static int64_t vitastor_get_allocated_file_size(BlockDriverState *bs)
+//{
+// return 0;
+//}
+
+static void vitastor_co_init_task(BlockDriverState *bs, VitastorRPC *task)
+{
+ *task = (VitastorRPC) {
+ .co = qemu_coroutine_self(),
+ .bs = bs,
+ };
+}
+
+static void vitastor_co_generic_bh_cb(void *opaque, long retval)
+{
+ VitastorRPC *task = opaque;
+ task->ret = retval;
+ task->complete = 1;
+ if (qemu_coroutine_self() != task->co)
+ {
+#if QEMU_VERSION_MAJOR >= 3 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR > 8
+ aio_co_wake(task->co);
+#else
+ qemu_coroutine_enter(task->co, NULL);
+ qemu_aio_release(task);
+#endif
+ }
+}
+
+static void vitastor_co_read_cb(void *opaque, long retval, uint64_t version)
+{
+ vitastor_co_generic_bh_cb(opaque, retval);
+}
+
+static int coroutine_fn vitastor_co_preadv(BlockDriverState *bs,
+#if QEMU_VERSION_MAJOR >= 7 || QEMU_VERSION_MAJOR == 6 && QEMU_VERSION_MINOR >= 2
+ int64_t offset, int64_t bytes, QEMUIOVector *iov, BdrvRequestFlags flags
+#else
+ uint64_t offset, uint64_t bytes, QEMUIOVector *iov, int flags
+#endif
+)
+{
+ VitastorClient *client = bs->opaque;
+ VitastorRPC task;
+ vitastor_co_init_task(bs, &task);
+ task.iov = iov;
+
+ uint64_t inode = client->watch ? vitastor_c_inode_get_num(client->watch) : client->inode;
+ qemu_mutex_lock(&client->mutex);
+ vitastor_c_read(client->proxy, inode, offset, bytes, iov->iov, iov->niov, vitastor_co_read_cb, &task);
+ qemu_mutex_unlock(&client->mutex);
+
+ while (!task.complete)
+ {
+ qemu_coroutine_yield();
+ }
+
+ return task.ret;
+}
+
+static int coroutine_fn vitastor_co_pwritev(BlockDriverState *bs,
+#if QEMU_VERSION_MAJOR >= 7 || QEMU_VERSION_MAJOR == 6 && QEMU_VERSION_MINOR >= 2
+ int64_t offset, int64_t bytes, QEMUIOVector *iov, BdrvRequestFlags flags
+#else
+ uint64_t offset, uint64_t bytes, QEMUIOVector *iov, int flags
+#endif
+)
+{
+ VitastorClient *client = bs->opaque;
+ VitastorRPC task;
+ vitastor_co_init_task(bs, &task);
+ task.iov = iov;
+
+ if (client->last_bitmap)
+ {
+ // Invalidate last bitmap on write
+ free(client->last_bitmap);
+ client->last_bitmap = NULL;
+ }
+
+ uint64_t inode = client->watch ? vitastor_c_inode_get_num(client->watch) : client->inode;
+ qemu_mutex_lock(&client->mutex);
+ vitastor_c_write(client->proxy, inode, offset, bytes, 0, iov->iov, iov->niov, vitastor_co_generic_bh_cb, &task);
+ qemu_mutex_unlock(&client->mutex);
+
+ while (!task.complete)
+ {
+ qemu_coroutine_yield();
+ }
+
+ return task.ret;
+}
+
+#if defined VITASTOR_C_API_VERSION && VITASTOR_C_API_VERSION >= 1
+#if QEMU_VERSION_MAJOR >= 2 || QEMU_VERSION_MAJOR == 1 && QEMU_VERSION_MINOR >= 7
+static void vitastor_co_read_bitmap_cb(void *opaque, long retval, uint8_t *bitmap)
+{
+ VitastorRPC *task = opaque;
+ VitastorClient *client = task->bs->opaque;
+ task->ret = retval;
+ task->complete = 1;
+ if (retval >= 0)
+ {
+ task->bitmap = bitmap;
+ if (client->last_bitmap_inode == task->inode &&
+ client->last_bitmap_offset == task->offset &&
+ client->last_bitmap_len == task->len)
+ {
+ free(client->last_bitmap);
+ client->last_bitmap = bitmap;
+ }
+ }
+ if (qemu_coroutine_self() != task->co)
+ {
+#if QEMU_VERSION_MAJOR >= 3 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR > 8
+ aio_co_wake(task->co);
+#else
+ qemu_coroutine_enter(task->co, NULL);
+ qemu_aio_release(task);
+#endif
+ }
+}
+
+static int coroutine_fn vitastor_co_block_status(
+ BlockDriverState *bs, bool want_zero, int64_t offset, int64_t bytes,
+ int64_t *pnum, int64_t *map, BlockDriverState **file)
+{
+ // Allocated => return BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID
+ // Not allocated => return 0
+ // Error => return -errno
+ // Set pnum to length of the extent, `*map` = `offset`, `*file` = `bs`
+ VitastorRPC task;
+ VitastorClient *client = bs->opaque;
+ uint64_t inode = client->watch ? vitastor_c_inode_get_num(client->watch) : client->inode;
+ uint8_t bit = 0;
+ if (client->last_bitmap && client->last_bitmap_inode == inode &&
+ client->last_bitmap_offset <= offset &&
+ client->last_bitmap_offset+client->last_bitmap_len >= (want_zero ? offset+1 : offset+bytes))
+ {
+ // Use the previously read bitmap
+ task.bitmap_granularity = client->last_bitmap_granularity;
+ task.offset = client->last_bitmap_offset;
+ task.len = client->last_bitmap_len;
+ task.bitmap = client->last_bitmap;
+ }
+ else
+ {
+ // Read bitmap from this position, rounding to full inode PG blocks
+ uint32_t block_size = vitastor_c_inode_get_block_size(client->proxy, inode);
+ if (!block_size)
+ return -EAGAIN;
+ // Init coroutine
+ vitastor_co_init_task(bs, &task);
+ free(client->last_bitmap);
+ task.inode = client->last_bitmap_inode = inode;
+ task.bitmap_granularity = client->last_bitmap_granularity = vitastor_c_inode_get_bitmap_granularity(client->proxy, inode);
+ task.offset = client->last_bitmap_offset = offset / block_size * block_size;
+ task.len = client->last_bitmap_len = (offset+bytes+block_size-1) / block_size * block_size - task.offset;
+ task.bitmap = client->last_bitmap = NULL;
+ qemu_mutex_lock(&client->mutex);
+ vitastor_c_read_bitmap(client->proxy, task.inode, task.offset, task.len, !client->skip_parents, vitastor_co_read_bitmap_cb, &task);
+ qemu_mutex_unlock(&client->mutex);
+ while (!task.complete)
+ {
+ qemu_coroutine_yield();
+ }
+ if (task.ret < 0)
+ {
+ // Error
+ return task.ret;
+ }
+ }
+ if (want_zero)
+ {
+ // Get precise mapping with all holes
+ uint64_t bmp_pos = (offset-task.offset) / task.bitmap_granularity;
+ uint64_t bmp_len = task.len / task.bitmap_granularity;
+ uint64_t bmp_end = bmp_pos+1;
+ bit = (task.bitmap[bmp_pos >> 3] >> (bmp_pos & 0x7)) & 1;
+ while (bmp_end < bmp_len && ((task.bitmap[bmp_end >> 3] >> (bmp_end & 0x7)) & 1) == bit)
+ {
+ bmp_end++;
+ }
+ *pnum = (bmp_end-bmp_pos) * task.bitmap_granularity;
+ }
+ else
+ {
+ // Get larger allocated extents, possibly with false positives
+ uint64_t bmp_pos = (offset-task.offset) / task.bitmap_granularity;
+ uint64_t bmp_end = (offset+bytes-task.offset) / task.bitmap_granularity - bmp_pos;
+ while (bmp_pos < bmp_end)
+ {
+ if (!(bmp_pos & 7) && bmp_end >= bmp_pos+8)
+ {
+ bit = bit || task.bitmap[bmp_pos >> 3];
+ bmp_pos += 8;
+ }
+ else
+ {
+ bit = bit || ((task.bitmap[bmp_pos >> 3] >> (bmp_pos & 0x7)) & 1);
+ bmp_pos++;
+ }
+ }
+ *pnum = bytes;
+ }
+ if (bit)
+ {
+ *map = offset;
+ *file = bs;
+ }
+ return (bit ? (BDRV_BLOCK_DATA|BDRV_BLOCK_OFFSET_VALID) : 0);
+}
+#endif
+#if QEMU_VERSION_MAJOR == 1 && QEMU_VERSION_MINOR >= 7 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR < 12
+// QEMU 1.7-2.11
+static int64_t coroutine_fn vitastor_co_get_block_status(BlockDriverState *bs,
+ int64_t sector_num, int nb_sectors, int *pnum, BlockDriverState **file)
+{
+ int64_t map = 0;
+ int64_t pnumbytes = 0;
+ int r = vitastor_co_block_status(bs, 1, sector_num*BDRV_SECTOR_SIZE, nb_sectors*BDRV_SECTOR_SIZE, &pnumbytes, &map, &file);
+ *pnum = pnumbytes/BDRV_SECTOR_SIZE;
+ return r;
+}
+#endif
+#endif
+
+#if !( QEMU_VERSION_MAJOR >= 3 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR >= 7 )
+static int coroutine_fn vitastor_co_readv(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *iov)
+{
+ return vitastor_co_preadv(bs, sector_num*BDRV_SECTOR_SIZE, nb_sectors*BDRV_SECTOR_SIZE, iov, 0);
+}
+
+static int coroutine_fn vitastor_co_writev(BlockDriverState *bs, int64_t sector_num, int nb_sectors, QEMUIOVector *iov)
+{
+ return vitastor_co_pwritev(bs, sector_num*BDRV_SECTOR_SIZE, nb_sectors*BDRV_SECTOR_SIZE, iov, 0);
+}
+#endif
+
+static int coroutine_fn vitastor_co_flush(BlockDriverState *bs)
+{
+ VitastorClient *client = bs->opaque;
+ VitastorRPC task;
+ vitastor_co_init_task(bs, &task);
+
+ qemu_mutex_lock(&client->mutex);
+ vitastor_c_sync(client->proxy, vitastor_co_generic_bh_cb, &task);
+ qemu_mutex_unlock(&client->mutex);
+
+ while (!task.complete)
+ {
+ qemu_coroutine_yield();
+ }
+
+ return task.ret;
+}
+
+#if QEMU_VERSION_MAJOR >= 3 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR > 0
+static QemuOptsList vitastor_create_opts = {
+ .name = "vitastor-create-opts",
+ .head = QTAILQ_HEAD_INITIALIZER(vitastor_create_opts.head),
+ .desc = {
+ {
+ .name = BLOCK_OPT_SIZE,
+ .type = QEMU_OPT_SIZE,
+ .help = "Virtual disk size"
+ },
+ { /* end of list */ }
+ }
+};
+#else
+static QEMUOptionParameter vitastor_create_opts[] = {
+ {
+ .name = BLOCK_OPT_SIZE,
+ .type = OPT_SIZE,
+ .help = "Virtual disk size"
+ },
+ { NULL }
+};
+#endif
+
+#if QEMU_VERSION_MAJOR >= 4
+static const char *vitastor_strong_runtime_opts[] = {
+ "inode",
+ "pool",
+ "config-path",
+ "etcd-host",
+ "etcd-prefix",
+
+ NULL
+};
+#endif
+
+static BlockDriver bdrv_vitastor = {
+ .format_name = "vitastor",
+ .protocol_name = "vitastor",
+
+ .instance_size = sizeof(VitastorClient),
+ .bdrv_parse_filename = vitastor_parse_filename,
+
+ .bdrv_has_zero_init = bdrv_has_zero_init_1,
+ .bdrv_get_info = vitastor_get_info,
+ .bdrv_getlength = vitastor_getlength,
+#if QEMU_VERSION_MAJOR >= 3 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR > 2
+ .bdrv_probe_blocksizes = vitastor_probe_blocksizes,
+#endif
+ .bdrv_refresh_limits = vitastor_refresh_limits,
+
+ // FIXME: Implement it along with per-inode statistics
+ //.bdrv_get_allocated_file_size = vitastor_get_allocated_file_size,
+
+ .bdrv_file_open = vitastor_file_open,
+ .bdrv_close = vitastor_close,
+
+ // Option list for the create operation
+#if QEMU_VERSION_MAJOR >= 3 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR > 0
+ .create_opts = &vitastor_create_opts,
+#else
+ .create_options = vitastor_create_opts,
+#endif
+
+ // For qmp_blockdev_create(), used by the qemu monitor / QAPI
+ // Requires patching QAPI IDL, thus unimplemented
+ //.bdrv_co_create = vitastor_co_create,
+
+#if QEMU_VERSION_MAJOR >= 3 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR >= 12
+ // For bdrv_create(), used by qemu-img
+ .bdrv_co_create_opts = vitastor_co_create_opts,
+#endif
+
+#if QEMU_VERSION_MAJOR >= 3
+ .bdrv_co_truncate = vitastor_co_truncate,
+#endif
+
+#if defined VITASTOR_C_API_VERSION && VITASTOR_C_API_VERSION >= 1
+#if QEMU_VERSION_MAJOR >= 3 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR >= 12
+ // For snapshot export
+ .bdrv_co_block_status = vitastor_co_block_status,
+#elif QEMU_VERSION_MAJOR == 1 && QEMU_VERSION_MINOR >= 7 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR < 12
+ .bdrv_co_get_block_status = vitastor_co_get_block_status,
+#endif
+#endif
+
+#if QEMU_VERSION_MAJOR >= 3 || QEMU_VERSION_MAJOR == 2 && QEMU_VERSION_MINOR >= 7
+ .bdrv_co_preadv = vitastor_co_preadv,
+ .bdrv_co_pwritev = vitastor_co_pwritev,
+#else
+ .bdrv_co_readv = vitastor_co_readv,
+ .bdrv_co_writev = vitastor_co_writev,
+#endif
+
+ .bdrv_co_flush_to_disk = vitastor_co_flush,
+
+#if QEMU_VERSION_MAJOR >= 4
+ .strong_runtime_opts = vitastor_strong_runtime_opts,
+#endif
+};
+
+static void vitastor_block_init(void)
+{
+ bdrv_register(&bdrv_vitastor);
+}
+
+block_init(vitastor_block_init);

View File

@@ -14,10 +14,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index 7e19bbff5f..b527e82a82 100644
index b283093e5b..821405fd02 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -450,7 +450,7 @@ static QemuOptsList raw_runtime_opts = {
@@ -552,7 +552,7 @@ static QemuOptsList raw_runtime_opts = {
{
.name = "locking",
.type = QEMU_OPT_STRING,
@@ -26,7 +26,7 @@ index 7e19bbff5f..b527e82a82 100644
},
{
.name = "pr-manager",
@@ -550,7 +550,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
@@ -652,7 +652,7 @@ static int raw_open_common(BlockDriverState *bs, QDict *options,
s->use_lock = false;
break;
case ON_OFF_AUTO_AUTO:

View File

@@ -5,22 +5,21 @@ Subject: [PATCH] PVE: [Config] Adjust network script path to /etc/kvm/
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
include/net/net.h | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
include/net/net.h | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/include/net/net.h b/include/net/net.h
index 39085d9444..487e3ea1b4 100644
index 523136c7ac..c27859b4f6 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -208,8 +208,9 @@ void netdev_add(QemuOpts *opts, Error **errp);
@@ -226,8 +226,8 @@ void netdev_add(QemuOpts *opts, Error **errp);
int net_hub_id_for_client(NetClientState *nc, int *id);
NetClientState *net_hub_port_find(int hub_id);
-#define DEFAULT_NETWORK_SCRIPT "/etc/qemu-ifup"
-#define DEFAULT_NETWORK_DOWN_SCRIPT "/etc/qemu-ifdown"
+#define DEFAULT_NETWORK_SCRIPT "/etc/kvm/kvm-ifup"
+#define DEFAULT_NETWORK_DOWN_SCRIPT "/etc/kvm/kvm-ifdown"
+
-#define DEFAULT_NETWORK_SCRIPT CONFIG_SYSCONFDIR "/qemu-ifup"
-#define DEFAULT_NETWORK_DOWN_SCRIPT CONFIG_SYSCONFDIR "/qemu-ifdown"
+#define DEFAULT_NETWORK_SCRIPT CONFIG_SYSCONFDIR "/kvm/kvm-ifup"
+#define DEFAULT_NETWORK_DOWN_SCRIPT CONFIG_SYSCONFDIR "/kvm/kvm-ifdown"
#define DEFAULT_BRIDGE_HELPER CONFIG_QEMU_HELPERDIR "/qemu-bridge-helper"
#define DEFAULT_BRIDGE_INTERFACE "br0"

View File

@@ -10,10 +10,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/target/i386/cpu.h b/target/i386/cpu.h
index e818fc712a..dd9bf7b3da 100644
index 04f2b790c9..19fdbb981c 100644
--- a/target/i386/cpu.h
+++ b/target/i386/cpu.h
@@ -1954,9 +1954,9 @@ uint64_t cpu_get_tsc(CPUX86State *env);
@@ -2039,9 +2039,9 @@ uint64_t cpu_get_tsc(CPUX86State *env);
#define CPU_RESOLVING_TYPE TYPE_X86_CPU
#ifdef TARGET_X86_64
@@ -24,4 +24,4 @@ index e818fc712a..dd9bf7b3da 100644
+#define TARGET_DEFAULT_CPU_TYPE X86_CPU_TYPE_NAME("kvm32")
#endif
#define cpu_signal_handler cpu_x86_signal_handler
#define cpu_list x86_cpu_list

View File

@@ -9,10 +9,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 9 insertions(+), 6 deletions(-)
diff --git a/ui/spice-core.c b/ui/spice-core.c
index ecc2ec2c55..ca04965ead 100644
index 31974b8d6c..a3acdbd682 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -668,32 +668,35 @@ void qemu_spice_init(void)
@@ -689,32 +689,35 @@ static void qemu_spice_init(void)
if (tls_port) {
x509_dir = qemu_opt_get(opts, "x509-dir");

View File

@@ -9,7 +9,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 11 insertions(+), 4 deletions(-)
diff --git a/block/gluster.c b/block/gluster.c
index 0aa1f2cda4..dcd1ef7ebc 100644
index 592e71b22a..aebfece6eb 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -42,7 +42,7 @@

View File

@@ -1,24 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Alexandre Derumier <aderumier@odiso.com>
Date: Mon, 6 Apr 2020 12:16:34 +0200
Subject: [PATCH] PVE: [Config] smm_available = false
Signed-off-by: Alexandre Derumier <aderumier@odiso.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
hw/i386/x86.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index b82770024c..bd05b3c79a 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -896,7 +896,7 @@ bool x86_machine_is_smm_enabled(X86MachineState *x86ms)
if (tcg_enabled() || qtest_enabled()) {
smm_available = true;
} else if (kvm_enabled()) {
- smm_available = kvm_has_smm();
+ smm_available = false;
}
if (smm_available) {

View File

@@ -18,10 +18,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+)
diff --git a/block/rbd.c b/block/rbd.c
index e637639a07..5717e7258c 100644
index def96292e0..a4b8fb482c 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -651,6 +651,8 @@ static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
@@ -820,6 +820,8 @@ static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
rados_conf_set(*cluster, "rbd_cache", "false");
}

View File

@@ -11,10 +11,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
3 files changed, 43 insertions(+)
diff --git a/net/net.c b/net/net.c
index 38778e831d..dabfb482f0 100644
index f0d14dbfc1..6d476c47ef 100644
--- a/net/net.c
+++ b/net/net.c
@@ -1331,6 +1331,33 @@ void hmp_info_network(Monitor *mon, const QDict *qdict)
@@ -1334,6 +1334,33 @@ void hmp_info_network(Monitor *mon, const QDict *qdict)
}
}
@@ -49,10 +49,10 @@ index 38778e831d..dabfb482f0 100644
{
NetClientState *nc;
diff --git a/qapi/net.json b/qapi/net.json
index cebb1b52e3..f6854483b1 100644
index 7fab2e7cd8..74c9a6109e 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -34,6 +34,21 @@
@@ -35,6 +35,21 @@
##
{ 'command': 'set_link', 'data': {'name': 'str', 'up': 'bool'} }
@@ -75,14 +75,14 @@ index cebb1b52e3..f6854483b1 100644
# @netdev_add:
#
diff --git a/qapi/pragma.json b/qapi/pragma.json
index cffae27666..5a3e3de95f 100644
index 3bc0335d1f..7c91ea3685 100644
--- a/qapi/pragma.json
+++ b/qapi/pragma.json
@@ -5,6 +5,7 @@
{ 'pragma': {
# Commands allowed to return a non-dictionary:
'returns-whitelist': [
+ 'get_link_status',
@@ -22,6 +22,7 @@
'system_reset',
'system_wakeup' ],
'command-returns-exceptions': [
+ 'get_link_status',
'human-monitor-command',
'qom-get',
'query-migrate-cache-size',
'query-tpm-models',

View File

@@ -16,7 +16,7 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 9 insertions(+), 1 deletion(-)
diff --git a/block/gluster.c b/block/gluster.c
index dcd1ef7ebc..ac79b4bdb4 100644
index aebfece6eb..3b7ee2f649 100644
--- a/block/gluster.c
+++ b/block/gluster.c
@@ -57,6 +57,7 @@ typedef struct GlusterAIOCB {
@@ -27,7 +27,7 @@ index dcd1ef7ebc..ac79b4bdb4 100644
} GlusterAIOCB;
typedef struct BDRVGlusterState {
@@ -763,8 +764,10 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret,
@@ -752,8 +753,10 @@ static void gluster_finish_aiocb(struct glfs_fd *fd, ssize_t ret,
acb->ret = 0; /* Success */
} else if (ret < 0) {
acb->ret = -errno; /* Read/Write failed */
@@ -39,15 +39,15 @@ index dcd1ef7ebc..ac79b4bdb4 100644
}
aio_co_schedule(acb->aio_context, acb->coroutine);
@@ -1035,6 +1038,7 @@ static coroutine_fn int qemu_gluster_co_pwrite_zeroes(BlockDriverState *bs,
@@ -1022,6 +1025,7 @@ static coroutine_fn int qemu_gluster_co_pwrite_zeroes(BlockDriverState *bs,
acb.ret = 0;
acb.coroutine = qemu_coroutine_self();
acb.aio_context = bdrv_get_aio_context(bs);
+ acb.is_write = true;
ret = glfs_zerofill_async(s->fd, offset, size, gluster_finish_aiocb, &acb);
ret = glfs_zerofill_async(s->fd, offset, bytes, gluster_finish_aiocb, &acb);
if (ret < 0) {
@@ -1216,9 +1220,11 @@ static coroutine_fn int qemu_gluster_co_rw(BlockDriverState *bs,
@@ -1203,9 +1207,11 @@ static coroutine_fn int qemu_gluster_co_rw(BlockDriverState *bs,
acb.aio_context = bdrv_get_aio_context(bs);
if (write) {
@@ -59,7 +59,7 @@ index dcd1ef7ebc..ac79b4bdb4 100644
ret = glfs_preadv_async(s->fd, qiov->iov, qiov->niov, offset, 0,
gluster_finish_aiocb, &acb);
}
@@ -1281,6 +1287,7 @@ static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs)
@@ -1269,6 +1275,7 @@ static coroutine_fn int qemu_gluster_co_flush_to_disk(BlockDriverState *bs)
acb.ret = 0;
acb.coroutine = qemu_coroutine_self();
acb.aio_context = bdrv_get_aio_context(bs);
@@ -67,11 +67,11 @@ index dcd1ef7ebc..ac79b4bdb4 100644
ret = glfs_fsync_async(s->fd, gluster_finish_aiocb, &acb);
if (ret < 0) {
@@ -1327,6 +1334,7 @@ static coroutine_fn int qemu_gluster_co_pdiscard(BlockDriverState *bs,
@@ -1317,6 +1324,7 @@ static coroutine_fn int qemu_gluster_co_pdiscard(BlockDriverState *bs,
acb.ret = 0;
acb.coroutine = qemu_coroutine_self();
acb.aio_context = bdrv_get_aio_context(bs);
+ acb.is_write = true;
ret = glfs_discard_async(s->fd, offset, size, gluster_finish_aiocb, &acb);
ret = glfs_discard_async(s->fd, offset, bytes, gluster_finish_aiocb, &acb);
if (ret < 0) {

View File

@@ -9,10 +9,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/qemu-img.c b/qemu-img.c
index 821cbf610e..667c540a89 100644
index f036a1d428..080ad9bca7 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -2821,7 +2821,8 @@ static int img_info(int argc, char **argv)
@@ -2989,7 +2989,8 @@ static int img_info(int argc, char **argv)
list = collect_image_info_list(image_opts, filename, fmt, chain,
force_share);
if (!list) {

View File

@@ -33,14 +33,14 @@ Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
qemu-img-cmds.hx | 4 +-
qemu-img.c | 192 +++++++++++++++++++++++++++++------------------
2 files changed, 122 insertions(+), 74 deletions(-)
qemu-img.c | 187 +++++++++++++++++++++++++++++------------------
2 files changed, 119 insertions(+), 72 deletions(-)
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index c9c54de1df..0f98033658 100644
index 72bcdcfbfa..0b2999f3ab 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -51,9 +51,9 @@ SRST
@@ -58,9 +58,9 @@ SRST
ERST
DEF("dd", img_dd,
@@ -53,10 +53,10 @@ index c9c54de1df..0f98033658 100644
DEF("info", img_info,
diff --git a/qemu-img.c b/qemu-img.c
index 667c540a89..6b7d1fcb51 100644
index 080ad9bca7..1f457d9e80 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4444,10 +4444,12 @@ out:
@@ -4805,10 +4805,12 @@ static int img_bitmap(int argc, char **argv)
#define C_IF 04
#define C_OF 010
#define C_SKIP 020
@@ -69,7 +69,7 @@ index 667c540a89..6b7d1fcb51 100644
};
struct DdIo {
@@ -4526,6 +4528,20 @@ static int img_dd_skip(const char *arg,
@@ -4884,6 +4886,19 @@ static int img_dd_skip(const char *arg,
return 0;
}
@@ -77,10 +77,9 @@ index 667c540a89..6b7d1fcb51 100644
+ struct DdIo *in, struct DdIo *out,
+ struct DdInfo *dd)
+{
+ dd->osize = cvtnum(arg);
+ dd->osize = cvtnum("size", arg);
+
+ if (dd->osize < 0) {
+ error_report("invalid number: '%s'", arg);
+ return 1;
+ }
+
@@ -90,7 +89,7 @@ index 667c540a89..6b7d1fcb51 100644
static int img_dd(int argc, char **argv)
{
int ret = 0;
@@ -4566,6 +4582,7 @@ static int img_dd(int argc, char **argv)
@@ -4924,6 +4939,7 @@ static int img_dd(int argc, char **argv)
{ "if", img_dd_if, C_IF },
{ "of", img_dd_of, C_OF },
{ "skip", img_dd_skip, C_SKIP },
@@ -98,7 +97,7 @@ index 667c540a89..6b7d1fcb51 100644
{ NULL, NULL, 0 }
};
const struct option long_options[] = {
@@ -4644,8 +4661,13 @@ static int img_dd(int argc, char **argv)
@@ -4999,91 +5015,112 @@ static int img_dd(int argc, char **argv)
arg = NULL;
}
@@ -106,53 +105,30 @@ index 667c540a89..6b7d1fcb51 100644
- error_report("Must specify both input and output files");
+ if (!(dd.flags & C_IF) && (!fmt || strcmp(fmt, "raw") != 0)) {
+ error_report("Input format must be raw when readin from stdin");
+ ret = -1;
+ goto out;
+ }
ret = -1;
goto out;
}
-
- blk1 = img_open(image_opts, in.filename, fmt, 0, false, false,
- force_share);
-
- if (!blk1) {
+ if (!(dd.flags & C_OF) && strcmp(out_fmt, "raw") != 0) {
+ error_report("Output format must be raw when writing to stdout");
ret = -1;
goto out;
}
@@ -4657,85 +4679,101 @@ static int img_dd(int argc, char **argv)
goto out;
}
- blk1 = img_open(image_opts, in.filename, fmt, 0, false, false,
- force_share);
+ if (dd.flags & C_IF) {
+ blk1 = img_open(image_opts, in.filename, fmt, 0, false, false,
+ force_share);
- if (!blk1) {
- ret = -1;
- goto out;
+ if (!blk1) {
+ ret = -1;
+ goto out;
+ }
}
- drv = bdrv_find_format(out_fmt);
- if (!drv) {
- error_report("Unknown file format");
+ if (dd.flags & C_OSIZE) {
+ size = dd.osize;
+ } else if (dd.flags & C_IF) {
+ size = blk_getlength(blk1);
+ if (size < 0) {
+ error_report("Failed to get size for '%s'", in.filename);
+ ret = -1;
+ goto out;
+ }
+ } else if (dd.flags & C_COUNT) {
+ size = dd.count * in.bsz;
+ } else {
+ error_report("Output size must be known when reading from stdin");
ret = -1;
goto out;
}
- ret = -1;
- goto out;
- }
- proto_drv = bdrv_find_protocol(out.filename, true, &local_err);
+ if (dd.flags & C_IF) {
+ blk1 = img_open(image_opts, in.filename, fmt, 0, false, false,
+ force_share);
- if (!proto_drv) {
- error_report_err(local_err);
@@ -170,14 +146,50 @@ index 667c540a89..6b7d1fcb51 100644
- proto_drv->format_name);
- ret = -1;
- goto out;
+ if (!(dd.flags & C_OSIZE) && dd.flags & C_COUNT && dd.count <= INT64_MAX / in.bsz &&
+ dd.count * in.bsz < size) {
+ size = dd.count * in.bsz;
+ if (!blk1) {
+ ret = -1;
+ goto out;
+ }
}
- create_opts = qemu_opts_append(create_opts, drv->create_opts);
- create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);
- opts = qemu_opts_create(create_opts, NULL, 0, &error_abort);
-
- size = blk_getlength(blk1);
- if (size < 0) {
- error_report("Failed to get size for '%s'", in.filename);
+ if (dd.flags & C_OSIZE) {
+ size = dd.osize;
+ } else if (dd.flags & C_IF) {
+ size = blk_getlength(blk1);
+ if (size < 0) {
+ error_report("Failed to get size for '%s'", in.filename);
+ ret = -1;
+ goto out;
+ }
+ } else if (dd.flags & C_COUNT) {
+ size = dd.count * in.bsz;
+ } else {
+ error_report("Output size must be known when reading from stdin");
ret = -1;
goto out;
}
- if (dd.flags & C_COUNT && dd.count <= INT64_MAX / in.bsz &&
+ if (!(dd.flags & C_OSIZE) && dd.flags & C_COUNT && dd.count <= INT64_MAX / in.bsz &&
dd.count * in.bsz < size) {
size = dd.count * in.bsz;
}
- /* Overflow means the specified offset is beyond input image's size */
- if (dd.flags & C_SKIP && (in.offset > INT64_MAX / in.bsz ||
- size < in.bsz * in.offset)) {
- qemu_opt_set_number(opts, BLOCK_OPT_SIZE, 0, &error_abort);
- } else {
- qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
- size - in.bsz * in.offset, &error_abort);
- }
+ if (dd.flags & C_OF) {
+ drv = bdrv_find_format(out_fmt);
+ if (!drv) {
@@ -187,9 +199,11 @@ index 667c540a89..6b7d1fcb51 100644
+ }
+ proto_drv = bdrv_find_protocol(out.filename, true, &local_err);
- size = blk_getlength(blk1);
- if (size < 0) {
- error_report("Failed to get size for '%s'", in.filename);
- ret = bdrv_create(drv, out.filename, opts, &local_err);
- if (ret < 0) {
- error_reportf_err(local_err,
- "%s: error while creating output image: ",
- out.filename);
- ret = -1;
- goto out;
- }
@@ -213,20 +227,18 @@ index 667c540a89..6b7d1fcb51 100644
+ create_opts = qemu_opts_append(create_opts, drv->create_opts);
+ create_opts = qemu_opts_append(create_opts, proto_drv->create_opts);
- if (dd.flags & C_COUNT && dd.count <= INT64_MAX / in.bsz &&
- dd.count * in.bsz < size) {
- size = dd.count * in.bsz;
- }
- /* TODO, we can't honour --image-opts for the target,
- * since it needs to be given in a format compatible
- * with the bdrv_create() call above which does not
- * support image-opts style.
- */
- blk2 = img_open_file(out.filename, NULL, out_fmt, BDRV_O_RDWR,
- false, false, false);
+ opts = qemu_opts_create(create_opts, NULL, 0, &error_abort);
- /* Overflow means the specified offset is beyond input image's size */
- if (dd.flags & C_SKIP && (in.offset > INT64_MAX / in.bsz ||
- size < in.bsz * in.offset)) {
- qemu_opt_set_number(opts, BLOCK_OPT_SIZE, 0, &error_abort);
- } else {
- qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
- size - in.bsz * in.offset, &error_abort);
- }
- if (!blk2) {
- ret = -1;
- goto out;
+ /* Overflow means the specified offset is beyond input image's size */
+ if (dd.flags & C_OSIZE) {
+ qemu_opt_set_number(opts, BLOCK_OPT_SIZE, size, &error_abort);
@@ -237,15 +249,7 @@ index 667c540a89..6b7d1fcb51 100644
+ qemu_opt_set_number(opts, BLOCK_OPT_SIZE,
+ size - in.bsz * in.offset, &error_abort);
+ }
- ret = bdrv_create(drv, out.filename, opts, &local_err);
- if (ret < 0) {
- error_reportf_err(local_err,
- "%s: error while creating output image: ",
- out.filename);
- ret = -1;
- goto out;
- }
+
+ ret = bdrv_create(drv, out.filename, opts, &local_err);
+ if (ret < 0) {
+ error_reportf_err(local_err,
@@ -254,14 +258,7 @@ index 667c540a89..6b7d1fcb51 100644
+ ret = -1;
+ goto out;
+ }
- /* TODO, we can't honour --image-opts for the target,
- * since it needs to be given in a format compatible
- * with the bdrv_create() call above which does not
- * support image-opts style.
- */
- blk2 = img_open_file(out.filename, NULL, out_fmt, BDRV_O_RDWR,
- false, false, false);
+
+ /* TODO, we can't honour --image-opts for the target,
+ * since it needs to be given in a format compatible
+ * with the bdrv_create() call above which does not
@@ -269,10 +266,7 @@ index 667c540a89..6b7d1fcb51 100644
+ */
+ blk2 = img_open_file(out.filename, NULL, out_fmt, BDRV_O_RDWR,
+ false, false, false);
- if (!blk2) {
- ret = -1;
- goto out;
+
+ if (!blk2) {
+ ret = -1;
+ goto out;
@@ -280,7 +274,7 @@ index 667c540a89..6b7d1fcb51 100644
}
if (dd.flags & C_SKIP && (in.offset > INT64_MAX / in.bsz ||
@@ -4753,11 +4791,17 @@ static int img_dd(int argc, char **argv)
@@ -5101,11 +5138,17 @@ static int img_dd(int argc, char **argv)
for (out_pos = 0; in_pos < size; block_count++) {
int in_ret, out_ret;
@@ -302,7 +296,7 @@ index 667c540a89..6b7d1fcb51 100644
}
if (in_ret < 0) {
error_report("error while reading from input image file: %s",
@@ -4767,9 +4811,13 @@ static int img_dd(int argc, char **argv)
@@ -5115,9 +5158,13 @@ static int img_dd(int argc, char **argv)
}
in_pos += in_ret;

View File

@@ -11,14 +11,14 @@ an expected end of input.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
qemu-img.c | 29 ++++++++++++++++++++++++++---
1 file changed, 26 insertions(+), 3 deletions(-)
qemu-img.c | 28 +++++++++++++++++++++++++---
1 file changed, 25 insertions(+), 3 deletions(-)
diff --git a/qemu-img.c b/qemu-img.c
index 6b7d1fcb51..17393b2f53 100644
index 1f457d9e80..d9e8a8c4d4 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4445,11 +4445,13 @@ out:
@@ -4806,11 +4806,13 @@ static int img_bitmap(int argc, char **argv)
#define C_OF 010
#define C_SKIP 020
#define C_OSIZE 040
@@ -32,7 +32,7 @@ index 6b7d1fcb51..17393b2f53 100644
};
struct DdIo {
@@ -4542,6 +4544,20 @@ static int img_dd_osize(const char *arg,
@@ -4899,6 +4901,19 @@ static int img_dd_osize(const char *arg,
return 0;
}
@@ -40,10 +40,9 @@ index 6b7d1fcb51..17393b2f53 100644
+ struct DdIo *in, struct DdIo *out,
+ struct DdInfo *dd)
+{
+ dd->isize = cvtnum(arg);
+ dd->isize = cvtnum("size", arg);
+
+ if (dd->isize < 0) {
+ error_report("invalid number: '%s'", arg);
+ return 1;
+ }
+
@@ -53,7 +52,7 @@ index 6b7d1fcb51..17393b2f53 100644
static int img_dd(int argc, char **argv)
{
int ret = 0;
@@ -4556,12 +4572,14 @@ static int img_dd(int argc, char **argv)
@@ -4913,12 +4928,14 @@ static int img_dd(int argc, char **argv)
int c, i;
const char *out_fmt = "raw";
const char *fmt = NULL;
@@ -69,7 +68,7 @@ index 6b7d1fcb51..17393b2f53 100644
};
struct DdIo in = {
.bsz = 512, /* Block size is by default 512 bytes */
@@ -4583,6 +4601,7 @@ static int img_dd(int argc, char **argv)
@@ -4940,6 +4957,7 @@ static int img_dd(int argc, char **argv)
{ "of", img_dd_of, C_OF },
{ "skip", img_dd_skip, C_SKIP },
{ "osize", img_dd_osize, C_OSIZE },
@@ -77,7 +76,7 @@ index 6b7d1fcb51..17393b2f53 100644
{ NULL, NULL, 0 }
};
const struct option long_options[] = {
@@ -4789,14 +4808,18 @@ static int img_dd(int argc, char **argv)
@@ -5136,14 +5154,18 @@ static int img_dd(int argc, char **argv)
in.buf = g_new(uint8_t, in.bsz);

View File

@@ -0,0 +1,121 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Alexandre Derumier <aderumier@odiso.com>
Date: Mon, 6 Apr 2020 12:16:42 +0200
Subject: [PATCH] PVE: [Up] qemu-img dd: add -n skip_create
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: fix getopt-string + add documentation]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
docs/tools/qemu-img.rst | 11 ++++++++++-
qemu-img-cmds.hx | 4 ++--
qemu-img.c | 23 ++++++++++++++---------
3 files changed, 26 insertions(+), 12 deletions(-)
diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index d663dd92bd..a49badb158 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -208,6 +208,10 @@ Parameters to convert subcommand:
Parameters to dd subcommand:
+.. option:: -n
+
+ Skip the creation of the target volume
+
.. program:: qemu-img-dd
.. option:: bs=BLOCK_SIZE
@@ -488,7 +492,7 @@ Command description:
it doesn't need to be specified separately in this case.
-.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] if=INPUT of=OUTPUT
+.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [-n] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] if=INPUT of=OUTPUT
dd copies from *INPUT* file to *OUTPUT* file converting it from
*FMT* format to *OUTPUT_FMT* format.
@@ -499,6 +503,11 @@ Command description:
The size syntax is similar to :manpage:`dd(1)`'s size syntax.
+ If the ``-n`` option is specified, the target volume creation will be
+ skipped. This is useful for formats such as ``rbd`` if the target
+ volume has already been created with site specific options that cannot
+ be supplied through ``qemu-img``.
+
.. option:: info [--object OBJECTDEF] [--image-opts] [-f FMT] [--output=OFMT] [--backing-chain] [-U] FILENAME
Give information about the disk image *FILENAME*. Use it in
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index 0b2999f3ab..f3b2b1b4de 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -58,9 +58,9 @@ SRST
ERST
DEF("dd", img_dd,
- "dd [--image-opts] [-U] [-f fmt] [-O output_fmt] [bs=block_size] [count=blocks] [skip=blocks] [osize=output_size] if=input of=output")
+ "dd [--image-opts] [-U] [-f fmt] [-O output_fmt] [-n] [bs=block_size] [count=blocks] [skip=blocks] [osize=output_size] if=input of=output")
SRST
-.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] [osize=OUTPUT_SIZE] if=INPUT of=OUTPUT
+.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [-n] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] [osize=OUTPUT_SIZE] if=INPUT of=OUTPUT
ERST
DEF("info", img_info,
diff --git a/qemu-img.c b/qemu-img.c
index d9e8a8c4d4..015d6d2ce4 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4930,7 +4930,7 @@ static int img_dd(int argc, char **argv)
const char *fmt = NULL;
int64_t size = 0, readsize = 0;
int64_t block_count = 0, out_pos, in_pos;
- bool force_share = false;
+ bool force_share = false, skip_create = false;
struct DdInfo dd = {
.flags = 0,
.count = 0,
@@ -4968,7 +4968,7 @@ static int img_dd(int argc, char **argv)
{ 0, 0, 0, 0 }
};
- while ((c = getopt_long(argc, argv, ":hf:O:U", long_options, NULL))) {
+ while ((c = getopt_long(argc, argv, ":hf:O:Un", long_options, NULL))) {
if (c == EOF) {
break;
}
@@ -4988,6 +4988,9 @@ static int img_dd(int argc, char **argv)
case 'h':
help();
break;
+ case 'n':
+ skip_create = true;
+ break;
case 'U':
force_share = true;
break;
@@ -5118,13 +5121,15 @@ static int img_dd(int argc, char **argv)
size - in.bsz * in.offset, &error_abort);
}
- ret = bdrv_create(drv, out.filename, opts, &local_err);
- if (ret < 0) {
- error_reportf_err(local_err,
- "%s: error while creating output image: ",
- out.filename);
- ret = -1;
- goto out;
+ if (!skip_create) {
+ ret = bdrv_create(drv, out.filename, opts, &local_err);
+ if (ret < 0) {
+ error_reportf_err(local_err,
+ "%s: error while creating output image: ",
+ out.filename);
+ ret = -1;
+ goto out;
+ }
}
/* TODO, we can't honour --image-opts for the target,

View File

@@ -1,65 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Alexandre Derumier <aderumier@odiso.com>
Date: Mon, 6 Apr 2020 12:16:42 +0200
Subject: [PATCH] PVE: [Up] qemu-img dd : add -n skip_create
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
qemu-img.c | 23 ++++++++++++++---------
1 file changed, 14 insertions(+), 9 deletions(-)
diff --git a/qemu-img.c b/qemu-img.c
index 17393b2f53..574bb3c73d 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4574,7 +4574,7 @@ static int img_dd(int argc, char **argv)
const char *fmt = NULL;
int64_t size = 0, readsize = 0;
int64_t block_count = 0, out_pos, in_pos;
- bool force_share = false;
+ bool force_share = false, skip_create = false;
struct DdInfo dd = {
.flags = 0,
.count = 0,
@@ -4612,7 +4612,7 @@ static int img_dd(int argc, char **argv)
{ 0, 0, 0, 0 }
};
- while ((c = getopt_long(argc, argv, ":hf:O:U", long_options, NULL))) {
+ while ((c = getopt_long(argc, argv, ":hf:O:U:n", long_options, NULL))) {
if (c == EOF) {
break;
}
@@ -4632,6 +4632,9 @@ static int img_dd(int argc, char **argv)
case 'h':
help();
break;
+ case 'n':
+ skip_create = true;
+ break;
case 'U':
force_share = true;
break;
@@ -4772,13 +4775,15 @@ static int img_dd(int argc, char **argv)
size - in.bsz * in.offset, &error_abort);
}
- ret = bdrv_create(drv, out.filename, opts, &local_err);
- if (ret < 0) {
- error_reportf_err(local_err,
- "%s: error while creating output image: ",
- out.filename);
- ret = -1;
- goto out;
+ if (!skip_create) {
+ ret = bdrv_create(drv, out.filename, opts, &local_err);
+ if (ret < 0) {
+ error_reportf_err(local_err,
+ "%s: error while creating output image: ",
+ out.filename);
+ ret = -1;
+ goto out;
+ }
}
/* TODO, we can't honour --image-opts for the target,

View File

@@ -10,14 +10,14 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
hw/virtio/virtio-balloon.c | 33 +++++++++++++++++++++++++++++++--
monitor/hmp-cmds.c | 30 +++++++++++++++++++++++++++++-
qapi/misc.json | 22 +++++++++++++++++++++-
qapi/machine.json | 22 +++++++++++++++++++++-
3 files changed, 81 insertions(+), 4 deletions(-)
diff --git a/hw/virtio/virtio-balloon.c b/hw/virtio/virtio-balloon.c
index a4729f7fc9..97c1c16ccf 100644
index 9a4f491b54..1faa16234e 100644
--- a/hw/virtio/virtio-balloon.c
+++ b/hw/virtio/virtio-balloon.c
@@ -713,8 +713,37 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
@@ -812,8 +812,37 @@ static uint64_t virtio_balloon_get_features(VirtIODevice *vdev, uint64_t f,
static void virtio_balloon_stat(void *opaque, BalloonInfo *info)
{
VirtIOBalloon *dev = opaque;
@@ -58,10 +58,10 @@ index a4729f7fc9..97c1c16ccf 100644
static void virtio_balloon_to_target(void *opaque, ram_addr_t target)
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 9b94e67879..0c6f6ff331 100644
index 2e91ccb738..e9fa9af6bd 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -653,7 +653,35 @@ void hmp_info_balloon(Monitor *mon, const QDict *qdict)
@@ -696,7 +696,35 @@ void hmp_info_balloon(Monitor *mon, const QDict *qdict)
return;
}
@@ -98,13 +98,13 @@ index 9b94e67879..0c6f6ff331 100644
qapi_free_BalloonInfo(info);
}
diff --git a/qapi/misc.json b/qapi/misc.json
index 99b90ac80b..e2a6678eae 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -225,10 +225,30 @@
#
# @actual: the number of bytes the balloon currently contains
diff --git a/qapi/machine.json b/qapi/machine.json
index 067e3f5378..91f3be6f44 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -1018,10 +1018,30 @@
# @actual: the logical size of the VM in bytes
# Formula used: logical_vm_size = vm_ram_size - balloon_size
#
+# @last_update: time when stats got updated from guest
+#
@@ -122,7 +122,7 @@ index 99b90ac80b..e2a6678eae 100644
+#
+# @max_mem: amount of memory (in bytes) assigned to the guest
+#
# Since: 0.14.0
# Since: 0.14
#
##
-{ 'struct': 'BalloonInfo', 'data': {'actual': 'int' } }

View File

@@ -13,10 +13,10 @@ Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
2 files changed, 9 insertions(+), 1 deletion(-)
diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index eed5aeb2f7..1953633e82 100644
index 4f4ab30f8c..76fff60a6b 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -230,6 +230,12 @@ MachineInfoList *qmp_query_machines(Error **errp)
@@ -99,6 +99,12 @@ MachineInfoList *qmp_query_machines(Error **errp)
info->hotpluggable_cpus = mc->has_hotpluggable_cpus;
info->numa_mem_supported = mc->numa_mem_supported;
info->deprecated = !!mc->deprecation_reason;
@@ -30,24 +30,24 @@ index eed5aeb2f7..1953633e82 100644
info->default_cpu_type = g_strdup(mc->default_cpu_type);
info->has_default_cpu_type = true;
diff --git a/qapi/machine.json b/qapi/machine.json
index ff7b5032e3..f6cf28f9fd 100644
index 91f3be6f44..0905618e25 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -340,6 +340,8 @@
@@ -141,6 +141,8 @@
#
# @is-default: whether the machine is default
#
+# @is-current: whether this machine is currently used
+#
# @cpu-max: maximum number of CPUs supported by the machine type
# (since 1.5.0)
# (since 1.5)
#
@@ -359,7 +361,7 @@
@@ -162,7 +164,7 @@
##
{ 'struct': 'MachineInfo',
'data': { 'name': 'str', '*alias': 'str',
- '*is-default': 'bool', 'cpu-max': 'int',
+ '*is-default': 'bool', '*is-current': 'bool', 'cpu-max': 'int',
'hotpluggable-cpus': 'bool', 'numa-mem-supported': 'bool',
'deprecated': 'bool', '*default-cpu-type': 'str' } }
'deprecated': 'bool', '*default-cpu-type': 'str',
'*default-ram-id': 'str' } }

View File

@@ -12,29 +12,29 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2 files changed, 8 insertions(+)
diff --git a/qapi/ui.json b/qapi/ui.json
index e16e98a060..feda6ef090 100644
index 4244c62c30..f946fbd8c1 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -213,11 +213,14 @@
@@ -333,11 +333,14 @@
#
# @channels: a list of @SpiceChannel for each active spice channel
#
+# @ticket: The last ticket set with set_password
+#
# Since: 0.14.0
# Since: 0.14
##
{ 'struct': 'SpiceInfo',
'data': {'enabled': 'bool', 'migrated': 'bool', '*host': 'str', '*port': 'int',
'*tls-port': 'int', '*auth': 'str', '*compiled-version': 'str',
+ '*ticket': 'str',
'mouse-mode': 'SpiceQueryMouseMode', '*channels': ['SpiceChannel']},
'if': 'defined(CONFIG_SPICE)' }
'if': 'CONFIG_SPICE' }
diff --git a/ui/spice-core.c b/ui/spice-core.c
index ca04965ead..243466c13d 100644
index a3acdbd682..756776778d 100644
--- a/ui/spice-core.c
+++ b/ui/spice-core.c
@@ -539,6 +539,11 @@ SpiceInfo *qmp_query_spice(Error **errp)
@@ -534,6 +534,11 @@ static SpiceInfo *qmp_query_spice_real(Error **errp)
micro = SPICE_SERVER_VERSION & 0xff;
info->compiled_version = g_strdup_printf("%d.%d.%d", major, minor, micro);

View File

@@ -1,45 +1,49 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Dietmar Maurer <dietmar@proxmox.com>
Date: Mon, 6 Apr 2020 12:16:46 +0200
Subject: [PATCH] PVE: internal snapshot async
Subject: [PATCH] PVE: add savevm-async for background state snapshots
Truncate at 1024 boundary (Fabian Ebner will send a patch for stable)
Put qemu_savevm_state_{header,setup} into the main loop and the rest
of the iteration into a coroutine. The former need to lock the
iothread (and we can't unlock it in the coroutine), and the latter
can't deal with being in a separate thread, so a coroutine it must
be.
Truncate output file at 1024 boundary.
Do not block the VM and save the state on aborting a snapshot, as the
snapshot will be invalid anyway.
Also, when aborting, wait for the target file to be closed, otherwise a
client might run into race-conditions when trying to remove the file
still opened by QEMU.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[improve aborting]
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
Makefile.objs | 1 +
hmp-commands-info.hx | 13 +
hmp-commands.hx | 32 +++
include/migration/snapshot.h | 1 +
hmp-commands.hx | 33 ++
include/migration/snapshot.h | 2 +
include/monitor/hmp.h | 5 +
monitor/hmp-cmds.c | 57 +++++
qapi/migration.json | 34 +++
qapi/misc.json | 32 +++
migration/meson.build | 1 +
migration/savevm-async.c | 598 +++++++++++++++++++++++++++++++++++
monitor/hmp-cmds.c | 57 ++++
qapi/migration.json | 34 ++
qapi/misc.json | 32 ++
qemu-options.hx | 12 +
savevm-async.c | 464 +++++++++++++++++++++++++++++++++++
softmmu/vl.c | 10 +
11 files changed, 661 insertions(+)
create mode 100644 savevm-async.c
11 files changed, 797 insertions(+)
create mode 100644 migration/savevm-async.c
diff --git a/Makefile.objs b/Makefile.objs
index a7c967633a..d0b4dde836 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -47,6 +47,7 @@ common-obj-y += bootdevice.o iothread.o
common-obj-y += dump/
common-obj-y += job-qmp.o
common-obj-y += monitor/
+common-obj-y += savevm-async.o
common-obj-y += net/
common-obj-y += qdev-monitor.o
common-obj-$(CONFIG_WIN32) += os-win32.o
diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index ca5198438d..89fea71972 100644
index 407a1da800..245f8acc55 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -579,6 +579,19 @@ SRST
Show current migration xbzrle cache size.
@@ -536,6 +536,19 @@ SRST
Show current migration parameters.
ERST
+ {
@@ -59,13 +63,13 @@ index ca5198438d..89fea71972 100644
.name = "balloon",
.args_type = "",
diff --git a/hmp-commands.hx b/hmp-commands.hx
index 7f0f3974ad..81fe305d07 100644
index 5efb47fc32..1ad13b668b 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -1814,3 +1814,35 @@ ERST
.flags = "p",
@@ -1746,3 +1746,36 @@ ERST
"\n\t\t\t -b to specify dirty bitmap as method of calculation)",
.cmd = hmp_calc_dirty_rate,
},
+
+ {
+ .name = "savevm-start",
@@ -96,24 +100,25 @@ index 7f0f3974ad..81fe305d07 100644
+ .args_type = "",
+ .params = "",
+ .help = "Resume VM after snaphot.",
+ .cmd = hmp_savevm_end,
+ .cmd = hmp_savevm_end,
+ .coroutine = true,
+ },
diff --git a/include/migration/snapshot.h b/include/migration/snapshot.h
index c85b6ec75b..4411b7121d 100644
index e72083b117..c846d37806 100644
--- a/include/migration/snapshot.h
+++ b/include/migration/snapshot.h
@@ -17,5 +17,6 @@
@@ -61,4 +61,6 @@ bool delete_snapshot(const char *name,
bool has_devices, strList *devices,
Error **errp);
int save_snapshot(const char *name, Error **errp);
int load_snapshot(const char *name, Error **errp);
+int load_snapshot_from_blockdev(const char *filename, Error **errp);
+
#endif
diff --git a/include/monitor/hmp.h b/include/monitor/hmp.h
index e33ca5a911..601827d43f 100644
index 96d014826a..3a39ba41b5 100644
--- a/include/monitor/hmp.h
+++ b/include/monitor/hmp.h
@@ -25,6 +25,7 @@ void hmp_info_status(Monitor *mon, const QDict *qdict);
@@ -26,6 +26,7 @@ void hmp_info_status(Monitor *mon, const QDict *qdict);
void hmp_info_uuid(Monitor *mon, const QDict *qdict);
void hmp_info_chardev(Monitor *mon, const QDict *qdict);
void hmp_info_mice(Monitor *mon, const QDict *qdict);
@@ -121,7 +126,7 @@ index e33ca5a911..601827d43f 100644
void hmp_info_migrate(Monitor *mon, const QDict *qdict);
void hmp_info_migrate_capabilities(Monitor *mon, const QDict *qdict);
void hmp_info_migrate_parameters(Monitor *mon, const QDict *qdict);
@@ -83,6 +84,10 @@ void hmp_netdev_add(Monitor *mon, const QDict *qdict);
@@ -80,6 +81,10 @@ void hmp_netdev_add(Monitor *mon, const QDict *qdict);
void hmp_netdev_del(Monitor *mon, const QDict *qdict);
void hmp_getfd(Monitor *mon, const QDict *qdict);
void hmp_closefd(Monitor *mon, const QDict *qdict);
@@ -132,191 +137,24 @@ index e33ca5a911..601827d43f 100644
void hmp_sendkey(Monitor *mon, const QDict *qdict);
void hmp_screendump(Monitor *mon, const QDict *qdict);
void hmp_chardev_add(Monitor *mon, const QDict *qdict);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index 0c6f6ff331..39c7474cea 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1876,6 +1876,63 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
hmp_handle_error(mon, err);
}
+void hmp_savevm_start(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+ const char *statefile = qdict_get_try_str(qdict, "statefile");
+
+ qmp_savevm_start(statefile != NULL, statefile, &errp);
+ hmp_handle_error(mon, errp);
+}
+
+void hmp_snapshot_drive(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+ const char *name = qdict_get_str(qdict, "name");
+ const char *device = qdict_get_str(qdict, "device");
+
+ qmp_snapshot_drive(device, name, &errp);
+ hmp_handle_error(mon, errp);
+}
+
+void hmp_delete_drive_snapshot(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+ const char *name = qdict_get_str(qdict, "name");
+ const char *device = qdict_get_str(qdict, "device");
+
+ qmp_delete_drive_snapshot(device, name, &errp);
+ hmp_handle_error(mon, errp);
+}
+
+void hmp_savevm_end(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+
+ qmp_savevm_end(&errp);
+ hmp_handle_error(mon, errp);
+}
+
+void hmp_info_savevm(Monitor *mon, const QDict *qdict)
+{
+ SaveVMInfo *info;
+ info = qmp_query_savevm(NULL);
+
+ if (info->has_status) {
+ monitor_printf(mon, "savevm status: %s\n", info->status);
+ monitor_printf(mon, "total time: %" PRIu64 " milliseconds\n",
+ info->total_time);
+ } else {
+ monitor_printf(mon, "savevm status: not running\n");
+ }
+ if (info->has_bytes) {
+ monitor_printf(mon, "Bytes saved: %"PRIu64"\n", info->bytes);
+ }
+ if (info->has_error) {
+ monitor_printf(mon, "Error: %s\n", info->error);
+ }
+}
+
void hmp_info_iothreads(Monitor *mon, const QDict *qdict)
{
IOThreadInfoList *info_list = qmp_query_iothreads(NULL);
diff --git a/qapi/migration.json b/qapi/migration.json
index eca2981d0a..081663d67a 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -222,6 +222,40 @@
'*compression': 'CompressionStats',
'*socket-address': ['SocketAddress'] } }
+##
+# @SaveVMInfo:
+#
+# Information about current migration process.
+#
+# @status: string describing the current savevm status.
+# This can be 'active', 'completed', 'failed'.
+# If this field is not returned, no savevm process
+# has been initiated
+#
+# @error: string containing error message is status is failed.
+#
+# @total-time: total amount of milliseconds since savevm started.
+# If savevm has ended, it returns the total save time
+#
+# @bytes: total amount of data transfered
+#
+# Since: 1.3
+##
+{ 'struct': 'SaveVMInfo',
+ 'data': {'*status': 'str', '*error': 'str',
+ '*total-time': 'int', '*bytes': 'int'} }
+
+##
+# @query-savevm:
+#
+# Returns information about current savevm process.
+#
+# Returns: @SaveVMInfo
+#
+# Since: 1.3
+##
+{ 'command': 'query-savevm', 'returns': 'SaveVMInfo' }
+
##
# @query-migrate:
#
diff --git a/qapi/misc.json b/qapi/misc.json
index e2a6678eae..0868de22b7 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -1165,6 +1165,38 @@
##
{ 'command': 'query-fdsets', 'returns': ['FdsetInfo'] }
+##
+# @savevm-start:
+#
+# Prepare for snapshot and halt VM. Save VM state to statefile.
+#
+##
+{ 'command': 'savevm-start', 'data': { '*statefile': 'str' } }
+
+##
+# @snapshot-drive:
+#
+# Create an internal drive snapshot.
+#
+##
+{ 'command': 'snapshot-drive', 'data': { 'device': 'str', 'name': 'str' } }
+
+##
+# @delete-drive-snapshot:
+#
+# Delete a drive snapshot.
+#
+##
+{ 'command': 'delete-drive-snapshot', 'data': { 'device': 'str', 'name': 'str' } }
+
+##
+# @savevm-end:
+#
+# Resume VM after a snapshot.
+#
+##
+{ 'command': 'savevm-end' }
+
##
# @AcpiTableOptions:
#
diff --git a/qemu-options.hx b/qemu-options.hx
index 292d4e7c0c..55eef64ddf 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -3832,6 +3832,18 @@ SRST
Start right away with a saved state (``loadvm`` in monitor)
ERST
+DEF("loadstate", HAS_ARG, QEMU_OPTION_loadstate, \
+ "-loadstate file\n" \
+ " start right away with a saved state\n",
+ QEMU_ARCH_ALL)
+SRST
+``-loadstate file``
+ Start right away with a saved state. This option does not rollback
+ disk state like @code{loadvm}, so user must make sure that disk
+ have correct state. @var{file} can be any valid device URL. See the section
+ for "Device URL Syntax" for more information.
+ERST
+
#ifndef _WIN32
DEF("daemonize", 0, QEMU_OPTION_daemonize, \
"-daemonize daemonize QEMU after initializing\n", QEMU_ARCH_ALL)
diff --git a/savevm-async.c b/savevm-async.c
diff --git a/migration/meson.build b/migration/meson.build
index f8714dcb15..ea9aedeefc 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -23,6 +23,7 @@ softmmu_ss.add(files(
'multifd-zlib.c',
'postcopy-ram.c',
'savevm.c',
+ 'savevm-async.c',
'socket.c',
'tls.c',
), gnutls)
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
new file mode 100644
index 0000000000..54ceeae26c
index 0000000000..79a0cda906
--- /dev/null
+++ b/savevm-async.c
@@ -0,0 +1,464 @@
+++ b/migration/savevm-async.c
@@ -0,0 +1,598 @@
+#include "qemu/osdep.h"
+#include "migration/migration.h"
+#include "migration/savevm.h"
@@ -335,6 +173,7 @@ index 0000000000..54ceeae26c
+#include "qapi/qapi-commands-misc.h"
+#include "qapi/qapi-commands-block.h"
+#include "qemu/cutils.h"
+#include "qemu/timer.h"
+#include "qemu/main-loop.h"
+#include "qemu/rcu.h"
+
@@ -369,10 +208,17 @@ index 0000000000..54ceeae26c
+ int saved_vm_running;
+ QEMUFile *file;
+ int64_t total_time;
+ QEMUBH *cleanup_bh;
+ QemuThread thread;
+ QEMUBH *finalize_bh;
+ Coroutine *co;
+ QemuCoSleep *target_close_wait;
+} snap_state;
+
+static bool savevm_aborted(void)
+{
+ return snap_state.state == SAVE_STATE_CANCELLED ||
+ snap_state.state == SAVE_STATE_ERROR;
+}
+
+SaveVMInfo *qmp_query_savevm(Error **errp)
+{
+ SaveVMInfo *info = g_malloc0(sizeof(*info));
@@ -425,17 +271,23 @@ index 0000000000..54ceeae26c
+ }
+
+ if (snap_state.target) {
+ /* try to truncate, but ignore errors (will fail on block devices).
+ * note1: bdrv_read() need whole blocks, so we need to round up
+ * note2: PVE requires 1024 (BDRV_SECTOR_SIZE*2) alignment
+ */
+ size_t size = QEMU_ALIGN_UP(snap_state.bs_pos, BDRV_SECTOR_SIZE*2);
+ blk_truncate(snap_state.target, size, false, PREALLOC_MODE_OFF, NULL);
+ if (!savevm_aborted()) {
+ /* try to truncate, but ignore errors (will fail on block devices).
+ * note1: bdrv_read() need whole blocks, so we need to round up
+ * note2: PVE requires 1024 (BDRV_SECTOR_SIZE*2) alignment
+ */
+ size_t size = QEMU_ALIGN_UP(snap_state.bs_pos, BDRV_SECTOR_SIZE*2);
+ blk_truncate(snap_state.target, size, false, PREALLOC_MODE_OFF, 0, NULL);
+ }
+ blk_op_unblock_all(snap_state.target, snap_state.blocker);
+ error_free(snap_state.blocker);
+ snap_state.blocker = NULL;
+ blk_unref(snap_state.target);
+ snap_state.target = NULL;
+
+ if (snap_state.target_close_wait) {
+ qemu_co_sleep_wake(snap_state.target_close_wait);
+ }
+ }
+
+ return ret;
@@ -477,6 +329,7 @@ index 0000000000..54ceeae26c
+ BlkRwCo *rwco = opaque;
+ rwco->ret = blk_co_pwritev(snap_state.target, rwco->offset, rwco->qiov->size,
+ rwco->qiov, 0);
+ aio_wait_kick();
+}
+
+static ssize_t block_state_writev_buffer(void *opaque, struct iovec *iov,
@@ -514,19 +367,63 @@ index 0000000000..54ceeae26c
+ .close = block_state_close,
+};
+
+static void process_savevm_cleanup(void *opaque)
+static void process_savevm_finalize(void *opaque)
+{
+ int ret;
+ qemu_bh_delete(snap_state.cleanup_bh);
+ snap_state.cleanup_bh = NULL;
+ qemu_mutex_unlock_iothread();
+ qemu_thread_join(&snap_state.thread);
+ qemu_mutex_lock_iothread();
+ AioContext *iohandler_ctx = iohandler_get_aio_context();
+ MigrationState *ms = migrate_get_current();
+
+ bool aborted = savevm_aborted();
+
+#ifdef DEBUG_SAVEVM_STATE
+ int64_t start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+#endif
+
+ qemu_bh_delete(snap_state.finalize_bh);
+ snap_state.finalize_bh = NULL;
+ snap_state.co = NULL;
+
+ /* We need to own the target bdrv's context for the following functions,
+ * so move it back. It can stay in the main context and live out its live
+ * there, since we're done with it after this method ends anyway.
+ */
+ aio_context_acquire(iohandler_ctx);
+ blk_set_aio_context(snap_state.target, qemu_get_aio_context(), NULL);
+ aio_context_release(iohandler_ctx);
+
+ ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+ if (ret < 0) {
+ save_snapshot_error("vm_stop_force_state error %d", ret);
+ }
+
+ if (!aborted) {
+ /* skip state saving if we aborted, snapshot will be invalid anyway */
+ (void)qemu_savevm_state_complete_precopy(snap_state.file, false, false);
+ ret = qemu_file_get_error(snap_state.file);
+ if (ret < 0) {
+ save_snapshot_error("qemu_savevm_state_iterate error %d", ret);
+ }
+ }
+
+ DPRINTF("state saving complete\n");
+ DPRINTF("timing: process_savevm_finalize (state saving) took %ld ms\n",
+ qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - start_time);
+
+ /* clear migration state */
+ migrate_set_state(&ms->state, MIGRATION_STATUS_SETUP,
+ ret || aborted ? MIGRATION_STATUS_FAILED : MIGRATION_STATUS_COMPLETED);
+ ms->to_dst_file = NULL;
+
+ qemu_savevm_state_cleanup();
+
+ ret = save_snapshot_cleanup();
+ if (ret < 0) {
+ save_snapshot_error("save_snapshot_cleanup error %d", ret);
+ } else if (snap_state.state == SAVE_STATE_ACTIVE) {
+ snap_state.state = SAVE_STATE_COMPLETED;
+ } else if (aborted) {
+ save_snapshot_error("process_savevm_cleanup: found aborted state: %d",
+ snap_state.state);
+ } else {
+ save_snapshot_error("process_savevm_cleanup: invalid state: %d",
+ snap_state.state);
@@ -535,82 +432,98 @@ index 0000000000..54ceeae26c
+ vm_start();
+ snap_state.saved_vm_running = false;
+ }
+
+ DPRINTF("timing: process_savevm_finalize (full) took %ld ms\n",
+ qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - start_time);
+}
+
+static void *process_savevm_thread(void *opaque)
+static void coroutine_fn process_savevm_co(void *opaque)
+{
+ int ret;
+ int64_t maxlen;
+ BdrvNextIterator it;
+ BlockDriverState *bs = NULL;
+
+ rcu_register_thread();
+#ifdef DEBUG_SAVEVM_STATE
+ int64_t start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+#endif
+
+ qemu_savevm_state_header(snap_state.file);
+ qemu_savevm_state_setup(snap_state.file);
+ ret = qemu_file_get_error(snap_state.file);
+
+ if (ret < 0) {
+ save_snapshot_error("qemu_savevm_state_setup failed");
+ rcu_unregister_thread();
+ return NULL;
+ return;
+ }
+
+ while (snap_state.state == SAVE_STATE_ACTIVE) {
+ uint64_t pending_size, pend_precopy, pend_compatible, pend_postcopy;
+
+ /* pending is expected to be called without iothread lock */
+ qemu_mutex_unlock_iothread();
+ qemu_savevm_state_pending(snap_state.file, 0, &pend_precopy, &pend_compatible, &pend_postcopy);
+ qemu_mutex_lock_iothread();
+
+ pending_size = pend_precopy + pend_compatible + pend_postcopy;
+
+ maxlen = blk_getlength(snap_state.target) - 30*1024*1024;
+
+ if (pending_size > 400000 && snap_state.bs_pos + pending_size < maxlen) {
+ qemu_mutex_lock_iothread();
+ ret = qemu_savevm_state_iterate(snap_state.file, false);
+ if (ret < 0) {
+ save_snapshot_error("qemu_savevm_state_iterate error %d", ret);
+ break;
+ }
+ qemu_mutex_unlock_iothread();
+ DPRINTF("savevm inerate pending size %lu ret %d\n", pending_size, ret);
+ DPRINTF("savevm iterate pending size %lu ret %d\n", pending_size, ret);
+ } else {
+ qemu_mutex_lock_iothread();
+ qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
+ ret = global_state_store();
+ if (ret) {
+ save_snapshot_error("global_state_store error %d", ret);
+ break;
+ }
+ ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
+ if (ret < 0) {
+ save_snapshot_error("vm_stop_force_state error %d", ret);
+ break;
+ }
+ DPRINTF("savevm inerate finished\n");
+ /* upstream made the return value here inconsistent
+ * (-1 instead of 'ret' in one case and 0 after flush which can
+ * still set a file error...)
+ */
+ (void)qemu_savevm_state_complete_precopy(snap_state.file, false, false);
+ ret = qemu_file_get_error(snap_state.file);
+ if (ret < 0) {
+ save_snapshot_error("qemu_savevm_state_iterate error %d", ret);
+ break;
+ }
+ qemu_savevm_state_cleanup();
+ DPRINTF("save complete\n");
+
+ DPRINTF("savevm iterate complete\n");
+ break;
+ }
+ }
+
+ qemu_bh_schedule(snap_state.cleanup_bh);
+ qemu_mutex_unlock_iothread();
+ DPRINTF("timing: process_savevm_co took %ld ms\n",
+ qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - start_time);
+
+ rcu_unregister_thread();
+ return NULL;
+#ifdef DEBUG_SAVEVM_STATE
+ int64_t start_time_flush = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+#endif
+ /* If a drive runs in an IOThread we can flush it async, and only
+ * need to sync-flush whatever IO happens between now and
+ * vm_stop_force_state. bdrv_next can only be called from main AioContext,
+ * so move there now and after every flush.
+ */
+ aio_co_reschedule_self(qemu_get_aio_context());
+ for (bs = bdrv_first(&it); bs; bs = bdrv_next(&it)) {
+ /* target has BDRV_O_NO_FLUSH, no sense calling bdrv_flush on it */
+ if (bs == blk_bs(snap_state.target)) {
+ continue;
+ }
+
+ AioContext *bs_ctx = bdrv_get_aio_context(bs);
+ if (bs_ctx != qemu_get_aio_context()) {
+ DPRINTF("savevm: async flushing drive %s\n", bs->filename);
+ aio_co_reschedule_self(bs_ctx);
+ bdrv_flush(bs);
+ aio_co_reschedule_self(qemu_get_aio_context());
+ }
+ }
+
+ DPRINTF("timing: async flushing took %ld ms\n",
+ qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - start_time_flush);
+
+ qemu_bh_schedule(snap_state.finalize_bh);
+}
+
+void qmp_savevm_start(bool has_statefile, const char *statefile, Error **errp)
+{
+ Error *local_err = NULL;
+ MigrationState *ms = migrate_get_current();
+ AioContext *iohandler_ctx = iohandler_get_aio_context();
+
+ int bdrv_oflags = BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_NO_FLUSH;
+
@@ -620,6 +533,17 @@ index 0000000000..54ceeae26c
+ return;
+ }
+
+ if (migration_is_running(ms->state)) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, QERR_MIGRATION_ACTIVE);
+ return;
+ }
+
+ if (migrate_use_block()) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+ "Block migration and snapshots are incompatible");
+ return;
+ }
+
+ /* initialize snapshot info */
+ snap_state.saved_vm_running = runstate_is_running();
+ snap_state.bs_pos = 0;
@@ -658,14 +582,32 @@ index 0000000000..54ceeae26c
+ goto restart;
+ }
+
+ /*
+ * qemu_savevm_* paths use migration code and expect a migration state.
+ * State is cleared in process_savevm_co, but has to be initialized
+ * here (blocking main thread, from QMP) to avoid race conditions.
+ */
+ migrate_init(ms);
+ memset(&ram_counters, 0, sizeof(ram_counters));
+ ms->to_dst_file = snap_state.file;
+
+ error_setg(&snap_state.blocker, "block device is in use by savevm");
+ blk_op_block_all(snap_state.target, snap_state.blocker);
+
+ snap_state.state = SAVE_STATE_ACTIVE;
+ snap_state.cleanup_bh = qemu_bh_new(process_savevm_cleanup, &snap_state);
+ qemu_thread_create(&snap_state.thread, "savevm-async", process_savevm_thread,
+ NULL, QEMU_THREAD_JOINABLE);
+ snap_state.finalize_bh = qemu_bh_new(process_savevm_finalize, &snap_state);
+ snap_state.co = qemu_coroutine_create(&process_savevm_co, NULL);
+ qemu_mutex_unlock_iothread();
+ qemu_savevm_state_header(snap_state.file);
+ qemu_savevm_state_setup(snap_state.file);
+ qemu_mutex_lock_iothread();
+
+ /* Async processing from here on out happens in iohandler context, so let
+ * the target bdrv have its home there.
+ */
+ blk_set_aio_context(snap_state.target, iohandler_ctx, &local_err);
+
+ aio_co_schedule(iohandler_ctx, snap_state.co);
+
+ return;
+
@@ -675,11 +617,14 @@ index 0000000000..54ceeae26c
+
+ if (snap_state.saved_vm_running) {
+ vm_start();
+ snap_state.saved_vm_running = false;
+ }
+}
+
+void qmp_savevm_end(Error **errp)
+void coroutine_fn qmp_savevm_end(Error **errp)
+{
+ int64_t timeout;
+
+ if (snap_state.state == SAVE_STATE_DONE) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+ "VM snapshot not started\n");
@@ -688,14 +633,38 @@ index 0000000000..54ceeae26c
+
+ if (snap_state.state == SAVE_STATE_ACTIVE) {
+ snap_state.state = SAVE_STATE_CANCELLED;
+ return;
+ goto wait_for_close;
+ }
+
+ if (snap_state.saved_vm_running) {
+ vm_start();
+ snap_state.saved_vm_running = false;
+ }
+
+ snap_state.state = SAVE_STATE_DONE;
+
+wait_for_close:
+ if (!snap_state.target) {
+ DPRINTF("savevm-end: no target file open\n");
+ return;
+ }
+
+ /* wait until cleanup is done before returning, this ensures that after this
+ * call exits the statefile will be closed and can be removed immediately */
+ DPRINTF("savevm-end: waiting for cleanup\n");
+ timeout = 30L * 1000 * 1000 * 1000;
+ qemu_co_sleep_ns_wakeable(snap_state.target_close_wait,
+ QEMU_CLOCK_REALTIME, timeout);
+ snap_state.target_close_wait = NULL;
+ if (snap_state.target) {
+ save_snapshot_error("timeout waiting for target file close in "
+ "qmp_savevm_end");
+ /* we cannot assume the snapshot finished in this case, so leave the
+ * state alone - caller has to figure something out */
+ return;
+ }
+
+ DPRINTF("savevm-end: cleanup done\n");
+}
+
+// FIXME: Deprecated
@@ -764,6 +733,9 @@ index 0000000000..54ceeae26c
+ qemu_system_reset(SHUTDOWN_CAUSE_NONE);
+ ret = qemu_loadvm_state(f);
+
+ /* dirty bitmap migration has a special case we need to trigger manually */
+ dirty_bitmap_mig_before_vm_start();
+
+ qemu_fclose(f);
+ migration_incoming_state_destroy();
+ if (ret < 0) {
@@ -781,32 +753,201 @@ index 0000000000..54ceeae26c
+ }
+ return ret;
+}
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index e9fa9af6bd..5000ce39d1 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -1903,6 +1903,63 @@ void hmp_info_memory_devices(Monitor *mon, const QDict *qdict)
hmp_handle_error(mon, err);
}
+void hmp_savevm_start(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+ const char *statefile = qdict_get_try_str(qdict, "statefile");
+
+ qmp_savevm_start(statefile != NULL, statefile, &errp);
+ hmp_handle_error(mon, errp);
+}
+
+void hmp_snapshot_drive(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+ const char *name = qdict_get_str(qdict, "name");
+ const char *device = qdict_get_str(qdict, "device");
+
+ qmp_snapshot_drive(device, name, &errp);
+ hmp_handle_error(mon, errp);
+}
+
+void hmp_delete_drive_snapshot(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+ const char *name = qdict_get_str(qdict, "name");
+ const char *device = qdict_get_str(qdict, "device");
+
+ qmp_delete_drive_snapshot(device, name, &errp);
+ hmp_handle_error(mon, errp);
+}
+
+void coroutine_fn hmp_savevm_end(Monitor *mon, const QDict *qdict)
+{
+ Error *errp = NULL;
+
+ qmp_savevm_end(&errp);
+ hmp_handle_error(mon, errp);
+}
+
+void hmp_info_savevm(Monitor *mon, const QDict *qdict)
+{
+ SaveVMInfo *info;
+ info = qmp_query_savevm(NULL);
+
+ if (info->has_status) {
+ monitor_printf(mon, "savevm status: %s\n", info->status);
+ monitor_printf(mon, "total time: %" PRIu64 " milliseconds\n",
+ info->total_time);
+ } else {
+ monitor_printf(mon, "savevm status: not running\n");
+ }
+ if (info->has_bytes) {
+ monitor_printf(mon, "Bytes saved: %"PRIu64"\n", info->bytes);
+ }
+ if (info->has_error) {
+ monitor_printf(mon, "Error: %s\n", info->error);
+ }
+}
+
void hmp_info_iothreads(Monitor *mon, const QDict *qdict)
{
IOThreadInfoList *info_list = qmp_query_iothreads(NULL);
diff --git a/qapi/migration.json b/qapi/migration.json
index bbfd48cf0b..45686390a2 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -247,6 +247,40 @@
'*compression': 'CompressionStats',
'*socket-address': ['SocketAddress'] } }
+##
+# @SaveVMInfo:
+#
+# Information about current migration process.
+#
+# @status: string describing the current savevm status.
+# This can be 'active', 'completed', 'failed'.
+# If this field is not returned, no savevm process
+# has been initiated
+#
+# @error: string containing error message is status is failed.
+#
+# @total-time: total amount of milliseconds since savevm started.
+# If savevm has ended, it returns the total save time
+#
+# @bytes: total amount of data transfered
+#
+# Since: 1.3
+##
+{ 'struct': 'SaveVMInfo',
+ 'data': {'*status': 'str', '*error': 'str',
+ '*total-time': 'int', '*bytes': 'int'} }
+
+##
+# @query-savevm:
+#
+# Returns information about current savevm process.
+#
+# Returns: @SaveVMInfo
+#
+# Since: 1.3
+##
+{ 'command': 'query-savevm', 'returns': 'SaveVMInfo' }
+
##
# @query-migrate:
#
diff --git a/qapi/misc.json b/qapi/misc.json
index 358548abe1..25b3febc52 100644
--- a/qapi/misc.json
+++ b/qapi/misc.json
@@ -435,6 +435,38 @@
##
{ 'command': 'query-fdsets', 'returns': ['FdsetInfo'] }
+##
+# @savevm-start:
+#
+# Prepare for snapshot and halt VM. Save VM state to statefile.
+#
+##
+{ 'command': 'savevm-start', 'data': { '*statefile': 'str' } }
+
+##
+# @snapshot-drive:
+#
+# Create an internal drive snapshot.
+#
+##
+{ 'command': 'snapshot-drive', 'data': { 'device': 'str', 'name': 'str' } }
+
+##
+# @delete-drive-snapshot:
+#
+# Delete a drive snapshot.
+#
+##
+{ 'command': 'delete-drive-snapshot', 'data': { 'device': 'str', 'name': 'str' } }
+
+##
+# @savevm-end:
+#
+# Resume VM after a snapshot.
+#
+##
+{ 'command': 'savevm-end', 'coroutine': true }
+
##
# @CommandLineParameterType:
#
diff --git a/qemu-options.hx b/qemu-options.hx
index ae2c6dbbfc..423144abeb 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -4171,6 +4171,18 @@ SRST
Start right away with a saved state (``loadvm`` in monitor)
ERST
+DEF("loadstate", HAS_ARG, QEMU_OPTION_loadstate, \
+ "-loadstate file\n" \
+ " start right away with a saved state\n",
+ QEMU_ARCH_ALL)
+SRST
+``-loadstate file``
+ Start right away with a saved state. This option does not rollback
+ disk state like @code{loadvm}, so user must make sure that disk
+ have correct state. @var{file} can be any valid device URL. See the section
+ for "Device URL Syntax" for more information.
+ERST
+
#ifndef _WIN32
DEF("daemonize", 0, QEMU_OPTION_daemonize, \
"-daemonize daemonize QEMU after initializing\n", QEMU_ARCH_ALL)
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 32c0047889..4b45eb0c37 100644
index 620a1f1367..fd82efb8b3 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2827,6 +2827,7 @@ void qemu_init(int argc, char **argv, char **envp)
int optind;
const char *optarg;
const char *loadvm = NULL;
+ const char *loadstate = NULL;
MachineClass *machine_class;
const char *cpu_option;
const char *vga_model = NULL;
@@ -3391,6 +3392,9 @@ void qemu_init(int argc, char **argv, char **envp)
case QEMU_OPTION_loadvm:
loadvm = optarg;
break;
+ case QEMU_OPTION_loadstate:
+ loadstate = optarg;
+ break;
case QEMU_OPTION_full_screen:
dpy.has_full_screen = true;
dpy.full_screen = true;
@@ -4447,6 +4451,12 @@ void qemu_init(int argc, char **argv, char **envp)
autostart = 0;
exit(1);
}
@@ -156,6 +156,7 @@ static const char *incoming;
static const char *loadvm;
static const char *accelerators;
static QDict *machine_opts_dict;
+static const char *loadstate;
static QTAILQ_HEAD(, ObjectOption) object_opts = QTAILQ_HEAD_INITIALIZER(object_opts);
static QTAILQ_HEAD(, DeviceOption) device_opts = QTAILQ_HEAD_INITIALIZER(device_opts);
static ram_addr_t maxram_size;
@@ -2743,6 +2744,12 @@ void qmp_x_exit_preconfig(Error **errp)
if (loadvm) {
load_snapshot(loadvm, NULL, false, NULL, &error_fatal);
+ } else if (loadstate) {
+ Error *local_err = NULL;
+ if (load_snapshot_from_blockdev(loadstate, &local_err) < 0) {
@@ -816,3 +957,13 @@ index 32c0047889..4b45eb0c37 100644
}
if (replay_mode != REPLAY_MODE_NONE) {
replay_vmstate_init();
@@ -3284,6 +3291,9 @@ void qemu_init(int argc, char **argv, char **envp)
case QEMU_OPTION_loadvm:
loadvm = optarg;
break;
+ case QEMU_OPTION_loadstate:
+ loadstate = optarg;
+ break;
case QEMU_OPTION_full_screen:
dpy.has_full_screen = true;
dpy.full_screen = true;

View File

@@ -1,31 +1,36 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
Date: Mon, 4 May 2020 11:05:08 +0200
Subject: [PATCH] add optional buffer size to QEMUFile
Subject: [PATCH] PVE: add optional buffer size to QEMUFile
So we can use a 4M buffer for savevm-async which should
increase performance storing the state onto ceph.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
[increase max IOV count in QEMUFile to actually write more data]
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
migration/qemu-file.c | 36 ++++++++++++++++++++++++------------
migration/qemu-file.h | 1 +
savevm-async.c | 4 ++--
3 files changed, 27 insertions(+), 14 deletions(-)
migration/qemu-file.c | 38 +++++++++++++++++++++++++-------------
migration/qemu-file.h | 1 +
migration/savevm-async.c | 4 ++--
3 files changed, 28 insertions(+), 15 deletions(-)
diff --git a/migration/qemu-file.c b/migration/qemu-file.c
index 1c3a358a14..7362e51c71 100644
index 6338d8e2ff..6697a93a7e 100644
--- a/migration/qemu-file.c
+++ b/migration/qemu-file.c
@@ -30,7 +30,7 @@
@@ -30,8 +30,8 @@
#include "trace.h"
#include "qapi/error.h"
-#define IO_BUF_SIZE 32768
-#define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 64)
+#define DEFAULT_IO_BUF_SIZE 32768
#define MAX_IOV_SIZE MIN(IOV_MAX, 64)
+#define MAX_IOV_SIZE MIN_CONST(IOV_MAX, 256)
struct QEMUFile {
const QEMUFileOps *ops;
@@ -45,7 +45,8 @@ struct QEMUFile {
when reading */
int buf_index;
@@ -36,34 +41,34 @@ index 1c3a358a14..7362e51c71 100644
DECLARE_BITMAP(may_free, MAX_IOV_SIZE);
struct iovec iov[MAX_IOV_SIZE];
@@ -101,7 +102,7 @@ bool qemu_file_mode_is_not_valid(const char *mode)
@@ -103,7 +104,7 @@ bool qemu_file_mode_is_not_valid(const char *mode)
return false;
}
-QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
+QEMUFile *qemu_fopen_ops_sized(void *opaque, const QEMUFileOps *ops, size_t buffer_size)
-QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops, bool has_ioc)
+QEMUFile *qemu_fopen_ops_sized(void *opaque, const QEMUFileOps *ops, bool has_ioc, size_t buffer_size)
{
QEMUFile *f;
@@ -109,9 +110,17 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
@@ -112,9 +113,17 @@ QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops, bool has_ioc)
f->opaque = opaque;
f->ops = ops;
f->has_ioc = has_ioc;
+ f->buf_allocated_size = buffer_size;
+ f->buf = malloc(buffer_size);
+
return f;
}
+QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops)
+QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops, bool has_ioc)
+{
+ return qemu_fopen_ops_sized(opaque, ops, DEFAULT_IO_BUF_SIZE);
+ return qemu_fopen_ops_sized(opaque, ops, has_ioc, DEFAULT_IO_BUF_SIZE);
+}
+
void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks)
{
@@ -346,7 +355,7 @@ static ssize_t qemu_fill_buffer(QEMUFile *f)
@@ -349,7 +358,7 @@ static ssize_t qemu_fill_buffer(QEMUFile *f)
}
len = f->ops->get_buffer(f->opaque, f->buf + pending, f->pos,
@@ -72,7 +77,7 @@ index 1c3a358a14..7362e51c71 100644
if (len > 0) {
f->buf_size += len;
f->pos += len;
@@ -386,6 +395,9 @@ int qemu_fclose(QEMUFile *f)
@@ -389,6 +398,9 @@ int qemu_fclose(QEMUFile *f)
ret = ret2;
}
}
@@ -82,7 +87,7 @@ index 1c3a358a14..7362e51c71 100644
/* If any error was spotted before closing, we should report it
* instead of the close() return value.
*/
@@ -435,7 +447,7 @@ static void add_buf_to_iovec(QEMUFile *f, size_t len)
@@ -443,7 +455,7 @@ static void add_buf_to_iovec(QEMUFile *f, size_t len)
{
if (!add_to_iovec(f, f->buf + f->buf_index, len, false)) {
f->buf_index += len;
@@ -91,7 +96,7 @@ index 1c3a358a14..7362e51c71 100644
qemu_fflush(f);
}
}
@@ -461,7 +473,7 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, size_t size)
@@ -469,7 +481,7 @@ void qemu_put_buffer(QEMUFile *f, const uint8_t *buf, size_t size)
}
while (size > 0) {
@@ -100,7 +105,7 @@ index 1c3a358a14..7362e51c71 100644
if (l > size) {
l = size;
}
@@ -508,8 +520,8 @@ size_t qemu_peek_buffer(QEMUFile *f, uint8_t **buf, size_t size, size_t offset)
@@ -516,8 +528,8 @@ size_t qemu_peek_buffer(QEMUFile *f, uint8_t **buf, size_t size, size_t offset)
size_t index;
assert(!qemu_file_is_writable(f));
@@ -111,7 +116,7 @@ index 1c3a358a14..7362e51c71 100644
/* The 1st byte to read from */
index = f->buf_index + offset;
@@ -559,7 +571,7 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
@@ -567,7 +579,7 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
size_t res;
uint8_t *src;
@@ -120,16 +125,16 @@ index 1c3a358a14..7362e51c71 100644
if (res == 0) {
return done;
}
@@ -593,7 +605,7 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
@@ -601,7 +613,7 @@ size_t qemu_get_buffer(QEMUFile *f, uint8_t *buf, size_t size)
*/
size_t qemu_get_buffer_in_place(QEMUFile *f, uint8_t **buf, size_t size)
{
- if (size < IO_BUF_SIZE) {
+ if (size < f->buf_allocated_size) {
size_t res;
uint8_t *src;
uint8_t *src = NULL;
@@ -618,7 +630,7 @@ int qemu_peek_byte(QEMUFile *f, int offset)
@@ -626,7 +638,7 @@ int qemu_peek_byte(QEMUFile *f, int offset)
int index = f->buf_index + offset;
assert(!qemu_file_is_writable(f));
@@ -138,7 +143,7 @@ index 1c3a358a14..7362e51c71 100644
if (index >= f->buf_size) {
qemu_fill_buffer(f);
@@ -770,7 +782,7 @@ static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
@@ -778,7 +790,7 @@ static int qemu_compress_data(z_stream *stream, uint8_t *dest, size_t dest_len,
ssize_t qemu_put_compression_data(QEMUFile *f, z_stream *stream,
const uint8_t *p, size_t size)
{
@@ -148,36 +153,36 @@ index 1c3a358a14..7362e51c71 100644
if (blen < compressBound(size)) {
return -1;
diff --git a/migration/qemu-file.h b/migration/qemu-file.h
index a9b6d6ccb7..8752d27c74 100644
index 3f36d4dc8c..67501fd9cf 100644
--- a/migration/qemu-file.h
+++ b/migration/qemu-file.h
@@ -120,6 +120,7 @@ typedef struct QEMUFileHooks {
@@ -121,6 +121,7 @@ typedef struct QEMUFileHooks {
} QEMUFileHooks;
QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops);
+QEMUFile *qemu_fopen_ops_sized(void *opaque, const QEMUFileOps *ops, size_t buffer_size);
QEMUFile *qemu_fopen_ops(void *opaque, const QEMUFileOps *ops, bool has_ioc);
+QEMUFile *qemu_fopen_ops_sized(void *opaque, const QEMUFileOps *ops, bool has_ioc, size_t buffer_size);
void qemu_file_set_hooks(QEMUFile *f, const QEMUFileHooks *hooks);
int qemu_get_fd(QEMUFile *f);
int qemu_fclose(QEMUFile *f);
diff --git a/savevm-async.c b/savevm-async.c
index af865b9a0a..c3fe741c38 100644
--- a/savevm-async.c
+++ b/savevm-async.c
@@ -338,7 +338,7 @@ void qmp_savevm_start(bool has_statefile, const char *statefile, Error **errp)
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index 79a0cda906..970ee3b3fc 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -418,7 +418,7 @@ void qmp_savevm_start(bool has_statefile, const char *statefile, Error **errp)
goto restart;
}
- snap_state.file = qemu_fopen_ops(&snap_state, &block_file_ops);
+ snap_state.file = qemu_fopen_ops_sized(&snap_state, &block_file_ops, 4 * 1024 * 1024);
+ snap_state.file = qemu_fopen_ops_sized(&snap_state, &block_file_ops, false, 4 * 1024 * 1024);
if (!snap_state.file) {
error_set(errp, ERROR_CLASS_GENERIC_ERROR, "failed to open '%s'", statefile);
@@ -454,7 +454,7 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
@@ -567,7 +567,7 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
blk_op_block_all(be, blocker);
/* restore the VM state */
- f = qemu_fopen_ops(be, &loadstate_file_ops);
+ f = qemu_fopen_ops_sized(be, &loadstate_file_ops, 4 * 1024 * 1024);
+ f = qemu_fopen_ops_sized(be, &loadstate_file_ops, false, 4 * 1024 * 1024);
if (!f) {
error_setg(errp, "Could not open VM state file");
goto the_end;

View File

@@ -4,30 +4,32 @@ Date: Mon, 6 Apr 2020 12:16:47 +0200
Subject: [PATCH] PVE: block: add the zeroinit block driver filter
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[adapt to changed function signatures]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
block/Makefile.objs | 1 +
block/zeroinit.c | 197 ++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 198 insertions(+)
block/meson.build | 1 +
block/zeroinit.c | 196 ++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 197 insertions(+)
create mode 100644 block/zeroinit.c
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 3635b6b4c1..1282445672 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -11,6 +11,7 @@ block-obj-$(CONFIG_QED) += qed.o qed-l2-cache.o qed-table.o qed-cluster.o
block-obj-$(CONFIG_QED) += qed-check.o
block-obj-y += vhdx.o vhdx-endian.o vhdx-log.o
block-obj-y += quorum.o
+block-obj-y += zeroinit.o
block-obj-y += blkdebug.o blkverify.o blkreplay.o
block-obj-$(CONFIG_PARALLELS) += parallels.o
block-obj-y += blklogwrites.o
diff --git a/block/meson.build b/block/meson.build
index deb73ca389..c9d1fdca7d 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -41,6 +41,7 @@ block_ss.add(files(
'vmdk.c',
'vpc.c',
'write-threshold.c',
+ 'zeroinit.c',
), zstd, zlib, gnutls)
softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
diff --git a/block/zeroinit.c b/block/zeroinit.c
new file mode 100644
index 0000000000..ff38388d94
index 0000000000..20ee611f22
--- /dev/null
+++ b/block/zeroinit.c
@@ -0,0 +1,197 @@
@@ -0,0 +1,196 @@
+/*
+ * Filter to fake a zero-initialized block device.
+ *
@@ -107,7 +109,7 @@ index 0000000000..ff38388d94
+
+ /* Open the raw file */
+ bs->file = bdrv_open_child(qemu_opt_get(opts, "x-next"), options, "next",
+ bs, &child_file, false, &local_err);
+ bs, &child_of_bds, BDRV_CHILD_FILTERED, false, &local_err);
+ if (local_err) {
+ ret = -EINVAL;
+ error_propagate(errp, local_err);
@@ -138,22 +140,22 @@ index 0000000000..ff38388d94
+}
+
+static int coroutine_fn zeroinit_co_preadv(BlockDriverState *bs,
+ uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
+ int64_t offset, int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ return bdrv_co_preadv(bs->file, offset, bytes, qiov, flags);
+}
+
+static int coroutine_fn zeroinit_co_pwrite_zeroes(BlockDriverState *bs, int64_t offset,
+ int count, BdrvRequestFlags flags)
+ int64_t bytes, BdrvRequestFlags flags)
+{
+ BDRVZeroinitState *s = bs->opaque;
+ if (offset >= s->extents)
+ return 0;
+ return bdrv_pwrite_zeroes(bs->file, offset, count, flags);
+ return bdrv_pwrite_zeroes(bs->file, offset, bytes, flags);
+}
+
+static int coroutine_fn zeroinit_co_pwritev(BlockDriverState *bs,
+ uint64_t offset, uint64_t bytes, QEMUIOVector *qiov, int flags)
+ int64_t offset, int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ BDRVZeroinitState *s = bs->opaque;
+ int64_t extents = offset + bytes;
@@ -174,15 +176,16 @@ index 0000000000..ff38388d94
+}
+
+static int coroutine_fn zeroinit_co_pdiscard(BlockDriverState *bs,
+ int64_t offset, int count)
+ int64_t offset, int64_t bytes)
+{
+ return bdrv_co_pdiscard(bs->file, offset, count);
+ return bdrv_co_pdiscard(bs->file, offset, bytes);
+}
+
+static int zeroinit_co_truncate(BlockDriverState *bs, int64_t offset,
+ _Bool exact, PreallocMode prealloc, Error **errp)
+ _Bool exact, PreallocMode prealloc,
+ BdrvRequestFlags req_flags, Error **errp)
+{
+ return bdrv_co_truncate(bs->file, offset, exact, prealloc, errp);
+ return bdrv_co_truncate(bs->file, offset, exact, prealloc, req_flags, errp);
+}
+
+static int zeroinit_get_info(BlockDriverState *bs, BlockDriverInfo *bdi)
@@ -199,7 +202,7 @@ index 0000000000..ff38388d94
+ .bdrv_file_open = zeroinit_open,
+ .bdrv_close = zeroinit_close,
+ .bdrv_getlength = zeroinit_getlength,
+ .bdrv_child_perm = bdrv_filter_default_perms,
+ .bdrv_child_perm = bdrv_default_perms,
+ .bdrv_co_flush_to_disk = zeroinit_co_flush,
+
+ .bdrv_co_pwrite_zeroes = zeroinit_co_pwrite_zeroes,
@@ -211,8 +214,6 @@ index 0000000000..ff38388d94
+
+ .bdrv_has_zero_init = zeroinit_has_zero_init,
+
+ .bdrv_co_block_status = bdrv_co_block_status_from_file,
+
+ .bdrv_co_pdiscard = zeroinit_co_pdiscard,
+
+ .bdrv_co_truncate = zeroinit_co_truncate,

View File

@@ -14,10 +14,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
2 files changed, 11 insertions(+)
diff --git a/qemu-options.hx b/qemu-options.hx
index 55eef64ddf..e11b4f8ff5 100644
index 423144abeb..4879471aeb 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -904,6 +904,9 @@ DEFHEADING()
@@ -1019,6 +1019,9 @@ DEFHEADING()
DEFHEADING(Block device options:)
@@ -28,20 +28,20 @@ index 55eef64ddf..e11b4f8ff5 100644
"-fda/-fdb file use 'file' as floppy disk 0/1 image\n", QEMU_ARCH_ALL)
DEF("fdb", HAS_ARG, QEMU_OPTION_fdb, "", QEMU_ARCH_ALL)
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 4b45eb0c37..9de81875fd 100644
index fd82efb8b3..eb05e5a000 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2815,6 +2815,7 @@ static void create_default_memdev(MachineState *ms, const char *path)
void qemu_init(int argc, char **argv, char **envp)
{
int i;
@@ -2779,6 +2779,7 @@ void qemu_init(int argc, char **argv, char **envp)
MachineClass *machine_class;
bool userconfig = true;
FILE *vmstate_dump_file = NULL;
+ long vm_id;
int snapshot, linux_boot;
const char *initrd_filename;
const char *kernel_filename, *kernel_cmdline;
@@ -3518,6 +3519,13 @@ void qemu_init(int argc, char **argv, char **envp)
exit(1);
}
qemu_add_opts(&qemu_drive_opts);
qemu_add_drive_opts(&qemu_legacy_drive_opts);
@@ -3421,6 +3422,13 @@ void qemu_init(int argc, char **argv, char **envp)
machine_parse_property_opt(qemu_find_opts("smp-opts"),
"smp", optarg);
break;
+ case QEMU_OPTION_id:
+ vm_id = strtol(optarg, (char **)&optarg, 10);
@@ -51,5 +51,5 @@ index 4b45eb0c37..9de81875fd 100644
+ }
+ break;
case QEMU_OPTION_vnc:
vnc_parse(optarg, &error_fatal);
vnc_parse(optarg);
break;

View File

@@ -11,10 +11,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 9 insertions(+)
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 9ec0f2deb2..a00d45251f 100644
index 2a20982066..7968ad5a93 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -259,6 +259,15 @@ static void apic_reset_common(DeviceState *dev)
@@ -278,6 +278,15 @@ static void apic_reset_common(DeviceState *dev)
info->vapic_base_update(s);
apic_init_reset(dev);

View File

@@ -8,15 +8,15 @@ Otherwise creating images on nfs/cifs can be problematic.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/file-posix.c | 61 +++++++++++++++++++++++++++++---------------
block/file-posix.c | 59 ++++++++++++++++++++++++++++++--------------
qapi/block-core.json | 3 ++-
2 files changed, 43 insertions(+), 21 deletions(-)
2 files changed, 42 insertions(+), 20 deletions(-)
diff --git a/block/file-posix.c b/block/file-posix.c
index b527e82a82..36ebd0967e 100644
index 821405fd02..e3b6c3c524 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -2309,6 +2309,7 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
@@ -2465,6 +2465,7 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
int fd;
uint64_t perm, shared;
int result = 0;
@@ -24,7 +24,7 @@ index b527e82a82..36ebd0967e 100644
/* Validate options and set default values */
assert(options->driver == BLOCKDEV_DRIVER_FILE);
@@ -2342,19 +2343,22 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
@@ -2505,19 +2506,22 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
perm = BLK_PERM_WRITE | BLK_PERM_RESIZE;
shared = BLK_PERM_ALL & ~BLK_PERM_RESIZE;
@@ -59,7 +59,7 @@ index b527e82a82..36ebd0967e 100644
}
/* Clear the file by truncating it to 0 */
@@ -2387,13 +2391,15 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
@@ -2571,13 +2575,15 @@ raw_co_create(BlockdevCreateOptions *options, Error **errp)
}
out_unlock:
@@ -82,7 +82,7 @@ index b527e82a82..36ebd0967e 100644
}
out_close:
@@ -2416,6 +2422,7 @@ static int coroutine_fn raw_co_create_opts(BlockDriver *drv,
@@ -2602,6 +2608,7 @@ static int coroutine_fn raw_co_create_opts(BlockDriver *drv,
PreallocMode prealloc;
char *buf = NULL;
Error *local_err = NULL;
@@ -90,7 +90,7 @@ index b527e82a82..36ebd0967e 100644
/* Skip file: protocol prefix */
strstart(filename, "file:", &filename);
@@ -2433,6 +2440,18 @@ static int coroutine_fn raw_co_create_opts(BlockDriver *drv,
@@ -2624,6 +2631,18 @@ static int coroutine_fn raw_co_create_opts(BlockDriver *drv,
return -EINVAL;
}
@@ -109,34 +109,25 @@ index b527e82a82..36ebd0967e 100644
options = (BlockdevCreateOptions) {
.driver = BLOCKDEV_DRIVER_FILE,
.u.file = {
@@ -2442,6 +2461,8 @@ static int coroutine_fn raw_co_create_opts(BlockDriver *drv,
.preallocation = prealloc,
.has_nocow = true,
@@ -2635,6 +2654,8 @@ static int coroutine_fn raw_co_create_opts(BlockDriver *drv,
.nocow = nocow,
.has_extent_size_hint = has_extent_size_hint,
.extent_size_hint = extent_size_hint,
+ .has_locking = true,
+ .locking = locking,
},
};
return raw_co_create(&options, errp);
@@ -2983,7 +3004,7 @@ static int raw_check_perm(BlockDriverState *bs, uint64_t perm, uint64_t shared,
}
/* Copy locks to the new fd */
- if (s->perm_change_fd) {
+ if (s->use_lock && s->perm_change_fd) {
ret = raw_apply_lock_bytes(NULL, s->perm_change_fd, perm, ~shared,
false, errp);
if (ret < 0) {
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 943df1926a..4c55464f86 100644
index 1d3dd9cb48..3f81d6a5c0 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4183,7 +4183,8 @@
'data': { 'filename': 'str',
'size': 'size',
'*preallocation': 'PreallocMode',
- '*nocow': 'bool' } }
+ '*nocow': 'bool',
@@ -4445,7 +4445,8 @@
'size': 'size',
'*preallocation': 'PreallocMode',
'*nocow': 'bool',
- '*extent-size-hint': 'size'} }
+ '*extent-size-hint': 'size',
+ '*locking': 'OnOffAuto' } }
##

View File

@@ -18,10 +18,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 1 insertion(+), 2 deletions(-)
diff --git a/monitor/qmp.c b/monitor/qmp.c
index f89e7daf27..ed5e39fcf7 100644
index 6b8cfcf6d8..3ec67e32d3 100644
--- a/monitor/qmp.c
+++ b/monitor/qmp.c
@@ -406,8 +406,7 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
@@ -519,8 +519,7 @@ void monitor_init_qmp(Chardev *chr, bool pretty, Error **errp)
qemu_chr_fe_set_echo(&mon->common.chr, true);
/* Note: we run QMP monitor in I/O thread when @chr supports that */

View File

@@ -1,22 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Thomas Lamprecht <t.lamprecht@proxmox.com>
Date: Mon, 6 Apr 2020 12:16:51 +0200
Subject: [PATCH] PVE: savevm-async: kick AIO wait on block state write
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
savevm-async.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/savevm-async.c b/savevm-async.c
index 54ceeae26c..393d55af2a 100644
--- a/savevm-async.c
+++ b/savevm-async.c
@@ -158,6 +158,7 @@ static void coroutine_fn block_state_write_entry(void *opaque) {
BlkRwCo *rwco = opaque;
rwco->ret = blk_co_pwritev(snap_state.target, rwco->offset, rwco->qiov->size,
rwco->qiov, 0);
+ aio_wait_kick();
}
static ssize_t block_state_writev_buffer(void *opaque, struct iovec *iov,

View File

@@ -26,10 +26,10 @@ Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/hw/core/machine.c b/hw/core/machine.c
index c1a444cb75..9f56ecc4e8 100644
index 53a99abc56..ad2cb2592e 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -56,7 +56,8 @@ GlobalProperty hw_compat_4_0[] = {
@@ -113,7 +113,8 @@ GlobalProperty hw_compat_4_0[] = {
{ "virtio-vga", "edid", "false" },
{ "virtio-gpu-device", "edid", "false" },
{ "virtio-device", "use-started", "false" },

View File

@@ -1,38 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
Date: Mon, 6 Apr 2020 12:16:52 +0200
Subject: [PATCH] PVE: move snapshot cleanup into bottom half
as per:
(0ceccd858a8d) migration: qemu_savevm_state_cleanup() in cleanup
may affect held locks and therefore change assumptions made
by that function!
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
savevm-async.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/savevm-async.c b/savevm-async.c
index 393d55af2a..790e27ae37 100644
--- a/savevm-async.c
+++ b/savevm-async.c
@@ -201,6 +201,8 @@ static void process_savevm_cleanup(void *opaque)
int ret;
qemu_bh_delete(snap_state.cleanup_bh);
snap_state.cleanup_bh = NULL;
+ qemu_savevm_state_cleanup();
+
qemu_mutex_unlock_iothread();
qemu_thread_join(&snap_state.thread);
qemu_mutex_lock_iothread();
@@ -277,7 +279,6 @@ static void *process_savevm_thread(void *opaque)
save_snapshot_error("qemu_savevm_state_iterate error %d", ret);
break;
}
- qemu_savevm_state_cleanup();
DPRINTF("save complete\n");
break;
}

View File

@@ -0,0 +1,128 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Dietmar Maurer <dietmar@proxmox.com>
Date: Mon, 6 Apr 2020 12:16:55 +0200
Subject: [PATCH] PVE: Allow version code in machine type
E.g. pc-i440fx-4.0+pve3 would print 'pve3' as version code while
selecting pc-i440fx-4.0 as machine type.
Version is made available as 'pve-version' in query-machines (same as,
and only if 'is-current').
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
hw/core/machine-qmp-cmds.c | 6 ++++++
include/hw/boards.h | 2 ++
qapi/machine.json | 4 +++-
softmmu/vl.c | 25 +++++++++++++++++++++++++
4 files changed, 36 insertions(+), 1 deletion(-)
diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 76fff60a6b..ec9201fb9a 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -103,6 +103,12 @@ MachineInfoList *qmp_query_machines(Error **errp)
if (strcmp(mc->name, MACHINE_GET_CLASS(current_machine)->name) == 0) {
info->has_is_current = true;
info->is_current = true;
+
+ // PVE version string only exists for current machine
+ if (mc->pve_version) {
+ info->has_pve_version = true;
+ info->pve_version = g_strdup(mc->pve_version);
+ }
}
if (mc->default_cpu_type) {
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 9c1c190104..51e04bde62 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -227,6 +227,8 @@ struct MachineClass {
const char *desc;
const char *deprecation_reason;
+ const char *pve_version;
+
void (*init)(MachineState *state);
void (*reset)(MachineState *state);
void (*wakeup)(MachineState *state);
diff --git a/qapi/machine.json b/qapi/machine.json
index 0905618e25..a05c46e253 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -160,6 +160,8 @@
#
# @default-ram-id: the default ID of initial RAM memory backend (since 5.2)
#
+# @pve-version: custom PVE version suffix specified as 'machine+pveN'
+#
# Since: 1.2
##
{ 'struct': 'MachineInfo',
@@ -167,7 +169,7 @@
'*is-default': 'bool', '*is-current': 'bool', 'cpu-max': 'int',
'hotpluggable-cpus': 'bool', 'numa-mem-supported': 'bool',
'deprecated': 'bool', '*default-cpu-type': 'str',
- '*default-ram-id': 'str' } }
+ '*default-ram-id': 'str', '*pve-version': 'str' } }
##
# @query-machines:
diff --git a/softmmu/vl.c b/softmmu/vl.c
index eb05e5a000..f306d21d63 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1655,6 +1655,7 @@ static const QEMUOption *lookup_opt(int argc, char **argv,
static MachineClass *select_machine(QDict *qdict, Error **errp)
{
const char *optarg = qdict_get_try_str(qdict, "type");
+ const char *pvever = qdict_get_try_str(qdict, "pvever");
GSList *machines = object_class_get_list(TYPE_MACHINE, false);
MachineClass *machine_class;
Error *local_err = NULL;
@@ -1672,6 +1673,11 @@ static MachineClass *select_machine(QDict *qdict, Error **errp)
}
}
+ if (machine_class) {
+ machine_class->pve_version = g_strdup(pvever);
+ qdict_del(qdict, "pvever");
+ }
+
g_slist_free(machines);
if (local_err) {
error_append_hint(&local_err, "Use -machine help to list supported machines\n");
@@ -3363,12 +3369,31 @@ void qemu_init(int argc, char **argv, char **envp)
case QEMU_OPTION_machine:
{
bool help;
+ size_t pvever_index, name_len;
+ const gchar *name;
+ gchar *name_clean, *pvever;
keyval_parse_into(machine_opts_dict, optarg, "type", &help, &error_fatal);
if (help) {
machine_help_func(machine_opts_dict);
exit(EXIT_SUCCESS);
}
+
+ // PVE version is specified with '+' as seperator, e.g. pc-i440fx+pvever
+ name = qdict_get_try_str(machine_opts_dict, "type");
+ if (name != NULL) {
+ name_len = strlen(name);
+ pvever_index = strcspn(name, "+");
+ if (pvever_index < name_len) {
+ name_clean = g_strndup(name, pvever_index);
+ pvever = g_strndup(name + pvever_index + 1, name_len - pvever_index - 1);
+ qdict_put_str(machine_opts_dict, "pvever", pvever);
+ qdict_put_str(machine_opts_dict, "type", name_clean);
+ g_free(name_clean);
+ g_free(pvever);
+ }
+ }
+
break;
}
case QEMU_OPTION_accel:

View File

@@ -0,0 +1,59 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Wed, 2 Mar 2022 08:35:05 +0100
Subject: [PATCH] block/backup: move bcs bitmap initialization to job creation
For backing up the state of multiple disks from the same time, a job
for each disk has to be created. It's convenient if the jobs don't
have to be started at the same time and if operation of the VM can be
resumed after job creation. This would lead to a window between job
creation and running the job, where writes can happen. But no writes
should happen between setting up the copy-before-write filter and
setting up the block copy state bitmap, because then new writes would
just pass through.
Commit 06e0a9c16405c0a4c1eca33cf286cc04c42066a2 moved initalization of
the bitmap to setting up the copy-before-write filter when sync_mode
is not MIRROR_SYNC_MODE_BITMAP. Ensure that the bitmap is initialized
upon job creation for the remaining case too, by moving the
backup_init_bcs_bitmap call to backup_job_create.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/backup.c | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/block/backup.c b/block/backup.c
index 21d5983779..47e218857d 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -239,8 +239,8 @@ static void backup_init_bcs_bitmap(BackupBlockJob *job)
assert(ret);
} else if (job->sync_mode == MIRROR_SYNC_MODE_TOP) {
/*
- * We can't hog the coroutine to initialize this thoroughly.
- * Set a flag and resume work when we are able to yield safely.
+ * Initialization is costly here. Simply set a flag and let the
+ * backup_run coroutine resume work once it can yield safely.
*/
block_copy_set_skip_unallocated(job->bcs, true);
}
@@ -254,8 +254,6 @@ static int coroutine_fn backup_run(Job *job, Error **errp)
BackupBlockJob *s = container_of(job, BackupBlockJob, common.job);
int ret;
- backup_init_bcs_bitmap(s);
-
if (s->sync_mode == MIRROR_SYNC_MODE_TOP) {
int64_t offset = 0;
int64_t count;
@@ -493,6 +491,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
block_job_add_bdrv(&job->common, "target", target, 0, BLK_PERM_ALL,
&error_abort);
+ backup_init_bcs_bitmap(job);
+
return &job->common;
error:

View File

@@ -1,101 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Dietmar Maurer <dietmar@proxmox.com>
Date: Mon, 6 Apr 2020 12:16:55 +0200
Subject: [PATCH] PVE: Allow version code in machine type
E.g. pc-i440fx-4.0+pve3 would print 'pve3' as version code while
selecting pc-i440fx-4.0 as machine type.
Version is made available as 'pve-version' in query-machines (same as,
and only if 'is-current').
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
hw/core/machine-qmp-cmds.c | 6 ++++++
include/hw/boards.h | 2 ++
qapi/machine.json | 3 ++-
softmmu/vl.c | 15 ++++++++++++++-
4 files changed, 24 insertions(+), 2 deletions(-)
diff --git a/hw/core/machine-qmp-cmds.c b/hw/core/machine-qmp-cmds.c
index 1953633e82..ca8c0dc53d 100644
--- a/hw/core/machine-qmp-cmds.c
+++ b/hw/core/machine-qmp-cmds.c
@@ -234,6 +234,12 @@ MachineInfoList *qmp_query_machines(Error **errp)
if (strcmp(mc->name, MACHINE_GET_CLASS(current_machine)->name) == 0) {
info->has_is_current = true;
info->is_current = true;
+
+ // PVE version string only exists for current machine
+ if (mc->pve_version) {
+ info->has_pve_version = true;
+ info->pve_version = g_strdup(mc->pve_version);
+ }
}
if (mc->default_cpu_type) {
diff --git a/include/hw/boards.h b/include/hw/boards.h
index fd4d62b501..dd395e9232 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -170,6 +170,8 @@ struct MachineClass {
const char *desc;
const char *deprecation_reason;
+ const char *pve_version;
+
void (*init)(MachineState *state);
void (*reset)(MachineState *state);
void (*wakeup)(MachineState *state);
diff --git a/qapi/machine.json b/qapi/machine.json
index f6cf28f9fd..a7f9c79a59 100644
--- a/qapi/machine.json
+++ b/qapi/machine.json
@@ -363,7 +363,8 @@
'data': { 'name': 'str', '*alias': 'str',
'*is-default': 'bool', '*is-current': 'bool', 'cpu-max': 'int',
'hotpluggable-cpus': 'bool', 'numa-mem-supported': 'bool',
- 'deprecated': 'bool', '*default-cpu-type': 'str' } }
+ 'deprecated': 'bool', '*default-cpu-type': 'str',
+ '*pve-version': 'str' } }
##
# @query-machines:
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 9de81875fd..8340c4ca53 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2300,6 +2300,8 @@ static MachineClass *machine_parse(const char *name, GSList *machines)
{
MachineClass *mc;
GSList *el;
+ size_t pvever_index = 0;
+ gchar *name_clean;
if (is_help_option(name)) {
printf("Supported machines are:\n");
@@ -2316,12 +2318,23 @@ static MachineClass *machine_parse(const char *name, GSList *machines)
exit(0);
}
- mc = find_machine(name, machines);
+ // PVE version is specified with '+' as seperator, e.g. pc-i440fx+pvever
+ pvever_index = strcspn(name, "+");
+
+ name_clean = g_strndup(name, pvever_index);
+ mc = find_machine(name_clean, machines);
+ g_free(name_clean);
+
if (!mc) {
error_report("unsupported machine type");
error_printf("Use -machine help to list supported machines\n");
exit(1);
}
+
+ if (pvever_index < strlen(name)) {
+ mc->pve_version = &name[pvever_index+1];
+ }
+
return mc;
}

View File

@@ -3,58 +3,64 @@ From: Dietmar Maurer <dietmar@proxmox.com>
Date: Mon, 6 Apr 2020 12:16:57 +0200
Subject: [PATCH] PVE-Backup: add vma backup format code
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[FE: create: register all streams before entering coroutines]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
Makefile | 3 +-
Makefile.objs | 1 +
vma-reader.c | 857 ++++++++++++++++++++++++++++++++++++++++++++++++++
vma-writer.c | 771 +++++++++++++++++++++++++++++++++++++++++++++
vma.c | 837 ++++++++++++++++++++++++++++++++++++++++++++++++
vma.h | 150 +++++++++
6 files changed, 2618 insertions(+), 1 deletion(-)
block/meson.build | 2 +
meson.build | 5 +
vma-reader.c | 860 ++++++++++++++++++++++++++++++++++++++++++++++
vma-writer.c | 790 ++++++++++++++++++++++++++++++++++++++++++
vma.c | 849 +++++++++++++++++++++++++++++++++++++++++++++
vma.h | 150 ++++++++
6 files changed, 2656 insertions(+)
create mode 100644 vma-reader.c
create mode 100644 vma-writer.c
create mode 100644 vma.c
create mode 100644 vma.h
diff --git a/Makefile b/Makefile
index 8a9113e666..74c2039005 100644
--- a/Makefile
+++ b/Makefile
@@ -479,7 +479,7 @@ dummy := $(call unnest-vars,, \
diff --git a/block/meson.build b/block/meson.build
index c9d1fdca7d..72081a9974 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -44,6 +44,8 @@ block_ss.add(files(
'zeroinit.c',
), zstd, zlib, gnutls)
include $(SRC_PATH)/tests/Makefile.include
+block_ss.add(files('../vma-writer.c'), libuuid)
+
softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
-all: $(DOCS) $(if $(BUILD_DOCS),sphinxdocs) $(TOOLS) $(HELPERS-y) recurse-all modules $(vhost-user-json-y)
+all: $(DOCS) $(if $(BUILD_DOCS),sphinxdocs) $(TOOLS) vma$(EXESUF) $(HELPERS-y) recurse-all modules $(vhost-user-json-y)
block_ss.add(when: 'CONFIG_QCOW1', if_true: files('qcow.c'))
diff --git a/meson.build b/meson.build
index 96de1a6ef9..54c23b9567 100644
--- a/meson.build
+++ b/meson.build
@@ -1202,6 +1202,8 @@ keyutils = dependency('libkeyutils', required: false,
qemu-version.h: FORCE
$(call quiet-command, \
@@ -608,6 +608,7 @@ qemu-img$(EXESUF): qemu-img.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io
qemu-nbd$(EXESUF): qemu-nbd.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
qemu-io$(EXESUF): qemu-io.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
qemu-storage-daemon$(EXESUF): qemu-storage-daemon.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(chardev-obj-y) $(io-obj-y) $(qom-obj-y) $(storage-daemon-obj-y) $(COMMON_LDADDS)
+vma$(EXESUF): vma.o vma-reader.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
has_gettid = cc.has_function('gettid')
qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o $(COMMON_LDADDS)
diff --git a/Makefile.objs b/Makefile.objs
index d0b4dde836..05031a3da7 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -18,6 +18,7 @@ block-obj-y += block.o blockjob.o job.o
block-obj-y += block/ scsi/
block-obj-y += qemu-io-cmds.o
block-obj-$(CONFIG_REPLICATION) += replication.o
+block-obj-y += vma-writer.o
block-obj-m = block/
+libuuid = cc.find_library('uuid', required: true)
+
# libselinux
selinux = dependency('libselinux',
required: get_option('selinux'),
@@ -3070,6 +3072,9 @@ if have_tools
dependencies: [blockdev, qemuutil, gnutls, selinux],
install: true)
+ vma = executable('vma', files('vma.c', 'vma-reader.c') + genh,
+ dependencies: [authz, block, crypto, io, qom], install: true)
+
subdir('storage-daemon')
subdir('contrib/rdmacm-mux')
subdir('contrib/elf2dmp')
diff --git a/vma-reader.c b/vma-reader.c
new file mode 100644
index 0000000000..2b1d1cdab3
index 0000000000..4f4ee2b47b
--- /dev/null
+++ b/vma-reader.c
@@ -0,0 +1,857 @@
@@ -0,0 +1,860 @@
+/*
+ * VMA: Virtual Machine Archive
+ *
@@ -249,6 +255,9 @@ index 0000000000..2b1d1cdab3
+ if (vmar->rstate[i].bitmap) {
+ g_free(vmar->rstate[i].bitmap);
+ }
+ if (vmar->rstate[i].target) {
+ blk_unref(vmar->rstate[i].target);
+ }
+ }
+
+ if (vmar->md5csum) {
@@ -914,10 +923,10 @@ index 0000000000..2b1d1cdab3
+
diff --git a/vma-writer.c b/vma-writer.c
new file mode 100644
index 0000000000..fe86b18a60
index 0000000000..11d8321ffd
--- /dev/null
+++ b/vma-writer.c
@@ -0,0 +1,771 @@
@@ -0,0 +1,790 @@
+/*
+ * VMA: Virtual Machine Archive
+ *
@@ -1213,20 +1222,20 @@ index 0000000000..fe86b18a60
+
+ if ((stat(filename, &st) == 0) && S_ISFIFO(st.st_mode)) {
+ oflags = O_NONBLOCK|O_WRONLY;
+ vmaw->fd = qemu_open(filename, oflags, 0644);
+ vmaw->fd = qemu_open(filename, oflags, errp);
+ } else if (strstart(filename, "/dev/fdset/", &tmp_id_str)) {
+ oflags = O_NONBLOCK|O_WRONLY;
+ vmaw->fd = qemu_open(filename, oflags, 0644);
+ vmaw->fd = qemu_open(filename, oflags, errp);
+ } else if (strstart(filename, "/dev/fdname/", &tmp_id_str)) {
+ vmaw->fd = monitor_get_fd(cur_mon, tmp_id_str, errp);
+ vmaw->fd = monitor_get_fd(monitor_cur(), tmp_id_str, errp);
+ if (vmaw->fd < 0) {
+ goto err;
+ }
+ /* try to use O_NONBLOCK */
+ fcntl(vmaw->fd, F_SETFL, fcntl(vmaw->fd, F_GETFL)|O_NONBLOCK);
+ } else {
+ oflags = O_NONBLOCK|O_DIRECT|O_WRONLY|O_CREAT|O_EXCL;
+ vmaw->fd = qemu_open(filename, oflags, 0644);
+ oflags = O_NONBLOCK|O_DIRECT|O_WRONLY|O_EXCL;
+ vmaw->fd = qemu_create(filename, oflags, 0644, errp);
+ }
+
+ if (vmaw->fd < 0) {
@@ -1553,17 +1562,33 @@ index 0000000000..fe86b18a60
+
+ DPRINTF("VMA WRITE %d %zd\n", dev_id, cluster_num);
+
+ uint64_t dev_size = vmaw->stream_info[dev_id].size;
+ uint16_t mask = 0;
+
+ if (buf) {
+ int i;
+ int bit = 1;
+ uint64_t byte_offset = cluster_num * VMA_CLUSTER_SIZE;
+ for (i = 0; i < 16; i++) {
+ const unsigned char *vmablock = buf + (i*VMA_BLOCK_SIZE);
+ if (!buffer_is_zero(vmablock, VMA_BLOCK_SIZE)) {
+
+ // Note: If the source is not 64k-aligned, we might reach 4k blocks
+ // after the end of the device. Always mark these as zero in the
+ // mask, so the restore handles them correctly.
+ if (byte_offset < dev_size &&
+ !buffer_is_zero(vmablock, VMA_BLOCK_SIZE))
+ {
+ mask |= bit;
+ memcpy(vmaw->outbuf + vmaw->outbuf_pos, vmablock,
+ VMA_BLOCK_SIZE);
+
+ // prevent memory leakage on unaligned last block
+ if (byte_offset + VMA_BLOCK_SIZE > dev_size) {
+ uint64_t real_data_in_block = dev_size - byte_offset;
+ memset(vmaw->outbuf + vmaw->outbuf_pos + real_data_in_block,
+ 0, VMA_BLOCK_SIZE - real_data_in_block);
+ }
+
+ vmaw->outbuf_pos += VMA_BLOCK_SIZE;
+ } else {
+ DPRINTF("VMA WRITE %zd ZERO BLOCK %d\n", cluster_num, i);
@@ -1571,6 +1596,7 @@ index 0000000000..fe86b18a60
+ *zero_bytes += VMA_BLOCK_SIZE;
+ }
+
+ byte_offset += VMA_BLOCK_SIZE;
+ bit = bit << 1;
+ }
+ } else {
@@ -1596,8 +1622,8 @@ index 0000000000..fe86b18a60
+
+ if (dev_id != vmaw->vmstate_stream) {
+ uint64_t last = (cluster_num + 1) * VMA_CLUSTER_SIZE;
+ if (last > vmaw->stream_info[dev_id].size) {
+ uint64_t diff = last - vmaw->stream_info[dev_id].size;
+ if (last > dev_size) {
+ uint64_t diff = last - dev_size;
+ if (diff >= VMA_CLUSTER_SIZE) {
+ vma_writer_set_error(vmaw, "vma_writer_write: "
+ "read after last cluster");
@@ -1687,14 +1713,16 @@ index 0000000000..fe86b18a60
+ g_checksum_free(vmaw->md5csum);
+ }
+
+ qemu_vfree(vmaw->headerbuf);
+ qemu_vfree(vmaw->outbuf);
+ g_free(vmaw);
+}
diff --git a/vma.c b/vma.c
new file mode 100644
index 0000000000..a82752448a
index 0000000000..89440733b1
--- /dev/null
+++ b/vma.c
@@ -0,0 +1,837 @@
@@ -0,0 +1,849 @@
+/*
+ * VMA: Virtual Machine Archive
+ *
@@ -2006,8 +2034,6 @@ index 0000000000..a82752448a
+ int vmstate_fd = -1;
+ guint8 vmstate_stream = 0;
+
+ BlockBackend *blk = NULL;
+
+ for (i = 1; i < 255; i++) {
+ VmaDeviceInfo *di = vma_reader_get_device_info(vmar, i);
+ if (di && (strcmp(di->devname, "vmstate") == 0)) {
@@ -2028,6 +2054,8 @@ index 0000000000..a82752448a
+ int flags = BDRV_O_RDWR;
+ bool write_zero = true;
+
+ BlockBackend *blk = NULL;
+
+ if (readmap) {
+ RestoreMap *map;
+ map = (RestoreMap *)g_hash_table_lookup(devmap, di->devname);
@@ -2140,8 +2168,6 @@ index 0000000000..a82752448a
+
+ vma_reader_destroy(vmar);
+
+ blk_unref(blk);
+
+ bdrv_close_all();
+
+ return ret;
@@ -2262,6 +2288,7 @@ index 0000000000..a82752448a
+ g_warning("vma_writer_close failed %s", error_get_pretty(err));
+ }
+ }
+ qemu_vfree(buf);
+}
+
+static int create_archive(int argc, char **argv)
@@ -2269,6 +2296,7 @@ index 0000000000..a82752448a
+ int i, c;
+ int verbose = 0;
+ const char *archivename;
+ GList *backup_coroutines = NULL;
+ GList *config_files = NULL;
+
+ for (;;) {
@@ -2357,7 +2385,9 @@ index 0000000000..a82752448a
+ job->dev_id = dev_id;
+
+ Coroutine *co = qemu_coroutine_create(backup_run, job);
+ qemu_coroutine_enter(co);
+ // Don't enter coroutine yet, because it might write the header before
+ // all streams can be registered.
+ backup_coroutines = g_list_append(backup_coroutines, co);
+ }
+
+ VmaStatus vmastat;
@@ -2365,6 +2395,13 @@ index 0000000000..a82752448a
+ int last_percent = -1;
+
+ if (devcount) {
+ GList *entry = backup_coroutines;
+ while (entry && entry->data) {
+ Coroutine *co = entry->data;
+ qemu_coroutine_enter(co);
+ entry = g_list_next(entry);
+ }
+
+ while (1) {
+ main_loop_wait(false);
+ vma_writer_get_status(vmaw, &vmastat);
@@ -2429,6 +2466,9 @@ index 0000000000..a82752448a
+ g_error("creating vma archive failed");
+ }
+
+ g_list_free(backup_coroutines);
+ g_list_free(config_files);
+ vma_writer_destroy(vmaw);
+ return 0;
+}
+

View File

@@ -7,33 +7,23 @@ Subject: [PATCH] PVE-Backup: add backup-dump block driver
- move BackupBlockJob declaration from block/backup.c to include/block/block_int.h
- block/backup.c - backup-job-create: also consider source cluster size
- job.c: make job_should_pause non-static
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/Makefile.objs | 1 +
block/backup-dump.c | 169 ++++++++++++++++++++++++++++++++++++++
block/backup.c | 23 ++----
include/block/block_int.h | 30 +++++++
block/backup-dump.c | 168 ++++++++++++++++++++++++++++++++++++++
block/backup.c | 30 ++-----
block/meson.build | 1 +
include/block/block_int.h | 35 ++++++++
job.c | 3 +-
5 files changed, 207 insertions(+), 19 deletions(-)
5 files changed, 214 insertions(+), 23 deletions(-)
create mode 100644 block/backup-dump.c
diff --git a/block/Makefile.objs b/block/Makefile.objs
index 1282445672..8af7073c83 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -34,6 +34,7 @@ block-obj-$(CONFIG_RBD) += rbd.o
block-obj-$(CONFIG_GLUSTERFS) += gluster.o
block-obj-$(CONFIG_VXHS) += vxhs.o
block-obj-$(CONFIG_LIBSSH) += ssh.o
+block-obj-y += backup-dump.o
block-obj-y += accounting.o dirty-bitmap.o
block-obj-y += write-threshold.o
block-obj-y += backup.o
diff --git a/block/backup-dump.c b/block/backup-dump.c
new file mode 100644
index 0000000000..3066ab0698
index 0000000000..93d7f46950
--- /dev/null
+++ b/block/backup-dump.c
@@ -0,0 +1,169 @@
@@ -0,0 +1,168 @@
+/*
+ * BlockDriver to send backup data stream to a callback function
+ *
@@ -61,7 +51,6 @@ index 0000000000..3066ab0698
+ BDRVBackupDumpState *s = bs->opaque;
+
+ bdi->cluster_size = s->dump_cb_block_size;
+ bdi->unallocated_blocks_are_zero = true;
+ return 0;
+}
+
@@ -142,7 +131,7 @@ index 0000000000..3066ab0698
+static void qemu_backup_dump_child_perm(
+ BlockDriverState *bs,
+ BdrvChild *c,
+ const BdrvChildRole *role,
+ BdrvChildRole role,
+ BlockReopenQueue *reopen_queue,
+ uint64_t perm, uint64_t shared,
+ uint64_t *nperm, uint64_t *nshared)
@@ -204,17 +193,18 @@ index 0000000000..3066ab0698
+ return bs;
+}
diff --git a/block/backup.c b/block/backup.c
index ecd93e91e0..cf8f5ad25d 100644
index 47e218857d..4d8fad70c4 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -32,24 +32,6 @@
@@ -29,28 +29,6 @@
#define BACKUP_CLUSTER_SIZE_DEFAULT (1 << 16)
#include "block/copy-before-write.h"
-typedef struct BackupBlockJob {
- BlockJob common;
- BlockDriverState *backup_top;
- BlockDriverState *cbw;
- BlockDriverState *source_bs;
- BlockDriverState *target_bs;
-
- BdrvDirtyBitmap *sync_bitmap;
-
@@ -223,32 +213,58 @@ index ecd93e91e0..cf8f5ad25d 100644
- BlockdevOnError on_source_error;
- BlockdevOnError on_target_error;
- uint64_t len;
- uint64_t bytes_read;
- int64_t cluster_size;
- BackupPerf perf;
-
- BlockCopyState *bcs;
-
- bool wait;
- BlockCopyCallState *bg_bcs_call;
-} BackupBlockJob;
-
static const BlockJobDriver backup_job_driver;
static void backup_progress_bytes_callback(int64_t bytes, void *opaque)
@@ -411,6 +393,11 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
goto error;
static void backup_cleanup_sync_bitmap(BackupBlockJob *job, int ret)
@@ -455,6 +433,14 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
}
cluster_size = block_copy_cluster_size(bcs);
+ if (cluster_size < 0) {
+ goto error;
+ }
+
+ BlockDriverInfo bdi;
+ if (bdrv_get_info(bs, &bdi) == 0) {
+ cluster_size = MAX(cluster_size, bdi.cluster_size);
+ }
+
/*
* If source is in backing chain of target assume that target is going to be
* used for "image fleecing", i.e. it should represent a kind of snapshot of
if (perf->max_chunk && perf->max_chunk < cluster_size) {
error_setg(errp, "Required max-chunk (%" PRIi64 ") is less than backup "
diff --git a/block/meson.build b/block/meson.build
index 72081a9974..7883df047c 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -4,6 +4,7 @@ block_ss.add(files(
'aio_task.c',
'amend.c',
'backup.c',
+ 'backup-dump.c',
'copy-before-write.c',
'blkdebug.c',
'blklogwrites.c',
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 336f71e69d..62e5579723 100644
index f4c75e8ba9..169dc43d59 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -60,6 +60,36 @@
@@ -26,6 +26,7 @@
#include "block/accounting.h"
#include "block/block.h"
+#include "block/block-copy.h"
#include "block/aio-wait.h"
#include "qemu/queue.h"
#include "qemu/coroutine.h"
@@ -64,6 +65,40 @@
#define BLOCK_PROBE_BUF_SIZE 512
@@ -266,8 +282,9 @@ index 336f71e69d..62e5579723 100644
+typedef struct BlockCopyState BlockCopyState;
+typedef struct BackupBlockJob {
+ BlockJob common;
+ BlockDriverState *backup_top;
+ BlockDriverState *cbw;
+ BlockDriverState *source_bs;
+ BlockDriverState *target_bs;
+
+ BdrvDirtyBitmap *sync_bitmap;
+
@@ -276,20 +293,23 @@ index 336f71e69d..62e5579723 100644
+ BlockdevOnError on_source_error;
+ BlockdevOnError on_target_error;
+ uint64_t len;
+ uint64_t bytes_read;
+ int64_t cluster_size;
+ BackupPerf perf;
+
+ BlockCopyState *bcs;
+
+ bool wait;
+ BlockCopyCallState *bg_bcs_call;
+} BackupBlockJob;
+
enum BdrvTrackedRequestType {
BDRV_TRACKED_READ,
BDRV_TRACKED_WRITE,
diff --git a/job.c b/job.c
index e82253e041..bcbbb0be02 100644
index dbfa67bb0a..af25dd5b98 100644
--- a/job.c
+++ b/job.c
@@ -269,7 +269,8 @@ static bool job_started(Job *job)
@@ -276,7 +276,8 @@ static bool job_started(Job *job)
return job->co;
}

View File

@@ -1,92 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
Date: Mon, 6 Apr 2020 12:16:56 +0200
Subject: [PATCH] PVE-Backup: modify job api
Introduce a pause_count parameter to start a backup in
paused mode. This way backups of multiple drives can be
started up sequentially via the completion callback while
having been started at the same point in time.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/backup.c | 3 +++
block/replication.c | 2 +-
blockdev.c | 3 ++-
include/block/block_int.h | 1 +
job.c | 2 +-
5 files changed, 8 insertions(+), 3 deletions(-)
diff --git a/block/backup.c b/block/backup.c
index a7a7dcaf4c..ecd93e91e0 100644
--- a/block/backup.c
+++ b/block/backup.c
@@ -338,6 +338,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
BlockdevOnError on_target_error,
int creation_flags,
BlockCompletionFunc *cb, void *opaque,
+ int pause_count,
JobTxn *txn, Error **errp)
{
int64_t len;
@@ -459,6 +460,8 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
block_job_add_bdrv(&job->common, "target", target, 0, BLK_PERM_ALL,
&error_abort);
+ job->common.job.pause_count += pause_count;
+
return &job->common;
error:
diff --git a/block/replication.c b/block/replication.c
index da013c2041..17246a822c 100644
--- a/block/replication.c
+++ b/block/replication.c
@@ -554,7 +554,7 @@ static void replication_start(ReplicationState *rs, ReplicationMode mode,
0, MIRROR_SYNC_MODE_NONE, NULL, 0, false, NULL,
BLOCKDEV_ON_ERROR_REPORT,
BLOCKDEV_ON_ERROR_REPORT, JOB_INTERNAL,
- backup_job_completed, bs, NULL, &local_err);
+ backup_job_completed, bs, 0, NULL, &local_err);
if (local_err) {
error_propagate(errp, local_err);
backup_job_cleanup(bs);
diff --git a/blockdev.c b/blockdev.c
index 5faddaa705..65c358e4ef 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -3114,7 +3114,8 @@ static BlockJob *do_backup_common(BackupCommon *backup,
backup->filter_node_name,
backup->on_source_error,
backup->on_target_error,
- job_flags, NULL, NULL, txn, errp);
+ job_flags, NULL, NULL, 0, txn, errp);
+
return job;
}
diff --git a/include/block/block_int.h b/include/block/block_int.h
index 4c3587ea19..336f71e69d 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -1219,6 +1219,7 @@ BlockJob *backup_job_create(const char *job_id, BlockDriverState *bs,
BlockdevOnError on_target_error,
int creation_flags,
BlockCompletionFunc *cb, void *opaque,
+ int pause_count,
JobTxn *txn, Error **errp);
BdrvChild *bdrv_root_attach_child(BlockDriverState *child_bs,
diff --git a/job.c b/job.c
index 53be57a3a0..e82253e041 100644
--- a/job.c
+++ b/job.c
@@ -918,7 +918,7 @@ void job_start(Job *job)
job->co = qemu_coroutine_create(job_co_entry, job);
job->pause_count--;
job->busy = true;
- job->paused = false;
+ job->paused = job->pause_count > 0;
job_state_transition(job, JOB_STATUS_RUNNING);
aio_co_enter(job->aio_context, job->co);
}

View File

@@ -4,40 +4,34 @@ Date: Mon, 6 Apr 2020 12:17:01 +0200
Subject: [PATCH] PVE-Backup: pbs-restore - new command to restore from proxmox
backup server
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
Makefile | 4 +-
pbs-restore.c | 208 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 211 insertions(+), 1 deletion(-)
meson.build | 4 +
pbs-restore.c | 224 ++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 228 insertions(+)
create mode 100644 pbs-restore.c
diff --git a/Makefile b/Makefile
index dbd9542ae4..7c1fb58e18 100644
--- a/Makefile
+++ b/Makefile
@@ -479,7 +479,7 @@ dummy := $(call unnest-vars,, \
include $(SRC_PATH)/tests/Makefile.include
-all: $(DOCS) $(if $(BUILD_DOCS),sphinxdocs) $(TOOLS) vma$(EXESUF) $(HELPERS-y) recurse-all modules $(vhost-user-json-y)
+all: $(DOCS) $(if $(BUILD_DOCS),sphinxdocs) $(TOOLS) vma$(EXESUF) pbs-restore$(EXESUF) $(HELPERS-y) recurse-all modules $(vhost-user-json-y)
qemu-version.h: FORCE
$(call quiet-command, \
@@ -610,6 +610,8 @@ qemu-io$(EXESUF): qemu-io.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-o
qemu-storage-daemon$(EXESUF): qemu-storage-daemon.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(chardev-obj-y) $(io-obj-y) $(qom-obj-y) $(storage-daemon-obj-y) $(COMMON_LDADDS)
qemu-storage-daemon$(EXESUF): LIBS += -lproxmox_backup_qemu
vma$(EXESUF): vma.o vma-reader.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
+pbs-restore$(EXESUF): pbs-restore.o $(authz-obj-y) $(block-obj-y) $(crypto-obj-y) $(io-obj-y) $(qom-obj-y) $(COMMON_LDADDS)
+pbs-restore$(EXESUF): LIBS += -lproxmox_backup_qemu
qemu-bridge-helper$(EXESUF): qemu-bridge-helper.o $(COMMON_LDADDS)
diff --git a/meson.build b/meson.build
index 37dab249cc..1a4dfab4e2 100644
--- a/meson.build
+++ b/meson.build
@@ -3076,6 +3076,10 @@ if have_tools
vma = executable('vma', files('vma.c', 'vma-reader.c') + genh,
dependencies: [authz, block, crypto, io, qom], install: true)
+ pbs_restore = executable('pbs-restore', files('pbs-restore.c') + genh,
+ dependencies: [authz, block, crypto, io, qom,
+ libproxmox_backup_qemu], install: true)
+
subdir('storage-daemon')
subdir('contrib/rdmacm-mux')
subdir('contrib/elf2dmp')
diff --git a/pbs-restore.c b/pbs-restore.c
new file mode 100644
index 0000000000..f65de8b890
index 0000000000..4d3f925a1b
--- /dev/null
+++ b/pbs-restore.c
@@ -0,0 +1,208 @@
@@ -0,0 +1,224 @@
+/*
+ * Qemu image restore helper for Proxmox Backup
+ *
@@ -124,7 +118,7 @@ index 0000000000..f65de8b890
+
+ error_init(argv[0]);
+
+ for(;;) {
+ for (;;) {
+ static const struct option long_options[] = {
+ {"help", no_argument, 0, 'h'},
+ {"skip-zero", no_argument, 0, 'S'},
@@ -138,31 +132,31 @@ index 0000000000..f65de8b890
+ if (c == -1) {
+ break;
+ }
+ switch(c) {
+ case ':':
+ fprintf(stderr, "missing argument for option '%s'", argv[optind - 1]);
+ return -1;
+ case '?':
+ fprintf(stderr, "unrecognized option '%s'", argv[optind - 1]);
+ return -1;
+ case 'f':
+ format = g_strdup(argv[optind - 1]);
+ break;
+ case 'r':
+ repository = g_strdup(argv[optind - 1]);
+ break;
+ case 'k':
+ keyfile = g_strdup(argv[optind - 1]);
+ break;
+ case 'v':
+ verbose = true;
+ break;
+ case 'S':
+ skip_zero = true;
+ break;
+ case 'h':
+ help();
+ return 0;
+ switch (c) {
+ case ':':
+ fprintf(stderr, "missing argument for option '%s'\n", argv[optind - 1]);
+ return -1;
+ case '?':
+ fprintf(stderr, "unrecognized option '%s'\n", argv[optind - 1]);
+ return -1;
+ case 'f':
+ format = g_strdup(argv[optind - 1]);
+ break;
+ case 'r':
+ repository = g_strdup(argv[optind - 1]);
+ break;
+ case 'k':
+ keyfile = g_strdup(argv[optind - 1]);
+ break;
+ case 'v':
+ verbose = true;
+ break;
+ case 'S':
+ skip_zero = true;
+ break;
+ case 'h':
+ help();
+ return 0;
+ }
+ }
+
@@ -197,31 +191,42 @@ index 0000000000..f65de8b890
+ bdrv_init();
+ module_call_init(MODULE_INIT_QOM);
+
+ if (verbose) {
+ fprintf(stderr, "connecting to repository '%s'\n", repository);
+ }
+ char *pbs_error = NULL;
+ ProxmoxRestoreHandle *conn = proxmox_restore_connect(
+ ProxmoxRestoreHandle *conn = proxmox_restore_new(
+ repository, snapshot, password, keyfile, key_password, fingerprint, &pbs_error);
+ if (conn == NULL) {
+ fprintf(stderr, "restore failed: %s\n", pbs_error);
+ return -1;
+ }
+
+ int res = proxmox_restore_connect(conn, &pbs_error);
+ if (res < 0 || pbs_error) {
+ fprintf(stderr, "restore failed (connection error): %s\n", pbs_error);
+ return -1;
+ }
+
+ QDict *options = qdict_new();
+ qdict_put_str(options, "driver", format);
+
+ if (format) {
+ qdict_put_str(options, "driver", format);
+ }
+
+
+ if (verbose) {
+ fprintf(stderr, "open block backend for target '%s'\n", target);
+ }
+ Error *local_err = NULL;
+ int flags = BDRV_O_RDWR;
+
+ BlockBackend *blk = blk_new_open(target, NULL, options, flags, &local_err);
+ if (!blk) {
+ fprintf(stderr, "%s\n", error_get_pretty(local_err));
+ return -1;
+ }
+
+ CallbackData *callback_data = calloc(sizeof( CallbackData), 1);
+ CallbackData *callback_data = calloc(sizeof(CallbackData), 1);
+
+ callback_data->target = blk;
+ callback_data->skip_zero = skip_zero;
@@ -229,7 +234,11 @@ index 0000000000..f65de8b890
+
+ // blk_set_enable_write_cache(blk, !writethrough);
+
+ int res = proxmox_restore_image(
+ if (verbose) {
+ fprintf(stderr, "starting to restore snapshot '%s'\n", snapshot);
+ fflush(stderr); // ensure we do not get printed after the progress log
+ }
+ res = proxmox_restore_image(
+ conn,
+ archive_name,
+ write_callback,
@@ -238,6 +247,7 @@ index 0000000000..f65de8b890
+ verbose);
+
+ proxmox_restore_disconnect(conn);
+ blk_unref(blk);
+
+ if (res < 0) {
+ fprintf(stderr, "restore failed: %s\n", pbs_error);

View File

@@ -0,0 +1,452 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Mon, 29 Jun 2020 11:06:03 +0200
Subject: [PATCH] PVE-Backup: Add dirty-bitmap tracking for incremental backups
Uses QEMU's existing MIRROR_SYNC_MODE_BITMAP and a dirty-bitmap on top
of all backed-up drives. This will only execute the data-write callback
for any changed chunks, the PBS rust code will reuse chunks from the
previous index for everything it doesn't receive if reuse_index is true.
On error or cancellation, remove all dirty bitmaps to ensure
consistency.
Add PBS/incremental specific information to query backup info QMP and
HMP commands.
Only supported for PBS backups.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 1 +
monitor/hmp-cmds.c | 45 ++++++++++----
proxmox-backup-client.c | 3 +-
proxmox-backup-client.h | 1 +
pve-backup.c | 103 ++++++++++++++++++++++++++++++---
qapi/block-core.json | 12 +++-
6 files changed, 142 insertions(+), 23 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index f6668ab01d..3c06734e6d 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1042,6 +1042,7 @@ void hmp_backup(Monitor *mon, const QDict *qdict)
false, NULL, // PBS fingerprint
false, NULL, // PBS backup-id
false, 0, // PBS backup-time
+ false, false, // PBS incremental
true, dir ? BACKUP_FORMAT_DIR : BACKUP_FORMAT_VMA,
false, NULL, false, NULL, !!devlist,
devlist, qdict_haskey(qdict, "speed"), speed, &error);
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index b2687eae3a..cfd7a60f32 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -221,19 +221,42 @@ void hmp_info_backup(Monitor *mon, const QDict *qdict)
monitor_printf(mon, "End time: %s", ctime(&info->end_time));
}
- int per = (info->has_total && info->total &&
- info->has_transferred && info->transferred) ?
- (info->transferred * 100)/info->total : 0;
- int zero_per = (info->has_total && info->total &&
- info->has_zero_bytes && info->zero_bytes) ?
- (info->zero_bytes * 100)/info->total : 0;
monitor_printf(mon, "Backup file: %s\n", info->backup_file);
monitor_printf(mon, "Backup uuid: %s\n", info->uuid);
- monitor_printf(mon, "Total size: %zd\n", info->total);
- monitor_printf(mon, "Transferred bytes: %zd (%d%%)\n",
- info->transferred, per);
- monitor_printf(mon, "Zero bytes: %zd (%d%%)\n",
- info->zero_bytes, zero_per);
+
+ if (!(info->has_total && info->total)) {
+ // this should not happen normally
+ monitor_printf(mon, "Total size: %d\n", 0);
+ } else {
+ bool incremental = false;
+ size_t total_or_dirty = info->total;
+ if (info->has_transferred) {
+ if (info->has_dirty && info->dirty) {
+ if (info->dirty < info->total) {
+ total_or_dirty = info->dirty;
+ incremental = true;
+ }
+ }
+ }
+
+ int per = (info->transferred * 100)/total_or_dirty;
+
+ monitor_printf(mon, "Backup mode: %s\n", incremental ? "incremental" : "full");
+
+ int zero_per = (info->has_zero_bytes && info->zero_bytes) ?
+ (info->zero_bytes * 100)/info->total : 0;
+ monitor_printf(mon, "Total size: %zd\n", info->total);
+ monitor_printf(mon, "Transferred bytes: %zd (%d%%)\n",
+ info->transferred, per);
+ monitor_printf(mon, "Zero bytes: %zd (%d%%)\n",
+ info->zero_bytes, zero_per);
+
+ if (info->has_reused) {
+ int reused_per = (info->reused * 100)/total_or_dirty;
+ monitor_printf(mon, "Reused bytes: %zd (%d%%)\n",
+ info->reused, reused_per);
+ }
+ }
}
qapi_free_BackupStatus(info);
diff --git a/proxmox-backup-client.c b/proxmox-backup-client.c
index a8f6653a81..4ce7bc0b5e 100644
--- a/proxmox-backup-client.c
+++ b/proxmox-backup-client.c
@@ -89,6 +89,7 @@ proxmox_backup_co_register_image(
ProxmoxBackupHandle *pbs,
const char *device_name,
uint64_t size,
+ bool incremental,
Error **errp)
{
Coroutine *co = qemu_coroutine_self();
@@ -98,7 +99,7 @@ proxmox_backup_co_register_image(
int pbs_res = -1;
proxmox_backup_register_image_async(
- pbs, device_name, size ,proxmox_backup_schedule_wake, &waker, &pbs_res, &pbs_err);
+ pbs, device_name, size, incremental, proxmox_backup_schedule_wake, &waker, &pbs_res, &pbs_err);
qemu_coroutine_yield();
if (pbs_res < 0) {
if (errp) error_setg(errp, "backup register image failed: %s", pbs_err ? pbs_err : "unknown error");
diff --git a/proxmox-backup-client.h b/proxmox-backup-client.h
index 1dda8b7d8f..8cbf645b2c 100644
--- a/proxmox-backup-client.h
+++ b/proxmox-backup-client.h
@@ -32,6 +32,7 @@ proxmox_backup_co_register_image(
ProxmoxBackupHandle *pbs,
const char *device_name,
uint64_t size,
+ bool incremental,
Error **errp);
diff --git a/pve-backup.c b/pve-backup.c
index 88f5ee133f..1c49cd178d 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -28,6 +28,8 @@
*
*/
+const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap";
+
static struct PVEBackupState {
struct {
// Everithing accessed from qmp_backup_query command is protected using lock
@@ -39,7 +41,9 @@ static struct PVEBackupState {
uuid_t uuid;
char uuid_str[37];
size_t total;
+ size_t dirty;
size_t transferred;
+ size_t reused;
size_t zero_bytes;
} stat;
int64_t speed;
@@ -66,6 +70,7 @@ typedef struct PVEBackupDevInfo {
uint8_t dev_id;
bool completed;
char targetfile[PATH_MAX];
+ BdrvDirtyBitmap *bitmap;
BlockDriverState *target;
} PVEBackupDevInfo;
@@ -105,11 +110,12 @@ static bool pvebackup_error_or_canceled(void)
return error_or_canceled;
}
-static void pvebackup_add_transfered_bytes(size_t transferred, size_t zero_bytes)
+static void pvebackup_add_transfered_bytes(size_t transferred, size_t zero_bytes, size_t reused)
{
qemu_mutex_lock(&backup_state.stat.lock);
backup_state.stat.zero_bytes += zero_bytes;
backup_state.stat.transferred += transferred;
+ backup_state.stat.reused += reused;
qemu_mutex_unlock(&backup_state.stat.lock);
}
@@ -148,7 +154,8 @@ pvebackup_co_dump_pbs_cb(
pvebackup_propagate_error(local_err);
return pbs_res;
} else {
- pvebackup_add_transfered_bytes(size, !buf ? size : 0);
+ size_t reused = (pbs_res == 0) ? size : 0;
+ pvebackup_add_transfered_bytes(size, !buf ? size : 0, reused);
}
return size;
@@ -208,11 +215,11 @@ pvebackup_co_dump_vma_cb(
} else {
if (remaining >= VMA_CLUSTER_SIZE) {
assert(ret == VMA_CLUSTER_SIZE);
- pvebackup_add_transfered_bytes(VMA_CLUSTER_SIZE, zero_bytes);
+ pvebackup_add_transfered_bytes(VMA_CLUSTER_SIZE, zero_bytes, 0);
remaining -= VMA_CLUSTER_SIZE;
} else {
assert(ret == remaining);
- pvebackup_add_transfered_bytes(remaining, zero_bytes);
+ pvebackup_add_transfered_bytes(remaining, zero_bytes, 0);
remaining = 0;
}
}
@@ -248,6 +255,18 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
if (local_err != NULL) {
pvebackup_propagate_error(local_err);
}
+ } else {
+ // on error or cancel we cannot ensure synchronization of dirty
+ // bitmaps with backup server, so remove all and do full backup next
+ GList *l = backup_state.di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ if (di->bitmap) {
+ bdrv_release_dirty_bitmap(di->bitmap);
+ }
+ }
}
proxmox_backup_disconnect(backup_state.pbs);
@@ -303,6 +322,12 @@ static void pvebackup_complete_cb(void *opaque, int ret)
// remove self from job queue
backup_state.di_list = g_list_remove(backup_state.di_list, di);
+ if (di->bitmap && ret < 0) {
+ // on error or cancel we cannot ensure synchronization of dirty
+ // bitmaps with backup server, so remove all and do full backup next
+ bdrv_release_dirty_bitmap(di->bitmap);
+ }
+
g_free(di);
qemu_mutex_unlock(&backup_state.backup_mutex);
@@ -472,12 +497,18 @@ static bool create_backup_jobs(void) {
assert(di->target != NULL);
+ MirrorSyncMode sync_mode = MIRROR_SYNC_MODE_FULL;
+ BitmapSyncMode bitmap_mode = BITMAP_SYNC_MODE_NEVER;
+ if (di->bitmap) {
+ sync_mode = MIRROR_SYNC_MODE_BITMAP;
+ bitmap_mode = BITMAP_SYNC_MODE_ON_SUCCESS;
+ }
AioContext *aio_context = bdrv_get_aio_context(di->bs);
aio_context_acquire(aio_context);
BlockJob *job = backup_job_create(
- NULL, di->bs, di->target, backup_state.speed, MIRROR_SYNC_MODE_FULL, NULL,
- BITMAP_SYNC_MODE_NEVER, false, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
+ NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
+ bitmap_mode, false, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
JOB_DEFAULT, pvebackup_complete_cb, di, NULL, &local_err);
aio_context_release(aio_context);
@@ -528,6 +559,8 @@ typedef struct QmpBackupTask {
const char *fingerprint;
bool has_fingerprint;
int64_t backup_time;
+ bool has_use_dirty_bitmap;
+ bool use_dirty_bitmap;
bool has_format;
BackupFormat format;
bool has_config_file;
@@ -619,6 +652,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
size_t total = 0;
+ size_t dirty = 0;
l = di_list;
while (l) {
@@ -656,6 +690,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
int dump_cb_block_size = PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE; // Hardcoded (4M)
firewall_name = "fw.conf";
+ bool use_dirty_bitmap = task->has_use_dirty_bitmap && task->use_dirty_bitmap;
+
char *pbs_err = NULL;
pbs = proxmox_backup_new(
task->backup_file,
@@ -675,7 +711,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
goto err;
}
- if (proxmox_backup_co_connect(pbs, task->errp) < 0)
+ int connect_result = proxmox_backup_co_connect(pbs, task->errp);
+ if (connect_result < 0)
goto err;
/* register all devices */
@@ -686,9 +723,40 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
const char *devname = bdrv_get_device_name(di->bs);
- int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, task->errp);
- if (dev_id < 0)
+ BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
+ bool expect_only_dirty = false;
+
+ if (use_dirty_bitmap) {
+ if (bitmap == NULL) {
+ bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, task->errp);
+ if (!bitmap) {
+ goto err;
+ }
+ } else {
+ expect_only_dirty = proxmox_backup_check_incremental(pbs, devname, di->size) != 0;
+ }
+
+ if (expect_only_dirty) {
+ dirty += bdrv_get_dirty_count(bitmap);
+ } else {
+ /* mark entire bitmap as dirty to make full backup */
+ bdrv_set_dirty_bitmap(bitmap, 0, di->size);
+ dirty += di->size;
+ }
+ di->bitmap = bitmap;
+ } else {
+ dirty += di->size;
+
+ /* after a full backup the old dirty bitmap is invalid anyway */
+ if (bitmap != NULL) {
+ bdrv_release_dirty_bitmap(bitmap);
+ }
+ }
+
+ int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, task->errp);
+ if (dev_id < 0) {
goto err;
+ }
if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, task->errp))) {
goto err;
@@ -697,6 +765,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
di->dev_id = dev_id;
}
} else if (format == BACKUP_FORMAT_VMA) {
+ dirty = total;
+
vmaw = vma_writer_create(task->backup_file, uuid, &local_err);
if (!vmaw) {
if (local_err) {
@@ -724,6 +794,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
}
} else if (format == BACKUP_FORMAT_DIR) {
+ dirty = total;
+
if (mkdir(task->backup_file, 0640) != 0) {
error_setg_errno(task->errp, errno, "can't create directory '%s'\n",
task->backup_file);
@@ -796,8 +868,10 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
char *uuid_str = g_strdup(backup_state.stat.uuid_str);
backup_state.stat.total = total;
+ backup_state.stat.dirty = dirty;
backup_state.stat.transferred = 0;
backup_state.stat.zero_bytes = 0;
+ backup_state.stat.reused = format == BACKUP_FORMAT_PBS && dirty >= total ? 0 : total - dirty;
qemu_mutex_unlock(&backup_state.stat.lock);
@@ -821,6 +895,10 @@ err:
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
+ if (di->bitmap) {
+ bdrv_release_dirty_bitmap(di->bitmap);
+ }
+
if (di->target) {
bdrv_unref(di->target);
}
@@ -862,6 +940,7 @@ UuidInfo *qmp_backup(
bool has_fingerprint, const char *fingerprint,
bool has_backup_id, const char *backup_id,
bool has_backup_time, int64_t backup_time,
+ bool has_use_dirty_bitmap, bool use_dirty_bitmap,
bool has_format, BackupFormat format,
bool has_config_file, const char *config_file,
bool has_firewall_file, const char *firewall_file,
@@ -880,6 +959,8 @@ UuidInfo *qmp_backup(
.backup_id = backup_id,
.has_backup_time = has_backup_time,
.backup_time = backup_time,
+ .has_use_dirty_bitmap = has_use_dirty_bitmap,
+ .use_dirty_bitmap = use_dirty_bitmap,
.has_format = has_format,
.format = format,
.has_config_file = has_config_file,
@@ -948,10 +1029,14 @@ BackupStatus *qmp_query_backup(Error **errp)
info->has_total = true;
info->total = backup_state.stat.total;
+ info->has_dirty = true;
+ info->dirty = backup_state.stat.dirty;
info->has_zero_bytes = true;
info->zero_bytes = backup_state.stat.zero_bytes;
info->has_transferred = true;
info->transferred = backup_state.stat.transferred;
+ info->has_reused = true;
+ info->reused = backup_state.stat.reused;
qemu_mutex_unlock(&backup_state.stat.lock);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 551ee28275..b9d6f52f0c 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -757,8 +757,13 @@
#
# @total: total amount of bytes involved in the backup process
#
+# @dirty: with incremental mode (PBS) this is the amount of bytes involved
+# in the backup process which are marked dirty.
+#
# @transferred: amount of bytes already backed up.
#
+# @reused: amount of bytes reused due to deduplication.
+#
# @zero-bytes: amount of 'zero' bytes detected.
#
# @start-time: time (epoch) when backup job started.
@@ -771,8 +776,8 @@
#
##
{ 'struct': 'BackupStatus',
- 'data': {'*status': 'str', '*errmsg': 'str', '*total': 'int',
- '*transferred': 'int', '*zero-bytes': 'int',
+ 'data': {'*status': 'str', '*errmsg': 'str', '*total': 'int', '*dirty': 'int',
+ '*transferred': 'int', '*zero-bytes': 'int', '*reused': 'int',
'*start-time': 'int', '*end-time': 'int',
'*backup-file': 'str', '*uuid': 'str' } }
@@ -815,6 +820,8 @@
#
# @backup-time: backup timestamp (Unix epoch, required for format 'pbs')
#
+# @use-dirty-bitmap: use dirty bitmap to detect incremental changes since last job (optional for format 'pbs')
+#
# Returns: the uuid of the backup job
#
##
@@ -825,6 +832,7 @@
'*fingerprint': 'str',
'*backup-id': 'str',
'*backup-time': 'int',
+ '*use-dirty-bitmap': 'bool',
'*format': 'BackupFormat',
'*config-file': 'str',
'*firewall-file': 'str',

View File

@@ -1,39 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Dietmar Maurer <dietmar@proxmox.com>
Date: Mon, 6 Apr 2020 12:17:00 +0200
Subject: [PATCH] PVE-Backup: aquire aio_context before calling
backup_job_create
And do not set target in same aoi_context as source, because
this is already done in bdrv_backup_top_append ...
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
---
pve-backup.c | 8 +++-----
1 file changed, 3 insertions(+), 5 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 9ae89fb679..38dd33e28b 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -757,17 +757,15 @@ static void coroutine_fn pvebackup_co_start(void *opaque)
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
- // make sure target runs in same aoi_context as source
AioContext *aio_context = bdrv_get_aio_context(di->bs);
aio_context_acquire(aio_context);
- GSList *ignore = NULL;
- bdrv_set_aio_context_ignore(di->target, aio_context, &ignore);
- g_slist_free(ignore);
- aio_context_release(aio_context);
job = backup_job_create(NULL, di->bs, di->target, backup_state.speed, MIRROR_SYNC_MODE_FULL, NULL,
BITMAP_SYNC_MODE_NEVER, false, NULL, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
JOB_DEFAULT, pvebackup_complete_cb, di, 1, NULL, &local_err);
+
+ aio_context_release(aio_context);
+
if (!job || local_err != NULL) {
qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
error_setg(&backup_state.stat.error, "backup_job_create failed");

View File

@@ -0,0 +1,219 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Dietmar Maurer <dietmar@proxmox.com>
Date: Thu, 9 Jul 2020 12:53:08 +0200
Subject: [PATCH] PVE: various PBS fixes
pbs: fix crypt and compress parameters
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
PVE: handle PBS write callback with big blocks correctly
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
PVE: add zero block handling to PBS dump callback
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 4 ++-
pve-backup.c | 57 +++++++++++++++++++++++++++-------
qapi/block-core.json | 6 ++++
3 files changed, 54 insertions(+), 13 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 3c06734e6d..4481b60a5c 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1042,7 +1042,9 @@ void hmp_backup(Monitor *mon, const QDict *qdict)
false, NULL, // PBS fingerprint
false, NULL, // PBS backup-id
false, 0, // PBS backup-time
- false, false, // PBS incremental
+ false, false, // PBS use-dirty-bitmap
+ false, false, // PBS compress
+ false, false, // PBS encrypt
true, dir ? BACKUP_FORMAT_DIR : BACKUP_FORMAT_VMA,
false, NULL, false, NULL, !!devlist,
devlist, qdict_haskey(qdict, "speed"), speed, &error);
diff --git a/pve-backup.c b/pve-backup.c
index 1c49cd178d..c15abefdda 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -8,6 +8,7 @@
#include "block/blockjob.h"
#include "qapi/qapi-commands-block.h"
#include "qapi/qmp/qerror.h"
+#include "qemu/cutils.h"
/* PVE backup state and related function */
@@ -67,6 +68,7 @@ opts_init(pvebackup_init);
typedef struct PVEBackupDevInfo {
BlockDriverState *bs;
size_t size;
+ uint64_t block_size;
uint8_t dev_id;
bool completed;
char targetfile[PATH_MAX];
@@ -135,10 +137,13 @@ pvebackup_co_dump_pbs_cb(
PVEBackupDevInfo *di = opaque;
assert(backup_state.pbs);
+ assert(buf);
Error *local_err = NULL;
int pbs_res = -1;
+ bool is_zero_block = size == di->block_size && buffer_is_zero(buf, size);
+
qemu_co_mutex_lock(&backup_state.dump_callback_mutex);
// avoid deadlock if job is cancelled
@@ -147,17 +152,29 @@ pvebackup_co_dump_pbs_cb(
return -1;
}
- pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id, buf, start, size, &local_err);
- qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+ uint64_t transferred = 0;
+ uint64_t reused = 0;
+ while (transferred < size) {
+ uint64_t left = size - transferred;
+ uint64_t to_transfer = left < di->block_size ? left : di->block_size;
- if (pbs_res < 0) {
- pvebackup_propagate_error(local_err);
- return pbs_res;
- } else {
- size_t reused = (pbs_res == 0) ? size : 0;
- pvebackup_add_transfered_bytes(size, !buf ? size : 0, reused);
+ pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id,
+ is_zero_block ? NULL : buf + transferred, start + transferred,
+ to_transfer, &local_err);
+ transferred += to_transfer;
+
+ if (pbs_res < 0) {
+ pvebackup_propagate_error(local_err);
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+ return pbs_res;
+ }
+
+ reused += pbs_res == 0 ? to_transfer : 0;
}
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
+ pvebackup_add_transfered_bytes(size, is_zero_block ? size : 0, reused);
+
return size;
}
@@ -178,6 +195,7 @@ pvebackup_co_dump_vma_cb(
int ret = -1;
assert(backup_state.vmaw);
+ assert(buf);
uint64_t remaining = size;
@@ -204,9 +222,7 @@ pvebackup_co_dump_vma_cb(
qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
++cluster_num;
- if (buf) {
- buf += VMA_CLUSTER_SIZE;
- }
+ buf += VMA_CLUSTER_SIZE;
if (ret < 0) {
Error *local_err = NULL;
vma_writer_error_propagate(backup_state.vmaw, &local_err);
@@ -569,6 +585,10 @@ typedef struct QmpBackupTask {
const char *firewall_file;
bool has_devlist;
const char *devlist;
+ bool has_compress;
+ bool compress;
+ bool has_encrypt;
+ bool encrypt;
bool has_speed;
int64_t speed;
Error **errp;
@@ -692,6 +712,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
bool use_dirty_bitmap = task->has_use_dirty_bitmap && task->use_dirty_bitmap;
+
char *pbs_err = NULL;
pbs = proxmox_backup_new(
task->backup_file,
@@ -701,8 +722,10 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
task->has_password ? task->password : NULL,
task->has_keyfile ? task->keyfile : NULL,
task->has_key_password ? task->key_password : NULL,
+ task->has_compress ? task->compress : true,
+ task->has_encrypt ? task->encrypt : task->has_keyfile,
task->has_fingerprint ? task->fingerprint : NULL,
- &pbs_err);
+ &pbs_err);
if (!pbs) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
@@ -721,6 +744,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
+ di->block_size = dump_cb_block_size;
+
const char *devname = bdrv_get_device_name(di->bs);
BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
@@ -941,6 +966,8 @@ UuidInfo *qmp_backup(
bool has_backup_id, const char *backup_id,
bool has_backup_time, int64_t backup_time,
bool has_use_dirty_bitmap, bool use_dirty_bitmap,
+ bool has_compress, bool compress,
+ bool has_encrypt, bool encrypt,
bool has_format, BackupFormat format,
bool has_config_file, const char *config_file,
bool has_firewall_file, const char *firewall_file,
@@ -951,6 +978,8 @@ UuidInfo *qmp_backup(
.backup_file = backup_file,
.has_password = has_password,
.password = password,
+ .has_keyfile = has_keyfile,
+ .keyfile = keyfile,
.has_key_password = has_key_password,
.key_password = key_password,
.has_fingerprint = has_fingerprint,
@@ -961,6 +990,10 @@ UuidInfo *qmp_backup(
.backup_time = backup_time,
.has_use_dirty_bitmap = has_use_dirty_bitmap,
.use_dirty_bitmap = use_dirty_bitmap,
+ .has_compress = has_compress,
+ .compress = compress,
+ .has_encrypt = has_encrypt,
+ .encrypt = encrypt,
.has_format = has_format,
.format = format,
.has_config_file = has_config_file,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index b9d6f52f0c..5d8e2eb303 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -822,6 +822,10 @@
#
# @use-dirty-bitmap: use dirty bitmap to detect incremental changes since last job (optional for format 'pbs')
#
+# @compress: use compression (optional for format 'pbs', defaults to true)
+#
+# @encrypt: use encryption ((optional for format 'pbs', defaults to true if there is a keyfile)
+#
# Returns: the uuid of the backup job
#
##
@@ -833,6 +837,8 @@
'*backup-id': 'str',
'*backup-time': 'int',
'*use-dirty-bitmap': 'bool',
+ '*compress': 'bool',
+ '*encrypt': 'bool',
'*format': 'BackupFormat',
'*config-file': 'str',
'*firewall-file': 'str',

View File

@@ -0,0 +1,407 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 8 Jul 2020 09:50:54 +0200
Subject: [PATCH] PVE: Add PBS block driver to map backup archives into VMs
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
[error cleanups, file_open implementation]
Signed-off-by: Dietmar Maurer <dietmar@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[adapt to changed function signatures]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
block/meson.build | 3 +
block/pbs.c | 276 +++++++++++++++++++++++++++++++++++++++++++
configure | 9 ++
meson.build | 1 +
qapi/block-core.json | 13 ++
5 files changed, 302 insertions(+)
create mode 100644 block/pbs.c
diff --git a/block/meson.build b/block/meson.build
index 9d3dd5b7c3..8c758c0218 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -51,6 +51,9 @@ block_ss.add(files(
'../pve-backup.c',
), libproxmox_backup_qemu)
+block_ss.add(when: 'CONFIG_PBS_BDRV', if_true: files('pbs.c'))
+block_ss.add(when: 'CONFIG_PBS_BDRV', if_true: libproxmox_backup_qemu)
+
softmmu_ss.add(when: 'CONFIG_TCG', if_true: files('blkreplay.c'))
diff --git a/block/pbs.c b/block/pbs.c
new file mode 100644
index 0000000000..0b05ea9080
--- /dev/null
+++ b/block/pbs.c
@@ -0,0 +1,276 @@
+/*
+ * Proxmox Backup Server read-only block driver
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qstring.h"
+#include "qemu/module.h"
+#include "qemu/option.h"
+#include "qemu/cutils.h"
+#include "block/block_int.h"
+
+#include <proxmox-backup-qemu.h>
+
+#define PBS_OPT_REPOSITORY "repository"
+#define PBS_OPT_SNAPSHOT "snapshot"
+#define PBS_OPT_ARCHIVE "archive"
+#define PBS_OPT_KEYFILE "keyfile"
+#define PBS_OPT_PASSWORD "password"
+#define PBS_OPT_FINGERPRINT "fingerprint"
+#define PBS_OPT_ENCRYPTION_PASSWORD "key_password"
+
+typedef struct {
+ ProxmoxRestoreHandle *conn;
+ char aid;
+ int64_t length;
+
+ char *repository;
+ char *snapshot;
+ char *archive;
+} BDRVPBSState;
+
+static QemuOptsList runtime_opts = {
+ .name = "pbs",
+ .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
+ .desc = {
+ {
+ .name = PBS_OPT_REPOSITORY,
+ .type = QEMU_OPT_STRING,
+ .help = "The server address and repository to connect to.",
+ },
+ {
+ .name = PBS_OPT_SNAPSHOT,
+ .type = QEMU_OPT_STRING,
+ .help = "The snapshot to read.",
+ },
+ {
+ .name = PBS_OPT_ARCHIVE,
+ .type = QEMU_OPT_STRING,
+ .help = "Which archive within the snapshot should be accessed.",
+ },
+ {
+ .name = PBS_OPT_PASSWORD,
+ .type = QEMU_OPT_STRING,
+ .help = "Server password. Can be passed as env var 'PBS_PASSWORD'.",
+ },
+ {
+ .name = PBS_OPT_FINGERPRINT,
+ .type = QEMU_OPT_STRING,
+ .help = "Server fingerprint. Can be passed as env var 'PBS_FINGERPRINT'.",
+ },
+ {
+ .name = PBS_OPT_ENCRYPTION_PASSWORD,
+ .type = QEMU_OPT_STRING,
+ .help = "Optional: Key password. Can be passed as env var 'PBS_ENCRYPTION_PASSWORD'.",
+ },
+ {
+ .name = PBS_OPT_KEYFILE,
+ .type = QEMU_OPT_STRING,
+ .help = "Optional: The path to the keyfile to use.",
+ },
+ { /* end of list */ }
+ },
+};
+
+
+// filename format:
+// pbs:repository=<repo>,snapshot=<snap>,password=<pw>,key_password=<kpw>,fingerprint=<fp>,archive=<archive>
+static void pbs_parse_filename(const char *filename, QDict *options,
+ Error **errp)
+{
+
+ if (!strstart(filename, "pbs:", &filename)) {
+ if (errp) error_setg(errp, "pbs_parse_filename failed - missing 'pbs:' prefix");
+ }
+
+
+ QemuOpts *opts = qemu_opts_parse_noisily(&runtime_opts, filename, false);
+ if (!opts) {
+ if (errp) error_setg(errp, "pbs_parse_filename failed");
+ return;
+ }
+
+ qemu_opts_to_qdict(opts, options);
+
+ qemu_opts_del(opts);
+}
+
+static int pbs_open(BlockDriverState *bs, QDict *options, int flags,
+ Error **errp)
+{
+ QemuOpts *opts;
+ BDRVPBSState *s = bs->opaque;
+ char *pbs_error = NULL;
+
+ opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
+ qemu_opts_absorb_qdict(opts, options, &error_abort);
+
+ s->repository = g_strdup(qemu_opt_get(opts, PBS_OPT_REPOSITORY));
+ s->snapshot = g_strdup(qemu_opt_get(opts, PBS_OPT_SNAPSHOT));
+ s->archive = g_strdup(qemu_opt_get(opts, PBS_OPT_ARCHIVE));
+ const char *keyfile = qemu_opt_get(opts, PBS_OPT_KEYFILE);
+ const char *password = qemu_opt_get(opts, PBS_OPT_PASSWORD);
+ const char *fingerprint = qemu_opt_get(opts, PBS_OPT_FINGERPRINT);
+ const char *key_password = qemu_opt_get(opts, PBS_OPT_ENCRYPTION_PASSWORD);
+
+ if (!password) {
+ password = getenv("PBS_PASSWORD");
+ }
+ if (!fingerprint) {
+ fingerprint = getenv("PBS_FINGERPRINT");
+ }
+ if (!key_password) {
+ key_password = getenv("PBS_ENCRYPTION_PASSWORD");
+ }
+
+ /* connect to PBS server in read mode */
+ s->conn = proxmox_restore_new(s->repository, s->snapshot, password,
+ keyfile, key_password, fingerprint, &pbs_error);
+
+ /* invalidates qemu_opt_get char pointers from above */
+ qemu_opts_del(opts);
+
+ if (!s->conn) {
+ if (pbs_error && errp) error_setg(errp, "PBS restore_new failed: %s", pbs_error);
+ if (pbs_error) proxmox_backup_free_error(pbs_error);
+ return -ENOMEM;
+ }
+
+ int ret = proxmox_restore_connect(s->conn, &pbs_error);
+ if (ret < 0) {
+ if (pbs_error && errp) error_setg(errp, "PBS connect failed: %s", pbs_error);
+ if (pbs_error) proxmox_backup_free_error(pbs_error);
+ return -ECONNREFUSED;
+ }
+
+ /* acquire handle and length */
+ s->aid = proxmox_restore_open_image(s->conn, s->archive, &pbs_error);
+ if (s->aid < 0) {
+ if (pbs_error && errp) error_setg(errp, "PBS open_image failed: %s", pbs_error);
+ if (pbs_error) proxmox_backup_free_error(pbs_error);
+ return -ENODEV;
+ }
+ s->length = proxmox_restore_get_image_length(s->conn, s->aid, &pbs_error);
+ if (s->length < 0) {
+ if (pbs_error && errp) error_setg(errp, "PBS get_image_length failed: %s", pbs_error);
+ if (pbs_error) proxmox_backup_free_error(pbs_error);
+ return -EINVAL;
+ }
+
+ return 0;
+}
+
+static int pbs_file_open(BlockDriverState *bs, QDict *options, int flags,
+ Error **errp)
+{
+ return pbs_open(bs, options, flags, errp);
+}
+
+static void pbs_close(BlockDriverState *bs) {
+ BDRVPBSState *s = bs->opaque;
+ g_free(s->repository);
+ g_free(s->snapshot);
+ g_free(s->archive);
+ proxmox_restore_disconnect(s->conn);
+}
+
+static int64_t pbs_getlength(BlockDriverState *bs)
+{
+ BDRVPBSState *s = bs->opaque;
+ return s->length;
+}
+
+typedef struct ReadCallbackData {
+ Coroutine *co;
+ AioContext *ctx;
+} ReadCallbackData;
+
+static void read_callback(void *callback_data)
+{
+ ReadCallbackData *rcb = callback_data;
+ aio_co_schedule(rcb->ctx, rcb->co);
+}
+
+static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
+ int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ BDRVPBSState *s = bs->opaque;
+ int ret;
+ char *pbs_error = NULL;
+ uint8_t *buf = malloc(bytes);
+
+ if (offset < 0 || bytes < 0) {
+ fprintf(stderr, "unexpected negative 'offset' or 'bytes' value!\n");
+ return -EINVAL;
+ }
+
+ ReadCallbackData rcb = {
+ .co = qemu_coroutine_self(),
+ .ctx = bdrv_get_aio_context(bs),
+ };
+
+ proxmox_restore_read_image_at_async(s->conn, s->aid, buf, (uint64_t)offset, (uint64_t)bytes,
+ read_callback, (void *) &rcb, &ret, &pbs_error);
+
+ qemu_coroutine_yield();
+
+ if (ret < 0) {
+ fprintf(stderr, "error during PBS read: %s\n", pbs_error ? pbs_error : "unknown error");
+ if (pbs_error) proxmox_backup_free_error(pbs_error);
+ return -EIO;
+ }
+
+ qemu_iovec_from_buf(qiov, 0, buf, bytes);
+ free(buf);
+
+ return ret;
+}
+
+static coroutine_fn int pbs_co_pwritev(BlockDriverState *bs,
+ int64_t offset, int64_t bytes,
+ QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ fprintf(stderr, "pbs-bdrv: cannot write to backup file, make sure "
+ "any attached disk devices are set to read-only!\n");
+ return -EPERM;
+}
+
+static void pbs_refresh_filename(BlockDriverState *bs)
+{
+ BDRVPBSState *s = bs->opaque;
+ snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s/%s(%s)",
+ s->repository, s->snapshot, s->archive);
+}
+
+static const char *const pbs_strong_runtime_opts[] = {
+ NULL
+};
+
+static BlockDriver bdrv_pbs_co = {
+ .format_name = "pbs",
+ .protocol_name = "pbs",
+ .instance_size = sizeof(BDRVPBSState),
+
+ .bdrv_parse_filename = pbs_parse_filename,
+
+ .bdrv_file_open = pbs_file_open,
+ .bdrv_open = pbs_open,
+ .bdrv_close = pbs_close,
+ .bdrv_getlength = pbs_getlength,
+
+ .bdrv_co_preadv = pbs_co_preadv,
+ .bdrv_co_pwritev = pbs_co_pwritev,
+
+ .bdrv_refresh_filename = pbs_refresh_filename,
+ .strong_runtime_opts = pbs_strong_runtime_opts,
+};
+
+static void bdrv_pbs_init(void)
+{
+ bdrv_register(&bdrv_pbs_co);
+}
+
+block_init(bdrv_pbs_init);
diff --git a/configure b/configure
index 48c21775f3..eda4e9225a 100755
--- a/configure
+++ b/configure
@@ -356,6 +356,7 @@ vdi=${default_feature:-yes}
vvfat=${default_feature:-yes}
qed=${default_feature:-yes}
parallels=${default_feature:-yes}
+pbs_bdrv="yes"
debug_mutex="no"
plugins="$default_feature"
rng_none="no"
@@ -1126,6 +1127,10 @@ for opt do
;;
--enable-parallels) parallels="yes"
;;
+ --disable-pbs-bdrv) pbs_bdrv="no"
+ ;;
+ --enable-pbs-bdrv) pbs_bdrv="yes"
+ ;;
--disable-vhost-user) vhost_user="no"
;;
--enable-vhost-user) vhost_user="yes"
@@ -1465,6 +1470,7 @@ cat << EOF
vvfat vvfat image format support
qed qed image format support
parallels parallels image format support
+ pbs-bdrv Proxmox backup server read-only block driver support
crypto-afalg Linux AF_ALG crypto backend driver
debug-mutex mutex debugging support
rng-none dummy RNG, avoid using /dev/(u)random and getrandom()
@@ -3534,6 +3540,9 @@ if test "$xen" = "enabled" ; then
echo "XEN_CFLAGS=$xen_cflags" >> $config_host_mak
echo "XEN_LIBS=$xen_libs" >> $config_host_mak
fi
+if test "$pbs_bdrv" = "yes" ; then
+ echo "CONFIG_PBS_BDRV=y" >> $config_host_mak
+fi
if test "$vhost_scsi" = "yes" ; then
echo "CONFIG_VHOST_SCSI=y" >> $config_host_mak
fi
diff --git a/meson.build b/meson.build
index 1a4dfab4e2..85b3c63199 100644
--- a/meson.build
+++ b/meson.build
@@ -3448,6 +3448,7 @@ summary_info += {'lzfse support': liblzfse}
summary_info += {'zstd support': zstd}
summary_info += {'NUMA host support': config_host.has_key('CONFIG_NUMA')}
summary_info += {'libxml2': libxml2}
+summary_info += {'PBS bdrv support': config_host.has_key('CONFIG_PBS_BDRV')}
summary_info += {'capstone': capstone_opt == 'internal' ? capstone_opt : capstone}
summary_info += {'libpmem support': libpmem}
summary_info += {'libdaxctl support': libdaxctl}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 5d8e2eb303..777863e33b 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -3053,6 +3053,7 @@
'luks', 'nbd', 'nfs', 'null-aio', 'null-co', 'nvme', 'parallels',
'preallocate', 'qcow', 'qcow2', 'qed', 'quorum', 'raw', 'rbd',
{ 'name': 'replication', 'if': 'CONFIG_REPLICATION' },
+ 'pbs',
'ssh', 'throttle', 'vdi', 'vhdx', 'vmdk', 'vpc', 'vvfat' ] }
##
@@ -3125,6 +3126,17 @@
{ 'struct': 'BlockdevOptionsNull',
'data': { '*size': 'int', '*latency-ns': 'uint64', '*read-zeroes': 'bool' } }
+##
+# @BlockdevOptionsPbs:
+#
+# Driver specific block device options for the PBS backend.
+#
+##
+{ 'struct': 'BlockdevOptionsPbs',
+ 'data': { 'repository': 'str', 'snapshot': 'str', 'archive': 'str',
+ '*keyfile': 'str', '*password': 'str', '*fingerprint': 'str',
+ '*key_password': 'str' } }
+
##
# @BlockdevOptionsNVMe:
#
@@ -4367,6 +4379,7 @@
'nfs': 'BlockdevOptionsNfs',
'null-aio': 'BlockdevOptionsNull',
'null-co': 'BlockdevOptionsNull',
+ 'pbs': 'BlockdevOptionsPbs',
'nvme': 'BlockdevOptionsNVMe',
'parallels': 'BlockdevOptionsGenericFormat',
'preallocate':'BlockdevOptionsPreallocate',

View File

@@ -1,884 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Dietmar Maurer <dietmar@proxmox.com>
Date: Mon, 6 Apr 2020 12:17:02 +0200
Subject: [PATCH] PVE-Backup: avoid coroutines to fix AIO freeze, cleanups
We observed various AIO pool loop freezes, so we decided to avoid
coroutines and restrict ourselfes using similar code as upstream
(see blockdev.c: do_backup_common).
* avoid coroutine for job related code (causes hangs with iothreads)
- this imply to use normal QemuRecMutex instead of CoMutex
* split pvebackup_co_dump_cb into:
- pvebackup_co_dump_pbs_cb and
- pvebackup_co_dump_pbs_cb
* new helper functions
- pvebackup_propagate_error
- pvebackup_error_or_canceled
- pvebackup_add_transfered_bytes
* avoid cancel flag (not needed)
* simplify backup_cancel logic
There is progress on upstream to support running qmp commands inside
coroutines, see:
https://lists.gnu.org/archive/html/qemu-devel/2020-02/msg04852.html
We should consider using that when it is available in upstream qemu.
---
pve-backup.c | 611 +++++++++++++++++++++++++--------------------------
1 file changed, 299 insertions(+), 312 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 38dd33e28b..169f0c68d0 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -11,11 +11,10 @@
/* PVE backup state and related function */
-
static struct PVEBackupState {
struct {
- // Everithing accessed from qmp command, protected using rwlock
- CoRwlock rwlock;
+ // Everithing accessed from qmp_backup_query command is protected using lock
+ QemuRecMutex lock;
Error *error;
time_t start_time;
time_t end_time;
@@ -25,19 +24,18 @@ static struct PVEBackupState {
size_t total;
size_t transferred;
size_t zero_bytes;
- bool cancel;
} stat;
int64_t speed;
VmaWriter *vmaw;
ProxmoxBackupHandle *pbs;
GList *di_list;
- CoMutex backup_mutex;
+ QemuRecMutex backup_mutex;
} backup_state;
static void pvebackup_init(void)
{
- qemu_co_rwlock_init(&backup_state.stat.rwlock);
- qemu_co_mutex_init(&backup_state.backup_mutex);
+ qemu_rec_mutex_init(&backup_state.stat.lock);
+ qemu_rec_mutex_init(&backup_state.backup_mutex);
}
// initialize PVEBackupState at startup
@@ -52,10 +50,54 @@ typedef struct PVEBackupDevInfo {
BlockDriverState *target;
} PVEBackupDevInfo;
-static void pvebackup_co_run_next_job(void);
+static void pvebackup_run_next_job(void);
+
+static BlockJob *
+lookup_active_block_job(PVEBackupDevInfo *di)
+{
+ if (!di->completed && di->bs) {
+ for (BlockJob *job = block_job_next(NULL); job; job = block_job_next(job)) {
+ if (job->job.driver->job_type != JOB_TYPE_BACKUP) {
+ continue;
+ }
+
+ BackupBlockJob *bjob = container_of(job, BackupBlockJob, common);
+ if (bjob && bjob->source_bs == di->bs) {
+ return job;
+ }
+ }
+ }
+ return NULL;
+}
+
+static void pvebackup_propagate_error(Error *err)
+{
+ qemu_rec_mutex_lock(&backup_state.stat.lock);
+ error_propagate(&backup_state.stat.error, err);
+ qemu_rec_mutex_unlock(&backup_state.stat.lock);
+}
+
+static bool pvebackup_error_or_canceled(void)
+{
+ qemu_rec_mutex_lock(&backup_state.stat.lock);
+ bool error_or_canceled = !!backup_state.stat.error;
+ qemu_rec_mutex_unlock(&backup_state.stat.lock);
+
+ return error_or_canceled;
+}
+static void pvebackup_add_transfered_bytes(size_t transferred, size_t zero_bytes)
+{
+ qemu_rec_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.zero_bytes += zero_bytes;
+ backup_state.stat.transferred += transferred;
+ qemu_rec_mutex_unlock(&backup_state.stat.lock);
+}
+
+// This may get called from multiple coroutines in multiple io-threads
+// Note1: this may get called after job_cancel()
static int coroutine_fn
-pvebackup_co_dump_cb(
+pvebackup_co_dump_pbs_cb(
void *opaque,
uint64_t start,
uint64_t bytes,
@@ -67,137 +109,129 @@ pvebackup_co_dump_cb(
const unsigned char *buf = pbuf;
PVEBackupDevInfo *di = opaque;
- qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
- bool cancel = backup_state.stat.cancel;
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
+ assert(backup_state.pbs);
+
+ Error *local_err = NULL;
+ int pbs_res = -1;
+
+ qemu_rec_mutex_lock(&backup_state.backup_mutex);
- if (cancel) {
- return size; // return success
+ // avoid deadlock if job is cancelled
+ if (pvebackup_error_or_canceled()) {
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+ return -1;
}
- qemu_co_mutex_lock(&backup_state.backup_mutex);
+ pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id, buf, start, size, &local_err);
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
- int ret = -1;
+ if (pbs_res < 0) {
+ pvebackup_propagate_error(local_err);
+ return pbs_res;
+ } else {
+ pvebackup_add_transfered_bytes(size, !buf ? size : 0);
+ }
- if (backup_state.vmaw) {
- size_t zero_bytes = 0;
- uint64_t remaining = size;
-
- uint64_t cluster_num = start / VMA_CLUSTER_SIZE;
- if ((cluster_num * VMA_CLUSTER_SIZE) != start) {
- qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
- if (!backup_state.stat.error) {
- qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
- error_setg(&backup_state.stat.error,
- "got unaligned write inside backup dump "
- "callback (sector %ld)", start);
- }
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
- qemu_co_mutex_unlock(&backup_state.backup_mutex);
- return -1; // not aligned to cluster size
- }
+ return size;
+}
- while (remaining > 0) {
- ret = vma_writer_write(backup_state.vmaw, di->dev_id, cluster_num,
- buf, &zero_bytes);
- ++cluster_num;
- if (buf) {
- buf += VMA_CLUSTER_SIZE;
- }
- if (ret < 0) {
- qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
- if (!backup_state.stat.error) {
- qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
- vma_writer_error_propagate(backup_state.vmaw, &backup_state.stat.error);
- }
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
+// This may get called from multiple coroutines in multiple io-threads
+static int coroutine_fn
+pvebackup_co_dump_vma_cb(
+ void *opaque,
+ uint64_t start,
+ uint64_t bytes,
+ const void *pbuf)
+{
+ assert(qemu_in_coroutine());
- qemu_co_mutex_unlock(&backup_state.backup_mutex);
- return ret;
- } else {
- qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
- backup_state.stat.zero_bytes += zero_bytes;
- if (remaining >= VMA_CLUSTER_SIZE) {
- backup_state.stat.transferred += VMA_CLUSTER_SIZE;
- remaining -= VMA_CLUSTER_SIZE;
- } else {
- backup_state.stat.transferred += remaining;
- remaining = 0;
- }
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
- }
- }
- } else if (backup_state.pbs) {
- Error *local_err = NULL;
- int pbs_res = -1;
+ const uint64_t size = bytes;
+ const unsigned char *buf = pbuf;
+ PVEBackupDevInfo *di = opaque;
- pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id, buf, start, size, &local_err);
- qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
+ int ret = -1;
- if (pbs_res < 0) {
- error_propagate(&backup_state.stat.error, local_err);
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
- qemu_co_mutex_unlock(&backup_state.backup_mutex);
- return pbs_res;
- } else {
- if (!buf) {
- backup_state.stat.zero_bytes += size;
- }
- backup_state.stat.transferred += size;
+ assert(backup_state.vmaw);
+
+ uint64_t remaining = size;
+
+ uint64_t cluster_num = start / VMA_CLUSTER_SIZE;
+ if ((cluster_num * VMA_CLUSTER_SIZE) != start) {
+ Error *local_err = NULL;
+ error_setg(&local_err,
+ "got unaligned write inside backup dump "
+ "callback (sector %ld)", start);
+ pvebackup_propagate_error(local_err);
+ return -1; // not aligned to cluster size
+ }
+
+ while (remaining > 0) {
+ qemu_rec_mutex_lock(&backup_state.backup_mutex);
+ // avoid deadlock if job is cancelled
+ if (pvebackup_error_or_canceled()) {
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+ return -1;
}
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
+ size_t zero_bytes = 0;
+ ret = vma_writer_write(backup_state.vmaw, di->dev_id, cluster_num, buf, &zero_bytes);
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
- } else {
- qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
- if (!buf) {
- backup_state.stat.zero_bytes += size;
+ ++cluster_num;
+ if (buf) {
+ buf += VMA_CLUSTER_SIZE;
+ }
+ if (ret < 0) {
+ Error *local_err = NULL;
+ vma_writer_error_propagate(backup_state.vmaw, &local_err);
+ pvebackup_propagate_error(local_err);
+ return ret;
+ } else {
+ if (remaining >= VMA_CLUSTER_SIZE) {
+ assert(ret == VMA_CLUSTER_SIZE);
+ pvebackup_add_transfered_bytes(VMA_CLUSTER_SIZE, zero_bytes);
+ remaining -= VMA_CLUSTER_SIZE;
+ } else {
+ assert(ret == remaining);
+ pvebackup_add_transfered_bytes(remaining, zero_bytes);
+ remaining = 0;
+ }
}
- backup_state.stat.transferred += size;
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
}
- qemu_co_mutex_unlock(&backup_state.backup_mutex);
-
return size;
}
-static void coroutine_fn pvebackup_co_cleanup(void)
+static void coroutine_fn pvebackup_co_cleanup(void *unused)
{
assert(qemu_in_coroutine());
- qemu_co_mutex_lock(&backup_state.backup_mutex);
+ qemu_rec_mutex_lock(&backup_state.backup_mutex);
- qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
+ qemu_rec_mutex_lock(&backup_state.stat.lock);
backup_state.stat.end_time = time(NULL);
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
+ qemu_rec_mutex_unlock(&backup_state.stat.lock);
if (backup_state.vmaw) {
Error *local_err = NULL;
vma_writer_close(backup_state.vmaw, &local_err);
if (local_err != NULL) {
- qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
- error_propagate(&backup_state.stat.error, local_err);
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
- }
+ pvebackup_propagate_error(local_err);
+ }
backup_state.vmaw = NULL;
}
if (backup_state.pbs) {
- qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
- bool error_or_canceled = backup_state.stat.error || backup_state.stat.cancel;
- if (!error_or_canceled) {
+ if (!pvebackup_error_or_canceled()) {
Error *local_err = NULL;
proxmox_backup_co_finish(backup_state.pbs, &local_err);
if (local_err != NULL) {
- qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
- error_propagate(&backup_state.stat.error, local_err);
- }
+ pvebackup_propagate_error(local_err);
+ }
}
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
proxmox_backup_disconnect(backup_state.pbs);
backup_state.pbs = NULL;
@@ -205,43 +239,14 @@ static void coroutine_fn pvebackup_co_cleanup(void)
g_list_free(backup_state.di_list);
backup_state.di_list = NULL;
- qemu_co_mutex_unlock(&backup_state.backup_mutex);
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
}
-typedef struct PVEBackupCompeteCallbackData {
- PVEBackupDevInfo *di;
- int result;
-} PVEBackupCompeteCallbackData;
-
-static void coroutine_fn pvebackup_co_complete_cb(void *opaque)
+static void coroutine_fn pvebackup_complete_stream(void *opaque)
{
- assert(qemu_in_coroutine());
-
- PVEBackupCompeteCallbackData *cb_data = opaque;
-
- qemu_co_mutex_lock(&backup_state.backup_mutex);
-
- PVEBackupDevInfo *di = cb_data->di;
- int ret = cb_data->result;
-
- di->completed = true;
-
- qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
- bool error_or_canceled = backup_state.stat.error || backup_state.stat.cancel;
-
- if (ret < 0 && !backup_state.stat.error) {
- qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
- error_setg(&backup_state.stat.error, "job failed with err %d - %s",
- ret, strerror(-ret));
- }
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-
- di->bs = NULL;
+ PVEBackupDevInfo *di = opaque;
- if (di->target) {
- bdrv_unref(di->target);
- di->target = NULL;
- }
+ bool error_or_canceled = pvebackup_error_or_canceled();
if (backup_state.vmaw) {
vma_writer_close_stream(backup_state.vmaw, di->dev_id);
@@ -251,108 +256,96 @@ static void coroutine_fn pvebackup_co_complete_cb(void *opaque)
Error *local_err = NULL;
proxmox_backup_co_close_image(backup_state.pbs, di->dev_id, &local_err);
if (local_err != NULL) {
- qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
- error_propagate(&backup_state.stat.error, local_err);
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
+ pvebackup_propagate_error(local_err);
}
}
+}
- // remove self from job queue
- backup_state.di_list = g_list_remove(backup_state.di_list, di);
- g_free(di);
+static void pvebackup_complete_cb(void *opaque, int ret)
+{
+ assert(!qemu_in_coroutine());
+
+ PVEBackupDevInfo *di = opaque;
- int pending_jobs = g_list_length(backup_state.di_list);
+ qemu_rec_mutex_lock(&backup_state.backup_mutex);
- qemu_co_mutex_unlock(&backup_state.backup_mutex);
+ di->completed = true;
- if (pending_jobs > 0) {
- pvebackup_co_run_next_job();
- } else {
- pvebackup_co_cleanup();
+ if (ret < 0) {
+ Error *local_err = NULL;
+ error_setg(&local_err, "job failed with err %d - %s", ret, strerror(-ret));
+ pvebackup_propagate_error(local_err);
}
-}
-static void pvebackup_complete_cb(void *opaque, int ret)
-{
- // This can be called from the main loop, or from a coroutine
- PVEBackupCompeteCallbackData cb_data = {
- .di = opaque,
- .result = ret,
- };
+ di->bs = NULL;
- if (qemu_in_coroutine()) {
- pvebackup_co_complete_cb(&cb_data);
- } else {
- block_on_coroutine_fn(pvebackup_co_complete_cb, &cb_data);
- }
-}
+ assert(di->target == NULL);
-static void coroutine_fn pvebackup_co_cancel(void *opaque)
-{
- assert(qemu_in_coroutine());
+ block_on_coroutine_fn(pvebackup_complete_stream, di);
- qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
- backup_state.stat.cancel = true;
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
+ // remove self from job queue
+ backup_state.di_list = g_list_remove(backup_state.di_list, di);
- qemu_co_mutex_lock(&backup_state.backup_mutex);
+ g_free(di);
- // Avoid race between block jobs and backup-cancel command:
- if (!(backup_state.vmaw || backup_state.pbs)) {
- qemu_co_mutex_unlock(&backup_state.backup_mutex);
- return;
- }
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
- qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
- if (!backup_state.stat.error) {
- qemu_co_rwlock_upgrade(&backup_state.stat.rwlock);
- error_setg(&backup_state.stat.error, "backup cancelled");
- }
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
+ pvebackup_run_next_job();
+}
+
+static void pvebackup_cancel(void)
+{
+ Error *cancel_err = NULL;
+ error_setg(&cancel_err, "backup canceled");
+ pvebackup_propagate_error(cancel_err);
+
+ qemu_rec_mutex_lock(&backup_state.backup_mutex);
if (backup_state.vmaw) {
/* make sure vma writer does not block anymore */
- vma_writer_set_error(backup_state.vmaw, "backup cancelled");
+ vma_writer_set_error(backup_state.vmaw, "backup canceled");
}
if (backup_state.pbs) {
- proxmox_backup_abort(backup_state.pbs, "backup cancelled");
+ proxmox_backup_abort(backup_state.pbs, "backup canceled");
}
- bool running_jobs = 0;
- GList *l = backup_state.di_list;
- while (l) {
- PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
- l = g_list_next(l);
- if (!di->completed && di->bs) {
- for (BlockJob *job = block_job_next(NULL); job; job = block_job_next(job)) {
- if (job->job.driver->job_type != JOB_TYPE_BACKUP) {
- continue;
- }
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
- BackupBlockJob *bjob = container_of(job, BackupBlockJob, common);
- if (bjob && bjob->source_bs == di->bs) {
- AioContext *aio_context = job->job.aio_context;
- aio_context_acquire(aio_context);
+ for(;;) {
- if (!di->completed) {
- running_jobs += 1;
- job_cancel(&job->job, false);
- }
- aio_context_release(aio_context);
- }
+ BlockJob *next_job = NULL;
+
+ qemu_rec_mutex_lock(&backup_state.backup_mutex);
+
+ GList *l = backup_state.di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ BlockJob *job = lookup_active_block_job(di);
+ if (job != NULL) {
+ next_job = job;
+ break;
}
}
- }
- qemu_co_mutex_unlock(&backup_state.backup_mutex);
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
- if (running_jobs == 0) pvebackup_co_cleanup(); // else job will call completion handler
+ if (next_job) {
+ AioContext *aio_context = next_job->job.aio_context;
+ aio_context_acquire(aio_context);
+ job_cancel_sync(&next_job->job);
+ aio_context_release(aio_context);
+ } else {
+ break;
+ }
+ }
}
void qmp_backup_cancel(Error **errp)
{
- block_on_coroutine_fn(pvebackup_co_cancel, NULL);
+ pvebackup_cancel();
}
static int coroutine_fn pvebackup_co_add_config(
@@ -406,46 +399,97 @@ static int coroutine_fn pvebackup_co_add_config(
bool job_should_pause(Job *job);
-static void coroutine_fn pvebackup_co_run_next_job(void)
+static void pvebackup_run_next_job(void)
{
- assert(qemu_in_coroutine());
+ assert(!qemu_in_coroutine());
- qemu_co_mutex_lock(&backup_state.backup_mutex);
+ qemu_rec_mutex_lock(&backup_state.backup_mutex);
GList *l = backup_state.di_list;
while (l) {
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
- if (!di->completed && di->bs) {
- for (BlockJob *job = block_job_next(NULL); job; job = block_job_next(job)) {
- if (job->job.driver->job_type != JOB_TYPE_BACKUP) {
- continue;
- }
- BackupBlockJob *bjob = container_of(job, BackupBlockJob, common);
- if (bjob && bjob->source_bs == di->bs) {
- AioContext *aio_context = job->job.aio_context;
- qemu_co_mutex_unlock(&backup_state.backup_mutex);
- aio_context_acquire(aio_context);
-
- if (job_should_pause(&job->job)) {
- qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
- bool error_or_canceled = backup_state.stat.error || backup_state.stat.cancel;
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-
- if (error_or_canceled) {
- job_cancel(&job->job, false);
- } else {
- job_resume(&job->job);
- }
- }
- aio_context_release(aio_context);
- return;
+ BlockJob *job = lookup_active_block_job(di);
+
+ if (job) {
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+
+ AioContext *aio_context = job->job.aio_context;
+ aio_context_acquire(aio_context);
+
+ if (job_should_pause(&job->job)) {
+ bool error_or_canceled = pvebackup_error_or_canceled();
+ if (error_or_canceled) {
+ job_cancel_sync(&job->job);
+ } else {
+ job_resume(&job->job);
}
}
+ aio_context_release(aio_context);
+ return;
}
}
- qemu_co_mutex_unlock(&backup_state.backup_mutex);
+
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+
+ block_on_coroutine_fn(pvebackup_co_cleanup, NULL); // no more jobs, run cleanup
+}
+
+static bool create_backup_jobs(void) {
+
+ assert(!qemu_in_coroutine());
+
+ Error *local_err = NULL;
+
+ /* create and start all jobs (paused state) */
+ GList *l = backup_state.di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ assert(di->target != NULL);
+
+ AioContext *aio_context = bdrv_get_aio_context(di->bs);
+ aio_context_acquire(aio_context);
+
+ BlockJob *job = backup_job_create(
+ NULL, di->bs, di->target, backup_state.speed, MIRROR_SYNC_MODE_FULL, NULL,
+ BITMAP_SYNC_MODE_NEVER, false, NULL, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
+ JOB_DEFAULT, pvebackup_complete_cb, di, 1, NULL, &local_err);
+
+ aio_context_release(aio_context);
+
+ if (!job || local_err != NULL) {
+ Error *create_job_err = NULL;
+ error_setg(&create_job_err, "backup_job_create failed: %s",
+ local_err ? error_get_pretty(local_err) : "null");
+
+ pvebackup_propagate_error(create_job_err);
+ break;
+ }
+ job_start(&job->job);
+
+ bdrv_unref(di->target);
+ di->target = NULL;
+ }
+
+ bool errors = pvebackup_error_or_canceled();
+
+ if (errors) {
+ l = backup_state.di_list;
+ while (l) {
+ PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
+ l = g_list_next(l);
+
+ if (di->target) {
+ bdrv_unref(di->target);
+ di->target = NULL;
+ }
+ }
+ }
+
+ return errors;
}
typedef struct QmpBackupTask {
@@ -476,7 +520,7 @@ typedef struct QmpBackupTask {
UuidInfo *result;
} QmpBackupTask;
-static void coroutine_fn pvebackup_co_start(void *opaque)
+static void coroutine_fn pvebackup_co_prepare(void *opaque)
{
assert(qemu_in_coroutine());
@@ -495,15 +539,14 @@ static void coroutine_fn pvebackup_co_start(void *opaque)
GList *di_list = NULL;
GList *l;
UuidInfo *uuid_info;
- BlockJob *job;
const char *config_name = "qemu-server.conf";
const char *firewall_name = "qemu-server.fw";
- qemu_co_mutex_lock(&backup_state.backup_mutex);
+ qemu_rec_mutex_lock(&backup_state.backup_mutex);
if (backup_state.di_list) {
- qemu_co_mutex_unlock(&backup_state.backup_mutex);
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
"previous backup not finished");
return;
@@ -631,7 +674,7 @@ static void coroutine_fn pvebackup_co_start(void *opaque)
if (dev_id < 0)
goto err;
- if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_cb, di, task->errp))) {
+ if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, task->errp))) {
goto err;
}
@@ -652,7 +695,7 @@ static void coroutine_fn pvebackup_co_start(void *opaque)
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
- if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_cb, di, task->errp))) {
+ if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_vma_cb, di, task->errp))) {
goto err;
}
@@ -717,9 +760,7 @@ static void coroutine_fn pvebackup_co_start(void *opaque)
}
/* initialize global backup_state now */
- qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
-
- backup_state.stat.cancel = false;
+ qemu_rec_mutex_lock(&backup_state.stat.lock);
if (backup_state.stat.error) {
error_free(backup_state.stat.error);
@@ -742,7 +783,7 @@ static void coroutine_fn pvebackup_co_start(void *opaque)
backup_state.stat.transferred = 0;
backup_state.stat.zero_bytes = 0;
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
+ qemu_rec_mutex_unlock(&backup_state.stat.lock);
backup_state.speed = (task->has_speed && task->speed > 0) ? task->speed : 0;
@@ -751,45 +792,7 @@ static void coroutine_fn pvebackup_co_start(void *opaque)
backup_state.di_list = di_list;
- /* start all jobs (paused state) */
- l = di_list;
- while (l) {
- PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
- l = g_list_next(l);
-
- AioContext *aio_context = bdrv_get_aio_context(di->bs);
- aio_context_acquire(aio_context);
-
- job = backup_job_create(NULL, di->bs, di->target, backup_state.speed, MIRROR_SYNC_MODE_FULL, NULL,
- BITMAP_SYNC_MODE_NEVER, false, NULL, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
- JOB_DEFAULT, pvebackup_complete_cb, di, 1, NULL, &local_err);
-
- aio_context_release(aio_context);
-
- if (!job || local_err != NULL) {
- qemu_co_rwlock_wrlock(&backup_state.stat.rwlock);
- error_setg(&backup_state.stat.error, "backup_job_create failed");
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
- break;
- }
- job_start(&job->job);
- if (di->target) {
- bdrv_unref(di->target);
- di->target = NULL;
- }
- }
-
- qemu_co_mutex_unlock(&backup_state.backup_mutex);
-
- qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
- bool no_errors = !backup_state.stat.error;
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-
- if (no_errors) {
- pvebackup_co_run_next_job(); // run one job
- } else {
- pvebackup_co_cancel(NULL);
- }
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
uuid_info = g_malloc0(sizeof(*uuid_info));
uuid_info->UUID = uuid_str;
@@ -833,7 +836,7 @@ err:
rmdir(backup_dir);
}
- qemu_co_mutex_unlock(&backup_state.backup_mutex);
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
task->result = NULL;
return;
@@ -878,32 +881,28 @@ UuidInfo *qmp_backup(
.errp = errp,
};
- block_on_coroutine_fn(pvebackup_co_start, &task);
+ block_on_coroutine_fn(pvebackup_co_prepare, &task);
+
+ if (*errp == NULL) {
+ qemu_rec_mutex_lock(&backup_state.backup_mutex);
+ create_backup_jobs();
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+ pvebackup_run_next_job();
+ }
return task.result;
}
-
-typedef struct QmpQueryBackupTask {
- Error **errp;
- BackupStatus *result;
-} QmpQueryBackupTask;
-
-static void coroutine_fn pvebackup_co_query(void *opaque)
+BackupStatus *qmp_query_backup(Error **errp)
{
- assert(qemu_in_coroutine());
-
- QmpQueryBackupTask *task = opaque;
-
BackupStatus *info = g_malloc0(sizeof(*info));
- qemu_co_rwlock_rdlock(&backup_state.stat.rwlock);
+ qemu_rec_mutex_lock(&backup_state.stat.lock);
if (!backup_state.stat.start_time) {
/* not started, return {} */
- task->result = info;
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
- return;
+ qemu_rec_mutex_unlock(&backup_state.stat.lock);
+ return info;
}
info->has_status = true;
@@ -939,19 +938,7 @@ static void coroutine_fn pvebackup_co_query(void *opaque)
info->has_transferred = true;
info->transferred = backup_state.stat.transferred;
- task->result = info;
+ qemu_rec_mutex_unlock(&backup_state.stat.lock);
- qemu_co_rwlock_unlock(&backup_state.stat.rwlock);
-}
-
-BackupStatus *qmp_query_backup(Error **errp)
-{
- QmpQueryBackupTask task = {
- .errp = errp,
- .result = NULL,
- };
-
- block_on_coroutine_fn(pvebackup_co_query, &task);
-
- return task.result;
+ return info;
}

View File

@@ -0,0 +1,74 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 8 Jul 2020 11:57:53 +0200
Subject: [PATCH] PVE: add query_proxmox_support QMP command
Generic interface for future use, currently used for PBS dirty-bitmap
backup support.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[PVE: query-proxmox-support: include library version]
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
pve-backup.c | 9 +++++++++
qapi/block-core.json | 29 +++++++++++++++++++++++++++++
2 files changed, 38 insertions(+)
diff --git a/pve-backup.c b/pve-backup.c
index c15abefdda..4684789813 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -1075,3 +1075,12 @@ BackupStatus *qmp_query_backup(Error **errp)
return info;
}
+
+ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
+{
+ ProxmoxSupportStatus *ret = g_malloc0(sizeof(*ret));
+ ret->pbs_library_version = g_strdup(proxmox_backup_qemu_version());
+ ret->pbs_dirty_bitmap = true;
+ ret->pbs_dirty_bitmap_savevm = true;
+ return ret;
+}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 777863e33b..cfd980b70f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -867,6 +867,35 @@
##
{ 'command': 'backup-cancel' }
+##
+# @ProxmoxSupportStatus:
+#
+# Contains info about supported features added by Proxmox.
+#
+# @pbs-dirty-bitmap: True if dirty-bitmap-incremental backups to PBS are
+# supported.
+#
+# @pbs-dirty-bitmap-savevm: True if 'dirty-bitmaps' migration capability can
+# safely be set for savevm-async.
+#
+# @pbs-library-version: Running version of libproxmox-backup-qemu0 library.
+#
+##
+{ 'struct': 'ProxmoxSupportStatus',
+ 'data': { 'pbs-dirty-bitmap': 'bool',
+ 'pbs-dirty-bitmap-savevm': 'bool',
+ 'pbs-library-version': 'str' } }
+
+##
+# @query-proxmox-support:
+#
+# Returns information about supported features added by Proxmox.
+#
+# Returns: @ProxmoxSupportStatus
+#
+##
+{ 'command': 'query-proxmox-support', 'returns': 'ProxmoxSupportStatus' }
+
##
# @BlockDeviceTimedStats:
#

View File

@@ -0,0 +1,441 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 19 Aug 2020 17:02:00 +0200
Subject: [PATCH] PVE: add query-pbs-bitmap-info QMP call
Returns advanced information about dirty bitmaps used (or not used) for
the latest PBS backup.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
monitor/hmp-cmds.c | 28 ++++++-----
pve-backup.c | 117 ++++++++++++++++++++++++++++++++-----------
qapi/block-core.json | 56 +++++++++++++++++++++
3 files changed, 159 insertions(+), 42 deletions(-)
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index cfd7a60f32..b613190a3c 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -198,6 +198,7 @@ void hmp_info_mice(Monitor *mon, const QDict *qdict)
void hmp_info_backup(Monitor *mon, const QDict *qdict)
{
BackupStatus *info;
+ PBSBitmapInfoList *bitmap_info;
info = qmp_query_backup(NULL);
@@ -228,26 +229,29 @@ void hmp_info_backup(Monitor *mon, const QDict *qdict)
// this should not happen normally
monitor_printf(mon, "Total size: %d\n", 0);
} else {
- bool incremental = false;
size_t total_or_dirty = info->total;
- if (info->has_transferred) {
- if (info->has_dirty && info->dirty) {
- if (info->dirty < info->total) {
- total_or_dirty = info->dirty;
- incremental = true;
- }
- }
+ bitmap_info = qmp_query_pbs_bitmap_info(NULL);
+
+ while (bitmap_info) {
+ monitor_printf(mon, "Drive %s:\n",
+ bitmap_info->value->drive);
+ monitor_printf(mon, " bitmap action: %s\n",
+ PBSBitmapAction_str(bitmap_info->value->action));
+ monitor_printf(mon, " size: %zd\n",
+ bitmap_info->value->size);
+ monitor_printf(mon, " dirty: %zd\n",
+ bitmap_info->value->dirty);
+ bitmap_info = bitmap_info->next;
}
- int per = (info->transferred * 100)/total_or_dirty;
-
- monitor_printf(mon, "Backup mode: %s\n", incremental ? "incremental" : "full");
+ qapi_free_PBSBitmapInfoList(bitmap_info);
int zero_per = (info->has_zero_bytes && info->zero_bytes) ?
(info->zero_bytes * 100)/info->total : 0;
monitor_printf(mon, "Total size: %zd\n", info->total);
+ int trans_per = (info->transferred * 100)/total_or_dirty;
monitor_printf(mon, "Transferred bytes: %zd (%d%%)\n",
- info->transferred, per);
+ info->transferred, trans_per);
monitor_printf(mon, "Zero bytes: %zd (%d%%)\n",
info->zero_bytes, zero_per);
diff --git a/pve-backup.c b/pve-backup.c
index 4684789813..f90abaa50a 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -46,6 +46,7 @@ static struct PVEBackupState {
size_t transferred;
size_t reused;
size_t zero_bytes;
+ GList *bitmap_list;
} stat;
int64_t speed;
VmaWriter *vmaw;
@@ -672,7 +673,6 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
size_t total = 0;
- size_t dirty = 0;
l = di_list;
while (l) {
@@ -693,18 +693,33 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
uuid_generate(uuid);
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.reused = 0;
+
+ /* clear previous backup's bitmap_list */
+ if (backup_state.stat.bitmap_list) {
+ GList *bl = backup_state.stat.bitmap_list;
+ while (bl) {
+ g_free(((PBSBitmapInfo *)bl->data)->drive);
+ g_free(bl->data);
+ bl = g_list_next(bl);
+ }
+ g_list_free(backup_state.stat.bitmap_list);
+ backup_state.stat.bitmap_list = NULL;
+ }
+
if (format == BACKUP_FORMAT_PBS) {
if (!task->has_password) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'password'");
- goto err;
+ goto err_mutex;
}
if (!task->has_backup_id) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-id'");
- goto err;
+ goto err_mutex;
}
if (!task->has_backup_time) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-time'");
- goto err;
+ goto err_mutex;
}
int dump_cb_block_size = PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE; // Hardcoded (4M)
@@ -731,12 +746,12 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
"proxmox_backup_new failed: %s", pbs_err);
proxmox_backup_free_error(pbs_err);
- goto err;
+ goto err_mutex;
}
int connect_result = proxmox_backup_co_connect(pbs, task->errp);
if (connect_result < 0)
- goto err;
+ goto err_mutex;
/* register all devices */
l = di_list;
@@ -747,6 +762,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
di->block_size = dump_cb_block_size;
const char *devname = bdrv_get_device_name(di->bs);
+ PBSBitmapAction action = PBS_BITMAP_ACTION_NOT_USED;
+ size_t dirty = di->size;
BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
bool expect_only_dirty = false;
@@ -755,49 +772,59 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (bitmap == NULL) {
bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, task->errp);
if (!bitmap) {
- goto err;
+ goto err_mutex;
}
+ action = PBS_BITMAP_ACTION_NEW;
} else {
expect_only_dirty = proxmox_backup_check_incremental(pbs, devname, di->size) != 0;
}
if (expect_only_dirty) {
- dirty += bdrv_get_dirty_count(bitmap);
+ /* track clean chunks as reused */
+ dirty = MIN(bdrv_get_dirty_count(bitmap), di->size);
+ backup_state.stat.reused += di->size - dirty;
+ action = PBS_BITMAP_ACTION_USED;
} else {
/* mark entire bitmap as dirty to make full backup */
bdrv_set_dirty_bitmap(bitmap, 0, di->size);
- dirty += di->size;
+ if (action != PBS_BITMAP_ACTION_NEW) {
+ action = PBS_BITMAP_ACTION_INVALID;
+ }
}
di->bitmap = bitmap;
} else {
- dirty += di->size;
-
/* after a full backup the old dirty bitmap is invalid anyway */
if (bitmap != NULL) {
bdrv_release_dirty_bitmap(bitmap);
+ action = PBS_BITMAP_ACTION_NOT_USED_REMOVED;
}
}
int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, task->errp);
if (dev_id < 0) {
- goto err;
+ goto err_mutex;
}
if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, task->errp))) {
- goto err;
+ goto err_mutex;
}
di->dev_id = dev_id;
+
+ PBSBitmapInfo *info = g_malloc(sizeof(*info));
+ info->drive = g_strdup(devname);
+ info->action = action;
+ info->size = di->size;
+ info->dirty = dirty;
+ backup_state.stat.bitmap_list = g_list_append(backup_state.stat.bitmap_list, info);
}
} else if (format == BACKUP_FORMAT_VMA) {
- dirty = total;
-
vmaw = vma_writer_create(task->backup_file, uuid, &local_err);
if (!vmaw) {
if (local_err) {
error_propagate(task->errp, local_err);
}
- goto err;
+ goto err_mutex;
}
/* register all devices for vma writer */
@@ -807,7 +834,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
l = g_list_next(l);
if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_vma_cb, di, task->errp))) {
- goto err;
+ goto err_mutex;
}
const char *devname = bdrv_get_device_name(di->bs);
@@ -815,16 +842,14 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (di->dev_id <= 0) {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
"register_stream failed");
- goto err;
+ goto err_mutex;
}
}
} else if (format == BACKUP_FORMAT_DIR) {
- dirty = total;
-
if (mkdir(task->backup_file, 0640) != 0) {
error_setg_errno(task->errp, errno, "can't create directory '%s'\n",
task->backup_file);
- goto err;
+ goto err_mutex;
}
backup_dir = task->backup_file;
@@ -841,18 +866,18 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
di->size, flags, false, &local_err);
if (local_err) {
error_propagate(task->errp, local_err);
- goto err;
+ goto err_mutex;
}
di->target = bdrv_open(di->targetfile, NULL, NULL, flags, &local_err);
if (!di->target) {
error_propagate(task->errp, local_err);
- goto err;
+ goto err_mutex;
}
}
} else {
error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "unknown backup format");
- goto err;
+ goto err_mutex;
}
@@ -860,7 +885,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (task->has_config_file) {
if (pvebackup_co_add_config(task->config_file, config_name, format, backup_dir,
vmaw, pbs, task->errp) != 0) {
- goto err;
+ goto err_mutex;
}
}
@@ -868,12 +893,11 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (task->has_firewall_file) {
if (pvebackup_co_add_config(task->firewall_file, firewall_name, format, backup_dir,
vmaw, pbs, task->errp) != 0) {
- goto err;
+ goto err_mutex;
}
}
/* initialize global backup_state now */
-
- qemu_mutex_lock(&backup_state.stat.lock);
+ /* note: 'reused' and 'bitmap_list' are initialized earlier */
if (backup_state.stat.error) {
error_free(backup_state.stat.error);
@@ -893,10 +917,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
char *uuid_str = g_strdup(backup_state.stat.uuid_str);
backup_state.stat.total = total;
- backup_state.stat.dirty = dirty;
+ backup_state.stat.dirty = total - backup_state.stat.reused;
backup_state.stat.transferred = 0;
backup_state.stat.zero_bytes = 0;
- backup_state.stat.reused = format == BACKUP_FORMAT_PBS && dirty >= total ? 0 : total - dirty;
qemu_mutex_unlock(&backup_state.stat.lock);
@@ -913,6 +936,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
task->result = uuid_info;
return;
+err_mutex:
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
err:
l = di_list;
@@ -1076,11 +1102,42 @@ BackupStatus *qmp_query_backup(Error **errp)
return info;
}
+PBSBitmapInfoList *qmp_query_pbs_bitmap_info(Error **errp)
+{
+ PBSBitmapInfoList *head = NULL, **p_next = &head;
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+
+ GList *l = backup_state.stat.bitmap_list;
+ while (l) {
+ PBSBitmapInfo *info = (PBSBitmapInfo *)l->data;
+ l = g_list_next(l);
+
+ /* clone bitmap info to avoid auto free after QMP marshalling */
+ PBSBitmapInfo *info_ret = g_malloc0(sizeof(*info_ret));
+ info_ret->drive = g_strdup(info->drive);
+ info_ret->action = info->action;
+ info_ret->size = info->size;
+ info_ret->dirty = info->dirty;
+
+ PBSBitmapInfoList *info_list = g_malloc0(sizeof(*info_list));
+ info_list->value = info_ret;
+
+ *p_next = info_list;
+ p_next = &info_list->next;
+ }
+
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+ return head;
+}
+
ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
{
ProxmoxSupportStatus *ret = g_malloc0(sizeof(*ret));
ret->pbs_library_version = g_strdup(proxmox_backup_qemu_version());
ret->pbs_dirty_bitmap = true;
ret->pbs_dirty_bitmap_savevm = true;
+ ret->query_bitmap_info = true;
return ret;
}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index cfd980b70f..8833060385 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -875,6 +875,8 @@
# @pbs-dirty-bitmap: True if dirty-bitmap-incremental backups to PBS are
# supported.
#
+# @query-bitmap-info: True if the 'query-pbs-bitmap-info' QMP call is supported.
+#
# @pbs-dirty-bitmap-savevm: True if 'dirty-bitmaps' migration capability can
# safely be set for savevm-async.
#
@@ -883,6 +885,7 @@
##
{ 'struct': 'ProxmoxSupportStatus',
'data': { 'pbs-dirty-bitmap': 'bool',
+ 'query-bitmap-info': 'bool',
'pbs-dirty-bitmap-savevm': 'bool',
'pbs-library-version': 'str' } }
@@ -896,6 +899,59 @@
##
{ 'command': 'query-proxmox-support', 'returns': 'ProxmoxSupportStatus' }
+##
+# @PBSBitmapAction:
+#
+# An action taken on a dirty-bitmap when a backup job was started.
+#
+# @not-used: Bitmap mode was not enabled.
+#
+# @not-used-removed: Bitmap mode was not enabled, but a bitmap from a
+# previous backup still existed and was removed.
+#
+# @new: A new bitmap was attached to the drive for this backup.
+#
+# @used: An existing bitmap will be used to only backup changed data.
+#
+# @invalid: A bitmap existed, but had to be cleared since it's associated
+# base snapshot did not match the base given for the current job or
+# the crypt mode has changed.
+#
+##
+{ 'enum': 'PBSBitmapAction',
+ 'data': ['not-used', 'not-used-removed', 'new', 'used', 'invalid'] }
+
+##
+# @PBSBitmapInfo:
+#
+# Contains information about dirty bitmaps used for each drive in a PBS backup.
+#
+# @drive: The underlying drive.
+#
+# @action: The action that was taken when the backup started.
+#
+# @size: The total size of the drive.
+#
+# @dirty: How much of the drive is considered dirty and will be backed up,
+# or 'size' if everything will be.
+#
+##
+{ 'struct': 'PBSBitmapInfo',
+ 'data': { 'drive': 'str', 'action': 'PBSBitmapAction', 'size': 'int',
+ 'dirty': 'int' } }
+
+##
+# @query-pbs-bitmap-info:
+#
+# Returns information about dirty bitmaps used on the most recently started
+# backup. Returns nothing when the last backup was not using PBS or if no
+# backup occured in this session.
+#
+# Returns: @PBSBitmapInfo
+#
+##
+{ 'command': 'query-pbs-bitmap-info', 'returns': ['PBSBitmapInfo'] }
+
##
# @BlockDeviceTimedStats:
#

View File

@@ -0,0 +1,61 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Tue, 12 Jan 2021 14:12:20 +0100
Subject: [PATCH] PVE: redirect stderr to journal when daemonized
QEMU uses the logging for error messages usually, so LOG_ERR is most
fitting.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
meson.build | 2 ++
os-posix.c | 7 +++++--
2 files changed, 7 insertions(+), 2 deletions(-)
diff --git a/meson.build b/meson.build
index 85b3c63199..31ba7d70d6 100644
--- a/meson.build
+++ b/meson.build
@@ -1203,6 +1203,7 @@ keyutils = dependency('libkeyutils', required: false,
has_gettid = cc.has_function('gettid')
libuuid = cc.find_library('uuid', required: true)
+libsystemd = cc.find_library('systemd', required: true)
libproxmox_backup_qemu = cc.find_library('proxmox_backup_qemu', required: true)
# libselinux
@@ -2571,6 +2572,7 @@ if have_block
# os-posix.c contains POSIX-specific functions used by qemu-storage-daemon,
# os-win32.c does not
blockdev_ss.add(when: 'CONFIG_POSIX', if_true: files('os-posix.c'))
+ blockdev_ss.add(when: 'CONFIG_POSIX', if_true: libsystemd)
softmmu_ss.add(when: 'CONFIG_WIN32', if_true: [files('os-win32.c')])
endif
diff --git a/os-posix.c b/os-posix.c
index ae6c9f2a5e..36807806bf 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -28,6 +28,8 @@
#include <pwd.h>
#include <grp.h>
#include <libgen.h>
+#include <systemd/sd-journal.h>
+#include <syslog.h>
#include "qemu-common.h"
/* Needed early for CONFIG_BSD etc. */
@@ -291,9 +293,10 @@ void os_setup_post(void)
dup2(fd, 0);
dup2(fd, 1);
- /* In case -D is given do not redirect stderr to /dev/null */
+ /* In case -D is given do not redirect stderr to journal */
if (!qemu_logfile) {
- dup2(fd, 2);
+ int journal_fd = sd_journal_stream_fd("QEMU", LOG_ERR, 0);
+ dup2(journal_fd, 2);
}
close(fd);

View File

@@ -0,0 +1,98 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Thu, 20 Aug 2020 14:31:59 +0200
Subject: [PATCH] PVE: Add sequential job transaction support
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
include/qemu/job.h | 12 ++++++++++++
job.c | 31 +++++++++++++++++++++++++++++++
2 files changed, 43 insertions(+)
diff --git a/include/qemu/job.h b/include/qemu/job.h
index 6e67b6977f..60376c99ee 100644
--- a/include/qemu/job.h
+++ b/include/qemu/job.h
@@ -294,6 +294,18 @@ typedef enum JobCreateFlags {
*/
JobTxn *job_txn_new(void);
+/**
+ * Create a new transaction and set it to sequential mode, i.e. run all jobs
+ * one after the other instead of at the same time.
+ */
+JobTxn *job_txn_new_seq(void);
+
+/**
+ * Helper method to start the first job in a sequential transaction to kick it
+ * off. Other jobs will be run after this one completes.
+ */
+void job_txn_start_seq(JobTxn *txn);
+
/**
* Release a reference that was previously acquired with job_txn_add_job or
* job_txn_new. If it's the last reference to the object, it will be freed.
diff --git a/job.c b/job.c
index af25dd5b98..d0d152e697 100644
--- a/job.c
+++ b/job.c
@@ -72,6 +72,8 @@ struct JobTxn {
/* Reference count */
int refcnt;
+
+ bool sequential;
};
/* Right now, this mutex is only needed to synchronize accesses to job->busy
@@ -102,6 +104,25 @@ JobTxn *job_txn_new(void)
return txn;
}
+JobTxn *job_txn_new_seq(void)
+{
+ JobTxn *txn = job_txn_new();
+ txn->sequential = true;
+ return txn;
+}
+
+void job_txn_start_seq(JobTxn *txn)
+{
+ assert(txn->sequential);
+ assert(!txn->aborting);
+
+ Job *first = QLIST_FIRST(&txn->jobs);
+ assert(first);
+ assert(first->status == JOB_STATUS_CREATED);
+
+ job_start(first);
+}
+
static void job_txn_ref(JobTxn *txn)
{
txn->refcnt++;
@@ -888,6 +909,9 @@ static void job_completed_txn_success(Job *job)
*/
QLIST_FOREACH(other_job, &txn->jobs, txn_list) {
if (!job_is_completed(other_job)) {
+ if (txn->sequential) {
+ job_start(other_job);
+ }
return;
}
assert(other_job->ret == 0);
@@ -1082,6 +1106,13 @@ int job_finish_sync(Job *job, void (*finish)(Job *, Error **errp), Error **errp)
return -EBUSY;
}
+ /* in a sequential transaction jobs with status CREATED can appear at time
+ * of cancelling, these have not begun work so job_enter won't do anything,
+ * let's ensure they are marked as ABORTING if required */
+ if (job->status == JOB_STATUS_CREATED && job->txn->sequential) {
+ job_update_rc(job);
+ }
+
AIO_WAIT_WHILE(job->aio_context,
(job_enter(job), !job_is_completed(job)));

View File

@@ -0,0 +1,295 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Thu, 20 Aug 2020 14:25:00 +0200
Subject: [PATCH] PVE-Backup: Use a transaction to synchronize job states
By using a JobTxn, we can sync dirty bitmaps only when *all* jobs were
successful - meaning we don't need to remove them when the backup fails,
since QEMU's BITMAP_SYNC_MODE_ON_SUCCESS will now handle that for us.
To keep the rate-limiting and IO impact from before, we use a sequential
transaction, so drives will still be backed up one after the other.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[add new force parameter to job_cancel_sync calls]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
pve-backup.c | 167 +++++++++++++++------------------------------------
1 file changed, 49 insertions(+), 118 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index f90abaa50a..63c686463f 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -52,6 +52,7 @@ static struct PVEBackupState {
VmaWriter *vmaw;
ProxmoxBackupHandle *pbs;
GList *di_list;
+ JobTxn *txn;
QemuMutex backup_mutex;
CoMutex dump_callback_mutex;
} backup_state;
@@ -71,32 +72,12 @@ typedef struct PVEBackupDevInfo {
size_t size;
uint64_t block_size;
uint8_t dev_id;
- bool completed;
char targetfile[PATH_MAX];
BdrvDirtyBitmap *bitmap;
BlockDriverState *target;
+ BlockJob *job;
} PVEBackupDevInfo;
-static void pvebackup_run_next_job(void);
-
-static BlockJob *
-lookup_active_block_job(PVEBackupDevInfo *di)
-{
- if (!di->completed && di->bs) {
- for (BlockJob *job = block_job_next(NULL); job; job = block_job_next(job)) {
- if (job->job.driver->job_type != JOB_TYPE_BACKUP) {
- continue;
- }
-
- BackupBlockJob *bjob = container_of(job, BackupBlockJob, common);
- if (bjob && bjob->source_bs == di->bs) {
- return job;
- }
- }
- }
- return NULL;
-}
-
static void pvebackup_propagate_error(Error *err)
{
qemu_mutex_lock(&backup_state.stat.lock);
@@ -272,18 +253,6 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
if (local_err != NULL) {
pvebackup_propagate_error(local_err);
}
- } else {
- // on error or cancel we cannot ensure synchronization of dirty
- // bitmaps with backup server, so remove all and do full backup next
- GList *l = backup_state.di_list;
- while (l) {
- PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
- l = g_list_next(l);
-
- if (di->bitmap) {
- bdrv_release_dirty_bitmap(di->bitmap);
- }
- }
}
proxmox_backup_disconnect(backup_state.pbs);
@@ -322,8 +291,6 @@ static void pvebackup_complete_cb(void *opaque, int ret)
qemu_mutex_lock(&backup_state.backup_mutex);
- di->completed = true;
-
if (ret < 0) {
Error *local_err = NULL;
error_setg(&local_err, "job failed with err %d - %s", ret, strerror(-ret));
@@ -336,20 +303,17 @@ static void pvebackup_complete_cb(void *opaque, int ret)
block_on_coroutine_fn(pvebackup_complete_stream, di);
- // remove self from job queue
+ // remove self from job list
backup_state.di_list = g_list_remove(backup_state.di_list, di);
- if (di->bitmap && ret < 0) {
- // on error or cancel we cannot ensure synchronization of dirty
- // bitmaps with backup server, so remove all and do full backup next
- bdrv_release_dirty_bitmap(di->bitmap);
- }
-
g_free(di);
- qemu_mutex_unlock(&backup_state.backup_mutex);
+ /* call cleanup if we're the last job */
+ if (!g_list_first(backup_state.di_list)) {
+ block_on_coroutine_fn(pvebackup_co_cleanup, NULL);
+ }
- pvebackup_run_next_job();
+ qemu_mutex_unlock(&backup_state.backup_mutex);
}
static void pvebackup_cancel(void)
@@ -371,36 +335,28 @@ static void pvebackup_cancel(void)
proxmox_backup_abort(backup_state.pbs, "backup canceled");
}
- qemu_mutex_unlock(&backup_state.backup_mutex);
-
- for(;;) {
-
- BlockJob *next_job = NULL;
-
- qemu_mutex_lock(&backup_state.backup_mutex);
-
- GList *l = backup_state.di_list;
- while (l) {
- PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
- l = g_list_next(l);
+ /* it's enough to cancel one job in the transaction, the rest will follow
+ * automatically */
+ GList *bdi = g_list_first(backup_state.di_list);
+ BlockJob *cancel_job = bdi && bdi->data ?
+ ((PVEBackupDevInfo *)bdi->data)->job :
+ NULL;
- BlockJob *job = lookup_active_block_job(di);
- if (job != NULL) {
- next_job = job;
- break;
- }
- }
+ /* ref the job before releasing the mutex, just to be safe */
+ if (cancel_job) {
+ job_ref(&cancel_job->job);
+ }
- qemu_mutex_unlock(&backup_state.backup_mutex);
+ /* job_cancel_sync may enter the job, so we need to release the
+ * backup_mutex to avoid deadlock */
+ qemu_mutex_unlock(&backup_state.backup_mutex);
- if (next_job) {
- AioContext *aio_context = next_job->job.aio_context;
- aio_context_acquire(aio_context);
- job_cancel_sync(&next_job->job, true);
- aio_context_release(aio_context);
- } else {
- break;
- }
+ if (cancel_job) {
+ AioContext *aio_context = cancel_job->job.aio_context;
+ aio_context_acquire(aio_context);
+ job_cancel_sync(&cancel_job->job, true);
+ job_unref(&cancel_job->job);
+ aio_context_release(aio_context);
}
}
@@ -459,51 +415,19 @@ static int coroutine_fn pvebackup_co_add_config(
goto out;
}
-bool job_should_pause(Job *job);
-
-static void pvebackup_run_next_job(void)
-{
- assert(!qemu_in_coroutine());
-
- qemu_mutex_lock(&backup_state.backup_mutex);
-
- GList *l = backup_state.di_list;
- while (l) {
- PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
- l = g_list_next(l);
-
- BlockJob *job = lookup_active_block_job(di);
-
- if (job) {
- qemu_mutex_unlock(&backup_state.backup_mutex);
-
- AioContext *aio_context = job->job.aio_context;
- aio_context_acquire(aio_context);
-
- if (job_should_pause(&job->job)) {
- bool error_or_canceled = pvebackup_error_or_canceled();
- if (error_or_canceled) {
- job_cancel_sync(&job->job, true);
- } else {
- job_resume(&job->job);
- }
- }
- aio_context_release(aio_context);
- return;
- }
- }
-
- block_on_coroutine_fn(pvebackup_co_cleanup, NULL); // no more jobs, run cleanup
-
- qemu_mutex_unlock(&backup_state.backup_mutex);
-}
-
static bool create_backup_jobs(void) {
assert(!qemu_in_coroutine());
Error *local_err = NULL;
+ /* create job transaction to synchronize bitmap commit and cancel all
+ * jobs in case one errors */
+ if (backup_state.txn) {
+ job_txn_unref(backup_state.txn);
+ }
+ backup_state.txn = job_txn_new_seq();
+
BackupPerf perf = { .max_workers = 16 };
/* create and start all jobs (paused state) */
@@ -526,7 +450,7 @@ static bool create_backup_jobs(void) {
BlockJob *job = backup_job_create(
NULL, di->bs, di->target, backup_state.speed, sync_mode, di->bitmap,
bitmap_mode, false, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
- JOB_DEFAULT, pvebackup_complete_cb, di, NULL, &local_err);
+ JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn, &local_err);
aio_context_release(aio_context);
@@ -538,7 +462,8 @@ static bool create_backup_jobs(void) {
pvebackup_propagate_error(create_job_err);
break;
}
- job_start(&job->job);
+
+ di->job = job;
bdrv_unref(di->target);
di->target = NULL;
@@ -556,6 +481,10 @@ static bool create_backup_jobs(void) {
bdrv_unref(di->target);
di->target = NULL;
}
+
+ if (di->job) {
+ job_unref(&di->job->job);
+ }
}
}
@@ -946,10 +875,6 @@ err:
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
- if (di->bitmap) {
- bdrv_release_dirty_bitmap(di->bitmap);
- }
-
if (di->target) {
bdrv_unref(di->target);
}
@@ -1038,9 +963,15 @@ UuidInfo *qmp_backup(
block_on_coroutine_fn(pvebackup_co_prepare, &task);
if (*errp == NULL) {
- create_backup_jobs();
+ bool errors = create_backup_jobs();
qemu_mutex_unlock(&backup_state.backup_mutex);
- pvebackup_run_next_job();
+
+ if (!errors) {
+ /* start the first job in the transaction
+ * note: this might directly enter the job, so we need to do this
+ * after unlocking the backup_mutex */
+ job_txn_start_seq(backup_state.txn);
+ }
} else {
qemu_mutex_unlock(&backup_state.backup_mutex);
}

View File

@@ -0,0 +1,503 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Mon, 28 Sep 2020 13:40:51 +0200
Subject: [PATCH] PVE-Backup: Don't block on finishing and cleanup
create_backup_jobs
proxmox_backup_co_finish is already async, but previously we would wait
for the coroutine using block_on_coroutine_fn(). Avoid this by
scheduling pvebackup_co_complete_stream (and thus pvebackup_co_cleanup)
as a real coroutine when calling from pvebackup_complete_cb. This is ok,
since complete_stream uses the backup_mutex internally to synchronize,
and other streams can happily continue writing in the meantime anyway.
To accomodate, backup_mutex is converted to a CoMutex. This means
converting every user to a coroutine. This is not just useful here, but
will come in handy once this series[0] is merged, and QMP calls can be
yield-able coroutines too. Then we can also finally get rid of
block_on_coroutine_fn.
Cases of aio_context_acquire/release from within what is now a coroutine
are changed to aio_co_reschedule_self, which works since a running
coroutine always holds the aio lock for the context it is running in.
job_cancel_sync is called from a BH since it can't be run from a
coroutine (uses AIO_WAIT_WHILE internally).
Same thing for create_backup_jobs, which is converted to a BH too.
To communicate the finishing state, a new property is introduced to
query-backup: 'finishing'. A new state is explicitly not used, since
that would break compatibility with older qemu-server versions.
Also fix create_backup_jobs:
No more weird bool returns, just the standard "errp" format used
everywhere else too. With this, if backup_job_create fails, the error
message is actually returned over QMP and can be shown to the user.
To facilitate correct cleanup on such an error, we call
create_backup_jobs as a bottom half directly from pvebackup_co_prepare.
This additionally allows us to actually hold the backup_mutex during
operation.
Also add a job_cancel_sync before job_unref, since a job must be in
STATUS_NULL to be deleted by unref, which could trigger an assert
before.
[0] https://lists.gnu.org/archive/html/qemu-devel/2020-09/msg03515.html
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[add new force parameter to job_cancel_sync calls]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
pve-backup.c | 217 ++++++++++++++++++++++++++++---------------
qapi/block-core.json | 5 +-
2 files changed, 144 insertions(+), 78 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 63c686463f..6f05796fad 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -33,7 +33,9 @@ const char *PBS_BITMAP_NAME = "pbs-incremental-dirty-bitmap";
static struct PVEBackupState {
struct {
- // Everithing accessed from qmp_backup_query command is protected using lock
+ // Everything accessed from qmp_backup_query command is protected using
+ // this lock. Do NOT hold this lock for long times, as it is sometimes
+ // acquired from coroutines, and thus any wait time may block the guest.
QemuMutex lock;
Error *error;
time_t start_time;
@@ -47,20 +49,22 @@ static struct PVEBackupState {
size_t reused;
size_t zero_bytes;
GList *bitmap_list;
+ bool finishing;
+ bool starting;
} stat;
int64_t speed;
VmaWriter *vmaw;
ProxmoxBackupHandle *pbs;
GList *di_list;
JobTxn *txn;
- QemuMutex backup_mutex;
+ CoMutex backup_mutex;
CoMutex dump_callback_mutex;
} backup_state;
static void pvebackup_init(void)
{
qemu_mutex_init(&backup_state.stat.lock);
- qemu_mutex_init(&backup_state.backup_mutex);
+ qemu_co_mutex_init(&backup_state.backup_mutex);
qemu_co_mutex_init(&backup_state.dump_callback_mutex);
}
@@ -72,6 +76,7 @@ typedef struct PVEBackupDevInfo {
size_t size;
uint64_t block_size;
uint8_t dev_id;
+ int completed_ret; // INT_MAX if not completed
char targetfile[PATH_MAX];
BdrvDirtyBitmap *bitmap;
BlockDriverState *target;
@@ -227,12 +232,12 @@ pvebackup_co_dump_vma_cb(
}
// assumes the caller holds backup_mutex
-static void coroutine_fn pvebackup_co_cleanup(void *unused)
+static void coroutine_fn pvebackup_co_cleanup(void)
{
assert(qemu_in_coroutine());
qemu_mutex_lock(&backup_state.stat.lock);
- backup_state.stat.end_time = time(NULL);
+ backup_state.stat.finishing = true;
qemu_mutex_unlock(&backup_state.stat.lock);
if (backup_state.vmaw) {
@@ -261,35 +266,29 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
g_list_free(backup_state.di_list);
backup_state.di_list = NULL;
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.end_time = time(NULL);
+ backup_state.stat.finishing = false;
+ qemu_mutex_unlock(&backup_state.stat.lock);
}
-// assumes the caller holds backup_mutex
-static void coroutine_fn pvebackup_complete_stream(void *opaque)
+static void coroutine_fn pvebackup_co_complete_stream(void *opaque)
{
PVEBackupDevInfo *di = opaque;
+ int ret = di->completed_ret;
- bool error_or_canceled = pvebackup_error_or_canceled();
-
- if (backup_state.vmaw) {
- vma_writer_close_stream(backup_state.vmaw, di->dev_id);
+ qemu_mutex_lock(&backup_state.stat.lock);
+ bool starting = backup_state.stat.starting;
+ qemu_mutex_unlock(&backup_state.stat.lock);
+ if (starting) {
+ /* in 'starting' state, no tasks have been run yet, meaning we can (and
+ * must) skip all cleanup, as we don't know what has and hasn't been
+ * initialized yet. */
+ return;
}
- if (backup_state.pbs && !error_or_canceled) {
- Error *local_err = NULL;
- proxmox_backup_co_close_image(backup_state.pbs, di->dev_id, &local_err);
- if (local_err != NULL) {
- pvebackup_propagate_error(local_err);
- }
- }
-}
-
-static void pvebackup_complete_cb(void *opaque, int ret)
-{
- assert(!qemu_in_coroutine());
-
- PVEBackupDevInfo *di = opaque;
-
- qemu_mutex_lock(&backup_state.backup_mutex);
+ qemu_co_mutex_lock(&backup_state.backup_mutex);
if (ret < 0) {
Error *local_err = NULL;
@@ -301,7 +300,19 @@ static void pvebackup_complete_cb(void *opaque, int ret)
assert(di->target == NULL);
- block_on_coroutine_fn(pvebackup_complete_stream, di);
+ bool error_or_canceled = pvebackup_error_or_canceled();
+
+ if (backup_state.vmaw) {
+ vma_writer_close_stream(backup_state.vmaw, di->dev_id);
+ }
+
+ if (backup_state.pbs && !error_or_canceled) {
+ Error *local_err = NULL;
+ proxmox_backup_co_close_image(backup_state.pbs, di->dev_id, &local_err);
+ if (local_err != NULL) {
+ pvebackup_propagate_error(local_err);
+ }
+ }
// remove self from job list
backup_state.di_list = g_list_remove(backup_state.di_list, di);
@@ -310,21 +321,49 @@ static void pvebackup_complete_cb(void *opaque, int ret)
/* call cleanup if we're the last job */
if (!g_list_first(backup_state.di_list)) {
- block_on_coroutine_fn(pvebackup_co_cleanup, NULL);
+ pvebackup_co_cleanup();
}
- qemu_mutex_unlock(&backup_state.backup_mutex);
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
}
-static void pvebackup_cancel(void)
+static void pvebackup_complete_cb(void *opaque, int ret)
{
- assert(!qemu_in_coroutine());
+ PVEBackupDevInfo *di = opaque;
+ di->completed_ret = ret;
+
+ /*
+ * Schedule stream cleanup in async coroutine. close_image and finish might
+ * take a while, so we can't block on them here. This way it also doesn't
+ * matter if we're already running in a coroutine or not.
+ * Note: di is a pointer to an entry in the global backup_state struct, so
+ * it stays valid.
+ */
+ Coroutine *co = qemu_coroutine_create(pvebackup_co_complete_stream, di);
+ aio_co_enter(qemu_get_aio_context(), co);
+}
+
+/*
+ * job_cancel(_sync) does not like to be called from coroutines, so defer to
+ * main loop processing via a bottom half.
+ */
+static void job_cancel_bh(void *opaque) {
+ CoCtxData *data = (CoCtxData*)opaque;
+ Job *job = (Job*)data->data;
+ AioContext *job_ctx = job->aio_context;
+ aio_context_acquire(job_ctx);
+ job_cancel_sync(job, true);
+ aio_context_release(job_ctx);
+ aio_co_enter(data->ctx, data->co);
+}
+static void coroutine_fn pvebackup_co_cancel(void *opaque)
+{
Error *cancel_err = NULL;
error_setg(&cancel_err, "backup canceled");
pvebackup_propagate_error(cancel_err);
- qemu_mutex_lock(&backup_state.backup_mutex);
+ qemu_co_mutex_lock(&backup_state.backup_mutex);
if (backup_state.vmaw) {
/* make sure vma writer does not block anymore */
@@ -342,27 +381,22 @@ static void pvebackup_cancel(void)
((PVEBackupDevInfo *)bdi->data)->job :
NULL;
- /* ref the job before releasing the mutex, just to be safe */
if (cancel_job) {
- job_ref(&cancel_job->job);
+ CoCtxData data = {
+ .ctx = qemu_get_current_aio_context(),
+ .co = qemu_coroutine_self(),
+ .data = &cancel_job->job,
+ };
+ aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);
+ qemu_coroutine_yield();
}
- /* job_cancel_sync may enter the job, so we need to release the
- * backup_mutex to avoid deadlock */
- qemu_mutex_unlock(&backup_state.backup_mutex);
-
- if (cancel_job) {
- AioContext *aio_context = cancel_job->job.aio_context;
- aio_context_acquire(aio_context);
- job_cancel_sync(&cancel_job->job, true);
- job_unref(&cancel_job->job);
- aio_context_release(aio_context);
- }
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
}
void qmp_backup_cancel(Error **errp)
{
- pvebackup_cancel();
+ block_on_coroutine_fn(pvebackup_co_cancel, NULL);
}
// assumes the caller holds backup_mutex
@@ -415,10 +449,18 @@ static int coroutine_fn pvebackup_co_add_config(
goto out;
}
-static bool create_backup_jobs(void) {
+/*
+ * backup_job_create can *not* be run from a coroutine (and requires an
+ * acquired AioContext), so this can't either.
+ * The caller is responsible that backup_mutex is held nonetheless.
+ */
+static void create_backup_jobs_bh(void *opaque) {
assert(!qemu_in_coroutine());
+ CoCtxData *data = (CoCtxData*)opaque;
+ Error **errp = (Error**)data->data;
+
Error *local_err = NULL;
/* create job transaction to synchronize bitmap commit and cancel all
@@ -454,24 +496,19 @@ static bool create_backup_jobs(void) {
aio_context_release(aio_context);
- if (!job || local_err != NULL) {
- Error *create_job_err = NULL;
- error_setg(&create_job_err, "backup_job_create failed: %s",
- local_err ? error_get_pretty(local_err) : "null");
+ di->job = job;
- pvebackup_propagate_error(create_job_err);
+ if (!job || local_err) {
+ error_setg(errp, "backup_job_create failed: %s",
+ local_err ? error_get_pretty(local_err) : "null");
break;
}
- di->job = job;
-
bdrv_unref(di->target);
di->target = NULL;
}
- bool errors = pvebackup_error_or_canceled();
-
- if (errors) {
+ if (*errp) {
l = backup_state.di_list;
while (l) {
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
@@ -483,12 +520,17 @@ static bool create_backup_jobs(void) {
}
if (di->job) {
+ AioContext *ctx = di->job->job.aio_context;
+ aio_context_acquire(ctx);
+ job_cancel_sync(&di->job->job, true);
job_unref(&di->job->job);
+ aio_context_release(ctx);
}
}
}
- return errors;
+ /* return */
+ aio_co_enter(data->ctx, data->co);
}
typedef struct QmpBackupTask {
@@ -525,11 +567,12 @@ typedef struct QmpBackupTask {
UuidInfo *result;
} QmpBackupTask;
-// assumes the caller holds backup_mutex
static void coroutine_fn pvebackup_co_prepare(void *opaque)
{
assert(qemu_in_coroutine());
+ qemu_co_mutex_lock(&backup_state.backup_mutex);
+
QmpBackupTask *task = opaque;
task->result = NULL; // just to be sure
@@ -550,8 +593,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
const char *firewall_name = "qemu-server.fw";
if (backup_state.di_list) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+ error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
"previous backup not finished");
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
return;
}
@@ -618,6 +662,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
di->size = size;
total += size;
+
+ di->completed_ret = INT_MAX;
}
uuid_generate(uuid);
@@ -849,6 +895,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
backup_state.stat.dirty = total - backup_state.stat.reused;
backup_state.stat.transferred = 0;
backup_state.stat.zero_bytes = 0;
+ backup_state.stat.finishing = false;
+ backup_state.stat.starting = true;
qemu_mutex_unlock(&backup_state.stat.lock);
@@ -863,6 +911,33 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
uuid_info->UUID = uuid_str;
task->result = uuid_info;
+
+ /* Run create_backup_jobs_bh outside of coroutine (in BH) but keep
+ * backup_mutex locked. This is fine, a CoMutex can be held across yield
+ * points, and we'll release it as soon as the BH reschedules us.
+ */
+ CoCtxData waker = {
+ .co = qemu_coroutine_self(),
+ .ctx = qemu_get_current_aio_context(),
+ .data = &local_err,
+ };
+ aio_bh_schedule_oneshot(waker.ctx, create_backup_jobs_bh, &waker);
+ qemu_coroutine_yield();
+
+ if (local_err) {
+ error_propagate(task->errp, local_err);
+ goto err;
+ }
+
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
+
+ qemu_mutex_lock(&backup_state.stat.lock);
+ backup_state.stat.starting = false;
+ qemu_mutex_unlock(&backup_state.stat.lock);
+
+ /* start the first job in the transaction */
+ job_txn_start_seq(backup_state.txn);
+
return;
err_mutex:
@@ -885,6 +960,7 @@ err:
g_free(di);
}
g_list_free(di_list);
+ backup_state.di_list = NULL;
if (devs) {
g_strfreev(devs);
@@ -905,6 +981,8 @@ err:
}
task->result = NULL;
+
+ qemu_co_mutex_unlock(&backup_state.backup_mutex);
return;
}
@@ -958,24 +1036,8 @@ UuidInfo *qmp_backup(
.errp = errp,
};
- qemu_mutex_lock(&backup_state.backup_mutex);
-
block_on_coroutine_fn(pvebackup_co_prepare, &task);
- if (*errp == NULL) {
- bool errors = create_backup_jobs();
- qemu_mutex_unlock(&backup_state.backup_mutex);
-
- if (!errors) {
- /* start the first job in the transaction
- * note: this might directly enter the job, so we need to do this
- * after unlocking the backup_mutex */
- job_txn_start_seq(backup_state.txn);
- }
- } else {
- qemu_mutex_unlock(&backup_state.backup_mutex);
- }
-
return task.result;
}
@@ -1027,6 +1089,7 @@ BackupStatus *qmp_query_backup(Error **errp)
info->transferred = backup_state.stat.transferred;
info->has_reused = true;
info->reused = backup_state.stat.reused;
+ info->finishing = backup_state.stat.finishing;
qemu_mutex_unlock(&backup_state.stat.lock);
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 8833060385..6a67adf923 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -774,12 +774,15 @@
#
# @uuid: uuid for this backup job
#
+# @finishing: if status='active' and finishing=true, then the backup process is
+# waiting for the target to finish.
+#
##
{ 'struct': 'BackupStatus',
'data': {'*status': 'str', '*errmsg': 'str', '*total': 'int', '*dirty': 'int',
'*transferred': 'int', '*zero-bytes': 'int', '*reused': 'int',
'*start-time': 'int', '*end-time': 'int',
- '*backup-file': 'str', '*uuid': 'str' } }
+ '*backup-file': 'str', '*uuid': 'str', 'finishing': 'bool' } }
##
# @BackupFormat:

View File

@@ -0,0 +1,212 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Thu, 22 Oct 2020 17:34:18 +0200
Subject: [PATCH] PVE: Migrate dirty bitmap state via savevm
QEMU provides 'savevm' registrations as a mechanism for arbitrary state
to be migrated along with a VM. Use this to send a serialized version of
dirty bitmap state data from proxmox-backup-qemu, and restore it on the
target node.
Also add a flag to query-proxmox-support so qemu-server can determine if
safe migration is possible and makes sense.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
include/migration/misc.h | 3 ++
migration/meson.build | 2 +
migration/migration.c | 1 +
migration/pbs-state.c | 106 +++++++++++++++++++++++++++++++++++++++
pve-backup.c | 1 +
qapi/block-core.json | 6 +++
6 files changed, 119 insertions(+)
create mode 100644 migration/pbs-state.c
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 465906710d..4f0aeceb6f 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -75,4 +75,7 @@ bool migration_in_bg_snapshot(void);
/* migration/block-dirty-bitmap.c */
void dirty_bitmap_mig_init(void);
+/* migration/pbs-state.c */
+void pbs_state_mig_init(void);
+
#endif
diff --git a/migration/meson.build b/migration/meson.build
index ea9aedeefc..c27dc9bd97 100644
--- a/migration/meson.build
+++ b/migration/meson.build
@@ -7,8 +7,10 @@ migration_files = files(
'qemu-file-channel.c',
'qemu-file.c',
'yank_functions.c',
+ 'pbs-state.c',
)
softmmu_ss.add(migration_files)
+softmmu_ss.add(libproxmox_backup_qemu)
softmmu_ss.add(files(
'block-dirty-bitmap.c',
diff --git a/migration/migration.c b/migration/migration.c
index abaf6f9e3d..d925fd7488 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -213,6 +213,7 @@ void migration_object_init(void)
blk_mig_init();
ram_mig_init();
dirty_bitmap_mig_init();
+ pbs_state_mig_init();
}
void migration_cancel(const Error *error)
diff --git a/migration/pbs-state.c b/migration/pbs-state.c
new file mode 100644
index 0000000000..29f2b3860d
--- /dev/null
+++ b/migration/pbs-state.c
@@ -0,0 +1,106 @@
+/*
+ * PBS (dirty-bitmap) state migration
+ */
+
+#include "qemu/osdep.h"
+#include "migration/misc.h"
+#include "qemu-file.h"
+#include "migration/vmstate.h"
+#include "migration/register.h"
+#include "proxmox-backup-qemu.h"
+
+typedef struct PBSState {
+ bool active;
+} PBSState;
+
+/* state is accessed via this static variable directly, 'opaque' is NULL */
+static PBSState pbs_state;
+
+static void pbs_state_save_pending(QEMUFile *f, void *opaque,
+ uint64_t max_size,
+ uint64_t *res_precopy_only,
+ uint64_t *res_compatible,
+ uint64_t *res_postcopy_only)
+{
+ /* we send everything in save_setup, so nothing is ever pending */
+}
+
+/* receive PBS state via f and deserialize, called on target */
+static int pbs_state_load(QEMUFile *f, void *opaque, int version_id)
+{
+ /* safe cast, we cannot migrate to target with less bits than source */
+ size_t buf_size = (size_t)qemu_get_be64(f);
+
+ uint8_t *buf = (uint8_t *)malloc(buf_size);
+ size_t read = qemu_get_buffer(f, buf, buf_size);
+
+ if (read < buf_size) {
+ fprintf(stderr, "error receiving PBS state: not enough data\n");
+ return -EIO;
+ }
+
+ proxmox_import_state(buf, buf_size);
+
+ free(buf);
+ return 0;
+}
+
+/* serialize PBS state and send to target via f, called on source */
+static int pbs_state_save_setup(QEMUFile *f, void *opaque)
+{
+ size_t buf_size;
+ uint8_t *buf = proxmox_export_state(&buf_size);
+
+ /* LV encoding */
+ qemu_put_be64(f, buf_size);
+ qemu_put_buffer(f, buf, buf_size);
+
+ proxmox_free_state_buf(buf);
+ pbs_state.active = false;
+ return 0;
+}
+
+static bool pbs_state_is_active(void *opaque)
+{
+ /* we need to return active exactly once, else .save_setup is never called,
+ * but if we'd just return true the migration doesn't make progress since
+ * it'd be waiting for us */
+ return pbs_state.active;
+}
+
+static bool pbs_state_is_active_iterate(void *opaque)
+{
+ /* we don't iterate, everything is sent in save_setup */
+ return pbs_state_is_active(opaque);
+}
+
+static bool pbs_state_has_postcopy(void *opaque)
+{
+ /* PBS state can't change during a migration (since that's blocking any
+ * potential backups), so we can copy everything before the VM is stopped */
+ return false;
+}
+
+static void pbs_state_save_cleanup(void *opaque)
+{
+ /* reset active after migration succeeds or fails */
+ pbs_state.active = false;
+}
+
+static SaveVMHandlers savevm_pbs_state_handlers = {
+ .save_setup = pbs_state_save_setup,
+ .has_postcopy = pbs_state_has_postcopy,
+ .save_live_pending = pbs_state_save_pending,
+ .is_active_iterate = pbs_state_is_active_iterate,
+ .load_state = pbs_state_load,
+ .is_active = pbs_state_is_active,
+ .save_cleanup = pbs_state_save_cleanup,
+};
+
+void pbs_state_mig_init(void)
+{
+ pbs_state.active = true;
+ register_savevm_live("pbs-state", 0, 1,
+ &savevm_pbs_state_handlers,
+ NULL);
+}
diff --git a/pve-backup.c b/pve-backup.c
index 6f05796fad..5fa3cc1352 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -1132,6 +1132,7 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
ret->pbs_library_version = g_strdup(proxmox_backup_qemu_version());
ret->pbs_dirty_bitmap = true;
ret->pbs_dirty_bitmap_savevm = true;
+ ret->pbs_dirty_bitmap_migration = true;
ret->query_bitmap_info = true;
return ret;
}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 6a67adf923..c99ddf8628 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -883,6 +883,11 @@
# @pbs-dirty-bitmap-savevm: True if 'dirty-bitmaps' migration capability can
# safely be set for savevm-async.
#
+# @pbs-dirty-bitmap-migration: True if safe migration of dirty-bitmaps including
+# PBS state is supported. Enabling 'dirty-bitmaps'
+# migration cap if this is false/unset may lead
+# to crashes on migration!
+#
# @pbs-library-version: Running version of libproxmox-backup-qemu0 library.
#
##
@@ -890,6 +895,7 @@
'data': { 'pbs-dirty-bitmap': 'bool',
'query-bitmap-info': 'bool',
'pbs-dirty-bitmap-savevm': 'bool',
+ 'pbs-dirty-bitmap-migration': 'bool',
'pbs-library-version': 'str' } }
##

View File

@@ -1,88 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 8 Apr 2020 15:29:03 +0200
Subject: [PATCH] PVE: savevm-async: set up migration state
code mostly adapted from upstream savevm.c
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
savevm-async.c | 30 ++++++++++++++++++++++++++++--
1 file changed, 28 insertions(+), 2 deletions(-)
diff --git a/savevm-async.c b/savevm-async.c
index 790e27ae37..a38b15d652 100644
--- a/savevm-async.c
+++ b/savevm-async.c
@@ -225,6 +225,7 @@ static void *process_savevm_thread(void *opaque)
{
int ret;
int64_t maxlen;
+ MigrationState *ms = migrate_get_current();
rcu_register_thread();
@@ -234,8 +235,7 @@ static void *process_savevm_thread(void *opaque)
if (ret < 0) {
save_snapshot_error("qemu_savevm_state_setup failed");
- rcu_unregister_thread();
- return NULL;
+ goto out;
}
while (snap_state.state == SAVE_STATE_ACTIVE) {
@@ -287,6 +287,12 @@ static void *process_savevm_thread(void *opaque)
qemu_bh_schedule(snap_state.cleanup_bh);
qemu_mutex_unlock_iothread();
+out:
+ /* set migration state accordingly and clear soon-to-be stale file */
+ migrate_set_state(&ms->state, MIGRATION_STATUS_SETUP,
+ ret ? MIGRATION_STATUS_FAILED : MIGRATION_STATUS_COMPLETED);
+ ms->to_dst_file = NULL;
+
rcu_unregister_thread();
return NULL;
}
@@ -294,6 +300,7 @@ static void *process_savevm_thread(void *opaque)
void qmp_savevm_start(bool has_statefile, const char *statefile, Error **errp)
{
Error *local_err = NULL;
+ MigrationState *ms = migrate_get_current();
int bdrv_oflags = BDRV_O_RDWR | BDRV_O_RESIZE | BDRV_O_NO_FLUSH;
@@ -303,6 +310,17 @@ void qmp_savevm_start(bool has_statefile, const char *statefile, Error **errp)
return;
}
+ if (migration_is_running(ms->state)) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, QERR_MIGRATION_ACTIVE);
+ return;
+ }
+
+ if (migrate_use_block()) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
+ "Block migration and snapshots are incompatible");
+ return;
+ }
+
/* initialize snapshot info */
snap_state.saved_vm_running = runstate_is_running();
snap_state.bs_pos = 0;
@@ -341,6 +359,14 @@ void qmp_savevm_start(bool has_statefile, const char *statefile, Error **errp)
goto restart;
}
+ /*
+ * qemu_savevm_* paths use migration code and expect a migration state.
+ * State is cleared in process_savevm_thread, but has to be initialized
+ * here (blocking main thread, from QMP) to avoid race conditions.
+ */
+ migrate_init(ms);
+ memset(&ram_counters, 0, sizeof(ram_counters));
+ ms->to_dst_file = snap_state.file;
error_setg(&snap_state.blocker, "block device is in use by savevm");
blk_op_block_all(snap_state.target, snap_state.blocker);

View File

@@ -0,0 +1,33 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Tue, 3 Nov 2020 14:57:32 +0100
Subject: [PATCH] migration/block-dirty-bitmap: migrate other bitmaps even if
one fails
If the checks in bdrv_dirty_bitmap_check fail, that only means that this
one specific bitmap cannot be migrated. That is not an error condition
for any other bitmaps on the same block device.
Fixes dirty-bitmap migration with sync=bitmap, as the bitmaps used for
that are obviously marked as "busy", which would cause none at all to be
transferred.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
migration/block-dirty-bitmap.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/migration/block-dirty-bitmap.c b/migration/block-dirty-bitmap.c
index 9aba7d9c22..f4ecf9c9f9 100644
--- a/migration/block-dirty-bitmap.c
+++ b/migration/block-dirty-bitmap.c
@@ -538,7 +538,7 @@ static int add_bitmaps_to_list(DBMSaveState *s, BlockDriverState *bs,
if (bdrv_dirty_bitmap_check(bitmap, BDRV_BITMAP_DEFAULT, &local_err)) {
error_report_err(local_err);
- return -1;
+ continue;
}
if (bitmap_aliases) {

View File

@@ -1,211 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Dietmar Maurer <dietmar@proxmox.com>
Date: Fri, 17 Apr 2020 08:57:47 +0200
Subject: [PATCH] PVE Backup: avoid use QemuRecMutex inside coroutines
---
pve-backup.c | 59 +++++++++++++++++++++++++++++++++-------------------
1 file changed, 38 insertions(+), 21 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 169f0c68d0..dddf430399 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -11,6 +11,23 @@
/* PVE backup state and related function */
+/*
+ * Note: A resume from a qemu_coroutine_yield can happen in a different thread,
+ * so you may not use normal mutexes within coroutines:
+ *
+ * ---bad-example---
+ * qemu_rec_mutex_lock(lock)
+ * ...
+ * qemu_coroutine_yield() // wait for something
+ * // we are now inside a different thread
+ * qemu_rec_mutex_unlock(lock) // Crash - wrong thread!!
+ * ---end-bad-example--
+ *
+ * ==> Always use CoMutext inside coroutines.
+ * ==> Never acquire/release AioContext withing coroutines (because that use QemuRecMutex)
+ *
+ */
+
static struct PVEBackupState {
struct {
// Everithing accessed from qmp_backup_query command is protected using lock
@@ -30,12 +47,14 @@ static struct PVEBackupState {
ProxmoxBackupHandle *pbs;
GList *di_list;
QemuRecMutex backup_mutex;
+ CoMutex dump_callback_mutex;
} backup_state;
static void pvebackup_init(void)
{
qemu_rec_mutex_init(&backup_state.stat.lock);
qemu_rec_mutex_init(&backup_state.backup_mutex);
+ qemu_co_mutex_init(&backup_state.dump_callback_mutex);
}
// initialize PVEBackupState at startup
@@ -114,16 +133,16 @@ pvebackup_co_dump_pbs_cb(
Error *local_err = NULL;
int pbs_res = -1;
- qemu_rec_mutex_lock(&backup_state.backup_mutex);
+ qemu_co_mutex_lock(&backup_state.dump_callback_mutex);
// avoid deadlock if job is cancelled
if (pvebackup_error_or_canceled()) {
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
return -1;
}
pbs_res = proxmox_backup_co_write_data(backup_state.pbs, di->dev_id, buf, start, size, &local_err);
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
if (pbs_res < 0) {
pvebackup_propagate_error(local_err);
@@ -149,7 +168,6 @@ pvebackup_co_dump_vma_cb(
const unsigned char *buf = pbuf;
PVEBackupDevInfo *di = opaque;
-
int ret = -1;
assert(backup_state.vmaw);
@@ -167,16 +185,16 @@ pvebackup_co_dump_vma_cb(
}
while (remaining > 0) {
- qemu_rec_mutex_lock(&backup_state.backup_mutex);
+ qemu_co_mutex_lock(&backup_state.dump_callback_mutex);
// avoid deadlock if job is cancelled
if (pvebackup_error_or_canceled()) {
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
return -1;
}
size_t zero_bytes = 0;
ret = vma_writer_write(backup_state.vmaw, di->dev_id, cluster_num, buf, &zero_bytes);
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+ qemu_co_mutex_unlock(&backup_state.dump_callback_mutex);
++cluster_num;
if (buf) {
@@ -203,12 +221,11 @@ pvebackup_co_dump_vma_cb(
return size;
}
+// assumes the caller holds backup_mutex
static void coroutine_fn pvebackup_co_cleanup(void *unused)
{
assert(qemu_in_coroutine());
- qemu_rec_mutex_lock(&backup_state.backup_mutex);
-
qemu_rec_mutex_lock(&backup_state.stat.lock);
backup_state.stat.end_time = time(NULL);
qemu_rec_mutex_unlock(&backup_state.stat.lock);
@@ -239,9 +256,9 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
g_list_free(backup_state.di_list);
backup_state.di_list = NULL;
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
}
+// assumes the caller holds backup_mutex
static void coroutine_fn pvebackup_complete_stream(void *opaque)
{
PVEBackupDevInfo *di = opaque;
@@ -295,6 +312,8 @@ static void pvebackup_complete_cb(void *opaque, int ret)
static void pvebackup_cancel(void)
{
+ assert(!qemu_in_coroutine());
+
Error *cancel_err = NULL;
error_setg(&cancel_err, "backup canceled");
pvebackup_propagate_error(cancel_err);
@@ -348,6 +367,7 @@ void qmp_backup_cancel(Error **errp)
pvebackup_cancel();
}
+// assumes the caller holds backup_mutex
static int coroutine_fn pvebackup_co_add_config(
const char *file,
const char *name,
@@ -431,9 +451,9 @@ static void pvebackup_run_next_job(void)
}
}
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
-
block_on_coroutine_fn(pvebackup_co_cleanup, NULL); // no more jobs, run cleanup
+
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
}
static bool create_backup_jobs(void) {
@@ -520,6 +540,7 @@ typedef struct QmpBackupTask {
UuidInfo *result;
} QmpBackupTask;
+// assumes the caller holds backup_mutex
static void coroutine_fn pvebackup_co_prepare(void *opaque)
{
assert(qemu_in_coroutine());
@@ -543,11 +564,8 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
const char *config_name = "qemu-server.conf";
const char *firewall_name = "qemu-server.fw";
- qemu_rec_mutex_lock(&backup_state.backup_mutex);
-
if (backup_state.di_list) {
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+ error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
"previous backup not finished");
return;
}
@@ -792,8 +810,6 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
backup_state.di_list = di_list;
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
-
uuid_info = g_malloc0(sizeof(*uuid_info));
uuid_info->UUID = uuid_str;
@@ -836,8 +852,6 @@ err:
rmdir(backup_dir);
}
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
-
task->result = NULL;
return;
}
@@ -881,13 +895,16 @@ UuidInfo *qmp_backup(
.errp = errp,
};
+ qemu_rec_mutex_lock(&backup_state.backup_mutex);
+
block_on_coroutine_fn(pvebackup_co_prepare, &task);
if (*errp == NULL) {
- qemu_rec_mutex_lock(&backup_state.backup_mutex);
create_backup_jobs();
qemu_rec_mutex_unlock(&backup_state.backup_mutex);
pvebackup_run_next_job();
+ } else {
+ qemu_rec_mutex_unlock(&backup_state.backup_mutex);
}
return task.result;

View File

@@ -0,0 +1,69 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Tue, 17 Nov 2020 10:51:05 +0100
Subject: [PATCH] PVE: fall back to open-iscsi initiatorname
When no explicit option is given, try reading the initiator name from
/etc/iscsi/initiatorname.iscsi and only use the generic fallback, i.e.
iqn.2008-11.org.linux-kvmXXX, as a third alternative.
This avoids the need to add an explicit option for vma and to explicitly set it
for each call to qemu that deals with iSCSI disks, while still allowing to set
the option if a different name is needed.
According to RFC 3720, an initiator name is at most 223 bytes long, so the
4 KiB buffer is big enough, even if many whitespaces are used.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/iscsi.c | 30 ++++++++++++++++++++++++++++++
1 file changed, 30 insertions(+)
diff --git a/block/iscsi.c b/block/iscsi.c
index 57aa07a40d..a8902b84d5 100644
--- a/block/iscsi.c
+++ b/block/iscsi.c
@@ -1386,12 +1386,42 @@ static char *get_initiator_name(QemuOpts *opts)
const char *name;
char *iscsi_name;
UuidInfo *uuid_info;
+ FILE *name_fh;
name = qemu_opt_get(opts, "initiator-name");
if (name) {
return g_strdup(name);
}
+ name_fh = fopen("/etc/iscsi/initiatorname.iscsi", "r");
+ if (name_fh) {
+ const char *key = "InitiatorName";
+ char buffer[4096];
+ char *line;
+
+ while ((line = fgets(buffer, sizeof(buffer), name_fh))) {
+ line = g_strstrip(line);
+ if (!strncmp(line, key, strlen(key))) {
+ line = strchr(line, '=');
+ if (!line || strlen(line) == 1) {
+ continue;
+ }
+ line++;
+ g_strstrip(line);
+ if (!strlen(line)) {
+ continue;
+ }
+ name = line;
+ break;
+ }
+ }
+ fclose(name_fh);
+
+ if (name) {
+ return g_strdup(name);
+ }
+ }
+
uuid_info = qmp_query_uuid(NULL);
if (strcmp(uuid_info->UUID, UUID_NONE) == 0) {
name = qemu_get_vm_name();

View File

@@ -1,227 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Dietmar Maurer <dietmar@proxmox.com>
Date: Fri, 17 Apr 2020 08:57:48 +0200
Subject: [PATCH] PVE Backup: use QemuMutex instead of QemuRecMutex
We acquire/release all mutexes outside coroutines now, so we can now
correctly use a normal mutex.
---
pve-backup.c | 58 ++++++++++++++++++++++++++--------------------------
1 file changed, 29 insertions(+), 29 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index dddf430399..bb917ee972 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -31,7 +31,7 @@
static struct PVEBackupState {
struct {
// Everithing accessed from qmp_backup_query command is protected using lock
- QemuRecMutex lock;
+ QemuMutex lock;
Error *error;
time_t start_time;
time_t end_time;
@@ -46,14 +46,14 @@ static struct PVEBackupState {
VmaWriter *vmaw;
ProxmoxBackupHandle *pbs;
GList *di_list;
- QemuRecMutex backup_mutex;
+ QemuMutex backup_mutex;
CoMutex dump_callback_mutex;
} backup_state;
static void pvebackup_init(void)
{
- qemu_rec_mutex_init(&backup_state.stat.lock);
- qemu_rec_mutex_init(&backup_state.backup_mutex);
+ qemu_mutex_init(&backup_state.stat.lock);
+ qemu_mutex_init(&backup_state.backup_mutex);
qemu_co_mutex_init(&backup_state.dump_callback_mutex);
}
@@ -91,26 +91,26 @@ lookup_active_block_job(PVEBackupDevInfo *di)
static void pvebackup_propagate_error(Error *err)
{
- qemu_rec_mutex_lock(&backup_state.stat.lock);
+ qemu_mutex_lock(&backup_state.stat.lock);
error_propagate(&backup_state.stat.error, err);
- qemu_rec_mutex_unlock(&backup_state.stat.lock);
+ qemu_mutex_unlock(&backup_state.stat.lock);
}
static bool pvebackup_error_or_canceled(void)
{
- qemu_rec_mutex_lock(&backup_state.stat.lock);
+ qemu_mutex_lock(&backup_state.stat.lock);
bool error_or_canceled = !!backup_state.stat.error;
- qemu_rec_mutex_unlock(&backup_state.stat.lock);
+ qemu_mutex_unlock(&backup_state.stat.lock);
return error_or_canceled;
}
static void pvebackup_add_transfered_bytes(size_t transferred, size_t zero_bytes)
{
- qemu_rec_mutex_lock(&backup_state.stat.lock);
+ qemu_mutex_lock(&backup_state.stat.lock);
backup_state.stat.zero_bytes += zero_bytes;
backup_state.stat.transferred += transferred;
- qemu_rec_mutex_unlock(&backup_state.stat.lock);
+ qemu_mutex_unlock(&backup_state.stat.lock);
}
// This may get called from multiple coroutines in multiple io-threads
@@ -226,9 +226,9 @@ static void coroutine_fn pvebackup_co_cleanup(void *unused)
{
assert(qemu_in_coroutine());
- qemu_rec_mutex_lock(&backup_state.stat.lock);
+ qemu_mutex_lock(&backup_state.stat.lock);
backup_state.stat.end_time = time(NULL);
- qemu_rec_mutex_unlock(&backup_state.stat.lock);
+ qemu_mutex_unlock(&backup_state.stat.lock);
if (backup_state.vmaw) {
Error *local_err = NULL;
@@ -284,7 +284,7 @@ static void pvebackup_complete_cb(void *opaque, int ret)
PVEBackupDevInfo *di = opaque;
- qemu_rec_mutex_lock(&backup_state.backup_mutex);
+ qemu_mutex_lock(&backup_state.backup_mutex);
di->completed = true;
@@ -305,7 +305,7 @@ static void pvebackup_complete_cb(void *opaque, int ret)
g_free(di);
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+ qemu_mutex_unlock(&backup_state.backup_mutex);
pvebackup_run_next_job();
}
@@ -318,7 +318,7 @@ static void pvebackup_cancel(void)
error_setg(&cancel_err, "backup canceled");
pvebackup_propagate_error(cancel_err);
- qemu_rec_mutex_lock(&backup_state.backup_mutex);
+ qemu_mutex_lock(&backup_state.backup_mutex);
if (backup_state.vmaw) {
/* make sure vma writer does not block anymore */
@@ -329,13 +329,13 @@ static void pvebackup_cancel(void)
proxmox_backup_abort(backup_state.pbs, "backup canceled");
}
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+ qemu_mutex_unlock(&backup_state.backup_mutex);
for(;;) {
BlockJob *next_job = NULL;
- qemu_rec_mutex_lock(&backup_state.backup_mutex);
+ qemu_mutex_lock(&backup_state.backup_mutex);
GList *l = backup_state.di_list;
while (l) {
@@ -349,7 +349,7 @@ static void pvebackup_cancel(void)
}
}
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+ qemu_mutex_unlock(&backup_state.backup_mutex);
if (next_job) {
AioContext *aio_context = next_job->job.aio_context;
@@ -423,7 +423,7 @@ static void pvebackup_run_next_job(void)
{
assert(!qemu_in_coroutine());
- qemu_rec_mutex_lock(&backup_state.backup_mutex);
+ qemu_mutex_lock(&backup_state.backup_mutex);
GList *l = backup_state.di_list;
while (l) {
@@ -433,7 +433,7 @@ static void pvebackup_run_next_job(void)
BlockJob *job = lookup_active_block_job(di);
if (job) {
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+ qemu_mutex_unlock(&backup_state.backup_mutex);
AioContext *aio_context = job->job.aio_context;
aio_context_acquire(aio_context);
@@ -453,7 +453,7 @@ static void pvebackup_run_next_job(void)
block_on_coroutine_fn(pvebackup_co_cleanup, NULL); // no more jobs, run cleanup
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+ qemu_mutex_unlock(&backup_state.backup_mutex);
}
static bool create_backup_jobs(void) {
@@ -778,7 +778,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
/* initialize global backup_state now */
- qemu_rec_mutex_lock(&backup_state.stat.lock);
+ qemu_mutex_lock(&backup_state.stat.lock);
if (backup_state.stat.error) {
error_free(backup_state.stat.error);
@@ -801,7 +801,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
backup_state.stat.transferred = 0;
backup_state.stat.zero_bytes = 0;
- qemu_rec_mutex_unlock(&backup_state.stat.lock);
+ qemu_mutex_unlock(&backup_state.stat.lock);
backup_state.speed = (task->has_speed && task->speed > 0) ? task->speed : 0;
@@ -895,16 +895,16 @@ UuidInfo *qmp_backup(
.errp = errp,
};
- qemu_rec_mutex_lock(&backup_state.backup_mutex);
+ qemu_mutex_lock(&backup_state.backup_mutex);
block_on_coroutine_fn(pvebackup_co_prepare, &task);
if (*errp == NULL) {
create_backup_jobs();
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+ qemu_mutex_unlock(&backup_state.backup_mutex);
pvebackup_run_next_job();
} else {
- qemu_rec_mutex_unlock(&backup_state.backup_mutex);
+ qemu_mutex_unlock(&backup_state.backup_mutex);
}
return task.result;
@@ -914,11 +914,11 @@ BackupStatus *qmp_query_backup(Error **errp)
{
BackupStatus *info = g_malloc0(sizeof(*info));
- qemu_rec_mutex_lock(&backup_state.stat.lock);
+ qemu_mutex_lock(&backup_state.stat.lock);
if (!backup_state.stat.start_time) {
/* not started, return {} */
- qemu_rec_mutex_unlock(&backup_state.stat.lock);
+ qemu_mutex_unlock(&backup_state.stat.lock);
return info;
}
@@ -955,7 +955,7 @@ BackupStatus *qmp_query_backup(Error **errp)
info->has_transferred = true;
info->transferred = backup_state.stat.transferred;
- qemu_rec_mutex_unlock(&backup_state.stat.lock);
+ qemu_mutex_unlock(&backup_state.stat.lock);
return info;
}

View File

@@ -0,0 +1,598 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Tue, 26 Jan 2021 15:45:30 +0100
Subject: [PATCH] PVE: Use coroutine QMP for backup/cancel_backup
Finally turn backup QMP calls into coroutines, now that it's possible.
This has the benefit that calls are asynchronous to the main loop, i.e.
long running operations like connecting to a PBS server will no longer
hang the VM.
Additionally, it allows us to get rid of block_on_coroutine_fn, which
was always a hacky workaround.
While we're already spring cleaning, also remove the QmpBackupTask
struct, since we can now put the 'prepare' function directly into
qmp_backup and thus no longer need those giant walls of text.
(Note that for our patches to work with 5.2.0 this change is actually
required, otherwise monitor_get_fd() fails as we're not in a QMP
coroutine, but one we start ourselves - we could of course set the
monitor for that coroutine ourselves, but let's just fix it the right
way instead)
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 4 +-
hmp-commands.hx | 2 +
proxmox-backup-client.c | 31 -----
pve-backup.c | 232 ++++++++++-----------------------
qapi/block-core.json | 4 +-
5 files changed, 77 insertions(+), 196 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 4481b60a5c..c9849a5b29 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1016,7 +1016,7 @@ void hmp_info_snapshots(Monitor *mon, const QDict *qdict)
g_free(global_snapshots);
}
-void hmp_backup_cancel(Monitor *mon, const QDict *qdict)
+void coroutine_fn hmp_backup_cancel(Monitor *mon, const QDict *qdict)
{
Error *error = NULL;
@@ -1025,7 +1025,7 @@ void hmp_backup_cancel(Monitor *mon, const QDict *qdict)
hmp_handle_error(mon, error);
}
-void hmp_backup(Monitor *mon, const QDict *qdict)
+void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
{
Error *error = NULL;
diff --git a/hmp-commands.hx b/hmp-commands.hx
index d4bb00216e..4e21911fa6 100644
--- a/hmp-commands.hx
+++ b/hmp-commands.hx
@@ -109,6 +109,7 @@ ERST
"\n\t\t\t Use -d to dump data into a directory instead"
"\n\t\t\t of using VMA format.",
.cmd = hmp_backup,
+ .coroutine = true,
},
SRST
@@ -122,6 +123,7 @@ ERST
.params = "",
.help = "cancel the current VM backup",
.cmd = hmp_backup_cancel,
+ .coroutine = true,
},
SRST
diff --git a/proxmox-backup-client.c b/proxmox-backup-client.c
index 4ce7bc0b5e..0923037dec 100644
--- a/proxmox-backup-client.c
+++ b/proxmox-backup-client.c
@@ -5,37 +5,6 @@
/* Proxmox Backup Server client bindings using coroutines */
-typedef struct BlockOnCoroutineWrapper {
- AioContext *ctx;
- CoroutineEntry *entry;
- void *entry_arg;
- bool finished;
-} BlockOnCoroutineWrapper;
-
-static void coroutine_fn block_on_coroutine_wrapper(void *opaque)
-{
- BlockOnCoroutineWrapper *wrapper = opaque;
- wrapper->entry(wrapper->entry_arg);
- wrapper->finished = true;
- aio_wait_kick();
-}
-
-void block_on_coroutine_fn(CoroutineEntry *entry, void *entry_arg)
-{
- assert(!qemu_in_coroutine());
-
- AioContext *ctx = qemu_get_current_aio_context();
- BlockOnCoroutineWrapper wrapper = {
- .finished = false,
- .entry = entry,
- .entry_arg = entry_arg,
- .ctx = ctx,
- };
- Coroutine *wrapper_co = qemu_coroutine_create(block_on_coroutine_wrapper, &wrapper);
- aio_co_enter(ctx, wrapper_co);
- AIO_WAIT_WHILE(ctx, !wrapper.finished);
-}
-
// This is called from another thread, so we use aio_co_schedule()
static void proxmox_backup_schedule_wake(void *data) {
CoCtxData *waker = (CoCtxData *)data;
diff --git a/pve-backup.c b/pve-backup.c
index 5fa3cc1352..323014744c 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -357,7 +357,7 @@ static void job_cancel_bh(void *opaque) {
aio_co_enter(data->ctx, data->co);
}
-static void coroutine_fn pvebackup_co_cancel(void *opaque)
+void coroutine_fn qmp_backup_cancel(Error **errp)
{
Error *cancel_err = NULL;
error_setg(&cancel_err, "backup canceled");
@@ -394,11 +394,6 @@ static void coroutine_fn pvebackup_co_cancel(void *opaque)
qemu_co_mutex_unlock(&backup_state.backup_mutex);
}
-void qmp_backup_cancel(Error **errp)
-{
- block_on_coroutine_fn(pvebackup_co_cancel, NULL);
-}
-
// assumes the caller holds backup_mutex
static int coroutine_fn pvebackup_co_add_config(
const char *file,
@@ -533,50 +528,27 @@ static void create_backup_jobs_bh(void *opaque) {
aio_co_enter(data->ctx, data->co);
}
-typedef struct QmpBackupTask {
- const char *backup_file;
- bool has_password;
- const char *password;
- bool has_keyfile;
- const char *keyfile;
- bool has_key_password;
- const char *key_password;
- bool has_backup_id;
- const char *backup_id;
- bool has_backup_time;
- const char *fingerprint;
- bool has_fingerprint;
- int64_t backup_time;
- bool has_use_dirty_bitmap;
- bool use_dirty_bitmap;
- bool has_format;
- BackupFormat format;
- bool has_config_file;
- const char *config_file;
- bool has_firewall_file;
- const char *firewall_file;
- bool has_devlist;
- const char *devlist;
- bool has_compress;
- bool compress;
- bool has_encrypt;
- bool encrypt;
- bool has_speed;
- int64_t speed;
- Error **errp;
- UuidInfo *result;
-} QmpBackupTask;
-
-static void coroutine_fn pvebackup_co_prepare(void *opaque)
+UuidInfo coroutine_fn *qmp_backup(
+ const char *backup_file,
+ bool has_password, const char *password,
+ bool has_keyfile, const char *keyfile,
+ bool has_key_password, const char *key_password,
+ bool has_fingerprint, const char *fingerprint,
+ bool has_backup_id, const char *backup_id,
+ bool has_backup_time, int64_t backup_time,
+ bool has_use_dirty_bitmap, bool use_dirty_bitmap,
+ bool has_compress, bool compress,
+ bool has_encrypt, bool encrypt,
+ bool has_format, BackupFormat format,
+ bool has_config_file, const char *config_file,
+ bool has_firewall_file, const char *firewall_file,
+ bool has_devlist, const char *devlist,
+ bool has_speed, int64_t speed, Error **errp)
{
assert(qemu_in_coroutine());
qemu_co_mutex_lock(&backup_state.backup_mutex);
- QmpBackupTask *task = opaque;
-
- task->result = NULL; // just to be sure
-
BlockBackend *blk;
BlockDriverState *bs = NULL;
const char *backup_dir = NULL;
@@ -593,17 +565,17 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
const char *firewall_name = "qemu-server.fw";
if (backup_state.di_list) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
"previous backup not finished");
qemu_co_mutex_unlock(&backup_state.backup_mutex);
- return;
+ return NULL;
}
/* Todo: try to auto-detect format based on file name */
- BackupFormat format = task->has_format ? task->format : BACKUP_FORMAT_VMA;
+ format = has_format ? format : BACKUP_FORMAT_VMA;
- if (task->has_devlist) {
- devs = g_strsplit_set(task->devlist, ",;:", -1);
+ if (has_devlist) {
+ devs = g_strsplit_set(devlist, ",;:", -1);
gchar **d = devs;
while (d && *d) {
@@ -611,14 +583,14 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (blk) {
bs = blk_bs(blk);
if (!bdrv_is_inserted(bs)) {
- error_setg(task->errp, QERR_DEVICE_HAS_NO_MEDIUM, *d);
+ error_setg(errp, QERR_DEVICE_HAS_NO_MEDIUM, *d);
goto err;
}
PVEBackupDevInfo *di = g_new0(PVEBackupDevInfo, 1);
di->bs = bs;
di_list = g_list_append(di_list, di);
} else {
- error_set(task->errp, ERROR_CLASS_DEVICE_NOT_FOUND,
+ error_set(errp, ERROR_CLASS_DEVICE_NOT_FOUND,
"Device '%s' not found", *d);
goto err;
}
@@ -641,7 +613,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
if (!di_list) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "empty device list");
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "empty device list");
goto err;
}
@@ -651,13 +623,13 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
while (l) {
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
- if (bdrv_op_is_blocked(di->bs, BLOCK_OP_TYPE_BACKUP_SOURCE, task->errp)) {
+ if (bdrv_op_is_blocked(di->bs, BLOCK_OP_TYPE_BACKUP_SOURCE, errp)) {
goto err;
}
ssize_t size = bdrv_getlength(di->bs);
if (size < 0) {
- error_setg_errno(task->errp, -di->size, "bdrv_getlength failed");
+ error_setg_errno(errp, -di->size, "bdrv_getlength failed");
goto err;
}
di->size = size;
@@ -684,47 +656,44 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
if (format == BACKUP_FORMAT_PBS) {
- if (!task->has_password) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'password'");
+ if (!has_password) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'password'");
goto err_mutex;
}
- if (!task->has_backup_id) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-id'");
+ if (!has_backup_id) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-id'");
goto err_mutex;
}
- if (!task->has_backup_time) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-time'");
+ if (!has_backup_time) {
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "missing parameter 'backup-time'");
goto err_mutex;
}
int dump_cb_block_size = PROXMOX_BACKUP_DEFAULT_CHUNK_SIZE; // Hardcoded (4M)
firewall_name = "fw.conf";
- bool use_dirty_bitmap = task->has_use_dirty_bitmap && task->use_dirty_bitmap;
-
-
char *pbs_err = NULL;
pbs = proxmox_backup_new(
- task->backup_file,
- task->backup_id,
- task->backup_time,
+ backup_file,
+ backup_id,
+ backup_time,
dump_cb_block_size,
- task->has_password ? task->password : NULL,
- task->has_keyfile ? task->keyfile : NULL,
- task->has_key_password ? task->key_password : NULL,
- task->has_compress ? task->compress : true,
- task->has_encrypt ? task->encrypt : task->has_keyfile,
- task->has_fingerprint ? task->fingerprint : NULL,
+ has_password ? password : NULL,
+ has_keyfile ? keyfile : NULL,
+ has_key_password ? key_password : NULL,
+ has_compress ? compress : true,
+ has_encrypt ? encrypt : has_keyfile,
+ has_fingerprint ? fingerprint : NULL,
&pbs_err);
if (!pbs) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
"proxmox_backup_new failed: %s", pbs_err);
proxmox_backup_free_error(pbs_err);
goto err_mutex;
}
- int connect_result = proxmox_backup_co_connect(pbs, task->errp);
+ int connect_result = proxmox_backup_co_connect(pbs, errp);
if (connect_result < 0)
goto err_mutex;
@@ -743,9 +712,9 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
BdrvDirtyBitmap *bitmap = bdrv_find_dirty_bitmap(di->bs, PBS_BITMAP_NAME);
bool expect_only_dirty = false;
- if (use_dirty_bitmap) {
+ if (has_use_dirty_bitmap && use_dirty_bitmap) {
if (bitmap == NULL) {
- bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, task->errp);
+ bitmap = bdrv_create_dirty_bitmap(di->bs, dump_cb_block_size, PBS_BITMAP_NAME, errp);
if (!bitmap) {
goto err_mutex;
}
@@ -775,12 +744,12 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
}
}
- int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, task->errp);
+ int dev_id = proxmox_backup_co_register_image(pbs, devname, di->size, expect_only_dirty, errp);
if (dev_id < 0) {
goto err_mutex;
}
- if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, task->errp))) {
+ if (!(di->target = bdrv_backup_dump_create(dump_cb_block_size, di->size, pvebackup_co_dump_pbs_cb, di, errp))) {
goto err_mutex;
}
@@ -794,10 +763,10 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
backup_state.stat.bitmap_list = g_list_append(backup_state.stat.bitmap_list, info);
}
} else if (format == BACKUP_FORMAT_VMA) {
- vmaw = vma_writer_create(task->backup_file, uuid, &local_err);
+ vmaw = vma_writer_create(backup_file, uuid, &local_err);
if (!vmaw) {
if (local_err) {
- error_propagate(task->errp, local_err);
+ error_propagate(errp, local_err);
}
goto err_mutex;
}
@@ -808,25 +777,25 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
l = g_list_next(l);
- if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_vma_cb, di, task->errp))) {
+ if (!(di->target = bdrv_backup_dump_create(VMA_CLUSTER_SIZE, di->size, pvebackup_co_dump_vma_cb, di, errp))) {
goto err_mutex;
}
const char *devname = bdrv_get_device_name(di->bs);
di->dev_id = vma_writer_register_stream(vmaw, devname, di->size);
if (di->dev_id <= 0) {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR,
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR,
"register_stream failed");
goto err_mutex;
}
}
} else if (format == BACKUP_FORMAT_DIR) {
- if (mkdir(task->backup_file, 0640) != 0) {
- error_setg_errno(task->errp, errno, "can't create directory '%s'\n",
- task->backup_file);
+ if (mkdir(backup_file, 0640) != 0) {
+ error_setg_errno(errp, errno, "can't create directory '%s'\n",
+ backup_file);
goto err_mutex;
}
- backup_dir = task->backup_file;
+ backup_dir = backup_file;
l = di_list;
while (l) {
@@ -840,34 +809,34 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
bdrv_img_create(di->targetfile, "raw", NULL, NULL, NULL,
di->size, flags, false, &local_err);
if (local_err) {
- error_propagate(task->errp, local_err);
+ error_propagate(errp, local_err);
goto err_mutex;
}
di->target = bdrv_open(di->targetfile, NULL, NULL, flags, &local_err);
if (!di->target) {
- error_propagate(task->errp, local_err);
+ error_propagate(errp, local_err);
goto err_mutex;
}
}
} else {
- error_set(task->errp, ERROR_CLASS_GENERIC_ERROR, "unknown backup format");
+ error_set(errp, ERROR_CLASS_GENERIC_ERROR, "unknown backup format");
goto err_mutex;
}
/* add configuration file to archive */
- if (task->has_config_file) {
- if (pvebackup_co_add_config(task->config_file, config_name, format, backup_dir,
- vmaw, pbs, task->errp) != 0) {
+ if (has_config_file) {
+ if (pvebackup_co_add_config(config_file, config_name, format, backup_dir,
+ vmaw, pbs, errp) != 0) {
goto err_mutex;
}
}
/* add firewall file to archive */
- if (task->has_firewall_file) {
- if (pvebackup_co_add_config(task->firewall_file, firewall_name, format, backup_dir,
- vmaw, pbs, task->errp) != 0) {
+ if (has_firewall_file) {
+ if (pvebackup_co_add_config(firewall_file, firewall_name, format, backup_dir,
+ vmaw, pbs, errp) != 0) {
goto err_mutex;
}
}
@@ -885,7 +854,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
if (backup_state.stat.backup_file) {
g_free(backup_state.stat.backup_file);
}
- backup_state.stat.backup_file = g_strdup(task->backup_file);
+ backup_state.stat.backup_file = g_strdup(backup_file);
uuid_copy(backup_state.stat.uuid, uuid);
uuid_unparse_lower(uuid, backup_state.stat.uuid_str);
@@ -900,7 +869,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
qemu_mutex_unlock(&backup_state.stat.lock);
- backup_state.speed = (task->has_speed && task->speed > 0) ? task->speed : 0;
+ backup_state.speed = (has_speed && speed > 0) ? speed : 0;
backup_state.vmaw = vmaw;
backup_state.pbs = pbs;
@@ -910,8 +879,6 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
uuid_info = g_malloc0(sizeof(*uuid_info));
uuid_info->UUID = uuid_str;
- task->result = uuid_info;
-
/* Run create_backup_jobs_bh outside of coroutine (in BH) but keep
* backup_mutex locked. This is fine, a CoMutex can be held across yield
* points, and we'll release it as soon as the BH reschedules us.
@@ -925,7 +892,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
qemu_coroutine_yield();
if (local_err) {
- error_propagate(task->errp, local_err);
+ error_propagate(errp, local_err);
goto err;
}
@@ -938,7 +905,7 @@ static void coroutine_fn pvebackup_co_prepare(void *opaque)
/* start the first job in the transaction */
job_txn_start_seq(backup_state.txn);
- return;
+ return uuid_info;
err_mutex:
qemu_mutex_unlock(&backup_state.stat.lock);
@@ -969,7 +936,7 @@ err:
if (vmaw) {
Error *err = NULL;
vma_writer_close(vmaw, &err);
- unlink(task->backup_file);
+ unlink(backup_file);
}
if (pbs) {
@@ -980,65 +947,8 @@ err:
rmdir(backup_dir);
}
- task->result = NULL;
-
qemu_co_mutex_unlock(&backup_state.backup_mutex);
- return;
-}
-
-UuidInfo *qmp_backup(
- const char *backup_file,
- bool has_password, const char *password,
- bool has_keyfile, const char *keyfile,
- bool has_key_password, const char *key_password,
- bool has_fingerprint, const char *fingerprint,
- bool has_backup_id, const char *backup_id,
- bool has_backup_time, int64_t backup_time,
- bool has_use_dirty_bitmap, bool use_dirty_bitmap,
- bool has_compress, bool compress,
- bool has_encrypt, bool encrypt,
- bool has_format, BackupFormat format,
- bool has_config_file, const char *config_file,
- bool has_firewall_file, const char *firewall_file,
- bool has_devlist, const char *devlist,
- bool has_speed, int64_t speed, Error **errp)
-{
- QmpBackupTask task = {
- .backup_file = backup_file,
- .has_password = has_password,
- .password = password,
- .has_keyfile = has_keyfile,
- .keyfile = keyfile,
- .has_key_password = has_key_password,
- .key_password = key_password,
- .has_fingerprint = has_fingerprint,
- .fingerprint = fingerprint,
- .has_backup_id = has_backup_id,
- .backup_id = backup_id,
- .has_backup_time = has_backup_time,
- .backup_time = backup_time,
- .has_use_dirty_bitmap = has_use_dirty_bitmap,
- .use_dirty_bitmap = use_dirty_bitmap,
- .has_compress = has_compress,
- .compress = compress,
- .has_encrypt = has_encrypt,
- .encrypt = encrypt,
- .has_format = has_format,
- .format = format,
- .has_config_file = has_config_file,
- .config_file = config_file,
- .has_firewall_file = has_firewall_file,
- .firewall_file = firewall_file,
- .has_devlist = has_devlist,
- .devlist = devlist,
- .has_speed = has_speed,
- .speed = speed,
- .errp = errp,
- };
-
- block_on_coroutine_fn(pvebackup_co_prepare, &task);
-
- return task.result;
+ return NULL;
}
BackupStatus *qmp_query_backup(Error **errp)
diff --git a/qapi/block-core.json b/qapi/block-core.json
index c99ddf8628..829dc7b8e9 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -846,7 +846,7 @@
'*config-file': 'str',
'*firewall-file': 'str',
'*devlist': 'str', '*speed': 'int' },
- 'returns': 'UuidInfo' }
+ 'returns': 'UuidInfo', 'coroutine': true }
##
# @query-backup:
@@ -868,7 +868,7 @@
# Notes: This command succeeds even if there is no backup process running.
#
##
-{ 'command': 'backup-cancel' }
+{ 'command': 'backup-cancel', 'coroutine': true }
##
# @ProxmoxSupportStatus:

View File

@@ -0,0 +1,98 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 10 Feb 2021 11:07:06 +0100
Subject: [PATCH] PBS: add master key support
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
this requires a new enough libproxmox-backup-qemu0, and allows querying
from the PVE side to avoid QMP calls with unsupported parameters.
Signed-off-by: Fabian Grünbichler <f.gruenbichler@proxmox.com>
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 1 +
pve-backup.c | 3 +++
qapi/block-core.json | 7 +++++++
3 files changed, 11 insertions(+)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index c9849a5b29..52ddbf95ad 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1039,6 +1039,7 @@ void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
false, NULL, // PBS password
false, NULL, // PBS keyfile
false, NULL, // PBS key_password
+ false, NULL, // PBS master_keyfile
false, NULL, // PBS fingerprint
false, NULL, // PBS backup-id
false, 0, // PBS backup-time
diff --git a/pve-backup.c b/pve-backup.c
index 323014744c..9f6c04a512 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -533,6 +533,7 @@ UuidInfo coroutine_fn *qmp_backup(
bool has_password, const char *password,
bool has_keyfile, const char *keyfile,
bool has_key_password, const char *key_password,
+ bool has_master_keyfile, const char *master_keyfile,
bool has_fingerprint, const char *fingerprint,
bool has_backup_id, const char *backup_id,
bool has_backup_time, int64_t backup_time,
@@ -681,6 +682,7 @@ UuidInfo coroutine_fn *qmp_backup(
has_password ? password : NULL,
has_keyfile ? keyfile : NULL,
has_key_password ? key_password : NULL,
+ has_master_keyfile ? master_keyfile : NULL,
has_compress ? compress : true,
has_encrypt ? encrypt : has_keyfile,
has_fingerprint ? fingerprint : NULL,
@@ -1044,5 +1046,6 @@ ProxmoxSupportStatus *qmp_query_proxmox_support(Error **errp)
ret->pbs_dirty_bitmap_savevm = true;
ret->pbs_dirty_bitmap_migration = true;
ret->query_bitmap_info = true;
+ ret->pbs_masterkey = true;
return ret;
}
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 829dc7b8e9..d089328a1f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -817,6 +817,8 @@
#
# @key-password: password for keyfile (optional for format 'pbs')
#
+# @master-keyfile: PEM-formatted master public keyfile (optional for format 'pbs')
+#
# @fingerprint: server cert fingerprint (optional for format 'pbs')
#
# @backup-id: backup ID (required for format 'pbs')
@@ -836,6 +838,7 @@
'*password': 'str',
'*keyfile': 'str',
'*key-password': 'str',
+ '*master-keyfile': 'str',
'*fingerprint': 'str',
'*backup-id': 'str',
'*backup-time': 'int',
@@ -888,6 +891,9 @@
# migration cap if this is false/unset may lead
# to crashes on migration!
#
+# @pbs-masterkey: True if the QMP backup call supports the 'master_keyfile'
+# parameter.
+#
# @pbs-library-version: Running version of libproxmox-backup-qemu0 library.
#
##
@@ -896,6 +902,7 @@
'query-bitmap-info': 'bool',
'pbs-dirty-bitmap-savevm': 'bool',
'pbs-dirty-bitmap-migration': 'bool',
+ 'pbs-masterkey': 'bool',
'pbs-library-version': 'str' } }
##

View File

@@ -1,111 +0,0 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
Date: Thu, 30 Apr 2020 15:55:37 +0200
Subject: [PATCH] move savevm-async back into a coroutine
Move qemu_savevm_state_{header,setup} into the main loop and
the rest of the iteration into a coroutine. The former need
to lock the iothread (and we can't unlock it in the
coroutine), and the latter can't deal with being in a
separate thread, so a coroutine it must be.
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
savevm-async.c | 28 +++++++++-------------------
1 file changed, 9 insertions(+), 19 deletions(-)
diff --git a/savevm-async.c b/savevm-async.c
index a38b15d652..af865b9a0a 100644
--- a/savevm-async.c
+++ b/savevm-async.c
@@ -51,7 +51,7 @@ static struct SnapshotState {
QEMUFile *file;
int64_t total_time;
QEMUBH *cleanup_bh;
- QemuThread thread;
+ Coroutine *co;
} snap_state;
SaveVMInfo *qmp_query_savevm(Error **errp)
@@ -201,11 +201,9 @@ static void process_savevm_cleanup(void *opaque)
int ret;
qemu_bh_delete(snap_state.cleanup_bh);
snap_state.cleanup_bh = NULL;
+ snap_state.co = NULL;
qemu_savevm_state_cleanup();
- qemu_mutex_unlock_iothread();
- qemu_thread_join(&snap_state.thread);
- qemu_mutex_lock_iothread();
ret = save_snapshot_cleanup();
if (ret < 0) {
save_snapshot_error("save_snapshot_cleanup error %d", ret);
@@ -221,18 +219,13 @@ static void process_savevm_cleanup(void *opaque)
}
}
-static void *process_savevm_thread(void *opaque)
+static void process_savevm_coro(void *opaque)
{
int ret;
int64_t maxlen;
MigrationState *ms = migrate_get_current();
- rcu_register_thread();
-
- qemu_savevm_state_header(snap_state.file);
- qemu_savevm_state_setup(snap_state.file);
ret = qemu_file_get_error(snap_state.file);
-
if (ret < 0) {
save_snapshot_error("qemu_savevm_state_setup failed");
goto out;
@@ -247,16 +240,13 @@ static void *process_savevm_thread(void *opaque)
maxlen = blk_getlength(snap_state.target) - 30*1024*1024;
if (pending_size > 400000 && snap_state.bs_pos + pending_size < maxlen) {
- qemu_mutex_lock_iothread();
ret = qemu_savevm_state_iterate(snap_state.file, false);
if (ret < 0) {
save_snapshot_error("qemu_savevm_state_iterate error %d", ret);
break;
}
- qemu_mutex_unlock_iothread();
DPRINTF("savevm inerate pending size %lu ret %d\n", pending_size, ret);
} else {
- qemu_mutex_lock_iothread();
qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
ret = global_state_store();
if (ret) {
@@ -285,16 +275,12 @@ static void *process_savevm_thread(void *opaque)
}
qemu_bh_schedule(snap_state.cleanup_bh);
- qemu_mutex_unlock_iothread();
out:
/* set migration state accordingly and clear soon-to-be stale file */
migrate_set_state(&ms->state, MIGRATION_STATUS_SETUP,
ret ? MIGRATION_STATUS_FAILED : MIGRATION_STATUS_COMPLETED);
ms->to_dst_file = NULL;
-
- rcu_unregister_thread();
- return NULL;
}
void qmp_savevm_start(bool has_statefile, const char *statefile, Error **errp)
@@ -373,8 +359,12 @@ void qmp_savevm_start(bool has_statefile, const char *statefile, Error **errp)
snap_state.state = SAVE_STATE_ACTIVE;
snap_state.cleanup_bh = qemu_bh_new(process_savevm_cleanup, &snap_state);
- qemu_thread_create(&snap_state.thread, "savevm-async", process_savevm_thread,
- NULL, QEMU_THREAD_JOINABLE);
+ snap_state.co = qemu_coroutine_create(&process_savevm_coro, NULL);
+ qemu_mutex_unlock_iothread();
+ qemu_savevm_state_header(snap_state.file);
+ qemu_savevm_state_setup(snap_state.file);
+ qemu_mutex_lock_iothread();
+ aio_co_schedule(iohandler_get_aio_context(), snap_state.co);
return;

View File

@@ -0,0 +1,53 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 9 Dec 2020 11:46:57 +0100
Subject: [PATCH] PVE: block/pbs: fast-path reads without allocation if
possible
...and switch over to g_malloc/g_free while at it to align with other
QEMU code.
Tracing shows the fast-path is taken almost all the time, though not
100% so the slow one is still necessary.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/pbs.c | 17 ++++++++++++++---
1 file changed, 14 insertions(+), 3 deletions(-)
diff --git a/block/pbs.c b/block/pbs.c
index 0b05ea9080..c5eb4d5bad 100644
--- a/block/pbs.c
+++ b/block/pbs.c
@@ -200,7 +200,16 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
BDRVPBSState *s = bs->opaque;
int ret;
char *pbs_error = NULL;
- uint8_t *buf = malloc(bytes);
+ uint8_t *buf;
+ bool inline_buf = true;
+
+ /* for single-buffer IO vectors we can fast-path the write directly to it */
+ if (qiov->niov == 1 && qiov->iov->iov_len >= bytes) {
+ buf = qiov->iov->iov_base;
+ } else {
+ inline_buf = false;
+ buf = g_malloc(bytes);
+ }
if (offset < 0 || bytes < 0) {
fprintf(stderr, "unexpected negative 'offset' or 'bytes' value!\n");
@@ -223,8 +232,10 @@ static coroutine_fn int pbs_co_preadv(BlockDriverState *bs,
return -EIO;
}
- qemu_iovec_from_buf(qiov, 0, buf, bytes);
- free(buf);
+ if (!inline_buf) {
+ qemu_iovec_from_buf(qiov, 0, buf, bytes);
+ g_free(buf);
+ }
return ret;
}

View File

@@ -0,0 +1,25 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Tue, 2 Mar 2021 16:34:28 +0100
Subject: [PATCH] PVE: block/stream: increase chunk size
Ceph favors bigger chunks, so increase to 4M.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/stream.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/block/stream.c b/block/stream.c
index e45113aed6..c3c0c5febe 100644
--- a/block/stream.c
+++ b/block/stream.c
@@ -28,7 +28,7 @@ enum {
* large enough to process multiple clusters in a single call, so
* that populating contiguous regions of the image is efficient.
*/
- STREAM_CHUNK = 512 * 1024, /* in bytes */
+ STREAM_CHUNK = 4 * 1024 * 1024, /* in bytes */
};
typedef struct StreamBlockJob {

View File

@@ -0,0 +1,33 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Tue, 2 Mar 2021 16:11:54 +0100
Subject: [PATCH] block/io: accept NULL qiov in bdrv_pad_request
Some operations, e.g. block-stream, perform reads while discarding the
results (only copy-on-read matters). In this case they will pass NULL as
the target QEMUIOVector, which will however trip bdrv_pad_request, since
it wants to extend its passed vector.
Simply check for NULL and do nothing, there's no reason to pad the
target if it will be discarded anyway.
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
block/io.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/block/io.c b/block/io.c
index 4e4cb556c5..04061f1e68 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1765,6 +1765,10 @@ static int bdrv_pad_request(BlockDriverState *bs,
{
int ret;
+ if (!qiov) {
+ return 0;
+ }
+
bdrv_check_qiov_request(*offset, *bytes, *qiov, *qiov_offset, &error_abort);
if (!bdrv_init_padding(bs, *offset, *bytes, pad)) {

View File

@@ -0,0 +1,402 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Mon, 7 Dec 2020 15:21:03 +0100
Subject: [PATCH] block: add alloc-track driver
Add a new filter node 'alloc-track', which seperates reads and writes to
different children, thus allowing to put a backing image behind any
blockdev (regardless of driver support). Since we can't detect any
pre-allocated blocks, we can only track new writes, hence the write
target ('file') for this node must always be empty.
Intended use case is for live restoring, i.e. add a backup image as a
block device into a VM, then put an alloc-track on the restore target
and set the backup as backing. With this, one can use a regular
'block-stream' to restore the image, while the VM can already run in the
background. Copy-on-read will help make progress as the VM reads as
well.
This only worked if the target supports backing images, so up until now
only for qcow2, with alloc-track any driver for the target can be used.
If 'auto-remove' is set, alloc-track will automatically detach itself
once the backing image is removed. It will be replaced by 'file'.
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
[adapt to changed function signatures]
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
block/alloc-track.c | 350 ++++++++++++++++++++++++++++++++++++++++++++
block/meson.build | 1 +
2 files changed, 351 insertions(+)
create mode 100644 block/alloc-track.c
diff --git a/block/alloc-track.c b/block/alloc-track.c
new file mode 100644
index 0000000000..6b50fbe537
--- /dev/null
+++ b/block/alloc-track.c
@@ -0,0 +1,350 @@
+/*
+ * Node to allow backing images to be applied to any node. Assumes a blank
+ * image to begin with, only new writes are tracked as allocated, thus this
+ * must never be put on a node that already contains data.
+ *
+ * Copyright (c) 2020 Proxmox Server Solutions GmbH
+ * Copyright (c) 2020 Stefan Reiter <s.reiter@proxmox.com>
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "block/block_int.h"
+#include "qapi/qmp/qdict.h"
+#include "qapi/qmp/qstring.h"
+#include "qemu/cutils.h"
+#include "qemu/option.h"
+#include "qemu/module.h"
+#include "sysemu/block-backend.h"
+
+#define TRACK_OPT_AUTO_REMOVE "auto-remove"
+
+typedef enum DropState {
+ DropNone,
+ DropRequested,
+ DropInProgress,
+} DropState;
+
+typedef struct {
+ BdrvDirtyBitmap *bitmap;
+ DropState drop_state;
+ bool auto_remove;
+} BDRVAllocTrackState;
+
+static QemuOptsList runtime_opts = {
+ .name = "alloc-track",
+ .head = QTAILQ_HEAD_INITIALIZER(runtime_opts.head),
+ .desc = {
+ {
+ .name = TRACK_OPT_AUTO_REMOVE,
+ .type = QEMU_OPT_BOOL,
+ .help = "automatically replace this node with 'file' when 'backing'"
+ "is detached",
+ },
+ { /* end of list */ }
+ },
+};
+
+static void track_refresh_limits(BlockDriverState *bs, Error **errp)
+{
+ BlockDriverInfo bdi;
+
+ if (!bs->file) {
+ return;
+ }
+
+ /* always use alignment from underlying write device so RMW cycle for
+ * bdrv_pwritev reads data from our backing via track_co_preadv (no partial
+ * cluster allocation in 'file') */
+ bdrv_get_info(bs->file->bs, &bdi);
+ bs->bl.request_alignment = MAX(bs->file->bs->bl.request_alignment,
+ MAX(bdi.cluster_size, BDRV_SECTOR_SIZE));
+}
+
+static int track_open(BlockDriverState *bs, QDict *options, int flags,
+ Error **errp)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+ QemuOpts *opts;
+ Error *local_err = NULL;
+ int ret = 0;
+
+ opts = qemu_opts_create(&runtime_opts, NULL, 0, &error_abort);
+ qemu_opts_absorb_qdict(opts, options, &local_err);
+ if (local_err) {
+ error_propagate(errp, local_err);
+ ret = -EINVAL;
+ goto fail;
+ }
+
+ s->auto_remove = qemu_opt_get_bool(opts, TRACK_OPT_AUTO_REMOVE, false);
+
+ /* open the target (write) node, backing will be attached by block layer */
+ bs->file = bdrv_open_child(NULL, options, "file", bs, &child_of_bds,
+ BDRV_CHILD_DATA | BDRV_CHILD_METADATA, false,
+ &local_err);
+ if (local_err) {
+ ret = -EINVAL;
+ error_propagate(errp, local_err);
+ goto fail;
+ }
+
+ track_refresh_limits(bs, errp);
+ uint64_t gran = bs->bl.request_alignment;
+ s->bitmap = bdrv_create_dirty_bitmap(bs->file->bs, gran, NULL, &local_err);
+ if (local_err) {
+ ret = -EIO;
+ error_propagate(errp, local_err);
+ goto fail;
+ }
+
+ s->drop_state = DropNone;
+
+fail:
+ if (ret < 0) {
+ bdrv_unref_child(bs, bs->file);
+ if (s->bitmap) {
+ bdrv_release_dirty_bitmap(s->bitmap);
+ }
+ }
+ qemu_opts_del(opts);
+ return ret;
+}
+
+static void track_close(BlockDriverState *bs)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+ if (s->bitmap) {
+ bdrv_release_dirty_bitmap(s->bitmap);
+ }
+}
+
+static int64_t track_getlength(BlockDriverState *bs)
+{
+ return bdrv_getlength(bs->file->bs);
+}
+
+static int coroutine_fn track_co_preadv(BlockDriverState *bs,
+ int64_t offset, int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+ QEMUIOVector local_qiov;
+ int ret;
+
+ /* 'cur_offset' is relative to 'offset', 'local_offset' to image start */
+ uint64_t cur_offset, local_offset;
+ int64_t local_bytes;
+ bool alloc;
+
+ if (offset < 0 || bytes < 0) {
+ fprintf(stderr, "unexpected negative 'offset' or 'bytes' value!\n");
+ return -EINVAL;
+ }
+
+ /* a read request can span multiple granularity-sized chunks, and can thus
+ * contain blocks with different allocation status - we could just iterate
+ * granularity-wise, but for better performance use bdrv_dirty_bitmap_next_X
+ * to find the next flip and consider everything up to that in one go */
+ for (cur_offset = 0; cur_offset < bytes; cur_offset += local_bytes) {
+ local_offset = offset + cur_offset;
+ alloc = bdrv_dirty_bitmap_get(s->bitmap, local_offset);
+ if (alloc) {
+ local_bytes = bdrv_dirty_bitmap_next_zero(s->bitmap, local_offset,
+ bytes - cur_offset);
+ } else {
+ local_bytes = bdrv_dirty_bitmap_next_dirty(s->bitmap, local_offset,
+ bytes - cur_offset);
+ }
+
+ /* _bitmap_next_X return is -1 if no end found within limit, otherwise
+ * offset of next flip (to start of image) */
+ local_bytes = local_bytes < 0 ?
+ bytes - cur_offset :
+ local_bytes - local_offset;
+
+ qemu_iovec_init_slice(&local_qiov, qiov, cur_offset, local_bytes);
+
+ if (alloc) {
+ ret = bdrv_co_preadv(bs->file, local_offset, local_bytes,
+ &local_qiov, flags);
+ } else if (bs->backing) {
+ ret = bdrv_co_preadv(bs->backing, local_offset, local_bytes,
+ &local_qiov, flags);
+ } else {
+ ret = qemu_iovec_memset(&local_qiov, cur_offset, 0, local_bytes);
+ }
+
+ if (ret != 0) {
+ break;
+ }
+ }
+
+ return ret;
+}
+
+static int coroutine_fn track_co_pwritev(BlockDriverState *bs,
+ int64_t offset, int64_t bytes, QEMUIOVector *qiov, BdrvRequestFlags flags)
+{
+ return bdrv_co_pwritev(bs->file, offset, bytes, qiov, flags);
+}
+
+static int coroutine_fn track_co_pwrite_zeroes(BlockDriverState *bs,
+ int64_t offset, int64_t bytes, BdrvRequestFlags flags)
+{
+ return bdrv_co_pwrite_zeroes(bs->file, offset, bytes, flags);
+}
+
+static int coroutine_fn track_co_pdiscard(BlockDriverState *bs,
+ int64_t offset, int64_t bytes)
+{
+ return bdrv_co_pdiscard(bs->file, offset, bytes);
+}
+
+static coroutine_fn int track_co_flush(BlockDriverState *bs)
+{
+ return bdrv_co_flush(bs->file->bs);
+}
+
+static int coroutine_fn track_co_block_status(BlockDriverState *bs,
+ bool want_zero,
+ int64_t offset,
+ int64_t bytes,
+ int64_t *pnum,
+ int64_t *map,
+ BlockDriverState **file)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+
+ bool alloc = bdrv_dirty_bitmap_get(s->bitmap, offset);
+ int64_t next_flipped;
+ if (alloc) {
+ next_flipped = bdrv_dirty_bitmap_next_zero(s->bitmap, offset, bytes);
+ } else {
+ next_flipped = bdrv_dirty_bitmap_next_dirty(s->bitmap, offset, bytes);
+ }
+
+ /* in case not the entire region has the same state, we need to set pnum to
+ * indicate for how many bytes our result is valid */
+ *pnum = next_flipped == -1 ? bytes : next_flipped - offset;
+ *map = offset;
+
+ if (alloc) {
+ *file = bs->file->bs;
+ return BDRV_BLOCK_RAW | BDRV_BLOCK_OFFSET_VALID;
+ } else if (bs->backing) {
+ *file = bs->backing->bs;
+ }
+ return 0;
+}
+
+static void track_child_perm(BlockDriverState *bs, BdrvChild *c,
+ BdrvChildRole role, BlockReopenQueue *reopen_queue,
+ uint64_t perm, uint64_t shared,
+ uint64_t *nperm, uint64_t *nshared)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+
+ *nshared = BLK_PERM_ALL;
+
+ /* in case we're currently dropping ourselves, claim to not use any
+ * permissions at all - which is fine, since from this point on we will
+ * never issue a read or write anymore */
+ if (s->drop_state == DropInProgress) {
+ *nperm = 0;
+ return;
+ }
+
+ if (role & BDRV_CHILD_DATA) {
+ *nperm = perm & DEFAULT_PERM_PASSTHROUGH;
+ } else {
+ /* 'backing' is also a child of our BDS, but we don't expect it to be
+ * writeable, so we only forward 'consistent read' */
+ *nperm = perm & BLK_PERM_CONSISTENT_READ;
+ }
+}
+
+static void track_drop(void *opaque)
+{
+ BlockDriverState *bs = (BlockDriverState*)opaque;
+ BlockDriverState *file = bs->file->bs;
+ BDRVAllocTrackState *s = bs->opaque;
+
+ assert(file);
+
+ /* we rely on the fact that we're not used anywhere else, so let's wait
+ * until we're only used once - in the drive connected to the guest (and one
+ * ref is held by bdrv_ref in track_change_backing_file) */
+ if (bs->refcnt > 2) {
+ aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, opaque);
+ return;
+ }
+ AioContext *aio_context = bdrv_get_aio_context(bs);
+ aio_context_acquire(aio_context);
+
+ bdrv_drained_begin(bs);
+
+ /* now that we're drained, we can safely set 'DropInProgress' */
+ s->drop_state = DropInProgress;
+ bdrv_child_refresh_perms(bs, bs->file, &error_abort);
+
+ bdrv_replace_node(bs, file, &error_abort);
+ bdrv_set_backing_hd(bs, NULL, &error_abort);
+ bdrv_drained_end(bs);
+ bdrv_unref(bs);
+ aio_context_release(aio_context);
+}
+
+static int track_change_backing_file(BlockDriverState *bs,
+ const char *backing_file,
+ const char *backing_fmt)
+{
+ BDRVAllocTrackState *s = bs->opaque;
+ if (s->auto_remove && s->drop_state == DropNone &&
+ backing_file == NULL && backing_fmt == NULL)
+ {
+ /* backing file has been disconnected, there's no longer any use for
+ * this node, so let's remove ourselves from the block graph - we need
+ * to schedule this for later however, since when this function is
+ * called, the blockjob modifying us is probably not done yet and has a
+ * blocker on 'bs' */
+ s->drop_state = DropRequested;
+ bdrv_ref(bs);
+ aio_bh_schedule_oneshot(qemu_get_aio_context(), track_drop, (void*)bs);
+ }
+
+ return 0;
+}
+
+static BlockDriver bdrv_alloc_track = {
+ .format_name = "alloc-track",
+ .instance_size = sizeof(BDRVAllocTrackState),
+
+ .bdrv_file_open = track_open,
+ .bdrv_close = track_close,
+ .bdrv_getlength = track_getlength,
+ .bdrv_child_perm = track_child_perm,
+ .bdrv_refresh_limits = track_refresh_limits,
+
+ .bdrv_co_pwrite_zeroes = track_co_pwrite_zeroes,
+ .bdrv_co_pwritev = track_co_pwritev,
+ .bdrv_co_preadv = track_co_preadv,
+ .bdrv_co_pdiscard = track_co_pdiscard,
+
+ .bdrv_co_flush = track_co_flush,
+ .bdrv_co_flush_to_disk = track_co_flush,
+
+ .supports_backing = true,
+
+ .bdrv_co_block_status = track_co_block_status,
+ .bdrv_change_backing_file = track_change_backing_file,
+};
+
+static void bdrv_alloc_track_init(void)
+{
+ bdrv_register(&bdrv_alloc_track);
+}
+
+block_init(bdrv_alloc_track_init);
diff --git a/block/meson.build b/block/meson.build
index 8c758c0218..45b72e10f1 100644
--- a/block/meson.build
+++ b/block/meson.build
@@ -2,6 +2,7 @@ block_ss.add(genh)
block_ss.add(files(
'accounting.c',
'aio_task.c',
+ 'alloc-track.c',
'amend.c',
'backup.c',
'backup-dump.c',

View File

@@ -0,0 +1,33 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 26 May 2021 15:26:30 +0200
Subject: [PATCH] PVE: whitelist 'invalid' QAPI names for backwards compat
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
qapi/pragma.json | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/qapi/pragma.json b/qapi/pragma.json
index 7c91ea3685..c3888d654c 100644
--- a/qapi/pragma.json
+++ b/qapi/pragma.json
@@ -12,6 +12,7 @@
'device_add',
'device_del',
'expire_password',
+ 'get_link_status',
'migrate_cancel',
'netdev_add',
'netdev_del',
@@ -60,6 +61,8 @@
'SysEmuTarget', # query-cpu-fast, query-target
'UuidInfo', # query-uuid
'VncClientInfo', # query-vnc, query-vnc-servers, ...
- 'X86CPURegister32' # qom-get of x86 CPU properties
+ 'X86CPURegister32', # qom-get of x86 CPU properties
# feature-words, filtered-features
+ 'BlockdevOptionsPbs', # for PBS backwards compat
+ 'BalloonInfo'
] } }

View File

@@ -0,0 +1,35 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Stefan Reiter <s.reiter@proxmox.com>
Date: Wed, 26 May 2021 17:36:55 +0200
Subject: [PATCH] PVE: savevm-async: register yank before
migration_incoming_state_destroy
Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
migration/savevm-async.c | 5 +++++
1 file changed, 5 insertions(+)
diff --git a/migration/savevm-async.c b/migration/savevm-async.c
index 970ee3b3fc..b3ccc069f1 100644
--- a/migration/savevm-async.c
+++ b/migration/savevm-async.c
@@ -19,6 +19,7 @@
#include "qemu/timer.h"
#include "qemu/main-loop.h"
#include "qemu/rcu.h"
+#include "qemu/yank.h"
/* #define DEBUG_SAVEVM_STATE */
@@ -580,6 +581,10 @@ int load_snapshot_from_blockdev(const char *filename, Error **errp)
dirty_bitmap_mig_before_vm_start();
qemu_fclose(f);
+
+ /* state_destroy assumes a real migration which would have added a yank */
+ yank_register_instance(MIGRATION_YANK_INSTANCE, &error_abort);
+
migration_incoming_state_destroy();
if (ret < 0) {
error_setg_errno(errp, -ret, "Error while loading VM state");

View File

@@ -0,0 +1,130 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Mon, 7 Feb 2022 14:21:01 +0100
Subject: [PATCH] qemu-img: dd: add -l option for loading a snapshot
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
docs/tools/qemu-img.rst | 6 +++---
qemu-img-cmds.hx | 4 ++--
qemu-img.c | 33 +++++++++++++++++++++++++++++++--
3 files changed, 36 insertions(+), 7 deletions(-)
diff --git a/docs/tools/qemu-img.rst b/docs/tools/qemu-img.rst
index a49badb158..1039aec01c 100644
--- a/docs/tools/qemu-img.rst
+++ b/docs/tools/qemu-img.rst
@@ -492,10 +492,10 @@ Command description:
it doesn't need to be specified separately in this case.
-.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [-n] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] if=INPUT of=OUTPUT
+.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [-n] [-l SNAPSHOT_PARAM] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] if=INPUT of=OUTPUT
- dd copies from *INPUT* file to *OUTPUT* file converting it from
- *FMT* format to *OUTPUT_FMT* format.
+ dd copies from *INPUT* file or snapshot *SNAPSHOT_PARAM* to *OUTPUT* file
+ converting it from *FMT* format to *OUTPUT_FMT* format.
The data is by default read and written using blocks of 512 bytes but can be
modified by specifying *BLOCK_SIZE*. If count=\ *BLOCKS* is specified
diff --git a/qemu-img-cmds.hx b/qemu-img-cmds.hx
index f3b2b1b4de..e77ed9347f 100644
--- a/qemu-img-cmds.hx
+++ b/qemu-img-cmds.hx
@@ -58,9 +58,9 @@ SRST
ERST
DEF("dd", img_dd,
- "dd [--image-opts] [-U] [-f fmt] [-O output_fmt] [-n] [bs=block_size] [count=blocks] [skip=blocks] [osize=output_size] if=input of=output")
+ "dd [--image-opts] [-U] [-f fmt] [-O output_fmt] [-n] [-l snapshot_param] [bs=block_size] [count=blocks] [skip=blocks] [osize=output_size] if=input of=output")
SRST
-.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [-n] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] [osize=OUTPUT_SIZE] if=INPUT of=OUTPUT
+.. option:: dd [--image-opts] [-U] [-f FMT] [-O OUTPUT_FMT] [-n] [-l SNAPSHOT_PARAM] [bs=BLOCK_SIZE] [count=BLOCKS] [skip=BLOCKS] [osize=OUTPUT_SIZE] if=INPUT of=OUTPUT
ERST
DEF("info", img_info,
diff --git a/qemu-img.c b/qemu-img.c
index 015d6d2ce4..7031195e32 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -4922,6 +4922,7 @@ static int img_dd(int argc, char **argv)
BlockDriver *drv = NULL, *proto_drv = NULL;
BlockBackend *blk1 = NULL, *blk2 = NULL;
QemuOpts *opts = NULL;
+ QemuOpts *sn_opts = NULL;
QemuOptsList *create_opts = NULL;
Error *local_err = NULL;
bool image_opts = false;
@@ -4931,6 +4932,7 @@ static int img_dd(int argc, char **argv)
int64_t size = 0, readsize = 0;
int64_t block_count = 0, out_pos, in_pos;
bool force_share = false, skip_create = false;
+ const char *snapshot_name = NULL;
struct DdInfo dd = {
.flags = 0,
.count = 0,
@@ -4968,7 +4970,7 @@ static int img_dd(int argc, char **argv)
{ 0, 0, 0, 0 }
};
- while ((c = getopt_long(argc, argv, ":hf:O:Un", long_options, NULL))) {
+ while ((c = getopt_long(argc, argv, ":hf:O:l:Un", long_options, NULL))) {
if (c == EOF) {
break;
}
@@ -4991,6 +4993,19 @@ static int img_dd(int argc, char **argv)
case 'n':
skip_create = true;
break;
+ case 'l':
+ if (strstart(optarg, SNAPSHOT_OPT_BASE, NULL)) {
+ sn_opts = qemu_opts_parse_noisily(&internal_snapshot_opts,
+ optarg, false);
+ if (!sn_opts) {
+ error_report("Failed in parsing snapshot param '%s'",
+ optarg);
+ goto out;
+ }
+ } else {
+ snapshot_name = optarg;
+ }
+ break;
case 'U':
force_share = true;
break;
@@ -5050,11 +5065,24 @@ static int img_dd(int argc, char **argv)
if (dd.flags & C_IF) {
blk1 = img_open(image_opts, in.filename, fmt, 0, false, false,
force_share);
-
if (!blk1) {
ret = -1;
goto out;
}
+ if (sn_opts) {
+ bdrv_snapshot_load_tmp(blk_bs(blk1),
+ qemu_opt_get(sn_opts, SNAPSHOT_OPT_ID),
+ qemu_opt_get(sn_opts, SNAPSHOT_OPT_NAME),
+ &local_err);
+ } else if (snapshot_name != NULL) {
+ bdrv_snapshot_load_tmp_by_id_or_name(blk_bs(blk1), snapshot_name,
+ &local_err);
+ }
+ if (local_err) {
+ error_reportf_err(local_err, "Failed to load snapshot: ");
+ ret = -1;
+ goto out;
+ }
}
if (dd.flags & C_OSIZE) {
@@ -5203,6 +5231,7 @@ static int img_dd(int argc, char **argv)
out:
g_free(arg);
qemu_opts_del(opts);
+ qemu_opts_del(sn_opts);
qemu_opts_free(create_opts);
blk_unref(blk1);
blk_unref(blk2);

View File

@@ -0,0 +1,407 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Thu, 21 Apr 2022 13:26:48 +0200
Subject: [PATCH] vma: allow partial restore
Introduce a new map line for skipping a certain drive, of the form
skip=drive-scsi0
Since in PVE, most archives are compressed and piped to vma for
restore, it's not easily possible to skip reads.
For the reader, a new skip flag for VmaRestoreState is added and the
target is allowed to be NULL if skip is specified when registering. If
the skip flag is set, no writes will be made as well as no check for
duplicate clusters. Therefore, the flag is not set for verify.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Acked-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
Signed-off-by: Thomas Lamprecht <t.lamprecht@proxmox.com>
---
vma-reader.c | 64 ++++++++++++---------
vma.c | 157 +++++++++++++++++++++++++++++----------------------
vma.h | 2 +-
3 files changed, 126 insertions(+), 97 deletions(-)
diff --git a/vma-reader.c b/vma-reader.c
index 4f4ee2b47b..844d95a5ba 100644
--- a/vma-reader.c
+++ b/vma-reader.c
@@ -29,6 +29,7 @@ typedef struct VmaRestoreState {
bool write_zeroes;
unsigned long *bitmap;
int bitmap_size;
+ bool skip;
} VmaRestoreState;
struct VmaReader {
@@ -426,13 +427,14 @@ VmaDeviceInfo *vma_reader_get_device_info(VmaReader *vmar, guint8 dev_id)
}
static void allocate_rstate(VmaReader *vmar, guint8 dev_id,
- BlockBackend *target, bool write_zeroes)
+ BlockBackend *target, bool write_zeroes, bool skip)
{
assert(vmar);
assert(dev_id);
vmar->rstate[dev_id].target = target;
vmar->rstate[dev_id].write_zeroes = write_zeroes;
+ vmar->rstate[dev_id].skip = skip;
int64_t size = vmar->devinfo[dev_id].size;
@@ -447,28 +449,30 @@ static void allocate_rstate(VmaReader *vmar, guint8 dev_id,
}
int vma_reader_register_bs(VmaReader *vmar, guint8 dev_id, BlockBackend *target,
- bool write_zeroes, Error **errp)
+ bool write_zeroes, bool skip, Error **errp)
{
assert(vmar);
- assert(target != NULL);
+ assert(target != NULL || skip);
assert(dev_id);
- assert(vmar->rstate[dev_id].target == NULL);
-
- int64_t size = blk_getlength(target);
- int64_t size_diff = size - vmar->devinfo[dev_id].size;
-
- /* storage types can have different size restrictions, so it
- * is not always possible to create an image with exact size.
- * So we tolerate a size difference up to 4MB.
- */
- if ((size_diff < 0) || (size_diff > 4*1024*1024)) {
- error_setg(errp, "vma_reader_register_bs for stream %s failed - "
- "unexpected size %zd != %zd", vmar->devinfo[dev_id].devname,
- size, vmar->devinfo[dev_id].size);
- return -1;
+ assert(vmar->rstate[dev_id].target == NULL && !vmar->rstate[dev_id].skip);
+
+ if (target != NULL) {
+ int64_t size = blk_getlength(target);
+ int64_t size_diff = size - vmar->devinfo[dev_id].size;
+
+ /* storage types can have different size restrictions, so it
+ * is not always possible to create an image with exact size.
+ * So we tolerate a size difference up to 4MB.
+ */
+ if ((size_diff < 0) || (size_diff > 4*1024*1024)) {
+ error_setg(errp, "vma_reader_register_bs for stream %s failed - "
+ "unexpected size %zd != %zd", vmar->devinfo[dev_id].devname,
+ size, vmar->devinfo[dev_id].size);
+ return -1;
+ }
}
- allocate_rstate(vmar, dev_id, target, write_zeroes);
+ allocate_rstate(vmar, dev_id, target, write_zeroes, skip);
return 0;
}
@@ -561,19 +565,23 @@ static int restore_extent(VmaReader *vmar, unsigned char *buf,
VmaRestoreState *rstate = &vmar->rstate[dev_id];
BlockBackend *target = NULL;
+ bool skip = rstate->skip;
+
if (dev_id != vmar->vmstate_stream) {
target = rstate->target;
- if (!verify && !target) {
+ if (!verify && !target && !skip) {
error_setg(errp, "got wrong dev id %d", dev_id);
return -1;
}
- if (vma_reader_get_bitmap(rstate, cluster_num)) {
- error_setg(errp, "found duplicated cluster %zd for stream %s",
- cluster_num, vmar->devinfo[dev_id].devname);
- return -1;
+ if (!skip) {
+ if (vma_reader_get_bitmap(rstate, cluster_num)) {
+ error_setg(errp, "found duplicated cluster %zd for stream %s",
+ cluster_num, vmar->devinfo[dev_id].devname);
+ return -1;
+ }
+ vma_reader_set_bitmap(rstate, cluster_num, 1);
}
- vma_reader_set_bitmap(rstate, cluster_num, 1);
max_sector = vmar->devinfo[dev_id].size/BDRV_SECTOR_SIZE;
} else {
@@ -619,7 +627,7 @@ static int restore_extent(VmaReader *vmar, unsigned char *buf,
return -1;
}
- if (!verify) {
+ if (!verify && !skip) {
int nb_sectors = end_sector - sector_num;
if (restore_write_data(vmar, dev_id, target, vmstate_fd,
buf + start, sector_num, nb_sectors,
@@ -655,7 +663,7 @@ static int restore_extent(VmaReader *vmar, unsigned char *buf,
return -1;
}
- if (!verify) {
+ if (!verify && !skip) {
int nb_sectors = end_sector - sector_num;
if (restore_write_data(vmar, dev_id, target, vmstate_fd,
buf + start, sector_num,
@@ -680,7 +688,7 @@ static int restore_extent(VmaReader *vmar, unsigned char *buf,
vmar->partial_zero_cluster_data += zero_size;
}
- if (rstate->write_zeroes && !verify) {
+ if (rstate->write_zeroes && !verify && !skip) {
if (restore_write_data(vmar, dev_id, target, vmstate_fd,
zero_vma_block, sector_num,
nb_sectors, errp) < 0) {
@@ -851,7 +859,7 @@ int vma_reader_verify(VmaReader *vmar, bool verbose, Error **errp)
for (dev_id = 1; dev_id < 255; dev_id++) {
if (vma_reader_get_device_info(vmar, dev_id)) {
- allocate_rstate(vmar, dev_id, NULL, false);
+ allocate_rstate(vmar, dev_id, NULL, false, false);
}
}
diff --git a/vma.c b/vma.c
index 89440733b1..21e765a469 100644
--- a/vma.c
+++ b/vma.c
@@ -138,6 +138,7 @@ typedef struct RestoreMap {
char *throttling_group;
char *cache;
bool write_zero;
+ bool skip;
} RestoreMap;
static bool try_parse_option(char **line, const char *optname, char **out, const char *inbuf) {
@@ -245,47 +246,61 @@ static int extract_content(int argc, char **argv)
char *bps = NULL;
char *group = NULL;
char *cache = NULL;
+ char *devname = NULL;
+ bool skip = false;
+ uint64_t bps_value = 0;
+ const char *path = NULL;
+ bool write_zero = true;
+
if (!line || line[0] == '\0' || !strcmp(line, "done\n")) {
break;
}
int len = strlen(line);
if (line[len - 1] == '\n') {
line[len - 1] = '\0';
- if (len == 1) {
+ len = len - 1;
+ if (len == 0) {
break;
}
}
- while (1) {
- if (!try_parse_option(&line, "format", &format, inbuf) &&
- !try_parse_option(&line, "throttling.bps", &bps, inbuf) &&
- !try_parse_option(&line, "throttling.group", &group, inbuf) &&
- !try_parse_option(&line, "cache", &cache, inbuf))
- {
- break;
+ if (strncmp(line, "skip", 4) == 0) {
+ if (len < 6 || line[4] != '=') {
+ g_error("read map failed - option 'skip' has no value ('%s')",
+ inbuf);
+ } else {
+ devname = line + 5;
+ skip = true;
+ }
+ } else {
+ while (1) {
+ if (!try_parse_option(&line, "format", &format, inbuf) &&
+ !try_parse_option(&line, "throttling.bps", &bps, inbuf) &&
+ !try_parse_option(&line, "throttling.group", &group, inbuf) &&
+ !try_parse_option(&line, "cache", &cache, inbuf))
+ {
+ break;
+ }
}
- }
- uint64_t bps_value = 0;
- if (bps) {
- bps_value = verify_u64(bps);
- g_free(bps);
- }
+ if (bps) {
+ bps_value = verify_u64(bps);
+ g_free(bps);
+ }
- const char *path;
- bool write_zero;
- if (line[0] == '0' && line[1] == ':') {
- path = line + 2;
- write_zero = false;
- } else if (line[0] == '1' && line[1] == ':') {
- path = line + 2;
- write_zero = true;
- } else {
- g_error("read map failed - parse error ('%s')", inbuf);
+ if (line[0] == '0' && line[1] == ':') {
+ path = line + 2;
+ write_zero = false;
+ } else if (line[0] == '1' && line[1] == ':') {
+ path = line + 2;
+ write_zero = true;
+ } else {
+ g_error("read map failed - parse error ('%s')", inbuf);
+ }
+
+ path = extract_devname(path, &devname, -1);
}
- char *devname = NULL;
- path = extract_devname(path, &devname, -1);
if (!devname) {
g_error("read map failed - no dev name specified ('%s')",
inbuf);
@@ -299,6 +314,7 @@ static int extract_content(int argc, char **argv)
map->throttling_group = group;
map->cache = cache;
map->write_zero = write_zero;
+ map->skip = skip;
g_hash_table_insert(devmap, map->devname, map);
@@ -328,6 +344,7 @@ static int extract_content(int argc, char **argv)
const char *cache = NULL;
int flags = BDRV_O_RDWR;
bool write_zero = true;
+ bool skip = false;
BlockBackend *blk = NULL;
@@ -343,6 +360,7 @@ static int extract_content(int argc, char **argv)
throttling_group = map->throttling_group;
cache = map->cache;
write_zero = map->write_zero;
+ skip = map->skip;
} else {
devfn = g_strdup_printf("%s/tmp-disk-%s.raw",
dirname, di->devname);
@@ -361,57 +379,60 @@ static int extract_content(int argc, char **argv)
write_zero = false;
}
- size_t devlen = strlen(devfn);
- QDict *options = NULL;
- bool writethrough;
- if (format) {
- /* explicit format from commandline */
- options = qdict_new();
- qdict_put_str(options, "driver", format);
- } else if ((devlen > 4 && strcmp(devfn+devlen-4, ".raw") == 0) ||
- strncmp(devfn, "/dev/", 5) == 0)
- {
- /* This part is now deprecated for PVE as well (just as qemu
- * deprecated not specifying an explicit raw format, too.
- */
- /* explicit raw format */
- options = qdict_new();
- qdict_put_str(options, "driver", "raw");
- }
- if (cache && bdrv_parse_cache_mode(cache, &flags, &writethrough)) {
- g_error("invalid cache option: %s\n", cache);
- }
+ if (!skip) {
+ size_t devlen = strlen(devfn);
+ QDict *options = NULL;
+ bool writethrough;
+ if (format) {
+ /* explicit format from commandline */
+ options = qdict_new();
+ qdict_put_str(options, "driver", format);
+ } else if ((devlen > 4 && strcmp(devfn+devlen-4, ".raw") == 0) ||
+ strncmp(devfn, "/dev/", 5) == 0)
+ {
+ /* This part is now deprecated for PVE as well (just as qemu
+ * deprecated not specifying an explicit raw format, too.
+ */
+ /* explicit raw format */
+ options = qdict_new();
+ qdict_put_str(options, "driver", "raw");
+ }
- if (errp || !(blk = blk_new_open(devfn, NULL, options, flags, &errp))) {
- g_error("can't open file %s - %s", devfn,
- error_get_pretty(errp));
- }
+ if (cache && bdrv_parse_cache_mode(cache, &flags, &writethrough)) {
+ g_error("invalid cache option: %s\n", cache);
+ }
- if (cache) {
- blk_set_enable_write_cache(blk, !writethrough);
- }
+ if (errp || !(blk = blk_new_open(devfn, NULL, options, flags, &errp))) {
+ g_error("can't open file %s - %s", devfn,
+ error_get_pretty(errp));
+ }
- if (throttling_group) {
- blk_io_limits_enable(blk, throttling_group);
- }
+ if (cache) {
+ blk_set_enable_write_cache(blk, !writethrough);
+ }
- if (throttling_bps) {
- if (!throttling_group) {
- blk_io_limits_enable(blk, devfn);
+ if (throttling_group) {
+ blk_io_limits_enable(blk, throttling_group);
}
- ThrottleConfig cfg;
- throttle_config_init(&cfg);
- cfg.buckets[THROTTLE_BPS_WRITE].avg = throttling_bps;
- Error *err = NULL;
- if (!throttle_is_valid(&cfg, &err)) {
- error_report_err(err);
- g_error("failed to apply throttling");
+ if (throttling_bps) {
+ if (!throttling_group) {
+ blk_io_limits_enable(blk, devfn);
+ }
+
+ ThrottleConfig cfg;
+ throttle_config_init(&cfg);
+ cfg.buckets[THROTTLE_BPS_WRITE].avg = throttling_bps;
+ Error *err = NULL;
+ if (!throttle_is_valid(&cfg, &err)) {
+ error_report_err(err);
+ g_error("failed to apply throttling");
+ }
+ blk_set_io_limits(blk, &cfg);
}
- blk_set_io_limits(blk, &cfg);
}
- if (vma_reader_register_bs(vmar, i, blk, write_zero, &errp) < 0) {
+ if (vma_reader_register_bs(vmar, i, blk, write_zero, skip, &errp) < 0) {
g_error("%s", error_get_pretty(errp));
}
diff --git a/vma.h b/vma.h
index c895c97f6d..1b62859165 100644
--- a/vma.h
+++ b/vma.h
@@ -142,7 +142,7 @@ GList *vma_reader_get_config_data(VmaReader *vmar);
VmaDeviceInfo *vma_reader_get_device_info(VmaReader *vmar, guint8 dev_id);
int vma_reader_register_bs(VmaReader *vmar, guint8 dev_id,
BlockBackend *target, bool write_zeroes,
- Error **errp);
+ bool skip, Error **errp);
int vma_reader_restore(VmaReader *vmar, int vmstate_fd, bool verbose,
Error **errp);
int vma_reader_verify(VmaReader *vmar, bool verbose, Error **errp);

View File

@@ -0,0 +1,233 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Wolfgang Bumiller <w.bumiller@proxmox.com>
Date: Tue, 26 Apr 2022 16:06:28 +0200
Subject: [PATCH] pbs: namespace support
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
block/monitor/block-hmp-cmds.c | 1 +
block/pbs.c | 25 +++++++++++++++++++++----
pbs-restore.c | 19 ++++++++++++++++---
pve-backup.c | 6 +++++-
qapi/block-core.json | 5 ++++-
5 files changed, 47 insertions(+), 9 deletions(-)
diff --git a/block/monitor/block-hmp-cmds.c b/block/monitor/block-hmp-cmds.c
index 52ddbf95ad..69c868887a 100644
--- a/block/monitor/block-hmp-cmds.c
+++ b/block/monitor/block-hmp-cmds.c
@@ -1041,6 +1041,7 @@ void coroutine_fn hmp_backup(Monitor *mon, const QDict *qdict)
false, NULL, // PBS key_password
false, NULL, // PBS master_keyfile
false, NULL, // PBS fingerprint
+ false, NULL, // PBS backup-ns
false, NULL, // PBS backup-id
false, 0, // PBS backup-time
false, false, // PBS use-dirty-bitmap
diff --git a/block/pbs.c b/block/pbs.c
index c5eb4d5bad..7471e2ef9d 100644
--- a/block/pbs.c
+++ b/block/pbs.c
@@ -14,6 +14,7 @@
#include <proxmox-backup-qemu.h>
#define PBS_OPT_REPOSITORY "repository"
+#define PBS_OPT_NAMESPACE "namespace"
#define PBS_OPT_SNAPSHOT "snapshot"
#define PBS_OPT_ARCHIVE "archive"
#define PBS_OPT_KEYFILE "keyfile"
@@ -27,6 +28,7 @@ typedef struct {
int64_t length;
char *repository;
+ char *namespace;
char *snapshot;
char *archive;
} BDRVPBSState;
@@ -40,6 +42,11 @@ static QemuOptsList runtime_opts = {
.type = QEMU_OPT_STRING,
.help = "The server address and repository to connect to.",
},
+ {
+ .name = PBS_OPT_NAMESPACE,
+ .type = QEMU_OPT_STRING,
+ .help = "Optional: The snapshot's namespace.",
+ },
{
.name = PBS_OPT_SNAPSHOT,
.type = QEMU_OPT_STRING,
@@ -76,7 +83,7 @@ static QemuOptsList runtime_opts = {
// filename format:
-// pbs:repository=<repo>,snapshot=<snap>,password=<pw>,key_password=<kpw>,fingerprint=<fp>,archive=<archive>
+// pbs:repository=<repo>,namespace=<ns>,snapshot=<snap>,password=<pw>,key_password=<kpw>,fingerprint=<fp>,archive=<archive>
static void pbs_parse_filename(const char *filename, QDict *options,
Error **errp)
{
@@ -112,6 +119,7 @@ static int pbs_open(BlockDriverState *bs, QDict *options, int flags,
s->archive = g_strdup(qemu_opt_get(opts, PBS_OPT_ARCHIVE));
const char *keyfile = qemu_opt_get(opts, PBS_OPT_KEYFILE);
const char *password = qemu_opt_get(opts, PBS_OPT_PASSWORD);
+ const char *namespace = qemu_opt_get(opts, PBS_OPT_NAMESPACE);
const char *fingerprint = qemu_opt_get(opts, PBS_OPT_FINGERPRINT);
const char *key_password = qemu_opt_get(opts, PBS_OPT_ENCRYPTION_PASSWORD);
@@ -124,9 +132,12 @@ static int pbs_open(BlockDriverState *bs, QDict *options, int flags,
if (!key_password) {
key_password = getenv("PBS_ENCRYPTION_PASSWORD");
}
+ if (namespace) {
+ s->namespace = g_strdup(namespace);
+ }
/* connect to PBS server in read mode */
- s->conn = proxmox_restore_new(s->repository, s->snapshot, password,
+ s->conn = proxmox_restore_new_ns(s->repository, s->snapshot, s->namespace, password,
keyfile, key_password, fingerprint, &pbs_error);
/* invalidates qemu_opt_get char pointers from above */
@@ -171,6 +182,7 @@ static int pbs_file_open(BlockDriverState *bs, QDict *options, int flags,
static void pbs_close(BlockDriverState *bs) {
BDRVPBSState *s = bs->opaque;
g_free(s->repository);
+ g_free(s->namespace);
g_free(s->snapshot);
g_free(s->archive);
proxmox_restore_disconnect(s->conn);
@@ -252,8 +264,13 @@ static coroutine_fn int pbs_co_pwritev(BlockDriverState *bs,
static void pbs_refresh_filename(BlockDriverState *bs)
{
BDRVPBSState *s = bs->opaque;
- snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s/%s(%s)",
- s->repository, s->snapshot, s->archive);
+ if (s->namespace) {
+ snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s/%s:%s(%s)",
+ s->repository, s->namespace, s->snapshot, s->archive);
+ } else {
+ snprintf(bs->exact_filename, sizeof(bs->exact_filename), "%s/%s(%s)",
+ s->repository, s->snapshot, s->archive);
+ }
}
static const char *const pbs_strong_runtime_opts[] = {
diff --git a/pbs-restore.c b/pbs-restore.c
index 4d3f925a1b..62042bdd93 100644
--- a/pbs-restore.c
+++ b/pbs-restore.c
@@ -30,7 +30,7 @@
static void help(void)
{
const char *help_msg =
- "usage: pbs-restore [--repository <repo>] snapshot archive-name target [command options]\n"
+ "usage: pbs-restore [--repository <repo>] [--ns namespace] snapshot archive-name target [command options]\n"
;
printf("%s", help_msg);
@@ -78,6 +78,7 @@ int main(int argc, char **argv)
Error *main_loop_err = NULL;
const char *format = "raw";
const char *repository = NULL;
+ const char *backup_ns = NULL;
const char *keyfile = NULL;
int verbose = false;
bool skip_zero = false;
@@ -91,6 +92,7 @@ int main(int argc, char **argv)
{"verbose", no_argument, 0, 'v'},
{"format", required_argument, 0, 'f'},
{"repository", required_argument, 0, 'r'},
+ {"ns", required_argument, 0, 'n'},
{"keyfile", required_argument, 0, 'k'},
{0, 0, 0, 0}
};
@@ -111,6 +113,9 @@ int main(int argc, char **argv)
case 'r':
repository = g_strdup(argv[optind - 1]);
break;
+ case 'n':
+ backup_ns = g_strdup(argv[optind - 1]);
+ break;
case 'k':
keyfile = g_strdup(argv[optind - 1]);
break;
@@ -161,8 +166,16 @@ int main(int argc, char **argv)
fprintf(stderr, "connecting to repository '%s'\n", repository);
}
char *pbs_error = NULL;
- ProxmoxRestoreHandle *conn = proxmox_restore_new(
- repository, snapshot, password, keyfile, key_password, fingerprint, &pbs_error);
+ ProxmoxRestoreHandle *conn = proxmox_restore_new_ns(
+ repository,
+ snapshot,
+ backup_ns,
+ password,
+ keyfile,
+ key_password,
+ fingerprint,
+ &pbs_error
+ );
if (conn == NULL) {
fprintf(stderr, "restore failed: %s\n", pbs_error);
return -1;
diff --git a/pve-backup.c b/pve-backup.c
index 9f6c04a512..f6a5f8c785 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -10,6 +10,8 @@
#include "qapi/qmp/qerror.h"
#include "qemu/cutils.h"
+#include <proxmox-backup-qemu.h>
+
/* PVE backup state and related function */
/*
@@ -535,6 +537,7 @@ UuidInfo coroutine_fn *qmp_backup(
bool has_key_password, const char *key_password,
bool has_master_keyfile, const char *master_keyfile,
bool has_fingerprint, const char *fingerprint,
+ bool has_backup_ns, const char *backup_ns,
bool has_backup_id, const char *backup_id,
bool has_backup_time, int64_t backup_time,
bool has_use_dirty_bitmap, bool use_dirty_bitmap,
@@ -674,8 +677,9 @@ UuidInfo coroutine_fn *qmp_backup(
firewall_name = "fw.conf";
char *pbs_err = NULL;
- pbs = proxmox_backup_new(
+ pbs = proxmox_backup_new_ns(
backup_file,
+ has_backup_ns ? backup_ns : NULL,
backup_id,
backup_time,
dump_cb_block_size,
diff --git a/qapi/block-core.json b/qapi/block-core.json
index d089328a1f..705f0c97ba 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -821,6 +821,8 @@
#
# @fingerprint: server cert fingerprint (optional for format 'pbs')
#
+# @backup-ns: backup namespace (required for format 'pbs')
+#
# @backup-id: backup ID (required for format 'pbs')
#
# @backup-time: backup timestamp (Unix epoch, required for format 'pbs')
@@ -840,6 +842,7 @@
'*key-password': 'str',
'*master-keyfile': 'str',
'*fingerprint': 'str',
+ '*backup-ns': 'str',
'*backup-id': 'str',
'*backup-time': 'int',
'*use-dirty-bitmap': 'bool',
@@ -3236,7 +3239,7 @@
{ 'struct': 'BlockdevOptionsPbs',
'data': { 'repository': 'str', 'snapshot': 'str', 'archive': 'str',
'*keyfile': 'str', '*password': 'str', '*fingerprint': 'str',
- '*key_password': 'str' } }
+ '*key_password': 'str', '*namespace': 'str' } }
##
# @BlockdevOptionsNVMe:

View File

@@ -0,0 +1,161 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Tue, 17 May 2022 09:46:02 +0200
Subject: [PATCH] Revert "block/rbd: implement bdrv_co_block_status"
During backup, bdrv_co_block_status is called for each block copy
chunk. When RBD is used, the current implementation with
rbd_diff_iterate2() using whole_object=true takes about linearly more
time, depending on the image size. Since there are linearly more
chunks, the slowdown is quadratic, becoming unacceptable for large
images (starting somewhere between 500-1000 GiB in my testing).
This reverts commit 0347a8fd4c3faaedf119be04c197804be40a384b as a
stop-gap measure, until it's clear how to make the implemenation
more efficient.
Upstream bug report:
https://gitlab.com/qemu-project/qemu/-/issues/1026
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
block/rbd.c | 112 ----------------------------------------------------
1 file changed, 112 deletions(-)
diff --git a/block/rbd.c b/block/rbd.c
index a4b8fb482c..3393b06a4e 100644
--- a/block/rbd.c
+++ b/block/rbd.c
@@ -97,12 +97,6 @@ typedef struct RBDTask {
int64_t ret;
} RBDTask;
-typedef struct RBDDiffIterateReq {
- uint64_t offs;
- uint64_t bytes;
- bool exists;
-} RBDDiffIterateReq;
-
static int qemu_rbd_connect(rados_t *cluster, rados_ioctx_t *io_ctx,
BlockdevOptionsRbd *opts, bool cache,
const char *keypairs, const char *secretid,
@@ -1267,111 +1261,6 @@ static ImageInfoSpecific *qemu_rbd_get_specific_info(BlockDriverState *bs,
return spec_info;
}
-/*
- * rbd_diff_iterate2 allows to interrupt the exection by returning a negative
- * value in the callback routine. Choose a value that does not conflict with
- * an existing exitcode and return it if we want to prematurely stop the
- * execution because we detected a change in the allocation status.
- */
-#define QEMU_RBD_EXIT_DIFF_ITERATE2 -9000
-
-static int qemu_rbd_diff_iterate_cb(uint64_t offs, size_t len,
- int exists, void *opaque)
-{
- RBDDiffIterateReq *req = opaque;
-
- assert(req->offs + req->bytes <= offs);
- /*
- * we do not diff against a snapshot so we should never receive a callback
- * for a hole.
- */
- assert(exists);
-
- if (!req->exists && offs > req->offs) {
- /*
- * we started in an unallocated area and hit the first allocated
- * block. req->bytes must be set to the length of the unallocated area
- * before the allocated area. stop further processing.
- */
- req->bytes = offs - req->offs;
- return QEMU_RBD_EXIT_DIFF_ITERATE2;
- }
-
- if (req->exists && offs > req->offs + req->bytes) {
- /*
- * we started in an allocated area and jumped over an unallocated area,
- * req->bytes contains the length of the allocated area before the
- * unallocated area. stop further processing.
- */
- return QEMU_RBD_EXIT_DIFF_ITERATE2;
- }
-
- req->bytes += len;
- req->exists = true;
-
- return 0;
-}
-
-static int coroutine_fn qemu_rbd_co_block_status(BlockDriverState *bs,
- bool want_zero, int64_t offset,
- int64_t bytes, int64_t *pnum,
- int64_t *map,
- BlockDriverState **file)
-{
- BDRVRBDState *s = bs->opaque;
- int status, r;
- RBDDiffIterateReq req = { .offs = offset };
- uint64_t features, flags;
-
- assert(offset + bytes <= s->image_size);
-
- /* default to all sectors allocated */
- status = BDRV_BLOCK_DATA | BDRV_BLOCK_OFFSET_VALID;
- *map = offset;
- *file = bs;
- *pnum = bytes;
-
- /* check if RBD image supports fast-diff */
- r = rbd_get_features(s->image, &features);
- if (r < 0) {
- return status;
- }
- if (!(features & RBD_FEATURE_FAST_DIFF)) {
- return status;
- }
-
- /* check if RBD fast-diff result is valid */
- r = rbd_get_flags(s->image, &flags);
- if (r < 0) {
- return status;
- }
- if (flags & RBD_FLAG_FAST_DIFF_INVALID) {
- return status;
- }
-
- r = rbd_diff_iterate2(s->image, NULL, offset, bytes, true, true,
- qemu_rbd_diff_iterate_cb, &req);
- if (r < 0 && r != QEMU_RBD_EXIT_DIFF_ITERATE2) {
- return status;
- }
- assert(req.bytes <= bytes);
- if (!req.exists) {
- if (r == 0) {
- /*
- * rbd_diff_iterate2 does not invoke callbacks for unallocated
- * areas. This here catches the case where no callback was
- * invoked at all (req.bytes == 0).
- */
- assert(req.bytes == 0);
- req.bytes = bytes;
- }
- status = BDRV_BLOCK_ZERO | BDRV_BLOCK_OFFSET_VALID;
- }
-
- *pnum = req.bytes;
- return status;
-}
-
static int64_t qemu_rbd_getlength(BlockDriverState *bs)
{
BDRVRBDState *s = bs->opaque;
@@ -1607,7 +1496,6 @@ static BlockDriver bdrv_rbd = {
#ifdef LIBRBD_SUPPORTS_WRITE_ZEROES
.bdrv_co_pwrite_zeroes = qemu_rbd_co_pwrite_zeroes,
#endif
- .bdrv_co_block_status = qemu_rbd_co_block_status,
.bdrv_snapshot_create = qemu_rbd_snap_create,
.bdrv_snapshot_delete = qemu_rbd_snap_remove,

View File

@@ -0,0 +1,59 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Wed, 25 May 2022 13:59:37 +0200
Subject: [PATCH] PVE-Backup: create jobs: correctly cancel in error scenario
The first call to job_cancel_sync() will cancel and free all jobs in
the transaction, so ensure that it's called only once and get rid of
the job_unref() that would operate on freed memory.
It's also necessary to NULL backup_state.pbs in the error scenario,
because a subsequent backup_cancel QMP call (as happens in PVE when
the backup QMP command fails) would try to call proxmox_backup_abort()
and run into a segfault.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
pve-backup.c | 10 ++++++++--
1 file changed, 8 insertions(+), 2 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index f6a5f8c785..5bed6f4014 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -506,6 +506,11 @@ static void create_backup_jobs_bh(void *opaque) {
}
if (*errp) {
+ /*
+ * It's enough to cancel one job in the transaction, the rest will
+ * follow automatically.
+ */
+ bool canceled = false;
l = backup_state.di_list;
while (l) {
PVEBackupDevInfo *di = (PVEBackupDevInfo *)l->data;
@@ -516,12 +521,12 @@ static void create_backup_jobs_bh(void *opaque) {
di->target = NULL;
}
- if (di->job) {
+ if (!canceled && di->job) {
AioContext *ctx = di->job->job.aio_context;
aio_context_acquire(ctx);
job_cancel_sync(&di->job->job, true);
- job_unref(&di->job->job);
aio_context_release(ctx);
+ canceled = true;
}
}
}
@@ -947,6 +952,7 @@ err:
if (pbs) {
proxmox_backup_disconnect(pbs);
+ backup_state.pbs = NULL;
}
if (backup_dir) {

View File

@@ -0,0 +1,76 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Wed, 25 May 2022 13:59:38 +0200
Subject: [PATCH] PVE-Backup: ensure jobs in di_list are referenced
Ensures that qmp_backup_cancel doesn't pick a job that's already been
freed. With unlucky timings it seems possible that:
1. job_exit -> job_completed -> job_finalize_single starts
2. pvebackup_co_complete_stream gets spawned in completion callback
3. job finalize_single finishes -> job's refcount hits zero -> job is
freed
4. qmp_backup_cancel comes in and locks backup_state.backup_mutex
before pvebackup_co_complete_stream can remove the job from the
di_list
5. qmp_backup_cancel will pick a job that's already been freed
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
pve-backup.c | 25 ++++++++++++++++++++-----
1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 5bed6f4014..0c34428713 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -316,6 +316,14 @@ static void coroutine_fn pvebackup_co_complete_stream(void *opaque)
}
}
+ if (di->job) {
+ AioContext *ctx = di->job->job.aio_context;
+ aio_context_acquire(ctx);
+ job_unref(&di->job->job);
+ di->job = NULL;
+ aio_context_release(ctx);
+ }
+
// remove self from job list
backup_state.di_list = g_list_remove(backup_state.di_list, di);
@@ -491,9 +499,12 @@ static void create_backup_jobs_bh(void *opaque) {
bitmap_mode, false, NULL, &perf, BLOCKDEV_ON_ERROR_REPORT, BLOCKDEV_ON_ERROR_REPORT,
JOB_DEFAULT, pvebackup_complete_cb, di, backup_state.txn, &local_err);
- aio_context_release(aio_context);
-
di->job = job;
+ if (job) {
+ job_ref(&job->job);
+ }
+
+ aio_context_release(aio_context);
if (!job || local_err) {
error_setg(errp, "backup_job_create failed: %s",
@@ -521,12 +532,16 @@ static void create_backup_jobs_bh(void *opaque) {
di->target = NULL;
}
- if (!canceled && di->job) {
+ if (di->job) {
AioContext *ctx = di->job->job.aio_context;
aio_context_acquire(ctx);
- job_cancel_sync(&di->job->job, true);
+ if (!canceled) {
+ job_cancel_sync(&di->job->job, true);
+ canceled = true;
+ }
+ job_unref(&di->job->job);
+ di->job = NULL;
aio_context_release(ctx);
- canceled = true;
}
}
}

View File

@@ -0,0 +1,120 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Wed, 25 May 2022 13:59:39 +0200
Subject: [PATCH] PVE-Backup: avoid segfault issues upon backup-cancel
When canceling a backup in PVE via a signal it's easy to run into a
situation where the job is already failing when the backup_cancel QMP
command comes in. With a bit of unlucky timing on top, it can happen
that job_exit() runs between schedulung of job_cancel_bh() and
execution of job_cancel_bh(). But job_cancel_sync() does not expect
that the job is already finalized (in fact, the job might've been
freed already, but even if it isn't, job_cancel_sync() would try to
deref job->txn which would be NULL at that point).
It is not possible to simply use the job_cancel() (which is advertised
as being async but isn't in all cases) in qmp_backup_cancel() for the
same reason job_cancel_sync() cannot be used. Namely, because it can
invoke job_finish_sync() (which uses AIO_WAIT_WHILE and thus hangs if
called from a coroutine). This happens when there's multiple jobs in
the transaction and job->deferred_to_main_loop is true (is set before
scheduling job_exit()) or if the job was not started yet.
Fix the issue by selecting the job to cancel in job_cancel_bh() itself
using the first job that's not completed yet. This is not necessarily
the first job in the list, because pvebackup_co_complete_stream()
might not yet have removed a completed job when job_cancel_bh() runs.
An alternative would be to continue using only the first job and
checking against JOB_STATUS_CONCLUDED or JOB_STATUS_NULL to decide if
it's still necessary and possible to cancel, but the approach with
using the first non-completed job seemed more robust.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
Signed-off-by: Wolfgang Bumiller <w.bumiller@proxmox.com>
---
pve-backup.c | 61 +++++++++++++++++++++++++++++++++-------------------
1 file changed, 39 insertions(+), 22 deletions(-)
diff --git a/pve-backup.c b/pve-backup.c
index 0c34428713..2e22030eec 100644
--- a/pve-backup.c
+++ b/pve-backup.c
@@ -355,15 +355,42 @@ static void pvebackup_complete_cb(void *opaque, int ret)
/*
* job_cancel(_sync) does not like to be called from coroutines, so defer to
- * main loop processing via a bottom half.
+ * main loop processing via a bottom half. Assumes that caller holds
+ * backup_mutex.
*/
static void job_cancel_bh(void *opaque) {
CoCtxData *data = (CoCtxData*)opaque;
- Job *job = (Job*)data->data;
- AioContext *job_ctx = job->aio_context;
- aio_context_acquire(job_ctx);
- job_cancel_sync(job, true);
- aio_context_release(job_ctx);
+
+ /*
+ * Be careful to pick a valid job to cancel:
+ * 1. job_cancel_sync() does not expect the job to be finalized already.
+ * 2. job_exit() might run between scheduling and running job_cancel_bh()
+ * and pvebackup_co_complete_stream() might not have removed the job from
+ * the list yet (in fact, cannot, because it waits for the backup_mutex).
+ * Requiring !job_is_completed() ensures that no finalized job is picked.
+ */
+ GList *bdi = g_list_first(backup_state.di_list);
+ while (bdi) {
+ if (bdi->data) {
+ BlockJob *bj = ((PVEBackupDevInfo *)bdi->data)->job;
+ if (bj) {
+ Job *job = &bj->job;
+ if (!job_is_completed(job)) {
+ AioContext *job_ctx = job->aio_context;
+ aio_context_acquire(job_ctx);
+ job_cancel_sync(job, true);
+ aio_context_release(job_ctx);
+ /*
+ * It's enough to cancel one job in the transaction, the
+ * rest will follow automatically.
+ */
+ break;
+ }
+ }
+ }
+ bdi = g_list_next(bdi);
+ }
+
aio_co_enter(data->ctx, data->co);
}
@@ -384,22 +411,12 @@ void coroutine_fn qmp_backup_cancel(Error **errp)
proxmox_backup_abort(backup_state.pbs, "backup canceled");
}
- /* it's enough to cancel one job in the transaction, the rest will follow
- * automatically */
- GList *bdi = g_list_first(backup_state.di_list);
- BlockJob *cancel_job = bdi && bdi->data ?
- ((PVEBackupDevInfo *)bdi->data)->job :
- NULL;
-
- if (cancel_job) {
- CoCtxData data = {
- .ctx = qemu_get_current_aio_context(),
- .co = qemu_coroutine_self(),
- .data = &cancel_job->job,
- };
- aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);
- qemu_coroutine_yield();
- }
+ CoCtxData data = {
+ .ctx = qemu_get_current_aio_context(),
+ .co = qemu_coroutine_self(),
+ };
+ aio_bh_schedule_oneshot(data.ctx, job_cancel_bh, &data);
+ qemu_coroutine_yield();
qemu_co_mutex_unlock(&backup_state.backup_mutex);
}

View File

@@ -0,0 +1,57 @@
From 0000000000000000000000000000000000000000 Mon Sep 17 00:00:00 2001
From: Fabian Ebner <f.ebner@proxmox.com>
Date: Wed, 22 Jun 2022 10:45:11 +0200
Subject: [PATCH] vma: create: support 64KiB-unaligned input images
which fixes backing up templates with such disks in PVE, for example
efitype=4m EFI disks on a file-based storage (size = 540672).
If there is not enough left to read, blk_co_preadv will return -EIO,
so limit the size in the last iteration.
For writing, an unaligned end is already handled correctly.
The call to memset is not strictly necessary, because writing also
checks that it doesn't write data beyond the end of the image. But
there are two reasons to do it:
1. It's cleaner that way.
2. It allows detecting when the final piece is all zeroes, which might
not happen if the buffer still contains data from the previous
iteration.
Signed-off-by: Fabian Ebner <f.ebner@proxmox.com>
---
vma.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/vma.c b/vma.c
index 21e765a469..6d02b29047 100644
--- a/vma.c
+++ b/vma.c
@@ -548,7 +548,7 @@ static void coroutine_fn backup_run(void *opaque)
struct iovec iov;
QEMUIOVector qiov;
- int64_t start, end;
+ int64_t start, end, readlen;
int ret = 0;
unsigned char *buf = blk_blockalign(job->target, VMA_CLUSTER_SIZE);
@@ -562,8 +562,16 @@ static void coroutine_fn backup_run(void *opaque)
iov.iov_len = VMA_CLUSTER_SIZE;
qemu_iovec_init_external(&qiov, &iov, 1);
+ if (start + 1 == end) {
+ memset(buf, 0, VMA_CLUSTER_SIZE);
+ readlen = job->len - start * VMA_CLUSTER_SIZE;
+ assert(readlen > 0 && readlen <= VMA_CLUSTER_SIZE);
+ } else {
+ readlen = VMA_CLUSTER_SIZE;
+ }
+
ret = blk_co_preadv(job->target, start * VMA_CLUSTER_SIZE,
- VMA_CLUSTER_SIZE, &qiov, 0);
+ readlen, &qiov, 0);
if (ret < 0) {
vma_writer_set_error(job->vmaw, "read error", -1);
goto out;

Some files were not shown because too many files have changed in this diff Show More